人気の記事一覧
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Scenarios and Approaches for Situated Natural Language Explanations
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
SimPO: Simple Preference Optimization with a Reference-Free Reward
Hallucination of Multimodal Large Language Models: A Survey
Instruction-Following Evaluation for Large Language Models
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities