人気の記事一覧

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Scenarios and Approaches for Situated Natural Language Explanations

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

3週間前

SimPO: Simple Preference Optimization with a Reference-Free Reward

4週間前

Hallucination of Multimodal Large Language Models: A Survey

1か月前

Instruction-Following Evaluation for Large Language Models

2か月前

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

10か月前