「#LLM評価」の人気タグ記事一覧｜note ――つくる、つながる、とどける。

CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

3か月前

SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory

4か月前

Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences

4か月前

A RAG Method for Source Code Inquiry Tailored to Long-Context LLMs

5か月前

InFoBench: Evaluating Instruction Following Ability in Large Language Models

9か月前

arxiv.org/abs/2308.16890 背景）会話スキル・視覚物語生成能力について精緻な LVLM 評価研究が少ない提案）LVLM 能力を総合評価すべく 27 サブタスクをカバーする視覚対話データ TouchStone を提案し LLM による評価を実施

1年前