人気の記事一覧

SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities

2か月前

SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory

1か月前

Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset

1か月前

Building a Large Japanese Web Corpus for Large Language Models

2か月前

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

5か月前

K-QA: A Real-World Medical Q&A Benchmark

5か月前