人気の記事一覧
SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities
SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory
Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset
Building a Large Japanese Web Corpus for Large Language Models
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
K-QA: A Real-World Medical Q&A Benchmark