人気の記事一覧

Observational Scaling Laws and the Predictability of Language Model Performance

4か月前

AgentBench: Evaluating LLMs as Agents