「#AgentBench」の人気タグ記事一覧｜note ――つくる、つながる、とどける。

Observational Scaling Laws and the Predictability of Language Model Performance

4か月前

AgentBench: Evaluating LLMs as Agents

1年前