人気の記事一覧

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

2か月前

Deep reinforcement learning from human preferences

1か月前

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

1か月前

RLIF: Interactive Imitation Learning as Reinforcement Learning

1か月前

Large Language Models Open New Way of AI-Assisted Molecule Design for Chemists

2か月前

From r to Q∗: Your Language Model is Secretly a Q-Function

2か月前