人気の記事一覧
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Deep reinforcement learning from human preferences
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
RLIF: Interactive Imitation Learning as Reinforcement Learning
Large Language Models Open New Way of AI-Assisted Molecule Design for Chemists
From r to Q∗: Your Language Model is Secretly a Q-Function