✨
AI Summary
- Reward model quality, not just accuracy, impacts RLHF efficiency.
- Low reward variance slows optimization by creating a flat objective landscape.
- Effective reward models require diverse, informative feedback.