✨
AI Summary
- Jeffrey Wang and Joe Reeve propose RLHB (Reinforcement Learning from Human Behavior) as alternative to RLHF, using implicit behavioral signals instead of explicit feedback
- Highlights difficulty of collecting high-quality explicit human feedback (15,000 items is resource-intensive) and low engagement with explicit feedback UI
- Explores using behavioral data and implicit signals to scale reward modeling without requiring millions of explicit annotation items