From RLHF to RLHB: The Case for Learning from Human Behavior - with Jeffrey Wang and Joe Reeve of Amplitude — Latent Space: The AI Engineer Podcast

✨ AI Summary

Jeffrey Wang and Joe Reeve propose RLHB (Reinforcement Learning from Human Behavior) as alternative to RLHF, using implicit behavioral signals instead of explicit feedback
Highlights difficulty of collecting high-quality explicit human feedback (15,000 items is resource-intensive) and low engagement with explicit feedback UI
Explores using behavioral data and implicit signals to scale reward modeling without requiring millions of explicit annotation items

More from Latent Space: The AI Engineer Podcast

Apr 3, 2026 · 1h 16m

Apr 2, 2026 · 1h 6m

Mar 30, 2026 · 48m

Mar 24, 2026 · 35m