RLHF 201 - with Nathan Lambert of AI2 and Interconnects — Latent Space: The AI Engineer Podcast

✨ AI Summary

Nathan Lambert provides deep dive into RLHF (Reinforcement Learning from Human Feedback), explaining how transformer models transition from next-token prediction to helpful, honest assistants
Covers the shoggoth mask factory concept and training techniques like DPO used in Tulu 2 and other open-source models
Educational survey episode on one of the most critical alignment and training techniques in modern LLM development

More from Latent Space: The AI Engineer Podcast

Apr 3, 2026 · 1h 16m

Apr 2, 2026 · 1h 6m

Mar 30, 2026 · 48m

Mar 24, 2026 · 35m