✨
AI Summary
- Nathan Lambert provides deep dive into RLHF (Reinforcement Learning from Human Feedback), explaining how transformer models transition from next-token prediction to helpful, honest assistants
- Covers the shoggoth mask factory concept and training techniques like DPO used in Tulu 2 and other open-source models
- Educational survey episode on one of the most critical alignment and training techniques in modern LLM development