✨
AI Summary
- OpenPipe pivoted from distilling GPT-4 into cheaper models to RL-based agent training as frontier model prices dropped, addressing why 90% of AI projects fail due to reliability rather than capability issues
- RULER (Relative Universal Reinforcement Learning Elicited Rewards) breakthrough enables accessible RL training by using LLMs as judges to rank agent behaviors relatively, eliminating complex reward engineering
- Kyle Corbitt transitioned from leading YC's Startup School to building a company acquired by CoreWeave, demonstrating shift from supervised fine-tuning to reinforcement learning as the critical path forward
Guests on This Episode
KC
Kyle Corbitt
1 podcast appearance