Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave) — Latent Space: The AI Engineer Podcast

✨ AI Summary

OpenPipe pivoted from distilling GPT-4 into cheaper models to RL-based agent training as frontier model prices dropped, addressing why 90% of AI projects fail due to reliability rather than capability issues
RULER (Relative Universal Reinforcement Learning Elicited Rewards) breakthrough enables accessible RL training by using LLMs as judges to rank agent behaviors relatively, eliminating complex reward engineering
Kyle Corbitt transitioned from leading YC's Startup School to building a company acquired by CoreWeave, demonstrating shift from supervised fine-tuning to reinforcement learning as the critical path forward