[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor — Latent Space: The AI Engineer Podcast

✨ AI Summary

Ashvin Nair from Cursor shipped RL breakthroughs on GPT-4o/o1/o3; reasoning team scaled from 12 to 300+ people; IOI Gold felt reachable in 2022 but only materialized when o1 shipped
Key insight: RL doesn't generalize beyond training distribution, requiring product-model co-design to bring economically useful tasks into distribution instead of overfitting to benchmarks
Cursor's continual learning approach with policy updates every two hours and bi-directional human-in-the-loop prevents ADHD-like context-switching, positioning it for next paradigm shift