PodcastIntel
Sign in Get Started Free
Latent Space: The AI Engineer Podcast
Latent Space: The AI Engineer Podcast

[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI

Dec 31, 2025 · 27m
AI Summary
  • Josh McGrath at OpenAI describes post-training evolution from 2023 PPO vs DPO debates to current RLVR era where data quality and signal trust matter more than optimization method
  • RLHF and RLVR are both policy gradient methods; difference is input data (verifiable math signals vs human preferences); GRPO from DeepSeek Math represents underappreciated shift toward trustworthy rewards
  • Token efficiency now matters more than wall-clock time for scaling; GPT-5 to 5.1 improved evals while reducing tokens; Codex changed workflows from 40-min design to 15-min agent sprints

Guests on This Episode

JM
Josh McGrath
1 podcast appearance

More from Latent Space: The AI Engineer Podcast

View all episodes →

Get AI Summaries for Every New Episode

Subscribe to Latent Space: The AI Engineer Podcast and get AI summaries, guest tracking, and email digests delivered automatically.

Sign Up Free →