How to train a Million Context LLM — with Mark Huang of Gradient.ai — Latent Space: The AI Engineer Podcast

✨ AI Summary

Documents the evolution of context window lengths from 84k tokens (MPT-7B) to current 1M+ token models, covering the competitive 'Context Extension Campaigns' between frontier labs
Discusses Mark Huang's techniques for training long-context LLMs at scale, including architectural innovations and training strategies to handle million-token inputs
Addresses practical challenges in long-context training such as computational efficiency, interpolation methods, and maintaining performance across extended sequences

More from Latent Space: The AI Engineer Podcast

Apr 3, 2026 · 1h 16m

Apr 2, 2026 · 1h 6m

Mar 30, 2026 · 48m

Mar 24, 2026 · 35m