LLMs Everywhere: Running 70B models in browsers and iPhones using MLC — with Tianqi Chen of CMU / OctoML — Latent Space: The AI Engineer Podcast

✨ AI Summary

Tianqi Chen of CMU/OctoML discusses MLC (Machine Learning Compilation) enabling LLMs to run on consumer hardware without GPUs—iPhones, browsers, AMD cards via compilation optimization
MLC Chat and WebLLM projects demonstrate practical deployment of 70B models on edge devices achieving 30+ tokens/sec, addressing GPU scarcity challenges
Compilation-based approach unlocks alternative compute paths and reduces infrastructure costs while maintaining competitive inference performance on consumer hardware

More from Latent Space: The AI Engineer Podcast

Apr 3, 2026 · 1h 16m

Apr 2, 2026 · 1h 6m

Mar 30, 2026 · 48m

Mar 24, 2026 · 35m