PodcastIntel
Sign in Get Started Free
Latent Space: The AI Engineer Podcast
Latent Space: The AI Engineer Podcast

LLMs Everywhere: Running 70B models in browsers and iPhones using MLC — with Tianqi Chen of CMU / OctoML

Aug 10, 2023 · 52m
AI Summary
  • Tianqi Chen of CMU/OctoML discusses MLC (Machine Learning Compilation) enabling LLMs to run on consumer hardware without GPUs—iPhones, browsers, AMD cards via compilation optimization
  • MLC Chat and WebLLM projects demonstrate practical deployment of 70B models on edge devices achieving 30+ tokens/sec, addressing GPU scarcity challenges
  • Compilation-based approach unlocks alternative compute paths and reduces infrastructure costs while maintaining competitive inference performance on consumer hardware

More from Latent Space: The AI Engineer Podcast

View all episodes →

Get AI Summaries for Every New Episode

Subscribe to Latent Space: The AI Engineer Podcast and get AI summaries, guest tracking, and email digests delivered automatically.

Sign Up Free →