✨
AI Summary
- Tianqi Chen of CMU/OctoML discusses MLC (Machine Learning Compilation) enabling LLMs to run on consumer hardware without GPUs—iPhones, browsers, AMD cards via compilation optimization
- MLC Chat and WebLLM projects demonstrate practical deployment of 70B models on edge devices achieving 30+ tokens/sec, addressing GPU scarcity challenges
- Compilation-based approach unlocks alternative compute paths and reduces infrastructure costs while maintaining competitive inference performance on consumer hardware