✨
AI Summary
- AI inference costs decreased 10-100x in 2024, with open models like Llama 3.1 405B costing $3/mtok versus $30/mtok for Claude 3 Opus, and frontier models dropped 400x from 2022-2024
- Inference speed improved 4-8x annually, with Cerebras Inference running 70B models at 450 tok/s and platforms like Gemini Flash and Cerebras offering 1M tokens/day free for personal use
- Hardware improvements, quantization, and synthetic data distillation are the three dimensions driving 3000x improvements in AI efficiency across time, cost, and speed