Llama 2, 3 & 4: Synthetic Data, RLHF, Agents on the path to Open Source AGI

✨ AI Summary

Meta released Llama 3.1-405B, the largest open source model trained on 15T tokens beating GPT-4 on benchmarks, with 8B and 70B models also receiving significant spec bumps
Synthetic data generation was key to Llama 3 training, with focus on pre-training pipelines, scaling laws, and post-training including RLHF vs instruction tuning approaches
Thomas Scialom, who led post-training for Llama 2 and 3, discusses tool calling, evals, and the role of synthetic data in training the largest open source AGI models

More from Latent Space: The AI Engineer Podcast

Apr 3, 2026 · 1h 16m

Apr 2, 2026 · 1h 6m

Mar 30, 2026 · 48m

Mar 24, 2026 · 35m