How to train your own Large Multimodal Model — with Hugo Laurençon & Leo Tronchon of HuggingFace M4 — Latent Space: The AI Engineer Podcast

✨ AI Summary

Hugo Laurençon and Leo Tronchon (HuggingFace M4) explain building open source multimodal models by combining existing LLMs and vision encoders with adapter layers
Discusses how DeepMind's Flamingo inspired cheaper alternatives like LLaVA, BakLLaVA, and FireLLaVA, and why Flamingo wasn't open sourced
Covers LAION's contributions to open source multimodal research and democratizing access to vision-language model training techniques

More from Latent Space: The AI Engineer Podcast

Jul 8, 2026 · 57m

Jul 1, 2026 · 1h 48m

Jun 24, 2026 · 1h 8m

Jun 22, 2026 · 1h 6m