✨
AI Summary
- Hugo Laurençon and Leo Tronchon (HuggingFace M4) explain building open source multimodal models by combining existing LLMs and vision encoders with adapter layers
- Discusses how DeepMind's Flamingo inspired cheaper alternatives like LLaVA, BakLLaVA, and FireLLaVA, and why Flamingo wasn't open sourced
- Covers LAION's contributions to open source multimodal research and democratizing access to vision-language model training techniques