PodcastIntel
Sign in Get Started Free
Latent Space: The AI Engineer Podcast
Latent Space: The AI Engineer Podcast

ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Jun 10, 2024 · 4h 29m
AI Summary
  • Discusses code editing benchmarks (WebArena, Sotopia), OpenDevin agent framework, and tensions between academic research and industry implementation of AI systems
  • Covers SWEBench for software engineering tasks, dataset contamination detection methods, GAIA benchmark, and Moritz Hardt's research on the science of benchmarking
  • Explores Self-RAG approach for reasoning and post-training, examining how LLMs can learn to retrieve, generate, and critique through self-reflection mechanisms

Guests on This Episode

AS
Aman Sanger
1 podcast appearance
GN
Graham Neubig
1 podcast appearance
MH
Moritz Hardt
1 podcast appearance

More from Latent Space: The AI Engineer Podcast

View all episodes →

Get AI Summaries for Every New Episode

Subscribe to Latent Space: The AI Engineer Podcast and get AI summaries, guest tracking, and email digests delivered automatically.

Sign Up Free →