PodcastIntel
Sign in Get Started Free
Neural intel Pod
Neural intel Pod

Group Sequence Policy Optimization for LLMs

Aug 1, 2025 · 00:32:58
AI Summary
  • Introduces Group Sequence Policy Optimization (GSPO) for LLM training.
  • Contrasts GSPO with unstable GRPO, addressing token-level importance sampling.
  • Defines importance ratios based on entire sequence likelihood for stability.

More from Neural intel Pod

View all episodes →

Get AI Summaries for Every New Episode

Subscribe to Neural intel Pod and get AI summaries, guest tracking, and email digests delivered automatically.

Sign Up Free →