✨
AI Summary
- Kevin Wang et al. won NeurIPS 2025 Best Paper by scaling RL networks to 1,000 layers deep, defying decade-long conventional wisdom that depth fails in RL
- Key insight: self-supervised RL using contrastive learning on state/action/future representations scales where value-based methods collapse; architecture matters (residual connections, layer norm)
- Scaling depth proves more parameter-efficient than width (linear vs quadratic growth); shift from regression to classification objectives enabled breakthrough