✨
AI Summary
- Policy entropy declines rapidly in RL for LLMs, limiting exploration.
- Performance gains correlate directly with entropy reduction, creating a ceiling.
- New analysis links entropy change to action probability covariance.