✨
AI Summary
- Attention Residuals (AttnRes) replace fixed additive residuals with learned, input-dependent softmax attention.
- AttnRes treats model depth like a Transformer sequence, addressing the 'PreNorm dilution' problem.
- This new architecture from the Kimi Team offers a potential improvement over standard residual connections.