🦉 Blogs
- 2022.10: 🎉 Transformer Attention Layer gradient The full derivation of Transformer attention gradient. We also compare the gradient we calculated with PyTorch to prove the correctness.
- 2022.08: 🎉 CNN Stochastic Gradient Descent The full derivation of CNN gradient.