ML Math

AI Learning Notes

Worked derivations for core ML building blocks — forward passes, gradients, and intuitions.

Loss Functions

Softmax probabilities, numerically stable forward pass, and the clean p − y gradient derivation through the combined softmax + loss.

Full forward and backward pass including the three-term dx formula through mean, variance, and normalized input.

Simpler two-term dx derivation. Includes a side-by-side comparison with LayerNorm and a feature table.

Full backward pass normalizing across the batch dimension, including training vs. inference running statistics and a comparison with LayerNorm.

Swish activation derivative, gated hidden state gradients, and weight gradients for all three projection matrices.

Per-pair rotation forward pass, the relative-position dot-product property, and the symmetric backward pass via transposed rotation.