Worked derivations for core ML building blocks — forward passes, gradients, and intuitions.
Full forward and backward pass including the three-term dx formula through mean, variance, and normalized input.
Simpler two-term dx derivation. Includes a side-by-side comparison with LayerNorm and a feature table.
Full backward pass normalizing across the batch dimension, including training vs. inference running statistics and a comparison with LayerNorm.