Batch normalisation – trainable and non trainable – day 27
Demystifying Trainable and Non-Trainable Parameters in Batch Normalization Batch normalization (BN) is a powerful technique used in deep learning to stabilize and accelerate training. The core idea behind BN is to normalize the output of a previous layer by subtracting the batch mean and dividing by the batch standard deviation. This is expressed by the following general formula: \[\hat{x} = \frac{x – \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}\]\[y = \gamma \hat{x} + \beta\] Where: Why This Formula is Helpful The normalization step ensures that the input to each layer has a consistent distribution, which addresses the problem of “internal covariate shift”—where the distribution of inputs to a layer changes during training. By maintaining a stable distribution, the training process becomes more efficient, requiring less careful initialization of parameters and allowing for higher learning rates. The addition of \( \gamma \) and \( \beta \) parameters allows the model to restore the capacity of the network to represent the original data distribution. This means that the model can learn any representation it could without normalization, but with the added benefits of stabilized and accelerated training. The use of batch normalization has been shown empirically to result in faster convergence and improved model performance, particularly...