Demystifying Trainable and Non-Trainable Parameters in Batch Normalization Batch normalization (BN) is a powerful technique used in deep learning to stabilize and accelerate training. The core idea behind BN is to normalize the output of a previous layer by subtracting the batch mean and dividing by the batch standard deviation. This is expressed by the following general formula: \[\hat{x} = \frac{x – \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}\]\[y = \gamma \hat{x} + \beta\] Where: \( x \) is the input to the batch normalization layer. \( \mu_B \) and \( \sigma_B^2 \) are the mean and variance of the current mini-batch, respectively. \( \epsilon \) is a small constant added to avoid division by zero. \( \hat{x} \) is the normalized output. \( \gamma \) and \( \beta \) are learnable parameters that scale and shift the normalized output. Why This Formula is Helpful The normalization step ensures that the input to each layer has a consistent distribution, which addresses the problem of “internal covariate shift”—where the distribution of inputs to a layer changes during training. By maintaining a stable distribution, the training process becomes more efficient, requiring less careful initialization of parameters and allowing for higher learning rates. The addition of \(…