Batch normalisation – trainable and non trainable – day 27

Demystifying Trainable and Non-Trainable Parameters in Batch Normalization Batch normalization (BN) is a powerful technique used in deep learning to stabilize and accelerate training. The core idea behind BN is to normalize the output of a previous layer by subtracting the batch mean and dividing by the batch standard deviation. This is expressed by the following general formula: \[\hat{x} = \frac{x – \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}\]\[y = \gamma \hat{x} + \beta\] Where: \( x \) is the input to the batch normalization layer. \( \mu_B \) and \( \sigma_B^2 \) are the mean and variance of the current mini-batch, respectively. \( \epsilon \) is a small constant added to avoid division by zero. \( \hat{x} \) is the normalized output. \( \gamma \) and \( \beta \) are learnable parameters that scale and shift the normalized output. Why This Formula is Helpful The normalization step ensures that the input to each layer has a consistent distribution, which addresses the problem of “internal covariate shift”—where the distribution of inputs to a layer changes during training. By maintaining a stable distribution, the training process becomes more efficient, requiring less careful initialization of parameters and allowing for higher learning rates. The addition of \(…

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here
FAQ Chatbot

Select a Question

Or type your own question

For best results, phrase your question similar to our FAQ examples.