Batch normalisation part 2 – day 26

Introduction to Batch Normalization Batch normalization is a widely used technique in deep learning that significantly improves the performance and stability of neural networks. Introduced by Sergey Ioffe and Christian Szegedy in 2015, this technique addresses the issues of vanishing and exploding gradients that can occur during training, particularly in deep networks. Why Batch Normalization? In deep learning, as data propagates through the layers of a neural network, it can lead to shifts in the distribution of inputs to layers deeper in the network—a phenomenon known as internal covariate shift. This shift can cause issues such as vanishing gradients, where gradients become too small, slowing down the training process, or exploding gradients, where they become too large, leading to unstable training. Traditional solutions like careful initialization and lower learning rates help, but they don’t entirely solve these problems. What is Batch Normalization? Batch normalization (BN) mitigates these issues by normalizing the inputs of each layer within a mini-batch, ensuring that the inputs to a given layer have a consistent distribution. This normalization happens just before or after the activation function of each hidden layer. Here’s a step-by-step breakdown of how batch normalization works: Zero-Centering and Normalization: \[ \mu_B = \frac{1}{m_B}…

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here
FAQ Chatbot

Select a Question

Or type your own question

For best results, phrase your question similar to our FAQ examples.