Momentum vs Normalization in Deep learning -Part 2 – Day 34

Comparing Momentum and Normalization in Deep Learning: A Mathematical Perspective Momentum and normalization are two pivotal techniques in deep learning that enhance the efficiency and stability of training. This article explores the mathematics behind these methods, provides examples with and without these techniques, and demonstrates why they are beneficial for deep learning models.  Comparing Momentum and Normalization Momentum: Smoothing and Accelerating Convergence Momentum is an optimization technique that modifies the standard gradient descent by adding a velocity term to the update rule. This velocity term is a running average of past gradients, which helps the optimizer to continue moving in directions where gradients are consistently pointing, thereby accelerating convergence and reducing oscillations. Mathematical Formulation: Without Momentum (Standard Gradient Descent): With Momentum: Here, is the momentum coefficient (typically around 0.9), and accumulates the gradients to provide smoother and more directed updates. Example with and Without Momentum: Consider a simple quadratic loss function , starting with , a learning rate , and for momentum. Without Momentum: Iteration 1: Gradient at : Update: Iteration 2: Gradient at : Update: With Momentum: Iteration 1: Gradient at : Velocity update: Update: Iteration 2: Gradient at : Velocity update: Update: Why Momentum is Better: Faster Convergence:...

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here

Batch normalisation – trainable and non trainable – day 27

Demystifying Trainable and Non-Trainable Parameters in Batch Normalization Batch normalization (BN) is a powerful technique used in deep learning to stabilize and accelerate training. The core idea behind BN is to normalize the output of a previous layer by subtracting the batch mean and dividing by the batch standard deviation. This is expressed by the following general formula: \[\hat{x} = \frac{x – \mu_B}{\sqrt{\sigma_B^2 + \epsilon}}\]\[y = \gamma \hat{x} + \beta\] Where: Why This Formula is Helpful The normalization step ensures that the input to each layer has a consistent distribution, which addresses the problem of “internal covariate shift”—where the distribution of inputs to a layer changes during training. By maintaining a stable distribution, the training process becomes more efficient, requiring less careful initialization of parameters and allowing for higher learning rates. The addition of \( \gamma \) and \( \beta \) parameters allows the model to restore the capacity of the network to represent the original data distribution. This means that the model can learn any representation it could without normalization, but with the added benefits of stabilized and accelerated training. The use of batch normalization has been shown empirically to result in faster convergence and improved model performance, particularly...

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here

Batch normalisation part 2 – day 26

Introduction to Batch Normalization Batch normalization is a widely used technique in deep learning that significantly improves the performance and stability of neural networks. Introduced by Sergey Ioffe and Christian Szegedy in 2015, this technique addresses the issues of vanishing and exploding gradients that can occur during training, particularly in deep networks. Why Batch Normalization? In deep learning, as data propagates through the layers of a neural network, it can lead to shifts in the distribution of inputs to layers deeper in the network—a phenomenon known as internal covariate shift. This shift can cause issues such as vanishing gradients, where gradients become too small, slowing down the training process, or exploding gradients, where they become too large, leading to unstable training. Traditional solutions like careful initialization and lower learning rates help, but they don’t entirely solve these problems. What is Batch Normalization? Batch normalization (BN) mitigates these issues by normalizing the inputs of each layer within a mini-batch, ensuring that the inputs to a given layer have a consistent distribution. This normalization happens just before or after the activation function of each hidden layer. Here’s a step-by-step breakdown of how batch normalization works: Zero-Centering and Normalization: \[ \mu_B = \frac{1}{m_B}...

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here

Batch Normalization – day 25

Understanding Batch Normalization in Deep Learning Understanding Batch Normalization in Deep Learning Deep learning has revolutionized numerous fields, from computer vision to natural language processing. However, training deep neural networks can be challenging due to issues like unstable gradients. In particular, gradients can either explode (grow too large) or vanish (shrink too small) as they propagate through the network. This instability can slow down or completely halt the learning process. To address this, a powerful technique called Batch Normalization was introduced. The Problem: Unstable Gradients In deep networks, the issue of unstable gradients becomes more pronounced as the network depth increases. When gradients vanish, the learning process becomes very slow, as the model parameters are updated minimally. Conversely, when gradients explode, the model parameters may be updated too drastically, causing the learning process to diverge. Introducing Batch Normalization Batch Normalization (BN) is a technique designed to stabilize the learning process by normalizing the inputs to each layer within the network. Proposed by Sergey Ioffe and Christian Szegedy in 2015, this method has become a cornerstone in training deep neural networks effectively. How Batch Normalization Works Step 1: Compute the Mean and Variance For each mini-batch of data, Batch Normalization first...

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here