3 Types of Gradient Decent Types : Batch, Stochastic & Mini-Batch _ Day 8

Understanding Gradient Descent: Batch, Stochastic, and Mini-Batch Understanding Gradient Descent: Batch, Stochastic, and Mini-BatchLearn the key differences between Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent, and how to apply them in your machine learning models. Batch Gradient DescentBatch Gradient Descent uses the entire dataset to calculate the gradient of the cost function, leading to stable, consistent steps toward an optimal solution. It is computationally expensive, making it suitable for smaller datasets where high precision is crucial. Formula: \[\theta := \theta – \eta \cdot \frac{1}{m} \sum_{i=1}^{m} \nabla_{\theta} J(\theta; x^{(i)}, y^{(i)})\] \(\theta\) = parameters \(\eta\) = learning rate \(m\) = number of training examples \(\nabla_{\theta} J(\theta; x^{(i)}, y^{(i)})\) = gradient of the cost function Stochastic Gradient Descent (SGD)Stochastic Gradient Descent updates parameters using each training example individually. This method can quickly adapt to new patterns, potentially escaping local minima more effectively than Batch Gradient Descent. It is particularly useful for large datasets and online learning environments. Formula: \[\theta := \theta – \eta \cdot \nabla_{\theta} J(\theta; x^{(i)}, y^{(i)})\] \(\theta\) = parameters \(\eta\) = learning rate \(\nabla_{\theta} J(\theta; x^{(i)}, y^{(i)})\) = gradient of the cost function for a single training example Mini-Batch Gradient DescentMini-Batch Gradient Descent is a hybrid approach that divides the dataset into small batches and updates parameters for each batch. This method balances the robustness of Batch Gradient Descent with the flexibility and speed of Stochastic Gradient Descent, making it highly efficient, especially on modern hardware that can parallelize computations. Formula: \[\theta := \theta – \eta \cdot \frac{1}{n} \sum_{i=1}^{n} \nabla_{\theta} J(\theta; x^{(i)}, y^{(i)})\] \(\theta\) = parameters \(\eta\) = learning rate \(n\) = number of training examples in a mini-batch \(\nabla_{\theta} J(\theta; x^{(i)}, y^{(i)})\) = gradient of the cost function for a mini-batch Key Takeaways Batch Gradient Descent: Best for smaller datasets; provides stable but slow convergence. Stochastic Gradient Descent (SGD): Suitable for large datasets and online learning; faster updates but can be noisy. Mini-Batch Gradient Descent: Combines the benefits of both; optimal for deep learning with balanced performance and stability. Choosing the right gradient descent method can significantly impact the efficiency and performance of your machine learning model. For most deep learning tasks, Mini-Batch Gradient Descent is the preferred choice due to its balanced approach. For more in-depth guides and practical implementations, check out resources from Machine Learning Mastery and Analytics Vidhya. Gradient Descent Example for Small Business Revenue Prediction Gradient Descent Example for Small Business Revenue Prediction IntroductionThis part of the blog will use a simple linear regression to predict the revenue based on marketing spend. We will use a small dataset to illustrate how each gradient descent method performs and impacts the efficiency and accuracy of the model. Dataset and Initial SetupOur dataset consists of three data points representing daily marketing spend and the revenue: Day 1: \(x_1 = \$2000\), \(y_1 = \$3000\) Day 2: \(x_2 = \$3000\), \(y_2 = \$5000\) Day 3: \(x_3 = \$2500\), \(y_3 = \$4000\) Model: \( y = \theta_0 + \theta_1 \cdot x \) Initial Parameters: \( \theta_0 = 0 \), \( \theta_1 = 0 \) Learning Rate: \( \eta = 0.000001 \) Applying Gradient Descent Methods Batch Gradient Descent Calculate the gradient for each parameter across all data points. Update parameters based on these gradients. Repeat until convergence or for a fixed number of iterations. Gradient Calculations and Update (One iteration): Gradient of \( \theta_0 \) = \(\frac{2}{3} [(-3000) + (-5000) + (-4000)] = -8000\) Gradient of \( \theta_1 \) = \(\frac{2}{3} [(-3000 \times 2000) + (-5000 \times 3000) + (-4000 \times 2500)] = -36666666.67\) Updated \( \theta_0 \) = \(0 – 0.000001 \times -8000 = 0.008\) Updated \( \theta_1 \) = \(0 – 0.000001 \times -36666666.67 = 36.67\) Stochastic Gradient DescentUpdates parameters using each data point individually: Update after Day 1: \(\theta_0\) updated to 0.006 \(\theta_1\) updated to 6 Update after Day 2: \(\theta_0\) updated to 0.011 \(\theta_1\) updated to 21 Update after Day 3: \(\theta_0\) updated to 0.019 \(\theta_1\) updated to 29.5 Mini-Batch Gradient DescentUsing mini-batches of size 2 (First two days as one batch and the third day as another batch): First Batch (Day 1 & 2): \(\theta_0\) updated to 0.007 \(\theta_1\) updated to 13.5 Second Batch (Day 3): \(\theta_0\) updated to 0.015 \(\theta_1\) updated to 26.75 Comparison and ConclusionBatch Gradient Descent provides the most stable and consistent updates but may be slow for larger datasets. Stochastic Gradient Descent offers faster updates but can result in significant parameter fluctuations, which may hinder convergence. Mini-Batch Gradient Descent balances both approaches, providing updates that are both timely and relatively stable, making it ideal for most practical applications. Gradient Descent Detailed Comparison for Revenue Prediction Model Gradient Descent Detailed Comparison for Revenue Prediction Model OverviewThis…

Thank you for reading this post, don't forget to subscribe!

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here
FAQ Chatbot

Select a Question

Or type your own question

For best results, phrase your question similar to our FAQ examples.