Lets go through Paper of DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning – Day 80

Lets First Go Through its official paper of : DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning What Is DeepSeek-R1? DeepSeek-R1 is a new method for training large language models (LLMs) so they can solvetough reasoning problems (like math and coding challenges) more reliably. It starts with a base model(“DeepSeek-V3”) and then applies Reinforcement Learning (RL) in a way thatmakes the model teach itself to reason step by step, without relying on a huge amount of labeled examples. In simpler terms: They take an existing language model. They let it practice solving problems on its own, rewarding it when it reaches a correct, properly formatted answer. Over many practice rounds, it gets really good at giving detailed, logical responses. Two Main Versions DeepSeek-R1-Zero They begin by training the model purely with RL, giving it no extra “teacher” data(no big supervised datasets). Surprisingly, this alone makes the model much better at step-by-stepreasoning—almost like how a human can get better at math by practicing a bunch of problems andchecking answers. DeepSeek-R1  Although DeepSeek-R1-Zero improves reasoning, sometimes it produces messy or mixed-language answers.To fix that, they: Gather a small amount of supervised “cold-start” data to clean up its style and correctness. Do...

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here

Mathematical Explanation behind SGD Algorithm in Machine Learning _ day 5

In our previous blog post – on day 4 – we have talked about using the SGD algorithm for the MNIST dataset. But what is Stochastic Gradient Descent? Stochastic Gradient Descent (SGD) is an iterative method for optimizing an objective function that is written as a sum of differentiable functions. It’s a variant of the traditional gradient descent algorithm but with a twist: instead of computing the gradient of the whole dataset, it approximates the gradient using a single data point or a small batch of data points. This makes SGD much faster and more scalable, especially for large datasets. Why is SGD Important? Efficiency: By updating the parameters using only a subset of data, SGD reduces computation time, making it faster than batch gradient descent for large datasets. Online Learning: SGD can be used in online learning scenarios where the model is updated continuously as new data comes in. Convergence: Although SGD introduces more noise into the optimization process, this can help in escaping local minima and finding a better global minimum. The SGD Algorithm The goal of SGD is to minimize an objective function $J(\theta)$ with respect to the parameters $\theta$. Here’s the general procedure: Initialize: Randomly initialize...

Membership Required

You must be a member to access this content.

View Membership Levels

Already a member? Log in here