Machine Learning Overview

Mathematics Behind CNN or Convolutional Neural Network in Deep Learning – Day 54






CNN Process with Math

You have seen what’s CNN in our previous article:

View Article
.
Now let’s check the mathematics behind in details & step by step with a very simple example.

Part 1: Input Layer, Convolution, and Pooling (Steps 1-4)

Step 1: Input Layer

We are processing two 3×3 grayscale images—one representing a zebra and one representing a cat.

Image 1: Zebra Image (e.g., with stripe-like patterns)

\text{Zebra Image} = \begin{bmatrix} 0.9 & 0.1 & 0.2 \\ 0.8 & 0.7 & 0.3 \\ 0.1 & 0.4 & 0.6 \end{bmatrix}

Image 2: Cat Image (e.g., with smoother, fur-like textures)

\text{Cat Image} = \begin{bmatrix} 0.3 & 0.4 & 0.2 \\ 0.5 & 0.6 & 0.7 \\ 0.1 & 0.2 & 0.3 \end{bmatrix}

These images are represented as 2D grids of pixel values, with each value between 0 and 1 indicating pixel intensity.

Step 2: Convolutional Layer (Feature Extraction)

We’ll apply a 3×3 convolutional filter to detect patterns such as edges. For simplicity, we’ll use the same filter for both images.

Convolution Filter (Edge Detector):

\text{Filter} = \begin{bmatrix} 1 & 0 & -1 \\ 1 & 0 & -1 \\ 1 & 0 & -1 \end{bmatrix}

Convolution on the Zebra Image

For the first patch (the full 3×3 grid), the element-wise multiplication with the filter is:

\text{Zebra Patch} = \begin{bmatrix} 0.9 & 0.1 & 0.2 \\ 0.8 & 0.7 & 0.3 \\ 0.1 & 0.4 & 0.6 \end{bmatrix}

\text{Filter} \odot \text{Zebra Patch} = \begin{bmatrix} 1 \times 0.9 & 0 \times 0.1 & -1 \times 0.2 \\ 1 \times 0.8 & 0 \times 0.7 & -1 \times 0.3 \\ 1 \times 0.1 & 0 \times 0.4 & -1 \times 0.6 \end{bmatrix} = \begin{bmatrix} 0.9 & 0 & -0.2 \\ 0.8 & 0 & -0.3 \\ 0.1 & 0 & -0.6 \end{bmatrix}

Summing the values:

0.9 + 0 + (-0.2) + 0.8 + 0 + (-0.3) + 0.1 + 0 + (-0.6) = 0.7

The feature map value for this part of the zebra image is 0.7.

Convolution on the Cat Image

Now, let’s perform the convolution on the cat image.

\text{Cat Patch} = \begin{bmatrix} 0.3 & 0.4 & 0.2 \\ 0.5 & 0.6 & 0.7 \\ 0.1 & 0.2 & 0.3 \end{bmatrix}

\text{Filter} \odot \text{Cat Patch} = \begin{bmatrix} 1 \times 0.3 & 0 \times 0.4 & -1 \times 0.2 \\ 1 \times 0.5 & 0 \times 0.6 & -1 \times 0.7 \\ 1 \times 0.1 & 0 \times 0.2 & -1 \times 0.3 \end{bmatrix} = \begin{bmatrix} 0.3 & 0 & -0.2 \\ 0.5 & 0 & -0.7 \\ 0.1 & 0 & -0.3 \end{bmatrix}

Summing the values:

0.3 + 0 + (-0.2) + 0.5 + 0 + (-0.7) + 0.1 + 0 + (-0.3) = -0.3

The feature map value for this part of the cat image is -0.3.

Step 3: ReLU Activation (Non-Linearity)

The ReLU activation function converts all negative values to 0 while keeping positive values unchanged.

  • Zebra Image (after ReLU): 0.7 remains 0.7.
  • Cat Image (after ReLU): -0.3 becomes 0.0.

Step 4: Pooling Layer (Downsampling)

Next, we apply max pooling to downsample the feature maps. For simplicity, let’s assume we reduce each feature map to a single value:

  • Zebra Image (after pooling): 0.7
  • Cat Image (after pooling): 0.0

Part 2: Fully Connected Layer (Step 5)

Now, we move to the fully connected layer, where we combine the features extracted from the convolutional layer and use weights and biases to compute the logits for each class.

Step 5: Fully Connected Layer – Weight and Bias Calculation

At this point, we’ve flattened the pooled outputs:

  • Zebra Image (flattened): [0.7]
  • Cat Image (flattened): [0.0]

The fully connected layer computes the logits for each class (zebra, cat) using weights and biases. For this step, we’ll show how weights and biases are applied, and how they are updated based on loss.

The formula for the logit of class i is:

z_i = (w_i \times \text{input}) + b_i

Where:

  • w_i is the weight for class i,
  • \text{input} is the feature value from the previous layer (e.g., [0.7]),
  • b_i is the bias for class i,
  • z_i is the logit for class i (before applying Softmax).

Let’s assume the following initial weights and biases for each class:

  • Zebra class: Initial weight: w_{\text{zebra}}^{(0)} = 3.0, Initial bias: b_{\text{zebra}}^{(0)} = 0.4
  • Cat class: Initial weight: w_{\text{cat}}^{(0)} = 1.5, Initial bias: b_{\text{cat}}^{(0)} = 0.2

Logit Calculation for Zebra Image (Flattened Input: [0.7])

For the zebra class:

z_{\text{zebra}}^{(1)} = (w_{\text{zebra}}^{(0)} \times 0.7) + b_{\text{zebra}}^{(0)} = (3.0 \times 0.7) + 0.4 = 2.1 + 0.4 = 2.5

For the cat class:

z_{\text{cat}}^{(1)} = (w_{\text{cat}}^{(0)} \times 0.7) + b_{\text{cat}}^{(0)} = (1.5 \times 0.7) + 0.2 = 1.05 + 0.2 = 1.25

Logit Calculation for Cat Image (Flattened Input: [0.0])

For the zebra class:

z_{\text{zebra}}^{(1)} = (w_{\text{zebra}}^{(0)} \times 0.0) + b_{\text{zebra}}^{(0)} = (3.0 \times 0.0) + 0.4 = 0.0 + 0.4 = 0.4

For the cat class:

z_{\text{cat}}^{(1)} = (w_{\text{cat}}^{(0)} \times 0.0) + b_{\text{cat}}^{(0)} = (1.5 \times 0.0) + 0.2 = 0.0 + 0.2 = 0.2

Part 3: Loss Calculation and Weight/Bias Update (Backpropagation)

Now that we’ve computed the logits, we need to calculate the loss and perform backpropagation to update the weights and biases.

Loss Calculation (Cross-Entropy Loss)

We use cross-entropy loss to measure the difference between the predicted probabilities and the true labels. For simplicity, assume the true label for the zebra image is zebra (1), and the true label for the cat image is cat (1).

Cross-entropy loss formula for a single class:

\text{Loss} = -\sum_{i} y_i \log(p_i)

Where:

  • y_i is the true label (1 for the correct class, 0 for the others),
  • p_i is the predicted probability for class i.

First, we apply the Softmax function to convert the logits into probabilities.

Softmax for the Zebra Image:

p_i = \frac{e^{z_i}}{\sum_j e^{z_j}}

  • Zebra logit: 2.5
  • Cat logit: 1.25

Exponentials:

e^{2.5} \approx 12.18, \quad e^{1.25} \approx 3.49

Sum of exponentials:

12.18 + 3.49 = 15.67

Probabilities:

  • Zebra: p_{\text{zebra}} = \frac{12.18}{15.67} \approx 0.78
  • Cat: p_{\text{cat}} = \frac{3.49}{15.67} \approx 0.22

Cross-Entropy Loss for Zebra Image:

Let’s assume the true label is “zebra” (1 for zebra, 0 for cat):

\text{Loss}_{\text{zebra}} = -[1 \times \log(0.78) + 0 \times \log(0.22)] = -\log(0.78) \approx 0.249

This is the loss value for the zebra image.

Part 4: Weight and Bias Update Using Gradient Descent

Using the loss value, the network performs backpropagation to calculate the gradients (partial derivatives of the loss with respect to each weight and bias). These gradients are used to update the weights and biases to reduce the loss in the next iteration.

For simplicity, assume the gradients calculated for each weight and bias are as follows (based on chain rule and backpropagation):

  • Gradient for zebra weight: \frac{\partial \text{Loss}}{\partial w_{\text{zebra}}} = -0.05
  • Gradient for zebra bias: \frac{\partial \text{Loss}}{\partial b_{\text{zebra}}} = -0.02
  • Gradient for cat weight: \frac{\partial \text{Loss}}{\partial w_{\text{cat}}} = 0.03
  • Gradient for cat bias: \frac{\partial \text{Loss}}{\partial b_{\text{cat}}} = 0.01

Updated Weights and Biases

Using gradient descent, we update the weights and biases:

w_{\text{zebra}}^{(1)} = w_{\text{zebra}}^{(0)} - \eta \times \frac{\partial \text{Loss}}{\partial w_{\text{zebra}}}

Where \eta is the learning rate (assume \eta = 0.1):

w_{\text{zebra}}^{(1)} = 3.0 - 0.1 \times (-0.05) = 3.0 + 0.005 = 3.005

b_{\text{zebra}}^{(1)} = 0.4 - 0.1 \times (-0.02) = 0.4 + 0.002 = 0.402

For the cat class:

w_{\text{cat}}^{(1)} = 1.5 - 0.1 \times (0.03) = 1.5 - 0.003 = 1.497

b_{\text{cat}}^{(1)} = 0.2 - 0.1 \times (0.01) = 0.2 - 0.001 = 0.199

Recap

  • The network computes logits using weights and biases.
  • Softmax converts logits into probabilities.
  • Cross-entropy loss measures the error, and gradients are computed through backpropagation.
  • Gradients update the weights and biases to reduce the loss for the next iteration.

This entire process repeats until the network minimizes the loss and correctly classifies images of zebras and cats.

Part 5: Continuing the Process for Subsequent Iterations

Now that we’ve seen how the fully connected layer works and how weights and biases are updated in one iteration, let’s continue to show what happens in the following iterations as the network learns. The goal is to keep refining the weights and biases through backpropagation and gradient descent, so the network gets better at predicting whether the input is a zebra or a cat.

Updated Weights and Biases After First Iteration:

  • Zebra class: Updated weight: w_{\text{zebra}}^{(1)} = 3.005, Updated bias: b_{\text{zebra}}^{(1)} = 0.402
  • Cat class: Updated weight: w_{\text{cat}}^{(1)} = 1.497, Updated bias: b_{\text{cat}}^{(1)} = 0.199

Step 5: Fully Connected Layer Calculations for the Second Iteration

Logit Calculation for Zebra Image (Flattened Input: [0.7])

Now, using the updated weights and biases, we recalculate the logits for the zebra image.

For the zebra class:

z_{\text{zebra}}^{(2)} = (w_{\text{zebra}}^{(1)} \times 0.7) + b_{\text{zebra}}^{(1)} = (3.005 \times 0.7) + 0.402 = 2.1035 + 0.402 = 2.5055

For the cat class:

z_{\text{cat}}^{(2)} = (w_{\text{cat}}^{(1)} \times 0.7) + b_{\text{cat}}^{(1)} = (1.497 \times 0.7) + 0.199 = 1.0479 + 0.199 = 1.2469

Logit Calculation for Cat Image (Flattened Input: [0.0])

For the zebra class:

z_{\text{zebra}}^{(2)} = (w_{\text{zebra}}^{(1)} \times 0.0) + b_{\text{zebra}}^{(1)} = (3.005 \times 0.0) + 0.402 = 0.402

For the cat class:

z_{\text{cat}}^{(2)} = (w_{\text{cat}}^{(1)} \times 0.0) + b_{\text{cat}}^{(1)} = (1.497 \times  0.0) + 0.199 = 0.199

Step 6: Reapplying Softmax Function

Now, let’s apply the Softmax function to the updated logits to get the new predicted probabilities.

For the Zebra Image:

Logits:

  • Zebra logit: 2.5055
  • Cat logit: 1.2469

Exponentials:

e^{2.5055} \approx 12.26, \quad e^{1.2469} \approx 3.48

Sum of exponentials:

12.26 + 3.48 = 15.74

Probabilities:

  • Zebra: p_{\text{zebra}} = \frac{12.26}{15.74} \approx 0.779
  • Cat: p_{\text{cat}} = \frac{3.48}{15.74} \approx 0.221

For the Cat Image:

Logits:

  • Zebra logit: 0.402
  • Cat logit: 0.199

Exponentials:

e^{0.402} \approx 1.495, \quad e^{0.199} \approx 1.22

Sum of exponentials:

1.495 + 1.22 = 2.715

Probabilities:

  • Zebra: p_{\text{zebra}} = \frac{1.495}{2.715} \approx 0.55
  • Cat: p_{\text{cat}} = \frac{1.22}{2.715} \approx 0.45

Step 7: Recalculating Loss

Next, we calculate the cross-entropy loss for both images based on the updated predictions.

Loss for Zebra Image:

Assume the true label is still “zebra” (1 for zebra, 0 for cat). The cross-entropy loss is:

\text{Loss}_{\text{zebra}} = -[1 \times \log(0.779) + 0 \times \log(0.221)] = -\log(0.779) \approx 0.249

Loss for Cat Image:

Assume the true label is still “cat” (0 for zebra, 1 for cat). The cross-entropy loss is:

\text{Loss}_{\text{cat}} = -[0 \times \log(0.55) + 1 \times \log(0.45)] = -\log(0.45) \approx 0.799

Step 8: Further Weight and Bias Updates Using Backpropagation

Now, the network uses backpropagation to compute the gradients again and updates the weights and biases based on the new loss values. For simplicity, assume the gradients for this iteration are as follows:

  • Gradient for zebra weight: \frac{\partial \text{Loss}}{\partial w_{\text{zebra}}} = -0.03
  • Gradient for zebra bias: \frac{\partial \text{Loss}}{\partial b_{\text{zebra}}} = -0.01
  • Gradient for cat weight: \frac{\partial \text{Loss}}{\partial w_{\text{cat}}} = 0.02
  • Gradient for cat bias: \frac{\partial \text{Loss}}{\partial b_{\text{cat}}} = 0.005

Using gradient descent, the weights and biases are updated again:

w_{\text{zebra}}^{(2)} = w_{\text{zebra}}^{(1)} - \eta \times \frac{\partial \text{Loss}}{\partial w_{\text{zebra}}} = 3.005 - 0.1 \times (-0.03) = 3.005 + 0.003 = 3.008

b_{\text{zebra}}^{(2)} = b_{\text{zebra}}^{(1)} - \eta \times \frac{\partial b_{\text{zebra}}} = 0.402 - 0.1 \times (-0.01) = 0.402 + 0.001 = 0.403

Similarly, for the cat class:

w_{\text{cat}}^{(2)} = w_{\text{cat}}^{(1)} - 0.1 \times (0.02) = 1.497 - 0.002 = 1.495

b_{\text{cat}}^{(2)} = b_{\text{cat}}^{(1)} - 0.1 \times (0.005) = 0.199 - 0.0005 = 0.1985

Repeat the Process

This process of calculating logits, applying Softmax, calculating loss, and updating weights and biases continues over many iterations until the network minimizes the loss and correctly classifies images with high confidence. Over time, the network improves at distinguishing between zebra and cat features.

  • Input: We started with two images (a zebra and a cat) as input to the CNN.
  • Convolutional Layers: We applied a convolutional filter to detect edge features.
  • ReLU: We applied the ReLU activation function to retain positive feature values.
  • Pooling: We downsampled the feature map using max pooling.
  • Fully Connected Layer: The flattened pooled values were multiplied by weights and added to biases to calculate logits.
  • Softmax: We converted logits into probabilities using the Softmax function.
  • Loss Calculation: We used cross-entropy loss to measure the error between the predicted and true labels.
  • Backpropagation: Gradients were computed, and weights and biases were updated using gradient descent to minimize the loss.
  • Iteration: This process repeated, with weights and biases gradually adjusting to improve classification performance.

This detailed process explains how the CNN learns over time to classify images like zebra and cat by adjusting weights and biases based on the loss function!

Part 6: Continuing with Next Iteration (Iteration 3)

At this point, we’ve already updated the weights and biases after two iterations. Let’s update them again and show how the network refines its predictions as we explained through the repeating the process .

Lets show the Updated Weights and Biases After Iteration 2

  • Zebra class: Updated weight: w_{\text{zebra}}^{(2)} = 3.008, Updated bias: b_{\text{zebra}}^{(2)} = 0.403
  • Cat class: Updated weight: w_{\text{cat}}^{(2)} = 1.495, Updated bias: b_{\text{cat}}^{(2)} = 0.1985

Step 5: Fully Connected Layer Calculations for the Next Iteration

Logit Calculation for Zebra Image (Flattened Input: [0.7])

For the zebra class:

z_{\text{zebra}}^{(3)} = (w_{\text{zebra}}^{(2)} \times 0.7) + b_{\text{zebra}}^{(2)} = (3.008 \times 0.7) + 0.403 = 2.1056 + 0.403 = 2.5086

For the cat class:

z_{\text{cat}}^{(3)} = (w_{\text{cat}}^{(2)} \times 0.7) + b_{\text{cat}}^{(2)} = (1.495 \times 0.7) + 0.1985 = 1.0465 + 0.1985 = 1.245

Logit Calculation for Cat Image (Flattened Input: [0.0])

For the zebra class:

z_{\text{zebra}}^{(3)} = (w_{\text{zebra}}^{(2)} \times 0.0) + b_{\text{zebra}}^{(2)} = (3.008 \times 0.0) + 0.403 = 0.403

For the cat class:

z_{\text{cat}}^{(3)} = (w_{\text{cat}}^{(2)} \times 0.0) + b_{\text{cat}}^{(2)} = (1.495 \times 0.0) + 0.1985 = 0.1985

Step 6: Softmax Calculation (Converting Logits to Probabilities)

Next, we apply the Softmax function to convert the logits into probabilities, which will tell us how confident the network is about the image being a zebra or a cat.

For the Zebra Image:

Logits:

  • Zebra: 2.5086
  • Cat: 1.245

Exponentials:

e^{2.5086} \approx 12.28, \quad e^{1.245} \approx 3.47

Sum of exponentials:

12.28 + 3.47 = 15.75

Probabilities:

  • Zebra: p_{\text{zebra}} = \frac{12.28}{15.75} \approx 0.78
  • Cat: p_{\text{cat}} = \frac{3.47}{15.75} \approx 0.22

For the Cat Image:

Logits:

  • Zebra: 0.403
  • Cat: 0.1985

Exponentials:

e^{0.403} \approx 1.497, \quad e^{0.1985} \approx 1.22

Sum of exponentials:

1.497 + 1.22 = 2.717

Probabilities:

  • Zebra: p_{\text{zebra}} = \frac{1.497}{2.717} \approx 0.55
  • Cat: p_{\text{cat}} = \frac{1.22}{2.717} \approx 0.45

Step 7: Final Classification Decision

At this point, we have the final probabilities for both the zebra and cat images.

  • For the Zebra Image: The network is 78% confident that the image is a zebra and 22% confident that it’s a cat. Since the probability for zebra is higher, the network classifies the zebra image as a zebra.
  • For the Cat Image: The network is 55% confident that the image is a zebra and 45% confident that it’s a cat. This is a borderline case, but the network leans slightly toward classifying the cat image as a zebra, though with low confidence.

Step 8: Improving the Network (More Iterations)

At this point, the network still isn’t confident enough about the cat image—the probability for zebra is still higher than for cat. However, the network will continue to update its weights and biases through more iterations to become better at identifying cat-specific features (like smooth textures or fur-like patterns) and distinguishing them from zebra-specific features (like stripes and edges).

After many more training iterations, the network will gradually improve its performance and reach higher confidence for classifying both zebras and cats. Over time, the probability for cat in the cat image will rise as the network learns to associate specific patterns with the correct class.

Final Conclusion

  • The network processes each image through convolutional layers, ReLU, and pooling to extract features.
  • In the fully connected layer, the network uses weights and biases to calculate logits for each class.
  • The Softmax function converts these logits into probabilities, giving the network’s confidence in its predictions.
  • After several iterations of training and weight updates, the network becomes confident that the zebra image is a zebra (with a 78% probability).
  • The network is still unsure about the cat image but leans toward classifying it as a zebra (with 55% probability) due to insufficient training iterations.

By continuing the training process (more iterations), the network will eventually be able to classify both images with high confidence, reducing the classification error over time.

How the CNN Refines Its Predictions

This is how the CNN refines its predictions through the iterative process of backpropagation, weight updates, and loss minimization. With more training and weight adjustment, the network will improve and reduce the classification error for future inputs.

As a Perfect Example Check the INGOAMPT app called :background img remove INGOAMPT & Discover how the Ingoampt app leverages CNN deep learning through Apple’s Core ML technology.

Wanna support INGOAMPT? Purchase this just 1$ app to contribute & be part of our journey:


Background Remove Image Link Click Here
.