Machine Learning Overview

CNN – Convolutional Neural Networks explained by INGOAMPT – DAY 53

Understanding Convolutional Neural Networks (CNNs): A Step-by-Step Breakdown

Convolutional Neural Networks (CNNs) are widely used in deep learning due to their ability to efficiently process image data. They perform complex operations on input images, enabling tasks like image classification, object detection, and segmentation. This step-by-step guide explains each stage of a CNN’s process, along with an example to clarify the concepts.

1. Input Image Representation

The first step is providing an image to the network as input. Typically, the image is represented as a 3D matrix where the dimensions are:

  • Height: Number of pixels vertically.
  • Width: Number of pixels horizontally.
  • Channels: Number of color channels (e.g., RGB for color images).

Example: A 32×32 RGB image is represented with the shape: (32, 32, 3)

2. Convolutional Layer

The Convolutional Layer applies filters to the image. Filters are small matrices that slide over the image, performing element-wise multiplication followed by summation. This produces feature maps.

Each filter detects specific features like edges or textures. The network learns these filters during training.

Mathematical Operation:
 S(i,j) = \sum_m \sum_n I(i+m, j+n) \cdot K(m,n)

3. Activation Function (ReLU)

After the convolutional layer, an Activation Function is applied. The most common activation function is ReLU (Rectified Linear Unit), which is mathematically expressed as:

 f(x) = \max(0, x)

ReLU zeroes out all negative values and retains only positive values, introducing non-linearity into the network.

4. Pooling Layer (Downsampling)

The Pooling Layer reduces the spatial dimensions of the feature maps. This helps reduce the computational complexity of the model and prevents overfitting. The most common pooling operation is Max Pooling.

 P(i,j) = \max(S(i+m, j+n))

5. Flattening Layer

Before passing data to the fully connected layers, the 2D feature maps are flattened into a 1D vector.

6. Fully Connected Layer

The Fully Connected Layer connects every input neuron to every output neuron, using learned features from earlier layers to make predictions.

7. Output Layer

The final layer in the CNN produces probabilities using a Softmax Activation Function.

 \hat{y}_i = \frac{e^{z_i}}{\sum_j e^{z_j}}

8. Backpropagation and Optimization

During training, the model uses Backpropagation to calculate the error and update weights. This process ensures the model improves over time by reducing the loss function.

Optimization algorithms: SGD (Stochastic Gradient Descent) or Adam

Step-by-Step Summary of CNN Operations

Step Operation Output
1. Input Layer Receives an image of size (32, 32, 3). (32, 32, 3)
2. Convolutional Layer Applies filters to detect features like edges. (30, 30, 32) (after applying a 3×3 filter)
3. Activation (ReLU) Applies ReLU to introduce non-linearity. (30, 30, 32) (negative values set to 0)
4. Pooling Layer Max pooling reduces the feature map size. (15, 15, 32)
5. Flattening Layer Flattens the 2D feature map into a 1D vector. (7200)
6. Fully Connected Layer Connects all features to output neurons. (128)
7. Output Layer Softmax outputs probabilities for each class. [0.9, 0.1] (e.g., 90% cat, 10% dog)
8. Backpropagation Updates weights to minimize error. Weights updated across layers

Key Note

This breakdown explains how CNNs process images step-by-step, from input to classification. Each layer plays a specific role in extracting features, learning complex patterns, and improving through backpropagation. CNNs are widely used because of their ability to automatically learn important features and apply them efficiently. Below once more time we explain this steps with the image we have shared for better understanding of what we explained so far.

source : https://developersbreach.com/convolution-neural-network-deep-learning/

Step-by-Step Comparison of CNN Operations and Image (Image on Top of Our Article)

In the image on top of our article, we can visualize the entire flow of a typical CNN architecture, which aligns well with the step-by-step table we previously discussed. Here’s how the steps from the table compare to the components shown in the image:

Step Operation (Table) Corresponding Component (Image)
1. Input Layer Receives the image as input, such as a 32 \times 32 RGB image. The Input section in the image, showing the zebra image as input.
2. Convolutional Layer Applies filters (kernels) to detect features like edges. The first Convolution + ReLU section in the image, where filters are applied to the input image.
3. Activation (ReLU) Applies ReLU to introduce non-linearity and eliminate negative values. Part of the Convolution + ReLU layers, shown after the convolution step in the image.
4. Pooling Layer Applies max pooling to downsample the feature maps and reduce the size of the data. The Pooling layers in the image, reducing the size of the feature maps.
5. Flattening Layer Flattens the 2D feature maps into a 1D vector to prepare for the fully connected layer. The Flatten Layer in the image, located between feature extraction and classification.
6. Fully Connected Layer Combines all the features and connects them to the output neurons, learning complex representations. The Fully Connected Layer in the image, where the flattened feature maps are used for classification.
7. Output Layer Uses the Softmax function to convert logits to probabilities for classification. The Softmax Activation Function and the Output Layer in the image, showing final class probabilities (e.g., 0.7 for zebra).
8. Backpropagation Updates the weights based on the loss function during training. Not explicitly shown in the image, but occurs during training after generating output.

Detailed Comparison: Image Breakdown and Table Alignment

Let’s compare the steps more thoroughly:

  1. Input Layer: The zebra serves as the input in the image, which corresponds to the first step in the table. Here, the image is processed with dimensions like 32 \times 32 \times 3.
  2. Convolutional + ReLU Layers: The convolution layers in the image apply filters to detect simple features like edges. These layers are clearly labeled Convolution + ReLU, aligning with the convolution and activation steps in the table.
  3. Pooling Layers: The pooling layers in the image are responsible for downsampling the feature maps. This is a critical step in reducing the spatial dimensions, matching the pooling step in the table.
  4. Flatten Layer: The feature maps are flattened into a 1D vector, as shown in the image. This step prepares the data for classification in the fully connected layers.
  5. Fully Connected Layer: In the image, the fully connected layer learns from the extracted features and combines them for final decision-making. This corresponds to the step in the table where the network connects all features to the output neurons.
  6. Output Layer: The softmax activation function produces probabilistic outputs (e.g., zebra = 0.7). This matches the table’s output layer step, where softmax is applied to determine the class probabilities.
  7. Backpropagation: Although not shown in the image, backpropagation occurs after the output is generated, updating the network’s weights based on the error. This step is essential during training and is listed in the table as well.

Now Is Time for Code Part Practice of the CNN which we explained the theory so far

CNN in PyTorch Example

The following example demonstrates a fully functional implementation of ResNet-18, a type of Convolutional Neural Network (CNN), in PyTorch. This model is used to classify images from the CIFAR-10 dataset into 10 categories. The code includes all necessary steps, such as data loading, model definition, training, and evaluation:


import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# Step 1: Define the Basic Block (Residual Block) used in ResNet
# Residual blocks allow gradients to flow directly through skip connections, reducing the vanishing gradient problem.
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channels, out_channels, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )

    def forward(self, x):
        out = torch.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)  # Skip connection
        out = torch.relu(out)
        return out

# Step 2: Define the ResNet model
# The ResNet architecture stacks multiple residual blocks for deep feature extraction.
class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        self.linear = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, out_channels, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks - 1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_channels, out_channels, stride))
            self.in_channels = out_channels * block.expansion
        return nn.Sequential(*layers)

    def forward(self, x):
        out = torch.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = torch.nn.functional.avg_pool2d(out, 4)  # Global average pooling
        out = torch.flatten(out, 1)
        out = self.linear(out)  # Fully connected layer
        return out

# Step 3: Load and preprocess the CIFAR-10 dataset
transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),  # Data augmentation
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)

# Step 4: Initialize the model, loss function, and optimizer
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = ResNet(BasicBlock, [2, 2, 2, 2]).to(device)  # ResNet-18
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)

# Step 5: Training loop
for epoch in range(100):  # Number of epochs
    model.train()
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(trainloader):
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        if i % 100 == 99:
            print(f'[Epoch {epoch + 1}, Batch {i + 1}] loss: {running_loss / 100:.3f}')
            running_loss = 0.0
    scheduler.step()

print('Training complete.')

# Step 6: Save the trained model
torch.save(model.state_dict(), 'resnet18_cifar10.pth')

# Step 7: Evaluate the model
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for inputs, labels in testloader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy on test set: {100 * correct / total:.2f}%')
  







CNN Implementation in MLX

Convolutional Neural Network (CNN) in MLX

This example demonstrates how to implement a CNN for the CIFAR-10 dataset using the MLX framework, optimized for Apple Silicon. MLX leverages Metal Performance Shaders (MPS) to accelerate training and inference.

Prerequisites

  • Apple Silicon Mac (M1/M2/M3).
  • MLX framework installed (MLX GitHub).
  • MLX data package installed.

Code Example

# train.py
import argparse
import time
import mlx.core as mx
import mlx.nn as nn
import mlx.optimizers as optim
import numpy as np
import dataset
import model

def loss_fn(model, X, y):
    predictions = model(X)
    loss = mx.mean(nn.losses.cross_entropy(predictions, y))
    acc = mx.mean(mx.argmax(predictions, axis=1) == y)
    return loss, acc

def eval_fn(model, X, y):
    predictions = model(X)
    acc = mx.mean(mx.argmax(predictions, axis=1) == y)
    return acc

def save_model(model, filename, epoch):
    params = model.parameters()
    np.savez(filename + f"_epoch{epoch}.npz", **params)

def main(args):
    mx.set_default_device(mx.gpu if not args.cpu else mx.cpu)
    mx.random.seed(args.seed)

    train_data, test_data = dataset.load_data(args.dataset)
    cnn = model.CNN(num_classes=10 if args.dataset == "CIFAR-10" else 100)
    optimizer = optim.Adam(cnn.parameters(), lr=args.lr)

    for e in range(args.epochs):
        tic = time.time()
        train_loss, train_acc, num_batches = 0.0, 0.0, 0

        for X_batch, y_batch in train_data.batch(args.batchsize):
            loss, acc = loss_fn(cnn, X_batch, y_batch)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            train_loss += loss.item()
            train_acc += acc.item()
            num_batches += 1

        train_loss /= num_batches
        train_acc /= num_batches

        test_acc = 0.0
        num_test_batches = 0
        for X_batch, y_batch in test_data.batch(args.batchsize):
            acc = eval_fn(cnn, X_batch, y_batch)
            test_acc += acc.item()
            num_test_batches += 1

        test_acc /= num_test_batches
        toc = time.time()

        print(f"Epoch {e}: Loss {train_loss:.5f} | Train Acc {train_acc:.3f} | Test Acc {test_acc:.3f} | Throughput {len(train_data)/(toc-tic):.2f} images/sec")
        save_model(cnn, "models/cnn_mlx", e)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Train a CNN on CIFAR-10 using MLX.")
    parser.add_argument("--cpu", action="store_true", help="Use CPU instead of GPU.")
    parser.add_argument("--seed", type=int, default=0, help="Random seed.")
    parser.add_argument("--batchsize", type=int, default=64, help="Batch size.")
    parser.add_argument("--epochs", type=int, default=30, help="Number of epochs.")
    parser.add_argument("--lr", type=float, default=0.001, help="Learning rate.")
    parser.add_argument("--dataset", type=str, default="CIFAR-10", choices=["CIFAR-10", "CIFAR-100"], help="Dataset.")
    args = parser.parse_args()
    main(args)

Additional Files

Here are the supporting modules for the above script:

# dataset.py
import mlx.data as data

def load_data(dataset_name):
    if dataset_name == "CIFAR-10":
        train_data = data.CIFAR10(train=True)
        test_data = data.CIFAR10(train=False)
    elif dataset_name == "CIFAR-100":
        train_data = data.CIFAR100(train=True)
        test_data = data.CIFAR100(train=False)
    else:
        raise ValueError("Unsupported dataset.")
    return train_data, test_data
# model.py
import mlx.nn as nn

class CNN(nn.Module):
    def __init__(self, num_classes):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.relu = nn.ReLU()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(256 * 4 * 4, 1024)
        self.fc2 = nn.Linear(1024, num_classes)

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = self.pool(self.relu(self.conv3(x)))
        x = self.flatten(x)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

Explanation

The code demonstrates how to:

  • Load CIFAR-10 or CIFAR-100 datasets using the MLX data library.
  • Define a CNN with three convolutional layers, max-pooling, and fully connected layers.
  • Train the model using Adam optimizer and evaluate accuracy on the test dataset.
  • Save model parameters for each epoch.


As a Perfect Example of CNN into an iOS app, Check the INGOAMPT app called :background img remove INGOAMPT ; Discover how the Ingoampt app leverages CNN deep learning through Apple’s Core ML technology.
Wanna support INGOAMPT? Purchase any of our Apps to contribute & be part of our journey.  
Any Question? Send us an Email us at : email@ingoampt.com

To check Background Remove Image INGOAMPT App which is a very simple CNN integration into iOS app BUT a perfect example of CNN integration, Click Here
.

This article is part of a series. To continue reading, check out the next day’s article. (day 54) which we explain mathematic behind CNN:

Day 54: Mathematics Behind CNN in Deep Learning
.

don't miss our new posts. Subscribe for updates

We don’t spam! Read our privacy policy for more info.