Introduction to Deep Learning and Neural Networks with a Focus on Perceptrons
Deep Learning is a subset of machine learning that uses neural networks with many layers (hence "deep") to model and understand complex patterns in data. These networks are inspired by the human brain and are particularly powerful for tasks like image and speech recognition.
Neural Networks consist of interconnected layers of nodes, or neurons. Each neuron receives input, processes it, and passes it to the next layer. The simplest form of a neural network is the Perceptron, which is a single-layer neural network used for binary classification tasks.
Perceptron Explained
A Perceptron is a fundamental unit of a neural network, performing binary classification by making predictions based on a linear predictor function. It works by:
- Receiving Input: Taking input features \( x_1, x_2, \ldots, x_n \).
- Weight Multiplication: Multiplying each input by a corresponding weight \( w_1, w_2, \ldots, w_n \).
- Summation: Summing the weighted inputs and adding a bias term \( b \).
- Activation Function: Passing the result through an activation function (typically a step function for a perceptron).
The mathematical formula for a perceptron can be written as:
$$ y = f\left(\sum_{i=1}^{n} w_i x_i + b\right) $$
where \( f \) is the activation function.
Training a Perceptron
Training involves adjusting the weights and bias to minimize classification errors on the training data. This is typically done using algorithms like the Perceptron Learning Algorithm, which updates weights based on the error in predictions.
Mathematical Foundations
Linear Predictor
$$ z = \sum_{i=1}^{n} w_i x_i + b $$
Purpose: Calculates the weighted sum of inputs and bias, determining the linear combination that will be used for prediction.
Activation Function
$$ y = f(z) $$
where \( f(z) \) is often a step function:
$$ f(z) = \begin{cases} 1 & \text{if } z \geq 0 \\ 0 & \text{if } z < 0 \end{cases} $$
Purpose: Transforms the linear predictor into a binary output, classifying the input as either 1 or 0.
Weight Update Rule
$$ w_i \leftarrow w_i + \Delta w_i $$
where \( \Delta w_i = \eta (t - y) x_i \) with \( \eta \) being the learning rate, \( t \) the target value, and \( y \) the predicted value.
Purpose: Adjusts the weights to reduce the error in future predictions. This iterative process helps the perceptron learn from the training data.
Iterative Process
Each iteration involves the following steps:
- Prediction: Calculate the linear predictor \( z \) and apply the activation function to obtain the predicted output \( \hat{y} \).
- Error Calculation: Determine the difference between the actual target \( t \) and the predicted output \( \hat{y} \).
- Weight Update: Adjust the weights and bias based on the error. This helps in refining the decision boundary that the perceptron uses to classify the input data.
How Perceptron Helps the Algorithm
- Learning from Data: The perceptron adjusts its weights based on the input data and corresponding labels, learning to classify the data correctly over iterations.
- Updating Weights: The weight update rule ensures that the perceptron corrects its mistakes, moving towards a model that can generalize well to unseen data.
- Binary Classification: By using a step function as the activation function, the perceptron can classify inputs into two categories, making it useful for binary classification tasks.
Perception and mathematics behind it is explained below in 3 Practical ways:
Perceptrons form the building blocks for more complex neural networks used in deep learning. Understanding their working and mathematical foundation is crucial for delving into more advanced topics in neural networks and deep learning.
For a detailed exploration and understand math behind it, Continue reading our article below.
Here is the 3 ways of :
1- Manual Perception Training code which is without importing perception
2- Algorithm with Importing Perception
3- Mathematic behind Perception to show what’s the mathematic behind Perception without any code in detail
We showed all these these steps therefor, you can have a deep understanding of Perception and Mathematics behind it, check number 1, 2 and 3 below :
1- Manual Perceptron Training Code
Below is the code that performs manual updates and visualizes the decision boundary at each step:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
# Load the Iris dataset
iris = load_iris()
X = iris.data[:, (2, 3)] # Using petal length and petal width as features
y = (iris.target == 0).astype(int) # Binary target: Iris Setosa vs. others
# Initial parameters
weights = np.array([0.01, -0.02])
bias = 0.0
learning_rate = 0.1
# Function to calculate the net input
def net_input(X, weights, bias):
return np.dot(X, weights) + bias
# Function to apply the activation function (step function)
def predict(X, weights, bias):
return np.where(net_input(X, weights, bias) >= 0.0, 1, 0)
# Function to plot the decision boundary
def plot_decision_boundary(weights, bias, X, y, iteration):
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
Z = predict(np.c_[xx.ravel(), yy.ravel()], weights, bias)
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.Paired)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel("Petal Length (cm)")
plt.ylabel("Petal Width (cm)")
plt.title(f"Decision Boundary after Iteration {iteration}")
plt.show()
# Training loop with detailed logging and plotting
def manual_perceptron_train(X, y, weights, bias, learning_rate, num_iterations):
for iteration in range(num_iterations):
print(f"Iteration {iteration + 1}")
for xi, target in zip(X, y):
z = net_input(xi, weights, bias)
y_hat = predict(xi.reshape(1, -1), weights, bias)[0]
error = target - y_hat
print(f"Data Point: {xi}, True Label: {target}, Net Input: {z}, Predicted: {y_hat}, Error: {error}")
if error != 0:
weights += learning_rate * error * xi
bias += learning_rate * error
print(f"Updated Weights: {weights}, Updated Bias: {bias}")
plot_decision_boundary(weights, bias, X, y, iteration + 1)
print(f"Weights: {weights}, Bias: {bias}")
# Ensure both classes are present in the subset for training
X_train = np.array([[2.5, 1.5], [1.0, 0.5]])
y_train = np.array([1, 0])
# Train the perceptron and plot decision boundaries
manual_perceptron_train(X_train, y_train, weights, bias, learning_rate, 3)
This code demonstrates how the weights and bias are updated at each step based on the perceptron learning rule. The decision boundary is plotted after each iteration to visualize how it evolves as the model learns from the data.
We hope this detailed walkthrough of the perceptron training algorithm and manual implementation helps you understand the inner workings of this fundamental machine learning model.
so lets see the results :
2- Algorithm with Importing Perception
This part demonstrates the Perceptron training using sklearn’s Perceptron class. This approach abstracts the manual weight updates and leverages sklearn’s built-in methods.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron
# Load the Iris dataset
iris = load_iris()
X = iris.data[:, (2, 3)] # Using petal length and petal width as features
y = (iris.target == 0).astype(int) # Binary target: Iris Setosa vs. others
# Initialize the Perceptron model with warm_start=True to retain weights and bias across iterations
per_clf = Perceptron(max_iter=1, eta0=0.1, random_state=42, tol=None, warm_start=True)
# Manually set the initial weights and bias after the first fit
initial_weights = np.array([0.01, -0.02])
initial_bias = np.array([0.0])
# Function to plot the decision boundary
def plot_decision_boundary(model, X, y, iteration):
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.Paired)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel("Petal Length (cm)")
plt.ylabel("Petal Width (cm)")
plt.title(f"Perceptron Decision Boundary after Iteration {iteration}")
plt.show()
# Train the Perceptron for multiple iterations and plot decision boundaries
def perceptron_train(X, y, model, num_iterations):
for iteration in range(num_iterations):
print(f"Iteration {iteration + 1}")
if iteration == 0:
# Set initial weights and bias manually after the first fit
model.fit(X, y)
model.coef_ = initial_weights.reshape(1, -1)
model.intercept_ = initial_bias
else:
model.partial_fit(X, y, classes=np.array([0, 1]))
plot_decision_boundary(model, X, y, iteration + 1)
print(f"Weights: {model.coef_}, Bias: {model.intercept_}")
# Ensure both classes are present in the subset for training
X_train = np.array([[2.5, 1.5], [1.0, 0.5]])
y_train = np.array([1, 0])
# Train the perceptron and plot decision boundaries
perceptron_train(X_train, y_train, per_clf, 3)
The above code demonstrates the Perceptron training process using sklearn’s Perceptron class, with manual initialization of weights and bias, and plotting the decision boundary after each iteration. This approach simplifies the implementation while retaining the core functionality of the manual process.
Here is the results of the code :
3- Mathematic Behind the code with imported Perception
Initial weights: \( \mathbf{w} = [0.01, -0.02] \)
Initial bias: \( b = 0.0 \)
Learning rate: \( \eta = 0.1 \)
Iteration 1
First Training Instance \([2.5, 1.5]\) with Label \( 1 \)
Prediction:
\[ z = 0.01 \cdot 2.5 + (-0.02) \cdot 1.5 + 0 = 0.025 - 0.03 = -0.005 \]
\[ \hat{y} = 0 \quad (\text{since } z < 0) \]
Weight Update:
\[ \mathbf{w} \leftarrow \mathbf{w} + \eta (y - \hat{y}) \mathbf{x} = [0.01, -0.02] + 0.1 \cdot (1 - 0) \cdot [2.5, 1.5] \]
\[ \mathbf{w} \leftarrow [0.01, -0.02] + [0.25, 0.15] = [0.26, 0.13] \]
\[ b \leftarrow b + \eta (y - \hat{y}) = 0 + 0.1 \cdot (1 - 0) = 0.1 \]
Second Training Instance \([1.0, 0.5]\) with Label \( 0 \)
Prediction:
\[ z = 0.26 \cdot 1.0 + 0.13 \cdot 0.5 + 0.1 = 0.26 + 0.065 + 0.1 = 0.425 \]
\[ \hat{y} = 1 \quad (\text{since } z \geq 0) \]
Weight Update:
\[ \mathbf{w} \leftarrow \mathbf{w} + \eta (y - \hat{y}) \mathbf{x} = [0.26, 0.13] + 0.1 \cdot (0 - 1) \cdot [1.0, 0.5] \]
\[ \mathbf{w} \leftarrow [0.26, 0.13] + [-0.1, -0.05] = [0.16, 0.08] \]
\[ b \leftarrow b + \eta (y - \hat{y}) = 0.1 + 0.1 \cdot (0 - 1) = 0.1 - 0.1 = 0 \]
Results after Iteration 1:
Iteration | Weights | Bias |
---|---|---|
1 | [0.16, 0.08] | 0.0 |
Iteration 2
First Training Instance \([2.5, 1.5]\) with Label \( 1 \)
Prediction:
\[ z = 0.16 \cdot 2.5 + 0.08 \cdot 1.5 + 0 = 0.4 + 0.12 = 0.52 \]
\[ \hat{y} = 1 \quad (\text{since } z \geq 0) \]
Weight Update:
No update since \( \hat{y} = y \)
Second Training Instance \([1.0, 0.5]\) with Label \( 0 \)
Prediction:
\[ z = 0.16 \cdot 1.0 + 0.08 \cdot 0.5 + 0 = 0.16 + 0.04 = 0.2 \]
\[ \hat{y} = 1 \quad (\text{since } z \geq 0) \]
Weight Update:
\[ \mathbf{w} \leftarrow \mathbf{w} + \eta (y - \hat{y}) \mathbf{x} = [0.16, 0.08] + 0.1 \cdot (0 - 1) \cdot [1.0, 0.5] \]
\[ \mathbf{w} \leftarrow [0.16, 0.08] + [-0.1, -0.05] = [0.06, 0.03] \]
\[ b \leftarrow b + \eta (y - \hat{y}) = 0 + 0.1 \cdot (0 - 1) = -0.1 \]
Results after Iteration 2:
Iteration | Weights | Bias |
---|---|---|
2 | [0.06, 0.03] | -0.1 |
Iteration 3
First Training Instance \([2.5, 1.5]\) with Label \( 1 \)
Prediction:
\[ z = 0.06 \cdot 2.5 + 0.03 \cdot 1.5 - 0.1 = 0.15 + 0.045 - 0.1 = 0.095 \]
\[ \hat{y} = 1 \quad (\text{since } z \geq 0) \]
Weight Update:
No update since \( \hat{y} = y \)
Second Training Instance \([1.0, 0.5]\) with Label \( 0 \)
Prediction:
\[ z = 0.06 \cdot 1.0 + 0.03 \cdot 0.5 - 0.1 = 0.06 + 0.015 - 0.1 = -0.025 \]
\[ \hat{y} = 0 \quad (\text{since } z < 0) \]
Weight Update:
No update since \( \hat{y} = y \)
Results after Iteration 3:
Iteration | Weights | Bias |
---|---|---|
3 | [0.06, 0.03] | -0.1 |
Summary and Final Results
The Perceptron has undergone three iterations. Below are the weights and bias at each iteration:
Iteration | Weights (\( \mathbf{w} \)) | Bias (\( b \)) |
---|---|---|
1 | \([0.16, 0.08]\) | \(0.0\) |
2 | \([0.06, 0.03]\) | \(-0.1\) |
3 | \([0.06, 0.03]\) | \(-0.1\) |
Now lets Visualise the table :
import numpy as np
import matplotlib.pyplot as plt
# Define the weights and biases from the table
iterations = [1, 2, 3]
weights = np.array([[0.16, 0.08], [0.06, 0.03], [0.06, 0.03]])
biases = [0.0, -0.1, -0.1]
# Define the training data
X_train = np.array([[2.5, 1.5], [1.0, 0.5]])
y_train = np.array([1, 0])
# Function to plot the decision boundary for a given iteration
def plot_decision_boundary(weights, bias, X_train, y_train, iteration):
x_min, x_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
y_min, y_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))
Z = np.dot(np.c_[xx.ravel(), yy.ravel()], weights) + bias
Z = np.where(Z >= 0, 1, 0).reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.Paired)
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel("Petal Length (cm)")
plt.ylabel("Petal Width (cm)")
plt.title(f"Decision Boundary after Iteration {iteration}")
plt.show()
# Plot the decision boundary for each iteration
for i in range(len(iterations)):
plot_decision_boundary(weights[i], biases[i], X_train, y_train, iterations[i])
Here is the result of drawing it :
You May Still Wonder, How Decision Boundary Is Drawn ?
The Only Queestion You Might Have Is How It Know How To Draw The Decision Boundary Line To Separate Data
Understanding the Decision Boundary in Perceptron Learning
Here, we'll explore how a perceptron learnt in our models above to classify data by adjusting its decision boundary through iterations. .
Concepts to Understand
- Decision Boundary: This is the line (or hyperplane in higher dimensions) where the perceptron classifier decides between two classes. It's defined as:
$\mathbf{w}^T \mathbf{x} + b = 0$
where $\mathbf{w}$ is the weight vector, $\mathbf{x}$ is the feature vector, and $b$ is the bias term. - Learning Process: The perceptron adjusts $\mathbf{w}$ and $b$ to minimize errors by updating them based on the classification results.
Initial Setup
- Data Points:
- Point 1: $\mathbf{x}_1 = [2.5, 1.5]$ with label $y_1 = 1$
- Point 2: $\mathbf{x}_2 = [1.0, 0.5]$ with label $y_2 = 0$
- Initial Weights and Bias:
- Weights: $\mathbf{w} = [0.01, -0.02]$
- Bias: $b = 0.0$
- Learning Rate ($\eta$): 0.1
Iteration 1
- For $\mathbf{x}_1$:
- Net Input Calculation:
$\text{Net Input} = 0.01 \cdot 2.5 - 0.02 \cdot 1.5 + 0.0 = 0.025 - 0.03 = -0.005$
- Prediction: Since $-0.005$ is less than 0, the perceptron predicts class 0.
- Actual Label: 1 (Error = 1 - 0 = 1)
- Update Weights and Bias:
$\mathbf{w} \leftarrow [0.01, -0.02] + 0.1 \cdot 1 \cdot [2.5, 1.5] = [0.01 + 0.25, -0.02 + 0.15] = [0.26, 0.13]$
$b \leftarrow 0.0 + 0.1 \cdot 1 = 0.1$
- Net Input Calculation:
- For $\mathbf{x}_2$:
- Net Input Calculation:
$\text{Net Input} = 0.26 \cdot 1.0 + 0.13 \cdot 0.5 + 0.1 = 0.26 + 0.065 + 0.1 = 0.425$
- Prediction: Since $0.425$ is greater than 0, the perceptron predicts class 1.
- Actual Label: 0 (Error = 0 - 1 = -1)
- Update Weights and Bias:
$\mathbf{w} \leftarrow [0.26, 0.13] + 0.1 \cdot (-1) \cdot [1.0, 0.5] = [0.26 - 0.1, 0.13 - 0.05] = [0.16, 0.08]$
$b \leftarrow 0.1 + 0.1 \cdot (-1) = 0.0$
- Net Input Calculation:
Decision Boundary After Iteration 1:
Equation: $0.16 \cdot x_1 + 0.08 \cdot x_2 + 0.0 = 0$
Rearranged: $x_2 = -\frac{0.16}{0.08} x_1 = -2 x_1$
Iteration 2
- For $\mathbf{x}_1$:
- Net Input Calculation:
$\text{Net Input} = 0.16 \cdot 2.5 + 0.08 \cdot 1.5 + 0.0 = 0.40 + 0.12 = 0.52$
- Prediction: Since $0.52$ is greater than 0, the perceptron predicts class 1.
- Actual Label: 1 (No Error)
- Net Input Calculation:
- For $\mathbf{x}_2$:
- Net Input Calculation:
$\text{Net Input} = 0.16 \cdot 1.0 + 0.08 \cdot 0.5 + 0.0 = 0.16 + 0.04 = 0.20$
- Prediction: Since $0.20$ is greater than 0, the perceptron predicts class 1.
- Actual Label: 0 (Error = 0 - 1 = -1)
- Update Weights and Bias:
$\mathbf{w} \leftarrow [0.16, 0.08] + 0.1 \cdot (-1) \cdot [1.0, 0.5] = [0.16 - 0.1, 0.08 - 0.05] = [0.06, 0.03]$
$b \leftarrow 0.0 + 0.1 \cdot (-1) = -0.1$
- Net Input Calculation:
Decision Boundary After Iteration 2:
Equation: $0.06 \cdot x_1 + 0.03 \cdot x_2 - 0.1 = 0$
Rearranged: $x_2 = -\frac{0.06}{0.03} x_1 + \frac{0.1}{0.03} = -2 x_1 + \frac{10}{3} \approx -2 x_1 + 3.33$
Iteration 3
- For $\mathbf{x}_1$:
- Net Input Calculation:
$\text{Net Input} = 0.06 \cdot 2.5 + 0.03 \cdot 1.5 - 0.1 = 0.15 + 0.045 - 0.1 = 0.095$
- Prediction: Since $0.095$ is greater than 0, the perceptron predicts class 1.
- Actual Label: 1 (No Error)
- Net Input Calculation:
- For $\mathbf{x}_2$:
- Net Input Calculation:
$\text{Net Input} = 0.06 \cdot 1.0 + 0.03 \cdot 0.5 - 0.1 = 0.06 + 0.015 - 0.1 = -0.019$
- Prediction: Since $-0.019$ is less than 0, the perceptron predicts class 0.
- Actual Label: 0 (No Error)
- Net Input Calculation:
Decision Boundary After Iteration 3:
Equation: $0.06 \cdot x_1 + 0.03 \cdot x_2 - 0.1 = 0$
Rearranged: $x_2 = -\frac{0.06}{0.03} x_1 + \frac{0.1}{0.03} = -2 x_1 + \frac{10}{3} \approx -2 x_1 + 3.33$
Conclusion
As we can see from the iterations, the perceptron adjusts its weights and bias to better fit the data. The decision boundary evolves through these adjustments:
- Iteration 1: Adjusts based on the errors from both data points, resulting in a boundary that may not perfectly separate the classes.
- Iteration 2: Further adjustments refine the boundary, moving closer to an optimal separation.
- Iteration 3: Shows that the decision boundary has stabilized with correct classifications for the given data points.
The decision boundary is defined by the equation $\mathbf{w}^T \mathbf{x} + b = 0$. In this case, it separates the two classes (Iris Setosa vs. others) by adjusting weights and bias based on errors from the training data.
Additional Notes :
Role of the Activation Function
Net Input Calculation:
For each input data point \( \mathbf{x}_i \), the perceptron calculates a net input \( z_i \) as:
\( z_i = \mathbf{w} \cdot \mathbf{x}_i + b \)
This net input is a continuous value.
Binary Classification:
The perceptron is a binary classifier, meaning it needs to classify each data point into one of two categories (e.g., 0 or 1).
The continuous net input \( z_i \) needs to be converted into a discrete class label.
Binary Step Activation Function:
The binary step activation function helps in this conversion:
\( y_{\text{pred}} = \begin{cases} 1 & \text{if } z_i \geq 0 \\ 0 & \text{if } z_i < 0 \end{cases} \)
It outputs 1 if the net input is greater than or equal to zero and 0 otherwise.
This function effectively creates a decision boundary at \( z_i = 0 \).
Relationship to Target Labels \( y \)
The target labels \( y \) in your dataset indicate the true class of each data point (\( y = 1 \) or \( y = 0 \)).
The binary step activation function ensures that the perceptron’s predictions are in the same format as these target labels.
Guiding the Training Process:
During training, the perceptron adjusts its weights and bias to minimize the difference between the predicted labels \( y_{\text{pred}} \) and the true labels \( y \).
The binary step function's output provides clear feedback on whether the prediction is correct or not, guiding the weight updates.
Why Use the Binary Step Activation Function?
- Simplicity: The binary step activation function is simple and computationally efficient. It directly maps the net input to the desired binary output.
- Creating a Decision Boundary: The function defines a clear decision boundary where \( \mathbf{w} \cdot \mathbf{x} + b = 0 \). This boundary separates the feature space into two regions: one for each class.
- Ensuring Discrete Outputs: Since the perceptron is a binary classifier, it needs to output discrete class labels. The binary step function ensures that the outputs are 0 or 1, matching the format of the target labels.
Example Walkthrough
Consider again the example with data points and corresponding labels:
- Data points: \( \mathbf{x}_1 = [2.5, 1.5] \) and \( \mathbf{x}_2 = [1.0, 0.5] \)
- Labels: \( y_1 = 1 \) and \( y_2 = 0 \)
Initial Prediction for \( \mathbf{x}_1 \):
Net input: \( z_1 = 0.01 \cdot 2.5 + (-0.02) \cdot 1.5 + 0 = -0.005 \)
Binary step activation function: \( y_{\text{pred}} = 0 \) (since \( z_1 < 0 \))
Initial Prediction for \( \mathbf{x}_2 \):
Net input: \( z_2 = 0.26 \cdot 1 + 0.13 \cdot 0.5 + 0.1 = 0.425 \)
Binary step activation function: \( y_{\text{pred}} = 1 \) (since \( z_2 \geq 0 \))
Conclusion
The binary step activation function is crucial in a perceptron for:
- Converting the continuous net input into discrete class labels.
- Establishing a clear decision boundary.
- Ensuring predictions match the format of the target labels, guiding the training process to minimize classification errors.
By using this activation function, the perceptron can effectively learn to classify data points into the correct binary categories, adjusting its decision boundary as needed based on the training data.