Machine Learning Overview

Day 18 _ Automatic vs Manual optimisation in Keras

First check automatic – keras tuner – which is explained in our previous post






Automated Hyperparameter Tuning in Keras

Part 1: Automated Approaches for Hyperparameter Tuning in Keras

Hyperparameter tuning is a crucial step in machine learning that involves finding the best set of parameters for your model to optimize its performance. Keras provides a robust toolset for this purpose through its KerasTuner library, which offers several powerful, automated methods to explore the hyperparameter space. In this section, we’ll dive into the different models and approaches available in Keras for automated hyperparameter tuning, updated with the latest in 2024.

1. Random Search

Random search is one of the simplest and most straightforward hyperparameter tuning methods. It works by randomly sampling hyperparameter combinations from the predefined search space. Despite its simplicity, random search can be surprisingly effective, especially when combined with a well-chosen search space. It’s often used as a baseline method due to its ease of implementation and ability to explore diverse regions of the hyperparameter space.

tuner = kt.RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=10,
    executions_per_trial=2,
    directory='random_search_dir',
    project_name='random_search'
)
tuner.search(x_train, y_train, epochs=5, validation_data=(x_val, y_val))

Here, max_trials defines the number of different hyperparameter combinations to try, while executions_per_trial allows for multiple runs to average out random variations.

2. Bayesian Optimization

Bayesian Optimization is a more sophisticated approach that models the objective function to predict which hyperparameters might perform well, based on the results of previous trials. This method balances exploration of new areas of the hyperparameter space with exploitation of known good areas, making it more efficient than random search.

tuner = kt.BayesianOptimization(
    build_model,
    objective='val_loss',
    max_trials=10,
    directory='bayesian_optimization_dir',
    project_name='bayesian_optimization'
)
tuner.search(x_train, y_train, epochs=10, validation_data=(x_val, y_val))

This method is particularly useful when the search space is large or when computational resources are limited, as it tends to converge to good solutions faster than random search.

3. Hyperband

Hyperband is a method designed to optimize the resource allocation for hyperparameter tuning. It starts by evaluating many configurations with a small budget (e.g., a few epochs) and progressively allocates more resources (e.g., more epochs) to the most promising configurations. This makes Hyperband particularly effective when training is expensive, as it avoids wasting resources on poor configurations.

tuner = kt.Hyperband(
    build_model,
    objective='val_accuracy',
    max_epochs=20,
    factor=3,
    directory='hyperband_dir',
    project_name='hyperband'
)
tuner.search(x_train, y_train, epochs=5, validation_data=(x_val, y_val))

Hyperband is known for its ability to efficiently handle large hyperparameter spaces by dynamically allocating resources based on performance.

4. Sklearn Tuner

For users integrating Scikit-learn models with Keras, the Sklearn tuner allows hyperparameter tuning within the familiar KerasTuner framework. This tuner is especially useful when you have a hybrid workflow that involves both Scikit-learn and Keras models.

tuner = kt.SklearnTuner(
    model_builder,
    objective='accuracy',
    max_trials=10,
    directory='sklearn_tuner_dir',
    project_name='sklearn_tuner'
)
tuner.search(x_train, y_train, validation_data=(x_val, y_val))

This tuner provides flexibility in tuning a wide range of models beyond just those built with Keras.

5. Gemma Models with LoRA API (2024 Update)

One of the latest advancements in 2024 is the introduction of Gemma models, a family of lightweight large language models (LLMs) that can be fine-tuned with Keras using the new LoRA (Low Rank Adaptation) API. This API enables parameter-efficient fine-tuning by drastically reducing the number of trainable parameters without compromising performance.

gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")
gemma_lm.backbone.enable_lora(rank=4)
gemma_lm.generate("Keras is a", max_length=32)

This approach is ideal for deploying large models on resource-constrained environments like mobile devices or for applications where rapid fine-tuning is essential.

Conclusion

Automated hyperparameter tuning in Keras has evolved significantly, offering tools that range from simple random search methods to complex Bayesian optimization and cutting-edge approaches like the LoRA API for fine-tuning large language models. These tools empower developers to efficiently explore the hyperparameter space and build optimized models with minimal manual intervention.



As next, Let’s check the Manual Tuning :






Manual Hyperparameter Tuning in Keras

Part 2: Manual Hyperparameter Tuning Strategies

While automated hyperparameter tuning in Keras offers powerful and efficient ways to optimize models, there are situations where manual tuning can be invaluable. This approach allows for a deep understanding of how different parameters affect the model and can be particularly useful when dealing with smaller datasets, specific constraints, or when you need fine-grained control over the model’s performance.

1. Manual Grid Search

Explanation

Grid search involves creating a ‘grid’ of hyperparameter values and exhaustively testing every possible combination. It is a systematic and thorough approach, which guarantees finding the best combination within the predefined grid. However, it can be computationally expensive, especially when the grid is large.

Example

Let’s say we’re tuning an SVM model. We want to test specific values of C and gamma based on our prior knowledge.

Manual Code Example

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load a dataset
X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define manually selected parameters
C_values = [0.1, 1, 10]
gamma_values = [0.01, 0.1]

best_score = 0
best_params = {}

# Manual grid search
for C in C_values:
    for gamma in gamma_values:
        # Train the model with selected hyperparameters
        svm = SVC(C=C, gamma=gamma)
        svm.fit(X_train, y_train)
        y_pred = svm.predict(X_test)
        
        # Evaluate the model
        score = accuracy_score(y_test, y_pred)
        print(f"C: {C}, Gamma: {gamma}, Accuracy: {score}")
        
        # Update the best parameters if current score is higher
        if score > best_score:
            best_score = score
            best_params = {'C': C, 'gamma': gamma}

print(f'Best Score: {best_score}')
print(f'Best Parameters: {best_params}')

Resource: For more on Grid Search, see Analytics Vidhya’s guide.

2. Sequential Tuning

Explanation

Sequential tuning involves tuning one hyperparameter at a time. This approach allows you to see the impact of each hyperparameter independently before moving on to the next.

Example

For a neural network, start by finding the optimal learning rate, then adjust the number of neurons in the hidden layers.

Manual Code Example

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train.reshape(-1, 784), X_test.reshape(-1, 784)
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

# Define a simple neural network model
def build_model(learning_rate, neurons):
    model = Sequential([
        Dense(neurons, input_shape=(784,), activation='relu'),
        Dense(10, activation='softmax')
    ])
    model.compile(optimizer=Adam(learning_rate=learning_rate), 
                  loss='categorical_crossentropy', 
                  metrics=['accuracy'])
    return model

# Step 1: Tune learning rate
learning_rates = [0.01, 0.001, 0.0001]
best_score = 0
best_lr = None

for lr in learning_rates:
    model = build_model(lr, neurons=64)
    model.fit(X_train, y_train, epochs=5, batch_size=128, verbose=0)
    score = model.evaluate(X_test, y_test, verbose=0)[1]
    print(f"Learning Rate: {lr}, Accuracy: {score}")
    
    if score > best_score:
        best_score = score
        best_lr = lr

print(f'Best Learning Rate: {best_lr}')

# Step 2: Tune number of neurons using the best learning rate
neuron_options = [32, 64, 128]
best_neurons = None

for neurons in neuron_options:
    model = build_model(best_lr, neurons=neurons)
    model.fit(X_train, y_train, epochs=5, batch_size=128, verbose=0)
    score = model.evaluate(X_test, y_test, verbose=0)[1]
    print(f"Neurons: {neurons}, Accuracy: {score}")
    
    if score > best_score:
        best_score = score
        best_neurons = neurons

print(f'Best Neurons: {best_neurons}')

Resource: Learn more about sequential model-based optimization in Neptune.ai’s hyperparameter tuning guide.

3. Manual Random Sampling

Explanation

Manual random sampling involves manually picking random values within a specified range for the hyperparameters and testing them. The practitioner decides which ranges to explore and manually checks the results.

Example

You might randomly select different values for the number of trees and the maximum depth in a Random Forest model.

Manual Code Example

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load the dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Manually picked random values for hyperparameters
n_estimators_values = [np.random.randint(50, 200) for _ in range(3)]
max_depth_values = [np.random.randint(5, 20) for _ in range(3)]
best_score = 0
best_params = {}

# Manual random sampling
for n_estimators in n_estimators_values:
    for max_depth in max_depth_values:
        # Train the model
        clf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, random_state=42)
        clf.fit(X_train, y_train)
        y_pred = clf.predict(X_test)
        
        # Evaluate the model
        score = accuracy_score(y_test, y_pred)
        print(f"n_estimators: {n_estimators}, max_depth: {max_depth}, Accuracy: {score}")
        
        # Update best parameters
        if score > best_score:
            best_score = score
            best_params = {'n_estimators': n_estimators, 'max_depth': max_depth}

print(f'Best Score: {best_score}')
print(f'Best Parameters: {best_params}')

Resource: Explore the fundamentals of random search in this paper by James Bergstra and Yoshua Bengio.

4. Learning Curves and Validation

Explanation

This approach involves plotting the model’s performance over time and manually interpreting the results to make decisions about hyperparameter adjustments. You might stop training early if the validation loss stops improving, or adjust other hyperparameters like the learning rate.

Example

You could manually plot learning curves after each epoch to monitor how the model is performing on training and validation datasets.

Manual Code Example

import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import History

# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train, X_test = X_train.reshape(-1, 784), X_test.reshape(-1, 784)
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

# Define a simple neural network model
model = Sequential([
    Dense(64, input_shape=(784,), activation='relu'),
    Dense(10, activation='softmax')
])

model.compile(optimizer=Adam(learning_rate=0.001), 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

history = History()

# Train the model and store the history
model.fit(X_train, y_train, epochs=20, batch_size=128, 
          validation_data=(X_test, y_test), 
          callbacks=[history], verbose=0)

# Plot the learning curves
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Learning Curves')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

Resource: For more details on interpreting learning curves, check out this article on Machine Learning Mastery.

Conclusion

Manual hyperparameter tuning, though time-consuming, allows for greater flexibility and insight into how each parameter impacts the model’s performance. By systematically applying strategies like grid search, sequential tuning, random sampling, and careful monitoring of learning curves, you can achieve optimal performance for your models. To deepen your understanding, consider exploring the resources linked above.



Theoretically explain:

Deep Learning Theories and Strategies for Hyperparameter Tuning

Part 3: Deep Learning Theories and Strategies for Hyperparameter Tuning

In the previous sections, we focused on both automated and manual hyperparameter tuning methods. Automated approaches help streamline the tuning process using tools like KerasTuner, while manual tuning gives you control over each hyperparameter by tweaking them iteratively. However, these methods must be grounded in a solid understanding of how neural networks operate, especially when it comes to complex deep learning models.

In this part, we explore the theoretical foundations that support these tuning strategies. This includes understanding how the structure and depth of neural networks influence their ability to model complex functions and how you can leverage this understanding in practical tuning.

The Role of Network Depth and Parameter Efficiency

Deep neural networks (DNNs) are powerful because of their ability to model complex functions with high parameter efficiency. This means that deep networks can achieve better performance with fewer neurons compared to shallow networks, thanks to their hierarchical structure.

Key Concepts:

1. Parameter Efficiency

Explanation: Deep networks can model complex functions using exponentially fewer neurons than shallow networks. This efficiency comes from their ability to capture intricate patterns and hierarchical data structures.

Application: In practice, this means that even with fewer neurons, deep networks can outperform shallow networks on complex tasks, like image or speech recognition.

2. Hierarchical Learning

Explanation: Just as building a forest by copying and pasting branches and trees is more efficient than drawing each tree individually, deep networks reuse learned features across layers.

  • Lower Layers: Capture basic patterns (like edges or textures).
  • Intermediate Layers: Combine these basic patterns to form more complex shapes.
  • Higher Layers: Integrate these shapes into full objects or scenes.

Application: This understanding helps in designing network architectures that are capable of capturing the complexity of real-world data.

Transfer Learning: Reusing Knowledge

When facing a new but related task, deep networks can benefit from transfer learning—a strategy where the lower layers of a pre-trained model are reused to speed up training on the new task.

Explanation: Instead of training a new model from scratch, you start with a pre-trained model and only fine-tune the higher layers. For instance, if a model has been trained to recognize faces, you can reuse its lower layers when training it to recognize hairstyles, saving both time and computational resources.

Practical Example: You might train a model on a large dataset like ImageNet and then fine-tune it on a specific task like medical image analysis. The lower layers, which have already learned to detect edges, textures, and shapes, can be reused, allowing the model to adapt quickly to the new task.

Structuring Your Network: Layer Depth and Neuron Count

Designing a neural network involves deciding on the number of layers and the number of neurons per layer. Here’s how to approach this:

1. Start Simple

Strategy: Begin with a single hidden layer with a moderate number of neurons and gradually increase complexity as needed.

2. Increase Complexity Gradually

Strategy: If the task complexity increases, add more layers or neurons, but be cautious of overfitting. For complex tasks, deeper networks might be required, but this also increases the risk of overfitting.

3. Avoid Bottlenecks

Strategy: Ensure that no layer has too few neurons, which could create a bottleneck, limiting the model’s ability to learn effectively. A common strategy is to make the first hidden layer the largest, gradually reducing the size of subsequent layers.

Vincent Vanhoucke’s Analogy: Think of the first hidden layer as stretch pants that can adjust to different sizes, accommodating various learning requirements. This flexibility helps avoid bottleneck layers that could hinder the network’s performance.

Tuning Hyperparameters: Learning Rate, Batch Size, and More

After determining the structure of your network, optimizing other hyperparameters like learning rate and batch size is crucial for enhancing model performance:

1. Learning Rate

Explanation: The learning rate controls how much to change the model in response to the estimated error each time the model weights are updated. An optimal learning rate is typically around half of the maximum rate at which the model begins to diverge.

Strategy: Start with a very low learning rate and gradually increase it, observing the loss to identify the optimal point.

2. Batch Size

Explanation: The batch size is the number of training examples utilized in one iteration. Larger batch sizes can accelerate training by leveraging hardware accelerators more efficiently, but smaller batch sizes often lead to better generalization.

Conclusion

In this part, we bridged the gap between theory and practice by discussing how the architecture of neural networks impacts their ability to learn complex functions. Understanding these principles not only helps in setting up more effective automated and manual tuning strategies but also in building a deep learning model architecture that inherently suits the complexity of your task.

With these foundational insights, you’re better equipped to fine-tune your models, knowing exactly how different hyperparameters and architectural choices interact to determine your model’s performance.