Comparing Deep Learning on TensorFlow (Keras), PyTorch, and Apple’s MLX Deep learning frameworks such as TensorFlow (Keras), PyTorch, and Apple’s MLX offer powerful tools to build and train machine learning models. Despite solving similar problems, these frameworks have different philosophies, APIs, and optimizations under the hood. In this post, we will examine how the same model is implemented on each platform and why the differences in code arise, especially focusing on why MLX is more similar to PyTorch than TensorFlow. 1. Model in PyTorch PyTorch is known for giving developers granular control over model-building and training processes. The framework encourages writing custom training loops, making it highly flexible, especially for research purposes. PyTorch Code: import torch import torch.optim as optim import torch.nn as nn import torch.nn.functional as F # Define a simple model class SimpleModel(nn.Module): def __init__(self): super(SimpleModel, self).__init__() self.fc1 = nn.Linear(32, 64) self.fc2 = nn.Linear(64, 10) def forward(self, x): x = F.relu(self.fc1(x)) return self.fc2(x) # Initialize the model, optimizer, and loss function model = SimpleModel() optimizer = optim.SGD(model.parameters(), lr=0.01) criterion = nn.CrossEntropyLoss() # Training loop for epoch in range(10): for inputs, labels in train_loader: optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() # Gradient clipping torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) optimizer.step() What’s Happening Behind the Scenes in PyTorch? PyTorch gives the developer direct control over every step of the model training process. The training loop is written manually, where: Forward pass: Defined in the forward() method, explicitly computing the output layer by layer. Backward pass: After calculating the loss, the gradients are computed using loss.backward(). Gradient updates: The optimizer manually updates the weights after each batch using optimizer.step(). This manual training loop allows researchers and developers to experiment with unconventional architectures or optimization methods. The gradient clipping function torch.nn.utils.clip_grad_norm_ prevents exploding gradients during backpropagation. 2. Model in TensorFlow (Keras) TensorFlow with Keras abstracts many of the low-level operations you see in PyTorch. Its primary goal is to make model-building and training easy and fast, reducing the amount of boilerplate code. TensorFlow (Keras) Code: import tensorflow as tf from tensorflow.keras import layers, models, optimizers # Define a simple model model = models.Sequential([ layers.Dense(64, activation=’relu’, input_shape=(32,)), layers.Dense(10) ]) # Compile the model with gradient clipping optimizer = optimizers.SGD(learning_rate=0.01, clipnorm=1.0) model.compile(optimizer=optimizer, loss=’sparse_categorical_crossentropy’) # Train the model model.fit(X_train, y_train, epochs=10, batch_size=32) What’s Happening Behind the Scenes in TensorFlow (Keras)? TensorFlow abstracts much of the model-building process, particularly: Automatic Training Loop: The model.fit() method handles the forward pass, backpropagation, and gradient updates. This makes the code much more concise. Gradient Clipping: Can be added directly into the optimizer with clipnorm, without needing explicit control of the gradients as in PyTorch. TensorFlow’s design philosophy is to reduce complexity for users by providing higher-level APIs. However, this abstraction means that developers may have less control over the finer details of the training process. TensorFlow is often preferred for production environments or when rapid prototyping is required. 3. Model in MLX (Apple Silicon) MLX is Apple’s machine learning framework designed specifically for Apple Silicon (M1/M2) hardware. While it shares many similarities with PyTorch, it is optimized to take full advantage of the unified memory architecture and metal-accelerated computations of Apple devices. MLX Code: import mlx.core as mx import mlx.nn as nn import mlx.optimizers as optim # Define a simple model class SimpleModel(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(32, 64) self.fc2 = nn.Linear(64, 10) def __call__(self, x): x = mx.maximum(self.fc1(x), 0) # ReLU activation return self.fc2(x) # Initialize the model model = SimpleModel() # Define optimizer and loss function optimizer = optim.SGD(learning_rate=0.01) loss_fn = nn.losses.cross_entropy # Training loop for epoch in range(10): for inputs, labels in train_loader: outputs = model(inputs) loss = loss_fn(outputs, labels) grads = nn.value_and_grad(model, loss_fn)(model, inputs, labels) optimizer.update(model, grads) What’s Happening Behind the Scenes in MLX? MLX is conceptually similar to PyTorch, with a manual training loop and explicit model-building, but it’s optimized for Apple…
Thank you for reading this post, don't forget to subscribe!