Machine Learning Overview

Generative Adversarial Network (GANs) Deep Learning – day 75

Exploring the Evolution of GANs: From DCGANs to StyleGANs

Generative Adversarial Networks (GANs) have revolutionized the field of image generation by allowing us to create realistic images from random noise. Over the years, the basic architecture of GANs has undergone significant enhancements, resulting in more stable and higher-quality image generation. In this post, we will dive deep into three key stages of GAN development: Deep Convolutional GANs (DCGANs), Progressive Growing of GANs, and StyleGANs.

Deep Convolutional GANs (DCGANs)

The introduction of Deep Convolutional GANs (DCGANs) in 2015 by Alec Radford and colleagues marked a major breakthrough in stabilizing GAN training and improving image generation. DCGANs leveraged deep convolutional layers to enhance image quality, particularly for larger images.

Key Guidelines for DCGANs

Guideline Description
Strided Convolutions Replace pooling layers with strided convolutions in the discriminator and transposed convolutions in the generator.
Batch Normalization Use batch normalization in all layers except the generator’s output layer and the discriminator’s input layer.
No Fully Connected Layers Remove fully connected layers to enhance training stability and performance.
Activation Functions Use ReLU in the generator (except for the output layer, which uses tanh) and Leaky ReLU in the discriminator.

DCGAN Architecture Example

In the table below, we break down a simple DCGAN architecture that works with the Fashion MNIST dataset.

Layer (Generator) Output Shape Description
Dense Layer (7 × 7 × 128) Projects the input vector to a feature map.
Reshape (7 × 7 × 128) Reshapes the tensor into 7×7×128.
Batch Normalization (7 × 7 × 128) Normalizes the layer’s activations.
Conv2DTranspose (stride = 2) (14 × 14 × 64) Upsamples the feature map to 14×14, reducing the depth to 64.
Conv2DTranspose (stride = 2) (28 × 28 × 1) Final output layer using tanh, producing a 28×28 image.

Figure 1: DCGAN Generator Architecture

[Random Noise] –> [Dense Layer] –> [Reshape] –> [Conv2DTranspose] –> [Conv2DTranspose] –> [Generated Image]

Figure 2: DCGAN Discriminator Architecture

[Input Image] –> [Conv2D (Leaky ReLU)] –> [Dropout] –> [Conv2D (Leaky ReLU)] –> [Dense Layer] –> [Sigmoid]

Although effective for small images, as image complexity increases, DCGANs face challenges such as generating artifacts or inconsistencies in larger images.

Progressive Growing of GANs

In 2018, Nvidia researchers Tero Kerras et al. introduced Progressive Growing of GANs, which allows for a more stable training of GANs to generate high-resolution images. The idea is to start with low-resolution images (e.g., 4×4) and gradually add layers to increase the resolution as training progresses (e.g., 8×8, 16×16, up to 1024×1024).

How Progressive Growing Works

  • Layer-wise Growth: New convolutional layers are added progressively to the generator and the discriminator during training, as shown in Figure 3.
  • Mini-Batch Standard Deviation Layer: This layer helps prevent mode collapse by encouraging the generator to produce diverse outputs. It computes the standard deviation across feature maps and appends the result as an extra feature map to each instance.

Figure 3: Progressive Growing GAN Architecture

[4×4 Image] –> [Conv Layers] –> [Upsampling to 8×8] –> [Upsampling to 16×16] –> … –> [1024×1024 Image]

StyleGANs: Taking Image Generation to the Next Level

The StyleGAN architecture, introduced by Nvidia in 2018, took GAN technology to a new level by incorporating style transfer techniques. StyleGAN enhances the generator to produce high-resolution images with unprecedented quality, particularly when generating faces.

StyleGAN Architecture

Network Function
Mapping Network Transforms the latent vector (random noise) into a style vector, controlling different aspects of the image.
Synthesis Network Uses the style vector to generate images, incorporating noise at each level to add stochastic variations.

Key Innovations in StyleGAN

  • Noise Injection: Noise is added at different levels of the synthesis network to introduce randomness, helping generate fine details such as hair and wrinkles.
  • Style Mixing: Two latent vectors are used to control different parts of the image, encouraging more localized variation. This prevents correlation between different image attributes (e.g., eyes and mouth).
  • Pixelwise Normalization: After each convolutional layer, activations are normalized to balance features across the image, preventing any one feature from dominating.

Table: StyleGAN Features and Their Benefits

Feature Description Benefit
Noise Injection Adds noise to individual levels of the network to introduce variability in generated images. Generates fine details and avoids feature artifacts.
Style Mixing Controls different parts of the image using multiple latent vectors. Ensures more meaningful variations in generated images.
Pixelwise Normalization Normalizes activations at each convolutional layer. Balances features to prevent any single feature from dominating.

Figure 5: StyleGAN’s Latent Space Arithmetic

[Man with Glasses] – [Man without Glasses] + [Woman without Glasses] = [Woman with Glasses]

Conclusion

From the introduction of DCGANs to the development of Progressive Growing GANs and StyleGANs, GAN technology has rapidly evolved to produce highly realistic images.

  • DCGANs provided stability in training using deep convolutional architectures.
  • Progressive Growing of GANs allowed for scalable generation of high-resolution images by growing layers during training.
  • StyleGAN refined the process further, incorporating style transfer and noise injection for superior control and image quality.

Understanding the Code Behind GANs

In this section, we will explain the code used in Deep Convolutional GANs (DCGANs), Progressive Growing GANs, and StyleGANs, and also incorporate the latest advancements in GAN technology from 2024. The added note at the end will compare these advancements to the older architectures to give you a complete understanding of how GANs are evolving.

DCGAN Example (Fashion MNIST)

Generator Architecture

generator = tf.keras.Sequential([
    tf.keras.layers.Dense(7 * 7 * 128),
    tf.keras.layers.Reshape([7, 7, 128]),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Conv2DTranspose(64, kernel_size=5, strides=2, padding="same", activation="relu"),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Conv2DTranspose(1, kernel_size=5, strides=2, padding="same", activation="tanh"),
])

The generator starts by projecting random noise into a 7×7 feature map with 128 filters, then upsamples it through two transposed convolutional layers, ultimately generating a 28×28 grayscale image. The tanh activation function ensures that the output values range between -1 and 1, which matches the scaled image data.

Discriminator Architecture

discriminator = tf.keras.Sequential([
    tf.keras.layers.Conv2D(64, kernel_size=5, strides=2, padding="same", activation=tf.keras.layers.LeakyReLU(0.2)),
    tf.keras.layers.Dropout(0.4),
    tf.keras.layers.Conv2D(128, kernel_size=5, strides=2, padding="same", activation=tf.keras.layers.LeakyReLU(0.2)),
    tf.keras.layers.Dropout(0.4),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

The discriminator applies strided convolutions to downsample the input image and distinguish real from fake. Dropout layers help reduce overfitting, and the LeakyReLU activation allows a small gradient to pass even for negative inputs, improving stability.

Combining Generator and Discriminator into a GAN

gan = tf.keras.Sequential([generator, discriminator])

This simply connects the generator and discriminator, allowing the generator to produce an image and the discriminator to evaluate it.

Progressive Growing of GANs: Mini-Batch Standard Deviation Layer

S = tf.math.reduce_std(inputs, axis=[0, -1])
v = tf.reduce_mean(S)
tf.concat([inputs, tf.fill([batch_size, height, width, 1], v)], axis=-1)

In Progressive Growing GANs, new layers are added progressively as the resolution increases. The mini-batch standard deviation layer calculates the variability in the mini-batch and appends this information to the discriminator. This helps reduce mode collapse by encouraging more diverse outputs from the generator.

StyleGAN: Pixelwise Normalization Layer

inputs / tf.sqrt(tf.reduce_mean(tf.square(X), axis=-1, keepdims=True) + 1e-8)

StyleGAN introduces several innovations, such as Pixelwise Normalization, which normalizes each pixel’s activations based on all channels. This prevents any single feature from dominating the generation process and enhances control over style variations.

Recent Advancements in GANs (2024)

In 2024, new advancements in GAN technology have significantly improved their training stability, control, and effectiveness across various applications. Key innovations include:

  • Wasserstein GANs (WGANs): WGANs improve training stability by using the Wasserstein distance metric, ensuring reliable convergence and addressing issues present in traditional GANs like DCGANs.
  • Conditional GANs (CGANs): CGANs allow for more control by conditioning the generated outputs on specific labels or attributes, providing greater precision in generating images and other outputs.
  • Self-Supervised Learning for GANs: Self-supervised learning enables GANs to learn from unlabeled data, significantly improving their generalization capabilities across various datasets.
  • Attention Mechanisms: The integration of attention mechanisms improves how GANs focus on relevant features, enhancing performance in tasks such as text generation and high-quality image synthesis.

 

=

 


View Generative Adversarial Network Illustration on Wikimedia Commons

This image visually explains the relationship between the generator and discriminator in a GAN, where the generator tries to create realistic data while the discriminator evaluates its authenticity.

 

Generative Adversarial Network Illustration

This image visually explains the relationship between the generator and discriminator in a GAN, where the generator tries to create realistic data while the discriminator evaluates its authenticity.

don't miss our new posts. Subscribe for updates

We don’t spam! Read our privacy policy for more info.