Stanford XCS236 · Deep Generative Models
Generative Adversarial Networks
Week 3–4 · Adversarial Training & Wasserstein Distance
01

Adversarial Framework

Generative Adversarial Networks pit two neural networks in a min-max game: a generator G creates synthetic data from noise, while a discriminator D attempts to distinguish real from fake. This adversarial process drives both networks toward an equilibrium where the generator produces indistinguishable samples.

The framework formulates unsupervised learning as a competition. As D improves at detection, G is forced to generate more realistic samples. The Nash equilibrium represents the optimal solution where D cannot improve further.

02

GAN Objective & Loss

The GAN objective function V(G, D) balances two log-likelihood terms: maximizing D's ability to identify real samples and minimizing D's ability to identify fake ones. The generator minimizes the probability of being detected, creating a zero-sum game dynamic.

The optimal discriminator D* has a closed form: it outputs the probability that a sample is real given the data. This theoretical insight guides practical training, even though computing D* exactly is intractable. The minimax formulation defines a theoretical global optimum where real and generated distributions align; however, practical optimization with gradient-based methods does not guarantee convergence to this point and often faces challenges such as mode collapse and training instability.

03

Training Dynamics

Training GANs requires alternating gradient updates: D is updated to maximize classification accuracy, while G is updated to fool D. Instability arises from this sequential optimization—when one network is overparameterized, training can diverge or collapse to trivial solutions.

Mode collapse occurs when G learns to generate only a subset of the data distribution's modes, evading D's discrimination while avoiding the complexity of full coverage. Non-convergence is endemic: theoretical guarantees are weak, and practice shows oscillation between network improvements rather than stable equilibrium. Careful hyperparameter tuning, architectural choices, and training protocols are essential.

04

Wasserstein GAN

Wasserstein GANs address training instability by replacing the divergence metric. Instead of KL or JS divergence, WGAN uses Earth Mover's distance, which provides a smoother gradient landscape. Weight clipping constrains the discriminator to be 1-Lipschitz, ensuring the loss is a valid metric on the space of distributions.

Spectral normalization and gradient penalty further stabilize training by controlling discriminator gradients. Wasserstein loss provides a meaningful training signal even when distributions are disjoint, enabling mode coverage and more reliable convergence. These techniques have become standard practice in modern GAN training.

05

Conditional GANs

Conditional GANs inject class labels into both generator and discriminator, enabling controlled generation. The class information guides G to generate samples from a specific class, while D learns to discriminate both on authenticity and class consistency. This extends GANs from unsupervised to semi-supervised learning.

Pix2pix applies cGANs to image-to-image translation: paired images condition the generator to learn fine-grained transformations like sketch-to-photograph or day-to-night. The discriminator becomes a "patch discriminator," evaluating local realism rather than global authenticity, improving fine detail synthesis.

06

Progressive & Style GANs

Progressive GANs grow the network architecture layer-by-layer during training, starting with low resolution and gradually adding detail. This accelerates convergence and improves stability by allowing the generator to first learn coarse structure before refining details. StyleGAN extends this with a mapping network that translates latent codes into style parameters.

StyleGAN uses adaptive instance normalization (AdaIN) to inject style information at multiple scales, decoupling content from style. Latent interpolation in the intermediate style space produces smooth, high-quality transitions. These architectural innovations enable stable training of high-resolution image generation, exemplified by face synthesis at 1024×1024.

07

Evaluation Metrics

Evaluating GANs is challenging because they lack explicit likelihood models. Inception Score (IS) measures sample diversity and quality using a pre-trained classifier, but is sensitive to mode collapse. Fréchet Inception Distance (FID) compares feature distributions between real and generated samples, providing more robust evaluation.

Precision and recall metrics directly measure mode coverage: precision quantifies the fraction of generated samples within the data manifold, while recall measures the fraction of data modes captured by the generator. Together with FID, these metrics provide comprehensive evaluation of both quality and diversity in generated samples.

08

GAN Theory & Limitations

GANs lack a likelihood-based training objective, making theoretical analysis difficult. However, the minimax framework guarantees convergence under restrictive assumptions (continuous distributions, sufficient capacity, infinite optimization). In practice, these assumptions rarely hold, and networks exhibit oscillation rather than convergence.

The diversity-quality tradeoff is fundamental: generating highly realistic samples requires fine-tuning to a narrow distribution, while maintaining diversity requires exploring the full data manifold. Truncation tricks trade diversity for quality by sampling latent codes from a restricted region. Understanding these limitations guides proper application: GANs excel at high-quality synthesis but require careful validation of mode coverage and diversity.

09

References & Further Reading

Generative Adversarial Networks introduced a paradigm shift through adversarial training. This section gathers foundational papers, improvements, and resources for understanding the GAN framework from theory to modern applications in image generation and beyond.

From the original formulation to conditional variants and architectural innovations, these materials trace the evolution and impact of adversarial training in deep generative modeling.