Introduction to Generative Models

Stanford XCS236 · Deep Generative Models

Introduction to Generative Models

What Are Generative Models

Generative models learn to represent and sample from data distributions. Rather than predicting labels or features, they estimate p(x)—the probability density of observed data. This enables generation of novel samples and provides rich representations useful for downstream tasks.

At their core, generative models pursue density estimation through learned parametric distributions. Whether through autoencoders, flow models, or diffusion processes, the goal remains consistent: discover latent structure in high-dimensional data and synthesize realistic continuations.

Taxonomy of Approaches

Generative modeling encompasses diverse algorithmic families with distinct tradeoffs. Autoregressive models factor joint distributions as products of conditionals; latent variable models introduce unobserved structure; flow-based models apply invertible transformations; implicit models skip explicit likelihoods entirely.

Each approach balances tractability, expressiveness, and sampling efficiency. Understanding this taxonomy clarifies why certain methods suit certain domains—transformers for text, score-based diffusion for images, normalizing flows for molecular design.

Maximum Likelihood Principle

Maximum likelihood estimation remains the dominant training objective. By maximizing log p_θ(x), we implicitly minimize KL divergence from data to model, pushing learned distributions toward empirical observations. This principled approach justifies why MLE works across nearly all modern architectures.

Alternative divergences exist—Wasserstein, f-divergences, reverse KL in VAEs—each reflecting different geometric assumptions. Yet forward KL via MLE dominates due to computational tractability and its natural alignment with our information-theoretic intuitions about learning.

Sampling & Generation

Sampling lies at the heart of generative modeling's utility. Ancestral sampling proceeds autogressively through learned conditional distributions; rejection sampling accepts proposals with likelihood-weighted probability; MCMC explores high-dimensional spaces through iterative refinement.

Each technique trades off computational cost, mixing time, and sample quality. Modern diffusion models reframe sampling as reversing a noise schedule; transformer-based models leverage top-k filtering; adversarial approaches rely on implicit sampling through generator networks.

Evaluation Metrics

Evaluating generative models requires both computational metrics and perceptual quality measures. Log-likelihood gauges model fit; Fréchet Inception Distance captures perceptual similarity; Inception Score rewards confident, diverse predictions; precision-recall balances mode coverage against spurious novel modes.

No single metric suffices. Likelihood alone ignores sample quality; FID suits images but not all modalities; human evaluation remains essential. Modern benchmarks combine multiple metrics, acknowledging that generative quality involves multiple dimensions of fidelity.

Computational Tradeoffs

Computing exact log-likelihoods is intractable for most architectures. Autoregressive models achieve tractability through factorization; flow models via Jacobian determinants; VAEs through learned variational bounds. Implicit models abandon likelihood entirely, complicating evaluation but enabling flexible expressiveness.

Latent variable models trade approximate inference for expressive posteriors. Recognizing this spectrum—from fully tractable autoregressive to fully implicit GANs—explains architectural choices and guides practitioners toward methods matching their constraints and requirements.

Historical Context

Generative modeling's modern renaissance builds on decades of foundational work. Boltzmann machines and RBMs pioneered energy-based approaches; deep belief networks unified supervised and unsupervised learning; VAEs and GANs introduced latent variable models at scale; transformers and diffusion models now dominate.

Each era discovered fundamental principles—the variational autoencoder's latent space trade-off, the adversarial game's nonconvergence challenges, diffusion's connection to score matching. These aren't historical footnotes but active principles guiding current research directions.

Course Roadmap

This course spans the generative modeling landscape from foundations through modern applications. Weeks 1-3 establish core concepts: density estimation, MLE, sampling, and evaluation. Weeks 4-6 build toward powerful architectures: autoencoders, flows, and diffusion. Weeks 7-8 synthesize with adversarial training and applications.

Each approach's tradeoffs become clear through hands-on implementation. Autoregressive models train quickly but sample slowly; flow models offer exact likelihood but limited expressiveness; diffusion models match state-of-the-art image generation despite iterative sampling. Understanding these tensions prepares you for designing novel architectures.

References & Further Reading

This course draws on seminal papers, textbooks, and pedagogical resources spanning decades of generative modeling research. The references below provide both foundational theory and modern implementations, enabling deeper exploration of each architectural family and their applications.

Key resources include Stanford's course materials, classic papers introducing VAEs, GANs, and diffusion models, as well as contemporary blog posts and tutorials that bridge theory and practice. This section guides further learning beyond lecture content.

What Are Generative Models

Taxonomy of Approaches

Maximum Likelihood Principle

Sampling & Generation

Evaluation Metrics

Computational Tradeoffs

Historical Context

Course Roadmap

References & Further Reading

What Are Generative Models

Core Concepts

Density Estimation

Sample Synthesis

Representation Learning

Likelihood Optimization

Key Distinctions

Foundation Principle

Learning Objectives

Taxonomy of Approaches

Autoregressive Models

Latent Variable Models

Flow-Based Models

Score-Based & Diffusion Models

Implicit Models (GANs)

Advantages

Tradeoffs

Maximum Likelihood Principle

KL Divergence Equivalence

Why MLE Dominates

Optimization Landscape

Information Theory Principle

Forward KL (MLE)

Reverse KL (VAE)

Wasserstein Distance

Implicit Objectives

Practical Considerations

Sampling & Generation

Ancestral Sampling

Rejection Sampling

Markov Chain Monte Carlo

Diffusion Model Sampling

Advantages

Limitations

Evaluation Metrics

Log-Likelihood & Bits/Dimension

Fréchet Inception Distance (FID)

Inception Score (IS) & Precision/Recall

Sample Quality Metrics

Evaluation Principle

Log-Likelihood

FID Score

Inception Score

Precision/Recall

Context-Specific Metrics

Computational Tradeoffs

Exact vs. Approximate Likelihood

Tractable vs. Intractable Posteriors

Training Efficiency

Inference (Sampling) Speed

Autoregressive

Latent Variable

Flows

Diffusion

Scalability to High Dimensions

Design Principle

Historical Context

Energy-Based Models Era (1980s-2000s)

Deep Learning Revolution (2010s Early)

Adversarial Training & GANs (2014-2018)

Attention & Transformer Era (2017-2019)

Score-Based & Diffusion Resurgence (2019-2022)

Lasting Principles

Course Roadmap

Weeks 1-2: Foundations

Weeks 3-4: Autoregressive & VAEs

Weeks 5-6: Flows & Diffusion Models

Weeks 7-8: GANs, Synthesis & Applications

Key Concepts Throughout

Strengths

Challenges