Stanford XCS236 · Deep Generative Models
Introduction to Generative Models
Week 1 · Foundations, Taxonomy & Evaluation
01

What Are Generative Models

Generative models learn to represent and sample from data distributions. Rather than predicting labels or features, they estimate p(x)—the probability density of observed data. This enables generation of novel samples and provides rich representations useful for downstream tasks.

At their core, generative models pursue density estimation through learned parametric distributions. Whether through autoencoders, flow models, or diffusion processes, the goal remains consistent: discover latent structure in high-dimensional data and synthesize realistic continuations.

02

Taxonomy of Approaches

Generative modeling encompasses diverse algorithmic families with distinct tradeoffs. Autoregressive models factor joint distributions as products of conditionals; latent variable models introduce unobserved structure; flow-based models apply invertible transformations; implicit models skip explicit likelihoods entirely.

Each approach balances tractability, expressiveness, and sampling efficiency. Understanding this taxonomy clarifies why certain methods suit certain domains—transformers for text, score-based diffusion for images, normalizing flows for molecular design.

03

Maximum Likelihood Principle

Maximum likelihood estimation remains the dominant training objective. By maximizing log p_θ(x), we implicitly minimize KL divergence from data to model, pushing learned distributions toward empirical observations. This principled approach justifies why MLE works across nearly all modern architectures.

Alternative divergences exist—Wasserstein, f-divergences, reverse KL in VAEs—each reflecting different geometric assumptions. Yet forward KL via MLE dominates due to computational tractability and its natural alignment with our information-theoretic intuitions about learning.

04

Sampling & Generation

Sampling lies at the heart of generative modeling's utility. Ancestral sampling proceeds autogressively through learned conditional distributions; rejection sampling accepts proposals with likelihood-weighted probability; MCMC explores high-dimensional spaces through iterative refinement.

Each technique trades off computational cost, mixing time, and sample quality. Modern diffusion models reframe sampling as reversing a noise schedule; transformer-based models leverage top-k filtering; adversarial approaches rely on implicit sampling through generator networks.

05

Evaluation Metrics

Evaluating generative models requires both computational metrics and perceptual quality measures. Log-likelihood gauges model fit; Fréchet Inception Distance captures perceptual similarity; Inception Score rewards confident, diverse predictions; precision-recall balances mode coverage against spurious novel modes.

No single metric suffices. Likelihood alone ignores sample quality; FID suits images but not all modalities; human evaluation remains essential. Modern benchmarks combine multiple metrics, acknowledging that generative quality involves multiple dimensions of fidelity.

06

Computational Tradeoffs

Computing exact log-likelihoods is intractable for most architectures. Autoregressive models achieve tractability through factorization; flow models via Jacobian determinants; VAEs through learned variational bounds. Implicit models abandon likelihood entirely, complicating evaluation but enabling flexible expressiveness.

Latent variable models trade approximate inference for expressive posteriors. Recognizing this spectrum—from fully tractable autoregressive to fully implicit GANs—explains architectural choices and guides practitioners toward methods matching their constraints and requirements.

07

Historical Context

Generative modeling's modern renaissance builds on decades of foundational work. Boltzmann machines and RBMs pioneered energy-based approaches; deep belief networks unified supervised and unsupervised learning; VAEs and GANs introduced latent variable models at scale; transformers and diffusion models now dominate.

Each era discovered fundamental principles—the variational autoencoder's latent space trade-off, the adversarial game's nonconvergence challenges, diffusion's connection to score matching. These aren't historical footnotes but active principles guiding current research directions.

08

Course Roadmap

This course spans the generative modeling landscape from foundations through modern applications. Weeks 1-3 establish core concepts: density estimation, MLE, sampling, and evaluation. Weeks 4-6 build toward powerful architectures: autoencoders, flows, and diffusion. Weeks 7-8 synthesize with adversarial training and applications.

Each approach's tradeoffs become clear through hands-on implementation. Autoregressive models train quickly but sample slowly; flow models offer exact likelihood but limited expressiveness; diffusion models match state-of-the-art image generation despite iterative sampling. Understanding these tensions prepares you for designing novel architectures.

09

References & Further Reading

This course draws on seminal papers, textbooks, and pedagogical resources spanning decades of generative modeling research. The references below provide both foundational theory and modern implementations, enabling deeper exploration of each architectural family and their applications.

Key resources include Stanford's course materials, classic papers introducing VAEs, GANs, and diffusion models, as well as contemporary blog posts and tutorials that bridge theory and practice. This section guides further learning beyond lecture content.