Stanford XCS229 · Problem Set 1
PS1: GLM Convexity
25 Points · Convexity, Features & Stability
1

Problem Set 1 Overview

This problem set covers three fundamental concepts in machine learning: generalized linear models, feature representations, and algorithmic stability. Across 25 points and 3 problems, you'll explore the mathematical foundations that connect probabilistic modeling, kernel methods, and generalization theory. Each problem builds on core week 1-2 concepts from XCS229, creating a cohesive understanding of supervised learning.

The problems span theoretical proofs, computational implementations, and conceptual connections. Work through each section carefully, as the insights from earlier problems illuminate the later ones. Focus on building intuition alongside rigorous mathematics.

2

GLM Convexity Setup

Generalized linear models (GLMs) form the foundation for supervised learning. The key insight is that the negative log-likelihood (NLL) of GLMs exhibits a convex structure, enabling efficient optimization. This section establishes the mathematical setup: understanding the exponential family, the link function, and how the Hessian matrix determines convexity properties.

The problem asks you to prove that for any GLM, the negative log-likelihood is a convex function of the parameters. This requires understanding the relationship between the second derivative (Hessian) and positive semi-definiteness, ultimately connecting to the information matrix of the exponential family.

3

Proving Convexity

Proving convexity involves demonstrating that the Hessian of the NLL is positive semi-definite. For exponential family distributions, this follows from the structure of the sufficient statistics and the log-partition function. The proof strategy combines calculus, linear algebra, and properties specific to exponential families.

Key techniques include: taking derivatives of the NLL with respect to parameters, recognizing the Hessian as a variance-covariance-like matrix, and showing that it can be written as X^T D X where D is diagonal with positive entries. This establishes not just convexity but also strict convexity for most GLMs.

4

Feature Maps Problem

Feature maps transform raw inputs into a higher-dimensional representation where linear models become more expressive. This problem explores explicit feature maps and their computational implications. You'll work with polynomial kernels, computing feature representations that correspond to polynomial features in the original space.

The central question: how do we efficiently represent polynomial features, and what computational cost does this entail? By understanding explicit feature maps, you gain insight into why the kernel trick exists—as a computational shortcut for high-dimensional feature representations.

5

Feature Map Analysis

Analyzing feature maps requires comparing dimensionality, computational cost, and representational power. For polynomial degree d features in d-dimensional input space, the feature map dimension grows polynomially. This analysis motivates kernel methods: computing inner products in feature space without explicitly constructing features.

The kernel trick exploits the fact that certain inner products in feature space can be computed efficiently in input space. Understanding this connection bridges explicit representations and implicit kernel methods, showing how mathematical structure enables computational efficiency in machine learning.

6

Stability of Learning

Stability of learning algorithms quantifies how much the learned hypothesis changes when one training example is perturbed or removed. This concept connects directly to generalization bounds: stable algorithms generalize well from finite samples. Algorithmic stability provides a framework for understanding why regularization works and how sample complexity relates to model complexity.

Uniform stability—the strongest notion—requires that removing any single example changes the learned function by at most a small amount. This property ensures that the algorithm's performance on unseen data closely matches its training performance, establishing generalization guarantees independent of model class assumptions.

7

Stability Analysis

Analyzing stability involves computing how the leave-one-out error relates to the training error, and connecting this to regularization strength. For learning algorithms like regularized linear regression, stability can be explicitly quantified in terms of the regularization parameter. The leave-one-out cross-validation error provides an empirical measure of stability.

The deep connection emerges: stable algorithms have low generalization gaps. This explains why regularization (ridge regression, LASSO) works—it enforces stability by preventing the algorithm from fitting too closely to any particular training example. The relationship between stability and regularization parameter strength is fundamental to understanding overfitting and generalization.

8

Key Takeaways

This problem set synthesizes three core machine learning concepts. Convexity enables efficient optimization of GLMs. Feature maps and kernels provide flexible representations while managing computational complexity. Stability theory explains generalization and justifies regularization. Together, they form the mathematical and conceptual foundation for supervised learning.

As you complete this problem set, recognize how these concepts interlock: convex optimization finds efficient solutions, feature representations enable expressivity, and stability theory guarantees that these solutions generalize. These three pillars—optimization, representation, and generalization—define modern machine learning practice.

9

References & Further Reading

To deepen your understanding of the concepts covered in Problem Set 1, refer to these key resources. The Stanford CS229 course materials provide comprehensive lectures on GLMs, kernels, and generalization. Lecture notes on exponential families and convexity are essential for Problem 1. For feature maps and kernel methods, review the supervised learning lectures. Stability theory is covered in advanced generalization lectures.

The Elements of Statistical Learning (ESL) is an authoritative reference for practical machine learning theory. Murphy's Machine Learning textbook provides excellent intuitions alongside mathematics. For rigorous proofs and advanced topics, consult survey papers on convex optimization and algorithmic stability.