Optimization Geometry and Convergence
Explore PL inequalities, linear convergence rates, and the geometry of deep network optimization landscapes. Covers SGD dynamics, saddle avoidance, and practical implications for deep learning.
Stanford-level tutorials covering the mathematical foundations of deep learning, from optimization theory to generative models and scaling laws.
Explore PL inequalities, linear convergence rates, and the geometry of deep network optimization landscapes. Covers SGD dynamics, saddle avoidance, and practical implications for deep learning.
Barron-space approximation theory, Rademacher bounds, and compute-optimal scaling laws for neural networks. Understanding why deep networks generalize and scale effectively.
Unified view of generative models, probability flow ODEs, and the mathematical foundations of diffusion models. Connects score matching to continuous normalizing flows.
Newton methods, cubic regularization, natural gradients, and practical approximations like K-FAC and Shampoo. When and how curvature information improves optimization.
Learning rate scaling, warmup theorems, cosine decay, and stability analysis for large-scale training. Systematic approaches to hyperparameter design.
PL inequalities, convergence rates, saddle escape
Barron space, Rademacher bounds, scaling laws
Diffusion, flows, score matching
Newton, natural gradients, K-FAC
Hyperparameters, schedules, stability
These articles are designed to be read in sequence, building from fundamental optimization theory through to advanced topics in generative models and training dynamics.
Or explore any article based on your interests