Diffusion Models and Normalizing Flows: A Unified View
1. Two paradigms, one transport perspective
- Normalizing flows: learn invertible map T_θ such that x = T_θ(z), z ~ p₀, with exact likelihood via change of variables.
- Diffusion models: define forward noising process and learn reverse-time denoising dynamics.
Both can be seen as transporting a simple base distribution to data.
2. Normalizing flow objective
For invertible T_θ,
Training maximizes exact log-likelihood.
For continuous normalizing flows (CNFs), with ODE
the log-density evolves as
3. Diffusion ELBO and score matching
Forward SDE:
Reverse-time SDE:
Score model s_θ(x, t) ≈ ∇_x log p_t(x) is trained with denoising score matching objective
4. Theorem: probability flow ODE equivalence
For the forward SDE above, define ODE
The ODE has the same marginal densities p_t as the SDE.
The SDE density follows a Fokker-Planck PDE:
For the deterministic ODE with velocity
the continuity equation gives
Thus PDEs match; marginals coincide. □
5. Relation to flows
Probability-flow ODE induces a continuous flow with instantaneous change-of-variables
which is exactly CNF machinery, but with vector field tied to score dynamics. Therefore diffusion can be interpreted as learning a flow field through score estimation rather than direct Jacobian-parameterized transport.
6. Likelihoods, speed, and inductive bias
- Flows: exact likelihood, invertibility constraints, sometimes less expressive per FLOP.
- Diffusion: flexible, strong sample quality, expensive sampling unless accelerated.
- Hybrid methods: flow matching, rectified flows, consistency distillation reduce sampling steps while preserving transport interpretation.
7. Convergence and discretization error
If reverse dynamics are integrated with step size h, weak error of Euler-Maruyama is O(h) under regularity assumptions; higher-order solvers can improve to O(h^p) but require smoother score fields and accurate Jacobian-vector products.
Key connections:
- Diffusion models can be viewed as learning continuous normalizing flows through score matching
- Probability flow ODEs provide deterministic sampling paths with same marginals as stochastic SDEs
- Hybrid methods combine the strengths of both paradigms for efficient high-quality generation