neural / flow
Where is the probability going?
Score functions point uphill. Velocity fields show the current.
Score-based diffusion and flow matching generate the same distributions. The difference is what the arrows mean. Score fields show grad_x log p(x) — the steepest uphill direction of density. Velocity fields show dx/dt — where probability mass is actually moving. Same math, different intuition. One tells you where the peaks are. The other tells you how to get there.
Lipman et al. (2023) showed that flow matching and score-based diffusion are mathematically equivalent. Both define a continuous path from noise to data. Score matching learns the gradient of the log-density along that path. Flow matching learns the velocity field that transports samples along the path directly.
The velocity formulation has a practical advantage: it uses the conditional flow matching objective, which has lower variance than denoising score matching. The loss is simply ||v_theta(x,t) - u_t(x|z)||^2, where u_t is the conditional velocity along the optimal transport path. No score estimation, no Stein's identity, no noise level weighting. Just regress the velocity.
conditional flow matching on 5 gaussian targets · particles start from source N(0.5, 0.2^2) and follow the velocity field v(x,t) via ODE integration · arrows show WHERE probability mass flows, not density gradient · toggle streamlines to see trajectories through time · compare with the score field below
Optimal Transport Path
Given source sample z ~ p_0 and target sample x ~ p_1, the conditional flow is psi_t(x|z) = (1-t)*z + t*x. At t=0 you have noise. At t=1 you have data. The interpolation is linear — the simplest possible transport.
Conditional Velocity
The velocity along the path is u_t(x|z) = d/dt psi_t(x|z) = x - z. It's constant in time — the particle moves at uniform speed from source to target. The neural network learns to predict this velocity given only the current position and time.
Marginal Velocity Field
The marginal v_t(x) = E[u_t | psi_t = x] averages over all source-target pairs that pass through point x at time t. This is the field you see above — it's the expected direction of transport at every point in space, accounting for the full mixture of possible origins and destinations.
compare — score field (langevin dynamics)
same 5 gaussian targets, same convergence · but arrows now show grad_x log p(x) — the score function · particles follow langevin dynamics with noise injection · notice: score arrows point toward density peaks (uphill), velocity arrows point along transport trajectories (flow direction) · the distinction matters at low density
Stable Diffusion 3, Flux, and most modern image generators use flow matching, not score-based diffusion. The velocity formulation trains faster (lower variance gradients), generates straighter ODE trajectories (fewer steps needed), and naturally supports optimal transport coupling between source and target distributions.
The score formulation requires careful noise level weighting (lambda(sigma) in the loss), struggles with low-density regions (score estimation is poor where p(x) is near zero), and needs more denoising steps because SDE trajectories curve. Flow matching trajectories are straighter by construction — the optimal transport path is a line.
Training loss variance
10-100x
lower than score matching
Conditional FM objective eliminates score estimation noise
ODE steps
20-50
vs 100-1000 for SDE
Straighter trajectories need fewer integration steps
Adopters
SD3
Flux, Genie 2, MovieGen
Industry-standard for image and video generation