Forward and Backward Updates in Gradient Descent
Recall the equation of a step in gradient descent
And we can rewrite this into
Take , we will have
Consider the iteration of the descent procedure. For a quite small step, we have the forward equation, it is a ODE
Similarly, we can write the reverse equation (See gradient ascent)
Stochastic Differential Equation
If we introduce a noise term to the gradient descent algorithm, then the ODE will become a stochastic differential equation (SDE). If denote the noise term by , we will have (treating as a continuous function)
And we define a random process such that for a very small . In computation, we can generate by integrating . We can therefore write
This equation reveals a generic form of the SDE, that is
where in terms of physics
- The drift coefficient defining how molecules in a closed system would move in the absence of random effects. For the gradient descent algorithm, the drift is defined by the negative gradient of the objective function. That is, we want the solution trajectory to follow the gradient of the objective.
- The diffusion coefficient is a scalar function describing how the molecules would randomly walk from one position to another.
Stochastic Differential Equation for DDPM
We consider the discrete-time DDPM iteration, for (check this equation for details, and here we define )
And this sampling equation can be written as an SDE via
This is because if we define a step size , and rewrite as
where we start by and . Similarly, we can define
Hence, we have
As , we have
Being able to write the DDPM forward update iteration as an SDE means that the DDPM estimates can be determined by solving the SDE. In other words, for an appropriately defined SDE solver, we can throw the SDE into the solver. The solution returned by an appropriately chosen solver will be the DDPM estimate.