Forward and Backward Updates in Gradient Descent

Recall the equation of a step in gradient descent

And we can rewrite this into

Take , we will have

Consider the iteration of the descent procedure. For a quite small step, we have the forward equation, it is a ODE

Similarly, we can write the reverse equation (See gradient ascent)

Stochastic Differential Equation

If we introduce a noise term to the gradient descent algorithm, then the ODE will become a stochastic differential equation (SDE). If denote the noise term by , we will have (treating as a continuous function)

And we define a random process such that for a very small . In computation, we can generate by integrating . We can therefore write

This equation reveals a generic form of the SDE, that is

where in terms of physics

  • The drift coefficient defining how molecules in a closed system would move in the absence of random effects. For the gradient descent algorithm, the drift is defined by the negative gradient of the objective function. That is, we want the solution trajectory to follow the gradient of the objective.
  • The diffusion coefficient is a scalar function describing how the molecules would randomly walk from one position to another.

Stochastic Differential Equation for DDPM

We consider the discrete-time DDPM iteration, for (check this equation for details, and here we define )

And this sampling equation can be written as an SDE via

This is because if we define a step size , and rewrite as

where we start by and . Similarly, we can define

Hence, we have

As , we have

Being able to write the DDPM forward update iteration as an SDE means that the DDPM estimates can be determined by solving the SDE. In other words, for an appropriately defined SDE solver, we can throw the SDE into the solver. The solution returned by an appropriately chosen solver will be the DDPM estimate.