Forward and Backward Updates in Gradient Descent

Recall the equation of a step in gradient descent

x_{i} - x_{i - 1} = - β_{i - 1} \nabla f (x_{i - 1})

And we can rewrite this into

Δ x = - β Δ t \cdot \nabla f (x)

Take $Δ t \to 0$ , we will have

d x = - β \nabla f (x) d t

Consider the iteration of the descent procedure. For a quite small step, we have the forward equation, it is a ODE

x_{i} \approx x_{i - 1} - β \nabla f (x_{i - 1}) d t

Similarly, we can write the reverse equation (See gradient ascent)

x_{i - 1} \approx x_{i - 1} + β \nabla f (x_{i}) d t

Stochastic Differential Equation

If we introduce a noise term $z_{t} \sim N (0, I)$ to the gradient descent algorithm, then the ODE will become a stochastic differential equation (SDE). If denote the noise term by $z (t)$ , we will have (treating $x (t)$ as a continuous function)

x (t + Δ t) = x (t) - τ \nabla f (x (t)) + z (t)

And we define a random process $w (t)$ such that $z (t) = w (t + Δ t) - w (t) \approx \frac{d w ( t )}{d t} Δ t$ for a very small $Δ t$ . In computation, we can generate $w (t)$ by integrating $z (t)$ . We can therefore write

x (t + Δ) - x (t) ⟹ d x = - τ \nabla f (x (t)) + w (t + Δ t) - w (t) = - τ \nabla f (x) d t + d w

This equation reveals a generic form of the SDE, that is

d x = drift f (x, t) d t + diffusion g (t) d w

where in terms of physics

The drift coefficient defining how molecules in a closed system would move in the absence of random effects. For the gradient descent algorithm, the drift is defined by the negative gradient of the objective function. That is, we want the solution trajectory to follow the gradient of the objective.
The diffusion coefficient is a scalar function describing how the molecules would randomly walk from one position to another.

Stochastic Differential Equation for DDPM

We consider the discrete-time DDPM iteration, for $i = 1, 2, \dots, N$ (check this equation for details, and here we define $β_{i} = \overset{α}{ˉ}_{i}$ )

x_{i} = 1 - β_{i} x_{i - 1} + β_{i} z_{i - 1}, z_{i - 1} \sim N (0, I)

And this sampling equation can be written as an SDE via

d x = drift - \frac{β ( t )}{2} x d t + diffusion β (t) d w

This is because if we define a step size $Δ t = \frac{1}{N}$ , and rewrite $β_{i}$ as

β_{i} = β (\frac{i}{N}) \cdot \frac{1}{N} = β (t + Δ t) Δ t

where we start by $t = 0$ and $i = 1$ . Similarly, we can define

x_{i} z_{i} = x (\frac{i}{N}) = x (t + Δ t) = z (\frac{i}{N}) = z (t + Δ t)

Hence, we have

x_{i} ⟹ x (t + Δ t) ⟹ x (t + Δ t) ⟹ x (t + Δ t) = 1 - β_{i} x_{i - 1} + β_{i} z_{i - 1} = 1 - β (t + Δ t) \cdot Δ t x (t) + β (t + Δ t) \cdot Δ t z (t) \approx (1 - \frac{1}{2} β (t + Δ t) \cdot Δ t) x (t) + β (t + Δ t) \cdot Δ t z (t) \approx x (t) - \frac{1}{2} β (t) Δ t x (t) + β (t) Δ t z (t)

As $Δ \to 0$ , we have

d x = - \frac{1}{2} β (t) x d t + β (t) d w

Being able to write the DDPM forward update iteration as an SDE means that the DDPM estimates can be determined by solving the SDE. In other words, for an appropriately defined SDE solver, we can throw the SDE into the solver. The solution returned by an appropriately chosen solver will be the DDPM estimate.

Lin's Notes Garden

Explorer

Stochastic Differential Equation (SDE)

Forward and Backward Updates in Gradient Descent

Stochastic Differential Equation

Stochastic Differential Equation for DDPM

Graph View

Table of Contents

Backlinks