Abandon Bayes and Markov

In Review DDPM and DDPM, to get the distribution $p (x_{t - 1} ∣ x_{t}, x_{0})$ , we applied Bayes' Theorem:

p (x_{t - 1} ∣ x_{t}, x_{0}) = \frac{p ( x _{t} ∣ x _{t - 1} ) p ( x _{t - 1} ∣ x _{0} )}{p ( x _{t} ∣ x _{0} )}

where we set $p (x_{t} ∣ x_{t - 1})$ as a series of Gaussian distribution, so that we can derive $p (x_{t - 1} ∣ x_{0})$ and $p (x_{t} ∣ x_{0})$ from it and therefore obtain $p (x_{t - 1} ∣ x_{t}, x_{0})$ .

The Gaussian we chose for $p (x_{t} ∣ x_{t - 1})$ is

p (x_{t} ∣ x_{t - 1}) = N (x_{t} ∣ α_{t} x_{t - 1}, (1 - α_{t}) I)

This transition satisfies Markov Property.

In DDIM, we choose a another way to obtain $p (x_{t - 1} ∣ x_{t}, x_{0})$ by assuming it as a Gaussian

p (x_{t - 1} ∣ x_{t}, x_{0}) = N (x_{t - 1} ∣ κ_{t} x_{t} + λ_{t} x_{0}, σ_{t}^{2} I)

This is a general form for $p (x_{t} ∣ x_{t - 1}, x_{0})$ , which should satisfy

\int p (x_{t - 1} ∣ x_{t}, x_{0}) p (x_{t} ∣ x_{0}) d x_{t} = p (x_{t - 1} ∣ x_{0})

Now we could get the form of $p (x_{t} ∣ x_{t - 1}, x_{0})$ by setting up $p (x_{t} ∣ x_{0})$ and $p (x_{t - 1} ∣ x_{0})$ . To reuse the model we trained in DDPM, we remain the same distribution of them

p (x_{t - 1} ∣ x_{0}) p (x_{t} ∣ x_{0}) = N (x_{t - 1} ∣ \overset{α}{ˉ}_{t - 1} x_{t - 1}, (1 - \overset{α}{ˉ}_{t - 1}) I) = N (x_{t} ∣ \overset{α}{ˉ}_{t} x_{0}, (1 - \overset{α}{ˉ}_{t}) I

Then we can solve

κ_{t} = \frac{1 - α ˉ _{t - 1} - σ _{t}^{2}}{1 - α ˉ _{t}}, λ_{t} = \overset{α}{ˉ}_{t - 1} - \frac{α ˉ _{t} ( 1 - α ˉ _{t - 1} - σ _{t}^{2} )}{1 - α ˉ _{t}}

So $p (x_{t - 1} ∣ x_{t}, x_{0})$ is a set of distributions depend on $σ_{t}^{2}$ . If we chose $σ_{t}^{2}$ as the same in DDPM, the $p (x_{t - 1} ∣ x_{t}, x_{0})$ will also be the same.

Now, by Bayes' Theorem we could obtain the forward process $p (x_{t} ∣ x_{t - 1}, x_{0})$ , which is no longer Markovian as $p (x_{t} ∣ x_{t - 1}, x_{0}) = p (x_{t} ∣ x_{t - 1})$ in DDPM

Tip

The general idea of DDIM is, specific $p (x_{t} ∣ x_{0})$ and $p (x_{t - 1} ∣ x_{0})$ by $\overset{α}{ˉ}_{t}$ and $\overset{α}{ˉ}_{t - 1}$ to get $p (x_{t - 1} ∣ x_{t}, x_{0})$ , rather than by defining $p (x_{t} ∣ x_{t - 1})$ to get all other distributions in DDPM

Training and Inference

As same in DDPM, DDIM trains a denoise network $\hat{x}_{0} = \hat{x}_{θ} (x_{t})$ to get $p (x_{t - 1} ∣ x_{t})$ from $p (x_{t - 1} ∣ x_{t}, x_{0})$ . Since the loss function, as the same in DDPM, does not include $σ_{t}^{2}$ , so we can use a pre-trained DDPM model for DDIM inference by changing the value of $σ_{t}^{2}$ .

In particular, if we set $σ_{t}^{2} = 0$ , the inference process would be determinant, and this is what the Implicit in DDIM mean.

Accelerate Generation

In DDPM, the generative process is considered as the approximation to the reverse process; since of the forward process has $T$ steps, the generative process is also forced to sample $T$ steps.

However, the denoising network we trained in DDPM does not rely on any specific forward procedure. That is, as long as $x_{t}$ is fixed by $p (x_{t} ∣ x_{0})$ with parameter $\overset{α}{ˉ}_{t}$ , the denoising network $\hat{x}_{0} = \hat{x}_{θ, t} (x_{t})$ would work regardless $x_{t}$ is sampled from $x_{t - 1}$ or $x_{t - 100}$ . Therefore, a DDPM trained for $[1, \dots, T]$ includes all the parameters we need for any subsequence $[τ_{1}, \dots, τ_{S}] \subset [1, \dots, T]$ , and we could build a DDIM model on this new sequence without training a new model.

By this way, we can directly accelerate the generation process to $S$ steps by recalculating $\overset{α}{ˉ}_{1}, \dots, \overset{α}{ˉ}_{S}$ from $α_{τ_{1}}, \dots, α_{τ_{S}}$ .

Lin's Notes Garden

Explorer

Denoising Diffusion Implicit Models

Abandon Bayes and Markov

Training and Inference

Accelerate Generation

Graph View

Table of Contents

Backlinks