Review DDPM

This article is a simplified interpretation of DDPM.

The general target of diffusion models is to learn a real world distribution $p (x)$ . DDPM chooses a step-by-step Markov Chain to do this

p_{θ} (x_{0}) = \int p_{θ} (x_{0 : T}) d x_{1 : T}

where we set $p (x_{T}) = N (x_{T} ∣ 0, I)$ , and

p_{θ} (x_{0 : T}) = p_{θ} (x_{T}) t = 1 \prod T p_{θ} (x_{t - 1} ∣ x_{t})

We don't know what is $p_{θ} (x_{t - 1} ∣ x_{t})$ , so we apply Bayes' Theorem,

p_{θ} (x_{t - 1} ∣ x_{t}) = \frac{p _{θ} ( x _{t} ∣ x _{t - 1} ) p _{θ} ( x _{t - 1} )}{p _{θ} ( x _{t} )}

Recall the definition of the transition distribution

p_{θ} (x_{t} ∣ x_{t - 1}) = N (x_{t} ∣ α_{t} x_{t - 1}, (1 - α_{t}) I)

So we therefore can conclude $p_{θ} (x_{t - 1} ∣ x_{0})$ and $p_{θ} (x_{t} ∣ x_{0})$ , than we would have

p_{θ} (x_{t - 1} ∣ x_{t}, x_{0}) = \frac{p _{θ} ( x _{t} ∣ x _{t - 1} ) p _{θ} ( x _{t - 1} ∣ x _{0} )}{p _{θ} ( x _{t} ∣ x _{0} )} = \frac{N ( x _{t} ∣ α _{t} x _{t - 1} , ( 1 - α _{t} ) I ) N ( x _{t - 1} ∣ α ˉ _{t - 1} x _{t - 1} , ( 1 - α ˉ _{t - 1} ) I )}{N ( x _{t} ∣ α ˉ _{t} x _{0} , ( 1 - α ˉ _{t} ) I )} = N (x_{t - 1} ∣ μ (x_{t}, x_{0}), σ_{t}^{2} I)

where

μ (x_{t}, x_{0}) σ_{t}^{2} = \frac{( 1 - α _{t - 1} ) α _{t}}{1 - α _{t}} x_{t} + \frac{( 1 - α _{t} ) α _{t - 1}}{1 - α _{t}} x_{0} = \frac{( 1 - α _{t} ) ( 1 - α _{t - 1} )}{1 - α _{t}}

However, we want $p_{θ} (x_{t - 1} ∣ x_{t})$ other than $p_{θ} (x_{t - 1} ∣ x_{t}, x_{0})$ . To solve this, we train a model $\hat{x} = \hat{x}_{θ} (x_{t})$ to predict $x_{0}$ from $x_{t}$ , so that the probability would only depend on $x_{t}$ . Therefore

p_{θ} (x_{t - 1} ∣ x_{t}) \approx p_{θ} (x_{t - 1} ∣ x_{t}, x_{0} = \hat{x}_{θ} (x_{t})) = N (x_{t - 1} ∣ μ (x_{t}, \hat{x}_{θ} (x_{t})), σ_{t}^{2} I)

This is what denoising means.

Lin's Notes Garden

Explorer

Review DDPM

Graph View

Backlinks