This article is a simplified interpretation of DDPM.

The general target of diffusion models is to learn a real world distribution . DDPM chooses a step-by-step Markov Chain to do this

where we set , and

We don't know what is , so we apply Bayes' Theorem,

Recall the definition of the transition distribution

So we therefore can conclude and , than we would have

where

However, we want other than . To solve this, we train a model to predict from , so that the probability would only depend on . Therefore

This is what denoising means.