Idea
Lower the computational cost of diffusion models by performing the denoising process in a low dimensional latent space.
VAE
KL Regularization (KL-reg)
This method applies a Kullback–Leibler Divergence penalty, which helps push the latent space distribution to match a standard normal distribution. This is similar to a VAE, where the KL-divergence between the learned latent distribution and a prior (usually normal) is minimized. It prevents the latent space from becoming overly spread out or having high variance.
Vector Quantization Regularization (VQ-reg)
In this variant, the model incorporates vector quantization (VQ) into the decoder. Vector quantization means that instead of a continuous latent space, the latent representations are discretized into a set of predefined "codebook" vectors. This method is associated with VQGAN (Vector Quantized Generative Adversarial Network), which uses a similar mechanism. However, in this case, the quantization is absorbed by the decoder, simplifying the architecture.
Note
This is not a VQ-VAE
Latent Space and Compression
The learned latent space , where is the encoder mapping the input to the latent space, is structured in two dimensions. This is important because the latent diffusion model (DM) works with a two-dimensional latent space, preserving more structure and detail from the input xxx.
In contrast to other works that use a 1D latent representation (ignoring much of the structure), this model uses a mild compression rate, leading to better reconstructions of the input.