Motivation
Replace the commonly-used U-Net backbone with a transformer that operates on latent patches in latent diffusion
adaLN-Zero
Block
Adaptive Layer Normalization (AdaLN
)
AdaLN
is designed to adaptively normalize layer outputs based on task-specific characteristics. It allows models to efficiently handle different types of data (e.g., images from various domains) without needing separate models for each task.
In contrast to standard layer normalization, which applies fixed scaling and shifting parameters, AdaLN
learns these parameters dynamically. Specifically, it regresses the scale and shift parameters ( and ) from the sum of the input embeddings, allowing for better adaptation to different tasks while maintaining a shared model architecture.
adaLN-Zero
Unlike standard AdaLN
, which uses learned parameters for scaling and shifting, adaLN-Zero
starts with zero weights for these parameters. This means that initially, the output of the normalization layer does not alter the input, effectively allowing the model to learn from the raw input data without any initial bias introduced by these parameters