Idea
Layer normalization is a technique used in machine learning to normalize the inputs across features for each sample in a layer independently, regardless of batch size.
Batch Normalization vs Layer Normalization
Batch Normalization:
- Normalizes inputs across the batch dimension () for each feature
- Computes mean and variance across the batch for each feature channel independently
- Works well with fully connected layers and CNNs
- Performance can depend on batch size, which can be problematic for small batches or recurrent models
Layer Normalization:
- Normalizes inputs across the feature dimension () for each sample individually
- Computes mean and variance across all features for each individual sample
- Works better with RNNs
- Does not depend on batch size, making it more suitable for scenarios with variable batch sizes
In the diagram, stands for channels/features, stands for batches and stand for the height and weight of the feature map
For detail algorithm implement, refer to Batch Normalization