Architectures to Capture Multi-scale Context
Atrous Spatial Pyramid Pooling
Improve from DeepLab v1
Method
Four parallel atrous convolutions with different atrous rates are applied on top of the feature map.
Problem
As the sampling rate becomes larger, the number of valid filter weights (i.e., the weights that are applied to the valid feature region, instead of padded zeros) becomes smaller.
Solution
Concatenate the pooling result with image level features. Specifically, apply global average pooling on the last feature map of the model, feed the resulting image-level features to a convolution with filters (and batch normalization), and then bilinearly upsample the feature to the desired spatial dimension.
The resulting features from all the branches are then concatenated and pass through another convolution (also with filters and batch normalization) before the final convolution which generates the final logits.