Architectures to Capture Multi-scale Context

Atrous Spatial Pyramid Pooling

Improve from DeepLab v1

Method

Four parallel atrous convolutions with different atrous rates are applied on top of the feature map.

Problem

As the sampling rate becomes larger, the number of valid filter weights (i.e., the weights that are applied to the valid feature region, instead of padded zeros) becomes smaller.

Solution

Concatenate the pooling result with image level features. Specifically, apply global average pooling on the last feature map of the model, feed the resulting image-level features to a convolution with filters (and batch normalization), and then bilinearly upsample the feature to the desired spatial dimension.

The resulting features from all the branches are then concatenated and pass through another convolution (also with filters and batch normalization) before the final convolution which generates the final logits.