Architectures to Capture Multi-scale Context

Atrous Spatial Pyramid Pooling

Method

Four parallel atrous convolutions with different atrous rates are applied on top of the feature map.

Problem

As the sampling rate becomes larger, the number of valid filter weights (i.e., the weights that are applied to the valid feature region, instead of padded zeros) becomes smaller.

Solution

Concatenate the pooling result with image level features. Specifically, apply global average pooling on the last feature map of the model, feed the resulting image-level features to a $1 \times 1$ convolution with $256$ filters (and batch normalization), and then bilinearly upsample the feature to the desired spatial dimension.

The resulting features from all the branches are then concatenated and pass through another $1 \times 1$ convolution (also with $256$ filters and batch normalization) before the final $1 \times 1$ convolution which generates the final logits.

Lin's Notes Garden

Explorer

Rethinking Atrous Convolution for Semantic Image Segmentation

Architectures to Capture Multi-scale Context

Atrous Spatial Pyramid Pooling

Method

Problem

Solution

Graph View

Table of Contents