Idea
Improve the semantic segmentation results by effectively utilizing global context, which captures the semantic context of scenes and selectively enhances class-dependent feature maps.
Context Encoding Module
Encoding Layer
Input
Considering the input feature map with the shape of as a set of -dimensional input features , where
Codebook
The Encoding Layer learns a codebook containing codewords (or visual centers)
Each represents a distinct visual center in the feature space that captures a specific semantic meaning or category in the input data
Smoothing Factor
Alongside the codebook, the layer also learns a set of smoothing factors
Each corresponds to a smoothing factor for the visual center . These factors are used to control the influence of each visual center on the feature representation
Output
The layer outputs the residual encoder by aggregating the residuals with soft-assignment weights , where
and the residuals are given by . The final output is , where denotes Batch Normalization with ReLU activation. ( is a -dimensional vector)
Feature Map Attention
Use a fully connected layer to predict feature map scaling factors , where denotes the layer weights and is the sigmoid function. Then the module output is given by .