Idea

Apply point-wise spatial attention mechanism to scene parsing.

Method

Let be the newly aggregated feature at position , and be the feature representation at position in the input feature map , then we have the following bi-directional propagation formula:

where enumerates all positions in the region of interest associated with , and represents the relative location of position and

Bi-Direction Information Propagation

For the first term, encodes to what extent the features at other positions can help prediction. Each position collects information from other positions. For the second term, denotes the importance of the feature at one position to to features at other positions. Each position distributes information to others.

Specifically, in this model both and can be regarded as predicted attention values to aggregate feature , rewriting the formula above as

where and denote the predicted attention values in the point-wise attention maps and from collect and distribute branches, respectively.

Architecture

PSA Module

Attention Map Generation

In the collect branch, at each position , with -th row and -th column, we predict how current position is related to other positions based on feature at position .

Specifically, element at -th row and -th column in the attention mask (i.e. ) is

where indexed position in rows and columns.

Model