Idea

Apply point-wise spatial attention mechanism to scene parsing.

Method

Let $z_{i}$ be the newly aggregated feature at position $i$ , and $x_{i}$ be the feature representation at position $i$ in the input feature map $X$ , then we have the following bi-directional propagation formula:

z_{i} = \frac{1}{N} \forall j \in Ω (i) \sum [F_{Δ_{ij}} (x_{i}) x_{j} + F_{Δ_{ji}} (x_{j}) x_{i}]

where $\forall j \in Ω (i)$ enumerates all positions in the region of interest associated with $i$ , and $Δ_{ij}$ represents the relative location of position $i$ and $j$

Bi-Direction Information Propagation

For the first term, $F_{Δ_{ij}} (x_{i})$ encodes to what extent the features at other positions can help prediction. Each position collects information from other positions. For the second term, $F_{Δ_{ij}} (x_{j})$ denotes the importance of the feature at one position to to features at other positions. Each position distributes information to others.

Specifically, in this model both $F_{Δ_{ij}} (x_{i})$ and $F_{Δ_{ij}} (x_{j})$ can be regarded as predicted attention values to aggregate feature $x_{j}$ , rewriting the formula above as

z_{i} = \frac{1}{N} \forall j \sum a_{i, j}^{c} x_{i} + \frac{1}{N} \forall j \sum a_{i, j}^{d} x_{j}

where $a_{i, j}^{c}$ and $a_{i, j}^{d}$ denote the predicted attention values in the point-wise attention maps $A^{c}$ and $A^{d}$ from collect and distribute branches, respectively.

Architecture

PSA Module

Attention Map Generation

In the collect branch, at each position $i$ , with $k$ -th row and $l$ -th column, we predict how current position is related to other positions based on feature at position $i$ .

Specifically, element at $s$ -th row and $k$ -th column in the attention mask $a_{i}^{c}$ (i.e. $a_{[k, l]}^{c}$ ) is

a_{[k, l], [s, t]}^{c} = h_{[k, l], [H - k + s, W - l + t]}^{c}, \forall s \in [0, H), t \in [0, W)

where $[\cdot, \cdot]$ indexed position in rows and columns.

Lin's Notes Garden

Explorer

PSANet: Point-wise Spatial Attention Network for Scene Parsing

Idea

Method

Bi-Direction Information Propagation

Architecture

PSA Module

Attention Map Generation

Model

Graph View

Table of Contents

Backlinks