Motivation

Extend Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition
The mask branch is a small FCN applied to each RoI, predicting a segmentation mask in a pixel-to-pixel manner.

Architecture

Procedure

Mask R-CNN adopts the same two-stage procedure as Faster R-CNN, with an identical first stage (which is RPN). In the second stage, in parallel to predicting the class and box offset, Mask R-CNN also outputs a binary mask for each RoI via a small FCN

Formally, during training, we define a multi-task loss on each sampled RoI as $L = L_{cls} + L_{box} + L_{mask}$ , where $L_{cls}$ and $L_{box}$ are identical as Faster R-CNN. The mask branch has a $K m^{2}$ -dimensional output for each RoI, which encodes $K$ binary masks of resolution $m \times m$ , one for each of the $K$ classes. To this we apply a per-pixel sigmoid, and define $L_{mask}$ as the average binary cross-entropy loss. For an RoI associated with ground-truth class $k$ , $L_{mask}$ is only defined on the $k$ -th mask.

Lin's Notes Garden

Explorer

Mask R-CNN

Motivation

Architecture

Procedure

Graph View

Table of Contents

Backlinks