Idea

Based on PSPNet, make it faster and suitable for real-time semantic segmentation on high-resolution images.

Architecture

Cascade Image Input

Apply muti-resolution inputs. Though some details are missing and blurry boundaries are generated in the top branch, it already harvests most semantic parts. Therefore, we can safely limit the number of parameters in both middle and bottom branches.

Light weighted CNNs are adopted in higher resolution branches.

Cascade Feature Fusion

The inputs are:

  • two feature maps and with sizes and respectively
  • one ground-truth label with resolution The outputs are:
  • one concatenated feature map with sizes
  • one auxiliary loss (cascade label guidance)