Idea

View image segmentation as rendering problem and to adapt classical idea from computer graphics to efficiently "render" high-quality label maps, that is, for each "rendering" iteration focus on the most ambiguous points to recover detail on the finer gird.

Method

  1. A point selection strategy chooses a small number of real-value points to make predictions on (these points should be located more densely near high-frequency areas), avoiding excessive computation for all pixels in the high-resolution output grid.
  2. In each iteration, PointRend upsamples its previously predicted segmentation using bilinear interpolation and then selects the most uncertain points (e.g., those with probabilities closest to for a binary mask) on this denser grid. PointRend then computes the point-wise feature representation for each of these points and predicts their labels. This process is repeated until the segmentation is upsampled to a desired resolution.

Architecture

Select key points to refine the coarse prediction to better predict the details