Motivation
- R-CNN is slow because it performs a CNN forward pass for each object proposal, without sharing computation
- The selective search process is also time-consuming (The paper does not optimize this)
Method
- Input the image and use selective search to generate object proposals ()
- Pass the whole input image into a CNN to extract feature maps
- Apply RoI Pooling to extract feature map for each object proposal (Key Optimization: here the input of pooling is the projection of the CNN feature map output on these object proposals, therefore we save the CNN forward computing time)
- Compute the classification loss and box regression loss
RoI Pooling
Just like SPP, which extracts a fixed-size feature representation from a variable-sized RoI in an input feature map. It allows the use of fully connected layers after the convolutional layers, even though the RoIs can have different sizes.