Motivation

An improvement of YOLO v1, and generated a bigger object detection dataset.

Better (Accuracy Improvement)

Apply Batch Normalization
Use high resolution classifier backbone, like SSD
Use anchor boxes instead of directly predicting the exact coordinates, like Faster R-CNN, SSD
Other than hand-pick the dimensions (heights and weights) of the anchor boxes, the new architecture first collect the ground truth dimensions in the training data and then run $k$ -means to find out the best dimensions
Predict the positions of box center $x, y$ using relative coordinates (so that $x, y$ fall in $(0, 1)$ ) instead of exact positions
Get the features of multi-scale objects via a pass-through layer to concatenate the low-level features with high level features (while Faster R-CNN and SSD run their proposal networks on feature maps with different sizes)

Faster

Use customized backbone network Darknet-19

Stronger (More Categories Supported)

Since object detection datasets are way more limited than classification datasets, the author proposes a mechanism for jointly training on classification and detection data.

When the network sees an image labelled for detection it can back-propagate based on the full YOLOv2 loss function. When it sees a classification image we only back-propagate loss from the classification-specific parts of the architecture.

Lin's Notes Garden

Explorer

YOLO9000: Better, Faster, Stronger

Motivation

Better (Accuracy Improvement)

Faster

Stronger (More Categories Supported)

Hierarchical Classification

Graph View

Table of Contents

Backlinks