Paper Review: Centernet: Keypoint Triplets for Object Detection

Problem the paper is solving

The paper aims to improve keypoint based object detection
The approach before this suffers from incorrect bounding boxes.
This paper solves this issue by detecting each object as triplet rather than a pair of keypoints
Introduce two modules - cascade corner pooling and center pooling - for enriching of information.

Issues with anchor based approaches

Problems with CornerNet

Improvements

ability to perceive visual patterns within each proposed region
if predicted bb has high IOu with ground truth, high prob that center keypoint lies in central region
center pooling - helps center keypoint obtain more recognizable visual patterns
max summed response in both horizontal and vertical directions
cascade corner pooling - equips original corner pooling module with ability of perceiving internal information.

Cornernet as baseline

produces two heatmaps - top-left cornet and bottom-right corner
heatmaps - represent location of keypoints + assigns a confidence score for each point + predicts an embedding and a group of offsets for each corner
embeddings - used to identify two corners from same object
Offsets - learn to remap corners from heatmaps to input image
generating bounding boxes - top-k left top corners and bottom right corners are selected according to scores, distance between embedding vectors of a pair of corners is calculated if paired corners belong to same object based on a threshold
score of bounding box - average scores of corner pairs

Centernet

Center Pooling

geometric centers do not necessarily convey very recognizable visual patters
backbone -> feature map -> find maximum value in both horizontal and vertical directions and add them together

Cascade Corner Pooling

CornerNet - uses Corner Pooling - Aims to find maximum values on boundary directions so as to determine corners - makes corners sensitive to edges
Cascade corner pooling - finds the maximum value along the boundary, then looks inwards along location of boundary maximum value to find internal maximum value - add the two values together

Building Center Pooling and Casacade Corner pooling

Training

Input image - 511*511 -> heatmaps of size 128*128
Training loss - focal loss for corner and center keypoints + pull loss for corners + push loss corners + l1 losses for corners to learn offsets
alpha, beta, gamma are weights for pull, push and offset losses (0.1, 0.1 and 1 by default)

Inference

top 70 center keypoints and top 70 top-left corners and top 70 bottom-right corners are selected
soft-nms is used
top 100 bounding boxes are selected based on scores

Results

Reference

Link - 1904.08189v3.pdf (arxiv.org)

All figures are taken from the link mentioned above.

Recent Posts