Paper Review: Centernet: Keypoint Triplets for Object Detection
- Akshat Mandloi
- Feb 8, 2023
- 3 min read
Problem the paper is solving
The paper aims to improve keypoint based object detection
The approach before this suffers from incorrect bounding boxes.
This paper solves this issue by detecting each object as triplet rather than a pair of keypoints
Introduce two modules - cascade corner pooling and center pooling - for enriching of information.
Issues with anchor based approaches
Size and aspect ratio need to be manually designed
Need large number of anchors to achieve high IOU rate with ground truths
Anchors are usually not aligned with ground truth boxes
Problems with CornerNet
Cornernet - each object was represented by cornet points
but had weak ability of referring to global information of object
sensitive to object boundaries
Improvements
ability to perceive visual patterns within each proposed region
if predicted bb has high IOu with ground truth, high prob that center keypoint lies in central region
center pooling - helps center keypoint obtain more recognizable visual patterns
max summed response in both horizontal and vertical directions
cascade corner pooling - equips original corner pooling module with ability of perceiving internal information.

Cornernet as baseline
produces two heatmaps - top-left cornet and bottom-right corner
heatmaps - represent location of keypoints + assigns a confidence score for each point + predicts an embedding and a group of offsets for each corner
embeddings - used to identify two corners from same object
Offsets - learn to remap corners from heatmaps to input image
generating bounding boxes - top-k left top corners and bottom right corners are selected according to scores, distance between embedding vectors of a pair of corners is calculated if paired corners belong to same object based on a threshold
score of bounding box - average scores of corner pairs
Centernet
Each object represented by a triplet - additional keypoint in center
Additionaly embed heatmap for center keypoint too and predict its offsets
USe the cornernet method to generate top k boxes
Procedure to select top k boxes -
select top k boxes according to their scores
use offsets to remap boxes to input image
define central region for each box and see if central region contains the center keypoint and has same class as corner keypoints
if center keypoint is detected, preserve the box
scores = avg of the three keypoints
Choosing the central region size - affects precision and recall
Scale aware central region is proposed

n=3 for boxes less than 150 (width) and n=5 for boxes greater than 150
Center Pooling
geometric centers do not necessarily convey very recognizable visual patters
backbone -> feature map -> find maximum value in both horizontal and vertical directions and add them together
Cascade Corner Pooling
CornerNet - uses Corner Pooling - Aims to find maximum values on boundary directions so as to determine corners - makes corners sensitive to edges
Cascade corner pooling - finds the maximum value along the boundary, then looks inwards along location of boundary maximum value to find internal maximum value - add the two values together
Building Center Pooling and Casacade Corner pooling
Can be achieved by combining corner pooling at different locations

Training
Input image - 511*511 -> heatmaps of size 128*128
Training loss - focal loss for corner and center keypoints + pull loss for corners + push loss corners + l1 losses for corners to learn offsets
alpha, beta, gamma are weights for pull, push and offset losses (0.1, 0.1 and 1 by default)
Inference
top 70 center keypoints and top 70 top-left corners and top 70 bottom-right corners are selected
soft-nms is used
top 100 bounding boxes are selected based on scores
Results

Reference
Link - 1904.08189v3.pdf (arxiv.org)
All figures are taken from the link mentioned above.
Commenti