Paper Review: TinaFace - Strong but Simple Baseline for Face Detection

Problem the paper is solving

Establish that it is very little to no gap between face detection and generic object detection.
Import modules from already established techniques in generic object detection to get a simple but strong baseline.

Similarities between face detection and generic object detection

Data perspective - pose, scale occlusion, illumination, blur - all occur in generic objects too
Expression and makeup can be mapped to distortion and colour in objects.
Multi-scale, small faces, dense scenes, they all exist in objects too

Contributions

Face detection - one class generic object detection problem
Use modules and techniques from generic object detection to give a simple strong baseline - TinaFace
Achieve 2nd in wider-face test benchmark in hard settings.

TinaFace

Start from RetinaNet
Deformable Convolutional Networks
- CNN - strong priors about the sampling position
- hard for networks to learn to encode complex geometrical transformations, limited capability
- DCN in stage 4 and stage 5
Inception Module
- Dealing with multi-scale - Multi-scale training, FPN and multi-scale testing are the most common ways
- inception module uses 3*3 convolution layers in parallel to form features of different receptive fields and combine them, helping capture object as well as context
IoU Aware Branch
- method to relieve the mismatch problem between classification score and localization accuracy of single-stage detector
- helps resort classification score and suppress false positive detected boxes
- At the inference time, final detection confidence is computed by the following equation,

Distance IoU Loss
- Smooth L1 loss is not consistent with the regression evaluation metric, IOU.
- More friendly to small objects, as compared to other IOU-based losses.

Experiments

Results

Reference

2011.13183.pdf (arxiv.org)

Recent Posts