Paper Review: TinaFace - Strong but Simple Baseline for Face Detection
- Akshat Mandloi
- Mar 9, 2023
- 2 min read
Problem the paper is solving
Establish that it is very little to no gap between face detection and generic object detection.
Import modules from already established techniques in generic object detection to get a simple but strong baseline.
Similarities between face detection and generic object detection
Data perspective - pose, scale occlusion, illumination, blur - all occur in generic objects too
Expression and makeup can be mapped to distortion and colour in objects.
Multi-scale, small faces, dense scenes, they all exist in objects too
Contributions
Face detection - one class generic object detection problem
Use modules and techniques from generic object detection to give a simple strong baseline - TinaFace
Achieve 2nd in wider-face test benchmark in hard settings.
TinaFace

Start from RetinaNet
Deformable Convolutional Networks
CNN - strong priors about the sampling position
hard for networks to learn to encode complex geometrical transformations, limited capability
DCN in stage 4 and stage 5
Inception Module
Dealing with multi-scale - Multi-scale training, FPN and multi-scale testing are the most common ways
inception module uses 3*3 convolution layers in parallel to form features of different receptive fields and combine them, helping capture object as well as context
IoU Aware Branch
method to relieve the mismatch problem between classification score and localization accuracy of single-stage detector
helps resort classification score and suppress false positive detected boxes
At the inference time, final detection confidence is computed by the following equation,

Distance IoU Loss
Smooth L1 loss is not consistent with the regression evaluation metric, IOU.
More friendly to small objects, as compared to other IOU-based losses.

b, b(gt) denote central points of predicted and ground truth boxes
rho is euclidean distance
c is the diagonal length of the smallest enclosing box covering the two boxes
Experiments
Implementation details
Feature extractor - Resnet-50, FPN, 6 levels in FPN from p2 to p7.
Losses: classification - focal loss, regression - DIoU Loss, IoU prediction - cross-entropy loss
Normalization - Group Normalization - stable and independent of batch sizes, model performance decrease with batch size when using batch normalization.
Anchor settings - 6 anchors from the set 2**(4/3)*{4,8,16,32,64) are used for 6 levels of FPN, IoU threshold for matching is set at 0.35
Data Augmentation - crop square patch from the original picture with random size, Do photo distortion and horizontal flip with 0.5 probability, resize to 640*640 and normalize.
Training - train model by using SGD optimizer with batch size 4 (single GPU), annealing learning rate with a warmup for first 500 iterations.
Results
Precision-Recall Curves on WIDER FACE validation and test subsets.

Reference
コメント