top of page
Search

Paper Review: TinaFace - Strong but Simple Baseline for Face Detection

Problem the paper is solving

  • Establish that it is very little to no gap between face detection and generic object detection.

  • Import modules from already established techniques in generic object detection to get a simple but strong baseline.

Similarities between face detection and generic object detection

  • Data perspective - pose, scale occlusion, illumination, blur - all occur in generic objects too

  • Expression and makeup can be mapped to distortion and colour in objects.

  • Multi-scale, small faces, dense scenes, they all exist in objects too

Contributions

  • Face detection - one class generic object detection problem

  • Use modules and techniques from generic object detection to give a simple strong baseline - TinaFace

  • Achieve 2nd in wider-face test benchmark in hard settings.

TinaFace



  • Start from RetinaNet

  • Deformable Convolutional Networks

    • CNN - strong priors about the sampling position

    • hard for networks to learn to encode complex geometrical transformations, limited capability

    • DCN in stage 4 and stage 5

  • Inception Module

    • Dealing with multi-scale - Multi-scale training, FPN and multi-scale testing are the most common ways

    • inception module uses 3*3 convolution layers in parallel to form features of different receptive fields and combine them, helping capture object as well as context

  • IoU Aware Branch

    • method to relieve the mismatch problem between classification score and localization accuracy of single-stage detector

    • helps resort classification score and suppress false positive detected boxes

    • At the inference time, final detection confidence is computed by the following equation,

  • Distance IoU Loss

    • Smooth L1 loss is not consistent with the regression evaluation metric, IOU.

    • More friendly to small objects, as compared to other IOU-based losses.

  • b, b(gt) denote central points of predicted and ground truth boxes

  • rho is euclidean distance

  • c is the diagonal length of the smallest enclosing box covering the two boxes


Experiments

  • Implementation details

    • Feature extractor - Resnet-50, FPN, 6 levels in FPN from p2 to p7.

    • Losses: classification - focal loss, regression - DIoU Loss, IoU prediction - cross-entropy loss

    • Normalization - Group Normalization - stable and independent of batch sizes, model performance decrease with batch size when using batch normalization.

    • Anchor settings - 6 anchors from the set 2**(4/3)*{4,8,16,32,64) are used for 6 levels of FPN, IoU threshold for matching is set at 0.35

    • Data Augmentation - crop square patch from the original picture with random size, Do photo distortion and horizontal flip with 0.5 probability, resize to 640*640 and normalize.

    • Training - train model by using SGD optimizer with batch size 4 (single GPU), annealing learning rate with a warmup for first 500 iterations.

Results

  • Precision-Recall Curves on WIDER FACE validation and test subsets.

Reference






 
 
 

コメント


©2020 by Akshat Mandloi

bottom of page