A Fast and Accurate Unconstrained Face Detector Shengcai Liao; Jain, Anil K.; Li, Stan Z.
IEEE transactions on pattern analysis and machine intelligence,
2016-Feb.-1, 2016-Feb, 2016-2-1, 20160201, Volume:
38, Issue:
2
Journal Article
Peer reviewed
Open access
We propose a method to address challenges in unconstrained face detection, such as arbitrary pose variations and occlusions. First, a new image feature called Normalized Pixel Difference (NPD) is ...proposed. NPD feature is computed as the difference to sum ratio between two pixel values, inspired by the Weber Fraction in experimental psychology. The new feature is scale invariant, bounded, and is able to reconstruct the original image. Second, we propose a deep quadratic tree to learn the optimal subset of NPD features and their combinations, so that complex face manifolds can be partitioned by the learned rules. This way, only a single soft-cascade classifier is needed to handle unconstrained face detection. Furthermore, we show that the NPD features can be efficiently obtained from a look up table, and the detection template can be easily scaled, making the proposed face detector very fast. Experimental results on three public face datasets (FDDB, GENKI, and CMU-MIT) show that the proposed method achieves state-of-the-art performance in detecting unconstrained faces with arbitrary pose variations and occlusions in cluttered scenes.
Face detection from low-light images is challenging due to limited photons and inevitable noise, which, to make the task even harder, are often spatially unevenly distributed. A natural solution is ...to borrow the idea from multi-exposure , which captures multiple shots to obtain well-exposed images under challenging conditions. High-quality implementation/approximation of multi-exposure from a single image is however nontrivial. Fortunately, as shown in this paper, neither is such high-quality necessary since our task is face detection rather than image enhancement . Specifically, we propose a novel Recurrent Exposure Generation (REG) module and couple it seamlessly with a Multi-Exposure Detection (MED) module, and thus significantly improve face detection performance by effectively inhibiting non-uniform illumination and noise issues. REG produces progressively and efficiently intermediate images corresponding to various exposure settings, and such pseudo-exposures are then fused by MED to detect faces across different lighting conditions. The proposed method, named REGDet , is the first 'detection-with-enhancement' framework for low-light face detection. It not only encourages rich interaction and feature fusion across different illumination levels, but also enables effective end-to-end learning of the REG component to be better tailored for face detection. Moreover, as clearly shown in our experiments, REG can be flexibly coupled with different face detectors without extra low/normal-light image pairs for training. We tested REGDet on the DARK FACE low-light face benchmark with thorough ablation study, where REGDet outperforms previous state-of-the-arts by a significant margin, with only negligible extra parameters.
In this paper, we present a new face detection scheme using deep learning and achieve the state-of-the-art detection performance on the well-known FDDB face detection benchmark evaluation. In ...particular, we improve the state-of-the-art Faster RCNN framework by combining a number of strategies, including feature concatenation, hard negative mining, multi-scale training, model pre-training, and proper calibration of key parameters. As a consequence, the proposed scheme obtained the state-of-the-art face detection performance and was ranked as one of the best models in terms of ROC curves of the published methods on the FDDB benchmark.11The result of this work ranked #1 on the FDDB leaderboard in Feb 2017. An earlier version of this work was submitted to published in arXiv.org on 28 Jan 2017
In recent years, face detection algorithms based on deep learning have made great progress. Nevertheless, the effective utilization of face detectors for small and occlusion faces remains ...challenging, primarily stemming from the limitations in pixel information and the presence of missing features. In this paper, we propose a novel real-time face detector, YOLO-FaceV2, built upon the YOLOv5 architecture. Our approach introduces a Receptive Field Enhancement (RFE) module designed to extract multi-scale pixel information and augment the receptive field for accurately detecting small faces. To address issues related to face occlusion, we introduce an attention mechanism termed the Separated and Enhancement Attention Module (SEAM), which effectively focuses on the regions affected by occlusion. Furthermore, we propose a Slide Weight Function (SWF) to mitigate the imbalance between easy and hard samples. The experiments demonstrate that our YOLO-FaceV2 achieves performance exceeding the state-of-the-art on the WiderFace validation dataset. Source code and pre-trained model are available at https://github.com/Krasjet-Yu/YOLO-FaceV2.
•Proposed an YOLO-FaceV2 detector to address face detection.•Good performance under face occlusion and varying scales.•Designed a novel weighting function alleviated the problem of imbalanced samples.•Detection results on the WiderFace validation dataset are 98.6%, 97.9% and 91.9%.•Achieved state-of-the-art performance on the easy and medium subset of WiderFace dataset.
Joint Multi-View Face Alignment in the Wild Deng, Jiankang; Trigeorgis, George; Zhou, Yuxiang ...
IEEE transactions on image processing,
07/2019, Volume:
28, Issue:
7
Journal Article
Peer reviewed
Open access
The de facto algorithm for facial landmark estimation involves running a face detector with a subsequent deformable model fitting on the bounding box. This encompasses two basic problems: 1) the ...detection and deformable fitting steps are performed independently, while the detector might not provide the best-suited initialization for the fitting step, and 2) the face appearance varies hugely across different poses, which makes the deformable face fitting very challenging and thus distinct models have to be used (e.g., one for profile and one for frontal faces). In this paper, we propose the first, to the best of our knowledge, joint multi-view convolutional network to handle large pose variations across faces in-the-wild, and elegantly bridge face detection and facial landmark localization tasks. The existing joint face detection and landmark localization methods focus only on a very small set of landmarks. By contrast, our method can detect and align a large number of landmarks for semi-frontal (68 landmarks) and profile (39 landmarks) faces. We evaluate our model on a plethora of datasets including the standard static image datasets such as IBUG, 300W, COFW, and the latest Menpo Benchmark for both semi-frontal and profile faces. A significant improvement over the state-of-the-art methods on deformable face tracking is witnessed on the 300VW benchmark. We also demonstrate state-of-the-art results for face detection on FDDB and MALF datasets.
Face detection and alignment in unconstrained environment are challenging due to various poses, illuminations, and occlusions. Recent studies show that deep learning approaches can achieve impressive ...performance on these two tasks. In this letter, we propose a deep cascaded multitask framework that exploits the inherent correlation between detection and alignment to boost up their performance. In particular, our framework leverages a cascaded architecture with three stages of carefully designed deep convolutional networks to predict face and landmark location in a coarse-to-fine manner. In addition, we propose a new online hard sample mining strategy that further improves the performance in practice. Our method achieves superior accuracy over the state-of-the-art techniques on the challenging face detection dataset and benchmark and WIDER FACE benchmarks for face detection, and annotated facial landmarks in the wild benchmark for face alignment, while keeps real-time performance.
Pig face recognition has a wide range of applications in breeding farms, including precision feeding and disease surveillance. This article proposes a method to guarantee its performance in complex ...environments such as with dirty faces and in unconstrained outdoor conditions. First, inspired by the shape of the pig face, a trapezoid normalized pixel difference (T-NPD) feature is designed to achieve more accurate detection in unconstrained outdoor conditions. Subsequently, a trimmed mean attention mechanism (TMAM) uses the trimmed mean-based squeeze method to assign more precise weights to feature channels, and then fuses it into a 50-layer ResNet (ResNet50) backbone network to classify detected pig face images with high accuracy. In addition, the TMAM can be applied in numerous common networks due to its universality. Finally, comprehensive experiments conducted on the publicly available JD pig face dataset indicate that the proposed method has superior performance compared with other methods, with an overall accuracy of 95.06%.
Occlusion and pose variations, which can change facial appearance significantly, are two major obstacles for automatic Facial Expression Recognition (FER). Though automatic FER has made substantial ...progresses in the past few decades, occlusion-robust and pose-invariant issues of FER have received relatively less attention, especially in real-world scenarios. This paper addresses the real-world pose and occlusion robust FER problem in the following aspects. First, to stimulate the research of FER under real-world occlusions and variant poses, we annotate several in-the-wild FER datasets with pose and occlusion attributes for the community. Second, we propose a novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER. The RAN aggregates and embeds varied number of region features produced by a backbone convolutional neural network into a compact fixed-length representation. Last, inspired by the fact that facial expressions are mainly defined by facial action units, we propose a region biased loss to encourage high attention weights for the most important regions. We validate our RAN and region biased loss on both our built test datasets and four popular datasets: FERPlus, AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region biased loss largely improve the performance of FER with occlusion and variant pose. Our method also achieves state-of-the-art results on FERPlus, AffectNet, RAF-DB, and SFEW. Code and the collected test data will be publicly available.
Deep Face Recognition: A Survey Masi, Iacopo; Wu, Yue; Hassner, Tal ...
2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)
Conference Proceeding
Face recognition made tremendous leaps in the last five years with a myriad of systems proposing novel techniques substantially backed by deep convolutional neural networks (DCNN). Although face ...recognition performance sky-rocketed using deep-learning in classic datasets like LFW, leading to the belief that this technique reached human performance, it still remains an open problem in unconstrained environments as demonstrated by the newly released IJB datasets. This survey aims to summarize the main advances in deep face recognition and, more in general, in learning face representations for verification and identification. The survey provides a clear, structured presentation of the principal, state-of-the-art (SOTA) face recognition techniques appearing within the past five years in top computer vision venues. The survey is broken down into multiple parts that follow a standard face recognition pipeline: (a) how SOTA systems are trained and which public data sets have they used; (b) face preprocessing part (detection, alignment, etc.); (c) architecture and loss functions used for transfer learning (d) face recognition for verification and identification. The survey concludes with an overview of the SOTA results at a glance along with some open issues currently overlooked by the community.
Though tremendous strides have been made in uncontrolled face detection, accurate and efficient 2D face alignment and 3D face reconstruction in-the-wild remain an open challenge. In this paper, we ...present a novel single-shot, multi-level face localisation method, named RetinaFace, which unifies face box prediction, 2D facial landmark localisation and 3D vertices regression under one common target: point regression on the image plane. To fill the data gap, we manually annotated five facial landmarks on the WIDER FACE dataset and employed a semi-automatic annotation pipeline to generate 3D vertices for face images from the WIDER FACE, AFLW and FDDB datasets. Based on extra annotations, we propose a mutually beneficial regression target for 3D face reconstruction, that is predicting 3D vertices projected on the image plane constrained by a common 3D topology. The proposed 3D face reconstruction branch can be easily incorporated, without any optimisation difficulty, in parallel with the existing box and 2D landmark regression branches during joint training. Extensive experimental results show that RetinaFace can simultaneously achieve stable face detection, accurate 2D face alignment and robust 3D face reconstruction while being efficient through single-shot inference.