•A novel cattle face identification framework for practical production scenarios.•RetinaFace-mobilenet is designed for cattle face detection and location.•ArcFace Loss is combined with ...RetinaFace-mobilenet applied for individual identification.•The framework outperforms other competing algorithms and achieves accuracy of 91.3%.
Cattle identification is crucial to be registered for breeding association, food quality tracing, disease prevention and control and fake insurance claims. Traditional non-biometrics methods for cattle identification is not really satisfactory in providing reliability due to theft, fraud, and duplication. In this study, a computer vision technique was proposed to facilitate precision animal management and improve livestock welfare. This paper presents a novel face identification framework by integrating light-weight RetinaFace-mobilenet with Additive Angular Margin Loss (ArcFace), namely CattleFaceNet. RetinaFace-mobilenet is designed for face detection and location, and ArcFace is adopted to strengthen the within-class compactness and also between-class discrepancy during training. Experiments on real-word scenarios dataset prove that RetinaFace-mobilenet achieves superior detection performance and significantly accelerates the computation time against RetinaNet. Three loss functions utilized in human face recognition combined with RetinaFace-mobilenet are compared and results indict that the proposed CattleFaceNet outperforms others with identification accuracy of 91.3% and processing time of 24 frames per second (FPS). This research work demonstrates the potential candidate of CattleFaceNet for livestock identification in real time in practical production scenarios.
Face recognition models often encounter various unseen domains and environments in real-world applications, leading to unsatisfactory performance due to the open-set nature of face recognition. ...Models trained on central datasets may exhibit poor generalization when faced with different candidates under varying illumination and blur conditions. In this paper, our goal is to enhance the generalization of face recognition models for diverse target conditions without relying on active or incremental learning. We propose an approach for face recognition that utilizes contrastive learning to synthesize positive and multiple negative samples. To address the combinatorial challenges posed by positive and negative samples, our framework incorporates a combination of contrastive regularizer loss and Arcface loss, along with an effective sampling strategy for batch model learning. We update the model weights by jointly back-propagating contrastive and ArcFace gradients. We validate our method on both generalized and standard face recognition benchmarks dataset namely IJB-B and IJB-C. Series of experimentation revealed the out-performance of proposed framework against other state-of-the-art methods.
•Proposed generalized face recognition aimed to handle unknown target domains without model updates or fine-tuning.•Proposed contrastive learning-based approach to address problem of illumination and motion blur for registered candidates.•Employ augmentation techniques to generate positive and negative samples to mitigate computational complexities.•Integrate ArcFace and contrastive regularizer loss to learn distinctive face representation for each identity.•Performed series of experiments for proof convergence of proposed model on IJB-B and IJB-C datasets.
•An unsupervised method is proposed to detect unknown anomalous sounds without training data of anomalous sounds.•The ArcFace loss-based classifier is presented to ensure the high separability of the ...hidden features, thus improving the performance of detection.•The Gaussian mixture model-based anomaly score calculation method is adopted to determine more complex decision boundaries.
The operation state of machine can be monitored by performing anomalous sound detection (ASD). Unsupervised-ASD is a detection task in which the model detects unknown anomalous sounds without the use of anomalous sounds to train it. However, when detecting completely unknown anomalous samples, it is challenging to classify samples with high similarities and determine decision boundaries, leading to the poor detection performance. In light of the above deficiencies, we propose the ArcFace classifier and Gaussian mixture model (GMM) based unsupervised-ASD method. The Arcface loss-based classifier is proposed to aggregate the hidden features of different classes into the corresponding arc space, which increases the separability of samples. The GMM-based anomaly score calculation is presented to determine the more complex decision boundary. Experiments are carried out on the datasets provided by DCASE 2020 Task 2 and CRWU. As demonstrated by the areas under the receiver operating characteristic curve (AUC) and the partial AUC (pAUC), the proposed method has better performance compared with other methods in comparison.
Speech emotion recognition (SER) is an essential part of human–computer interaction. Meanwhile, the SER has widely utilized multimodal information in SER in recent years. This paper focuses on ...exploiting the acoustic and textual modalities for the SER task. We propose a bimodal network based on an Audio–Text-Interactional-Attention (ATIA) structure which can facilitate the interaction and fusion of the emotionally salient information within the acoustic and textual modalities. We also explored four different ATIA structures and verified their effectiveness. Finally, we selected one ATIA structure to build our bimodal network with the best performance. Furthermore, our SER model adopts an additive angular margin loss, named ArcFace loss, applied to the deep face recognition field. Compared with the widespread Softmax loss, our visualization results demonstrated the effectiveness of the ArcFace loss function. ArcFace loss can improve the discriminate power of features by focusing on the angles between the features and the weights. As we know, it is the first time to apply ArcFace loss in the field of SER. Finally, the results show that the bimodal network combined ArcFace loss achieved 72.8% of Weighted Accuracy (WA) and 62.5% of Unweighted Accuracy (UA) for the seven-class emotion classification, and 82.4% of WA and 80.6% of UA for the four-class emotion classification on the IEMOCAP dataset.
•A bimodal network based on an Audio–Text-Interactional-Attention (ATIA) structure and ArcFace loss is proposed.•Four ATIA structures were explored for Speech Emotion Recognition (SER) network.•The discriminative power of ArcFace loss is visually demonstrated.•Experimental evaluation demonstrates that the proposed bimodal network with ArcFace loss outperforms multiple recently reported multimodal SER methods.
In multi-view 3D object retrieval tasks, it is pivotal to aggregate visual features extracted from multiple view images to generate a discriminative representation for a 3D object. The existing ...multi-view convolutional neural network employs view pooling for feature aggregation, which ignores the local view-relevant discriminative information within each view image and the global correlative information across all view images. To leverage both types of information, we propose two self-attention modules, namely, View Attention Module and Instance Attention Module, to learn view and instance attentive features, respectively. The final representation of a 3D object is the aggregation of three features: original, view-attentive, and instance-attentive. Furthermore, we propose employing the ArcFace loss together with the cosine-distance-based triplet-center loss as the metric learning guidance to train our model. As the cosine distance is used to rank the retrieval results, our angular metric learning losses achieve a consistent objective between the training and testing processes, thereby facilitating discriminative feature learning. Extensive experiments and ablation studies are conducted on four publicly available datasets on 3D object retrieval to show the superiority of the proposed method over multiple state-of-the-art methods.
•We propose to leverage the aggregation of view and instance attentive features for multi-view 3D object retrieval.•To leverage local view-relevant discriminative information within each of the view images, we propose a View Attention Module (VAM) to learn view attentive features for each view image.•To leverage global correlative information across all the view images, we propose an Instance Attention Module (IAM) to learn instance attentive features for each view image.•We propose to employ ArcFace loss together with cosine distance based triplet-center loss as the metric learning guidance to learn discriminative representations in the angular feature space.
Noonan syndrome (NS), a genetically heterogeneous disorder, presents with hypertelorism, ptosis, dysplastic pulmonary valve stenosis, hypertrophic cardiomyopathy, and small stature. Early detection ...and assessment of NS are crucial to formulating an individualized treatment protocol. However, the diagnostic rate of pediatricians and pediatric cardiologists is limited. To overcome this challenge, we propose an automated facial recognition model to identify NS using a novel deep convolutional neural network (DCNN) with a loss function called additive angular margin loss (ArcFace).
The proposed automated facial recognition models were trained on dataset that included 127 NS patients, 163 healthy children, and 130 children with several other dysmorphic syndromes. The photo dataset contained only one frontal face image from each participant. A novel DCNN framework with ArcFace loss function (DCNN-Arcface model) was constructed. Two traditional machine learning models and a DCNN model with cross-entropy loss function (DCNN-CE model) were also constructed. Transfer learning and data augmentation were applied in the training process. The identification performance of facial recognition models was assessed by five-fold cross-validation. Comparison of the DCNN-Arcface model to two traditional machine learning models, the DCNN-CE model, and six physicians were performed.
At distinguishing NS patients from healthy children, the DCNN-Arcface model achieved an accuracy of 0.9201 ± 0.0138 and an area under the receiver operator characteristic curve (AUC) of 0.9797 ± 0.0055. At distinguishing NS patients from children with several other genetic syndromes, it achieved an accuracy of 0.8171 ± 0.0074 and an AUC of 0.9274 ± 0.0062. In both cases, the DCNN-Arcface model outperformed the two traditional machine learning models, the DCNN-CE model, and six physicians.
This study shows that the proposed DCNN-Arcface model is a promising way to screen NS patients and can improve the NS diagnosis rate.
In the past few years, there has been a leap from traditional palmprint recognition methodologies, which use handcrafted features, to deep-learning approaches that are able to automatically learn ...feature representations from the input data. However, the information that is extracted from such deep-learning models typically corresponds to the global image appearance, where only the most discriminative cues from the input image are considered. This characteristic is especially problematic when data is acquired in unconstrained settings, as in the case of contactless palmprint recognition systems, where visual artifacts caused by elastic deformations of the palmar surface are typically present in spatially local parts of the captured images. In this study we address the problem of elastic deformations by introducing a new approach to
based on a novel CNN model, designed as a two-path architecture, where one path processes the input in a holistic manner, while the second path extracts local information from smaller image patches sampled from the input image. As elastic deformations can be assumed to most significantly affect the global appearance, while having a lesser impact on spatially local image areas, the local processing path addresses the issues related to elastic deformations thereby supplementing the information from the global processing path. The model is trained with a learning objective that combines the Additive Angular Margin (ArcFace) Loss and the well-known center loss. By using the proposed model design, the discriminative power of the learned image representation is significantly enhanced compared to standard holistic models, which, as we show in the experimental section, leads to state-of-the-art performance for contactless palmprint recognition. Our approach is tested on two publicly available contactless palmprint datasets-namely, IITD and CASIA-and is demonstrated to perform favorably against state-of-the-art methods from the literature. The source code for the proposed model is made publicly available.
Face Recognition is an important research topics in Machine Learning and Artificial Intelligence. It analyses and compares person's facial traits with a database that contains different faces to ...automatically recognise and verify a person's identity. It has attracted a lot of attention recently due to its non-intrusive nature. Unlike other biometric identification systems, face recognition does not require physical contact with the individual being identified, making it more convenient and hygienic. Although the existing face recognition system has achieved better performance, recognizing the obscured and disguised faces is difficult. Thus, to deal with these problems, this paper reveals a new real time unique face recognition network called YOLO-InsightFace that combines YOLO-V7, a cutting-edge deep learning model and InsightFace, one-of-a-kind 2D & 3D face analysis python module. YOLO-V7 is highly accurate and fast, making it ideal for real-time applications while InsightFace is capable of recognizing faces by generating highly discriminative face embeddings.
Facial expression recognition is an important research direction of emotion computing and has broad application prospects in human-computer interaction. However, noise such as illumination and ...occlusion in natural environment brings many challenges to facial expression recognition. In order to solve the problems such as low recognition rate of facial expression in natural environment, unable to highlight the characteristics of facial expression in global facial research, and misclassification caused by the similarity between negative expressions. In this paper, a multi-region coordinate attentional residual expression recognition model (MrCAR) is proposed. The model is mainly composed of the following three parts: 1) multi-region input: MTCNN is used for face detection and alignment processing, and the eyes and mouth parts are further cropped to obtain multi-region pictures. Through multi-region input, local details and global features are more easily obtained, which reduces the influence of complex environmental noise and highlights the facial features. 2) Feature extraction module: On the basis of residual element, CA-Net and multi-scale convolution were added to obtain coordinate residual attention module, through which the model's ability to distinguish subtle changes of expression and the utilization rate of key features were improved; 3) Classifier: Arcface Loss is used to enhance intra-class tightness and inter-class difference at the same time, thus reducing the wrong classification of negative expressions by the model. Finally, the accuracy rates of CK+, JAFFE, FER2013 and RAF-DB were 98.78%, 99.09%, 74.50% and 88.26%, respectively. The experimental results show that compared with many advanced models, the MrCAR model in this paper is more competent for the task of expression classification.
The Chinese mitten crab (Eriocheir sinensis), a species unique to Chinese aquaculture, holds significant economic value in the seafood market. In response to increasing concerns about the quality and ...safety of Chinese mitten crab products, the high traceability costs, and challenges for consumers in verifying the authenticity of individual crabs, this study proposes a lightweight individual recognition model for Chinese mitten crab carapace images based on an improved MobileNetV2. The method first utilizes a lightweight backbone network, MobileNetV2, combined with a coordinate attention mechanism to extract features of the Chinese mitten crab carapace, thereby enhancing the ability to recognize critical morphological features of the crab shell while maintaining the model’s light weight. Then, the model is trained using the ArcFace loss function, which effectively extracts the generalized features of the Chinese mitten crab carapace images. Finally, authenticity is verified by calculating the similarity between two input images of Chinese mitten crab carapaces. Experimental results show that the model, combined with the coordinate attention mechanism and ArcFace, achieves a high accuracy rate of 98.56% on the Chinese mitten crab image dataset, surpassing ShuffleFaceNet, MobileFaceNet, and VarGFaceNet by 13.63, 11.1, and 6.55 percentage points, respectively. Moreover, it only requires an average of 1.7 milliseconds per image for verification. While maintaining lightness, this model offers high efficiency and accuracy, offering an effective technical solution for enhancing the traceability of Chinese mitten crab products and combating counterfeit goods.