Cross-modality person re-identification is a challenging problem which retrieves a given pedestrian image in RGB modality among all the gallery images in infrared modality. The task can address the ...limitation of RGB-based person Re-ID in dark environments. Existing researches mainly focus on enlarging inter-class differences of feature to solve the problem. However, few studies investigate improving intra-class cross-modality similarity, which is important for this issue. In this paper, we propose a novel loss function, called Hetero-Center loss (HC loss) to reduce the intra-class cross-modality variations. Specifically, HC loss can supervise the network learning the cross-modality invariant information by constraining the intra-class center distance between two heterogenous modalities. With the joint supervision of Cross-Entropy (CE) loss and HC loss, the network is trained to achieve two vital objectives, inter-class discrepancy and intra-class cross-modality similarity as much as possible. Besides, we propose a simple and high-performance network architecture to learn local feature representations for cross-modality person re-identification, which can be a baseline for future research. Extensive experiments indicate the effectiveness of the proposed methods, which outperform state-of-the-art methods by a wide margin.
Person search in real-world scenarios is a new challenging computer version task with many meaningful applications. The challenge of this task mainly comes from: (1) unavailable bounding boxes for ...pedestrians and the model needs to search for the person over the whole gallery images; (2) huge variance of visual appearance of a particular person owing to varying poses, lighting conditions, and occlusions. To address these two critical issues in modern person search applications, we propose a novel Individual Aggregation Network (IAN) that can accurately localize persons by learning to minimize intra-person feature variations. IAN is built upon the state-of-the-art object detection framework, i.e., faster R-CNN, so that high-quality region proposals for pedestrians can be produced in an online manner. In addition, to relieve the negative effect caused by varying visual appearances of the same individual, IAN introduces a novel center loss that can increase the intra-class compactness of feature representations. The engaged center loss encourages persons with the same identity to have similar feature characteristics. Extensive experimental results on two benchmarks, i.e., CUHK-SYSU and PRW, well demonstrate the superiority of the proposed model. In particular, IAN achieves 77.23% mAP and 80.45% top-1 accuracy on CUHK-SYSU, which outperform the state-of-the-art by 1.7% and 1.85%, respectively.
With the development of smart cities, urban surveillance video analysis plays a further significant role in intelligent transportation systems. Vehicle re-identification (re-ID) aims at identifying ...the same target vehicle in large datasets from non-overlapping cameras, which has grown into a hot topic in promoting intelligent transportation systems. However, due to the similar appearances, vehicle re-ID has become a challenging task. In this paper, we tackle this challenge by proposing Triplet Center Loss based Part-aware Model (TCPM) that leverages the discriminative features in part details of vehicles to refine the accuracy of vehicle re-ID. TCPM mainly partitions the vehicle from horizontal and vertical directions to strengthen the details of the vehicle and reinforce the internal consistency of the parts. In addition, to eliminate intra-class differences in local regions of the vehicle, we utilize the external memory modules to emphasize the consistency of each part to learn the discriminating features, which form a global dictionary over all categories in the dataset. Moreover, in TCPM, triplet-center loss is introduced to ensure that each part of vehicle features has intra-class consistency and inter-class separability. Experimental results show that our proposed TCPM has competitive results over the existing state-of-the-art methods on benchmark datasets VehicleID and VeRi-776.
Constrained Center Loss for Convolutional Neural Networks Shi, Zhanglei; Wang, Hao; Leung, Chi-Sing
IEEE transaction on neural networks and learning systems,
2023-Feb., 2023-Feb, 2023-2-00, 20230201, Volume:
34, Issue:
2
Journal Article
From the feature representation's point of view, the feature learning module of a convolutional neural network (CNN) is to transform an input pattern into a feature vector. This feature vector is ...then multiplied with a number of output weight vectors to produce softmax scores. The common training objective in CNNs is based on the softmax loss, which ignores the intra-class compactness. This brief proposes a constrained center loss (CCL)-based algorithm to extract robust features. The training objective of a CNN consists of two terms, softmax loss and CCL. The aim of the softmax loss is to push the feature vectors from different classes apart. Meanwhile, the CCL aims at clustering the feature vectors such that the feature vectors from the same classes are close together. Instead of using stochastic gradient descent (SGD) algorithms to learn all the connection weights and the cluster centers at the same time. Our CCL-based algorithm is based on the alternative learning strategy. We first fix the connection weights of the CNN and update the cluster centers based on an analytical formula, which can be implemented based on the minibatch concept. We then fix the cluster centers and update the connection weights for a number of SGD minibatch iterations. We also propose a simplified CCL (SCCL) algorithm. Experiments are performed on six commonly used benchmark datasets. The results demonstrate that the two proposed algorithms outperform several state-of-the-art approaches.
In recent years, convolutional neural networks (CNNs) have become the predominant method for content-based aerial image retrieval (CBAIR) and aerial scene classification (ASC) due to their ...overwhelming performance advantages. However, existing CNN-based models have the following shortcomings: first, they do not deal with large intraclass variations, thereby overlooking the possibility of fine-grained retrieval and classification; second, all similarity learning methods for CBAIR consider similarity between two images as a constant, neglecting the fact that image similarity is uncertain in nature; third, similarity learning is separated from ASC, ignoring the advantages of joint optimization. To address these issues, we propose a novel metric learning method called center-metric learning, and couple it with a new kind of loss called positive-negative center loss, which, with the help of several "experts," enables CNNs to cope successfully with within-class variations. Besides, we propose similarity distribution learning, making the first attempt to embed uncertainty regarding similarity into the training process. The resulting fine-grained similarity predictions can further strengthen CNNs' fine discrimination ability. Furthermore, three tasks, that is, center-metric learning, similarity distribution learning, and ASC, are incorporated into one CNN, benefitting from one another and leading to a better generalization capability. Just like an eagle, our model is able to discriminate subtle differences among aerial images, hence the name "eagle-eyed multitask CNN." We carry out extensive experiments over four publicly available aerial image sets and achieve a performance better than all existing methods.
Objective: In recent years, the early diagnosis and treatment of coronary microvascular dysfunction (CMD) have become crucial for preventing coronary heart disease. This paper aims to develop a ...computer-assisted autonomous diagnosis method for CMD by using ECG features and expert features. Approach: Clinical electrocardiogram (ECG), myocardial contrast echocardiography (MCE), and coronary angiography (CAG) are used in our method. Firstly, morphological features, temporal features, and T-wave features of ECG are extracted by multi-channel residual network with BiLSTM (MCResnet-BiLSTM) model and the multi-source T-wave features (MTF) extraction model, respectively. And these features are fused to form ECG features. In addition, the CFR<inline-formula><tex-math notation="LaTeX">_\text{MCE}</tex-math></inline-formula> is calculated based on the parameters related to the MCE at rest and stress state, and the Angio-IMR is calculated based on CAG. The combination of CFR<inline-formula><tex-math notation="LaTeX">_\text{MCE}</tex-math></inline-formula> and Angio-IMR is termed as expert features. Furthermore, the hybrid features, fused from the ECG features and the expert features, are input into the multilayer perceptron to implement the identification of CMD. And the weighted sum of the soft maximum loss and center loss is used as the total loss function for training the classification model, which optimizes the classification ability of the model. Result: The proposed method achieved 93.36% accuracy, 94.46% specificity, 92.10% sensitivity, 95.89% precision, and 93.95% F1 score on the clinical dataset of the Second Affiliated Hospital of Zhejiang University. Conclusion: The proposed method accurately extracts global ECG features, combines them with expert features to obtain hybrid features, and uses weighted loss to significantly improve diagnostic accuracy. It provides a novel and practical method for the clinical diagnosis of CMD.
Recently, deep learning-based cross-view gait recognition has become popular owing to the strong capacity of convolutional neural networks (CNNs). Current deep learning methods often rely on loss ...functions used widely in the task of face recognition, e.g., contrastive loss and triplet loss. These loss functions have the problem of hard negative mining. In this paper, a robust, effective, and gait-related loss function, called angle center loss (ACL), is proposed to learn discriminative gait features. The proposed loss function is robust to different local parts and temporal window sizes. Different from center loss which learns a center for each identity, the proposed loss function learns multiple sub-centers for each angle of the same identity. Only the largest distance between the anchor feature and the corresponding cross-view sub-centers is penalized, which achieves better intra-subject compactness. We also propose to extract discriminative spatial-temporal features by local feature extractors and a temporal attention model. A simplified spatial transformer network is proposed to localize the suitable horizontal parts of the human body. Local gait features for each horizontal part are extracted and then concatenated as the descriptor. We introduce long short-term memory (LSTM) units as the temporal attention model to learn the attention score for each frame, e.g., focusing more on discriminative frames and less on frames with bad quality. The temporal attention model shows better performance than the temporal average pooling or gait energy images (GEI). By combing the three aspects, we achieve state-of-the-art results on several cross-view gait recognition benchmarks.
The purpose of Person re-identification (PReID) is to identify the same individual from the non-overlapping cameras, the task has been greatly promoted by the deep learning system. In this study, we ...review two widely-used CNN frameworks in the PReID community: identification model and triplet model. We provide a comprehensive overview of the advantages and limitations of the two models and present a hybrid model that combines the advantages of both identification and triplet models. Specifically, the proposed model employs triplet loss, identification loss and center loss to simultaneously train the carefully designed network. Furthermore, the dropout scheme is adopted by its identification subnetwork. Given a triplet unit images, the model can output the identities of the three input images and force the Euclidean distance between the mismatched pairs to be larger than those between the matched pairs as well as reduce the variance of the same class at the same time. Extensive comparative experiments on three PReID benchmark datasets (CUHK01, CUHK03, Market-1501) show that our proposed architecture outperforms many state of the art methods in most cases.
Attend and Discriminate Abedin, Alireza; Ehsanpour, Mahsa; Shi, Qinfeng ...
Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies,
03/2021, Volume:
5, Issue:
1
Journal Article
Peer reviewed
Wearables are fundamental to improving our understanding of human activities, especially for an increasing number of healthcare applications from rehabilitation to fine-grained gait analysis. ...Although our collective know-how to solve Human Activity Recognition (HAR) problems with wearables has progressed immensely with end-to-end deep learning paradigms, several fundamental opportunities remain overlooked. We rigorously explore these new opportunities to learn enriched and highly discriminating activity representations. We propose: i) learning to exploit the latent relationships between multi-channel sensor modalities and specific activities; ii) investigating the effectiveness of data-agnostic augmentation for multi-modal sensor data streams to regularize deep HAR models; and iii) incorporating a classification loss criterion to encourage minimal intra-class representation differences whilst maximising inter-class differences to achieve more discriminative features. Our contributions achieves new state-of-the-art performance on four diverse activity recognition problem benchmarks with large margins---with up to 6% relative margin improvement. We extensively validate the contributions from our design concepts through extensive experiments, including activity misalignment measures, ablation studies and insights shared through both quantitative and qualitative studies. The code base and trained network parameters are open-sourced on GitHub https://github.com/AdelaideAuto-IDLab/Attend-And-Discriminate to support further research.
In recent years, a growing body of research has focused on the problem of person re-identification (re-id). The re-id techniques attempt to match the images of pedestrians from disjoint ...non-overlapping camera views. A major challenge of the re-id is the serious intra-class variations caused by changing viewpoints. To overcome this challenge, we propose a deep neural network-based framework which utilizes the view information in the feature extraction stage. The proposed framework learns a view-specific network for each camera view with a cross-view Euclidean constraint (CV-EC) and a cross-view center loss. We utilize the CV-EC to decrease the margin of the features between diverse views and extend the center loss metric to a view-specific version to better adapt the re-id problem. Moreover, we propose an iterative algorithm to optimize the parameters of the view-specific networks from coarse to fine. The experiments demonstrate that our approach significantly improves the performance of the existing deep networks and outperforms the state-of-the-art methods on the VIPeR, CUHK01, CUHK03, SYSU-mReId, and Market-1501 benchmarks.