As a subjectively psychological and physiological response to external stimuli, emotion is ubiquitous in our daily life. With the continuous development of the artificial intelligence and brain ...science, emotion recognition rapidly becomes a multiple discipline research field through EEG signals. This paper investigates the relevantly scientific literature in the past five years and reviews the emotional feature extraction methods and the classification methods using EEG signals. Commonly used feature extraction analysis methods include time domain analysis, frequency domain analysis, and time-frequency domain analysis. The widely used classification methods include machine learning algorithms based on Support Vector Machine (SVM), k-Nearest Neighbor (KNN), Naive Bayes (NB), etc., and their classification accuracy ranges from 57.50% to 95.70%. The classification accuracy of the deep learning algorithms based on Neural Network (NN), Long and Short-Term Memory (LSTM), and Deep Belief Network (DBN) ranges from 63.38% to 97.56%.
Display omitted
Small and cluttered objects are common in real-world which are challenging for detection. The difficulty is further pronounced when the objects are rotated, as traditional detectors often routinely ...locate the objects in horizontal bounding box such that the region of interest is contaminated with background or nearby interleaved objects. In this paper, we first innovatively introduce the idea of denoising to object detection. Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects. To handle the rotation variation, we also add a novel IoU constant factor to the smooth L1 loss to address the long standing boundary problem, which to our analysis, is mainly caused by the periodicity of angular (PoA) and exchangeability of edges (EoE). By combing these two features, our proposed detector is termed as SCRDet++. Extensive experiments are performed on large aerial images public datasets DOTA, DIOR, UCAS-AOD as well as natural image dataset COCO, scene text dataset ICDAR2015, small traffic light dataset BSTLD and our released S <inline-formula><tex-math notation="LaTeX">^{2}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="yan-ieq1-3166956.gif"/> </inline-formula> TLD by this paper. The results show the effectiveness of our approach. The released dataset S <inline-formula><tex-math notation="LaTeX">^{2}</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="yan-ieq2-3166956.gif"/> </inline-formula> TLD is made public available, which contains 5,786 images with 14,130 traffic light instances across five categories.
Abstract
The stethoscope is a commonly used medical device for diagnosing a patient’s condition by listening to their heartbeat. This diagnostic technique is known as auscultation, where each heart ...sound heard through the stethoscope has a distinct pattern that depends on the person’s heart condition. Researchers have developed computational methods that automatically analyze heart sounds to overcome the subjective nature of auscultation. In this study, we utilized the Hjorth method for feature extraction and the Long Short-Term RNN algorithm for classifying the normal and abnormal heart sounds. With 100 epochs, two layers of LSTM-RNN, and one layer of Dense, the classification accuracy for distinguishing normal and abnormal sounds reached 71.95%. This research aims to contribute to the development of an accurate system for detecting normal and abnormal heart sounds.
This paper addresses the problem of text-to-video temporal grounding, which aims to identify the time interval in a video semantically relevant to a text query. We tackle this problem using a novel ...regression-based model that learns to extract a collection of mid-level features for semantic phrases in a text query, which corresponds to important semantic entities described in the query (e.g., actors, objects, and actions), and reflect bi-modal interactions between the linguistic features of the query and the visual features of the video in multiple levels. The proposed method effectively predicts the target time interval by exploiting contextual information from local to global during bi-modal interactions. Through in-depth ablation studies, we find out that incorporating both local and global context in video and text interactions is crucial to the accurate grounding. Our experiment shows that the proposed method outperforms the state of the arts on Charades-STA and ActivityNet Captions datasets by large margins, 7.44\% and 4.61\% points at Recall@tIoU=0.5 metric, respectively.
To address the problem of image texture feature extraction, a direction measure statistic that is based on the directionality of image texture is constructed, and a new method of texture feature ...extraction, which is based on the direction measure and a gray level co-occurrence matrix (GLCM) fusion algorithm, is proposed in this paper. This method applies the GLCM to extract the texture feature value of an image and integrates the weight factor that is introduced by the direction measure to obtain the final texture feature of an image. A set of classification experiments for the high-resolution remote sensing images were performed by using support vector machine (SVM) classifier with the direction measure and gray level co-occurrence matrix fusion algorithm. Both qualitative and quantitative approaches were applied to assess the classification results. The experimental results demonstrated that texture feature extraction based on the fusion algorithm achieved a better image recognition, and the accuracy of classification based on this method has been significantly improved.
Our work focuses on tackling the challenging but natural visual recognition task of long-tailed data distribution (i.e., a few classes occupy most of the data, while most classes have rarely few ...samples). In the literature, class re-balancing strategies (e.g., re-weighting and re-sampling) are the prominent and effective methods proposed to alleviate the extreme imbalance for dealing with long-tailed problems. In this paper, we firstly discover that these re-balancing methods achieving satisfactory recognition accuracy owe to that they could significantly promote the classifier learning of deep networks. However, at the same time, they will unexpectedly damage the representative ability of the learned deep features to some extent. Therefore, we propose a unified Bilateral-Branch Network (BBN) to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately. In particular, our BBN model is further equipped with a novel cumulative learning strategy, which is designed to first learn the universal patterns and then pay attention to the tail data gradually. Extensive experiments on four benchmark datasets, including the large-scale iNaturalist ones, justify that the proposed BBN can significantly outperform state-of-the-art methods. Furthermore, validation experiments can demonstrate both our preliminary discovery and effectiveness of tailored designs in BBN for long-tailed problems. Our method won the first place in the iNaturalist 2019 large scale species classification competition, and our code is open-source and available at https://github.com/Megvii-Nanjing/BBN.
Existing domain generalization methods for face anti-spoofing endeavor to extract common differentiation features to improve the generalization. However, due to large distribution discrepancies among ...fake faces of different domains, it is difficult to seek a compact and generalized feature space for the fake faces. In this work, we propose an end-to-end single-side domain generalization framework (SSDG) to improve the generalization ability of face anti-spoofing. The main idea is to learn a generalized feature space, where the feature distribution of the real faces is compact while that of the fake ones is dispersed among domains but compact within each domain. Specifically, a feature generator is trained to make only the real faces from different domains undistinguishable, but not for the fake ones, thus forming a single-side adversarial learning. Moreover, an asymmetric triplet loss is designed to constrain the fake faces of different domains separated while the real ones aggregated. The above two points are integrated into a unified framework in an end-to-end training manner, resulting in a more generalized class boundary, especially good for samples from novel domains. Feature and weight normalization is incorporated to further improve the generalization ability. Extensive experiments show that our proposed approach is effective and outperforms the state-of-the-art methods on four public databases. The code is released online.
Deep-learning based salient object detection methods achieve great progress. However, the variable scale and unknown category of salient objects are great challenges all the time. These are closely ...related to the utilization of multi-level and multi-scale features. In this paper, we propose the aggregate interaction modules to integrate the features from adjacent levels, in which less noise is introduced because of only using small up-/down-sampling rates. To obtain more efficient multi-scale features from the integrated features, the self-interaction modules are embedded in each decoder unit. Besides, the class imbalance issue caused by the scale variation weakens the effect of the binary cross entropy loss and results in the spatial inconsistency of the predictions. Therefore, we exploit the consistency-enhanced loss to highlight the fore-/back-ground difference and preserve the intra-class consistency. Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches. The source code will be publicly available at https://github.com/lartpang/MINet.
The extraction of joint spatial-spectral features has been proved to improve the classification performance of hyperspectral images (HSIs). Recently, utilizing convolutional neural networks (CNNs) to ...learn joint spatial-spectral features has become of great interest. However, the existing CNN models ignore complementary spatial-spectral information among the shallow and deep layers. Moreover, insufficient training samples in HSIs afflict these CNN models with overfitting problem. In order to address these problems, a novel CNN method for HSI classification is proposed. It considers multilayer spatial-spectral feature fusion and sample augmentation with local and nonlocal constraints, which is abbreviated as MSLN-CNN. In MSLN-CNN, a triple-architecture CNN is constructed to extract spatial-spectral features by cascading spectral features to dual-scale spatial features from shallow to deep layers. Then, multilayer spatial-spectral features are fused to learn complementary information among the shallow layers with detailed information and the deep layers with semantic information. Finally, the multilayer spatial-spectral feature fusion and classification are integrated into a unified network, and MSLN-CNN can be optimized in the end-to-end way. To alleviate the small sample size problem, the unlabeled samples having high confidences on local spatial constraint and nonlocal spectral constraint are selected and prelabeled. The nonlocal spectral constraint considers the structure information with spectrally similar samples in the nonlocal searching, while the local spatial constraint utilizes the contextual information with spatially adjacent samples. Experimental results on several hyperspectral datasets demonstrate that the proposed method achieves more encouraging classification performance than the current state-of-the-art classification methods, especially with the limited training samples.