The problem of dense semantic labeling consists in assigning semantic labels to every pixel in an image. In the context of aerial image analysis, it is particularly important to yield high-resolution ...outputs. In order to use convolutional neural networks (CNNs) for this task, it is required to design new specific architectures to provide fine-grained classification maps. Many dense semantic labeling CNNs have been recently proposed. Our first contribution is an in-depth analysis of these architectures. We establish the desired properties of an ideal semantic labeling CNN, and assess how those methods stand with regard to these properties. We observe that even though they provide competitive results, these CNNs often underexploit properties of semantic labeling that could lead to more effective and efficient architectures. Out of these observations, we then derive a CNN framework specifically adapted to the semantic labeling problem. In addition to learning features at different resolutions, it learns how to combine these features. By integrating local and global information in an efficient and flexible manner, it outperforms previous techniques. We evaluate the proposed framework and compare it with state-of-the-art architectures on public benchmarks of high-resolution aerial image labeling.
Face parsing infers a pixel-wise label map for each semantic facial component. Previous methods generally work well for uncovered faces, however, they overlook facial occlusion and ignore some ...contextual areas outside a single face, especially when facial occlusion has become a common situation during the COVID-19 epidemic. Inspired by the lighting phenomena in everyday life, where illumination from four distinct lamps provides a more uniform distribution than a single central light source, we propose a novel homogeneous tanh-transform for image preprocessing, which is made up of four tanh-transforms. These transforms fuse the central vision and the peripheral vision together. Our proposed method addresses the dilemma of face parsing under occlusion and compresses more information from the surrounding context. Based on homogeneous tanh-transforms, we propose an occlusion-aware convolutional neural network for occluded face parsing. It combines information in both Tanh-polar space and Tanh-Cartesian space, capable of enhancing receptive fields. Furthermore, we introduce an occlusion-aware loss to focus on the boundaries of occluded regions. The network is simple, flexible, and can be trained end-to-end. To facilitate future research of occluded face parsing, we also contribute a new cleaned face parsing dataset. This dataset is manually purified from several academic or industrial datasets, including CelebAMask-HQ, Short-video Face Parsing, and the Helen dataset, and will be made public. Experiments demonstrate that our method surpasses state-of-the-art methods in face parsing under occlusion.
•Propose a Tanh-transforms neural network for occluded face parsing.•The four-point Tanh-transform is used to enhance facial component recognition.•Design a four-point block structure and an occlusion-aware loss function.•A Sheltered Face Parsing Dataset is presented for face parsing.
Toward the development of effective and efficient brain-computer interface (BCI) systems, precise decoding of brain activity measured by an electroencephalogram (EEG) is highly demanded. Traditional ...works classify EEG signals without considering the topological relationship among electrodes. However, neuroscience research has increasingly emphasized network patterns of brain dynamics. Thus, the Euclidean structure of electrodes might not adequately reflect the interaction between signals. To fill the gap, a novel deep learning (DL) framework based on the graph convolutional neural networks (GCNs) is presented to enhance the decoding performance of raw EEG signals during different types of motor imagery (MI) tasks while cooperating with the functional topological relationship of electrodes. Based on the absolute Pearson's matrix of overall signals, the graph Laplacian of EEG electrodes is built up. The GCNs-Net constructed by graph convolutional layers learns the generalized features. The followed pooling layers reduce dimensionality, and the fully-connected (FC) softmax layer derives the final prediction. The introduced approach has been shown to converge for both personalized and groupwise predictions. It has achieved the highest averaged accuracy, 93.06% and 88.57% (PhysioNet dataset), 96.24% and 80.89% (high gamma dataset), at the subject and group level, respectively, compared with existing studies, which suggests adaptability and robustness to individual variability. Moreover, the performance is stably reproducible among repetitive experiments for cross-validation. The excellent performance of our method has shown that it is an important step toward better BCI approaches. To conclude, the GCNs-Net filters EEG signals based on the functional topological relationship, which manages to decode relevant features for brain MI. A DL library for EEG task classification including the code for this study is open source at https://github.com/SuperBruceJia/ EEG-DL for scientific research.
Object detection is a basic issue of very high-resolution remote sensing images (RSIs) for automatically labeling objects. At present, deep learning has gradually gained the competitive advantage for ...remote sensing object detection, especially based on convolutional neural networks (CNNs). Most of the existing methods use the global information in the fully connected feature vector and ignore the local information in the convolutional feature cubes. However, the local information can provide spatial information, which is helpful for accurate localization. In addition, there are variable factors, such as rotation and scaling, which affect the object detection accuracy in RSIs. In order to solve these problems, this paper presents a hierarchical robust CNN. First, multiscale convolutional features are extracted to represent the hierarchical spatial semantic information. Second, multiple fully connected layer features are stacked together so as to improve the rotation and scaling robustness. Experiments on two data sets have shown the effectiveness of our method. In addition, a large-scale high-resolution remote sensing object detection data set is established to make up for the current situation that the existing data set is insufficient or too small. The data set is available at https://github.com/CrazyStoneonRoad/TGRS-HRRSD-Dataset.
The classification of hyperspectral images (HSIs) using convolutional neural networks (CNNs) has recently drawn significant attention. However, it is important to address the potential overfitting ...problems that CNN-based methods suffer when dealing with HSIs. Unlike common natural images, HSIs are essentially three-order tensors which contain two spatial dimensions and one spectral dimension. As a result, exploiting both spatial and spectral information is very important for HSI classification. This paper proposes a new hand-crafted feature extraction method, based on multiscale covariance maps (MCMs), that is specifically aimed at improving the classification of HSIs using CNNs. The proposed method has the following distinctive advantages. First, with the use of covariance maps, the spatial and spectral information of the HSI can be jointly exploited. Each entry in the covariance map stands for the covariance between two different spectral bands within a local spatial window, which can absorb and integrate the two kinds of information (spatial and spectral) in a natural way. Second, by means of our multiscale strategy, each sample can be enhanced with spatial information from different scales, increasing the information conveyed by training samples significantly. To verify the effectiveness of our proposed method, we conduct comprehensive experiments on three widely used hyperspectral data sets, using a classical 2-D CNN (2DCNN) model. Our experimental results demonstrate that the proposed method can indeed increase the robustness of the CNN model. Moreover, the proposed MCMs+2DCNN method exhibits better classification performance than other CNN-based classification strategies and several standard techniques for spectral-spatial classification of HSIs.
Capturing subtle texture variations in remote sensing images with limited samples remains a challenge for accurate tree classification. To address this, we propose Local Binary Convolutional Neural ...Networks with Attention Mechanisms (AM-LBCNN), a deep learning framework that combines channel attention mechanism (CBAM) with local binary convolutional neural networks (LBCNN). Experimental results on a limited dataset of remotely sensed tree images demonstrate the superiority of AM-LBCNN in capturing subtle texture differences, achieving a classification accuracy of 93.57%. The processing efficiency of LBCNN has been improved by 5 times compared to the baseline, and after introducing the CBAM mechanism, the Top-1 accuracy has been increased by 0.11% compared to LBCNN. This integrated approach addresses the limitations of CNNs in texture extraction by using LBCNN and the attention mechanism to accurately identify fine-grained texture variations among tree species, while mitigating the computational challenges associated with few-shot.
This article proposes a quantum spatial graph convolutional neural network (QSGCN) model that is implementable on quantum circuits, providing a novel avenue to processing non-Euclidean type data ...based on the state-of-the-art parameterized quantum circuit (PQC) computing platforms. Four basic blocks are constructed to formulate the whole QSGCN model, including the quantum encoding, the quantum graph convolutional layer, the quantum graph pooling layer, and the network optimization. In particular, the trainability of the QSGCN model is analyzed through discussions on the barren plateau phenomenon. Simulation results from various types of graph data are presented to demonstrate the learning, generalization, and robustness capabilities of the proposed quantum neural network (QNN) model.
Class/Regression Activation Maps (CAMs/RAMs; AMs) are often embedded into Convolutional Neural Networks (CNNs) for checking activated regions on input images at estimation. CNNs sometime generate ...unreliable AMs, such as activated regions, are inappropriate. Because AM is calculated by stacking many feature maps generated by the final convolutional layer, when there are Anomaly Feature Maps (AFMs), unreliable AMs can be generated. For example, suppose we have a CNN that evaluates the heart. In this case, the feature maps that focus on regions unrelated to the heart (e.g., shoulders and esophagus) are AFMs. Additionally, we have a hypothesis that the estimation accuracy of CNNs is increased by removing AFMs. However, methods for automatically detecting and removing AFMs have not been sufficiently studied in previous research to improve the performance of CNNs. Therefore, we propose a method named “Removal Operation of Anomaly Feature Maps (RO-AFMs)” to automatically detect and remove AFMs. When applying an RO-AFM to the Global Average Pooling (GAP) feature vectors of a CNN, dimensions of the GAP vector are reduced. Therefore, an RO-AFM is regarded as a deep-feature selection algorithm. From the results of adopting an RO-AFM to a Regression CNN (R-CNN) for estimating pulmonary artery wedge pressure, which is one of the measurement score for representing cardiac anomaly state, improved reliability of AM and estimation accuracy were verified. A comparison of RO-AFM and the existing methods, i.e., Lasso and the Feature Selection Layer (FSL), indicated that RO-AFM performed slightly better on the estimation accuracy. The computation time required for RO-AFM to evaluate all features was 1.833 s on average, confirming that RO-AFM is a lightweight process. Therefore, RO-AFM is useful for constructing a medical CNN that emphasizes explainability (e.g., CNNs for estimating the risk of a disease or a test value from chest X-ray or computed tomography images).
Display omitted
•Explainable convolutional neural networks sometime generate unreliable activation maps.•A method to automatically detect and remove anomaly feature maps is proposed.•A proposed method improves the reliability of activation maps and estimation accuracy.