Hyperspectral data classification is a hot topic in remote sensing community. In recent years, significant effort has been focused on this issue. However, most of the methods extract the features of ...original data in a shallow manner. In this paper, we introduce a deep learning approach into hyperspectral image classification. A new feature extraction (FE) and image classification framework are proposed for hyperspectral data analysis based on deep belief network (DBN). First, we verify the eligibility of restricted Boltzmann machine (RBM) and DBN by the following spectral information-based classification. Then, we propose a novel deep architecture, which combines the spectral-spatial FE and classification together to get high classification accuracy. The framework is a hybrid of principal component analysis (PCA), hierarchical learning-based FE, and logistic regression (LR). Experimental results with hyperspectral data indicate that the classifier provide competitive solution with the state-of-the-art methods. In addition, this paper reveals that deep learning system has huge potential for hyperspectral data classification.
Graph convolution networks (GCNs) have been applied in a variety of fields due to their powerful ability in processing graph-like data. However, the massive number of hyperspectral pixels makes it ...challenging to define general graph structures on hyperspectral images (HSIs). On the other hand, convolutional neural networks (CNNs) take in regular image regions with fixed square size, and have demonstrated impressive accuracy while being efficient in computation. Inspired by the classification framework of CNNs, we develop a GCN-based model that generates effective local spectral-spatial features for HSI classification. Specifically, graph convolutions are performed separately on every local region, which significantly limits the graph's size. While graph convolution extracts features of every pixel, it does not reduce the number of them. To fuse suitable representations for the classification task, we develop a graph pooling operation to preserve classification-specific features and reduce redundant pixels. Based on local regions of HSIs, pooling in the graph domain is equivalent to spatial pooling in the spatial domain. The proposed method is thus named the spatial pooling graph convolutional network (SPGCN). Experimental results on several typical datasets demonstrated that the proposed SPGCN provides competitive results compared with other state-of-the-art CNN-based methods.
With more detailed spatial information being represented in very-high-resolution (VHR) remote sensing images, stringent requirements are imposed on accurate image classification. Due to the diverse ...land objects with intraclass variation and interclass similarity, efficient and fine classification of VHR images especially in complex scenes are challenging. Even for some popular deep learning (DL) frameworks, geometric details of land objects may be lost in deep feature levels, so it is difficult to maintain the highly detailed spatial information (e.g., edges, small objects) only relying on the last high-level layer. Moreover, many of the newly developed DL methods require massive well-labeled samples, which inevitably deteriorates the model generalization ability under the few-shot learning. Therefore, in this article, a lightweight shallow-to-deep feature fusion network (SDF 2 N) is proposed for VHR image classification, where the traditional machine learning (ML) and DL schemes are integrated to learn rich and representative information to improve the classification accuracy. In particular, the shallow spectral-spatial features are first extracted and then a novel triple-stage fusion (TSF) module is designed to learn the saliency and discriminative information at different levels for classification. The TSF module includes three feature fusion stages, that is, low-level spectral-spatial feature fusion, middle-level multiscale feature fusion, and high-level multilayer feature fusion. The proposed SDF 2 N takes the advantage of the shallow-to-deep features, which can extract representative and complementary information from crossing layers. It is important to note that even with limited training samples, the SDF 2 N still can achieve satisfying classification performance. Experimental results obtained on three real VHR remote sensing datasets including two multispectral and one airborne hyperspectral images covering complex urban scenarios confirm the effectiveness of the proposed approach compared with the state-of-the-art methods.
Humans are able to describe image contents with coarse to fine details as they wish. However, most image captioning models are intention-agnostic which cannot generate diverse descriptions according ...to different user intentions initiatively. In this work, we propose the Abstract Scene Graph (ASG) structure to represent user intention in fine-grained level and control what and how detailed the generated description should be. The ASG is a directed graph consisting of three types of abstract nodes (object, attribute, relationship) grounded in the image without any concrete semantic labels. Thus it is easy to obtain either manually or automatically. From the ASG, we propose a novel ASG2Caption model, which is able to recognise user intentions and semantics in the graph, and therefore generate desired captions following the graph structure. Our model achieves better controllability conditioning on ASGs than carefully designed baselines on both VisualGenome and MSCOCO datasets. It also significantly improves the caption diversity via automatically sampling diverse ASGs as control signals. Code will be released at \url{https://github.com/cshizhe/asg2cap}.
Abstract
To solve the problem of inadequate expression of action behavior features, this paper proposes an action recognition method based on attention mechanism. Firstly, in the feature extraction ...part, a CSE module is designed to model action features spatio-temporally, and then this module is incorporated into the residual network to improve the feature extraction ability of the model; after that, the LSTM network is used to solve the problem of temporal association of features; finally, the actions are classified by Softmax. The experimental results show that the improved recognition rates of this method on UCF101, HMDB51 and Kinetics400 datasets are 96.23%, 92.03% and 75.65%, respectively.
The purpose of question classification (QC) is to assign a question to an appropriate category from the set of predefined categories that constitute a question taxonomy. Selected question features ...are able to significantly improve the performance of QC. However, feature extraction, particularly syntax feature extraction, has a high computational cost. To maintain or enhance performance without syntax features, this study presents a hybrid approach to semantic feature extraction and lexical feature extraction. These features are generated by improved information gain and sequential pattern mining methods, respectively. Selected features are then fed into classifiers for questions classification. Benchmark testing is performed using the public UIUC data set. The results reveal that the proposed approach achieves a coarse accuracy of 96% and fine accuracy of 90.4%, which is superior to existing methods.
State of health (SOH) is essential for battery management, timely maintenance, and safety incident avoidance. For specific applications, a variety of SOH estimation methods have been proposed. ...However, it is often difficult to apply these methods to other applications. In this article, a novel feature extraction method is proposed to extract health indicators (HIs) from general discharging conditions. A voltage partition strategy is used to obtain the discharge capacity differences of two cycles △ Q ( V ) from nonmonotonic or pulse discharge voltage curve, and a filtering strategy is employed to obtain smooth voltage curves under dynamic discharging conditions. The standard deviations of the discharge capacity curve and △ Q ( V ) are selected as HIs and are verified to have strong correlations to battery capacity under different datasets for three types of batteries. By using these HIs as input features, typical data-driven methods, including linear regression, support vector machine, relevance vector machine, and Gaussian process regression (GPR), are constructed to predict battery SOH. The estimation results of these methods are compared under different operating conditions for the three types of batteries. Good estimation accuracy is achieved for all these methods. Among them, the GPR has the best performance, and its maximum absolute error and root-mean-square error are lower than 1% and 1.3%, respectively.
Modern industry processes are typically composed of multiple operating units with reaction interaction and energy-mass coupling, which result in a mixed time-varying and spatial-temporal coupling of ...process variables. It is challenging to develop a comprehensive and precise fault detection model for the multiple interconnected units by simple superposition of the individual unit models. In this study, the fault detection problem is formulated as a spatial-temporal fault detection problem utilizing process data of multiple interconnected unit processes. A spatial-temporal variational graph attention autoencoder (STVGATE) using interactive information is proposed for fault detection, which aims to effectively capture the spatial and temporal features of the interconnected unit processes. First, slow feature analysis (SFA) is implemented to extract temporal information that reveals the dynamic relevance of the process data. Then, an integration method of metric learning and prior knowledge is proposed to construct coupled spatial relationships based on temporal information. In addition, a variational graph attention autoencoder (VGATE) is suggested to extract temporal and spatial information for fault detection, which incorporates the dominances of variational inference and graph attention mechanisms. The proposed method can automatically extract and deeply mine spatial-temporal interactive feature information to boost detection performance. Finally, three industrial process experiments are performed to verify the feasibility and effectiveness of the proposed method. The results demonstrate that the proposed method dramatically increases the fault detection rate (FDR) and reduces the false alarm rate (FAR).
There is a growing demand for ready-to-eat kiwifruit in the world. However, ready-to-eat kiwifruit has a rather narrow range of firmness (e.g. 10–30 N), and it remains challenging to predict this ...firmness in a non-destructive manner. Here, we report a strategy for non-destructive prediction of kiwifruit firmness based on Fourier transform near-infrared (FT-NIR) spectroscopy. The radial basis function (RBF) model displayed superior performance, with a coefficient of determination (Rc2) of 0.83, a cross-validation coefficient of determination (Rp2) of 0.73, a root mean square error of calibration (RMSEC) of 0.58, a root mean square error of prediction (RMSEP) of 0.72, and a ratio of performance to deviation (RPD) of 1.92. To enhance the accuracy of kiwifruit firmness prediction, we optimized the FT-NIR algorithm through data preprocessing, feature selection, and dimensionality reduction. The results showed that the FD-CARS-SVR (RBF) algorithm exhibited the best performance in predicting kiwifruit firmness during the shelf life with impressive values of Rc2 (0.99), Rp2 (0.92), RMSEC (0.15), RMSEP (0.40), and RPD (3.48). To further evaluate the applicability of the FT-NIR model, we compared the data predicted by the model and acquired from the Kiwifirm™ and penetrometer GY-4. The results revealed pronounced superiority of the FT-NIR model for the firmness ranging from 10 to 40 N to replace Kiwifirm™, providing a new non-destructive model for the prediction of the firmness of ready-to-eat kiwifruit.
•We developed a firmness testing algorithm based on FT-NIR during kiwifruit ripe.•The approach for predicting the firmness of kiwifruit has a pronounced superiority.•Classifying based on shelf life has better advantages.
Traditionally multi-object tracking and object detection are performed using separate systems with most prior works focusing exclusively on one of these aspects over the other. Tracking systems ...clearly benefit from having access to accurate detections, however and there is ample evidence in literature that detectors can benefit from tracking which, for example, can help to smooth predictions over time. In this paper we focus on the tracking-by-detection paradigm for autonomous driving where both tasks are mission critical. We propose a conceptually simple and efficient joint model of detection and tracking, called RetinaTrack, which modifies the popular single stage RetinaNet approach such that it is amenable to instance-level embedding training. We show, via evaluations on the Waymo Open Dataset, that we outperform a recent state of the art tracking algorithm while requiring significantly less computation. We believe that our simple yet effective approach can serve as a strong baseline for future work in this area.