Hyperspectral image super-resolution (SR) methods based on deep learning have achieved significant progress recently. However, previous methods lack the joint analysis between spectrum and horizontal ...or vertical direction. Besides, when both 2D and 3D convolution are in the network, the existing models cannot effectively combine the two. To address these issues, in this article, we propose a novel hyperspectral image SR method by exploring the relationship between 2D/3D convolution (ERCSR). Our method alternately employs 2D and 3D units to solve the problem of structural redundancy by sharing spatial information during reconstruction for existing model, which can enhance the learning ability of 2D spatial domain. Importantly, compared with the network using 3D units, i.e., 2D units are replaced by 3D units, it can not only reduce the size of the model but also improve the performance of the model. Furthermore, to exploit the spectrum fully, the split adjacent spatial and spectral convolution (SAEC) is designed to parallelly explore information between spectrum and horizontal or vertical direction in space. Experiments on widely used benchmark datasets demonstrate that the proposed approach outperforms state-of-the-art SR algorithms across different scales in terms of quantitative and qualitative analysis.
Traffic forecasting is a particularly challenging application of spatiotemporal forecasting, due to the time-varying traffic patterns and the complicated spatial dependencies on road networks. To ...address this challenge, we learn the traffic network as a graph and propose a novel deep learning framework, Traffic Graph Convolutional Long Short-Term Memory Neural Network (TGC-LSTM), to learn the interactions between roadways in the traffic network and forecast the network-wide traffic state. We define the traffic graph convolution based on the physical network topology. The relationship between the proposed traffic graph convolution and the spectral graph convolution is also discussed. An L1-norm on graph convolution weights and an L2-norm on graph convolution features are added to the model's loss function to enhance the interpretability of the proposed model. Experimental results show that the proposed model outperforms baseline methods on two real-world traffic state datasets. The visualization of the graph convolution weights indicates that the proposed framework can recognize the most influential road segments in real-world traffic networks.
Low-dose computed tomography (LDCT) scans, which can effectively alleviate the radiation problem, will degrade the imaging quality. In this paper, we propose a novel LDCT reconstruction network that ...unrolls the iterative scheme and performs in both image and manifold spaces. Because patch manifolds of medical images have low-dimensional structures, we can build graphs from the manifolds. Then, we simultaneously leverage the spatial convolution to extract the local pixel-level features from the images and incorporate the graph convolution to analyze the nonlocal topological features in manifold space. The experiments show that our proposed method outperforms both the quantitative and qualitative aspects of state-of-the-art methods. In addition, aided by a projection loss component, our proposed method also demonstrates superior performance for semi-supervised learning. The network can remove most noise while maintaining the details of only 10% (40 slices) of the training data labeled.
Convolution in Convolution for Network in Network Pang, Yanwei; Sun, Manli; Jiang, Xiaoheng ...
IEEE transaction on neural networks and learning systems,
05/2018, Volume:
29, Issue:
5
Journal Article
Open access
Network in network (NiN) is an effective instance and an important extension of deep convolutional neural network consisting of alternating convolutional layers and pooling layers. Instead of using a ...linear filter for convolution, NiN utilizes shallow multilayer perceptron (MLP), a nonlinear function, to replace the linear filter. Because of the powerfulness of MLP and 1 × 1 convolutions in spatial domain, NiN has stronger ability of feature representation and hence results in better recognition performance. However, MLP itself consists of fully connected layers that give rise to a large number of parameters. In this paper, we propose to replace dense shallow MLP with sparse shallow MLP. One or more layers of the sparse shallow MLP are sparely connected in the channel dimension or channel-spatial domain. The proposed method is implemented by applying unshared convolution across the channel dimension and applying shared convolution across the spatial dimension in some computational layers. The proposed method is called convolution in convolution (CiC). The experimental results on the CIFAR10 data set, augmented CIFAR10 data set, and CIFAR100 data set demonstrate the effectiveness of the proposed CiC method.
Accurate traffic forecasting is critical in improving safety, stability, and efficiency of intelligent transportation systems. Despite years of studies, accurate traffic prediction still faces the ...following challenges, including modeling the dynamics of traffic data along both temporal and spatial dimensions, and capturing the periodicity and the spatial heterogeneity of traffic data, and the problem is more difficult for long-term forecast. In this paper, we propose an Attention based Spatial-Temporal Graph Neural Network (ASTGNN) for traffic forecasting. Specifically, in the temporal dimension, we design a novel self-attention mechanism that is capable of utilizing the local context, which is specialized for numerical sequence representation transformation. It enables our prediction model to capture the temporal dynamics of traffic data and to enjoy global receptive fields that is beneficial for long-term forecast. In the spatial dimension, we develop a dynamic graph convolution module, employing self-attention to capture the spatial correlations in a dynamic manner. Furthermore, we explicitly model the periodicity and capture the spatial heterogeneity through embedding modules. Experiments on five real-world traffic flow datasets demonstrate that ASTGNN outperforms the state-of-the-art baselines.
Automatic segmentation of retinal vessels in fundus images plays an important role in the diagnosis of some diseases such as diabetes and hypertension. In this paper, we propose Deformable U-Net ...(DUNet), which exploits the retinal vessels’ local features with a U-shape architecture, in an end to end manner for retinal vessel segmentation. Inspired by the recently introduced deformable convolutional networks, we integrate the deformable convolution into the proposed network. The DUNet, with upsampling operators to increase the output resolution, is designed to extract context information and enable precise localization by combining low-level features with high-level ones. Furthermore, DUNet captures the retinal vessels at various shapes and scales by adaptively adjusting the receptive fields according to vessels’ scales and shapes. Public datasets: DRIVE, STARE, CHASE_DB1 and HRF are used to test our models. Detailed comparisons between the proposed network and the deformable neural network, U-Net are provided in our study. Results show that more detailed vessels can be extracted by DUNet and it exhibits state-of-the-art performance for retinal vessel segmentation with a global accuracy of 0.9566/0.9641/0.9610/0.9651 and AUC of 0.9802/0.9832/0.9804/0.9831 on DRIVE, STARE, CHASE_DB1 and HRF respectively. Moreover, to show the generalization ability of the DUNet, we use another two retinal vessel data sets, i.e., WIDE and SYNTHE, to qualitatively and quantitatively analyze and compare with other methods. Extensive cross-training evaluations are used to further assess the extendibility of DUNet. The proposed method has the potential to be applied to the early diagnosis of diseases.
•A deep neural network (DUNet) for automatic segmentation of retinal vessel is built.•DUNet exploits the retinal vessels’ features with a U-shape architecture.•DUNet captures retinal vessels adaptively, according to vessels’ scales and shapes.•DUNet extracts more detailed vessels than deformable neural network and U-Net.•Comparisons between many methods show competitive performance and generalization.
•A new multi short-term load forecasting model named RICNN is proposed.•The proposed model combines an RNN and a 1-D CNN of inception module.•The proposed RICNN yields better forecasting performance ...than MLP, 1-D CNN and RNN.•The proposed RICNN is verified by actual power consumption data collected from three industrial complexes in South Korea.
Smart grid and microgrid technology based on energy storage systems (ESS) and renewable energy are attracting significant attention in addressing the challenges associated with climate change and energy crises. In particular, building an accurate short-term load forecasting (STLF) model for energy management systems (EMS) is a key factor in the successful formulation of an appropriate energy management strategy. Recent recurrent neural network (RNN)-based models have demonstrated favorable performance in electric load forecasting. However, when forecasting electric load at a specific time, existing RNN-based forecasting models neither use a predicted future hidden state vector nor the fully available past information. Therefore, once a hidden state vector has been incorrectly generated at a specific prediction time, it cannot be corrected for enhanced forecasting of the following prediction times. To address these problems, we propose a recurrent inception convolution neural network (RICNN) that combines RNN and 1-dimensional CNN (1-D CNN). We use the 1-D convolution inception module to calibrate the prediction time and the hidden state vector values calculated from nearby time steps. By doing so, the inception module generates an optimized network via the prediction time generated in the RNN and the nearby hidden state vectors. The proposed RICNN model has been verified in terms of the power usage data of three large distribution complexes in South Korea. Experimental results demonstrate that the RICNN model outperforms the benchmarked multi-layer perception, RNN, and 1-D CNN in daily electric load forecasting (48-time steps with an interval of 30 min).
Due to their spatial and spectral information, hyperspectral images are frequently used in various scientific and industrial fields. Recent developments in hyperspectral image classification have ...revolved around the use of convolutional neural networks (CNNs) and transformers, which are capable of modeling local and global data. However, most of the backbone networks of existing methods are based on 3-D convolution, which have high complexity in network structure. Moreover, local information and global information are extracted through different modules, and the coupling relationship between the two types of information is weak. To address the above issues, we propose a method named main-sub transformer network with spectral-spatial separable convolution method (MST-SSSNet), which includes two key modules: the spectral-spatial separable convolution (SSSC) module and the main-sub transformer encoder (MST) module. The SSSC module uses the proposed spectral-spatial separable convolution, reducing network parameters and efficiently extracting local features. The MST module adds the designed sub-transformer in front of the conventional transformer encoder (main-transformer). It assists the main-transformer encoder to establish global correlation by learning local information. The WHU-Hi dataset can be used as a benchmark dataset for precise crop classification and hyperspectral image classification research. MST-SSSNet is shown to deliver better classification performance than current state-of-the-art methods on the datasets. The code will be downloaded to https://github.com/fengqinshou/MST-SSSNet .
One essential problem in skeleton-based action recognition is how to extract discriminative features over all skeleton joints. However, the complexity of the recent State-Of-The-Art (SOTA) models for ...this task tends to be exceedingly sophisticated and over-parameterized. The low efficiency in model training and inference has increased the validation costs of model architectures in large-scale datasets. To address the above issue, recent advanced separable convolutional layers are embedded into an early fused Multiple Input Branches (MIB) network, constructing an efficient Graph Convolutional Network (GCN) baseline for skeleton-based action recognition. In addition, based on such the baseline, we design a compound scaling strategy to expand the model's width and depth synchronously, and eventually obtain a family of efficient GCN baselines with high accuracies and small amounts of trainable parameters, termed EfficientGCN-Bx, where "x" denotes the scaling coefficient. On two large-scale datasets, i.e. , NTU RGB+D 60 and 120, the proposed EfficientGCN-B4 baseline outperforms other SOTA methods, e.g. , achieving 92.1% accuracy on the cross-subject benchmark of NTU 60 dataset, while being 5.82× smaller and 5.85× faster than MS-G3D, which is one of the SOTA methods. The source code in PyTorch version and the pretrained models are available at https://github.com/yfsong0709/EfficientGCNv1 .