The representation power of convolutional neural network (CNN) models for hyperspectral image (HSI) analysis is in practice limited by the available amount of the labeled samples, which is often ...insufficient to sustain deep networks with many parameters. We propose a novel approach to boost the network representation power with a two-stream 2-D CNN architecture. The proposed method extracts simultaneously, the spectral features and local spatial and global spatial features, with two 2-D CNN networks and makes use of channel correlations to identify the most informative features. Moreover, we propose a layer-specific regularization and a smooth normalization fusion scheme to adaptively learn the fusion weights for the spectral-spatial features from the two parallel streams. An important asset of our model is the simultaneous training of the feature extraction, fusion, and classification processes with the same cost function. Experimental results on several hyperspectral data sets demonstrate the efficacy of the proposed method compared with the state-of-the-art methods in the field.
Despite significant efforts made so far for Weakly Supervised Object Detection (WSOD), proposal generation and proposal selection are still two major challenges. In this paper, we focus on addressing ...the two challenges by generating and selecting high-quality proposals. To be specific, for proposal generation, we combine selective search and a Gradient-weighted Class Activation Mapping (Grad-CAM) based technique to generate more proposals having higher Intersection-Over-Union (IOU) with ground truth boxes than those obtained by greedy search approaches, which can better envelop the entire objects. As regards proposal selection, for each object class, we choose as many confident positive proposals as possible and meanwhile only select class-specific hard negatives to focus training on more discriminative negative proposals by up-weighting their losses, which can make training more effective. The proposed proposal generation and proposal selection approaches are generic and thus can be broadly applied to many WSOD methods. In this work, we unify them into the framework of Online Instance Classifier Refinement (OICR). Experimental results on the PASCAL VOC 2007 and 2012 datasets and MS COCO dataset demonstrate that our method significantly improves the baseline method OICR by large margins (13.4% mAP and 11.6% CorLoc gains on the VOC 2007 dataset, 15.0% mAP and 8.9% CorLoc gains on the VOC 2012 dataset, and 6.4% mAP and 5.0% CorLoc gains on the COCO dataset) and achieves the state-of-the-art results compared with existing methods.
Automatic detection and localization of anomalies in nanofibrous materials help to reduce the cost of the production process and the time of the post-production visual inspection process. Amongst all ...the monitoring methods, those exploiting Scanning Electron Microscope (SEM) imaging are the most effective. In this paper, we propose a region-based method for the detection and localization of anomalies in SEM images, based on Convolutional Neural Networks (CNNs) and self-similarity. The method evaluates the degree of abnormality of each subregion of an image under consideration by computing a CNN-based visual similarity with respect to a dictionary of anomaly-free subregions belonging to a training set. The proposed method outperforms the state of the art.
Automatic crack detection is vital for efficient and economical road maintenance. With the explosive development of convolutional neural networks (CNNs), recent crack detection methods are mostly ...based on CNNs. In this article, we propose a deeply supervised convolutional neural network for crack detection via a novel multiscale convolutional feature fusion module. Within this multiscale feature fusion module, the high-level features are introduced directly into the low-level features at different convolutional stages. Besides, deep supervision provides integrated direct supervision for convolutional feature fusion, which is helpful to improve model convergency and final performance of crack detection. Multiscale convolutional features learned at different convolution stages are fused together to robustly represent cracks, whose geometric structures are complicated and hardly captured by single-scale features. To demonstrate its superiority and generalizability, we evaluate the proposed network on three public crack data sets, respectively. Sufficient experimental results demonstrate that our method outperforms other state-of-the-art crack detection, edge detection, and image segmentation methods in terms of F1-score and mean IU.
Deep learning has been successfully applied to image denoising. In this study, we take one step forward by using deep learning to suppress random noise in poststack seismic data from the aspects of ...network architecture and training samples. On the one hand, poststack seismic data denoising mainly aims at 3-D seismic data. We designed an end-to-end 3-D denoising convolutional neural network (3-D-DnCNN) that takes raw 3-D cubes as input in order to better extract the features of the 3-D spatial structure of poststack seismic data. On the other hand, denoising images with deep learning require noisy-clean sample pairs for training. In the field of seismic data processing, researchers usually try their best to suppress noise by using complex processes that combine different methods, but clean labels of seismic data are not available. In addition, building training samples in field seismic data has become an interesting but challenging problem. Therefore, we propose a training sample selection method that contains a complex workflow to produce comparatively ideal training samples. Experiments in this study demonstrate that deep learning can directly learn the ability to denoise field seismic data from selected samples. Although the building of the training samples may occur through a complex process, the experimental results of synthetic seismic data and field seismic data show that the 3-D-DnCNN has learned the ability to suppress the Gaussian noise and super-Gaussian noise from different training samples. Moreover, the 3-D-DnCNN network has better denoising performance toward arc-like imaging noise. In addition, we adopt residual learning and batch normalization in order to accelerate the training speed. After network training is satisfactorily completed, its processing efficiency can be significantly higher than that of conventional denoising methods.
Scene text detection is an important step of scene text recognition system and also a challenging problem. Different from general object detections, the main challenges of scene text detection lie on ...arbitrary orientations, small sizes, and significantly variant aspect ratios of text in natural images. In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass. No post-processing other than efficient non-maximum suppression is involved. We have evaluated the proposed TextBoxes++ on four public data sets. In all experiments, TextBoxes++ outperforms competing methods in terms of text localization accuracy and runtime. More specifically, TextBoxes++ achieves an f-measure of 0.817 at 11.6 frames/s for 1024 × 1024 ICDAR 2015 incidental text images and an f-measure of 0.5591 at 19.8 frames/s for 768 × 768 COCO-Text images. Furthermore, combined with a text recognizer, TextBoxes++ significantly outperforms the state-of-the-art approaches for word spotting and end-to-end text recognition tasks on popular benchmarks. Code is available at: https://github.com/MhLiao/TextBoxes_plusplus.
Gradient descent optimization of learning has become a paradigm for training deep convolutional neural networks (DCNN). However, utilizing other learning strategies in the training process of the ...DCNN has rarely been explored by the deep learning (DL) community. This serves as the motivation to introduce a non-iterative learning strategy to retrain neurons at the top dense or fully connected (FC) layers of DCNN, resulting in, higher performance. The proposed method exploits the Moore-Penrose Inverse to pull back the current residual error to each FC layer, generating well-generalized features. Further, the weights of each FC layers are recomputed according to the Moore-Penrose Inverse. We evaluate the proposed approach on six most widely accepted object recognition benchmark datasets: Scene-15, CIFAR-10, CIFAR-100, SUN-397, Places365, and ImageNet. The experimental results show that the proposed method obtains improvements over 30 state-of-the-art methods. Interestingly, it also indicates that any DCNN with the proposed method can provide better performance than the same network with its original Backpropagation (BP)-based training.
Understanding deep convolutional networks Mallat, Stéphane
Philosophical transactions of the Royal Society of London. Series A: Mathematical, physical, and engineering sciences,
04/2016, Volume:
374, Issue:
2065
Journal Article
Peer reviewed
Open access
Deep convolutional networks provide state-of-the-art classifications and regressions results over many high-dimensional problems. We review their architecture, which scatters data with a cascade of ...linear filter weights and nonlinearities. A mathematical framework is introduced to analyse their properties. Computations of invariants involve multiscale contractions with wavelets, the linearization of hierarchical symmetries and sparse separations. Applications are discussed.
Convolution in Convolution for Network in Network Pang, Yanwei; Sun, Manli; Jiang, Xiaoheng ...
IEEE transaction on neural networks and learning systems,
05/2018, Volume:
29, Issue:
5
Journal Article
Open access
Network in network (NiN) is an effective instance and an important extension of deep convolutional neural network consisting of alternating convolutional layers and pooling layers. Instead of using a ...linear filter for convolution, NiN utilizes shallow multilayer perceptron (MLP), a nonlinear function, to replace the linear filter. Because of the powerfulness of MLP and 1 × 1 convolutions in spatial domain, NiN has stronger ability of feature representation and hence results in better recognition performance. However, MLP itself consists of fully connected layers that give rise to a large number of parameters. In this paper, we propose to replace dense shallow MLP with sparse shallow MLP. One or more layers of the sparse shallow MLP are sparely connected in the channel dimension or channel-spatial domain. The proposed method is implemented by applying unshared convolution across the channel dimension and applying shared convolution across the spatial dimension in some computational layers. The proposed method is called convolution in convolution (CiC). The experimental results on the CIFAR10 data set, augmented CIFAR10 data set, and CIFAR100 data set demonstrate the effectiveness of the proposed CiC method.
Pansharpening refers to the fusion of a panchromatic (PAN) image with a high spatial resolution and a multispectral (MS) image with a low spatial resolution, aiming to obtain a high spatial ...resolution MS (HRMS) image. In this article, we propose a novel deep neural network architecture with level-domain-based loss function for pansharpening by taking into account the following double-type structures, i.e., double-level, double-branch, and double-direction, called as triple-double network (TDNet). By using the structure of TDNet, the spatial details of the PAN image can be fully exploited and utilized to progressively inject into the low spatial resolution MS (LRMS) image, thus yielding the high spatial resolution output. The specific network design is motivated by the physical formula of the traditional multi-resolution analysis (MRA) methods. Hence, an effective MRA fusion module is also integrated into the TDNet. Besides, we adopt a few ResNet blocks and some multi-scale convolution kernels to deepen and widen the network to effectively enhance the feature extraction and the robustness of the proposed TDNet. Extensive experiments on reduced- and full-resolution datasets acquired by WorldView-3, QuickBird, and GaoFen-2 sensors demonstrate the superiority of the proposed TDNet compared with some recent state-of-the-art pansharpening approaches. An ablation study has also corroborated the effectiveness of the proposed approach. The code is available at https://github.com/liangjiandeng/TDNet .