The deep convolutional neural network (CNN) has recently attracted the researchers for classification of hyperspectral remote sensing images. The CNN mainly consists of convolution layer, pooling ...layer and fully connected layer. The pooling is a regularisation technique and improves the performance of CNN while reducing the computation time. Various pooling strategies have been developed in literature. This study shows the effect of pooling strategy on the performance of deep CNN for classification of hyperspectral remote sensing images. The authors have compared the performance of various pooling strategies such as max pooling, average pooling, stochastic pooling, rank-based average pooling and rank-based weighted pooling. The experiments were performed on three well-known hyperspectral remote sensing datasets: Indian Pines, University of Pavia and Kennedy Space Center. The proposed experimental results show that max pooling has produced better results for all the three considered datasets.
•We propose a learnable residual pooling layer comprising of a residual encoding module and an aggregation module that retains spatial information and aggregates them to a feature with a lower ...dimension.•We propose an end-to-end learning framework that integrates the residual pooling layer into any pre-trained CNN model for efficient feature transfer for texture recognition.•We compare the performance of the proposed pooling layer with other residual encoding schemes to illustrate state-of-the-art performance on benchmark texture datasets, an industry dataset and a scene recognition dataset.
Current deep learning-based texture recognition methods extract spatial orderless features from pre-trained deep learning models that are trained on large-scale image datasets. These methods either produce high dimensional features or have multiple steps like dictionary learning, feature encoding and dimension reduction. In this paper, we propose a novel end-to-end learning framework that not only overcomes these limitations, but also demonstrates faster learning. The proposed framework incorporates a residual pooling layer consisting of a residual encoding module and an aggregation module. The residual encoder preserves the spatial information for improved feature learning and the aggregation module generates orderless feature for classification through a simple averaging. The feature has the lowest dimension among previous deep texture recognition approaches, yet it achieves state-of-the-art performance on benchmark texture recognition datasets such as FMD, DTD, 4D Light and one industry dataset used for metal surface anomaly detection. Additionally, the proposed method obtains comparable results on the MIT-Indoor scene recognition dataset. Our codes are available at https://github.com/maoshangbo/DRP-Texture-Recognition.
•Pooling is one of the key elements in a convolutional neural network.•In this paper, we propose a new pooling method named universal pooling (UP).•UP can actually be considered as a channel-wise ...local spatial attention module.•UP includes previous simple poolings such as MP, AP, and SP as special cases.•UP achieves better performance than the previous pooling methods.
Pooling is one of the key elements in a convolutional neural network. It reduces the feature map size, thereby enabling training with a limited amount of computation. The most common pooling methods are average pooling, max pooling, and stride pooling. The common pooling methods, however, have the disadvantage that they can perform only specified and fixed pooling functions and thus have limited expressive power. In this paper, we propose a new pooling method named universal pooling (UP). UP performs different pooling functions depending on the training samples. UP is a general pooling and includes the previous common pooling methods as special cases. The structure of UP is inspired by attention methods. UP can actually be considered as a channel-wise local spatial attention module. It is quite different from attention-based feature reduction methods. We insert UP into a couple of popular networks and apply the networks to benchmark sets in two applications, namely, image recognition and semantic segmentation. The experiment results show that complex poolings are trained in the proposed UP and that UP achieves better performance than the previous pooling methods.
Rank Pooling for Action Recognition Fernando, Basura; Gavves, Efstratios; Oramas M., Jose ...
IEEE transactions on pattern analysis and machine intelligence,
2017-April-1, 2017-04-00, 2017-4-1, 20170401, Volume:
39, Issue:
4
Journal Article
Peer reviewed
Open access
We propose a function-based temporal pooling method that captures the latent structure of the video sequence data - e.g., how frame-level features evolve over time in a video. We show how the ...parameters of a function that has been fit to the video data can serve as a robust new video representation. As a specific example, we learn a pooling function via ranking machines. By learning to rank the frame-level features of a video in chronological order, we obtain a new representation that captures the video-wide temporal dynamics of a video, suitable for action recognition. Other than ranking functions, we explore different parametric models that could also explain the temporal changes in videos. The proposed functional pooling methods, and rank pooling in particular, is easy to interpret and implement, fast to compute and effective in recognizing a wide variety of actions. We evaluate our method on various benchmarks for generic action, fine-grained action and gesture recognition. Results show that rank pooling brings an absolute improvement of 7-10 average pooling baseline. At the same time, rank pooling is compatible with and complementary to several appearance and local motion based methods and features, such as improved trajectories and deep learning features.
Alcohol use disorder (AUD) is an important brain disease. It alters the brain structure. Recently, scholars tend to use computer vision based techniques to detect AUD. We collected 235 subjects, 114 ...alcoholic and 121 non-alcoholic. Among the 235 image, 100 images were used as training set, and data augmentation method was used. The rest 135 images were used as test set. Further, we chose the latest powerful technique—convolutional neural network (CNN) based on convolutional layer, rectified linear unit layer, pooling layer, fully connected layer, and softmax layer. We also compared three different pooling techniques: max pooling, average pooling, and stochastic pooling. The results showed that our method achieved a sensitivity of 96.88%, a specificity of 97.18%, and an accuracy of 97.04%. Our method was better than three state-of-the-art approaches. Besides, stochastic pooling performed better than other max pooling and average pooling. We validated CNN with five convolution layers and two fully connected layers performed the best. The GPU yielded a 149× acceleration in training and a 166× acceleration in test, compared to CPU.
To obtain the best pooling effect and higher accuracy in image recognition, an improved method based on optimal search theory for the pooling layer of convolutional neural networks (CNNs) is ...proposed. The purpose is to solve the problems of the traditional pooling method, namely that it is too simplistic and it is difficult to extract effective features. The basic principle and network structure of CNN are introduced in the study. A new optimum-pooling method is proposed, and the authors study how to obtain the maximum probability to detect the target function under the constrained condition. Comparison experiments of different pooling methods are performed on three widely used datasets: LFW, CIFAR-10, and ImageNet. The experimental results show that the proposed method has the characteristics of more effective feature extraction and wide adaptability, and leads to higher accuracy and lower error rate in image recognition.
Cloud cover is a common and inevitable phenomenon that often hinders the usability of optical remote sensing (RS) image data and further interferes with continuous cartography based on RS image ...interpretation. In the literature, the off-the-shelf cloud detection methods either require various hand-crafted features or utilize data-driven features using deep networks. Overall, deep networks achieve much better performance than traditional methods using hand-crafted features. However, the current deep networks used for cloud detection depend on massive pixel-level annotation labels, which require a great deal of manual annotation labor. To reduce the labor needed for annotating the pixel-level labels, this paper proposes a weakly supervised deep learning-based cloud detection (WDCD) method using block-level labels indicating only the presence or the absence of cloud in one RS image block. In the training phase, a new global convolutional pooling (GCP) operation is proposed to enhance the ability of the feature map to represent useful information (e.g., spatial variance). In the testing phase, the trained deep networks are modified to generate the cloud activation map (CAM) via the local pooling pruning (LPP) strategy, which prunes the local pooling layers of the deep networks that are trained in the training phase to improve the quality (e.g., spatial resolution) of CAM. One large RS image is cropped into multiple overlapping blocks by a sliding window, and then the CAM of each block is generated by the modified deep networks. Based on the correspondence between the image blocks and CAMs, multiple corresponding CAMs are collected to mosaic the CAM of the large image. By segmenting the CAM using a statistical threshold against a clear-sky surface, the pixel-level cloud mask of the testing image can be obtained. To verify the effectiveness of our proposed WDCD method, we collected a new global dataset, for which the training dataset contains over 200,000 RS image blocks with block-level labels from 622 large GaoFen-1 images from all over the world; the validation dataset contains 5 large GaoFen-1 images with pixel-level annotation labels, and the testing dataset contains 25 large GaoFen-1 and ZiYuan-3 images with pixel-level annotation labels. Even under the extremely weak supervision, our proposed WDCD method could achieve excellent cloud detection performance with an overall accuracy (OA) as high as 96.66%. Extensive experiments demonstrated that our proposed WDCD method obviously outperforms the state-of-the-art methods. The collected datasets have been made publicly available online (https://github.com/weichenrs/WDCD).
•A weakly supervised deep learning framework is proposed to address cloud detection.•A novel global convolutional pooling (GCP) operation is proposed.•This paper proposes a new local pooling pruning (LPP) strategy.•A large-scale remote sensing image dataset for cloud detection is released.
Graph Neural Network (GNN) models are recently proposed to process the graph-structured data for the learning tasks on graphs, e.g., node classification, link prediction, and so on. This work focuses ...on the graph classification task, aiming to obtain the graph representation and predict the class label for a graph. Existing works proposed applying graph pooling to obtain graph embedding but still suffer from several issues. First, node embeddings are generated according to the topological information of the whole graph, but ignoring the local isomorphic substructures commonly seen in bioinformatics and chemistry. Another limitation arises when aggregating node embeddings. The hard assignment obtained through clustering algorithms, which rely on preset and fixed parameters instead of considering the graph’s properties adaptively, restricts the flexibility in handling graphs of varying scales. To address the above problems, a module-based graph pooling framework (MGPool) is proposed in this work. Inspired by the rules of bioinformatics, MGPool assumes that a graph consists of multiple modules (also known as sub structures), which are identified based on the natural organization of the graph rather than the hard allocation of nodes. Benefiting from the hypothesis, MGPool generates node embeddings from graph-view and module-view, which is capable to capture global graph information and local isomorphic information respectively. Then module-level pooling is used to capture the intra-module information, while the inter-module information in terms of the correlation between modules is obtained through graph-level pooling. Finally, an entropy-based weighting mechanism is proposed to adjust the modules’ weights for the graph aggregation. Experiments conducted on bioinformatics benchmark datasets demonstrate the effectiveness of MGPool by outperforming other state-of-the-art graph pooling methods. For social network datasets, MGPool also provides competitive performance. Moreover, the visualization of module entropy weights is given to reveal the interpretability of the model.
•The module-based graph pooling (MGPool) framework obtains the graph representation by three stages from bottom to top: node, module and graph.•MGPool considers information from both graph and module views during node encoding.•An entropy-based weighting mechanism is adopted to model the modules’ contribution to the graph representation.•MGPool outperforms other SOTA graph pooling methods on the benchmark datasets of graph classification.•The visualization of modules in the experiments reveals the interpretability of MGPool.•The source code of MGPool is available in https://github.com/SubaiDeng/MGPool.
•Dilated convolution kernel enlarges local receptive field and enhances feature extraction.•Global pooling layer reduces training parameters number and avoids overfitting problem.•Multi-scale ...convolutional kernels extract multi-scale features of the input image.•Improvement of recognition accuracy and robustness is verified by the experimental results.
It is a challenging research topic to identify plant disease based on diseased leaf image processing techniques due to the complexity of the diseased leaf images. Deep learning models are promising for identifying plant disease based on leaf images and AlexNet is one of these models. Aiming at the problems of too many parameters of the AlexNet model and single feature scale, a global pooling dilated convolutional neural network (GPDCNN) is proposed in this paper for plant disease identification by combining dilated convolution with global pooling. Compared with the classical convolutional neural network (CNN) and AlexNet models, GPDCNN has three improvements: (1) the convolution receptive field are increased without increasing the computational complexity and without losing the discriminant formation by replacing fully connected layers with a global pooling layer; (2) dilated convolutional layer is employed to recover the spatial resolution without increasing the number of training parameters; (3) GPDCNN also integrates the merits of dilated convolution and global pooling. Experimental results on the datasets of six common cucumber leaf diseases demonstrate that the proposed model can effectively recognize cucumber diseases.