Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to ...developing more sophisticated attention modules for achieving better performance, which inevitably increase model complexity. To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity. Therefore, we propose a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via 1D convolution. Furthermore, we develop a method to adaptively select kernel size of 1D convolution, determining coverage of local cross-channel interaction. The proposed ECA module is both efficient and effective, e.g., the parameters and computations of our modules against backbone of ResNet50 are 80 vs. 24.37M and 4.7e-4 GFlops vs. 3.86 GFlops, respectively, and the performance boost is more than 2% in terms of Top-1 accuracy. We extensively evaluate our ECA module on image classification, object detection and instance segmentation with backbones of ResNets and MobileNetV2. The experimental results show our module is more efficient while performing favorably against its counterparts.
Recent research has shown that using spectral-spatial information can considerably improve the performance of hyperspectral image (HSI) classification. HSI data is typically presented in the format ...of 3D cubes. Thus, 3D spatial filtering naturally offers a simple and effective method for simultaneously extracting the spectral-spatial features within such images. In this paper, a 3D convolutional neural network (3D-CNN) framework is proposed for accurate HSI classification. The proposed method views the HSI cube data altogether without relying on any preprocessing or post-processing, extracting the deep spectral-spatial-combined features effectively. In addition, it requires fewer parameters than other deep learning-based methods. Thus, the model is lighter, less likely to over-fit, and easier to train. For comparison and validation, we test the proposed method along with three other deep learning-based HSI classification methods-namely, stacked autoencoder (SAE), deep brief network (DBN), and 2D-CNN-based methods-on three real-world HSI datasets captured by different sensors. Experimental results demonstrate that our 3D-CNN-based method outperforms these state-of-the-art methods and sets a new record.
Time series classification is an important research topic in machine learning and data mining communities, since time series data exist in many application domains. Recent studies have shown that ...machine learning algorithms could benefit from good feature representation, explaining why deep learning has achieved breakthrough performance in many tasks. In deep learning, the convolutional neural network (CNN) is one of the most well-known approaches, since it incorporates feature learning and classification task in a unified network architecture. Although CNN has been successfully applied to image and text domains, it is still a challenge to apply CNN to time series data. This paper proposes a tensor scheme along with a novel deep learning architecture called multivariate convolutional neural network (MVCNN) for multivariate time series classification, in which the proposed architecture considers multivariate and lag-feature characteristics. We evaluate our proposed method with the prognostics and health management (PHM) 2015 challenge data, and compare with several algorithms. The experimental results indicate that the proposed method outperforms the other alternatives using the prediction score, which is the evaluation metric used by the PHM Society 2015 data challenge. Besides performance evaluation, we provide detailed analysis about the proposed method.
Object detection is a fundamental visual recognition problem in computer vision and has been widely studied in the past decades. Visual object detection aims to find objects of certain target classes ...with precise localization in a given image and assign each object instance a corresponding class label. Due to the tremendous successes of deep learning based image classification, object detection techniques using deep learning have been actively studied in recent years. In this paper, we give a comprehensive survey of recent advances in visual object detection with deep learning. By reviewing a large body of recent related work in literature, we systematically analyze the existing object detection frameworks and organize the survey into three major parts: (i) detection components, (ii) learning strategies, and (iii) applications & benchmarks. In the survey, we cover a variety of factors affecting the detection performance in detail, such as detector architectures, feature learning, proposal generation, sampling strategies, etc. Finally, we discuss several future directions to facilitate and spur future research for visual object detection with deep learning.
A review on deep learning in UAV remote sensing Osco, Lucas Prado; Marcato Junior, José; Marques Ramos, Ana Paula ...
International journal of applied earth observation and geoinformation,
October 2021, Volume:
102
Journal Article
Peer reviewed
Open access
•Combining deep learning and UAV-based data is an emerging trend in remote sensing.•Most articles published rely on CNN-based methods.•Future perspectives in UAV-based data processing still have much ...to cover.
Deep Neural Networks (DNNs) learn representation from data with an impressive capability, and brought important breakthroughs for processing images, time-series, natural language, audio, video, and many others. In the remote sensing field, surveys and literature revisions specifically involving DNNs algorithms’ applications have been conducted in an attempt to summarize the amount of information produced in its subfields. Recently, Unmanned Aerial Vehicle (UAV)-based applications have dominated aerial sensing research. However, a literature revision that combines both “deep learning” and “UAV remote sensing” thematics has not yet been conducted. The motivation for our work was to present a comprehensive review of the fundamentals of Deep Learning (DL) applied in UAV-based imagery. We focused mainly on describing the classification and regression techniques used in recent applications with UAV-acquired data. For that, a total of 232 papers published in international scientific journal databases was examined. We gathered the published materials and evaluated their characteristics regarding the application, sensor, and technique used. We discuss how DL presents promising results and has the potential for processing tasks associated with UAV-based image data. Lastly, we project future perspectives, commentating on prominent DL paths to be explored in the UAV remote sensing field. This revision consisting of an approach to introduce, commentate, and summarize the state-of-the-art in UAV-based image applications with DNNs algorithms in diverse subfields of remote sensing, grouping it in the environmental, urban, and agricultural contexts.
Automated melanoma recognition in dermoscopy images is a very challenging task due to the low contrast of skin lesions, the huge intraclass variation of melanomas, the high degree of visual ...similarity between melanoma and non-melanoma lesions, and the existence of many artifacts in the image. In order to meet these challenges, we propose a novel method for melanoma recognition by leveraging very deep convolutional neural networks (CNNs). Compared with existing methods employing either low-level hand-crafted features or CNNs with shallower architectures, our substantially deeper networks (more than 50 layers) can acquire richer and more discriminative features for more accurate recognition. To take full advantage of very deep networks, we propose a set of schemes to ensure effective training and learning under limited training data. First, we apply the residual learning to cope with the degradation and overfitting problems when a network goes deeper. This technique can ensure that our networks benefit from the performance gains achieved by increasing network depth. Then, we construct a fully convolutional residual network (FCRN) for accurate skin lesion segmentation, and further enhance its capability by incorporating a multi-scale contextual information integration scheme. Finally, we seamlessly integrate the proposed FCRN (for segmentation) and other very deep residual networks (for classification) to form a two-stage framework. This framework enables the classification network to extract more representative and specific features based on segmented results instead of the whole dermoscopy images, further alleviating the insufficiency of training data. The proposed framework is extensively evaluated on ISBI 2016 Skin Lesion Analysis Towards Melanoma Detection Challenge dataset. Experimental results demonstrate the significant performance gains of the proposed framework, ranking the first in classification and the second in segmentation among 25 teams and 28 teams, respectively. This study corroborates that very deep CNNs with effective training mechanisms can be employed to solve complicated medical image analysis tasks, even with limited training data.
We consider the use of deep convolutional neural networks (CNNs) with transfer learning for the image classification and detection problems posed within the context of X-ray baggage security imagery. ...The use of the CNN approach requires large amounts of data to facilitate a complex end-to-end feature extraction and classification process. Within the context of X-ray security screening, limited availability of object of interest data examples can thus pose a problem. To overcome this issue, we employ a transfer learning paradigm such that a pre-trained CNN, primarily trained for generalized image classification tasks where sufficient training data exists, can be optimized explicitly as a later secondary process towards this application domain. To provide a consistent feature-space comparison between this approach and traditional feature space representations, we also train support vector machine (SVM) classifier on CNN features. We empirically show that fine-tuned CNN features yield superior performance to conventional hand-crafted features on object classification tasks within this context. Overall we achieve 0.994 accuracy based on AlexNet features trained with SVM classifier. In addition to classification, we also explore the applicability of multiple CNN driven detection paradigms, such as sliding window-based CNN (SW-CNN), Faster region-based CNNs (F-RCNNs), region-based fully convolutional networks (R-FCN), and YOLOv2. We train numerous networks tackling both single and multiple detections over SW-CNN/ F-RCNN/R-FCN/YOLOv2 variants. YOLOv2, Faster-RCNN, and R-FCN provide superior results to the more traditional SW-CNN approaches. With the use of YOLOv2, using input images of size <inline-formula> <tex-math notation="LaTeX">544\times 544 </tex-math></inline-formula>, we achieve 0.885 mean average precision (mAP) for a six-class object detection problem. The same approach with an input of size <inline-formula> <tex-math notation="LaTeX">416\times 416 </tex-math></inline-formula> yields 0.974 mAP for the two-class firearm detection problem and requires approximately 100 ms per image. Overall we illustrate the comparative performance of these techniques and show that object localization strategies cope well with cluttered X-ray security imagery, where classification techniques fail.
With advances in the continuous improvement and development of the power system, insulators have gradually become one of the most important components. At present, unmanned aerial vehicles (UAVs) ...have been widely used to inspect insulators, insulators in pictures are accurately and efficiently identified by convolutional neural networks (CNNs), and this method has been extensively applied. These existing methods have been widely used to identify insulators in pictures with high accuracy and efficiency. However, they are based on Faster R-CNN, and you only look once (YOLO) either requires more identification time due to the complex network structure or does not have sufficient accuracy for insulator defects. More identification time is required due to the complexity of the network structure, or there is not enough accuracy for insulator defects. Based on the YOLOv3 network, this article proposes a new type of CNN for target detection, which can improve and enhance the efficiency while ensuring the detection speed. In addition, this article applies the latest EIoU and loss functions to YOLOv3, which significantly improves the coincidence of the prediction frame and the annotation frame, and accelerates the convergence speed. The experimental results show that the detection model proposed in this article has an average precision (AP) of 0.94 for insulators and 0.89 for insulator defects, and its detection speed can reach 93.5 ms/image. Finally, after experimental verification, the detection model proposed in this article meets the requirements of power inspection and has good engineering application prospects.
•A fast and accurate fully automatic method for brain tumor segmentation which is competitive both in terms of accuracy and speed compared to the state of the art.•The method is based on deep neural ...networks (DNN) and learns features that are specific to brain tumor segmentation.•We present a new DNN architecture which exploits both local features as well as more global contextual features simultaneously.•Using a GPU implementation and a convolutional output layer, the model is an order of magnitude faster than other state of the art methods.•Introduces a novel cascaded architecture that allows the system to more accurately model local label dependencies.
Display omitted
In this paper, we present a fully automatic brain tumor segmentation method based on Deep Neural Networks (DNNs). The proposed networks are tailored to glioblastomas (both low and high grade) pictured in MR images. By their very nature, these tumors can appear anywhere in the brain and have almost any kind of shape, size, and contrast. These reasons motivate our exploration of a machine learning solution that exploits a flexible, high capacity DNN while being extremely efficient. Here, we give a description of different model choices that we’ve found to be necessary for obtaining competitive performance. We explore in particular different architectures based on Convolutional Neural Networks (CNN), i.e. DNNs specifically adapted to image data.
We present a novel CNN architecture which differs from those traditionally used in computer vision. Our CNN exploits both local features as well as more global contextual features simultaneously. Also, different from most traditional uses of CNNs, our networks use a final layer that is a convolutional implementation of a fully connected layer which allows a 40 fold speed up. We also describe a 2-phase training procedure that allows us to tackle difficulties related to the imbalance of tumor labels. Finally, we explore a cascade architecture in which the output of a basic CNN is treated as an additional source of information for a subsequent CNN. Results reported on the 2013 BRATS test data-set reveal that our architecture improves over the currently published state-of-the-art while being over 30 times faster.