Scene classification of high-resolution images is an active research topic in the remote sensing community. Although convolutional neural network (CNN)-based methods have obtained good performance, ...large-scale changes of ground objects in complex scenes restrict the further improvement of classification accuracy. In this letter, a global-local dual-branch structure (GLDBS) is designed to explore discriminative features of the original images and the crucial areas, and the strategy of decision-level fusion is applied for performance improvement. To discover the crucial area of the original image, the energy map generated by CNNs is transformed to the binary image, and the coordinates of the maximally connected region can be obtained. Among them, two shallow CNNs, ResNet18 and ResNet34, are selected as the backbone to construct a dual-branch network, and a joint loss is designed to optimize the whole model. In the GLDBS, the two streams employ the same structure (ResNet18-ResNet34) as the backbone, while the parameters are not shared. Experimental results on the aerial image data set (AID) and NWPU-RESISC45 datasets prove that the proposed GLDBS method achieves remarkable classification performance compared with some state-of-the-art (SOTA) methods. The highest overall accuracies (OAs) on the AID and NWPU-RESISC45 datasets are 97.01% and 94.46%, respectively.
We introduce a comprehensive benchmark for local features and robust estimation algorithms, focusing on the downstream task—the accuracy of the reconstructed camera pose—as our primary metric. Our ...pipeline’s modular structure allows easy integration, configuration, and combination of different methods and heuristics. This is demonstrated by embedding dozens of popular algorithms and evaluating them, from seminal works to the cutting edge of machine learning research. We show that with proper settings, classical solutions may still outperform the
perceived state of the art
. Besides establishing the
actual state of the art
, the conducted experiments reveal unexpected properties of structure from motion pipelines that can help improve their performance, for both algorithmic and learned methods. Data and code are online (
https://github.com/ubc-vision/image-matching-benchmark
), providing an easy-to-use and flexible framework for the benchmarking of local features and robust estimation methods, both
alongside
and
against
top-performing methods. This work provides a basis for the Image Matching Challenge (
https://image-matching-challenge.github.io
).
Deep convolutional neural networks (DCNNs) show impressive similarities to the human visual system. Recent research, however, suggests that DCNNs have limitations in recognizing objects by their ...shape. We tested the hypothesis that DCNNs are sensitive to an object’s local contour features but have no access to global shape information that predominates human object recognition. We employed transfer learning to assess local and global shape processing in trained networks. In Experiment 1, we used restricted and unrestricted transfer learning to retrain AlexNet, VGG-19, and ResNet-50 to classify circles and squares. We then probed these networks with stimuli with conflicting global shape and local contour information. We presented networks with overall square shapes comprised of curved elements and circles comprised of corner elements. Networks classified the test stimuli by local contour features rather than global shapes. In Experiment 2, we changed the training data to include circles and squares comprised of different elements so that the local contour features of the object were uninformative. This considerably increased the network’s tendency to produce global shape responses, but deeper analyses in Experiment 3 revealed the network still showed no sensitivity to the spatial configuration of local elements. These findings demonstrate that DCNNs’ performance is an inversion of human performance with respect to global and local shape processing. Whereas abstract relations of elements predominate in human perception of shape, DCNNs appear to extract only local contour fragments, with no representation of how they spatially relate to each other to form global shapes.
Speeded-Up Robust Features (SURF) Bay, Herbert; Ess, Andreas; Tuytelaars, Tinne ...
Computer vision and image understanding,
06/2008, Volume:
110, Issue:
3
Journal Article
Peer reviewed
Open access
This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features). SURF approximates or even outperforms previously proposed schemes with ...respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (specifically, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps.
The paper encompasses a detailed description of the detector and descriptor and then explores the effects of the most important parameters. We conclude the article with SURF’s application to two challenging, yet converse goals: camera calibration as a special case of image registration, and object recognition. Our experiments underline SURF’s usefulness in a broad range of topics in computer vision.
Deep models have been widely and successfully used in image manipulation detection, which aims to classify tampered images and localize tampered regions. Most existing methods mainly focus on ...extracting global features from tampered images, while neglecting the relationships of local features between tampered and authentic regions within a single tampered image. To exploit such spatial relationships, we propose Proposal Contrastive Learning (PCL) for effective image manipulation detection. Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively. To further improve the discriminative power, we exploit the relationships of local features through a proxy proposal contrastive learning task by attracting/repelling proposal-based positive/negative sample pairs. Moreover, we show that our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features. Extensive experiments among several standard datasets demonstrate that our PCL can be a general module to obtain consistent improvement. The code is available at https://github.com/Sandy-Zeng/PCL .
Accurately predicting the remaining useful life (RUL) of equipment is crucial for planning production and eliminating unplanned downtime events. Specifically, the application of effective RUL ...prediction methods can detect potential equipment failures in advance to provide timely maintenance measures, which can help enterprises better plan and manage resources, optimize production plans, and provide strong support for subsequent maintenance decisions. The data-driven approaches have achieved great success in the field of RUL prediction by fully exploiting mechanical degradation information from historical operation data. However, these approaches have certain limitations, for instance, (1) they always fail to precisely extract spatial and temporal features in noisy environments simultaneously; (2) they often fail to effectively capture local features and global degradation trends simultaneously. To overcome the above limitations, we design an end-to-end model, termed ASATCN-TABGRU, for mechanical failure prediction, which contains an automatic shrinking attention temporal convolutional network (ASATCN) and a temporal attention bidirectional gated recurrent unit (TABGRU). In ASATCN module, to extract spatio-temporal information from historical operation data, we first perform a multi-scale modeling of historical operation data through a deliberately designed dilated causal convolution subnetwork (DCCS) to obtain local features. Then, we propose a novel soft thresholding subnetwork (STS) based on the normalization-based attention module (NAM), to capture useful temporal features through the automatic shrinking soft thresholding mechanism from the local features sequence; in addition, we design a hybrid attention subnetwork (HAS) to capture spatial features with flexible and different importance by the spatial-channel attention mechanism from historical operation data. The precise extraction of spatio-temporal features is then achieved through a connection operation. With the above encoded spatio-temporary features sequence, a TABGRU module is further proposed to capture global degradation trends by simultaneously extracting contextual information and historical influence information, thereby effectively modeling the local and global features. The experiments show that our approach has better performance and robustness, compared with other state-of-the-art approaches, particularly on the small sample dataset.