Change detection (CD) is a critical task to observe and analyze dynamic processes of land cover. Although numerous deep learning-based CD models have performed excellently, their further performance ...improvements are constrained by the limited knowledge extracted from the given labelled data. On the other hand, the foundation models that emerged recently contain a huge amount of knowledge by scaling up across data modalities and proxy tasks. In this paper, we propose a Bi-Temporal Adapter Network (BAN), which is a universal foundation model-based CD adaptation framework aiming to extract the knowledge of foundation models for CD. The proposed BAN contains three parts, i.e. frozen foundation model (e.g., CLIP), bi-temporal adapter branch (Bi-TAB), and bridging modules between them. Specifically, BAN extracts general features through a frozen foundation model, which are then selected, aligned, and injected into Bi-TAB via the bridging modules. Bi-TAB is designed as a model-agnostic concept to extract task/domain-specific features, which can be either an existing arbitrary CD model or some hand-crafted stacked blocks. Beyond current customized models, BAN is the first extensive attempt to adapt the foundation model to the CD task. Experimental results show the effectiveness of our BAN in improving the performance of existing CD methods (e.g., up to 4.08% IoU improvement) with only a few additional learnable parameters. More importantly, these successful practices show us the potential of foundation models for remote sensing CD. The code is available at https://github.com/likyoo/BAN and will be supported in our Open-CD.
Remote sensing object detection (RSOD) encounters challenges in complex backgrounds and small object detection, which are interconnected and unable to be addressed separately. To this end, we propose ...an attention-free global multiscale fusion network (AGMF-Net). Initially, we present a spatial bias module (SBM) to obtain long-range dependencies as a part of our proposal global information extraction module (GIEM). GIEM efficiently captures the global information, overcoming challenges posed by complex backgrounds. Moreover, we propose multi-task enhanced structure (MES) and multi-task feature pre-treatment (MFP) to enhance the feature representation of multiscale targets, while eliminating the interference from complex backgrounds. In addition, an efficient context decoupled detector (ECDD) is presented to provide distinct features for regression and classification tasks, aiming to improve the efficiency of RSOD. Extensive experiments demonstrate that our proposed method achieves superior performance compared with the state-of-the-art detectors. Specifically, AGMF-Net obtains the mean average precision (mAP) of 73.2%, 92.03%, 95.21% and 94.30% on DIOR, HRRSD, NWPU VHR-10 and RSOD datasets, respectively.
•An attention-based hierarchical framework is proposed for building footprint extraction.•Three CNN modules are applied to learn the features of various types of buildings.•The combination of focal ...loss and dice Loss outperforms loss functions in literature.•An overlap splicing and voting mechanism is proposed to reduce edge effects.
Convolutional neural networks show excellent performance in image segmentation. However, compared with natural images, remote sensing images are characterized by large coverage, multi-scale nesting, and complex geographic context. Therefore, it has been a challenging task to extract the building footprint from high-resolution remote sensing images. In this study, an end-to-end Multi-Scale Geoscience Network (MS-GeoNet) is proposed for building footprint extraction. The proposed architecture focuses on multi-scale nested characteristics and the spatial correlation between buildings and surroundings. The performance of a number of embedding modules and loss functions in extracting various types of buildings are explored in detail. Our proposed method outperforms the baseline model Fully Convolutional DenseNets (FC- DenseNet) by 7.10% for the intersection over union (IoU) and by 3.09% for F1-score. Moreover, to increase the accuracy of large area interpretation, an overlap splicing and voting mechanism is proposed. It is also an effective means to solve the edge processing task. The proposed method demonstrates approximately 1.19% IoU improvement and 0.83% F1-score improvement on our dataset, compared with the traditional splicing method. MS-GeoNet is a promising approach for automatic generation of building footprint in practical applications.
Cloud cover can influence numerous important ecological processes, including reproduction, growth, survival, and behavior, yet our assessment of its importance at the appropriate spatial scales has ...remained remarkably limited. If captured over a large extent yet at sufficiently fine spatial grain, cloud cover dynamics may provide key information for delineating a variety of habitat types and predicting species distributions. Here, we develop new near-global, fine-grain (≈1 km) monthly cloud frequencies from 15 y of twice-daily Moderate Resolution Imaging Spectroradiometer (MODIS) satellite images that expose spatiotemporal cloud cover dynamics of previously undocumented global complexity. We demonstrate that cloud cover varies strongly in its geographic heterogeneity and that the direct, observation-based nature of cloud-derived metrics can improve predictions of habitats, ecosystem, and species distributions with reduced spatial autocorrelation compared to commonly used interpolated climate data. These findings support the fundamental role of remote sensing as an effective lens through which to understand and globally monitor the fine-grain spatial variability of key biodiversity and ecosystem properties.
Knowledge of the temporal dynamics and spatial variability of soil moisture is crucial in understanding many environmental processes and their impacts on plant fertility, crop yields, droughts, or ...exposure to flood hazards. The Soil Moisture Active Passive (SMAP) satellite was launched on 31 January 2015 by the National Aeronautics and Space Administration (NASA) to provide SSM using brightness temperature through its active (radar, 3 km) and passive (radiometer, 36 km) sensors at an intermediate resolution of 9 km. ...we utilized these advancements to further postprocess the downscaled soil moisture dataset at 1-km spatial resolution and provide a more accurate and reliable product. According to the U.S. Department of Agriculture National Agricultural Statistical Service (USDA NASS), the rice farmlands are flooded and seeded each year from late April through May.
Abstract
To address the problem that a large number of small targets exist in remote-sensing images but are difficult to detect, in this paper, a DenseYOLOv5 detection model is proposed for practical ...applications. DenseYOLOv5 is based on YOLOv5s and the small target detection head P2 and its feature fusion part are added to improve the detection performance of small targets. To address the problem of semantic information loss of small targets due to continuous upsampling in YOLOv5s, DenseYOLOv5 reconstructs the feature fusion pyramid (FPN) structure and incorporates dense connections. In addition, DenseYOLOv5 also uses transposed convolution as the upsampling method to further improve the small target detection capability. DenseYOLOv5 can achieve better detection results with less memory and computational overhead and thus has better usability.
Recently, remote sensing images have become increasingly popular in a number of tasks, such as environmental monitoring. However, the observed images from satellite sensors often suffer from ...low-resolution (LR), making it difficult to meet the requirements for further analysis. Super-resolution (SR) aims to increase the image resolution while providing finer spatial details, which perfectly remedies the weakness of satellite images. Therefore, in this article, we propose an innovative mixed high-order attention network (MHAN) for remote sensing SR. It comprises two components: a feature extraction network for feature extraction, and a feature refinement network with high-order attention (HOA) mechanism for detail restoration. In the feature extraction network, we replace the elementwise addition with weighted channelwise concatenation in all skip connections, which greatly facilitates the information flow. In the feature refinement network, rather than exploring the first-order statistics (spatial or channel attention), we introduce the HOA module to restore the missing details. Finally, to fully exploit hierarchical features, we introduce the frequency-aware connection to bridge the feature extraction and feature refinement networks. Experiments on two widely used remote sensing image data sets demonstrate that our MHAN not only obtains better accuracy than the state-of-the-art methods but also shows the superiority in terms of running time and GPU cost. Code is available at https://github.com/ZhangDY827/MHAN .
Deep learning-based object detection models rely heavily on large-scale and precise annotations for training. However, manually annotating bounding-box annotations for such data is both ...time-consuming and costly, especially when dealing with high-resolution satellite imagery containing densely packed small-sized objects. To alleviate the burden of manual annotation, we propose a simple yet effective approach, called Progressive Self-Training Object Detection (PSTDet), to enable accurate object detection in remote sensing imagery without relying on manual annotations. Our PSTDet framework consists of two main components: Initial Pseudo Label Generation (IPLG) and Progressive Self-training with Re-Labeling (PST-R). In IPLG, we leverage unsupervised image clustering, unsupervised instance detection, and geometric constraints to automatically generate high-quality bounding-box annotations for the initial training dataset. This innovative approach significantly reduces the time and expense associated with data annotation, laying a solid foundation for the subsequent progressive self-training stage. The annotations produced by IPLG serve as the training data for PST-R, which enhances the detector and pseudo labels through progressive self-training and our proposed Noisy Pseudo-Label Filtering strategy (NPLFilter). Our NPLFilter purifies the quality of pseudo labels by integrating geometric constraints, prior knowledge, and category-adaptive thresholds. Experimental results demonstrate that our method achieves significant performance improvement on challenging NWPU VHR-10.v2 and DIOR datasets. Notably, our method far outperforms state-of-the-art weakly-supervised methods and compares favorably with fully-supervised methods.