Accurate road detection and centerline extraction from very high resolution (VHR) remote sensing imagery are of central importance in a wide range of applications. Due to the complex backgrounds and ...occlusions of trees and cars, most road detection methods bring in the heterogeneous segments; besides for the centerline extraction task, most current approaches fail to extract a wonderful centerline network that appears smooth, complete, as well as single-pixel width. To address the above-mentioned complex issues, we propose a novel deep model, i.e., a cascaded end-to-end convolutional neural network (CasNet), to simultaneously cope with the road detection and centerline extraction tasks. Specifically, CasNet consists of two networks. One aims at the road detection task, whose strong representation ability is well able to tackle the complex backgrounds and occlusions of trees and cars. The other is cascaded to the former one, making full use of the feature maps produced formerly, to obtain the good centerline extraction. Finally, a thinning algorithm is proposed to obtain smooth, complete, and single-pixel width road centerline network. Extensive experiments demonstrate that CasNet outperforms the state-of-the-art methods greatly in learning quality and learning speed. That is, CasNet exceeds the comparing methods by a large margin in quantitative performance, and it is nearly 25 times faster than the comparing methods. Moreover, as another contribution, a large and challenging road centerline data set for the VHR remote sensing image will be publicly available for further studies.
In this paper, we propose a robust framework for building extraction in visible band images. We first get an initial classification of the pixels based on an unsupervised presegmentation. Then, we ...develop a novel conditional random field (CRF) formulation to achieve accurate rooftops extraction, which incorporates pixel-level information and segment-level information for the identification of rooftops. Comparing with the commonly used CRF model, a higher order potential defined on segment is added in our model, by exploiting region consistency and shape feature at segment level. Our experiments show that the proposed higher order CRF model outperforms the state-of-the-art methods both at pixel and object levels on rooftops with complex structures and sizes in challenging environments.
Accurate object detection is important in computer vision. However, detecting small objects in low-resolution images remains a challenging and elusive problem, primarily because these objects are ...constructed of less visual information and cannot be easily distinguished from similar background regions. To resolve this problem, we propose a Hierarchical Small Object Detection Network in low-resolution remote sensing images, named HSOD-Net. We develop a point-to-region detection paradigm by first performing a key-point prediction to obtain position hypotheses, then only later super-resolving the image and detecting the objects around those candidate positions. By postponing the object prediction to after increasing its resolution, the obtained key-points are more stable than their traditional counterparts based on early object detection with less visual information. This hierarchical approach, HSOD-Net, saves significant run-time, which makes it more suitable for practical applications such as search and rescue, and drone navigation. In comparison with the state-of-art models, HSOD-Net achieves remarkable precision in detecting small objects in low-resolution remote sensing images.
In unmanned aerial vehicle (UAV) large-scale scene modeling, challenges such as missed shots, low overlap, and data gaps due to flight paths and environmental factors, such as variations in lighting, ...occlusion, and weak textures, often lead to incomplete 3D models with blurred geometric structures and textures. To address these challenges, an implicit–explicit coupling enhancement for a UAV large-scale scene modeling framework is proposed. Benefiting from the mutual promotion of implicit and explicit models, we initially address the issue of missing co-visibility clusters caused by environmental noise through large-scale implicit modeling with UAVs. This enhances the inter-frame photometric and geometric consistency. Subsequently, we enhance the multi-view point cloud reconstruction density via synthetic co-visibility clusters, effectively recovering missing spatial information and constructing a more complete dense point cloud. Finally, during the mesh modeling phase, high-quality 3D modeling of large-scale UAV scenes is achieved by inversely radiating and mapping additional texture details into 3D voxels. The experimental results demonstrate that our method achieves state-of-the-art modeling accuracy across various scenarios, outperforming existing commercial UAV aerial photography software (COLMAP 3.9, Context Capture 2023, PhotoScan 2023, Pix4D 4.5.6) and related algorithms.
This paper presents a unified variational formulation for joint object segmentation and stereo matching, which takes both accuracy and efficiency into account. In our approach, depth-map consists of ...compact objects, each object is represented through three different aspects: the perimeter in image space; the slanted object depth plane; and the planar bias, which is to add an additional level of detail on top of each object plane in order to model depth variations within an object. Compared with traditional high quality solving methods in low level, we use a convex formulation of the multilabel Potts Model with PatchMatch stereo techniques to generate depth-map at each image in object level and show that accurate multiple view reconstruction can be achieved with our formulation by means of induced homography without discretization or staircasing artifacts. Our model is formulated as an energy minimization that is optimized via a fast primal-dual algorithm, which can handle several hundred object depth segments efficiently. Performance evaluations in the Middlebury benchmark data sets show that our method outperforms the traditional integer-valued disparity strategy as well as the original PatchMatch algorithm and its variants in subpixel accurate disparity estimation. The proposed algorithm is also evaluated and shown to produce consistently good results for various real-world data sets (KITTI benchmark data sets and multiview benchmark data sets).
High-fidelity mesh reconstruction from point clouds has long been a fundamental research topic in computer vision and computer graphics. Traditional methods require dense triangle meshes to achieve ...high fidelity, but excessively dense triangles may lead to unnecessary storage and computational burdens, while also struggling to capture clear, sharp, and continuous edges. This paper argues that the key to high-fidelity reconstruction lies in preserving sharp features. Therefore, we introduce a novel sharp-feature-preserving reconstruction framework based on primitive detection. It includes an improved deep-learning-based primitive detection module and two novel mesh splitting and selection modules that we propose. Our framework can accurately and reasonably segment primitive patches, fit meshes in each patch, and split overlapping meshes at the triangle level to ensure true sharpness while obtaining lightweight mesh models. Quantitative and visual experimental results demonstrate that our framework outperforms both the state-of-the-art learning-based primitive detection methods and traditional reconstruction methods. Moreover, our designed modules are plug-and-play, which not only apply to learning-based primitive detectors but also can be combined with other point cloud processing tasks such as edge extraction or random sample consensus (RANSAC) to achieve high-fidelity results.
Traditional multi-view stereo (MVS) is not applicable for the point cloud reconstruction of serialized video frames. Among them, the exhausted feature extraction and matching for all the prepared ...frames are time-consuming, and the scope of the search requires covering all the key frames. In this paper, we propose a novel serialized reconstruction method to solve the above issues. Specifically, a joint feature descriptors-based covisibility cluster generation strategy is designed to accelerate the feature matching and improve the performance of the pose estimation. Then, a serialized structure-from-motion (SfM) and dense point cloud reconstruction framework is designed to achieve high efficiency and competitive precision reconstruction for serialized frames. To fully demonstrate the superiority of our method, we collect a public aerial sequences dataset with referable ground truth for the dense point cloud reconstruction evaluation. Through a time complexity analysis and the experimental validation in this dataset, the comprehensive performance of our algorithm is better than the other compared outstanding methods.
Real-time large-scale point cloud segmentation is an important but challenging task for practical applications such as remote sensing and robotics. Existing real-time methods have achieved acceptable ...performance by aggregating local information. However, most of them only exploit local spatial geometric or semantic information dependently, few considering the complementarity of both. In this paper, we propose a model named Spatial–Semantic Incorporation Network (SSI-Net) for real-time large-scale point cloud segmentation. A Spatial-Semantic Cross-correction (SSC) module is introduced in SSI-Net as a basic unit. High-quality contextual features can be learned through SSC by correcting and updating high-level semantic information using spatial geometric cues and vice versa. Adopting the plug-and-play SSC module, we design SSI-Net as an encoder–decoder architecture. To ensure efficiency, it also adopts a random sample-based hierarchical network structure. Extensive experiments on several prevalent indoor and outdoor datasets for point cloud semantic segmentation demonstrate that the proposed approach can achieve state-of-the-art performance.
This paper presents an effective framework for correspondence field estimation. The core idea is to construct pixel-level and superpixel-level patch matching to achieve high accuracy estimation as ...well as fast speed computation. To this end, a hybrid edge-preserving supported weighting approach is first developed, which contributes to better performance on the pixel level, especially on those in the regions of fine structures. Then, a local Minimum Spanning Tree (MST) is constructed to describe regions and develop the adaptive smooth penalty weights, so that the over-patching in large textureless regions can be effectively avoided. In addition, the MST is further extended to handle occlusions in way of edge preserving strategy. Finally, all the above treatments are collected into an optimization model where the objective function is developed in terms of Markov Random Filed (MRF). In computation, a fast yet efficient iterative optimization strategy is developed. Our approach achieves favorable place on optical flow benchmark, which locates at the top two and top four for endpoint error and angular error evaluations among more than 130 approaches listed in the webpage.
Existing architecture semantic modeling methods in 3D complex urban scenes continue facing difficulties, such as limited training data, lack of semantic information, and inflexible model processing. ...Focusing on extracting and adopting accurate semantic information into a modeling process, this work presents a framework for lightweight modeling of buildings that joints point clouds semantic segmentation and 3D feature line detection constrained by geometric and photometric consistency. The main steps are: (1) Extraction of single buildings from point clouds using 2D-3D semi-supervised semantic segmentation under photometric and geometric constraints. (2) Generation of lightweight building models by using 3D plane-constrained multi-view feature line extraction and optimization. (3) Introduction of detailed semantics of building elements into independent 3D building models by using fine-grained segmentation of multi-view images to achieve high-accuracy architecture lightweight modeling with fine-grained semantic information. Experimental results demonstrate that it can perform independent lightweight modeling of each building on point cloud at various scales and scenes, with accurate geometric appearance details and realistic textures. It also enables independent processing and analysis of each building in the scenario, making them more useful in practical applications.