Reconstruction and expansion, as well as asset management, of highways necessitate the development of a current and highly precise 3D pavement model. Current inverse modeling methods with point ...clouds are laborious, time-consuming, and limited in precision. This article introduces an alternative framework for parametric inverse procedural modeling of highway pavement with standardized alignments seamlessly integrated with off-the-shelf modeling software. It comprises three key steps. (1) Extraction of highway pavement boundaries and lane markings: Initially, we combine grid-based and model-driven methods, followed by line structure-based clustering, to accurately generate road centerlines and layouts. (2) Road centerline generation: The centerline, derived from lane markings, informs highway alignments and parameters based on geometric characteristics such as curvature and slope. We utilize cost functions to facilitate this process. (3) Novel inverse procedural assembly: This innovative step integrates off-the-shelf modeling software. This approach involves extracting vector lines from point clouds and applying constraints at pivotal points on highway pavement cross-sections. Our focus is on refined component-level modeling, allowing for the assembly of diverse highway elements. This method significantly reduces human intervention and achieves high precision. In tests on two highway datasets from Sichuan Province, China, our method achieved excellent results. It attained an average correctness of 98.63% and completeness of 99.66% within a 10 cm error margin. A comparison with the intersection point method indicated minimal errors, with maximum values below 1.2%. The resultant 3D highway pavement model is modular and highly accurate at the centimeter level.
Digital twins (DTs) have been found useful in manufacturing, construction, and maintenance. Adapting DTs to serve cities, the question arises of what an urban digital twin should contain and how it ...should be orchestrated to serve a city’s dynamical ecosystem, along with how to enhance the efficiency of the city. We are aligning with the commonplace idea that the main advantage of using DTs is economical as, for example, DTs can improve the planning of activities thus saving money and time. But how can they be useful for a city? Instead of looking at the DTs as solutions in search of problems to be solved, we start from city needs. Our approach is two-fold. We start by briefly reviewing existing possibilities for meeting some specific needs, but keep the focus on identifying and attempting to close the gap between the needs arising from everyday city functions and the latest DT techniques useful for meeting those needs. DTs are technically different and serve different applications, yet they share a common identity and name, as well as several technical similarities. Adopting computer science terminology, we see a back-end city DT as the container of all information, while any single front-end, visualized or used either by humans or robots, offers a limited but meaningful representation of the DT for a specific application. Alas, there are multiple open questions regarding the realization and benefits of such back-end DT. Nevertheless, we discuss how the back-end DT (or any specific DT) could be updated autonomously from sensor data using artificial intelligence techniques, and how the front-ends could be used for large benefits to the entire city ecosystem.
•There is a call to better match DT technology to meet overall city needs.•City DTs differ from DTs used in manufacturing, construction, and maintenance.•Differences include both technical (BIM-GIS) and human factor-induced complexities.•Novel AI methods could serve in automated updating of city DTs from sensor data.•Human factors and inclusion of third parties need to be considered for city DTs.
Three-dimensional semantic segmentation is the foundation for automatically creating enriched Digital Twin Cities (DTCs) and their updates. For this task, prior-level fusion approaches show more ...promising results than other fusion levels. This article proposes a new approach by developing and benchmarking three prior-level fusion scenarios to enhance the outcomes of point cloud-enriched semantic segmentation. The latter were compared with a baseline approach that used the point cloud only. In each scenario, specific prior knowledge (geometric features, classified images, or classified geometric information) and aerial images were fused into the neural network’s learning pipeline with the point cloud data. The goal was to identify the one that most profoundly enhanced the neural network’s knowledge. Two deep learning techniques, “RandLaNet” and “KPConv”, were adopted, and their parameters were modified for different scenarios. Efficient feature engineering and selection for the fusion step facilitated the learning process and improved the semantic segmentation results. Our contribution provides a good solution for addressing some challenges, particularly for more accurate extraction of semantically rich objects from the urban environment. The experimental results have demonstrated that Scenario 1 has higher precision (88%) on the SensatUrban dataset compared to the baseline approach (71%), the Scenario 2 approach (85%), and the Scenario 3 approach (84%). Furthermore, the qualitative results obtained by the first scenario are close to the ground truth. Therefore, it was identified as the efficient fusion approach for point cloud-enriched semantic segmentation, which we have named the efficient prior-level fusion (Efficient-PLF) approach.
A Survey of Visual Transformers Liu, Yang; Zhang, Yao; Wang, Yixin ...
IEEE transaction on neural networks and learning systems,
06/2024, Letnik:
35, Številka:
6
Journal Article
Odprti dostop
Transformer, an attention-based encoder-decoder model, has already revolutionized the field of natural language processing (NLP). Inspired by such significant achievements, some pioneering works have ...recently been done on employing Transformer-liked architectures in the computer vision (CV) field, which have demonstrated their effectiveness on three fundamental CV tasks (classification, detection, and segmentation) as well as multiple sensory data stream (images, point clouds, and vision-language data). Because of their competitive modeling capabilities, the visual Transformers have achieved impressive performance improvements over multiple benchmarks as compared with modern convolution neural networks (CNNs). In this survey, we have reviewed over 100 of different visual Transformers comprehensively according to three fundamental CV tasks and different data stream types, where taxonomy is proposed to organize the representative methods according to their motivations, structures, and application scenarios. Because of their differences on training settings and dedicated vision tasks, we have also evaluated and compared all these existing visual Transformers under different configurations. Furthermore, we have revealed a series of essential but unexploited aspects that may empower such visual Transformers to stand out from numerous architectures, e.g., slack high-level semantic embeddings to bridge the gap between the visual Transformers and the sequential ones. Finally, two promising research directions are suggested for future investment. We will continue to update the latest articles and their released source codes at https://github.com/liuyang-ict/awesome-visual-transformers .
Recently, the advancement of deep learning (DL) in discriminative feature learning from 3-D LiDAR data has led to rapid development in the field of autonomous driving. However, automated processing ...uneven, unstructured, noisy, and massive 3-D point clouds are a challenging and tedious task. In this article, we provide a systematic review of existing compelling DL architectures applied in LiDAR point clouds, detailing for specific tasks in autonomous driving, such as segmentation, detection, and classification. Although several published research articles focus on specific topics in computer vision for autonomous vehicles, to date, no general survey on DL applied in LiDAR point clouds for autonomous vehicles exists. Thus, the goal of this article is to narrow the gap in this topic. More than 140 key contributions in the recent five years are summarized in this survey, including the milestone 3-D deep architectures, the remarkable DL applications in 3-D semantic segmentation, object detection, and classification; specific data sets, evaluation metrics, and the state-of-the-art performance. Finally, we conclude the remaining challenges and future researches.
Point clouds, as a kind of 3D objects representation, are the most primitive outputs obtained by 3D sensors. Unlike 2D images, point clouds are disordered and unstructured. Hence the classification ...techniques such as the convolution neural network are not applicable to point cloud analysis directly. To solve this problem, we propose a novel network to extract point clouds feature, named attention-based graph convolutional network (AGCN). Taking the learning process as a message propagation between adjacent points, we specifically introduce attention mechanism to construct a point attention layer for analyzing the relationship between local points feature. The object classification is implemented by stacking multiple layers of point attention layer. In addition, the proposed network is extended to an attention-based encoder-decoder structure for segmentation tasks. We also introduce an additional global graph structure network to compensate for the relative location information of the individual points in the graph structure network. Experimental results show that our network has lower computational complexity and faster convergence speed. Compared with existing methods, the proposed network can achieve comparable performance in classification and segmentation tasks.
Registration of partially overlapping, featureless three-dimensional (3D) point sets with noise is a difficult problem in the applications of large-scale metrology (LSM). Existing approaches use the ...sparse iterative closest points (SICP) method applying the lp norm to decrease the influence of outliers during registration. However, we reveal in this study that sparse point-to-point becomes easily trapped into local minima in featureless point clouds registration, and the error landscape of sparse point-to-plane is too shallow to restrain the sliding due to the lack of constraints in the large flat areas. Also, point clouds sampled from the large flat areas cause the low-rank matrix of linear equation in estimating the transformation matrix. Hence, we propose using the point-to-point lp distance constraints to restrain the sliding along large flat areas. We further define a weighted enhanced lp distance (WELD) error metric to slacken the constraints and escape from the local minima. Moreover, WELD can improve stability with the full-rank linear equation in estimating the transformation matrix. To verify the capability of escaping from the local minima and restraining the sliding, we choose SICP and two other algorithms to be compared with our method in simulated and actual point clouds. The comparisons show that our method successfully can escape from the local minima and restrain the sliding to handle outliers and noisy featureless point clouds effectively. The source code is available at https://github.com/Timbersaw-wangzw/WES-ICP.
In this study, we propose a high-throughput and low-cost automatic detection method based on deep learning to replace the inefficient manual counting of rapeseed siliques. First, a video is captured ...with a smartphone around the rapeseed plants in the silique stage. Feature point detection and matching based on SIFT operators are applied to the extracted video frames, and sparse point clouds are recovered using epipolar geometry and triangulation principles. The depth map is obtained by calculating the disparity of the matched images, and the dense point cloud is fused. The plant model of the whole rapeseed plant in the silique stage is reconstructed based on the structure-from-motion (SfM) algorithm, and the background is removed by using the passthrough filter. The downsampled 3D point cloud data is processed by the DGCNN network, and the point cloud is divided into two categories: sparse rapeseed canopy siliques and rapeseed stems. The sparse canopy siliques are then segmented from the original whole rapeseed siliques point cloud using the sparse-dense point cloud mapping method, which can effectively save running time and improve efficiency. Finally, Euclidean clustering segmentation is performed on the rapeseed canopy siliques, and the RANSAC algorithm is used to perform line segmentation on the connected siliques after clustering, obtaining the three-dimensional spatial position of each silique and counting the number of siliques. The proposed method was applied to identify 1457 siliques from 12 rapeseed plants, and the experimental results showed a recognition accuracy greater than 97.80%. The proposed method achieved good results in rapeseed silique recognition and provided a useful example for the application of deep learning networks in dense 3D point cloud segmentation.
Digitalization of Nuclear Power Plants (NPPs) is critical for their safe and effective operation and maintenance. Development of Digital Twins (DTs) of NPP legacy assets and subsystems is key to ...achieving this goal. Doing this effectively requires a framework for intelligent allocation of limited resources. This framework is developed here by synthesizing emerging best practices with NPP operators' needs for legacy assets management. Within the framework, a pipeline employs deep-learning object detection to read and locate equipment tags in images. It computes their locations in the corresponding 3D point clouds and then relates that data to an asset management system. The pipeline is premised on preservation and augmentation of existing NPP asset management processes that preclude options such as RFID tags or barcodes. It is a significant step toward more efficient development of DTs of legacy assets. The contributions are framed in the context of a typical Canadian legacy NPP.
•A framework that defines the key aspects of legacy assets for Digital Twins.•Automatic link between photographic records and asset management software.•Performance of tag detection and optical character recognition on real images.•Pipeline for efficient linking of assets in point clouds to management systems.•Identification of asset locations using legacy asset tags in nuclear power plants.
•A novel pillar-based features encoder named PSCFE is proposed in the paper.•The length-adaptive RNN-based module is designed to deal with the inhomogeneity.•ASCNet has achieved competitive ...performance in KITTI dataset.
This paper presents a novel two-stage 3D point cloud object detector named ASCNet for autonomous driving. Most current works project 3D point clouds to 2D space, whereas the quantization loss in the transformation is inevitable. A Pillar-wise Spatial Context Feature Encoding (PSCFE) module is proposed in the paper to drive the learning of discriminative features and reduce the detailed information loss. The inhomogeneity that existed in 3D object detection from the point clouds, such as the inconsistent number of points in the pillars, the diverse size of Regions of Interest (RoI), should be treated wisely due to the sparsity and the individual specificity. We introduce a length-adaptive RNN-based module to solve the inhomogeneity. A novel backbone combining encoder-decoder and shortcut connection is designed in the paper to learn the multi-scale features for 3D object detection. Additionally, we utilize multiple RoI heads and class-wise NMS to deal with the class imbalance in scenes. Extensive experiments on the KITTI dataset demonstrate that our algorithm achieves competitive performance in 3D bounding box detection and BEV detection.