Deep Learning for 3D Point Clouds: A Survey Guo, Yulan; Wang, Hanyun; Hu, Qingyong ...
IEEE transactions on pattern analysis and machine intelligence,
2021-Dec.-1, 2021-12-1, 20211201, Volume:
43, Issue:
12
Journal Article
Peer reviewed
Open access
Point cloud learning has lately attracted increasing attention due to its wide applications in many areas, such as computer vision, autonomous driving, and robotics. As a dominating technique in AI, ...deep learning has been successfully used to solve various 2D vision problems. However, deep learning on point clouds is still in its infancy due to the unique challenges faced by the processing of point clouds with deep neural networks. Recently, deep learning on point clouds has become even thriving, with numerous methods being proposed to address different problems in this area. To stimulate future research, this paper presents a comprehensive review of recent progress in deep learning methods for point clouds. It covers three major tasks, including 3D shape classification, 3D object detection and tracking, and 3D point cloud segmentation. It also presents comparative results on several publicly available datasets, together with insightful observations and inspiring future research directions.
3D object recognition in cluttered scenes is a rapidly growing research area. Based on the used types of features, 3D object recognition methods can broadly be divided into two categories-global or ...local feature based methods. Intensive research has been done on local surface feature based methods as they are more robust to occlusion and clutter which are frequently present in a real-world scene. This paper presents a comprehensive survey of existing local surface feature based 3D object recognition methods. These methods generally comprise three phases: 3D keypoint detection, local surface feature description, and surface matching. This paper covers an extensive literature survey of each phase of the process. It also enlists a number of popular and contemporary databases together with their relevant attributes.
Recognizing 3D objects in the presence of noise, varying mesh resolution, occlusion and clutter is a very challenging task. This paper presents a novel method named Rotational Projection Statistics ...(RoPS). It has three major modules: local reference frame (LRF) definition, RoPS feature description and 3D object recognition. We propose a novel technique to define the LRF by calculating the scatter matrix of all points lying on the local surface. RoPS feature descriptors are obtained by rotationally projecting the neighboring points of a feature point onto 2D planes and calculating a set of statistics (including low-order central moments and entropy) of the distribution of these projected points. Using the proposed LRF and RoPS descriptor, we present a hierarchical 3D object recognition algorithm. The performance of the proposed LRF, RoPS descriptor and object recognition algorithm was rigorously tested on a number of popular and publicly available datasets. Our proposed techniques exhibited superior performance compared to existing techniques. We also showed that our method is robust with respect to noise and varying mesh resolution. Our RoPS based algorithm achieved recognition rates of 100, 98.9, 95.4 and 96.0 % respectively when tested on the Bologna, UWA, Queen’s and Ca’ Foscari Venezia Datasets.
A number of 3D local feature descriptors have been proposed in the literature. It is however, unclear which descriptors are more appropriate for a particular application. A good descriptor should be ...descriptive, compact, and robust to a set of nuisances. This paper compares ten popular local feature descriptors in the contexts of 3D object recognition, 3D shape retrieval, and 3D modeling. We first evaluate the descriptiveness of these descriptors on eight popular datasets which were acquired using different techniques. We then analyze their compactness using the recall of feature matching per each float value in the descriptor. We also test the robustness of the selected descriptors with respect to support radius variations, Gaussian noise, shot noise, varying mesh resolution, distance to the mesh boundary, keypoint localization error, occlusion, clutter, and dataset size. Moreover, we present the performance results of these descriptors when combined with different 3D keypoint detection methods. We finally analyze the computational efficiency for generating each descriptor.
Several bandwise total variation (TV) regularized low-rank (LR)-based models have been proposed to remove mixed noise in hyperspectral images (HSIs). These methods convert high-dimensional HSI data ...into 2-D data based on LR matrix factorization. This strategy introduces the loss of useful multiway structure information. Moreover, these bandwise TV-based methods exploit the spatial information in a separate manner. To cope with these problems, we propose a spatial-spectral TV regularized LR tensor factorization (SSTV-LRTF) method to remove mixed noise in HSIs. From one aspect, the hyperspectral data are assumed to lie in an LR tensor, which can exploit the inherent tensorial structure of hyperspectral data. The LRTF-based method can effectively separate the LR clean image from sparse noise. From another aspect, HSIs are assumed to be piecewisely smooth in the spatial domain. The TV regularization is effective in preserving the spatial piecewise smoothness and removing Gaussian noise. These facts inspire the integration of the LRTF with TV regularization. To address the limitations of bandwise TV, we use the SSTV regularization to simultaneously consider local spatial structure and spectral correlation of neighboring bands. Both simulated and real data experiments demonstrate that the proposed SSTV-LRTF method achieves superior performance for HSI mixed-noise removal, as compared to the state-of-the-art TV regularized and LR-based methods.
Camera arrays provide spatial and angular information within a single snapshot. With refocusing methods, focal planes can be altered after exposure. In this letter, we propose a light field ...refocusing method to improve the imaging quality of camera arrays. In our method, the disparity is first estimated. Then, the unfocused region (bokeh) is rendered by using a depth-based anisotropic filter. Finally, the refocused image is produced by a reconstruction-based superresolution approach where the bokeh image is used as a regularization term. Our method can selectively refocus images with focused region being superresolved and bokeh being esthetically rendered. Our method also enables postadjustment of depth of field. We conduct experiments on both public and self-developed datasets. Our method achieves superior visual performance with acceptable computational cost as compared to the other state-of-the-art methods.
Range image registration is a fundamental research topic for 3D object modeling and recognition. In this paper, we propose an accurate and robust algorithm for pairwise and multi-view range image ...registration. We first extract a set of Rotational Projection Statistics (RoPS) features from a pair of range images, and perform feature matching between them. The two range images are then registered using a transformation estimation method and a variant of the Iterative Closest Point (ICP) algorithm. Based on the pairwise registration algorithm, we propose a shape growing based multi-view registration algorithm. The seed shape is initialized with a selected range image and then sequentially updated by performing pairwise registration between itself and the input range images. All input range images are iteratively registered during the shape growing process. Extensive experiments were conducted to test the performance of our algorithm. The proposed pairwise registration algorithm is accurate, and robust to small overlaps, noise and varying mesh resolutions. The proposed multi-view registration algorithm is also very accurate. Rigorous comparisons with the state-of-the-art show the superiority of our algorithm.
This paper studies the hyperspectral image (HSI) denoising problem under the assumption that the signal is low in rank. In this paper, a mixture of Gaussian noise and sparse noise is considered. The ...sparse noise includes stripes, impulse noise, and dead pixels. The denoising task is formulated as a low-rank tensor recovery (LRTR) problem from Gaussian noise and sparse noise. Traditional low-rank tensor decomposition methods are generally NP-hard to compute. Besides, these tensor decomposition based methods are sensitive to sparse noise. In contrast, the proposed LRTR method can preserve the global structure of HSIs and simultaneously remove Gaussian noise and sparse noise.The proposed method is based on a new tensor singular value decomposition and tensor nuclear norm. The NP-hard tensor recovery task is well accomplished by polynomial time algorithms. The convergence of the algorithm and the parameter settings are also described in detail. Preliminary numerical experiments have demonstrated that the proposed method is effective for low-rank tensor recovery from Gaussian noise and sparse noise. Experimental results also show that the proposed LRTR method outperforms other denoising algorithms on real corrupted hyperspectral data.
The goal of ground-to-aerial image geo-localization is to determine the location of a ground query image by matching it against a reference database consisting of aerial/satellite images. This task ...is highly challenging due to the large appearance difference caused by extreme changes in viewpoint and orientation. In this work, we show that the training difficulty is an important cue that can be leveraged to improve metric learning on cross-view images. More specifically, we propose a new Soft Exemplar Highlighting (SEH) loss to achieve online soft selection of exemplars. Adaptive weights are generated for exemplars by measuring their associated training difficulty using distance rectified logistic regression. These weights are then constrained to remove simple exemplars from training and truncate the large weights of extremely hard exemplars to escape from the trap with a local optimal solution. We further use the proposed SEH loss to train two mainstream convolutional neural networks for ground-to-aerial image-based geo-localization. Experimental results on two benchmark cross-view image datasets demonstrate that the proposed method achieves significant improvements in feature discriminativeness and outperforms the state-of-the-art image-based geo-localization methods.