Learning depth from a single image, as an important issue in scene understanding, has attracted a lot of attention in the past decade. The accuracy of the depth estimation has been improved from ...conditional Markov random fields, non-parametric methods, to deep convolutional neural networks most recently. However, there exist inherent ambiguities in recovering 3D from a single 2D image. In this paper, we first prove the ambiguity between the focal length and monocular depth learning and verify the result using experiments, showing that the focal length has a great influence on accurate depth recovery. In order to learn monocular depth by embedding the focal length, we propose a method to generate synthetic varying-focal-length data set from fixed-focal-length data sets, and a simple and effective method is implemented to fill the holes in the newly generated images. For the sake of accurate depth recovery, we propose a novel deep neural network to infer depth through effectively fusing the middle-level information on the fixed-focal-length data set, which outperforms the state-of-the-art methods built on pre-trained VGG. Furthermore, the newly generated varying-focal-length data set is taken as input to the proposed network in both learning and inference phases. Extensive experiments on the fixed- and varying-focal-length data sets demonstrate that the learned monocular depth with embedded focal length is significantly improved compared to that without embedding the focal length information.
A novel method ICF (Identifying point correspondences by Correspondence Function) is proposed for rejecting mismatches from given putative point correspondences. By analyzing the connotation of ...homography, we introduce a novel concept of correspondence function for two images of a general 3D scene, which captures the relationships between corresponding points by mapping a point in one image to its corresponding point in another. Since the correspondence functions are unknown in real applications, we also study how to estimate them from given putative correspondences, and propose an algorithm IECF (Iteratively Estimate Correspondence Function) based on diagnostic technique and SVM. Then, the proposed ICF method is able to reject the mismatches by checking whether they are consistent with the estimated correspondence functions. Extensive experiments on real images demonstrate the excellent performance of our proposed method. In addition, the ICF is a general method for rejecting mismatches, and it is applicable to images of rigid objects or images of non-rigid objects with unknown deformation.
This article presents a challenging new dataset for indoor localization research. We have recorded the whole internal structure of Fengtai Wanda Plaza which is an area of over 15,800 m2 with a Navvis ...M6 device. The dataset contains 679 RGB-D panoramas and 2,664 query images collected by three different smartphones. In addition to the data, an aligned 3D point cloud is produced after the elimination of moving objects based on the building floorplan. Furthermore, a method is provided to generate corresponding high-resolution depth images for each panorama. By fixing the smartphones on the device using a specially designed bracket, six-degree-of-freedom camera poses can be calculated precisely. We believe it can give a new benchmark for indoor visual localization and the full dataset can be downloaded from http://vision.ia.ac.cn/Faculty/wgao/data_code/data_indoor_localizaiton/data_indoor_localization.htm
This paper proposes a novel method for interest region description which pools local features based on their intensity orders in multiple support regions. Pooling by intensity orders is not only ...invariant to rotation and monotonic intensity changes, but also encodes ordinal information into a descriptor. Two kinds of local features are used in this paper, one based on gradients and the other on intensities; hence, two descriptors are obtained: the Multisupport Region Order-Based Gradient Histogram (MROGH) and the Multisupport Region Rotation and Intensity Monotonic Invariant Descriptor (MRRID). Thanks to the intensity order pooling scheme, the two descriptors are rotation invariant without estimating a reference orientation, which appears to be a major error source for most of the existing methods, such as Scale Invariant Feature Transform (SIFT), SURF, and DAISY. Promising experimental results on image matching and object recognition demonstrate the effectiveness of the proposed descriptors compared to state-of-the-art descriptors.
Structure-from-Motion (SfM) methods can be broadly categorized as incremental or global according to their ways to estimate initial camera poses. While incremental system has advanced in robustness ...and accuracy, the efficiency remains its key challenge. To solve this problem, global reconstruction system simultaneously estimates all camera poses from the epipolar geometry graph, but it is usually sensitive to outliers. In this work, we propose a new hybrid SfM method to tackle the issues of efficiency, accuracy and robustness in a unified framework. More specifically, we propose an adaptive community-based rotation averaging method first to estimate camera rotations in a global manner. Then, based on these estimated camera rotations, camera centers are computed in an incremental way. Extensive experiments show that our hybrid method performs similarly or better than many of the state-of-the-art global SfM approaches, in terms of computational efficiency, while achieves similar reconstruction accuracy and robustness with two other state-of-the-art incremental SfM approaches.
Line matching plays an important role in many applications, such as image registration, 3D reconstruction, object recognition and video understanding. However, compared with other features (such as ...point, region matching), it has made little progress in recent years.
In this paper, we investigate the problem of matching line segments automatically only from their neighborhood appearance, without resorting to any other constraints or priori knowledge. A novel line descriptor called mean–standard deviation line descriptor (MSLD) descriptor is proposed for this purpose, which is constructed by the following three steps: (1) For each pixel on the line segment, its pixel support region (PSR) is defined and then the PSR is divided into non-overlapped sub-regions. (2) Line gradient description matrix (GDM) is formed by characterizing each sub-region into a vector. (3) MSLD is built by computing the mean and standard deviation of GDM column vectors. Extensive experiments on real images show that MSLD descriptor is highly distinctive for line matching under rotation, illumination change, image blur, viewpoint change, noise, JPEG compression and partial occlusion.
In addition, the concept of MSLD descriptor can also be extended to creating curve descriptor (mean–standard deviation curve descriptor, MSCD), and promising MSCD-based results for both curve and region matching are also demonstrated in this work.
Robust and precise visual localization over extended periods of time poses a formidable challenge in the current domain of spatial vision. The primary difficulty lies in effectively addressing ...significant variations in appearance caused by seasonal changes (summer, winter, spring, autumn) and diverse lighting conditions (dawn, day, sunset, night). With the rapid development of related technologies, more and more relevant datasets have emerged, which has also promoted the progress of 6-DOF visual localization in both directions of autonomous vehicles and handheld devices.This manuscript endeavors to rectify the existing limitations of the current public benchmark for long-term visual localization, especially in the part on the autonomous vehicle challenge. Taking into account that autonomous vehicle datasets are primarily captured by multi-camera rigs with fixed extrinsic camera calibration and consist of serialized image sequences, we present several proposed modifications designed to enhance the rationality and comprehensiveness of the evaluation algorithm. We advocate for standardized preprocessing procedures to minimize the possibility of human intervention influencing evaluation results. These procedures involve aligning the positions of multiple cameras on the vehicle with a predetermined canonical reference system, replacing the individual camera positions with uniform vehicle poses, and incorporating sequence information to compensate for any failed localized poses. These steps are crucial in ensuring a just and accurate evaluation of algorithmic performance. Lastly, we introduce a novel indicator to resolve potential ties in the Schulze ranking among submitted methods. The inadequacies highlighted in this study are substantiated through simulations and actual experiments, which unequivocally demonstrate the necessity and effectiveness of our proposed amendments.
Traditionally the danger cylinder is intimately related to the solution stability in P3P problem. In this work, we show that the danger cylinder is also closely related to the multiple-solution ...phenomenon. More specifically, we show that when the optical center lies on the danger cylinder, of the 3 possible P3P solutions, i.e., one double solution, and two other solutions, the optical center of the double solution still lies on the danger cylinder, but the optical centers of the other two solutions no longer lie on the danger cylinder. And when the optical center moves on the danger cylinder, accordingly the optical centers of the two other solutions of the corresponding P3P problem form a new surface, characterized by a polynomial equation of degree 12 in the optical center coordinates, called the deltoidal surface of danger cylinder (DSDC). This indicates the danger cylinder always has a companion deltoidal surface. For the significance of DSDC, we show that when the optical center passes through the DSDC, the number of solutions of P3P constraint system must change by 2, or DSDC acts as a delimitating surface of the P3P solution space. These new findings shed some new lights on the P3P multi-solution phenomenon, an important issue in P3P study.
In this paper, we put forward a new method for surface reconstruction from image-based point clouds. In particular, we introduce a new visibility model for each line of sight to preserve scene ...details without decreasing the noise filtering ability. To make the proposed method suitable for point clouds with heavy noise, we introduce a new likelihood energy term to the total energy of the binary labeling problem of Delaunay tetrahedra, and we give its
-
graph implementation. Besides, we further improve the performance of the proposed method with the dense visibility technique, which helps to keep the object edge sharp. The experimental result shows that the proposed method rivalled the state-of-the-art methods in terms of accuracy and completeness, and performed better with reference to detail preservation.
In this work, a language-level Semantics-Conditioned framework for 3D Point cloud segmentation, called SeCondPoint, is proposed, where language-level semantics are introduced to condition the ...modeling of the point feature distribution, as well as the pseudo-feature generation, and a feature–geometry-based Mixup approach is further proposed to facilitate the distribution learning. Since a large number of point features could be generated from the learned distribution thanks to the semantics-conditioned modeling, any existing segmentation network could be embedded into the proposed framework to boost its performance. In addition, the proposed framework has the inherent advantage of dealing with novel classes, which seems an impossible feat for the current segmentation networks. Extensive experimental results on two public datasets demonstrate that three typical segmentation networks could achieve significant improvements over their original performances after enhancement by the proposed framework in the conventional 3D segmentation task. Two benchmarks are also introduced for a newly introduced zero-shot 3D segmentation task, and the results also validate the proposed framework.