Camera pose estimation in known scenes is a 3D geometry task recently tackled by multiple learning algorithms. Many regress precise geometric quantities, like poses or 3D points, from an input image. ...This either fails to generalize to new viewpoints or ties the model parameters to a specific scene. In this paper, we go Back to the Feature: we argue that deep networks should focus on learning robust and invariant visual features, while the geometric estimation should be left to principled algorithms. We introduce PixLoc, a scene-agnostic neural network that estimates an accurate 6-DoF pose from an image and a 3D model. Our approach is based on the direct alignment of multiscale deep features, casting camera localization as metric learning. PixLoc learns strong data priors by end-to-end training from pixels to pose and exhibits exceptional generalization to new scenes by separating model parameters and scene geometry. The system can localize in large environments given coarse pose priors but also improve the accuracy of sparse feature matching by jointly refining keypoints and poses with little overhead. The code will be publicly available at github.com/cvg/pixloc.
Long-term visual localization is the problem of estimating the camera pose of a given query image in a scene whose appearance changes over time. It is an important problem in practice that is, for ...example, encountered in autonomous driving. In order to gain robustness to such changes, long-term localization approaches often use segmantic segmentations as an invariant scene representation, as the semantic meaning of each scene part should not be affected by seasonal and other changes. However, these representations are typically not very discriminative due to the very limited number of available classes. In this paper, we propose a novel neural network, the Fine-Grained Segmentation Network (FGSN), that can be used to provide image segmentations with a larger number of labels and can be trained in a self-supervised fashion. In addition, we show how FGSNs can be trained to output consistent labels across seasonal changes. We show through extensive experiments that integrating the fine-grained segmentations produced by our FGSNs into existing localization algorithms leads to substantial improvements in localization performance.
Robust cross-seasonal localization is one of the major challenges in long-term visual navigation of autonomous vehicles. In this paper, we exploit recent advances in semantic segmentation of images, ...i.e., where each pixel is assigned a label related to the type of object it represents, to attack the problem of long-term visual localization. We show that semantically labeled 3D point maps of the environment, together with semantically segmented images, can be efficiently used for vehicle localization without the need for detailed feature descriptors (SIFT, SURF, etc.), Thus, instead of depending on hand-crafted feature descriptors, we rely on the training of an image segmenter. The resulting map takes up much less storage space compared to a traditional descriptor based map. A particle filter based semantic localization solution is compared to one based on SIFT-features, and even with large seasonal variations over the year we perform on par with the larger and more descriptive SIFT-features, and are able to localize with an error below 1 m most of the time.
Visual localization enables autonomous vehicles to navigate in their surroundings and augmented reality applications to link virtual to real worlds. Practical visual localization approaches need to ...be robust to a wide variety of viewing condition, including day-night changes, as well as weather and seasonal variations, while providing highly accurate 6 degree-of-freedom (6DOF) camera pose estimates. In this paper, we introduce the first benchmark datasets specifically designed for analyzing the impact of such factors on visual localization. Using carefully created ground truth poses for query images taken under a wide variety of conditions, we evaluate the impact of various factors on 6DOF camera pose estimation accuracy through extensive experiments with state-of-the-art localization approaches. Based on our results, we draw conclusions about the difficulty of different conditions, showing that long-term localization is far from solved, and propose promising avenues for future work, including sequence-based localization approaches and the need for better local features. Our benchmark is available at visuallocalization.net.
For self-localization, a detailed and reliable map of the environment can be used to relate sensor data to static features with known locations. This paper presents a method for construction of ...detailed radar maps that describe the expected intensity of detections. Specifically, the measurements are modelled by an inhomogeneous Poisson process with a spatial intensity function given by the sum of a constant clutter level and an unnormalized Gaussian mixture. A substantial difficulty with radar mapping is the presence of data association uncertainties, i.e., the unknown associations between measurements and landmarks. In this paper, the association variables are introduced as hidden variables in a variational Bayesian expectation maximization (VBEM) framework, resulting in a computationally efficient mapping algorithm that enables a joint estimation of the number of landmarks and their parameters.
In this paper, we propose a Bayesian filtering approach that uses information from camera-based driver monitoring systems and filtering techniques to find the probability that the driver is looking ...in different zones. In particular, the focus is on a set of zones directly related either to active driving or to visual distraction, such as the road, the mirrors, the infotainment display, or control buttons. For systems that do not provide direct observations of the gaze direction or as a complement to noisy gaze data, we propose to use probabilistic functions that describe the gaze direction as a function of head pose and eye closure. It is further shown how these functions can be estimated from data with know visual focus points using Gaussian processes. Evaluation on data from two driver monitoring systems shows a significant improvement compared with the gaze zone estimates based on unprocessed data.
Visual localization enables autonomous vehicles to navigate in their surroundings and augmented reality applications to link virtual to real worlds. Practical visual localization approaches need to ...be robust to a wide variety of viewing conditions, including day-night changes, as well as weather and seasonal variations, while providing highly accurate six degree-of-freedom (6DOF) camera pose estimates. In this paper, we extend three publicly available datasets containing images captured under a wide variety of viewing conditions, but lacking camera pose information, with ground truth pose information, making evaluation of the impact of various factors on 6DOF camera pose estimation accuracy possible. We also discuss the performance of state-of-the-art localization approaches on these datasets. Additionally, we release around half of the poses for all conditions, and keep the remaining half private as a test set, in the hopes that this will stimulate research on long-term visual localization, learned local image features, and related research areas. Our datasets are available at visuallocalization.net , where we are also hosting a benchmarking server for automatic evaluation of results on the test set. The presented state-of-the-art results are to a large degree based on submissions to our server.
Mapping stationary objects is essential for autonomous vehicles and many autonomous functions in vehicles. In this contribution the probability hypothesis density (PHD) filter framework is applied to ...automotive imagery sensor data for constructing such a map, where the main advantages are that it avoids the detection, the data association and the track handling problems in conventional multiple-target tracking, and that it gives a parsimonious representation of the map in contrast to grid based methods. Two original contributions address the inherent complexity issues of the algorithm: First, a data clustering algorithm is suggested to group the components of the PHD into different clusters, which structures the description of the prior and considerably improves the measurement update in the PHD filter. Second, a merging step is proposed to simplify the map representation in the PHD filter. The algorithm is applied to multi-sensor radar data collected on public roads, and the resulting map is shown to well describe the environment as a human perceives it.
This paper addresses the mapping problem. Using a conjugate prior form, we derive the exact theoretical batch multiobject posterior density of the map given a set of measurements. The landmarks in ...the map are modeled as extended objects, and the measurements are described as a Poisson process, conditioned on the map. We use a Poisson process prior on the map and prove that the posterior distribution is a hybrid Poisson, multi-Bernoulli mixture distribution. We devise a Gibbs sampling algorithm to sample from the batch multiobject posterior. The proposed method can handle uncertainties in the data associations and the cardinality of the set of landmarks, and is parallelizable, making it suitable for large-scale problems. The performance of the proposed method is evaluated on synthetic data and is shown to outperform a state-of-the-art method.
This paper is concerned with the problem of decision-making in systems that assist drivers in avoiding collisions. An important aspect of these systems is not only assisting the driver when needed ...but also not disturbing the driver with unnecessary interventions. Aimed at improving both of these properties, a probabilistic framework is presented for jointly evaluating the driver acceptance of an intervention and the necessity thereof to automatically avoid a collision. The intervention acceptance is modeled as high if it estimated that the driver judges the situation as critical, based on the driver's observations and predictions of the traffic situation. One advantage with the proposed framework is that interventions can be initiated at an earlier stage when the estimated driver acceptance is high. Using a simplified driver model, the framework is applied to a few different types of collision scenarios. The results show that the framework has appealing properties, both with respect to increasing the system benefit and to decreasing the risk of unnecessary interventions.