In this work we address the task of the contextual classification of an airborne LiDAR point cloud. For that purpose, we integrate a Random Forest classifier into a Conditional Random Field (CRF) ...framework. It is a flexible approach for obtaining a reliable classification result even in complex urban scenes. In this way, we benefit from the consideration of context on the one hand and from the opportunity to use a large amount of features on the other hand. Considering the interactions in our experiments increases the overall accuracy by 2%, though a larger improvement becomes apparent in the completeness and correctness of some of the seven classes discerned in our experiments. We compare the Random Forest approach to linear models for the computation of unary and pairwise potentials of the CRF, and investigate the relevance of different features for the LiDAR points as well as for the interaction of neighbouring points. In a second step, building objects are detected based on the classified point cloud. For that purpose, the CRF probabilities for the classes are plugged into a Markov Random Field as unary potentials, in which the pairwise potentials are based on a Potts model. The 2D binary building object masks are extracted and evaluated by the benchmark ISPRS Test Project on Urban Classification and 3D Building Reconstruction. The evaluation shows that the main buildings (larger than 50m2) can be detected very reliably with a correctness larger than 96% and a completeness of 100%.
During the last few years, artificial intelligence based on deep learning, and particularly based on convolutional neural networks, has acted as a game changer in just about all tasks related to ...photogrammetry and remote sensing. Results have shown partly significant improvements in many projects all across the photogrammetric processing chain from image orientation to surface reconstruction, scene classification as well as change detection, object extraction and object tracking and recognition in image sequences. This paper summarizes the foundations of deep learning for photogrammetry and remote sensing before illustrating, by way of example, different projects being carried out at the Institute of Photogrammetry and GeoInformation, Leibniz University Hannover, in this exciting and fast moving field of research and development.
In feature based image matching, distinctive features in images are detected and represented by feature descriptors. Matching is then carried out by assessing the similarity of the descriptors of ...potentially conjugate points. In this paper, we first shortly discuss the general framework. Then, we review feature detection as well as the determination of affine shape and orientation of local features, before analyzing feature description in more detail. In the feature description review, the general framework of local feature description is presented first. Then, the review discusses the evolution from hand-crafted feature descriptors, e.g. SIFT (Scale Invariant Feature Transform), to machine learning and deep learning based descriptors. The machine learning models, the training loss and the respective training data of learning-based algorithms are looked at in more detail; subsequently the various advantages and challenges of the different approaches are discussed. Finally, we present and assess some current research directions before concluding the paper.
In this paper, we present a new fast and robust method for structure from motion (SfM) for data sets potentially comprising thousands of ordered or unordered images. Our work focuses on the two most ...time-consuming procedures: (a) image matching and (b) pose estimation. For image matching, a new method employing a random k-d forest is proposed to quickly obtain pairs of overlapping images from an unordered set. After that, image matching and the estimation of relative orientation parameters are performed only for pairs found to be very likely to overlap. For pose estimation, we use a two-stage global approach, separating the determination of rotation matrices and translation parameters; the latter are computed simultaneously using a new method. In order to cope with outliers in the relative orientations, which global approaches are particularly sensitive to, we present a new constraint based on triplet loop closure errors of rotation and translation. Finally, a robust bundle adjustment is carried out to refine the image orientation parameters.
We demonstrate the potential and limitations of our pipeline using various real-world datasets including ordered image data acquired from UAV (unmanned aerial vehicle) and other platforms as well as unordered data from the internet. The experiments show that our work performs better than comparable state-of-the-art SfM systems in terms of run time, while we achieve a similar accuracy and robustness.
With the growing number of digitally available collections consisting of images depicting relevant objects from the past in relation with descriptive annotations, the need for suitable information ...retrieval techniques is becoming increasingly important to support historians in their work. In this context, we address the problem of image retrieval for searching records in a database of silk fabrics. The descriptors, used as an index to the database, are learned by a convolutional neural network, exploiting the available annotations to automatically generate training data. Descriptor learning is combined with auxiliary classification loss with the aim of supporting the clustering in the descriptor space with respect to the properties of the depicted silk objects, such as the place or time of origin. We evaluate our approach on a dataset of fabric images in a kNN-classification, showing promising results with respect to the ability of the descriptors to represent semantic properties of silk fabrics; integrating the auxiliary loss improves the overall accuracy by 2.7% and the average F1 score by 5.6%. It can be observed that the largest improvements can be obtained for variables with imbalanced class distributions. An evaluation on the WikiArt dataset demonstrates the transferability of our approach to other digital collections.
A method for the detection of buildings in densely built-up urban areas by the fusion of first and last pulse laser scanner data and multi-spectral images is presented. The method attempts to achieve ...a classification of land cover into the classes “building”, “tree”, “grassland”, and “bare soil”, the latter three being considered relevant for the subsequent generation of a high-quality digital terrain model (DTM). Building detection is accomplished by first applying a hierarchical rule-based technique for coarse DTM generation based on morphological filtering. After that, data fusion based on the theory of Dempster–Shafer is used at two different stages of the classification process. We describe the algorithms involved, giving examples for a test site in Fairfield (New South Wales).
In this paper, we describe the evaluation of a method for building detection by the Dempster–Shafer fusion of airborne laser scanner (ALS) data and multi-spectral images. For this purpose, ground ...truth was digitised for two test sites with quite different characteristics. Using these data sets, the heuristic models for the probability mass assignments are validated and improved, and rules for tuning the parameters are discussed. The sensitivity of the results to the most important control parameters of the method is assessed. Further we evaluate the contributions of the individual cues used in the classification process to determine the quality of the results. Applying our method with a standard set of parameters on two different ALS data sets with a spacing of about 1 point/m
2, 95% of all buildings larger than 70 m
2 could be detected and 95% of all detected buildings larger than 70 m
2 were correct in both cases. Buildings smaller than 30 m
2 could not be detected. The parameters used in the method have to be appropriately defined, but all except one (which must be determined in a training phase) can be determined from meaningful physical entities. Our research also shows that adding the multi-spectral images to the classification process improves the correctness of the results for small residential buildings by up to 20%.
We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the ...image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged a specific data model and taxonomy in a Knowledge Graph to create the dataset and to store classification results.
Global Navigation Satellite Systems (GNSS) deliver absolute position and velocity, as well as time information (P, V, T). However, in urban areas, the GNSS navigation performance is restricted due to ...signal obstructions and multipath. This is especially true for applications dealing with highly automatic or even autonomous driving. Subsequently, multi-sensor platforms including laser scanners and cameras, as well as map data are used to enhance the navigation performance, namely in accuracy, integrity, continuity and availability. Although well-established procedures for integrity monitoring exist for aircraft navigation, for sensors and fusion algorithms used in automotive navigation, these concepts are still lacking. The research training group i.c.sens, integrity and collaboration in dynamic sensor networks, aims to fill this gap and to contribute to relevant topics. This includes the definition of alternative integrity concepts for space and time based on set theory and interval mathematics, establishing new types of maps that report on the trustworthiness of the represented information, as well as taking advantage of collaboration by improved filters incorporating person and object tracking. In this paper, we describe our approach and summarize the preliminary results.
The 3D reconstruction of objects is a prerequisite for many highly relevant applications of computer vision such as mobile robotics or autonomous driving. To deal with the inverse problem of ...reconstructing 3D objects from their 2D projections, a common strategy is to incorporate prior object knowledge into the reconstruction approach by establishing a 3D model and aligning it to the 2D image plane. However, current approaches are limited due to inadequate shape priors and the insufficiency of the derived image observations for a reliable alignment with the 3D model. The goal of this paper is to show how 3D object reconstruction can profit from a more sophisticated shape prior and from a combined incorporation of different observation types inferred from the images. We introduce a subcategory-aware deformable vehicle model that makes use of a prediction of the vehicle type for a more appropriate regularisation of the vehicle shape. A multi-branch CNN is presented to derive predictions of the vehicle type and orientation. This information is also introduced as prior information for model fitting. Furthermore, the CNN extracts vehicle keypoints and wireframes, which are well-suited for model-to-image association and model fitting. The task of pose estimation and reconstruction is addressed by a versatile probabilistic model. Extensive experiments are conducted using two challenging real-world data sets on both of which the benefit of the developed shape prior can be shown. A comparison to state-of-the-art methods for vehicle pose estimation shows that the proposed approach performs on par or better, confirming the suitability of the developed shape prior and probabilistic model for vehicle reconstruction.