Autonomous robotic manipulation in clutter is challenging. A large variety of objects must be perceived in complex scenes, where they are partially occluded and embedded among many distractors, often ...in restricted spaces. To tackle these challenges, we developed a deep-learning approach that combines object detection and semantic segmentation. The manipulation scenes are captured with RGB-D cameras, for which we developed a depth fusion method. Employing pretrained features makes learning from small annotated robotic datasets possible. We evaluate our approach on two challenging datasets: one captured for the Amazon Picking Challenge 2016, where our team NimbRo came in second in the Stowing and third in the Picking task; and one captured in disaster-response scenarios. The experiments show that object detection and semantic segmentation complement each other and can be combined to yield reliable object perception.
A common practice to gain invariant features in object recognition models is to aggregate multiple low-level features over a small neighborhood. However, the differences between those models makes a ...comparison of the properties of different aggregation functions hard. Our aim is to gain insight into different functions by directly comparing them on a fixed architecture for several common object recognition tasks. Empirical results show that a maximum pooling operation significantly outperforms subsampling operations. Despite their shift-invariant properties, overlapping pooling windows are no significant improvement over non-overlapping pooling windows. By applying this knowledge, we achieve state-of-the-art error rates of 4.57% on the NORB normalized-uniform dataset and 5.6% on the NORB jittered-cluttered dataset.
Scene understanding is an important capability for robots acting in unstructured environments. While most SLAM approaches provide a geometrical representation of the scene, a semantic map is ...necessary for more complex interactions with the surroundings. Current methods treat the semantic map as part of the geometry which limits scalability and accuracy. We propose to represent the semantic map as a geometrical mesh and a semantic texture coupled at independent resolution. The key idea is that in many environments the geometry can be greatly simplified without loosing fidelity, while semantic information can be stored at a higher resolution, independent of the mesh. We construct a mesh from depth sensors to represent the scene geometry and fuse information into the semantic texture from segmentations of individual RGB views of the scene. Making the semantics persistent in a global mesh enables us to enforce temporal and spatial consistency of the individual view predictions. For this, we propose an efficient method of establishing consensus between individual segmentations by iteratively retraining semantic segmentation with the information stored within the map and using the retrained segmentation to re-fuse the semantics. We demonstrate the accuracy and scalability of our approach by reconstructing semantic maps of scenes from NYUv2 and a scene spanning large buildings.
Semantic scene understanding is important for various applications. In particular, self-driving cars need a fine-grained understanding of the surfaces and objects in their vicinity. Light detection ...and ranging (LiDAR) provides precise geometric information about the environment and is thus a part of the sensor suites of almost all self-driving cars. Despite the relevance of semantic scene understanding for this application, there is a lack of a large dataset for this task which is based on an automotive LiDAR. In this paper, we introduce a large dataset to propel research on laser-based semantic segmentation. We annotated all sequences of the KITTI Vision Odometry Benchmark and provide dense point-wise annotations for the complete 360-degree field-of-view of the employed automotive LiDAR. We propose three benchmark tasks based on this dataset: (i) semantic segmentation of point clouds using a single scan, (ii) semantic segmentation using multiple past scans, and (iii) semantic scene completion, which requires to anticipate the semantic scene in the future. We provide baseline experiments and show that there is a need for more sophisticated models to efficiently tackle these tasks. Our dataset opens the door for the development of more advanced methods, but also provides plentiful data to investigate new research directions.
Deep convolutional neural networks have shown outstanding performance in the task of semantically segmenting images. Applying the same methods on 3D data still poses challenges due to the heavy ...memory requirements and the lack of structured data. Here, we propose LatticeNet, a novel approach for 3D semantic segmentation, which takes raw point clouds as input. A PointNet describes the local geometry which we embed into a sparse permutohedral lattice. The lattice allows for fast convolutions while keeping a low memory footprint. Further, we introduce DeformSlice, a novel learned data-dependent interpolation for projecting lattice features back onto the point cloud. We present results of 3D segmentation on multiple datasets where our method achieves state-of-the-art performance. We also extend and evaluate our network for instance and dynamic object segmentation.
Understanding the growth and development of individual plants is of central importance in modern agriculture, crop breeding, and crop science. To this end, using 3D data for plant analysis has gained ...attention over the last years. High-resolution point clouds offer the potential to derive a variety of plant traits, such as plant height, biomass, as well as the number and size of relevant plant organs. Periodically scanning the plants even allows for performing spatio-temporal growth analysis. However, highly accurate 3D point clouds from plants recorded at different growth stages are rare, and acquiring this kind of data is costly. Besides, advanced plant analysis methods from machine learning require annotated training data and thus generate intense manual labor before being able to perform an analysis. To address these issues, we present with this dataset paper a multi-temporal dataset featuring high-resolution registered point clouds of maize and tomato plants, which we manually labeled for computer vision tasks, such as for instance segmentation and 3D reconstruction, providing approximately 260 million labeled 3D points. To highlight the usability of the data and to provide baselines for other researchers, we show a variety of applications ranging from point cloud segmentation to non-rigid registration and surface reconstruction. We believe that our dataset will help to develop new algorithms to advance the research for plant phenotyping, 3D reconstruction, non-rigid registration, and deep learning on raw point clouds. The dataset is freely accessible at https://www.ipb.uni-bonn.de/data/pheno4d/.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
To verify and validate networks, it is essential to gain insight into their decisions, limitations as well as possible shortcomings of training data. In this work, we propose a post-hoc, optimization ...based visual explanation method, which highlights the evidence in the input image for a specific prediction. Our approach is based on a novel technique to defend against adversarial evidence (i.e. faulty evidence due to artefacts) by filtering gradients during optimization. The defense does not depend on human-tuned parameters. It enables explanations which are both fine-grained and preserve the characteristics of images, such as edges and colors. The explanations are interpretable, suited for visualizing detailed evidence and can be tested as they are valid model inputs. We qualitatively and quantitatively evaluate our approach on a multitude of models and datasets.
This work demonstrates that it is possible to prepare new, competitive thin-film composite (TFC) membranes with a polyolefin ultrafiltration membrane as support and with a non-porous ...photo-cross-linked polyimide as separation layer for organic solvent nanofiltration. The commercial polyimide Lenzing P84® was modified by a polymer-analogous reaction to introduce side groups with carbon–carbon double bonds to increase its photo-reactivity with respect to cross-linking. Polymer characterization revealed that this was successfully achieved at acceptable level of main chain scission. The higher reactivity of the photo-cross-linkable polyimide had been confirmed by comparison with the original polymer; i.e., shorter gelation times upon UV irradiation, higher suppression of swelling by solvents and complete stability in strong solvents for not cross-linked polyimide such as dimethylformamide (DMF) had been obtained. For films from unmodified and modified polyimide, the degree of swelling in various solvents could be adjusted by UV irradiation time. Photo-cross-linking of the original polyimide did not lead to stability in DMF. TFC membranes had been prepared by polymer solution casting on a polyethylene ultrafiltration membrane, UV irradiation of the liquid film and subsequent solvent evaporation. Polyimide barrier film thicknesses between 10 and 1μm were obtained by variation of cast film thickness. Performance in organic solvent nanofiltration was analyzed by using hexane, toluene, isopropanol and DMF as well as two dyes with molar masses of ∼300 and ∼1000g/mol. Permeances of TFC membranes from unmodified polyimide were low (<0.1L/hm2bar) while rejections of up to 100% for the dye with ∼1000g/mol could be achieved. TFC membranes from modified and photo-cross-linked polyimide had adjustable separation performance in DMF with a trade-off between permeance and selectivity, in the same range (e.g.: 0.3L/hm2bar and 97% rejection for the dye with ∼1000g/mol) as a commercial conventional polyimide membrane tested in parallel. The established membrane preparation method is promising because by tuning the degree of cross-linking of the polymeric barrier layer, the membrane separation performance could be tailored within the same manufacturing process.