While
any
grasp must satisfy the grasping stability criteria,
good
grasps depend on the specific manipulation scenario: the object, its properties and functionalities, as well as the task and grasp ...constraints. We propose a
probabilistic logic
approach for robot grasping, which improves grasping capabilities by leveraging semantic object parts. It provides the robot with
semantic reasoning skills
about the most likely object part to be grasped, given the task constraints and object properties, while also dealing with the uncertainty of visual perception and grasp planning. The probabilistic logic framework is
task-dependent
. It semantically reasons about pre-grasp configurations with respect to the intended task and employs object-task affordances and object/task ontologies to encode rules that generalize over similar object parts and object/task categories. The use of probabilistic logic for task-dependent grasping contrasts with current approaches that usually learn direct mappings from visual perceptions to task-dependent grasping points. The logic-based module receives data from a low-level module that extracts semantic objects parts, and sends information to the low-level grasp planner. These three modules define our probabilistic logic framework, which is able to perform robotic grasping in realistic kitchen-related scenarios.
Object recognition, localization, and tracking play a role of primordial importance in computer vision applications. However, it is still an extremely difficult task, particularly in scenarios where ...objects are attended to using fast-moving UAVs that need to robustly operate in real time. Typically the performance of these vision-based systems is affected by motion blur and geometric distortions, to name but two issues. Gimbal systems are thus essential to compensate for motion blur and ensure visual streams are stable. In this work, we investigate the advantages of active tracking approaches using a three-degrees-of-freedom (DoF) gimbal system mounted on UAVs. A method that utilizes joint movement and visual information for actively tracking spherical and planar objects in real time is proposed. Tracking methodologies are tested and evaluated in two different realistic Gazebo simulation environments: the first on 3D positional tracking (sphere) and the second on tracking of 6D poses (planar fiducial markers). We show that active object tracking is advantageous for UAV applications, first, by reducing motion blur, caused by fast camera motion and vibrations, and, second, by fixating the object of interest within the center of the field of view and thus reducing re-projection errors due to peripheral distortion. The results demonstrate significant object pose estimation accuracy improvements of active approaches when compared with traditional passive ones. More specifically, a set of experiments suggests that active gimbal tracking can increase the spatial estimation accuracy of known-size moving objects, under conditions of challenging motion patterns and in the presence of image distortion.
When large vessels such as container ships are approaching their destination port, they are required by law to have a maritime pilot on board responsible for safely navigating the vessel to its ...desired location. The maritime pilot has extensive knowledge of the local area and how currents and tides affect the vessel's navigation. In this work, we present a novel end-to-end solution for estimating time-to-collision time-to-collision (TTC) between moving objects (i.e., vessels), using real-time image streams from aerial drones in dynamic maritime environments. Our method relies on deep features, which are learned using realistic simulation data, for reliable and robust object detection, segmentation, and tracking. Furthermore, our method uses rotated bounding box representations, which are computed by taking advantage of pixel-level object segmentation for enhanced TTC estimation accuracy. We present collision estimates in an intuitive manner, as collision arrows that gradually change its color to red to indicate an imminent collision. A set of experiments in a realistic shipyard simulation environment demonstrate that our method can accurately, robustly, and quickly predict TTC between dynamic objects seen from a top-view, with a mean error and a standard deviation of 0.358 and 0.114 s, respectively, in a worst case scenario.
In this work we study how information provided by foveated images sampled according to the log-polar transformation can be integrated over time in order to build accurate world representations and ...accomplish visual search tasks in an efficient manner. We focus on a specific visual information modality depth and on how to store it in a flexible memory structure. We propose a probabilistic observational model for a stereo system that relies on the Unscented Transform in order to propagate uncertainty in stereo matching, due to spatial quantization in the retina, to the 3D Cartesian domain. Probabilistic depth measurements are integrated in a novel Sensory Ego-Sphere whose topology can be biased with foveal-like distributions, according to the autonomous agent short-term tasks and goals. Furthermore, we investigate an Upper Confidence Bound algorithm for the task of simultaneously finding the closest object to the observer (visual search) and learning the surrounding environment 3D map (mapping). The performance of task execution is assessed both with a foveated log-polar sensor and a classical uniform one. The advantage of foveal vision and custom ego-sphere representations are illustrated in a series of experiments with a realistic simulator.
This article presents a complete solution for autonomous mapping and inspection tasks, namely a lightweight multi-camera drone design coupled with computationally efficient planning algorithms and ...environment representations for enhanced autonomous navigation in exploration and mapping tasks. The proposed system utilizes state-of-the-art Next-Best-View (NBV) planning techniques, with geometric and semantic segmentation information computed with Deep Convolutional Neural Networks (DCNNs) to improve the environment map representation. The main contributions of this article are the following. First, we propose a novel efficient sensor observation model and a utility function that encodes the expected information gains from observations taken from specific viewpoints. Second, we propose a reward function that incorporates both geometric and semantic probabilistic information provided by a DCNN for semantic segmentation that operates in close to real-time. The incorporation of semantics in the environment representation enables biasing exploration towards specific object categories while disregarding task-irrelevant ones during path planning. Experiments in both a virtual and a real scenario demonstrate the benefits on reconstruction accuracy of using semantics for biasing exploration towards task-relevant objects, when compared with purely geometric state-of-the-art methods. Finally, we present a unified approach for the selection of the number of cameras on a UAV, to optimize the balance between power consumption, flight-time duration, and exploration and mapping performance trade-offs. Unlike previous design optimization approaches, our method is couples with the sense and plan algorithms. The proposed system and general formulations can be be applied in the mapping, exploration, and inspection of any type of environment, as long as environment dependent semantic training data are available, with demonstrated successful applicability in the inspection of dry dock shipyard environments.
In this work we present a novel end-to-end solution for tracking objects (i.e., vessels), using video streams from aerial drones, in dynamic maritime environments. Our method relies on deep features, ...which are learned using realistic simulation data, for robust object detection, segmentation and tracking. Furthermore, we propose the use of rotated bounding-box representations, which are computed by taking advantage of pixel-level object segmentation, for improved tracking accuracy, by reducing erroneous data associations during tracking, when combined with the appearance-based features. A thorough set of experiments and results obtained in a realistic shipyard simulation environment, demonstrate that our method can accurately, and fast detect and track dynamic objects seen from a top-view.
In order to explore and understand the surrounding environment in an efficient manner, humans have developed a set of space-variant vision mechanisms that allow them to actively attend different ...locations in the surrounding environment and compensate for memory, neuronal transmission bandwidth and computational limitations in the brain. Similarly, humanoid robots deployed in everyday environments have limited on-board resources, and are faced with increasingly complex tasks that require interaction with objects arranged in many possible spatial configurations. The main goal of this work is to describe and overview biologically inspired, space-variant human visual mechanism benefits, when combined with state-of-the-art algorithms for different visual tasks (e.g. object detection), ranging from low-level hardwired attention vision (i.e. foveal vision) to high-level visual attention mechanisms. We overview the state-of-the-art in biologically plausible space-variant resource-constrained vision architectures, namely for active recognition and localization tasks.
To enable flexible quality inspection in industrial manufacturing environments, there is an increasing demand for easy-to-use automation systems that can aid factory workers in repetitive tasks. ...However, quality-control tasks may be too complex for a single classification algorithm to deliver results comparable to a human operator. Therefore, we propose a computer-vision architecture to improve flexibility of quality inspection tasks, that first detects objects of interest, estimates their poses and defines regions-of-interest prior to the classification task takes place. The proposed hybrid architecture consists of a deep-learning based object detection and segmentation model, and edge-based pose estimation method. We validate our approaches in texture-less, reflecting and symmetric vehicle metal sheets which are challenging for state-of-the-art object detection and pose estimation methods. Furthermore, since annotating the data required by these methods is laborious and time-consuming, we train our object detection and pose estimation architectures on 3D synthetic datasets based on available CAD model. We demonstrate promising domain generalization results on the object detection stage of our architecture (Mask R-CNN) as while trained with synthetic data, it reaches a bounding box mAP of 0.781 on synthetic and mAP of 0.686 on real images. On the other side, our simple edge-based pose estimation method can cope with texture-less parts even if using synthetic data as reference and being evaluated on real images. Our custom edge-based pose estimator reaches 16° of average rotation error and 0.14m of average translation error on real images.
In this paper we propose algorithms for 3D object recognition from 3D point clouds of rotationally symmetric objects. We base our work in a recent method that represents objects using a hash table of ...shape features, which allows to match efficiently features that vote for object pose hypotheses. In the case of symmetric objects, the rotation angle about the axis of symmetry does not provide any information, so the hash table contains redundant information. We propose a way to remove redundant features by adding a weight factor for each set of symmetric features. The removal procedure leads to significant computational savings both in storage and time while keeping the recognition performance. We analyze the theoretical storage gains and compare them against the practical ones. We also compare the execution time gains in feature matching and pose clustering. The experiments show storage gains up to 100× and execution time savings up to 3500× with respect to state-of-the-art methods.