Picking tasks in logistics warehouses requires handling many objects of various types, increasing daily. Therefore, high generalization performance is required for object detection in bin-picking ...systems in logistics warehouses, but conventional methods have yet to meet this requirement. We propose a Multi-modal Mask R-CNN (M3R-CNN) and its training method for that aim. M3R-CNN is a network for the instance-segmentation task that takes RGB and depth as input and obtains high generalizability with small training data. We trained this network with 561 scenes of training data using our proposed method and obtained a recognition accuracy of F1-score = 0.631 and mAP = 0.958 for unknown objects. We also performed an object-grasping experiment with a robot using the M3R-CNN and obtained an availability-score of 0.97.
Full text
Available for:
BFBNIB, DOBA, GIS, IJS, IZUM, KILJ, KISLJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Pose estimation is a particularly important link in the task of robotic bin-picking. Its purpose is to obtain the 6D pose (3D position and 3D posture) of the target object. In real bin-picking ...scenarios, noise, overlap, and occlusion affect accuracy of pose estimation and lead to failure in robot grasping. In this paper, a new point-pair feature (PPF) descriptor is proposed, in which curvature information of point-pairs is introduced to strengthen feature description, and improves the point cloud matching rate. The proposed method also introduces an effective point cloud preprocessing, which extracts candidate targets in complex scenarios, and, thus, improves the overall computational efficiency. By combining with the curvature distribution, a weighted voting scheme is presented to further improve the accuracy of pose estimation. The experimental results performed on public data set and real scenarios show that the accuracy of the proposed method is much higher than that of the existing PPF method, and it is more efficient than the PPF method. The proposed method can be used for robotic bin-picking in real industrial scenarios.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Object segmentation is a common task in bin-picking. Region-growing-based methods have been proven to be applicable for ordinary tasks, but they are not suitable for closely adjacent and stacked ...scenes. In this paper, we propose an inward-region-growing-based accurate partitioning method for bin-picking. A boundary bud generation algorithm is proposed for detecting the boundary initial points of closely adjacent objects for region growing. Then, a simplified growing algorithm, namely, the oriented unrestrained growing algorithm, is proposed for limiting the growing direction to the inward direction and accelerating the growing process. These experimental results demonstrate that the proposed method can achieve higher accuracy and speed than existing methods, especially in closely adjacent scenes.
Instance segmentation is an important pre-processing task in numerous real-world applications, such as robotics, autonomous vehicles, and human–computer interaction. Compared with the rapid ...development of deep learning for two-dimensional (2D) image tasks, deep learning-based instance segmentation of 3D point cloud still has a lot of room for development. In particular, distinguishing a large number of occluded objects of the same class is a highly challenging problem, which is seen in robotic bin-picking. In a usual bin-picking scene, many identical objects are stacked together and the model of the objects is known. Thus, the semantic information can be ignored; instead, the focus in the bin-picking is put on the segmentation of instances. Based on this task requirement, we propose a Fast Point Cloud Clustering (FPCC) for instance segmentation of industrial bin-picking scene. FPCC includes a network named FPCC-Net and a fast clustering algorithm. FPCC-Net extracts features of each point and infers geometric center points of each instance simultaneously. After that, the proposed clustering algorithm clusters the remaining points to the closest geometric center in feature embedding space. Experiments show that FPCC also surpasses the existing works in bin-picking scenes and is more computationally efficient. Our code and data are available at (https://github.com/xyjbaal/FPCC).
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Industrial robots have been utilized for factory automation due to their high repeatability. Along with the development of visual servo and machine learning techniques, various vision-based ...autonomous pick and place methods have been presented. However, unlike parts in a bin handled in manufacturing environment, the technique was studied in less cluttered environment and huge dataset for the training. This research suggests a robot bin picking platform that uses an initial Convolutional Neural Network (CNN) model trained by human data, then increases the accuracy of the model by continuously training by the robot itself. In the human part, a user determines the pickable or non-pickable parts from depth image obtained by a lidar sensor and referenced 3D partial point cloud of a block for Iterative Closest Point (ICP) algorithm is generated. Next, the autonomous part generates a CNN model with the initial human data, tries to perform pick operation autonomously, and repeats training of the CNN model through the collected data by itself. Through the experiments, 74% success rate, which appeared only with initial human data, increased up to 87% with 2000 dataset. This platform is expected to build an autonomous robotic bin picking system without CAD models and less efforts to prepare the labeled dataset for training deep learning models.
•A robotic bin picking of cluttered objects is proposed from the utilization of human skills to teach autonomy.•This platform consists of human interface for creating initial convolutional neural network model and its self-learning.•The success rate of bin picking was increased from 74% from human to 87% after self-learning with 2000 datasets.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
•The point cloud based two-stage deep learning method is proposed to solve the pose estimation problem in cluttered and occluded scenes, which restricts the pose prediction to an individual object ...and improves the accuracy and efficiency of the pose estimation.•An efficient physically-simulated engine is constructed to generate the synthetic point cloud dataset instead of the hand-annotated dataset for industrial objects, which greatly reduces the computational cost of the dataset generation in the real scene.•The distance feature branch and the semantic branch are fused to the instance feature to enlarge the difference of different instances, which improves the robustness of the semantic instance segmentation network.•The point cloud projection-based pose estimation network is constructed to predict the instance point cloud pose, and the pose estimation accuracy is improved by fusing the depth and normal feature maps.•A loss function to train the pose estimation network is specially designed for the symmetric object, which avoids incorrect pose prediction because of the symmetry of the object.
3D object pose estimation for robotic grasping and manipulation is a crucial task in the manufacturing industry. In cluttered and occluded scenes, the 6D pose estimation of the low-textured or textureless industrial object is a challenging problem due to the lack of color information. Thus, point cloud that is hardly affected by the lighting conditions is gaining popularity as an alternative solution for pose estimation. This article proposes a deep learning-based pose estimation using point cloud as input, which consists of instance segmentation and instance point cloud pose estimation. The instance segmentation divides the scene point cloud into multiple instance point clouds, and each instance point cloud pose is accurately predicted by fusing the depth and normal feature maps. In order to reduce the time consumption of the dataset acquisition and annotation, a physically-simulated engine is constructed to generate the synthetic dataset. Finally, several experiments are conducted on the public, synthetic and real datasets to verify the effectiveness of the pose estimation network. The experimental results show that the point cloud based pose estimation network can effectively and robustly predict the poses of objects in cluttered and occluded scenes.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Bin picking is still a challenge in robotics, as patent in recent robot competitions. These competitions are an excellent platform for technology comparisons since some participants may use ...state-of-the-art technologies, while others may use conventional ones. Nevertheless, even though points are awarded or subtracted based on the performance in the frame of the competition rules, the final score does not directly reflect the suitability of the technology. Therefore, it is difficult to understand which technologies and their combination are optimal for various real-world problems. In this paper, we propose a set of performance metrics selected in terms of actual field use as a solution to clarify the important technologies in bin picking. Moreover, we use the selected metrics to compare our four original robot systems, which achieved the best performance in the Stow task of the Amazon Robotics Challenge 2017. Based on this comparison, we discuss which technologies are ideal for practical use in bin picking robots in the fields of factory and warehouse automation.
Full text
Available for:
BFBNIB, DOBA, GIS, IJS, IZUM, KILJ, KISLJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Suction cups are an important gripper type in industrial robot applications, and the prior literature focuses on using vision-based planners to improve grasping success in these tasks. Vision-based ...planners can fail due to adversarial objects or lose generalizability for unseen scenarios, without retraining learned algorithms. In this article, we propose haptic exploration to improve suction cup grasping when visual grasp planners fail. We present the smart suction cup, an end effector that utilizes internal flow measurements for tactile sensing. We show that model-based haptic search methods, guided by these flow measurements, improve grasping success by up to 2.5× as compared with using only a vision planner during a bin-picking task. In characterizing the smart suction cup on both geometric edges and curves, we find that flow rate can accurately predict the ideal motion direction even with large postural errors. The smart suction cup includes no electronics on the cup itself, such that the design is easy to fabricate and haptic exploration does not damage the sensor. This work motivates the use of suction cups with autonomous haptic search capabilities in especially adversarial scenarios.
Real-time robotic grasping, supporting a subsequent precise object-in-hand operation task, is a priority target towards highly advanced autonomous systems. However, such an algorithm which can ...perform sufficiently-accurate grasping with time efficiency is yet to be found. This paper proposes a novel method with a 2-stage approach that combines a fast 2D object recognition using a deep neural network and a subsequent accurate and fast 6D pose estimation based on Point Pair Feature framework to form a real-time 3D object recognition and grasping solution capable of multi-object class scenes. The proposed solution has a potential to perform robustly on real-time applications, requiring both efficiency and accuracy. In order to validate our method, we conducted extensive and thorough experiments involving laborious preparation of our own dataset. The experiment results show that the proposed method scores 97.37% accuracy in 5cm5deg metric and 99.37% in Average Distance metric. Experiment results have shown an overall 62% relative improvement (5cm5deg metric) and 52.48% (Average Distance metric) by using the proposed method. Moreover, the pose estimation execution also showed an average improvement of 47.6% in running time. Finally, to illustrate the overall efficiency of the system in real-time operations, a pick-and-place robotic experiment is conducted and has shown a convincing success rate with 90% of accuracy. This experiment video is available at https://sites.google.com/view/dl-ppf6dpose/.
•Towards a robust and accurate real-time robotics grasping.•A custom pipeline to coordinate submodules to form a 3D object recognition system.•Performance achieves high metrics score 97.37% (5cm5deg) and 99.37% (ADD).•A large relative improvement in detection metrics 5cm5deg (62%) and ADD (52.48%).•A large average running time improvement of 47.6%
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Most supervised learning-based pose estimation methods for stacked scenes are trained on massive synthetic datasets. In most cases, the challenge is that the learned network on the training dataset ...is no longer optimal on the testing dataset. To address this problem, we propose a pose regression network PPR-Net++. It transforms each scene point into a point in the centroid space, followed by a clustering process and a voting process. In the training phase, a mapping function between the network's critical parameter (i.e., the bandwidth of the clustering algorithm) and the compactness of the centroid distributions is obtained. This function is used to adapt the bandwidth between centroid distributions of two different domains. In addition, to further improve the pose estimation accuracy, the network also predicts the confidence of each point, based on its visibility and pose error. Only the points with high confidence have the right to vote for the final object pose. In experiments, our method is trained on the IPA synthetic dataset and compared with the state-of-the-art algorithm. When tested with the public synthetic Siléane dataset, our method is better in all eight objects, where five of them are improved by more than 5% in average precision (AP). On IPA real dataset, our method outperforms a large margin by 20%. This lays a solid foundation for robot grasping in industrial scenarios. Note to Practitioners-Our work is motivated by industrial product assembly based on robot grasping. The industrial parts are usually manufactured by numerical machines and piled in bins. Our method can estimate the poses of visible parts accurately. A pose of a part includes its centroid and spatial orientations. Combined with a depth camera, this algorithm allows an industrial robot to understand complex stacked scenes. We improve the pose estimation accuracy in order to assemble parts with robot grasping, without an additional pose adjuster. Our network can learn from a synthetic dataset and apply it to real-world data, without a significant accuracy drop. The synthetic dataset can be obtained easily by computer simulation programs, so the training data are sufficient. Experiments demonstrate that our method outperforms the state-of-the-art pose estimation approaches.