With the increasing presence of robots in our daily life, there is a strong need and demand for the strategies to acquire a high quality interaction between robots and users by enabling robots to ...understand usersʼ mood, intention, and other aspects. During human-human interaction, personality traits have an important influence on human behavior, decision, mood, and many others. Therefore, we propose an efficient computational framework to endow the robot with the capability of understanding the userʼ s personality traits based on the userʼ s nonverbal communication cues represented by three visual features including the head motion, gaze, and body motion energy, and three vocal features including voice pitch, voice energy, and mel-frequency cepstral coefficient ( MFCC ) . We used the Pepper robot in this study as a communication robot to interact with each participant by asking questions, and meanwhile, the robot extracts the nonverbal features from each participantʼ s habitual behavior using its on-board sensors. On the other hand, each participantʼ s personality traits are evaluated with a questionnaire. We then train the ridge regression and linear support vector machine ( SVM ) classifiers using the nonverbal features and personality trait labels from a questionnaire and evaluate the performance of the classifiers. We have verified the validity of the proposed models that showed promising binary classification performance on recognizing each of the Big Five personality traits of the participants based on individual differences in nonverbal communication cues.
Image registration is one of the most fundamental and widely used tools in optical mapping applications. It is mostly achieved by extracting and matching salient points (features) described by ...vectors (feature descriptors) from images. While matching the descriptors, mismatches (outliers) do appear. Probabilistic methods are then applied to remove outliers and to find the transformation (motion) between images. These methods work in an iterative manner. In this paper, an efficient way of integrating geometric invariants into feature-based image registration is presented aiming at improving the performance of image registration in terms of both computational time and accuracy. To do so, geometrical properties that are invariant to coordinate transforms are studied. This would be beneficial to all methods that use image registration as an intermediate step. Experimental results are presented using both semi-synthetically generated data and real image pairs from underwater environments.
Object rearrangement is widely demanded in many of the manipulation tasks performed by industrial and service robots. Rearranging an object through planar pushing is deemed energy efficient and safer ...compared with the pick-and-place operation. However, due to the unknown physical properties of the object, rearranging an object toward the target position is difficult to accomplish. Even though robots can benefit from multimodal sensory data for estimating novel object dynamics, the exact estimation error bound is still unknown. In this work, first, we demonstrate a way to obtain an error bound on the center of mass (CoM) estimation for the novel object only using a position-controlled robot arm and a vision sensor. Specifically, we extend Mason's Voting Theorem to object CoM estimation in the absence of accurate information on friction and object shape. The probable CoM locations are monotonously narrowed down to a convex region, and the extended voting theorems' guarantee that the convex region contains the CoM ground truth in the presence of contact normal estimation error and pushing execution error. For the object translation task, existing methods generally assume that the pusher-object system's physical properties and full-state feedback are available, or utilize iterative pushing executions, which limits the application of planar pushing to real-world settings. In this work, assuming a nominal friction coefficient between the pusher and object through contact normal error bound analysis, we leverage the estimated convex region and the Zero Moment Two Edge Pushing method (Gao et al., 2023) to select the contact configurations for object pure translation. It is ensured that the selected contact configurations are capable of tolerating the CoM estimation error. The experimental results show that the object can be accurately translated to the target position with only two controlled pushes at most.
Since the dynamic nature of human–robot interaction becomes increasingly prevalent in our daily life, there is a great demand for enabling the robot to better understand human personality traits and ...inspiring humans to be more engaged in the interaction with the robot. Therefore, in this work, as we design the paradigm of human–robot interaction as close to the real situation as possible, the following three main problems are addressed: (1) fusion of visual and audio features of human interaction modalities, (2) integration of variable length feature vectors, and (3) compensation of shaky camera motion caused by movements of the robot’s communicative gesture. Specifically, the three most important visual features of humans including head motion, gaze, and body motion were extracted from a camera mounted on the robot performing verbal and body gestures during the interaction. Then, our system was geared to fuse the aforementioned visual features and different types of vocal features, such as voice pitch, voice energy, and Mel-Frequency Cepstral Coefficient, dealing with variable length multiple feature vectors. Lastly, considering unknown patterns and sequential characteristics of human communicative behavior, we proposed a multi-layer Hidden Markov Model that improved the classification accuracy of personality traits and offered notable advantages of fusing the multiple features. The results were thoroughly analyzed and supported by psychological studies. The proposed multi-modal fusion approach is expected to deepen the communicative competence of social robots interacting with humans from different cultures and backgrounds.
•Human personality trait recognition from nonverbal cues in Human–Robot Interaction.•A framework for extracting visual features of human motion with robot camera motion compensation.•Multi-modal feature fusion approach to improving personality trait classification accuracy.•A multi-layer Hidden Markov Model for autonomous nonverbal feature selection.•Analysis (and interpretation) of experimental results, reflecting on psychological studies.
Image mosaicing sits at the core of many optical mapping applications with mobile robotic platforms. As these platforms have been evolving rapidly and increasing their capabilities, the amount of ...data they are able to collect is increasing drastically. For this reason, the necessity for efficient methods to handle and process such big data has been rising from different scientific fields, where the optical data provides valuable information. One of the challenging steps of image mosaicing is finding the best image-to-map (or mosaic) motion (represented as a planar transformation) for each image while considering the constraints imposed by inter-image motions. This problem is referred to as Global Alignment (GA) or Global Registration, which usually requires a non-linear minimization. In this paper, following the aforementioned motivations, we propose a two-step global alignment method to obtain globally coherent mosaics with less computational cost and time. It firstly tries to estimate the scale and rotation parameters and then the translation parameters. Although it requires a non-linear minimization, Jacobians are simple to compute and do not contain the positions of correspondences. This allows for saving computational cost and time. It can be also used as a fast way to obtain an initial estimate for further usage in the Symmetric Transfer Error Minimization (STEMin) approach. We presented experimental and comparative results on different datasets obtained by robotic platforms for mapping purposes.
Robot planar pushing is one of the primitive elements of non-prehensile manipulation skills and has been widely studied as an alternative solution to complex manipulation tasks. To transfer this ...skill to novel objects, reasoning the pushing effect on object motion is important for selecting proper contact locations and pushing directions. However, complex contact conditions and unknown physical properties of the object cause difficulties in reasoning. In this work, firstly, we present a new large planar pushing dataset that contains a wide range of simulated objects and a novel representation for pushing primitives for the data-driven prediction model. Secondly, we propose a computation efficient planning method that employs a heuristic to reduce the possibility of making sliding contact between the pusher and the object. The prediction model and planning method were evaluated both in simulation and real experimental settings. The results show that the prediction model purely trained using our simulation dataset is capable of predicting real object motions accurately. The push planning method effectively reduces the number of pushes required to move unknown real objects to target positions.
Recent advancements in AI have significantly enhanced smart diagnostic methods, bringing us closer to achieving end-to-end diagnosis. Ultrasound image segmentation plays a crucial role in this ...diagnostic process. An accurate and robust segmentation model accelerates the process and reduces the burden of sonographers. In contrast to previous research, we consider two inherent features of ultrasound images: (1) different organs and tissues vary in spatial sizes, and (2) the anatomical structures inside the human body form a relatively constant spatial relationship. Based on those two ideas, we proposed two segmentation models combining multi-scale convolution neural network backbones and a spatial context feature extractor. We discuss two backbone structures to extract anatomical structures of different scales: the Feature Pyramid Network (FPN) backbone and the Trident Network backbone. Moreover, we show how Spatial Recurrent Neural Network (SRNN) is implemented to extract the spatial context features in abdominal ultrasound images. Our proposed model has achieved dice coefficient score of 0.919 and 0.931, respectively.
Pushing is one of the fundamental nonprehensile manipulation skills to impart to an object changes in position and orientation. To exploit this skill to manipulate novel objects, explicit knowledge ...of their physical properties should be given a priori . In this work, we estimate the center of mass (CoM) of an object by narrowing down its probable location with a deep learning model and Mason's voting theorem. In addition, we propose the Zero Moment Two Edge Pushing (ZMTEP) method to translate a novel object without rotation to a goal pose. The proposed method enables a pusher to select the most suitable two-edge-contact configuration for a given object using the estimated CoM and the geometrical shape of the object. Notably, neither the friction between the object and its support plane nor the friction between the object and the pusher are assumed to be known. We evaluate the proposed CoM estimation and ZMTEP methods through a series of experiments in both simulation and real robotic pusher settings. The result shows that the CoM estimation method has good mean squared error properties and small standard deviation, and the ZMTEP method significantly outperforms competitive baseline methods. Note to Practitioners -This article aims to endow robotic arms with the capability of moving or aligning objects by pushing, which is much more simple and secure than pick-and-place or in-hand manipulations. Most in-demand manipulation skills require sophisticated hand design and control, which might not be affordable for industrial applications staying cost-competitive. In contrast, robot pushing can be implemented with different types of simple pushers and straightforwardly applied to pre-grasp manipulation. This article makes the estimation of an object's CoM location practical. Building upon the estimation method, a robust and noise-tolerant two-edge-contact pushing configuration selection method is presented to translate an arbitrarily shaped unknown object to its goal pose.
Over the past decade, several image mosaicing methods have been proposed in robotic mapping and remote sensing applications. Owing to rapid developments in obtaining optical data from areas beyond ...human reach, there is a high demand from different science fields for creating large-area image mosaics, often using images as the only source of information. One of the most important steps in the mosaicing process is motion estimation between overlapping images to obtain the topology, i.e., the spatial relationships between images.
In this paper, we propose a generic framework for feature-based image mosaicing capable of obtaining the topology with a reduced number of matching attempts and of getting the best possible trajectory estimation. Innovative aspects include the use of a fast image similarity criterion combined with a Minimum Spanning Tree (MST) solution, to obtain a tentative topology and information theory principles to decide when to update trajectory estimation. Unlike previous approaches for large-area mosaicing, our framework is able to naturally deal with the cases where time-consecutive images cannot be matched successfully, such as completely unordered sets. This characteristic also makes our approach robust to sensor failure. The performance of the method is illustrated with experimental results obtained from different challenging underwater image sequences.
► Obtaining the topology with a reduced number of matching attempts. ► Use of a fast image similarity criterion combined with a Minimum Spanning Tree solution. ► Use of information theory principles to decide when to update trajectory estimation. ► Being able to obtain topology from completely unordered sets.