This article presents a robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments. The key new feature of the system is that it ...handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it first uses an object-agnostic grasping framework to map from visual observations to actions: inferring dense pixel-wise probability maps of the affordances for four different grasping primitive actions. It then executes the action with the highest affordance and recognizes picked objects with a cross-domain image classification framework that matches observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box for novel objects without requiring any additional data collection or re-training. Exhaustive experimental results demonstrate that our multi-affordance grasping achieves high success rates for a wide variety of objects in clutter, and our recognition algorithm achieves high accuracy for both known and novel grasped objects. The approach was part of the MIT–Princeton Team system that took first place in the stowing task at the 2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are available online at http://arc.cs.princeton.edu/
Children’s social learning (SL) is characterized by significant variation. Explaining when and why children excel in some SL problems but not others is an unappreciated but significant problem in the ...developmental sciences. Here, two studies explore different forms of SL in preschoolers (3–6 years) using two tablet-based tasks, Cognitive and Spatial. These tasks involve sequencing items by their identity (e.g., Apple→Boy→Cat) or spatial location (e.g., Top→Bottom→Right). Experiment 1 (n = 189) explored children’s ability to learn different sequences by individual—trial-and-error—learning (baseline), recalling these individually learned sequences after a brief delay (recall), copying a novel sequence following a demonstration (novel imitation), and copying a familiar sequence that had been previously learned by trial and error (familiar imitation). Experiment 2 (n = 99) measured novel imitation and individual recall in addition to children’s ability to learn different sequences from a model’s mistake (goal emulation) and from physical/symbolic feedback provided automatically by a tablet (i.e., ghost condition). Results showed that familiar imitation and goal emulation developed early across tasks. Whereas novel imitation and ghost (affordance) learning developed late. An exploration of the dimensionality of these skills showed that imitation (Exp. 1), whether familiar or novel, was domain-specific. In contrast, emulation (Exp. 2) was multi-dimensional in the Spatial Task but unidimensional in the Cognitive Task. These results highlight the mosaic nature of children’s SL development. Results provide a model for explaining some of the observed variation in children’s performance across task and research paradigms. This information can be used to better predict when and why children are likely to succeed (or fail) in SL tasks.
•3–6 years learned what (apple→Boy→Cat) and where (Top→Bottom→Right) sequences in two tablet-based tasks.•Sequences could be learned by individual recall, novel or familiar imitation, goal emulation or affordance learning (ghost control).•Familiar imitation and goal emulation developed early across tasks. Novel imitation and ghost (affordance) learning developed late.•There were few associations between learning conditions; Mostly restricted to within-task performance.•Results suggest that variation in social learning may be explained by domain-type and content-familiarity.
•Behavioral structure helps infer behavioral function.•Temporal analysis of object play helps test functional hypotheses about tool use.•Stone play actions afford the emergence of stone tool use in a ...sexual context.•Male monkeys can use stones as tools to masturbate.
Inferring functional components of behavioral sequences is a crucial but challenging task. A systematic comparison of their temporal structure is a good starting point, based on the postulate that more functional traits are less structurally variable. We studied stone handling behavior (SH) in Balinese long-tailed macaques, a versatile form of stone-directed play. We tested the hypothesis that stones are used by male monkeys to stimulate their genitals in a sexual context (i.e., “sex toy” hypothesis). Specifically, two SH actions (i.e., “tap-on-groin” (TOG) and “rub-on-groin” (ROG), respectively the repetitive tapping and rubbing of a stone onto the genital area) gained functional properties as self-directed tool-assisted masturbation. Owing to the structural organization of playful activities, we predicted that SH sequences without TOG/ROG would exhibit higher levels of variability, repeatability and exaggeration than SH sequences with TOG/ROG. We also predicted that TOG/ROG would occur more often and last longer in SH sequences in which penile erection – a sexually-motivated physiological response in primates – was observed than in SH sequences in which penile erection was not observed.
To identify and compare recurring series of SH patterns otherwise undetectable by using conventional quantitative approaches across SH sequences containing TOG/ROG or not, we used a temporal analysis known as “T-pattern detection and analysis” (TPA). Our predictions about variability, exaggeration and temporal association between TOG/ROG in males and penile erection were supported. As expected, SH sequences without TOG/ROG were, on average, more repeatable than SH sequences with TOG/ROG, but the difference was not statistically significant. Overall, the “sex toy” hypothesis was partly supported, and our results suggested that TOG and ROG are two forms of tool-assisted genital stimulation, possibly derived from the playful handling of stones. These findings are consistent with the view that tool use may evolve in stages from initially non-functional object manipulation, such as object play.
The 6-Degree-of-Freedom (6-DoF) robotic grasping is a fundamental task in robot manipulation, aimed at detecting graspable points and corresponding parameters in a 3D space, i.e affordance learning, ...and then a robot executes grasp actions with the detected affordances. Existing research works on affordance learning predominantly focus on learning local features directly for each grid in a voxel scene or each point in a point cloud scene, subsequently filtering the most promising candidate for execution. Contrarily, cognitive models of grasping highlight the significance of global descriptors, such as size, shape, and orientation, in grasping. These global descriptors indicate a grasp path closely tied to actions. Inspired by this, we propose a novel bio-inspired neural network that explicitly incorporates global feature encoding. In particular, our method utilizes a Truncated Signed Distance Function (TSDF) as input, and employs the recently proposed Transformer model to encode the global features of a scene directly. With the effective global representation, we then use deconvolution modules to decode multiple local features to generate graspable candidates. In addition, to integrate global and local features, we propose using a skip-connection module to merge lower-layer global features with higher-layer local features. Our approach, when tested on a recently proposed pile and packed grasping dataset for a decluttering task, surpassed state-of-the-art local feature learning methods by approximately 5% in terms of success and declutter rates. We also evaluated its running time and generalization ability, further demonstrating its superiority. We deployed our model on a Franka Panda robot arm, with real-world results aligning well with simulation data. This underscores our approach’s effectiveness for generalization and real-world applications.
Recent reports on tool use in nonforaging contexts have led researchers to reconsider the proximate drivers of instrumental object manipulation. In this study, we explore the physiological and ...behavioral correlates of two stone‐directed and seemingly playful actions, the repetitive tapping and rubbing of stones onto the genital and inguinal area, respectively, that may have been co‐opted into self‐directed tool‐assisted masturbation in long‐tailed macaques (i.e., “Sex Toy” hypothesis). We predicted that genital and inguinal stone‐tapping and rubbing would be more closely temporally associated with physiological responses (e.g., estrus in females, penile erection in males) and behavior patterns (e.g., sexual mounts and other mating interactions) that are sexually motivated than other stone‐directed play. We also predicted that the stones selected to perform genital and inguinal stone‐tapping and rubbing actions would be less variable in number, size, and texture than the stones typically used during other stone‐directed playful actions. Overall, our data partly supported the “Sex Toy” hypothesis indicating that stone‐directed tapping and rubbing onto the genital and inguinal area are sexually motivated behaviors. Our research suggests that instrumental behaviors of questionably adaptive value may be maintained over evolutionary time through pleasurable/self‐rewarding mechanisms, such as those underlying playful and sexual activities.
Genital‐directed stone play actions are sexually motivated in male long‐tailed macaques.
Adult females show a higher level of selectivity for the texture of the stones they use to perform genital‐directed stone play.
Balinese long‐tailed macaques can use stones as tools to masturbate.
Learning by observing others is especially beneficial for young and naïve individuals. The relationship to the social partner is thus important. While peers are often used as demonstrators to test ...for social learning abilities in a species, thereby studying horizontal transmission of information, this study focused on the vertical transmission of information, i.e. learning across generations, in a highly social species. Half-a-year-old piglets of the Kune Kune breed, Sus scrofa domesticus (in contrast to the usual subjects in studies on pigs raised and kept in seminatural conditions), were first exposed to their mother or aunt pushing one of two differently coloured bars to either the left or right side to open a sliding door, and were then tested after 1min, 1h and 1-day retention intervals. Results indicated that subjects recalled the movement of the door, rather than using local or stimulus enhancement. A second test series revealed that the pigs used the demonstrated opening technique and even remembered it after a delay of 24h. Nonexposed piglets did not show a side bias during their first encounters with the apparatus; however, habit formation was at play during later test sessions and was possibly the reason for long-term memory of the self-acquired techniques. Altogether, this study revealed that piglets learned how to solve a manipulative foraging problem from both their mother and their aunt, probably by acquiring some information through observation and then memorizing it for up to a day.
•Piglets learned from observing their mother or aunt to recreate object movements.•Piglets were able to recall observed information for up to 24h.•Nonexposed piglets memorized self-acquired box-opening techniques for 5 months.
Object affordances play a major role in action expression: (a) providing opportunities to generate potential solutions to instrumental problems and (b) shaping and constraining the motor actions ...available to an individual. The playful manipulation of objects can facilitate individual acquisition of functional object-assisted actions through affordance learning. We tested the "object affordance" hypothesis in free-ranging long-tailed macaques. This hypothesis holds that the physical properties associated with stone size afford different stone-directed actions, in the context of stone handling (SH) behavior, a form of culturally maintained stone play from which stone tool use can emerge. We predicted that higher SH versatility (i.e., total number of different SH behavioral elements expressed) and higher duration of the SH behavioral element "Pound" would be associated with the manipulation of medium-sized stones, followed by small stones, and then large stones. Our data partly supported these predictions. Both medium-sized and small-sized stones afforded the highest SH versatility, and a higher duration of "Pound" than large stones. As expected, duration of "Pound" was higher with medium than small stones, but the difference was not statistically significant. Our results were consistent with Newell's constraint model, which emphasizes the role of objects' physical properties in limiting and enhancing the expression of actions directed to these objects. The relaxed selective pressures acting on SH behavior may enhance the expression of a range of actions directed toward stones of different sizes that could facilitate the emergence of instrumental solutions and may contribute to explaining the evolution of lithic technology in early humans.
Robots need to understand their environment to perform their task. If it is possible to pre-program a visual scene analysis process in closed environments, robots operating in an open environment ...would benefit from the ability to learn it through their interaction with their environment. This ability furthermore opens the way to the acquisition of affordances maps in which the action capabilities of the robot structure its visual scene understanding. We propose an approach to build such affordances maps by relying on an interactive perception approach and an online classification for a real robot equipped with two arms with 7 degrees of freedom. Our system is modular and permits to learn maps from different skills. In the proposed formalization of affordances, actions and effects are related to visual features, not objects, thus our approach does not need a prior definition of the concept of object. We have tested the approach on three action primitives and on a real PR2 robot.
Learning from demonstration holds the promise of enabling robots to learn diverse actions from expert experience. In contrast to learning from observation-action pairs, humans learn to imitate in a ...more flexible and efficient manner: learning behaviors by simply "watching." In this article, we propose a "watch-and-act" imitation learning pipeline that endows a robot with the ability of learning diverse manipulations from visual demonstrations. Specifically, we address this problem by intuitively casting it as two subtasks: 1) understanding the demonstration video and 2) learning the demonstrated manipulations. First, a captioning module based on visual change is presented to understand the demonstration by translating the demonstration video into a command sentence. Then, to execute the captioning command, a manipulation module that learns the demonstrated manipulations is built upon an instance segmentation model and a manipulation affordance prediction model. We validate the superiority of the two modules over existing methods separately via extensive experiments and demonstrate the whole robotic imitation system developed based on the two modules in diverse scenarios using a real robotic arm. Supplementary video is available at https://vsislab.github.io/watch-and-act/.
As a popular concept proposed in the field of psychology, affordance has been regarded as one of the important abilities that enable humans to understand and interact with the environment. Briefly, ...it captures the possibilities and effects of the actions of an agent applied to a specific object or, more generally, a part of the environment. This paper provides a short review of the recent developments of deep robotic affordance learning (DRAL), which aims to develop data-driven methods that use the concept of affordance to aid in robotic tasks. We first classify these papers from a reinforcement learning (RL) perspective and draw connections between RL and affordances. The technical details of each category are discussed and their limitations are identified. We further summarise them and identify future challenges from the aspects of observations, actions, affordance representation, data-collection and real-world deployment. A final remark is given at the end to propose a promising future direction of the RL-based affordance definition to include the predictions of arbitrary action consequences.