In this paper, a new imitation learning algorithm is proposed based on the Restored Action Generative Adversarial Imitation Learning (RAGAIL) from observation. An action policy is trained to move a ...robot manipulator similar to a demonstrator’s behavior by using the restored action from state-only demonstration. To imitate the demonstrator, the trajectory is generated by Recurrent Generative Adversarial Networks (RGAN), and the action is restored from the output of the tracking controller constructed by the state and the generated target trajectory. The proposed imitation learning algorithm is not required to access the demonstrator’s action (internal control signal such as force/torque command) and provides better learning performances. The effectiveness of the proposed method is validated through the experimental results of the robot manipulator.
•A new generative adversarial imitation learning is proposed with the restored action.•The state of the demonstration and the recurrent state are used to generate the target trajectory.•The experimental results were obtained by a drawing task on a 7 degree of freedom (DOF) Sawyer robot.
Recent Advances in Robot Learning from Demonstration Ravichandar, Harish; Polydoros, Athanasios S; Chernova, Sonia ...
Annual review of control, robotics, and autonomous systems,
05/2020, Volume:
3, Issue:
1
Journal Article
Peer reviewed
Open access
In the context of robotics and automation, learning from demonstration (LfD) is the paradigm in which robots acquire new skills by learning to imitate an expert. The choice of LfD over other robot ...learning methods is compelling when ideal behavior can be neither easily scripted (as is done in traditional robot programming) nor easily defined as an optimization problem, but can be demonstrated. While there have been multiple surveys of this field in the past, there is a need for a new one given the considerable growth in the number of publications in recent years. This review aims to provide an overview of the collection of machine-learning methods used to enable a robot to learn from and imitate a teacher. We focus on recent advancements in the field and present an updated taxonomy and characterization of existing methods. We also discuss mature and emerging application areas for LfD and highlight the significant challenges that remain to be overcome both in theory and in practice.
Vehicle path planning is one of the effective ways to relieve the huge traffic flow pressure of modern urban transportation system, and it is also an important way to realize carbon emission ...reduction and to build green transportation system as well as smart city. At present, the artificial intelligence (AI) algorithms with reinforcement learning (RL) as the mainstream have achieved great success in the field of vehicle path planning. However, RL only conducts policy learning based on the evaluation feedback of the environment, whereas imitation learning (IL) can obtain more direct feedback from expert decision data, and then obtain a decision model close to the expert level by comparing with RL. At present, there are very few vehicle path planning algorithms based on IL, and they are often hindered by the compounding error and sample complexity dilemma, resulting in poor path planning effectiveness. In order to overcome these problems, in this paper, a mixed generative adversarial IL (MixGAIL) algorithm has been proposed, which effectively integrates the transition aware adversarial IL (TAIL) and generative adversarial IL (GAIL) based on minimum-distance functions (MIMIC-MD) methods under the framework of GAIL. In order to overcome the optimization dilemma of non-convex and non-smooth objective function after the integration, the proposed MixGAIL uses mixed policy gradient actor-critic model with random escape term and filter optimization (MPGACEF), and pioneers the noise projected subgradient descent method with momentum (MNPSGD) for global optimization. Experiments have shown that by learning expert decision data, MixGAIL has better vehicle path planning performance and faster iteration speed than classic IL algorithms such as behavioral cloning (BC), dataset aggregation (DAgger), feature expectation matching (FEM), game theoretical appraisal learning (GTAL), TAIL, and MIMIC-MD, and is closer to expert level.
With the development of deep representation learning, the domain of reinforcement learning (RL) has become a powerful learning framework now capable of learning complex policies in high dimensional ...environments. This review summarises deep reinforcement learning (DRL) algorithms and provides a taxonomy of automated driving tasks where (D)RL methods have been employed, while addressing key computational challenges in real world deployment of autonomous driving agents. It also delineates adjacent domains such as behavior cloning, imitation learning, inverse reinforcement learning that are related but are not classical RL algorithms. The role of simulators in training agents, methods to validate, test and robustify existing solutions in RL are discussed.
Recently, Internet of Vehicles (IoV) has become one of the most active research fields in both academic and industry, which exploits resources of vehicles and Road Side Units (RSUs) to execute ...various vehicular applications. Due to the increasing number of vehicles and the asymmetrical distribution of traffic flows, it is essential for the network operator to design intelligent offloading strategies to improve network performance and provide high-quality services for users. However, the lack of global information and the time-variety of IoVs make it challenging to perform effective offloading and caching decisions under long-term energy constraints of RSUs. Since Artificial Intelligence (AI) and machine learning can greatly enhance the intelligence and the performance of IoVs, we push AI inspired computing, caching and communication resources to the proximity of smart vehicles, which jointly enable RSU peer offloading, vehicle-to-RSU offloading and content caching in the IoV framework. A Mix Integer Non-Linear Programming (MINLP) problem is formulated to minimize total network delay, consisting of communication delay, computation delay, network congestion delay and content downloading delay of all users. Then, we develop an online multi-decision making scheme (named OMEN) by leveraging Lyapunov optimization method to solve the formulated problem, and prove that OMEN achieves near-optimal performance. Leveraging strong cognition of AI, we put forward an imitation learning enabled branch-and-bound solution in edge intelligent IoVs to speed up the problem solving process with few training samples. Experimental results based on real-world traffic data demonstrate that our proposed method outperforms other methods from various aspects.
Existing safe imitation learning (safe IL) methods mainly focus on learning safe policies that are similar to expert ones, but may fail in applications requiring different safety constraints. In this ...paper, we propose the Lagrangian Generative Adversarial Imitation Learning (LGAIL) algorithm, which can adaptively learn safe policies from a single expert dataset under diverse prescribed safety constraints. To achieve this, we augment GAIL with safety constraints and then relax it as an unconstrained optimization problem by utilizing a Lagrange multiplier. The Lagrange multiplier enables explicit consideration of the safety and is dynamically adjusted to balance the imitation and safety performance during training. Then, we apply a two-stage optimization framework to solve LGAIL: (1) a discriminator is optimized to measure the similarity between the agent-generated data and the expert ones; (2) forward reinforcement learning is employed to improve the similarity while considering safety concerns enabled by a Lagrange multiplier. Furthermore, theoretical analyses on the convergence and safety of LGAIL demonstrate its capability of adaptively learning a safe policy given prescribed safety constraints. At last, extensive experiments in OpenAI Safety Gym conclude the effectiveness of our approach.
Expert demonstrations in imitation learning often contain different behavioral modes, e.g., driving modes such as driving on the left, keeping the lane, and driving on the right in the driving tasks. ...Although most existing multi-modal imitation learning methods allow learning from demonstrations of multiple modes, they have strict constraints on the data of each mode, generally requiring a near data ratio of all modes. Otherwise, it tends to fall into a mode collapse or only learn the data distribution of the mode that has the largest data volume. To address the problem, an algorithm that balances real-fake loss and classification loss by modifying the output of the discriminator, referred to as BAlanced Generative Adversarial Imitation Learning (BAGAIL), is proposed. With this modification, the generator is only rewarded for generating real trajectories with correct modes. BAGAIL is therefore able to deal with imbalanced expert demonstrations and carry out efficient learning for each mode. The learning process of BAGAIL is divided into a pre-training stage and an imitation learning stage. During the pre-training stage, BAGAIL initializes the generator parameters by means of conditional Behavioral Cloning, laying the foundation for the direction of parameter optimization. During the imitation learning stage, BAGAIL optimizes the parameters by using the adversary between the generator and the modified discriminator so that the finally obtained policy can successfully learn the distribution of imbalanced expert data. The experiments showed that BAGAIL accurately distinguished different behavioral modes with imbalanced demonstrations. What is more, the learning result of each mode is close to the expert standard and more stable than other multi-modal imitation learning methods.
•Introduce the imbalance classification concept to multi-modal imitation learning.•Train the agent to imitate multiple expert behaviors from imbalanced demonstrations.•Treat the real-fake loss and classification loss fairly to prevent mode collapse.•Utilize behavioral cloning with modal labels to speed up the convergence.•Relax the strict balance limits of demonstrations to increase general applicability.
High-quality and representative data is essential for both Imitation Learning (IL)- and Reinforcement Learning (RL)-based motion planning tasks. For real robots, it is challenging to collect enough ...qualified data either as demonstrations for IL or experiences for RL due to safety consideration in environments with obstacles. We target this challenge by proposing the self-imitation learning by planning plus (SILP+) algorithm, which efficiently embeds experience-based planning into the learning architecture to mitigate the data-collection problem. The planner generates demonstrations based on successfully visited states from the current RL policy, and the policy improves by learning from these demonstrations. In this way, we relieve the demand for human expert operators to collect demonstrations required by IL and improve the RL performance as well. Various experimental results shows that SILP+ achieves better training efficiency, higher and more stable success rate in complex motion planning tasks compared to several other methods. Extensive tests on physical robots illustrate the effectiveness of SILP+ in a physical setting, retaining a success rate of 90% where the next-best contender drops from 87% to 75% in the Sim2Real transition.
•We propose the self-imitation learning with planning plus (SILP+) algorithm for motion-planning tasks.•Data preparation for reinforcement learning from demonstrations can be augmented with experience-based planning.•Extrapolation error occurring in common actor–critic reinforcement learning algorithms generally leads to unstable results.•Gaussian-process-guided exploration near obstacles contributes to a safer training process.
Kernelized movement primitives Huang, Yanlong; Rozo, Leonel; Silvério, João ...
The International journal of robotics research,
06/2019, Volume:
38, Issue:
7
Journal Article
Peer reviewed
Open access
Imitation learning has been studied widely as a convenient way to transfer human skills to robots. This learning approach is aimed at extracting relevant motion patterns from human demonstrations and ...subsequently applying these patterns to different situations. Despite the many advancements that have been achieved, solutions for coping with unpredicted situations (e.g., obstacles and external perturbations) and high-dimensional inputs are still largely absent. In this paper, we propose a novel kernelized movement primitive (KMP), which allows the robot to adapt the learned motor skills and fulfill a variety of additional constraints arising over the course of a task. Specifically, KMP is capable of learning trajectories associated with high-dimensional inputs owing to the kernel treatment, which in turn renders a model with fewer open parameters in contrast to methods that rely on basis functions. Moreover, we extend our approach by exploiting local trajectory representations in different coordinate systems that describe the task at hand, endowing KMP with reliable extrapolation capabilities in broader domains. We apply KMP to the learning of time-driven trajectories as a special case, where a compact parametric representation describing a trajectory and its first-order derivative is utilized. In order to verify the effectiveness of our method, several examples of trajectory modulations and extrapolations associated with time inputs, as well as trajectory adaptations with high-dimensional inputs are provided.