In this article, we present a planning framework that uses a combination of implicit (robot motion) and explicit (visual/audio/haptic feedback) communication during mobile robot navigation. First, we ...developed a model that approximates both continuous movements and discrete behavior modes in human navigation, considering the effects of implicit and explicit communication on human decision-making. The model approximates the human as an optimal agent, with a reward function obtained through inverse reinforcement learning. Second, a planner uses this model to generate communicative actions that maximize the robot's transparency and efficiency. We implemented the planner on a mobile robot, using a wearable haptic device for explicit communication. In a user study of an indoor human-robot pair orthogonal crossing situation, the robot is able to actively communicate its intent to users in order to avoid collisions and facilitate efficient trajectories. Results show that the planner generated plans that are easier to understand, reduce users' effort, and increase users' trust of the robot, compared to simply performing collision avoidance. The key contribution of this article is the integration and analysis of explicit communication (together with implicit communication) for social navigation.
Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from ...human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations (e.g., kinesthetic guidance), whereas preferences (e.g., comparative rankings) are actively elicited. Prior research has independently applied reward learning to these different data sources. However, there exist many domains where multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward. This algorithm not only enables us combine multiple data sources, but it also informs the robot when it should leverage each type of information. Further, our approach accounts for the human’s ability to provide data: yielding user-friendly preference queries which are also theoretically optimal. Our extensive simulated experiments and user studies on a Fetch mobile manipulator demonstrate the superiority and the usability of our integrated framework.
Designing reward functions is a difficult task in AI and robotics. The complex task of directly specifying all the desirable behaviors a robot needs to optimize often proves challenging for humans. A ...popular solution is to learn reward functions using expert demonstrations. This approach, however, is fraught with many challenges. Some methods require heavily structured models, for example, reward functions that are linear in some predefined set of features, while others adopt less structured reward functions that may necessitate tremendous amounts of data. Moreover, it is difficult for humans to provide demonstrations on robots with high degrees of freedom, or even quantifying reward values for given trajectories. To address these challenges, we present a preference-based learning approach, where human feedback is in the form of comparisons between trajectories. We do not assume highly constrained structures on the reward function. Instead, we employ a Gaussian process to model the reward function and propose a mathematical formulation to actively fit the model using only human preferences. Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework. We further analyze our algorithm in comparison to several baselines on reward optimization, where the goal is to find the optimal robot trajectory in a data-efficient way instead of learning the reward function for every possible trajectory. Our results in three different simulation experiments and a user study show our approach can efficiently learn expressive reward functions for robotic tasks, and outperform the baselines in both reward learning and reward optimization.
We consider the problem of dynamically allocating tasks to multiple agents under time window constraints and task completion uncertainty. Our objective is to minimize the number of unsuccessful tasks ...at the end of the operation horizon. We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination, and addresses them in a hierarchical manner. The lower layer computes policies for individual agents using dynamic programming with tree search, and the upper layer resolves conflicts in individual plans to obtain a valid multi-agent allocation. Our algorithm,
Stochastic Conflict-Based Allocation
(SCoBA), is optimal in expectation and complete under some reasonable assumptions. In practice, SCoBA is computationally efficient enough to interleave planning and execution online. On the metric of successful task completion, SCoBA consistently outperforms a number of baseline methods and shows strong competitive performance against an oracle with complete lookahead. It also scales well with the number of tasks and agents. We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
Roles such as leading and following can emerge naturally in human groups. However, in human–robot teams, such roles are often predefined due to the difficulty of scalably learning and adapting to ...them. In this work, we enable a robot to efficiently learn how group dynamics emerge and evolve in human teams and we leverage this understanding to plan for influencing actions for autonomous robots that guide the team toward achieving a common goal. We first develop an effective and concise representation of group dynamics, such as leading and following, by enforcing a graph structure while learning the weights of the edges corresponding to one-to-one relationships between the agents. We then develop an optimization-based robot policy that leverages this graph representation to attain an objective by influencing a human team. We apply our framework to two types of group dynamics, leading-following and predator–prey, and show that our structured representation is scalable with different human team sizes and also generalizable across different tasks. We also show that robots that utilize this representation are able to successfully influence a group to achieve various goals compared to robots that do not have access to these graph representations (Parts of this work has been published at Robotics: Science and Systems (RSS) (Kwon et al. in Proceedings of robotics: science and systems (RSS), 2019.
https://doi.org/10.15607/rss.2019.xv.075
).
Assistive robot arms enable people with disabilities to conduct everyday tasks on their own. These arms are dexterous and
high-dimensional
; however, the interfaces people must use to control their ...robots are
low-dimensional
. Consider teleoperating a 7-DoF robot arm with a 2-DoF joystick. The robot is helping you eat dinner, and currently you want to cut a piece of tofu. Today’s robots assume a pre-defined mapping between joystick inputs and robot actions: in one mode the joystick controls the robot’s motion in the
x
–
y
plane, in another mode the joystick controls the robot’s
z
–
yaw
motion, and so on. But this mapping misses out on the task you are trying to perform! Ideally, one joystick axis should control how the robot stabs the tofu, and the other axis should control different cutting motions. Our insight is that we can achieve intuitive, user-friendly control of assistive robots by
embedding
the robot’s high-dimensional actions into low-dimensional and human-controllable
latent actions
. We divide this process into three parts. First, we explore models for learning latent actions from offline task demonstrations, and formalize the properties that latent actions should satisfy. Next, we combine learned latent actions with autonomous robot assistance to help the user reach and maintain their high-level goals. Finally, we learn a personalized alignment model between joystick inputs and latent actions. We evaluate our resulting approach in four user studies where non-disabled participants reach marshmallows, cook apple pie, cut tofu, and assemble dessert. We then test our approach with two disabled adults who leverage assistive devices on a daily basis.
Imitation learning enables robots to learn from demonstrations. Previous imitation learning algorithms usually assume access to optimal expert demonstrations. However, in many real-world ...applications, this assumption is limiting. Most collected demonstrations are not optimal or are produced by an agent with slightly different dynamics. We therefore address the problem of imitation learning when the demonstrations can be sub-optimal or be drawn from agents with varying dynamics. We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning. The proposed score enables learning from more informative demonstrations, and disregarding the less relevant demonstrations. Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
•Autonomous vehicles increase the road capacities under the CTM model.•Reinforcement learning achieves routing autonomous vehicles to minimize latency.•Optimal Nash equilibrium in parallel networks ...under CTM can be solved efficiently.
Road congestion induces significant costs across the world, and road network disturbances, such as traffic accidents, can cause highly congested traffic patterns. If a planner had control over the routing of all vehicles in the network, they could easily reverse this effect. In a more realistic scenario, we consider a planner that controls autonomous cars, which are a fraction of all present cars. We study a dynamic routing game, in which the route choices of autonomous cars can be controlled and the human drivers react selfishly and dynamically. As the problem is prohibitively large, we use deep reinforcement learning to learn a policy for controlling the autonomous vehicles. This policy indirectly influences human drivers to route themselves in such a way that minimizes congestion on the network. To gauge the effectiveness of our learned policies, we establish theoretical results characterizing equilibria and empirically compare the learned policy results with best possible equilibria. We prove properties of equilibria on parallel roads and provide a polynomial-time optimization for computing the most efficient equilibrium. Moreover, we show that in the absence of these policies, high demand and network perturbations would result in large congestion, whereas using the policy greatly decreases the travel times by minimizing the congestion. To the best of our knowledge, this is the first work that employs deep reinforcement learning to reduce congestion by indirectly influencing humans’ routing decisions in mixed-autonomy traffic.
Information gathering actions over human internal state Sadigh, Dorsa; Sastry, S. Shankar; Seshia, Sanjit A. ...
2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
2016-Oct.
Conference Proceeding
Odprti dostop
Much of estimation of human internal state (goal, intentions, activities, preferences, etc.) is passive: an algorithm observes human actions and updates its estimate of human state. In this work, we ...embrace the fact that robot actions affect what humans do, and leverage it to improve state estimation. We enable robots to do active information gathering, by planning actions that probe the user in order to clarify their internal state. For instance, an autonomous car will plan to nudge into a human driver's lane to test their driving style. Results in simulation and in a user study suggest that active information gathering significantly outperforms passive state estimation.
Social Coordination and Altruism in Autonomous Driving Toghi, Behrad; Valiente, Rodolfo; Sadigh, Dorsa ...
IEEE transactions on intelligent transportation systems,
2022-Dec., 2022-12-00, Letnik:
23, Številka:
12
Journal Article
Recenzirano
Odprti dostop
Despite the advances in the autonomous driving domain, autonomous vehicles (AVs) are still inefficient and limited in terms of cooperating with each other or coordinating with vehicles operated by ...humans. A group of autonomous and human-driven vehicles (HVs) which work together to optimize an altruistic social utility can co-exist seamlessly and assure safety and efficiency on the road. Achieving this mission without explicit coordination among agents is challenging, mainly due to the difficulty of predicting the behavior of humans with heterogeneous preferences in mixed-autonomy environments. Formally, we model an AV's maneuver planning in mixed-autonomy traffic as a partially-observable stochastic game and attempt to derive optimal policies that lead to socially-desirable outcomes using a multi-agent reinforcement learning framework (MARL), and propose a semi-sequential multi-agent training and policy dissemination algorithm for our MARL problem. We introduce a quantitative representation of the AVs' social preferences and design a distributed reward structure that induces altruism into their decision-making process. Altruistic AVs are able to form alliances, guide the traffic, and affect the behavior of the HVs to handle competitive driving scenarios. We compare egoistic AVs to our altruistic autonomous agents in a highway merging setting and demonstrate the emerging behaviors that lead to improvement in the number of successful merges and the overall traffic flow and safety.