Although communication plays a pivotal role in achieving coordinated activities in multi-agent systems, conventional approaches often involve complicated high-dimensional messages generated by deep ...networks. These messages are typically indecipherable to humans, are relatively costly to transmit, and require intricate encoding and decoding networks. This can pose a design limitation for the agents such as autonomous (mobile) robots. This lack of interpretability can lead to systemic issues with security and reliability. In this study, inspired by common human communication about likely actions in collaborative endeavors, we propose a novel approach in which each agent’s action probabilities are transmitted to other agents as messages, drawing inspiration from the common human practice of sharing likely actions in collaborative endeavors. Our proposed framework is referred to as communication based on action probabilities (CAP), and focuses on generating straightforward, low-dimensional, interpretable messages to support multiple agents in coordinating their activities to achieve specified cooperative goals. CAP streamlines our comprehension of the agents’ learned coordinated and cooperative behaviors and eliminates the need to use additional network models to generate messages. CAP’s network architecture is simpler than that of state-of-the-art methods, and our experimental results show that it nonetheless performed comparably, converged faster, and exhibited a lower volume of communication with better interpretability.
Cooperation and coordination are major issues in studies on multi-agent systems because the entire performance of such systems is greatly affected by these activities. The issues are challenging ...however, because appropriate coordinated behaviors depend on not only environmental characteristics but also other agents’ strategies. On the other hand, advances in multi-agent deep reinforcement learning (MADRL) have recently attracted attention, because MADRL can considerably improve the entire performance of multi-agent systems in certain domains. The characteristics of learned coordination structures and agent’s resulting behaviors, however, have not been clarified sufficiently. Therefore, we focus here on MADRL in which agents have their own deep Q-networks (DQNs), and we analyze their coordinated behaviors and structures for the
pickup and floor laying problem
, which is an abstraction of our target application. In particular, we analyze the behaviors around scarce resources and long narrow passages in which conflicts such as collisions are likely to occur. We then indicated that different types of inputs to the networks exhibit similar performance but generate various coordination structures with associated behaviors, such as division of labor and a shared social norm, with no direct communication.
Decentralized execution is a widely used framework in multi-agent reinforcement learning. However, it has a well-known but neglected shortcoming, redundant computation, that is, the same/similar ...computation is performed redundantly in different agents owing to their overlapping observations. This study proposes a novel method, the locally centralized team transformer (LCTT), to address this problem. This method first proposes a locally centralized execution framework that autonomously determines some agents as leaders that generate instructions and other agents as workers to act according to the received instructions without running their policy networks. For the LCTT, we subsequently propose the team-transformer (T-Trans) structure, which enables leaders to generate targeted instructions for each worker, and the leadership shift, which enables agents to determine those that should instruct or be instructed by others. The experimental results demonstrated that the proposed method significantly reduces redundant computations without decreasing rewards and achieves faster learning convergence.
In this work, we focus on an environment where multiple agents with complementary capabilities cooperate to generate non-conflicting joint actions that achieve a specific target. The central problem ...addressed is how several agents can collectively learn to coordinate their actions such that they complete a given task together without conflicts. However, sequential decision-making under uncertainty is one of the most challenging issues for intelligent cooperative systems. To address this, we propose a multi-agent concurrent framework where agents learn coordinated behaviors in order to divide their areas of responsibility. The proposed framework is an extension of some recent deep reinforcement learning algorithms such as DQN, double DQN, and dueling network architectures. Then, we investigate how the learned behaviors change according to the dynamics of the environment, reward scheme, and network structures. Next, we show how agents behave and choose their actions such that the resulting joint actions are optimal. We finally show that our method can lead to stable solutions in our specific environment.
This study proposes a method to automatically generate paths for multiple autonomous agents to collectively form a sequence of consecutive patterns. Several studies have considered minimizing the ...total travel distances of all agents for formation transitions in applications with multiple self-driving robots, such as unmanned aerial vehicle shows by drones or group actions in which self-propelled robots synchronously move together, consecutively transforming the patterns without collisions. However, few studies consider fairness in travel distance between agents, which can lead to battery exhaustion for certain agents and thereafter reduced operating time. Furthermore, because these group actions are usually performed with a large number of agents, they can have only small batteries to reduce cost and weight, but their performance time depends on the battery duration. The proposed method, which is based on ant colony optimization (ACO), considers the fairness in distances traveled by agents as well as the less total traveling distances, and can achieve long transitions in both three- and two-dimensional spaces. Our experiments demonstrate that the proposed method based on ACO allows agents to execute more formation patterns without collisions than the conventional method, which is also based on ACO.
In this paper, we propose an enhanced version of the distributed attentional actor architecture (eDA3-X) for model-free reinforcement learning. This architecture is designed to facilitate the ...interpretability of learned coordinated behaviors in multi-agent systems through the use of a saliency vector that captures partial observations of the environment. Our proposed method, in principle, can be integrated with any deep reinforcement learning method, as indicated by X, and can help us identify the information in input data that individual agents attend to during and after training. We then validated eDA3-X through experiments in the object collection game. We also analyzed the relationship between cooperative behaviors and three types of attention heatmaps (standard, positional, and class attentions), which provided insight into the information that the agents consider crucial when making decisions. In addition, we investigated how attention is developed by an agent through training experiences. Our experiments indicate that our approach offers a promising solution for understanding coordinated behaviors in multi-agent reinforcement learning.
In high carbon steel, TTT nose temperature rises and upper baninte becomes easy to be formed with quantity of Si addition. Generation of upper bainite is reduced by boron addition. In this study, the ...influence of boron addition on isothermal transformation behavior in Si-added high carbon steel was clarified. By boron addition, lamellar spacing and growth rate of pearlite doesn’t change, but the nucleation of pealite is reduced. But nucleation of pearlite is promoted when Fe23(C,B)6 precipitates. In the Si-added high carbon steel, upper bainite is often formed with the generated ferrite on prior austenite grain boundary. It is inferred that boron reduces ferrite generation in grain boundary which causes upper bainite formation. It is confirmed that effective existence state of boron is grain boundary segregation.
We propose a two-stage reward allocation method with decay using an extension of replay memory to adapt this rewarding method for deep reinforcement learning (DRL), to generate coordinated behaviors ...for tasks that can be completed by executing a few subtasks sequentially by heterogeneous agents. An independent learner in cooperative multi-agent systems needs to learn its policies for effective execution of its own responsible subtask, as well as for coordinated behaviors under a certain coordination structure. Although the reward scheme is an issue for DRL, it is difficult to design it to learn both policies. Our proposed method attempts to generate these different behaviors in multi-agent DRL by dividing the timing of rewards into two stages and varying the ratio between them over time. By introducing the coordinated delivery and execution problem with an expiration time, where a task can be executed sequentially by two heterogeneous agents, we experimentally analyze the effect of using various ratios of the reward division in the two-stage allocations on the generated behaviors. The results demonstrate that the proposed method could improve the overall performance relative to those with the conventional one-time or fixed reward and can establish robust coordinated behavior.
The retweet is a characteristic mechanism of several social network services/social media, such as Facebook, Twitter, and Weibo. By retweeting tweet, users can share an article with their friends and ...followers. However, it is not clear how retweets affect the dominant behaviors of users. Therefore, this study investigates the impact of retweets on the behavior of social media users from the perspective of networked game theory, and how the existence of the retweet mechanism in social media promotes or reduces the willingness of users to post and comment on articles. To address these issues, we propose the retweet reward game model and quote tweet reward game model by adding the retweet and quote tweet mechanisms to a relatively simple social networking service model known as the reward game. Subsequently, we conduct simulation-based experiments to understand the influence of retweets on the user behavior on various networks. It is demonstrated that users will be more willing to post new articles with a retweet mechanism, and quote retweets are more beneficial to users, as users can expect to spread their information and their own comments on already posted articles.
Social networking services (SNSs) are constantly used by a large number of people with various motivations and intentions depending on their social relationships and purposes, and thus, resulting in ...diverse strategies of posting/consuming content on SNSs. Therefore, it is important to understand the differences of the individual strategies depending on their network locations and surroundings. For this purpose, by using a game-theoretical model of users called
agents
and proposing a co-evolutionary algorithm called
multiple-world genetic algorithm
to evolve diverse strategy for each user, we investigated the differences in individual strategies and compared the results in artificial networks and those of the Facebook ego network. From our experiments, we found that agents did not select the free rider strategy, which means that just reading the articles and comments posted by other users, in the Facebook network, although this strategy is usually cost-effective and usually appeared in the artificial networks. We also found that the agents who mainly comment on posted articles/comments and rarely post their own articles appear in the Facebook network but do not appear in the connecting nearest-neighbor networks, although we think that this kind of user actually exists in real-world SNSs. Our experimental simulation also revealed that the number of friends was a crucial factor to identify users’ strategies on SNSs through the analysis of the impact of the differences in the reward for a comment on various ego networks.