Adversarial risk analysis has been introduced as a framework to deal with risks derived from intentional actions of adversaries. The analysis supports one of the decisionmakers, who must forecast the ...actions of the other agents. Typically, this forecast must take account of random consequences resulting from the set of selected actions. The solution requires one to model the behavior of the opponents, which entails strategic thinking. The supported agent may face different kinds of opponents, who may use different rationality paradigms, for example, the opponent may behave randomly, or seek a Nash equilibrium, or perform level‐k thinking, or use mirroring, or employ prospect theory, among many other possibilities. We describe the appropriate analysis for these situations, and also show how to model the uncertainty about the rationality paradigm used by the opponent through a Bayesian model averaging approach, enabling a fully decision‐theoretic solution. We also show how as we observe an opponent's decision behavior, this approach allows learning about the validity of each of the rationality models used to predict his decision by computing the models' (posterior) probabilities, which can be understood as a measure of their validity. We focus on simultaneous decision making by two agents.
Opponent modeling is a key challenge in Real-Time Strategy (RTS) games as the environment is adversarial in these games, and the player cannot predict the future actions of her opponent. ...Additionally, the environment is partially observable due to the fog of war. In this paper, we propose an opponent model which is robust to the observation noise existing due to the fog of war. In order to cope with the uncertainty existing in these games, we design a Bayesian network whose parameters are learned from an unlabeled game-logs dataset; so it does not require a human expert’s knowledge. We evaluate our model on StarCraft which is considered as a unified test-bed in this domain. The model is compared with that proposed by Synnaeve and Bessiere. Experimental results on recorded games of human players show that the proposed model can predict the opponent’s future decisions more effectively. Using this model, it is possible to create an adaptive game intelligence algorithm applicable to RTS games, where the concept of build order (the order of building construction) exists.
Negotiation is a process essential for a wide range of applications. The complex decision making involved in negotiation makes its automation difficult. The complexity is further increased as ...negotiators hide their individual preferences from each other to avoid exploitation by the opponent. Even though sharing of private preference information leads to better agreement for both sides, it is never done in the absence of trust. In this work, we learn opponent’s preference information from the offers given by the opponent using Analytic Hierarchy Process (AHP). We apply our approach to the negotiation of Quality-of-Service (QoS) parameters for the establishment of Service Level Agreements (SLA) between a provider and a consumer. Experiments show that using AHP, the negotiations are faster and the agreements are on or nearer to the pareto-optimal line.
One important component of developing autonomous agents lies in the accurate prediction of their opponents' behaviors when the agents interact with others in an uncertain environment. Most recent ...study focuses on first constructing predictive types (or models) of the opponents, considering their various properties of interest, and subsequently using these models to predict their behaviors accordingly. However, as the possible type space can be rather large, it is time-consuming, and sometimes even infeasible, to predict the actual behaviors of opponents with all candidate types. Thus, in this paper a tractable opponent behavior reasoning approach is proposed that facilitates (<inline-formula><tex-math notation="LaTeX">a</tex-math></inline-formula>) extraction of a small yet representative summary of all candidates using sub-modular-type maximization, and accordingly, (<inline-formula><tex-math notation="LaTeX">b</tex-math></inline-formula>) identification of the most appropriate type for real-time behavior prediction based on multi-armed bandits. In addition, we propose a knowledge-transfer scheme through demonstration learning to synchronize subject agents' knowledge about their opponents' behaviors. This further reduces the burden of reasoning with all models of their opponents from the perspective of individual subject agents. We integrate the new behavior prediction and reasoning method into a state-of-the-art evolutionary multi-agent framework, namely a memetic multi-agent system (MeMAS), and demonstrate empirical performance in two problem domains.
We consider an autonomous agent facing a stochastic, partially observable, multiagent environment. In order to compute an optimal plan, the agent must accurately predict the actions of the other ...agents, since they influence the state of the environment and ultimately the agent’s utility. To do so, we propose a special case of interactive partially observable Markov decision process, in which the agent does not explicitly model the other agents’ beliefs and preferences, and instead represents them as stochastic processes implemented by probabilistic deterministic finite state controllers (PDFCs). The agent maintains a probability distribution over the PDFC models of the other agents, and updates this belief using Bayesian inference. Since the number of nodes of these PDFCs is unknown and unbounded, the agent places a Bayesian nonparametric prior distribution over the infinitely dimensional set of PDFCs. This allows the size of the learned models to adapt to the complexity of the observed behavior. Deriving the posterior distribution is in this case too complex to be amenable to analytical computation; therefore, we provide a Markov chain Monte Carlo algorithm that approximates the posterior beliefs over the other agents’ PDFCs, given a sequence of (possibly imperfect) observations about their behavior. Experimental results show that the learned models converge behaviorally to the true ones. We consider two settings, one in which the agent first learns, then interacts with other agents, and one in which learning and planning are interleaved. We show that the agent’s performance increases as a result of learning in both situations. Moreover, we analyze the dynamics that ensue when two agents are simultaneously learning about each other while interacting, showing in an example environment that coordination emerges naturally from our approach. Furthermore, we demonstrate how an agent can exploit the learned models to perform indirect inference over the state of the environment via the modeled agent’s actions.
The payoff of an agent depends on both the environment and the actions of other agents. Thus, the ability to model and predict the strategies and behaviors of other agents in an interactive ...decision-making scenario is one of the core functionalities in intelligent systems. State-of-the-art methods for opponent modeling mainly use an explicit model of opponents’ actions, preferences, targets, etc., that the primary agent uses to make decisions. It is more important for an agent to increase its payoff than to accurately predict opponents’ behavior. Therefore, we propose a framework synchronizing the opponent modeling and decision making of the primary agent by incorporating opponent modeling into reinforcement learning. For interactive decisions, the payoff depends not only on the behavioral characteristics of the opponent but also the current state. However, confounding the two obscures the effects of state and action, which then cannot be accurately encoded. To this end, state evaluation is separated from action evaluation in our model. The experimental results from two game environments, a simulated soccer game and a real game called quiz bowl, show that the introduction of opponent modeling can effectively improve decision payoffs. In addition, the proposed framework for opponent modeling outperforms benchmark models.
•A novel approach to complex agent-based negotiations is proposed.•The approach is able to effectively learn an unknown opponent’s strategy.•The approach suggests concession toward opponents in an ...adaptive manner.•Extensive experimental results show the negotiation qualities of the approach.
Negotiation among computational autonomous agents has gained rapidly growing interest in previous years, mainly due to its broad application potential in many areas such as e-commerce and e-business. This work deals with automated bilateral multi-issue negotiation in complex environments. Although tremendous progress has been made, available algorithms and techniques typically are limited in their applicability for more complex situations, in that most of them are based on simplifying assumptions about the negotiation complexity such as simple or partially known opponent behaviors and availability of negotiation history. We propose a negotiation approach called OMAC★ that aims at tackling these problems. OMAC★ enables an agent to efficiently model opponents in real-time through discrete wavelet transformation and non-linear regression with Gaussian processes. Based on the approximated model the decision-making component of OMAC★ adaptively adjusts its utility expectations and negotiation moves. Extensive experimental results are provided that demonstrate the negotiation qualities of OMAC★, both from the standard mean-score performance perspective and the perspective of empirical game theory. The results show that OMAC★ outperforms the top agents from the 2012, 2011 and 2010 International Automated Negotiating Agents Competition (ANAC) in a broad range of negotiation scenarios.
This paper proposes a dynamic resource trading scheme in unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) network. A UAV-assisted MEC server adaptively adjusts its trajectory to ...sell the computation offloading services to the mobile users (MUs), where the MUs have stochastic task arrivals. In this context, we formulate the sequential resource trading problem as a stochastic Stackelberg game, which is composed of two stages for each trading round. In the first stage, the self-interested UAV jointly optimizes its trajectory and service price to maximize its long-term profits. In the second stage, the non-cooperative MUs optimize their binary offloading decisions to minimize the average task processing delay and service payment. However, it is challenging to obtain the equilibrium across the fully decentralized agents with constantly evolving and tightly coupled policies, where each agent is confronted with a non-stationary environment. To solve this problem, we propose an opponent modeling based double deep Q learning (OM-DDQN) algorithm, where each agent adopts opponent modeling to effectively predict the trading strategies of other agents in the network. Simulation results demonstrate that, compared with the baseline algorithms, the proposed algorithm can achieve a win-win resource trading outcome that not only enhances the UAV's profit but also reduces the MUs' costs.
Real-time strategy game online adversarial planning is a challenging problem in the field of multi-agent learning.In the process of game confrontation,in the face of an uncertain threat environment ...and non-stationary opponents,the agent needs to reason about the opponent's actions within a limited time according to the game situation,make your own action plan quickly and perform adversarial planning in the huge state space and action space.The real-time strategy game platform is an ideal testbed for studying online adversarial planning problems.This paper firstly uses a typical real-time strategy game model to elicit the real-time strategy game confrontation problems,and classifies them into three levels and two operation control methods,and sorts out the five challenges faced from five sub-directions.Secondly,the current online adversarial planning methods are comprehensively reviewed and analyzed from three perspectives of tactical adversarial planning,strategic adversarial planning and mixed adversarial pl
Electronic negotiation experiments provide a rich source of information about relationships between the negotiators, their individual actions, and the negotiation dynamics. This information can be ...effectively utilized by intelligent agents equipped with adaptive capabilities to learn from past negotiations and assist in selecting appropriate negotiation tactics. This paper presents an approach to modeling the negotiation process in a time-series fashion using artificial neural network. In essence, the network uses information about past offers and the current proposed offer to simulate expected counter-offers. On the basis of the model’s prediction, “what-if” analysis of counter-offers can be done with the purpose of optimizing the current offer. The neural network has been trained using the Levenberg–Marquardt algorithm with Bayesian Regularization. The simulation of the predictive model on a testing set has very good and highly significant performance. The findings suggest that machine learning techniques may find useful applications in the context of electronic negotiations. These techniques can be effectively incorporated in an intelligent agent that can sense the environment and assist negotiators by providing predictive information, and possibly automating some negotiation steps.