We introduce Re-determinizing IS-MCTS, a novel extension of Information Set Monte Carlo Tree Search (IS-MCTS) that prevents a leakage of hidden information into opponent models that can occur in ...MCTS, and is particularly severe in Hanabi. When combined with a learned evaluation function to estimate leaf node values and avoid full simulations during MCTS this won the Hanabi competition at the Computation Intelligence in Games (CIG) 2018.
The adversarial character of real-time strategy (RTS) games is one of the main sources of uncertainty within this domain. Since players lack exact knowledge about their opponent's actions, they need ...a reasonable representation of alternative possibilities and their likelihood. In this article we propose a method of predicting the most probable combination of units produced by the opponent during a certain time period. We employ a logic programming paradigm called Answer Set Programming, since its semantics is well suited for reasoning with uncertainty and incomplete knowledge. In contrast with typical, purely probabilistic approaches, the presented method takes into account the background knowledge about the game and only considers the combinations that are consistent with the game mechanics and with the player's partial observations. Experiments, conducted during different phases of StarCraft: Brood War and Warcraft III: The Frozen Throne games, show that the prediction accuracy for time intervals of 1-3 min seems to be surprisingly high, making the method useful in practice. Root-mean-square error grows only slowly with increasing prediction intervals-almost in a linear fashion.
Agent modelling is a critical aspect of many artificial intelligence systems. Many different techniques are used to learn the tendencies of another agent, though most suffer from a slow learning ...time. The research proposed in this paper examines stereotyping as a method to improve the learning time of poker playing agents. Poker is a difficult domain for opponent modelling due to its hidden information, stochastic elements and complex strategies. However, the literature suggests there are clusters of similar poker strategies, making it an ideal environment to test the effectiveness of stereotyping. This paper presents a method for using stereotyping in a poker bot, and shows that stereotyping improves performance in early-match play in many scenarios.
The main goal of agent modelling is to extract and represent the knowledge about the behaviour of other agents. Nowadays, modelling an agent in multi-agent systems is increasingly becoming more ...complex and significant. Also, robotic soccer domain is an interesting environment where agent modelling can be used. In this paper, we present an approach to classify and compare the behaviour of a multi-agent system using a coach in the soccer simulation domain of the RoboCup.
This thesis investigates decision-making in two-player imperfect information games against opponents whose actions can affect our rewards, and whose strategies may be based on memories of ...interaction, or may be changing, or both. The focus is on modelling these dynamic opponents, and using the models to learn high-reward strategies. The main contributions of this work are: 1. An approach to learn high-reward strategies in small simultaneous-move games against these opponents. This is done by using a model of the opponent learnt from sequence prediction, with (possibly discounted) rewards learnt from reinforcement learning, to lookahead using explicit tree search. Empirical results show that this gains higher average rewards per game than state-of-the-art reinforcement learning agents in three simultaneous-move games. They also show that several sequence prediction methods model these opponents effectively, supporting the idea of using them from areas such as data compression and string matching; 2. An online expectation-maximisation algorithm that infers an agent's hidden information based on its behaviour in imperfect information games; 3. An approach to learn high-reward strategies in medium-size sequential-move poker games against these opponents. This is done by using a model of the opponent learnt from sequence prediction, which needs its hidden information (inferred by the online expectation-maximisation algorithm), to train a state-of-the-art no-regret learning algorithm by simulating games between the algorithm and the model. Empirical results show that this improves the no-regret learning algorithm's rewards when playing against popular and state-of-the-art algorithms in two simplified poker games; 4. Demonstrating that several change detection methods can effectively model changing categorical distributions with experimental results comparing their accuracies to empirical distributions. These results also show that their models can be used to outperform state-of-the-art reinforcement learning agents in two simultaneous-move games. This supports the idea of modelling changing opponent strategies with change detection methods; 5. Experimental results for the self-play convergence to mixed strategy Nash equilibria of the empirical distributions of plays of sequence prediction and change detection methods. The results show that they converge faster, and in more cases for change detection, than fictitious play.
In this contribution we propose a class of strategies which focus on the game as well as on the opponent. Preference is given to the thoughts of the opponent, so that the strategy under investigation ...might be speculative. We describe a generalization of OM search, called (
D,d)-OM search, where
D stands for the depth of search by the player and
d for the opponent's depth of search. A known difference in search depth can be exploited by purposely choosing a suboptimal variation with the aim to gain a larger advantage than when playing the objectively best move. The difference in search depth may have the result that the opponent does not see the variation in sufficiently deep detail. We then give a pruning alternative for (
D,d)-OM search, denoted by
α-β
2
pruning. A best-case analysis shows that
α-β
2
prunes very efficiently, comparable to the efficiency of
α-β with regard to minimax. The effectiveness of the proposed strategy is confirmed by simulations using a game-tree model including an opponent model and by experiments in the domain of Othello.
One of the fundamental and challengeable research areas in Real Time Strategy (RTS) games is opponent modelling. Most current approaches to opponent modelling pretended inefficiency. They are either ...computationally expensive or required a numerous amount of online gameplays to start learn successful models. Unfortunately, most successful approaches also were game specific. They mainly depend on the expert's knowledge of the game. In this paper, a generic and adaptive opponent modelling approach for RTS games is proposed. It is a completely automated approach for learning the highly informative features of the opponent's behavior of any RTS game. Inspired by the case-based reasoning technique, a case base of different opponent models is constructed in the approach offline phase. The online phase (during gameplay) utilizes only this model base for opponent classification. To better cope with opponents that switch strategies, the approach keeps track of the performance after classification. To show how the proposed approach is beneficial, a case study called SPRING game case-study is presented.
The Benefits of Opponent Models in Negotiation Hindriks, Koen; Jonker, Catholijn M.; Tykhonov, Dmytro
2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology,
2009, Letnik:
2
Conference Proceeding
Information about the opponent is essential to improve automated negotiation strategies for bilateral multi-issue negotiation. In this paper we propose a negotiation strategy that exploits a ...technique to learn a model of opponent preferences in a single negotiation session. An opponent model may be used to achieve at least two important goals in negotiation. First, it can be used to recognize, avoid and respond appropriately to exploitation, which differentiates the strategy proposed from commonly used concession-based strategies. Second, it can be used to increase the efficiency of a negotiated agreement by searching for Pareto-optimal bids. A negotiation strategy should be efficient, transparent, maximize the chance of an agreement and should avoid exploitation. We argue that the proposed strategy satisfies these criteria and analyze its performance experimentally.
Many games require opponent modelling for optimal performance. The implicit learning and adaptive nature of evolutionary computation techniques offer a natural way to develop and explore models of an ...opponent's strategy without significant overhead. In this paper, we propose the use of genetic programming to play the game of Spoof, a simple guessing game of imperfect information. We discuss the technical details needed to equip a computer to play the game and report on experiments using this approach that demonstrate emergent adaptive behaviour. We further show that specialisation via adaptation is crucial to maximise winnings and that no general strategy will suffice against all opponents