A Survey of Monte Carlo Tree Search Methods Browne, C. B.; Powley, E.; Whitehouse, D. ...
IEEE transactions on computational intelligence and AI in games.,
2012-March, 2012-03-00, Volume:
4, Issue:
1
Journal Article
Open access
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its ...spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work.
Monte Carlo algorithms are one of the three main reinforcement learning paradigms that are capable of efficiently solving control and decision problems in dynamic environments. Through sampling they ...shape the values of states in the search space. Based on these values they develop an exploration policy that is in turn used to guide the future direction of sampling. Studies confirm the convergence of this interleaving iterative approach to an optimal solution; however, when a learning agent lacks prior knowledge of the problem domain, the convergence rate may be extremely slow in case of an erroneous staring policy that causes far-from-optimal value estimates. In this paper we present a brief overview of Monte Carlo control algorithms in the scope of reinforcement learning and propose a method to improve the convergence by gradually forgetting early estimates. Our method keeps track of the state values with a moving average that gives a higher weight to the recent rewards and discounts the weight of the previous rewards, while assuming that the policy is improving over time. We apply it to the general on-policy Monte Carlo control algorithm and to the popular upper confidence bounds for trees algorithm in the Monte Carlo tree search framework. The evaluation on several decision problems confirms that our method regularly improves the convergence rate of both algorithms and in some cases also their final policy.
Monte Carlo Tree Search (MCTS) is a decision-making technique that has received considerable interest in the past decade due to its success in a number of domains. In this paper, we explore its ...application in the “Diplomacy” multi-agent strategic board game, by putting forward and evaluating eight (8) variants of MCTS Diplomacy agents. In the core of our MCTS agents lies the well-known Upper Confidence Bounds for Trees (UCT) bandit method, which attempts to strike a balance between exploration and exploitation during the search tree creation. Moreover, we devised a heuristic weighting system for prioritizing the tree nodes’ actions, and used it to effectively incorporate high-quality domain knowledge in some of our agents. We provide a thorough experimental evaluation of our approach, in which we systematically compare the performance of our agents against each other and against other opponents, including the state-of-the-art Diplomacy agent, DBrane. Our results verify that several of our agents are highly competitive in this domain, exhibiting as they do performance which is comparable to, and in some instances superior to, that of DBrane. Interestingly, the MCTS approach consistently outperforms all others in tournaments in which one MCTS agent faces one D-Brane agent and several other opponents.
Monte Carlo tree search (MCTS) has been recently very successful for game playing, particularly for games where the evaluation of a state is difficult to compute, such as Go or General Games. We ...compare nested Monte Carlo (NMC) search, upper confidence bounds for trees (UCT-T), UCT with transposition tables (UCT+T), and a simple combination of NMC and UCT+T (MAX) on single-player games of the past General Game Playing (GGP) competitions. We show that transposition tables improve UCT and that MAX is the best of these four algorithms. Using UCT+T, the program Ary won the 2009 GGP competition. MAX and NMC are slight improvements over this 2009 version.
Symbolic Execution is a widely used technique for program testing and analysis. When a program executes a trace symbolically, it simulates all possible paths. This results in an exponential growth of ...the number of states within the problem, which is commonly referred to as "path explosion." We therefore propose novel strategies that only require limited resources to give priority to more valuable paths. In this paper, we utilize a method based on the Monte Carlo tree search (MCTS) strategy to resolve the problem. We then compare the proposed MCTS-based strategy with other methods such as depth-first search (DFS) and breadth-first search (BFS). We also perform different scales of experiments based on time and space resource constraints. For smaller test cases, we found that MCTS performs on average 1.4 times better than BFS and DFS in terms of the block discovery rate. In addition, for larger test cases, MCTS performs on average 2.8 times better than DFS and BFS in terms of the block discovery rate.
A Problem Case for UCT Browne, C.
IEEE transactions on computational intelligence and AI in games.,
03/2013, Volume:
5, Issue:
1
Journal Article
This paper examines a simple 5 × 5 Hex position that not only completely defeats flat Monte Carlo search, but also initially defeats plain upper confidence bounds for trees (UCT) search until an ...excessive number of iterations are performed. The inclusion of domain knowledge during playouts significantly improves UCT performance, but a slight negative effect is shown for the rapid action value estimate (RAVE) heuristic under some circumstances. This example was drawn from an actual game during standard play, and highlights the dangers of relying on flat Monte Carlo and unenhanced UCT search even for rough estimates. A brief comparison is made with RAVE failure in Go.