Teaching computer programs to play games through machine learning has been an important way to achieve better artificial intelligence (AI) in a variety of real-world applications. **Monte** **Carlo** **Tree** **Search** (MCTS) is one of the key AI techniques developed recently that enabled AlphaGo to defeat a legendary professional Go player. What makes MCTS particularly attractive is that it only understands the basic rules of the game and does not rely on expert-level knowledge. Researchers thus expect that MCTS can be applied to other com- plex AI problems where domain-specific expert-level knowledge is not yet available. So far there are very few analytic studies in the literature. In this pa- per, our goal is to develop analytic studies of MCTS to build a more funda- mental understanding of the algorithms and their applicability in complex AI problems. We start with a simple version of MCTS, called random playout **search** (RPS), to play Tic-Tac-Toe, and find that RPS may fail to discover the correct moves even in a very simple game position of Tic-Tac-Toe. Both the probability analysis and simulation have confirmed our discovery. We con- tinue our studies with the full version of MCTS to play Gomoku and find that while MCTS has shown great success in playing more sophisticated games like Go, it is not effective to address the problem of sudden death/win. The main reason that MCTS often fails to detect sudden death/win lies in the random playout **search** nature of MCTS, which leads to prediction distortion. There- fore, although MCTS in theory converges to the optimal minimax **search**, with real world computational resource constraints, MCTS has to rely on RPS as an important step in its **search** process, therefore suffering from the same fun- damental prediction distortion problem as RPS does. By examining the de- tailed statistics of the scores in MCTS, we investigate a variety of scenarios where MCTS fails to detect sudden death/win. Finally, we propose an im- proved MCTS algorithm by incorporating minimax **search** to overcome pre- diction distortion. Our simulation has confirmed the effectiveness of the pro- posed algorithm. We provide an estimate of the additional computational How to cite this paper: Li, W. (2018)

Show more
34 Read more

Artificial intelligence in games serves as an excellent plat- form for facilitating collaborative research with undergradu- ates. This paper explores several aspects of a research chal- lenge proposed for a newly-developed variant of a solitaire game. We present multiple classes of game states that can be identified as solvable or unsolvable. We present a heuris- tic for quickly finding goal states in a game state **search** **tree**. Finally, we introduce a **Monte** **Carlo** **Tree** **Search**-based player for the solitaire variant that can win almost any solv- able starting deal efficiently.

Show more
However, this problem is intrinsically difficult be- cause it is hard to encode what to say into a sentence while ensuring its syntactic correctness. We propose to use **Monte** **Carlo** **tree** **search** (MCTS) (Kocsis and Szepesvari, 2006; Browne et al., 2012), a stochastic **search** algorithm for decision processes, to find an optimal solution in the decision space. We build a **search** **tree** of possible syntactic trees to generate a sentence, by selecting proper rules through numer- ous random simulations of possible yields.

The central theme of this paper is the use of multiple determinized trees as a means of dealing with imperfect information in a MCTS **search** and we have shown that this approach provides significant benefits in playing strength, becoming competitive with a sophisticated expert rules player with a simulation budget of less than one CPU second on standard hardware, despite having no access to expert knowl- edge. In addition to that we have presented a wide variety of enhancements to the determinized trees and analysed the effect on playing strength that each enhancement offers. All of these enhancements show further improvement. We investigated a modification of the structure of the decision **tree** to a binary **tree**, well suited to M:TG where decisions amount to the choice of a subset of cards from a small set, rather than an individual card. As well as providing significant improvements in playing strength, the binary **tree** representation substantially reduced CPU time per move. Dominated move pruning used limited domain knowledge, of a type applicable to a wide variety of games involving subset choice, to significantly reduce the branching factor within the **tree**. Another promising approach maintained pressure on the **Monte** **Carlo** **Tree** **Search** algorithm by choosing “interesting” determinizations which were balanced between the two players. An enhancement which used decaying reward to encourage delaying moves when behind had some positive effect, but was not as effective as the preceding three enhancements.

Show more
19 Read more

In recent years there has been much interest in the **Monte** **Carlo** **Tree** **Search** (MCTS) algorithm. In 2006 it was a new, adaptive, randomized optimization algo- rithm [Cou06, KS06]. In fields as diverse as Artificial Intelligence, Operations Re- **search**, and High Energy Physics, research has established that MCTS can find valu- able approximate answers without domain-dependent heuristics [KPVvdH13]. The strength of the MCTS algorithm is that it provides answers with a random amount of error for any fixed computational budget [GBC16]. Much effort has been put into the development of parallel algorithms for MCTS to reduce the running time. The ef- forts are applied to a broad spectrum of parallel systems; ranging from small shared- memory multi-core machines to large distributed-memory clusters. In the last years, parallel MCTS played a major role in the success of AI by defeating humans in the game of Go [SHM + 16, HS17].

Show more
196 Read more

Abstract— We are addressing the course timetabling problem in this work. In a university, students can select their favorite courses each semester. Thus, the general requirement is to allow them to attend lectures without clashing with other lectures. A feasible solution is a solution where this and other conditions are satisfied. Constructing reasonable solutions for course timetabling problem is a hard task. Most of the existing methods failed to generate reasonable solutions for all cases. This is since the problem is heavily constrained and an e ﬀ ective method is required to explore and exploit the **search** space. We utilize **Monte** **Carlo** **Tree** **Search** (MCTS) in finding feasible solutions for the first time. In MCTS, we build a **tree** incrementally in an asymmetric manner by sampling the decision space. It is traversed in the best-first manner. We propose several enhancements to MCTS like simulation and **tree** pruning based on a heuristic. The performance of MCTS is compared with the methods based on graph coloring heuristics and Tabu **search**. We test the solution methodologies on the three most studied publicly available datasets. Overall, MCTS performs better than the method based on graph coloring heuristic; however, it is inferior compared to the Tabu based method. Experimental results are discussed.

Show more
Abstract—**Monte**-**Carlo** **Tree** **Search** (MCTS) is a best-ﬁrst **search** where the pseudorandom simulations guide the solution of problem. Recent improvements on MCTS have produced strong computer Go pro- gram, which has a large **search** space, and the suc- cess is a hot topic for selecting the best move. So far, most of reports about MCTS have been on two- player game, and MCTS has been applied rarely in one-player games. MCTS does not need an admis- sible heuristic, so the application of MCTS for one- player games might be an interesting alternative. Ad- ditionally, one-player games changed its situation by player’s decision like puzzles are describable as net- work diagrams like PERT with the representation of interdependences between each operation. Therefore if MCTS for one-player games is developed as a meta- heuristic algorithm, we would use this for not only many practical problems, but also combinatorial op- timization problems. This paper investigated the ap- plication of Single Player MCTS (SP-MCTS) intro- duced by Schadd et al. to a puzzle game called Bubble Breaker. Next this paper showed the eﬀectiveness of new simulation strategies on SP-MCTS by numerical experiments, and found the diﬀerences between the **search** methods and their parameters. Based on the results, this paper discussed the application potential- ity of SP-MCTS for a practical scheduling problem.

Show more
This chapter presents the historical motivation for our involvement in the topic of hierarchical bandits. It starts with an experimen- tal success: UCB-based bandits (see the previous chapter) used in a hierarchy demonstrated impressive performance for performing **tree** **search** in the ﬁeld of Computer Go, such as in the Go programs Crazy- Stone [Coulom, 2006] and MoGo [Wang and Gelly, 2007, Gelly et al., 2006]. This impacted the ﬁeld of **Monte**-**Carlo**-**Tree**-**Search** (MCTS) [Chaslot, 2010, Browne et al., 2012] which provided a simulation-based approach to game programming and has also been used in other se- quential decision making problems. However, the analysis of the pop- ular UCT (Upper Conﬁdence Bounds applied to Trees) algorithm [Kocsis and Szepesvári, 2006] have been a theoretical failure: the al- gorithm may perform very poorly (much worse than a uniform **search**) on toy problems and does not possess nice ﬁnite-time performance guar- antees (see [Coquelin and Munos, 2007]).

Show more
133 Read more

In order to address problems with larger **search** spaces, we must turn to alternative methods. **Monte** **Carlo** **tree** **search** (MCTS) has had a lot of success in Go and in other appli- cations [2] [1]. MCTS eschews the typical brute force **tree** searching methods, and utilizes statistical sampling instead. This makes MCTS a probabilistic algorithm. As such, it will not always choose the best action, but it still performs rea- sonably well given sufficient time and memory. MCTS per- forms lightweight simulations that randomly select actions. These simulations are used to selectively grow a game **tree** over a large number of iterations. Since these simulations do not take long to perform, it allows MCTS to explore **search** spaces quickly. This is what gives MCTS the advantage over deterministic methods in large **search** spaces.

Show more
While common αβ-algorithms generally do not give satisfactory results, there have been several at- tempts based on **Monte**-**Carlo** **Tree** **Search** (MCTS) (Coulom, 2007; Kocsis and Szepesv´ ari, 2006). The MCTS algorithm does random simulations of the game in order to determine which move to play. It consists of 4 steps which are repeated until no time is left (Chaslot et al., 2008). The first step is selection, in which a node is selected that is not part of the MCTS **tree** yet. The node is then added to the **tree** in the expansion step. The third step is simulation (also called play-out), in which a game is simulated from that node until the end by playing random moves. The last step is backpropagation, where the result of the simulated game is backpropagated up to the root of the MCTS **tree**. MCTS is discussed in detail in Chapter 2.

Show more
52 Read more

remaining computations. Accordingly, for the game of Go there exists a large number of publications about the design of such policies, e.g., [43][27][28]. One of the objectives playout designers pursue focuses on balancing simulations to prevent biased evaluations [43][93][51]. Simulation balancing targets at ensuring that the policy generates moves of equal quality for both players in any situation. Hence, adding domain knowledge to the playout policy for attacking also necessitates adding domain knowledge for according defense moves. One of the greatest early improvements in **Monte**-**Carlo** Go were sequence-like playout policies [43] that highly concentrate on local answer moves. They lead to a very selective **search**. Further concentration on local attack and defense moves improved the handling of some tactical fights and hereby contributed to additional strength gain of MCTS programs. However, adding more and more specific domain knowledge with the result of increasingly selective playouts, we open the door for more imbalance. This in turn allows for severe false estimates of position values. Accordingly, the correct evaluation of, e.g., semeai is still considered to be extremely challenging for MCTS based Go programs [81]. This holds true, especially when they require long sequences of correct play by either player. In order to face this issue, we **search** for a way to make MCTS become aware of probably biased evaluations due to the existence of semeai or groups with uncertain status. In this chapter we present our results about the analysis of score histograms to infer information about the presence of groups with uncertain status. We heuristically locate fights on the Go board and estimated their corresponding relevance for winning the game. The developed heuristic is not yet used by the MCTS **search**. Accordingly, we cannot definitely specify and empirically prove the benefit of the proposed heuristic in terms of playing strength. We further conducted experiments with our MCTS Computer Go engine Gomorra on a number of 9 × 9 game positions that are known to be difficult to handle by state-of-the-art Go programs. All these positions include two ongoing capturing fights that were successfully recognized and localized by Gomorra using the method presented in the remainder of this chapter.

Show more
147 Read more

These measurements, averaged across all 5000 deals, are pre- sented in Table II. It should be noted that these measurements are a function of the deal; the Þ rst measurement is exact for each deal, while the second depends on the sampled deter- minizations. These measurements were made only for the non-Landlord players since the playing strength experiments in Section VII-B were conducted from the point of view of the Landlord player. This means the algorithms tested always had the same number of branches at nodes where the Land- lord makes a move, since the Landlord can see his cards in hand. The Þ rst measurement is an indicator for the number of branches that may be expected at opponent nodes for the cheating UCT player as the Landlord. Similarly, the second measurement indicates the number of branches for opponent nodes with determinized UCT as the Landlord. Both of these measurements are upper bounds, since if an opponent has played any cards at all then the number of leading plays will be smaller. The third, fourth, and Þ fth measurements indicate how many expansions ISMCTS will be making at opponent nodes after a certain number of visits, since a new determinization is used on each iteration. Again this measurement is an upper bound since only one move is actually added per iteration and if there were moves unique to a determinization which were never seen again, only one of them would be added to the **tree**.

Show more
25 Read more

● Interesting under strong time constraints ● Result: 60-40%.. Pondering[r]

103 Read more

The idea of updating a **tree** by adding leaves dates back to at least Felsenstein (1981), in which he describes, for maxi- mum likelihood estimation, that an effective **search** strategy in **tree** space is to add species one by one. More recent work also makes use of the idea of adding sequences one at a time: ARGWeaver (Rasmussen et al. 2014) uses this approach to initialise MCMC on (in this case, a space of graphs), t + 1 sequences using the output of MCMC on t sequences, and TreeMix (Pickrell and Pritchard 2012) uses a similar idea in a greedy algorithm. In work conducted simultaneously to our own, Dinh et al. (2018) also propose a sequential **Monte** **Carlo** approach to inferring phylogenies in which the sequence of distributions is given by introducing sequences one by one. However, their approach: uses different proposal distribu- tions for new sequences; does not infer the mutation rate simultaneously with the **tree**; does not exploit intermediate distributions to reduce the variance; and does not use adaptive MCMC moves. Further investigation of their approach can be found in Fourment et al. (2018), where different guided proposal distributions are explored but that still presents the aforementioned limitations.

Show more
14 Read more

Samples are not directly picked from the transition kernel P , they are instead sampled from the proposal distribution q(θ 0 |θ), yet dismissing a portion of the sam- ples drawn. Usually, evaluating P is not feasible since most of the time it is in- tractable. It is worth noting that the choice of the proposal distribution is essential to the extent that the statistical features of the Markov chain strongly depend on this decision. An inadequate decision brings about potentially poor performance of the **Monte** **Carlo** estimators. To be specific, the efficiency of the MH algorithm de- pends on the quality of the selected standard deviation of the proposal distribution. On the unlikely case that it is excessively limited, it will generate samples which are closely correlated and mixing slowly. Then again, in case it is excessively wide, the rejection rate can be remarkably high resulting in the repetition of the old param- eters during the new iterations. Generally, samples created by the MCMC should be mixed well such that they effectively overlook their previous values. Specifically, it is necessary to choose a logical proposal variance to guarantee well-mixing. It is simple to demonstrate that the samples produced by MH algorithm will converge asymptotically to those drawn from the target distribution, as shown by Andrieu et al. (2003).

Show more
142 Read more

Multiple hypothesis testing is widely used to evaluate scientific studies involving statis- tical tests. However, for many of these tests, p-values are not available and are thus often approximated using **Monte** **Carlo** tests such as permutation tests or bootstrap tests. This ar- ticle presents a simple algorithm based on Thompson Sampling to test multiple hypotheses. It works with arbitrary multiple testing procedures, in particular with step-up and step-down procedures. Its main feature is to sequentially allocate **Monte** **Carlo** eﬀort, generating more **Monte** **Carlo** samples for tests whose decisions are so far less certain. A simulation study demonstrates that for a low computational eﬀort, the new approach yields a higher power and a higher degree of reproducibility of its results than previously suggested methods.

Show more
14 Read more

Moreover, the import of the Hamiltonian flow elevates the integration time as a fundamental parameter, with the integrator step size accompanying the use of an approximate flow. A common error in empirical optimizations is to reparameterize the integration time and step size as the number of integrator steps, which can obfuscate the optimal settings. For example, Wang, Mohamed and de Freitas (2013) use Bayesian optimization methods to derive an adaptive Hamiltonian **Monte** **Carlo** implementation, but they optimize the integrator step size and the number of integrator steps over only a narrow range of values. This leads not only to a narrow range of short integration times that limits the efficacy of the Hamiltonian flow, but also a step size-dependent range of integration times that skew the optimization values. Empirical optimizations of Hamiltonian **Monte** **Carlo** are most productive when studying the fundamental parameters directly.

Show more
41 Read more

The term ‘‘quantum **Monte** **Carlo**’’ covers several dif- ferent techniques based on random sampling. The sim- plest of these, variational **Monte** **Carlo** (VMC), uses a stochastic integration method to evaluate expectation values for a chosen trial wave function. In a system of 1000 electrons the required integrals are 3000 dimen- sional, and for such problems **Monte** **Carlo** integration is much more efficient than conventional quadrature methods such as Simpson’s rule. The main drawback of VMC is that the accuracy of the result depends entirely on the accuracy of the trial wave function. The other method we consider, diffusion **Monte** **Carlo** (DMC), overcomes this limitation by using a projection tech- nique to enhance the ground-state component of a start- ing trial wave function. During the last ten years it has become clear that VMC and DMC can produce very accurate ground-state expectation values for weakly cor- related solids. They can also provide some information about specific excited states, although not yet full exci- tation spectra.

Show more
51 Read more

plus the cost of curtailing renewable energy in case of excess production, subject to the constraint of ensuring reliable energy supply. A regression **Monte** **Carlo** method from the mathematical finance literature is used to solve this stochastic optimization problem numerically. Three variants of the regression algorithm, called grid discretization, Regress now and Regress later are proposed and compared in this paper. The numerical examples illustrate the performance of the optimal policies, provide insights on the optimal sizing of the battery, and compare the policies obtained by stochastic optimization to the industry standard, which uses deterministic policies.

Show more
22 Read more

We propose a novel class of Sequential **Monte** **Carlo** (SMC) algorithms, appropri- ate for inference in probabilistic graphical models. This class of algorithms adopts a divide-and-conquer approach based upon an auxiliary **tree**-structured decomposition of the model of interest, turning the overall inferential task into a collection of re- cursively solved sub-problems. The proposed method is applicable to a broad class of probabilistic graphical models, including models with loops. Unlike a standard SMC sampler, the proposed Divide-and-Conquer SMC employs multiple independent populations of weighted particles, which are resampled, merged, and propagated as the method progresses. We illustrate empirically that this approach can outperform standard methods in terms of the accuracy of the posterior expectation and marginal likelihood approximations. Divide-and-Conquer SMC also opens up novel parallel im- plementation options and the possibility of concentrating the computational effort on the most challenging sub-problems. We demonstrate its performance on a Markov ran- dom field and on a hierarchical logistic regression problem. Supplementary material including proofs and additional numerical results is available online.

Show more
30 Read more