Search results

Volkert, M. 2017

Confidence Bound Algorithms in Game Trees: Effective Monte Carlo Tree Search through Bernstein’s Inequality and the Law of the Iterated Logarithm

Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)

open access

Game trees have been utilized as a formal representation of adversarial planning scenarios such as two-player zero-sum games like chess [1, 2]. When using stochastic leaf values based on Bernoulli...Show moreGame trees have been utilized as a formal representation of adversarial planning scenarios such as two-player zero-sum games like chess [1, 2]. When using stochastic leaf values based on Bernoulli trials to model noisy game trees, a challenging task is to solve the Monte Carlo Tree Search (MCTS) problem of identifying a best move under uncertainty. Confidence bound algorithms are investigated as one solution, with focus on the FindTopWinner algorithm by Teraoka, Hatano, and Takimoto [3], which uses (a) the minimax rule to evaluate the game tree by alternately minimizing and maximizing over the values associated with each move, (b) Hoeffding’s Inequality to estimate sample size requirements by fixing precision and error probability, and (c) an epoch-wise pruning regime to reduce investment on suboptimal nodes. We experimented on this algorithm by equipping it with methods that are based on (i) Bernstein’s Inequality to create a tighter confidence bound [4], (ii) the Law of the Iterated Logarithm (LIL) to sample in single-sample steps, allowing for exact pruning and stopping [5, 6], and (iii) a combination of both. An empirically-derived Hoeffding-based Iterated-Logarithm confidence bound will be proposed in a fully refurbished FindTopWinner algorithm, which achieved much better performance in terms of samples required to find a best move, whereas the Bernstein-based approaches did not fare better than the original by Teraoka et al. [3]. Possible reasons such as limited, more asymptotic advantages for Bernstein-based algorithms will be discussed and the recommended parameter space for the empirically-derived Hoeffding-based confidence bound will be provided.Show less

Leiden University Student Repository

Refine Results

Availability

Faculty

Thesis type

Programme

Issued

Supervisor

Language

Your search

Enabled Filters

Sort