Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
Game trees have been utilized as a formal representation of adversarial planning scenarios such as two-player zero-sum games like chess [1, 2]. When using stochastic leaf values based on Bernoulli...Show moreGame trees have been utilized as a formal representation of adversarial planning scenarios such as two-player zero-sum games like chess [1, 2]. When using stochastic leaf values based on Bernoulli trials to model noisy game trees, a challenging task is to solve the Monte Carlo Tree Search (MCTS) problem of identifying a best move under uncertainty. Confidence bound algorithms are investigated as one solution, with focus on the FindTopWinner algorithm by Teraoka, Hatano, and Takimoto [3], which uses (a) the minimax rule to evaluate the game tree by alternately minimizing and maximizing over the values associated with each move, (b) Hoeffding’s Inequality to estimate sample size requirements by fixing precision and error probability, and (c) an epoch-wise pruning regime to reduce investment on suboptimal nodes. We experimented on this algorithm by equipping it with methods that are based on (i) Bernstein’s Inequality to create a tighter confidence bound [4], (ii) the Law of the Iterated Logarithm (LIL) to sample in single-sample steps, allowing for exact pruning and stopping [5, 6], and (iii) a combination of both. An empirically-derived Hoeffding-based Iterated-Logarithm confidence bound will be proposed in a fully refurbished FindTopWinner algorithm, which achieved much better performance in terms of samples required to find a best move, whereas the Bernstein-based approaches did not fare better than the original by Teraoka et al. [3]. Possible reasons such as limited, more asymptotic advantages for Bernstein-based algorithms will be discussed and the recommended parameter space for the empirically-derived Hoeffding-based confidence bound will be provided.Show less
Sparse Coding is an unsupervised method for learning a basis which can be used to represent the data using only a few basis vectors. In recent years, this approach has successfully been applied to...Show moreSparse Coding is an unsupervised method for learning a basis which can be used to represent the data using only a few basis vectors. In recent years, this approach has successfully been applied to signal processing and classification. However, the performance of Sparse Coding deteriorates when tested on data that follows a distribution different from the data on which it was trained. This has led to the development of models that make Sparse Coding more robust against differences between training and test data. Distinct efforts in supervised Sparse Coding have resulted in Sparse Coding models that are optimized for prediction problems. This thesis builds on both lines of research in order to develop a Sparse Coding model that learns from different datasets and simultaneously allows direct optimization for a classification task. Robustness against dataset differences is achieved by transforming dissimilar samples to a shared representation. Direct optimization of the model is achieved by backpropagation of the classification error, which is made possible by switching from the conventional non-differentiable L1-regularization to a differentiable alternative. Effectiveness of the proposed model is demonstrated by applying it to classification of handwritten digits from different datasets.Show less
We study covariate selection for genome-wide gene expression data, by comparing three methods, one of them introduced in this thesis. All methods apply the multi-task principle to the regression of...Show moreWe study covariate selection for genome-wide gene expression data, by comparing three methods, one of them introduced in this thesis. All methods apply the multi-task principle to the regression of the different genes. The first method detects which covariates are important for all the genes and uses only these covariates for the regression. The second uses Akaike Information Criterion for each gene independently. The third and new method introduces a general trend over all the genes. This trend describes the averaged influence of each covariates on the different genes. In its original formulation this method is computationally intractable, but we derive a reduction to a LASSO problem, which van be solved efficiently. We report experiments on simulated and real data sets which demonstrate that the new method gives the best results if there is a general trend over all the genes, where one of the other methods is preferable if there is not a trend or if there are several trends. If there are several trends the new method orders the covariates according to the mean influence, hereby different trends can cancel each other out. The first two methods on the other hand are insensitive to different trends. We analyse gene expression data of human embryos and find that there are five groups of genes with their own general trend.Show less
Mini-max is a concept often used for solving games. The idea behind it is a constant alternation of minimizing and maximizing the value of moves to account for an adversarial opponent in the game....Show moreMini-max is a concept often used for solving games. The idea behind it is a constant alternation of minimizing and maximizing the value of moves to account for an adversarial opponent in the game. Lots of established methods have been developed to allow computers to play games like Chess [2] and Go [12]. Many of these methods involve evaluating sequences of moves to determine the best move to play. Because games like Chess and Go are quite big, it is infeasible to evaluate all possible sequences, so we resort to algorithms that pick sequences selectively, collected under the name Monte Carlo Tree Search [5]. These methods, in their quest to find the best move, already try to play as optimal as possible while figuring out the best move. We think that by letting go of the desire to only sample good sequences, and instead only caring for a good conclusion on the best move, we can improve on current algorithms. We do this by adapting Best-Arm Identification’s objective to fit a Mini-max structure: Mini-max Action Identification. We believe that this has not been done before. In Section 2 we will establish the framework and details of Mini-max Action Identification. We define the problem of finding an optimal algorithm in two ways: In Section 3 we define the problem as the algorithm that provides the best guaranteed performance on the hardest set of parameters. In Section 4 the problem will be based on parameters following a fixed distribution. Further algorithms will be provided in Section 5. In this Section the algorithms will be compared in their performance as well. Lastly we present some findings on the worst-case set of parameters in Section 6, providing proofs on a couple of these findings.Show less
Approximation properties and improper learning of deep learning models with the rectifier activation function are studied. Explicit approximation bounds are given for single-layer networks and a...Show moreApproximation properties and improper learning of deep learning models with the rectifier activation function are studied. Explicit approximation bounds are given for single-layer networks and a class of target functions with a smoothness property related to the Fourier transform. An alternative kernel is proposed for the kernel method [Zhang, Y., Lee, J. D. and Jordan, M. I. (2015). `1- regularized neural networks are improperly learnable in polynomial time.] that enables improper learning of single-layer neural networks with one-dimensional input and bounded weights.Show less
In this study the performance of feature-based dissimilarity space(FDS) classification is evaluated by comparing it to conventional classification techniques. In FDS classification a classifier is...Show moreIn this study the performance of feature-based dissimilarity space(FDS) classification is evaluated by comparing it to conventional classification techniques. In FDS classification a classifier is trained by using a dissimilarity space instead of a feature vector space. Since FDS classification is applied in a wide range of classifiers a new and model independent dissimilarity feature selection method is presented and tested. The fundamentals of this newly proposed selection method are given by the compactness hypothesis(Arkadev and Braverman, 1966). The performance of this newly proposed dissimilarity feature selection technique is evaluated by a Monto-Carlo simulation experiment and a bootstrap study. The performance of FDS classification is evaluated by comparing it to the performance of conventional classification techniques. The performance of FDS classification is estimated by using a bootstrap procedure. The results indicate that FDS classification is beneficial in combination with a linear classifier and a complex classification task. Due to the combination of a linear classifier and FDS classification a linear decision boundary is fitted in a dissimilarity space. This decision boundary becomes non-linear in the original feature vector space.Show less
Introduction Action recognition is an important task for domestic care robots. Current action recognition literature exclusively studies closed-set recognition problems, where performance is...Show moreIntroduction Action recognition is an important task for domestic care robots. Current action recognition literature exclusively studies closed-set recognition problems, where performance is evaluated on videos which were also available in the training set. However, the real-world environment is by definition open, and it is both theoretically and practically infeasible to supply the robot with all necessary information beforehand. This work develops novelty detection methodology applicable to HMM-based classifiers, which have shown earlier success in action recognition. By filtering unknown actions instances, our novelty detection module increases system robustness in open environments, and is a first step towards adaptively learning robots. Methodology We first develop an ordinary action recognition system based on a new skeleton-derived feature and a HMM back-end classifier. The HMM system is estimated in three ways: clustered, Expectation-Maximization (EM) and Extended Baum-Welch (EBW). The latter is a discriminative training criterion, which could theoretically improve novelty detection accuracy. Since the EBW algorithm has only been implemented in speech recognition software, we wrote its first open-source implementation (in Matlab, publicly available from www.github.com/thomasmoerland/Thesis). Then, novelty detection is approached from both a posterior probability and hypothesis-testing perspective, which we unify as background models. Since novelty detection for action recognition has not been reported before, we investigate a diverse set of background models: sum over competing models, filler models, flat models, anti-models, reweighted anti-models, and some combinations of them. Results Our HMM classification system has around 95% closed-set recognition accuracy on the Microsoft Action 3D dataset, which is near the state-of-the-art. Performance did not di↵er between the clustered, EM an EBW estimation methods, although our results do indicate the latter two might be beneficial on more challenging datasets. The optimal novelty detection module combining anti-models with flat models had 78% novelty accuracy, while maintaining 78% recognition accuracy as well. Novelty detection results were consistent over various dataset splits. Discriminative training did not alter novelty detection performance. Conclusion We are the first to study novelty detection for action recognition. Our results could increase system robustness in an open-set real-world environment, and furthermore serve as a first step towards an adaptively learning robot.Show less