Sparse Coding is an unsupervised method for learning a basis which can be used to represent the data using only a few basis vectors. In recent years, this approach has successfully been applied to...Show moreSparse Coding is an unsupervised method for learning a basis which can be used to represent the data using only a few basis vectors. In recent years, this approach has successfully been applied to signal processing and classification. However, the performance of Sparse Coding deteriorates when tested on data that follows a distribution different from the data on which it was trained. This has led to the development of models that make Sparse Coding more robust against differences between training and test data. Distinct efforts in supervised Sparse Coding have resulted in Sparse Coding models that are optimized for prediction problems. This thesis builds on both lines of research in order to develop a Sparse Coding model that learns from different datasets and simultaneously allows direct optimization for a classification task. Robustness against dataset differences is achieved by transforming dissimilar samples to a shared representation. Direct optimization of the model is achieved by backpropagation of the classification error, which is made possible by switching from the conventional non-differentiable L1-regularization to a differentiable alternative. Effectiveness of the proposed model is demonstrated by applying it to classification of handwritten digits from different datasets.Show less
In this thesis, we present the theory of test martingales. Whereas in traditional statistics, the p-value indicates the level of evidence against the null hypothesis, in martingale testing a...Show moreIn this thesis, we present the theory of test martingales. Whereas in traditional statistics, the p-value indicates the level of evidence against the null hypothesis, in martingale testing a betting strategy that allows one to make a (virtual) profit is seen as evidence against the null hypothesis. We extend the concept of test martingales of Shafer et al. (2011) to composite null hypotheses. We refer to these martingales as composite test martingales – they are the main innovation in this thesis. Using composite test martingales, we construct two martingale tests as an alternative for the Student’s t-test. These two tests appear to be the first published martingale tests that can differentiate the t-test hypotheses. The main result of this thesis concerns the Jeffreys Bayesian t-test. It was already known, experimentally, to be robust under optional stopping. Optional stopping refers to the practice of looking at observed experimental data in order to decide whether or not to continue testing. Robustness in this context means that the statistical method preserves its significance level even when optional stopping is employed. We prove the Jeffreys Bayesian t-test to be a martingale test. Therefore, it is robust under optional stopping. In general, under a composite null hypothesis, Bayesian tests are not robust under optional stopping.Show less
In network data analysis, community detection is a fundamental statistical problem. A popular approach to analyse such data networks is the stochastic block model. In this model connection...Show moreIn network data analysis, community detection is a fundamental statistical problem. A popular approach to analyse such data networks is the stochastic block model. In this model connection probabilities between nodes only depend on the communities they belong to. This thesis is concerned with a two-stage method that achieves optimal misclassification proportion in the stochastic block model. Even though the stochastic block model is simple and models the group structure, it lacks flexibility and is not realistic. The reason is that models generated with the stochastic block model miss the generation of hubs which are nodes with a high degree. Hubs are however often observed in network data. Indeed, lots of real life models have a cumulative advantage where ”the rich get richer”. The preferential attachment model models this advantage by adding a degree dependent term in the connection probabilities, but does not include any group structure. This thesis contributes to a previous result for misclassification proportion in stochastic block models by adding in a degree dependence to the connection probability which changes over time. This allows models different from the stochastic block model to be used. In the thesis we propose a model which combines the stochastic block model with the preferential attachment model to allow the modeling of hubs in networks. For this model we prove that under some conditions the average edge degrees in this model are comparable to those in the stochastic block model. We achieve this by bounding the expected degree of the time dynamic model in terms of the stochastic block model which requires techniques like martingale concentration inequalities and probabilistic approximations. First we define the stochastic block model in Section 2. In Section 3 we summarize previous results which uses the stochastic block model as a basis. After that we introduce the preferential attachment model in Section 4 which we use to make our own time dynamic stochastic block model in Section 5.Show less
We study covariate selection for genome-wide gene expression data, by comparing three methods, one of them introduced in this thesis. All methods apply the multi-task principle to the regression of...Show moreWe study covariate selection for genome-wide gene expression data, by comparing three methods, one of them introduced in this thesis. All methods apply the multi-task principle to the regression of the different genes. The first method detects which covariates are important for all the genes and uses only these covariates for the regression. The second uses Akaike Information Criterion for each gene independently. The third and new method introduces a general trend over all the genes. This trend describes the averaged influence of each covariates on the different genes. In its original formulation this method is computationally intractable, but we derive a reduction to a LASSO problem, which van be solved efficiently. We report experiments on simulated and real data sets which demonstrate that the new method gives the best results if there is a general trend over all the genes, where one of the other methods is preferable if there is not a trend or if there are several trends. If there are several trends the new method orders the covariates according to the mean influence, hereby different trends can cancel each other out. The first two methods on the other hand are insensitive to different trends. We analyse gene expression data of human embryos and find that there are five groups of genes with their own general trend.Show less
Mini-max is a concept often used for solving games. The idea behind it is a constant alternation of minimizing and maximizing the value of moves to account for an adversarial opponent in the game....Show moreMini-max is a concept often used for solving games. The idea behind it is a constant alternation of minimizing and maximizing the value of moves to account for an adversarial opponent in the game. Lots of established methods have been developed to allow computers to play games like Chess [2] and Go [12]. Many of these methods involve evaluating sequences of moves to determine the best move to play. Because games like Chess and Go are quite big, it is infeasible to evaluate all possible sequences, so we resort to algorithms that pick sequences selectively, collected under the name Monte Carlo Tree Search [5]. These methods, in their quest to find the best move, already try to play as optimal as possible while figuring out the best move. We think that by letting go of the desire to only sample good sequences, and instead only caring for a good conclusion on the best move, we can improve on current algorithms. We do this by adapting Best-Arm Identification’s objective to fit a Mini-max structure: Mini-max Action Identification. We believe that this has not been done before. In Section 2 we will establish the framework and details of Mini-max Action Identification. We define the problem of finding an optimal algorithm in two ways: In Section 3 we define the problem as the algorithm that provides the best guaranteed performance on the hardest set of parameters. In Section 4 the problem will be based on parameters following a fixed distribution. Further algorithms will be provided in Section 5. In this Section the algorithms will be compared in their performance as well. Lastly we present some findings on the worst-case set of parameters in Section 6, providing proofs on a couple of these findings.Show less
Mean-Variance optimization is widely used to find portfolios that make an optimal tradeoff between expected return and volatility. The method, however, struggles with a robustness problem since the...Show moreMean-Variance optimization is widely used to find portfolios that make an optimal tradeoff between expected return and volatility. The method, however, struggles with a robustness problem since the portfolio weights are very sensitive towards change of the input parameters. There is a vast literature on methods that tries to solve this problem and we discuss two of these methods: resampling and shrinkage. In addition to the methods from the literature, we develop a new method which we call maximum distance optimization. The resampling method attempts to obtain more robust portfolios by changing the optimization procedure. The shrinkage method attempts to obtain more robust portfolios by making the estimation of the input parameters more robust. The maximum distance optimization method explores a region closely beneath the efficient frontier and determines what kind of portfolios are nearly optimal, but have very different portfolio weights. First, we show that any convex combination of these near-optimal portfolios is also near optimal. Second, we show that the set of near-optimal portfolios is robust. Apart from the robustness, the advantage of this method is that we now obtain a whole scope of solutions, instead of a single portfolio, which Mean-Variance optimization provides. Since the region is robust, the investor or consultant can use his own qualitative arguments to select a preferred portfolio from this region.Show less