Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
A problem for survey datasets is that the data may cone from a selective group of the population. This is hard to produce unbiased and accurate estimates for the entire population. One way to...Show moreA problem for survey datasets is that the data may cone from a selective group of the population. This is hard to produce unbiased and accurate estimates for the entire population. One way to overcome this problem is to use sample matching. In sample matching, one draws a sample from the population using a well-defined sampling mechanism. Next, units in the survey dataset are matched to units in the drawn sample using some background information. Usually the background information is insufficiently detaild to enable exact matching, where a unit in the survey dataset is matched to the same unit in the drawn sample. Instead one usually needs to rely on synthetic methods on matching where a unit in the survey dataset is matched to a similar unit in the drawn sample. This study developed several methods in sample matching for categorical data. A selective panel represents the available completed but biased dataset which used to estimate the target variable distribution of the population. The result shows that the exact matching is unexpectedly performs best among all matching methods, and using a weighted sampling instead of random sampling has not contributes to increase the accuracy of matching. Although the predictive mean matching lost the competition against exact matching, with proper adjustment of transforming categorical variables into numerical values would substantial increase the accuracy of matching. All the matches are used in reducing overfitting of machine learning, and the results show that all matches are able to increase the prediction precision.Show less
Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
The brain's Default Mode Network (DMN) raised a lot of interest in neuroscience the past decade. The DMN, is active even when a human is resting and his mind is not task oriented [1]. It is...Show moreThe brain's Default Mode Network (DMN) raised a lot of interest in neuroscience the past decade. The DMN, is active even when a human is resting and his mind is not task oriented [1]. It is mentioned in the literature [2, 3], that disruptions within the DMN often occur in the profile of patients carrying some disorder, such as Parkinson's disease (PD), Alzheimer's disease (AD) and epilepsy. In this thesis we aim to build a classification model that predicts whether a new subject is an Alzheimer's patient or not. This model is created based on the DMN profile of 250 subjects. To this purpose, we employ the δmachine classification approach of Yuan, Heiser and De Rooij [4], which uses the distances between DMN profiles as the predictor matrix in a lasso logistic regression model. It is essential to define a distance measure that best fits the DMN univariate time series data, that is, a measure which can strongly represent the distances, irrespective of the possibility of data distortion in time. Keeping that in mind, five distance measures were investigated, which are designed for time series and are implemented in the up-to-date R packages TSdist and TSclust. The final goal is twofold: on the one hand building a classification model by using the δ-machine approach, based on the profile of the activity in the DMN of 250 subjects, and on the other hand uncovering which distance measure is the most suitable when involved in the δ-machine approach.Show less
Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
The Elo rating system has been used in various sports / games, such as chess, soccer, tennis and even video games, to calculate the relative playing strengths of players / teams. Originally, the...Show moreThe Elo rating system has been used in various sports / games, such as chess, soccer, tennis and even video games, to calculate the relative playing strengths of players / teams. Originally, the Elo system was invented by a Hungarian physics professor, Arpad Elo, to improve chess rating system. Now many rating systems used in sports are based on the Elo rating system with modifications. The objective of this thesis project is to examine the Elo rating system for soccer tournaments and how it can be applied to the 2017 UEFA Women’s Championship (short for UEFA Women’s Euro 2017). More specifically, two primary interests lie in this project. The first interest lies in determining the strength of each team by assigning an Elo rating to the each competing team after tournament. In addition, it is interesting to see how home-field advantage helped the Netherlands (the host country) win the championship of UEFA Women’s Euro 2017 by incorporating the home-field advantage in the Elo formula. Secondly, strengths of the players of all teams are also of interest. In order to estimate the strengths of the players, each player is assigned a rating (Not an Elo rating) to represent how strong every player is. We can then compare the players among all teams. In order to access the reliability of our ideas and methodology, a simulation study will follow after the theoretical part of our research. In Chapter 1 I will first describe the basic concepts of the Elo rating system. Then a short summary of the relevant literature papers will be presented. Finally I will discuss the source of the data, the arrangement of the tournament, and the process that will take to go through the algorithm / methodology. In Chapter 2 the basic Elo formula and some modified Elo models are proposed, which allows us later on to determine the most appropriate model for estimating the strengths of every single competing country and the players of all teams. In the end of this chapter, I develop an ordered probit regression model for forecasting match results in UEFA Women’s Euro 2017. Chapter 3 suggests a simulation study for estimating the strengths of all the participant countries of the tournament and the strengths of football players of all teams. Chapter 4 presents the main conclusions drawn from the model computations and suggests some further research of this thesis project.Show less