Search results

Schouten, Frederiek 2020

Handling Missing Values in Multi View Data with Stacked Penalized Regression

Master thesis | Psychology (MSc)

open access

Excessive data collection can be expensive to perform and process. Feature selection methods such as penalized regression are useful techniques to exclude less relevant features from a prediction...Show moreExcessive data collection can be expensive to perform and process. Feature selection methods such as penalized regression are useful techniques to exclude less relevant features from a prediction model. However, these methods cannot be directly applied to multi-view data, as it ignores the grouped structure of this type of data and valuable information may be lost. For multi-view data Stacked Penalized Regression (StaPLR) can be used for view selection. A challenge arises when missing data is present, especially when complete views are missing. Using listwise deletion, the most common method to deal with missing values, on these datasets will therefore likely lead to a lot of information loss. In this thesis the effect of mean substitution, single Bayesian regression, multiple Bayesian regression and predictive mean matching on the predictive accuracy and sparsity of StaPLR are compared on different missing data scenarios on an existing dataset containing items from the Mood and Anxiety Symptoms Questionnaire (MASQ). In addition to traditional feature imputation, StaPLR offers the possibility to impute view-level predictions when views are missing completely, which can decrease the computational burden. To evaluate the performance of the missing data methods, 27 missing data scenarios are created with features missing either randomly or within the view structure. Predictive accuracy is measured with AUC, and sparsity by the number of views StaPLR selects. Multiple imputation works well in terms of predictive accuracy, but when quantities of missing values increases they do not work well in terms of sparsity. Mean substitution could potentially be an acceptable alternative to multiple imputation when features are missing randomly. View-level imputation performs similar as feature-level imputation for most methods when views are missing completely, and could therefore be used for these methods to decrease the computational burden. This thesis is the first step in handling missing values with StaPLR. Future research directions include repeating this research on more elaborate multi-view data structures, where the views are collected from different sources, to identify whether these effects are found under different circumstances.Show less

Leiden University Student Repository

Refine Results

Availability

Faculty

Thesis type

Programme

Issued

Supervisor

Language

Your search

Enabled Filters

Sort