Longitudinal data are often collected in different research areas such as medicine, biology, education, and psychology. We can build a transitional model using longitudinal binary data, which aims...Show moreLongitudinal data are often collected in different research areas such as medicine, biology, education, and psychology. We can build a transitional model using longitudinal binary data, which aims to model the probability of transition between the response categories. In this type of data is common to find missing values due to dropouts, costs, and organizational problems. The missing-indicator model is often used as a method to handle missing values in this type of data. This method consists in creating a new category for the missing values. Therefore, the binary logistic model changes to a baseline-category logit model. This study aims to evaluate the bias of the estimated coefficients when the missing-indicator method is used in the response of a binary transitional model. Based on an empirical example, a Monte Carlo simulation with three factors is carried out: (1) type of missingness, (2) sample size, and (3) proportion of missing data. The coefficients bias from the baseline-category logit model is evaluated using boxplots and a three-way MANOVA analysis. The results suggest that sample size, the proportion of missing data, type of missingness and the interaction between sample size and proportion affect the bias of the estimated coefficients; nonetheless, the effect size is small. When each dependent variable is analysed separately using ANOVA, the effects of the proportion of missing and the interaction between sample size and proportion were statistically significant for only one coefficient. However, the effect size is still small. Therefore, the conclusion is that the estimated coefficients' bias for all the missingness types is low.Show less
Excessive data collection can be expensive to perform and process. Feature selection methods such as penalized regression are useful techniques to exclude less relevant features from a prediction...Show moreExcessive data collection can be expensive to perform and process. Feature selection methods such as penalized regression are useful techniques to exclude less relevant features from a prediction model. However, these methods cannot be directly applied to multi-view data, as it ignores the grouped structure of this type of data and valuable information may be lost. For multi-view data Stacked Penalized Regression (StaPLR) can be used for view selection. A challenge arises when missing data is present, especially when complete views are missing. Using listwise deletion, the most common method to deal with missing values, on these datasets will therefore likely lead to a lot of information loss. In this thesis the effect of mean substitution, single Bayesian regression, multiple Bayesian regression and predictive mean matching on the predictive accuracy and sparsity of StaPLR are compared on different missing data scenarios on an existing dataset containing items from the Mood and Anxiety Symptoms Questionnaire (MASQ). In addition to traditional feature imputation, StaPLR offers the possibility to impute view-level predictions when views are missing completely, which can decrease the computational burden. To evaluate the performance of the missing data methods, 27 missing data scenarios are created with features missing either randomly or within the view structure. Predictive accuracy is measured with AUC, and sparsity by the number of views StaPLR selects. Multiple imputation works well in terms of predictive accuracy, but when quantities of missing values increases they do not work well in terms of sparsity. Mean substitution could potentially be an acceptable alternative to multiple imputation when features are missing randomly. View-level imputation performs similar as feature-level imputation for most methods when views are missing completely, and could therefore be used for these methods to decrease the computational burden. This thesis is the first step in handling missing values with StaPLR. Future research directions include repeating this research on more elaborate multi-view data structures, where the views are collected from different sources, to identify whether these effects are found under different circumstances.Show less
Although ecological momentary assessment (EMA) is increasingly used in clinical and research settings due to its high ecological validity, low compliance rates still hinder its full fruition....Show moreAlthough ecological momentary assessment (EMA) is increasingly used in clinical and research settings due to its high ecological validity, low compliance rates still hinder its full fruition. Inconsistency in which predictors interfere with EMA compliance persists. As students frequently suffer from mental health problems, we as a Bachelor project group conducted an EMA study measuring mental health and related behaviors in 84 Bachelor students of Dutch universities via a smartphone application. The study consisted of a baseline assessment, a two-week-long EMA with four measurements per day, and a post-assessment. My goal was to explore whether mental health and self-efficacy predict EMA compliance and whether self-efficacy mediates the relationship between mental health and compliance? I computed a multiple linear regression model and mediation analysis with bootstrapping using the program “PROCESS” (Hayes, 2009) on IBM SPSS Statistics, version 24. The dependent variable was compliance, derived from the percentage completed EMA surveys, and the independent variables were mental health and self-efficacy at baseline, where the latter ran as the mediator between mental health and compliance. I added age and gender as covariates. Results depicted a mean EMA compliance rate of 83.9% with minimal time variations. No predictor was significantly related to EMA compliance (R 2 = 0%). The mediation analysis showed non-significant direct and indirect paths with compliance. This demonstrates that students generally complied well with the EMA and did not systematically miss EMA reports based on their mental health and self-efficacy, which is promising for future EMA use.Show less
The present study aims to investigate a possible relationship between perceived stress and missing data. Respondents were asked to answer a series of questionnaires (Baseline, Ecological momentary...Show moreThe present study aims to investigate a possible relationship between perceived stress and missing data. Respondents were asked to answer a series of questionnaires (Baseline, Ecological momentary assessment (EMA) and post assessments) over the course of a 2-week period. One hundred undergraduate students between 18 to 48 years of age comprised the sample. The respondents were asked to complete four EMA questionnaires per day, for each day of the duration of the study. The results analysed were composed of the data from 84 of the respondents: 19 males, 64 females, and one person who did not identify their gender. The level of perceived stress was collected at baseline for each individual, and the evolution of stress level was analysed in relationship to the cumulative percentage of the amount of missing data; throughout the EMA period. To explore this relationship, two hypotheses were tested: stressed individuals have more missing data and women have more perceived stress in relation to the levels of missing data. The regression analysis between the level of perceived stress, gender and missing data held a non-significant p-value of 0.861. Concerning the exploratory research question: multiple stressors such as the burden created by the questionnaires and COVID-19 pandemic showed an influence on missing data. A positive relationship between stress created by the Coronavirus (COVID-19) pandemic and missing data was found with F(5,78)=2.335, p= .050 indicating the impact of the pandemic on the respondent's compliance. In conclusion, the obtained results did not show any significant results between stress, gender and missing data. Consequently, both the hypotheses were rejected. Interestingly, the stress caused by the current pandemic might have influenced the amount of missing data. A peculiarity of the study was the co-occurrence of the COVID-19 pandemic that might have influenced the results, and the level of perceived stress of the respondents. In the analysis and interpretation of the results it is necessary to take into consideration this particular situation and the impact on each individual’s daily life.Show less