Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
Food-borne disease outbreaks constitute a large, ongoing public health burden worldwide (Hald et al., 2016). Early identification of contaminated food products plays an important role in reducing...Show moreFood-borne disease outbreaks constitute a large, ongoing public health burden worldwide (Hald et al., 2016). Early identification of contaminated food products plays an important role in reducing health burdens of food-borne disease outbreaks (Jacobs et al., 2017). Case-control studies together with logistic regression analysis are primarily used in food-borne outbreak investigations. However, the current methodology is associated with problems including response misclassification, missing values and ignoring small sample bias. Jacobs et al. (2017) developed a formal Bayesian variable selection method which deals with the problems of missing covariates and misclassified response. The re-analysis of Dutch Salmonella Thompson 2012 outbreak data (Friesema et al., 2014) has illustrated that this Bayesian approach allows a relatively easy implementation of these concepts and performs better than the standard logistic regression analysis in the identification of responsible food products. The complete Bayesian variable selection model is composed of three different parts, namely, misclassification correction, missing value imputation and Bayesian variable selection. In this thesis, we are interested in how these different parts affect the performance of Bayesian variable selection models in scenarios with (i) the same response misclassification rate and missingness rate in an assumed responsible food product covariate as in the original food-borne disease outbreak dataset, (ii) different response misclassification rates, (iii) different missingness rates in an assumed responsible food product and (iv) the combination of different response misclassification rates and missingness rates. We answer this research question by designing and executing a simulation study. Our results indicate that for the four different versions of Bayesian variable selection models studied in this thesis, the increase in the response misclassification rate or the missingness rate in the assumed responsible food product covariate or the increase in both results in a decrease in model performance. Bayesian variable selection, misclassification correction and missing value imputation all contribute positively to the model performance. Although missing value imputation is most computationally expensive, it contributes the most to the model performance among these three components.Show less