Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
Criminal profiling is a rapidly growing field of research, in which statistics get more and more incorporated alongside of the traditional behavioural profiling approach that uses psychological...Show moreCriminal profiling is a rapidly growing field of research, in which statistics get more and more incorporated alongside of the traditional behavioural profiling approach that uses psychological theories to predict the behaviour of an offender. A model was built to predict the offender characteristics from crime and victim characteristics for single-victim-single-offender homicides in the Netherlands. Using the Dutch Homicide Monitor, eight different Bayesian network structure learning algorithms were combined into one model; arcs that were present in at least three separate structure learning algorithms were represented in the combined model and its direction was determined by the highest cumulative arc strength. The graphical representation of the model gives insight into the dependence relationships between crime, victim, and offender characteristics, and therefore could be used to confirm existing and develop new hypotheses on criminal psychology. Moreover, with an appropriate threshold resulting in a prediction error of less than 10 percent, the combined Bayesian network might be suitable for actual implementation by the police. This practical implication and the restrictions of the model are discussed, and recommendations for future research are given.Show less
Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
Food-borne disease outbreaks constitute a large, ongoing public health burden worldwide (Hald et al., 2016). Early identification of contaminated food products plays an important role in reducing...Show moreFood-borne disease outbreaks constitute a large, ongoing public health burden worldwide (Hald et al., 2016). Early identification of contaminated food products plays an important role in reducing health burdens of food-borne disease outbreaks (Jacobs et al., 2017). Case-control studies together with logistic regression analysis are primarily used in food-borne outbreak investigations. However, the current methodology is associated with problems including response misclassification, missing values and ignoring small sample bias. Jacobs et al. (2017) developed a formal Bayesian variable selection method which deals with the problems of missing covariates and misclassified response. The re-analysis of Dutch Salmonella Thompson 2012 outbreak data (Friesema et al., 2014) has illustrated that this Bayesian approach allows a relatively easy implementation of these concepts and performs better than the standard logistic regression analysis in the identification of responsible food products. The complete Bayesian variable selection model is composed of three different parts, namely, misclassification correction, missing value imputation and Bayesian variable selection. In this thesis, we are interested in how these different parts affect the performance of Bayesian variable selection models in scenarios with (i) the same response misclassification rate and missingness rate in an assumed responsible food product covariate as in the original food-borne disease outbreak dataset, (ii) different response misclassification rates, (iii) different missingness rates in an assumed responsible food product and (iv) the combination of different response misclassification rates and missingness rates. We answer this research question by designing and executing a simulation study. Our results indicate that for the four different versions of Bayesian variable selection models studied in this thesis, the increase in the response misclassification rate or the missingness rate in the assumed responsible food product covariate or the increase in both results in a decrease in model performance. Bayesian variable selection, misclassification correction and missing value imputation all contribute positively to the model performance. Although missing value imputation is most computationally expensive, it contributes the most to the model performance among these three components.Show less