Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
Data is often collected in an aggregated fashion, for instance as categories, in intervals, or in predefined areas. In order to estimate the underlying, continuous distribution of an aggregated...Show moreData is often collected in an aggregated fashion, for instance as categories, in intervals, or in predefined areas. In order to estimate the underlying, continuous distribution of an aggregated variable, the penalized composite link mixed model can be used (PCLMM). The PCLMM only assumes that the underlying distribution is smooth, and so it can be used to estimate any nonparametric regression function. The model is a combination of the generalized linear mixed model, penalized B-splines, and the composite link model. In this thesis, the mathematical framework of these three well-known techniques is described, after which the close connection between them and the PCLMM is used to give a mathematical description of the estimation technique. Using a simulation of an one-dimensional function and an example on Q-fever cases in the Netherlands in 2009, it is shown that the PCLMM can accurately estimate even the smaller details of the underlying distribution if covariate information on the finer-scale is available. Decent approximations of the underlying distribution is obtained when covariate data is only available on the aggregated scale.Show less