Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
Prediction rule ensembles (PREs) aim to offer a good compromise between prediction accuracy and interpretability by selecting a small set of the most important prediction rules. The accuracy of...Show morePrediction rule ensembles (PREs) aim to offer a good compromise between prediction accuracy and interpretability by selecting a small set of the most important prediction rules. The accuracy of tree-based methods, such as single decision trees are known to be negatively affected by measurement error. The PRE algorithm is based on single decision trees, which are turned into an ensemble of multiple rules and may thus inherit the negative effect of measurement error. However, an extensive investigation of the influence of measurement error on the performance of PREs has not been conducted before. Therefore, we evaluated the impact of measurement error on the performance of PREs though two simulation studies: one for data with continuous predictor variables and the other for data with binary predictor variables. In both the focus is solely on binary classification. We found that the predictive accuracy of PREs, as measured by AUC values, deteriorated in the presence of measurement error. More importantly, it was found that the performance of the PRE method deteriorated with larger amounts of measurement error for both the binary and continuous predictor scenarios. In addition, the performance of PREs in terms of number of correctly selected rules, type I and type II errors was evaluated. We found that, apart from deteriorating the predictive performance of the PREs, measurement error can also deteriorate the interpretability of the fitted ensemble by selecting wrong rules, resulting in unreliable and wrong conclusions. Keywords: RuleFit, prediction rule ensembles, measurement error, classification error, reliability, type I error, type II errorShow less