In scientific research, interpretability and high predictive performance are difficult to combine: while black-box models perform better than interpretable models, only the latter allow for...Show moreIn scientific research, interpretability and high predictive performance are difficult to combine: while black-box models perform better than interpretable models, only the latter allow for transparency and inference, which are necessary tools when these models are used in decision-making or in hypothesis testing. Models such as RuleFit combine the flexibility of a black-box tree ensemble with the interpretability of a sparse, LASSO linear regression. Later work substitutes Bayesian regression for the LASSO regression, thus further improving the model’s prediction (Horserule). The work in this thesis was two-sided: on the one hand, we applied a different Bayesian prior (the informative Horseshoe prior) to the linear step of the RuleFit model, which can naturally take the structure of RuleFit into account. On the other hand, we used Shapley values to measure the contribution of each predictor in the RuleFit model and combined these values with the Bayesian regression to build inferential tools. The new machinery was tested on both synthetic data and the dataset from the Helius study. The predictive performance of the resulting model was observed to be higher than that of the original RuleFit model, but lower than that of Horserule. Compared to Horserule, the proposed model excessively favours trees over linearity, but in doing so it more strongly enforces the choice of simpler trees. Shapley values were also compared to other importance measures mentioned in the RuleFit literature, and shown to be more accurate in reconstructing the contributions as defined in the synthetic datasets.Show less