Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
Currently, platelet transfusion is the main treatment for patients with thrombocytopenia due to haematological malignancy and intensive chemotherapy. When the platelet count is low, transfusion is...Show moreCurrently, platelet transfusion is the main treatment for patients with thrombocytopenia due to haematological malignancy and intensive chemotherapy. When the platelet count is low, transfusion is given to prevent bleedings. However, the platelet count is not the only determinant of bleeding (Ypma et al., 2019). Other biomarkers might additionally or even better predict bleeding such as the albumincreatinine ratio measured in urine. This thesis project will determine the predictive value of these new biomarkers where we would like to predict the ”untreated risk” of bleeding: the risk of bleeding if patients would not receive a transfusion. We used a real dataset that contains 88 patients with 116 thrombocytopenic episodes in which patients’ platelet counts are low and they may develop a bleeding. A problem is that the patients who received transfusions cause diculty in predicting the “untreated risk”. Another problem is that transfusions were given partly based on the platelet counts, which makes the e↵ect of transfusion on bleeding confounded by platelet count. We considered two situations. One was to predict the bleeding during the day based on the platelet count that was measured in the morning (the one-day situation). The two-day situation was to predict bleeding in the next two days, but before the second day-night based on the platelet count that was measured on the first day morning. In the first part of this thesis, we structured the relationship between biomarkers, transfusions and bleeding by expressing them in causal diagrams. Using the causal diagrams, we found the reason why the conventional models failed to predict untreated risk in the two-day situation. We found that the marginal structural model might be a solution. In the second part, we set up a simulation to verify whether the marginal structural model or conventional regression models can handle the confounding in the one-day situation and the time-dependent confounding in the two-day situation. Based on our simulation studies, we concluded that for the one-day situation the regression model including treatment and predictor was well equipped, while in the two-day situation the marginal structure model is recommended to estimate the “untreated” risk. In the third part, we applied the models to the dataset. We found that in the one-day situation urine albumin/creatinine ratio and the platelet count have potential predictive value for predicting same day bleeding, while, for the two-day situation, only the urine albumin/creatinine ratio was significantly associated with the risk of bleeding in all models. Additionally, there was not a clear e↵ect of transfusion detected in the one-day situation and two-day situation.Show less
Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
This study deals with the introduction of a customer lifetime value for business customers with a focus on lifetime estimations using mobile contracts that are part of larger business contracts of...Show moreThis study deals with the introduction of a customer lifetime value for business customers with a focus on lifetime estimations using mobile contracts that are part of larger business contracts of a large Dutch telecom provider. Customer lifetime value is the total profit or loss to a company over the whole period of transactions by a customer. Business customers are defined here as firms or locations of large firms that are contracted for one or more business products of the telecom provider. Customer lifetime values are calculated of the level of mobile contracts and taken together per location afterwards. In order to calculate customer lifetime values, individual lifetime predictions and a definition of the values is needed. The lifetime predictions resemble a survival analysis that models the time from becoming contractfree until one of three possible decisions (contract renewal, product migration or contract termination) is made. Using survival estimates and semi-parametric models the overall survival is analyzed as well as the influence of characteristics of locations and companies to which the locations belong. Then, with the R package mstate competing risks models are applied to model the time to each decision while taking into account the other possible decisions. Additionally, lifetime estimations that result from the competing risks models are updated, whereby the survival analysis starts several months after becoming contract-free. Results show that approximately 25% of the decisions have been made at the start of the study. The duration of mobile contracts and ownership of a business internet product or a mobile internet product next to the mobile contract discriminate most between the occurance of the decisions. Furthermore, results of the competing risks models show that probabilities of making any decision attenuate over time. This is confirmed with a fictional product offer on both the levels of the mobile contract and business customers. The customer lifetime value as described here is a useful metric for the telecom provider to make customer selections and, after applying it to other business products, it could be used to discriminate between product offers.Show less
Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
The area under the receiver operating characteristic (ROC) curve (AUC) is a commonly used measurement for the discriminative ability of a model. For the time to event variable in survival analysis...Show moreThe area under the receiver operating characteristic (ROC) curve (AUC) is a commonly used measurement for the discriminative ability of a model. For the time to event variable in survival analysis the case and control sets will vary over time, thus a dynamic definition of AUC is required. We choose the dynamic AUC defined by incident true positive rate and dynamic false positive rate (I/D AUC) proposed by Heagerty and Zheng [6]. However, the difficulty to empirically obtain the incident true positive rate is hampering the estimation of dynamic AUC. Thus, several semi-parametric and non-parametric estimators are proposed. Heagerty and Zheng [6] proposed the semi-parametric estimation method based on Cox model. The non-parametric estimates using intermediate concordance measure with LOWESS smoothing is raised by van Houwelingen and Putter [14]. Based on the same intermediate concordance measure, SahaChaudhuri and Heagerty suggested to use locally weighted mean rank smoothing [10]. Recently, Shen et al proposed a semi-parametric method by adopting fractional polynomial to fit the dynamic AUC [12]. In this thesis, we compare the performance of these methods with different configuration in a series of simulations. The plain Cox methods is not recommended when the proportional hazards assumption is not satisfied. The Cox model with time-varying coefficients are relatively stable when the marker has a mediocre effect. For the non-parametric methods, a too wide span/bandwidth may lead to large bias, and a too narrow span/bandwidth may lead to unstable estimates, thus, the trade-off between the bias and the standard deviation has to be made. For fractional polynomial, adding extra fractional polynomial terms does not benefit the performance. In addition, many researchers observed a decreasing trend of I/D AUC over time in their empirical studies [10][12][6], yet Pepe et al. held the opinion that the I/D AUC may be an increasing function over time [7]. We investigate the trend of I/D AUC under a Cox model and binary marker setting. However, we observe that under certain Cox models, the I/D AUC curve first increases then decreases, thus I/D AUC is not necessarily a decreasing function of time.Show less