Employment contract types are often distinguished in permanent, flexible, and other types of contracts. Accurate estimates of their frequencies are valuable for socio-economic research and...Show moreEmployment contract types are often distinguished in permanent, flexible, and other types of contracts. Accurate estimates of their frequencies are valuable for socio-economic research and legislative purposes. In member states of the European Union, the Labour Force Survey (LFS) is used to acquire such estimates. In the Netherlands, in addition to the LFS, the Employment Register (ER) is available with which employment contract type frequencies can be estimated. Estimates based on the two indicators are known to differ substantially and consistently. Studies have found several plausible contributing factors for the inconsistencies. However, when taken into account (excluding measurement error), a substantive part of the inconsistencies remains unexplained. The true employment contract type can be regarded as a latent variable of which the ER and the LFS are indicators. Potentially, the inconsistencies between the indicators result from a difference in measured concept. Aside from the true employment contract type, direct effects (DEs) may exist from external covariates on the recorded employment contract type in the ER or the LFS. If so, such covariates are a source of differential item functioning (DIF) for the indicators. This study focuses on potential DIF for the ER and the LFS. An attempt is made to deduce DIF using latent class (LC) analysis. LC models in which various types of DEs are included are compared with a stepwise likelihood-ratio test (LRT) method, based on Masyn (2017), and an exhaustive Bayesian information criterion (BIC) method. Over multiple datasets, the results for both methods were inconsistent. Additionally, there was little agreement between the methods. The exhaustive BIC method was more conservative as all best-fitting models were nested in the best-fitting models of the stepwise LRT method. For testing the performance of the assessed methods in scenarios with two indicators, an additional simulation study is included. It was found that when no DEs were present, both methods deduced the correct relationships in all cases. However, when DEs were present, both methods performed poorly in deducing the correct relationships. Correct relationships between covariates and indicators were more often found when DIF was relatively simple and effect sizes were relatively large. The moderate success of the stepwise LRT method with two indicators had not been described in any literature thus far. As the results for the real data were inconsistent and the simulation study showed poor performance overall for the assessed methods, no decisive evidence was found that a specific covariate is a source of DIF for the employment contract type as recorded in the ER or the LFS. However, as there are hints for DIF, a difference in measured concept cannot be ruled out. Follow-up research should consider other avenues to investigate the question at hand as the assessed methods gave unsatisfactory results.Show less
Machine learning algorithms are frequently deployed for predictive classification problems. However, as implied by the No Free Lunch (NFL) theorem, not every algorithm is destined to perform well...Show moreMachine learning algorithms are frequently deployed for predictive classification problems. However, as implied by the No Free Lunch (NFL) theorem, not every algorithm is destined to perform well on a given data set. One is often interested whether predictive capacity of an algorithm is significantly better than chance level (better than chance) for a choice of the hyperparameter(s). Machine learning algorithms generally lack the intrinsic statistical framework of statistical learning algorithms to make statements about better than chance performance. In supervised binary classification problems, multiple methods have been proposed to do so. These have shown to be flawed. Arguably, this can be attributed to the idea that the NFL also applies to these better than chance methods, or in general to any tests, suggesting that performance of the test depends on the type of signal. Therefore, in this current project, we propose novel global test (GT) based tests that are in accordance with the signal detected by their respective learning algorithm. To do so, we reformulated two popular machine learning algorithms, k-nearest neighbors (kNN) and random forest, as empirical Bayesian linear models. It turned out we can not only construct tests for specific (combinations of) hyperparameters but also for sets of hyperparameters. Properties of these tests have been explored in simulated linearly and nonlinearly separable alternatives as well as in real world data. Results from our simulation studies indicated that our novel tests had competitive power characteristics compared to existing methods. Moreover, we demonstrated their applicability to real world data. Our finding indicated that our novel tests for kNN and random forest can be readily used to assess better than chance performance. Equally important, the exploited GT framework can be applied to construct tests for other learning algorithms. Ultimately, our tests and possible future GT based tests add to list of existing methods that each serve a niche in the detection of better than chance signal for a learning algorithm.Show less
This thesis research aims to improve traffic sign detection within dashcam footage by using temporal information. Essentially, a video is a set of images displayed at a fast rate. Temporal...Show moreThis thesis research aims to improve traffic sign detection within dashcam footage by using temporal information. Essentially, a video is a set of images displayed at a fast rate. Temporal information lies in the similarity across subsequent frames. However, current state-of-the-art object detection frameworks only use single images. To test whether temporal information can increase the performance of a Convolutional Neural Network (CNN), we train three models: YoloV5, a 3D CNN and a 4D CNN. YoloV5 is used to benchmark the other models against a state-of-the-art framework for object detection. Second, the existing architecture of YoloV5 is adopted as a basis for the 3D CNN. After tuning the hyperparameters for the 3D CNN, performance is compared to YoloV5. Third, the 3D CNN is changed into a 4D CNN that processes sets of frames. By combining the frames within a set, the information in each frame is fused together, including the temporal information across the frames. We call this temporal information fusion (TIF). Comparing the performance of the 3D CNN to those of the 4D CNN shows the effect of TIF. In this research, a balanced dataset containing 444 sets of frames containing traffic signs from dashcam videos is used to train and test the models. The objective is to correctly classify the traffic signs on the frames. The results show that TIF can increase the accuracy of a CNN model by 2\%, purely through the addition of TIF. The main drawback of using TIF is an increase in processing time. Instead of a single image, the network needs to process a set of images, which naturally will take longer. The results in this research can form a basis to explore TIF in object detection further.Show less
Product data can be useful to perform environmental impact assessments of product life-cycles. In order to automatize such assessments, this research is examining methodologies that encounter the...Show moreProduct data can be useful to perform environmental impact assessments of product life-cycles. In order to automatize such assessments, this research is examining methodologies that encounter the challenges with the processing and classification of product data. We consider a large imbalanced and multilingual dataset with short and noisy product descriptions that have been labeled by human annotators. The product classes are hierarchically ordered and characterized by two levels. To treat the class imbalance we proposed two data enrichment methods on the training data. Oversampling and a web scraping method with a prior-filtering. The web scraper was parsing web data using a search engine and used Sentence-BERT with cosine-similarity to assess semantic relevant information. In addition, we proposed two classification methods, Support Vector Machines (SVM) and BERT. Both models were evaluated according to several experiments considering a flattened- and hierarchical classification approach of the products. In addition, we perform an extensive error analysis on the model results considering the SVM feature importance and the BERT attention weights. The results showed that both models show similar flattened classification performance using the normal data, i.e. no data enrichment. SVM show better flattened classification performance after treated the class imbalance with data enrichment. BERT show poor performance using data enrichment and is overfitting the training data. Hierarchical classification improved the classification performance of BERT using oversampling. SVM did not benefit from the hierarchical classification approach and show better classification performance using flattened classification. As last, the error analysis have showed that the data consist of incorrect or subjective manual labeling. The SVM feature importance and BERT attention weights results suggest that nonrepresentative tokens or out-of-vocabulary tokens have the tendency to decrease the classification performance.Show less
Functional Magnetic Resonance Imaging (fMRI) data capturing the BOLD response for various voxels through time for a single subjectcan reveal idiosyncratic dynamic functional connectivity (dFC)...Show moreFunctional Magnetic Resonance Imaging (fMRI) data capturing the BOLD response for various voxels through time for a single subjectcan reveal idiosyncratic dynamic functional connectivity (dFC) patternsunderlying a subject’s brain responses. These dFC patterns are known to be related to mental disorders, likeschizophrenia (Lynall et al.,2010)and Alzheimer’s disease (AD; Gili et al., 2011). Current directions in neuroscience hope to identify possible types and subtypes of mental disorders. In this thesis, we make the assumption that heterogeneity in dFC patterns across subjects may be indicative of such mental disorder (sub)types. To detect these(sub)typesbased on dFC patterns, we propose the clusterwise INDSCAL modelto analyze multi-subject rs-fMRI data, which is a generalization of the K-INDSCAL model of Bocci andVichi (2011). In this model, subjects with similar dFC patterns are clustered together, whereas subjects with clearly different patterns are allocated to different clusters. As such, clusterwise INDSCAL captures heterogeneity between subjects in dFC patternsand is able to identify unknown disease (sub)types. An Alternating Least Squares (ALS) algorithm to estimate the model parameters is presented in which the clustering and the model parameters for each cluster are updated alternatingly. This algorithm, alongwith a model selection heuristic to determine the optimal number of clusters and dFC patterns,is evaluated in an extensive simulation study in which several data characteristics (e.g., signal-to-noise ratio, similarity of clusters) are manipulated.The results show that the CLINDSCAL algorithm can successfully identify the true clustering of the patients and their underlying dFCpatterns. Further, when the spatial overlap in dFC patterns between the clusters increases, the performance of the algorithm in terms of recovering the clustering of the patientsdecreases.It can be concluded that CLINDSCAL is an interesting tool to discover a natural subject clustering with subject clusters differing in the dFC patterns underlying their data. Such a clustering may point at the existence of (yet unknown) mental disorder (sub)types.Show less
The emergence of automated high-throughput phenotyping platforms provides us with the po-tential for powerful genome-wide association studies (GWAS) using image phenotypes. A majorchallenge of GWAS...Show moreThe emergence of automated high-throughput phenotyping platforms provides us with the po-tential for powerful genome-wide association studies (GWAS) using image phenotypes. A majorchallenge of GWAS on imaging-genetics datasets is to define a meaningful representation of thetraits, to make the data amenable to GWAS-based prediction. We propose a different approachof reverse-GWAS mapping, which predicts the genetic markers from the phenotypes rather thanthe other way around. This method would allow us to use chlorophyll fluorescence images asphenotypes to identify markers that influence photosynthesis efficiency inArabidopsis thaliana.We implemented several deep learning methods for reverse-GWAS, based on Convolutional Neu-ral Networks (CNNs), including well-known architectures such as ResNet-50, DenseNet-121, andXception.These results suggest that the various convolutional neural networks are not able to classifySNP markers with high accuracies. The main challenges seems to be the highly overlappingphotosynthesis efficiency of alleleaandb, even for the most significant SNP markers that aredetected from the GWAS, which makes it difficult for the models to classify the alleles correctly.Moreover, the individual narrow-sense heritability of the trait is low, indicating that the geneticadditive effect is low. Furthermore, the number of accessions used in this study is relativelylow compared to the approximately 6,000 registered accessions. Hence, it is possible that theaccessions with high and low photosynthesis efficiency are not included in the study. For futurestudies, adding more accessions might be able to improve the accuracy of Reverse-GWAS.Show less
Statistical hypothesis testing is central to many scientific fields. Testing many hypothe-ses simultaneously is called multiple testing. The main concern in multiple testing, is toensure that most...Show moreStatistical hypothesis testing is central to many scientific fields. Testing many hypothe-ses simultaneously is called multiple testing. The main concern in multiple testing, is toensure that most of the rejected null hypotheses are indeed false, i.e., that the numberof incorrect rejections remains low. A major challenge in multiple testing is to accountfor the complex dependencies in the data. A powerful approach in this regard, arepermutation-based multiple testing methods. These methods make few distributionalassumptions. In fact, they often make only one assumption, called joint exchangeabil-ity. In this thesis we investigate the robustness of the methods to violations of thisassumption. We do this by means of simulations, where we focus on case-control data.We find that, while the theoretical literature always makes the mentioned assumption,it is often not necessary in practice. Thus, this thesis provides further evidence for thevalidity of these powerful methods in practice.Show less
In this study, the combination of multi-state survival analysis and causal inference was used to estimate theprobability of an event occurring as a function of treatmenttiming. This studyfollows...Show moreIn this study, the combination of multi-state survival analysis and causal inference was used to estimate theprobability of an event occurring as a function of treatmenttiming. This studyfollows throughfrom the results and recommendations of a previous methodological researchestimating the average pregnancy probabilityas a function ofintrauterine insemination (IUI)treatment timings using observational data from a prospective cohort studyin the Netherlands. The study applied anillness-death multi-statemodel with expectation management as the initial state, IUI treatment as the transition state, and pregnancy as the final or absorbing state. To study the performance ofcausalmulti-state survival analysis,multipledatasetswitha woman’sage with a standardised normal distribution and treatment timings following an exponential distribution were generated using simulations. Fivetreatment strategies were considered: when a patient receives the treatment without delay, when treatment is delayed at three, six, nine months from follow-up,and when treatmentis delayed indefinitely i.e., the patient does not receive treatment during the observation period. For each strategy, the pregnancy probabilityfor an individual andthe group average wereestimated using the causal multi state model and compared to the calculatedtrue valuesfor an observation period of 1.5 years from the start of the follow-up. Variance, bias, and the root mean square error (RMSE) were used as performance measures toassess whether the methodcan accurately estimate the average pregnancy probabilities by treatment strategy over time.The resultsfrom the performance measuresindicate that the methodology canprovide precise and unbiasedestimates. Future work in this area includesintroducing a mechanism for censoringin the data generating stepof the simulation, exploring other probability distributions to generate the transition times, and comparing theresults for the multi-state approachwith those for other similar methodologiessuch as inverse probability weighting used to estimate the outcomesof treatment timing fromobservational data.Show less
Background -Recent studies suggest that adding genomic markers leads to improved riskprediction and reduces over-treatment for women with early stage breast cancer. A more refinedevaluation from a...Show moreBackground -Recent studies suggest that adding genomic markers leads to improved riskprediction and reduces over-treatment for women with early stage breast cancer. A more refinedevaluation from a decision analytic perspective may be informative for patients and policymakers.We aim to examine the clinical utility of two recently proposed genomic markers.Methods -We reanalyzed aggregated data from the MINDACT and TAILORx trials, in-cludingN= 6653 andN= 10253 women, to evaluate the clinical utility of the MammaPrintand OncotypeDX tests. Clinical utility was quantified using Net Benefit, reflecting the rela-tion between 8-year distant metastasis free interval (DMFI) and the number of women receiv-ing chemotherapy. Net Benefit balances the DMFI and number of chemotherapy courses by aweight, this weight is determined by the decision threshold. This decision threshold indicates thethreshold where the gain in DMFI is considered sufficiently high to indicate chemotherapy. Keyparameters were estimated from the two trials, including distributions for clinical and genomicrisks, where statistical correlation between clinical and genomic risks was retained.First, a reanalysis was performed for the MINDACT and TAILORx trials separately usingdecision analytic modelling. We compared the proposed strategies in the trials with decision an-alytic modelling. Second, we resimulated from the MINDACT trial population to enable a directcomparison between the MammaPrint and OncotypeDX genomic tests. Here, the effectivenessof chemotherapy was estimated from earlier randomized controlled trials (HR = 0.64), similarto the PREDICT decision support tool. We also assumed a similar clinical risk function. Thisapproach allowed for estimating individualized risk distributions and expected benefit distribu-tions only differing in the specific genomic marker used. Third, sensitivity analyses examinedhow different qualities of baseline models, quality of genomic tests, effectiveness of chemotherapy,and decision thresholds affected clinical utility.Results -Results show that using decision analytic modelling results in a more favorablebalance between benefits and harms. For MINDACT, using decision analytic modelling increasedNet Benefit from 0.55 to 0.58. Whereas for TAILORx the Net Benefit increased from 0.04 to0.05. The comparison for the genomic markers shows that OncotypeDX performs similar to orbetter than the MammaPrint. Although the average DMFI was similar (MammaPrint: 92.38 vsOncotypeDX: 92.49), the Net Benefit was higher for OncotypeDX (MammaPrint: 0.31 vs Onco-type: 0.50). Moreover, the MammaPrint had a prognostic effect of HR = 2.37, whereas the di-chotomized OncotypeDX (i.e. dichotomized at the same proportion of high risks to MammaPrint)showed a significantly higher prognostic effect of HR = 3.23. The sensitivity analysis showedthat the clinical utility of genomic markers depended on the quality of the baseline model, theeffectiveness of chemotherapy, and the decision threshold for the expected benefit.Conclusions -Decision analytic modelling confirms that the Mammaprint and Oncotype ge-nomic tests both have clinical utility, with OncotypeDX potentially outperforming MammaPrint.Using decision analytic modelling provides detailed information on the expected benefits of treat-ments, which can assist in shared-decision making about adjuvant chemotherapy. Further vali-dation and direct comparison are needed for MammaPrint and OncotypeDX to optimal compareand evaluate their clinical utility.Show less
Survival analysis deals with the study of the time until an event of interest occurs. The CoxProportional Hazards model (Cox model) is commonly used to model the relationship betweena survival...Show moreSurvival analysis deals with the study of the time until an event of interest occurs. The CoxProportional Hazards model (Cox model) is commonly used to model the relationship betweena survival outcome and a set of cross-sectional covariates, but it cannot handle longitudinal co-variates, i.e. covariates that are repeatedly measured over time. Traditional ways to deal withlongitudinal covariates include joint modelling, landmarking and the time-dependent Cox model,but to date their applicability has mostly been restricted to problems with a small number oflongitudinal covariates.Recently, the increasing availability of repeated measurements in biomedical studies has mo-tivated the development of statistical methods specifically designed to predict survival from alarge (potentially high-dimensional) number of longitudinal covariates. Due to the fact that suchmethods are still quite new, little is known about how these methods may perform in practice.The aim of this thesis is to compare the performance of various statistical methods to predictsurvival on a real dataset where many longitudinal covariates are available as predictors. Fourmethods were chosen for comparison, including two novel methods employing different techniquesto harness the longitudinal information, Penalized Regression Calibration (PRC) and Multivari-ate Functional Principal Component Cox (MFPCCox) model, and penalized Cox models usinglandmarking (last observation carried forward method) and baseline measurements respectively.These methods were applied to the data from the Alzheimer’s Disease Neuroimaging Initiative(ADNI) study in the context of dynamic prediction of time to develop dementia. The ADNIstudy monitored the development of dementia in cohort of elderly individuals, and collected anextensive, heterogeneous set of markers over multiple years of follow-ups. Predictions were com-puted using a total of 26 covariates, of which 21 were longitudinal. The predictive performanceof the models was evaluated considering three performance measures (time-dependent AUC, Cindex, and Brier score).The results showed that the best performing method depended on the choice of performancemeasure, landmark time, and prediction time. Landmarking was the best performing methodwhen looking at the time-dependent AUC and C index, whereas PRC was the best performingmethod in terms of Brier score. Landmarking, PRC, and MFPCCox outperformed the baselinemodel that ignored the follow-up information, suggesting that the longitudinal information inthe ADNI data can be used to improve predictions for dementia. Overall, our results seem toindicate that for the ADNI data a simple approach such as landmarking may be enough to deliveraccurate predictions, when compared to more sophisticated approaches (PRC and MFPCCox)that model the trajectories of longitudinal covariates.Show less
Survival analysis studies time-to-event outcomes. One of the main characteristics of survivaldata is that some survival times are not observed, we call those observations censored. Standardmethods...Show moreSurvival analysis studies time-to-event outcomes. One of the main characteristics of survivaldata is that some survival times are not observed, we call those observations censored. Standardmethods to analyze censored data, like the Kaplan-Meier estimator or the Cox proportionalhazards model, assume that censored observations are independent of the time to event and wecall this type of censoring non-informative. In real life studies however, this is not always thecase and the censoring may depend on time to event either directly or through covariates. Inthat case the censoring is called informative or dependent and using the standard methods canlead to biased results.In this thesis we examined first how serious the issue of dependent censoring is by generatingdata with dependent censoring using two methods, one for two time-independent covariates andone for two time-independent and one time-dependent covariate, and studying how much biasis introduced if we assume independent censoring in the analysis.Different approaches have been proposed in order to correct for the issue of dependentcensoring, one of them is Inverse Probability Censoring Weighting (IPCW). In the second part,we will perform a simulation study to evaluate the performance of the IPCW method in thepresence of dependent censoring, for each of the two methods we examined in the first part.Results showed that the survival curves estimated by the traditional Kaplan-Meier methodhave only small bias in most cases. The bias increased when the dependent censoring getsstronger. The IPCW method overall performs well and corrects for the presence of dependentcensoring but it is not able to correct the bias fully in case the dependency is too strong orwhen we introduced a time dependent covariate which is subject to measurement error.Show less
The technique of whole slide imaging (WSI) boosts the application of deeplearning in medical imaging analysis and computational pathology. How-ever, fully supervised learning stucks into...Show moreThe technique of whole slide imaging (WSI) boosts the application of deeplearning in medical imaging analysis and computational pathology. How-ever, fully supervised learning stucks into bottlenecks due to the heavyreliance on manual annotations, which requires specific expertise and ex-pensive cost. Self-supervised learning would be a potential solution, whichis supervised by the signals generated from itself. It has been provedto perform as well as supervised learning on ImageNet in classificationtasks. Yet, its performance on medical image classification is unexplored.This study verifies the effectiveness of four self-supervised learning to de-tect anatomic structures on kidney biopsy WSI, including SimCLR, MoCo,SwAV and Barlow Twins. In the pretext-task, these self-supervised learn-ing algorithms are trained in 500 epochs with the same backbone archi-tecture, ResNet-50, which is initialized by the weights pre-trained on Im-ageNet correspondingly. The evaluation protocol is a semi-supervisedlinear classifier, implemented by using multi-nomial logistic regression.The results of the classification task show the features extracted by thefour algorithms all achieve good accuracy scores, higher than 85% withonly 10% labels. Among them, SwAV outperforms the other algorithmsfrom the perspective of overview and each class. Through this study,self-supervised learning algorithms exhibit the potential for more complextasks related to renal pathology.Show less
It is very easy to understand and to interpret a single tree model. However, it is of-ten unstable and relatively inaccurate. The aim of this article is to evaluate and improvethe performance of...Show moreIt is very easy to understand and to interpret a single tree model. However, it is of-ten unstable and relatively inaccurate. The aim of this article is to evaluate and improvethe performance of single tree algorithms. In total, three single tree algorithms includingClassification and Regression Tree (CART) applied with R package ’rpart’, EvolutionaryTree applied with R package ’evtree’ and a new method that combining Bayesian AdditiveRegression Trees (BART) and Born Again Tree were evaluated. We did a bechmark studyon six differnet datasets and found that the evolutionary trees and born-again trees bothperform better than CART in terms of accuracy. The relative performance between evolu-tionary and born-again trees depended on the dataset. Evolutionary trees performed betteron relatively larger datasets and born-again trees performed better on relatively smallerdatasets. However, these single tree methods still showed a huge gap in performance com-pared to BART, especially when applied to large datasets. we conclude that there is stillroom for the improvement of single trees compared to ensemble methods.Show less
Gaussian graphical models (GGMs) are probabilistic models that represent theconditional independence between random variables and present them in a graph.These models are applied to a variety of...Show moreGaussian graphical models (GGMs) are probabilistic models that represent theconditional independence between random variables and present them in a graph.These models are applied to a variety of domains, such as social sciences, eco-nomics and natural sciences when visualizing the topology of a network. However,traditional GGMs can be improved by adding conditioning to the estimation of thenetwork on another related data source. These models are called conditional Gaus-sian graphical models (cGGMs). In these models, another related data source isconditioned on the primary data source when creating its graph. Most developmentin the field of cGGMs useℓ1norm penalty which can show shortcomings in certainscenarios. In this thesis, we proposed three simple cGGM estimation methods us-ingℓ2norm penalty parameters as an alternative to these methods. We conducted asimulation study and a real data analysis to test our proposed estimating methods.Our results demonstrated that several of our proposed estimation methods betterreconstruct the network topology when compared toℓ1based cGGMs and GGM.Show less