ObjectiveRandomized controlled trials (RCTs) for rare neurological diseases, such as the Guillain-Barr ́esyndrome (GBS), have a disappointing lack of success, possibly due to inefficient...Show moreObjectiveRandomized controlled trials (RCTs) for rare neurological diseases, such as the Guillain-Barr ́esyndrome (GBS), have a disappointing lack of success, possibly due to inefficient statisticalanalysis. We aimed to evaluate the impact of covariate adjustment for baseline characteristics,ordinal analysis and repeated assessments on statistical power in randomized controlled trialswith ordinal scales as outcome measure.MethodsWe re-analysed a previous trial in GBS (the IVIg + placebo vs IVIg + Methylprednisolone trial,n= 221) and conducted power simulations to assess performance of different approaches foranalysis of ordinal scales such as the GBS Disability Scale under different conditions. The ap-proaches consist of binary logistic regression and proportional odds logistic regression, with andwithout covariate adjustment for important prognostic factors (MRC sum score and days sinceonset of weakness to randomisation). The conditions consist of satisfaction of the proportionalodds assumption, the use of weaker prognostic baseline characteristics, and quantitative versusqualitative violation of the proportional odds assumption. We extended these approaches to alongitudinal proportional odds model. Simulations varied in sample size and treatment effect.ResultsCovariate adjustment led to an increased estimated treatment effect and increased standard errorin the GBS trial. Proportional odds analysis decreased the standard error in comparison to abinary logistic regression analysis, indicating a more sensitive analysis. The longitudinal pro-portional odds resulted in a larger standard error as compared to single time point proportionalodds analyses. Simulations for analysis of continuous data with a linear mixed model confirmedthat a longitudinal approach does not increase power as compared to a single time point analysisin case of a low within-subject variance, as was observed for the GBS trial. In simulations wefocused on the effect of covariate adjustment and ordinal analysis. Simulations indicated thatType I errors were generally around 5%. A small gain in power was achieved by covariate ad-justment for two known prognostic factors in GBS, and a larger gain by exploiting ordinalityinstead of dichotomizing the ordinal scale. The gains translated to a gain in power of up to 7 and13% points by covariate adjustment and exploiting ordinality respectively. The gains in powerwere only slightly smaller under violation of the proportional odds assumption and with smallerprognostic effects of the covariates.ConclusionOptimal analysis of ordinal scales should adjust for baseline characteristics (covariate adjust-ment) and should respect the ordinality of the outcome measure. A longitudinal proportionalodds model for analysis of repeated assessments may not have added benefit as compared to asingle time point proportional odds model. Further research should confirm that the use of alongitudinal proportional odds model is only beneficial when the observed disease course withinpatients is more variable over time.Show less
In the past decade, experimental developments in the field of transcriptomics have enabledresearchers to measure gene expression at the level of single cells, leading to a great increasein...Show moreIn the past decade, experimental developments in the field of transcriptomics have enabledresearchers to measure gene expression at the level of single cells, leading to a great increasein measurement resolution. Clustering the cells according to their gene expression profilescan aid the discovery of novel cell types. Unfortunately, it is often difficult to tell whetherthe established clusters are homogeneous or if sub-clustering might be possible. Recently,a new clusterability measure has been developed that aims to quantify the heterogeneityof gene expression within clusters. The so-calledSIGnal Measurement Angle(SIGMA)is based on a result from random matrix theory which states that the singular values ofa random matrix follow a known probability distribution. Singular values that stronglydeviate from this distribution are likely to be caused by deterministic sources of variability,such as differences between cell types. However, the heterogeneity may also be causedby unwanted technical sources of variance, such as batch effects, which arise when datafrom several experiments are combined. Various methods exist for batch effect correction,but it is yet unclear whether they reduce batch effects to the extent that their effect onSIGMA is sufficiently eliminated. In this thesis, we compared the efficacy of three differentbatch correction methods (fastMNN, Harmony, and Seurat) on simulated data and on twoempirical data sets. Their effectiveness was evaluated with several batch mixing metricsas well as by inspecting the singular values and vectors of the batch-corrected expressionmatrices. In conclusion, both fastMNN and Harmony worked well in most cases, but withunbalanced data sets (i.e., when one or more cell types were absent in one of the batches), itbecame increasingly difficult to decide whether singular values were batch effect-associated,because in those cases the batch effect also contained biological heterogeneity.Show less
In the analysis of 2×2×Kcontingency tables, a common hypothesis is the conditionalindependence of rows and columns controlling for a third variable. While frequentist ver-sions to test this...Show moreIn the analysis of 2×2×Kcontingency tables, a common hypothesis is the conditionalindependence of rows and columns controlling for a third variable. While frequentist ver-sions to test this hypothesis (e.g., the Cochran-Mantel-Haenszel) exist, it was the goal ofthis thesis to evaluate a Bayes factor alternative for the conditional independence testin contingency tables. Framing the test as a Bayesian model comparison using general-ized linear models, multipleg-prior variants were evaluated and compared to each otherthrough a simulation study. The simulation results indicate that priors like the hyper-g/n,intrinsic or the robust prior generally show desirable patterns for medium to large effectsizes, but are prone to lead to wrong conclusions for small underlying effect sizes unlessthe sample size is large. The R code for the simulation study can be found on GitHub:https://github.com/DHeemann/Bayesian-conditional-independence-simulation-Show less
In the context of factor analysis, the most common estimation method for analysing discrete datais multiple step Diagonally Weighted Least Squares (DWLS). A novel estimation method is...Show moreIn the context of factor analysis, the most common estimation method for analysing discrete datais multiple step Diagonally Weighted Least Squares (DWLS). A novel estimation method is calledPairwise Maximum Likelihood (PML). PML calculates the product of bivariate likelihoods byonly using a single step. PML estimation was found effective for small datasets with few discretevariables. In this study, we investigate how PML performs with large datasets and different typesof data (e.g., discrete data, continuous data, and combinations thereof).We conducted two different simulation studies to compare the performance of PML to theDWLS estimation method in terms of accuracy and efficiency. We thereby examined differentexperimental conditions; model sizes (small, medium, large, and huge), sample sizes (200, 400, and800), and answer categories (two and four). In addition, we checked the robustness of PML byfitting a model without misspecifications (i.e., correctly specified model) and with misspecifications(i.e., misspecified model). ANOVAs were conducted to test whether the differences between PMLand DWLS depend on the aforementioned design factors. Regarding the performance of PMLand DWLS, our results indicate that the (relative) bias of both the parameter estimates and thestandard errors remain very small among the varying experimental conditions for the correctlyspecified model and slightly increases in conditions with a misspecified model. Overall, our findingsdemonstrate that PML performed slightly better compared to DWLS in terms bias of both theparameter estimates and the standard errors.Show less
The detection of anomalies is a research area that has made great progressin recent years and decades. As more and more applications produce everlarger amounts of data, anomaly detection becomes...Show moreThe detection of anomalies is a research area that has made great progressin recent years and decades. As more and more applications produce everlarger amounts of data, anomaly detection becomes increasingly important.In the past most anomaly detection algorithms focused on static data sets,that is data sets with not time stamp or element, and did not take the ele-ment of time into account if it was provided. In addition, these algorithmsrarely have the ability to incorporate additional knowledge into their decision-making process and cannot adapt to changes in the data over time. Buildingon an algorithm called Evolutionary Isolation Forest which attempts to solveboth of these problems, this paper suggests a variation of this algorithm calledExtended Evolutionary Isolation Forest. This algorithm uses more complexsplitting criteria to isolate anomalies and uses evolutionary operators to refinethe decision process and adapt to feedback from experts. Using benchmarkdata, it can be shown that the algorithm performs similarly to the Evolution-ary Isolation Forest, but without generally outperforming it. In addition, thealgorithms are compared with a real-world data set from the energy infras-tructure provided by WithTheGrid.Show less
E-variables are tools for statistical testing that allow results from multiple ex-periments to be easily combined. Large E-variables denote a large amount ofevidence against the null hypothesis. In...Show moreE-variables are tools for statistical testing that allow results from multiple ex-periments to be easily combined. Large E-variables denote a large amount ofevidence against the null hypothesis. In this thesis we focus on E-variablesthat are valid under optional stopping, which means that the researcher maychoose to stop collecting data at any point in time, even after viewing the sta-tistical result. We look at a non-parametric setting where it is tested whethera distribution is symmetric around 0 or not. These E-variables are allowedto be conditioned on past data. This means that we can learn ideal valuesof E-variable parameters while collecting data such that the E-variables per-form better and better as more data come in. We examine multiple versions ofEfron-De la Pe ̃na E-variables and introduce ‘hedge’ E-variables. We also exam-ine E-variables based on rank-based methods such as the Sequential Rank Test,and we introduce a modified version of the Safe Mann-Whitney U test, whichwe call the Split-Safe Mann-Whitney U test. We evaluate these E-variables bycomparing the amount of data needed to gather enough evidence for a ‘signifi-cant’ result, as well as the rate at which the E-variables grow larger when moredata is collected, when data is generated by a variety of probability distribu-tions. We have found that different E-variables perform better across differentgenerative probability distributions, but overall the ‘hedge’ Efron-De la Pe ̃naE-variable and the Sequential Rank Test appear to be the best E-variables forthis setting. All examined E-variables require more data to be collected for a‘significant’ result in comparison to the classical Mann-Whitney U test. How-ever, the benefit of E-variables is that they do not require the researcher to seta fixed sample size beforehand, which can outweigh the lower performance.Show less
Patient survival in biomedical studies is often subject to multiple clinical endpoints, allof which compete for the first and possibly only opportunity of occurrence. As a result,the occurrence of...Show morePatient survival in biomedical studies is often subject to multiple clinical endpoints, allof which compete for the first and possibly only opportunity of occurrence. As a result,the occurrence of competing events may preclude the observation of a specific clinicaloutcome of interest. To gain further insight into specific outcomes in the presence ofcompeting events, a special type of survival analysis is required, known as competingrisks analysis. The presence of treatment effects in competing risk models can be visuallyexamined by constructing a cumulative incidence curves. These curves illustrate theprobability of first occurrence for each event over a series of time points, and therebyavoid the bias that is introduced by competing events in classic survival curves.In randomized controlled trials, cumulative incidence curves are unaffected by con-founding from patient-specific covariates, which is the result of strict random assignmentof patients between treatment cohorts. However, observational studies may often intro-duce imbalance of covariates between treatment cohorts, as certain groups of patientsmay be overrepresented within a particular treatment strategy. Covariate imbalancebetween cohorts results in a biased comparison of cumulative incidence curves, sincethey reflect the average failure probability within each cohort. This may discourageresearchers from using cumulative incidence curves to report findings in the presence ofcompeting risks in the presence of covariate imbalance. Fortunately, strategies have al-ready been well-documented to address covariate imbalance for survival analysis, whichhas led to covariate-adjusted survival functions. However, these methods have yet tobe further expanded upon to provide covariate adjustment for the cumulative incidencecurves used to report competing risk models.In this study, we have developed and examined various adjustment methods to pro-duce covariate-adjusted cumulative incidence curves in the presence of covariate imbal-ance between cohorts. A simulation study was carried out to compare the accuracy andprecision of these methods, and the best-performing method was applied on real-worldbreast cancer survival data. Covariate adjustment in breast cancer survival data al-lowed us to shed light on the role of covariate imbalance between patients treated withmastectomy and those treated with breast-conserving therapy for each of the competingoutcomes.Show less
One of the key characteristics to describe an infectious disease is incubation period. Commonlyincubation period estimates are obtained via interval-censored methods. Deng, You, Liu, Qin,and Zhou ...Show moreOne of the key characteristics to describe an infectious disease is incubation period. Commonlyincubation period estimates are obtained via interval-censored methods. Deng, You, Liu, Qin,and Zhou (2020) and Qin et al. (2020) proposed a new family of methods for estimating incuba-tion period and applied them to data from the initial SARS-CoV-2 outbreak in Wuhan. Thesemethods are based on the theory of the renewal process and do not require information on thetime of infection. Instead travel information (i.e., day of departure) are needed. These data tendto be easier obtainable and hence, larger datasets can usually be used. However, both Deng andQin made a number of assumptions that appear questionable. To date, no study has addressedthe validity of their proposed renewal methods nor their assumptions.In a novel simulation study, the impact of changing assumptions on the estimated incubationtime was investigated. Deng and Qin assumed that the time from infection to leaving Wuhanfollows a uniform distribution. This assumption is problematic because of the exponential in-crease in SARS-CoV-2 cases and the sharp increase in people leaving Wuhan before lockdownmeasures were implemented. In addition, both assume that up to 20% additional infectionsoccur at day of travel due to busy environments. However, it is not clear whether the correctionfor additional infections at travel day is warranted. As part of the thesis, a data generationmethod was introduced that takes these aspects into account.In this thesis, it is shown that the assumptions underlying the renewal process method areviolated by Qin and Deng and the proposed data generation method. The simulation studyshowed that the violated assumptions introduce a bias that is partially compensated by the biasintroduced by the inclusion of additional infection at day of travel. The findings suggest thatincubation period estimates based on current renewal process methods should be interpretedwith caution. The results of this work provide important insights into the accuracy of currentmethods for estimating incubation period. This can help to better understand the dynamics ofinfectious diseases, which in turn can help to contain the spread of future outbreaks.Show less
This thesis investigates the effect of having a family history of exceptional longevity on the risk ofcontracting age-related diseases. It analyses electronic health records of the offspring (and...Show moreThis thesis investigates the effect of having a family history of exceptional longevity on the risk ofcontracting age-related diseases. It analyses electronic health records of the offspring (and theirpartners) of members of long-lived sibships that were collected as part of the Leiden LongevityStudy (LLS). Because participants can contract multiple age-related diseases, I work within arecurrent events framework; this is an adaption of the classical Survival Analysis framework thatallows for events to happen repeatedly to an individual. The data has a nested structure: eventshappen to individuals who are organised within families. As such, a random-effects survival modelwith two levels of correlation – termed the Nested Frailty model – is applied to the data, with anadditional element of event dependence. The thesis consists of three parts. In the first, I derivethe likelihood for a Nested Frailty model. Next, two simulation studies explore the possible pitfallsof ignoring the elements of nested frailties and event dependence when these are present in thedata, and demonstrate that it is key to include both elements in the model. Finally, the LLS datais analysed. I find that a family history of exceptional longevity is linked with a slower rate ofacquisition of age-related diseases.Show less
Missing data is common in clinical research. How these missing valuesare handled has a direct impact on the final study results. Existing medi-cal studies commonly use complete case analysis to...Show moreMissing data is common in clinical research. How these missing valuesare handled has a direct impact on the final study results. Existing medi-cal studies commonly use complete case analysis to remove observationswith missing values, which has the advantage that it is simple and easybut depletes the information in the original dataset, and may result in bi-ased estimated. Multiple imputation (MI) methods are often consideredmore reliable than complete case analysis, missing indicator methods andsingle imputation methods. However, recent research has shown that bycomparing a number of MI methods, particularly where the underlyingassumptions are undermined, some MI methods may cause more bias inmodel estimates than complete case analysis.To study which methods would perform better under which circum-stances, this thesis will perform a simulation study, comparing the resultsof the above-mentioned techniques under certain types of missingness,such as MCAR, MAR and MNAR. Complicated connections like miss-ingness is also correlated with survival time will be considered.The various parameter settings for the simulation study are based ona real case study where about 12% of the observations contain missingvalues for some variables. In addition to the basic MICE, two other multi-ple imputation methods are compared, one with interaction terms betweenthe full variables and the baseline hazard in the imputation model, and theother with a specific substantive model in the iteration. In this thesis, thesubstantive model is Cox model. The simulation studies show that theone with interaction terms is not significantly different from MICE andits improvements are of limited applicability. The one with the specificsubstantive model is more suitable for complex data types and when thereare strong correlations between covariates. Besides, basic MICE also per-forms well in data sets with a high proportion of missing binary covariates, while the missing indicator method produces large bias in many settings, even for full case studies.Show less
We provide a simulation study complementing the theoretical results of Bosand Schmidt-Hieber (2021) for supervised classification using deep neuralnetworks. Theirmain risk boundsuggests a faster...Show moreWe provide a simulation study complementing the theoretical results of Bosand Schmidt-Hieber (2021) for supervised classification using deep neuralnetworks. Theirmain risk boundsuggests a faster truncated Küllback-Leiblerdivergence risk convergence rate with smoother conditional class probabilityfunctions and when fewer conditional class probabilities are near zero; aswell as that convergence rate is fast when the functions have a high degree ofsmoothness even if many probabilities are near zero. The proportion of smallconditional class probabilities can be measured bysmall value boundindex훼.We calculate훼for an illustrative selection of settings with conditional classprobability functions that have an arbitrarily high Hölder smoothness index훽.We estimate the Küllback-Leibler divergence risk convergence rate in thesesettings by evaluating networks trained on simulated datasets of various sizes.We find slower convergence rates than suggested by the main risk bound.However, in line with expectations,훼has no consistent effect on convergencerate when combined with arbitrarily high훽.Show less
The feasibility for detection and distinguishing HNLs signals with neutrino signals in IceCube/KM3NeT neutrino telescopes were discussed andWe proposed a distinguishable double tracks events...Show moreThe feasibility for detection and distinguishing HNLs signals with neutrino signals in IceCube/KM3NeT neutrino telescopes were discussed andWe proposed a distinguishable double tracks events geometry and calculated the sensitivity curves by Mathematica. Comparison of the number of events at the lower boundary in KM3NET and CHARM experiments implied IceCube/KM3NeT has no potential for probing this kind of event. We also studied the full decay channels of HNLs and calculated the corresponding sensitivity curves in IceCube and CHARM experiments, excellent agreements were obtained by comparing the numeric sensitivity bounds with the analytic formulas. Further comparison of sensitivity curves of HNLs in both experiments indicates that IceCube has no superiority in HNLs detection. The methods to improve HNLs detection sensitivity in IceCube were also considered, where extension of working time is the most reliable way. Besides, IceCube sensitivity may exceed CHARM by increasing the detector area and/or decay volume length, thus, IceCube gen2 is prospective to contribute to HNLs detection.Show less
The formation of large scale structure e.g. clusters of galaxies has caught plenty of attention lately, since nowadays there exists the technology to collect data with high precision. This implies...Show moreThe formation of large scale structure e.g. clusters of galaxies has caught plenty of attention lately, since nowadays there exists the technology to collect data with high precision. This implies that we might be able to start probing theories that go beyond the simplest single-field inflationary scenario, as an effective single-field theory coming from a two-field inflation framework. In particular, in this work, we study the (scale-dependent) bias in galaxy clustering that comes from an effective field theory, whose effect in the inflaton dynamics is a transient reduction in the speed of sound.Show less
Fluorescence Correlation Spectroscopy (FCS) and F¨orster Resonance Energy Transfer (FRET) are used in combination to study conformational dynamics of biological complexes such as nucleosomes....Show moreFluorescence Correlation Spectroscopy (FCS) and F¨orster Resonance Energy Transfer (FRET) are used in combination to study conformational dynamics of biological complexes such as nucleosomes. However, it is currently unclear how experimental conditions affect the accuracy with which rates of conformational dynamics are determined. We develop a computational method that allows us to simulate FCS data for FRET labelled molecules under Pulsed Interleaved Excitation (PIE). Using these simulations, we determined the effect of experimental conditions such as laser intensity, measurement time, sample concentration and shift in laser focus on the correlation curves and extracted diffusion time and rate of transitions. We show two ways to accurately determine transition rates of diffusing molecules in single experiments. One using PIE and a wavelength dependent correction factor and another under continuous excitation. These simulations can assist in obtaining new insight on protein- DNA interaction studies from combined PIE FRET-FCS studies.Show less
Magnetic Resonance Force Microscopy (MRFM) is a sensitive method to investigate spin systems, which uses a flexible cantilever as mechanical amplifier of the forces on its magnetic tip. However,...Show moreMagnetic Resonance Force Microscopy (MRFM) is a sensitive method to investigate spin systems, which uses a flexible cantilever as mechanical amplifier of the forces on its magnetic tip. However, MRFM is generally limited in its application at milliKelvin temperatures because existing devices rely on laser interferometry to detect cantilever deflection, which heats the cantilever, leaving many condensed matter systems out of reach for MRFM. Furthermore, lower temperatures correspond to lower cantilever force noise, so samples with more diluted spins could be investigated. SQUID-detected MRFM, using the flux induced by a moving cantilever tip, does allow for operation at milliKelvin temperatures. Yet, SQUID-detecting setups have still been limited in sample accessibility because the detection loop is printed on the sample. This thesis reports on the construction of a SQUID-detected MRFM device that employs a single probe head design to overcome the issue. The design choices and assembly methods for this device, called the easyMRFM, are discussed, as well as models to predict the sensitivity. It was found that the coupling is large enough to do optimisations in liquid helium dipstick experiments, although the thermal cantilever motion signal will only barely rise above the flux noise level. Lastly, a room-temperature magnetometry setup for cantilever chips is discussed that has proven useful in characterising cantilevers before mounting them in more permanent setups.Show less