This study explores the circumstances under which traditional statistical methods and machine learning methods perform best. The literature has provided no conclusive answer on when each approach...Show moreThis study explores the circumstances under which traditional statistical methods and machine learning methods perform best. The literature has provided no conclusive answer on when each approach performs best. Instead, many contradicting findings have been reported demonstrating situations from both perspectives. Often sample size plays a role in which approach is recommended as machine learning methods are said to perform better when the data sample is big. We performed a simulation study, in which we varied several complexity parameters: number of covariates, interactions, interaction depth, regression coefficients, variance of p(x), and formula complexity. Additionally, we reviewed whether sample size and continuous covariates had any bearing on results by reviewing results across different sample sizes and including continuous covariates in combination with binary covariates. To analyze the results, we made use of accuracy, sensitivity, and specificity. From 138 models, we identified seven general patterns analyzed across different sample sizes: (a) a machine learning method performed best, (b) a traditional statistical method performed best, and (c) mixed performance. We extended the analysis to include more methods from both approaches. For each pattern and performance measure we selected models. This resulted in 20 median models in which not all patterns returned. In a similar analysis on three empirical data sets, similar behavior emerged, although the identification of patterns became more challenging. Our findings indicate that the variety within each pattern is too great to conclusively identify which complexity parameters produce a particular pattern, although nuances do exist. Moreover, many similar models are spread out across multiple patterns. The identification of patterns has shown that the opposing views in the literature might be explained by the existence of these patterns. We find that traditional statistical methods outperformed complex machine learning methods in several patterns. Furthermore, we determine that sample size is not the sole determinant to select the best approach, as results demonstrate several instances in which traditional statistical methods perform better on larger sample size(s). This adds new insights into how sample size and methods are related.Show less
Understanding the functioning of the brain and how it relates to behavior is one of theprimary objectives of neuroscience. The focus of neuroscience has evolved from a singlebrain to studying...Show moreUnderstanding the functioning of the brain and how it relates to behavior is one of theprimary objectives of neuroscience. The focus of neuroscience has evolved from a singlebrain to studying interactions between multiple brains. In several fields, synchrony inbrain responses between individuals has been proven to positively influence psychologicalprocesses and lead to better outcomes.Time-series data for each subject’s behavior or modality are obtained by measuringsynchrony. Comparative studies for synchrony methods have been carried out in orderto gain some insight into the similarities and differences between many measures forevaluating the synchrony between subjects using such time-series data. The research onlyprovides a partial picture of the performance of the synchrony methods in terms of captur-ing synchrony and the conditions in which these methods are optimal. It is still unknownhow well the synchrony methods perform when changing other data characteristics.The goal of this study is to evaluate the performance of several methods for capturingdifferent types of synchrony between a pair of time series. Two mechanisms are usedto generate a pair of time-series data with a known amount of synchrony between thetime series (1) two unidirectionally connected Hénon maps, and (2) bivariate von Misesdistribution. Correlations between the two time series are computed as another definitionof true synchrony to provide a different perspective on true synchrony. In addition, asystematical evaluation of the performance of the synchrony methods on simulated datawith various data characteristics is carried out.For the generated data coherence and phase synchrony are the two best performingmethods. Regarding the varied data characteristics, especially the amount of true syn-chrony has a large effect on recovery performance. These main effects between the datacharacteristics are qualified by several two-way and three-way interactions that almostalways include the synchrony methods and the amount of true synchrony. Under all ofthe different data characteristics, no synchrony method is perfect, and all of the synchronymethods in this study are not always stable. As a result, using a combination of differentsynchrony methods to detect synchrony is recommended.Show less
Using individual participant data (IPD) has many advantages over using aggregate data (AD) inclinical meta-analysis. However, access to the IPD is often limited, yet the aggregate data is...Show moreUsing individual participant data (IPD) has many advantages over using aggregate data (AD) inclinical meta-analysis. However, access to the IPD is often limited, yet the aggregate data is availablefrom most clinical trials. Papadimitropoulou’s et al. [4] propose a method for studies with continuousoutcomes at baseline and follow-up measurement to generate pseudo-IPD from the aggregate data,which can be analyzed as IPD, using analysis of covariance (ANCOVA) models and linear mixed mod-els. The pseudo-IPD is generated based on the mean, standard deviation at baseline and follow-up, andthe correlation between baseline and follow-up, which are sufficient statistics of the linear mixed model.This thesis exemplified the pseudo-IPD models, standard meta-analysis models, and a Trowman meta-regression model on Obstructive Sleep Apnea Data with 2 treatment groups. We further exploredthe performance of the models under different conditions by a simulation study. The estimates of theTrowman meta-regression suffered from significant variance, and the standard AD models providedbias estimation when baseline imbalance exists. The ANCOVA models for pseudo-IPD and AD offeredmore accurate and stable results. The pseudo-IPD ANCOVA model is the most preferred since it canaccount for baseline difference and interaction between treatment and baseline, and different residualstructures can be used.Show less
Longitudinal data are often collected in different research areas such as medicine, biology, education, and psychology. We can build a transitional model using longitudinal binary data, which aims...Show moreLongitudinal data are often collected in different research areas such as medicine, biology, education, and psychology. We can build a transitional model using longitudinal binary data, which aims to model the probability of transition between the response categories. In this type of data is common to find missing values due to dropouts, costs, and organizational problems. The missing-indicator model is often used as a method to handle missing values in this type of data. This method consists in creating a new category for the missing values. Therefore, the binary logistic model changes to a baseline-category logit model. This study aims to evaluate the bias of the estimated coefficients when the missing-indicator method is used in the response of a binary transitional model. Based on an empirical example, a Monte Carlo simulation with three factors is carried out: (1) type of missingness, (2) sample size, and (3) proportion of missing data. The coefficients bias from the baseline-category logit model is evaluated using boxplots and a three-way MANOVA analysis. The results suggest that sample size, the proportion of missing data, type of missingness and the interaction between sample size and proportion affect the bias of the estimated coefficients; nonetheless, the effect size is small. When each dependent variable is analysed separately using ANOVA, the effects of the proportion of missing and the interaction between sample size and proportion were statistically significant for only one coefficient. However, the effect size is still small. Therefore, the conclusion is that the estimated coefficients' bias for all the missingness types is low.Show less
We provide a simulation study complementing the theoretical results of Bosand Schmidt-Hieber (2021) for supervised classification using deep neuralnetworks. Theirmain risk boundsuggests a faster...Show moreWe provide a simulation study complementing the theoretical results of Bosand Schmidt-Hieber (2021) for supervised classification using deep neuralnetworks. Theirmain risk boundsuggests a faster truncated Küllback-Leiblerdivergence risk convergence rate with smoother conditional class probabilityfunctions and when fewer conditional class probabilities are near zero; aswell as that convergence rate is fast when the functions have a high degree ofsmoothness even if many probabilities are near zero. The proportion of smallconditional class probabilities can be measured bysmall value boundindex훼.We calculate훼for an illustrative selection of settings with conditional classprobability functions that have an arbitrarily high Hölder smoothness index훽.We estimate the Küllback-Leibler divergence risk convergence rate in thesesettings by evaluating networks trained on simulated datasets of various sizes.We find slower convergence rates than suggested by the main risk bound.However, in line with expectations,훼has no consistent effect on convergencerate when combined with arbitrarily high훽.Show less