Master thesis | Statistical Science for the Life and Behavioural Sciences (MSc)
open access
Real-world data contains both signal and noise. In this study, we developed a method to utilize replications to separate signal and noise. Our proposed method employed the Expectation-maximization...Show moreReal-world data contains both signal and noise. In this study, we developed a method to utilize replications to separate signal and noise. Our proposed method employed the Expectation-maximization algorithm to estimate both signal and noise precision matrices. The estimated precision matrices were used to construct a Gaussian graphical model, which represents the network of variables. In highdimensional settings, regularization techniques were used to ensure the positive definiteness of the estimated precision matrices. In the simulation study, we varied the graphical structure, the number of edges and the size of noise to see how the proposed method performs. As the true signal precision matrix is known, the estimates of the proposed method were compared to those by other methods through Kullback-Leibler divergence from the true one and prediction accuracy of edge presence or absence. The results show that for the clique models in our case, our proposed method unpenalized performed best in edge detection while for banded and star models under certain circumstances, the unpenalized estimates by the proposed method came last in edge detection. The distributions using the penalized estimates by our method are best or second best approximations of the true distribution in terms of KL divergence. Our results also show that with increasing samples and replications, the estimates become better in edge detection and approximation to the true distributions. In the real-world data analysis, we used three pathways of a lung cancer dataset from TCGA project. The results show that there is more overlapping between estimates of the merged data and single-platform data than between estimates of the two platforms in terms of KL divergence and edges in common. We also found that the distribution constructed by the signal-noise estimator by the proposed method is better approximation to that of the new data than signal estimator.Show less