Search results

Li, Yanzhe 2020

Estimating the Effect of Classification Errors on Domain Statistics

Master thesis | Statistics and Data Science (MSc)

open access

The reliability of statistics is essential for official statistics. With administrative data more often used instead of survey data, non-sampling errors become important factors in the accuracy of...Show moreThe reliability of statistics is essential for official statistics. With administrative data more often used instead of survey data, non-sampling errors become important factors in the accuracy of statistics. For domain statistics, such as yearly turnover of enterprises, classification errors occur. This study aims to measure the effect of classification errors on domain statistics, more specially, bias and variance due to classification errors. In this study, a new method was developed that applies a Gaussian mixture model, estimated by the EM algorithm, in short referred to as the EM method. Further another method was introduced that combined the EM method with bootstrapping, referred to as the combined method. Among them, the EM method only estimates bias, and the combined method is able to estimate both bias and variance. Together with a previously used bootstrap method, the three methods were tested in a simulation study and in a case study. The bias and variance estimates from the three methods were compared with their corresponding true values in different settings. The results showed that the bias estimates from the EM and the combined method were closer to the true values compared to the bootstrap method; The combined method had closer outputs on variance estimation than the bootstrap method. The EM and the combined method were equally accurate in estimating the true bias. These results suggest that the EM and the combined method estimated the bias and variance more accurately than the bootstrap method. In practice, the combined method is recommended since both the bias and the variance can be estimated. In a situation with a very large data set, where the variance is usually small and the bias is of most concern, the EM method may be preferred.Show less

Leiden University Student Repository

Refine Results

Availability

Faculty

Thesis type

Programme

Issued

Supervisor

Language

Your search

Enabled Filters

Sort