This paper describes the results of a study to test the efficiency of a set of estimating methodsused in Statistical Matching (SM) for categorical variables, in a variety of different conditions.SM...Show moreThis paper describes the results of a study to test the efficiency of a set of estimating methodsused in Statistical Matching (SM) for categorical variables, in a variety of different conditions.SM is a technique that integrates datasets that include different units, that are coming from thesame population and share some common variables. The goal of the technique is to estimate theassociation between the variables that are not common. The tested estimators include some exist-ing methods (i.e. the Direct estimator, the CIA estimator, the Combined estimator and the EMestimator), along with Iterative Proportional Fitting (IPF), which is applied for the first time inthe context of SM in the context of this research. The methods are tested in populations withdifferent levels of dependence between the target variables and also for the effect of using a selectiveoverlap of units (sampled from another, relevant population) to make these estimations. For thisreason synthetic populations are created and are used both to directly test the estimators for theirpredicting accuracy and as populations from which selective overlap sets could be sampled from.Furthermore, the accuracy, bias and variance of the cells of the estimated contingency table wereassessed. The results suggest that Direct and EM estimators remain almost unaffected by thechanges in the populations’ characteristics or the selective overlaps respectively. On the contrary,methods based on CIA estimator appear to have advantage when the conditional independenceassumption is met in the population.Show less