Krisztián Boros – Meta-analysis of missing data handling methods with text-mining

2020 Survey Statistics and Data Analytics MSC

Krisztián Boros (LinkedIn, GitHub)

The ubiquity of missing data in quantitative research is undeniable. We may encounter with missing data due to, for example, non-response, incorrect sampling, or data processing errors. During the past 50 years, researchers have developed a wide variety of missing data handling methods; the spectrum of available techniques extends from the basic deletion methods (e.g. listwise- and pairwise deletion) to the more involved techniques (e.g. Multiple Imputation, EM-algorithm).

The aim of my thesis is twofold. On one hand, I introduce a text-mining approach to collect and analyze papers while pointing out the advantages and disadvantages of this particular approach using the Total Survey Error Framework. On the other hand, I try to examine the possible trends of the missing data handling methods across years and scientific fields.

The results show that the popularity of advanced techniques (e.g. Multiple Imputation, EM-algorithm) had been growing over the past 20 years, but the not-advanced techniques (e.g. deletion methods, mean imputation) are still in widespread use. In the case of the methodology, several limitations of the text-mining approach were pointed out such as the questionable generalizability and reliability of the results.

View Thesis