Krisztián Boros (LinkedIn, GitHub)
The ubiquity of missing data in quantitative research is undeniable. We may encounter with missing data due to, for example, non-response, incorrect sampling, or data processing errors. During the past 50 years, researchers have developed a wide variety of missing data handling methods; the spectrum of available techniques extends from the basic deletion methods (e.g. listwise- and pairwise deletion) to the more involved techniques (e.g. Multiple Imputation, EM-algorithm).
The aim of my thesis is twofold. On one hand, I introduce a text-mining approach to collect and analyze papers while pointing out the advantages and disadvantages of this particular approach using the Total Survey Error Framework. On the other hand, I try to examine the possible trends of the missing data handling methods across years and scientific fields.
The results show that the popularity of advanced techniques (e.g. Multiple Imputation, EM-algorithm) had been growing over the past 20 years, but the not-advanced techniques (e.g. deletion methods, mean imputation) are still in widespread use. In the case of the methodology, several limitations of the text-mining approach were pointed out such as the questionable generalizability and reliability of the results.