Renáta Németh (ELTE Research Center for Computational Social Science): Who knows it better? The task of detecting discrimination using human coding vs. text mining (co-authors: Jakab Buda (ELTE TáTK, ELTE RC2S2) and Bori Simonovits (ELTE PPK)) – ELTE Research Center for Computational Social Science

In our study we assess the responsiveness of Hungarian local governments to requests for information by Roma and non-Roma clients, relying on a nationwide email study that applied a randomized controlled trial design.
Two methods were used in parallel to evaluate the response emails in parallel: traditional qualitative coding and machine learning (ML). Both methods provided evidence of attention discrimination. Our ML models worked significantly better compared to random classification, confirming the differential treatment of Roma clients. The most important predictors showed that the answers sent to ostensibly Roma clients are not only shorter, but their tone is less polite and more reserved, supporting the idea of attention discrimination. A higher level of attention discrimination is detectable against male senders, and in smaller settlements.
We show that it is possible to detect discrimination in textual data in an automated way without human coding, and that ML may detect linguistic features of discrimination that human coders may not recognize. To the best of our knowledge, our study is the first attempt to assess discrimination using ML techniques.