Zsófia Rakovics, statistician and sociologist. During her university studies she conducted a research on precarity, globalization and its effect on society and work. She also took part in other sociological investigations (e.g. about gender inequality), and studied the remembrance of the Holocaust, by qualitatively analysing testimonies of survivors. Her methodological interest includes both qualitative and quantitative methods. After completing her master studies, she worked as a data scientist at an international company using computer vision and machine learning technologies to capture emotional reactions. Recently, her interest turned back to memory research, and towards studying populism and the political public discourse with natural language processing methods, through which she is linked to the research group. Her doctoral research focuses on the study of language polarisation, analysing online political public discourse using deep learning language models.
Doctoral research related to the project titled The layers of political public sphere in Hungary (2001-2020) (supported by NKFIH-K-134428)
Sociological study of language polarization
Doctoral student: Zsófia Rakovics
Advisors: Renáta Németh, PhD and Domonkos Sik, PhD
Description
The doctoral research empirically investigates language change and polarization tendencies in online political discourse (Gentzkow et al. 2016; Prior 2013) using deep learning-based language models (Vaswani et al. 2017; Devlin et al. 2018).
Deep learning-based language models are artificial neural networks with many layers and parameters that are capable of generating meaningful texts by learning syntactic and semantic features of the natural language (Vaswani et al. 2017; Devlin et al. 2018). According to the internal, abstract representation of the input texts, these language models are able to generate text that best matches the input (Brown et al. 2020; Mikolov et al. 2013; Radford et al. 2019).
The textual traces of information diffusion patterns can be used to reconstruct the discursive structure of the online political space, identifying network centers and peripheries (Bryden et al. 2013) and ‘contagion patterns’ of information diffusion (Alshaabi et al. 2021; Hamilton and Hamilton 2010). By empirically scanning online political communication for language change and language polarization, and describing their characteristics and dynamics in detail, we can gain deeper insights into the ways in which public discourse operates and thus their impact on society.
The goal of the research is to make the models trained on the Hungarian data available together with an easy-to-use graphical user interface through which new tasks (e.g., classification and abstraction of texts) can be defined and answers gained without deep technical knowledge. The methodology developed in this research can help to pave the way for a wider sociological application of language models.
References
Alshaabi, T., Dewhurst, D. R., Minot, J. R., Arnold, M. V., Adams, J. L., Danforth, C. M., & Dodds, P. S. (2021). The growing amplification of social media: measuring temporal and social contagion dynamics for over 150 languages on Twitter for 2009–2020. EPJ data science, 10(1), 1-28.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
Bryden, J., Funk, S., & Jansen, V. A. (2013). Word usage mirrors community structure in the online social network Twitter. EPJ Data Science, 2(1), 1-9.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Gentzkow, M., Shapiro, J., & Taddy, M. (2016). Measuring polarization in high-dimensional data: Method and application to congressional speech (No. id: 11114).
Hamilton, J. D., & Hamilton, L. C. (2010 [1981]). Models of social contagion. Journal of Mathematical Sociology, 8(1), 133-160.
Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 746-751).
Prior, M. (2013). Media and political polarization. Annual Review of Political Science, 16, 101-127.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.