Eszter Katona, Mihály Fazekas (2022): Hidden barriers to open competition: Using text mining to uncover corrupt restrictions to competition in Public Procurement

2022.07.15. Presentation Corruption in Online Editorial Media

Following a previous presentation, Eszter Katona, a member of our research group, was invited to the ICSA (International Conference on Sustainability Analysis) conference on 15.07.2022. The panel is entitled “Challenges and recent advancements in corruption risk assessment”. Eszter presented her research with her co-supervisor, Mihály Fazekas.

https://icsadias.wordpress.com/

Zsófia Rakovics, Márton Rakovics (2022): Semantic evolution of words in Hungarian PM Viktor Orbán’s speeches using a temporal word embedding model focusing on the issue of migration

2022.06.22. Conference poster The layers of political public sphere in Hungary (2001–2020)

Zsófia Rakovics and Márton Rakovics will display the results of their research entitledSemantic evolution of words in Hungarian PM Viktor Orbán’s speeches using a temporal word embedding model focusing on the issue of migration” at the poster section of the 8th International Conference on Computational Social Science IC2S2 (The University of Chicago Booth School of Business, Chicago, IL, USA), July 19-22, 2022.

Zsófia Rakovics (2022): Analyzing the semantic evolution of words related to migration in prime ministerial speeches using a temporal word embedding model

2022.06.03. Presentation The layers of political public sphere in Hungary (2001–2020)

Zsófia Rakovics gave a presentation at the annual conference of the Centre for Social Sciences, entitled “Analyzing the semantic evolution of words related to migration in prime ministerial speeches using a temporal word embedding model” on 3 June, 2022.

Emese Tímea Tóth (2022): Analysis of the Twitter discourse on sustainability using the methods of natural language processing

2022.06.03. Presentation The layers of political public sphere in Hungary (2001–2020)

Tímea Emese Tóth gave a presentation entitled “Analysis of the Twitter discourse on sustainability using the method of natural language processing” at the Centre for Social Sciences’ annual conference on June 3, 2022. The performance was a great success, with Emese receiving the Special Award for the Best Doctoral Presentation in the non-doctoral section.

Poster of the members of our research group at the CEU Data Stories exhibition

2022.05.23. Poszter

A visualisation of Fanni, Eszter and Árpád’s previous research was also exhibited at this year’s Data Stories event. https://datastoriesceu.org/gallery/happy-birthday-information-society

The primary aim of the research is to use NLP tools to review the themes that the journal has introduced into the domestic discourse of “information society studies” over the past 15 years, and to explore the thematic structure of the journal. In addition to content analysis, the research provides insights into the co-authorship network of the journal’s contributors and the relationship between authors and specific topics.

To read: https://doi.org/10.22503/inftars.XXI.2021.1.1

To view: https://inftars.infonia.hu/inftars20?lang=hu

Ildikó Barna, Árpád Knap (2022): Analysis of the Thematic Structure and Discursive Framing in Articles about Trianon and the Holocaust in the Online Hungarian Press Using LDA Topic Modelling

2022.05.16. Publication The layers of political public sphere in Hungary (2001–2020)

The latest publication by Ildikó Barna and Árpád Knap was published in the journal Nationalities Papers (D1). In their paper, they examined the thematic structure and discursive framing in newspaper articles related to the Trianon Peace Treaty and the Holocaust using LDA topic models and qualitative analysis. The article is open access.

Eszter Katona (2022): Using algorithms to track corruption. Applications of natural language processing in corruption research

2022.04.22. Presentation Corruption in Online Editorial Media

Eszter Katona, a member of our research group, gave a presentation at the ‘There’s something new under the sun’ conference on innovative methods. Eszter’s presentation was titled ‘Using algorithms to track corruption. Applications of natural language processing in corruption research’, and was related to her doctoral research.

Zsófia Rakovics (2022): Demonstrating the potential of Temporal Positive Pointwise Mutual Information (TPPMI) temporal word embedding model – Semantic evolution of words in prime ministerial speeches

2022.04.22. Presentation The layers of political public sphere in Hungary (2001–2020)

Zsófia Rakovics gave a presentation entitled Demonstrating the potential of Temporal Positive Pointwise Mutual Information (TPPMI) temporal word embedding model – Semantic evolution of words in prime ministerial speeches at the ’Van új a nap alatt’ conference of the ELTE Angelusz Róbert College for Advanced Studies in Social Sciences, 2022.04.22. The work will also be published in the accompanying conference proceedings (in print).

Eszter Katona, Mihály Fazekas (2022): Hidden barriers to open competition: Using text mining to uncover corrupt restrictions to competition in Public Procurement

2022.04.20. Előadás

Eszter Katona, a member of our research team, also presented a paper co-authored by her co-chair at the ECPR workshop on measuring corruption. Their presentation, ‘Hidden barriers to open competition: using text mining to uncover corrupt restrictions to competition in Public Procurement’, is related to Eszter’s PhD research.

Ildikó Barna, Árpád Knap (2021): An exploration of coronavirus-related online antisemitism in Hungary using quantitative topic model and qualitative discourse analysis

2022.01.11. Publication Online Antisemitism
Soon after the outbreak of the pandemic, antisemitism connected to the coronavirus appeared in the world. Ildikó Barna and Árpád Knap, members of our group analyzed the Hungarian online public sphere using quantitative topic model complemented with qualitative text analysis. They also proposed a comprehensive typology of its coronavirus-related antisemitic content.

Árpád Knap, Diána Bartha, Ildikó Barna (2021): Analysing The Memory Politics Of Trianon And The Holocaust Using Natural Language Processing

2022.01.04. Publication The layers of political public sphere in Hungary (2001–2020)

A new publication by members of our research team (Árpád Knap, Diána Bartha, Ildikó Barna) on the memory politics of Trianon and the Holocaust using NLP. Within the “European Memory Politics – Populism, Nationalism and the Challenges to a European Memory Culture (EuMePo)” international research network supported by Jean Monnet Network Grant. English abstract is also available on the link, and an English-language paper on this research is also in press.

Eszter Katona, Zoltán Kmetty, Renáta Németh (2021): Applying natural language processing to analyise the representation of corruption in the Hungarian online media

2021.07.19. Publication Corruption in Online Editorial Media

This paper presents a thematic analysis of the representation of corruption in the Hungarian online media, using a text mining tool called dynamic topic modeling. The text corpus was provided by K-Monitor and includes online articles on corruption and issues related to the misuse of public funds. Our study is exploratory in nature: it is aimed at identifying the main topics of the articles and the dynamics of thematic changes in the period 2007–2018, including the meaning, the background and the changes of each corruption topic. The causal links revealed by this research lie in whether the medium is of an oppositional or of a pro-government position, and how election campaign periods affect the thematic structure of the representation of corruption. Owing to the fact that the ownership of the news portal Origó changed during the analysed period, a natural experiment has also been possible in an attempt to reveal the impact of this change on the thematic structure of the corruption discourse on the portal in question.

https://mediakutato.hu/kiadvany/2021_02_nyar.html

Eszter Katona, Árpád Knap, Fanni Máté, Mihály Csótó – Topic modelling of the Információs Társadalom

2021.07.13. Publication

Three members of our research group: Eszter Katona, Árpád Knap and Fanni Máté, with the contribution of Mihály Csótó, wrote an article for the special anniversary issue of the Information Society journal. The primary aim of the study is to review the topics that the journal has included in the Hungarian discourse of “information society studies” over the past 15 years and to explore the thematic structure of the journal with NLP methods. In addition to the content analysis, the article also provides an insight into the co-author network of the journal, as well as the relationship between the authors and each topic.

The paper is accessible on the following URL: https://doi.org/10.22503/inftars.XXI.2021.1.1
The visualizations are available on this link: https://inftars.infonia.hu/inftars20?lang=hu

Fanni Máté (2021): Social Support on an Online Forum for Depression and Anxiety

2021.06.13. Publication Discursive framing of depression in online health communities

Nowadays, online communities are typical sources of social support, which is a considerable help especially for those suffering from depression or anxiety. The aim of my research is to investigate the patterns of social support on an online depression and anxiety forum and to serve as an exploratory research of Natural Language Processing usage to classify comments into the categories of social support. The uniqueness of my research is the quantitative text analysis based on a complete qualitative analysis of the whole dataset. The conclusions of the qualitative analysis provide profound information for model definition, and for their evaluation. This knowledge is important for the investigation the potential of automatic text analysis in sociology. On average, four out of five comments are related to social support on the examined forum. Informational support appears in 59.9 percent of the supportive comments, while emotional support appears in 44.7 percent. The applied models’ accuracies are nearly 80 percent, which means that they classified the vast majority of comments into the right category. The results show that there is a potential in building reliable models in order to classify the comments into the previously defined categories of social support.

https://szociologia.hu/szociologiai-szemle/tarsas-tamogatas-megjelenese-egy-depresszio-es-szorongas-temaju-online-forumon

Eszter Katona, Renáta Németh (2021): Automated text analytics in corruption research

2021.05.22. Publication Corruption in Online Editorial Media The layers of political public sphere in Hungary (2001–2020)
Our study examines the use and possible applicability of Natural Language Processing (NLP) in corruption research. In our review, we aim to collect and summarize automated text analytics-based corruption research born after 2000. We focus on the prevalence and potential of NLP methods. We found significant differences in the textual data sources, the corruption measurement methods, and the analytical approaches used.
However, there were unfortunately few mixed-type studies (in terms of data source, method, or corruption measurement method). In addition to the classic works describing of the volume of corruption or the attitude or perception related to it, we found results that can be used to prevent corruption and even be directly suitable for intervention. NLP has been used in only a few studies, and mostly only for some technical tasks. Our results show that NLP is not very widespread in this area yet. However, it can also be seen that its use can be useful and could support traditional quantitative research as an alternative tool. The aim of our article is to provide inspiration for the use of NLP in the social sciences and to draw attention to its embeddability in existing scientific discourses.

https://socio.hu/index.php/so/article/view/853

Németh, Sik, Katona (2021) – The asymmetries of the biopsychosocial model of depression in lay discourses – Topic modelling online depression forums

2021.04.26. Publication Discursive framing of depression in online health communities

New results of our project ‘NLP analysis of online depression forums’ was published in SSM Population Health (D1) written by Renáta Németh, Domonkos Sik and Eszter Katona. The asymmetries of the biopsychosocial model of depression in lay discourses – Topic modeling of online depression forums.

Our former publications in related topics

2021.04.14. Publication Discursive framing of depression in online health communities

Sik Domonkos: From mental disorders to social suffering: Making sense of depression for critical theories. EUROPEAN JOURNAL OF SOCIAL THEORY (2018)

Sik, Domonkos: Válaszok a szenvedésre: A hálózati szolidaritás elmélete. Budapest, Magyarország : ELTE Eötvös Kiadó (2018) , 228 p.

Sik, Domonkos: A szenvedés határállapotai: Egy kritikai hálózatelmélet vázlata. Budapest, Magyarország : ELTE Eötvös Kiadó (2018) , 246 p.

Deckovic-Dukres, V., Hrkal, J., Németh, R., Vitrai, J., Zach, H.: Inequalities in health system responsiveness. Joint World Health Survey Report Based on Data from Selected Central European Countries, 2007. Jelentés a WHO megbízásából.

Remák, E., Gál, R.I., Németh, R.: Health and morbidity in the accession countries. Country report – Hungary. ENEPRI Research Reports 28, Brussels: ENEPRI, 2006.

Albert, F., Dávid, B., Németh, R.: Social support, social cohesion. In.: National Health Interview Survey 2003, Research Report, 2005. (Hung.)

(magyarul: Albert Fruzsina, Dávid Beáta, Németh Renáta: Társas támogatottság, társadalmi kohézió. In.: Országos Lakossági Egészségfelmérés OLEF2003, Kutatási Jelentés, 2005.)

Sik, Domonkos (2020): From Lay Depression Narratives to Secular Ritual Healing: An Online Ethnography of Mental Health Forums

2020.12.29. Publication Discursive framing of depression in online health communities

The article aims at analysing online depression forums enabling lay reinterpretation and criticism of expert biomedical discourses. Firstly, two contrasting interpretations of depression are reconstructed: expert psy-discourses are confronted with the phenomenological descriptions of lay experiences, with a special emphasis on online forums as empirical platforms hosting such debates. After clarifying the general theoretical stakes concerning contested ‘depression narratives’, the results of an online ethnography are introduced: the main topics appearing in online discussions are summarised (analysing how the abstract tensions between lay and expert discourses appear in the actual discussions), along with the idealtypical discursive logics (analysing pragmatic advises, attempts of reframing self-narratives and expressions of unconditional recognition). Finally, based on these analyses an attempt is made to explore the latent functionality of online depression forums by referring to a secular ‘ritual healing’ existing as an unreflected, contingent potential.

Renáta Németh, Domonkos Sik, Fanni Máté. 2020. “Machine learning of concepts hard even for humans: the case of online depression forums”. International Journal of Qualitative Methods

2020.08.25. Publication Discursive framing of depression in online health communities

Social scientists of mixed-methods research have traditionally used human annotators to classify texts according to some predefined knowledge. The ‘big data’ revolution, the fast growth of digitized texts in recent years brings new opportunities but also new challenges. In our research project, we aim to examine the potential for natural language processing (NLP) techniques to understand the individual framing of depression in online forums. In this paper, we introduce a part of this project experimenting with NLP classification (supervised machine learning) method, which is capable of classifying large digital corpora according to various discourses on depression. Our question was whether an automated method can be applied to sociological problems outside the scope of hermeneutically more trivial business applications.

The present article introduces our learning path from the difficulties of human annotation to the hermeneutic limitations of algorithmic NLP methods. We faced our first failure when we experienced significant inter-annotator disagreement. In response to the failure, we moved to the strategy of intersubjective hermeneutics (interpretation through consensus). The second failure arose because we expected the machine to effectively learn from the human-annotated sample despite its hermeneutic limitations. The machine learning seemed to work appropriately in predicting bio-medical and psychological framing, but it failed in case of sociological framing. These results show that the sociological discourse about depression is not as well founded as the bio medical and the psychological discourses – a conclusion which requires further empirical study in the future. An increasing part of machine learning solution is based on human annotation of semantic interpretation tasks, and such human-machine interactions will probably define many more applications in the future. Our paper shows the hermeneutic limitations of ‘big data’ text analytics in the social sciences, and highlights the need for a better understanding of the use of annotated textual data and the annotation process itself.

The supplementary material of this article can be found here.

Renáta Németh, Eszter Katona, Zoltán Kmetty (2020): The Perspective of Automated Text Analytics in Social Sciences

2020.04.30. Publication Data Science in Social Research

In our paper, we present an overview of Natural Language Processing (NLP) methods, which developed parallel with the spread of ‘Big Data’ paradigm. We present the most promising methods for social sciences, the specific research questions they can answer and the methodological features that distinguish them from classic quantitative methods. These methods go far beyond classic quantitative text analysis based on simple word frequencies. Their modelling logic arises from machine learning methods; hence, it is substantially differing from the classic social science logic that seeks for explanation and casual effects. Our goal is to inspire Hungarian social scientists by providing an insight into a less-institutionalized area, since we believe that at an international level, text mining will be a standard method for empirical social science research within a few years.

Barna, Ildikó, and Árpád Knap. 2020. „A Case Study of Using LDA Topic Modeling in Sociological Research – Antisemitism in Contemporary Hungary”. Presentation, Institute of Formal and Applied Linguistics, Charles University, Prague, Czech Republic.

2020.01.20. Presentation Online Antisemitism

Ildikó Barna, co-leader of our research group, gave a presentation on contemporary Hungarian antisemitism at the Formal and Applied Linguistics Institute of Charles University Prague. The presentation was based on the Online Antisemitism project conducted with Árpád Knap. In addition to presenting the results of the research so far, in her lecture she also discussed why sociological and domain knowledge is indispensable for interpreting the output of natural language processing.

Further information of the lecture is available on the university’s website. The video recording of the presentation can be accessed on this link.

The post about the presentation can be found here on our website.

Németh, Renáta; Koltai, Júlia (2019): Sociological knowledge discovery through text analytics. In: Rudas, Tamás – Péli, Gábor (eds.) Pathways Between Social Science and Computational Social Science – Therories, Methods and Interpretations. New York, NY, Springer. 

2019.12.01. Publication Data Science in Social Research

In our work, based on recent research reports, we discuss the advances, challenges and opportunities of Big Data text analytics in sociology. The advances include the utilization of the originally and primarily business and technology-oriented development of information technology, data science, AI and NLP; and also, the rapid growth of computing capacity. These advances provide opportunities. Social behavior can be directly observed, not only on self-reported basis. The observation and analysis could happen in real-time, and – because of the development of NLP methods – the understanding of the content is getting deeper.

As our paper shows, there are new possibilities for sociological research which are in some sense just byproduct of information science. We introduce recently developed methods which can be applied to specific sociological problems outside the scope of business applications. We present sociological topics not yet studied in this area and show new insights the approach can offer to classical sociological questions. As our aim is to encourage sociologists to enter this field, we discuss the new methods on the base of the classic quantitative approach, using its concepts and terminology, addressing also the question of new skills acquired from traditionally trained sociologists.