Koltai, Júlia – Kmetty, Zoltán – Bozsonyi, Károly (2019) From Durkheim to machine learning – finding the relevant sociological content in a social media discourse. In: Rudas, Tamás – Péli, Gábor (eds.) Pathways Between Social Science and Computational Social Science – Therories, Methods and Interpretations. New York, NY, Springer. (forthcoming)

2019.12.15. Publication Data Science in Social Research

The phenomenon of suicide is in the focus of social scientists since Durkheim. Internet and social media sites provide new ways for people to express their positive feelings, but they are also platforms to express suicide ideation or depressed thoughts. Most of these contents are not notes about real suicides, but some of them are cry for help. Nevertheless, suicide and depression related content varies among platforms and it is not evident, how a researcher can find these contents in mass data of social media.  Our paper uses the corpus of more than 4 million Instagram posts, related to mental health problems. After defining the initial corpus, we present two different strategies to find the relevant sociological content in the noisy environment of social media. The first approach starts with a topic modelling (Latent Dirichlet Allocation), which output serves as the basis of a supervised classification method, based on advanced machine learning techniques. The other strategy is built on an artificial neural network based word embedding language model.