Beáta Gallina – Sentiment analysis on articles from online news sites

2019 Survey Statistics and Data Analytics MSC Supervisor Renáta Németh, PhD

Beáta Gallina (https://github.com/bgallina, www.linkedin.com/in/bgallina)

In my thesis I focus on sentiment analysis (SA) on Hungarian online news articles. In this case study, I present the methodological steps of text mining and sentiment analysis – with special emphasis on preprocessing – the most important SA models, then I accomplish a comparative analysis. In addition I contrast two traditional (lexicon and machine learning based) models with the combination of them and use the model with the best performance to answer the following social science themed research questions: To what extent appears emotional attitudes related to political actors in Hungarian online press; has changes happened in the perception of political actors due to the elections on the side of journalists and is there a parallel between the results of traditional popularity polls and the results of SA, more specifically, is there a relationship between the voters’ preferences and the valency of the political actor presence.

After the model evaluation, I worked with Naive Bayes classifier and on the grounds of the outcomes, it can be concluded that the largest sentiment category is neutral, but the dominant class is greatly influenced by which political actor is represented in the given text. The work revealed that election day had an impact on politicians’ connotation in media: most opposition politicians appeared in more negative light in the opposition media after the voting, than before. In case of some parties, there is a similar tendency in polls and SA.

The accuracy of the models could be further enhanced by inclusion of other features – namely topics, n-grams, article authors – a larger training set and a more comprehensive sentiment dictionary.

Keywords: elections, text mining, sentiment analysis, polls, machine learning, Naives Bayes classifier