Anna Sára Piros – Possibilities and limitations of using BERTopic

2024 Survey Statistics and Data Analytics MSc Supervisor Zsófia Rakovics

Anna Sára Piros (LinkedIn, GitHub)

I present the application and performance of a new topic modelling technique, BERTopic, in comparison to the commonly used LDA model. For a practical comparison, I tested one LDA and two BERTopic models on a corpus of English-language speeches of Prime Minister Viktor Orbán. For the optimized LDA model, I applied fixed settings to one BERTopic model and optimized settings to the other. To evaluate the models, I examined topic coherence and topic diversity indicators, as well as the interpretability of topic representations. The optimised LDA model produced redundant and incoherent topics, while both BERTopic models produced diverse, coherent and specific topics. BERTopic achieves better results, is simpler to use and has a wide range of possibilities thanks to its modular, flexible architecture.

View Thesis