Topic Modelling for Tweets (Bachelor Thesis, Ongoing)


Simon Peterhans


The European Commission has found that between 59% and 72% of people consume news via social media depending on the age of the respondents. To facilitate the analysis of the public conversation surrounding events, there is a clear need for a system which easily allows the analysis of both traditional newspaper articles and user-generated content such as comments or posts on social media. The verifir system currently collects both tweets and newspaper articles for further analysis. One important dimension of the collected data is their textual content, since it allows for the investigations of the subjects discussed. This type of study has already been done in the context of the 2017 German Federal Election. 

The main objective of this thesis is to first, apply the topic modelling techniques from1 with a special focus on combining newspaper articles and tweets. Given enough time, the thesis can also investigate techniques to deal with multilingual content or investigate the usage of other algorithms .

For the evaluation, the thesis shall analyze collected tweets and newspaper articles for the swiss national election on the 20th of October with a focus on the swiss-german part of switzerland. One import analysis is the identification of different topic-clusters which emerge in the data and what they are discussing. This should allow a discussion on what important topics on social media with regards to the election were.


1 Morstatter, F., Shao, Y., Galstyan, A., & Karunasekera, S. (2018, April). From alt-right to alt-rechts: Twitter analysis of the 2017 german federal election. In Companion Proceedings of the The Web Conference 2018 (pp. 621-628). International World Wide Web Conferences Steering Committee.

Start / End Dates

2019/08/19 - 2019/12/19


Research Topics