Guided Data Collection using Twitter and NewsAPI for Elections (Bachelor Thesis, Finished)


Cristina Illi


Social media has joined the traditional newspaper in being one of the main news sources. It has become a challenge to keep up with the ever increasing pace at which information is produced. However, the tools to allow citizens, researchers and journalists to analyse the wealth of information produced are often very limited in scope, not publicly available or do simply not exist yet.
In the course of this thesis, Pythia’s functionality was extended from streaming to searching Twitter and collecting news articles from NewsAPI. A user interface for better accessibility when starting the collection process or improving the search criteria based on analysis was implemented.
As a use case, data from Twitter and NewsAPI in relation to the Swiss vote carried out on May 19th 2019 was collected. In this primary iteration, the list of search terms was chosen manually. For Twitter in particular, hashtags are a central component when collecting data focused on a specific topic. In order to improve the catalogue of search terms, analysis on the collected data were implemented. Especially the Jaccard coefficient is valuable to define future search terms.

Start / End Dates

2019/02/25 - 2019/06/25


Research Topics