Detecting protest repression incidents from tweets
[摘要] Protests are considered a threat to governments and political elites, that is why protesters are likely to be faced with repression. For social scientists to study protest repression, they need protest repression datasets. Currently, social scientists depend on news reports to build protest datasets and political conflict datasets. Although news reports provide a source of information that gives access to historical and international events, they have limitations like the coverage of small protest events and the delay in reporting incidents. This research explores the use of social media posts, especially Twitter, to build protest repression dataset and to overcome the limitations of using new reports. We use supervised machine learning models with a dataset of tweets that were sent during the Turkish Gezi Park protest in 2013 to detect tweets that report protest repression events. To accomplish this, we run a crowdsourcing experiment to build a training dataset of tweets and their corresponding labels as protest-related or not and violent or not. Then, we use this dataset to train two baseline machine learning models: Support Vector Machine(SVM) and Multinomial Naive Bayes(MNB) with different text representation models: Bag of Words(BOW), TF-IDF and word Embedding(WE). The empirical results of the experiments show that Crowdsourcing with the right settings and quality measures provides a fast and cheap way to hand label datasets to train machine learning models. The results also show that baseline machine learning models perform well in tweets classification tasks in terms of good AUC scores (high true positive rate and low false-positive rate).
[发布日期] [发布机构] University:University of Glasgow;Department:School of Computing Science
[效力级别] [学科分类]
[关键词] Protests, Violence, Protest repression, Twitter, Machine learning, Text classification, Support vector machine (SVM), Naive Bayes (NB), Crowdsourcing, Figure-Eight. [时效性]