Determining truth in tweets using feature based supervised statistical classifiers

Janssen, Bas (2016) Determining truth in tweets using feature based supervised statistical classifiers.

Abstract:Social media is getting more and more important in society. Social media is actively used by 32 percent of the world population and the amount of active users has grown 10 percent in 2015. Social media has changed how people communicate with each other and is taking over the way people obtain information such as (financial) news by replacing newspapers and how companies carry out their market research. Social media can be described as uncurated and uncontrolled and its messages can serve as a real-time propagation of information with an enormous reach. In several popular papers, the usefulness of so called social media mining has been shown and this has attracted other researchers to perform similar experiments with social media data. Next to these success stories with social media data, social media can have negative impact on society in which it, among other popular examples, enables rioters a communication channel and enables users to spread false rumours which causes panic in society and thus will have far-reaching consequences. By understanding this context, the tremendous opportunities to work with social media data and the acknowledgement of the negative effects, a way of determining truth in claims on social media would not only be interesting but also very valuable. By making use of this ability, applications using social media data could be supported (for example by using this ability as a filter step by discarding the false tweets) or this ability can be used as a selection tool in research regarding the spread of false rumours. In this thesis, we show that we can determine truth by using a statistical classifier supported by three preprocessing phases; filtering, detecting types of facts and extracting facts. We base our research on a dataset of Twitter messages (including meta-information) about the 2014 FIFA World Cup. We determine the truth of a tweet by using 7 popular fact types (involving events in the matches in the tournament such as scoring a goal) and we show that we can determine truth by using a feature based classifier achieving an F1-score of 0.988 for the first class; the tweets which contained no false facts and an F1-score of 0.818 on the second class; the tweets which contained one or more false facts. We show that we can determine truth for the selected kind of facts by using features which determine which fact type the facts in the tweet belong to in combination with features which determine the popularity of the facts (how many times users have repeated the fact), the reach of the facts (how many people were able to see the fact) and the number of replies on the facts in the tweet. Our discoveries look promising and we expect that there are several situations, which we describe in this thesis in detail, in which the reliability classifier will perform similarly as good as our obtained results. We expect that the classifier only performs well in situations comparable to the dataset we have used in the thesis and that more research is needed to provide the same results in incomparable situations, for which we offer some advice.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page