University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

The generalization performance of hate speech detection using machine learning

Coroiu, Alexandra (2019) The generalization performance of hate speech detection using machine learning.

PDF
342kB

Abstract:	The current need for automatic hate speech detection is supported by existing research and current implementations of natural language processing. The ability to generalize is an important characteristic of classification models used in natural language processing. In the case of hate speech detection, it assures accurate identification of abusive messages aimed at various groups, even if the model has not yet been trained on messages targeting those specific groups. This research measures the generalization performance of a machine learning implementation trained on sexist messages and tested on racist ones. The word count and term frequency - inverse document frequency features are extracted from text messages and used in a support vector machine with three different kernels: linear, radial basis function and polynomial. There is a substantial difference between the training F1 score benchmark of 0.8 and the testing F1 score result of hardly 0.3. The results show an overall low generalization performance for this classical machine learning method.
Item Type:	Essay (Bachelor)
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	54 computer science
Programme:	Computer Science BSc (56964)
Keywords:	Hate speech detection, Text classification, Natural language processing, Machine learning
Link to this item:	https://purl.utwente.nl/essays/78739
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page