University of Twente Student Theses

Login

The generalization performance of hate speech detection using machine learning

Coroiu, Alexandra (2019) The generalization performance of hate speech detection using machine learning.

[img] PDF
342kB
Abstract:The current need for automatic hate speech detection is supported by existing research and current implementations of natural language processing. The ability to generalize is an important characteristic of classification models used in natural language processing. In the case of hate speech detection, it assures accurate identification of abusive messages aimed at various groups, even if the model has not yet been trained on messages targeting those specific groups. This research measures the generalization performance of a machine learning implementation trained on sexist messages and tested on racist ones. The word count and term frequency - inverse document frequency features are extracted from text messages and used in a support vector machine with three different kernels: linear, radial basis function and polynomial. There is a substantial difference between the training F1 score benchmark of 0.8 and the testing F1 score result of hardly 0.3. The results show an overall low generalization performance for this classical machine learning method.
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science BSc (56964)
Keywords:Hate speech detection, Text classification, Natural language processing, Machine learning
Link to this item:https://purl.utwente.nl/essays/78739
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page