University of Twente Student Theses
The generalization performance of hate speech detection using machine learning
Coroiu, Alexandra (2019) The generalization performance of hate speech detection using machine learning.
PDF
342kB |
Abstract: | The current need for automatic hate speech detection is supported by existing research and current implementations of natural language processing. The ability to generalize is an important characteristic of classification models used in natural language processing. In the case of hate speech detection, it assures accurate identification of abusive messages aimed at various groups, even if the model has not yet been trained on messages targeting those specific groups. This research measures the generalization performance of a machine learning implementation trained on sexist messages and tested on racist ones. The word count and term frequency - inverse document frequency features are extracted from text messages and used in a support vector machine with three different kernels: linear, radial basis function and polynomial. There is a substantial difference between the training F1 score benchmark of 0.8 and the testing F1 score result of hardly 0.3. The results show an overall low generalization performance for this classical machine learning method. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Keywords: | Hate speech detection, Text classification, Natural language processing, Machine learning |
Link to this item: | https://purl.utwente.nl/essays/78739 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page