Efficient and accurate classification of cyber security related documents

Kobes, W.J. (2019)

Cyber security is a current issue in the media and in publications by both organisations and the academic field. When an overview of how many documents relate to cyber security is created, it might be possible to conclude how much is cared about the topic. Focusing on international organisations, lots of publications regarding cyber security exist, in many different document types. To make organisations comparable on this topic, their publications must be classified on relevance to cyber security. The intention of this research was to create a classifier that was efficient and accurate in classifying these cyber security related documents. To achieve this, there was looked into different text classification methods, of which a selection was implemented. Next to this, the various document types that occurred were analysed and grouped. The classification methods were tested with a manually classified subset of the data. The highest classification accuracy was achieved with a Neural Network classifier, reaching 96% accuracy. Finally, this classifier was applied to the entire data set.
Kobes_BA_EEMCS.pdf