University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

SecBERT : Analyzing reports using BERT-like models

Liberato, M. (2022) SecBERT : Analyzing reports using BERT-like models.

PDF
820kB

Abstract:	Natural Language Processing (NLP) is a field of computer science which enables computers to interact with human language rough the use of specific software. Generic NLP tools do not work well on domain-specific language, as each domain has unique characteristics that a generic tool is not trained to handle. The domain of cyber security, has a variety of unique difficulties, such as the need to understand ever-evolving technical terms, and, in the case of Cyber Threat Intelligence (CTI) reports, the extraction of Indicators of Compromise (IoCs) and attack campaigns. After evaluating how existing systems addressed these issues we created SecBERT by training BERT, a state-of-the-art neural network for NLP tasks, using cyber security data. We evaluated SecBERT using a Masked Language Modeling task, in which sentences from cyber security reports were masked and SecBERT was used to predict the hidden parts. The performance of models trained on the cyber security language-domain improved in precision by 3.4\% to 5.2\%, compared to the baseline of models trained on general language performing the same task.
Item Type:	Essay (Master)
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	54 computer science
Programme:	Computer Science MSc (60300)
Link to this item:	https://purl.utwente.nl/essays/93906
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page