Extracting Indicators of Compromise from Threat Reports by Leveraging the Power of LLMs

Author(s): Croquet Thorne, Mauricio (2025)

Abstract:
Current advancements in cybersecurity often aim at facilitating the search and identification of vulnerabilities. These include Indicators of Compromise (IOCs), which serve as data artifacts that can be easily targeted/used with the intention of exploiting said systems. Cyber Threat Intelligence (CTI) reports are produced by cybersecurity teams with the expectation of exposing vulnerabilities in a given security framework. The act of exposing these vulnerabilities includes extract- ing IOCs. These indicators are commonly extracted from CTI reports by using what are known as rule-based extraction tools. However, these tools have limitations as they only extract known and correctly structured IOC types, causing them to overlook other potential indica- tors. Recent developments in Large Language Models (LLMs) suggest that these tools can be used to extract IOCs from CTI reports. This re- search explores the current differences between rule- and LLM-based extraction with expectations of understanding the difference in per- formance of both sides. This research showcases an LLMs ability to extract IOCs with high completeness (80% in average). Meanwhile, showing how rule-based extraction tools are consistent with better precision, as f1-scores outperform LLM-based extraction.

Document(s):

Mauricio_Croquet_Thorne_BA_EEMCS.pdf