Extracting Indicators of Compromise from Threat Reports by Leveraging the Power of LLMs

Croquet Thorne, Mauricio

Current advancements in cybersecurity often aim at facilitating the search and identification of vulnerabilities. These include Indicators of Compromise (IOCs), which serve as data artifacts that can be easily targeted/used with the intention of exploiting said systems. Cyber Threat Intelligence (CTI) reports are produced by cybersecurity teams with the expectation of exposing vulnerabilities in a given security framework. The act of exposing these vulnerabilities includes extract- ing IOCs. These indicators are commonly extracted from CTI reports by using what are known as rule-based extraction tools. However, these tools have limitations as they only extract known and correctly structured IOC types, causing them to overlook other potential indica- tors. Recent developments in Large Language Models (LLMs) suggest that these tools can be used to extract IOCs from CTI reports. This re- search explores the current differences between rule- and LLM-based extraction with expectations of understanding the difference in per- formance of both sides. This research showcases an LLMs ability to extract IOCs with high completeness (80% in average). Meanwhile, showing how rule-based extraction tools are consistent with better precision, as f1-scores outperform LLM-based extraction.

Extracting Indicators of Compromise from Threat Reports by Leveraging the Power of LLMs

Author(s): Croquet Thorne, Mauricio (2025)

Abstract:

Document(s):