University of Twente Student Theses
As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.
Leveraging LLMs for Automating the Extraction of Users and Financial Structures from the Multilingual Unstructured Data Leak of I-Soon
Condu, Alexandru-Stefan (2025) Leveraging LLMs for Automating the Extraction of Users and Financial Structures from the Multilingual Unstructured Data Leak of I-Soon.
This is the latest version of this item.
PDF
191kB |
Abstract: | The I-Soon data leak provides immeasurable insights into the inner workings of a private cybersecurity contractor involved in state-affiliated cyber-espionage activities. This paper delves into the usage of Large Language Models (LLMs) to extract key users and financial structures from the multilingual, unstructured dataset leaked anonymously on GitHub. By leveraging various pipelines (available at https://github.com/alexCondu/LLM-pipelines-for-I-Soon-Analysis), the LLMs incorporate data parsing, translation, enrichment and analysis, demonstrating their capabilities of parsing files (.md, .png, .txt, .log) into CSVs, translating the contents of the messages from Chinese to English through a multi-thread approach, identifying user and financial data and creating structured profiles of the actors involved. The LLM powered pipelines reduce the time spent by law enforcement, increasing the speed, scale and consistency of the analysis. Despite the challenges underlying in message translation, OCR extraction and noise within the data, the LLMs can effectively determine the company’s name, URL, CEO, financial insights and user profiles, laying a foundation for AI-driven cyber-leak investigations. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Business & IT BSc (56066) |
Link to this item: | https://purl.utwente.nl/essays/107499 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page