University of Twente Student Theses

Login

Adversarial attacks on neural text detectors

Fishchuk, Vitalii (2023) Adversarial attacks on neural text detectors.

[img] PDF
650kB
Abstract:This thesis explores the effectiveness of adversarial attack methods in evading AI-text detection. Experimenting on three attack categories, prompt engineering, parameter tweaking, and character-level mutations, this research employs a mixed-methods approach to examine the effectiveness of such attacks with the recently released GPT-3.5 model. Results from this research reveal the low robustness of existing detectors towards practical and resource-efficient attack methods. The findings demonstrate how prompt engineering, parameter tweaking and character-level mutations can be exploited to evade detection effectively. Additionally, the study shows that detector algorithms struggle with the GPT-4 model and highlights the need for urgent improvement in existing detectors.
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science, 81 education, teaching
Programme:Business & IT BSc (56066)
Awards:The Twente Student Conference on IT - best paper
Link to this item:https://purl.utwente.nl/essays/95835
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page