University of Twente Student Theses

Login

Differentially Private Synthetic Data Generation using Large Language Models

Khalil, Saad (2024) Differentially Private Synthetic Data Generation using Large Language Models.

[img] PDF
1MB
Abstract:This paper presents an approach integrating differential privacy with Large Language Models (LLMs) for generating synthetic data, focusing on sensitive content such as user chats. Unlike traditional methods reviewed in the literature, our methodology employs a more heuristic approach to guide generation, thereby enhancing utility and fidelity while maintaining computational efficiency. Our method conditions LLMs with labels and the initial words of input text using special tokens, ensuring the preservation of context and semantic integrity, a crucial aspect for sensitive data sources. This approach contrasts with complex data generation methods that are computationally intensive for larger datasets and do not guarantee high utility and fidelity, nor the preservation of style in the synthetic dataset. We assess data utility and fidelity through a comparative analysis of the original and generated synthetic datasets, focusing on semantic and syntactic properties, and note a decrease in data utility and fidelity as privacy levels increase. This research underscores the complexities of balancing privacy safeguards against the functional usefulness of synthetic data. Our findings highlight the challenges in managing sensitive information, particularly private chats, emphasizing the importance of balancing privacy protection with the effectiveness of synthetic data. This balance is critical for advancing research methodologies in sensitive fields without compromising data confidentiality.
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Business & IT BSc (56066)
Link to this item:https://purl.utwente.nl/essays/98143
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page