University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

Reducing labeled data usage in duplicate detection using deep belief networks

Janssen, Stefan C. (2016) Reducing labeled data usage in duplicate detection using deep belief networks.

PDF
938kB

Abstract:	Modern duplicate detection systems typically use supervised machine learning algorithms to create duplicate detection models. These algorithms require a large amount of manually labeled data to train on. Using semi-supervised deep learning techniques would allow the training to use not only labeled data, but also unlabeled data, which is easily available. The expectation is that this will allow models with less manually labeled data to achieve similar or better accuracy as traditional supervised algorithms.
Item Type:	Essay (Master)
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	54 computer science
Programme:	Interaction Technology MSc (60030)
Link to this item:	https://purl.utwente.nl/essays/70362
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page