University of Twente Student Theses
As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.
A Large-Scale Real World Data Integration Use Case Analysis for DuBio
Hesthaven, D. (2024) A Large-Scale Real World Data Integration Use Case Analysis for DuBio.
PDF
1MB |
Abstract: | This research focuses on the limitations and execution performance of semantic duplicate cleaning on real-world data of the probabilistic database DuBio, which was developed at the University of Twente. This was done by using the WDC Product Data Corpus, a large collection of data, to find the overhead of running similar queries on two versions of the same database, one probabilistic and one not. The goal is to find how increasing the size of the database or the size of clusters within the database, affect the relative difference in overhead between the two versions of the database, alongside finding any additional aspects that may influence the overhead. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/100879 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page