University of Twente Student Theses
A Large-Scale Real World Data Integration Use Case Analysis for DuBio
Hesthaven, D. (2024) A Large-Scale Real World Data Integration Use Case Analysis for DuBio.
PDF
1MB |
Abstract: | This research focuses on the limitations and execution performance of semantic duplicate cleaning on real-world data of the probabilistic database DuBio, which was developed at the University of Twente. This was done by using the WDC Product Data Corpus, a large collection of data, to find the overhead of running similar queries on two versions of the same database, one probabilistic and one not. The goal is to find how increasing the size of the database or the size of clusters within the database, affect the relative difference in overhead between the two versions of the database, alongside finding any additional aspects that may influence the overhead. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/100879 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page