University of Twente Student Theses

Login
As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

Reimagining Probability Storage and Variable Assignments in DuBio : A Comparative Study of Data Structures

Maas, T.P. (2025) Reimagining Probability Storage and Variable Assignments in DuBio : A Comparative Study of Data Structures.

[img] PDF
1MB
Abstract:Merging and inserting data from multiple sources can cause problems such as duplication, multiple labels referring to the same source and other data quality issues. Probabilistic databases view such "uncertain" data as a valuable aspect instead of a problem, and use intrinsic probabilities to draw conclusions. This paper discusses, provides and analyzes solutions to a bottleneck regarding dictionary lookups in PostgreSQL extension DuBio for probabilistic data. A black box explanation of DuBio is provided, and the quick growth of variables in their dictionary structure is addressed. Suitable solutions within the scope of this research are a table of tuples or a table of triplets for storing variables instead of a dictionary, which can be used to calculate conjunctions of propositional formulas called sentences. An algorithm for those calculations using only PostgreSQL is provided as a high level diagram, pseudocode and full queries are provided in the appendix. The performance analysis reveals the newfound methods to be almost strictly better, only being outperformed by DuBio’s dictionary lookups combined with binary decision diagrams, in very small datasets
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science BSc (56964)
Awards:Best Presentation Award
Link to this item:https://purl.utwente.nl/essays/107806
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page