University of Twente Student Theses
Using functional dependency thresholding to discover functional dependencies for data cleaning
Smink, Ruben (2021) Using functional dependency thresholding to discover functional dependencies for data cleaning.
PDF
308kB |
Abstract: | Cleaning data is important before it can be processed. Erroneous data needs to be filtered out or repaired in order to achieve good results. One interesting method is to use functional dependencies to clean data. This is possible to do by hand on smaller data sets. However, when the data sets become larger and contain more attributes, this becomes labor intensive. In this paper, we describe a method of discovering functional dependencies useful for data cleaning. Using a method of data cleaning that uses FDs, we can test and evaluate how well a functional dependency performs. After this we can score them and use bayesian optimization to threshold the minimum score for a functional dependency to have a positive impact on the data cleaning process. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/85699 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page