Using functional dependency thresholding to discover functional dependencies for data cleaning
Author(s): Smink, Ruben (2021)
Abstract:
Cleaning data is important before it can be processed. Erroneous data needs to be filtered out or repaired in order to achieve good results. One interesting method is to use functional dependencies to clean data. This is possible to do by hand on smaller data sets. However, when the data sets become larger and contain more attributes, this becomes labor intensive. In this paper, we describe a method of discovering functional dependencies useful for data cleaning. Using a method of data cleaning that uses FDs, we can test and evaluate how well a functional dependency performs. After this we can score them and use bayesian optimization to threshold the minimum score for a functional dependency to have a positive impact on the data cleaning process.
Document(s):
Smink_BA_EEMCS.pdf