University of Twente Student Theses
The Impact of Data Noise on a Naive Bayes Classifier
Stribos, R.H. (2021) The Impact of Data Noise on a Naive Bayes Classifier.
PDF
285kB |
Abstract: | Data from the real world often contains noise. Mistakes made by humans, incorrect measurements or equipment malfunctioning are just a few examples of how data noise arises. There has been a lot of research on how to clean such noise from databases, but there is a shortage of research on the effect of data noise on the accuracy of different classification algorithms. This research aims to study this effect on a Naive Bayes classifier and to compare it to a Random Forest classifier. In this paper, both classification algorithms are explained, as are the different types of data noise, and how such noise is added to the different data sets for the experiments. Furthermore, the effect of data noise on the accuracy will be discussed and both algorithms will be compared to each other. This research shows that Naive Bayes is robust against data noise in the training data until around the 90 percent of data noise, whereas noise in the testing data has an intermediate effect. In both cases however, it is more robust than a Random Forest classifiers which is immediately and more significantly affected by noise. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/85678 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page