University of Twente Student Theses
Testing the sensitivity of machine learning classifiers to attribute noise in training data.
Schooltink, W.T. (2020) Testing the sensitivity of machine learning classifiers to attribute noise in training data.
PDF
466kB |
Abstract: | As datasets in the real world are often filled with some degree of noise in the data, emerging from several possible factors such as human error, a lot of research has been done on data cleaning algorithms. A notably less studied aspect of the data quality problem is research on the degree that noise in data affects classifier accuracy. This paper provides insights through an experimental approach to determine the impact different levels of noise in training data has on the accuracy of a resulting classifier, for Support Vector Classifiers and Random Forest Classifiers. The experiments show a high tolerance for noise in sensor data across both classifiers. With these results, one might be able to tune data cleaning algorithms or make an informed decision on what machine learning technique to choose based on a known data dirtiness. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/82072 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page