University of Twente Student Theses

Login

Testing the sensitivity of machine learning classifiers to attribute noise in training data.

Schooltink, W.T. (2020) Testing the sensitivity of machine learning classifiers to attribute noise in training data.

[img] PDF
466kB
Abstract:As datasets in the real world are often filled with some degree of noise in the data, emerging from several possible factors such as human error, a lot of research has been done on data cleaning algorithms. A notably less studied aspect of the data quality problem is research on the degree that noise in data affects classifier accuracy. This paper provides insights through an experimental approach to determine the impact different levels of noise in training data has on the accuracy of a resulting classifier, for Support Vector Classifiers and Random Forest Classifiers. The experiments show a high tolerance for noise in sensor data across both classifiers. With these results, one might be able to tune data cleaning algorithms or make an informed decision on what machine learning technique to choose based on a known data dirtiness.
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science BSc (56964)
Link to this item:http://purl.utwente.nl/essays/82072
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page