Comparison of different types of auto-encoders for data cleaning

Alberts, K.J.

Using machine learning techniques for data cleaning has a lot of potential, for example in repairing corrupted data or restoring missing information. Previous research has given rise to a lot of different ways of using machine learning in this way, one of which being the auto-encoder. A lot of different types of auto-encoders have since emerged, which are usually tested on one dataset or compared to one other type. This begs the question which type is best and if auto-encoders can be used in a more general sense. In this research, we propose to experimentally compare five different auto-encoders (basic, sparse, contractive, denoising and variational) for cleaning and to see which types of auto-encoders are the most suited and most accurate for data cleaning for three different datasets, namely CIFAR-10, MNIST (images) and US Weather Data (tabular). We implement a testing framework that allows easy implementation of different auto-encoders and datasets, and use this framework to test five different types of auto-encoders on two different image datasets. We find that for some types of auto-encoders there is no big difference in the type of dataset, but other types of auto-encoders work a lot better on certain types of data.

Comparison of different types of auto-encoders for data cleaning

Author(s): Alberts, K.J. (2021)

Abstract:

Document(s):