University of Twente Student Theses
Added value of machine learning in retail credit risk
Gorter, D. (2017) Added value of machine learning in retail credit risk.
PDF
1MB |
Abstract: | This thesis aims to pinpoint the added value of machine learning in the domain of retail credit risk, where (logistic) regression approaches are most commonly used. Credit data of the on-line peer to peer lending platform Lending Club is used to create retail credit risk models with Logistic Regression, Random Forests, Neural Networks and Support Vector Machines. A level playing field is created for the models by means of a single data transformation to keep the input of all models equal. This level playing field is achieved by using Weight of Evidence to create a scaled data set without outliers or missing values. The created retail credit risk models are evaluated in terms of modeling approach and in terms of model performance in order to find added value. The research shows that the added value of the machine learning approach over the traditional (logistic) regression approach is present. Where the machine learning algorithms can handle all variables and decide for themselves how to model the relationships between the variables, the (logistic) regression approaches need careful selection of subsets of independent variables. This can be valuable when in the future the amount of information available about loan applicants is larger than there is time to address data issues like correlated variables. The research has also found added value of machine learning in terms of model performance. The Neural Networks and Random Forests produce more accurate results than (logistic) regression. The Support Vector Machines however are not suitable for retail credit risk predictions because the best predictions are made when models are trained with large amounts of data which proved to be problematic for the Support Vector Machines. The results of this research depend on the Weight of Evidence transformation which is shown to be sub optimal for the Random Forests and possibly the other machine learning models. However while this transformation is suitable for Logistic Regression, the method is still outperformed by the Random Forests and Neural Networks. |
Item Type: | Essay (Master) |
Faculty: | BMS: Behavioural, Management and Social Sciences |
Subject: | 50 technical science in general, 83 economics |
Programme: | Industrial Engineering and Management MSc (60029) |
Link to this item: | https://purl.utwente.nl/essays/72314 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page