University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

Predicting Loan Default in Retail Credits: The Case of Indonesian Market

Haq, Muhammad Yasir Muzayan (2018) Predicting Loan Default in Retail Credits: The Case of Indonesian Market.

Full text not available from this repository.

Full Text Status:	Access to this publication is restricted
Abstract:	Credit scoring has been a popular topic of research for decades. It generally aims to predict the probability of default of a particular loan. The most recent and popular approach to predict loan default is using a data-driven approach or often referred as data mining. This approach lets machine learning algorithms to study the pattern within a data to build a prediction model. The common use of this model is to assess the risk of default of a particular loan application in the credit approval process based on its characteristics. This thesis presents an application of Knowledge Discovery in Data (KDD) on retail credit data from Indonesian market to predict loan default, especially of new loan applications. The data was originated from a rural sharia bank in Indonesia and was developed using a bottom-up approach for this study. It was started with a list of potential relevant features found in the literature whose data then were populated by querying the relevant information from the raw operational tables (bottom-up). After being preprocessed to specifically accommodate predictive modelling to assess credit risk of new retail loan applications, the dataset consists of 25 features with 56,903 non-default cases and only 1,498 default cases. Considering the extreme class imbalance between default and non-default cases in the dataset (100:2 imbalance ratio), a special attention was given to handle it. Two class rebalancing techniques were performed: 1) SMOTE+Tomek-link which produced an oversampled dataset, and 2) random undersampling. Despite a better class balance in both datasets, the undersampled dataset contains relatively more outliers than the oversampled set according to Principle Component Analysis. Additionally, several alternative metrics were used to measure the prediction performance such that the prediction accuracy in the minority class (default cases) is not jeopardized by the majority class (non-default classes). The experiment was performed using 10-fold cross-validation on the original and the rebalanced datasets using five different classification algorithms including logistic regression, ANN, decision tree, random forest, and Naïve bayes. The result showed that using overall accuracy such as PCC as performance indicator is less appropriate in the imbalanced learning. Hence, F-measure score was chosen to compare the prediction performance and suggested that random forest is the best performing algorithm. The result also suggested that both rebalanced datasets produce a similarly better performance compared to the imbalanced dataset. However, the undersampled dataset is more efficient in terms of computing time and resource consumption due to its smaller size. The experiment was extended by introducing two additional features concerning socio-economic situation in which the loan was granted, namely, the Quarter of Ramadan (QoR) and the Issued After Financial Crisis (IAFC). The result showed that the addition of either QoR or IAFC to the dataset gave insignificant impact to the prediction performance. Moreover, mean decrease Gini (MDG) impurity score calculation from random forest algorithm that indicates feature importance suggested that both features are less important in predicting loan default, even though IAFC is relatively more important than QoR. Meanwhile, the MDG score also gave insight into the importance of other features that might help the bank improving its credit approval process by implementing data-driven approach in order to increase the decision quality and to reduce personal sentiment and subjectivity.
Item Type:	Essay (Master)
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	54 computer science, 83 economics, 85 business administration, organizational science
Programme:	Business Information Technology MSc (60025)
Link to this item:	https://purl.utwente.nl/essays/74475
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page