University of Twente Student Theses


Analysing machine learning algorithms to automate a decision making process

Marsman, B. (2020) Analysing machine learning algorithms to automate a decision making process.

[img] PDF
Abstract:This research has been conducted at K&R Consultants. K&R Consultants is a consultancy company in the construction sector located in Apeldoorn. They provide both advice on costs, installation and management. One of the activities of K&R Consultants is to analyse the budgets of contractors. In Chapter 1 the motivation of this thesis is explained and the problems encountered in the manual analysis of K&R Consultants are illustrated. From these problems, the main research question is formulated, which is later divided into multiple sub-questions. The main research question is stated as follows: How can the application of machine learning algorithms help K&R Consultants to automate the labelling process and in turn decrease the lead time of the process? First of all, the current situation of the manual analysis at K&R Consultants is explained in Chapter 2. This process goes from exporting the open budget from PDF to an Excel file, to filling in their tool, to labelling all the elements present in the open budget, to at last comparing the prices from K&R Consultants to the contractors. After this, a literature study on machine learning is conducted. Here various types of machine learning are described and elaborated on, as well as on the different machine learning algorithms that fit the type of machine learning of K&R Consultants, which is supervised machine learning. Next to this, ways to validate the machine learning algorithm are explained. These entail the division between training and testing of datasets, the confusion matrix, the f-score, the ROC curve, and at last k-fold crossvalidation. Next, the data of K&R Consultants is analysed in RapidMiner, a data science software platform that provides an integrated environment for data preparation, machine learning, deep learning, and predictive analytics. Before analysing the data, the data is prepared first by removing inconsistent data and by balancing the data. After this, the predictive models and validation models are built. In Chapter 5, the results are given. Here we find that three machine learning algorithms work well on the data of K&R Consultants. These are Gradient Boosted Trees, ID3, Naive Bayes, and k-Nearest Neighbour. After this, the algorithms are analysed using k-fold cross-validation where is shown that there is very little bias in the algorithms, because the accuracy does not fluctuate too much. Next to this, we also explain why the ROC curve does not work for the data of K&R Consultants, since the error is not uniformly distributed in the matrix. In this chapter also the advantages and disadvantages of the best performing algorithms are discussed. At last, several recommendations are given to the company. First of all, the recommendation for continuous improvement. K&R Consultants should be aware of the fact that new data should first be assessed before being added to the dataset. Next to this, it is recommended to, when implementing the algorithm into their analysis, show the three highest predictions for every prediction. It occurs sometimes that the second prediction is the right prediction when showing the three highest predictions the right prediction could be picked more efficiently by an expert. There are also some points of improvement to be given. K&R Consultants should keep in mind that in the future a more accurate machine learning algorithm could be available. Next to this, when adding new data to the dataset an imbalance can occur again, which can be favourable for another machine learning algorithm.
Item Type:Essay (Bachelor)
Faculty:BMS: Behavioural, Management and Social Sciences
Subject:58 process technology
Programme:Industrial Engineering and Management BSc (56994)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page