University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

Identification of a drinking water softening model using machine learning

Jenden, J.N. (2020) Identification of a drinking water softening model using machine learning.

PDF
3MB

Abstract:	This report identifies Machine Learning (ML) models of the water softening treatment process in a Water Treatment Plant (WTP), using two different ML algorithms applied on time series data: eXtreme Gradient Boost (XGBoost) and Recurrent Neural Networks (RNNs). In addition, a control method for the draining of pellets in the softening reactor is explored based on collected softening treatment data and the resultingMLmodels. In particular, the pH is identified as a potential variable for the control of pellet draining within a softening reactor. The pH forecasts produced by ML models are able to predict the future behaviour of the pH and potentially anticipate when the pellets should be drained. For implementation of the ML algorithms, the inputs and outputs of the ML models are first identified. Wherein, the pH within the softening reactor is selected as the output, due to its potential control properties. Subsequently, water softening treatment data is collected from a water company residing in the Netherlands. After collection, the data is pre-processed and analysed to be able to better interpret the ML results and to improve the performance of the ML models trained. During pre-processing, the implementation of twoML data splitting methods, walk-forward and train-validation-test, is carried out. The performance of the models is gauged using two different evaluation metrics: Mean Squared Error (MSE) and R-squared. Lastly, predictions are carried out using the trained ML models for a set of forecast horizon lengths. Comparing the XGBoost and RNN pH predictions, the RNN performs in general better than the XGBoost method, where the RNN model with a train-validation-test split, has aMSE value of 0.0004 (4 d.p.) and an R-squared value of 0.9007 (4 d.p.). Extending the forecast horizon to four hours for the RNN walkforward model yielded MSE values below 0.01, but only negative R-squared values. Thereby, suggesting that the prediction is relatively close to the actual data points, but does not follow the shape of the actual data points well. The evaluation metric results suggest that it is possible to create a good performing model using the RNN method for a forecast horizon length equal to one minute. Alternatively, this model is heavily dependent on the current pH value and therefore is deemed to be not a good predictor of the pH. Increasing the horizon length leads to only slightly lower MSE values, but the R-squared values are in general negative, indicating a poor fit. Keywords: Machine Learning (ML), water softening treatment, Water Treatment Plant (WTP), time series, eXtreme Gradient Boost (XGBoost), Recurrent Neural Network (RNN), pH, control, pellet draining, softening reactor, forecast, inputs, outputs, pre-process, data splittingmethod, walk-forward, trainvalidation-test, evaluation metric,Mean Squared Error (MSE), R-squared, prediction, forecast horizon
Item Type:	Essay (Master)
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Programme:	Systems and Control MSc (60359)
Link to this item:	https://purl.utwente.nl/essays/80662
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page