University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

Incorporating spatial autocorrelation in machine learning

Liu, Xiaojian (2020) Incorporating spatial autocorrelation in machine learning.

PDF
2MB

Abstract:	Applications of machine learning algorithms have witnessed substantial increases in the geoscientific field. However, the predictive performance of these algorithms can be biased if the existing spatial autocorrelation in geospatial data is unattended. This study investigates the approach to account for spatial autocorrelation by introducing additional spatial features in machine learning. We explore the incorporation of two spatial features, i.e. spatial lag and eigenvector spatial filtering (ESF) features, with the widely used random forest (RF) algorithm. Least absolute shrinkage and selection operator (LASSO) selection is introduced to determine the best subset among multiple spatial features that would be included in machine learning. The effects of these spatial features are illustrated on two public datasets of varying sizes (Meuse dataset and California housing dataset). Normal and spatial cross-validation are applied to hyper-parameter tuning and performance evaluation. We utilize Moran’s I and local indicators of spatial association (LISA) to assess whether the spatial autocorrelation is captured at both global and local scales. The results show that RF models combined with either spatial lag or ESF features generally yields lower training errors (up to 38% in difference) than the model with no spatial features included. The global spatial autocorrelation of residuals is reduced (up to 95% decrease in Moran’s I) when spatial features are included. The local patterns, especially for homogeneous clusters, are weakened as well. However, the generalized error of spatial models increases considerably in spatial cross-validation compared to the error estimated from normal CV (up to 43% in average difference). Normal cross-validation generally returns a lower generalized error which indicates a potential over-optimistic estimate. It can be concluded that the two proposed spatial features are able to account for spatial autocorrelation in machine learning. The differences between normal and spatial cross-validation should be considered whenever a spatial model is evaluated. This study reveals the effectiveness of spatial features in capturing spatial autocorrelation, and provides insights on the usage of spatial cross-validation in performance estimation.
Item Type:	Essay (Master)
Faculty:	ITC: Faculty of Geo-information Science and Earth Observation
Programme:	Geoinformation Science and Earth Observation MSc (75014)
Link to this item:	https://purl.utwente.nl/essays/83881
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page