University of Twente Student Theses


The diversity of deprived areas: applications of unsupervised machine learning and open geodata

Trento Oliveira, Lorraine (2021) The diversity of deprived areas: applications of unsupervised machine learning and open geodata.

[img] PDF
[img] PDF
Link to full-text:
(only accessible for UT students and staff)
Abstract:The rapid growth of deprived areas in Low- to Middle-Income Countries (LMICs) is a great urban challenge that requires consistent and updated information specifically about the physical and living conditions in such areas. When available, census data provides detailed socioeconomic information at the household level, but it is resource-intensive and aggregated at area enumeration levels, masking important spatial differences. The increasing availability of very-high-resolution (VHR) imagery has boosted the publication of remote sensing (RS) mapping studies. Yet, only a few studies focus on characterizing the deprived areas because RS studies mostly focus on common physical morphologies of deprived areas and, thus, oversimplify their features. Moreover, even fewer studies address city-wide analysis due to the VHR acquisition and computational costs. To address these gaps, this research explores the intra-urban diversity of deprived areas using solely open RS and geo-data sources for São Paulo, Brazil. More specifically, it makes use of the potential of unsupervised machine learning (ML) models to capture intra-urban differences in deprived areas. First, based on literature, a pool of GIS- and RS-based features is developed to derive morphological and environmental characteristics of the study area. Next, a k-means clustering model is trained while running several optimisation experiments, including feature selection techniques and the inclusion of census-derived features. A feature importance tool is coupled to the k-means model to stress the relevance of specific features for each of the four resulting cluster types. The first cluster, “Infant settlements in open spaces”, is characterised by low accessibility to services and infrastructures, very sparse occupation and presence of vegetation. The second, “Unordered and poorly consolidated settlements”, is marked by steep terrain, lack of infrastructure and relatively low population densities. The third, “Less deprived settlements connected to non-residential areas”, is identified mainly by more regular layout and mixed land uses. And the fourth, “Densely urbanized and mature settlements with irregular layout”, is highly influenced by built-up density and complex (slum-like) morphology. The qualitative validation evinces that the unsupervised model successfully captures the intra-urban diversity of deprived settlements in São Paulo, stressesing higher precariousness for the second identified cluster. The assessment demonstrates that the proposed approach can be an alternative to current characterization studies using solely open data, providing a gridded output that supports the scalability of the model and its transferability to different cities. The cluster types are profiled and can be comprehensively used for the decision-making process. Moreover, this study offers an additional and important perspective to the characterization analysis with the census-derived features. For further research, the utmost suggestion is transferring the approach to other Brazilian cities and scaling it to a regional and national scale.
Item Type:Essay (Master)
Faculty:ITC: Faculty of Geo-information Science and Earth Observation
Programme:Geoinformation Science and Earth Observation MSc (75014)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page