University of Twente Student Theses


Deep learning-based building extraction using aerial images and digital surface models

Sun, Xiaoyu (2021) Deep learning-based building extraction using aerial images and digital surface models.

[img] PDF
Link to full-text:
(only accessible for UT students and staff)
Abstract:Building information is essential in multiple applications. The emerging of very high-resolution remote sensing imagery made the recognition of small-scale objects like buildings possible. However, manually extracting buildings from images is time-consuming. Therefore, different automatic or semi-automatic approaches have been developed for building extraction. With the rise of deep learning, Convolutional Neural Networks (CNNs) have outperformed traditional methods based on handcrafted features and become the dominant approach in image analysis. As the most popular CNN type for semantic segmentation, fully convolutional networks (FCNs) are widely used in building extraction. Most deep learning-based building extraction methods produce building masks in raster format, which cannot be directly integrated in geographic information systems (GIS) applications. Hence, some deep learning-based semantic segmentation models have been adapted for focusing on extracting building footprints as polygons directly. However, these models face the challenge of producing precise and regular building outlines. Recently, a building delineation method based on frame field learning was proposed by Girard et al., (2020) to extract regular building footprints as vector polygons directly from aerial RGB images. An FCN is trained to learn simultaneously the building mask, contours, and frame field followed by a polygonization method. Optical imagery has some limitations. The normalized digital surface model (nDSM) derived from Light Detection and Ranging (LiDAR) data can provide 3D information, which can serve as complementary information to help overcome these limitations. Hence, we introduce 3D information into the framework and explore the data fusion of different combinations of aerial images (RGB), Near-infrared (NIR) and nDSM to extract precise and regular building polygons. The results are evaluated at pixel-level, object-level and polygon-level, respectively. Moreover, we performed an analysis to assess the statistical deviations in the number of vertices per building extracted by the proposed methods compared with the reference polygons. The comparison of the number of vertices focuses on finding the output polygons easier to be edited by human analysts in operational applications. This analysis can serve as guidance to reduce the post-processing workload for obtaining high accuracy building footprints. The experiments were conducted in Enschede, the Netherlands. The results demonstrate 3D information provided by the nDSM overcomes the aerial images’ limitations and contributes to distinguishing the buildings from the background more accurately. The method benefited from the data fusion and achieved better results using the composite images (RGB + nDSM) than those achieved using RGB and nDSM only, considering both quantitative and qualitative criteria. The height information could reduce the false positives and prevent missing the real buildings on the ground. In addition, the nDSM improves positional accuracy and shape similarity, resulting in better-aligned building polygons. The additional NIR information further improves the results. Compared with the alternative method, the method outperformed the PolyMapper in all coco metrics, which shows that the investigated model can predict more precise and regular polygons for the study area.
Item Type:Essay (Master)
Faculty:ITC: Faculty of Geo-information Science and Earth Observation
Programme:Geoinformation Science and Earth Observation MSc (75014)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page