University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

Monocular Depth Estimation of UAV Images using Deep Learning

Madhuanand, Logambal (2020) Monocular Depth Estimation of UAV Images using Deep Learning.

PDF
3MB

Abstract:	UAVs have become an important photogrammetric measurement platform due to its affordability, easy accessibility and its widespread applications in various fields. The aerial images captured by UAVs are suitable for small and large scale texture mapping, 3D modelling, object detection tasks etc. UAV images are especially used for 3D reconstruction which has applications in forestry, archaeological excavations, mining sites, building modelling in urban areas, surveying etc. Depth in an image, defined as the distance of the object from the viewpoint, is the primary information required for the 3D reconstruction task. Depth can be obtained from active sensors or through passive techniques like image-based modelling that are much cheaper. The general approach in image-based modelling is to take multiple images with an overlapped field of view which can be processed to create a 3D model using methods like structure from motion. However, acquiring multiple images covering the same scene with sufficient base may not always be possible for complex terrains/environments due to occlusions. Single image depth estimation (SIDE) can not only overcome these limitations but also have various applications of its own. Estimating depth from a single image has traditionally been a tricky problem to solve analytically. However with recent advancements in computer vision techniques and deep learning, single image depth estimation has attracted a lot of attention. Most studies that estimate depth from a single image has been done with indoor or outdoor images taken at ground level. Using similar techniques to find single image depth from UAV images has applications in object detection, tracking, semantic segmentation, digital terrain model, obstacle or sensor mapping etc. It can also be used to reconstruct a 3D scene with limited images acquired beforehand. The problem is generally approached through supervised techniques that use pixel-wise ground truth depth information, semi-supervised techniques that use some information that is easier to obtain than depth like semantics or self-supervised techniques which doesn’t require any extra information other than the images. As the collection of ground truth depths is not always feasible and since the depths produced from self-supervised approach have proven to be comparable to that of the supervised approaches, self-supervised approach is preferable. Thus, this study aims to estimate depth from single UAV images in a self-supervised manner. For a deep learning model to learn in a self-supervised manner, a large number of images are required. A training dataset with UAV images is prepared by taking images from three different regions. The preparation of dataset involves undistortion and rectification to produce stereopairs. Image patches of smaller size are extracted from the images to accommodate in deep learning models. Around 22000 stereo image patches are produced for training the deep learning model. The main objective is to find a suitable deep learning model for SIDE. Two models, CNN and GAN are chosen due to their proven success in single image depth estimation for indoor images. The network architectures are modified based on the specifications of the UAV images dataset. Both models take as input one image from the stereopair, generates a disparity and then warp it with the other image in the stereopair to reproduce the original image. CNN model is based on VGG architecture consisting of image loss, the difference between original and reconstructed image, for backpropagation. While GAN model consists of, generator and discriminator structure to handle the image reconstruction task. Both models are found to be capable of producing disparity images. The results from both the models are inter-compared qualitatively as well as quantitatively with reference depths from SURE. The disparity output from CNN model showed closer approximation to SURE depths while GAN model produced disparities with fine details reproducing edges of roofs etc. However, GAN model has high noises and spikes in ground surfaces which needed improvement. To improve the quality of the SIDE models, a third model - InfoGAN is suggested where additional mutual information through an added network is used to improve the model performance. The disparity from stereopairs and gradient information is used as mutual information in this study. The InfoGAN model with disparity information shows improved results that are closer to CNN. The right mutual information provided through extended networks can improve the model performance even further.
Item Type:	Essay (Master)
Faculty:	ITC: Faculty of Geo-information Science and Earth Observation
Programme:	Geoinformation Science and Earth Observation MSc (75014)
Link to this item:	https://purl.utwente.nl/essays/85209
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page