University of Twente Student Theses


Semi-Supervised Semantic Segmentation of UAV Videos

Chaprana, Akshay Kumar (2023) Semi-Supervised Semantic Segmentation of UAV Videos.

[img] PDF
Abstract:Unmanned Aerial Vehicles (UAVs) have become crucial in various fields, collecting vast amounts of aerial data. Semi-supervised semantic segmentation techniques play an important role in extracting valuable information from this data. Combining labelled and unlabelled data enables efficient classification and segmentation of objects in UAV imagery. This integration has revolutionized decision-making processes in environmental monitoring, agriculture, infrastructure inspection, disaster management, and security surveillance. The extracted information aids in precision agriculture, urban planning, risk detection, and real-time situational awareness. The synergy between UAVs and semi-supervised semantic segmentation holds immense potential for advancing data analysis and decision support systems. Semi-supervised semantic segmentation has emerged as a powerful technique in computer vision for extracting precise and accurate object boundaries from images. Unlike traditional approaches that rely solely on labelled data, semi-supervised semantic segmentation leverages labelled and unlabelled data to enhance the segmentation efficiency. By merging the strengths of supervised and unsupervised learning, this approach addresses the challenges of limited labelled data availability while harnessing the abundance of unlabelled data. Various algorithms, such as self-training methods and deep learning models, have been evolved to exploit the potential of semi-supervised learning in semantic segmentation effectively. This research explored the scope of using semi-supervised techniques for semantically segmenting the UAVid dataset. This dataset brings high-resolution videos in 4K and unique challenges like dynamic object recognition, wide-scale disparity, and temporal consistency continuation. The BiMSANet model is used to segment the semantic classes in the UAVid dataset, as this model is very efficient in dealing with the challenges mentioned above. Only 10 labelled images with 5-sec intervals in each video sequence are available in the dataset. The interval between two frames is reduced from 5 sec to 1 sec to get higher temporal resolution and additional valuable information, which results in more frames in each video sequence. Pseudo-labelling is performed on the newly extracted frames through the use of a trained BiMSANet model on the original labelled images. Three experiments were conducted with different combinations of pseudo-labelled frames and original labelled images to assess the optimum condition for the selection of frames in semantic segmentation of the UAVid dataset. The results from the experiments are presented in the mIoU metrics. The findings from the experiments of our approach show improvement in the segmentation of most of the semantic classes present in the UAVid dataset. For instance, Exp – 1 shows the best results in Building and Road class, Exp – 2 shows the best results in Static_car and Moving_car classes, and Exp – 3 shows better accuracy in the Vegetation class. In Exp – 3, due to balanced segmentation across all the semantic classes, achieved the best mIoU score of 76.51% ( mIoU of 7 classes, excluding Human class). Exp – 3 outperforms the previous best BiMSANet (76.43%) by a margin of 0.08%.
Item Type:Essay (Master)
Faculty:ITC: Faculty of Geo-information Science and Earth Observation
Programme:Geoinformation Science and Earth Observation MSc (75014)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page