University of Twente Student Theses


Semantic video segmentation from UAV

Wang, Yiwen (2019) Semantic video segmentation from UAV.

[img] PDF
Abstract:With explodingresearches in deep learning, semanticvideosegmentationachieves remarkable advances (Jang & Kim, 2017)and has been usedin a wide range of academic and real-world applications. Using video semantic segmentationtechnique for Unmanned Aerial Vehicle (UAV)data processing is also a popular application. The UAVs could obtain high resolutionimages and videos from the dangerousand inaccessible areas where the mannedvehiclecannot reach with relatively low cost. It’s suitable for those tasks in small or dangerousareas which require high resolutionimages with numerous information in details.However, the semantic segmentationmission for UAV data also meet some specialchallenges caused by the characteristicof UAVs. The largestchallengeis the enormouschange of objects in videos. Traditional methods for video semantic segmentation for UAVs don’t care about the temporal information and justextend the single image segmentation method to multipleframes. In these approaches,UAV video is viewedasa collection of consecutiveframes. Each frame is a static image and is segmentedindividually. The segmentation result of each frame can be influencedeasily by the changes of the viewpointof the object, the changes of illumination, and deformation of the object. The same objectmay have differentappearancesin different frames and would lead to different segmentation result. Hence, the accuracy of thesesegmentationsisrelatively low. To keep the temporal consistency, the pixels of the same object in different frames should be assigned the same label. This researchproposesanFCN+Conv_LSTMframework for semantic video segmentation. The proposed algorithmtriesto combine the FCN model and the Conv_LSTMmodel.In this algorithm,the FCN model serves as the frame-based segmentation method which is used to segment each frame individually. The outputs of the FCNmodel are sentto Conv_LSTMmodel. According to the differentinputsof Conv_LSTMmodel, this framework is dividedinto two methods, oneuses the segmentation result of each frame anotherone usesthe featuremap of each frame. The Conv_LSTMmodel serves as the post-processing method which makes use of the temporal information between consecutive frames. The inputs of this part are sequences formed by the output segmentation results or the sequencesof the feature maps extracted from FCN model. Conv_LSTMlearn the temporal information of these sequences and output the final segmentation results.The dataset used inthisexperiment istheUAV videos captured in Wuhan, China from June to August 2017 and in Gronau, Germany in May 2017.27sequences are extractedfrom these videos. The experimental results show the superiority of this FCN + Conv_LSTMmodelespecially in some classes compared to the single image segmentation model.And the feature maps are more suitable for the Conv_LSTMmodel.This result shows the usefulness of temporal information in the task of semantic video segmentation.
Item Type:Essay (Master)
Faculty:ITC: Faculty of Geo-information Science and Earth Observation
Programme:Geoinformation Science and Earth Observation MSc (75014)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page