University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

Study of semantic segmentation applications for autonomous vehicles

Sanchez-Escalonilla Plaza, Santiago (2019) Study of semantic segmentation applications for autonomous vehicles.

PDF
4MB

Abstract:	Autonomous Vehicles are machines capable of navigating the environment without human intervention. Such an interaction with the environment requires of a system that provides the vehicle with accurate information about the surroundings, this is called scene understanding. Scene understanding includes obtention, processing and analysis of data. There are different ways of obtaining information from the environment although the most common one is through the use of cameras. Cameras can obtain visual information of the surroundings in the same way humans do. There are different techniques that allow the user to learn from its composition depending on the final goal and the required level of accuracy. Some of the applications of these techniques are: image classification, object detection, object tracking or semantic image segmentation. Semantic image segmentation provides insight about the composition of the image in the highest possible detail. It consists on the pixel-label classification of the image. Semantic image segmentation can be very useful for navigation scenarios, allowing to create an accurate representation of the environment. Classical Computer Vision provides the necessary tools to achieve any of the previously mentioned applications, however these tools are very rigid and limited against variations of external and internal parameters (illumination, occlusion, depth, camera resolution, disturbances or noise). Using Deep Learning for computer vision applications has proved to be key for applications working in dynamic environment conditions. There are different Deep Learning models designed to achieve semantic image segmentation ([1], [2], [3]). Unfortunately, the segmentation obtained after applying these methods differ from the ideal expected case (figure 1.3). In the best of the cases obtaining a partial object segmentation but in the worst completely missing the target. There are two different approaches valid for the semantic video segmentation. The first one consists on the direct application of semantic image segmentation models in a frame-by-frame basis. However, this approach often leads to inconsistent results and the appearance of a ’flickery’ effect due to the frames rapidly changing conditions. The second approach is to design specific tools for the analysis of videos, extracting the temporal context and using it for the current frame segmentation. This thesis focus on the last one. To do so, this thesis has defined three particular research questions. What is the current stateof-the-art for semantic image segmentation? How to extend semantic-image-segmentation models for the analysis of sequences? and What kind of mechanicsms can be applied to reduce the number of false classifications? The result for the first research question selected DeepLabv3 [4] (pretrained on Cityscapes [5]) as the state-of-the-art and the baseline model for this study due to its high accuracy on urbanscenarios. The last two research questions are answered together on the design of an extension algorithm that can be applied to semantic image segmentation models for the analysis of sequences. Two different approaches were the outcome of this study, an approach that combines the baseline segmentation of neighbouring frames (Image Buffer) and a different approach that modifies the segmentation probabilities and afterwards establishes a relation between frames (Attention Module). Later, each method was evaluated using two metrics chosen to determine the temporal consistency of the segmentation. As a result, both of the suggested extensions improve the consistency of the segmentation overtime (chapter 5), in some cases helping on the segmentation of objects difficult to detect for the baseline model. On the other hand, these combinations also reduce the accuracy of the segmentation due to the increase of false positive classifications.
Item Type:	Essay (Master)
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	50 technical science in general
Programme:	Systems and Control MSc (60359)
Link to this item:	https://purl.utwente.nl/essays/79787
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page