University of Twente Student Theses
As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.
Multitask Approach to Video Scene Understanding
Vinasiththamby, V. (2024) Multitask Approach to Video Scene Understanding.
PDF
2MB |
Abstract: | This thesis proposes a multitask approach to enhance video scene understanding by focusing on two distinct, yet complementary, aspects of video content into a finite set of classes. The first task, action recognition, categorises the actions in the video. The second task, object detection, aims to localise any objects in the frame and classify them. We propose a multi-task model that utilises self-attention mechanisms to jointly output action classes, objects, and bounding boxes. The two pre-trained models that encode taskspecific information are used as frozen feature encoders to fine-tune the merger model. The approach is evaluated on the EPIC-KITCHENS dataset. This integration is important for maintaining coherent spatial and temporal information crucial for accurate video scene understanding. The multi-task model shows promising results, as it learns well and does not overfit during the training phase. Although the tasks are distinct, leveraging the information leads to a more holistic understanding for each task individually. Additionally, the multi-task model is lightweight, as only a few attention layers are trained. The integration of action recognition and object detection tasks enhances the overall understanding of video scenes, providing a comprehensive and efficient analysis |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/100992 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page