University of Twente Student Theses

Login

Multitask Approach to Video Scene Understanding

Vinasiththamby, V. (2024) Multitask Approach to Video Scene Understanding.

[img] PDF
2MB
Abstract:This thesis proposes a multitask approach to enhance video scene understanding by focusing on two distinct, yet complementary, aspects of video content into a finite set of classes. The first task, action recognition, categorises the actions in the video. The second task, object detection, aims to localise any objects in the frame and classify them. We propose a multi-task model that utilises self-attention mechanisms to jointly output action classes, objects, and bounding boxes. The two pre-trained models that encode taskspecific information are used as frozen feature encoders to fine-tune the merger model. The approach is evaluated on the EPIC-KITCHENS dataset. This integration is important for maintaining coherent spatial and temporal information crucial for accurate video scene understanding. The multi-task model shows promising results, as it learns well and does not overfit during the training phase. Although the tasks are distinct, leveraging the information leads to a more holistic understanding for each task individually. Additionally, the multi-task model is lightweight, as only a few attention layers are trained. The integration of action recognition and object detection tasks enhances the overall understanding of video scenes, providing a comprehensive and efficient analysis
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science BSc (56964)
Link to this item:https://purl.utwente.nl/essays/100992
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page