University of Twente Student Theses
Multitask Approach to Video Scene Understanding
Vinasiththamby, V. (2024) Multitask Approach to Video Scene Understanding.
PDF
2MB |
Abstract: | This thesis proposes a multitask approach to enhance video scene understanding by focusing on two distinct, yet complementary, aspects of video content into a finite set of classes. The first task, action recognition, categorises the actions in the video. The second task, object detection, aims to localise any objects in the frame and classify them. We propose a multi-task model that utilises self-attention mechanisms to jointly output action classes, objects, and bounding boxes. The two pre-trained models that encode taskspecific information are used as frozen feature encoders to fine-tune the merger model. The approach is evaluated on the EPIC-KITCHENS dataset. This integration is important for maintaining coherent spatial and temporal information crucial for accurate video scene understanding. The multi-task model shows promising results, as it learns well and does not overfit during the training phase. Although the tasks are distinct, leveraging the information leads to a more holistic understanding for each task individually. Additionally, the multi-task model is lightweight, as only a few attention layers are trained. The integration of action recognition and object detection tasks enhances the overall understanding of video scenes, providing a comprehensive and efficient analysis |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/100992 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page