University of Twente Student Theses

Login

Perspective Interactions : Detecting Multimodal Social interactions from an Egocentric View

Nadar, Aditya (2024) Perspective Interactions : Detecting Multimodal Social interactions from an Egocentric View.

[img] PDF
3MB
Abstract:The idea of integrating data from multiple modalities is instinctively attractive as it can enhance the efficacy of Machine learning models. The system proposed here utilizes multiple modalities in the form of video and audio to develop a multimodal deep learning system capable of classifying Talking to me based social interactions from an egocentric view. This study extends the baseline work of Ego4D social interactions by devising a methodology to employ different multimodal fusion techniques, namely Early and Late fusion, and later realizing the optimal alternative to fuse the modalities. To employ these fusion techniques and implement optimizations at different stages, the system explores a multimodal framework called Multibench. The dataset used for this study is Ego4D, which consists of 3,670 hours of egocentric videos. By employing Multibench and its offered optimizations, our approach shows a mAP performance improvement of 3.67% (for Early fusion) and 5.52% (for Late fusion) compared to the baseline. The study also establishes a performance comparison between Early and Late fusion to identify the superior alternative of multimodal fusion with the dataset in hand. This study concludes by discussing the shortcomings of the system and guidelines for future improvements.
Item Type:Essay (Master)
Clients:
ALT001- ALTEN Nederland B.V., Apeldoorn, Netherlands
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Embedded Systems MSc (60331)
Link to this item:https://purl.utwente.nl/essays/98424
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page