University of Twente Student Theses
Perspective Interactions : Detecting Multimodal Social interactions from an Egocentric View
Nadar, Aditya (2024) Perspective Interactions : Detecting Multimodal Social interactions from an Egocentric View.
PDF
3MB |
Abstract: | The idea of integrating data from multiple modalities is instinctively attractive as it can enhance the efficacy of Machine learning models. The system proposed here utilizes multiple modalities in the form of video and audio to develop a multimodal deep learning system capable of classifying Talking to me based social interactions from an egocentric view. This study extends the baseline work of Ego4D social interactions by devising a methodology to employ different multimodal fusion techniques, namely Early and Late fusion, and later realizing the optimal alternative to fuse the modalities. To employ these fusion techniques and implement optimizations at different stages, the system explores a multimodal framework called Multibench. The dataset used for this study is Ego4D, which consists of 3,670 hours of egocentric videos. By employing Multibench and its offered optimizations, our approach shows a mAP performance improvement of 3.67% (for Early fusion) and 5.52% (for Late fusion) compared to the baseline. The study also establishes a performance comparison between Early and Late fusion to identify the superior alternative of multimodal fusion with the dataset in hand. This study concludes by discussing the shortcomings of the system and guidelines for future improvements. |
Item Type: | Essay (Master) |
Clients: | ALT001- ALTEN Nederland B.V., Apeldoorn, Netherlands |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Embedded Systems MSc (60331) |
Link to this item: | https://purl.utwente.nl/essays/98424 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page