Exploring human relationship recognition in egocentric videos using deep learning techniques

Peeters, Bart (2022)

Automating human relationship recognition, eg., friends, strangers, colleagues, etc., has big application potential in fields such as social media analyses, intelligent business services and public security. Deep learning techniques have made this automation possible. In the work of Costa et al. [1], 3 different cues, called the face, context, and body encoding stream, were to gather information from different parts of a scene. This information was combined using an Adaptive Fusion Module. In order to access this recognition for videos instead of single images, in this work, the original architecture was extended by incorporating Convolutional LSTMs. A brand new massive-scale egocentric dataset, called Ego4d, available with video's of daily life situations was used in order to test the automated human relationship recognition model just described. This dataset has been labeled with the classes: Friend, Stranger, Service, Colleague, Parents-offs, Couple. The model achieved an accuracy of 57\% and an F1-score of 0.55 on this 6 class classification problem. Comparing the model to the best performing model with only a single cue showed about an increase of 3\% in accuracy and 0.04 in F1-score.
Peeters_MA_DMB.pdf