University of Twente Student Theses


Investigating vision transformers for human activity recognition from skeletal data

Joseph, A.M. (2023) Investigating vision transformers for human activity recognition from skeletal data.

[img] PDF
Abstract:Transformers are increasingly being used for different kinds of applications these days. Recent works show that vision transformers can also demonstrate great capacity in solving Human Activity Recognition tasks based on skeletal trajectories. However, there are still certain aspects of them that are left unexplored, with respect to the input representation as well as the model architecture. We investigate two aspects of the problem: first, we use skeletal keypoint trajectories as inputs which are decomposed locally as well as globally. Secondly, we introduce convolutional learning in to transformers by using tubelet embeddings which we assume could extract better spatio-temporal information. We inspect our model on two different datasets, NTURGB+D 120 and HR-Crime. We observe that decomposing the keypoints globally and locally does not improve the performance. We also observe that incorporating a tubelet embedder to a simple transformer architecture gives similar results as the baseline results with significantly lesser computational costs. We also discuss the limitations of our work and what could be done to improve it.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page