University of Twente Student Theses
Explanation Guided Learning for Sports Video Data
Tomassen, Y. (2025) Explanation Guided Learning for Sports Video Data.
PDF
5MB |
Abstract: | Deep Neural Networks achieve high accuracy in human activity recognition but often lack transparency, hindering trust in applications like automated sports officiating. Explainable AI (XAI) aims to elucidate model decisions, while Explanation Guided Learning (EGL) seeks to improve model intuition by incorporating explanations into training. This thesis investigates the efficacy of EGL for activity diving recogntion. We propose and evaluate novel EGL methodologies that leverage optical flow to automatically generate ground truth attention maps, addressing the common EGL challenge of reliance on manual annotations. Our methods include: 1) an Optical Flow Guided Learning (OGL) approach using a Dice loss to align model-generated GradCAM attention with optical flow-derived diver silhouettes; 2) an OGL approach that directly transforms input frames using these diver masks (Temporal Mask Transform); and 3) a "Right for the Right Reason" (RRR) loss guided by either GradCAM (RRR + GradCAM) or optical flow-derived attention maps to penalise misleading input gradients (RRR + OGL). These are implemented on a SlowFast network architecture. Results indicate varying effectiveness among EGL approaches. The Dice-based method underperformed, likely due to a fundamental mismatch between GradCAM's attention and binary segmentation masks. However, the Temporal Mask Transform method demonstrates 6.67\% improvement at the lowest temporal resolution, and the RRR approach guided by optical flow (RRR + OGL) showed significant improvements, outperforming other augmentation methods at higher temporal resolutions (4.70\% improvement). Experiments with denser temporal sampling in the SlowFast model's slow pathway challenged original architectural assumptions of the SlowFast architecture. Our combined approach performed better than prior work while using approximately 56.0\% fewer computations. This research demonstrates EGL's potential for enhancing diving action recognition, underscores the viability of optical flow as a source for ground truth attention maps. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science MSc (60300) |
Link to this item: | https://purl.utwente.nl/essays/106575 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page