University of Twente Student Theses
Automatic Phase Recognition for Surgical Video Analysis : A Cross-Modal Multi-Visual Cue Approach
Schokker, J. (2025) Automatic Phase Recognition for Surgical Video Analysis : A Cross-Modal Multi-Visual Cue Approach.
PDF
22MB |
Abstract: | Surgical phase recognition is an important field in medical image analysis for improving surgical safety, efficiency, and training. Phase recognition involves predicting the different phases of a surgery using machine learning methods. This research proposes a framework for improving phase recognition using a cross-modal multi-visual cue approach. The proposed model leverages the video frames in combination with the extracted descriptors of tool presence, segmentation masks, and action labels. The ablation study conducted in this study shows the best configuration of visual cues is image data and action triplets, achieving an accuracy of 0.826 and F1 score of 0.871 on the Cholec80 dataset, compared to 0.792 and 0.844 for the baseline model. The model also outperforms state-of-the-art models on the HeiChole benchmark dataset with an F1 score of 0.796 and an accuracy of 0.732. These results demonstrate the effectiveness of integrating multi-visual cues for phase recognition, offering a promising direction for improving surgical workflow analysis. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 44 medicine, 54 computer science |
Programme: | Interaction Technology MSc (60030) |
Link to this item: | https://purl.utwente.nl/essays/105181 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page