University of Twente Student Theses


Algorithms for Automated Scoring of Respiratory Events in Sleep

Nassi, Thijs-Enagnon (2021) Algorithms for Automated Scoring of Respiratory Events in Sleep.

[img] PDF
Abstract:Manual scoring of polysomnography (PSG) data, in particular respiratory event labelling, is a time-consuming task. Scoring patient recordings with highly irregular breathing and frequent apneic events is an iterative operation that may take up to multiple hours. In recent years the development and application of computer algorithms that assist manual labor has been growing immensely. Automation by such computer models has a great impact on the medical field, but interpretation of medical data is often exceptionally heterogenous. For instance, the American Academy of Sleep Medicine (AASM) provides rules for manual scoring that contain several arbitrary thresholds. This allows for dynamic interpretation of these criteria that can be manipulated between patients. In turn, this leads to increased inter-rater variability which reduces scoring consistency among annotators. Both rule-based models and machine learning algorithms offer a pallet of potentially more robust opportunities that may be applicable for automated respiratory event scoring. In this work a deep neural network and a rule-based model were designed and experimented on the worlds largest available PSG database by the Massachusetts General Hospital. The proposed approach using a deep neural network (WaveNet) showed that a performance comparable to literature can be obtained while using a minimally invasive methodology. Differentiation between event types was possible with limited accuracy and may reflect in part the complexity of human respiratory output and some degree of arbitrariness in the clinical thresholds and criteria used during manual annotation. Next, a completely original rule-based modelling approach to automatically score respiratory events during sleep is introduced. The AASM criteria were used as a blueprint to design a compartmentalized rule-based model architecture including hyperparameters that can be adjusted to mimic the ambiguity encapsulated in manual scoring. Global patient assessment by the model resulted in a strong agreement with the original single scorer labels. Per-event scoring led to comparable performance with current state-of-the-art models and clinical implementation opportunities seem feasible. Preliminary results from an experiment studying the inter-rater agreement among human scorers indicates significant misclassification on event-level granularity. These findings demonstrate that new approaches should be put in relative perspective to human-to-human agreement, and not in direct contrast to single scorer data. Comparison of the inter-rater agreement between human scorers and the model showed an average decrease in Cohen’s kappa value from 0.43 to 0.30. The already promising results of the proposed prototype model is expected to improve up to human-level scoring performance with future development iterations.
Item Type:Essay (Master)
Faculty:TNW: Science and Technology
Subject:44 medicine, 50 technical science in general, 54 computer science
Programme:Technical Medicine MSc (60033)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page