University of Twente Student Theses
Comorbidity identification in clinical documents with medical terminology-based weak supervision.
Brouwer, S. (2024) Comorbidity identification in clinical documents with medical terminology-based weak supervision.
PDF
1MB |
Abstract: | Knowledge of patient comorbidities is crucial for effective healthcare decision-making and predictive modeling. However, data regarding comorbidities is often buried in unstructured text in the EHR. The aim of this work was to evaluate the potential of machine learning (ML) in extracting comorbidity data from EHRs. We frame the task of identifying comorbidity as a multi-label classification problem. We aim to classify emergency department documents for elderly hip fracture patient into the categories of the Charlson Comorbidity Index (CCI). We first evaluated four ML models in a fully supervised learning scheme. The performance of the fully supervised classifiers was hampered by the significant class imbalance for the CCI-categories. We attempted to mitigate the effects of the class imbalance by augmenting our training data with documents for patients outside the hip fracture cohort, using a weak supervision scheme. Weak labels were generated programmatically by checking for the presence of relevant terminology from SNOMED CT and the UMLS, supplemented with pseudo-labels generated by a fully supervised Random Forest. We find this approach to considerably improve classification performance for rare CCI categories. Random forest was the most performant model, achieving a classification accuracy of $0.75$ after inclusion of the weakly labelled documents. |
Item Type: | Essay (Master) |
Clients: | Ziekenhuisgroep Twente (ZGT), Hengelo, The Netherlands |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 44 medicine, 54 computer science |
Programme: | Computer Science MSc (60300) |
Link to this item: | https://purl.utwente.nl/essays/101039 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page