University of Twente Student Theses
Biomarker Based COVID Severity Prediction and Data Quality Exploration
El Habashy, Mohamed (2023) Biomarker Based COVID Severity Prediction and Data Quality Exploration.
PDF
542kB |
Abstract: | This report focuses on investigating the significance of Data Quality (DQ) on COVID severity prediction models and how these models can affect resource allocation management. By understanding the impact of DQ on COVID datasets, valuable insights can be gained to enhance the allocation of resources. The main research question of this project is: “What is the importance of Data Quality in predicting the severity and progression of COVID?”. The research methodology employed for this study is a Design Science Research Methodology (DSRM). The findings reveal the prevalence of DQ issues in COVID data, with Missing Data and Imbalanced Data being the most common issues. To evaluate the effects of data quality we developed a COVID severity prediction model using a Support Vector Machine (SVM), and a feature importance analysis using permutation importance to demonstrate the correlation between biomarkers and COVID severity. Among the biomarkers, Leuco (Leukocytes) exhibited the strongest correlation. The model achieved an accuracy of 76%, precision of 91%, recall of 69%, and an F1 score of 79%. The findings underscore the critical role of Data Quality in influencing model outcomes, highlighting the importance of proper preprocessing to ensure accurate and reliable results for the machine learning model. These results are crucial for the effective allocation of resources. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 42 biology, 54 computer science |
Programme: | Computer Science BSc (56964) |
Link to this item: | https://purl.utwente.nl/essays/96187 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page