University of Twente Student Theses


Quality assessment of medical health records using information extraction

Zanden, Guido van der (2010) Quality assessment of medical health records using information extraction.

[img] PDF
Abstract:The most important information in Electronic Health Records is in free text form. The result is that the quality of Electronic Health Records is hard to assess. Since Electronic Health Records are exchanged more and more, badly written or incomplete records can cause problems when other healthcare providers do not completely understand them. In this thesis we try to automatically assess the quality of Electronic Health Records using Information Extraction. Another advantage of the automated analysis of Electronic Health Records is to extract management information which can be used in order to increase efficiency and decrease cost, another popular subject in healthcare nowadays. Our solution for automated assessment of Electronic Health Records consists out of two parts. In the first part we theoretically determine what the quality of Electronic Health Records is, based upon Data and Information Quality theory. Based upon this analysis we propose three quality metrics. The first two check whether an Electronic Health Record is written as prescribed by guidelines of the association of general practitioners. The first checks whether the SOEP methodology is used correctly, the second whether a treatment is carried out according to the guideline for that illness. The third metric is more general applicable and measures conciseness. In the second part we designed and implemented a prototype system to execute the quality assessment. Due to time limitations we only implemented the SOEP methodology metric. This metric tests whether a piece of text is placed in the right place. The fields that can be used by a healthcare provider are (S)ubjective, (O)bjective, (E)valuation and (P)lan. We implemented a prototype based upon the 'General Architecture for Text Engineering'. Many generic Information Extraction tasks were available already, we implemented two domain specific tasks ourselves. The first looks up words in a thesaurus (the UMLS) in order to give meaning to the text, since to every word in the thesaurus one or more semantic types are assigned. The semantic types found in a sentence are then resolved to one of the four SOEP types. In a good Electronic Health Record, sentences are resolved to the SOEP field they are actually in. To validate our prototype we annotated text from real Electronic Health Records with S,O,E and P and compared it to the output of our prototype. We found a Precision of roughly 50% and a recall of 20-25%. Although not perfect, because we had time nor resources to involve domain experts we think this result is encouraging for further research. Furthermore we shown that our other two metrics are sensible with use cases. Although no proof they are feasible in practice, they show that a whole set of different metrics can be used to assess the quality of Electronic Health Records.
Item Type:Essay (Master)
Topicus Zorg
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page