University of Twente Student Theses

Login

Building a sense inventory for Dutch healthcare abbreviations

Heijden, G.A.M. van der (2022) Building a sense inventory for Dutch healthcare abbreviations.

[img] PDF
1MB
Abstract:Healthcare abbreviations pose problems to people reading healthcare reports and to text mining, due to being unknown or ambiguous. Word sense disambiguation (WSD) has been used to tackle the ambiguity of abbreviations, but WSD is bound by the exhaustiveness of abbreviation sense inventories. Unsupervised WSD, more often referred to as word sense induction (WSI), has been proposed to overcome the inhibiting dependency on sense inventories. A sense inventory can be constructed by annotating randomly sampled abbreviation occurrences, but this is a cumbersome approach. This thesis explores whether WSI can be used to reduce the annotation cost for finding abbreviation senses, while maintaining high sense coverage. In this thesis, WSI entails clustering vectorized abbreviation occurrences, with the aim of grouping together the occurrences of the same sense. Each cluster centroid is then annotated with a sense, reduces the number of annotations needed to retrieve an abbreviation’s senses. Aside from clustering, abbreviation occurrences and occurrences of candidate senses are compared through sentence similarity measures. The rank of the exact sense is quite bad, but the highest ranking candidate senses are often inflections or synonyms of the exact sense.
Item Type:Essay (Master)
Clients:
Nedap, Groenlo, Netherlands
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:https://purl.utwente.nl/essays/93139
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page