University of Twente Student Theses
Building a sense inventory for Dutch healthcare abbreviations
Heijden, G.A.M. van der (2022) Building a sense inventory for Dutch healthcare abbreviations.
PDF
1MB |
Abstract: | Healthcare abbreviations pose problems to people reading healthcare reports and to text mining, due to being unknown or ambiguous. Word sense disambiguation (WSD) has been used to tackle the ambiguity of abbreviations, but WSD is bound by the exhaustiveness of abbreviation sense inventories. Unsupervised WSD, more often referred to as word sense induction (WSI), has been proposed to overcome the inhibiting dependency on sense inventories. A sense inventory can be constructed by annotating randomly sampled abbreviation occurrences, but this is a cumbersome approach. This thesis explores whether WSI can be used to reduce the annotation cost for finding abbreviation senses, while maintaining high sense coverage. In this thesis, WSI entails clustering vectorized abbreviation occurrences, with the aim of grouping together the occurrences of the same sense. Each cluster centroid is then annotated with a sense, reduces the number of annotations needed to retrieve an abbreviation’s senses. Aside from clustering, abbreviation occurrences and occurrences of candidate senses are compared through sentence similarity measures. The rank of the exact sense is quite bad, but the highest ranking candidate senses are often inflections or synonyms of the exact sense. |
Item Type: | Essay (Master) |
Clients: | Nedap, Groenlo, Netherlands |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science MSc (60300) |
Link to this item: | https://purl.utwente.nl/essays/93139 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page