University of Twente Student Theses

Login

Identifying Covid-19 Shortages with the Help of an Automatically Constructed Knowledge Graph

Theodorakopoulos, Daphne (2022) Identifying Covid-19 Shortages with the Help of an Automatically Constructed Knowledge Graph.

[img] PDF
3MB
Abstract:Within the Covid-­19 pandemic, there were severe product shortages in supply chains, e.g. facemasks. Early detection of them diminishes their consequences. The aim of this study is to au­tomatically identify Covid­-19 shortages from text supported by a Knowledge Graph (KG). The Covid­-19 Open Research Dataset (CORD-­19) of Covid-­19 research publications forms that basis. The method can be split into three main parts: 1. An ensemble of term weighting schemes over time was used to identify shortages in text. Those are: monthly term frequencies, monthly TF­IDF, word embeddings, the monthly co­occurrences of certain keywords, and how that changes. 2. Topic Modeling to select relevant articles was applied. One topic in a guided LDA model was seeded with keywords. All articles which are part of the seeded topic were selected. 3. A domain-­specific KG was automatically created from text to improve the identification of shortages. A sub­graph was extracted from DBpedia based on keywords, which was enhanced with open relation extraction from the Topic Modeling (TM) ­selected articles. The KG was completed with entity types, super­classes, and text cleaning. Link prediction and neighbor occurrences within the KG were added to the ensemble to identify shortages. The shortage identification was somewhat successful, as around half of the expected terms were retrieved but the list also contained many irrelevant terms. The best weighting schemes were: similar terms from the word embedding, and the KG neighbor occurrences, which is a new scheme. The TM selection of relevant articles outperformed the standard keyword­ selection. How­ever, the shortage identification on all data was better than on the selected articles, which ques­tions the method. That is predominated by the advantage of saving human effort. The KG is domain-­related but noisy, as it contains 70% of the expected entities but also a lot of irrelevant and meaningless data. The shortage identification on the TM ­selected articles considering only KG entities was slightly better than considering all terms. However, that did not perform better than the method applied to all data without the KG. Most likely, that is due to the topic model not selecting the articles well enough. An important limitation is that the ground truth list of shortages is incomplete. Therefore, the precision of the shortage identification and the KG domain affiliation is underestimated. In future work, additional KG completion methods, such as entity resolution, fact-­checking, and error detection should be applied. Furthermore, a human evaluation of the suggested shortages, the selected articles, and the KG should be done. We conclude that the suggested method is a valid approach towards a shortage ­identification system but there are still many open challenges to overcome. The main contributions include a shortage­identification method, an automated method to select relevant articles, a method to automatically construct a KG from text, and the resulting Covid­-19 KG of product shortages.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Interaction Technology MSc (60030)
Awards:Cum Laude
Link to this item:https://purl.utwente.nl/essays/89572
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page