University of Twente Student Theses

Login

Effectiveness of natural language processing techniques in categorizing scientific articles by research methodology

Sakhi, Tarek (2023) Effectiveness of natural language processing techniques in categorizing scientific articles by research methodology.

[img] PDF
495kB
Abstract:Motivation: With the ever-growing number of published scientific articles, it becomes increasingly challenging for researchers to find, review and use relevant research. Aim: This study explores the potential of using unsupervised text classification models, specifically a zero-shot classification model (GPTNLI) and a similarity-based (Lbl2vec) classification model, to streamline the literature review process. Method: These models predict the methodological approach based on simple information like the title, keywords and abstract, thereby allowing for an extra filter during scientific database searches. To accomplish this, an extensive and well-structured definition is established for each class. Result: The finding demonstrates that the GPTNLI model using GPT4, outperforms the other models in accuracy and f1 scores while showing reduced variability in its performance. Through using a binomial test it is shown that the model’s performance statistically outperforms a random-guess strategy. Conclusion: Although the study has its limitations; For instance, the use of small test datasets and lack of cost-benefit analysis, the results are promising. Future research could improve the performance of the models by incorporating more sections of the study, further fine-tuning and adding self-learning capabilities
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Business & IT BSc (56066)
Link to this item:https://purl.utwente.nl/essays/95832
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page