University of Twente Student Theses
Effectiveness of natural language processing techniques in categorizing scientific articles by research methodology
Sakhi, Tarek (2023) Effectiveness of natural language processing techniques in categorizing scientific articles by research methodology.
PDF
495kB |
Abstract: | Motivation: With the ever-growing number of published scientific articles, it becomes increasingly challenging for researchers to find, review and use relevant research. Aim: This study explores the potential of using unsupervised text classification models, specifically a zero-shot classification model (GPTNLI) and a similarity-based (Lbl2vec) classification model, to streamline the literature review process. Method: These models predict the methodological approach based on simple information like the title, keywords and abstract, thereby allowing for an extra filter during scientific database searches. To accomplish this, an extensive and well-structured definition is established for each class. Result: The finding demonstrates that the GPTNLI model using GPT4, outperforms the other models in accuracy and f1 scores while showing reduced variability in its performance. Through using a binomial test it is shown that the model’s performance statistically outperforms a random-guess strategy. Conclusion: Although the study has its limitations; For instance, the use of small test datasets and lack of cost-benefit analysis, the results are promising. Future research could improve the performance of the models by incorporating more sections of the study, further fine-tuning and adding self-learning capabilities |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Business & IT BSc (56066) |
Link to this item: | https://purl.utwente.nl/essays/95832 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page