University of Twente Student Theses


Automating Scientific Paper Screening with ChatGPT: An Evaluation of Efficiency and Accuracy

Botnarenco, Daniel (2023) Automating Scientific Paper Screening with ChatGPT: An Evaluation of Efficiency and Accuracy.

[img] PDF
Abstract:Goals: This study aims to evaluate the performance of different language models, including BERT and GPT, in scientific paper screening. The primary research question is to assess their classification accuracy and language generation capabilities to gain insights into their potential and limitations. Method: The methodology involves evaluating the models for the specific task of scientific paper screening. The dataset comprises 6865 scientific papers with screening decisions provided as ground truth labels. Evaluation metrics such as accuracy and F1 scores are used, along with confusion matrices, to assess the models' classification performance. Results: The results show that the BERT model achieved the highest accuracy and F1 score among the tested models, while GPT-3 Turbo and 4 exhibited lower classification accuracy and F1 score performance. The processing speeds varied, with BERT benefiting from the CUDA framework. Each model provides, at best, twice as fast as a human coder processing speed of documents. Implications: The findings highlight the importance of prompt engineering and fine-tuning in improving language model performance for specific tasks. The study contributes to developing and understanding large language models in natural language processing tasks, facilitating their effective utilization in scientific paper screening tasks.
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science BSc (56964)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page