University of Twente Student Theses


Leveraging disagreement among annotators for text classification

Xu, J. (2024) Leveraging disagreement among annotators for text classification.

[img] PDF
Abstract:Using "ground truth" label for model training can sacrifice the valuable nuances and diverse perspectives inherent in annotators’ assessments, thereby compromising the authenticity and richness of annotated dataset. In this study, we introduce approaches that incorporate annotation disagreement into the model training process. We mainly focus on hate speech detection and abusive conversation detection, tasks inherently entailing a high degree of subjectivity. Our approaches construct models using three different strategies: probability-based multi-label method, ensemble system and instruction tuning. The probability-based multi-label method treats the detection tasks as a multi-label text classification problem and gives probability distribution across different labels. The ensemble system imitates the annotation process that involves multiple annotators. It consists of multiple sub-models that are trained individually, thereby incorporating diverse perspectives within the annotations. The predictions from all sub-models are combined and transformed into the final decisions. Both the multi-label method and the ensemble system use BERT as their foundation models. Instruction tuning shares the same principle with the ensemble system but employs LLaMa 2 as the foundation model and fine-tunes it through the use of natural language instructions. Cross entropy is utilized as a metric to compare the performance of these three approaches. Moreover, to evaluate the effectiveness of embracing annotation disagreements for model training, we conduct an online survey that compares the performance of the multi-label model against the baseline model.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:50 technical science in general, 54 computer science
Programme:Computer Science MSc (60300)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page