University of Twente Student Theses

Login

Leveraging disagreement among annotators for text classification

Xu, J. (2024) Leveraging disagreement among annotators for text classification.

[img] PDF
8MB
Abstract:Using "ground truth" label for model training can sacrifice the valuable nuances and diverse perspectives inherent in annotators’ assessments, thereby compromising the authenticity and richness of annotated dataset. In this study, we introduce approaches that incorporate annotation disagreement into the model training process. We mainly focus on hate speech detection and abusive conversation detection, tasks inherently entailing a high degree of subjectivity. Our approaches construct models using three different strategies: probability-based multi-label method, ensemble system and instruction tuning. The probability-based multi-label method treats the detection tasks as a multi-label text classification problem and gives probability distribution across different labels. The ensemble system imitates the annotation process that involves multiple annotators. It consists of multiple sub-models that are trained individually, thereby incorporating diverse perspectives within the annotations. The predictions from all sub-models are combined and transformed into the final decisions. Both the multi-label method and the ensemble system use BERT as their foundation models. Instruction tuning shares the same principle with the ensemble system but employs LLaMa 2 as the foundation model and fine-tunes it through the use of natural language instructions. Cross entropy is utilized as a metric to compare the performance of these three approaches. Moreover, to evaluate the effectiveness of embracing annotation disagreements for model training, we conduct an online survey that compares the performance of the multi-label model against the baseline model.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:50 technical science in general, 54 computer science
Programme:Computer Science MSc (60300)
Link to this item:https://purl.utwente.nl/essays/99023
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page