University of Twente Student Theses


Co-attention-based pairwise learning for author name disambiguation

Li, Q. (2022) Co-attention-based pairwise learning for author name disambiguation.

[img] PDF
Abstract:Digital library management systems suffer from inefficient retrieval caused by name ambiguity. Manual annotations require domain-specific knowledge and time-consuming cleaning work. Natural Language Processing and Deep neural networks are recently utilised to distinguish authorships of publications with identical author names. However, earlier machine learning approaches lack the latest embedding techniques in feature processing. Therefore, crucial latent information about record relationships is lost. Besides, no human-readable interpretation is provided. Based on state-of-art embedding techniques and attention mechanisms, this thesis proposed a co-attention-based pairwise learning model for author name disambiguation. The contribution of this thesis is threefold: first, it applies appropriate methods to process multiple types of features: textual, discrete, and coauthor features, with the goal of retaining all latent information of all components of records. Second, it engages the self-attention and co-attention mechanisms to investigate latent interactive information between records. Third, it provides explanations about model predictions by visualising the self-attention and co-attention weights. The experiment reveals that the co-attention-based model achieves the best scores using accuracy, F1, and ROC AUC measurements in most generated datasets. Although it is still debatable whether the attention weights are interpretations, they intuitively provide evidence of decision processes.
Item Type:Essay (Master)
OCLC B.V., Leiden, Netherlands
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page