University of Twente Student Theses

Login

Evaluating and comparing textual summaries using question answering models and reading comprehension datasets

Gavenavicius, M. (2020) Evaluating and comparing textual summaries using question answering models and reading comprehension datasets.

[img] PDF
304kB
Abstract:The currently dominant approaches to automatic evaluation of summaries rely on measuring similarity between a candidate and a reference summary solely through lexical overlap. These methods might be limited in their ability to assess summary factuality, which we address in this work by evaluating summaries by their usefulness for question answering on reading comprehension tasks. We develop a framework for performing these evaluations without reliance on Question Generation models by repurposing existing human crafted datasets. Our experiments show that the scores produced by our method correlate highly with ROUGE when evaluated on the RACE dataset, and have low to medium correlation when evaluated on SQuAD 2, implying that well-performing summarization systems (as evaluated by ROUGE) also do well on factual retention, although this is highly varied depending on the particular dataset. Our experiments also indicate that the gap between current state-of-the-art summarization models and simple baselines is still narrow when given out-of-domain text. We further test our method’s sensitivity to word order, showing that further adjustments are needed to evaluate the fluency of the summaries.
Item Type:Essay (Bachelor)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science BSc (56964)
Link to this item:http://purl.utwente.nl/essays/81989
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page