Author(s): Scheuter, L.R. (2021)
Abstract:
Discourse coherence has been the topic of research for years, yet the subject seems to be untouched when it comes to longer discourse. I therefore raise the question: how can longer fictional discourse of at least 50.000 words be assessed in terms of coherence? To analyze discourse coherence, I first distinguish between syntactic coherence (concerning grammar) and semantic coherence (concerning the meaning of the text). Using Barzilay and Lapata’s Entity Grid model for the assessment of syntactic coherence, a feature vector per document can be created representing the probabilities of the sentence-to-sentence transitions of the syntactic roles of the entities present in the document. For the assessment of semantic coherence, using Global Vectors (GloVe), a vector representation of a document is created, after which the semantic similarity of adjacent sentences could be computed using the cosine similarity score. Both the feature vectors and the cosine similarity scores then are the input to two models: a logistic regression model and a random forest model. Coherence is evaluated in three ways: using only the feature vectors, using only the similarity scores and finally using both the feature vectors and the similarity scores.
Document(s):
scheuter_MA_eemcs.pdf