University of Twente Student Theses


Predicting dialogue state transitions using prosodic markers: Exploring AMI Corpus backchannels

Nouwens, I.C.C. (2009) Predicting dialogue state transitions using prosodic markers: Exploring AMI Corpus backchannels.

[img] PDF
Abstract:In a typical conversation held within a small group, we often see one person speaking, whilst the others listen. In most dialogues the listeners are not completely silent while the speaker has the floor. Occasionally they indicate to the speaker and the rest of the group their engagement in the discourse by giving feedback in the form of words like “Oh really?”, “yeah”, “hmm-mm”, “you don’t say!”. By doing so, the listener informs the speaker with their opinion about what is being said. Opposed to these feedback words – called backchannels – a listener can choose to interrupt or take over with a new statement if he or she wishes to contribute more than a few backchanneling words. In this thesis we have studied recordings of conversations in order to determine if the prosody from the speaker contains characteristic differences between the situation in which a listener uses a backchannel and the other situation, in which a listener adds an entirely new verbal contribution to the conversation. We used a corpus, consisting of the recorded signals of 138 multiparty meetings with an average length of 33 minutes each, in which four participants discuss the design of a new product. From the participants’ speech each utterance is annotated with a type, indicating whether it is a backchannel or not. From this corpus we selected the utterances of a speaker wherein or where shortly after, one of the listeners would start a contribution. By using “Praat”, we extracted several prosodic features from the selected utterances, normalized them for each speaker and used the resulting dataset in a series of machine learning experiments. By applying statistical techniques on our data, we assessed whether the two different types of contributions could be distinguished, based on the prosodic features that can be taken from the speaker’s speech. We found our decision tree to be correct in classifying the type of the contribution in backchannels and non-backchannels in 65.9% of all cases. With the baseline set at 50%, this is an increase of 15.9%. This report will present the following contents: chapter one serves as the introduction and is followed by chapter two, presenting previous findings from fields related to our study. Chapter three will present the corpus used and how this was formatted to support automatic utterance selection. The selection criteria, resulting data selection and feature sets that are extracted are presented in chapter four. The experiments that were conducted and the results obtained from them are described in chapter five. Finally, we conclude in chapter six with conclusions and suggestions of further research.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Interaction Technology MSc (60030)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page