In the eye of the wizard : effects of (mutual) gaze on an avatar mediated conversation

Abstract:In this report we look at the effects of gaze on avatar mediated conversations. It is theorized that gaze has an effect on the efficiency of turn-taking, perceived dominance and perceived rapport. Would this effect also occur if one were to communicate via an avatar that copies gaze? And does a delay in copying gaze behavior to the avatar have an effect? To test this we expand the AsapRealizer [1] with an eye motion capture and animation module using data from the Kinect [2] and video analysis. The eye module mirrors iris movements and detects blinks. We perform an experiment where participants communicate with each other via this avatar. The experiment is executed in ten pairs with two sessions of roughly six minutes per pair. In one of the sessions participants see gaze behavior of the other participant mirrored directly, in the other session this behavior would be delayed by four seconds. The participants were given a questionnaire on perceived rapport, dominance and user satisfaction. The sessions were also recorded on video. After the experiment we perform both a system and a user evaluation. For the system evaluation we analyze generated log files on eye and head behavior by using MATLAB [3]. This shows that the head module copies user behavior within 200ms 88% of the time. Standard deviations are also fairly low indicating few extremes. The eye module however does not perform as well. It copies user behavior within 200ms only 66% of the time, though 81% within 500ms. The eyes are closed too often, especially when participants look down. The capabilities of the avatar are somewhat limited. It sometimes loses track of the participant for a moment. Also the avatar has a fixed appearance and was a woman. So she did not look much like the participants. It does however give a reasonable impression of where a participant was looking at. The user evaluation was split in an analysis of the questionnaire and an analysis of video captured of the participants. There are no significant differences between the delay conditions in the questionnaire results. For the video analysis we manually annotate participant speaking time using annotation tool ELAN. [4].We then have ELAN derive the gaps and overlaps of the session. We look at the percentage of conversation time taken up by speech, gap and overlap. Further we look at number of utterances per minute and average duration of speech, gaps and overlap. Turn taking is considered more efficient with less overlap and gaps. The video analysis showed no significant differences between the delay conditions. In conclusion, we found no differences between the delay conditions with this avatar. This could be because there simply is no effect of gaze behavior in avatar mediated conversation. It could also be because of the limited capabilities of the avatar and the lack of likeness. Future work will have to investigate this further.
