University of Twente Student Theses


Artificial Intelligence Conversational Agents : Using Card Sorting to evaluate the Chatbot Usability Scale (BUS-11), and investigate this in relation to Chatbot Experience

Schwemin, L. (2022) Artificial Intelligence Conversational Agents : Using Card Sorting to evaluate the Chatbot Usability Scale (BUS-11), and investigate this in relation to Chatbot Experience.

[img] PDF
Abstract:Introduction. Chatbot use is rapidly growing worldwide. Especially in the field of customer service, chatbots are regarded as time and cost-effective. As chatbots become more accessible for everyday use, it is important to understand the user’s needs in chatbot interactions, to facilitate user uptake. To do so, measuring chatbot satisfaction is the first step. As previous satisfaction measurement tools did not capture the complexity of chatbots, the BUS-11 was developed. To further validate the BUS-11, this study investigated the construct and face validity of the scale. Furthermore, it was tested whether previous experience influences how chatbot satisfaction is perceived. Methods. Twentythree participants were included in the study. A closed card sorting study was designed to investigate the construct and face validity. Hereby, the construct validity was assessed using heatmaps and the face validity using item-factor tables. Additionally, the participants were grouped based on their chatbot experience level and heatmaps and item-factor tables were plotted for each group. An ANOVA was performed to investigate whether previous experience affects the number of matches with the factorial structure based on the card sorting results. Results. On average, participants assigned the items to the expected factors during the card sorting. This confirmed the transparency of the construct underlying the scale (face validity). Additionally, the BUS-11 displayed good construct validity due to the participants grouping the items in accordance with the factorial structure. No significant differences were observed between different levels of experience, as the results from each group also mostly confirmed the factorial structure. The ANOVA was not significant but was limited in its statistical power due to the small sample size (the assumption of normality could not be confirmed). The high correlation between factors 2 and 3 found in the original study was also observed here. Discussion. The results indicate that the BUS-11 provides a reasonable estimate for chatbot satisfaction. It can be said that chatbot experience does not affect the card sorting results and that construct and face validity are good across all chatbot experience levels. The ANOVA had limitations due to sample size but indicated no between-group differences were present. The participants' mental model seems to fit the factorial structure to a large extent, but it can be suggested that factors 2 and 3 may be combinable.
Item Type:Essay (Bachelor)
Faculty:BMS: Behavioural, Management and Social Sciences
Subject:02 science and culture in general, 05 communication studies, 50 technical science in general, 54 computer science, 77 psychology
Programme:Psychology BSc (56604)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page