Experiments in How Automated Systems Should Talk to Users
David Williams 1, Christine Cheepen 2and Nigel Gilbert2
1 Vocalis Ltd
Great Shelford,
Cambridge
CB2 5LD, UK
cdw021@email.mot.com
2 Department of Sociology
University of Surrey, Guildford GU2 5XH, UK
{christine, nigel}@soc.surrey.ac.uk
ABSTRACT
This paper describes experiments carried out in the domain of automated telephone banking. The results suggest that for this domain the use in system prompts of human-like talk should be avoided.
KEYWORDS:
INTRODUCTION
This paper describes experiments which investigate the notion of naturalness in human-machine spoken dialogues. The paper focuses on the experimental method and results. For a more detailed theoretical background see Williams and Cheepen (1998). The experimental hypothesis is motivated by the widely-held assumption in the commercial sphere that for dialogues to be perceived as 'natural' or 'friendly' by a novice user, the system output (prompts) must contain a wide variety of human-like person-directed tokens, e.g. 'please', 'thanks', 'I', 'your,' etc. This paper proposes that embellishing a dialogue with such tokens will produce no better and possibly worse interaction than a more laconic prompt style. The experiments take a highly goal-directed domain which is typical of current automation targets, i.e. telebanking. A commercially available dialogue provides the dialogue logic and speech recognition performance.
Twoprompt sets are compared. The first set (which we call the original set) illustrates the typical, arbitrary use of human-like person-directed tokens in system output. The second set had these tokens stripped out or replaced by material which was not person-directed, in order to produce a 'denatured' prompt set. For example, the originalprompt"I'm sorry I didn't understand that" was reduced in the denatured condition to "Not understood". There was no difference in recognition performance or dialogue logic between the two 'systems'. We proposed that there would be no objective or subjective advantage for the original system.
EXPERIMENTAL METHODS
A pilot phase was conducted which used 12 subjects in a within subjects design. Ordering effects were countered by the subjects being organised into two groups, group one used the dentatured system first, then the original. The reverse was used for group two. Results were anecdotal, with subjects spontaneously referring to a perception of a more direct and rapid interaction with the denatured prompts. Furthermore, they identified this as the main reason for their preference for this version. A second experiment used 22 naive users from the general public. The experiment used a within subjects design with the original followed by the denatured system (organisational constraints meant that ordering effects for system type were not addressed. However, the pilot experiment showed the order of system presentation was not a confounding variable). The dependent variable was transaction time for each of the four tasks (Bill Pay, Statement, Balance, Transfer Funds) which was measured from option selection to the end of the last task-related prompt. Also, for each condition, subjects completed a short evaluation questionnaire. On completing both conditions, subjects were simply asked which of the two systems was quicker and which they preferred.
DISCUSSION: TRANSACTION TIMES AND USER PREFERENCE Our hypothesis, based on our findings in the pilot experiment, was that for the highly goal-directed domain of telebanking, the denatured prompts would perform as well overall (in terms of usability) as the human-like, supposedly 'natural' prompts. The analysis of experimental times shows no significant result in favour of the denatured prompts - the denatured prompts only resulted in a significantly shorter transaction time for one task. However, they clearly performed as well as the original prompts.
For user preference, the hypothesis suggests that the denatured prompts will be preferred by users, or at least be on a par with the original prompts. The subjective results indicate a clear preference for the denatured system. Examples of subject comments on the denatured system corroborate this, e.g. "Much clearer", "Seemed easier", "No fancy language and faster". However, this only makes a good argument for proposing shorter prompts in a highly goal-directed and business-oriented domain. Further work must be done in interpersonal domains, e.g. leisure services.
ACKNOWLEDGMENTS
From an ESRC-funded research project 'Design guidelines for advanced voice dialogues', under the Cognitive Engineering Programme, project no. L127251012.
