Keywords

1 Introduction

In computer science, centaur has come to refer to human-computer teams that collaborate on tasks and can often out-perform either humans or computers alone. This approach has been used successfully in domains such as chess [1] and protein-folding with the FoldIt system [2]. In the teamed view of centaur, both humans and computers remain distinct, but join together to solve problems. However, if we go back to Greek mythology, a centaur is not a team, but rather a distinct and unique entity that is a hybrid of a man and a horse, blending characteristics of both. If we think of things from that perspective, then we could broaden the notion of centaur in computer science to not only concern the nature of human-computer teams, but also confront what it might be like to create hybrid entities that combine aspects of both people and computers.

2 Virtual Humans as Centaurs

At the USC Institute for Creative Technologies, our work on virtual humans [3, 4] may suggest one possible approach to creating such hybrids. Virtual humans are embodied, autonomous computer agents that look and behave as much as possible like real people. They use verbal and non-verbal communication to interact naturally with real people. They perceive humans using computer vision. They model and exhibit emotions, and represent their own belief, desires and intentions as well as those of others. Studies have shown that people respond to virtual humans much like real people [57]. We have used virtual humans in a variety of roles as role-players in simulations, as guides in educational settings, and as coaches in medical applications.

3 Simsei and MultiSense

Recently, we have seen ways in which a virtual human may outperform either real people or inanimate systems alone. Simsensei [8] is a virtual human designed to act like an intake nurse, interviewing patients about PTSD and depression. The Simsensei virtual human, Ellie (shown in Fig. 1), uses language and non-verbal gestures such as head nods, mirroring gestures and body posture to engage and build rapport with the patient. Simsensei uses the MultiSense framework [8] which employs machine learing to form hypotheses about the patient’s condition by integrating information from multiple data streams, such as data about the patient’s facial expression, body posture and activity, voice prosody and speech content.

Fig. 1.
figure 1

Virtual interviewer Ellie

The MultiSense dashboard, shown in Fig. 2, indicates various inputs that go into MultiSense such as body position, facial tracking, and eye gaze, as well as some of MultiSense’s outputs such as levels of body activity and gaze attention.

Fig. 2.
figure 2

MultiSense perception dashboard

The hypotheses MultiSense forms about the patient are used to guide the interview and create an assessment about the patient’s condition. For example, if the patient seems to be disturbed by a particular question (which might be indicated by agitated body movements) Simsensei will inquire whether talking about the topic makes the patient uncomfortable. Similarly, if MultiSense detects that a patient seems to be avoiding a question (which might be indicated by eye gaze avoidance, among other things) Simsensei will probe deeper.

4 Results and Conclusions

Early on, we found that subjects reported they felt more comfortable and willing to disclose sensitive information to Ellie than to a real human because they did not fear being judged. A follow-up study [9] confirmed this effect: people will disclose more to a computer-based virtual agent. In another study with veterans returning from Afghanistan [10], we found that the soldiers disclosed more symptoms of PTSD and depression to Ellie than they did to a standard paper form, even if the form was anonymous.

Two factors seem to lead to these results. On one hand, people recognize that they are interacting with a computer, which removes the fear of judgment. But on the other hand, the virtual human uses gestures and conversation to build rapport and make the subject more comfortable. Thus, the computer and human aspects of a virtual human work together to produce a better performance than either alone could produce.

Thus, virtual humans may represent a new metaphor for how we interact with computers. At one level, adopting this metaphor means that the human-computer interface disappears because interacting with a computer becomes much like interacting with another person. At a perhaps deeper level adopting this metaphor can bring social elements to the interaction, as outlined above, which has been difficult to do with traditional interfaces. Adding social elements makes it possible to create new kinds of applications addressing new issues.