1 Introduction

Psychiatry is a unique medical specialty. Indeed, while other physicians can use objective measures to diagnose a disease (e.g., body temperature, tension, auscultation) psychiatric diagnostic only relies on face-to-face clinical interview [1, 2]. Therefore, two major skills need to be acquired by future physicians, and psychiatrists in particular: clinical reasoning, associated with symptomatology knowledge, in order to appropriately look for the accurate diagnosis; and clinical empathy [3], to conduct the interview in an engaging way so that the patient is willing to disclose his symptoms. Historically, as opposed to sympathy which corresponds to “an affective state matching the state we observe, such as sharing fear”, empathy can be defined as “a deliberate intellectual effort to get “inside” the other, for a better understanding” [4]. In healthcare, empathy has been shown to lower patient’s anxiety and encourage expressions of symptoms [1].

In France, in order to follow recommendations of Haute Autorité de Santé (National Authority for Health): “never the first time with a patient”, new techniques are emerging to simulate healthcare situations (e.g., actors, manikins, role playing) but they remain practically unused in French medical schools, notably in psychiatry specialty [5].

In the recent decades, embodied conversational agents (ECAs) have shown an increasing interest in several areas, and show a great potential for medical education, playing the role of virtual patients (VPs) [6].

In the last 5 years, our research team has developed and validated several ECAs used as virtual doctors to detect sleepiness [7], addiction for tobacco and alcohol consumption [8] and major depressive disorder [9]. Therefore, in collaboration with Bordeaux Medical School, we designed another ECA simulating a patient, in order to train medical students’ skills to conduct a psychiatric interview. We hence developed a VP suffering from major depressive disorder and trained students’ abilities to extract symptomatology and communicate with empathy.

In this study we present theories regarding mental health disorders and psychiatric interview characteristics in order to propose some guidelines for the development of a new virtual patient. We applied these guidelines in the design of a virtual patient suffering from depression and tested it with 35 medical students.

2 Related work

2.1 Major depressive disorder signs and symptoms

Clinical diagnosis relies on both signs (i.e., externally, observable phenomena - expressions) and symptoms (i.e., patient’s subjective complaints - experiences) [10]. In most medical specialties, physicians use tools to measure signs (e.g., blood pressure monitor, medical imaging), and clinical interviews to collect subjective symptoms. In the field of mental disorders, however, the vast majority of signs investigated, such as body movements, mood and discourse [11], are expressed as patients progressively disclose their symptoms.

According to the Diagnostic and Statistical Manual of Mental Disorders (DSM-5 [12],), the international standard for mental disorder diagnosis, a major depressive disorder (MDD) is diagnosed if at least 5 of the following symptoms have been present during at least 2 weeks:

  1. A.

    Depressed mood or irritable mood

  2. B.

    Diminished interest or loss of pleasure in almost all activities (anhedonia)

  3. C.

    Significant weight change or appetite disturbance

  4. D.

    Sleep disturbance (insomnia or hypersomnia)

  5. E.

    Psychomotor agitation or retardation

  6. F.

    Fatigue or loss of energy

  7. G.

    Feelings of worthlessness

  8. H.

    Diminish ability to think of concentrate; indecisiveness

  9. I.

    Recurrent thoughts of death, recurrent suicidal ideation without a specific plan, or suicide attempt or specific plan for committing suicide

Other authors [11, 13] describe MDD symptoms among three dimensions:

  • Affectivity: including depressive ideas, irritable mood and suicidal thoughts;

  • Psychomotricity: including reduced cognitive and intellectual abilities, locomotion retardation, lack of expressiveness

  • Physiological functioning: with sleep complaints, appetite disruption and sexuality disturbance

Again, we can see that the psychiatrist needs to collect and disentangle both observational and subjective symptoms during the psychiatric interview in order to diagnose MDD [1, 2].

2.2 Characteristics of psychiatric interviews

As a result of these intertwining signs and symptoms, the psychiatric interview should be conversational, contextually-adapted and empathic [10].

Notably, Shea [1] suggests that a first psychiatric interview should last at least 30 min and should follow three main phases:

  1. 1.

    The introduction and beginning of the interview, which aims to lower patient’s anxiety of coming to see a psychiatrist, and expose the objectives of the present interview

  2. 2.

    The main part of the interview, during which objectives are to help the patient express his symptoms by guiding him through the different dimensions of depressive disorders

  3. 3.

    The ending of the interview, where the psychiatrist presents the diagnosed disorder and proposes an adapted solution, while taking into account patients’ feelings and representations, and giving him hope about future recovery

Therefore, each of these three phases are built upon two complementary skills. First, physician’s ability to extract symptomatology, i.e., “the clinical evaluation of signs and symptoms, leading to the identification of a psychiatric disorder” [12, 14] based on patient’s answers and behaviors during the interview. But as importantly, the physician needs to establish an empathic communication with the patient in order to facilitate symptomatology extraction. Empathy is arguably the most important psychosocial characteristic of a physician engaged in patient care [15] as it helps build patient trust [16], increases patient satisfaction and compliance, improves medical care outcomes and may reduce medical malpractice lawsuits [17]. However, there is still a need to train and evaluate empathic communication skills in the medical field [18, 19].

2.3 Medical education

Until now in France (to note, a major reorganization of medical education will occur from September 2020), medical school starts directly after high school, and is divided into three cycles. First cycle/premedical school (3 years) is only theoretical and starts with a very selective examination at the end of the first year. Second cycle (3 years) is still mainly theoretical, but medical students, often called « externs » or “hospital students”, generally spend five mornings per week in several specialty departments, under the responsibility of a senior physician, to observe and learn how to recognize the various signs of a disease. At the end of 6th year, medical students must pass a “classifying national examination”, testing their medical knowledge, which will determine their specialization based on their rank. It is only during third cycle/internship (lasting 3–6 years depending on medical specialty) that “interns” can manage patients and prescribe drugs, still under the supervision of senior physicians. Experts from all around the world agree that medical education still consists principally in passive learning through classroom-based lectures and observation [20], and that new tools are thus needed to provide future physicians with more active, practical and experiential training [21]. Notably, new techniques are emerging to train students’ empathic skills [22] mainly provided by role play with standardized patients (i.e., actors trained to act as patients). However, even if these initiatives have been effective in improving medical students’ empathy, they are sometimes not feasible in terms of schedule and resources to train and employ. Additionally, regarding assessment, current methods during 1st and 6th year examinations mainly rely on multiple-choice questions (MCQs), which might of course not reflect students’ abilities to conduct an empathic interview with a patient. It is therefore now recommended to focus more on the assessment of medical students’ skills and competencies rather than on just their theoretical knowledge. Nonetheless, these tools need to remain standardized and common among medical schools [20].

2.4 Virtual patients in medical education

Virtual patients have demonstrated their potential for training communication skills in the education of medical students [6]. Notably, in France, Ochs and colleagues [23] designed a virtual patient to train doctors telling bad news in an empathic manner. Kenny and colleagues [24] designed a VP suffering from Post-Traumatic Stress Disorder, and noted that students were able to appropriately conduct a clinical interview and diagnose this disorder based on VP simulation. Results of Kleinsmith study [25] indicate that students made significantly more empathetic responses to virtual patients compared to responses made to standardized patients. In addition, the study conducted by Foster and colleagues [26] showed that empathic skills learnt by interacting with a VP were later effectively applied when interacting with real patients.

Taken together, these studies highlight the numerous benefits VP can offer compared to other non-virtual pedagogic tools, such as:

  • Enabling the simulation of complex, variable or not frequently encountered medical situations

  • Allowing a safe and repetitive practice

  • Facilitating evaluation through standardized content and automatic data gathering

  • Offering the possibility to pause the scene, re-do actions, or display real-time cues and feedbacks

  • Allowing students to experiment and see the consequences of their decisions

  • While providing applicable skills for real life

However, these studies only simulated a short and non-contextual interview, and none of them focused on depressive disorders, showing a great potential for technologies to provide new ways of pedagogy and evaluation.

3 Design guidelines of virtual patients for psychiatric interviews

Based on this educational and medical context, we propose four design guidelines for the conception of virtual patients to train and evaluate medical students. These guidelines are then presented under the form of a checklist to facilitate the design of virtual patients for pedagogy in medicine (Table 1).

Table 1 Checklist for the design of virtual patients for psychiatric interviews

Guideline 1: Simulate a realistic interview. The simulated interview needs to foster sensorial, emotional and episodic memory to induce better remembering performances. The interview should therefore by as realistic as possible in terms of social presence (i.e., feeling of being there with a “real” person [27]), structuration and duration.

Guideline 2: Simulate a realistic symptomatology. Mental disorders are complex diseases where observational signs and subjective symptoms are both exposed during the psychiatric interview. Particular effort should be put to simulate a patient showing both affective, psychomotor and physiological symptoms, following symptomatology description.

Guideline 3: Focus on abilities needed in psychiatric interviews. As previously mentioned, two main skills are needed in psychiatric interview: an empathic and engaging communication to favor patient disclosure, and symptomatology extraction, to perform accurate clinical reasoning and diagnosis. These two skills should therefore be emphasized in the interaction scenario and training tools.

Guideline 4: Follow assessment standards but provide personalized training tools. In France, medicine examinations are based on MCQs whose scores and errors determine students’ classification and specialization. Therefore, virtual patients should follow the same question format (i.e., MCQs) to prepare for examination. Nonetheless, for training, it is recommended to show feedback, in order to help students understand their errors and improve their competencies in the future.

4 Applications of these guidelines: our virtual patient with depressive symptoms

Following our guidelines, we designed a virtual patient suffering from major depressive disorders (MDD) dedicated to 4th year medical students. We will present the choices we made to design a realistic interview, the technique we used to simulate a realistic patient, how we included empathic communication and symptomatology extraction in the interaction scenario, and how we designed the assessments tools. Lastly, we will expose the entire architecture of our VP to build the interaction situation.

4.1 The interview (Guideline 1)

In order to provide realistic interview conditions, the VP was displayed in the size of a real human (displayed on a TV screen: Samsung Full HD 55 inches). The virtual environment was a consultation room and the patient was seated in front of the participant with a 1st person visualization (Fig. 1). Users interacted through voice, and we used Microsoft Speech serviceFootnote 1 for voice recognition. The interaction scenario was pre-determined, following Shea’s [1] three phases of a psychiatric interview, and proposed several options leading to a single endpoint (also known as a linear string of pearl narrative; [28]) by using decision trees architecture. Interaction lasted about the same duration as a real interview (around 35 min). A webcam was recording students during the entire interview.

Fig. 1
figure 1

Examples of questions appearing during the interaction with the Virtual Patient. On the left, an example of MCQs for symptomatology extraction. The picture also displays the interaction situation: TV screen, 1st person visualization, vocal interface. On the right, an example of two-choice question for empathic communication evaluation

4.2 Simulation of signs and symptoms of depression (Guideline 2)

To provide users with a realistic simulation of MDD symptomatology, the scenario was written by experienced psychiatrists in order to go through all different symptoms of MDD as described by psychiatric theories [11,12,13]. We used motion capture technology (using OptitrackFootnote 2 for body tracking and DynamixyzFootnote 3 for face tracking) and involved an actress (who was psychologist as well and had long experience with depressive patients) to display both verbal and nonverbal (the prosody, gestures, and general aspect) symptoms of MDD. Animations captured were applied to a 3D model of a virtual woman with bones and facial blend shapes for facial expressions using Autodesk MotionBuilder software.Footnote 4 The animated model was then displayed using Unity 3D software (Unity-TechnologiesFootnote 5) in order to be used in the intended scenario.

4.3 Training of psychiatric skills (Guideline 3)

Examples of empathic communication and symptomatology-extraction questions are presented on Table 2 and Fig. 1. To put the emphasis on empathic communication and information-seeking strategies, throughout the interview, the participant had to choose between two sentences the one that seemed the most appropriate to conduct the interview. Questions were written by two experienced psychiatrists (who also teach in the psychiatry department of Bordeaux Medical school), and were based on simple and consensual rules in the field of psychiatric interviews [1, 10]:

Table 2 Example of questions asked to the participant during the interview with the VP
  • Avoid negative judgments (e.g., “you are not trying hard enough”)

  • Prefer open questions (e.g. “Now, could you describe your sleep?” rather than “Do you sleep well?”)

  • Avoid multiple questions (e.g., “Do you have allergies, a medical history, and do you take medication?”)

  • Prefer reformulation (e.g., “You told me that you feel like having a knot in your stomach, can you tell me more?”)

As a training tool for 4th year students, with no or few practical knowledge, empathic communication questions were chosen to be stereotypical, and to mirror common mistakes made by students.

Regarding symptomatology extraction, during the main part of the interview, several lists of symptoms and signs were repetitively proposed to the student who had to select the one(s) demonstrated by the VP in the previous intervention, in a MCQ format. Signs and symptoms were based on psychiatric referentials [11,12,13].

4.4 Students’ evaluation and personalized feedback (Guideline 4)

Mirroring classical evaluation tools in medical examinations, symptomatology extraction was evaluated by 13 MCQs, with 5 items corresponding to several depressive symptoms and students had to selected the one(s) gathered during the interview. The number of right answers and errors were recorded, and participants received a score ranging from 0 to 20 (calculated from the raw score ranging from 0 to 65, corresponding to the total number of right answers). Once users had validated their answer, the system would give corrections, highlighting accurate and wrong answers to the user.

Evaluation of empathic communication was based on 32 two-choice questions, where students have to choose the right answer favoring empathy and patient disclosure. Number of right answers and errors were recorded, and participants received a score ranging from 0 to 20 (calculated from the raw score ranging from 0 to 32, corresponding to the total number of right answers). When the student picked the wrong sentence, the VP would answer saying that she did not understand or was a bit lost, and the accurate answer would be given to the student, in order for students to understand the consequences of their choice.

It has to be noted that we gave a mark out of 20 to students because it is the common metric in French education, with higher marks corresponding to better performance.

After the interaction with the VP, a semi-structured debriefing was conducted by a psychiatrist, in order to go through students’ answers and errors committed, as well as to assess their attitudes toward the agent.

4.5 The virtual patient architecture

The system was implemented in C# in Unity 3D software (Unity-TechnologiesFootnote 6) to provide a robust and generic architecture that can be executed with a PC, a tablet, a virtual reality headset, or in an immersive room (e.g., CAVE ™).

The main functionalities of this architecture are the following (Fig. 2):

Fig. 2
figure 2

The different modules composing the VP software. The software is composed by a scenario manager, a display manager, an interaction manager, a statistic manager and a debriefing manager

  • a scenario manager based on decision trees

  • a display manager, that automatically plays the voice and animations of the VP

  • an interaction manager, managing speech recognition and graphical interface

  • a statistics manager, gathering the scores, errors and the interaction duration

  • a debriefing manager, enabling to show errors and re-play the VP animations with a teacher in order to improve student’s skills

5 Experimental study

In order to test our VP with real users, we involved thirty-five fourth-year medical students to interact with the VP.

Participants were recruited from Bordeaux University Hospital (France), were aged 22 years-old on average, and half of them (N = 17) were male. Among them, 15 were trainees in the psychiatry department (and thus had already observed psychiatric interviews) and 20 in the neurology department (therefore never experienced a psychiatric interview before).

Scores and errors to empathic communication questions and symptomatology extraction questions were collected, and students’ answers during debriefing sessions were transcribed and analyzed afterwards by the experimenters.

This project is part of a larger project on virtual reality and clinical phenotyping (PHENOVIRT) that has been approved in compliance with French and European regulations on clinical research by a local ethics committee (Comité pour la Protection des Personnes – Institutional Review Board of University of Bordeaux). All participants gave their written informed consent before entering the study. To note, this study was only exploratory and had no impact on students’ validation of their exams.

5.1 Symptomatology extraction and empathic communication: scores and errors

Scores and errors are presented using means (M), standard deviations (SD), minimum and maximum values. All statistical analyses were performed using SPSS software (version 18, PASW Statistics).

Globally, students had very good total scores and made few errors (Table 3).

Table 3 Descriptive statistics of scores and errors in empathic communication questions and symptomatology extraction MCQs for all students

To note, all students obtained a total score over 10 (the minimum mark in France to pass an exam), and 3 students obtained a total score over 19 out of 20. Regarding empathic communication questions, 13 students obtained a score over 19, with 3 students reaching the maximum score of 20. Scores for symptomatology extraction were lower, with only three students reaching a score over 19 and none of them obtained the maximum score.

5.2 Qualitative evaluation of the VP by the students

Generally, feedback given by the students after the interaction with the VP was very positive. Three main advantages were highlighted:

  • Pedagogic usefulness Many students mentioned the benefits of the VP for learning, as it “presents a good panel of symptoms” (P1), “uses the actual terms of the psychiatry manual” (P22) and “enables us to test [our] knowledge” (P21). They also drew attention to the additional communicational skills learned during the interaction with the VP, such as “learn how to conduct an interview” (P6), “[understand] which questions to ask a patient” (P21). Moreover, they stressed the advantages of using digital solutions: “they give you ready access to patients” (P4), “we cannot do clinical observation in every domain” (P24) and “they could be used at home to prepare for an exam” (P3).

  • User experience Students expressed positive feelings regarding their interaction with the device, in terms of ease of use (“not too difficult” (P6, P7)), time consumption (“not too long” (P1, P3, P6, P17), “half an hour, it’s OK […], it is the same duration as a real interview” (P7)) and enjoyment (e.g., “awesome” (P4), “funny” (P8, P17), “cool” (P14, P24), “interesting” (P7, P18, P20, P21), “unexpected” (P12), “I wasn’t expecting it to be that good!” (P16), “the patient is truly endearing!” (P3).

  • Realism of interaction Several participants mentioned the realism of the VP in terms of “gestures” (P4), “sight” (P12), and “voice” (P24). They found the interaction to be “immersive” (P25), and “credible” (P5): “[depressive patients] are exactly like that!” (P16), “feels like conducting a real interview” (P11). One student who did an observation training in psychiatry department even said “I saw some real depressive patients, and they talk just like that. And psychiatrists ask the exact same questions!” (P15).

They also pointed out some limitations. Notably, many found that the empathic questions (two-choice questions) were too easy: “I felt like it was too obvious” (P24), “we understood quickly which question to choose” (P20), “two choices is too easy” (P16), and “too repetitive” (P17) “all the time same type of questions” (P15). However, as P26 said: “the questioning was a bit obvious, but not when we moved to the symtomatology questions…”. Indeed, some complained about the difficulty of the questions listing psychiatric signs, as some terms (e.g., abulia, apragmatism, bradipsychia) might be complex and very specific: “hard to remember it all” (P25) “I did not know all the symptomatologic terms” (P20). Additionally, not all students had the same theoretical background regarding these terms: “we have not learned about it yet” (P17), “we just started to see it in lectures” (P14).

Finally, two students offered ideas for future work: “It would be nice to have it for other disorders” (P22), and “it would be fun to do it with somebody undergoing a manic episode or something like that!” (P9).

6 Discussion

The objective of this work was to identify guidelines based on psychiatry theories to design and test a virtual patient to train student’s abilities to conduct a psychiatric interview. Our study suggested the effective application of our proposed design guidelines and provide areas of improvements.

The students managed to interact appropriately with the system, as overall they had good scores and made few errors. It hence validates the effective computerization of the examination tools we currently use (Guideline 4), mainly based on paper-based multiple-choice examinations and human observations [20], making them more time-efficient and standardized for medical education. Interestingly, students made very few errors to empathic communication questions while they made more errors in symptomatology questions. It suggests that the level of difficulty between these two types of questions is not equivalent, with empathic communication questions being easier to answer than symptomatology questions. This explanation is confirmed by students’ feedbacks during the debriefing session. One way of improvement could be to propose several levels of difficulty for empathic communication and symptomatology questions, in order to keep students motivated [29]. Also, the question format (i.e., two choices vs. MCQs) could have influenced students’ perceived difficulty and performances during the interview. Indeed, as a first prototype, we wanted very stereotypical questions to probe students’ communicational level. New versions could increase the realism of empathic communication questions, by proposing more than two choices, and subtler questions.

During the debriefing sessions, the students gave much positive feedbacks regarding VP usefulness for pedagogy, such as the possibility to simulate various symptoms and clinical situations, use from home, and enable to learn and remember better than just theoretical knowledge (Guideline 3). Indeed, many studies showed the added values of virtual reality tools to foster and improve learning, by displaying multimodal stimuli and therefore favoring the involvement of additional memory systems, such as emotional, associative and procedural memory [30]. Beside, our generic architecture would enable to easily design new scenarios, using several displays and different VPs’ appearance. Students also shared their positive experience with the tool, seen as “interesting” or “cool”, referring to well-known factors in Human–Computer Interaction literature (i.e., usefulness, ease of use, enjoyment), which suggests a good acceptance of the system by its users [31, 32].

Lastly, the realism and credibility of the VP was highlighted by the students, as one trainee in psychiatry department even found the same characteristics as in a real psychiatric interview. These feedbacks corroborate our design guidelines (Guidelines 1 and 2) and suggest an effective application of psychiatric interview recommendations [1, 10], as well as the added value of motion capture technology to provide a realistic interview situation in terms of social presence [27] and symptomatology. It also emphasizes the need for transdisciplinarity between computer scientists and healthcare professionals when designing and evaluating ECAs for medicine, in order to provide a more credible, realistic and user-centered solution.

Overall, this first validation study paves the way for further research. First, we could adapt our VP to more expert participants, by involving for example healthcare professionals or intern medical students, and develop a VP showing subtler symptoms and a more complex scenario, in order to train clinical reasoning skills and performing differential diagnosis (distinguish one disorder from another), which is closer to a real psychiatric interview. Second, further studies could aim to demonstrate the validity of VPs for students’ training and evaluation. For example, by following a longitudinal approach, one could measure students’ improvement when training with VPs, and their ability to transfer their skills from virtual reality to real situations with standardized patients (as in [26]) or with real patients, compared to classical medical training. Additionally, analyses could focus on the assessment of VPs versus other assessment tools in order to ensure its accuracy. Conclusions from such studies would validate the use of VPs as an additional assessment tool, which could provide a more practical evaluation of students’ medical skills, that could by applicable for example during the classifying national examination.

7 Conclusion

To conclude, this study proposed guidelines for the conception of a virtual patient (VP) training for psychiatric interviews, based on psychiatric and medical education theories. This theoretical background enabled us to develop a realistic tool for training students’ conducting psychiatric interview. User testing with medical students showed encouraging results by validating our guidelines and proposing areas for improvements. Our guidelines would enable the ECAs community to design more consistent VPs in the context psychiatric interviews. These VPs would provide new training and assessment modalities in medical education, and could be dedicated to students but also to healthcare professionals (nurses, homecare providers, general practitioners), in an objective to improve mental health disorder diagnosis and healthcare delivery model.