Keywords

1 Introduction and Motivation

More than 20 years ago, obesity was classified by the world health organization as an epidemic and global problem [1]. Since then, prevalence rates have steadily increased, making obesity and associated secondary diseases (e.g., cardiovascular diseases or diabetes) a significant burden for national healthcare systems [2]. Sustainable lifestyle modifications with changes in nutrition and exercise behavior are mandatory for long-term treatment success. However, weight loss programs often miss their goal of long-term weight reduction due to a lack of adherence and resulting in relapses [3]. Current research focuses on embodied conversational agents (ECAs) as virtual coaches (VCs), i.e., intelligent software systems with an animated avatar that support patients in their daily life and promise an improvement of therapy adherence [4]. In particular, the COVID-19 pandemic with restrictions in social contacts highlights the need for ECAs to support traditional healthcare approaches. However, there is a lack of digital health applications that utilize avatar and conversational agent (CA)-technologies for addressing the needs of obese and especially morbidly obese patients. Previous research investigated the use of an ECA as unimodal support (i.e., promoting physical activity) for overweight patients (BMI of 30 kg/m2) [5] and the use of a disembodied chatbot [6]. However, an avatar-based VC supporting the recommended multimodal treatment consisting of diet, exercise and behavioral therapy of morbid obesity (BMI > 40 kg/m2) is not available yet. Therefore, we introduce an avatar-based VC prototype for morbidly obese patients. This paper continues by describing the design approach, providing a system overview, explaining the components in detail and providing results of an evaluation of its potential usability from a patient’s perspective.

2 Design of the Artifact

2.1 Approach and Design Process

When it comes to the design of socio-technical systems that target behavior changes, especially findings from health psychology can inform design decisions. Therefore, our prototype design is informed by the Behavior Change Technique Taxonomy by Michie et al. [7]. A literature review by Asbjørnsen et al. [8] revealed that with respect to weight loss maintenance, self-monitoring, feedback, goal-setting, shaping knowledge as well as social support were significant behavior change techniques applied in existing e-Health applications. We used these findings as a first set of meta-requirements (MRs) to prepare interviews with health professionals and construct questionnaire items for a patient survey. Three guided interviews with a psychologist, a dietary assistant and a physician were conducted to identify and refine requirements. The interview questions were focused on possible use cases, frequently asked patients' questions, and examples of shared therapy goals. In addition, two patient surveys were conducted. The first survey (n = 27) focused on self-monitoring using sensor integration and self-reporting scales. In contrast, the second survey (n = 33) assessed how helpful goal-setting features, personalized feedback, reminders and asking the coach various therapy-related questions were perceived to prioritize the requirements. Both surveys were unrelated and conducted with patients from a non-surgical obesity treatment program. The program participants have a mean age of 46 years and a mean BMI of 49.5 kg/m2 [3]. As our survey solely focused on opinions, a formal medical ethical approval was not required. All patients provided their consent. The highly prioritized requirements were then instantiated in a first prototype. Figure 1 depicts the elicited core set of MRs and derived design principles (DPs) based on the structure proposed by Gregor et al. [9].

Fig. 1.
figure 1

Core set of meta-requirements and derived design principles

2.2 System Overview

Our implementation of the virtual obesity coach extends and adapts a web-based patient portal for patients with multiple sclerosis [10]. In the following, the main system components (see Fig. 2) are briefly described.

Animation Rendering: This component represents the avatar of the coach rendered via WebGLFootnote 1 in the browser. We deliberately chose a cartoonized avatar similar to “Disney Pixar” or Apple “Memojis”, as previous research has shown that this might be beneficial to avoid uncanny feelings of the user due to imperfections of too photorealistic representations [11]. Regarding gender, we chose a female avatar for this research. The avatar model was designed with the software “Reallusion Character Creator”Footnote 2 and animated, as well as exported (WebGL build) with the gaming engine “Unity”Footnote 3. For animating the avatar, we used animations such as idle, greeting or pointing gestures from the library “mixamo.com”. Further, we used a Unity packageFootnote 4 that allows a lip sync approximation and automated animation of eyes, eyelids and head. The speech data is delivered from the backend and lip sync is generated in real-time. To realize the multimodal behavior of the avatar (e.g., talking while pointing on the weight curve), we build on the behavior markup language (BML) that aims to standardize behavioral control of ECAs. Therefore, we re-used a BML realizer provided by the Horizon 2020 project “RAGE”Footnote 5.

Fig. 2.
figure 2

The system architecture of the virtual coach

The Self-Monitoring component provides the necessary data to generate coaching messages and enables the patient to overview health- and lifestyle-related data in a chart. Our requirement analysis has revealed that caloric intake, drinking amount, daily steps, sleep and stress are particularly important for therapy. Weight, step and sleep data is delivered by wearables and a Bluetooth body scale and queried from the Google fitness store. Nutrition and stress data must be entered manually by the patients using forms or questionnaires. On the backend side, several manager components control the data provision and integration. To avoid intervention fatigue, patients are prompted by the system only twice a week (on randomly selected days) to log their diet.

Goal-Setting: Patients can actively set goals (steps, calories, weight, time of lapses) that were agreed with the therapist. Several time periods and also details on how to achieve the goals are adjustable (e.g., to increase daily steps, the patient is instructed to get off one tram stop earlier on the way to work). The specified goals also serve the coaching engine for message generation and patients can track their progress in a chart.

Coaching-Engine: Our prototype instantiates the just-in-time-adaptive intervention (JITAI) framework [12]. JITAIs refer to an intervention design with the philosophy of providing the right type and amount of support at the right time by continuously adapting to the changing context. A central building block of the JITAI framework are decision rules (if-then) that link tailoring variables (e.g., stress, weight loss or hour of day) with intervention options (e.g., feedback, reminder). For example, patients often relapse after work by turning to unhealthy food. Here, the coach may intervene, for instance: “Before you reach for unhealthy food, always remember: now you may feel better, but just imagine how you feel on the scale tomorrow! Does this really reduce your stress?”. To integrate the coach into daily life, we integrated a Job-Scheduler that sends E-Mail and SMS notifications.

For the conversational competencies of the VC, a Natural Language Understanding component and a Dialog- and Behavior Manager are utilized. The user can interact with the coach via a text-based chat widget that uses a combination of constrained input for safety-critical conversations (action buttons with predefined questions/answers) and unconstrained input (free text). Understanding unconstrained text input is realized via the Microsoft language understanding serviceFootnote 6 and may be enriched by automatic speech recognition in a further development of the prototype. The dialog manager keeps track of the conversation flow based on a finite state machine (i.e., rule-based). The behavior manager then generates appropriate non-verbal and verbal behavior based on the dialog state and converts it into a representation that the Animation Rendering component can realize. For verbal-behavior, we used the text-to-speech service Amazon PollyFootnote 7 and a voice called “Vicky” similar to the voice of the popular speech assistant “Alexa”. For example, the user can ask for practical support in the form of diet suggestions (“What can I eat for lunch?”) or other therapy-related questions (e.g., “Why have I gained weight this week?”). If the user reports medical problems and feels unwell, the therapist is notified by the system. The VC also shapes knowledge by providing behavioral guidance in critical situations such as when eating out or when the system has noticed unexpected changes in daily routines. By using unconstrained input for well defined and uncritical intents (e.g., recipe recommendation), the input modalities of the VC go beyond previous work on an ECA for this health context that solely relied on multiple-choice menus [5]. Although language understanding capabilities render a more natural interaction possible, conducting empathic dialogues this way remains a challenge since the advent of ECAs [13] and was solved with constrained input options.

3 Evaluation, Significance of Results and Outlook

We evaluated our software artifact with a video demonstration. A one-minute videoFootnote 8 was presented to n = 12 patients (female: 7, male: 5) where the VC introduces herself “Hello, my name is Lea, I’m your virtual coach…” and explains the core functions of the system. We assessed perspicuity (pragmatic quality), stimulation and novelty (hedonic quality) based on the user experience questionnaire items (UEQ) [14]. In addition, we asked for verbal feedback regarding the overall impression. Since we used a video, an evaluation of efficiency and dependability was considered inappropriate. Overall, the results indicate a clear positive feedback (see Fig. 3) that is supported by a both-sided Wilcoxon rank sum test against the neutral value of the 7-point scale (all p < .05). Although the concept of an avatar-based VC for morbidly obese patients is novel, the ratings for novelty varied. The variance could reflect previous experiences with avatars, e.g., in video games. Regarding the verbal feedback, only one patient criticized the artificial voice of the coach. Implementing pre-recorded voices might be an alternative but would require more resources and restrict adaptability.

Fig. 3.
figure 3

Evaluation results of perspicuity, stimulation, novelty and verbal feedback

Our research contributes further evidence to the notion that ECAs could help address a gap in medical care and, moreover, introduces a promising first design of an avatar-based VC for obesity patients. The present data suggest that patients not only welcome a VC but may even have a demand for it. This is of particular importance as ECAs could also help reduce healthcare costs and provide the continuous care support required to manage chronic diseases such as obesity. Because ECAs are complex systems that have so far been mostly deployed as desktop installations [4], our study also demonstrates the practical feasibility of a web-based ECA. The web-based implementation enables broad accessibility and adoption of the VC. Additionally, this research presents a core set of MRs and DPs validated with multiple surveys and interviews. However, the current version does not contain elements for the management of diabetes, a condition quite common in the target population (e.g., monitoring blood sugar levels). This will be taken into account in further development. We also plan to port the application to mobile phones to enable more comprehensive just in time support. For example, the VC could then automatically detect if the patient is in a grocery store and provide nutrition advice for maintaining behavior changes or help prevent relapses. The advanced avatar could also be provided with abilities to demonstrate physical exercises. At the same time, further usability studies are required to determine, if, for example, an obese “peer buddy” is more suitable as an avatar than a lean one and which gender might be beneficial for this context. Finally, future studies should demonstrate the clinical evidence and investigate long term effects.