Introduction

Chatbots are machine agents that serve as natural language user interfaces for data and service providers [15]. Many chatbots are being devised towards helping patients with symptom-based diagnosis, receiving instant feedback regarding general health questions. Chatbots can learn from previous interactions in order to increase the accuracy of their disease recognition. The vision is for chatbots to help people in less time and for less money than it would take to see a medical professional. Early experiences on the use of this technology began around 2014, and it is currently experimenting a great interest among both the medical and the computing communities. This work surveys the available knowledge about the usage of health chatbots.

Chatbots bring several benefits: anonymity, asynchronicity, personalization, scalability, authentication or consumability to name a few [34]. These benefits spark the interest for this technology to face a rapidly aging population and stringent demands of chronic illness attention [6]. From this perspective, chatbots align with previous IT technologies to support “mobile health”. Specifically, Embodied Conversational Agents (ECAs) (i.e. a sort of human- like avatars that are usually oriented to desktop computers and not mobile devices) have been assessed to support different kind of services, namely, interviews [52, 41], counseling [53, 45], chronic health conditions monitoring [48, 10] or medication adherence [49, 5]. These approaches are generally not usable in mobile phones, mostly due to browser-plugin requirements or assumption of large-screen availability [37, 36, 20, 6, 8]. On the other hand, Conversational Agents (i.e. those that use any unconstrained natural language input) have been recently surveyed [40]. Here, the emphasis is on the capabilities offered by Natural Language Processing in a healthcare setting, no matter whether these capabilities are offered through chatbots, ECAs or Smart Conversational Interfaces such as Apple’s Siri or Amazon’s Alexa.

Unlike previous studies, our focus is not so much about the means but the ends of the conversation. If chatbots are conversational interfaces, then what are that conversations aimed at? No matter the illness, chatbots engage in a conversation to track, educate, encourage or prevent some behavior on the patient. The term “behavior (change) centered paradigm” is being coined to stress that the final aim is to assist users in changing their behavior rather than the more traditional approach of assisting users in conducting a task [6]. Specifically, we focus on using chatbots for behavior change for healthy purposes. This gives rise to three research questions:

  • RQ1 (“for healthy purposes”): What illnesses are chatbots tackling?

  • RQ2 (“for behavior change”): What patient competences are chatbots aimed at?

  • RQ3 (“using chatbots”): Which chatbot enablers are of most interest in the health domain?

This paper presents the results of a Systematic Mapping Study (SLR) to address the previous questions. Our work differs from previous surveys in both the platform (i.e. chatbots rather than ECAs) and the target audience (i.e. patients rather than clinicians), no matter the sophistication of the chatbot abilities (e.g. we do not exclude studies where user input occurred by clicking rather than voice or natural language recognition). Our interest in chatbots comes from having developed health chatbots ourselves [46].

The chatbot space

Chatbots allow for direct user engagement through text messaging. This engagement is sought for different purposes: marketing, banking, booking, etc. We focus on “using chatbots for behavior change for healthy purposes”. That is, primary studies (PSs) are regarded as instantiations of the previous statement along a triplet <illness, competences, technicalEnablers> where “illness” serves to pigeonhole PSs along the Curlie’s classification1; “competences” denotes the desired patient behavior; and finally, “technicalEnablers” refers to the benefits brought by chatbot technology which are mentioned in the primary study. This conforms a three-dimension space to arrange chatbot proposals (see Fig. 1). The rest of this section introduces the main values for each dimension.

Fig. 1
figure 1

The chatbot space along three dimensions: illness, patient competence and technical enabler

Technical enablers

This subsection highlights what we value most from the chatbot technology in a health setting. Other domains (e.g. banking, retailing) might certainly value most other features. http://www.curlie.org/Health/

Asynchronicity

Chatbots provide the perfect blend between immediacy (prompt answer, i.e. a reactive behavior) and asynchronicity (notifications and reminders, i.e. a proactive behavior). When combined with social media, chatbots offer a powerful blend to not only reach, but also engage patients in illness prevention and care, specially youngsters, due to their familiarity with the media.

Consumability

Some scenarios require for chatbots to be readily “consumable”, e.g. when craving appears in addiction treatment. Consumability is a description of customers’ end-to-end experience with technology solutions, and includes not only the use of the tool but also the extent to which installing, configuring, and administering the tool is perceived as easy2. Consumability wise, chatbots outperform previous technologies in different aspects: installation (limited to add the chatbot to the list of contacts), platform independence (e.g. Instant Messaging (IM) apps are available for all major platforms -Android, iOS, Windows Mobile, Linux, Windows and macOS) and learning effort (i.e. chatbots follow a conversational interface, with multi-modal input -text, button-oriented-navigation or voice- without needing to learn new GUI). We also include as part of consumability, ubiquity. Basically, chatbots are instantly available in the same way that other IM contacts are, with no installation hassle. This might result in a competitive advantage compare with mobile apps [1]. Chatbots can also tap into the myriad of smartphone sensors to collect data in a transparent way (i.e. no consumption effort).

Anonymity

When it comes to sensitive healthcare issues, the possibility of interacting anonymously becomes a main enabler. Patients might feel less shame and open when interacting with computers, and show positive sentiment towards the software agents, feeling more private and anonymous in comparison with speaking to real humans [23].

Authentication

The process of verifying the identity of patients can be facilitated through built-in smartphone mechanisms. Chatbots can be secured using many of the same security strategies used for other mobile technologies: login credentials, two-factor authentication (i.e. the patient is required to verify their identity through two separate channels), biometric authentication (e.g. retina scan, fingerprint), etc.

Personalization

Meeting patients’ idiosyncrasies more effectively yields increasing user satisfaction which, in turn, leads to better treatment engagement. Smartphone sensors account for a transparent mechanism to collect patients’ behavior which can later feed AI algorithms. GPS and accelerometer data can serve as indicators of physical activity, facial recognition can help recognize user physical state, or in conjunction with a smartphone-based heart rhythm monitor, provide counseling to patients with heart-based condition. This promotes a more engaging and personalized conversation experience.

Scalability

Chatbots have the potential to target large audiences in a cost-effective way [37].

While usability addresses a user’s ability to use a product, consumability is a higher-level concept that incorporates all the other aspects of the customer’s experience with the product https://en.wikipedia.org/wiki/Consumability

Behavior change

Previous technical enablers aim at providing an engaging experience, but what is the ultimate purpose? If chatbots are conversational interfaces, what is that conversation aimed at? In a health setting, the final aim is to facilitate behavior change through enhancing human competences. This subsection revises these competences along those provided by Brinkman [6].

Monitoring

Awareness and tracking of bad habits is certainly a first step in changing behavior. Chatbots can benefit from external data capture through a wealth-sensor infrastructure: blood pressure, stress level, weight or amount of physical activity, could all be monitored to encourage healthy behavior.

Cognition

The next step for change is to elaborate on the causes and potential diagnosis. Chatbots can play a role on empowering patients to understand implications of health conditions. In this way, the hope is to support engagement through understanding.

Affect

Sustained engagement might be needed beyond the success of a first encounter with the chatbot. Empathy, understanding, acknowledging people’s emotional state are key for sustained patient involvement. Combining personality and emotional aspects in the dialogues, for instance introducing social dialogues - small-talk or chit-chat sentences - can improve patients’ satisfaction and engagement with the bot. Developing rapport with chatbots has also been in the radar towards promoting a sense of self-efficacy that ends up with patients persevering in the related behavior [25].

Behavior

Chatbots do not stop at tracking and informing patients. They can go a step further by influencing their behaviors. Chatbots might help through reminders (e.g. take medication, do exercise), gamification (e.g. badgers, pair competition) or removing potential barriers (e.g. leaving free agenda evening slots for jogging).

Methodology

To tackle our research questions, we systematically assessed existing evidence related to chatbot usage using SLR guidelines [47]. An SLR facilitates identifying and collecting key papers in a specific area of interest, and evaluating and interpreting the reporting discussions and findings. An SLR comprises a defined review protocol, search strategy, explicit inclusion and exclusion criteria, and specified information that will be retrieved from primary studies. This section summarizes this approach for our purposes.

Search strategy

To help build the search terms and identify potential overlapping, a set of key papers were identified, mainly those surveys related to nearby areas. On these grounds, the search string was conformed along the PICOC (Population, Intervention, Comparison, Outcome, Context) criteria:

  • Population: patients (rather than health practitioners)

  • Intervention: the use of chatbot for healthcare interventions

  • Comparison: we do not compare different strategies for healthcare interventions but assess the area as a whole,

  • Outcomes: characterization of promoting patients’ competences in terms of monitoring, cognition, affect, and behavior,

  • Context: any context in which patients interact with chatbots.

The identified keywords were: ‘patient’, ‘chatbot’ and ‘healthcare’. Next, synonyms should be found. Along the guidelines of Petersen’s [47], we conducted an informal literature search in order to identify keywords and to find a balance between hits and noise. We noticed that the terms ‘chatbots’, ‘conversational agents’ and ‘virtual agents’ tend to be used interchangeably. Hence, we included those terms. This resulted in the following search string:

$$ \left(\left({}^{``}\mathrm{conversational}\ {\mathrm{agent}}^{"}\ {\mathrm{OR}}^{``}\mathrm{virtual}\ {\mathrm{agent}}^{"}\ {\mathrm{OR}}^{``}{\mathrm{chatbot}}^{"}\right)\ \mathrm{AND}\ \left({}^{``}{\mathrm{health}}^{\ast "}\right)\right)\ \mathrm{AND}\ \left({}^{``}{\mathrm{patient}}^{"}\right) $$

However, the population filter (i.e. ‘patient’) was too restrictive for two repositories, i.e. IEEE Xplore and ACM Digital Library. Hence, we decided to remove ‘patient’ from the search query, and support it as an exclusion criterion.

This query was matched against the title, the abstract and the keywords. We restricted the search to studies published from 2014 up to December 2018. Five electronic databases were consulted: IEEE Xplore, ACM Digital Library, Springer Link, Science Direct and Google Scholar. Figure 2 outlines the resulting numbers. In total, we obtained 2377 primary studies in this first step.

Fig. 2
figure 2

Filtering studies

Filter strategy

Studies were excluded if they met any of the following criteria:

  • EC1 (‘The study was published before 2014’). Chatbots started to be created around 2014, when Telegram added support for chatbots development [38].

  • EC2 (‘The study is not centric to mobile-based chatbots’). Embodied Conversational Agents (i.e. a sort of human-like avatars that are usually oriented to desktop computers and not mobile devices) are left outside this study (see [29] for a review).

  • EC3 (‘The study does not target patients’). We removed articles that do not directly target patients (e.g. caregivers or healthcare professionals)

Filtering was conducted by one author, and if in doubt, the second author also intervened. This happened eleven times. No quality assessment was conducted except that of being peer-reviewed papers.

Data extraction and classification

This includes two main steps. First, identifying relevant topic keywording that help answering the research questions. This yields the classification schema. Our classification schema includes tree main dimensions: <illness, competences, technicalEnablers> (see Section 2). Second, data synthesis. Each PSs were positioned in the "chatbot space" along the aforementioned dimensions. A PSs is characterized along a single illness but it might tackle more than one competence and technical enabler. Illness are explicitly mentioned in the PS text. However, competences and enablers were obtained using an integrated approach to thematic analysis that aims at quantifying (according to predetermined categories) content in a systematic and reliable manner [14]. Two investigators reviewed details extracted from the set of PSs in order to identify the illness, the aimed patient competences, and the underlined technical enablers brought by chatbots. Tables 1 and 2 gather these 30 triplets that provide the raw data in which to base the findings.

Table 1 Primary studies classified by addressed illness and main enablers1
Table 2 Primary studies, by purpose and leveraged competences2

Findings

This section goes back to the research questions. References refer to the primary studies (PSs) being considered.

RQ1: What illnesses are chatbots tackling?

We classified Primary Studies (PSs) along the Curlie’s classification for illness typification (see Fig. 3)

Fig. 3
figure 3

Tree-map depicting illnesses categories tackled by revised chatbots. PS numbers are shown in brackets

Neurological Disorders (6 PSs)

These chatbots are oriented towards assisting patients with psychiatric diagnosis illnesses, like dementia [3, 13], Alzheimer [25], insomnia [4], depression [50] or general psychiatric counseling [45].

Mental and Physical Wellness (8 PSs)

Promoting healthy habits [42, 44, 9, 31, 24] and physical exercises [8, 20], along with delivering positive psychology and mental well-being techniques is the aim of 8 PSs.

Nutritional-metabolic-disorders (6 PSs)

This basically covers obesity [11, 27] -specially child obesity [36, 22]-, allergies [26] and diabetes [10].

Addictions (3 PSs)

This includes drug consuming cessation interventions with a special focus on alcoholism [41, 17] and smoking [16].

Sexually-Transmitted-Diseases (3 PSs)

This includes HIV/AIDS [7, 53] and syphilis [35].

Others (3)

Chatbot experiences also include treating aphasia [54], diagnosing rare diseases [39], and detecting heart condition and cardiovascular disorder problems [32].

What patient competences are chatbots aimed at?

This moves us to the human competences chatbots look to promote (see Fig. 4). Evidences are collected from PS quotes which highlight the interest of the human competences. Hence, PSs account for one illness but potentially several competences.

Fig. 4
figure 4

Bubble chart mapping illness across human competences

Monitoring (7 PSs)

Chatbots can help by tracking different bio-signals: physical activities [20, 8], heart rhythm [32], blood pressure [11], body temperature [27], facial-expressions [31], voice intonation [45], or sensed monitoring chat-based answer feedback [36].

Cognition (13 PSs)

Unlike Expert Systems, chatbots target patients, no clinicians. This restricts cognition complexity to indicate simple causal relationships that help patients understand and support them in their decision making. In this way, chatbots become health coaches that back and advise users in different settings: their physical activities [8, 37]; counseling about healthy eating behavior [20, 27, 24]; facilitating patients to provide allergy-related information about visited restaurants and offering appropriate feedback [26]; making patients aware of depression episodes in cancer [50] and diabetes scenarios [10]; fighting back Alzheimer symptoms through quizzes (e.g., showing a clock and asking for the time) [25]; improving their adherence to insomnia therapy treatment [4]; providing advice in either general terms (i.e. first-aids like using Apple Siri, Google Now, Samsung Voice or Microsoft Cortana [44]) or illness specific (e.g., atrial fibrillation [32]); or keeping long-term recurrent interviews with the patient for rare disease diagnosis [39].

Affect & Attitude (14 PSs)

Brinkman highlights how “attitudinal change, for example, self-efficacy, or attitude towards healthy living, can be a key enabler that could lead up to change in people’s behavior” [6]. Here, chatbots can be engineered to look as " a friendly advisor or mentor to the user rather than a therapist or health care professional" [17]. This is more so when handling sensitive health problems, like alcohol-related issues [17], syphilis [35], HIV/AIDS [53, 7], or aphasia where patients with difficulties to speak can practice without the fear of shame and embarrassment [54]. Here, chatbots provide a secure, anonymous setting, where patients can develop a sense of rapport without fear of stigmatization and discrimination.

In this setting, chatbots can analyze patient mood, and reinforce attitudinal changes, accordingly. First, mood analysis. It can be conducted in different ways: capturing participants’ facial expressions through the camera [11]; analyzing sentiment of text messages [31, 3]); or using prosodic and statistical features extracted through voice signal [50]. Once mood is being analyzed, chatbots might resort to distinct strategies to reinforce attitudinal changes: positive psychology in the context of mental well-being (e.g. practicing kindness, expressing gratitude [42]); telling jokes in the context of dementia [13]; motivational interviewing (i.e. aiming at finding the intrinsic motivation to change patients’ lifestyle) [41, 28]; or praising for the efforts of insomnia patients [4].

Behavior (8 PSs)

Chatbots can influence patients’ behavior through distraction and encouragement. The former is illustrated by [16] where smokers are helped quitting by distracting them during the craving phases (e.g. sending multimedia content). In a similar vein, Cameron et al. report on a bot that starts conversations when users are experiencing stress in the workplace [9]. As for encouragement, different strategies are being attempted: meal-portion recommendation messaging at lunch time [22]; gamification techniques [21]; resorting to social-influence strategies (e.g. shared decision making [4]; promoting self-efficacy by tracking patients’ own challenges (e.g. counting steps as daily goals [36]).

Figure 4 showcases how chatbots address the four competences. Interestingly, affect & attitude seem to be the most popular competences being tackled, hence moving chatbot technology a step further from Expert Systems, i.e. addressing not only cognition but also attempting to engage and improve the treatment adherence of patients.

Which chatbot enablers are of most interest?

This subsection cross-matches aims and enablers to uncover what technological features might deserve more focus, health wise. It also unveils enablers not yet so much used to account for illness treatments (see Fig. 5). Evidences are collected from PS quotes which highlight the interest of the enabler. Hence, PSs account for one illness but potentially several enablers.

Fig. 5
figure 5

Bubble chart mapping illness across technical enablers

Scalability (3 PSs)

This enabler permits to handle large patient bases in a cost-effective way. Interesting enough, only three studies highlight this enabler as key in their setting. In [36], authors refer to near 13000 conversational turns during 4 months, something nearly impossible to obtain without automating the conversations. Likewise, [16] sets scalability as a main non-functional requirement to handle near 7000 participants with distraction tricks during craving times while quitting smoking. Finally, [28] highlights the value of chatbots as an efficient tool to address a global public health challenge, the lack of enough human psychotherapists to cope with the increasing mental health problems among the population.

Personalization (13 PSs)

This enabler facilitates tailoring to patients’ medical history and specifics. This comes in handy in nutritional metabolic disorders and physical wellness. The former is illustrated for allergies [26] and diabetes [10] where chatbot behavior is adapted to the patients’ profile. [27] and [24] also leverage this enabler to provide personalized diet and healthy eating tips to their patients. As for physical wellness, personalization allows for custom health screening [9], custom motivational support [20, 42] and custom guidance assistance [8]). In [22], the chatbot explicitly requests users about message appropriateness, and change its behavior accordingly.

In addition, personalization techniques can also help simulate affect. Discussing about addictions or at-risk behaviors can be emotional engaging (e.g. shame, discouragement, fear, anger) [41, 17]. For succeeding in the therapy efforts, affinity and empathy are a must. Chatbots can simulate emotion by resorting to computer vision to capture emotion, and react accordingly, e.g. introducing motivational comments in the conversation [16]. This explains that 3 out of 3 addiction-related PSs value personalization capabilities.

Consumability (11 PSs)

Easy consumption becomes key to promote and engage patients in the persistent use of chatbots. This is the case in health screening [9], motivation support [20] and guidance assistance [8]. In addition, this enabler is also largely noted in the area of neurological disorders. Capturing participants’ facial expressions or voice analysis are available to tune chatbot interactions [31]. This comes in handy to tackle mental disorders, usually aging-associated [3], where patients usually are more apt to use their voice than text commands. For patients with neurological conditions (dementia [13], depression [50] or Alzheimer [25]), chatbots’ usability and accessibility features are highly regarded. Finally, ubiquity turns to be a must in different scenarios: when trying to detect early symptoms of cardiovascular disorders [32], when patients with allergies want to check if there are nearby suitable restaurants [26], or for rapid intervention in insomnia episodes [4].

Asynchronicity (7 PSs)

This enabler promotes “on-site openness” where patients expose their hesitance at the time craving appears [16]. This enabler is specially appreciated in three scenarios: remembering assigned exercises [36, 42]; praising for achievements [11]; and encouraging adherence to the treatment [22].

Anonymity (6 PSs)

As expected, anonymity is a key enabler for sexually transmitted diseases. Potential carriers of sexually transmitted infection diseases tend to avoid early consultation of medical advice due to the fear of stigmatization [7, 53, 35]. Similar rationales hold for addictions [41], sensitive mental health issues [43], and aphasia where patients might unashamedly practice with the bot [54].

As a byproduct of this technical dimension, we were also interested in knowing the extent to which developers resorted to chatbot platforms or rather they had to elaborate on their own. Figures follow: custom developments (57.7%), Facebook Messenger (15.4%), Telegram (11.5%) and not-specified (15.4%). These figures seem counter-intuitive since platforms speed up chatbot development in contrast with cumbersome custom development. The rationales behind this option might be in the inability of chatbot platform to use custom UI widgets for communication, or more importantly, the inability to directly access smartphone’s built-in or external -wearable-sensors (like accelerometer, gyroscope, or possible health related sensors like ECG, heart rate, breathing and activity monitors, spectrophotometer, or any other biometric sensor). When developers do not need these features, they tend to use Facebook Messenger and Telegram platforms for their bots. But custom development has also its downsides. In addition to costly development, consumability decreases since patients are forced to install the custom app. A compromise could be reached if IM platforms would allow developers to access -with users’ permission- any data available for the smartphone (as it is the case of apps).

Discussion

Implications for research and practice

To our knowledge, this is the first mapping study of chatbot-mediated behavior change on the health domain. Our findings show an increasing interest in this kind of technology to engage patients. However, the novelty of the area makes most reported efforts be devoted to develop the chatbot from scratch rather than conducting more systematic and ample studies. At this respect, the large number of custom developments points to the opportunity of dedicated platforms that tackle the specifics of the health domain. The youth of the domain is also noted by the lack of rigor when reporting experiences. Good practices as those presented in [30] would help compare chatbots that address similar illness or tap into the same enablers. Similar initiatives are also available in the health domain, e.g. the Consolidated Standards of Reporting Trials of electronic and mobile health applications and online telehealth (Consort-eHealth) [18].

Primary studies also fall short of reporting the role played by supporters during the chatbot intervention. In this setting, support is defined as contact with a human with the aim of increasing the patient’s ability to use the chatbot to obtain the intended treatment result. Indeed, integrating human support to promote engagement and provide technical and clinical troubleshooting is being reported as more efficacious than self-guided, self-help interventions [2]. This plays in favor of chatbot. Unlike traditional clinical practice, chatbots bring about opportunities to gain information about people’s use of chatbot and to intervene accordingly. The point to note is that integrating human support would benefit from models grounded on psychology insights.

Chatbots should be aligned with other Behavioral Intervention Technologies (BITs) insofar as adopting an Efficiency Model of Support, i.e. a conceptual framework to capture the interplay between information and intervention in an efficient way [51]. Here, efficiency is defined as “the ratio of the outcome of an intervention relative to the human resources required to deliver it, since each decision corresponds to supporting that intervention (what, when, how much, who provides it) represents a trade-off between devoting additional resources and accruing additional benefits” [51]. According to this model, decisions should be based on the consideration of why people may fail to benefit from BITs. This model introduces five categories of possible failures. Usability failures refer to the design, ease of learning, and ease of use of the technical features. Engagement failures occur when the person has the capacity to use a BIT, but does not, for example, because he or she lacks the intention or motivation. Fit failures occur when the assigned tool does not meet the patient’s needs. Knowledge failures occur if a person uses a tool, but does so incorrectly (e.g. patients might need more instruction in order to understand the treatment’s rational), which would impact the effective practice. Lastly, implementation failures occur when a person uses the tool within a BIT, but fails to incorporate it into his or her life (i.e. no recurrent usage). Unfortunately, support intervention has been largely overlooked in the reviewed primary studies. Some exceptions include:

– [19], where the chatbot includes a human-based backup "feature" in the process. This permits to respond to patients in need, “especially when they feel trapped at certain point during their treatment”; and

– [53], where authors suggest for chatbots dealing with very sensitive health- conditions like AIDS, the need for users to “request to talk to a trained counselor" at any time, and that the bot should link the patient to a trained professional whenever "a mention was made of thoughts of self-harm or suicidal ideation” (p.81).

So far, most examples are predominantly self-help interventions. In these interventions, support is not a continuous, pro-active process. Rather, a supporting individual (e.g. a physician) may be aware that the patient is using the chatbot and receives timely information to be able to e.g. reinforce use of the chatbot, but the chatbot mostly operates separately from the support. This form of support is intended to address engagement failures. But this might not be enough. The advantage of the Efficiency Model of Support is to provide a framework where chatbot designers might reflect about eventual failures and whether support resources might payoff.

Finally, social-technical implications are hardly mentioned. Monitoring, counseling, preventing and detecting illnesses might well thrive if chatbots could harness full data for wearable sensors, improve their natural language processing skills, or enhance the emotion detection algorithms. These topics are evolving research areas that will be applied, without doubt, in future chatbots. Yet, it remains to be seen if the ethics of delegating human health problems to software programs are acceptable trade-offs. Indeed, when it comes to serious health concerns that require immediate actions, chatbots do not always handle it adequate or timely [44]. This calls for continuous evaluation of chatbot functioning, including problematic responses, privacy breaches, or patient harm, that might eventually leave chatbots in “quarantine”. Notice has been given about the deep social implications of bringing automatization to previously predominantly social activities [12]. Given the sensibility of health matters, chatbot studies would benefit from including a social-technical analysis where potential biases and the impact on the patient-doctor relationship were discussed.

Threats to validity

Construct Validity

This includes two major threats. The first threat is that the research questions may not provide complete coverage of chatbot-mediated change behavior in the health domain. We focused our study on arranging chatbot experiences along three main dimensions: illness, competences and technical enablers. Other dimensions might also be of importance to understand the adoption of chatbots in the health area (e.g. patient age, social setting, etc.). The second threat is about not including all the relevant works in the field. This threat was alleviated by combining several databases and manual searches to selected journals and conferences. We also checked out with related reviews, specifically, the one about Embodied Conversational Agents and the one on Conversational Agents [36, 40].

Internal Validity

Limitations mainly come from the categories being used. Firstly, we resort to the taxonomy of [6] to categorize chatbot opportunities to enhance human competences. This is a new taxonomy that has not yet been widely used by academia. Likewise, categories for the technology enablers brought by chatbots, though inspired in other works [34, 46], reflect our understanding as health chatbot developers ourselves. Whereas the vocabulary of clinical medicine is being standardized over a long period of time, chatbots are a recent technology where a nascent terminology is emerging as chatbots become mainstream.

External Validity

Our mapping study provides a global view about illnesses, patient competences and chatbot enablers based on 30 primary studies. Kitchenham et al. suggest that 10 papers or fewer are insufficient for an assessment of completeness for a mapping study, whereas 30 papers would be enough [33]. For a mapping study, the aim is to look at the high-level research trends in a broad topic area where completeness might not be so critical as long as the search strategy remains unbiased. In our case, we selected chatbot experiences published between 2014 to 2018. It is unlikely that excluded papers published before 2014 may affect the generalizability of the results. Year 2014 is the one Telegram added support for chatbots development [38], and in this way, recognizing chatbots becoming mainstream.

Conclusion Validity

Bias in data collection could hinder reproducibility of our study. We mitigated this threat by establishing a protocol to extract the data of each paper equally. In addition, Table 1 and Table 2 summarize classification decisions for researchers to replicate the study.

Conclusion

We classified 30 peer-reviewed articles along the triplet <illness, competence, technicalEnablers>. In so doing, we provide three initial insights about the healthcare-chatbot space. First, the two most active health areas are mental & physical wellness and nutritional & metabolic disorders. Second, “affect” and “cognition” are the human competences most sought-after by chatbots to attain change behavior. Finally, “consumability” is the largely mentioned chatbot enabler.

The field is still in a nascent period of investigation but with a large potential to provide a cost-effective way to handle a larger aging population. Based on the surveyed papers, two main considerations arise. First, the need for experiences to be reported along good practices [18] so that they can be compared or replicated. Second, the convenience of including the broader sociological implications brought by chatbots. Though this study focuses on the patient, other main stakeholders are also impacted: caregivers, physicians or relatives should also be surveyed about how chatbots affect their relationship with patients. After all, chatbots are embodiments of behavior-change principles from psychological science.