Keywords

1 Introduction

Social robots have been evaluated in many domains for supporting humans, showing different levels of success in a variety of user studies. Health and wellbeing contexts are one of such domains and hold great potential for the use of social robots in supporting individuals, as well as caregivers and health professionals. To better understand the benefits and potentials of social robots, it is informative to review past Human-Robot Interaction (HRI) studies and understand the outcome of such studies where social robots interacted with individuals, as well as to understand how they were evaluated.

There have been extensive studies of social robots for specific user groups, such as children with Autism or persons with dementia. Similarly, while there is a large body of reviews related to social robots and health and wellbeing, most of the review articles focus on a single type of user group or settings, such as in hospital settings [16], for supporting older adults (e.g., [5,6,7, 18]), for supporting people with dementia (e.g., [14, 19, 20, 22, 33], for mental health or psychological wellbeing interventions (e.g., [17, 24, 28]), from perspectives of non-patients who interacted with the robots (e.g., [15, 27]), or focused on the robots as opposed to users’ evaluations [4, 10, 21, 25, 32]. However, evaluation of HRI studies in the larger scope of health/wellbeing contexts has seen limited attention, perhaps due to the very large scope of the evaluation.

Recently, Santos et al. (2021) systematically mapped the literature relating to robotics and human care [26], covering 69 past studies in this domain to understand the types of tasks performed with the robot (e.g., personal assistant, object manipulation, human monitoring). A recent large-scale literature review conducted by the authors’ research team covered 443 articles where social robots were evaluated in HRI studies with adult participants (including younger and older adults) [13]. The review presented the social robots used in the studies, the settings and situations in which HRI studies were conducted, type of data collected in the studies (i.e., quantitative, qualitative, or mixed), robot control (e.g., autonomous), and user groups and their health conditions [13]. Here, we expand on these findings and further analyze the data gathered in the search presented in [13]. The contributions of this review include (a) outcome of the past HRI studies, (b) presence of statistical analysis for supporting the outcome, and (c) distribution of number of articles contributed by different authors in the past studies as identified by our search.

2 Research Questions

This review article addresses the following research questions.

  • RQ1 What were the outcomes of the past HRI studies that evaluated social robots in health/wellbeing contexts with adult participants?

  • RQ2 How were the data used for reporting the outcomes of the studies analyzed? Specifically, we ask if the results of the articles were based on statistical analysis.

  • RQ3 How broad is the field in terms of the number of different researchers who have contributed to the publications in the reviewed context?

3 Methodology

This systematic review carefully followed the steps outlined by the Centre for Reviews and Dissemination [3], and the reporting follows the PRISMA 2020 guidelines [23]. In this section, we present a short summary of the methodology of the systematic review. A more thorough description of the methodology is presented in [13], where research questions beyond the scope of this article are addressed. As discussed earlier, in this paper, we expand on the results of the data collected in [13] to address new research questions that are presented here.

In this review, social robots are defined as robots that operate alongside humans and are capable of interacting in human-centric terms [8, 9]. Health/wellbeing is defined as “the extent to which an individual or group is able, on the one hand, to realize aspirations and satisfy needs and, on the other hand, to cope with the interpersonal, social, biological and physical environments” [31]. A more thorough definition of the terminologies is presented in [13].

According to these definitions, our eligibility criteria required peer-reviewed studies that used and reported on social robots in a health or wellbeing context, where the participants interacted with the social robots.

3.1 Eligibility Criteria

Our inclusion and exclusion criteria were as below.

Inclusion Criteria:

  • Studies with adult participants (18+ yrs old)

  • Studies published in peer-reviewed conferences or journals

  • Studies that involved participants who engaged with or evaluated a social robot in the context of health and wellbeing

  • Studies on the use of social robots for a health or a wellbeing intervention, with related outcomes/evaluations

  • Studies on the use of physically embodied robots, and robots that possess social skills, i.e., those that are considered social robots based on our definitions above

  • Studies reported in English

Exclusion Criteria:

  • Studies on the use of a purely robotic device (exoskeleton, sensors, artificial limbs etc.) without social attributes

  • Studies on the use of robots in healthcare, where the robots did not exhibit a social behaviour (i.e., where the robot was not being operated/programmed to act as a social robot according to our above-mentioned definition)

  • Studies with only children as participants

  • Studies reported in a language other than English

  • Studies that were not included in a conference proceeding or a journal (e.g., book chapters, technical reports, etc.)

  • Studies that did not have any results related to health/wellbeing as defined above (e.g., studies that only evaluated general attitudes towards or acceptance of social robots without interactions with a robot or without considering a health context)

3.2 Information Sources and Search Strategy

Five databases were searched on February 6, 2021 to find relevant studies. MEDLINE via PubMed, PsycInfo via APA PsycNet, IEEE Xplore Digital Library, ACM Digitial Library and Scopus were chosen for their coverage of the health and/or technology literature. The initial search strategy was developed for PubMed in an iterative process by a librarian in computer science with input from the review team (see [13] for more information). See Table 1 for the search used in PubMed.

Table 1. PubMed Search
Fig. 1.
figure 1

Prisma flow diagram for systematic reviews

The main concepts searched in PubMed’s MEDLINE were social AND robot and were selected based on the research questions. To cut down on the amount of irrelevant results found in the other databases (mainly on the development of social robots and their use outside of health), the concepts of participant AND health were added. To define the search terms, we reviewed relevant papers to ensure that we captured the different keywords and vocabulary used by authors in both social robotics and health domains. After multiple iterations where different keywords were checked for their precision and recall of relevant articles, the search terms were defined for each database.

The databases returned a total of 11338 results. These results were exported into RefWorks [2] and 1932 duplicates were removed. The remaining 9406 were exported into Covidence [1] and 44 more duplicates were removed.

3.3 Selection Process

The 9362 unique articles were screened in Covidence by six members of the review team (two people per article) and disagreements were settled by discussion with at least two additional team members. 739 articles were included for full-text review. A full-text review (one person per article) was conducted afterwards. Full texts were checked again for eligibility at the time of data collection. Please see Fig. 1 for more details.

3.4 Data Collection and Synthesis Methods

The data items and extraction process were developed through discussion by a multidisciplinary team and tested by five of the reviewers. Five reviewers performed the data extraction (one person per article). Some studies did not include all the data points of interest and those were left blank in the chart (unless the missing information was required as a part of the inclusion criteria, in which case the article was removed).

4 Results

For a thorough summary of the country of authors of the reviewed articles and year of articles published see [13]. The majority of the articles were published by researchers in Japan and the United States; however, the search identified articles written by researchers in 44 different countries [13]. Below, we will report on the new results related to each of the above-mentioned research questions.

Fig. 2.
figure 2

Study outcomes

4.1 RQ1 - Study Outcomes

If the social robot had a positive influence on users (attitudes, behaviours, quality of lives, perceptions, etc.), the outcome was categorized as positive. It included instances where the robot improved various aspects of people’s lives or moods, or was associated with positive attitudes. If it had a negative influence on participants, did not work as intended, or attitudes were negative, it was categorized as negative. The negative category included instances in which participants displayed disinterest in using the robot or negative interactions were observed. Some articles had more than one study with different outcomes, or in a single study, both positive and negative outcomes were reported. These cases are shown with Positive/Negative. Similarly, neutral shows when no difference was observed in the presence/absence of the robot, and positive/neutral shows the cases were both observed in a paper, e.g., in two studies reported in the same article. The “not clear” category shows instances where the effect of the robot was indeterminable, or the study failed to yield conclusive results.

The social robots in the reviewed studies were used in many different roles, such as for providing companionship, as therapeutic and rehabilitation robots including animal therapy, for health data acquisition or diagnosis of different conditions, for cognitive support, as health and exercise coaches, or for helping with fall detection/prevention. Different aspects of the social robots were evaluated in the reported HRI studies, including their effectiveness and participants’ attitudes toward the robots. The studies were conducted in a variety of settings, including research labs, participants’ homes, care centres, and hospitals (see [13]).

Figure 2 shows the outcome of the reviewed papers. The vast majority of the articles(365 out of 443 articles) suggested a positive outcome of social robots on aspects of participants’ attitude or health/wellbeing. As can be seen in Fig. 2, approximately in half of these cases, a proper data analysis method (i.e., statistical analysis) was performed.

Fig. 3.
figure 3

The figure shows whether the articles reported the results based on statistical tests

4.2 RQ2 - Data Analysis

The presence of statistical analysis was assessed for the studies in the reviewed papers to investigate if the results or conclusions were drawn from those analyses. We acknowledge that statistical tests are not necessarily a requirement for all studies in the health and wellbeing contexts, but such tests can be meaningful in order to interpret the results. If a study conducted such tests and reported on any aspect of the analysis (e.g., even p-values only), it was classified in the “Yes” category (see Fig. 3).Footnote 1. On the other hand, if a study did not conduct any statistical tests, it was categorized under the “No” category.

Figure 3 shows the number of articles that performed statistical analysis to support the reported outcomes. The others included studies where observations (in many cases with a few participants) motivated specific outcomes, by only reporting on what was observed. In other words, although those studies provided evidence that supported a specific outcome, they did not report on a thorough analysis to provide stronger evidence in favour of those outcomes.

Fig. 4.
figure 4

Distribution of the number of articles contributed by authors based on our search.

4.3 RQ3 - Authors

We identified a total of 1406 unique authors in all the reviewed articles. Figure 4 shows the distribution of authors in terms of the number of articles published based on our search and in the context of this paper. For example, this shows that over 1000 authors have contributed to one paper in our search, with 51 authors contributing 5 or more articles.Footnote 2

5 Discussion, Gaps, and Future Directions

We revisited a review of 443 articles on social robots in health/wellbeing contexts for adults to better understand the outcome of Human-Robot Interaction (HRI) studies in these contexts, as well as to study how those outcomes were supported and to see the breadth of authors involved.

The vast majority of the articles (365 out of 443 articles) reported positive outcomes based on HRI studies, about half of which were supported by in-depth data analysis and statistical tests. These results are promising, supporting benefits of using social robots for supporting health/wellbeing.

However, the very high number of positive outcomes as compared with neutral and negative outcomes may be partly due to the fact that many researchers may not report on negative or neutral outcomes (while only in some of these cases such results may be due to methodological issues), due to a general bias of journals and conferences to focus on positive results. As HRI studies and generally user studies may be affected by many factors (e.g., participants are self-selecting, i.e. they have to self-enrol in the studies in order to meet requirements of institutional ethics boards), it is reasonable to expect that there exist more cases with neutral and negative outcomes that might have not been published. However, those reports could be very beneficial indeed, to better understand how social robots can be improved in this context and beyond. In other words, a well thought out methodology that did not lead to positive results could still inform the research community about different factors that may negatively affect outcomes and guide future research. This is also supported in other areas in science. For example, Teixeira da Silva (2015) argues that negative results can indicate what does not work and negative results are important in motivating scientific thoughts [29]. The author highlights a lack of a publishing channel for reflecting these negative results [29]. Furthermore, the general mindset in science that negatively perceives negative results could be the cause of why negative results are not published as often [29]. This emphasis on the importance of reporting negative results is not recent and dates back to many years ago in many scientific fields. For example, Smart (1964) argued for the importance of reporting negative results in research related to psychology, pointing out how negative results can inform researchers, and emphasized that negative results are often unpublished [30]. Fanelli (2012) argues how this can lead to a positive-outcome bias and may also affect how researchers treat their data and results [11].

Similarly, in HRI, we argue that the field could greatly benefit from learning about the negative outcomes of research, if the methodology is well thought out and executed. This could inform researchers about aspects of the robots (appearance, behaviour, etc.) that may not be desirable and/or be acceptable for users, including primary and secondary users. Additionally, these results could highlight user populations that may not have positive attitudes toward robots, or could point to methodologies or settings that may not work as well with social robots. A particular concern here is that researchers, students and faculty alike, who join the field of HRI for the first time, might unknowingly end up replicating unreported studies that previously gave neutral or negative results. Therefore, as HRI researchers, we need to be able to publish these negative or neutral results, as well as to see the value in such work when evaluating other researchers’ work in the role of reviewers. But in order to succeed in this endeavour, conferences, journals, and funding agencies need to recognize the importance of reporting neutral or negative results, and mechanisms have to be in place to be able to publish and acknowledge those results, similar to publications with ‘positive’ results. Otherwise, generations of HRI research might replicate studies that were never published because the results were inconclusive or negative, which is counterproductive to advancing research in HRI.

Further, a lack of statistical comparisons might be in part due to the limited number of participants in many studies. As reported in [13], many of these studies have been based on relatively small sample sizes. Therefore, despite having many articles reporting on positive outcomes, only half of those that performed more in-depth data analysis such as using statistical tests (e.g., comparing experimental conditions, before-after studies, etc.) could provide strong evidence and support, while others are still valuable and informative. Future work based on larger sample sizes and methodologies that would allow for statistical tests and comparisons is needed to better understand the potential of using social robots in health/well-being contexts. This includes studies with multiple conditions with and without social robots, as well as studies where aspects of participants’ health and wellbeing or attitudes are evaluated before and after using the social robots. Those quantitative studies can complement other in-depth studies, including case studies, and other methodologies such as conversation analysis [12]. Ultimately, although in this review it was not possible to evaluate all data analysis methods and we only focused on statistical analysis (as the specific methods, especially related to qualitative analysis were not often reported in the reviewed articles), we acknowledge that statistical analysis is not a necessity in all HRI studies. Rather, the selection of methods should be decided based on the research questions addressed, the setting of the study, the number of participants that could be recruited realistically, etc. For example, in many of the reviewed studies that dealt with studying the effect of social robots on users’ attitudes, moods, and other behavioural effects such as reducing depression, statistical tests comparing conditions with and without social robots would be required to provide evidence about positive effects of social robots. In the absence of such evidence, those studies can still be informative in terms of showing the impact the social robots can have, but then the claims need to be adjusted to be representative of the findings.

It is important to acknowledge that conducting long-term studies with social robots in health settings (similar to many other application-oriented settings), with specific user groups, and/or with a large number of participants can be highly challenging. Social robots introduced in many environments (e.g., hospitals, care centres) might be perceived as a novelty — depending on the location of the study — which may affect the number of participants who would be willing to join the research studies, or the number of facilities (hospitals, care centres, rehabilitation centres, etc.) that may approve such studies, depending on their attitudes towards robots as novel technology, as well as them considering the effort required in term of staff time and concerns such as interruptions to the operation of the unit.

Additionally, usually the number of social robots present in a lab that is running the study is limited — which is also affected by the cost of the robots — another factor that can affect recruitment of participants compared to the other types of technologies such as virtual agents and mobile applications that can become more widely available and used in parallel with multiple participants. Therefore, despite the need for long-term, large-scale studies with in-depth data analysis, there is definitely value in small-size studies that report on general observations with a small number of participants and based on shorter interactions. Especially studies conducted with a specific user group, in specific settings, etc., can act as a stepping stone for expanding these HRI studies to larger-scale future studies, for example by exposing different settings and user groups to social robots and reducing the hesitation that may be due to the novelty of social robots and general assumptions about them. Furthermore, while field studies in real-world settings such as hospitals and care centres are the ultimate goal in order to evaluate social robot technology in situ, lab based studies and their outcomes are still important as initial steps, to get prepared and ready (technically and methodologically), before going out ‘into the wild’.

Limitations: This review had several limitations. We relied on authors’ reports on the results and data analysis method. Therefore, our review did not identify the cases where the reported results were not supported by the study or where in-depth data analysis was performed but not reported in the articles. Also, although we had a large multi-disciplinary team who originally helped with the screening and data extraction steps, and despite carefully designing the search teams with direct involvement of an experienced librarian, we might have missed some of the related articles. Furthermore, for such a large scale review, some papers might have been missed due to human error during the screening stages, despite being screened in duplicates. This may specially affect the reported distribution of the number of papers contributed by different authors published before our cut-off date February 6, 2021. Finally, data analysis methods other than statistics may be as valuable, or more appropriate, depending on a study’s research questions and context. Here, we had to only rely on statistical analysis reported in the papers as it was not possible to evaluate all approaches based on the information provided in the reviewed papers.

6 Conclusion

Social robots have great potential in health/wellbeing contexts and for supporting individuals. To better understand the results of HRI studies with social robots in this context, we reported on a large-scale systematic review, where we investigated the outcome of HRI studies in studies where a social robot was used in health/wellbeing contexts with adult participants. PRISMA guidelines were followed and the reported results expanded on another systematic review that was conducted on the similar set of articles, addressing other research questions. Here, we reported on the study outcomes and whether statistical tests were performed to support those outcomes. We also assessed the distribution of authors which showed a broad range of authors who have contributed to this field. A need for publishing studies with negative or neutral outcomes based on robust methodologies is identified, as well as a need for performing studies with a larger number of participants and robust methodologies. This would allow conducting data analysis that can help better understand and inform how social robots can assist people in health/wellbeing contexts. We also highlighted that different research methodologies, both qualitative and quantitative, including studies with small sample sizes, or studies with neutral or negative outcomes, can be important to advance HRI research in the context of supporting adults in health and wellbeing and beyond, as long as the findings of the studies match conclusions being made on the data.