1 Introduction

Industry 4.0 has ushered in a new era of process automation, thus redefining the role of people and changing existing workplaces into unknown formats [1]. The number of robots in the manufacturing industry has been steadily increasing for several decades and in recent years the number and variety of industries using robots have also increased [2,3,4]. In this context, operators will continue to be of great importance, so optimising the interactions between persons and robots will be crucial. In contrast to standard automation, collaborative robots (cobots) [5] enable close and safe interactions between humans and machines, taking advantage of the benefits of both sides.

ISO 8373 [6] defined a robot as a powered mechanism controlled via an interface, it is programmable in two or more axes with a degree of autonomy and moves within its environment to perform intended tasks. Dautenhahn [7, sec. 38.2] defined human–robot interaction (HRI) as ‘the science that studies people’s behaviour and attitudes towards robots in relation to the physical, technological, and interactive characteristics of robots, with the aim of developing robots that facilitate the generation of human–robot interactions that are at the same time efficient (in accordance with the original requirements of their intended area of use), acceptable to people, meet the social and emotional needs of their individual users, and respect human values’.

For robots to become allies in the day-to-day lives of operators, they need to provide positive and fit-for-purpose experiences through smooth and satisfying interactions [8,9,10,11]. In this sense, the user experience (UX) serves as the greatest link between persons and robots. ISO 9241-210 [12, sec. 2.15] defined UX as ‘a person’s perceptions and responses resulting from the use or anticipated use of a product, system or service’. This includes user emotions, beliefs, preferences, perceptions, physical and psychological responses, behaviours, and achievements that occur before, during, and after use [12]. This means that humans must experience robots as fulfilling existing goals, and as entities that act efficiently and make people feel confident, safe, and comfortable while they are working together [13]. A clearer understanding of social cognitive constructs (such as determining intentionality, which suggests an intimate connection between social cues and the perception of robots as social agents) is required to fully optimise HRI [14]. This statement emerges from a shift in the perception of robots as tools that extend human capabilities to teammates that collaborate with people [15,16,17,18].

Over the next few years, the coexistence between people and robots will increase [19]. This will take place in technologically enriched environments, where information will be exchanged "naturally" between humans and robots, giving rise to hybrid environments in which people move between the digital and real worlds [19]. The combination of human and robotic skills is becoming increasingly important [20]. While certain routine tasks or specific skills can be effectively supported by automation, local decisions or exceptional interventions often require human input. This could arise from the extraordinary characteristics of the given situation or the complexity or the implicit nature of the knowledge required to find a feasible solution within a limited period. To date, the combination of human and artificial resources has not been part of standard automation practice, in which (i) robots and people are usually kept at arm’s length from each other, and (ii) people must adhere to work procedures that are as rigid as the rest of the automated production environment. Symbiotic human–robot collaboration (HRC) goes beyond these constraints and requires a more responsive, transparent, and accessible environment. Thus, for the improvement of HRI, the skills and expertise of humans must be combined with the accuracy and automation of robots, which work not as passive tools but as active partners [21].

To this end, it is important to optimise the UX between human and robot. The evaluation of the UX will enable the continuous improvement of the industry's workplaces.

1.1 Research Background on HRI Design and Evaluation

Numerous contributions have been written on HRI design and evaluation. The recent adoption of the concept industry 5.0 by the European Commission [22], increased the interest to incorporate human factors. Nevertheless, the literature reports few attempts to put human factors metrics in a comprehensive way in order to evaluate the UX on HRI.

A method for performing detailed ergonomic assessments of co-manipulation activities exists, and this could be applied to optimise the design of collaborative robots [23]. Maurice et al. [23] defined multiple ergonomic indicators to estimate different biomechanical demands (muscle force, tendon deformation, muscle fibre length…) that occur during whole-body activities (e.g., joint loads, joint dynamics, mechanical energy…). These indicators are measured through virtual human simulations.

Amoretti et al. [24] stated that understanding the characteristics, advantages and disadvantages of different technical architecture paradigms and software strategies for their use in the robotics domain is crucial for the design, implementation, and successful use of cobotic software architectures.

There are several literature reviews in the context of HRI. The work by Hentout et al. [25] proposes a rough classification of the content of works in HRI into several categories and subcategories, such as hardware and software design of collaborative robotic systems, safety in industrial robotics and cognitive HRI. They stated that the goal of HRI is to provide robots with three fundamental requirements: (i) human intention should be easy to infer by the robot, (ii) the control should be intuitive from the human viewpoint, and (iii) the designed controller should be safe for both humans and robots. Simões et al. [21] listed a number of guidelines broadly classified into: (i) human operator and technology, (ii) human–robot team performance, and (iii) an integrated approach to design HRC. As a generic conclusion, they highlighted the importance of feedback in improving trust and blame attribution. They presented recommendations for the design of safe, ergonomic, sustainable, and healthy human-centred workplaces where not only technical but also social and psychophysical aspects of collaboration are considered. Savela et al. [26] examined how the social acceptance of robots in different occupational fields had been studied and what kinds of attitudes the studies had discovered regarding robots as workers. Their results imply that attitudes toward robots are positive in many fields of work. Nevertheless, they indicated that there is a need for validated measures. Veling et al. [27] analysed the use of qualitative methods and approaches in the HRI literature to contribute to the development of a foundation of approaches and methodologies in the research area. Their review revealed six predominant qualitative data gathering methods in the HRI literature: qualitative observations, semi-structured interviews, focus groups, generative activities, reflective and narrative accounts, and textual/content analysis.

According to Moulières-Seban et al. [28], focusing on humans, tasks, robots, and system interactions when designing a cobotic system is necessary. These authors introduced a method of designing cobotic systems that is composed of four stages: (i) activity analysis, (ii) basic analysis, (iii) detailed design, and (iv) realisation, setup, validation and putting into service.

Numerous studies on robots in industries have been published, but most of them focus on safety and security aspects [25, 29, 30]. Other researchers have studied standardisations to improve workplaces [31, 32]. In this sense, the robotics industry is growing to a level where people and robots will be able to collaborate [33]. However, as Harriot et al. [33] pointed out, there is still no universal model that assesses the effect of this collaboration on people’s performance.

Furthermore, it is noted that no attention has been paid in the literature to the human factors resulting from the human–robot interaction. Emotional factors such as trust, satisfaction or mental workload have been poorly studied for the optimisation of collaborative robotic systems. The assessment of these factors is beneficial to know how people feel before, during and after the interaction. In this way, robot actions could be adapted to people's needs, in line with the human-centred design approach.

The direction indicated by European Commission’s framework Horizon 2020 establishes that gender equality must be promoted through changes in the culture of scientific institutions, changes in the composition of research teams in order to achieve equality, and changes in the content and design of research activities [34]. Nevertheless, the gender perspective has not been included in any of the identified studies. The integration of gender perspective in research is necessary to avoid biases, where sometimes the realities, experiences, and expectations of a group of people (considering men as a reference) are constructed as the norm, thus producing partial and non-universal results. The gender perspective in research means integrating sex and gender variables in the scientific process, which will have implications when considering gender norms, identities, and relations as explanatory variables of the analysed phenomenon. On the other hand, many people face accessibility barriers when interacting with robots, mainly people who do not usually interact with new technologies, elderly people, and users with disabilities [35]. Designing and developing robotic systems which ensure accessibility to all users with different abilities and needs is essential to make HRI systems more inclusive. Nevertheless, none of the identified reviews considers the inclusivity as a necessary aspect to be approach.

Therefore, the objective of this systematic literature review (SLR) is to identify evaluations of HRI that include a human-centred design perspective. Furthermore, we aim to understand the human factors that affect HRI in an industrial environment.

The organisation of this review paper is as follows. Section 2 provides information about HRI and interaction types. Additionally, it presents information about UX evaluation. Section 3 explains the search methodology used in this article. Section 4 details the results, which are the literature characterisations and the answers to the research questions (RQs). Section 5 presents the discussion, and Sect. 6 the research gaps and future research directions. In Sect. 7 the limitations are set out, and, finally, Sect. 8 presents the summary and conclusions.

2 Human–Robot Collaboration and UX Evaluation

HRC implies a deeper interaction between the two entities involved (i.e., humans and robots). In this context, interfaces play central roles as the main communication channels. A key aspect of collaboration is interaction and talking about interactions also means talking about interfaces. High-quality HRI requires intuitive user interfaces [36]. On the one hand, operators can give robots simple inputs without any distraction from their main tasks. On the other hand, robots provide clear information to users, resulting in an immediate understanding and interpretation of data [19]. The adoption of intuitive interfaces becomes even more important in the case of closer collaborations between robots and humans. Humans naturally interact with the world using multiple resources simultaneously [37]. Consequently, interacting with cobotic systems should be easy for them [25].

Establishing what effective communication entails and determining the interfaces through which humans and robots can communicate are necessary. In this regard, we should define (i) the intended interactions between persons and robots, and (ii) the purpose of the information exchange. Both elements are largely outlined by the scope of the application and the functions of humans and robots [38], and they need to be adapted to different contexts.

Interfaces can generate different types of interactions [39]. For example, graphical communication can take place using specific devices (e.g., a monitor or a touch screen), voice-based communication can use natural language interfaces and gesture-based communication can use cameras suitable for tracking human hands. Depending on the typology of communication, human–robot interfaces can be classified into four categories: (i) visual displays (e.g., graphical user interfaces and augmented reality [AR] interfaces), (ii) gestural (e.g., hand and face movements), (iii) voice and natural language (e.g., auditory and text-based responses) and (iv) physical and haptic interactions [40].

HRI has been classified into different areas depending on the authors. Prati et al. [39] used the classification by Schmidtler et al. [41], who categorised HRI into: (i) human–robot coexistence, (ii) human–robot cooperation and (iii) human–robot collaboration (HRC). According to Prati et al. [39], these interfaces can also be related to the level of interaction provided. In particular, the first level of interaction (coexistence) is usually satisfied with graphical interfaces. The second level (cooperation) often requires more advanced interfaces, such as voice and gestures. Finally, the third level (collaboration) may require direct physical or haptic interaction to be both effective and natural.

On the other hand, Wang et al. [42] studied the relationship between interaction and risk. At the first level (coexistence), commitment and complexity are low, and safety is easy to guarantee because the operator is protected by physical boundaries. In the second level (interaction), the interaction is not significantly higher, but the safety risk is much higher, as the person and the robot start to share the same space. At the human–robot cooperation level, the interaction and the safety risk increase significantly, as direct contact between the operator and the robot is high. In a fully symbiotic human–robot partnership, the task is carried out by both parties in a collaborative manner, and it is inevitable that the human operator comes into direct contact with the robot. Therefore, the level of safety risk is higher than that in cooperation.

As indicated by Wang et al. [20], the development of solutions for HRC requires an analysis and synthesis framework containing (i) means to classify and characterise the problem and (ii) solution templates and guidelines for developing a solution that fits seamlessly into existing production requirements. The authors identified the fundamental elements in an HRC scenario as (i) agents (robots and humans actively participating in the production process), (ii) the working environment, which includes resources that are necessary for production but do not play an active role in conditions (ambient light, noise, etc.), and (iii) parts and operations.

UX is a term that has become established in human–computer interaction research and practice. It denotes that the interaction with a contemporary technological system goes beyond usability and extends to the emotions before, during and after using the system. UX cannot be defined solely by studying the fundamental attributes of usability, such as effectiveness, efficiency, and user satisfaction. Measuring UX becomes a more complicated task when the target of the interaction is not just a technology system or an application but an entire environment.

UX is a key factor in the quality of a product, service or system [43,44,45]. Essential to its study is its evaluation, which refers to the application of a set of methods and instruments whose objective is to determine the perception of the use of a system or product, allowing the identification of aspects to improve or maintain [46].

In a study on “User eXperience Evaluation Methods” by Väänänen-Vainio-Mattila et al. [47], they proposed a set of requirements for good UX evaluation in industrial environments. Although they stated that it is not possible to have a single method that meets all the requirements because some of them may be contradictory or even unrealistic, it would be interesting to identify the different evaluation methods to assess HRI in the literature.

3 Research Methodology

As stated above, this literature review is about identifying evaluations of HRI that include a human-centred design perspective and understanding the human factors that affect HRI in an industrial environment. This can be achieved by performing a SLR and identifying all the available research papers within a specific duration.

In this literature review method, the guidelines proposed by Kitchenham et al. [48] were used to carry out the SLR, which included three phases: planning, organising, and documenting. These phases have their own components: (1) research questions, (2) data/information sources, (3) criteria for the inclusion/exclusion of selected papers, (4) quality assessment (QA), (5) systematic review strategy and, (6) extraction of data and synthesis. The following sections describe these phases.

3.1 Research Questions

In this SLR, two RQs were formulated which are given as follows: The questions must be clearly answered to complete the SLR successfully. Table 1 presents the RQs.

Table 1 Defined research questions

3.2 Data/Information Sources

There are three main groups of keywords. Scholars use varying terms to describe the concepts. Therefore, a range of keywords were identified and combined to discover different studies comprehensively and objectively. The following terms were used in the search for information:

“Human–robot Interaction” OR “Human–robot Collaboration” OR “Human–robot Coexistence” OR “Human–robot Workstation”

“Human factors” OR “user experience” OR “ux”C

Evaluation OR Assessment

In this way, the following main search equation was created:

(“Human–robot Interaction” OR “Human–robot Collaboration” OR “Human–robot Coexistence” OR “Human–robot Workstation”) AND (“Human factors” OR “user experience” OR “ux”) AND (Evaluation OR Assessment).

The electronic databases used for the search are shown in Table 2.

Table 2 Description of research databases

3.3 Literature Search

Each database was searched, adapting the equation as required by the database. One problem with this breadth of databases is the noticeable difference in their search functionality, which require adjustment according to each database, as detailed in Table 3. All articles must meet these general requirements: peer-reviewed journal articles dated between January 2011 and the date this search was conducted (November 2021). Using peer-reviewed journal articles ensured validated knowledge [49], while the publication year limit was set to reduce the number of inappropriate hits. We did not expect to find any articles before 2011 that were significant for the review because it is an emerging field and the interest in human factors is also recent. Moreover, we assumed that the latest work builds on that of previous years. We also excluded papers that were not mainly written in English or Spanish.

Table 3 Adopted search syntax for each database and number of results obtained

There are 555 identified papers, nearly the half of which were found in the ScienceDirect database, whose disciplines focus on science, technology, and the social sciences.

3.4 Selection of Literature

The next step continued with the review protocol. The main motivation for applying inclusion and exclusion criteria was to ensure that the studies selected for the systematic review were related to the evaluation of HRI taking into account human factors.

3.4.1 Criteria for Inclusion/Exclusion

Table 4 shows the criteria used in this review process. In addition to the language limitation (LL), we also ensured the credibility of the published papers by excluding journal articles that were not peer-reviewed (LP1). Another limitation was the publication year, which was set to reduce the number of inappropriate hits (LP3). Therefore, the first step was to exclude duplicate articles (LP2). A total of 117 duplicate articles were identified.

Table 4 Inclusion and exclusion criteria

The next step consisted of literature selection based on the article title and abstract and taking into account the directly related (DR), partially related (PR) and loosely related criteria (LR). A total of 331 articles were excluded.

3.4.2 Quality Assessment (QA)

The next step is to conduct a QA. This process allowed us to identify whether the articles were related to the specific topic being reviewed and whether they were useful when considering an evaluation of the UX in an industrial robotic environment. For this purpose, five QA questions were formulated, reviewed in detail and scored based on the analysis. Because of the nature of this type of experimentation, sample size or replicability has not been established as a crucial aspect in the QA. There is a large variability in the sample size used in the different experiments because of the different application protocols that may exist [50]. Considering that QA does not exist in isolation, but directly or indirectly serves to answer RQs and support conclusions in this study [51], our RQs are merely associated with the human factors in HRI, ways to evaluate it and the consideration of gender and inclusiveness. Therefore, we defined the QA questions by following the recommendations by Yang et al. [51], resulting in the following ones:

  • QA-1—Does the proposed topic relate to human factors in a human–robot interaction in an industrial environment? QA-1 aims to give higher scores to articles related to human factors in HRI and specifically in industrial settings.

  • QA-2—Does this research help identify human factors that affect human–robot interaction? QA-2 aims to value papers which at least assist in the identification of human factors influencing HRI, those that are most aligned with the human-centred design approach, and therefore with the purpose of this research.

  • QA-3—Is the proposed topic adequately described? QA-3 has been established to assess quality and rigorous articles, which are well written, and the subject is described correctly.

  • QA-4—Does the proposed theme consider a gender or inclusive perspective? Directly aligned with the scope of this research, QA-4 aims to value papers that consider the gender perspective and inclusiveness.

  • QA-5—Does the research describe how to evaluate the user experience of human–robot interaction? QA-5 aims to assess those papers that present experiments where HRI evaluations are carried out from a UX perspective.

The five QA questions mentioned help in evaluating the selected studies in terms of their contributions to the present literature review. The aim of the QA was to facilitate the understanding of the studies’ appropriateness and usefulness to the current study. Nidhra et al. [52] proposed high-level quality criteria by providing specific scores for the findings, which consisted of three types of ratings for the assessment: high, medium, and low. These ratings are given by answering QA questions.

A score of 2 was given to studies that fully met the quality standard, a score of 1 was given to studies that partially met the quality standard, and a score of 0 was given to the studies that did not meet the quality standard. Therefore, the maximum score for each study is 10 (i.e., 5 × 2 = 10), and the lowest possible score is 0 (i.e., 5 × 0 = 0).

In this SLR, we considered those articles that obtained a score higher than 7 which is a quite reasonable result to answer the RQs of this study and to ensure a high quality and reliable findings (Table 5), which was a total of 24 articles.

Table 5 Papers that obtained a quality score higher than 7

3.4.3 Systematic Process Review

Figure 1 describes the process carried out during the literature review. In the first phase, a total of 555 articles were identified, of which 117 were duplicates and were therefore discarded. In the second phase, screening was carried out by reading the titles and abstracts based on the previously defined inclusion and exclusion criteria (Table 4). A total of 438 titles and abstracts were reviewed, of which 331 were discarded. The next phase consisted of a complete reading of the 107 remaining articles, which were evaluated one by one based on the QA questions. After evaluation, only articles scoring more than 7 were considered, of which 24 articles were finally analysed in depth for data extraction and synthesis.

4 Results

4.1 Literature Characterisation

4.1.1 Evolution in the Field

The first article identified dates from 2013. Three articles have been identified for that year: they concern the studies of [33, 53] and [54]. It is only in 2017 that publications studying the evaluation of human factors in industrial human–robot environments begin to increase. In fact, more than 80% of the identified publications date from 2017 to 2021 (Fig. 2).

Fig. 1
figure 1

Diagram of the data collection process, according to the guidelines [48]

Fig. 2
figure 2

Number of articles identified per year. Note that the search only includes articles published before mid-November 2021 and thus does not include all 2021 publications

4.1.2 Nature of Journals

The journals in which the most articles have been identified, with three articles in each, are Procedia CIRP, a journal focused on publishing high quality proceedings of CIRP conferences and, ACM Transactions on Human–Robot Interaction (THRI). Two articles have been identified in the IFAC PapersOnLine journal, two in Procedia Manufacturing, and two in Robotics and Computer Integrated Manufacturing (Fig. 3). Journals in which most articles have been identified.

Fig. 3
figure 3

Journals in which most articles have been identified

As for the impact of the publications, Table 6 shows that only one journal is not indexed. In other words, 96% of the articles belong to indexed journals. Of the 17 journals identified, 5 of them (29%) are classified in the first quartile, and 9 (53%) in the second quartile.

Table 6 Impact of the identified journals (nd = no data, SJR = Scimago Journal Rank)

4.1.3 Number of Citations per Article

The number of citations in the articles is relatively low compared with that in broader fields. The most cited article is that by Villani et al. [63] with 246 citations. The paper by Lasota et al. [55] is in second place with 97 citations. With 31 citations, the third most cited article is that by Hietanen et al. [68]. Table 7 lists the 10 most cited articles included in this review.

Table 7 Papers by citations, retrieved from December 2021

4.2 RQ-1: Is there a HRI Assessment Model that Includes Human Factors in Industrial Settings?

Although no widely validated assessment model has been identified, 24 papers in which assessment of HRI in industrial settings is carried out have been determined. Five of these studies are theoretical frameworks, 5 are tools and 15 are experimental studies (Table 8). The study by Prati et al. [39] first showed the theoretical framework and then the tools. This is why the article appears in both groups, making a total of 25; however, there are only 24 articles.

Table 8 Classification of the identified papers taking into account the type of study

4.2.1 Theoretical frameworks

This SLR has identified five recently created theoretical frameworks. The studies by Cohen et al. [64] and Villani et al. [63] date from 2018. The studies by Meissner et al. [59] and Lindblom et al. [11] are from 2020, and that by Prati et al. [39] is from 2021. As these are recent theoretical frameworks, it could be said that interest in the field is growing. In addition, the high impact of the journals included shows the acceptance and interest of the scientific community of HRI. Specifically, the study by Villani et al. [63] has 246 citations (as of December 2021).

In the study by Cohen et al. [64], they proposed a theoretical framework for analysing and improving workplaces, the framework focuses on three phases: observation, analysis, and reaction. They emphasised the importance of examining the inputs, whether from the operator or the workplace itself, analysing them and selecting how the reaction must take place on that basis. According to Schillaci et al. [53] and Kildal et al. [58], providing the right feedback is important for the interaction to be perceived satisfactorily by users. In this regard, Cohen et al. [64] underlined the selection of the reaction mode (i.e., the channel, frequency and intensity by which feedback should be given) based on observation and analysis of the different elements involved.

Villani et al. [63], on the other hand, placed safety at the centre of the system. According to Bo et al. [74] and Hentout et al. [25], safety is the foremost consideration in HRI. Industrial cobots interact and perform tasks with humans, creating close ties between the two. However, this close relationship changes the current paradigm regarding safety procedures and separation in workspaces between humans and robots [75]. The safety of working with cobots is a challenge today. Reducing the weight of their moving parts is one of the main factors to be considered when designing intrinsically safe cobots [76]. The sensorial apparatuses of robots could be improved [25], such as, using proximity sensors, to reduce risk during interactions.

To this end, the importance of designing intuitive interfaces has been emphasised. The information provided by robots should be adequate for users to be aware of the situation, understand the behaviour of the system and thus intervene in dynamic and unexpected situations. Villani et al. [63] stated that affective robotics could be suitable for guaranteeing an intuitive interface, alleviating the cognitive load of the user, as the robot would adapt to the person’s situation. Cohen et al. [64], Schillaci et al. [53] and Kildal et al. [58] reported that providing adequate feedback is indispensable in establishing bidirectional person-robot communication. Furthermore, Villani et al. [63] stressed the importance of having adequate design methods and introducing adaptive solutions for inclusive robotics.

Beyond feedback, the theoretical framework of Meissner et al. [59] showed the factors that influence worker acceptance in HRC contexts. They indicated that the most influential factors (the primary ones) are perceived risk, perceived benefits, and positive and negative emotions. They also pointed out that a number of secondary factors affect the acceptance process, such as object-related, subject-related and context-related factors. All these factors influence people’s attitudes towards the acceptance of the system [59].

In relation to system acceptance, the study by Lindblom et al. [11] was based on Donald Norman’s seven-stage action model [77, 78]. In the model, the person starts from an intention, which is executed and subsequently has consequences, such as perception, interpretation, and evaluation of the context. In this sense, designers develop systems with specific intentions for users. However, this does not always happen, and people’s resulting emotions may differ from the intention with which the system is designed. Along these lines, Lindblom et al. [11] constructed the ANEMONE theoretical framework. It consists of a phased, iterative procedure that focuses on (i) determining whether people can perceive, understand, and predict robots’ intentions and actions and on (ii) providing relevant insight into why something works or does not work in a particular use situation. The goals are to provide guidance on how UX evaluation can be conducted and to facilitate an understanding of why something does or does not work by identifying UX issues.

Prati et al. [39] proposed a structured UX-oriented method to investigate human–robot dialogue. The method aims to introduce a set of UX techniques that support interface design. In accordance with the human centred design approach, the method places the user at the centre, it is also an iterative process. The first step of the method consists of requirements gathering, this involves a multidisciplinary team, user analysis (for which a set of tools is proposed), activity analysis and interaction visualisation. The second step consists of interface design, subsequent prototyping and, finally, UX evaluation. For the latter, the authors proposed user testing, but they did not present any process, technique, or tool to carry out the evaluation.

4.2.2 Operational Tools

This SLR has also identified five tools. The five articles showing the tools are fairly recent. The study by Charalambous et al. [61] dates from 2017; Von Der Pütten et al. [57], from 2018; Gualtieri et al. [69], from 2020; and Qbilat et al. [73] and Prati et al. [39] from 2021.

Charalambous et al. [61] proposed a system to determine industrial maturity level. The goal is to develop a new human actor readiness level tool for system design practitioners and thus optimise the successful implementation of industrial HRC.

Von Der Pütten et al. [57] developed and validated a new measure of self-efficacy in HRI. After conducting several experimental studies, they proposed a questionnaire consisting of 18 items. Participants have to rate the items on a six-point Likert scale [79].

In the study by Gualtieri et al. [69], a collection and classification of prerequisites and design guidelines were developed. These guidelines could help application designers properly develop and evaluate safe, people-centred, and efficient collaborative assembly workstations. Qbilat et al. [73] proposed HRI accessibility guidelines. These guidelines were evaluated by 17 HRI designers and/or developers. The authors developed a questionnaire consisting of nine five-point Likert-scale questions and six open-ended questions to evaluate the proposed guidelines for developers and designers in terms of four main factors: usability, social acceptance, UX and social impact.

Prati et al. [39] presented two design tools in addition to the theoretical framework. The first is the user/task matrix, which is used to synthesise in a chart all the information about users and tasks, as well as operational conditions, this tool helps designers define suitable interfaces. The second is experience maps, which represent a synthetic visualisation of the entire end-to-end experience that a "generic" user goes through to achieve a given goal. These maps are used to understand general human behaviour, as opposed to journey maps, which are more specific and focused on aspects related to a specific business.

4.2.3 Experimental Studies

The SLR has also identified 15 experimental studies evaluating HRI. Interest in experimental studies is growing, as can be seen in the results, more than 50% of the studies were carried out in the last three years (2019–2021). Furthermore, the high impact of the journals shows the scientific community’s interest in the field.

In general terms, the experiments are divided into three phases: (i) prior to the execution of the task, *ii) during the execution of the task and (iii) after the execution of the task. However, no validated evaluation model has been identified, as each of the experimental studies uses a different process. A comparison is shown in Table 10.

Regarding the phase prior to task execution, none of the studies have conducted an expert evaluation using tools, such as heuristics. Nor are data collected from expert’s or user’s perspectives. It is important to note that experience consists of the emotions before, during and after an interaction, as stated in the definition of UX in the ISO 9241-210 [12].

In the phase during execution, most studies have used robots of various kinds, commonly robotic arms (defined in Table 9). However, the study by Baskaran et al. [66] was an evaluation carried out using Siemens Process software, so the authors did not use any robots. Similarly, Colim et al. [60] performed the experimentation on a workstation, did not use any robots, and the same was the case in the study by Almeida et al. [67], who focused on interfaces.

Table 9 Type of robot used in each experimental study
Table 10 Summary of the reviewed experimental studies

Performance is measured in seven experiments. Psychophysiological measures are only used in one of the experiments. In seven of the experiments, observation during the task is also carried out, from which qualitative information about the interaction is obtained.

As for the phase after execution, data collection of the participants’ perceptions is carried out mainly through questionnaires (on 13 occasions). In one of the experiments, an interview is also carried out.

4.2.4 Sample Size and Gender Perspective

The sizes of the samples used in the different case studies and the number of men and women have also been collected. Table 11 shows the number of people who participated in each of the case studies, and the distribution between genders. Of the total sample, 30% are women and 65% men. There were 450 participants (134 women and 291 men). Only two studies (one of the case studies of Aromaa et al. [62] and the study by Tang et al. [65]) use equal samples. Furthermore, the studies do not show the results obtained disaggregated by gender, preventing us from determining whether there are differences between genders when interacting with robots.

Table 11 Sample size in each experimental study (nd = no data)

4.3 RQ-2: What Human Factors Does It Include and How Does It Assess Them?

Human factors is the scientific discipline concerned with the interaction between humans and artifacts and design of systems where people participate [80]. The purpose is to match systems, jobs, products and environments to the physical and mental abilities and limitations of people [80]. According to Beith [81] human factors focus on system usability and designing system interfaces to optimize the users' ability to accomplish their tasks error-free in a reasonable time and, therefore, to accept the system as a useful tool. Considering applying human factors principles leads to designs that are safer, more acceptable, more comfortable, and more effective for accomplishing their given tasks [81]. Table 12 shows the factors evaluated in the experimental studies and the ways in which they were evaluated. Four groups of measures have been identified: (i) performance, (ii) posture, (iii) robot-related factors and (iv) Emotionsrelated factors.

Table 12 Factors and techniques evaluated in each experimental study

4.3.1 Performance

Performance is the most evaluated factor; it refers to how people perform their task. Users’ performance is shaped by their capabilities (e.g., memory, attention, flexibility), and it is the consequence of the human factors displayed in the system. Therefore, a human centred approach involves taking into account also performance consideration. The indicators in this group are indicators that are directly or indirectly reflected in human performance, and hence provide insights into the human factors and UX. Eight indicators to evaluate it have been identified in seven of the fifteen studies.

  1. (i)

    Task execution time is measured in seven of the fifteen studies (47%). Thus, it is the most evaluated factor among the studies identified, in fact, it is measured by all the studies that assess performance.

  2. (ii)

    The number of interactions performed is measured once, i.e., in the study by Daniel et al. [54]. As the authors stated, this variable shows the quality of the user interface and offers insight into the possibilities for incorrect data input [54].

  3. (iii)

    Errors are measured once, i.e., in the study by Almeida et al. [67].

  4. (iv)

    Robot idle time is measured twice, i.e., in the studies by Lasota et al. [55] and Hietanen et al. [68].

  5. (v)

    Person idle time is measured once, i.e., in the study by Lasota et al. [55].

  6. (vi)

    Variability in production times is measured once, i.e., in the study by Colim et al. [60].

  7. (vii)

    Production rate is measured once, i.e., in the study by Colim et al. [60]. According to the authors, this is a key indicator measuring performance in terms of pieces produced within a specified time interval (e.g., number of preforms per hour).

  8. (viii)

    The ratio between the time required to complete the task with and without the robot is measured once, i.e., in the study by Beschi et al. [71]. As they stated, this indicator verifies whether human productivity is also affected by robot movement during unsynchronized tasks [71].

4.3.2 Posture

Related to anthropometrics and biomechanics, this focuses on eliminating harmful and unsafe work practices and aims to study human capabilities and limitations in order to adapt the task to the person while minimizing fatigue [82]. Four of the fifteen studies analyse the posture of the person. Six indicators have been identified from the studies.

  1. (i)

    Postural load is measured once, i.e., in the study by Harriott et al. [33]. It measures the percentage of time the participants spent with the flexion of their trunks at an angle of more than 45º from the vertical [33]. The longer a participant spent with severe trunk flexion, the higher the physical workload [83].

  2. (ii)

    Variance in posture is measured once, i.e., in the study by Harriott et al. [33].

  3. (iii)

    Total movement is measured once, i.e., in the study by Harriott et al. [33]. It is presented as the total number of times the participant stood up and crouched down.

  4. (iv)

    Vector magnitude is measured once, i.e., in the study by Harriott et al. [33]. As stated by the authors, it is a measure of overall physical activity and combines acceleration from the three axes of movement. Vector magnitude measures participants’ physical movement in the evaluation area.

  5. (v)

    Rapid upper limb assessment (RULA) is an observational method [84] to evaluate physical work-related upper limb disorders [85]. Its application involves the assessment of a worker’s posture, as well as the exerted forces, the repetitiveness of movements, and external loads (e.g., handling heavy materials) [85]. It is measured in three studies: those by Aromaa et al. [62], Tang et al. [65] and Colim et al. [60].

  6. (vi)

    The revised strain index (RSI) is also measured in the study of Colim et al. [60]. The RSI consists of a five-variable model using continuous multipliers. The five variables/risk factors measured are the intensity of exertion (force), exertions per minute (frequency), duration per exertion, hand–wrist posture and duration of a task per day [86].

4.3.3 Robot-Related Factors

This includes the characteristics that describe the nature of the system and, therefore, influence human perception. Factors related to the robot are measured in five of the fifteen studies. In total, seven factors have been identified.

  1. (i)

    Anthropomorphism is measured using the Godspeed questionnaire [87] in two studies. As Bartneck et al. [87] stated, anthropomorphism refers to the attribution of a human form, human characteristics, or human behaviour to nonhuman things, such as robots.

  2. (ii)

    Animacy is measured in the same two studies using the Godspeed questionnaire [87]. As Bartneck et al. [87] stated, the goal of many robotics researchers is to make their robots lifelike.

  3. (iii)

    Likeability is also measured in the same two studies using the Godspeed questionnaire [87]. As Bartneck et al. [87] stated, the way in which people form positive impressions of others is, to some degree, dependent on the visual and vocal behaviour of the targets, positive first impressions of a person often lead to more positive evaluations of that person.

  4. (iv)

    Perceived intelligence is also measured in the same two studies using the Godspeed questionnaire [87]. As Bartneck et al. [87] stated, interactive robots face tremendous challenges in acting intelligently. The reasons can be traced back to the field of artificial intelligence (AI). Robots’ behaviours are based on methods and knowledge developed with AI.

  5. (v)

    Perceived safety is measured in three studies. In two of them, it is measured using the Godspeed questionnaire [87]. A key issue for robots interacting with humans is safety [87]. This topic has received considerable attention in the robotics literature, particularly in terms of the systems and standards established for both industrial robots and service robots intended for use in the home. In the study by Lasota et al. [55], they used a self-generated four-item questionnaire to measure perceived safety.

  6. (vi)

    Usability is measured twice. In the study by Danielsson et al. [56], they used the system usability scale (SUS) questionnaire [88], and in the study by Almeida et al. [67], they used a questionnaire based on IBM Computer Usability Satisfaction Questionnaire. The ISO 9241-11 defines usability as ‘the extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use’ [89].

  7. (vii)

    Learnability is measured once, i.e., in the study by Danielsson et al. [56], using the SUS questionnaire [88]. According to Joyce [90], learnability considers how easy it is for users to accomplish a task the first time they encounter the interface and how many repetitions it takes for them to become efficient at that task.

In summary, the seven robot-related factors have been measured through four questionnaires, which are as follows:

  1. (i)

    Godspeed questionnaire [87]—to measure anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety. It was used twice, i.e., in the studies by Schillaci et al. [53] and Joosse et al. [72].

  2. (ii)

    SUS questionnaire [88]—to measure usability and likeability. It was used once, i.e., in the study by Danielsson et al. [56].

  3. (iii)

    A questionnaire based on the IBM Computer Usability Satisfaction Questionnaire, was used once, i.e., in the study by Almeida et al. [67], to measure usability and satisfaction.

  4. (iv)

    Another questionnaire was also used in the study by Lasota et al. [55] to measure satisfaction with robots as teammates and to determine perceived safety and comfort.

4.3.4 Emotion-related factors

This includes emotional responses resulting from human–robot interaction which evaluate the hedonic quality [91] of the system. Eight studies of the fifteen measured factors related to emotions. A total of five factors have been identified.

  1. (i)

    Trust is measured once, i.e., in the study by Daniel et al. [54]. The authors asked participants some questions adapted from the web accessibility initiative (WAI) Site Usability Testing Questions [92].

  2. (ii)

    Satisfaction is measured in two studies, i.e., those by Lasota et al. [55] and Almeida et al. [67]. In the former [55], the authors used a questionnaire that measured satisfaction with robots as teammates; in the latter [67], the authors used a questionnaire based on the IBM Computer Usability Satisfaction Questionnaire.

  3. (iii)

    Mental workload is measured in three studies. All of these studies, i.e., those by Aromaa et al. [62], Pantano et al. [70] and Harriott et al. [33], used NASA-Task Load Index (NASA-TLX) [93] questionnaire. Harriott et al. [33] also used physiological measures, such as heart rate and heart rate variability.

  4. (iv)

    Physical and mental stress is measured once, i.e., in the study by Hietanen et al. [68], using a self-generated questionnaire.

  5. (v)

    Perceived risk is also assessed once, i.e., in the study by Beschi et al. [71], using a self-generated questionnaire.

4.3.5 Types of measurements

A classification of the measurements used is made to understand the types of measurements applied and the interest in them. In addition to those previously mentioned, other measurements have been identified. The classification involves objective, subjective, qualitative, and quantitative measures.

Table 13 presents that the subjective and quantitative measures are basically questionnaires. Various questionnaires have been identified according to the indicator to be measured.

  1. (i)

    Path following precision is measured in the study by Almeida et al. [67]. According to the authors, this indicator refers to a 3D path following precision with the tip of a stick.

  2. (ii)

    The ratio between the touch screen and keyboard interactions is measured once, i.e., in the study by Daniel et al. [54]. The ratio between the interactions with the touch screen and the keys on the teach pendant or the robot controller may indicate the tendency of divided attention caused by the user interface [54].

  3. (iii)

    Percentage of concurrent motion is measured once, i.e., in the study by Lasota et al. [55].

  4. (iv)

    Distance between the human and the robot verifies whether human motion is affected by robot movement during unsynchronized tasks. It has been measured twice, i.e., in the studies by Lasota et al. [55] and Beschi et al. [71].

  5. (v)

    The Siemens process is capable of virtually validating manufacturing concepts up front. Through this platform, an evaluation of the interactions between associates working on the assembly line, equipment and materials flow can be performed [66].

Table 13 Type of measurements and tools used in each study

Regarding subjective measurements, we can classify them into: i) quantitative and ii) qualitative, as shown in Table 13. Questionnaires are particularly efficient methods of application and analysis that are commonly used for user-driven assessments [94]. They allow for efficient quantitative measurements of product characteristics, as they are usually measured using Likert scales [79] or semantic pairs [95]. Questionnaires measure user’s perspectives and do not necessarily require any kind of monitoring.

  1. (i)

    A general interest questionnaire is used in the study by Danielsson et al. [56]. Participants filled out questionnaires with six questions regarding general interest and five questions regarding the information displayed on the screen.

  2. (ii)

    Observation is used in two studies, i.e., those by Danielsson et al. [56] and Kildal et al. [58].

5 Discussion

In the context of HRC, the role of people remains central. This mutual relationship between people and robots results in a powerful collaboration framework with a positive impact on productivity and flexibility. Using a human-centred approach is essential to knowing people’s perceptions and thus bringing out the best in them during interactions. Therefore, analysing UX in these environments is crucial. Users must perceive robots as allies so that they can leverage the strengths of both for common goals. Collaborative robots enable closer and safer interactions between humans and machines, so that both sides can benefit from each other’s strengths. An SLR was conducted to learn how evaluations of HRI occur. The review identified a total of twenty-four articles, of which five were theoretical frameworks, five were evaluation techniques or tools and fifteen were experimental studies.

The theoretical frameworks identified are consistent, and they present similarities. Although the articles are recent, their similarities make us understand that there is a common line within this field of research.

In general terms, the importance of safety is emphasised to ensure an effective HRC. However, the safety perceived by the person should be considered equally important because if the user does not perceive it as such, the interaction will not be satisfactory and the UX will not be evaluated positively.

According to Norman [77, 78], when a person interacts with any object, in this case a robot, they start from an intention, which is executed and subsequently has consequences, such as the perception, interpretation and evaluation of the context. Along this line, Lindblom et al. [11] built a theoretical framework consisting of a phased and iterative procedure with the aims of providing guidance on how to carry out a UX evaluation and facilitating an understanding of why something works or does not work identifying UX problems.

The human factors that influence HRI have also been identified, and those described by Meissner et al. [59] stand out. The most influential factors affecting the acceptance of workers in HRC contexts are perceived risk, perceived benefits, and positive and negative emotions. These factors influence attitudes towards system acceptance.

The present paper reveals how HRI evaluations have been performed to date. There is a lack of experiments that evaluate UX before, during and after interactions. It would be appropriate to include the evaluation in these three phases and with a comprehensive approach, i.e., by using different measures (qualitative and quantitative, objective and subjective), to better interpret the data obtained from each measurement.

In the experiments on HRI assessment, only a single study using physiological monitoring was identified. The use of physiological monitoring could be beneficial in obtaining objective data on user emotions. Compared with traditional methods, including physiological monitoring in UX testing has limitations in terms of price, complexity and time required to ensure that the assessment is done properly. Moreover, physiological signals require some degree of interpretation, as the output must be processed to move from the raw data to actionable insights [50].

It can be concluded from this study that the evaluation of UX by combining methods, tools and physiological devices could be beneficial, as the interaction would be evaluated considering different types of information. Physiological devices provide quantitative and objective data about the user at the moment of the interaction, questionnaires help obtain quantitative and subjective data, useful to understand user perceptions of a system.

Finally, this study highlights the need to consider the gender perspective. A marked difference was observed in the samples used in the experimental studies, which consisted of 65% men and only 30% women. To contribute to the reduction of the gender digital gap, researchers should use equal samples and disaggregate the data obtained by gender to determine whether differences between genders in exist HRI.

6 Research Gaps and Future Research Directions

6.1 Research on the Correlation Between Dynamics of Robots and User Perceptions

Dynamic variables have had significant influence on perception studies. An important factor in HRI is the speed at which robots act. Several works in the literature have examined the appropriate speeds for robotic actions [58, 80, 81], determining that robots should act slower than people. According to Joosse et al. [72], a possible reason for people’s preference for a slower robot speed may be that not all robots give a clear indication of when they are going to stop, i.e., they do not provide feedback on their intentions. One way to overcome this is to equip robots with functional feedback systems so that they can convey their intentions. Therefore, there is an opportunity for research into dynamic variables and the correlation between user perceptions and performance. One further opportunity is evaluating people’s abilities to understand robots, the degree of accuracy in predicting the robots’ actions, and whether the sequence of actions performed is appropriate.

According to Lindblom et al. [11], UX is not absolute, which means that each person may perceive their experience differently. Helping a person understand a robot, perceive its intention, predict the sequence of actions that will take place, evaluate its actions, determine whether any action is necessary, specify a sequence of actions and perform these actions is necessary to optimally perceive UX in HRI. To this end, and according to Cohen et al. [64] and Villani et al. [63], robots must provide adequate feedback so that users can understand and predict their actions. There is a research opportunity to analyse how this feedback can be presented according to the robots’ actions and the people’s emotions. In this sense, affective robotics could be used to provide useful feedback by determining the appropriate channel, frequency, and intensity of interaction.

6.2 Research on the Evaluation of HRI in Design and Interaction

Operators need to have positive and fit-for-purpose experiences through trust-based, smooth, safe, and satisfying interactions in order to integrate robots as natural parts of their daily lives. Therefore, evaluation is a key aspect of ensuring a good UX. According to Gammieri et al. [98], virtual reality (VR) and AR are effective tools that are capable of simulating industrial cobotic systems with a high level of immersion. These can simulate HRI safely and economically with a digital twin, even in earlier design phases in which the workplace is still under development.

In conclusion, the lack of a structured evaluation method that is adapted to different HRI contexts and stages of the design process has been identified in this SLR. Such a model would need to function in a virtual context, since the HRI system might be designed virtually in the early stages of the design process. This model would also need to be operational at later stages when the system is, for example, in a laboratory or even in a real environment. The framework proposed by Prati et al. [39], called the UX cycle in HRI, is the closest approach, however, it has shortcomings in the evaluation phase.

Meissner et al. [59] identified several influencing factors on the individual in the context of collaborative robotics in industry. There is an opportunity to investigate how these factors can be evaluated, and more specifically the correlation between the influencing factors and already validated assessment questionnaires. For example, perceived risk could be identified using questionnaires measuring perceived safety (Godspeed questionnaire [87] or the one that was used in the study by Lasota et al. [55]) or confidence (SUPR-Q [67] or UEQ + [100]). Physical and mental relief could be related to ease of use, which could be measured by the Perceived Usefulness and Ease of Use Questionnaire [101], PSQ [102], ASQ [103], and USE [104]. It could also be related to intuitive use (measured by UEQ + [100]) or cognitive effort (measured by the DEEP questionnaire [105]). As for the perception of progress, it could be measured by efficiency (SUMI [106], WAMMI [107], UEQ [94], UMUX [108], UEQ + [100]), and effectiveness (UMUX [108]). In this sense, another line of research could study positive and negative emotions and the use of physiological tools.

The lack of experiments evaluating the temporal nature of UX, i.e., before, during and after interactions, provides an opportunity to investigate how it can be included in these three phases. In particular, emphasising the phase before the interaction is necessary because only in one experiment was an interview conducted prior to task execution (Table 12).

It has also been observed that, in the phase during the interaction, performance-related aspects are mainly evaluated, whereas emotions are hardly examined. The retrospective and subjective evaluation facilitated by questionnaires and interviews is, in all cases, not an optimal approach, because it does not measure UX at the moment when the interaction between a person and a robot occurs. Thus, it can be prone to human error because of inaccurate recall. Using physiological monitoring to assess UX during interactions is essential. According to Neumann et al. [109], psychophysiological measures are more objective than self-recording measures, such as questionnaires. The opportunities offered by physiological signals are increasing because of the evolution of sensors and signal processing [110].

6.3 Research on the Differences Between Genders when Interacting with Robots

In the experimental studies, the gender variable was not studied. Therefore, investigating the gender digital gap and to addressing it are considered important.

Women continue to be underrepresented in technology compared with their number in the overall workforce [111]. As indicated by Holtzblatt et al. [112], research shows that a diverse and inclusive workforce correlates with higher innovation, creativity, revenue, and profit [113, 114]. Bala et al. [115] stated that today’s workforce needs to be filled in a holistic manner that ensures a diverse group of people creates the technology of the future. From the perspective of the analysis of HRI, focusing on the differences in the behavioural modes, perceptions and emotional effects experienced by women and men when faced with different designs of these machines is appropriate.

For all these reasons, evaluating UX during HRI in industrial environments using an equal sample and disaggregating the results obtained by gender are necessary. Doing so will help identify whether there are differences in HRI between genders. In this way, workplaces can be designed in a gender-inclusive way to mitigate the gender digital divide.

7 Limitations

Our review has the following limitations:

  1. (i)

    The SLR methodology does not guarantee that all the publications related to a given research area will be identified [116].

  2. (ii)

    Having limited the search to peer-reviewed articles, we may have missed case studies published at conferences that could have been relevant to the study.

  3. (iii)

    The reviewer bias: Despite having attempted to objectify the review, we may have introduced bias in some cases.

  4. (iv)

    The choice of databases used. Although we strategically selected the databases to ensure appropriate coverage of this research area and designed a search strategy to ensure that as many publications as possible were captured, it is possible that if we had used any additional databases, we could have identified more significant articles for the research.

  5. (v)

    The QA criteria. If we had defined other QA questions, the result of the SLR would have been completely different. However, we wanted to focus on human factors, gender, and accessibility. The same with having set the cut-off in the QA at 7. If we had set a lower value, may be other relevant papers would have been identified. Despite our efforts to avoid bias, according to Yang et al. [51] QA could result from factors that potentially bias the findings of the study. Nevertheless, both the used QA questions and the cut-off value have provided us with quality papers, and this is supported by the literature characterisation. Since 96% of the articles correspond to articles published in indexed journals, 29% in the first quartile and 56% in the second quartile.

  6. (vi)

    Another limitation is restricting results to English and Spanish languages only.

8 Conclusions

The number of robots in the manufacturing industry has been steadily increasing for several decades and in recent years the number and variety of industries using robots have also increased. As stated by Hentout et al. [25], HRI can effectively contribute to developing future factories in which humans and robots can share tasks and work shoulder to shoulder. Therefore, operators need to have positive and fit-for-purpose experiences through trust-based, smooth, safe, and satisfying interactions in order to integrate robots as natural parts of their daily lives. According to the HCD approach, placing the human at the centre of the system is important to guarantee fluid, safe and satisfactory interactions.

This article reviewed existing works on HRI evaluation that considered human factors in industrial environments and were published between 2011 and 2021. A total of twenty-four full-text articles that provided a summary of (i) theoretical frameworks, (ii) operational and evaluation tools and (iii) experimental studies were analysed.

  1. (i)

    The theoretical frameworks identified emphasised safety and provided insight into the human factors that could influence HRC. In the context of interaction with a robot, safety and perceived risk are determining factors, as they directly impact the person’s performance and emotions. The theoretical framework proposed by Villani et al. [63] placed safety at the centre, and the framework proposed by Meissner et al. [59] indicated that one of the primary influential factors was perceived risk. Emotions, such as loyalty, stimulation, and trust, must also be considered, and even appearance must be taken into account, because little attention has been given to the emotional effects of aesthetic impressions on users [117]. A theoretical framework of HRI from the UX perspective was also identified [39], but it had shortcomings in the evaluation phase. Nevertheless, evaluation is a key aspect to optimising UX.

  2. (ii)

    The operational tools identified through SLR are recent. They are tools of different nature that determine industrial maturity, assess the measure of self-efficacy in HRI, propose design guidelines for collaborative assembly workstations, or allow synthesising HRI experiences. In addition to the development of these tools, other tools have been identified which have been applied in experimental HRI evaluation studies. On the other hand, physiological tools allow objective assessment because they provide information without retrospective bias. Given the lack of experiments using these tools to evaluate UX in HRI, integrating user monitoring using physiological tools would be essential in future experiments. Including different tools in a new evaluation model would also be essential, allowing the evaluation of UX in different phases of the design process, such as VR or AR, which would, in turn, enable assessment of UX in the design phases of the workplace. The combination of different tools, with different characteristics, at different times could help ensure the accuracy and reliability of the results, as it would provide a better understanding of the context in which the interaction takes place.

  3. (iii)

    The present article summarised how evaluations of HRI were performed in the literature. No validated model to assess UX in HRI was identified. In general terms, the experiments are composed of three phases: (i) prior to the execution of the task, (ii) during the execution of the task and (iii) after the execution of the task. Performance is the most evaluated factor, mainly through task execution time. As for subjective evaluation, questionnaires were the most frequently used tools, although different questionnaires were identified in the different case studies. Complementing the study with traditional tools, such as questionnaires or interviews, that provide subjective insight into users’ perceptions would be interesting.

Future experimentations must integrate a holistic approach to capture people’s perceptions at all times, i.e., before, during and after an interaction. Including different methods of measurement at different moments of the interaction—quantitative and qualitative, objective and subjective—and integrating gender aspects to guarantee the interpretation of the data and the understanding of the entire flow of interaction.