Introduction

Robotics research has made incredible strides in recent years, providing benefits across a wide variety of applications, such as manufacturing, search-and-rescue, and healthcare. However, we lack seamless human-robot team interactions similar to those that appeared in science fiction decades ago, with robots such as R2-D2 or C-3PO working with Luke in Star Wars or Rosey helping her family in The Jetsons. Today, most industries utilizing robots keep them in caged setups or human-free environments to avoid accidents. The few appearances of human collaborative robots (i.e., “cobots”) are in manufacturing [1,2,3], healthcare [4, 5], search and rescue [6], and military [7]. However, even such “collaboration” is highly predefined and substantially constraining (e.g., robots will stop or slow down when humans are in the vicinity), limiting the impact of such technologies [8]. These systems maintain high levels of autonomy [9,10,11] and can perform rigid, predefined behaviors, being able to assist humans, but not effectively team with humans.

Teaming has been incredibly significant in human history, allowing humans to build at incredible speed and scale, and ultimately spearheading technological development and cultural growth. As the field of robotics has reached a level of maturity, we are now at a critical point where we can enhance the collaboration between humans and robots, namely human-robot teaming (HRT), which can bring us into a new technological age. HRT will be crucial in increasing efficiency in production lines [8], reducing workload for healthcare professionals by creating healthcare robot aides [4], and saving lives through rapid and coordinated disaster response. However, HRT also brings certain risks, such as human manipulation of models for personal gain, misapplication of robots outside of their trained context leading to harm to humans or damage of property or privacy violations. Effective HRTs require robots and humans to understand and support each other, develop and maintain shared mental models, generate and dynamically adapt long-term collaboration plans, all while ensuring the safety and privacy of humans and jointly considering the ethical implications of the robot’s actions on different users.

In this paper, we carefully define nine grand challenges to guide the research community toward successful HRT while avoiding potential pitfalls. The challenges are the following: (1) Communication, (2) Modeling Human Behavior, (3) Long-Term Interactions, (4) Scalability, (5) Safety, (6) Privacy, (7) Ethics, (8) Metrics and Benchmarking, (9) Human Social and Psychological Wellbeing (Fig. 1).

Fig. 1
figure 1

An overview of the Human-Robot Teaming challenges covered in this paper

Communication

Information sharing is crucial for team cooperation and achieving shared goals [12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]. In human-robot collaboration, effective communication depends on the robot’s autonomy level and the human’s supervisory role [18, 30, 31]. In lower levels of automation, communication in HRTs is necessary for passing task information, while in higher levels, it increases situational awareness. Building a successful HRT requires a sense of partnership where robots work jointly with humans, not just to follow commands [18]. Such interpretation of an HRT requires social dexterity and understanding from both the human and the robot, where they need to reason about their counterparts’ intentions, beliefs, and goals to take appropriate actions at the right time. Such social dexterity can be achieved through communication.

There’s extensive literature studying communication in successful human teams [12, 13, 32,33,34,35], but developing effective communication frameworks for HRT remains an open challenge. Although some prior works have explored leveraging communication strategies from human-human teams for human-robot and robot-robot teams [14, 32, 36, 37], core challenges in communication still need to be resolved, including communication modality (i.e., how to communicate), communication frequency (i.e., when to communicate), and communication content (i.e., what to communicate) in HRTs.

Communication Modality

For successful, smooth, and efficient cooperation and collaboration, HRTs need clear communication that is maintained throughout interactions to synchronize goals, task states, and actions [38].

Researchers have studied various forms of communication in HRT, such as two-way dialogue [39], natural language [40], multi-modal (i.e., including gesture, gaze, etc.) communication [18, 41], and visual messages [18]. However, these approaches can increase mental workload [42] and pose challenges to situational awareness (see the “Communication Frequency” section) [30, 43]. To address these issues, investigating discrete and sparse communication channels that preserve human interpretability in shared information may be a potential solution for high-quality decision-making [44].

In specific domains (e.g., military, underwater, and road signs), humans use gestures or visual interfaces to communicate; however, these require pre-training and knowledge of domain-specific message-spaces. For the general public, natural language appears to be more intuitive [45], but it poses challenges due to its ambiguity, colloquialisms, and context-dependent use [46, 47]. Nonverbal communication, such as hand gestures and visual interfaces (i.e., multi-modal), together with language, can be effective in human-robot interaction and coordination [48, 49].

Communication Frequency

To optimize communication frequency, it is important to consider its impact on workload, situational awareness, and decision-making. Too many messages can be overwhelming and reduce situational awareness, resulting in low-quality decision-making [50,51,52]. Conversely, too few messages can lead to low situational awareness, insufficient knowledge of the world state, and thus, reduced teamwork effectiveness and performance [53, 54].

Previous research recommends sending messages at fixed intervals to create a steady stream of anticipatory information among team members [32, 35]. However, other studies propose event-triggered communication to enhance communication efficiency [55,56,57,58]. Efficient communication in HRTs requires an effective balance between the amount of information conveyed in each message and the frequency of messages sent and received during a task.

Communication Content

In this section we discuss the communication content, or what to send, in HRTs. Both humans and robots need to effectively communicate their world-state, action-intentions, and objectives to their collaborator [59]. HRTs can benefit from such information to enable establishing true human-robot collaboration and shoulder-to-shoulder teamwork (rather than simply following commands), including self- and world-assessment for mutual support, communicating back to support joint activity, and negotiating labor division and task allocation [18].

Sharing state observations has been the central communication technique in prior works [15, 16, 60,61,62,63,64,65], while communicating action-decisions allows for strategic decision-making through theory of mind (ToM) and cognitive hierarchy [17, 66,67,68]. Recently, sharing experiences (i.e., state, action, and task reward trajectories) has also been proposed [69], but with larger teams, such experience-sharing mechanisms can become exhaustive and pose high communication and computation overhead (see the “Scalability” section). An effective communication mechanism should efficiently summarize behaviors and include messages rich in essential information for decision-making, while avoiding unnecessary information [59, 70, 71].

Modeling Human Behavior

Teamwork is best achieved when team members understand one another [72, 73]. Prior works in HRT have shown that shared mental models among humans and robots positively correlate with team performance [74,75,76,77]. This section addresses the challenges associated with modeling the goals and capabilities of human teammates.

Modeling user behavior may involve learning the user’s objective [78,79,80,81,82], or learning the policy [83,84,85] to predict how the user will respond in different situations. Prior works have explored model-based and data-driven approaches for modeling human behavior [71, 82, 86,87,88,89,90]. Here, we highlight critical challenges for robots modeling human behavior, namely: Complexity and Suboptimality.

Complexity

The ability to decipher another person’s mental state is named the Theory of Mind (ToM) capability [91]. Robots with a ToM capability could understand how people behave. Thus, augmenting robot policies with reasoning about human behavior can enhance robot assistance and collaboration with humans. Prior HRT research uses various techniques to model human behavior across different levels of reasoning [92]: first-order models infer human goals or intents solely from human behavior, whereas second-order models learn how humans make inferences about a robot’s objectives [93].

Modeling human behavior poses a significant challenge for robots due to the complex nature of human behaviors. Several internal and external factors often influence human behaviors, including trust in robots [87], stress levels [94], physical capability [95], engagement [96], sleep deprivation and caffeine or alcohol intake [97]. Current computational models of human behavior only explore a subset of these factors at once and often rely on simplistic assumptions about latent dynamics in human behaviors [87, 96, 98]. Hence, we need additional work exploring multi-faceted approaches that can incorporate various interaction effects to model human behavior.

The difficulty in modeling human behavior also increases with higher orders of robotic reasoning. As robots become complex, understanding the robot’s objective will become essential for humans to collaborate effectively. Thus, robots should consider how users understand the robot’s objectives (second-order models) to choose more predictable plans [99] or disambiguate their intentions via communication [100, 101]. The development of second-order models of human behaviors is still nascent [102], and recent works assume that humans and robots in HRT share the same goal or objective [59] which may not be true [103]. Thus, we need further work exploring how to tie in first-order and second-order models of human behavior to enable fluent collaboration.

Suboptimality

Several human modeling works assume that humans exhibit rational behavior, i.e., they choose actions that are approximately proportional to their intent or reward function [104,105,106]. However, humans deviate from rational behavior due to certain cognitive biases, time pressure, or limited processing capabilities. Accounting for such suboptimality can improve human behavior modeling. Few recent works explore incorporating such inconsistencies for human behavior modeling [85, 86, 89, 107, 108]. These approaches mainly address human suboptimality in simple domains for short time horizons, while ideally, robots should model humans for longer interactions (see the “Longer-Term Interactions” section). To overcome this challenge, we require more sophisticated models of human decision-making from other disciplines, such as psychology, cognitive science [109], and behavioral economics [110] for modeling humans in HRT.

Moreover, modeling suboptimality at the team level remains under-explored in HRT. Suboptimal behaviors in HRT can arise not only from individual agent behaviors but also from the interaction of various entities. For instance, misunderstanding between humans and robots can lead to task redundancy [111], and robot suboptimality can lower human trust and willingness to coordinate with robots, reducing team efficiency [112, 113]. Hence, we need models that account for team dynamics when modeling suboptimality in HRT.

Longer-Term Interactions

Teamwork between humans and robots will not complete within a single moment but rather develop over time [114,115,116] and can last for a variable duration. Furthermore, these teaming interactions may repeat, resulting in an interaction that could last weeks, months, or even years, contextually changing the interaction to a lifelong deployment [117,118,119]. Across the various domains that HRT will be beneficial to, there will be dynamic components of the environment, requiring a robot to intelligently and continuously reason over streams of information [120]. Furthermore, as an interaction proceeds, it is important for the robot to understand a situation by considering both past and current information as well as predict the future to create a collaboration plan.

We must enable robots to reason effectively in such longer-term interactions, adapting their behavior to new situations and personalizing to ever-learning users.

In the past, robots have been deployed long-term within applications that require relatively limited interaction and have been shown to provide benefits across cardiac rehabilitation [121], robot therapy for autism [117, 118, 122], and education [123, 124]. However, as we shift to the rich interactions required in HRT, longer-term collaboration is especially challenging as it can involve providing robots with the ability to (1) dynamically learn new concepts and adapt learned behaviors to accomplish objectives and (2) collaborate with a human that may exhibit changes. Furthermore, evaluating algorithms in these longer-term contexts can prove difficult as these studies require substantial resources, and interactions can vary widely across users [125]. Augmenting robots with the ability to learn and adapt to new contexts and behaviors, understand human behavior, and personalize their actions will enable them to support lifelong HRT in unstructured and dynamic settings.

Continuous Task Learning and Adaptation

To effectively team with humans in long-term interactions, robots must be able to learn new behaviors and adapt current behaviors to new situations [126,127,128,129]. There has been much progress toward the goal of facilitating speedy task learning [130,131,132]. Approaches include creating task-agnostic world or model representations [131,132,133,134,135], development of models that can support continual learning [136, 137], and techniques that minimize the forgetting of previously acquired knowledge by these models [138,139,140]. Other work has studied the learning of sub-skills to allow for reasoning over how to adapt a current set of sub-skills in a new context [141,142,143].

However, these works have not been extended to HRT scenarios, a domain where robots may need to simultaneously team with humans and learn new behaviors, all while a dynamic scenario is evolving. In HRT, human teammates will need to teach robots or correct existing robot behaviors online so that the robot can perform duties essential to the teaming interaction. Addressing this challenge will require creating new paradigms to facilitate human-in-the-loop robot learning [144], and developing techniques so that (1) human teammates can quickly teach/correct robot behaviors [145] and (2) robots are able to update models with minimal exploration (without prolonged training or excessive environment interaction).

Accounting for a Changing Human Teammate

A unique challenge in HRT is that effective reasoning over context requires the robot to understand its human teammate (e.g., a teammate’s intent, latent characteristics, current state, and future behavior), not only addressing the state of the world. The fields of Human-Robot Interaction and Human-Computer Interaction have utilized an understanding of human behavior to personalize robot decision-making (approaches discussed in the “Modeling Human Behavior” section), resulting in benefits across education [146, 147], healthcare [118, 148], and domestic applications [149].

However, many of these models can only perform well at reasoning over a set of behaviors well-represented within a dataset. Common assumptions are that the human will maintain a static modality throughout an interaction [150], humans are rational [78, 151], or that humans have an advanced understanding of the task-at-hand. In a long-term interaction, such assumptions will be violated at some point, rendering these models unsuitable for HRT. We need to provide robots with the flexibility to understand human teammates in more complex, “in-the-wild” [152] long-term interactions.

Evaluating Longer-Term Interaction

A long-term teaming interaction may last a variable duration and may repeat, resulting in an interaction spanning months or even years. Conducting studies looking into such repeated interactions within a HRT scenario can prove difficult as robot systems are not currently robust for such long-term deployments [153], and interactions can vary widely over time. Some works have begun to deploy robots within homes for longer periods [154], but the interaction between the robot and user is limited and as such, does not fit our definition of teaming. Thus, it is critically important to begin conducting HRT studies at longer scales of interaction so that research questions for future work can be clearly identified. Furthermore, at these longer scales of interaction, it is important to pivot from episodic measures of teaming, such as minimizing workload or maximizing performance, to longer-term measures that may provide overarching benefits [115].

Scalability

Modeling a larger number of diverse team members [155], accounting for changes in constraints or resources [156] and diverse computation methods [157, 158] are challenging for scalable HRT. The challenges of scaling humans [159,160,161] and multi-robot teams [162, 163] apply to HRTs for coordination [164,165,166] and collaboration [167] but do not account for the added heterogeneity.

Modeling of Large Heterogeneous Teams

In HRT, modeling, training, and sharing information face challenges due to diverse humans and robots.

When modeling heterogeneous teams, one-size-fits-all models [168] or multi-modal approaches [169] are used to account for stochasticity [170], preferences [85] and capabilities [171, 172].

Training HRT on large scale is hazardous, due to proximity to humans, and current scalable training approaches [173] leverage curriculum learning in simulation [174] with fine-tuning on larger scales [175] or through calibration with online learning [176]. Training methods for large-scale robot teams [22, 168, 177,178,179,180] may become infeasible in HRT [181] due to the credit assignment problem [182].

As HRT scales, communication overhead [183] and limited large-scale communication [184] become relevant, requiring human [185] or robot [168] supervisors for efficient coordination.

Future works may include developing stochastic and type-independent models for coordination and communication in HRT.

Robustness to Different Conditions

HRTs should be robust enough to handle changes in constraints, distances, and available resources.

A versatile HRT could handle different constraints (e.g., temporal, spatial, motion control) [186,187,188] but current training methods to learn new constraints (e.g., curriculum learning [189], zero-shot transfer [190], multi-task learning [191]) may lead to catastrophic forgetting [192].

Changes in environment scale and distances must be accounted for [193, 194] as units take time to reach them [195]. HRT algorithms should consider the scalability of the map size and different failures that may happen.

Dynamic changes in team composition (e.g., breakdown of robots, reassignment of humans) and resource availability can lead to different policies in cooperative planning [196,197,198,199].

Further research is needed to account for the human stochasticity in scaling problem domains and feasibility of using methods associated with multi-robot teams in HRTs, as these domains are both easier and safer to explore than HRT systems.

Architecture of Solution

Industry 4.0 and IoT have shifted robot decision-making toward a decentralized model [200, 201], with challenges in training communication and application.

Team coordination and robots can be centralized while humans are inherently decentralized [202]. Scaling optimal central planners is intractable [156, 165, 203], and remote control of robots may lead to communication problems [27, 194, 204,205,206,207,208].

Decentralized HRT is possible through cloud robotics and edge computing [209] and crowdsourcing can be used for training [210,211,212,213]; however, communicating in large decentralized systems remains an open challenge [214, 215].

Semi-centralized model leverage a central high-level supervisor and low-level decision-making in sub-teams [27], becoming robust to communication interruptions despite the sub-optimality [216, 217].

Safety

As human-robot collaboration increases, the chance of safety hazards also increases, making safety one of the most important factors of HRT [218]. In industrial settings, where human workers team with strong robots [1,2,3], collisions can result in serious injuries [219], while domestic robots’ proximity to end-users and the broad population they team with poses special challenges in ensuring safety [147]. However, the issue of safety extends beyond these examples and is prevalent across all HRT applications, e.g., assistive driving [220], rescue operations [221], agriculture [222] and supporting astronauts [223].

Multiple safety guarantee frameworks are proposed in prior work. Control Barrier Function [224, 225] encodes safe and dangerous states and could maintain the system within safe states. Reachability analysis [226, 227] and minimally invasive safe control [228] override the policy to avoid unsafe regions when the agent is on the safe-unsafe boundary. Researchers also constrain Markov Decision Process such that certain unsafe states and actions are banned from visiting [229]. Other safety assurance methods include regulating the control energy, velocity, and force [230,231,232] when humans and robots are in close proximity.

Safety is a key bottleneck in achieving effective HRT, as measures such as stopping the robot when humans are close significantly impact teaming fluency [8] and more sophisticated safety modules that are compatible with complex environments and different human partners are required. Two major challenges on the road to effective safety for HRT are (1) adaptation and personalization and (2) human understanding of robotic safety.

Adaptation and Personalization

The International Organization for Standardization (ISO) has provided safe-robot-behavior guidelines for industry robots [233, 234] and general human-robot collaboration [235] to stop, reduce speed, or limit the applied force when humans enter a robot’s safety region. However, for dynamic and unknown environments, the definition of the safety region itself must be adapted to account for the environment and the task, making the enforcement of ISO Standards ambiguous. Most aforementioned approaches only work with fixed unsafe regions, rendering them unsuitable for HRT. Online re-planning methods for dynamic environments [236, 237] are often impractical due to high computation requirements. Brown and Niekum [238] reasons conservatively about unknown space but requires a high amount of user queries about trajectory ranking, similar to [239]. One possible direction for safety in dynamic and unknown environments is to develop interfaces and approaches to allow end-users or domain experts to intuitively specify and adjust the safe/unsafe regions as needed.

Lasota et al. [240] defines two types of robotic safety: physical safety and psychological safety. The previous paragraph focuses on physical safety (human safety and environmental safety). Psychological safety (i.e., subjective safety) is equally essential, as a perceived safe system is key to maximizing team performance. For instance, an experienced worker may regard working closely with a robot as safe and productive. In contrast, a new user who just unboxes a robot may prefer keeping a distance from the robot [241]. Enabling users to define preferred safe states and thresholds could also work for this challenge. Demonstration-based techniques provide a promising direction to empower end-users to specify their safety boundaries [242]. The robot could also create informative queries to ask human teammates about uncertain states.

Once robots can ensure safety in complex environments and fit different teammates’ needs, the acceptance of HRT systems could significantly increase in various risk-averse applications.

Human Understanding of Robotic Safety

Understanding the robot’s limitations and potential hazards is crucial for humans making decisions in HRT. For example, an elderly-care robot may not be equipped with depth sensing and could fall off stairs. If the human partner knows the robot’s capability, he/she can decide to deploy the robot only on the ground floor. Prior work has explored explaining to humans after a robot fails [207, 243], but few works have considered the best way to inform humans about robots’ limits and potential failures and proactively prevent unsafe cases from happening. Huang et al. [101] is a close work where an autonomous vehicle informs the user about its policy such that the human understands possible safety concerns.

More research on how to best inform the human partner about the robot’s limits could grant the human more confidence to collaborate with the robot. As humans acquire full safety knowledge of robotic teammates, the HRT will become safer, more robust, and more seamless.

Privacy

Social robots are expected to become prevalent in highly privacy-sensitive domains, such as industrial floors [244], healthcare [5, 245, 246], assistive therapy [247], schools [248, 249], homes [250, 251], and workplaces [244, 252]. As these robots become prevalent in day-to-day environments, humans and robots will share workspaces, participate in conversations, and collaborate on tasks, while robots actively manage and utilize sensory information [253, 254]. Such information can include audio and video recordings, personal information, and even biometric data. However, little is known how robots discern the sensitivity of the information, which has major implications for human-robot trust [254]. Mishandling sensitive information can lead to great harm in government applications (e.g., through leakage of classified information), healthcare (e.g., HIPAA), citizen security and wellbeing (e.g., Illinois BIPA, EU GDPR [255]), and any application involving sensitive populations (e.g., minors, prisoners). We list two key challenges in privacy in HRT: (1) Personalization in HRT and (2) Maintaining Domain-specific Policies.

Personalizaition and Privacy: Opposing Objectives

Effective personalization requires detailed records of user interactions with the robot and understanding their habits and lifestyle to uncover user needs, preferences, and expectations [256, 257]. Downstream, such personalization can enhance a user’s trust and anthropomorphism within the HRT [256, 258]. However, the question remains can we have personal, trustworthy, and reliable robots without giving away personal data and respecting human’s privacy?

Some end-users simply rely on privacy policies and terms and conditions developed and released with a robot by the manufacturing companies, while more recently, researchers have developed privacy controllers for human-robot interactions to improve privacy awareness and trustworthiness [254]. Creating transparency via Explainable AI (further discussed in the “Transparency to Minimize Overreliance” section) techniques can also help build privacy awareness and support a trustworthy, private relationship in HRT.

Following Domain-Specific Regulations

Robots interacting with humans will capture, store, and transmit information to improve their ability to reason effectively. However, this information can also be misused or mishandled. There has been a string of litigation to avoid such misuse, striving to improve the overall quality of life across citizens of the world. For example, in America, the Family Educational Rights and Privacy Act (FERPA) and the Health Insurance Portability and Accountability Act (HIPAA) have been passed to protect the transference of sensitive information that can be easily misused. However, such policies are not applied to robots, and such misuses have already occurred both for virtual agents such as Alexa [259], and robots such as the iRobot Roomba [260].

Collecting data across users can improve the accuracy of data-based techniques, but requires sending private information to a centralized server, which may violate laws or user-specific criteria. Federated techniques attempt to address this issue by only sending back gradient information to centralized servers, keeping the benefits of crowd sourcing data while maintaining privacy [261, 262]. Other work in differential privacy [263, 264] and Homomorphic encryption [265] add noise or encrypt the data to protect individual user information directly from data. However, even with these techniques, users are not given the ability to control the information sent to the centralized server.

It is important for a robot to understand its context (e.g., whether it is working with a child in an educational setting or in a hospital), and use that context to control the transference of information following current litigation and user-specified preferences. Furthermore, where possible, data should be encrypted or anonymized, and transparent procedures should be in place for data collection, use, and storage to minimize data leakage and misuse.

Ethics

The challenges associated with ethical decision-making in HRT include identifying the responsibilities of system designers, incorporating transparency to improve robot trustworthiness, and the importance of designing robots that promote diversity, equity, and inclusion.

Ethical Decision Making and Responsible of Decisions

Ethical Decision Making plays a crucial role in HRT as robots may need to make quick, life-altering decisions [266, 267]. The uncertainty in the information available to robots and the designing algorithmic frameworks that account for ethical issues pose decision-making in HRT [268, 269].

Challenges in this area go beyond the well-known “Trolley Problem” [270, 271] which does not assume any uncertainty of actions. With robots being involved in emergency services, such as the redistribution of critical medical supplies during COVID-19 or triage care, it is the ethical responsibility of HRT researchers to consider the accountability of each actor [272,273,274,275,276].

The lack of clarity regarding the ethical and legal responsibilities of actors in HRT, as well as the possibility that robots can become autonomous moral agents, further complicates matters [277,278,279]. In such cases, it becomes challenging to decide who is responsible for a failure when robots and humans work together [280, 281].

Transparency to Minimize Overreliance

Successful HRT depends on humans trust and willingness to collaborate. However, adopting a solely user-centric approach for building trust in AI and robots risk creating “dark patterns,” i.e., it may lead to improving user trust without ensuring the system is trustworthy [282]. Developing trustworthy robots that empower humans to make informed decisions is crucial. Explainable AI (xAI) aims to address the trustworthiness gap, but developing appropriate explanations that cater to different stakeholders’ expertise and functional roles remains a challenge [282, 283].

Despite these challenges, there exists positive evidence highlighting the critical role of transparency in AI decision-making in establishing human trust in human-AI systems [284]. Recently, Paleja et al. demonstrated the effectiveness of user-readable decision trees in increasing situational awareness [150]. However, such increased situational awareness comes at the cost of significant cognitive load, making it impractical for rapid decision-making. Additionally, Miller describes the pitfalls of operationalizing xAI without incorporating relational information about the operator, task, and environment, counter-indicating a one-size-fits-all approach to xAI [285].

Miller describes a lifecycle approach to transparency and trust, building upon prior work in human-human teams [286]. The study found that high-performing teams with a high level of priori and posteriori transparency (i.e., displaced transparency) can obtain a high level of trust with very little in-the-moment transparency. This displaced transparency facilitates trust across each of the three tiers defined by [287], namely affective, analogic, and analytic. This research indicates the need for AI development approaches that facilitate explainability, which is accessible to a wide variety of stakeholders while mitigating unfair bias and ensuring the safety and privacy of various individuals involved.

Diversity, Equity and Inclusion

Diversity, equity, and inclusion (DEI) promote fairness and equal opportunity leading to a more creative workforce that enhances innovation and problem-solving. DEI in HRT should lead to robots that support and enhance diversity rather than perpetuate existing inequalities.

Widespread adoption of automation with human characteristics impact how people perceive other people. Design choices of voice personal assistants (VPA) such as Siri, Alexa, and Cortana [288] which utilize a default female voice can strengthen gender stereotypes [289]. When designing robots that will interact with all members of the society, designers must ensure the body types, voices, and appearances of these robots do not reinforce negative stereotypes. The effect of widespread usages of these automated systems needs to be further explored.

Prior works in HRT have shown that the acceptance of robots depend on many factors, including previous experience or familiarity with robots and technologies, robot predictability, robot policy’s transparency, and the human’s sense of control and trust [290,291,292]. Moreover, research in psychology has shown that people with different cultural backgrounds and personalities have different preferences over proxemics with others [293].

As such, robot designers should include a personalization module in the system, or cater to a specific target population and tailor the hardware and software design for the desired population, but care must be taken to avoid inherent biases and harmful grouping of people.

Metrics and Benchmarking

As the field of HRT continues to expand, measuring interaction quality and success is becoming increasingly important [294,295,296,297]. It is critical to develop reliable metrics to assess the performance of teams and the human experience within these teams [298, 299]. Metrics help to quantify and evaluate (1) performance of the team and (2) the human experience of being within the team. Performance metrics can include task metrics such as time to complete tasks, operation time, concurrent activity, and accuracy, as well as physiological measures like heart rate and skin conductance to estimate the current state of the interaction.

It is equally crucial to measure the human experience within HRTs, such as safety [300, 301], trust [302], workload [303, 304], and acceptance [305]. Measuring these factors over long periods of time and scaling them with multiple humans and robots present significant challenges (“Longer-Term Interactions”, “Scalability” sections). This section specifically focuses on challenges in measuring human factors, correlating metrics with team performance, and benchmarking. Overcoming these challenges and improving the metrics will facilitate the development of more effective HRT that can tackle increasingly complex tasks, contributing to the progress and evolution of the field.

Selecting Metrics

The human-robot interaction community has combined methodologies from psychology, automation, and human factors [301] to quantitatively assess the usability, user experience, and accessibility of the team [306].

Robots, especially those that are social, may influence the group dynamics when they are active participants and may impact people in the group differently [307]. Applying metrics used for individual interactions to group settings does not capture the social dynamics and can miss the group-level dynamics analyzed in social psychology [308].

Measuring Shared Mental Models and Situational Awareness

Researchers use human-only shared mental model methods for measuring shared mental models in human-robot teams. These include similarity [309], perceived mutual understanding [310], and situational awareness. However, a robotic teammate cannot easily express its belief of human teammates or the world [311]. Situational awareness metrics (SAGAT and SART [312]) can be used to directly compare mental models between robots and humans [313]. However, these measures again can only measure the human’s perception of the robot and cannot be used to equivalently measure the robot’s level of situational awareness of humans.

Developing ways to measure shared mental models between humans and robots can lead to a better understanding of team fluency and creating a standardized methodology can help researchers compare results from various task domains. Recent trends in explainable AI (xAI) have explored the issue of black box models and aim to create systems where we can more easily measure the belief overlap between humans and robots [314].

Benchmarking

Benchmarking HRTs allows for researchers to quantitatively compare novel approaches with previous approaches. Great advances in reinforcement learning [315], computer vision [316], and natural language processing [317] have utilized competitions and benchmarks to advance the state of the art. However, similar competitions and benchmarks are scarce in evaluating HRTs due to the diversity of tasks and the nature of setting up physical experimental testbeds. Robotics competitions serve as an intermediary way to measure performance metrics but commonly do not measure safety or human factor aspects of teaming [318].

Recently, simulated cooperative human-agent teams have been used to evaluate the performance of artificial collaborative agents, such as Overcooked [319, 320], Minecraft [321], and Roblox [322]. However, it is not clear that algorithms and methods conducted within a simulated environment (Wizard of Oz studies [323, 324]) transfer to those in the real world. Additionally, while physical constraints may be present in simulated environments, aspects such as perceived safety, communication, and physical workload may not transfer between simulated and real environments.

Creating common benchmarks beyond limited assembly tasks [325, 326] has the potential for accelerating the progress in designing effective human-robot teams.

Human Social and Psychological Wellbeing

Discussions in the previous sections focus on the performance of HRTs. However, considering the scale and the ubiquity of future HRT applications, we must consider social and psychological implications of HRT, as HRT aims to alleviate the physical and mental burden of humans [327]. In this section, we highlight the challenges in ensuring humans’ social and psychological wellbeing in HRTs — Robot Sociability and Human Replacement by Robots. Tackling these challenges will go a long way in HRT’s road to achieving a net positive for society.

Sociability of Robots

HRT can be applied to many applications that require understanding social cues and conventions. For example, a robot receptionist meeting customers can be modeled as an ad-hoc HRT: the robot greets the visitor, asks about their visit, and leads the way, during which the facial expression, eye contact, and appearance of the robot all contribute to the success of the HRT [328]. The requirement of sociability of robots is further amplified in multi-robot multi-human team settings where the robots must be capable of understanding the human social dynamics to contribute effectively to the team [329]. Despite the importance of sociability, most prior work focuses on the social navigation of robots, an over-simplified version of the rich, complex human social behaviors [330,331,332,333,334]. More research is needed on empowering robots to understand human social behaviors and equipping robots with the strategies for various social occasions.

Further, robots must adhere to social norms while collaborating in HRT. When robots need assistance from human teammates, they must assess when, whom, and how to seek help [88, 335, 336]. Inappropriate interruption causes negative impacts on task performance, user’s social perception of the robot, and the willingness to collaborate henceforth [337]. Hence, there is a growing need to develop robots that reason about social and contextual cues in real time when collaborating with humans.

Robot Replacement of Human

One of the most profound social impacts that HRT may cause is human job replacements by robots [338, 339]. Robot teammates offer multiple benefits, including higher durability, stronger physical capabilities, and arguably better robustness. However, replacing human-only teams with HRT could have significant social impacts on both humans replaced by robots and humans who team up with robots after the replacement.

It is hypothesized that when robots start taking over high-risk, physical-demanding, repetitive, or tedious jobs, it also creates more flexibility for humans to conduct creative and novel jobs, worrying less about physical limitations. However, more work is needed to verify the hypothesis; blindly adopting HRT may result in significant psychological, ethical, and social concerns [340,341,342].

Further, humans teaming with robots after replacement will require additional training since humans need to understand the robot’s limits as mentioned in the “Safety” section. Mariah et al. [339] showed that humans prefer human partners over robot teammates due to lower perceived team fluency and rapport established. Therefore, it is essential to explore the psychological and social consequences HRT may bring before the large-scale deployment of HRT.

Conclusion

As technology progresses, effective human-robot collaboration is becoming more feasible. xAI can improve shared mental models, enhance implicit “communication”, safety, and ethical decision-making. Short-term advancements in personalized algorithms can aid in modeling human behavior, increasing long-term interaction, and improving robot acceptance. Establishing benchmarks and metrics will help assess HRT factors. Longer-term challenges include scaling to larger teams, preserving privacy, and accounting for robot sociability.

HRT has the potential to benefit a multitude of fields and applications. In manufacturing, collaborative robots can work closely with humans, improving production lines’ efficiency, productivity, and safety. In healthcare, assistive robots can aid professionals and patient/elderly care, reducing the burden on healthcare workers and allowing for better patient care. Assistive driving systems can create a safer and more ergonomic driving experience for humans. HRT can help us achieve a new level of efficiency and productivity, allowing us to tackle some of the most pressing issues facing our world today.

To realize this vision of HRT, addressing the challenges in this paper will pave the way for a new era of human-robot collaboration, in which robots and humans work together seamlessly to accomplish tasks that were once impossible. We hope this paper will inspire further research and development in this exciting field, and we look forward to the day when human-robot teaming is a reality in all our lives.