1 Introduction

In the aftermath of major disasters (man-made or natural), first responders (FRs), such as medics, security personnel and search and rescue teams, are rapidly dispatched to help save lives and infrastructure. In particular, FRs, with different skills, capabilities and experience may be required for the different tasks that need to be performed. For example, finding out where the civilians are trapped requires search and rescue teams and medics, transporting them to safe houses requires ambulances and security personnel and medics, while removing dangerous material from the environment requires safety experts and security personnel. While performing such tasks, FRs often operate in a very dynamic and uncertain environment, where, for example, fires spread, riots start, or the environment floods. Given this, FRs find it difficult to determine the best course of action and which task should be allocated to which team.

To assist in such situations, over the last few years, a number of algorithms and mechanisms have been developed to solve the coordination challenges faced by emergency responders (see Sect. 2 for more details). For example, [45] provide an algorithm to compute the optimal teams of emergency responders to allocate to tasks that require specific types of skills to complete, while [12, 43] distribute such computations in an attempt to reduce the bandwidth required to coordinate. However, none of these approaches consider the inherent uncertainty in the environment or in the first responders’ abilities. Crucially, to date, while all of these algorithms have been shown to perform well in simulations, none of them have been exercised to guide real human responders in real-time rescue missions. In turn, studies on the deployment of such intelligent technologies in the real-world reveal that they typically impose a cognitive burden on the first responders [37, 38] and disrupt task performance. Hence, it is important to develop real-world simulations of disaster response where such technologies can be trialled so that the interactional issues between humans and agents may be explored. Moreover, only through such trials will it become clear whether these algorithms will cope with real-world uncertainties (e.g., communication breakdowns or changes in weather conditions), be acceptable to humans (i.e., take into account their capabilities and preferences to perform certain tasks), and actually augment, rather than hinder, human performance (e.g., providing useful guidance and support rather than intrusive ones).

Against this background, we develop a novel algorithm for team coordination under uncertainty and evaluate it within a real-world mixed-reality game that embodies the simulation of team coordination in disaster response settings. Specifically, we consider a scenario involving rescue tasks (involving carrying a specific object to a safe zone) distributed in a physical space over which a (virtual) radioactive cloud is spreading. Tasks need to be completed by pairs of FRs with specific roles (e.g., medic, soldier, fire fighter, or transporter) that need to plan paths from their individual locations to meet at specific points in the disaster space to undertake these tasks. Moreover, they have to do so before the area is completely covered by the cloud (as FRs will die from radiation exposure) which is spreading according to varying wind speed and direction (that may result in highly uncertain radiation level predictions). Our algorithm captures the uncertainty in the scenario (i.e., in terms of environment and player states) and is able to compute a policy to allocate responders to tasks that minimises task completion time and plans routes for responders to ensure they not exposed to significant radiation. In order to be responsive, our algorithm is designed to return approximate solutions rather than optimal ones (that would take too much time to return solutions in a real-time setting). The algorithm is then used by a planning agent, working alongside a human commander, to guide FRs on the ground. Specifically, the planning agent is integrated into our test platform, AtomicOrchid,Footnote 1 that structures the interaction between FRs (i.e., players on the ground), a human commander, and the planning agent in a mixed-reality location-based game. In particular, the planning agent is designed to take over the burden of computing team plans from the human commander (who takes up a more supervisory role) while being responsive to requests to change plans from FRs (e.g., in case they are tired or prefer to do other tasks). By so doing, we are able to study, both quantitatively and qualitatively, the performance of a human–agent collective (i.e., a mixed-initiative team where control can shift between humans and agents) and the interactions between the different actors in the system [24]. In particular, we advance the state of the art in the following ways:

  1. (1)

    We develop a multi-agent Markov decision process (MMDP) to represent the problem of team coordination (i.e., path planning and task allocation) under uncertainty [7] and provide a novel algorithm to compute approximate solutions to the MMDP. We embed the mechanism to drive a planning agent in the AtomicOrchid game to evaluate it with users in a real-world setting.

  2. (2)

    We present a novel mixed-reality game, AtomicOrchid, to evaluate team coordination under uncertainty, focussing particularly on human–agent collaboration. In AtomicOrchid, human players in the field are supported by our planning agent in their mission to coordinate rescue tasks efficiently by communicating with headquarters and each other via a mobile phone application.

  3. (3)

    We run field trials of our planning agent in AtomicOrchid where it instructs field responders through mobile messages in a disaster response scenario in multiple field trials. Our quantitative and qualitative analysis of the results show that providing flexible interactions between human participants and the planning agent improve task performance, particularly when the agent can rapidly respond to human requests for tasks.

When taken together, our results show, for the first time, how agent-based coordination algorithms for disaster response can be integrated and validated with human teams. Moreover, these results allow us to derive a methodology and guidelines for systems involving human–agent collaboration.

The rest of this paper is structured as follows. Section 2 presents related work and provides some background on the techniques used in later parts of the paper. Section 3 formalises the disaster response problem as an MMDP. Section 4 describes the algorithm to solve the path planning and task allocation problems presented by the MMDP. Section 5 details the AtomicOrchid platform. Section 6 presents our pilot study and the field trial evaluation. Finally, Sect. 7 presents our design guidelines and concludes.

2 Background and related work

In this section we discuss related work and provide a short background on the techniques used in this paper. As our work lies at the intersection between multi-agent coordination and human–computer interaction (HCI) for disaster management applications, we discuss relevant algorithms for multi-agent coordination to support emergency responders and then describe the HCI techniques and approaches for the same purpose. We then go on to discuss the challenges relevant to human–agent collaboration and then justify our use of mixed-reality games to evaluate this form of collaboration. Through our analysis we also identify the challenges that pertain to the coordination of human emergency responders under significant uncertainty and therefore, conclude with a survey on decision-theoretic approaches to solving the coordination problem under uncertainty in order to justify our agent-based solution for the team coordination problem.

2.1 Human team coordination for disaster response

Team coordination focuses on managing interdependencies between activities performed by individual team members to achieve the team’s goal [34]. In emergency response situations, failures in team coordination can often occur due to the complexities of such interdependencies, and such failures are widely acknowledged as the most significant factor that can cost human lives [59, p. 2]. In particular, related work studies the challenges that arise when team members aim to create a shared understanding (e.g., of what needs to be done) [14], develop situation awareness (i.e., knowledge of the environment and the actors within it) [3], and align cooperative action through on-going communication [59].

Moreover, we note the work of [13] that highlighted that a key characteristic of large-scale disasters is the presence of multiple, spatially distributed incidents. To deal with multiple incidents, the disaster response team has to coordinate spatially distributed resources and personnel to carry out operations (e.g., search, rescue, and evacuation). Therefore, it is necessary to optimise the coordination of teams by allocating tasks to teams in time and space efficiently and sufficiently. Given this, a number of tools and system architectures have been developed to support such team coordination [14, 36, 41]. However, while these approaches focus on providing tools to human teams to better share information and formulate plans, they do not consider how such team coordination could be optimised using agent-based planning. Hence, in the next section, we survey a number of agent-based solutions to the task allocation problem in disaster response.

2.2 Agent-based planning for disaster response

Kitano et al. [27] were the first to propose disaster response as a key application area for multi-agent systems. Since then, a number of algorithms and simulation platforms have been developed to solve the computational challenges involved. For example, algorithms have been developed to efficiently allocate emergency responders to rescue tasks (e.g., to rescue civilians, extinguish fires, or unblock roads) for (i) decentralised coordination: where emergency responders need to choose their actions based on local knowledge [12, 43], (ii) coordination by a central authority: where a command centre is able to choose actions (against potentially adversarial agents) for all the members of the team given complete knowledge of the system [26, 30, 50, 58], and (iii) coalition formation: where sub-teams can perform tasks with different levels of efficiency, as defined by the synergies between their capabilities (e.g., when two policemen help rescue a civilian from rubble they would be less effective than a fire and rescue officer and a medic) [45]. Similar to [26, 30, 50], in our work, we adopt a centralised approach to the coordination problem to and additionally consider the uncertainty in the environment (see more details in Sect. 2.5).

Now, many of the above algorithms are actually evaluated in simulation using the RoboCupRescue disaster simulation platform [56]. In this platform, emergency response tasks are modelled computationally (e.g., functions describing speed and efficiency of agents at completing tasks) and the emergency responders are modelled as agents that automatically implement the outputs of a given task allocation algorithm [28, 45]. On a related note, [40] present a system to simulate evacuations during major disasters and show how human subjects using mobile phones can be influenced by the movement of artificial agents. However, they do not attempt to optimise the evacuation process. While such evaluations are useful to determine extreme scenarios (e.g., best case when all agents implement all tasks perfectly or worst case when they do not), they are prone to misrepresentations of human decision making since they do not capture all the subtleties of human interactions and perception. In particular, they assume that all the actors in the system perfectly understand the messages exchanged, the information presented on their communication devices, and that they always follow the instructions received perfectly. In contrast, in a disaster response setting, responders may not always understand the plans computed by an agent nor obey instructions they receive from an agent (e.g., if they are part of different agencies, misunderstand the instructions, or are just tired).

2.3 Challenges for human-agent collaboration

Many multi-agent coordination algorithms have the potential to be applied to support task assignment of responder teams. However, before we use those algorithms to build agent-based planning support systems, there is a need to understand how humans and agents can effectively collaborate. Controlled experiments designed by the Human Factors community have sought to identify key aspects of human–agent collaboration [10, 15, 57, 60], propose transfer-of-control policies to shift control between humans and agents [52], and evaluate strategies of agent support for teams [33]. In particular, prior research has recognised that interaction design is vital for the performance of socio-technical human–agent systems [38], particularly where an agent directly instructs humans [37]. In particular, the latter argue that, with inappropriate interaction design, agent-based planning support may function inefficiently, or at worst, hinder the performance of human teams. This is echoed by [9] who have shown that an ill-designed work-flow management/automation system can lead to undesirable results, not only fail to improve work efficiency but also hinders human performance. Bowers et al. found that extreme difficulties might be encountered when introducing new technology support for human teams. Thus, new technologies might not support, but disrupt smooth workflow if they are designed in an organisationally unacceptable way [1]. In some cases, people not be able to access agent-based guidance simply because they do not wish to share their context with a centralised autonomous system [61].

Although there is much literature in planning support, task assignment, and human–agent collaboration, to the exception of [51, 53], no real world studies of how human emergency response teams actually handle agent support have been carried out. In fact, [51, 53] mainly focus on humans acting as peers to agents in computational simulations rather than real-world deployments in the field. This further highlights the need to evaluate these technologies with human users before they are deployed in real-world settings. In particular, in this work, we turn to the use of gamification to run such evaluations.

Recent work by Tambe et al. [58] has shown how humans may be able to implement plans computed as solutions to a Stackelberg game. While their solutions have been deployed with various human teams (e.g., guards at LAX airport or security teams in the Boston/New York/LA harbours), they do not consider how such plans can be generated in real-time in collaboration with humans (i.e., internalising human input dynamically).

2.4 Disaster simulation and games

Computational simulations, particularly agent-based simulations, are the predominant approach in the computing literature to predict the consequences of certain courses of action in disasters [22], to model information flow among first responders [49], to model the logistic distribution of emergency relief supplies [32]. However, as hinted above, these simulations are a poor substitute for real-world field trials. For example, Simonovic highlights that simulations may rely on unrealistic geographical topography, and most importantly, may not “account for human psychosocial characteristics and individual movement, and (\(\ldots \)) learning ability” [55]. Moreover, the impact of emotional and physical responses likely in a disaster, such as stress, fear, exertion or panic [17] remains absent in most approaches that rely purely on computational simulation.

To combat this, in our study, we adopt a serious mixed-reality game approach to provide a setting in which people experience realistic cognitive and physical stress [18]. Mixed-reality games are recreational experiences that make use of pervasive technologies such as smart phones, wireless technologies and sensors with the aim of blending game events into a real world environment [5]. Arguably, they have become an established vehicle to explore socio-technical issues in complex real world settings [16]. The major advantage of mixed-reality games is the fact that they are situated in the real world, which leads to increased efficacy of the behavioural observations when compared to computational simulations.

By adopting the mixed reality games approach, for the first time, we are also able to run repeated trials of interfaces for humans to use to control and receive instructions from an agent in real-time. In our particular context, the planning agent works alongside a human commander at central command (who can visualise its outputs and override them) in order to instruct human players on the ground. This arrangement of humans being guided by an agent in collaboration with a supervisor is novel in itself and raises issues related to trust in the planner agent, delegation of control (to humans on the ground, to agent at headquarters, or to human at headquarters), and interaction design for human–agent systems that can support collaborative work in real-time settings.

2.5 A short background on decision-theoretic multi-agent planning

Decision theoretic planning is typically solved using Markov Decision Processes [35]. A Markov decision process (MDP) is a mathematical framework for sequential decision making under uncertainty where the problem is specified as a set of states and transition functions (specifying links between these states based on the actions taken). Then, the solution to an MDP specifies what action should be taken in each state given the possible transitions from a given state. In the presence of multiple agents, this model has been extended to multi-agent MDP (MMDP) [7] where the action chosen at any state consists of individual action components performed by the agents. Theoretically, any algorithm such as linear programming, value iteration, or policy iteration that can solve MDPs can also be used to solve MMDPs. However, these are likely to be very inefficient because the action space grows exponentially with the number of agents. In an attempt to combat this complexity, [8] show how domain structure can be exploited and thus introduced the factored MDP (FMDP)  in which the state space is described by a set of variables and the transition model is defined by a dynamic Bayesian network (DBN). When the agents can only observe partial information about the state, this problem can be modelled by multi-agent partially observable MDPs (MPOMDP) [44]. Similar to MMDP, MPOMDP can be treated as an extension of single-agent POMDP to multi-agent domains. This analogy is useful because MPOMDPs can be solved as belief-state MMDPs where a belief state is a probability distribution over the states. All the above models assume that there is a centralised unit that will select a joint action for the team and distribute each action component to the corresponding agent. Decentralised POMDP (DEC-POMDP) [6] is a more general model where the agents are controlled in a decentralised manner. In other words, there is no centralised unit for distributing the actions and each agent must choose its own action based on the local observation.

In this paper, we restrict ourself to model our problem as the MMDP because other models do not fit the characteristics of our domain or are too difficult to be solved with the size of our problem. Specifically, in our domain, we consider a central controller (at headquarters) that will collect all the information and distribute the commands to each field responder. Therefore, it is not necessary to assume that the information is only partial (as in MPOMDPs) or the decision must be made locally by the responders (as in DEC-POMDPs). Furthermore, those models are much harder than MMDPs and the existing algorithms can only solve very small problems. Moreover, we do not use the FMDP because most of the algorithms for solving this model require that the value function can be factored additively into a set of localized value functions [20, 21, 31] and our problem does not have such structures. For example, in our domain, several tasks may depend on the same responder. If she is delayed in one task, this may affect the completion of the other tasks. In other words, the completion of one task may depend on the completion of the other tasks, and so, the value function can not be factored on the basis of the local task states. Our settings are also different from the one in [12] where they assume that the responders are self-interested and need to negotiate with each other on the task that they want to perform next.

As discussed above, any algorithm that can solve large MDPs can be used to solve MMDPs. However, most of the existing approaches are offline algorithms (see the most recent survey [35] for more detail). The main disadvantage of offline algorithms is that they must compute a complete action mapping for all possible states in the policy. This is intractable for problems with huge state space as in our domain. In contrast to offline approaches, online algorithms interleave planning with execution and only need to compute the best action for the current state instead of the entire state space. Specifically, we adopt the basic framework of Monte-Carlo tree search (MCTS) [29], which is currently the leading online planning algorithm for large MDPs, and divide our online algorithm into two levels: task planning and path planning. It is worth pointing out that our method is different from the hierarchical planning for MMDPs [39] because it requires the task hierarchy to be part of the model and our problem does not have such task hierarchy for the responders. Indeed, our problem is more closely related to the coalition formation with spatial and temporal constraints (CFST) problem where agents form coalitions to complete tasks, each with different demands. However, existing work on CFST often assumes that there is no uncertainty on the agents’ actions and the environment [45].

3 The disaster scenario

To develop a scenario for the evaluation of agent-based planning support, we held a number of consultations with emergency response organisations such as Rescue GlobalFootnote 2 and Hampshire County Council.Footnote 3 Specifically, these discussions informed the design of decision-making challenges (e.g., hazard avoidance, path planning, team coordination) that mimic those that pervade real-world disaster response planning and execution while making reasonable assumptions about the environment and the participants.Footnote 4

In more detail, we consider a disaster scenario in which a satellite, powered by radioactive fuel, has crashed in a sub-urban area.Footnote 5 Debris is strewn around a large area, damaging buildings and causing accidents and injuring civilians. Moreover, radioactive particles discharged from the debris are gradually spreading over the area, threatening to contaminate food reserves and people. As the movement of this radioactive cloud is dependent on wind speed and direction, radiation levels may change drastically across the disaster space over time (see Appendix 1 for more details of the model we use). Hence, emergency services (including medics, soldiers, transporters, and fire-fighters) are deployed to evacuate the casualties and key assets (e.g., food reserves, medication, fuel), each requiring different teams of FRs, before they are engulfed by the radioactive cloud. In what follows, we first model this scenario formally. Second, we describe the use of a planning agent at headquarters to help coordinate the team. Third, we formalise the optimisation problem faced by the planning agent in trying to coordinate the team of FRs on the ground (i.e., including fire-fighters, medics, and soldiers) with the objective o save as many lives and assets as possible while minimising the risk of being exposed to harmful radiation.

3.1 Formal model

Let \(G\) denote a grid overlaid on top of the disaster space, and assume the satellite debris, casualties, assets, and actors are located at various coordinates \((x,y) \in G\) in this grid. The radioactive cloud induces a radioactivity level \(l \in [0,100]\) at every point it covers (100 corresponds to maximum radiation and 0 to no radiation). While the exact radiation levels can be measured by responders on the ground (at a given location) using their geiger counter, we assume that additional information is available from existing sensors in the area.Footnote 6 However, this information is uncertain due to the poor positioning of the sensors and the variations in wind speed and direction (we show how this uncertainty is captured in the next section). A number of safe zones \(G' \subseteq G\) are defined where the responders can drop off assets and casualties (i.e., targets to be rescued). Let the set of FRs be denoted as \(I = \{p_1, \cdots p_i \cdots , p_n\}\), where \(|I| = n\) and the set of targets to be rescued (i.e., rescue tasks) be denoted as \(T = \{t_1,\cdots , t_j, \cdots , t_m\}\), where \(|T| = m\). A rescue task is performed by picking the target up, carrying it to a safe zone, and dropping it off. As FRs perform rescue tasks, they may become tired, get injured, or receive radiation doses that may, at worst, be life threatening. Hence, we assign each responder a health level \(h_i\in [0,100]\) that decreases based on its radiation dose (\(h_i\) is decreased by \(0.02 \times l\) per second given a radiation level \(l\)) and assume that its decision to pick up and carry the target allocated to it is liable to some uncertainty (e.g., they may not want to pick a target because it is too far away or it is unclear how long it is going to take them to carry it back to a safe zone). Moreover, let \(\varTheta \) denote the types of responders (e.g., fire brigade, soldier, transporter, or medic) and assume a responder’s type determines the capabilities she has and therefore the tasks she can perform. We denote as \(\theta _i \in \varTheta \) the type of responder \(p_i\). In turn, to complete a given task \(t_j\), a set of responders \(C \subseteq I\) with specific types \(\varTheta _{t_j} \subseteq \varTheta \) is required to pick up \(t_j\). Thus, a task can only be completed by a team of responders \(C_j\) if \( \varTheta _{t_j} \subseteq \{\theta _i | p_i \in C_j\}\). Given the distribution of responders across the physical space, different sub-teams will perform to different levels (as they have to travel different distances) and this poses a challenge for the commander to find the best teams needed to perform the tasks.

3.2 Human–agent collaboration

In line with practice in many countries, we assume that the FRs are coordinated from a headquarters (HQ) headed by a human coordinator \(H\). In our case, \(H\) is assisted by an agent-based planning agent \(PA\) (more details in Sect. 4), that can receive input from, and direct, the FRs. Both \(H\) and \(PA\) can communicate their instructions (task plans to pick up targets) directly to the responders using an instant messaging system (or walkie talkie). While these instructions may be in natural language for \(H\), \(PA\) instructs them with simple requests such as “Pick up target X at position Y with team-mates Z” messages. In turn, the responders may not want to do some tasks (e.g., if they are too tired, prefer to work with specific peers, or are prioritise some tasks over others) and may therefore simply accept or reject the received instruction from \(PA\) or \(H\).Footnote 7 However, \(H\) can query the responders’ decisions and request more information about their status (e.g., fatigue or health) and goals (e.g., meeting with team-mate at position X or going for task Y). Instead, if a task is rejected by the responders, \(PA\) records this as a constraint on its task allocation procedure (see details in Sect. 4.1.3) and returns a new plan. Thus on the one hand, richer interactions are possible between \(H\) and the first responders than between them and \(PA\). On the other hand, \(PA\) runs an sophisticated task allocation algorithm that can compute an efficient allocation, possibly better than the one computable by \(H\) (particularly when many responders need to be managed). Note that, in contrast to previous work that suggested transfer-of-control regimes [52], our approach to handling requests from FRs to change the agent-based plan, does not constrain transfers of control to target specific decision points in the operation of the system. Rather, our interaction mechanisms are designed (see Sect. 5) to be more flexible to allow human control at any point (and our results in Sect. 6 validate this approach), along with human supervision (to implement possible corrective actions) by \(H\).

3.3 The optimisation problem

Previous agent-based models for team coordination in disaster response typically assume deterministic task executions and environments [45, 50]. However, in order to evaluate agent-guided coordination in a real-world environment, it is important to consider uncertainties due to player behaviours and the environment (as discussed in the previous section). Given this, we propose a new representation for the task allocation problem in disaster response that does take into account such uncertainties. More specifically, we represent this problem using an MMDP that captures the uncertainties of the radioactive cloud and the responders’ behaviours. We model the spreading of the radioactive cloud as a random process over the disaster space and allow the actions requested from the responders to fail (because they decline to go to a task) or incur delays (because they are too slow) during the rescue process. Thus in the MMDP model, we represent task executions as stochastic processes of state transitions, while the uncertainties of the radioactive cloud and the responders’ behaviours can be easily captured with transition probabilities. More formally, the MMDP is represented by tuple \(\mathcal {M} = \langle I, S, \{A_i\}, P, R \rangle \), where \(I = \{ p_1, p_2, \cdots , p_n \}\) is the set of actors as defined in the previous section, \(S\) is the state space, \(A_i\) is a set of responder \(p_i\)’s actions, \(P\) is the transition function, and \(R\) is the reward function. In the MMDP, the state is Markovian and represents all the information that is sufficient for the agent to make decision. The transition function models the dynamics of the system and how it reacts to the responders’ actions. The reward function specifies the objective of the system (e.g., saving as many lives as possible or minimising the radiation dose received). We elaborate on each of these below.

In more detail, \(S= S^G_r \times S_{p_1} \times \cdots \times S_{p_n} \times S_{t_1} \times \cdots \times S_{t_m}\) where:

  • \(S^G_r = \{l_{(x,y)}| (x, y) \in G\}\) is the state variable of the radioactive cloud that specifies the radioactive level \(l_{(x,y)}\in [0, 100]\) at every point \((x, y)\in G\).

  • \(S_{p_i} = \langle h_i, (x_i, y_i), t_j \rangle \) is the state variable for each responder \(p_i\) that specifies her health level \(h_i\in [0, 100]\), her present position \((x_i, y_i)\), and the task \(t_j\) she is carrying out, which is \(\mathtt{null}\) when she has no task.

  • \(S_{t_j} = \langle \mathtt{st_j}, (x_j, y_j) \rangle \) is then the state variable for task \(t_j\) to specify its status \(\mathtt{st_j}\) (i.e., the target is picked up, dropped off, or idle) and position \((x_j, y_j)\).

The three types of actions (in set \(A_i\)) a responder can take are: (i) stay in the current location \((x_i, y_i)\), (ii) move to the 8 neighbouring locations, or (iii) complete a task located at \((x_i, y_i)\). A joint action \(\mathbf {a}=\langle a_1, \cdots , a_n \rangle \) is a set of actions where \(a_i\in A_i\), one for each responder (a responder may just stay at its current position if it has no targets to rescue).

The transition function \(P\) is defined in more detail as: \(P= P_r \times P_{p_1} \times \cdots \times P_{p_n} \times P_{t_1} \times \cdots \times P_{t_m}\) where:

  • \(P_r(s'_r|s_r)\) is the probability the radioactive cloud spreads from state \(s_r\in S^G_r\) to \(s'_r\in S^G_r\). It captures the uncertainty of the radiation levels in the environment due to noisy sensor readings and the variations in wind speed and direction.

  • \(P_{p_i}(s'_{p_i}|s, a_i)\) is the probability responder \(p_i\) transitions to a new state \(s'_{p_i}\in S_{p_i}\) when executing action \(a_i\). For example, when a responder is asked to go to a new location, she may not end up there because she is tired, gets injured, or receives radiation doses that are life threatening.

  • \(P_{t_j}(s'_{t_j}|s, \mathbf {a})\) is the probability of task \(t_j\) being completed. A task \(t_j\) can only be completed by a team of responders with the required types (\(\varTheta _{t_j}\)) located at the same position as \(t_j\).

Now, if task \(t_j\) is completed (i.e., in \(\mathtt{st_j}\in S_{t_j}\), the status \(\mathtt{st_j}\) is marked as “dropped off” and its position \((x_j, y_j)\) is within a safe zone), the team will be rewarded using function \(R\). The team is penalised if a responder \(p_i\) gets injured or receives a high dose of radiation (i.e., in \(s_{p_i}\), the health level \(h_i\) is 0). Moreover, we attribute a cost to each of the responders’ actions since each action requires them to exert some effort (e.g., running or carrying objects).

Give the above definitions, a policy for the MMDP is a mapping from states to joint actions, \(\pi : S \rightarrow \mathbf {A}\) so that the responders know which actions to take given the current state of the problem. The quality of a policy \(\pi \) is measured by its expected value \(V^\pi \), which can be computed recursively by the Bellman equation:

$$\begin{aligned} V^\pi (s^\tau ) = R(s^\tau , \pi (s^\tau )) + \sum _{s^{\tau +1}\in S} P(s^{\tau +1}|s^\tau , \pi (s^t)) V^\pi (s^{\tau +1}) \end{aligned}$$
(1)

where \(\tau \) denotes the current time point and \(\pi (s^\tau )\) is a joint action given \(s^\tau \). The goal of solving the MMDP is to find an optimal policy \(\pi ^*\) that maximises the expected value with the initial state \(s^0\), \(\pi ^* = \arg \max _{\pi } V^\pi (s^0)\).

At each decision step, we assume that the PA can fully observe the state of the environment \(s\) by collecting sensor readings of the radioactive cloud and GPS locations of the responders. Given a policy \(\pi \) of the MMDP, a joint action \(\mathbf {a}=\pi (s)\) can be selected and broadcast to the responders (as mentioned earlier).

4 Team coordination algorithm

Unfortunately, as in most MDP-based approaches to solving team coordination problems (see Sect. 2.5), our MMDP model results in a very large search space, even for small-sized problems. For example, with 8 responders and 17 tasks in a \(50\times 55\) grid, the number of possible states is more than \(2\times 10^{400}\). Therefore, it is practically impossible to compute the optimal solution. In such cases, we need to consider approximate solutions that result in high quality allocations. To this end, we develop an approximate solution using the observation that responders first need to cooperatively form teams (i.e., agree on who will do what), and that they can then independently compute the best path to the task. In our planning algorithm, we use this observation to decompose the decision-making process into a hierarchical structure with two levels: at the top level, a task planning algorithm is run for the whole team to assign the best task to each responder given the current state of the world; at the lower level, given a task, a path planning algorithm is run by each responder to find the best path to the task from her current location.

Furthermore, not all states of MMDPs are relevant to the problem (e.g., if a responder gets injured, she is incapable of doing any task in the future and therefore her state is irrelevant to other responders) and we only need to consider the reachable states given the current global state \(s\) of the problem. Hence, given the current state, we compute the policy online only for reachable states. This saves a considerable amount of computation because the size of the reachable states is usually much smaller than the overall state space. For example, given the current location of a responder, the one-step reachable locations are the 8 neighbouring locations plus the current locations, which are 9 locations out of the \(50\times 55\) grid. Jointly, the reduction is huge, from \((50\times 55)^8\) to \(9^8\) for 8 responders. Another advantage of online planning is that it allows us to refine the model as more information is obtained or unexpected events happen. For example, given that the wind speed or direction may change, the uncertainty about the radioactive cloud may increase. If a responder becomes tired, the outcome of her actions may be liable to greater uncertainty.

The main process of our online hierarchical planning algorithm is outlined in Algorithm 1. The following sections describe the procedures of each level in more detail.

figure a

4.1 Task planning

As described in Sect. 3.1, each FR \(p_i\) is of a specific type \(\theta _i \in \varTheta \) that determines which task she can perform and a task \(t\) can only be completed by a team of responders with the required types \(\varTheta _t\). If, at some point in the execution of a plan, a responder \(p_i\) is incapable of performing a task (e.g., because she is tired or suffered a high radiation dose), she will be removed from the set of responders under consideration (that is \(I \rightarrow I \setminus p_i\)). This information can be obtained from the state \(s \in S\). When a task is completed by a chosen team, the task is simply removed from the set (that is \(T \rightarrow T\setminus t_k\) if \(t_k\) has been completed).

Now, to capture the efficiency of groupings of FRs at performing tasks, we define the value of a team \(v(C_{jk})\) that reflects the level of performance of team \(C_k\) in performing task \(t_j\). This is computed from the estimated rewards the team obtains for performing \(t_j\) (as we show below). Then, the goal of the task planning algorithm is to assign a task to each team that maximises the overall team performance given the current state \(s\), i.e., \(\sum _{j=1}^m v(C_{j})\) where \(C_j\) is a team for task \(t_j\) and \(\{ C_1, \cdots , C_m \}\) is a partition of \(I\) (\(\forall j\ne j', C_j \bigcap C_{j'} = \emptyset \) and \(\bigcup _{j=1}^m C_j=I\)). In what follows, we first detail the procedure to compute the value of all teams that are valid in a given state and then proceed to detail the main algorithm to allocate tasks. Note that these algorithms take into account the uncertainties captured by the transition function of the MMDP.

4.1.1 Team value calculation

The computation of \(v(C_{jk})\) for each team \(C_{jk}\) is challenging because not all tasks can be completed by one allocation (there are usually more targets than FRs). Moreover, the policy after completing task \(t_j\) must also be computed by the agent, which is time-consuming given the number of states and joint actions. Given this, we propose to estimate \(v(C_{jk})\) through several simulations. This is much cheaper computationally as it avoids computing the complete policy to come up with a good estimate of the team value, though we may not be able to evaluate all possible future outcomes. According to the law of large numbers, if the number of simulations is sufficiently large, the estimated value will converge to the true \(v(C_{jk})\). This process is outlined in Algorithm 2.

figure b

In each simulation of Algorithm 2, we first assign the FRs in \(C_{jk}\) to task \(t_j\) and run the simulator starting from the current state \(s\) (Line 4). After task \(t_j\) is completed, the simulator returns the sum of the rewards \(r\) and the new state \(s'\) (Line 4). If all the FRs in \(C_{jk}\) are incapable of doing other tasks (e.g., suffered radiation burns), the simulation is terminated (Line 5). Otherwise, we estimate the expected value of \(s'\) using Monte-Carlo Tree Search (MCTS) [29] (Line 8), which provides a good trade-off between exploitation and exploration of the policy space and has been shown to be efficient for large MDPs.Footnote 8 After \(N\) simulations, the average value is returned as an approximation of the team value (Line 10).

The basic idea of MCTS is to maintain a search tree where each node is associated with a state \(s\) and each branch is a task assignment for all FRs. To implement MCTS, the main step is to compute an assignment for the free FRs (a FR is free when she is capable of doing tasks but not assigned to any) at each node of the search tree. This can be computed by Eq. 3 using the team values estimated by the UCB1 heuristic [2] to balance exploitation and exploration:

$$\begin{aligned} v(C_{jk}) = \overline{v(C_{jk})} + c\sqrt{\frac{2N(s)}{N(s, C_{jk})}} \end{aligned}$$
(2)

where \(\overline{v(C_{jk})}\) is the averaged value of team \(C_{jk}\) at state \(s\) so far, \(c\) is a trade-off constant, \(N(s)\) is the visiting frequency of state \(s\), and \(N(s, C_{jk})\) is the frequency that team \(C_{jk}\) has been selected at state \(s\). Intuitively, if a team \(C_{jk}\) has a higher average value in the trials so far or is rarely selected in the previous visits, it has higher chance of being selected in the next visit of the tree node.

As we assume that the type of a FR and the role requirements of each task are static, we can compute all possible team values offline. Therefore, in the online phase, we only need to filter out the teams for completed tasks and those containing incapacitated FRs to compute the team set \(\{ C_{jk} \}\). Note that the expected health levels of the FRs after completing tasks are considered during the simulations if they enter the radioactive cloud. To simplfy the simulations, we assume that there is no rejections from the FRs when completing the tasks.

4.1.2 Coordinated task allocation

Given the team values \(v(C_{jk})\) computed above, we then solve the following optimisation problem to find the best solution:

$$\begin{aligned} \begin{array}{lll} \max \limits _{x_{jk}} &{} \sum _{j, k} x_{jk} \cdot v(C_{jk}) &{} \\ \text{ s.t. } &{} x_{jk} \in \{0, 1\} &{} \\ &{} \forall j, \sum _{k} x_{jk} \le 1 &{} \text{(i) } \\ &{} \forall i, \sum _{j, k} \delta _i(C_{jk}) \le 1 &{} \text{(ii) } \end{array} \end{aligned}$$
(3)

where \(x_{jk}\) is the boolean variable to indicate whether team \(C_{jk}\) is selected for task \(t_j\) or not, \(v(C_{jk})\) is the value of team \(C_{jk}\), and \(\delta _i(C_{jk}) = 1\) if FR \(p_i\in C_{jk}\) and 0 otherwise. In the optimisation, constraint (i) ensures that a task \(t_j\) is allocated to at most one team (a task does not need more than one group of FRs) and constraint (ii) ensures that a FR \(p_i\) is assigned to only one task (a FR cannot do more than one task at the same time). This is a standard Mixed Integer Linear Program (MILP) that can be efficiently solved using off-the-shelf solvers (e.g., IBM CPLEX or lp_solve).

4.1.3 Adapting to FR requests

An important characteristic of our approach is that it can easily incorporate the preferences of the FRs. For example, if a FR declines a task allocated to it by the planning agent, we simply filter out the teams for the task that contain this FR. By so doing, the FR will not be assigned to the task. Moreover, if an FR prefers to do the tasks with another FR, we can increase the weights of the teams that contain them in Eq. 3 (by default, all teams have identical weights of 1.0). Thus, our approach is adaptive to the preferences of human FRs.

4.2 Path planning

In the path planning phase, we compute the best path for a FR to her assigned task. Our approach accommodates the uncertainties in the radioactive cloud and the FRs’ actions. We model this problem as a single-agent MDP that can be defined as a tuple, \(\mathcal {M}_i = \langle S_i, A_i, P_i, R_i \rangle \), where: (1) \(S_i = S^G_r \times S_{p_i}\) is the state space, (2) \(A_i\) is the set of \(p_i\)’s actions, (3) \(P_i = P_r \times P_{p_i}\) is the transition function, and (4) \(R_i\) is the reward function. In this level, FR \(p_i\) only needs to consider the states of the radioactive cloud \(S^G_r\) and her own states \(S_{p_i}\) and her moving actions. Similarly, the transition function only needs to consider the spreading of the radioactive cloud \(P_r\) and the changes of her locations and health levels when moving in the field \(P_{p_i}\), and the reward function only needs to consider the cost of moving to a task and the penalty of receiving high radiation doses. This is a typical MDP that can be solved by many existing solvers (see the most recent survey [35]). We choose Real-Time Dynamic Programming (RTDP) [4] because it is simple and particularly fits our problem, that is, a goal-directed MDP with large number of states. However, other approaches for solving large MDPs could equally be used here.

There are several techniques we use to speed up the convergence of RTDP. In our problem, the map is static. Thus, we can initialize the value function \(V(s)\) using the cost of the shortest path between the current location and the task location on the map, which can be computed offline without considering the radioactive cloud. This helps RTDP quickly navigate among the obstacles (e.g., buildings, water pools, blocked roads) without getting trapped in dead-ends during the search.

Since, in this paper, we focus on the integration and validation of the algorithm in a real-world deployment, we leave the presentation of computational simulation results and comparisons with other agent-based planning solutions (using our MMDP formulation) to Appendix 3. As argued earlier (see Sect. 2.3), while computational simulations are useful to exemplify extreme cases, they do not explain how human FRs and the planning agent actually collaborate. We resolve this issue via a real-world trial of our algorithm as part of a planning agent next.

5 The AtomicOrchid platform

In this section we describe the AtomicOrchid environment used to embed the planning agent in order to trial mixed-initiative coordination. In more detail, AtomicOrchid is a location-based mobile game based on the fictitious scenario described in Sect. 3. First responders are assigned a specific type: medic, fire-fighter, soldier, or transporter. Their mission is to evacuate all four types of targets: victim (requires medic and fire-fighter), animal (requires medic and transporter), fuel (requires soldier and fire-fighter), or other resource (requires soldier and transporter) before the disaster space is covered by a radioactive cloud (which we simulate using a diffusion process described in Appendices 1 and 2). The first responders are supported by (at least) one person in a centrally located HQ room, and the planning agent \(PA\) that sends the next task (as described in the previous section) to the team of first responders. In what follows, we present the player interfaces used and the interactions with the planning agent in the game. A video of the operation of AtomicOrchid can be viewed at: http://bit.ly/1ebNYty.

5.1 Player interfaces

FRs are equipped with a ‘mobile responder tool’ providing sensing and awareness capabilities in three tabs (geiger counter, map, messaging and tasks; see Fig. 1). The first tab shows a reading of radioactivity, player health level (based on exposure), and a GPS-enabled map of the game area to locate fellow FRs, the targets to be rescued and the drop off zones for the targets. The second tab provides a broadcast messaging interface to communicate with fellow FRs and the commander \(H\). The third tab shows the team and task allocation dynamically provided by the agent \(PA\) that can be accepted or rejected. Notifications are used to alert both to new messages and task allocations.

Fig. 1
figure 1

Mobile responder tool

\(H\) has at her disposal an ‘HQ dashboard’ that provides an overview of the game area, including real-time information of the players’ locations (see Fig. 2). The dashboard provides a broadcast messaging widget, and a player status widget so that the responders’ exposure and health levels can be monitored. \(H\) can further monitor the current team and task allocations to individual responders by \(PA\) (by clicking on a button). The radioactive cloud (see Appendices 1 and 2 for more details) is graphically depicted as a heatmap (‘Hotter’ (red) zones correspond to higher levels of radiation). Crucially, only \(H\) can ’see’ the entire cloud, while field responders are restricted to seeing a reading for their current location on their Geiger counters—this is a deliberate design choice to require frequent communication between HQ and field responders.

Fig. 2
figure 2

HQ interface

5.2 System architecture

AtomicOrchid is based on the open-sourced geo-fencing game MapAttackFootnote 9 that has been iteratively developed for a responsive, (relatively) scalable experience. The location-based game is underpinned by client-server architecture, that relies on real-time data streaming between client and server. Client-side requests for less dynamic content use HTTP. Frequent events (e.g., location updates and radiation exposure) are streamed to clients to avoid the overhead of HTTP. In this way, FRs are kept informed in near real-time. Finally, to build the mobile app, we adapted the existing MapAttack Android app.

5.3 Integrating the planning agent

The planning agent \(PA\) takes the game status (i.e., positions of players, known status of the cloud, and messages received from players) as input and produces a plan for each FR for the current state. \(PA\) is deployed on a separate server. The AtomicOrchid server requests a plan from the agent via a stateless HTTP interface by transmitting the game status in JSON format. Polling (and thus re-planning) is triggered by two types of game events:

  • Completion of task On successful rescue of a target, a new plan (i.e., allocation of tasks to each FR) is requested from the agent.

  • Explicit reject On rejection of a task allocation by any of the allocated FRs, a new plan is requested. More importantly, the rejected allocation is, in turn, used as a constraint within the optimisation run by the planner agent (as described in Sect. 4.1.3). For example, if two FRs, one medic and one soldier, were allocated a task and the medic rejected it, the planning agent would rerun the optimisation algorithm with the constraint that this medic should not be allocated this task. If another medic is free (and accepts) the soldier and this medic can go ahead and complete the task. Otherwise, a new plan is created for the soldier. We provide a depiction of the flow of control between between \(PA\), \(H\), and the FRs in Fig. 3, highlighting the types of messages that may be sent (e.g., \(H\) will supervise instructions sent by \(PA\) and advise FRs rather than instruct as in the case of the \(PA\), while FRs will simply reject \(PA\)’s instructions with a tap on the mobile responder tool while they will normally respond over a chat interface to \(H\)).

Fig. 3
figure 3

Interactions between \(PA\), FRs, and \(H\)

Once a plan is received from \(PA\), the AtomicOrchid game engine splits the plan for a given team into individual task allocations for each player and sends them to their mobile FR app. The app presents the task allocation in the task tab, detailing: (i) the FR to team up with, (ii) the allocated target (using target id), and (iii) approximate direction of the target (e.g., north, east). Once a player accepts a task, an acknowledgement is sent to their teammate, while rejecting a task triggers a new assignment from \(PA\).

Now, in order to create a challenging, dynamic and pressured setting for the game players, \(H\), and \(PA\), we simulate a radioactive cloud that moves according to simulated wind conditions that mimics real-world cloud movement behaviours. The model we use for this purpose is described in Appendices 1 and 2.

6 Field trialling AtomicOrchid

We ran three sessions of AtomicOrchid with participants recruited from the University of Nottingham to trial mixed-initiative coordination in a disaster response scenario. The following sections describe the participants, procedure, session configuration and methods used to collect and analyse quantitative and qualitative data.

6.1 Participants and procedure

Our approach to field trials is strongly influenced by [11, p. 1663] who state that “participants in field trials are used as experts on their own activity, attempting to predict what might happen with a particular technology, to develop insight based on their use”. In particular, they point to the fact that running multiple trials (in an attempt to obtain statistically significant results) has no relationship to insightfulness. Crucially, they suggest that participants can be seen as investigators themselves in the trial. Hence, in line with this argument, we do not attempt to run tens or hundreds of trials on many different scenarios to generate statistically significant results. Rather, we focus on running enough trials to gain an insight into how participants perceive and interact with our planning agent.

Hence, a total of 24 participants (17 males and 7 females) were recruited through posters and emails, and reimbursed with £15 for 1.5–2 h of study. The majority were students. The procedure consisted of 30 min of game play, and about 1 h in total of pre-game briefing, consent forms, a short training session, and a post-game group discussion. The participants were told that their goal was to save as many targets as possible (out of a total of 20) before being engulfed by the ‘radioactive cloud’. Observations in the trial (such as running and screaming players) showed that this game mechanic put participants under the kind of stress that incites a real visceral response, supporting the objective of our mixed-reality game approach (see Sect. 2.4).

At the end of the briefing, in which mission objectives and rules were outlined, FR types were randomly assigned to all participants (fire-fighter, medic, transporter, soldier). The HQ was staffed by a different member of the research team in each session in order to mimic an experienced \(H\), while avoiding the same person running the HQ every time. Moreover, FRs were provided with a smartphone and \(H\) with a laptop. The team was given 5 min to discuss a common game strategy.

FRs were then accompanied to the starting point within the designated game area, about 1 min walk from headquarters. Once FRs were ready to start, \(H\) sent a ‘game start’ message. After 30 min of game play the FRs returned to the HQ where a group interview was conducted, before participants were debriefed and dismissed.

6.2 Game sessions

We ran one session without \(PA\), and two sessions with \(PA\) to be able to compare team performance in the two versions. Each session involved different sets of FRs (8 each). Thus, players were unique to a session to avoid learning effects between sessions. While we cannot rule out learning effects within sessions, we accept this as an integral part of working with humans in the real world.

We also ran a pilot study for each condition to fine tune game configuration. The 8 FRs in each session were randomly allocated a role so that the whole team had two of each of the four types of FRs. The terrain of the \(400\times 400\) metre game area includes grassland, a lake, buildings, roads, footpaths and lawns. There were two drop-off zones and 16 targets in each session. There were four targets for each of the four target types. The target locations, pattern of cloud movement and expansion were kept constant for all game sessions. The pilot study showed that this was a challenging, yet not too overwhelming configuration of game area size, and number of targets to collect in a 30 min game session.

6.3 Data collection and analysis

We collected more than 20 h of video of participants at the HQ, and participants on the ground (using multiple cameras to follow multiple teams at the same time). Video recordings of field action were catalogued to identify sequences (episodes) of interest through identification of key decision points in teaming and task allocation. Interesting distinct units of interaction were transcribed and triangulated with log files of relevant game activity for qualitative interaction analysis, presented in detail elsewhere [25]. For the purposes of the analysis presented here, we analysed the video data to classify how the agent task allocations were handled by humans.

In addition, we developed a log file replay tool to triangulate video recordings of game action with the timestamped system logs that contain a complete record of the game play, including FRs’ GPS location, their health status and radioactive exposure, messages, cloud location, locations of target objects and task status.

In what follows, we present an analysis of both \(PA\)’s task allocations and the messages sent between FRs and \(H\) to support coordination in order to assess how humans interact with each other and with \(PA\). In particular, we use speech-act theory [54] to classify the messages composed by humans. We focus on the most relevant types of acts in this paper (which are also the most frequently used in AtomicOrchid):

  • Assertives speech acts that commit a speaker to the truth of the expressed proposition; these were a common category as they include messages that contain situational information (e.g., You are next to the radiation cloud or the task is North of your current position).

  • Directives speech acts that are meant to cause the hearer to take a particular action; requests, commands and advice, including task and team allocation messages (e.g., X go to task 1, Y go with X).

6.4 Results

The participants were seen to engage engage with the game scenario, showing signs of stress when targets could not be saved, and at times, running to save targets before these were covered by the radioactive cloud. This confirmed that the fun factor of the game helped to incentivise the participants to optimise their task allocation plans. Figure 4a, b show two example traces of player movements during the games that demonstrate different levels of coordination and how it materialises in their choices to pair up with specific team-mates and where they choose to go. Specifically, Fig. 4a shows an example of poor coordination whereby players go multiple times to the same spot and different teams attempt to perform the same task. Some of this confusion was seen to result from delays in communicating and GPS jitter. On the other hand, Fig. 4b shows an example of good coordination where only pairs of players go to individual positions and avoid trying to complete other teams’ tasks.

Fig. 4
figure 4

Traces of player movements during two games. a Players are seen to herd to different tasks with no clear coordination. User ids in the game are shown in the legend. b Players are seen to pair up with different partners at different times indicating good coordination

Overall, 8 targets were rescued in the non-agent condition (Session A), and respectively 12 targets (Session B) and 11 targets (Session C) were rescued in the agent condition. Teams (re-)formed six times in session A, four times in session B and nine times in session C. Average player health after the game was much higher (more than double) for the agent-assisted sessions (91 for Session B and 72 for Session C) compared to the non-agent assisted session (40 in Session A) (see Table 1). In fact, one FR ‘died’ in Session A. This suggests that the agent’s more conservative approach not to send FRs into ‘harms way’ paid off. Moreover, surprisingly this more conservative approach did also not result in fewer targets saved than with a, perhaps, more risk seeking approach by HQ in Session A. This may be explained by more timely instructions by the agent, as these were generated automatically after a target had been dropped off.

\(PA\) dynamically re-planned 14 times in session B and 18 times in session C. In most cases, this was triggered when a target was dropped off in the safe zone (24 times)—as this frees up resources for \(PA\) to recompute an allocation. In the remaining cases, this was triggered by a player declining the agent’s task allocation (8 times).

In particular, Fig. 5 shows how FRs handled task allocations in the agent and non-agent conditions. In the non-agent condition, the HQ commander sent 43 task allocation directives (see Table 2 for details of each run) . Of these, the recipient FRs addressed only 15 messages (bringing them up in conversation). Of these 15, FRs chose to ignore the instructions only once. The FRs ignored the instruction because they were engaged in another task and did not want to abandon it. A further 4 \(H\) instructions were consistent with a decision to rescue a certain target that had already been agreed locally by the FRs. In the remaining 10 cases, FRs chose to follow the instructions. Although players were willing to follow \(H\)’s instructions, they failed to correctly follow the instructions due to confusion and misunderstanding in the communication. In fact, only 2 instances of directives from \(H\) led to task completion. The FRs performed 6 rescue operations (tasks) without being instructed by \(H\).

Table 1 Health of FRs.
Fig. 5
figure 5

How task allocations were handled by FRs in the version with agent (left), and without agent (right). The ‘overridden’ box denotes the number of times an ‘accepted’ allocation in the non-agent case was not carried out to its full extent and, instead, another task was chosen by \(H\) to be completed

Table 2 Message classification

In contrast, when task allocation was handled by the agent (52 tasks allocated in two trials on average), FRs explicitly accepted 24 tasks, of which they completed 15 successfully. Although there was either no response or no consensus between the FRs (in 17 tasks allocated), they still completed 6 of these tasks successfully. In total, 20 task allocations were withdrawn by the agent as a result of re-planning.

In terms of task rejections, FRs rejected \(PA\)’s task allocation 11 times in the agent version. All of the rejections happened when the task allocation would have split existing teams, or instructed FRs to team up with physically more distant FRs. In most cases (9 out of 11), they triggered re-planning by rejection and adjusted the task allocation to become consistent with the FR’s current team. In the other two cases, the FRs rejected the task allocation one more time before receiving the desired task allocation. For accepted instructions, the average distance between suggested teammates was 12 metres. For rejected instructions, the average was 86 metres.

The results above show that the simple mechanism to get \(PA\) to re-plan (i.e., reject/accept) was more successful (more tasks completed and less confusion) than the open-ended interactions between \(H\) and the FRs (that were open to confusion). Moreover, the fact that many of the rejections were due to the long distance to travel and teammate preference, implies that players chose to do the tasks they preferred rather than those deemed optimal by the agent. This indicates there may be an issue of trust in the agent, but also that it may be easier for a FR to impose (through a reject) such preferences on an agent (and indirectly to other team members) rather than expressing this to \(H\) or directly to teammates. It is also important to note that in the agent-assisted setting, \(H\) frequently monitored the allocation of tasks returned by the agent (57 clicks on ‘show task’ in UI FR status widget). Whereas 43 directives out of 68 in the non-agent session were task allocations, only 16 out of 68 were directly related to task allocations in the agent version. Out of these, \(H\) directly reinforced the agent’s instruction 6 times (e.g., “SS and LT retrieve 09”), and complemented (i.e., added to or elaborated) \(PA\)’s task allocation 5 times (e.g., “DP and SS, as soon as you can head to 20 before the radiation cloud gets there first”). \(H\) did ‘override’ \(PA\)’s instruction in 5 cases (as depicted by the ‘overridden’ box).

In the agent version, most of \(H\)’s directives (52 out of 68) and assertives (49 out of 51) focussed on providing situational awareness and routing the FRs to avoid exposing them to radiation. For example, “NK and JL approach drop off 6 by navigating via 10 and 09.”, or “Radiation cloud is at the east of the National College”.

These quantitative results point to the fact that \(H\) used \(PA\) as an advisor and in the non-agent case, \(H\) had to provide more information to FR on the ground. In fact, during the debriefing session, \(H\) operators in the non-agent case explicitly mentioned being overloaded by the number of variables under consideration when allocating tasks to FRs (e.g., providing situational awareness and allocating tasks). However, when supported by \(PA\), the participants mentioned they trusted the agent to make the right decision and mainly spent time supervising its allocations or even improving the allocation in some cases. For example, in one instance, \(H\) explicitly advised FRs to go for a task that \(PA\) calculated would be covered by the radiation cloud before they could get there. However, \(PA\)’s estimates of FRs’ running speed was obviously wrong in this case and the FRs were able to rescue the target.

In summary, these results suggest three key observations with regard to human–agent coordination in the trial:

  1. (1)

    FRs performed better (rescued more targets) and maintained higher health levels when supported by the agent. These results echo those obtained under simulation (see Appendix 3) and may reflect the better forward-planning capability of the planning agent compared to human FRs in the specific task allocation context under consideration. However, this result applies within the specific setting where the agent is able to model the priorities of the FRs very precisely (as per our constrained scenario). In a more ill-defined setting, it is likely that the precision of such a model may degrade and therefore human-driven strategic decision making may take precedence over the agent (see point 3 below).

  2. (2)

    Rejecting tasks was relatively frequently employed to trigger re-planning to obtain new task allocations aligned with FR preferences. In each case, the planning agent was able to adapt to provide an alternative that was acceptable to the FRs. Without this facility we believe the FRs would have chosen to ignore the plan (as experienced in our pilot studies). Task rejection seemed to be linked to changes to established teams, especially when members were relatively distant. Consequently, these kinds of allocations may need particularly support (e.g., explanation) or might be less preferentially selected by \(PA\).

  3. (3)

    When task allocation was handled by \(PA\), \(H\) could focus on providing contingent and vital situational awareness to safely route first FRs around danger zones: thus demonstrating effective division of labour and complementary collaboration between humans and agents. We suggest that this effective division of labour contributes substantially to the observed performance increase of FRs when supported by the agent.

Given the above observation we argue that a planning agent for team formation should not only model the uncertainty in player behaviours and in the environment, but that interactional challenges also need to be addressed if such a technology is to be accepted in practice. In the next section we elaborate on how our study results lead to new design guidelines for human–agent collectives.

7 Discussion and conclusions

In this paper we developed a novel approach to evaluating human–agent collaboration in uncertain, dynamic, and real-time environments. We developed a novel planning agent (using an MMDP approach) that is able to allocate tasks to teams in real-time and showed how it outperformed other baseline approaches. Furthermore we demonstrated how such a planning agent could be used by a human commander in AtomicOrchid, a mixed-reality game that simulates a real-world disaster and requires participants to collaborate and perform tasks in a physical environment.

Results from our field trials indicate the planning agent instructed players to carry out successful plans (outperforming a no-agent setting in terms of tasks completed and responders unharmed). The agent’s ability to re-plan as per responders’ preferences and constraints was particularly effective as evidenced by the acceptance levels of agent-based instructions. In particular, based on our analysis, we propose the following design guidelines for human–agent collaboration in human–agent collectives:

Adaptivity Our experience with AtomicOrchid suggests that planning algorithms should be designed to take in human input, and more importantly, be responsive to the needs of the users. As we saw in AtomicOrchid, players repeatedly requested new tasks and this would not have been possible unless our algorithm was computationally efficient but could dynamically assimilate updates, requests, and constraints. We believe this makes the algorithm more acceptable to the users. However, this adaptivity does cost the system in terms of efficiency as the rejection of tasks may lead the problem to be so constrained that the algorithm cannot return any solutions. To alleviate such issues, we believe human mediation may be important in nudging the players to justify their rejection of tasks or to nudge them not to do so too frequently.

Interaction simplicity Our agent was designed to issue simple commands (Do X with Y) and respond to simple requests (OK or Reject Task). Such simple messages were shown to be far more effective at guiding players to do the right task than the unstructured human communication in the non-agent assisted case that was fraught with inconsistencies and inaccuracies. In fact, we would suggest that agents should be designed with minimal options to simplify the reasoning users have to do to interact with the agent, particularly when they are under pressure to act. However, interaction simplicity in this context, to us also means providing human responders with interactive abilities to do what they are good at: dealing with unforeseen contingencies. Hence, it is important to provide unconstrained communication means such as chat, walkie talkies or mobile phones in addition to the ‘simple’ instructions that the agent provides. In effect, we are designing an interactional setting in which the agent is dealing with the routine and predictable aspects of the setting (repetitive tasks assignments), and the human coordinators in the HQ are freed up to deal with contingencies and the less predictable complexities as and when they arise.

Flexbile autonomy The HQ dashboard proved to be a key tool for the HQ coordinator \(H\) to check and correct for the allocations of \(PA\), taking into account the real-world constraints that the players on the ground faced. In particular, letting the human oversee the agent (i.e., “on-the-loop”) at times and actively instructing the players (and bypassing the agent) at other times (i.e., “in-the-loop”) as and when needed, was seen to be particularly effective. This was achieved by \(H\) without the agent defining when such transfers of control should happen (as in [52]) and, therefore, left the coordinator the option of taking control when she judged it was needed. However, while this allows humans to choose what to do, it is not clear whether they would have been better off going with the agent’s plan. Hence, we suggest that such deployed autonomous systems should be built for flexible autonomy. Specifically, interfaces should be designed to pass control seamlessly between humans and agents and the implications of human-based “corrective” actions should be made explicit to the humans to ensure they know when to take control, and when to let the agent decide.

Much remains to be done to further validate agent-based planning in real-world disaster response given that field trials of AtomicOrchid are limited to using volunteers and in settings that only approximate the typical environments faced by emergency responders. Hence, in future work, building upon our ethnographic studies of real-world training exercises [19], we aim to deploy our planning agent to support expert emergency responders from RescueGlobal during their annual multi-national disaster response training exercise (Angel ThunderFootnote 10). By so doing, we will develop new requirements for agent-based technologies for settings where users are highly trained and the actions taken by the agent can have major impact on the performance of the team (e.g., leading to loss of lives or waste of real resources).