Keywords

1 Introduction

The Future Combat Air System (FCAS) will encounter the challenges of future operating environments (FOE) for European Air Forces. One part of this system network is the Next Generation Weapon System (NGWS), the ability of which is to penetrate denied airspace. Due to the high risk associated with this task, it is envisioned to reduce the number of manned platforms using unmanned aerial vehicles. To investigate how this joint operation of manned and unmanned forces can be realized, we developed a laboratory prototype of cockpit and mission dynamics at the Institute of Flight Systems. Our approach is that the manned assets command the unmanned aerial vehicles, as well as their mission payloads. This approach is known under the term Manned-Unmanned Teaming (MUM-T). The term describes the interoperability of manned and unmanned assets to pursue a common mission objective. MUM-T requires to master the high work demands posed on the operator arising from the multi-platform mission management. We developed intelligent automation which supports and cooperates with the pilot [1, 2]. This study describes an effort to evaluate our prototype within a human-machine experiment.

2 Background

This chapter details the made assumptions for the design of the MUM-T prototype, the hypothesis we address and the background on quantifying complex human-in-the-loop experiments.

2.1 Assumptions

Future Operating Environments

The FOE is the operationalization of possible military conflicts. It is an important objective for the capability development. From the guidelines of the FOE we generated realistic application scenarios with the involvement of military domain experts. The challenge resulting from these scenarios place requirements on the MUM-T prototype to be developed. A future weapon system must be capable of effective service in both permissive and contested air missions. For this reason, we must design automation approaches that are suitable for general use.

Human-Autonomy Teaming Technologies

The key aspect for a MUM-T system is to keep the pilot in the decision-making process without overtaxing him. All developed automation functions contribute to maintain the task load of the pilot in a manageable zone and facilitate the accomplishment of mission. For mission management and unmanned vehicle guidance we used a concept called task-based guidance. The pilot uses a generic task formulation to command unmanned vehicles [3]. The task assignment is supported by a mission planning instance interacting with the pilot, to find optimal mission plans in a mixed-initiative manner [4]. With increasing autonomy automation-induced errors arises [5]. The onboard assistant system supports the pilot in the decision-making process by analyzing his current activity [6] and his mental state [7] permanently. We decided to team the manned fighter with two types of unmanned system.

  • Highly capable unmanned combat aerial vehicles (UCAV):

    It has a fighter jet like platform performance. They receive mission tasks from the pilot and are able to derive a course of action, including route planning, pattern calculation, sensor/effector management. Additionally, they incorporate intelligent cognitive automation for mutual support [8].

  • Swarm UAV:

    It is a cheap disposable aerial vehicle which can be deployed in large numbers in enemy territory. They organize themselves in a decentralized manner and use different swarming algorithms dependent on the assigned task. Swarming promises a variety of operational advantages [9, 10].

All platforms must be integrated into a MUM-T system, and the human-machine interaction and hierarchies must clearly be defined. For this reason, the prototype design is a human-integration challenge. The description language for Human-Autonomy-Teaming [11] helps to visualize the present relations within our MUM-T system. Figure 1 shows a MUM-T system with the manned fighter jet guiding two UCAVs and one swarm network.

Fig. 1.
figure 1

Possible configuration of MUM-T system. The manned fighter jet hosts pilot, assistant system and platform. Both UCAVs comprise the cognitive agent and the platform. Represented by an avatar agent, the entities of the swarm are considered as a single team member.

Force Compositions

In this study, we consider the presented works on our Human-Autonomy Teaming technologies as the proposed technical solution. However, each mission scenario poses specific challenges and subsequently requires a specific composition of the MUM-T system. Thus, a significant challenge in composing valuable force compositions will be in identifying the combinations of parameters, from among the range of possibilities, that will be most relevant in the mission. The identification of these interdependent parameters are the operational requirements we can contribute to a future weapon system.

2.2 Hypothesis

Regarding the MUM-T prototype the following aspects are examined.

H-1::

A pilot is able to guide a MUM-T system efficiently in a military air mission.

H-2::

The human places trust in the automation (cognitive agents).

H-3::

Keep the human in the decision-making process.

H-4::

The force composition of the MUM-T system impacts the workload imposed on the pilot.

2.3 Quantifying a Human-in-the-Loop Experiment

Kantowitz [12] suggests that external validity for a complex human-machine experiment could be viewed as having three major issues; representativeness of subjects, of setting and of variables. Our professional military pilots meet the requirements as suitable subjects. Due to the complexity of military air missions, only experts can assess the system design and execute the missions. The setting representativeness refers to the coherence between the test situation in which research is performed and the target situation in which research must be applied. The cockpit simulator and the missions were developed with domain knowledge gained from expert interviews in the squadrons. The acceptance and realization of the mission is also part of the feedback questionnaire that the pilot had to fill out after each mission, to evaluate this. So, the remaining point for validation of our human-machine approach are the variables. For this, mission execution must be quantified using indicators for performance, efficiency and workload. Renger [13] was able to form four areas of indicators for the evaluation of human-machine systems.

  • Goal achievement indicators

  • Work rate indicators

  • Operability indicators

  • Knowledge indicators

We will base the discussion of the experimental results on this structure. The chosen variables relating to the individual domains are presented in Sect. 3.4.

3 HITL Experiment

The experiments have been conducted with eight active German Air Force pilots of different age, experience levels, trained platform types. Each participant was trained to our system for two days. Each pilot performed six full missions from take-off to landing to prevent incorrect result arising from a lack of immersion or decreased situational awareness.

3.1 Test Environment

Our MUM-T cockpit simulation consists of a generic fighter jet cockpit and a dome projection system as external view. In addition to throttle and stick, three multi-touch screens are available for input to the participant of the experiment. The central screen displays a tactical map and is mainly used for mission management, like tasking unmanned assets or threat assessment. Freely configurable additional information can be shown on all displays. A non-intrusive eye-tracking system is used for gaze estimation providing important context information for the assistant system to assess the pilot’s activity [6] (Fig. 2).

Fig. 2.
figure 2

Generic fighter jet cockpit simulator of the Institute of Flight System.

3.2 Procedure

We defined missions in three levels of difficulty, from asymmetric conflict to a symmetric peer-level opponent. Within each difficulty level the threats, mission tasks, and timings are kept constant. Each difficulty level will be performed with two force configurations. The permutations are variables regarding i.e. the number of UCAVs and swarm networks, the payload, the authority/autonomy given to the UCAV. We want to verify the influence and possible operational advantages of different force compositions. Figure 3 shows the experimental design with the configurations that appeared most relevant for us. The faded areas depict potential possibilities in the design space.

Fig. 3.
figure 3

Heuristic for selecting suitable scenarios and configuration. The dimension should be read as nominal scales. On the x-axis different composites of the MUM-T team are depicted, the y-axis shows the level of mission difficulty.

In total, six full mission were conducted with each pilot. Within these the MUM-T system had to cope with different kinds of tasks. A typical mission sequence is the penetration of enemy territory (Ingress), reaching the target area to achieve the desired effect and leaving enemy territory (Egress). Enemy Air defense (ground and air-based) had to be considered and dealt with. If the objective is the engagement of a ground target, a F2T2EA (find, fix, track, target, engage, assess) cycle had to be performed. Therefore, an imaging-capable platform of the MUM-T system and a suitable effector must be available at the desired location. In the following the missions and the main differences of the individual configurations are described.

MV-A (permissive): Find few unknown mobile targets with all enemy defense sites known.

  • Configuration I: NGF with single-digit number of UCAVs.

  • Configuration II: NGF with one UCAVs and a swarm network of around ten platforms.

MV-B (medium): Find a time-sensitive target within a medium sized target area (CAT 7) in partially known enemy territory.

  • Configuration III: NGF carrying a swarm (around ten entities) with a single-digit number of UCAVs.

  • Configuration IV: NGF with the same single-digit number of UCAVs.

MV-C (contested): Enable an Offensive Counter Air (OCA) mission by suppressing the belt of enemy air defenses during ingress and egress:

  • Configuration V: NGF with a single digit number of UCAVs. Each UCAV could be tasked using an Area of Responsibility. Within this area each UCAV could operate autonomous by assigning targets fitting the task profile (i.e. SAM sites in a SEAD area).

  • Configuration VI: NGF with the same single digit number of UCAVs and an air-launched decoy swarm (around ten entities) stimulating the enemy air defense.

These mission scenarios were designed to systematically analyze the MUM-T system. Thus, we must evaluate the team performance of this system in a quantitatively manner.

3.3 Test Persons

(See Table 1)

Table 1. Test persons for the HITL experiment

3.4 Indicators of HITL-Experiment

Goal Achievement Indicators

contain the percentage of the achieved mission objectives and checks if the mission success criteria were met. All missions are underlain by rules (Rules of Engagement, ROE), compliance with which is also important.

Work Rate Indicators

comprise factors that specifically address the mission execution. In the experiment setting, two different types exist. The first measures system performance without considering the human explicitly. We will focus on the following parameters

  • Time an enemy radar tracks a friendly force. The following list names the military abbreviations for different indications of the radar warning receiver (RWR) of the aircrafts. The resulting risk increases in the downward direction.

    • DIRT: RWR indication of surface threat in search mode.

    • MUD: RWR indication of surface threat in track mode.

    • Singer: RWR indication of a surface-air-missile (SAM) launch.

  • Distance to threats (air or surface)

  • Dislocation of own MUM-T system

  • Average time for target identification

Within the scope of this article we focus on the tracking time of the MUM-T components. Besides these, there are the mental performance indicators. They describe the workload acting on a person. We decided to use the NASA-TLX questionnaire, which is a well-established tool and nowadays used far beyond its original application [14].

Operability Indicators

focus on the human user. Here it is evaluated which form of automation the pilot uses. Of interest are also measures that provide information about the task spectrum and activity of the pilot. In this way it is possible to trace down what a pilot does. For this we use information provided by the eye-tracking system. The gaze measurements are real-time analyzed and evaluated with a specific task-model [6]. Beyond the scope considered here, an indicator is also system errors. Why and when did they become apparent during the mission and how to prevent them in further development?

Knowledge Acquisition Indicators

describe the effort that the test persons must put in to learn, understand and remember how to use the system. We did not evaluate those types of indicators qualitatively during the days of training. We assumed that the test persons will be in control of the system at the end of the test phase.

4 Results

This section contains the results of the human-in-the-loop experiments that will be used to validate our MUM-T approach. The first section provides the results of the performed missions. In the following section the subjective opinions gathered in questionnaires are presented.

4.1 Mission Results

The results are outlined according to the different indicators described in Sect. 3.3.

Goal Achievement Indicators

With all provided configuration, the pilots were able to carry out the missions successfully. With configuration I, one pilot was not able to fulfill the secondary task in MV-A. The rules of engagement were also violated twice, as the minimum safety distance to the enemy interceptors was not maintained. In configuration V, one pilot strayed away from other aircrafts which should be protected (Table 2).

Table 2. Goal achievement indicators for the different configurations.

Work Rate Indicators

The results are split into two sets - the operational and the human related indicators.

Operational Work Rate Indicators

MV-A: This mission was a covert operation. In configuration I the pilots used a UCAV for the search and verification of the targets. In configuration II, this task was entrusted to the swarm network. All pilots chose holding points outside the enemy’s early warning radar. The assumption was made that the swarm platforms were not traceable due to their size. The enemy interceptors were alarmed when UCAVs entered the range of an early warning radar. The total time the configuration was DIRT is shown in Fig. 4 (top). The mean time of configuration I is 783 s and for configuration II it is 162 s.

Fig. 4.
figure 4

DIRT, MUD, SINGER times for mission: MV-A (top), MV-B (center), MV-C (bottom).

MV-B: In this comparison the operational differences using a swarm network for time-critical missions are investigated. The UCAV needed for the search area (CAT 7 – coordinates) 225 s for a full scan. The swarm network using pheromone-based search algorithms [15] with around ten entities needed 45 s (97% coverage). Considering the overall workload for the pilot the missions comprised no difference. The time the system was DIRT, was reduced using the swarm network (configuration III) from 857 to 582 s. Thus, we had an increase in the MUD time for configuration III – 72 s (configuration IV – 17 s). Two pilots performing mission MV-B with configuration IV were even fired upon (SINGER).

MV-C: Within this mission we compared the impact of locally given autonomy for UCAVs to a situational awareness gained from decoy swarms [16]. Configuration 5 shows lower DIRT (565 s), MUD (85 s) and SINGER (18 s) times compared to configuration VI (1250 s, 117 s, 20 s). Also, the overall workload of the pilot Fig. 4 shows an increase of 7.2% (configuration V – 41.9%, configuration VI – 49.1%). Thus, MV-C with configuration VI has placed greater demands on both the system and the pilot.

Mental Work Rate Indicators

The workload of the pilots performing the missions with different force configurations is shown in Fig. 5. With the increase in mission difficulty (MV-A, MV-B, MV-C) an increase of the mean workload is discernible. Excluded is an outlier existing in MV-A with configuration II indicating higher workload for one pilot. For MV-A with configuration I the mean workload is 34.3%, for configuration 2 it is lower with 29.5%. The mean average of both configuration III & IV of MV-B shows the same workload of 36.5%. In MV-C the workload of configuration VI with 49.0% exceeds configuration V with 41.8%.

Fig. 5.
figure 5

Arising workload, measured with a NASA-TLX questionnaire, performing the different missions within the experiment.

Operability Indicator

This section focuses on how the human operate the system. Thus, we recorded the observations of the assistant system to gain insight into the pilot’s activity. The task the pilot did, depended on two factors. The mission with its objectives, and the individual pilots. The chronological progression of the domains of occupation is shown in Fig. 6 for a single pilot with configuration I. Classic pilot tasks are aviate manually or with autopilot and navigate (red, light red and light green). Additional task load emerges with tactical assessment of the environment (blue), mission planning (green) and UAV management and monitoring (light orange). Analyzing the figure, one can deduce the phases of the mission. The time of the target engagement was (11.5 min after mission start). This pilot took over the managed aircraft and performed the attack manually. Before the engagement the pilot verified the target with a provided sensor picture (purple). Round about one minute after the effector was dropped the pilot confirmed the effectors impact.

Fig. 6.
figure 6

Pilot activity determination for one pilot in MV-A and configuration I. (Color figure online)

Figure 6 points out the mission sequence for a single pilot. As mentioned, the activity history looks different for each mission and each pilot. To determine the differences of activity for all pilots, Fig. 7 shows the percentage of activity over the total mission time. The magnitude of the variation can be read off the Box Plot. The great variety of the quartiles for Aviate and Autopilot Flight can be attributed to the fact that they were exclusive. The system was flown either manually or by autopilot.

Fig. 7.
figure 7

Box plot of the percentage of activity over the total mission time for configuration I.

4.2 Questionnaire Results

The pilots conducted a questionnaire after each mission. An excerpt from the questionnaire is depicted in Fig. 8. Therefore, for each question are 48 data points available. The Likert-Scale was chosen to uncover subtleties of opinion understanding the received feedback and to improve the system.

Fig. 8.
figure 8

Answers of questionnaire, 8 pilots each performing 6 missions.

The mission scenario design showed high acceptance. The chosen force configurations are remarkably compatible with a configuration desired by the pilots. The assistance system including the planning agent and the interface is predominantly very well rated. Trust was placed in the unmanned systems of both traits, UCAVs and the swarm network. Few would like to have more possibilities of intervention on the unmanned systems, same applies to more automation. The concept of task-based guidance was very well accepted and according to the pilot, it seems also suitable for delegating manned wingman. More than 50% could envisage conducting future military operations with the system design proposed here.

5 Discussion

We structured the discussion in accordance with the different indicators.

Goal Achievement

All pilots could efficiently operate the MUM-T system within various environmental settings. Thus, hypothesis H-1 is supported. In MV-A could be identified, that the use of a swarm network could positively influence the achievement of secondary goals and the compliance with restrictions. Same holds for the two configurations (V, VI) in MV-C.

Other advantages of configurations cannot be evaluated at this goal-based level.

Work Rate

Operative advantages incorporating swarming platforms are present in MV-A using configuration II. Due to the covert mission one must remain undetected by the enemy for as long as possible. Using swarm platforms in this mission contributes to a temporal resolution of the conflict with enemy aircrafts. This has direct impact on the arising workload affecting the pilot during the mission (supports hypothesis H-4).

In MV-B, the pilots also operated with a swarm network. Although the pilots must guide many vehicles, the necessary effort of delegation and monitoring stays the same. The pure number of platforms of configuration III and configuration IV more than doubles. This points in favor of our approach to integrate the swarm as an avatar into the system network. With equal workload, the target is found, identified and fought more quickly.

In MV-C with configuration 5 the pilots were able to give the UCAVs higher authority by allowing them to attack SAM sites within SEAD boxes on their own (Area of Responsibility). Thus, the UCAV was able to derive targets and actions as an autonomous, but locally restricted, system. This resulted in a reduction of the perceived workload (Fig. 5). In configuration 6, a swarm of cruise missile were used to stimulate the enemy air defense. Although most of the SAM systems were known, pilots experienced this configuration as more strenuous to fly. Objectively, the work index speaks for the higher automation, the pilots appreciate the operational advantages of the decoy swarm. In the debriefing 75% of the pilots in configuration 5 were no longer able to name the number and type of SAM systems that occurred. Reducing the workload by increasing the autonomy of the UCAVs inevitably yield a loss in transparency. Thus, the level of automation must suit the situation.

Operability

For the operability the activities of the pilot were observed. Classical pilot task like aviate and navigate are reduced using automation. In our case, these activities occupied round about 46 percent of the mission. One hundred percent would mean that the pilot is doing this task continuously. In a MUM-T system, the pilot no longer must deal only with his own aircraft, but with the operational planning of the unmanned aircraft. Thus, pilot tasks like mission planning and the monitoring which claim the pilot with 27 and 32 percent respectively. In parallel the pilots check the tactical situation for 53 percent of the mission execution time. Threat assessment can be count to the main responsibilities for a future MUM-T pilot.

General Assessment of the Pilots

The subjective rating of the pilots gave evidence for great acceptance of the overall system design. The degree of reality of the mission design was predominantly assessed as high to very high. The representativeness of setting is, as presented, one of main aspects contributing to the validity of human-in-the-loop evaluation, could be verified through the questionnaires. All missions could be performed within manageable work- and task load. In no situation did any of the test persons feel overwhelmed (supports hypothesis H-3). Thus, the trust in automation for the assistant system was high. All pilots relied on the unmanned systems to perform their tasks independently. This indicates support for the hypothesis H-2. Due to the high trust, the monitoring process has been kept to a minimum by the pilots. Operator responses to Likert-scale questionnaires as well as their verbal feedback during debrief sessions reinforced the finding that the task-based guidance concept is a sophisticated way of interacting with other teammates.

Operational Requirements

The operational requirements for a human-machine system can be derived from the mission scenarios, the concept of deployment and the automation. The missions define the need for specific platform parameters and abilities, each member needs to possess. We presented three different types of aerial vehicles, a manned fighter jet, highly capable UCAVs, and cheap disposable air-launched decoys. We used a teaming structure as the concept of deployment and integrated the UCAVs and the swarming UAVs into the team. To team with the swarming network we used an avatar representation which allows the network to be viewed as a single unit. Automation function must incorporate a concept for multi-platform guidance, intelligent agents onboard the unmanned systems and an assistant system supporting the pilot. Assuming the automation presented in this article, all pilots were able to conduct a military air operation together with unmanned system in a simulated environment.

6 Summary

The validation of modern military simulation relies heavily on the opinion of military experts, and it makes the validation task exhaustive and time-consuming. With our approach integrating the human in our system verification process, we receive additional subjective verification. The experts were able to contribute their knowledge through real-time interaction with the system. Their opinions therefore arose not only from pure observing a system. With the experiment, one can conclude that the design of a MUM-T system, as it was realized, enables a trained military fighter pilot to handle unmanned systems within a complex military air mission. The assistant system and the introduced automatic function for the cognitive agent onboard the UCAVs could reduce workload to maintain the human situational awareness. Even delegating a swarm network of around ten vehicles, additionally to UCAVs, can be made possible with the presented avatar. The human is adaptively kept in the decision-making process to form an efficient human-machine team. Concepts – like task-based guidance – can also be transferred to purely manned systems to improve cooperation. We were thus able to provide a first impression of how a MUM-T system might look like. We identified that with concepts like scalable autonomy individual habits of the pilots can be satisfied. It should be considered that automation-induced errors can occur due to the multitude of automatic functions. This can be counteracted with an adaptive assistance system that provides support based on the situation, the current pilot’s activity and his mental state.