Keywords

1 Introduction

Extensive research in human performance has investigated the association between operator states and performance. As pilots’ primary role in flight operations is to perceive, process and act upon information from their environment, flight operations rely on the human operator’s cognitive resources to accomplish this task (Endsley 1995; Rolfe and Lindsay 1973). A cognitive resource frequently discussed in aviation is workload. Workload arises when a human operator must dedicate physical and psychological resources towards performance on a task (Hart and Staveland 1988; Young et al. 2015). The association between operator workload and operator task performance is well established and can be described by an inverted U-shape: operator performance is poor when workload is either too low or too high (Young et al. 2015). As such, considerable efforts have been dedicated by researchers to measuring workload. One of these approaches aims to infer operator workload from psychophysiological measures. Proponents of this approach highlight that, in addition to being objective measures, they are not intrusive to the primary task, allowing workload measurement in a naturalistic environment (Cain 2007; Lehrer et al. 2010). However, interpreting physiological measures in relation to flying tasks is difficult as these measures are a result of overall psychophysiological activity.

Of the various psychophysiological measures of workload, a large body of research has investigated cardiac function through electrocardiography (ECG; reviewed in Roscoe 1992). In aviation, this is commonly done by measuring heart rate and heart rate variability (HRV) across different phases of flight. Heart rate is higher in phases with increased workload such as take-off and landing, across different pilots and airplanes (Causse et al. 2012; Roscoe 1993). Even in trained fighter pilots, progressive increases in heart rate have been observed (Mansikka et al. 2015, 2016). Abnormal situations such as in-flight emergencies have also been shown to increase heart rate (Kinney and O’Hare 2020). Investigating HRV (i.e., the variation in the duration of the R-R interval), De Rivecourt et al. (2008) conducted a study of pilot candidates in an instrument flying simulator. Decreases in HRV from baseline, indicative of increased parasympathetic activation, were observed during manoeuvres considered to impose higher workload (Mansikka et al. 2015, 2019). Together, these experiments provide ample evidence that heart rate and HRV change across phases of flight.

While this research shows that cardiac function differs between separate phases of a flight, it remains difficult to establish that these changes are a direct function of differences in workload. It is conceivable that these changes are a result of other task-related factors (e.g., stress, attention). Roscoe (1978) reviewed early studies of cardiac response of two-crew operations (i.e.: one flying and one nonflying pilot). Because both pilots are exposed to the same psychological stressors, only workload is different between pilots. He suggested that cardiac responses changed consistently for the pilot flying, but nonflying pilot responses varied based on the degree of task involvement, supporting its use as an indicator of workload. Furthering the association between cardiac function and workload, cardiac function also correlates with subjective workload across different phases of flight (Lee and Liu 2003; Mansikka et al. 2019). Together, these studies provide further evidence that cardiac function changes in response to the workload imposed by flight. However, most of this research employs an experimental design where workload is compared between different types of flying tasks (e.g.: different phases of flight or different manoeuvres). Therefore, it is unknown whether heart rate and HRV are sensitive to workload changes between similar flight tasks, where workload is manipulated by increasing the task difficulty.

In sum, research in aviation and similar domains suggest that heart rate and HRV are reflecting the demands of flight on pilots. However, the majority of this research investigates workload by comparing measurements between manoeuvres, phases of flight, or pilots’ task. To the authors knowledge, heart rate and HRV have seldom been used to measure workload across different variants of the same manoeuvre. Thus, the present study aims to investigate the use of ECG as a measure of workload through a different aviation paradigm where workload is induced by increasing the difficulty of a manoeuvre.

2 Method

The sample included commercial (n = 5) and airline (n = 9) pilots (males n = 12, AgeRange 23–42, Median = 26.5). All pilots held a valid medical certificate, multi-engine qualification, and instrument flying rating. Participants flew an Ascent XJ flight training device (Mechtronix, Montréal, Canada); a fixed base simulator configured as a narrow body transport jet. Participants acted as pilot flying, while a qualified pilot researcher acted as pilot monitoring.

Electrocardiogram data were collected using a Polar H10 heart rate monitor (Polar Electro, Kempele, Finland) sampling at 1000 Hz. The Elite HRV app (Polar Electro, Kempele, Finland; Elite HRV LLC, Asheville, USA) exported the sequential R-R interval durations in milliseconds. Subjective workload was measured with the NASA-TLX (Hart and Staveland 1988) application on an Apple iPad Pro (Apple, Cupertino, USA).

2.1 Procedure

Study objectives, procedures, and simulator handling characteristics were briefed, along with informed consent. Then, pilots completed two flights. The first flight familiarized participants with the simulator’s handling characteristics and ensure harmony between the pilot flying and pilot monitoring. The second flight was the experiment flight. Take-off and landing were conducted for ecological validity but were not evaluated. Table 1 describes the manoeuvre sequence for both flights. Pilots completed the NASA-TLX between manoeuvres.

Table 1. Manoeuvre sequence for practice and experimental flight.

2.2 Data Handling

Due to the small participant sample size, missing data were excluded as interpolation or mean substitution would be unreliable. ECG data was missing for one participant. NASA-TLX subjective workload ratings were missing for two participants. Interbeat interval data were analyzed in Kubios HRV 3.3.1 (Kubios, Kuopio, Finland). Heart rate and heart rate variability (SDNN) were calculated for time windows corresponding to each manoeuvre. The simulator data was extracted via video recording of a screen displaying simulator parameters via optical character recognition using a custom R program implementing Google’s Tesseract OCR engine (Google, Mountain View, USA). Flight path deviations were calculated as an objective performance measure for each manoeuvre. For turns, this refers to the time, in seconds, spent outside acceptable airspeed (240 knots – 260 knots) and altitude (4900 feet – 5100 feet) (Transport Canada 2017). For stalls, this refers to the interval between the stall warning onset and the point when safe airspeed (200 knots) and altitude (4800 feet) were attained.

3 Results

Descriptive statistics and distributions plots were calculated for flight path deviations to ensure the complex manoeuvres were more difficult, showed by increased deviations. Mean flight path deviations were higher during steep turns (M = 13.29, SD = 18.47) than during normal turns (M = 6.50, SD = 12.60, t = −1.57). Mean recovery time for the complex approach to stall was higher (M = 43.50, SD = 11.95) than for the simple approach to stall (M = 17.50, SD = 5.37; t =  −8.19). The distribution plots suggest that greater range in performance between participants was greater for the complex variants. Altogether, these data suggest that the complex manoeuvres were more difficult than their simple counterparts.

Table 2. Descriptive Statistics for heart rate, heart rate variability (SDNN), flight path deviations and NASA-TLX

Descriptive statistics and distributions plots were calculated for heart rate, heart rate variability, and NASA-TLX ratings (see Table 2 & Fig. 1). Qualitative observation of the graphs suggests that heart rate and heart rate variability are different between manoeuvre types, but not between the levels of difficulty.

Fig. 1.
figure 1

Distribution plots for each manoeuvre. Each dot represents a single participant, and lines connect within-subject observations. (A) Heart rate in beats per minute (B) Heart rate variability (SDNN) in milliseconds (C) Time outside ideal flight path, in seconds. (D) Compound NASA-TLX subjective workload ratings, in percentage. Note: NT: normal turn, SAS: simple approach to stall (clean configuration), ST: steep turn, CAS: complex approach to stall (landing configuration).

Repeated-measures analyses of variance (ANOVA) were conducted to investigate between-manoeuvre differences in cardiac function, and NASA-TLX ratings. Planned post-hoc, uncorrected, dependent samples t-tests are reported to compare manoeuvre pairs, contrasting the turns (i.e., normal vs. steep turn) and approaches to stall (i.e., simple vs. complex). Effect sizes, Bayes Factors and confidence intervals will be reported, but not p-values. Interpretations of Bayes Factors follow general guidelines (Wetzels et al. 2011). Due to the sample size, these statistical tests are considered exploratory and should be interpreted with caution.

First, an ANOVA was conducted to investigate overall heart rate differences between manoeuvres, which suggests an effect of manoeuvre on heart rate, F(3,36) = 9.97, η2 = .45, with decisive evidence for the research hypothesis (BFM = 324.58). Pairwise comparisons found anecdotal evidence suggesting a medium increase in heart rate between normal and steep turns (BF10 = 1.15, t = 1.92, d = .53, 95% CI [−.06, 1.11]) and between the simple and complex approaches to stall (BF10 = 1.09, t = 1.88, d = .52, 95% CI [−.07., 1.09]). However, the Bayes Factors and qualitative observation of the descriptives plot suggests these differences are not large enough to be meaningful.

Second, an ANOVA was conducted to investigate overall differences in SDNN between manoeuvres, which suggests a moderate effect of manoeuvre on SDNN (F(3,36) = 8.70, η2 = .42) with decisive evidence for the research hypothesis (BFM = 164.82). Two non-directional, dependent sample t-tests were conducted compare simple and complex manoeuvre variants. Anecdotal evidence was found suggesting no difference in SDNN between normal and steep turns (BF10 = .30, t = .39, d = .11, 95% CI [−.44, .65]) and between the complex and simple stalls (BF10 = .38, t =  −.87, d = −.44, 95% CI [−.79., .32]). However, qualitative observation of the descriptives plot suggested differences between manoeuvre types (Fig. 1). Dependent, non-directional t-tests were therefore conducted contrasting the normal turn with the simple approach to stall, and the steep turn with complex approach to stall. Substantial evidence was found suggesting that the SDNN was lower during the turns than during the stalls (for simple variants: BF10 = 9.22, t =  −.36, d = −.93, 95% CI [−1.58, .26]; for complex variants: BF10 = 9.91, t =  −3.41, d =  −.95, 95% CI [−1.59, −.27]).

Last, an ANOVA was conducted to investigate overall differences in subjective workload, which found substantial evidence supporting differences in subjective workload between manoeuvres (F(3, 33) = 5.24, η2 = .32, BFM = 9.78). Two non-directional, dependent sample t-tests were conducted to compare simple and complex manoeuvre variants. Substantial evidence was found suggesting a difference in subjective workload between normal and steep turns (BF10 = 4.22, t = 2.88, d = .83, 95% CI [.15, 1.48]) and between the complex and simple stalls (BF10 = 9.24, t = 3.41, d = .99, 95% CI [.27, 1.67]).

4 Discussion

Here we aimed to investigate the usability of heart rate and HRV to detect workload changes between maneuvers of varying complexity. We hypothesized that heart rate, HRV and NASA-TLX would be able to differentiate between high and low workload manoeuvres, and between manoeuvre types. Our results provide partial support for our hypothesis. Heart rate and HRV were different between turns and stalls, but did not vary meaningfully between the simple and complex variants. On the other hand, NASA-TLX scores were different between the simple and complex variants, but not between manoeuvre types.

Increases in heart rate and decrease in HRV (measured by SDNN) are associated with increased arousal and workload (Jorna 1993; Luque-Casado et al. 2016). Therefore, we expected to observe an increase in heart rate and a decrease in HRV in the complex exercises. However, we observed clear differences between manoeuvre types (i.e., turn vs stall), but little difference was found between the complex and simple variants of each manoeuvre. Pilots’ heart rate was higher during the turns than during the approaches to stall, and HRV was lower during turns. When contrasting the complex manoeuvres to their simple variant, heart rate and HRV were similar. Given the limited sample size, we were unable to conduct traditional parametric statistical tests on these between-condition differences. Nevertheless, the general pattern of the results suggests cardiac function was different between manoeuvre types (i.e., turns vs. approaches to stall), but not between different difficulties of the same manoeuvre type.

These results do replicate the findings reported in similar studies (Causse et al. 2012; Hankins and Wilson 1998; Kinney and O’Hare 2020). Taken together with the results of the present experiment, heart rate appears to change predictably due to workload induced by different manoeuvre types, but not vary as a function of workload changes due to manoeuvre difficulty. It is probable that the manipulation of difficulty may not have been sufficient to create a significant change in heart rate (see workload and performance limitations later). Additional research is required to define contexts where psychophysiological measures are most appropriate measurements of workload in human factors research. For example, the changes in cardiac function related to workload imposed by basic processes required to complete a task, regardless of its difficulty level. One could hypothesize that the patterns of between-manoeuvre changes in heart rate and HRV are caused by the inherent differences in pilot resources required for completing each manoeuvre. Turns require sustained attention, and consistently require updating a model in working memory with multiple parameters (i.e., bank angle, altitude, airspeed, engine thrust) and precision. Conversely, stalls require the prompt application of a trained rehearsed recovery procedure, with an emphasis placed on quickly and successfully exiting the situation, and not on the precision of flying inputs. Changes in psychophysiological measures may occur due to these different types of task demands.

Conversely, subjective workload evaluations are biased by previous exposure to similar manoeuvre and performance (Moore and Picou 2018). In the present experiment, pilots reported differences in workload between the simple and complex manoeuvres, but not between manoeuvre types. It is possible that pilots’ workload estimates were derived from previous exposure to similar situations, not as a function of the resources required. This may explain subjective workload scores’ ability to differentiate between manoeuvre difficulty levels, as flight path deviations were generally higher during the difficult manoeuvre, but not manoeuvre type.

This study does have limitations. We were unable to attain the planned sample size due to the COVID-19 pandemic. As such, the statistical tests included are considered exploratory. However, our sample size remains similar to that of aviation psychophysiological research (e.g.: Hankins and Wilson 1998; Hidalgo-Muñoz et al. 2018; Lee and Liu 2003; Mansikka et al. 2019). This study retains the limitations associated with the use of heart rate and HRV as objective workload indicators. Cardiac function is the result of multiple physiological processes occurring simultaneously, so its use as a workload metric remains confounded by external physiological factors. This renders interpretation of heart rate and HRV difficult in the aviation environment. Further research is required to identify guidelines for use of cardiac function as a workload indicator.

The present results add to the literature supporting the strategic use of cardiac function as an objective workload measure to compare aviation situations likely to require different types of cognitive and physiological resources on behalf of the pilot. The present study suggests that heart rate and HRV are limited indicators of workload in the cockpit, especially when different variants of the same manoeuvre are being studied. Nonetheless, cardiac function can complement subjective measures to better understand workload in the cockpit.