1 Introduction

Mental workload researchers have identified a variety of psychophysiological measures that have proven sensitive to cognitive task demands, including indices based on electrocardiography (ECG) [17, 44], transcranial Doppler sonography (TCD) [39, 53, 61], electroencephalography (EEG) [4, 14, 21], functional near infrared [12, 61], and eye tracking [2, 5, 27, 37]. Electromyographic (EMG) measures, based on electrical potentials produced by motor units during muscle contraction, have also demonstrated sensitivity to task demands (e.g., [18, 65]), yet face and neck surface EMG (sEMG) has received little attention in workload research despite the critical role of face and neck musculature in reflecting and expressing human mental/emotional state (whether through non-verbal cues or spoken expression). The lack of attention is perhaps not surprising given the obtrusiveness of many existing sEMG sensor designs. However, recent advances in sEMG sensor design/miniaturization and in EMG signal processing technologies are paving the way for a new generation of unobtrusive sEMG sensors that will conform to the skin surface and minimize any negative impact on the wearer. For example, there are new, commercially available miniature differential sensors specifically designed for high-fidelity, wireless recording of facial sEMG signals [41, 42]. These advances have sparked new interest in the application of face/neck sEMG to cognitive workload assessment.

Face and neck sEMG may offer a unique window into human emotional state, complementing or even replacing previously studied psychophysiological measures as sensing modalities in the real-time assessment of cognitive workload. Many facial muscles are situated immediately below the skin surface and are thereby readily accessible for sEMG recording. They have high endurance and show little change in EMG power spectra across repeated facial contractions [57], and a link between emotional responses and high levels of cognitive strain has also been established (e.g., [20]). A study of error-related activity in the corrugator supercilii, a muscle of the upper face (medial eyebrow region) involved in facial expressions, has linked amplified EMG activity to error commission with less than 100 milliseconds of latency [35].

This paper reports findings of an exploratory reanalysis of an existing dataset, originally collected in a previous investigation by Stepp et al. into the modulation of a specific neck sEMG signal, known as neck intermuscular beta coherence (NIBcoh), by speech and non-speech behaviors [52]. In the present work, the authors reanalyzed this dataset to investigate the potential utility of NIBcoh in real-time cognitive workload assessment. Specifically, the sensitivity of NIBcoh to cognitive task demands and the relationships between NIBcoh and task performance were examined. This reanalysis was an initial step in an ongoing program of research intended to shed light on the potential application of face and neck surface EMG to the real-time assessment of cognitive workload.

The paper makes three primary contributions to the sciences of cognitive workload and of EMG-EMG coherence analysis. First, the reanalysis provides limited evidence that NIBcoh is sensitive to variations in task demand (or attention) across similar speech-related tasks. Second, it indicates that NIBcoh may be correlated with error commission within the context of a specific time-pressured mental arithmetic task requiring verbal responses. Finally, the findings offer validation for concerns raised in [45] regarding the common use of full-wave rectification in EMG-EMG coherence analysis.

The rest of the paper is organized as follows. Section 2 provides relevant background concerning EMG and EMG-EMG coherence measures, establishes existing support for a potential connection between cognitive workload and face/neck sEMG, generally, and NIBcoh more specifically, and offers possible advantages of face and neck sEMG as a real-time cognitive workload sensing modality. Section 3 describes the conditions in which the NIBcoh dataset was collected and the methods employed in the recent reanalysis. Section 4 presents and evaluates the results of the analysis. Finally, Sect. 5 concludes by summarizing the key findings and positioning them within the context of ongoing and future research.

2 Related Work

2.1 Human Mental Workload

The general concept of cognitive workload has been recognized and studied for at least 50 years, although no formal, standard definition of the construct has yet to emerge within the research community [9]. This paper defines cognitive workload similarly to the operational definition proposed by O’Donnell and Eggemeier [46], as “the fractional utilization of an individual’s limited cognitive resources at a particular moment.” As noted above, a number of psychophysiological measures have proven sensitive to variations in cognitive task demands, supporting their potential utility in the measurement of cognitive workload. However, several studies offer evidence of divergence among known psychophysiological workload indices [26, 32, 38, 64]. The hypothesized causes for disassociation among these measures are varied and include both lack of specificity (i.e., some measures can be influenced by non-workload factors) and lack of diagnosticity (i.e., measures may reflect differing aspects of workload, consistent with multi-resource theories (e.g., [40, 63]) in which workload is a multi-faceted construct arising from the capacity of and demand for multiple cognitive resources).

In addition to psychophysiological measures, a number of subjective instruments for cognitive workload assessment have been developed and validated [36]. These include both offline, retrospective instruments, such as the NASA Task Load Index (NASA-TLX) [25], the Workload Profile (WP) [56], and the Subjective Workload Assessment Technique (SWAT) [48], and online (real-time) techniques, such as the Instantaneous Self-Assessment of Workload (ISA) [30]. While subjective instruments may help to validate other measures or models of cognitive workload, a valid objective measure offers obvious relative benefits, including correspondence to reality uncontaminated by subjectivity and the avoidance of self-assessment procedures that may distract from primary tasks. The assessment of cognitive workload has found applications in a wide variety of human endeavors, including: manual assembly/manufacturing [43], medical education [7], air traffic control [15], and vehicle operation including that of trains [1, 47, 50], aircraft [8], and motor vehicles [4, 55]. Therefore, the development of a robust, objective, real-time measure of cognitive workload can be expected to have widespread benefits.

2.2 Face and Neck Surface Electromyography

Previous research has demonstrated the utility of facial sEMG for recognizing and classifying emotional responses [11, 54, 58]. In contrast to image-based expression assessment, sEMG has the potential to identify rapid or slight facial expressions, including subtle muscle contractions below the threshold necessary to generate visible changes in the surface contours of the face [57]. Further, sEMG may provide greater sensitivity regarding the locations and magnitude of facial contraction when compared to image-based assessment. Automatic video quantification of facial movements is relatively difficult for features aside from high-contrast tissue edges, whereas sEMG can discern a broad combination of facial muscle actions across both high- and low-contrast regions.

Beyond the identification of facial expressions, face/neck sEMG can quantify neuromuscular activity related to the vocalization and articulation of speech, even for utterances that are not vocalized (i.e., sub-vocal speech). Several research groups have demonstrated the utility of non-acoustic speech recognition technologies based on surface EMG signals [13, 31, 41, 42], confirming that surface EMG provides ample speech-related information whether speech is spoken aloud or only “mouthed”. Because computer-based speech recognition is possible from EMG alone, it is reasonable to suggest that these signals may also change under varying cognitive load conditions in a manner resembling acoustic markers, such as those employed in voice stress analysis. The characteristics of face and neck sEMG make it particularly well suited to specific operational contexts. In noisy environments (such as an aircraft cockpit) in which an acoustic signal might be compromised, for instance, subtle acoustic features relating to cognitive load might be more readily gleaned from activity in the musculature involved in speech articulation than from degraded acoustic signals. The ability to detect subtle contractions associated with sub-vocal speech or slight (perhaps even involuntary) facial expressions could prove beneficial even in tasks that do not involve significant amounts of speech. Most facial expression assessment and eye blink tracking is done through image-based data collection, but this approach is problematic when individuals are freely moving and thereby changing head orientation relative to video capture sources. In addition, flight equipment such as helmets, glasses, and face-masks can preclude visualization of facial movements. In contrast, sEMG sensors can reside under headgear [3] or be incorporated into face masks, chin straps, etc. [10].

Finally, the dimensionality of sEMG, particularly when considering signals from multiple locations on the face and neck, offers a distinct advantage over many of the low-dimensional measures that are unobtrusive enough for operational use in real-time workload assessment. If face/neck sEMG can support speech recognition and the classification of emotional responses, perhaps it is also capable, alone or in concert with other measures, of distinguishing cognitive states that are conflated by other physiological signals. That is, it may help to overcome the lack of specificity and diagnosticity noted in Sect. 2.1.

Although sEMG appears to offer distinct advantages over other sensing modalities in some recording contexts, it also has potential drawbacks. Physiological measures typically require some degree of instrumentation, and even though sEMG is less prone to noise from movements and environmental sources compared to electrically weaker EEG, it is perhaps more cumbersome and prone to noise than other measures such as heart and respiration rate. Moreover, while modern sEMG recording systems do not require the use of conductive gels [41, 42], the recorded skin surface should nevertheless be clean and free from hair that can impede adequate electrode contact. This precludes some potential speech-related neck/face recording locations in individuals with beards or when proper skin preparation is impractical (e.g., military field deployment, extremely dirty or wet environments, etc.). In addition, the degree to which NIBcoh measurement is potentially degraded by motion artifact or environmental sources of electrical noise in the non-laboratory setting is still unknown.

2.3 Intermuscular Beta Coherence

The present study focuses on a particular measure, known as neck intermuscular beta coherence (NIBcoh), derived from the surface EMG signal at two anterior neck recording locations superior to (above) neck strap muscles involved in speech. Coherence, generally, is a frequency domain measure of the linear dependency or strength of coupling between two processes [24, 62]. The coherence function, \( {|{R}_{xy}(\lambda )|}^{2} \), can be defined as in Eq. 1 below, where \(f_{xx}\) represents the auto-spectra of a time series x(t), \(f_{yy}\) the auto-spectra of y(t), and \(f_{xy}\) the cross-spectra of the two. Intermuscular coherence, the coherence between EMG signals, is a measure of the common presynaptic drive to motor neurons [6].

$$\begin{aligned} {|R_{xy}(\lambda )|}^2 = \frac{{|f_{xy}(\lambda )|}^2}{f_{xx}(\lambda ) f_{yy}(\lambda )} \end{aligned}$$
(1)

Muscle is thought to be driven by a number of different physiological oscillations at varying frequencies (see [23] for a review). The frequencies at which physiological oscillations occur appear to be characteristic of the function of distinct neural circuits and have been categorized into distinct bands such as alpha (8–13 Hz), beta (15–35 Hz), gamma (30–70 Hz), and others. It is generally thought that the beta and low gamma bands originate primarily from the primary motor cortex [23]. The beta band is typically associated with production of static motor tasks and is reduced with movement onset (e.g., [33]). Intermuscular coherence measurements reflect all oscillatory presynaptic drives to lower motoneurons. However, the intermuscular coherence in the beta band has been shown to be qualitatively similar to corticomuscular coherence, both in healthy individuals as well as in individuals with cortical myoclonus [6, 33], supporting the hypothesis that beta-band intermuscular coherence is due to oscillatory drives originating in the motor cortex and is thereby likely influenced by cognitive state. The coherence of neuromuscular oscillations, whether measured through MEG-EMG, EEG-EMG or EMG-EMG, are affected by concurrent cognitive demands differently across the distinct frequency bands, making measures of coherence potentially useful for detecting changes in cognitive workload. For example, although alpha-band coherence is not dominant during motor tasks, it is known to increase when attention is drawn specifically to motor task execution [19, 34]. Beta-band coherence is the dominant signal during synchronized oscillatory discharges of corticospinal or corticobulbar pathways onto lower motor neurons, and is likewise reduced when attention is divided or otherwise drawn away from the motor task at hand [29, 34, 52]. In addition, beta-band coherence is negatively correlated with motor output errors during concurrent cognitive tasks in young adults [28], suggesting that it may be predictive of both cognitive workload and motor performance in younger individuals. In contrast, beta-band coherence is not necessarily correlated with motor performance in the elderly during divided-attention tasks (61–75 yr; [28]), perhaps due to reduced attentional resources [59] and motor coordination [60] with advancing age.

Measures of neck and face intermuscular beta coherence might be particularly well suited to real-time workload assessment given the bilateral symmetry of contraction typical for neck midline and facial muscles. For example, superficial facial muscles involved in speech articulation and neck midline strap muscles typically contract symmetrically across the right and left sides during speech and swallowing [41], providing an opportunity for coherence measurement during these synchronous contractions. Stepp and colleagues found that NIBcoh measured from ventral neck strap muscles (sternohyoid, sternothyroid, and thyrohyoid) can distinguish not only individuals with disordered (strained, hyperfunctional) versus healthy voice production [51], but also healthy individuals when they mimic a strained voice versus natural speech [52]. Vocal hyperfunction is associated with heightened speaking effort and anxiety [22], which may represent increased cognitive demand during speech and thereby reduce NIBcoh, regardless of whether the hyperfunction is pathological or mimicked. Stepp and colleagues [52] also found that NIBcoh decreases when speech is produced under divided attention (cognitive load imposed by rapid, backwards skip-counting), consistent with prior reports of divided attention effects on beta coherence in different motor systems [29, 34]. The goal of the present study was to re-examine the Stepp et al. [52] dataset of NIBcoh during their normal versus divided-attention speaking conditions, with the hypothesis that (1) their finding of reduced NIBcoh under the divided attention condition would be replicated, and (2) the commission of cognitive errors (miss counting) could be detected in the NIBcoh measure as errors occurred during their recordings of running speech (e.g., at a sub-second time resolution). If NIBcoh indeed correlates with cognitive errors, this measure would have important implications for real-time monitoring of cognitive load and performance.

3 Design and Methodology

3.1 Data Collection Procedures

The dataset analyzed in this study consists of simultaneous neck surface EMG (sEMG) and acoustic signals recorded during an earlier investigation by Stepp et al. into the modulation of neck intermuscular beta coherence (NIBcoh) by speech and non-speech behaviors [52]. The signals were recorded under a variety of speech and non-speech task conditions, including a normal speech condition involving both spontaneous and scripted speech and a “divided-attention” condition in which participants were instructed to rapidly skip-count backwards from 100 by 7s. In the present study, these data were reanalyzed (as detailed in Sects. 3.2 and 3.3) to explore the relationships among the neck sEMG signals, the acoustic signal, task demands, and task performance in order to shed more light on the possible relationship between neck sEMG and mental workload. Because this research is ultimately focused on real-time workload assessment, the previous analysis was also extended by considering time-varying measures and not only summary statistics over the entire time series.

Participants. The participants were ten (10) vocally healthy female volunteers (mean age: 25 years, standard deviation: 2.6 years). They reported no complaints related to their voice, and no abnormal pathology of the larynx was observed during standard digital video endoscopy with stroboscopy performed by a certified speech-language pathologist (SLP). Informed consent was obtained from all participants in compliance with the Institutional Review Board of the Massachusetts General Hospital.

Recording Procedures. As reported by Stepp et al. [52], simultaneous neck sEMG and acoustic signals from a lavalier microphone (Sennheiser MKE2-P-K, Wedemark, Germany) were filtered and digitally recorded at 20 kHz with Delsys hardware (Bagnoli Desktop System, Boston, MA) and software (EMGworks 3.3). The neck of each participant was prepared for electrode placement by cleaning the neck surface with an alcohol pad and “peeling” (exfoliating) with tape to reduce electrode-skin impedance, DC voltages, and motion artifacts. Neck sEMG was recorded with two Delsys 3.1 double differential surface electrodes placed on the neck surface, parallel to underlying muscle fibers. Each electrode consisted of three 10-mm silver bars with interbar distances of 10 mm. Double differential electrodes were chosen instead of single differential electrodes in order to increase spatial selectivity and to minimize electrical cross-talk between the two electrodes.

Fig. 1.
figure 1

sEMG electrode placement [52]. Copyright 2011 by the American Speech-Language-Hearing Association. Reprinted with permission.

The two electrodes were placed on the right and left anterior neck surface, as depicted by the schematic in Fig. 1. Electrode 1 was centered approximately 1 cm lateral to the neck midline, as far superior as was possible without impeding the jaw opening, superficial to fibers of the thyrohyoid and sternohyoid muscles, and to some degree the omohyoid. Electrode 2 was centered vertically on the gap between the cricoid and thyroid cartilages of the larynx, and centered 1 cm lateral to the midline contralateral to Electrode 1, superficial to the cricothyroid, sternothyroid, and sternohyoid muscles. However, based on previous examinations of sEMG recordings during pitch glides [51], it is doubtful that cricothyroid contraction contributed much energy to the sEMG due to its relatively deep position. The platysma muscle likely contributed to some degree to the activity recorded at both electrode locations. A ground electrode was placed on the superior aspect of the participant’s left shoulder. The sEMG recordings were pre-amplified and filtered using the Delsys Bagnoli system set to a gain of 1,000, with a bandpass filter with roll-off frequencies of 20 Hz and 450 Hz. All recordings were monitored by the experimenters in real time to ensure signal integrity, and no recordings included movement artifacts.

Tasks. Participants completed eleven separate speech and non-speech tasks, broadly organized into six task conditions. Only two conditions are relevant to the present study, however: a “normal speech” condition and a “divided attention” condition. The normal speech condition consisted of two tasks—a scripted task in which participants read “The Rainbow Passage” [16], and a spontaneous speech task in which participants produced speech spontaneously in response to a variety of available prompts, selected by participants (e.g., “What did you do last weekend?”). The Rainbow Passage was typically produced for 30–45 s. Spontaneous speech samples were approximately 1 min in length. No participant had any problems completing these speech tasks correctly. In order to collect speech under divided attention, participants were given 60 s to count backwards from 100, aloud, as quickly as possible in decrements of 7. These recordings were typically approximately 45 s in length. Participants uniformly reported this task as difficult, but all were able to produce continuous speech during the recording. The primary cognitive demand in this task is a (non-verbal) one imposed by time pressured arithmetic computation. The production of verbal responses, which imposes modest demands for linguistic processing and motor control resources, can be considered a secondary task. Since EMG-EMG coherence measures have been observed to decrease when attention is diverted from the motor task involving the instrumented muscle, the authors hypothesized that NIBcoh would decrease in response to increased demands of the primary, mental arithmetic task. From this perspective, NIBcoh was expected to function as a measure of secondary task attention.

3.2 Data Analysis

The original data consisted of discrete multivariate time series for ten subjects under five speech-related conditions and one non-speech condition, sampled at a rate of 20 kHz. Each time series included EMG variables from the two anterior neck surface recording locations depicted in Fig. 1 and an acoustic variable. From these “raw” time series, the authors derived several dependent EMG and acoustic measures and down-sampled to a rate of approximately 1.83 Hz (i.e., three samples for every 32,768 samples of the “raw” time series). The derived EMG-based variables were: NIBcoh, average magnitude, and gradient. The acoustic variables were: average amplitude, peak amplitude, spectral roll-off, cepstral peak prominence, and sound intensity.

Two versions of the intermuscular beta coherence measure were computed, with and without full-wave rectification (NIBcoh-rect and NIBcoh, respectively) of the EMG signals. It is common practice in EMG-EMG coherence analysis to apply full-wave rectification as an EMG pre-processing step, and this step was performed in the research in which the NIBcoh dataset originated [52]. Other work, however, has called this practice into question, demonstrating that full-wave rectification may impair the identification of common oscillatory inputs to muscle pairs [45]. Therefore, the present study experimented with both rectified and unrectified EMG signals.

The NIBcoh and NIBcoh-rect time series were computed as follows. First, any DC offset was removed from the raw sEMG signals. In the case of the NIBcoh-rect measure, the resulting signals were then full-wave rectified. The signals were segmented by sliding a 16,384-point (\(\approx \)820 ms) rectangular window over each of the resulting EMG-EMG bivariate time series, with 50% overlap. Coherence between the two EMG signals, as defined in Eq. 1, was then estimated within each rectangular window using Welch’s overlapped averaged periodogram method [62], with sliding 8,192-point (\(\approx \)410 ms) Hamming windows, a 8,192-point fast Fourier transform, and 50% overlap (i.e., three Hamming windows per rectangular segment). Finally, the beta-band coherence values were computed by averaging the coherence values over the 15–35 Hz frequency range. Based on the findings of Neto and Christou [45] suggesting that oscillatory activity in the 100–150 Hz frequency band of the unrectified signal may drive variations in the beta band of rectified EMG signals, coherence in the 100–150 Hz band was also computed.

In order to associate the acoustic and sEMG signals with time-varying performance metrics, the acoustic signals for the divided-attention condition were manually annotated with labels indicating the participants’ verbal responses—each interval i, terminated by the completion of a response, was labeled with the number uttered \(l_i\). Performance in the backwards-skip-counting task is characterized by both speed and accuracy. Accuracy is quantified as a function of error commission, with “errors” defined relative to the most recent element that a subject produced (e.g., 80 was regarded as the correct successor to 87, despite 87 not being an element of the correct sequence) to avoid an error-compounding effect. Speed is quantified by the duration of time required for a subject to produce each element of the sequence (i.e., response time). From the manual response annotations, discrete time series were generated capturing the error commission \(\epsilon \) and response time r performance metrics. Specifically, the value of the error commission indicator variable \(\epsilon _i\) for a given labeled interval i reflects whether the response label \(l_i\) for the interval is 7 less than the response label \(l_{i-1}\) for the previous labeled interval. The value of the response time variable for a given labeled interval is simply the duration of the interval. That is:

$$\begin{aligned} \begin{aligned} \epsilon _i&= {\left\{ \begin{array}{ll} 1 &{} \quad \text {if } l_{i} \ne l_{i-1} - 7 \\ 0 &{} \quad \text {otherwise} \end{array}\right. } \\ r_i&= \mathrm {len}(i) \end{aligned} \end{aligned}$$
(2)

The label boundaries did not, in general, align with the \(\approx \)820 ms segment boundaries used to derive down-sampled time series from the raw EMG and acoustic signals. Therefore, a time-weighted average of the down-sampled time series over each labeled interval was computed to assign a single value of each dependent variable (e.g., NIBcoh) to the interval.

3.3 Statistical Analysis

As an initial step in the present analysis, the authors sought to determine how well a key finding in [52] held up in light of Neto and Christou’s [45] criticism of full-wave rectification in EMG-EMG coherence analysis—specifically, Stepp et al. [52] had found a significant effect of task condition on neck intermuscular beta coherence. This result was re-examined by using ANOVA to quantify the effect of condition on four EMG-EMG coherence measures: NIBcoh, NIBcoh-rect, and the corresponding intermuscular coherence measures for the 100–150 Hz frequency band. For consistency with the methods employed in [52], coherence was estimated over each signal as a whole with Welch’s overlapped averaged periodogram method [62], with a sliding 16,384-point Hamming window, 16,384-point FFT, and 50% overlap. The recent analysis differs from that of [52] in that their two-factor ANOVA was replaced with a more conservative one-factor repeated measures ANOVA, which makes weaker independence assumptions. The results of this analysis are reported in Table 1, below.

Since the authors’ primary interest was in evaluating the coherence measure’s utility in workload assessment, a post hoc two-tailed t-test was performed to contrast the cognitively demanding divided-attention condition and the normal speech condition under the assumptions of the ANOVA model, as a means to evaluate the sensitivity of NIBcoh to varying task demands. The relationships between the performance measures for the backwards skip-counting task (defined by Eq. 2) and the measures derived from the EMG and acoustic signals were then investigated. Specifically, associations between response time and each of the EMG/acoustic variables were evaluated by using Student’s t-tests to test the null hypotheses that each Pearson’s product-moment correlation coefficient was 0 (i.e., \(H_0: \rho _{r,v} = 0\), where \(\rho _{r,v}\) denotes the correlation between response time (r) and an EMG/acoustic variable v). Similarly, Student’s two sample t-tests were used to evaluate the hypotheses that the distribution of each EMG/acoustic variable had unequal means for correct versus incorrect responses (i.e., \(H_0: \bar{v}_{\epsilon = 1} = \bar{v}_{\epsilon = 0}\)). The results of these statistical tests are shown in Tables 2 and 3, in Sect. 4 below, along with the estimated correlation coefficients and differences in means. The t-tests had 123 degrees of freedom, corresponding to 125 observations (i.e., “responses”) across the 10 participants. The reported p-values are not adjusted for multiple comparisons, since any such adjustment could itself be misleading due to correlations among several of the variables. In order to statistically control for substantial variability across subjects, both in terms of performance on the skip-counting task and in terms of the EMG/acoustic measures, the EMG/acoustic variables were normalized before computing t-values and correlation coefficients. Specifically, given a value x of a variable for subject s and the within-subject sample mean \(\mu _s\) and standard deviation \(\sigma _s\) of that variable over the normal speech condition, the standardized value z(x) was computed using Eq. 3. The statistical tests thus reflect the “effects” of performance on the other measures relative to each subject’s“baseline” from the normal speech condition.

$$\begin{aligned} z(x) = \frac{x - \mu _s}{\sigma _s} \end{aligned}$$
(3)

The analysis of the relationships between performance and the EMG/acoustic variables is motivated by the hypothesis that within-subject variations in performance on the skip-counting task result, at least in part, from variations in the difficulty of the task. Given the nature of the task, within-subject variations in cognitive demand might reasonably be expected to be quite small. However, some such variation may arise from differences in the arithmetic problems that each participant encountered while completing the task, especially in subjects with less developed mental arithmetic skills. Some subjects may find it easier to compute \(100 - 7 = 93\) than to compute \(93 - 7 = 86\), for instance. Whether this was actually the case was examined by comparing mean response times for decrementing numbers with a ones digit of 7, 8, or 9 (which can be computed without regard to the tens digit) versus other numbers (which require “borrowing” from the tens digit). Further, participants were encouraged to count as quickly as possible and to maintain continuous speech during the divided-attention task, which might be expected to lead them to sacrifice accuracy for speed and thereby to experience amplified within-subject variations in cognitive demands. This hypothesis was examined by analyzing the distributions of errors and response times and by comparing mean response times for correct versus incorrect responses within and across subjects. If within-subject variations in performance during the task can be plausibly connected to time-varying cognitive demands, then it is plausible that any observed relationship between performance and a physiological indicator can be interpreted as evidence of a possible relationship between that indicator and cognitive workload (i.e., that task performance acts as a proxy for cognitive workload within the context of this task).

4 Results and Evaluation

Replication analysis (Table 1) confirmed the finding in [52] that the experimental conditions had a significant effect on neck intermuscular beta coherence, despite the use of a more conservative statistical test than that employed by Stepp et al. The results also lend credence to the concerns of [45] regarding the common use of full-wave rectification in EMG-EMG coherence analysis. Rectifying the EMG signals prior to estimating coherence appears to dilute the estimated difference in coherence across conditions, suggesting that full-wave rectification may result in a harmful loss of information—note the difference between the effect sizes and p-values between the unrectified and full-wave rectified beta-band EMG-EMG coherence signals. No significant effect of the task conditions on neck intermuscular coherence in the 100–150 Hz frequency band was found.

Table 1. Summary of replication analysis. Effect of task condition on EMG-EMG coherence measures.

A post hoc comparison of the normal speech and divided-attention conditions revealed a significant difference between the two (\(p < 0.001\)). The linear model corresponding to the repeated measures ANOVA implied that NIBcoh was lower by an average of 0.0956 in the divided-attention condition, with a standard error of 0.0290. This difference may indicate sensitivity of NIBcoh to the change in cognitive demands between the two conditions. Statistical analysis of the performance metrics for the backwards skip-counting task did seem to support the use of these metrics as proxies for cognitive workload within the context of this task. First, it was found that participants tended to compute the successors for numbers with a ones digit of 7, 8, or 9 faster on average, with a mean time of 1.9 s, than for other numbers (3.5 s). This result is consistent with the hypothesis that cognitive demands vary between these two conditions and that the varying cognitive demands are reflected in task performance. Additionally, the data are consistent with the expectation that subjects sacrificed accuracy for speed—time pressure that would tend to amplify the effects of within-subject variations in cognitive demands on accuracy. A full 36% (45 out of 125) of the subjects’ responses were erroneous, despite the presumed simplicity of the task. The mean response time was 3.1 s with a standard deviation of 2.8, and the response time distribution was heavily left-skewed with a median response time of only 2.1 s. Furthermore, longer response times were strongly associated with incorrect responses both between and within most subjects, indicating that participants took longer to respond when they were struggling but that delayed responses did not generally result in greater accuracy. Analysis of cross-subject variation suggests that standardization of the EMG and acoustic variables, using Eq. 3 as described in the previous section, was justified. Subjects were found to vary substantially, both in terms of their performance on the skip-counting task (the total number of errors committed by each subject ranged from 0 to 10, with a mean of 5 and a median of 4) and on the EMG/acoustic measures (e.g., the mean within-subject standard deviation of NIBcoh was 0.104, while the overall mean was 0.446, overall standard deviation was 0.116, and standard deviation of within-subject means was 0.0497).

Table 2 shows the results of the t-tests comparing the distributions of the standardized EMG and acoustic variables across correct and incorrect responses. NIBcoh, EMG magnitude, sound intensity, average sound amplitude, peak amplitude, and cepstral peak prominence all exhibited significant (\(\alpha = 0.05\)) differences in estimated means across correct and incorrect responses. The NIBcoh-rect, EMG gradient, and acoustic spectral roll-off measures showed no significant difference across correct and incorrect responses. NIBcoh associated with incorrect responses was lower than NIBcoh associated with correct responses, as may be expected if workload or simply reduced attention to speech reduces intermuscular coherence of oscillatory drives in speech-related muscles during speech. The results for several acoustic measures frequently used in voice stress analysis also accord with expectations.

Table 2. Difference in mean standardized features values for correct vs. incorrect responses.

Correlations between response time and the EMG/acoustic variables are shown in Table 3. NIBcoh, peak acoustic amplitude, sound intensity, and cepstral peak prominence exhibited significant (\(\alpha = 0.05\)) associations with response time.

Table 3. Correlations between response time and the standardized EMG and acoustic features.

4.1 Caveats and Limitations

While the results are promising, several caveats must be acknowledged and the exploratory nature of the analysis must be stressed. Because the original experiment in which the data were collected was designed to shed light on the modulation of NIBcoh by speech and non-speech behaviors and not to explore the sensitivity of NIBcoh to cognitive demands, the recent reanalysis was necessarily ad hoc. The findings should therefore be considered only suggestive rather than conclusive—robust conclusions will require well controlled experiments in which cognitive demands are manipulated directly (Sect. 5 briefly describes such an experiment, which the authors plan to conduct in the near future).

The small sample size of only 10 participants and the primary focus on the (single) backwards skip-counting task further limit the generalizability of the results. In particular, although the task did require mental arithmetic, the verbal response format makes it impossible to determine from this study whether the NIBcoh measure may offer any insight into cognitive workload in non-speech-involving tasks. Additionally, it was found that the statistical hypothesis tests were quite sensitive to outliers—for instance, the omission of one subject in particular, who rapidly produced the entire sequence without errors, changes the p-value for the t-test comparing mean NIBcoh in correct versus incorrect responses from less than 0.006 to 0.074. This lack of robustness underscores the limits of this reanalysis and justifies cautious optimism in interpreting the results.

Finally, it must be noted that several of the EMG and acoustic measures were themselves somewhat correlated (e.g., NIBcoh and peak acoustic amplitude), and the design of the original experiment makes it impossible to infer the causal factors that underlie these correlations. Were NIBcoh and peak acoustic amplitude correlated because of a mutual causal relationship to cognitive demands / workload? It seems intuitively likely that the effect on peak acoustic amplitude is an artifact of the experimental conditions, rather than an indication of any general utility in predicting workload or error commission. Bursts of nervous laughter accompanying high workload might, for instance, contribute to high peak amplitudes in the specific conditions of this experiment, but one would not expect to find the same effect in other settings. Could such an artifact also explain lower intermuscular beta coherence associated with erroneous responses? Questions such as these cannot be addressed adequately with the existing data.

5 Conclusions and Future Work

The data reanalysis reported herein offers some evidence bearing on the utility of neck surface EMG for detecting cognitive strain in real time. Specifically, the analysis demonstrates that time-varying EMG-derived measures, with a sub-second temporal resolution, are correlated with error commission and response time in a backwards skip-counting task. Although the task exhibits only mild variations in task difficulty over the sub-problems that comprise the task, the analysis indicates that a time-varying NIBcoh measure was lower by about half a standard deviation on average during intervals in which subjects produced incorrect responses. Further research is necessary to confirm the effect in a well controlled context, determine whether it is due to a causal relationship with error commission, cognitive demands, and/or workload, investigate whether the effect is limited to tasks involving speech, and generalize the work to other EMG sensing locations on the face/neck surface where other physiological responses to variations in cognitive workload may be detected. A planned study, commencing in 2018, will establish more conclusively whether intermuscular beta coherence or other measures derived from face and neck sEMG signals are sensitive to cognitive task demands by recording these, and other, psychophysiological signals while participants complete tasks with varying levels of difficulty in the NASA Multi-Attribute Task Battery (MATB) [49]. The research will investigate multiple EMG sensing locations on the face and neck surface, relating to muscles involved in facial expression, mastication/jaw clenching, speech articulation, and voice production, and a psychometric analysis will establish relationships to more conventional workload indicators (including subjective workload as measured by the NASA Task Load Index administered within the MATB).

The experiments will employ a novel protocol designed to establish whether a perceived risk of aversive consequences affects the measured psychophysiological responses to cognitive task demands. Specifically, after half of the task blocks, identified to participants before and during each such block, a series of mildly noxious electrical stimuli will be delivered to participants, with the number of stimuli ostensibly associated with task performance—but actually determined by the (manipulated) level of task demand in the block. The protocol will thus test how the presence of perceived risks mediates the relationship between task demands and psychophysiological responses. It is our hope that the technique of employing aversive consequences in order to elevate physiological responses to workload will resolve two key challenges for workload researchers—namely, the risk that muted responses may lead to Type I errors in laboratory studies and the problem that laboratory models may not transfer well into operational environments.