Introduction

Research background

In many real-world complex human–machine cooperative or synergistic systems, the performance of the automated components has already been satisfactory and thus the overall performance as well as the task-execution effectiveness of the whole systems would be largely determined by the effectiveness of the cooperation between human operator and the automated systems, in which the operator always plays a crucial role for system performance. With increasing development and maturity of automation technologies, a variety of automatic control systems has become increasingly sophisticated and advanced and found many applications in virtually all areas. Unfortunately, most automated systems at the current technological level have not been equipped with the judgment and reasoning capacities, knowledge and experience of humans, which would often result in less desirable control performance and even the so-called operator effectiveness issue (Wilson and Fisher 1991). The decrement or impairment in Operator Functional State (OFS) would be prone or susceptible to operational errors, risks or accidents, whose consequences may be extremely severe or catastrophic particularly in the safety–critical systems. The human operators unavoidably exhibit fatigue and mental or psychological overload. If we could not make accurate prediction on those operator states, undesirable or unwanted effects would be brought on the safety and performance of those systems involving complex forms of human–machine cooperation or synergy. For instance, in the sector of nuclear power plants the operator errors have accounted for 50–70 % of all accidents, while the pilot misjudgment caused nearly 80 % of all flight accidents or crashes.

In the safety–critical human–machine cooperative systems in such fields as nuclear power plants, aviation and aerospace industry, the impact of the possible decrement in operator work performance on the reliability and safety of complex human–machine systems has drawn world-wide attention, which necessitated such scientific disciplines as human factors engineering. To cope with this problem, a viable solution is to dynamically adjust (or control) the task (or function) allocation between human and machine agents based on the estimated OFS. For example, with the OFS pattern recognition methods we can design an adaptive aiding system either to remind the operator or to reduce the task load during the period of excessive mental workload, with an aim to enhance the overall system performance (Hockey 2003). The OFS refers to the task-completion performance of the operator under the current task environment and is basically an estimation of the operator’s work performance, which is related not only to the external environment and task difficulty but to the psychophysiological state of the operator himself. A key problem involved is how to accurately recognize (or identify) the OFS based on measured data. In practical applications, the computational efficiency of the OFS pattern recognition also needs to be considered.

A literature review on OFS recognition and estimation

In such domains as accident analysis and system safety assessment, some qualitative studies on the OFS analysis have appeared, in which the subjective ratings of the operator are widely utilized. It has been shown that different physiological measures may reflect different aspects (or dimensions) of the OFS (Gao et al. 2011; Werner 2012; Zhang and Lee 2012). Due to their high bandwidth and fast and reliable responses, the physiological measures of the autonomic and central nervous systems are major OFS features (Hockey et al. 1998).

As an example, heart rate (HR) was found to be closely linked to the overall task engagement of the operator and the blink rate implies the visual demand imposed on the operator (Wilson and Fisher 1991, 1995; Gevins et al. 1998; Russell and Wilson 1998; Wilson and Eggemerier 1991). It was found in Fahrenberg and Wientjes (2000) that cardiovascular indices (in particular HR and HRV) respond reliably to the changes in workload and mental effort, especially in the operational settings involving problem solving (Tattersall and Hockey 1995). The current HRV analysis uses the spectral analysis of the cardiac interval signals to separate the effects mental or psychological effort on different components, although the concomitant measurement of the respiration is necessary to find the artifacts caused by the respiration (Tattersall and Hockey 1995).

Analogous to the HRV, it is usually necessary to make EEG spectral analysis in order to reveal the effects of mental state (Chen et al. 2008; Pockett et al. 2007). The EEG spectrum is typically divided into 4 frequency bands: delta (1–3 Hz), theta (4–7 Hz), alpha (8–12 Hz) and beta (13–30 Hz). The most sensitive index to the overall vigilance (or alertness) is based on the ratio between the higher and lower frequency power (Helton et al. 2010; Parasuraman et al. 2009; Lin et al. 2006, 2007, 2009). For example, the Langley group in NASA proposed an engagement index (EI) [beta/(alpha + theta)] based on the relative predominance of the higher frequency brain activity and they have successfully used the change in EI to switch between manual and computer-based aiding modes of laboratory tracking and vigilance tasks (Pope et al. 1995; Scerbo et al. 2003). EI has potential for use in adaptive automation (AA). However, it has several limitations and may be of limited value for application to the safety–critical systems, in which the relationship between operator and tasks is more complicated and dynamical. First, it is based on the principle of stabilizing the mental engagement on a moderate level during the whole task experiment. The logic of negative feedback in the simple task used is strong such that the operator assumes control when disengaged or abandon the control task when highly strained. Nevertheless, this is not suitable for the real-world tasks. In particular, this is of limited application value for the safety–critical systems since in these systems the relationship between operator and tasks is more complicated and often vary dynamically. In these cases, AA should allow the operator to have a period of rest from continuous executive (working-memory-based) decision-making, but not interrupt him unnecessarily (when he is engaged with the task, but does not show any sign of deleterious effect of stress or strain). Another possible limitation is that the EI index seems to be the measure of generalized vigilance (Jung et al. 1997; Kristjansson et al. 2009; Pattyn et al. 2008), instead of the engagement in the sense of task engagement orientation (Hockey et al. 2009). On the contrary, the EEG-based task load index (TLI) was found to be very sensitive to the mental stress of the operator (Nickel et al. 2005, 2006).

In some studies (e.g., Comstock and Arnegard 1992; Pope et al. 1995; Freedman et al. 1999), the physiological-measures-based adaptive aiding systems were developed in laboratory settings and the results showed that the system can adjust task allocation to improve the level of operator’s task engagement and thus to enhance the system performance. Apart from the electrophysiological measures, the operator’s task performance, which can be derived based on how effectively the operator has accomplished the tasks, can also be used to evaluate the OFS. For instance, the criteria of performance data can be the percentage of duration when the manually controlled system variables are within the target ranges in process control operations, or degrees of deviation from sliding slope during landing phase of an aircraft. In order to assess the OFS accurately enough, the electrophysiological and performance data can also be combined. Wilson (1999) used artificial neural network (ANN) technique to classify the OFS with 3 levels of task difficulty (i.e., low, middle, and high) and achieved a correct classification rate of 86.8 % on the test data. This work has shown the efficacy of hybrid physiological and performance measures for the OFS classification problem. However, the disadvantages of the work mainly include: (1) the real-time performance of the OFS assessment algorithm was not considered; (2) the classification method developed is fully deterministic; and (3) No physiological interpretations of the classification results are possible (i.e., the ANN method is a kind of opaque black-box for the users).

From the above short literature survey, it can be observed that most work in relation to observed OFS data were focused on single type of physiological data, such as electroencephalographic (EEG), electrocardiographic (ECG), electrooculargraphic (EOG), and electromusculargraphic (EMG). The existing OFS work has such common deficiencies as either too simple task used or too few physiological variables recorded (Wilson and Fisher 1995; Gevins et al. 1998; Russell and Wilson 1998; Gevins and Smith 1999; Nikolaev et al. 1998). While most previous work relied on either performance or psychophysiological measures, a hybrid data approach is seen as most appropriate and promising for executive control processes underlying the regulation of human performance in complex dynamical task environments. As the executive control processes [i.e., the cognitive processes such as flexible use of attentional and planning strategies, problem-solving, reasoning and decision making (Royall et al. 2002)] are mediated by the prefrontal cortex, measures of central nervous system, such as frontal midline theta activity and TLI (Gevins and Smith 2003; Gevins et al. 1997), have been found to better reflect load manipulations in complex task environments (Smith et al. 2001; Lorenz and Parasuraman 2003). The existing OFS assessment techniques are also not sufficiently accurate due to a lack of the data-based OFS temporal analysis and hence have met difficulties for real-world applications under operational settings.

Objectives and overview of the present work

The primary goal of the present study is to establish the proper experimental task parameters and to develop the OFS data analysis, feature extraction and pattern classification methods. We used the automation-enhanced cabin air management system (AUTO-CAMS) developed originally by Hockey et al. (1998) and later modified in Lorenz (2002) to simultaneously record multiple types of psychophysiological data (including EEG and ECG) as well as the operator performance data under operational risks and cognitive stress which were induced by stepwise increment of task load imposed on the operator. A total of 22 experimental sessions on 11 healthy male subjects (each participated in 2 sessions with exactly the same experimental procedures) was performed in laboratory settings in order to obtain the data-based evidence of detecting the vulnerable operator state. Differing from the tracking and vigilance task used by NASA/Langley group, AUTO-CAMS makes more executive demands on operator’s mental or psychological resources. In order to induce high-risk operator state, we adopt a novel cyclical loading method, similar to the strain testing method used in the field of mechanical engineering. The workload is heightened in a stepwise fashion until the compensatory limit is reached and the primary performance starts to break down, then the workload is gradually reduced until the performance is recovered to the normal range. This experimental design enables use to detect the effect of workload increment (loading) as well as the hysteresis effect of unloading phase (caused by accumulative fatigue).

The most important OFS features were selected. In this regard, we adopted the EEG-based task load index (TLI) proposed by Gevins and his group (Gevins and Smith 1999, 2003; Smith et al. 2001). TLI defined as the ratio of theta activity at the frontal midline region to alpha at parietal sites [theta/alpha]. Whereas theta (5–8 Hz) at central or parietal sites is typically a marker of drowsiness, its occurrence in frontal midline sites is not known to be correlated with the executive control activity and effective use of working memory (Gevins and Smith 2003; Scerbo et al. 2003; Schacter 1977). The reduction of the theta power at the frontal midline region may reflect the strategic disengagement from the usual executive demand of task management (i.e., AUTO-CAMS) (Lorenz 2002; Lorenz and Parasuraman 2003). Since frontal theta activity is generated by brain regions that are strongly implicated in executive control (Miller and Cohen 2001; Onton et al. 2005), TLI is a saliently useful candidate marker for mental strain. Therefore, it would be expected that the theta activity (and TLI) somehow increases with load. While performance may be well protected under such conditions, psychophysiological features are expected to reflect the costs of sustained mental effort. Given the basis of executive activity in frontal brain areas, this expectation should be strongly supported by the variations in TLI, where reduced use of executive control under fatigued state may generate lower level of theta activity.

As the OFS pattern classification problem is fuzzy in nature (i.e., the practical OFS at certain time instant may fall within a few different classes or categories), a proper tool to deal with this possibility is fuzzy logic (FL) theory (Zadeh 1973). In the past decades the FL-based methods have been successfully applied to a multitude of engineering and biomedical fields. Nevertheless its application in quantitative OFS recognition and prediction has been still rare except for a few work, such as Parsuraman et al. (2000), Zhang et al. (2007), (2008a), and Qin and Zhang (2012). Considering the inherent fuzziness and uncertainty of OFS assessment (either modeling or classification), naturally an effective tool for addressing the problem is fuzzy systems theory. In this work the fuzzy c-means (FCM) algorithm was employed to classify the OFS time-series data and both the instantaneous OFS class label and maximum degree of membership of that class were given. In comparison with the ANN method, the advantages of the fuzzy OFS recognition method proposed in this work are as follows: (1) the fuzzy methodology allows for the overlapped classes to which certain momentary OFS belongs, which naturally accommodates the fuzziness and uncertainty characteristics of the OFS pattern classification problem under our study. (2) In addition to the specific OFS class labels, the fuzzy method also produces the membership grades (in the interval [0, 1]) which can be considered as a confidence measure or estimate of the OFS category decision results.

Experimental data acquisition and analysis

The process control experiments were performed to make OFS pattern recognition based on the measured heterogeneous (or hybrid) data from multiple sources. The AUTO-CAMS software was utilized to simulate a highly complex safety–critical process control task environment. Under different task-load conditions, the operator was required to manually control different number of system variables, which overcome the disadvantage of too simple tasks usually adopted in previous OFS experimental studies. Each experimental session consists of 9 task-load conditions, each requiring different number of control subsystems to be manually controlled. The recorded data include EEG, ECG, performance and subjective data. Then several effective OFS-related EEG and ECG temporal features are extracted. All features are normalized to the interval [0, 1] before classification. In the following the experimental task environment and design, data acquisition and preprocessing, and feature extraction methods will be introduced in detail.

Subjects

11 healthy male graduate students (A, B, C, D, E, F, G, H, J, K, L respectively), aged between 23 and 29 years old, voluntarily participated in our experiments. All subjects have normal visual acuity or normal one after correction, have no diseases, and did not take any medications which may influence their task performance. The subject was informed that the experiment is concerned with the OFS test during simulated process control. Prior to the experiment, each subject had took part in a long-term (>10 h) training and testing program, consisting of at least 3 sessions, to ensure his familiarity with the experimental environment and the manual control tasks. The training sessions were evaluated based on the level (performance) of training on AUTO-CAMS and relevant process control expertise. After those, each subject underwent 2 experimental sessions, each arranged at the same time period of two different days in order to avoid the effects of circadian rhythms.

Task environment and measurement equipments

The process control software AUTO-CAMS was run on a PC and the subject was asked to monitor (or supervise) in real time the system operation on a 19 inch monitor (with a distance of about 50 cm) and to manually control the system by using keyboard or mouse. The subjective ratings and performance parameters were recorded on the process control PC, while the psychophysiological data recorded by another experimenter PC. The Activ Two System (BioSemi, The Netherlands) was used to continuously record 45-channel psychophysiological data, including ECG (Nehb’s triangle), respiration (nosal/mouth thermistor for 3 point measurement), EMG (muscle activity from the dominant forearms), EOG (vertical and horizontal electrical ocular activity), and EEG (electrode cap, 32 sites in modified 10–20 system with FC5, T7, T8 and FC6 replaced by FPz, AFz, CPz, and POz respectively) with the reference electrodes placed at the left and right mastoids. All psychophysiological signals were sampled at a rate of 2,048 Hz. The ActiView interface (BioSemi, The Netherlands) was used to monitor signal preprocessing, mark specific events (disturbances or artefacts), and to store the psychophysiological data in the BDF format of BioSemi-Data-Files on the experimenter PC.

Experimental tasks

Our experiments used AUTO-CAMS, shown in Fig. 1, to simulate with high fidelity a highly complex and safety–critical process control environment, which overcomes a major weakness of most previous studies, namely too simple task used. The AUTO-CAMS was initially designed for the European Space Agency (ESA) to investigate the stressors of the space crewmembers under highly isolated and confined environment (Hockey et al. 1998). The primary task of the operator was to manage a semi-automatic system in order to control the atmospheric environment (such as air quality, temperature and air pressure, etc.) of a closed system, such as a space capsule or submersible. The manual control task is to regulate the five key variables (i.e., temperature, humidity, pressure, oxygen concentration and CO2 concentration) within normal ranges. Once some automatic controller malfunctions, the operator will have to assume manual control to maintain the normal operation of the system.

Fig. 1
figure 1

The schematic functional configuration of the Auto-CAMS system

In addition to the primary task, other two secondary tasks, namely tank level recording (TLR) and alarm acknowledgement response, were also designed. The TLR is basically a prospective memory task (i.e., remembering to perform an action at a given time), which requires the operator to make precise electronic recording of the current level of the oxygen tank every minute. During the experimental session, the system would issue an alarm signal irregularly to the operator. After receiving the alarm, the operator was required to give a reaction (i.e., by clicking the mouse) as quickly as possible. Thus this secondary task provides a measure of the alarm reaction time (ART). The reaction time is defined as the duration from the presentation of alarm signal to the completion of reaction. The shorter the reaction time, the better the operator state; and vice versa. Furthermore, right before the onset of each load condition, the operator was asked to subjectively report several subjective measures (such as fatigue, mental effort, and anxiety) on a set of 1-D rating scales in 20 s.

Experimental procedure

Each subject participated in 2 experimental sessions which were arranged during the same period of time to avoid the unwanted effect of circadian rhythms on two different days. Each session was divided into two phases: loading phase with stepwise (graded) increment of task-load followed by an unloading phase with gradual reduction in task-load, and consisted of 9 task-load conditions (each lasting for 15 min and hence each session lasts for 9 × 15 = 135 min). The number of variables to be manually controlled was varied across load conditions (level 1, 2, 3, 4, 5, 4, 3, 2, and 1) to simulate different levels of task load (or difficulty). For example, the workload during the loading phase (C1–C5) was stepwise (monotonically) increased according to changes in manual control load (i.e., 1, 2, 3, 4, 5), while the workload during the unloading phase (C6–C9) was stepwise (monotonically) reduced according to changes in manual control load (i.e., 4, 3, 2, 1). The variations in the level of manual control load are likely due to the practical occurrence of fault, failure or malfunction in some of the five automatic key parameters (O2 flow, nitrogen flow, CO2, humidity, and temperature) controller.

For each session, this ‘cyclical loading’ method, inspired by the stress–strain testing method commonly used in the field of mechanical engineering and successfully applied to detect compensatory control strategies using subjective and performance measures (Conway 2006; Hockey 2005), would result in a loading phase (the first 5 task-load conditions, C1–C5) and an unloading phase (the following 4 task-load conditions, C6–C9) as shown schematically in Fig. 2. The purpose of the cyclical changes in manual control load is to induce the operator’s performance breakdown with an aim to detect when it occurs. The use of the cyclical loading procedure also allows us to investigate how the psychophysiological features respond to the accumulation of fatigued state. It is assumed that the psychophysiological responses under mental effort and stress during unloading phase will be affected by fatigue accumulated during the loading phase (Conway 2005; Hockey 2005).

Fig. 2
figure 2

The schematic of cyclical changes in manual control load resulted from the cyclical loading scheme during a session of simulated process control experiment of about 2 h, where the y axis stands for NOV (number of manually controlled variables), an indication of the discrete (graded) level of workload in each task-load condition and there are 2 phases: loading (load conditions C1 → C5) and unloading (C5 → C9)

When each session of psychophysiological data acquisition experiment started, the subject was asked to move eyeball first and then close his eyes while calmly sitting on the chair for about 5 min to acquire the range of the individual EOG activity. The process control operation would start right after the operator completed the health questionnaire and subjective ratings. After the EOG baseline was established, the sequence of cyclical loading was presented to the operator. Each task-load condition lasted for 15 min and was interrupted by completing subjective ratings for about 20 s. The performance data was simultaneously recorded during the process control operations.

Data acquisition and preprocessing

In each experimental session (lasting 135 min), the recorded data include the EEG, ECG and the operator performance data. OFS classification was based on data segments of 9 × 14 min, among which the first 5 manual control load conditions had incremental levels (level 1, 2, 3, 4, and 5) followed by 4 load conditions with decremental levels (level 4, 3, 2, and 1). The number associated to the taskload level denotes the number of variables to be manually controlled by the operator in a certain taskload condition.

AUTO-CAMS data acquisition

The levels of key performance parameters were sampled at 1 Hz by AUTO-CAMS, logged into a data file and classified as system parameters either within or out of normal range [TIR] according to the AUTO-CAMS simulation software and the requirements of cabin air quality. During the experiment, we also recorded in real time the values of the five controlled variables (i.e., temperature, humidity, pressure, oxygen concentration and CO2 concentration) which will be used to calculate the performance measure TIR, the percentage of time when any of the five key controlled parameters was in normal range. Primary task performance parameters were extracted from the log files and analyzed offline using special-purpose softwares to compute the corresponding scores of the TIR index.

Psychophysiological data acquisition

The Active Two System (BioSemi, The Netherlands) was used to continuously record psychophysiological (EEG, ECG and EOG) signals. The EEG electrodes were placed according to the international standard 10–20 system (Jasper 1958) including 32 scalp sites shown in Fig. 3.

Fig. 3
figure 3

International standard 10–20 EEG electrode placement system

Four electrodes in the original 10–20 system, namely FC5, T7, T8, and FC6, were replaced by FPz, AFz, CPz, and POz in our modified version. The EEG reference was at the left and right mastoids. The multichannel electrophysiological data (including EEG and ECG activities) were sampled at a rate of 2,048 Hz and controlled via ActiView 5.33 software (BioSemi, The Netherlands), which enabled the experimenter to monitor signal acquisition, to save psychophysiological and marker data in Biosemi-data-files (*.BDF) and to allow for setting up data transmission via TCP/IP.

Psychophysiological data were analyzed by using Brain Vision Analyzer (Brain Products, Germany). We used the LabVIEW (NI, USA) virtual instruments to automatically compute the instantaneous heart rate (HR) and the 0.1 Hz component of the heart rate variability (HRV) based on the procedure given in Nickel and Nachreiner (2003). The psychophysiological measures were further processed for statistical analysis by using MS Excel program (Microsoft, USA). The EEG spectral power in three EEG frequency bands, namely theta (4–8 Hz), alpha (8–13 Hz), and beta (13–22 Hz), were calculated at each selected spatial site on the scalp of the subject. The ECG signals were recorded at the Nehb’s triangle and segmented every 10 s. After baseline correction, the R peaks in the ECG signal were triggered by a level indicator (1 mV) and marked with the time of their appearance. The artifact correction was also performed by visually examining the detected R waves.

After down-sampling the original very large time-series data, there is an EEG or HR data sample every minute as we want to recognize the operator state from moment to moment (with 1 min level of temporal resolution). The vertical and horizontal EOGs were used for removal of ocular artifacts from the EEG recordings. For EEG and electrooculographic (EOG) activities, the passing band of the band-pass filter was preset between 1.6 and 55 Hz. Based on the segmented data (one segment every 2 s), the EOG correction, baseline correction and automatic artifact detection were carried out sequentially. Then the FFT (with 10 % Hamming window) was performed on each EEG segment to obtain its power spectrum with a spectral resolution of 0.5 Hz. Therefore, based on a combination of psychophysiological and performance features, we can make more accurate OFS classification by making full use of the physiological response and task performance data. Due to the period of time (about 20 s) during which the operator was asked to provide subjective ratings by filling in the corresponding rating scales, the data for the first and last 30 s. under each task-load condition of 15 min were removed to constitute the data of 14 min for each condition. In this way, each session finally contains 9 × 14 = 126 data points.

OFS feature extraction

The OFS analysis framework predicts that, in combination with performance measures, TLI will provide the influential feature of mental effort/fatigue with increased workload. Along with measures of primary task performance, all the following three candidate features of mental workload are considered. More specifically, the EEG and HRV indices, which characterize the generalized cortical activation (vigilance), whose most sensitive markers are based on rations between the power in higher and lower EEG frequency bands [for example, in Pope et al. (1995)], and specific frontal executive control (mental effort) and the cardiovascular activity, respectively, will be derived.

Based on the preprocessed ECG data, we can obtain the R peak interval between successive heart beats, then HRV data every second through linear regression technique. Much information about the equilibrium of the nervous system can be obtained by examining the HR data which has found many applications in medicine and mental workload assessment. In terms of autonomic markers of effort, HRV has been found to respond reliably to changes in workload and mental effort (Mulder et al. 2000), especially in operational settings where executive problem-solving is involved (Tattersall and Hockey 1995; de Waard 1996; Izso and Lang 2000). As with EEG measures, current HRV analysis often makes use of spectral analysis of the cardiac interval signal to separate effects of mental effort on different components, with HRV1 (the mid-frequency (0.1 Hz) band) strongly linked with effort manipulations. Despite its success in a range of studies, HRV may have limited value of being used as an index of mental stress (or strain) under our taskload environment with cyclical loading procedure. In Nickel and Nachreiner (2003), the authors found that HRV discriminated well between loaded (task) and unloaded (corresponding to resting or background state of the operator) conditions, but not between tasks with different levels of difficulty. If HRV1 is to be effective as a marker of mental effort, stress or strain, it is expected that it shows a progressive reduction with the increment of workload. Here the HR is taken as the average of heart rate every minute, while HRV2 is defined as the ratio between the standard deviation and mean value of HR data segment of 1 min (Wilson 1999; Zhang et al. 2008b):

$$ HRV_{2} = \frac{{\sigma_{HRV} }}{{\mu_{HRV} }} $$
(1)

where σ HRV and HRV are the standard deviation and mean of HR data, respectively.

Based on the spectral analysis method given in Zhang et al. (2008a), the index LF/HF is defined and calculated as the ratio between the lower (0.03–0.15 Hz) and higher frequency (0.18–0.4 Hz) power.

The EEG-based TLI may be defined by using different EEG sites and frequency bands (Gevins et al. 1997, 1998; Smith et al. 2001; Gevins and Smith 2003). Here we calculate the two indices, TLI1 and TLI2, as follows:

$$\left\{ {\begin{array}{*{20}l} {TLI_{1} = \frac{{P_{{\theta ,Fz}} }}{{P_{{\alpha ,Pz}} }}} \\ {TLI_{2} = \frac{{P_{{\theta ,AFz}} }}{{P_{{\alpha ,CPz}} + P_{{\alpha ,POz}} }}} \\ \end{array} } \right. $$
(2)

where Pθ and Pα represent the theta and alpha power, respectively. The EEG frequency bands are defined as: θ, Fz: 6–7 Hz; α, Pz: 10–12 Hz; θ, AFz: 5–7 Hz; α, CPz: 8–10.5 Hz; and α, POz: 10–13.5 Hz, where Fz (frontal), Pz (parietal), AFz, CPz and POz are the five EEG sites in the modified 10–20 system.

Another OFS assessment index is the operator performance when he is carrying out the main tasks. The TIR (Time-In-Range) index (variable), the percentage of the controlled variables within the target range, is derived to measure the momentary primary task performance (i.e., essentially an overall measure of the error rate of the operator-machine system). In the experiment, it is required that the five controlled system variables be maintained within the normal (or target) range cooperatively by the operator (in manual mode of operation) and the control computer (in automatic mode of operation). TIR refers to the percentage of the duration when the five controlled variables fall within the target range within a given period of time (i.e., 1 min). Moreover, another variable NOV (Number Of Variables requiring manual control), which is used to quantify the level of task difficulty and defined as the number of manually controlled system variables, is also considered in the work.

All the above-mentioned candidate features are normalized as follows:

$$ \left\{ {\begin{array}{*{20}l} {z^{\prime} = \frac{{z - z_{{\min }} }}{{L_{z} }}} \\ {L_{z} = z_{{\max }} - z_{{\min }} } \\ \end{array} } \right. $$
(3)

where \( z^{\prime } \in \) [0,1] stands for the normalized data, z the original data, and z max and z min the maximal and minimal value of z respectively.

OFS pattern classification based on FCM algorithm

In this section, the FCM algorithm will be first reviewed and then be applied to the problem of fuzzy OFS classification.

FCM algorithm

The FCM (Fuzzy C-Means) algorithm is a sort of fuzzy-partition-based clustering method based fully on the interrelationship in the data set without requiring the information about the target classes. The basic idea is to maximize the similarity between the objects in a certain cluster while minimizing the inter-cluster similarity. The FCM algorithm is a modified version of the standard c-means algorithm. For the latter, either “belonging to” or “not belonging to” is possible for a certain data clustering outcome, while in addition to the recognized clusters the degrees of membership (in the range of [0, 1]) to which a data point partially belongs to the several clusters are also produced by the former. The closer the membership grade is to 1, the more significant the membership to the given cluster. Note that the summation of the grades of membership to all possible clusters is 1.

Crisp (hard) c-partition and fuzzy (soft) c-partition

Given a dataset \( X = \left\{ {{\mathbf{x}}_{1} ,{\mathbf{x}}_{2} , \ldots ,{\mathbf{x}}_{n} } \right\} \) with \( {\mathbf{x}}_{k} \in {\mathbb{R}}^{p} ,\quad k = 1,2, \ldots ,n \) and let P(X) to be the power set of X (i.e., the set of all subsets of X), then the crisp c-partition of X is a family of sets \( \left\{ {A_{i} \in P\left( X \right)|1 \le i \le c} \right\} \) which satisfies \( \cup_{i = 1}^{c} A_{i} = X \) and \( A_{i} \cap A_{j} = \emptyset \left( {1 \le i \ne j \le c} \right) \). Each A i is a cluster, thus it is said that X is partitioned into c clusters \( \left\{ {A_{1} , \ldots ,A_{c} } \right\} \).

The crisp partition can be described by the characteristic (or membership) function of the element \( {\mathbf{x}}_{k} \) in A i as follows:

$$ u_{ik} = \left\{ \begin{gathered} 1,\quad {\mathbf{x}}_{k} \in A_{i} \hfill \\ 0,\quad {\mathbf{x}}_{k} \notin A_{i} \hfill \\ \end{gathered} \right. $$
(4)

where \( {\mathbf{x}}_{k} \in X \), \( A_{i} \in P\left( X \right) \), \( i = 1,2, \ldots ,c \), \( k = 1,2, \ldots ,n \). Evidently x k belongs to A i if u ik  = 1. As a result, when u ik is given, a unique crisp c-partition of X can be determined, and vice versa. The u ik must satisfy the following three conditions:

$$ u_{ik} \in \left\{ {0,1} \right\},\quad 1 \le i \le c,\,1 \le k \le n $$
(5)
$$ \sum\limits_{i = 1}^{c} {u_{ik} } = 1,\quad \forall k \in \left\{ {1,2, \ldots ,n} \right\} $$
(6)
$$ 0 < \sum\limits_{k = 1}^{n} {u_{ik} } < n,\quad \forall i \in \left\{ {1,2, \ldots ,c} \right\} $$
(7)

.

The Eqs. (5) and (6) imply that any \( {\mathbf{x}}_{k} \in X \) belongs to one and only cluster, while Eq. (7) shows that any A i contains at least 1 and at most n − 1 data points. All the elements \( u_{ik} \left( {1 \le i \le c,\,1 \le k \le n} \right) \) are used to constitute a (c × n) matrix U c×n . Thus the crisp c-partition can be defined in matrix form as follows.

Definition 1

Let \( X = \left\{ {{\mathbf{x}}_{1} ,{\mathbf{x}}_{2} , \ldots ,{\mathbf{x}}_{n} } \right\} \) to be any set and V cn be the set of all real-valued c × n matrix U = [u ik ]. If c is an integer and 2 ≤ c < n, the crisp c-partition of X results in the set:

$$ M_{c} = \left\{ {U \in V_{cn} |{\text{eqn}} . {\text{ (5){-}(7) are valid}}} \right\} $$
(8)

Unfortunately, in many practical classification problems the boundaries between the clusters are not well-defined. Furthermore, the discrete (binary) characteristic function u ik makes it impossible to perform gradient-descent-based optimization. Therefore, it is necessary to introduce the fuzzy c-partition (Bezdek 1981) defined below.

Definition 2

Assume that X, V cn and c are introduced in Def. 1, then the fuzzy c-partition of X results in the set:

$$ M_{fc} = \left\{ {U \in V_{cn} |u_{ik} \in \left[ {0,1} \right],\quad 1 \le i \le c,1 \le k \le n;{\text{ eqn}} . {\text{ (6) is valid}}} \right\} $$
(9)

where u ik represents the degree of membership to the cluster A i .

Crisp c-means and fuzzy c-means algorithms

From the above section it is seen that the essence of the c-means algorithm is to find an optimal partition of the dataset from the set M c (or M fc ). The most common method for measuring the quality of partition is based on a predefined objective function. The most widely-used objective function for the FCM algorithm is defined as the sum of the squared errors:

$$ J_{w} \left( {U,V} \right) = \sum\limits_{k = 1}^{n} {\sum\limits_{i = 1}^{c} {u_{ik} \left\| {{\mathbf{x}}_{k} - {\mathbf{v}}_{i} } \right\|^{2} } } $$
(10)

where \( U = \left[ {u_{ik} } \right] \in M_{c} \) (or M fc ) and \( V = \left( {{\mathbf{v}}_{1} ,{\mathbf{v}}_{2} , \ldots ,{\mathbf{v}}_{c} } \right) \) with \( {\mathbf{v}}_{i} \) being the center of the cluster A i defined by:

$$ {\mathbf{v}}_{i} = \frac{{\sum\nolimits_{k = 1}^{n} {u_{ik} {\mathbf{x}}_{k} } }}{{\sum\nolimits_{k = 1}^{n} {u_{ik} } }} .$$
(11)

Obviously \( {\mathbf{v}}_{i} \) is the average of all data points (in the case of crisp c-partition) or weighted average (in the case of fuzzy c-partition). The task of the FCM algorithm is to find \( U = \left[ {u_{ik} } \right] \in M_{fc} \) and \( V = \left( {{\mathbf{v}}_{1} ,{\mathbf{v}}_{2} , \ldots ,{\mathbf{v}}_{c} } \right),\,{\mathbf{v}}_{i} \in {\mathbb{R}}^{p} \) such that

$$ J_{m} \left( {U,V} \right) = \sum\limits_{k = 1}^{n} {\sum\limits_{i = 1}^{c} {\left( {u_{ik} } \right)^{m} \left\| {{\mathbf{x}}_{k} - {\mathbf{v}}_{i} } \right\|^{2} } } $$
(12)

is minimized, where \( m \in \left( {1,\infty } \right) \) is a weighting exponent.

It can be shown that only when

$$ u_{ik} = \frac{1}{{\sum\nolimits_{j = 1}^{c} {\left( {\frac{{\left\| {{\mathbf{x}}_{k} - {\mathbf{v}}_{i} } \right\|}}{{\left\| {{\mathbf{x}}_{k} - {\mathbf{v}}_{j} } \right\|}}} \right)^{{\frac{2}{m - 1}}} } }},\quad 1 \le i \le c;1 \le k \le n $$
(13)

and

$$ {\mathbf{v}}_{i} = \frac{{\sum\nolimits_{k = 1}^{n} {\left( {u_{ik} } \right)^{m} {\mathbf{x}}_{k} } }}{{\sum\nolimits_{k = 1}^{n} {\left( {u_{ik} } \right)^{m} } }},\quad 1 \le i \le c $$
(14)

\( U = \left[ {u_{ik} } \right] \) and \( V = \left( {{\mathbf{v}}_{1} ,{\mathbf{v}}_{2} , \ldots ,{\mathbf{v}}_{c} } \right) \) locally minimize the objective function \( J_{m} \left( {U,V} \right) \).

FCM computational procedure

The flowchart of FCM algorithm is shown in Fig. 4, which includes the computational steps:

Fig. 4
figure 4

Flowchart of the FCM algorithm

Step 1 Given the data set \( X = \left\{ {{\mathbf{x}}_{1} ,{\mathbf{x}}_{2} \ldots ,{\mathbf{x}}_{n} } \right\},\,{\mathbf{x}}_{k} \in {\mathbb{R}}^{p} \). Preset the number of possible clusters \( c \in \left\{ {2,3, \ldots ,n - 1} \right\} \) and weighting exponent \( m \in \left( {1,\infty } \right) \) and initialize all the elements of the membership degree matrix \( U^{{^{(0)} }} \in M_{fc} \) randomly, where M fc is the fuzzy c-partitioned set of X.

Step 2 At the l-th iteration, compute the c-means cluster center (vector) by:

$$ {\mathbf{v}}_{i}^{\left( l \right)} = \frac{{\sum\nolimits_{k = 1}^{n} {\left( {u_{ik}^{\left( l \right)} } \right)^{m} {\mathbf{x}}_{k} } }}{{\sum\nolimits_{k = 1}^{n} {\left( {u_{ik}^{\left( l \right)} } \right)^{m} } }},\quad 1 \le i \le c;l = 0,1 \cdots $$
(15)

Step 3 Update \( U^{\left( l \right)} = \left[ {u_{ik}^{\left( l \right)} } \right] \) to \( U^{{\left( {l + 1} \right)}} = \left[ {u_{ik}^{{\left( {l + 1} \right)}} } \right] \) by:

$$ u_{ik}^{{\left( {l + 1} \right)}} = \frac{1}{{\sum\nolimits_{j = 1}^{c} {\left( {\frac{{\left\| {{\mathbf{x}}_{k} - {\mathbf{v}}_{i}^{\left( l \right)} } \right\|}}{{\left\| {{\mathbf{x}}_{k} - {\mathbf{v}}_{j}^{\left( l \right)} } \right\|}}} \right)^{{\frac{2}{m - 1}}} } }},\quad 1 \le i \le c;1 \le k \le n $$
(16)

Step 4 If \( \left\| {U^{{\left( {l + 1} \right)}} - U^{\left( l \right)} } \right\| < \varepsilon \) (ε is a small positive constant) or the iteration number l reaches its maximum value, stop the algorithm and output the clustering outcome; Otherwise, Let l = l + 1, return to Step 2 to continue the iterative procedure.

OFS feature selection

The FCM algorithm introduced above produces the center vector of each possible cluster after the iterative procedure stops. The difference (or dissimilarity) of these center points in the feature space may reflect the relative sensitivity of each feature vector to the variation in OFS. Therefore, in order to reduce the computational complexity of the OFS classification algorithm and in turn to meet the future requirement of real-time OFS classification, we eliminate the less sensitive features from the candidate feature set based on the criterion of differences of cluster centers. The benefit of this method is to reduce the computational burden without discernible sacrifice of classification accuracy. Specifically, we preset the threshold of the inter-cluster distance to be 0.1, then those features resulted in inter-cluster distance <0.1 would be considered as not sensitive to the changes in OFS and thus eliminated from the candidate feature set. In this way, the most influential features of the OFS would be eventually selected.

OFS classification results

As introduced in the Introduction section, the OFS refers to the current cognitive, psychological or mental state (status) of the human operator, whose assessment is determined by many different factors, including the current physiological and psychological condition, current task demand, ambient environment stressors, etc. Consequently, in practical applications it is usually rather difficult to quantify the OFS with crisp parameters. In this case, very accurate quantitative estimation of the OFS is normally not required and it can be thus characterized or delineated by the notion of linguistic variable (with a limited discrete number of linguistic values such as Good, Average, and Poor) in fuzzy set theory. More importantly, it is crucial to recognize or identify the redline beyond which the operator would be unable to complete the current task demand so as to prevent the operational incidents or accidents caused by the OFS impairment or breakdown. It is obvious that the classification of the OFS into a few discrete categories has become a central problem. In this work we employ the FCM algorithm to make OFS classification for the following reasons: (1) The nature of the OFS pattern recognition problem requires the use of fuzzy models. The use of the concept of fuzzy membership degree allows for the possible overlapped OFS class (i.e., it is possible that the OFS at a certain moment (time instant) may belong to several different classes but with different degree of membership) and more flexible classification; (2) The use of fuzzy models makes the solution of the pattern recognition problem faster as gradient-based optimization can be conducted on the continuous variables in fuzzy models whereas it normally demands brutal force search on the whole state space if a non-fuzzy model is used. The OFS classification via the FCM algorithm would result in the information about the specific class labels, the cluster centers, as well as the respective membership grades. Finally we can simply select the class with maximal membership as the defuzzified category of the OFS data at a certain moment.

As mentioned before, from the measured physiological data we derived five OFS features, namely HRV2, LF/HF, TLI1, TLI2, and HR. For two sessions of dataset (denoted by s1 and s2 respectively, each containing a temporal sequence of 126 data points) of a particular subject, the dynamic OFS is classified into three distinct state: Good (with class label 1), Average (with class label 2), and Risky (with class label 3) based on the features examined by using the FCM algorithm. As a result, for the FCM algorithm we preset c = 3 and m = 2. First we randomly initialize all entries in the membership degree matrix in the range of [0, 1] with the constraint of the sum of the three elements in each column equal to 1, then the iterative procedure, given in subsect. 3.1.3, is carried out step by step.

In the selected examples of the OFS classification results for some subjects, all features are normalized to a dimensionless quantity in the range of [0, 1]. The horizontal axis of the OFS feature time-series (or time history) is time index with a sampling interval of 1 min and thus the unit of x-axis is min. The following figures illustrate the dynamic discrete (three-level) change in the OFS over time (minute-to-minute). In other words, based on the measured psychophysiological and performance time-series data, the momentary (or instantaneous) OFS (i.e., the operator state at a certain moment is good, average or vulnerable?) can be identified by using the method proposed in this paper.

Figure 5 shows the OFS classification result using the 2nd session of dataset from subject B (dataset B-s2 for short. The same shorthand for designating a certain dataset for a subject will be used in the following). Figure 4a) shows the time history of the performance feature TIR, from which it can be observed that the TIR under low workload conditions (i.e., C1 and C9 with the lightest task-difficulty 1) is close to 1 (exhibiting almost perfect control task performance) and begins to show clear decrement under higher workload conditions (e.g., C3 and C7 with mild to highest task difficulty 3, 4, 5). The maximum membership degrees are shown in Fig. 4c). For each data point, the sum of its degrees of membership to three OFS classes should be 1. It can also be seen from Fig. 4b) that OFS remains satisfactory at most of experimental time (cf. Fig. 4c), also with the highest membership degrees or confidence belief). From Fig. 4c), it can be observed that there are only 4 moments when the maximum degrees of membership to the corresponding classes are <60 % and that most maximal degrees of membership to class1 (“Good” state) are closer to 100 % (which implies that the OFS at those moments belongs to “Good” almost with full certainty). After the iteration of the FCM algorithm stops, we also obtain the three cluster centers: 0.9433, 0.5523, and 0.0859, respectively.

Fig. 5
figure 5

Dataset B-s2: a The time series of the performance feature TIR; b The momentary OFS classification results; c The maximum OFS category membership grades corresponding to (b)

For dataset B-s2, the five candidate features and the corresponding OFS classification result are shown in Fig. 6a, b, respectively. To evaluate the validness and relative accuracy of the OFS classification results, the task-difficulty variable NOV is also show in solid line. As mentioned earlier, the NOV is used to quantify the varying levels of task difficulty under different load conditions. From Fig. 6b, it is seen that the OFS classification result captures well the following characteristics of real OFS variations by our experimental design: (1) During the loading phase the OFS is gradually impaired with heightened task-load; (2) The OFS gradually recovers to normal range with the reduction in task-load during the unloading phase of our experiment. For dataset D-s2, K-s1, and L-s1, the five candidate psychophysioloigcal features and the corresponding OFS classification results are presented in Figs. 8, 10, and 12, respectively. From these results, it is found that the dynamic OFS classification results for subject D, K and L are also in good agreement with the real change in the workload due to the cyclical loading method used in our experiment. Therefore, the effectiveness of using the five features and the FCM algorithm for OFS classification is demonstrated by the obtained results on all 11 subjects. For dataset B-s2, D-s2, K-s1, and L-s1, the distribution of the sample OFS data points in the reduced feature space (or plane) is shown in Fig. 7a, 9a, 11a, 13a), respectively. The respective momentary OFS classification results are shown in Figs. 7b, 9b, 11b, 13b. It is easy to observe that the OFS classification results based on feature selection are well consistent with the actual workload variations across task-load conditions in the experiment.

Fig. 6
figure 6

Dataset B-s2: a The time history of the five physiological features; b The instantaneous OFS classification result

Fig. 7
figure 7

Dataset B-s2 with feature selection: a The data distribution in the reduced 3D feature space; b The instantaneous OFS classification result

Fig. 8
figure 8

Dataset D-s2: a The time history of the five features; b The corresponding OFS classification result

Fig. 9
figure 9

Dataset D-s2 with feature selection: a 3-D feature space after eliminating 2 other features; b Data clustering result

Fig. 10
figure 10

Dataset K-s1: a Time history of the five features; b Instantaneous OFS classification result

Fig. 11
figure 11

Dataset K-s1 with feature selection: a Data distribution in the reduced 2D feature plane; b Instantaneous OFS classification result

Fig. 12
figure 12

Dataset L-s1: a The five features; b The data clustering result

Fig. 13
figure 13

Dataset L-s1 with feature selection: a 3D feature space after eliminating 2 other features; b clustering result

Results analysis and discussion

In this subsection, we will make some comparative analysis on the OFS classification results based on two different datasets measured from each subject. The OFS feature selection and classification results for all 11 subjects are summarized in Table 1 with the last three columns representing the correlation between the classified OFS states with variable NOV and the percentage of consistent OFS classification before and after feature selection. In Table 1, the symbols √ stands for the selected features, s1 and s2 represent the 1st and 2nd experimental session, respectively. From the last row of Table 1, the importance (or sensitivity) ranking of all five candidate features in descent order is TLI2 > HR = HRV2 > TLI1 > LF/HF. The individual difference across subjects clearly exists in terms of their dominant OFS feature patterns. By analysis of the (linear) correlation between the OFS class output (valued in the finite discrete set {1, 2, 3}) and the variable NOV (valued in the finite discrete set {1, 2, 3, 4, 5}), significant correlation between the OFS and the task difficulty can be found. The OFS classification results from all 11 subjects reflect the real task-load variations due to the cyclical loading paradigm used in the experiment, especially for 6 particular subjects B, C, D, J, K and L. Furthermore, the results of classification consistency rate before and after feature selection also show little effect of the use of feature selection procedure on the OFS classification accuracy. The benefit brought by the feature selection is the reduction of the computational overhead and accordingly the enhancement of the real-time performance of the OFS classification method.

Table 1 The OFS feature selection and classification results for all subjects

In summary, based on the measured objective physiological and performance data, this work investigated three-state (corresponding to Good, Average and Risky operator state) classification of the time-varying OFS. By selecting the proper OFS feature vector tailored to individual subjects, the momentary OFS classification is performed by using the FCM algorithm, which gives the maximal degrees of membership (it can be considered as a measure or estimate of the confidence on the classification result) to the assigned classes in addition to the class labels. The results have shown that the method proposed can lead to satisfactory classification performance in terms of both accuracy and computational efficiency if the proper individual-specific OFS features are selected. Based on a comparative analysis of the results across 11 subjects, significant individual differences were also observed.

Conclusions and future work

In practical OFS assessment situations, the information about which discrete category a momentary OFS belongs to is usually desired. Based on a series of electrophysiological data measured from 11 subjects in laboratory-based human–machine cooperative process control experiments, five candidate OFS features were derived first and then the FCM algorithm was utilized to perform fuzzy classification of the OFS at each moment (with a temporal resolution of 1 min). The selection of most important features (or called data dimensionality reduction) is further conducted based on the obtained cluster centers by the FCM algorithm. Due to the fact that the target OFS class is often unknown, it is hard to evaluate the classification accuracy (i.e., the correct classification rate). The cyclical loading method used in our experiments helps alleviate this difficulty. From the correlation analysis between the OFS classification decision and the variable NOV, which is used to quantitatively characterize the varying levels of task difficulty manipulated by the cyclical loading experimental paradigm, the physiological-data-based OFS classification results clearly reflect the stepwise change in the OFS with the workload and task-difficulty variations. The results have also confirmed that the selected salient OFS features differ from subject to subject reflecting the expected individual differences. It was also shown that the feature selection highly improves the computational efficiency of the classification algorithm with no obvious cost of classification accuracy, which makes the online real-time classification of massive OFS data possible in the future. Based on accurate and dynamic recognition of the OFS, adaptive task allocation between human and machine (or computer) can be triggered with an aim to enhance the overall performance of human–machine cooperative systems. In this regard, our simulation work on adaptive control of human–machine cooperative systems has been reported recently [refer to Yang and Zhang (2012)].

As the interactive mechanism between the OFS and electrophysiological measures is generally very complex and unknown, we can only examine it based solely on a combination of the measured physiological and performance data using hybrid data approach. Although the fuzzy classification method yielded promising and encouraging results, it is still necessary to make further investigation on experimental studies as well as methodological algorithms. For instance, the necessary further work along these two lines of research may include: (1) OFS feature extraction: we are considering to use Principal Component Analysis (PCA) and other nonlinear analysis methods (Schiff 2011) to better extract the individualized optimal OFS features; (2) Development of novel EEG pattern recognition (PR) method: in future work we will compare the FCM algorithm with other popular PR methods, such as recurrent neural networks and support vector machine (Qin and Zhang 2012), in terms of multiclass OFS classification performance; (3) Fine-grained and real-time OFS analysis: In the present work only three operator cognitive state is differentiated, as a natural complement to the obtained results finer-grained analysis (corresponding to finer grid in OFS state-space) based on dynamical variations in workload during the online process control operations may be also necessary, which necessitates the future development of real-time and accurate OFS estimator for real-life operational applications.