Keywords

1 Introduction

Driver’s mental workload (MWL) plays a crucial role on the driving performance. Due to excessive MWL, drivers undergo a complex state of fatigue which manifests lack of alertness and reduces performance [1]. Consequently, drivers are prone to committing more mistakes due to increased MWL. It has been revealed that human error is the prime cause of around 72% road accidents per year [2]. So, increased MWL of drivers during driving can produce errors leading to fatal accidents. Driving is a complex and dynamic activity involving secondary tasks i.e. simultaneous cognitive, visual and spatial tasks. Diverse secondary tasks along with natural driving in addition to different road environments increase the MWL of drivers which lead to errors in traffic situations [3]. The alarming number of traffic accidents due to increased MWL leverages the need of determining drivers’ MWL efficiently. Several research works have identified mechanisms to measure drivers’ MWL while driving both in simulated and real environments [1, 4, 5]. Methods of measuring MWL can be clustered into three main classes; (i) subjective measures i.e. NASA Task Load Index (NASA-TLX), workload profile (WP) etc., (ii) task performance measures e.g. time to complete a task, reaction time to secondary task etc. and (iii) physiological measures e.g. electroencephalogram (EEG), heart rate measures etc. [6]. The latter, with respect to traditional subjective measures, are intrinsically objective and can be gathered along with the task without asking any additional action to the user. Also, with respect to performance measures, physiological measures do not require as well secondary tasks and are generally able to predict a mental impairment, while on the contrary performance generally degrades when the user is already overloaded [7,8,9]. Due to the vast availability of measuring technology, portability and capability of indicating neural activation clearly, the major concern of this work is the physiological measures, specifically, EEG. With the increase of data storage and computation power data-driven machine learning (ML) techniques have been becoming popular means of quantifying MWL from EEG signals.

Relevant features extracted from the EEG signals are the sine qua nons for quantifying MWL. Currently, feature extraction is done using theory driven manual analytic methods that demand huge time and effort [10, 11]. The proposed work aims at exploring a novel deep learning model for automated feature extraction from EEG signals to reduce the time, effort and complexity. From the literature study, it has been found that several ML techniques have been applied to extract features from EEG automatically but a proper comparative study on traditional and automatic feature extraction methods have not been put forward. In this paper, a deep learning model, convolutional neural network autoencoder (CNN-AE) is proposed for automatic feature extraction. These automated features are evaluated with several classification algorithms and compared with manual feature extraction technique for comparative analysis and feature optimisation.

The rest of the paper has been organised as follows – the background of the research domain and several related works, are described in Sect. 2. Section 3 contains detailed description of the experimental setup, data collection, analysis, feature extraction and classification techniques. Results along with the discussions are provided in Sects. 4 and 5 respectively. In the conclusion, limitations and future of this work are discussed in Sect. 6.

2 Background and Related Work

Literature indicates MWL as an important aspect of assessing human performance [12], whereas driving is a complex task performed by humans associated with several subsidiary tasks. Assessment of drivers’ performance by quantifying MWL has been being performed for decades. There have been several means for measuring mental workload, but physiological measures are chosen often due to cheap and smaller technologies [13]. Physiological measures include respiration, electrocardiac activities, skin conductance, blood pressure, ocular measures, brain measures etc. Recently, Charles and Nixon stated that, brain measures in the form of EEG has been used for measuring MWL in most of the research works [12, 14]. Moreover, several studies have proven a strong correlation between MWL and EEG features both in time and frequency domain. Features like theta and alpha wave rhythms of EEG signal over the frontal and parietal sites respectively reflect significantly on the MWL variations [8, 15, 16].

Since the exploration of EEG signals, as a tool for measuring MWL, conventional techniques of feature extraction including statistical analysis and signal processing, have been in practice. Ahmed et al. proposed a non-linear approach of feature extraction using fractal dimensions to determine different brain conditions of participants [10]. In classifying motor imagery signals, Sherwani et al. used discrete wavelet transform analysis to extract feature from EEG signals [17] whereas Sakai used non-negative matrix factorisation [18]. Several techniques with time and frequency domain analysis have been proposed for feature extraction [19, 20]. Tzallas et al. proposed a method of extracting features from power spectrum density (PSD) of EEG segments by using Fourier transformation for epileptic seizure detection [11]. Individual alpha frequency (IAF) analysis has been adopted in several studies to adjust features of EEG signals [21]. Recently, Wen and Zhang proposed a genetic algorithm based feature search technique for multi-class epilepsy classification [22]. However, sufficient works have been presented on classifying MWL from EEG signal analysis where different ML algorithms were deployed after extracting features analytically. Use of several ML algorithms were found in the literature for classifying MWL such as Support vector machine (SVM) [23, 24], k-nearest neighbours (k-NN) [23], fuzzy-c means clustering [25], multi-layer perceptron (MLP) [23, 26], etc.

Extracting features automatically from EEG signals is a relatively new field of research. Researchers have deployed diverse range of deep learning (DL) algorithms, commonly termed as autoencoders (AE) to extract feature from EEG signals both with/without preprocessing. Recently, Wen et al. used deep convolutional neural network (CNN) for unsupervised feature learning from EEG signals after applying data normalisation for preprocessing. To assess the performance of their proposed model, several classification algorithms were used to classify epilepsy patients [27]. In several works, authors used stacked denoising autoencoder (SDAE) [28], long short-term memory (LSTM) [29] and deep belief network (DBN) [30] for feature extraction after applying PSD for preprocessing. Gou et al. extracted features by deployed genetic algorithm for classifying epilepsy with k-NN classifier. In this approach, discrete wavelet transformation (DWT) was used for preprocessing of raw EEG signals [31]. In 2018, Shaha et al. investigated two different deep learning (DL) models, SDAE and LSTM, for extracting features from EEG signals without any preprocessing. Afterwards, MLP was used to classify cognitive load on the participants who were asked to perform learning task [23]. Ayata et al. [32] and Almogbel et al. [33], both the research groups used CNN autoencoder (CNN-AE) for extracting features from EEG signals for classifying arousal and MWL among participants.

Evidently, feature extraction from EEG signals using CNN-AE have been a popular technique among researchers for classification tasks from epilepsy and MWL domain. Moreover, several classification algorithms were further used to measure the effectiveness of the features extracted automatically. But, to our knowledge none of the works represented a comparative study about feature extraction through manual analysis and automatic extraction of features using DL techniques to compare the performance in workload classification particularly for driving situations.

3 Materials and Methods

3.1 Experimental Setup

The experiment was performed in a route going through urban areas at the periphery of Bologna, Italy. There were 20 participants in this experiment. All the participants were students of University of Bologna, Italy with mean age of 24 (±1.8) years and licensed for about 5.9 (±1) years on average. The participants were recruited for the study on voluntary basis. Only the male participants were selected to conduct a study with homogeneous experimental group. The experiment was conduct ed following the principles defined in the Declaration of Helsinki of 1975 (Revised in 2000). Informed consent and authorisation to use the recorded data was signed after proper description of the experiment was provided to the participants.

Fig. 1.
figure 1

The experimental circuit about 2.5 km long along Bologna roads. The red and yellow line along the route indicates ‘Hard’ and ‘Easy’ segments of the road respectively. The green arrow in the bottom-right corner shows the direction of driving from the starting and finishing point. (Color figure online)

During the experiment a participant had to drive a car, Fiat 500L 1.3Mjt, with diesel engine and manual transmission, along the route illustrated in Fig. 1. In particular, the route consisted of three laps of a circuit about 2.5 km long to be covered with the daylight. The circuit was designed on the basis of evidences put forward in scientific literature [34, 35]. In the designed circuit, there were two segments of interest in terms of road complexity and cognitive demand – (i) Easy, a straight secondary road serving residential area with an intersection halfway with the right-of-way; (ii) Hard, a major road with two roundabouts, three lanes, high traffic capacity and serving commercial area. This factor will be termed as “ROAD” in the following sections. Furthermore, a participant had to drive twice a day in the circuit, once during rush hour traffic and another in off-peak hour. This factor will be further termed as “HOUR” with two conditions Normal and Rush. This factor had been designed following the General Plan of Urban Traffic of Bologna, Italy. Table 1 refers the traffic flow intensity considered to design two experimental conditions in this study.

Table 1. Traffic flow intensity in the experimental area during a day retrieved from General Plan of Urban Traffic of Bologna, Italy.

At the end of every experimental procedure consisting of a driving task of three laps twice during rush and normal hours, each participant was properly debriefed. The order of rush and normal hour condition had been randomised among the participants to avoid any order effect [36]. There were two segments in each lap, easy and hard referring to road complexity and task difficulties. During the whole experimental protocol physiological data in terms of brain activities through EEG has been recorded. A detailed description on recording of EEG signals has been given in the following sections. However, two very recent studies have been performed by Di Flumeri et al. following the same experimental procedure [15, 16].

3.2 Data Collection and Processing

EEG signals have been recorded using digital monitoring BEmicro systems provided by EBNeuro, Italy. Twelve EEG channels (FPz, AF3, AF4, F3, Fz, F4, P3, P7, Pz, P4, P8 and POz) were used to collect the EEG signals. The channels were placed on the scalp according to the 10–20 International System. The sampling frequency was 256 Hz for recording EEG signals. All the electrodes were referenced to both the earlobes and grounded to Cz site. Impedance was kept below 20 k\(\Omega \). During the experiment no signal conditioning were done, all the EEG signal processing were done offline. Events were recorded along with EEG signals to associate specific signals to different road and hour conditions.

Raw EEG signals were cropped referencing the events recorded; three laps for both Normal & Rush hours including Easy & Hard conditions. Furthermore, two ROAD-HOUR driving situations; Easy-Normal and Hard-Rush were selected for the classification of MWL since literature suggests that these conditions demand low and high MWL respectively [15]. Data of all the laps driven by the participants in the Easy-Normal and the Hard-Rush conditions were used for further analysis. EEG signals were sliced into 2 s (epoch length) segments by sliding window technique with a stride of 0.125 s keeping an overlap of 0.825 s between two continuous epoch. The windowing technique was performed to obtain a higher number of observations in comparison with the number of variable and respecting the condition of stationarity of the EEG signals [37]. Specific procedures of EEGLAB toolbox [38] have been used for slicing the recorded EEG signals. To remove different artefacts i.e. ocular and muscle movements etc. from the raw EEG signals ARTE algorithm by Barua et al. [39] has been used.

3.3 Feature Extraction

Two different types of feature extraction techniques i.e., manual and automatic were investigated in this study. In both the methods artefact handled EEG signals have been used. Firstly, the technique following traditional practices with filtering and signal processing methods has been used. Here, 25 relevant features were retrieved. Further, for the other approach DL was used to extract features from EEG signals. Here, 284 features were primarily extracted by CNN-AE. After analysing the feature importance based on random forest (RF) classifier 124 features were used for further tasks. Table 2 demonstrates the number of relevant features extracted from different techniques followed by description of two different feature extraction techniques.

Table 2. Number of features selected from different techniques.

Traditional Approach. The process of feature extraction performed in this work is mostly motivated by the work done by Di Flumeri et al. [15]. Firstly, PSD has been calculated for each channel of each windowed epoch of ARTE cleaned EEG signals mentioned in Sect. 3.2. To calculate the PSD from the EEG signals, Welch’s method [40] with Blackman-Harris window function was used on the same length of the epochs (2 s, 0.5 Hz frequency resolution). In particular, only the theta band (5–8 Hz) over the EEG frontal channels and the alpha band (8–11 Hz) over the EEG parietal channels, were considered as variables for the mental workload evaluation [8]. Then, to define EEG frequency bands of interest, IAF values were estimated with the algorithm developed by Corcoran et al. [21]. Figure 2 illustrates the final feature vectors generation, for each of the observations following the aforementioned sequence of steps.

Fig. 2.
figure 2

Steps in traditional feature extraction technique.

Fig. 3.
figure 3

Network architecture of the CNN-AE for feature extraction.

Deep Learning Approach. The CNN-AE architecture used for automatic feature extraction is shown in Fig. 3. The whole network is divided into two parts, (i) encoder and (ii) decoder. Encoder is comprised of a number of convolutional layer associated with pooling layers, finds deep hidden features from original signal. On the other hand, Decoder uses several deconvolutional layer to reconstruct the signal from the features. To assess the performance of the encoders, the quality of reconstructed signal from decoder is used. On the basis of this compressing and reconstructing, the whole model is trained. The developed encoder in this study, consists of four convolutional layers and four max-pooling layers. The decoder is designed in inverse order of the encoder. It contains five convolutional layers and four upsampling layers facilitating the depooling. Zero padding, batch normalisation and ReLU activation function have been used in each of the layers. The developed CNN-AE utilised RMSprop optimisation with a learning rate of 0.002 and binary cross-entropy as the loss function. After a successful learning procedure, CNN-AE extracted 284 features from the experimental EEG signals.

3.4 Classification of MWL

After extracting features from two different methods, several classifiers were deployed to classify MWL. Table 3 provides the list of classifiers and the values of their prominent parameters.

Table 3. Parameters used in different classifiers.
Fig. 4.
figure 4

Variation in classification accuracy with respect to the change of threshold on feature importance values. Highest average accuracy 87.30% was found for 0.003 (point marked with red dot) as threshold on feature selection. (Color figure online)

Before classifying MWL, to reduce the dimension of the feature set further, feature importance was calculated using RF classifier. Different number of features were selected from 284 features depending on different threshold values and deployed for classifying MWL with SVM classifier on the training data set. It was observed that there was variation in accuracy. Finally, by imposing 0.003 threshold on feature importance 124 relevant features were finalised that reduced the feature set by more than half but increased accuracy. For the both the classifiers, parameters given in Table 3 were used. Figure 4 illustrates the change of accuracy for different threshold values of feature importance to select features for classification.

Table 4. Average performance measures of classifiers applied on traditionally extracted features.
Table 5. Average performance measures of classifiers applied on features extracted by CNN-AE.

4 Result and Evaluation

All the observations with relevant features from the EEG signals were divided into training and testing set considering 80% and 20% of the data respectively. The training set was used to train the model and the testing set was used to validate the accuracy of MWL classification. Several common classifiers stated Table 3 were deployed to verify the effectiveness of the features obtained by traditional method and CNN-AE. For measuring classification performance, average overall accuracy, balanced classification rate (BCR) or balanced accuracy and \(F_1\) score were calculated for each of classifiers and features extracted by different methods. Tables 4 and 5 contains the values for performance measures of classification from traditionally extracted features and CNN-AE extracted features respectively. It has been observed that features extracted from CNN-AE produced better performance measures for all the classifiers. In particular, SVM classified MWL with the highest overall accuracy of 87%.

To investigate the performance of the classifiers further, Specificity and Sensitivity were calculated and illustrated in Fig. 5. It has been clearly visible from the figure that both the scores for CNN-AE features were higher than traditionally extracted features.

Fig. 5.
figure 5

MWL classification results in terms of Sensitivity and Specificity.

Fig. 6.
figure 6

AUC-ROC curves for different classifiers with features extracted by traditional methods and CNN-AE where models were trained using 10-fold cross-validation.

Fig. 7.
figure 7

AUC-ROC curves for different classifiers with features extracted by traditional methods and CNN-AE where models were trained using leave-one-out (participant) cross-validation.

To establish the validity of the proposed model, 10-fold and leave-one-participant-out cross validations were performed. Average AUC curves on the cross validations are illustrated in Figs. 6 and 7 where SVM classifier has the highest AUC in both. For 10-fold cross validation, all the observations were divided into 10 segments. Afterwards, for each iteration, one segment was used for testing a model built on other segments as training set. In leave-one-participant-out cross validation process, for each of the participants of the experiment, the observations from that participant were used for testing the model build on the observations from other participants considered as training data. For both the cross validation, AUC values for CNN-AE extracted features in classification are notably higher than the values for traditionally extracted feature.

5 Discussion

In this study, traditional and CNN-AE based EEG feature extraction methods were comparatively investigated using four well established classifiers; SVM, kNN, RF and MLP. Among the concerned feature extraction techniques, CNN-AE influenced the classifiers to achieve higher classification accuracy and other performance measures. Initially, the number of features extracted from CNN-AE were substantially higher than the features extracted through traditional methods but with feature selection mechanism, the feature set was approximately reduced to half resulting improvement in the accuracy measures of all classifiers. From different performance measures demonstrated in Sect. 4, it has been shown that SVM achieves higher accuracy in classifying MWL from EEG signals irrespective of feature extraction technique.

In case of classifier models for MWL classification used in related works, many factors affect the performance of the model. Generally, if there remains a clear correlation between characteristics of data and class labels, the deployed classifier achieves higher accuracy in prediction. But, in case of MWL classification for drivers’ while driving in real life or simulator, the probability of noise being recorded with the EEG signals is quite high due to eye movement, power signals, miscellaneous interference etc. In practice, the noises are termed as artefacts. In traditional feature extraction methods, removing these artefacts from data along with different inter- and intra-individual variability require huge manual effort and processing. According to the characteristics of deep learning, its layer can find out hidden features laid in a data responsible of assigned labels. Here, from the results of this study it can be established that, CNN-AE or any deep learning mechanism can produce feature set from EEG signals, that would be equivalent or better than the feature set extracted manually with less effort keeping aside the preprocessing and artefact handling tasks. Primarily, the proposed CNN-AE produced an extensive set of features. An intuitive investigation on the feature selection with RF Classifier and imposing threshold on feature importance produced considerably shorter feature vector with higher classification accuracy. Further investigation on feature selection in this domain can produce more robust set of relevant features.

The recorded data from experimental protocol was balanced in terms of class labels. Each of the participants attempted driving for different ROAD and HOUR condition once. The recorded EEG signals formed the initial labelled balanced data. For further investigation, the raw EEG signals were segmented into overlapping epochs to increase the amount of observations keeping the core characteristics of the data. This operation facilitated this data-driven study by increasing the amount data with a trade-off for balanced data. Due the uneven driving duration among the participants, the number of windowed epochs varied from participant to participant as well as for different study factors resulting the data as an imbalanced data. Performance measures illustrated in Sect. 4 were chosen from prescribed measures for imbalanced data by Tharwat [41].

6 Conclusion

This paper presents a new hybrid approach for automatic feature extraction from the EEG signals and demonstrated with MWL classification. The main contribution of this paper can be represented in three folds: (i) CNN method is used to extract features automatically from artefact handled EEG signals, (ii) RF is used for feature selection and (iii) several machine learning algorithms are used to classify drivers’ mental workload on CNN based feature sets. This new hybrid approach is compared with traditional feature extraction approach considering four machine learning classifiers, i.e. SVM, kNN, RF and MLP. According to the outcome of the both 10-fold and leave-one-participant-out cross validation, SVM outperforms other classifiers with CNN-AE extracted features. One advantage of CNN-AE for feature extraction is that it works directly on the artefact handled data sets i.e. additional signal processing, individual feature extraction etc. are not needed, thus reducing time in manual work. More experimental work with large and heterogeneous data set is planned for future work to increase the performance of the proposed method and extract features directly from raw EEG signals. Moreover, classifying MWL in real time using the proposed approach and suggesting external actions to mitigate road casualty is the final goal of the planned research works.