Introduction

Fatigue is a complex mental state, often accompanied by drowsiness (Kar et al. 2010) and usually manifesting as lack of vigilance and reduced attention. It becomes one of the major causes of motor vehicle accidents (Sahayadhas et al. 2012; Khushaba et al. 2010), which bring serious physiological injuries, psychological distress, and significant economic loss to drivers and their families. Driver fatigue is reported to account for 35–45% of all vehicle accidents (Idogawa 2006). Therefore, detecting drivers’ cognitive ability in driving process, specially fatigue state, has great potential in reducing vehicle accidents.

Many methods for driver fatigue detection are based on physiological signals, such as electroencephalography (EEG), electrooculogram (EOG), Electromyogram (EMG), and Electrocardiogram (ECG) (Khushaba et al. 2010; Kong et al. 2017; Hu and Zheng 2009; Fu and Wang 2014; Ahn et al. 2016; Lin et al. 2014), or their combination. Most studies report that there is strong correlation between these signals and drivers’ cognitive state and they can be used to detect driver fatigue accurately. For instance, some studies  (Khushaba et al. 2010; Brookhuis and De 1993; Jap et al. 2009) report that the change in the cognitive state is usually accompanied with the significant changes of EEG frequency bands, such as delta (0.5–3.5 Hz), theta (4–7 Hz), alpha (8–12 Hz), and beta (13–30 Hz). Eye movement and closure are also considered two important indictors of driver fatigue (Kar et al. 2010). When a person is in fatigue statue, his eye movement decreases and blink rate increases (Lal and Craig 2001). In addition, it is reported that the variability of heart rate can be used to distinguish fatigue from other cognitive states by ECG power spectrum (Tsuchida et al. 2009). The variability of heart rate (Jeong et al. 2007) deceases when a person is in fatigue or drowsiness state.

EEG records the electrical potentials generated by cerebral cortex’s nerve cells (Liang et al. 2010), has rich sample data with high temporal resolution (Zeng et al. 2017; Stein et al. 2013), and contains abundant physiological or psychological information. EEG-based methods are considered to be the most convenient and effective among these physiology-based methods. In general, most EEG-based methods for driver fatigue detection utilize waveform information, power spectrum, nonlinear analysis and some modeling techniques  (Kong et al. 2017; Chen et al. 2017). For example, Correa et al. (2014) developed an automatic method to detect the drowsiness stage in EEG records using time, spectral and wavelet analysis, and obtained 87.4 and 83.6% accuracy in detecting alertness and drowsiness, respectively. Khushaba et al. (2010) proposed a fuzzy mutual-information(MI)-based wavelet packet transform feature-extraction method to predict drowsiness levels. Pal et al. (2008) found out that the power spectrum of alpha band in the EEG is related to the loss of alertness. Similar work is also reported in Jap et al. (2009), Lin et al. (2010). Mu et al. (2017) used four types of entropy, including spectrum entropy, approximation entropy, sample entropy and fuzzy entropy, to extract EEG features for driver fatigue detection. In addition, some modeling techniques are used to detect driver state. Hu (2017) developed an AdaBoost classifier for automated detection of driver fatigue with EEG signals. Wali et al. (2013) fused discrete wavelet packet transformation (DWPT) and fast Fourier transformation (FFT) to classify the driver distraction level, and achieved up to 85% classification accuracy. In Zhao et al. (2010), KPCA-SVM classifier was employed to differentiate the normal and mental fatigue state, and got a higher accuracy 98.7%. Fu et al. (2016) presented a HMM (Hidden Markov Model)-based dynamic fatigue detection model to estimate the driver fatigue, and obtained 92.5% accuracy.

Despite those advancement, robust and accurate detection of driver cognitive performance by EEG still remains challenge. First, it is well known that EEG manifests highly non-stationary, and varies over time within a single subject (intra-subject) and between two different subjects (inter-subject) (Thodoroff et al. 2016). It is challenging to identify general patterns from non-stationary EEG signals. Second, the above-mentioned methods generally separate the detection process into two steps: feature exaction and classification. The process of feature extraction usually needs hand-crafted operation, which may cause the loss of useful information in EEG (Tang et al. 2017). Third, the low signal-to-noise ratio (SNR) of EEG also impacts the detection accuracy.

Deep learning (DL) (LeCun et al. 2015) has been applied in various domains, such as computer vision, speech recognition and natural language processing. Convolutional neural networks (CNN) represents one of the most significant advances in DL due to its success in many challenging classification tasks (He et al. 2016; Abdel-Hamid et al. 2014; Domhan et al. 2015). CNN are feed-forward neural networks, usually including feature extraction layer and feature mapping layer, and can learn local patterns in data by convolution. A distinctive property of CNN is that it is suitable for end-to-end learning without any a priori feature selection (Schirrmeister et al. 2017), which avoids information loss, and is specially fit for low SNR, task-irrelevant EEG raw data. Hence, lots of EEG-based researches and applications have emerged these years such as P300 feature detection (Cecotti and Graser 2011; Puanhvuan et al. 2017), motor imagery classification (Sakhavi et al. 2015), seizure detection (Page et al. 2016; Raghu et al. 2017), cognitive therapy in depressive disorder (Bornas et al. 2015; Schoenberg and Speckens 2015), drowsy and alert states prediction (Hajinoroozi et al. 2016), momentary mental workload recognition and classification (Zhang et al. 2017, 2017), emergent visual attention model for identifying the possible cause of autism (Gravier et al. 2016) and brain-computer interface communication (Lawhern et al. 2016; Manor and Geva 2015), etc.

In this work, we construct two novel classifiers: EEG-Conv and EEG-Conv-R, where EEG-Conv is based on the traditional CNN and EEG-Conv-R combines CNN with recent deep residual learning. We study the prediction performance of our proposed classifiers on both intra- and inter-subject with raw EEG data. We also compare our EEG-Conv and EEG-Conv-R with support vector machine (SVM) and an existing deep learning method LSTM (long short term memory).

The rest of the paper is organized as follows: “Materials” section introduces the experiment design, EEG data acquisition, as well as data preprocessing, respectively. “Methods” section provides a detailed description of our proposed classifiers, including the design of EEG-Conv and EEG-Conv-R. The results and discussion of experiment are shown in “Results and discussion” section. Finally, conclusion is presented in “Conclusion” section.

Materials

Driving simulation platform

We construct a driving simulation platform, as shown in Fig. 1. The platform is made up of 1) stimulation driving operation devices, including racing seat, steering wheel, liquid crystal display (LCD), loudspeaker, and projector; 2) the physiological signal collection instruments, including Neuroscan with 64 electrodes for EEG collection, a camera for eye-blink detection, and a heart-rate sensor for counting the heart rate. The physiological signals are acquired simultaneously. ( Herein, eye-blink and heart-rate detection is to determine the rank of mental states. For instance, if the number of eye-blink per minute is less than 20 times and heart-rate is greater than 70 times per minute, we define it as sober state which we call ‘TAV’. Correspondingly, if the number of eye-blink exceeds 30 times, and heart rate is less than 60 times, we define it as fatigue statue which we call ‘DROWS’ ); 3) one computer for data recording, which has installed a driving simulation software - Need For Speed-Shift 2 Unleashed (NFS-S2U), and ‘WorldRecord’ software for recording all the parameters during driving; 4) another computer for collecting alert tasks like image and sound stimuli, and processing physiological signals.

Fig. 1
figure 1

Driving simulation experiment platform

Experiment protocol

Ten healthy subjects aged 23–25 participate in the experiment for driving data collection. All of them possess C1 (Manual Transmission, MT) driving licenses, and know the whole experiment procedure in advance. They are asked to ensure adequate sleep the day before the experiment, and are told not to drink excitant or inhibitory drinks like coffee, alcohol, tea, and to avoid strenuous exercises during the experiment day. The study is approved by a local Ethics committee, and all participants voluntarily sign the written consent form before experiment. The experiment is performed in a quiet and isolated room between 18:00 and 21:00. In addition, the face expression is recorded by the camera in front of the driver, and the heart rate is collected by the corresponding electrode attached on the subject’s right wrist.

Experiment setup

The experiment consists of two stages: the practice stage and experimental stage (Kong et al. 2017). They are performed in two successive days, respectively. The aim of practice is to make sure all subjects become familiar with the stimulation driving environment, and are able to respond correctly to various stimuli. After training, every subject is asked to drive at the specified track for two laps, and should not deviate from the track to ensure safety drive.

When collecting EEG data, we simultaneously record the number of eye blinks per minute of the subject. Combined with the heart rate collected by EKG, we divide the mental states into 8 phases: WUP, PERFO, TAV3, TAV1, TAV5, TAV2, TAV4, and DROWS (Kong et al. 2017), as shown in Table 1. The schematic diagram of experimental procedure is shown in Fig. 2. WUP corresponds to the incipient stage of the experiment, which needs the subject drives as practice dirving for about 10 min without any stimuli. PERFO is similar with WUP, only requires the subject to finish the tracks reducing 2% of the baseline time of the WUP state. From TAV1 to TAV5, the subject is exerted the tasks of video and audition (we call them as alert and vigilance stimuli, respectively) to enhance the subject’s workload, and respond to these stimuli by pressing the ‘RIGHT’ or ‘LEFT’ button on the steering wheel. The ‘RIGHT’ button is for video task with alert stimuli, and the ‘LEFT’ button for auditory task with vigilance stimuli. That is, in the condition of alert stimuli, the traffic jam is simulated, and the subject should press the ‘RIGHT’ button by right index finger when an ‘X’ appears on the screen 1m ahead of the subject. In the condition of vigilance state, the subject should press the ‘LEFT’ button by left index finger when two consecutive “beep”s come. Thus, it ensures the subjects to be alert or vigilant, and can collect EEG signals of the wake condition. The difference among these TAV states is the stimuli frequency. From TAV1 to TAV5, the stimuli intervals are 9800–10,200, 7700–8100, 5900–6300, 4100–4500 and 2300–2700 ms, respectively (Kong et al. 2017). DROWS is a boring drive condition at the speed of about 60 km/h without any extra video or audition stimuli, and the subject is apt to be immersed in drowsiness.

Table 1 Blink and heart rate times of eight mental states

In the present experiment, TAV3 is the first stage when video and sound stimuli appear, and the subject will pay higher attention to these tasks, and be in the most sober state. DROWS is the last stage of the experiment. After nearly 2 h of driving, the subject is prone to fatigue. Moreover, the drive process at a constant speed of 60km/h is monotonous, and easier to be fatigue. Also, as shown in Table 1, the obvious difference of eye-blinks and heart rate between TAV3 and DROWS confirms our experiment design. Therefore, in this paper, the collected EEG data of TAV3 and DROWS is selected for the prediction of drive fatigue.

Fig. 2
figure 2

Schematic diagram of experiment procedure

EEG data acquisition

EEG is collected by gUSBamp amplifier with 16 channels (g.Tec Medical Engineering GmbH), and is continuously sampled with frequency at 256 Hz and impedance below 5K\(\Omega\). The electrodes are deployed in accordance with the international 10/20 standard. Fifteen channels, Fz, Pz, Oz, Fp1, Fp2, F7, F3, F4, F8, C3, C4, P7, P3, P4, and P8, are used to record EEG signals. EKG electrode is placed on the fore-breast for recording the heart rate, and an additional electrode is attached on the left ear lobe as the reference.

EEG data preprocessing

First, by independent component analysis (ICA) (Jung et al. 2000), all trials that contain ocular artifacts are discarded. Then, EEG data between 1  and 40 Hz is retained by band-pass filter. Second, we convert EEG data of 15 channels into the format {SP*CH*TR}, except for EKG channel. Herein, SP refers to the sample rate which is 256 Hz in the experiment, CH is the corresponding sample channel, and TR is the event. EKG is used to record ECG data. Current EEG data format does not fit well with the DL structure, so we segment EEG data into 0.5-second (0.5s) epochs. Because the sample rate is 256 Hz, and there are 15 channels, each epoch can be represented as a \(15\times 128\) matrix. Also, we label ‘0’ for DROWS state, and ‘1’ for TAV3. Thus, in total 28,176 epochs are obtained, including 18,672 DROWS epochs and 9504 TAV3 epochs, as shown in Table 2. Our purpose is to train a classifier using these epochs to better predict the cognitive performance. The last step in the preprocessing is the normalization of EEG data to eliminate the otherness effect of inter-subject EEG data. We adopt z-score function for normalization, which can be denoted by:

$$X^* = \frac{X-\mu }{\delta }$$
(1)

where X is the amplitude of raw EEG data and \(X^{*}\) is the value after normalization. \(\mu\) and \(\delta\) are called the mean and standard deviation of all EEG data, respectively.

Table 2 DROWS and TAV3 epochs of subjects

Methods

Construction of EEG-Conv classifier

Fig. 3
figure 3

The overall architecture of EEG-Conv classifier

The architecture of our CNN-based EEG classifier (Hereinafter referred to as EEG-Conv) is illustrated in Fig. 3. It contains eight layers: the input layer, three convolutional layers, a pooling layer, a LRN (Local Response Normalization) layer, a fully connected layer and the output layer.

Conv1 The input data is a matrix of \(15\times 128\). The first convolutional layer convolves the input with a kernel of \(5\times 5\). The stride is 1, and the bias is set to 0. After convolution, 32 feature maps of size \(11\times 124\) are generated.

LRN2 A local response normalization layer after Conv1 applies local normalization to the previous dataflow. This type of normalization implements a kind of lateral inhibition inspired by biological phenomenon observed in real neurons, providing competition for big activities among neuron outputs calculated using different kernels. In EEG-Conv classifier, we employ the local response normalization layer to inhibit outputs from activation functions and highlight the peak value of the corresponding local region. In EEG brain signal domain, the highlighted high frequency features are more important for detecting driver cognitive states.

Conv3 The second convolutional layer convolves data generated by the previous layer with kernel of \(3\times 3\). The stride is 1 and the initial bias is set to 0. After this convolution, 64 feature maps of size \(9 \times 122\) are generated.

Conv4 The third convolutional layer convolves data generated by the previous layer with kernel of \(3\times 3\). The stride is 1 and the initial bias is set to 0. After this convolution, 32 feature maps of size \(7 \times 120\) are generated.

Pool5 A max pooling layer is placed after the third convolutional layer. The kernel size in Pool5 is \(2 \times 2\) and the stride is 2. The pooling layer lowers the computational burden by reducing the number of connections between the hidden layers in EEG-Conv. By stacking three convolutional layers and a pooling layer, a relative concise EEG signal feature representation is extracted.

FC6 The fully connected layer aims to perform high level reasoning on EEG signal feature representation. FC6 takes all neurons in Pool5 and connects them to every single neuron of current layer to generate global semantics of EEG signals. FC6 is composed of 2048 neurons. The dropout strategy is applied to prevent overfitting. The output of each hidden neuron in FC6 is set to 0 with probability 0.5. The dropout strategy forces EEG-Conv to learn more robust EEG signal features.

Out7 Logistic regression is put on top of the previous hidden layers as the output layer of the EEG-Conv classifier. A single logistic regression layer itself is a linear, probabilistic classifier. Detecting driver cognitive states is done by projecting data points onto a set of hyperplanes, the distances to which reflect a class membership probability. Out7 is parameterized by a weight matrix \({W_O}\) and a bias vector \({b_O}\). The logistic regression layer can be calculated by:

$$\begin{aligned} P({Y_{pred}}= & {} i|{O_{HL}},{W_O},{b_O}) = soft{\max _i}({W_O}{O_{HL}} + {b_O})\nonumber \\= & {} \frac{{{e^{{W_{Oi}}{O_{HL}} + {b_{Oi}}}}}}{{\varSigma _j^{{e^{{W_{Oj}}{O_{HL}} + {b_{Oj}}}}}}} \end{aligned}$$
(2)

where \({O_{HL}}\) is the output of layer Out7. The output of the EEG-Conv classifier is then generated by taking the argmax of the vector whose i-th element is \(P({Y_{pred}} = i|{O_{HL}},{W_O},{b_O})\). It can be calculated by Eq. 3, where the output result is denoted by \(Y \in \{ 0,1\}\).

$$Y = \arg {\max _i}P({Y_{pred}} = i|{O_{HL}},{W_O},{b_O})$$
(3)

Activation function Each neuron in the deep CNNs has nonlinearity (activation function) and linearity (affine transformation unit). The proper activation functions selected according to EEG signal domain knowledge are very important for the performance of the networks. An activation function shall satisfy the following requirements: nonlinearity, saturability, continuity, smoothness and monotonicity. The nonlinear activation function \(\varphi ( \cdot )\) are generally chosen to be sigmoid function, tanh function, or ReLU (Rectified Linear Unit) function. We chose ReLU as activation function in the convolutional layers and the fully connected layer due to its following advantages: 1) it is more efficient than sigmoid or tanh functions; 2) it induces the sparsity in the hidden units and allows the EEG classifier to easily obtain sparse brain signal feature representations. The ReLU function used in EEG-Conv is defined as:

$${\varphi _{i,j,k}} = \max ({a_{i,j,k}},0)$$
(4)

where \({a_{i,j,k}}\) is the input of the activation at location (ij) on the k-th channel. ReLU works better than logistic sigmod and tanh functions in our experiments.

Training of the EEG-Conv classifier

Training the EEG-Conv classifier can be regarded as solving a non convex optimization problem, because the loss function is not a convex function of the network parameters. Hence applying combined strategies to the training phase is necessary. We describe some practical strategies which we used during training in this subsection. Choosing proper learning rates makes learned weights approximates the global optimal solution as far as possible; the dropout method is used to prevent over fitting.

Learning rate The BP (back propagation) algorithm provides an approximation of the trajectory calculated by using the steepest descent in the weight parameter space. The learning rate is initialized to 0.01 at the beginning, and changed throughout the training phase. Step strategy is adopted so that the learning rate is adjusted after a fixed number of iterations in the training phase to prevent the network from oscillating. The learning rate is adjusted according to the below formula:

$$\eta = bas{e_\eta } \times {\alpha ^{floor(iter/stepsize)}}$$
(5)

where \(bas{e_\eta }\) is the current learning rate, \(\alpha\) is a fixed hyper parameter which is set to 0.1 in the experiments, iter is the current number of iterations, stepsize indicates the number of iterations when the learning rate will be changed (it has been set to 20000 in our experiments), \(floor( \cdot )\) is the rounding down operation.

Dropout In order to prevent over fitting, we apply the dropout strategy to the fully connected layer FC6. That is, we drop the neurons of FC6 with probability 0.5. The dropout strategy prevent the neurons in FC6 from cooperating with other nodes at the training phase, hence the other hidden nodes maybe discarded. Each time an input is presented, the EEG-Conv classifier samples a different architecture. This training strategy reduces sophisticated co-adaptations of neurons, since a neuron cannot rely on the presence of particular other neurons.

Fig. 4
figure 4

The residual block

Fig. 5
figure 5

The architecture of EEG-Conv-R classifier with residual learning

Improved EEG classifier with residual learning

Our EEG-Conv classifier has good prediction accuracy on the test set. To further improve accuracy, we develop an EEG-Conv-R classifier by combining EEG-Conv with residual learning.

Residual learning explicitly reformulates the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions (He et al. 2016; Qin et al. 2018). In other words, the residual layer learns the change of perturbations. As shown in Fig. 4, we add a shortcut after the input X, and the output of the block is superimposed upon the input, hence the output of the block becomes \(F(X)+X\), and the network weight parameters needs to learn is F(X). In EEG-Conv-R classifier, the residual block is defined as:

$$Y = F(X,\{ {W_i}\} ) + X$$
(6)

where X and Y are the input and output vectors of the layers considered. The function \(F(X,\{ {W_i}\} )\) describes the residual mapping to be learned.

Currently we add two residual blocks to EEG-Conv classifier to improve its performance. The architecture of EEG-Conv-R classifier is shown in Fig. 5.

Results and discussion

Here we evaluate the predictive performance of EEG-Conv and EEG-Conv-R on both intra-subject and inter-subject. Intra-subject prediction means the training and test data comes from the same subject, whereas inter-subject prediction means that the training and test data comes from different subjects.

Intra-subject classification performance

We randomly take 80% of the TAV3 and DROWS samples of each subject to form a training set, named \(Train\_i\), the remaining 20% of each subject as the test set, named \(Test\_i\), \(i=1,2,\cdots ,10\). Here, \(Train\_i\) and \(Test\_i\) are the training and test sets of the \(i{\mathrm{th}}\) subject, respectively. In order to avoid loss of generality, the TAV3 and DROWS samples are randomly taken from each \(Train\_i\). During the training step, each subject’s EEG data is used to train individual classification models. Thus, each \(Train\_i\) is used as input to train EEG-Conv and EEG-Conv-R, and each \(Test\_i\) is used to test both classifiers. Furthermore, we randomly extract 10% samples as the validation set from each \(Train\_i\) for cross validation, named \(Veri\_i\), \(i=1,2,\ldots ,10\).

The experimental results are shown in Fig. 6. For mental state detection of intra-subject, EEG-Conv and EEG-Conv-R have similar performance, their average accuracy reaches 91.788 and 92.682%, respectively.

As a control, the common LSTM (Long-short-term-memory) neural network model and SVM classifier are used for performance comparison with our proposed models. For SVM, A Gaussian kernel function is used, the penalty factor is set to 9.6, and probability estimation is not enabled. For LSTM, stacked layer is 2, the time step is set to 128, the learning rate is 0.01, and the stochastic gradient descent is used for dimension reduction. In addition, we refer to the literatures (Chang and Lin 2011) and (Hochreiter and Schmidhuber 1997) for the training of SVM and LSTM, respectively.

LSTM yields an average accuracy 85.132%, lower than our EEG-Conv and EEG-Conv-R. SVM has an average accuracy 88.070% with CSP (Common spatial pattern) feature extraction. Among the 10 subjects, SVM outperforms our EEG-Conv and EEG-Conv-R on only two of them, i.e., s2 and s4. The average classification accuracy of four models is shown in Table 3.

Fig. 6
figure 6

Accuracy of EEG-Conv, EEG-Conv-R, LSTM and SVM in intra-subject test

Table 3 Average classification accuracy of EEG-Conv, EEG-Conv-R, LSTM and SVM in intra-subject

The above results show that although the significant variances of EEG signal among different subjects, our proposed models, especially EEG-Conv-R, could learn better the features of EEG data, and yield excellent classification result for intra-subject.

We also perform variance analysis of SVM, LSTM, EEG-Conv, and EEG-Conv-R in intra-subject, as shown in Table 4. The stability of EEG-Conv and EEG-Conv-R is close to or slightly lower than SVM, but much higher than LSTM.

Table 4 Variance analysis of EEG-Conv, EEG-Conv-R, SVM and LSTM in intra-subject

Furthermore, we also use cross validation to compare EEG-Conv and EEG-Conv-R, as shown in Fig. 7. EEG-Conv-R can quickly approach 100%, although its validation accuracy fluctuate slightly. The main reason of fluctuations is the insufficient number of samples of each subject (the number of samples per subject is between 2000 and 3200, as shown in Table 2). But the application of residual blocks in EEG-Conv-R classifier makes the training much faster than EEG-Conv. Overall, EEG-Conv-R exhibits significant improvement in training speed over EEG-Conv.

Fig. 7
figure 7

The accuracy increase process of EEG-Conv-R versus EEG-Conv during training process

Inter-subject classification performance

To test the classification performance of EEG-Conv and EEG-Conv-R, we mix the TAV3 and DROWS samples of all the subjects together. Similarly, 80% of the samples are extracted as the training set, and the remaining as the test set. Also we randomly choose 10% of the training set for cross validation.

Fig. 8
figure 8

Accuracy of SVM, LSTM, EEG-Conv and EEG-Conv-R on inter-subject

As shown in Fig. 8, our EEG-Conv and EEG-Conv-R classifiers achieve higher classification accuracy than SVM and LSTM for inter-subject mental state recognition. The average accuracy of our EEG-Conv and EEG-Conv-R is 82.95 and 84.38%, respectively, while those of SVM and LSTM are 81.85 and 75.55%, respectively. This result suggests that our methods generalize better to the mental state detection among different subjects.

The convergence of EEG-Conv-R versus EEG-Conv

Figure 9 depicts the loss descending of EEG-Conv and EEG-Conv-R during the training process. The loss of EEG-Conv decreases slowly, and shows larger fluctuations. It needs nearly 250 batches to reach convergence for EEG-Conv. EEG-Conv-R converges quickly, achieving better convergence effect than EEG-Conv within 70 batches. That is, it takes less time to train EEG-Conv-R. The underlying reason is that EEG-Conv-R adds two convolution layers with a \(5\times 5\) and a \(3\times 3\) convolution kernel, respectively, to the depth of the model, and introduces the idea of residual learning into the model design. The \(5\times 5\) convolution kernel results in a greater receptive field, and process more parameters, which reduce the number of convolution layers. For residual learning of non-linear EEG signals, it has a good learning effect and can capture the difference between the output of the basic map and the real-world Gaussian response, so that the output of EEG-Conv-R is closer to the true value, and more easily to detect the disturbances of the signals. According to the experiment, the residual function we learned usually has a small response and fits faster.

Fig. 9
figure 9

The convergence comparison between EEG-Conv and EEG-Conv-R

Conclusion

In this paper, we have described two deep learning-based models EEG-Conv and EEG-Conv-R to predict the mental state of driver, respectively. A 5-layer convolution neural network is built to classify the mental states of drive fatigue, and both classifiers are tested by raw EEG data. The classification performances of these two models are compared to the classical SVM classifier and LSTM deep learning model with the same EEG data.

Our experimental results suggest the following findings: 1) for mental state detection of intra-subject , both EEG-Conv and EEG-Conv-R achieve better classification performances than the traditional classifiers like SVM and LSTM; 2) for mental state detection of inter-subject, EEG-Conv-R performs better than EEG-Conv, LSTM and SVM-based classifier; 3) EEG-Conv-R converges faster than EEG-Conv, and takes less time for feature extraction at the training stage.

However, insufficient sample of each intra-subject limits the performance improvement of EEG-Conv-R. We will collect more EEG data to further validation of EEG-Conv-R. Currently, we just study a binary classification. In our future work, we will apply the proposed deep learning methods to study multi-label classification of EEG signals.