Keywords

1 Introduction

Fatigue driving refers to the physiological and psychological disorders caused by the driver’s long continuous driving [1,2,3,4]. Due to lack of sleep, long continuous driving, and other factors, the driver is easily fatigued, in turn resulting in inattentive driving, decreased judgment, improper driving operation, and increasing the potential for traffic accidents [5, 6]. Effective driver mental state detection reduces the probability of unsafe driving and property loss [7].

Currently, there are three main driver mental state detection methods: computer vision-based [8,9,10], human physiological signal-based [11, 12], and information integration technology-based [13, 14] methods. Previous methods typically collect images of the driving process, establishing appropriate criteria for judgement and using image processing techniques to analyze the driver’s facial expressions to determine if the driver is fatigued. Human physiological signal-based methods usually collect data such as the subject’s electroencephalogram (EEG), eye motion, electrocardiogram (ECG), heart rate, and blood pressure to analyze the driver’s physical state and predict its mental state. Information integration technology-based methods mainly establish an integration model of a variety of factors that may contribute to the driver’s mental state, so as to perform the driver mental state detection and analysis. Among them, EEG [15], as an objective signal, is capable of quickly reflecting the process of human physiological and mental changes and is widely considered as the most readily available and effective driver mental state detection method.

However, the following problems are inevitably encountered when using EEG signals for analysis.

  1. 1.

    The EEG signal is particularly weak and susceptible to noise.

  2. 2.

    EEG is spontaneous and highly individualized, with varying data distributions across subjects and time periods.

  3. 3.

    EEG data samples are precious, and the collection of large amounts of data entails high time and financial costs.

Accordingly, it is extremely important to obtain desirable detection results from a small number of cross-subject sample features. Transfer learning [16] can address the problem of sparse data labels by transferring knowledge from the learned source domain to an unlabeled target domain. It differs from traditional machine learning methods in two aspects: (1) Transfer learning approaches forbid the premise that data from different domains obey the same distribution and are applicable to cases where data distributions are inconsistent; (2) Transfer learning approaches attempt to solve unsupervised problems based on only a few samples.

A growing number of studies have used transfer learning for driver state detection as transfer learning continues to progress in EEG data processing. Among current review articles [17,18,19] addressing driver mental state and fatigue, to the best of our knowledge, this is the first review article that focuses on the transfer learning approach.

The remainder of this paper is organized as follows. Section 2 provides a simple introduction to transfer learning. Section 3 describes the transfer learning-based driver fatigue detection methods. Finally, our conclusions are outlined in Sect. 4.

2 Transfer Learning

Transfer learning is the ability to systematically identify and apply knowledge and skills learned in a previous domain to a new domain.

There are two very important terms in transfer learning: domain and task. In EEG-based driver fatigue state detection, a domain usually represents the EEG observations obtained when a subject performs the same learning task. The EEG observations of different subjects under the same task are defined as their own domains. Furthermore, the domain can be divided into the source domain \( \left( {D_{S} } \right) \) and the target domain \( \left( {D_{T} } \right) \), depending on whether the domain has knowledge or not. Specifically, the source domain with label information is defined as \( \left\{ {X_{S} ,Y_{S} } \right\} \), and the target domain without label information is recorded as \( \left\{ {X_{T} } \right\} \). Additionally, a learning task consists of labels and the corresponding function, in which labeled spaces are represented by \( Y \) and the function is represented by \( f\left( \cdot \right) \). For instance, sentiment analysis and driver state analysis are two different tasks.

2.1 Definition of Transfer Learning

Definition 1.

As described in [16], there are two important parts in domain D, namely, feature space \( X \) and marginal probability distribution \( P\left( X \right) \), \( X = \left\{ {x_{1} ,x_{2} , \ldots ,x_{n} } \right\} \). Obviously, various domains have various feature spaces or obey various marginal probability distributions. Furthermore, given a source domain \( D_{S} \), a learning task \( T_{S} \), a target domain \( D_{T} \), and a learning task \( T_{T} \), the goal of transfer learning is to improve the learning performance of the target prediction function \( f_{\text{T}} ( \cdot ) \) in \( D_{T} \) by effectively employing the knowledge learned by \( D_{S} \) and \( T_{S} \), in which the following formula must be obeyed: \( D_{S} \ne D_{T} \) or \( T_{S} \ne T_{T} \).

The condition \( D_{S} \ne D_{T} \) implies that the source and target domain instances are various, i.e., \( X_{S} \ne X_{T} \), or that the source and target domain marginal probability distributions are various, \( P_{S} \left( X \right) \ne P_{T} \left( X \right) \). Specifically, for each task, \( T = \left\{ {Y,f\left( \cdot \right)} \right\} \), where \( f\left( \cdot \right) \) represents the conditional probability distribution \( {\text{P}}({\text{Y}}|{\text{X}}) \). Furthermore, \( {T}_{S} \ne {T}_{T} \) means that the source labels are unequal to target domain labels, i.e., \( {Y}_{S} \ne {Y}_{T} \), or that the source and target conditional probability distributions are unequal, \( P\left( {Y_{S} |X_{S} } \right) \ne P\left( {Y_{T} |X_{T} } \right) \). It is worth noting that the current problem becomes a traditional machine learning problem if the source and target domains are equal, \( D_{S} = D_{T} \), and the source and target tasks are also equal, \( T_{S} = T_{T} \).

Domain Adaptation.

As described in [20], Given a marked source domain \( D_{S} = \{ X_{i} , {\text{Y}}_{i} \}_{i = 1}^{n} \) and an unmarked target domain \( D_{T} = \{ X_{j} \}_{j = 1}^{m} \), it is assumed that their eigenspaces are the same, i.e., \( X_{S} = X_{T} \), and their class spaces are also the same, i.e., \( Y_{S} = Y_{T} \). However, the marginal and conditional distributions of both domains are different, i.e., \( P_{S} \left( X \right) \ne P_{T} \left( X \right) \) and \( P\left( {Y_{S} |X_{S} } \right) \ne P\left( {Y_{T} |X_{T} } \right) \). Then, the goal of transfer learning is to use marked data \( D_{S} \) to train a classifier \( f :x_{S} \to y_{S} \) to classify the label \( y_{T} \in Y_{T} \) of the target domain \( D_{T} \).

2.2 A Brief Introduction of Transfer Learning Methods

Instance-Based Transfer Learning.

To efficiently use the similarity with the target domain, some source domain data samples are reused according to weight generation rules to carry out the transfer learning [21,22,23,24]. The instance-based weight method has rich theoretical achievements and is easily deduced and used. However, this type of method is usually effective only when the distribution difference between fields is small.

Feature-Based Transfer Learning.

This type of method mine the correlation between the source and target domains through feature transformation, so as to decline the variation between the two domains [25]; or merge the data features of the source and target domains into a unified feature space, so that the improved traditional method is able to complete the related task [26,27,28,29,30,31]. Currently, this is the most important and common transfer learning method, which is extensively employed for the cross-subject transfer of EEG in fatigue driving.

Model (Parameter)-Based Transfer Learning.

Let the data in the source and target domains share certain model parameters, then the goal of transfer learning is to discover the shared relevant parameters from the source and target domains. Presently, methods involved in deep neural networks [32,33,34,35] are also among the model-based transfer learning methods. In addition, models combining domain adaptation and deep neural networks are commonly used methods, which are not only model-based but also feature-based methods.

Relationship-Based Transfer Learning

[36]. Most of this type of method emphasize on identifying the relationship between the source and target domain samples, but with only a few related studies. Furthermore, most of them are based on Markov logic net to mine the commonality between various domains.

Over the past decades, to tackle the non-linear, unstable, and high-dimensional characteristics of EEG data, an abundance of effective EEG feature extraction methods further upgraded the extraction effectiveness. In the current EEG-based application, instance-based, feature-based, and model (parameter)-based methods have mostly drawn the attention of researchers. Therefore, Sect. 3 focuses on introducing the development of these three methods.

3 Transfer Learning-Based Driver Fatigue Detection Methods

Transfer learning has been proposed to address the small sample problem and the adaptation of different domains. Transfer learning is capable of providing more effective solutions to the EEG transfer classification problem across subjects. Accordingly, it has been applied in driver fatigue detection.

In the application of transfer learning in driver fatigue detection, most existing detection systems (framework) have the same construction processes, in which several general characteristics can be summarized. Therefore, the first part of this chapter summarizes the characteristics of driver fatigue detection systems based on transfer learning methods. Presently, the transfer learning pattern recognition methods for driver fatigue detection are mainly feature- and model-based, and to a lesser proportion instance-based. These methods are summarized next.

3.1 System Features

Driver fatigue detection systems collect EEG signals during driving, process EEG signals either online or offline, feedback and control the results.

A complete driver fatigue detection system [18, 37] usually includes signal acquisition, signal preprocessing, feature extraction, pattern recognition, and feedback.

Signal acquisition: The weak EEG signal is detected by an electrode placed on the subject’s scalp, then amplified and digitized, and finally recorded by the matching recording system. It is mainly collected by the EEG cap and other EEG devices, including 64- and 32-electrode EEG caps.

Signal preprocessing: Used to remove common noise, interference, and artifacts, in order to improve signal quality. Commonly used preprocessing methods are digital filtering and independent component analysis (ICA).

Feature extraction: Used to reduce the dimensionality of EEG data and extract relevant features for pattern recognition. Common feature extraction methods are in temporal, frequency, and spatial domains as well as a combined analysis of the two in three domains Frequency domain analysis: autoregressive (AR) model and power spectrum estimation (PSD); temporal-frequency analysis: wavelet transform and wavelet packet transform; spatial domain analysis: principal component analysis (PCA) and common spatial pattern (CSP).

Pattern recognition model (PRM): The existing transfer learning pattern recognition driver fatigue detection model is mainly based on classification (C) and regression (R) models. The classification model usually sets the category information for the driver’s mental state from the EEG data according to a certain threshold. Then, the transfer learning model outputs the fatigue category, which is a discrete value. The regression model predicts the specific sleepiness state through the model, and the output is a continuous value.

Explore transferability in transfer learning: Many existing methods [38,39,40] consider how to select the optimal auxiliary source domains in order to further reduce the transfer learning cost and error. The appropriate auxiliary data is often more effective for transfer learning with less effort.

In the literature, feature extraction methods and pattern recognition models have some characteristics that can be summarized. In the following sections we summarize them according to different pattern recognition methods.

3.2 Instance-Based Transfer Learning Methods

The processing of instance-based transfer learning method is simple. In order to fully utilize the existing source domain data in EEG-based fatigue detection, it is necessary to perform similarity matching with the target domain according to weight generation rules to complete the data alignment and transfer learning. Table 1 summarizes this approach.

Table 1. Instance-based transfer learning methods.

Wu et al. [38] proposed an online weighted adaptation regression regularization (OwARR) algorithm used to decrease the amount of data required for a given subject calibration. A source domain selection (SDS) method was also proposed to reduce the computational cost of OwARR by 50%. The online classification/regression task means that there is not enough labeled data for calibration. In the literature, each subject performing the same driving task is considered a distinct source domain. Initially, OwARR is applied to each source domain, then the final regression model is constructed as a weighted mean of these basic models. Together, the final regression models are applied to future unlabeled data. Specifically, SDS is employed to reduce the clustering error of multiple source domains before domain adaptation. By selecting the best first Z source domains, SDS maintains model performance with less computation cost. On average, the training time for OwARR-SDS is approximately half of that for OwARR.

As EEG correlations between individuals are stable, there are existing auxiliary subject data to improve EEG performance. Wei et al. [39] proposed a framework for selective transfer learning that effectively utilizes the large amount of training data from other subjects to improve the recognition efficiency of unlabeled target domain data. This theoretical finding is a good reference for cross-subject transfer.

In an effort to improve system performance with minimal personalized calibration data, Wei et al. [40] used hierarchical clustering methods to evaluate inter- and intra-subject variability in a wide-scale EEG dataset of a simulated driving task. In addition, based on the existing data collected from the source subject, a model source pool was constructed. Furthermore, the framework carries out the design of an adjustment mechanism for ordering and fusing the source models of each target subject. In terms of time cost, the calibration time of the self-decoding (SD) method was 89.91 min, and that of the subject-transfer (ST) method was 1.48 min, the calibration time required for new users was reduced by 90%.

3.3 Feature-Based Transfer Learning Methods

The largest proportion of EEG-based driver fatigue detection methods are feature-based transfer learning methods due to its better feature alignment effect. This method tends to find source and target domain data based on two common mapping spaces, either optimization based on probability distribution alignment, or a combination of both. It is not difficult to train and the training effect is significant. Furthermore, compared to deep learning methods, training time and training data costs are low, therefore, it is extensively used in EEG-based tasks. Table 2 summarizes this approach.

Table 2. Feature-based transfer learning methods.

An online multi-view and transfer Takagi–Sugeno–Kang (TSK) fuzzy system [42] is proposed to estimate the driver’s sleepiness, which represents the source and target domain characteristics from multiple perspectives. In this algorithmic framework, the domain EEG data are characterized in terms of multiple perspectives. The multi-angle setting is injected into the transfer learning framework to enhance the consistency of the different angles. This online fuzzy system is more flexible and controllable than offline training.

Chen et al. [43] proposed an automatic detection system based on cross-subject feature selection and transfer classifiers to identify different driving mentalities. Considering the negative effects of noise and irrelevant information on transfer learning, they designed the class separation and domain fusion (CSDF) and utilized a hybrid feature choice methodology to combine different types of filtering methods in one framework. Additionally, they adopted a common adaptation regularization-based transfer learning (ARTL) as the pattern recognition method, which simultaneously optimizes the structural risk, the joint distribution, and the manifold consistency of two domains. This optimization method is based on the structural risk minimization principle and regularization theory.

The kernel spectral regression (KSR) with transformable discriminant dimension reduction (TDDR) method was proposed by Zhang et al. [44]. This method uses the reduced feature vector dimensionality to achieve the transfer of the classifier model cross-subjects. However, considering only low-dimensional source space discrimination is undesirable, as this would poorly generalize to the target domain of traditional dimensions. In this work, knowledge transfer using TDDR rewards the separation of domain merge data and penalizes the distance between the source and target domains by defining an objective function that rewards domain merge data. A low-dimensional latent space can be found, ensuring both discriminability and transferability, which addresses the problem of traditional dimension reduction methods only considering low dimension recognition. Furthermore, KSR is capable of overcoming the linear discriminant analysis (LDA) limitation to detect nonlinear components when reducing the EEG feature dimension. In the literature, detection results on two datasets show that the framework improves the performance of multi-class and multi-bandwidth identification.

Liu et al. [45] proposed a transfer learning-based cross-subject EEG fatigue recognition algorithm without correction. They also explored the influence of the number of EEG signal channels on algorithm accuracy and compared single and multi-channel situations. Specifically, the random forest algorithm was used to select the channel with the highest characteristic resolution. Their experimental results demonstrated that the occipital lobe channel has a better effect when considering only one channel. In this paper, two classical transfer learning strategies, namely, transfer component analysis (TCA) and maximum independence domain adaptation (MIDA) [46], are used. Among them, TCA is employed to alleviate the classification accuracy decline problem resulting from the distribution mismatch between the source and target data. The goal of TCA is to seek a potential mapping subspace where the maximum mean difference (MMD) between the source and target data is reduced in the Reproducing Kernel Hilbert Space (RKHS) [47]. The distance between these measures is empirically averaged. MIDA enables data from different domains into a potential domain invariant space, where the projected samples are independent of domain features. The accuracy was determined to be 73.01% for all thirty channels using MIDA and 68.00% for one selected channel using TCA, which was better than the baseline and deep learning methods.

3.4 Model-Based Transfer Learning Methods

A parametric model-based transfer learning method in driver fatigue detection usually addresses how to find a common parameter or prior distribution between the spatial model of the source and target domains in order to transfer knowledge through further processing. Deep transfer learning methods also belong to this category. Table 3 summarizes this approach.

Table 3. Model-based transfer learning methods.

Wu et al. [51] proposed a combined method based on transfer learning, active class selection (ACS), and a mean squared difference user-similarity heuristic and selects the best sample. Specifically, collaborative filtering is used to combine training data from a solitary subject with external training data from other similar subjects. In addition, in order to improve learning performance by combining a limited number of training samples with a substantial number of supplementary training samples from other similar topics, ACS optimizes class selection to generate individual user-specific training samples. It can boost recognition accuracy by not increasing the number of training samples.

Wu et al. [41] proposed an online EEG-based sleep estimation method based on adaptive model fusion. In this framework, only a few subjects require correction to achieve satisfactory results. Specifically, for each domain in Z auxiliary source domains, it combines with the target domain to implement the ridge regression-based domain adaptation operation and Z different models are obtained, which are fused into the final model.

In [52], a deep neural network-based transfer learning driver fatigue detection system is proposed, which increased system availability by relying solely on EEG channels. First, the signal is preprocessed and filtered, then transformed into two-dimensional spectrum. Then, the two-dimensional spectrum is classified by using AlexNet, the final normal and fatigue classification is carried out by using a transfer learning method. The FP1 and T3 channels have been shown experimentally to be the most effective channels for reflecting the driver’s fatigue state. Furthermore, with the improved AlexNet convolutional neural network (CNN) model, an efficient driver fatigue detection system can be obtained using only one channel. This method makes the driver fatigue detection system flexible, which is a major advantage.

In [53], two kinds of domain adaptive neural network (DaNN) and adverse discriminative domain adaptation (ADDA), based on the SEED-VIG dataset [50], are used to classify electrooculogram (EOG) and EEG signals. Compared with traditional domain adaptation methods, this method significantly improves the data. The experimental results show that the Pearson correlation coefficients of both domain adaptation networks are improved by more than 10% compared to the baseline. Therefore, the use of adversarial networks for EEG driver fatigue classification is a promising experiment.

Due to the continuous development of deep networks, EEG data could also be processed using model-based transfer learning methods. Moreover, parameter-based methods could be combined with feature-based methods to achieve better experimentation performance. However, there are not many deep network-based methods for fatigue detection and classification, which are necessary to further adopt efficient methods and achieve better experimental results.

4 Conclusion

With the continuous improvement of EEG acquisition devices, EEG-based driver mental state detection methods have become objective and accurate. Presently, traditional machine learning- and deep learning-based methods effectively achieve remarkable results on inter-subject experiments. However, EEG data distribution is complex and unstable. In practice, samples are precious, and more powerful models are needed to address the problem of monitoring cross-subject and cross-time EEG signals. The cross-subject problem may be addressed more effectively with the on-going transfer learning research. However, there are still certain limitations, which could be overcome in the future with the development of a large number of transfer learning algorithms.