Keywords

6.1 Introduction

Road accidents are certainly a growing concern in most of the nations because of its increase in the cause of death tolls. Reports from the World Health Organization (WHO) state that road accidents are the 9th leading cause of death globally [1]. Sleep-associated vehicle accidents have the highest share of traffic accidents. 30% of all fatalities and injuries have been caused by sleepiness worldwide.

The US National Highway Traffic Safety Administration (NHTSA) projected that drowsy driving is the reason for 40,000 injuries and 1500 demises in car crashes per year [2]. These deaths could be avoided if driver drowsiness was observed correctly, and drivers were certainly alarmed. Sleepiness is a physiological phenomenon signifying a lack of energy and motivation. There are various issues such as inadequate sleep, extended mental or physical work, long periods of stress and anxiety that generate fatigue in humans. Driver drowsiness has gained importance in the last 80 years. According to statistics, there is a huge requirement to build a system for monitoring the drivers and gauging their level of attention, which is also called as advanced driver assistant systems (ADAS). There are several techniques to detect driver drowsiness, which are broadly categorized into vehicle based, behavior based and physiology based.

In the vehicle-based methods, the parameters such as pressure on the acceleration pedal, lane deviation and steering angle position are used to determine whether the driver is under the influence of drowsiness or not. Nowadays, modern vehicle manufactures implement one or all of the above methods in the driver assistant systems as a safety measure. However, the driving environment, road marking, climatic conditions and driving skill make this method difficult to evaluate the state of the driver [2].

Another technique is to continually monitor and record the driver’s motion, namely eye closure, eye blink, yawning and head movements through a fixed camera with an intelligent system to discriminate between the state of the vigilance and drowsiness. Nevertheless, the critical limitation in this method is due to the reduced recognition rate under the dim lighting background and the stage at which the drowsiness is detected. The behavioral method detects drowsiness only after the driver is entirely influenced by sleep.

Among the various drowsiness detection techniques, the physiology-based process involves electroencephalogram (EEG), electromyogram (EMG), electrocardiogram (ECG) and electrooculogram (EOG). The EEG method yields high accuracy and is referred to as the gold standard. Thus, the physiology-based method produces high efficiency and reliability in detecting drowsiness at an early stage with high efficiency. Yet, the EEG signal, which is the most preferred physiological signal, has the closest association with drowsiness. The drawbacks of the EEG signals are that as they have high temporal resolution and can be easily interfered by EMG, eye blink and electromagnetic noise [2].

Drowsy driving is a vital issue and hence need to be identified at the earliest. Drowsiness has subsequent effects: (a) driver’s reduced attention to surroundings, (b) considerable delay in reaction time and (c) affecting the driver’s capability to make decisions. The literature study attempts to review all the previous approaches in detecting drowsiness. Drivers involved in sleep-related crashes reported that the quality of their sleep was either poor or fair [3]. Missing an hour of sleep can lead to car crash risk. The only antidote for drowsiness is sleep [4].

Li et al. [5] used Fp1 and O1 EEG channels to discriminate between alert and drowsy. Pranoto et al. [6] detected drowsiness in truck drivers using a speed limiter integrated fatigue analyzer. The system is based on the Arduino UNO, sensing the driver’s body temperature and heart rate. Ribeiro et al. [7, 8] developed an EEG-based drowsiness detection system based on the Sleep-EDF database and used Hjorth coefficients, power spectral density and average power to classify drowsiness. Albalawi and Li [9] developed a real-time drowsiness detection based on a single channel EEG signal using eight frequency bands. They computed the relative power and classified the drowsy and alert state with the support vector machine (SVM) classifier. They compared the results of the developed system with the MIT-BIH polysomnographic database. Leger et al. [10] developed an inflight vigilance state detection system, which used a single EEG channel for pilots. Pathak and Jayanthy [11, 12], Da Silveira et al. [13], Picot et al. [14, 15] and Ogino and Mitsukura [16] researched on the use of a single EEG channel to classify alert and drowsy states of drivers. Hu [17] compared the different features and classifiers for driver drowsiness detection based on a single EEG channel [18]. Alluhaibi et al. [19] discussed the various driver behavior detection methods, their advantages and disadvantages. Mu et al. [20] detected driver fatigue using combined entropy features from the EEG signals. Belakhdar et al. [21,22,23] used a single channel EEG signal to classify drowsiness based on the average power of the delta and alpha waves. They compared the ANN and SVM classifiers for the best accuracy. They used the MIT-BIH polysomnographic dataset. Correa et al. [24, 25] extracted features using spectral and wavelet decomposition methods to classify the drowsy and alert state.

In this work, we have considered the ULg multimodality drowsiness database. The database contains multimodal signals, namely five EEG channels, two EOG channels, one EMG and one ECG channel along with the psychomotor vigilance test (PVT) and video data. We have considered only the five EEG signals for analysis and to classify between the alert and drowsy state. We have used the framing method to segment the signal into equal frames of 2 s using a rectangular window with an overlap of 50%. The raw signal is trimmed using a band-pass filter to a maximum cut-off frequency of 50 Hz. Further, each frame is segmented into eight sub-bands using a band-pass filter bank. We have extracted the log energy entropy and band power features for each of the frames. The feature set is then rescaled using a bipolar normalization method. The feature set is labeled as drowsy and alert, and then, all the alert and drowsy data are combined to form the final dataset. The K-fold cross-validation technique is used. The dataset is trained using K-nearest neighbor network, and support vector machine classifiers, and the results are compared.

The remaining of the paper is ordered as follows. The details from the literature and the proposed methodology are explained in the second part of the introduction section. The multimodality drowsiness database, data acquisition protocol and the channel selection are explained in the first part of the methods section. The second part details the preprocessing, sub-band filtering and feature extraction procedures used in this approach. The data preprocessing section details dimensionality reduction and data preparation for the six classification models. The results section projects the results followed by the discussion section, which discusses the outcome of this research. The final conclusion section details the contributions and concludes the paper.

6.2 Methods

DROZY database: The publicly available ULg multimodality drowsiness (DROZY) database is considered in this research as it contains a multimodal approach. The complete details of the multimodal data collection, data description and the protocol are represented in Fig. 6.1 [1].

Fig. 6.1
figure 1

Adapted from [1]

Protocol design during the data collection.

In this research, it is planned to use the ULg multimodality drowsiness database. The database contains physiology related signals, i.e., EEG, EOG, ECG and EMG. By using the Embla Titanium system, the signals were recorded from EEG channels C3, C4, Cz, Pz and Fz referenced at A1 in the international 10-20 system. The vertical and horizontal EOG signals, ECG and EMG signals were also recorded at a sampling frequency of 512 Hz. Along with these signals, video signals were also recorded. The drowsiness and alert signals were recorded from 14 male and female subjects. The test was conducted in a controlled environment in three trials. Before the first trial, the subjects were asked to have a good sleep pattern for the past week. In the first trial, the subjects were asked to perform an action watching the screen. After the first trial, the subjects were asked to stay awake for 36–38 h to keep them sleep deprived. In the second and third trials, the subjects performed the same previous experiment. At the end of the final test, the subjects were arranged for a sound sleep before leaving the laboratory.

Channel selection: The experiment was conducted in three trials in two days, as mentioned in the protocol. The authors conducted an extensively used tool to measure the performance impairments due to drowsiness using a 10-min psychomotor vigilance test (PVT). This test gives the reaction time to visual or auditory stimuli that occur at random inter-stimulus intervals.

The database contains PSG signals from 11 electrophysiological signals (five EEG, two EOG, one ECG and one EMG). The five EEG channels are recorded from C3, C4, Cz, Pz and Fz locations present in the central lobe of the brain. The placement of the EEG channels is depicted in Fig. 6.4. The sensors are placed around C3, C4 and Cz locations and deal with the sensory and motor functions [26]. This research uses monopolar montage with C3 as a reference. The vertical and horizontal EOG signals are recorded from above and at the side of the right eye to capture the eye blinks. An ECG channel is recorded from the electrode placed on the chest, and an EMG signal is recorded from the electrode placed on the neck of the participant. The placement of electrodes is pictorially represented in Fig. 6.2.

Fig. 6.2
figure 2

EEG electrode placement conferring to the international 10-20 system

6.3 Feature Extraction

Dimensionality reduction plays a vital role in classifier performance. Dimensionality reduction is achieved either by feature extraction or feature selection. For both feature extraction and feature selection approach, the feature evaluation criterion, dimensionality of the feature space and optimization procedure are required. Feature extraction is the conversion of the raw data to a dataset with the selected number of variables which contains the most discriminatory evidence. Feature extraction, on the other hand, considers the whole original data and maps the useful information into a lower-dimensional space. The following section explains the signal processing and feature extraction process (Fig. 6.3).

Fig. 6.3
figure 3

Multimodal electrode placement during the experiment

EEG signal processing: EEG signals are the non-invasive physiological means of recording brain activity. It has the closest association with drowsiness. The EEG has neural domain signal information which has a high temporal resolution but can be easily interfered by EMG, eye blink and electromagnetic noise [27]. The EEG, along with the EMG, ECG and EOG signals, remains the European data format (EDF). A MATLAB function is used to extract only the EEG signals, which are of interest in this research. The experiment was conducted for 10 min, and the EEG signal contains information up to 600 s. The signals are recorded at 512 Hz sampling frequency. The signals include both EOG and EMG, which contribute toward the drowsiness detection. Hence, the eye blink artifacts and the EMG artifacts which are removed in conventional biosignal processing is avoided in this research. The raw signal is processed directly without any artifacts removal. However, a Butterworth low-pass filter with a cut-off frequency of 50 Hz is designed to attenuate the high-frequency components.

Sub-band filtering: The EEG signal is conventionally divided into alpha, beta, theta, gamma and delta waves based on their rhythms. These waves are further subdivided into low, medium and high and are extracted using a suitable band-pass filter based on their frequencies [9]. A Butterworth band-pass filter bank with eight frequency bands is designed and used to extract the sub-bands of the EEG signals. Table 6.1 details the cut-off frequencies for all the eight sub-bands.

Table 6.1 EEG sub-band filtering

Frame blocking, windowing and overlap: The raw signal is recorded for 10 min. It is difficult to apply any feature extraction methods to the whole signal. Hence, the signal is divided into specific time windows (signal epochs) from the continuous EEG signal. On a trial and error method, the duration of the time windows is selected based on the performance metrics. In this case, the 2-s window is chosen to segment the raw EEG signal. An overlap of 50% is applied during the process of windowing (Fig. 6.4).

Fig. 6.4
figure 4

Frame blocking

During each trial of the experiment, data are recorded for an average time of 10 min, which is approximately 600 s. All the signals are read, and the maximum length of the signal is computed and taken as reference, for those signals where the length is short, zeros are padded at the end of the signal to make all signals even. The raw EEG signal is segmented into 2 s pulses. After segmenting the signals into frames of 2 s with the windowing technique, the original raw signal grows to 300 per each subject per each trial, thereby enhancing the data.

Spectral Centroid

The spectral centroid is widely used in speech research to identify the robust and dominant frequency. The spectral centroid for the vibration signal is calculated using Eq. (6.1) as shown below.

$${\text{SC}} = \frac{{\int\limits {kX\left( k \right)} {\text{d}}k}}{{\int\limits {X\left( k \right){\text{d}}k} }}$$
(6.1)

where X(k) represents the FFT output, and k represents the number of FFT components.

Spectral Centroid frequency

The spectral centroid frequency is computed as the average of amplitude weighted frequencies, divided by the total amplitude. The spectral centroid frequency for the vibration signal is computed based on Eq. (6.2) as shown below

$${\text{SCF}}_{i} = \frac{{\sum\nolimits_{n = 1}^{M} {f*S\left[ f \right]} w_{i} \left[ f \right]}}{{\sum\nolimits_{n = 1}^{M} {S\left[ f \right]w_{i} \left[ f \right]} }}$$
(6.2)

where M is the number of frequency bins.

Classification: Data preprocessing is referred to the set of procedures used to process the raw data for further processing such as classification or clustering. The dataset derived from the features does not follow a uniform data distribution. Hence, the feature set is rescaled between a certain range to enhance the performance of the classifier. The spectral features derived from the vibration signals are rescaled between ‘0’ and ‘1’ and associated with the class to form the dataset for classification.

The support vector machine (SVM) model is a machine learning algorithm based on the statistical learning. The SVM model is chosen in this study to classify the presence of damage. In this method, we plot each data as a point in the n-dimensional space with the value of each feature being the value of the particular coordinate. The classification is performed by finding the hyperplane that differentiates between the two classes. Hence, finding the right hyperplane becomes very crucial in SVM. By maximizing the distance, also known as the margin between the nearest data point and the hyperplane will determine the right hyperplane. The kernel method is used to transform the low-dimensional input space into a higher-dimensional space in a nonlinear separation problem. The ‘fitcsvm’ function in MATLAB is used to model the SVM. ‘Radial basis function’ kernel is used to classify between drowsy and alert classes.

K-nearest neighbor algorithm (KNN) is one of the simplest and easy to implement a supervised machine learning algorithm. The K-factor is very crucial in determining the class boundaries. The boundaries become smooth with increased values of K. The training error rate and the validation error rate are the two parameters used to access for the values of K. The ‘fitcknn’ function in MATLAB is used to model the KNN classifier. The ‘minkowski’ method is used as a distance metric, and the number of neighbor’s value is chosen to be 3 in this classification method.

6.4 Results

The spectral features namely spectral centroid, and spectral centroid frequency are extracted for each frame of the EEG signals. Support vector machine is used to train the feature matrix. The K-fold cross-validation method is used to process the dataset into training and testing. The average classification accuracy, for the KNN and SVM classifiers, is calculated, and the results are tabulated in Table 6.2.

Table 6.2 Classification results

The results show that KNN outperforms SVM in accuracy, and the sensitivity of the network is high.

6.5 Conclusion

This study performed the detection and classification of drowsiness, between the alert and drowsy state using the five EEG channels. We have used the framing method to segment the signal into equal frames of 2 s using a rectangular window with an overlap of 50%. The raw signal is trimmed using a band-pass filter to a maximum cut-off frequency of 50 Hz. Later, each frame is segmented into eight sub-bands using a band-pass filter bank. The log energy entropy and band power for each of the frames were extracted as features. The feature set is then rescaled using a bipolar normalization method. The feature set is labeled as drowsy and alert, and then, all the alert and drowsy data are combined to form the final dataset. From the three trials, the feature sets for the drowsy class extracted from the trial 2 and trial 3 were labeled as ‘0’, and the alert feature set extracted from trial 1 was marked as ‘1’. The extracted feature sets were further normalized and tagged as drowsy and alert and then combined to form the final dataset. The K-fold cross-validation method is used. The dataset was trained using the two classifier models: K-nearest neighbor network, and support vector machine. The trained models were validated using the test dataset, and the performance of the classifiers on the two datasets was compared. Overall, the KNN classifier achieves 95.2 and 94.6% by outperforming the other classifiers. The proposed frame-based approach can be used for other classification applications as well. The developed model can be applied for driver drowsiness classification and other drowsiness research.