Keywords

1 Introduction

Based on the World Health Organization estimates, car accidents are responsible for nearly 1.35 million deaths each year [1], primarily due to driver drowsiness [2]. For this reason several research studies are aimed at detecting a driver’s drowsiness, through the use of proper sensors joined with Machine Learning (ML) and Artificial Intelligence (AI) algorithms. In some cases, safety technologies exploiting on-board sensors are already implemented in cars, to classify the driving behavior by monitoring lane deviations combined with steering wheel rotation, and enriched by additional sensors, such as camera-based systems, capturing the driver’s status. However, these systems aim to detect explicit signs of micro-sleeps or sleepy behavior (such as changes in visual facial descriptors, like estimated distance between the nose and mouth; blinks and longer closures; eye gaze location; face orientation), not drowsiness, that occurs with decreasing levels of arousal which vary by individual [3]. Stress, fatigue, and illness contribute to drowsiness, that declines the driver’s cognitive capabilities, increasing the risk of accidents [4]. Moreover, the above-mentioned approaches exhibit some degrees of reliability but also suffer limitations due to uncontrollable operational conditions: variable lighting scenarios, or glasses and sunglasses worn by the driver may impact the expected detection capability. Different and innovative solutions have been consequently investigated, to overcome the mentioned limits.

Drowsiness influences the driver’s behavior and is associated with the Autonomic Nervous System (ANS) activity reflected by changes in physiological signals [2]. Specifically, the Blood Volume Pulse (BVP) signal refers to the total amount of blood circulating within the arteries, capillaries, veins, venules, and chambers of the heart, at any given time [5]. BVP leads to the estimation of Heart Rate Variability (HRV), a useful indicator for both fatigue and drowsiness conditions, according to the literature [6, 7]. The Skin Conductance (SC) signal is usually exploited to detect changes in a subject’s arousal: it varies because of sweat gland secretion, and it can be decomposed in a slowly varying component, known as Skin Conductance Level (SCL) [8], plus a second component, featuring rapid changes in signal amplitude, known as Skin Conductance Response (SCR) [9]. For both the referenced signals, gold-standard measurement methods foresee the use of electrodes in direct skin contact, and applied on specific positions. In the case of the SC signal, it should be preferably acquired on fingers, foot or hand palms, where sweat glands are mostly spread. Unfortunately, the necessary skin contact of the sensors to acquire SC or Photoplethysmography (PPG) signals [10] is a major shortcoming of approaches based on physiological measurements in the automotive environment, where the common driving conditions are not compatible with the mentioned optimal sensor positioning choices.

For this reason, and pushed by the growing trends of the IoT-enabled wearables market, several studies tested wrist-worn devices (or wrist-worn-like ones, such as bracelets [11] and double-rings [12]) to collect physiological data that is then processed by classification algorithms to monitor drivers and detect a drowsy status [4]. As an example, Lee et al. assessed the driving behavior by utilizing the built-in motion sensor of a smartwatch; with a Support Vector Machine (SVM) classifier, an accuracy of 98.15% was achieved [13]. In another work, the same authors used the accelerometer signal combined with the PPG signal and they obtained an accuracy of 95.80% [14]. Instead, Leng et al. used a wearable device able to measure both PPG and SC signals on the fingers. The implemented SVM model reaches an accuracy of 98.70% [15]. Similarly, Choi et al. developed a wrist-worn wearable device with PPG, SC, temperature, acceleration, and gyroscope sensors. The SVM is employed to distinguish the normal, stressed, and drowsy states with an accuracy of about 85.00%. However, in this case, an additional PPG signal is acquired on the ear for more robust feature extraction, thus resulting in an intrusive system, less suitable for real-life driving applications [16]. In [17], we exploited the SC signal alone to assess driver’s drowsiness, obtaining an accuracy equal to 84.1%. Later on, Li et al. [18] showed that a properly designed feature extracted from the SC signal could be used in driver’s status management, and takeover-safety prediction for autonomous driving systems.

Wrist-worn devices represent an appealing solution for driver’s monitoring since they allow physiological signal acquisition in a minimally invasive fashion, and future car implementations could directly interact with them, for a feasible and integrated approach to the driver’s drowsiness detection problem. The device used in this work is the Empatica E4 wristband, which obtained CE medical certification in Europe, for acquired signals of better quality than the ones obtained from prototype devices, like those designed on purpose and used in other research works [15].

The analysis of recent literature shows that almost all the published works exploit ML or Deep Learning (DL) algorithms to process the data acquired from wearables and automatically classify the driver’s condition. As such, it is reasonable to investigate if the way the collected data is pre-processed may affect the attainable classification accuracy. In this respect, the research presented in this paper exploits a minimally-invasive wrist-worn device to collect both BVP and SC physiological signals, with the aim to explore the ANS activity related to changes in the driver’s arousal, which may be associated to a drowsy behavior. The collected time series of physiological data are pre-processed according to different sizes of data segmentation, before computing features used to feed three different ML algorithms to detect and classify driver’s drowsiness. Based on the attained classification accuracy, the impact of the different data segment sizes chosen is investigated.

The paper is structured as follows: Sect. 2 describes the experimental setup and the acquisition protocol. Section 3 presents the performed data processing, while in Sect. 4 the obtained results are provided. Lastly, Sect. 5 draws the main conclusions and outlines possible future works.

2 Materials and Methods

2.1 Experimental Setup and Acquisition Device

For the purpose of this study, a driving simulator was placed in a room with an average ambient temperature of around 23\(^{\circ }\)C. Temperature was maintained quite stable to reduce its influence on the user’s skin (e.g., sweat or cold body), and consequently on physiological signals acquired during the experimental tests. The route shown on the driving simulator was an overnight 80 km long driving path on a three-lane highway. A picture of the whole experimental setup is shown in Fig. 1.

During the driving simulation, and with the aim to propose an as much realistic as possible approach, BVP and SC signals were collected by wearing a single smart-band, the Empatica E4 [19], on the driver’s dominant wrist. The E4 is a multi-sensor wrist-worn device able to detect changes in both user’s cardiac activity and skin conductance, through two specific embedded sensors: PPG and SC sensor, respectively. By using a proprietary algorithm, the PPG sensor, equipped with 2 green LEDs and 2 red LEDs, extracts the BVP signal from the pulse waves at a fixed sampling rate 64 Hz. Regarding the SC, a small alternating current (maximum peak-to-peak value of 100 \(\upmu \)A - at a frequency 8 Hz) moves through two Ag/AgCl electrodes located on the bottom side of the bracelet, and the electric skin conductance across the skin is acquired at a sampling frequency 4 Hz, a resolution of 900 pS, and a dynamic range of [0.01, 100] \(\mu \)S. Once an acquisition session terminates, data collected from all the sensors embedded into the E4 are stored into the Empatica Cloud, from which they can be downloaded for further post-processing and analysis.

Fig. 1.
figure 1

The experimental setup used for acquisitions, including: the driving simulator, the tablet showing the KSS to select, and the E4 worn by the subject during tests.

2.2 Data Acquisition Protocol

Nine healthy subjects, namely five women and four men, were recruited for the test procedure. Since physiological data vary with age and gender, drivers have been chosen to cover the [28] years age range, that represents the age range of the majority of active drivers’ population. Moreover, to avoid bias due to gender, we selected a male and a female driver in each age cohort of 10 years, from 20 to 60 years of age. After receiving information about the study, the participants signed the informed consent. Then, they were provided with the Empatica E4 device to be worn on the dominant wrist during the six simulated driving sessions (around 40 min-long each) to acquire simultaneously the BVP and SC signals. Samples of the acquired BVP and SC signals over a whole 40 min-long session are given in Fig. 2.

Fig. 2.
figure 2

Sample acquired BVP (left) and SC (right) signals during a whole session (40 min duration).

Every 10 min, the participants reported their own perceived level of drowsiness on a tablet, according to the 9-point Karolinska Sleepiness Scale (KSS) questionnaire [20]. Similarly to other studies [21], we used the KSS self-assessment measure to subjectively quantify the level of drowsiness of the participants.

Table 1. Features extracted from HRV and SC.

3 Data Processing

The quality of signals acquired from the wrist, and consequently the performance of the driver’s drowsiness detection, can be highly affected by natural body movements performed while driving. To identify and reduce the motion artifacts in SC signals [22], the Stationary Wavelet Transform (SWT) denoising with Haar mother wavelet (7 levels of decomposition) was implemented, according to previous studies [23]. This approach firstly models the wavelet coefficients by using zero-mean Laplace distribution, then it defines high and low thresholds to distinguish clean SC signal and motion artifacts. When the wavelet coefficients exceed these thresholds, they are set to zero. Then, the application of SWT results in a denoised signal. Regarding the BVP signal, an algorithm embedded in Empatica E4 firmware (the details of which are not disclosed by the manufacturer) removes the motion artifacts, by exploiting the signal measured during exposure to the red LED. Being the HRV a well-known drowsiness indicator [24], it was derived from the BVP signal, by quantifying the inter-beat intervals (i.e., the distance between two consecutive signal peaks). The obtained HRV and SC signals (both SCR and SCL components) were divided into time intervals with a fixed size of \( \varDelta \)t seconds, and an overlapping window of length \( \varDelta \)t/2 s. It is explained later on in the paper that different \( \varDelta \)t values have been tested.

According to the 9-point KSS scale scores reported by the participants, each data window was labeled to detect the associated drowsiness status. Specifically, the KSS responses were grouped into two classes depending on the selected drowsiness level: KSS scores lower or equal to 6 (from 1 to 6) in class 1 (labeled as awake), and scores greater than 6 (from 7 to 9) in class 2 (labeled as drowsy). Finally, a total of 32 features, chosen in both time and frequency domains as listed in Table 1, was computed for each window. The features were considered in some cases since already used in previous studies [25,26,27], or, in other cases, such as the SC peaks, since characterized by a significant information content [28, 29].

As a first choice, \( \varDelta \)t = 30 s and an overlapping of 15 s were considered [25, 30]. Such a window size was chosen according to [30], in which authors state that drowsiness can be detected in short periods. Then to study the effect of the time window length on the performance of the ML classifiers tested, the analysis was repeated by considering different sizes of data segmentation. In particular, segmentation windows with a length of 15 s, 45 s, and 60 s, were considered, and a 50% overlap, to preserve the condition applied in the first analysis.

Following the two steps of data pre-processing and feature extraction, several ML algorithms were tested in MATLAB by means of the Classification Learner Toolbox. All the features were considered with a 10-fold cross-validation on input data. For each window, to perform ML analysis, the dataset was divided randomly into training and testing sets, corresponding, respectively, to 80% and 20% of the entire set. Since at each run of the classifiers the two groups are different, we tested every algorithm 10 times for all size windows, then we calculated the mean value of the obtained classification accuracy. To improve ML performance, features were normalized by their maximum value in each window, resulting in the range [0, 1]. Data analysis and feature extraction were performed in MATLAB environment. Figure 3 summarizes the different steps of the applied data processing.

Fig. 3.
figure 3

Graphical summary of the applied processing steps.

Table 2. Accuracy of different classifiers with the tested approaches.

4 Results

The classification performance was assessed based on the accuracy provided by each algorithm, and obtained by considering different lengths of the time windows applied for data segmentation. The accuracy provided by a classification algorithm is defined as:

$$\begin{aligned} Accuracy = (1-ErrorRate)\cdot 100 \end{aligned}$$
(1)

where ErrorRate = |\(N_{cci}\) - \(N_{ti}\)|/\(N_{ti}\), being \(N_{cci}\) and \(N_{ti}\) the numbers of correctly classified instances and total instances, respectively.

The length of the time windows used to evaluate and compare the classification performance are summarized in Table 2. The tested algorithms with the highest performance are Decision Tree, SVM, and Ensemble. Considering the 30 s-long window, i.e. the first tested window length, the best result was achieved by Ensemble with an accuracy of 92.2%, then SVM with 89.3%, and Decision Tree with 83.2%. However, taking into account all the analyzed window lengths, the best results were achieved by Ensemble considering a 15 s-long window with an accuracy of 93.0%. SVM and Decision Tree reached the highest performance with an accuracy of 90.0% and 84.7% respectively, both considering a window of 60 s. The results are comparable to the previously discussed works [13,14,15,16].

Figure 4 shows the results in terms of accuracy for different lengths of acquisition time windows. It is clear there are no significant differences among all the considered values, demonstrating that the physiological onset of drowsiness conditions is a slow phenomenon. Additionally, high accuracy levels and slight performance variations make the obtained results more robust since they confirm that it is possible to obtain a high classification accuracy value, even considering several lengths of the time windows.

The results herein presented, obtained on a dataset including few subjects, suggest the possibility to detect, by proper classifiers, the drowsiness condition in drivers. High classification accuracy has been obtained by considering signals collected from a minimally invasive wrist-worn device, which may represent an important outcome towards practical solutions to the problem of increasing drivers’ safety.

Fig. 4.
figure 4

ML accuracy of the tested algorithms considering different time windows. The black line refers to the Ensemble classifier, the blue line to SVM, the red line to the Decision Tree algorithm. (Color figure online)

5 Conclusions and Future Works

This study focused on the detection of driver’s drowsiness based on BVP and SC physiological variations, recorded through a wrist-worn device during a simulated overnight driving session. In particular, the authors attempted to classify the alert status from the drowsy one, by testing three ML algorithms. Since BVP and SC signals are linked to ANS activity, 32 features were extracted from cited physiological signals. They were segmented in time windows of 15 s, 30 s, 45 s, and 60 s with an overlap interval equal to 50% of the considered window length. Although the dataset is limited, it covers a wide range of ages (28–60 years) and it is also gender-balanced since drivers are 4 males and 5 females. Table 2 and Fig. 4 prove that the drowsiness detection capability of the tested classifiers is not affected by the choice of the window length used for data segmentation. In fact, using a shorter window length value instead of a longer one does not change significantly the attainable accuracy. For two classifiers out of three (namely, Decision Tree and SVM), increasing the window length slightly improves the classification accuracy: this could support the idea that drowsiness onset is a slow process, better detected on a longer observation time. In the end, we can say the obtained results demonstrate the possibility to detect drivers’ drowsiness reliably.

In future works, we will perform driving tests on a larger number of drivers with the purpose to involve drivers in the age range between 18 and 28 years, as well as in the range of 60–80 years age. This way, we will test a population including almost all the ages for which a driving license is allowed. We will study ML performance considering larger windows for data segmentation (i.e. 1 min, 2 min, 5 min) to investigate the rapidity of drowsiness level variation. Also, we will investigate the effects of features normalization on the classifiers to find the best data processing approach for drivers’ drowsiness detection. Based on these studies, we will develop a custom algorithm to further improve detection accuracy.