Keywords

1 Introduction

Driving is a highly complex activity that requires considerable perceptual, physical, and cognitive demands to be effective [1]. As the driver must remain aware of the environment, active attention plays a crucial role in safe driving. It is estimated by the World Health Organization that vehicle collisions cause approximately 1.35 million deaths worldwide and an even greater number of non-fatal injuries each year [2]. One of the leading contributors to this public health problem is drowsy driving, which accounts for 10–30% of all road accidents, and is a major cause of traffic fatalities [3].

Several factors can contribute to driver drowsiness. The most frequent causes include sleep disorders and behavioral factors such as sleep deprivation or shift work [4]. Long driving hours and time of day are also identified to increase accident risk [5]. Professional drivers are therefore more susceptible to crashes. In a Portuguese study, more than 8 out of every 10 truck drivers reported to drive while feeling sleepy [6]. In addition to the high levels of driving exposure, many drivers work long hours, sometimes irregular and in conflict with natural circadian rhythms [7]. As a result, a considerable sleep debt can be accumulated. The working schedule also poses challenges in adopting a healthy lifestyle, including a balanced diet and regular exercise [8]. They are identified as a high-risk group for health conditions such as obesity and sleep apnea.

In order to reduce the chances of accidents, technology has a key role. Driving monitoring and assistance systems have been progressively integrated into vehicles to assist drivers for a safe and comfortable driving experience [9]. Several commercial products are also available in the market, considering different measurement methods [10]. However, most current approaches focus on the detection of an impaired state of the driver rather than on its prediction [11]. Thus, it is relevant to distinguish these two terms. The ideal goal should be to predict the onset of drowsiness since, at the detection point, drowsy driving may already have led to a potentially dangerous situation or even an accident [12].

These systems can be seen as a reactive approach to drowsiness events during driving. However, a preventive one can also be considered when identifying their underlying cause. Sleep deprivation has increased globally in today’s fast-paced lifestyle, with sleep disorders reaching a substantial number of people [13]. In particular, insomnia affects approximately 10–15% of the general adult population [14], and obstructive sleep apnea 9–38% [15]. In this context, consumer products such as wearable devices are becoming widely available, and can automatically analyze sleep patterns. However, these new systems are rarely validated against polysomnography, considered the gold-standard method to assess sleep, to ensure their reliability and validity [16].

The proposed solution to increase road safety is to develop a wrist-worn wearable device that can detect and predict drowsiness when the user is driving, and continuously identify a potential chronic sleep deprivation or sleep disorder. A flowchart of the system is shown in Fig. 1. Towards the final goal, this work uses a public dataset for sleep staging based on heart rate variability (HRV) measured from electrocardiogram (ECG) signals. The preliminary results obtained will serve as a starting point for analyzing future wearable data.

Fig. 1.
figure 1

Representation of the proposed system.

The remainder of this paper is organized as follows: in Sect. 2 a comprehensive analysis of the literature is made; Sect. 3 describes the methodology used for sleep staging, with the results presented in Sect. 4; Sect. 5 provides the conclusions to the work developed, and future directions.

2 Related Work

2.1 Measurement of Driver Drowsiness

Several techniques to estimate driver drowsiness have been proposed in the literature. According to the source of information, these methods can be classified into the following measures: subjective, behavioral, vehicle-based and physiological [17]. A hybrid approach that combines several methods can also be used.

Subjective measures include self-assessment and observer ratings [18]. The driver’s personal estimation is evaluated through scales such as the commonly used Karolinska sleepiness scale (KSS) [19], represented by a nine-point scale that ranges from “extremely alert” to “very sleepy” as shown in Fig. 2. During an experiment, the considered questionnaire is presented to the subject repeatedly, with either a time interval or certain conditions. In terms of observer ratings, experts or trained individuals observe the driver in real-time or by watching video recordings, with scales that focus on behavioral changes. As these measures are not practical to be applied in real driving conditions, they are mainly used as ground truth for drowsiness detection systems.

Fig. 2.
figure 2

Karolinska sleepiness scale (KSS).

Alternatively, behavioral measures use a camera and image processing techniques to monitor the driver. These methods evaluate mainly three parameters: facial expression, eye movements and head position. Vehicle-based systems assess driving performance, with features such as steering wheel movement and deviation of lane position. The last category involves using physiological signals, that include the following [10]:

  • Brain activity: captured by electroencephalography (EEG);

  • Ocular activity: measured by electrooculography (EOG);

  • Muscle tone: recorded using electromyography (EMG);

  • Cardiac activity: monitored through electrocardiography (ECG) and photoplethysmography (PPG) signals.

  • Skin conductance: measured by electrodermal activity (EDA);

All of the different methods present some limitations [11, 20]. Behavioral measures can be affected by the environment and driving conditions, such as changes in lighting and the use of glasses. Vehicle-based systems are highly dependent upon road geometry, and are often not effective in conditions with substantial variation. Finally, physiological methods involve the intrusive nature of sensors. Nevertheless, this kind of data is considered reliable and accurate to measure the driver’s functional state. It starts to change in the early stages of drowsiness and is, therefore, more suitable to provide an alert on time. Thus, non-invasive monitoring strategies for recording signals are required.

Over the years, the usage of wearables has been gradually growing. According to the International Data Corporation, global shipments of wearable devices reached 444.7 million units in 2020, which marks a 28.4% increase compared to the previous year [21]. In this market, prevails particularly the trend of fitness tracking and health monitoring with wrist-worn devices, such as smartwatches and fitness trackers [18]. The wide user acceptance is associated with advantages such as low cost, comfort, and continuous recording of several physiological signals. These can be considered suitable for the task of detecting driver drowsiness, and will be further assessed in the following sections.

2.2 Drowsiness Detection

The use of wrist-worn wearable devices for driver drowsiness detection has been explored by previous work. Table 1 summarizes existing studies, comparing the methodology adopted in terms of measures, algorithms and evaluation.

Table 1. Summary of research on driver drowsiness detection with measures collected from a wrist-wearable device. MVT–movement with accelerometer and gyroscope sensors; TMP–temperature; C–classes; Acc–accuracy; N–participants; SVM–support vector machine; CNN–convolutional neural network; KNN–k-nearest neighbors; and DS–decision stump. (*) detects drowsiness, stress and fatigue.

To record driver state, studies are conducted in simulated environments. The scoring is obtained with subjective metrics, whose levels are typically grouped to a reduced number of classes. The collected signals are divided mainly using a sliding window strategy and the model performance is evaluated with different forms of cross-validation (CV). Because physiological signals within persons can differ to a great extent, tests that consider the split among subjects are crucial to evaluate the ability to generalize for new users.

The majority of studies use PPG sensors to derive HRV features. This analysis refers to the variation in time between successive heart beats, called inter-beat intervals (IBIs), and represents a non-invasive measure of the autonomic nervous system [26]. Another commonly used description for IBIs is RR intervals, which is the time between two R-peaks of the QRS complex on the ECG. HRV can be described in 24 h, short-term \((\sim \)5 min), and ultra-short-term \((<\)5 min) measurement periods, using time-domain, frequency-domain, and non-linear parameters [27]. Time-domain indices measure the amount of HRV that was observed. Frequency-domain values estimate the distribution of absolute or relative power into component bands. Finally, non-linear metrics quantify the unpredictability and complexity of the time series. Although multi-lead ECG devices are established as the gold standard for computing HRV, wearable devices based on single-lead ECG and PPG are considered a viable and popular alternative. The main drawback is that this type of sensing is more affected by motion artefacts, pressure disturbances and skin pigmentation [28]. Nevertheless, noise and artifact reduction techniques can be used to improve signal quality. An overview of the typical methodology of HRV analysis for drowsiness detection systems is presented next, in Fig. 3.

Fig. 3.
figure 3

Overview of the typical methodology used for drowsiness detection, when performing heart rate variability (HRV) analysis on photoplethysmogram (PPG) signals collected from wrist-worn wearable devices.

Results show that high accuracies can be achieved, but the employed datasets can introduce some conditioning factors. In HRV recordings, important subject variables that can affect measurements include age, sex, and health status [27]. Moreover, the association between measured signals and driver alertness is often performed at unknown circadian phase and wake duration [10]. The influence of inter-driver variance is reflected in lower values of accuracy, which indicates challenges that still need to be investigated to develop a robust, yet comfortable and cost-effective commercial drowsiness warning system. In this context, market products based on physiological signals present low progress compared to driving and driver behavioral technologies [10]. Apart from research, a wrist-wearable device is not yet available at the moment.

2.3 Drowsiness Prediction

For the task of predicting driver drowsiness, current research is still limited as there are no studies that consider wearable devices.

In [11], two independent models were developed using neural networks. Every minute, a detection model identifies the level of drowsiness, and a prediction model indicates the time required to reach a certain level of drowsiness (1.5 on a 0–4 scale). For that, physiological, behavioral, and vehicle-based indicators were investigated. The best performance was obtained with behavioral measures and additional information, namely, driving time and participant data. These models were able to detect and predict drowsiness with a mean square error of 0.22 and 4.18 min, respectively. However, inter-individual variability was only considered in [29]. To find a compromise between generalized and individual models, adaptive learning was used. The improvement in performance was significant from the first 3 min up to 15 min of input data, reaching about 40% in detection and 80% in prediction. Nevertheless, intra-individual variability was not addressed, that is, how regularly this adaptation would be necessary.

In order to predict the drowsy state, the time remaining until a target level is reached was used. However, other studies consider different approaches. In [30], logistic regression models were built to detect micro-sleep with 93% accuracy, considering the individual driver factor and eyelid measures. It was possible to achieve a specificity of 98% and sensitivity of 67%, and there were no significant changes in performance when using different time intervals relative to the events (from 1 min to 10 min). In [31], an accelerated failure time model was developed to estimate the driving time before the onset of drowsiness. For that, environmental and demographic factors were used, such as time of day, temperature, travel speed, driving experience, age, and sleep habits. The proposed model provides an understanding of how driver drowsiness is influenced by these factors and could be used in real-time drowsy warning systems.

In these studies, physiological measures were mainly collected in an intrusive manner, i.e., using electrodes. Therefore, a relevant direction is to investigate if similar results can be obtained in a simple and non-invasive way.

2.4 Sleep Staging

Sleep staging is essential to assess sleep and diagnose sleep disorders. This process involves segmenting a sleep period into 30 s epochs and assigning a sleep stage to each epoch [32]. According to the American Academy of Sleep Medicine (AASM) manual, sleep is divided into five stages: wake (W), rapid eye movement (REM), and three levels of non-REM (NREM) corresponding to N1, N2, and N3. Traditionally, sleep staging is performed by experts based on visual inspection of polysomnographic (PSG) recordings, which include multiple physiological parameters. Although it remains the gold standard for clinical assessment of sleep, PSG has some drawbacks: the scoring procedure is expensive, time-consuming, and prone to human errors [33]. Therefore, alternative methods and algorithms capable of accurately estimating sleep stages are needed.

To assess long-term sleep, actigraphy can be a useful tool [34]. This technique relies on a wrist-worn device that infers wake and sleep states by measuring movement through an accelerometer. Although it has some advantages, the cost and requirement of specialized technicians are among the main factors leading to the consideration of consumer wearables. These devices use multi-sensor data acquisition, and are not limited to binary sleep classification. Despite their widespread use, validation studies show that they tend to underestimate sleep disruptions and overestimate sleep efficiency, i.e., prioritize sensitivity to specificity [35]. In particular, these measures ranged from 95–97% and 39–62%, respectively, in four commercial solutions analyzed [36]. It is important to note that the algorithms implemented in these self-tracking devices are not public, and raw sensor data is not accessible for external use. As a result, although promising for understanding of sleep health, their application in sleep research and clinical sleep medicine is still limited [34]. Some recent studies are summarized in Table 2, considering different scoring resolutions.

Table 2. Summary of research of sleep staging with measures collected from a wrist-wearable device. Classification is divided in two-stages (wake/sleep), three-stages (wake/NREM/REM), four-stages (wake/light sleep(N1+N2)/deep sleep(N3)/REM), and five-stages (wake/N1/N2/N3/REM). Results are presented in a accuracy/kappa format. N–participants; LDA–Linear Discriminant Analysis; BLSTM–Bidirectional Long Short-Term Memory; ANN–Artificial Neural Network.

Despite the differences among studies, classifiers achieve a lower performance when the number of classes increase. A sequential model that considers the temporal dependencies of sleep is trained in [39] and [40]. These type of algorithms have also recently shown good results when the HRV analysis is performed using single-lead ECG data [43]. The influence of factors like demographics and environmental conditions on the signals recorded by the worn devices, and thus their capability in accurately staging sleep, should not be underestimated [34]. Except in [40], datasets are limited to healthy adults, without additional validation for other age groups or sleep disorders.

3 Methodology

For the sleep staging task based on HRV from ECG signals, the public dataset “EEG/EOG/EMG data from a cross sectional study on psychophysiological insomnia and normal sleep subjects” [44] was used. The data consists in recordings of 8h from 22 subjects, aged between 18 and 63 years. Table 3 shows the epoch distribution of the normal subjects by sleep stage. For the experiments, this data is classified into Wake, REM, and NREM (grouping N1, N2 and N3). Data processing was performed with the Python programming language, and scikit-learn and pyhrv [45] libraries.

Table 3. Distribution of segments by sleep stage. Stages N3 and N4 were merged into stage N3 according to the AASM manual.

The ECG signal was initially synchronized in time with PSG results, and the segments classified as movement were removed. For HRV analysis, the signal was divided into segments of 1.5 min, 2.5 min, 3.5 min, and 4.5 min, centered in each 30 s interval, with the goal of evaluating the impact of segment length in performance. After extracting the RR interval time series, segments were processed in time, frequency, and non-linear domains, obtaining a total of 34 features. The approach considered applies two types of validation, namely, stratified 10-fold CV and LOSO-CV. In each iteration, training data was first normalized at each attribute, to a mean of zero and standard deviation of one. Then, after selecting the best subset of features using the Pearson’s correlation coefficient with a threshold of 0.9, data was over-sampled with the SVM-Smote technique. Finally, performance measures were calculated as the average of all iterations, in particular the accuracy and sensitivity of each class. Figure 4 illustrates the process described. The four classification algorithms tested were support vector machine (SVM), linear discriminant analysis (LDA), k-nearest neighbors (KNN) with 15 neighbors, and random forest (RF) with a maximum depth of 20. The remaining parameters were set to the default values.

Fig. 4.
figure 4

Methodology adopted for sleep staging.

4 Results

The results of sleep classification are presented in Table 4. In the first validation test (10-fold CV), it is possible to observe that, except for the LDA algorithm, a larger segment dimension increases the accuracy. Using a 4.5 min window, RF obtained the best performance with a sensibility to Wake, REM, and NREM of 81%, 71%, and 93%, respectively. Regarding the subject-dependent test (LOSO-CV), this approach proved not sufficient to deal with individual variability. In the same setting, RF decreases the sensitivity of REM to 18%. This problem can be justified by a significant difference in class distribution between subjects. In particular, the REM stage ranges from 1 to 118 instances, and Wake from 7 to 331 instances. Therefore, to evaluate this type of scenario, a more comprehensive dataset is required.

Table 4. Results of the classification of 3 classes (accuracy and standard deviation), with different algorithms and window length.

5 Conclusion

The impact of drowsy driving is of recognized severity. This work reviews current solutions to address this problem, with a focus on wrist-wearable devices, which allow continuous long-term monitoring of multiple signals. In this context, a system that can detect, predict, and prevent driver drowsiness is proposed. Towards the final solution, sleep staging was performed with HRV analysis on ECG signals, using traditional machine learning algorithms.

Results show that a broader dataset is essential to improve the performance on subject-dependent tests. Future work will explore deep learning architectures, and the inclusion of new signals.