Keywords

1 Introduction

Drivers and operators of several transportation vehicles, including aerial vehicles [1, 2], must be professionally prepared to have a quick and efficient reaction on the roads. Completing a long trip could take a large amount of time, in which the driver is expected to remain attentive at all times. However, after a long period of driving fatigue symptoms develop quickly until succumbing to drowsiness, which causes the driver to fall asleep causing serious accidents. In fact, one of the major factors in traffic accident causation is the effects of fatigue on drivers, but the contribution of fatigue to accidents is often underestimated in official reporting [3].

Statistics show that in a 24-h period about 38% of truck drivers exceed 14 h of driving, and 51% exceed 14 h of driving plus other non-driving work [4]. In one of several working days many of these drivers experience less than 4 h of sleep, causing a favorable scenario for the appearance of symptoms of fatigue. Driving in a state of fatigue causes the driver to lose attention and the quick and adequate response to an unexpected or dangerous situation [5], and lack of concentration, as well as poorer decision-making and worsened mood [6].

In the field of professional drivers fatigue [7] various studies have been carried out on truck drivers [4, 8, 9], aircraft pilots [10, 11], car drivers [8, 10], taxi drivers [6] and bus drivers [9]. And as a method to recognize fatigue and somnolence, neurophysiological measurements have been used, such as electroencephalography: EEG, electrooculography: EOG and heart rate: HR [10]. Psychological measures have also been used, such as mood [6], accumulated sleep [4, 6, 8] and stress [7].

In the field of software and signal analysis, systems capable of detecting fatigue of drivers have been developed through heart rate monitoring and grip pressure on the steering wheel [12], other systems include electroencephalography-based monitoring (EEG) and electrooculography (EOG) in driving simulators [13]. On the other hand, in the field of computer vision [14,15,16,17], specifically for perception [18,19,20,21], has been used the recognition and monitoring of the eyes [22, 23], pupils and mouth [24, 25] in videos of drivers [26], road detection [27], path planning [28], object detection [29], and in real time [11, 30, 31].

Our proposal for detection of fatigue in drivers is detect and monitoring the shape of the eye. To do this we first detect the people face [32] in the video of a HD camera that will be right in front of the driver. This is done using HOG [33, 34] linear SVM [35], which detects the movement vectors of the face, by placing it in a visible area on the PC screen. Then we build a face landmark using a trained algorithm based on the 68-point facial landmark detector. Our intention is delimiting the eye area and build an eye-landmark, through which we will calculate the ocular aspect ratio (EAR). So, we can know if the eyes are open or closed. The second state being the characteristic for the detection of drowsiness in the drivers.

2 Related Works

Two important algorithms for face detection and object detection in general have been developed: Haar-like features [36] and HOG [37] by Dalal and Tiggs. Both algorithms have been used in many applications and have generated more than 40 new approaches [38]. Several methods for face detection include feature extraction algorithms: HAAR [36], HOG [37], HOG-LBP [39], working with machine learning based on SVMs [39] [40] or Adaboost [41]. Other methods, a bit more robust and accurate, use deep learning [42, 43].

On the other hand, for detection of key points of interest [44,45,46] and the development of a landmark along the shape of the face, a very important algorithm has been developed: The facial landmark detector [47] by Kazemi and Sullivan. Other methods of shape detection include the diffuse spatial clustering c-means (s-FCM) [26], edge recognition [23] and LDA [24]. Also, more robust methods have been developed based on deep learning [48] and constrained neural fields [49].

For detection of faces and people, different datasets have been created, such as ETH [50], focused on detection of locomotion patterns, and MIT-CBCL [51] based on edge detection. Within the algorithm to develop the facial landmark there are three main pre-trained facial landmark detectors, based on different datasets created by several research groups. The most common is the shape predictor with 68 key points trained on 300-W dataset [52,53,54]. There is also the shape predictor with 194 key points trained on HELEN dataset [55], and the most recent, with 5 key points trained on FRGC [52, 54].

The combination between HOG Linear SVM facial detection and facial landmark detector has been widely used for facial recognition [56], 3D facial scans [57], estimation of posture [58]. Other authors have delimited the detection of certain areas of the face, for example the ocular zone, for Eye blink detection [59], face alignment [60] and drowsiness detection in car drivers [61].

3 Our Approach

3.1 Face Detection

Our approach is the use of HOG [37] for detection of faces in real time. We decided to use this method because, although there is slower, it provides greater accuracy with less falses positives than Haar cascades. We combined this method with machine learning based on SVM, through which we can train the tracking of the moving face and classify it with a high accurate.

For the training we used the CBCL [6] face data base, from where we obtained the positive samples (P, that we want to detect) and the negative samples (N, that we do not want to detect). From both we extracted the HOG descriptors. Then, we trained at SVM on the positive and negative samples. Finally, we applied the hard-negative mining technique to record the false-positives and sort them according to their classification probability, and then re-train the classifier using them as negative samples (Fig. 1).

Fig. 1.
figure 1

HOG features with SVM.

A very important factor is obtaining the face bounding box (the (x, y)-coordinates of the face in the image). Based on it we will realize the facial landmark. We applied the technique of the sliding window in all the test images, extracting the HOG descriptors and applying the classifier in each case. The process must iterate until a sufficiently large probability is detected, and then record the bounding box.

3.2 Facial Landmarks Detection

Facial landmarks are used to localize and represent salient regions of the face, such as eyes, eyebrows, nose, mouth and jawline. In this context, our goal was detected important facial structures on the face using shape prediction methods.

The facial landmark detector is an implementation of [47] by Kazemi and Sullivan. This method starts by using a training set of labeled facial landmarks on an image and the probability on distance between pairs of input pixels. We used the original pre-trained facial landmark detector with 68 (x, y)-coordinates. The indexes of the 68 coordinates can be visualized on the image below (Fig. 2).

Fig. 2.
figure 2

Visualizing the 68 facial landmark coordinates from the iBUG 300-W dataset.

These annotations are part of the iBUG 300-W data set [52,53,54], which the facial landmark predictor was trained on.

Eye Aspect Ratio (EAR).

To detect the drowsiness, we analyzed the driver eyes using a metric called the eye aspect ratio (EAR) introduced by Soukupová and Čech [59]. This method was fast, efficient, and easy to implement.

We extracted the ocular structure of the facial landmark. Each eye is represented by 6 (x, y)-coordinates starting at the left-hand corner of the eye and then working clockwise around the remainder of the region (Fig. 3: Top-left).

Fig. 3.
figure 3

Top-left: A visualization of eye landmarks when then the eye is open. Top-right: Eye landmarks when the eye is closed. Bottom: The eye aspect ratio EAR in Eq. (1) plotted for several frames of a video sequence.

The EAR is a relationship between the width and the height of these coordinates, and is calculated by the following equation:

$$ EAR = \left( {\left| {p2 - p6} \right| + \left| {p3 + p5} \right|} \right)/\left( {2\left| {p1 - p4} \right|} \right) $$
(1)

Where p1, …, p6 are 2D facial landmark locations.

The EAR is approximately constant while the eye is open, but it will rapidly fall to zero when the eye is closed. Through this information we could know if the driver has been closing his eyes for a considerable time, a clear sign of drowsiness. To make this clear, consider the following figure:

I applied a threshold to know if a closed or open eye is considered. The threshold will be a value between 65–70%.

$$ TH = 0.65*\left( {EAR_{max} - \, EAR_{min} } \right) \, + \, EAR_{min} $$
(2)

The calibration was based on the two eyes. The average is made between the threshold values found individually. This is because the program has problems detecting the blinking of a single eye.

Finally, we implemented a timer to avoid activating events through involuntary flashes. While the eye is not closed for a certain time, no action will be triggered.

3.3 Drowsiness Detection Algorithm

First, we detected the face of the driver through HOG Linear SVM and got the face bounding box. Then we detected the facial landmarks using the original pre-trained detector with 68 (x, y)-coordinates. We delimited the ocular zone of the facial structure formed, i.e., we obtained the eyes landmark. Finally, we calculated the EAR and performed the respective calibration for each frame of the video.

To detect drowsiness, we set a threshold of 0.3 for the EAR value. If this metric exceeds or stays at the value of 0.3 it means that the driver is awake. If the metric is smaller, and stays that way for more than one second (approximately 40 frames of consecutive video) it means that the driver has symptoms of drowsiness and is falling asleep at the wheel. If the second case occurs, an alarm is triggered that alerts the driver to be alert.

This is presented graphically in Fig. 4.

Fig. 4.
figure 4

Algorithm Solution flow diagram.

4 Results and Discussion

To test the operation of our algorithm we did several tests on a car. We placed a high-resolution webcam on the upper part of a car dash connected to a standard computer. We made sure the camera clearly focused on the driver face (Fig. 5).

Fig. 5.
figure 5

Left: Mounting camera to our car dash for drowsiness detection. Right: Using a PC to run the actual drowsiness detection algorithm.

We used a PC because there is the best device to perform tests and thus evaluate the performance of our algorithm. But for a real application a microcontroller will be used that is compatible with the algorithm and can execute it without difficulty.

Then, once the test devices were ready, we ran the algorithm and went out to drive normally. The eyes landmarks are detected and developed normally, and the EAR is calculated continuously in real time (Fig. 6 Left). When the person pretended to fall asleep, the value of the EAR decreased to a value less than 0.3 for a little more than a second, after which the warning alarm that woke person up and alerted him was activated (Fig. 6 Right).

Fig. 6.
figure 6

Left: Detection of a driver eyes with normal gestures. Right: Detection of a drowsy driver.

The parameters considered for the evaluation of a good performance of the algorithm were the EAR and the number of false positives.

In Fig. 3 it can be clearly seen that the EAR for normal conditions is equal to or greater than 0.3, which is within the usual value. On the other hand, the EAR drops to a value less than 0.3 when the driver narrows his eyes. In this case the value of the ratio was 0.16. This also depends on the geometry of the eye.

In the experiment participated 10 drivers. Five tests were performed for each driver. The number of true positives and false negatives were counted in 3 situations. The data found are shown below.

When the driver looks down the algorithm recognizes that as a sign of drowsiness. Something similar happens when very abrupt changes in brightness are made, like when the driver is against light.

From the data in Table 1 we can calculate the sensitivity of the system:

Table 1. Counting of true positives (TP) and false negatives (FN).
$$ Sensitivity \, = \, True \, positives/\left( {True \, positives + \, False \, negatives} \right) $$
(3)

We obtain the table below (Table 2):

Table 2. Sensitivity of the System.

It is an acceptable value of sensitivity, although it could be increased by filters and an algorithm with a more sophisticated structure.

The best situation for our algorithm is a cloudy day, because it presents a sensitivity of 98%. The recognition of the ocular zone was carried out effectively, and landmark being visible in the contours of the eyes of the person detected. The method was very stable and robust because the detection ran without errors for a long period of time and is not sensitive to sudden movements. The method worked so well that it even detect the shape of the eyes through lenses. The number of frames allowed before the alarm was activated was adequate. There is not so short that it detects a flicker as a sign of drowsiness and nor so long that the driver falls asleep long enough to produce an accident. The same can be said for the ratio of the eyes (EAR): the value 0.3 was the most appropriate, i.e., the better threshold between activity and drowsiness.

5 Conclusions and Future Works

Our system uses a predictor file and detection algorithm for the detection of face landmarks, focusing on the eyes which is the main area that shows a symptom of fatigue in a driver. By means of a timer we set a time to determine that the EAR has been reduced to zero, therefore it is determined that the driver has fallen asleep and will emit an alarm. At first, we used the Haar algorithm but due to the fact that many false positives came out and it was not robust, we decided to lean towards landmarks that through a mathematical relationship manages to establish an EAR that decides whether the driver is awake or asleep. Improve the algorithm to be immune to the lack of illumination or inaccuracy that occurs when turning the head or extending and contracting the neck. The detection of the eye contour by the “Landmark detection” method was effective in terms of robustness and precision. This was evidenced in the continuous recognition of the eyes of the person detected, even with changes in luminosity and sudden movements. Getting to detect even when the person is in profile. A good feature of the algorithm is that it performs continuous recognition and detection for long periods of time. This was due to the while cycle programmed for this purpose, which only breaks if the program is completely closed. A very suitable application is the detection of drowsy drivers, since it would be possible to prevent accidents due to drivers who are sleeping while driving. This represents a high social impact, since it would prevent many accidents and save more than one life. The final objective is to apply the algorithm in a real situation. For that objective we pretend to install or simulate a situation where the driver starts to feel fatigated and through the algorithm, the car starts to stop.