Keywords

1 Introduction

Ten to thirty per cent of the car crashes are due to drivers’ fatigue [1]. In Egypt, one of the main reasons of trucks’ accidents is drivers’ fatigue. Truck drivers drive continuously for long hours [2] and because of the lack of sleep, they lose track of the road while driving. Unfortunately, there is no clear approach for the policy to control this behaviour rather than the control points in different places. There is also an absence of definitive criteria for establishing the level of fatigue [2].

Recently, there are embedded systems in cars to assess the drivers’ fatigue automatically. Lexus Toyota [3] car has a driver monitor assist safety feature in. The car has an infrared camera attached in the steering wheel. Its beams monitor the drivers’ eye attention. This is limited to only one car brand and it is not in trucks [3].

They are attachable products to the car that detect the size and direction of the pupils such as MR688 [4] and RVS-350 [5]. These products are quite expensive, and the quality of these systems was not tested with the public yet. The challenge is to develop a low-cost real-time system that detects the drivers’ fatigue features. Having more than one measure such as eye closure and yawning can increase the accuracy of the system. Additionally, it needs to work in different light conditions and recognizes features of different faces for different age groups. It can as well identify the eyes of the drivers who are wearing reading glasses and the mouth even if it is surrounded with hair of a beard and a moustache.

This paper presents a non-intrusive system that utilizes the image processing techniques for video processing. The system tracks the driver’s current state. It first recognizes the driver face features and then infers the symptoms of fatigue and classifies  the fatigue state from non-fatigue using SVM. The paper starts by discussing the related work that detects fatigue drivers. This is followed by the proposed system and its evaluation in different conditions. It ends by discussing the results and conclusions.

2 Related Work

2.1 Current Systems

The increase in the number of accidents [1] raises the importance of developing systems that detect drivers’ fatigue. Work in this area uses either intrusive or non-intrusive devices or a hybrid prototype of both categories to increase the accuracy. The former uses sensors attached to the human body to detect physiological fatigue signals from the breathing rate, heart rate, brain activity, muscles, body temperature, and eye movement. ‘The physiological signals start to change in earlier stages of drowsiness’ [6].

Devices such as electroencephalogram (EEG), electrocardiogram (ECG), and electrooculogram (EOG) are attached to the body that reads these signals. ‘Electrocardiogram (ECG) signals of the heart vary significantly between the different stages of drowsiness such as alertness and fatigue’ [7]. Electroencephalography (EEG) signals of the brain waves are categorized as delta, theta, alpha, beta, and gamma. A decrease in the alpha frequency band and an increase in the theta frequency band indicate drowsiness [6, 7].

‘Electrooculography (EOG) detects the electric difference between the cornea and the retina that reflects the orientation of the eyes. It identifies the rapid eye movements (REM) which occur when a subject is in a drowsy state’ [6]. These systems need to be attached to the human body to get reading, and this is main challenge to implement these systems. It causes discomfort to the drivers, especially if they are attached for a long time.

The later uses devices or sensors that are attached to the car. For example, a camera captures eyes’ closer, head nodding, head orientation, and yawning of the drivers [7], or sensors are attached on the side of the car to detect drivers’ deviation from the driving lane [7].

Some of these systems are embedded in cars. For example, Ford [8], Volkswagen [9], and BMW [10] have an alert system that works based on the driving behaviour. Ford, for instance, has sensors attached on both sides, i.e. right and left of the car to detect road lanes. It alerts the driver if there is a sudden change and a diversion from the current lane. Volkswagen system senses sudden steering to other lanes and how fast drivers steer. BMW system has more features like forward collision warning, pedestrian warning, and city collision mitigation. These systems are limited for certain car brands and types. They are not widely used yet to assess their accuracy and performance.

2.2 Image Processing Techniques

Research works on improving the accuracy of these systems. In the domain of image processing, more efforts are exerted to improve detecting the facial features of fatigue and selecting the best classifiers to differentiate between fatigue and non-fatigue states. A learning-based algorithms such as Haar-like [11], Eigen-/Fisher face [12], LBP map [13], Gabor-based template [14], HOG [15] are implemented to extract the eyes or the mouth.

Based on the algorithm used to extract the eyes and the mouth, an algorithm is developed to identify the eye closure or mouth yawning. Percentage of eyelid closure (PERCLOS) is one of algorithms used to count the eye closure. ‘It counts the number of video frames in which there was no eye pupil detected and dividing this by the total number of frames for a specific time interval’ [16].

The primary difficulties of real-time estimation of PERCLOS are the algorithm that should provide good accuracy with different lighting conditions, changing backgrounds of the image, and camera shake due to the car motion, for instance. Then, a classifier such as support vector machine (SVM), convolutional neural network (CNN), artificial neural network (ANN), or nearest neighbour is selected to differentiate between two states of fatigue or not.

The selection of the right learning-based algorithm is not quite easy. Each one has its pros and cons. For example, Haar-like features [11] recognize the darker regions than the skins. That is why it cannot be implemented if the face is not straight or with a change of a light conditions. It can miss the eyes or the mouth regions.

The histogram of oriented gradients (HOG) [15] uses a feature descriptor to extract certain feature and ignores other unimportant information. According to Dalal and Triggs [17], in their research using HOG for human detection, HOG detected human with minimum false-positive results.

The accuracy of these algorithms was tested with either images of faces that have different states of fatigue or with videos. A few were tested with real-time streaming videos. The evaluations as well were done in controlled environment like labs with driving cars’ simulators. A few have been tested outdoors.

To gain more confident of the techniques, the accuracy of these techniques needs to be tested in different conditions such as in different lighting conditions of day or night and with subjects wearing reading glasses. These conditions affect the accuracy of extracting the face features and the measures of eye blinks. The mouth surrounded with hair of a beard and a moustache can influence the accuracy of measuring yawning.

This work develops a system that detects the face features, extracts a fatigue measures, and tests this system in different conditions.

3 Proposed System

Figure 1 shows the system activities diagram. It starts with the face detection and recognition of landmark, i.e. eyes and mouth. Then, it calculates the eyes blinks using number of eyelid closure and mouth opening for yawning.

Fig. 1
figure 1

The system activity diagram of the system

3.1 Face Landmarks Detection

The driver fatigue detection system detects the face feature in each frame. Second, the eyes blinks and the yawning are calculated every 1 min.

The development of these steps is done in a number of trials. To detect the face features in real time, first, the Haar-like algorithm is tested with JAVA and MATLAB. The algorithm is affected by different lighting conditions and did not detect tilted faces, i.e. face movements.

Second, another feature-tracking algorithm, which is Kanade–Lucas–Tomasi (KLT) [17], is implemented with MATLAB and JAVA. The KLT tracks a set of points and detects the corners within the face bounding box using the minimum eigenvalue algorithm. The MATLAB and JAVA implementation didn’t solve the lighting variations problem. Additionally, the JAVA was too slow to process each frame.

Third, the histogram of oriented gradients (HOG) [15] is a feature descriptor. ‘It counts the occurrences of gradient orientation in localized portions of an image’ [15].

Figures 2 and 3 show the implementation of the HOG algorithm with MATLAB. The implementation was slow, and identifying the eyes and the lips was not very accurate. This was tested with five participants.

Fig. 2
figure 2

HOG input image

Fig. 3
figure 3

HOG descriptor example

Fourth, the D-lib [18] is a general-purpose cross-platform software library. It uses the ‘HOG along with linear support vector machine to train face images to get 68 face landmarks points’ [18]. The system used the D-lib with C++. It detects the eyes and the mouths lips accurately.

3.2 Fatigue Detection

The fatigue is identified by eye blinking, i.e. eye closure frequency in a frame time, and the yawning is measured by the mouth opening frequency in a frame time. For the eye closure, the Euclidean distance between the upper and lower eyelids or eye aspect ratio (EAR) is calculated. EAR measures the eye closure. For the mouth opening, the Euclidean distance between the upper and lower lip is calculated. The fatigue detection is recognized by the eye closure and the yawing (mouth open), defined as in Eqs. (1) and (2), respectively:

$$E(f) = \mathop \sum \limits_{\text{open} = 0}^{\text{closed} = 1}$$
(1)
$$Y(f) = \mathop \sum \limits_{\text{not yawning} = 0}^{\text{yawning} = 1}$$
(2)

where E(f) is the total number of eye closure in each frame f. The values are given as follows. Closed eye takes 1 and open is 0. Y(f) is the number of yawning in each frame f and yawning has the value of 1 and not yawning is 0. These measures are calculated in a frame window that has a sequence of frames \(x_{i}\) in Eq. (3).

$$fw(t) = xi\;(t \ge i \ge t - 1)$$
(3)

The fatigue is calculated by the following formula in Eq. (4).

$$FA(t) = \left({\mathop \sum \limits_{x \in fw(t)}}\,E(x) \ge 21\right) \wedge \left({\mathop \sum \limits_{x \in fw(t)}}\,Y(x) \ge 2\right)$$
(4)

Where the FA(t) is the fatigue FA in a time frame t. It is calculated with the number of eye close equals or more than 21 and there is at least 2 yawning or more. To build a support vector machine (SVM) model that classifies the fatigue from the non-fatigue states, 831 records of fatigue measures, i.e. eye closure and yawning were collected. The model is trained and tested, and the results are presented in the receiver operating characteristic curve (ROC) curve and the area under the curve (AUC) which are 95% accuracy (see Fig. 4).

Fig. 4
figure 4

ROC curve and AUC

4 Evaluation

The evaluation was guided by two research questions. RQ1. Will the system work in different light conditions? RQ2. Will the system be able to work with different age groups, gender, people wearing reading glasses or with male having hair of a beard and a moustache? The evaluation was done in two phases.

4.1 First Study

Equipment

The system uses the laptop’s camera and External HD 1080 camera attached to the laptop for a live video streaming input to the system. The laptop is HP Intel Core i5 7th generation with 4 GB RAM and 500 GB hard disk. The system was installed in a car where the laptop was on the right side of the driver, and the webcam is on the left-hand side of the driver. The camera was fixed in the far left of the driver not to distract him from driving (see Fig. 5).

Fig. 5
figure 5

System installed in the car

Participants and Procedures

Only one participant tested the system. He was asked to drive midday, where the sun is bright for 10 min and to repeat the same drive at night for 10 min. The two trips were recorded.

Results

The video was analysed, and the results show that the system managed to detect the face landmarks and the fatigue measures accurately.

4.2 Second Study

Equipment

The system was tested in a lab, in the morning with the laptop camera. The camera was on to top of the laptop and facing the participants’ eyes.

Participants and Procedures

The system is evaluated with ten participants. They are three ladies and seven males. The average age of five of the participants is between 20 and 25, four is between 30 and 35 and one male is above 70. Three of the participants were wearing reading glasses. Four of the males had moustaches and beards (see Fig. 6).

Fig. 6
figure 6

The participants’ evaluation

They were told that there are two short sessions of 5 min. In the first session, they will look into the camera, blink, and yawn. In the second, they will just look to the camera.

Results

The system was promptly showing the results of fatigue or not (see Fig. 7). For the ten participants, the system detects all fatigue measures accurately.

Fig. 7
figure 7

The system working with reading glasses and hair around the mouth

5 Conclusions and Future Work

This work presents a novel non-intrusive system that detects drivers’ fatigue using a simple webcam. To develop this system, different algorithms and implementation tools have tested to select the optimal. The system detects the facial features using HOC algorithm. Then, it calculates the fatigue symptoms of eye blink and yawning. A SVM classifier model is created to differentiate between fatigue and non-fatigue states. The AUC results of this model are 95%. This system is evaluated in different light conditions, with different age groups and attachments to the face such as reading glasses and hair around the mouth. The evaluation gave encouraging results. The system recognized all the fatigue symptoms and showed the fatigue and non-fatigue states immediately to the participants. To gain confidence in the results, in future, this system will be tested in different context of fatigues.