Keywords

1 Introduction

Drowsiness while driving is one of the principal causes of road accidents. According to the surveys conducted by the U.S National Highway Safety Administration [1], every year almost 60,000 road accidents are due to drowsiness. The above argument is also supported by Sinan et al. [2] with a statistical figure in terms of percentage, i.e., 25–30% of driving accidents are caused due to drowsiness of the driver. Drowsiness or even sleepiness while driving can be due to loss of attention, long working hours, fatigue, monotonous driving, late-night drives or may be due to consumption of some medications or alcohol [3]. An effective system which can automatically identify the onset of drowsiness can save many lives.

To address this, the factors indicating the onset of drowsiness need to be figured out first. In a broader perspective, the survey papers[2, 4,5,6] states that the aforementioned factors fall under 3major categories viz. behavioral factors, physiological factors and vehicular factors. In detection based on behavioral factors, computer vision-based techniques [7,8,9] are adopted, for capturing the videos or images of the driver’s drowsy face. Closure of eyelids, blinking rate of an eye, gaze, yawning, facial expressions, movements of the head and driving time [3] are some of the significant behavioral factors. The vehicular factors include lane detection, steering wheel angle and angular velocity. The major physiological factors are electroencephalogram (EEG) [10, 11], electro-oculogram (EOG), heart rate, breathing rate and body temperature. Analysis of driving patterns is not accurate because they are implemented by detecting steering wheel, brake and accelerator. Moreover, these characteristics depend on the type of road and drivers’ skills [12]. Physical characteristics of driver are detected by body sensors like EEG, Electro Cardiograms (ECG), EOG and blood pressure [13]. It is uncomfortable for the driver to wear all sensors at the time of driving. On the contrary, analysis of driver behavior consisting of camera-based methods [14] will be more accurate than other methods mentioned, since it takes care of the driver’s comfort without distracting his/her attention.

In this fast-moving world, in order to utilize time effectively, people prefer over-night travels, though it is dangerous. Driver drowsiness detection systems are very much crucial to safeguard people and lives. Hence an ample amount of research has been carried out in this regard. Owing to accuracy and reduced discomfort to the driver, computer vision-based techniques have opted for this analysis. Some of the related researches in this area are discussed below. Continuous monitoring is always prioritized while designing and developing driver alert systems. Similar researches can be seen in several works of literature. Ali et al. [15] designed a fatigue detection system using a web camera for continuous monitoring of drivers. The eye region was detected from the video using OpenCV libraries. Prediction of drowsiness was accomplished by Partial Least Squares Regression, using Percentage of Eyelid Closure (PERCLOS). They also computed the Eye Aspect Ratio (EAR) and Mouth Aspect Ratio (MAR). In both cases, if the threshold limit is crossed, an alert message will be generated, followed by an alarm.

To reduce the number of casualties, Mehta et al. [16] developed an android application for driver drowsiness detection using facial features. Among the machine learning techniques used for analysis, bagging classifier yielded the highest accuracy in detection. The author also suggests incorporation of the above app with the currently available cab hiring services. Accidents may be due to natural drowsiness of the driver or maybe due to alcohol consumption. Charniya et al. [17] carried out the analysis to detect the drowsy or drunk state of the driver while driving. This paper mainly focuses on the detection of drowsiness from the face and facial features with a yawning detector. Face detection was done using Haar cascade features. State of the eye was classified either as open or closed using Support Vector Machines (SVM). The visual features such as eye index and pupil activity were used to identify if he/she was either in alert or non-alert state. The relative deviation between estimated and ground-truth pupil centers was used for this and a threshold was used to give the warning. Yawn was detected by the Viola-Jones algorithm. For prohibiting the ignition of the vehicle, if the driver is drunk, an alcohol detector MQ303A was also included.

This work focuses on implementing a driver alert system, on the onset of drowsiness. This includes a Raspberry Pi with Pi camera module for acquiring the videos of driver’s face while driving. These captured images from video frames are preprocessed for facial landmark detection. 68 landmark features of the face from eyes, eyebrows, nose, mouth, and jawline are used to detect the onset of drowsiness. EAR, MAR, blink rate and yawn rate are the features used. Three types of detection techniques, including 2 supervised learning algorithms are used for detecting drowsiness. The former is a threshold technique and the latter is based on linear discriminant analysis (LDA) and Support Vector Machine (SVM). On detecting drowsiness, the driver will get alerted with the help of a buzzer. The performance of this system developed using different classifiers are measured in terms of accuracy, precision, recall and F1 scores. The paper is organized as follows: The proposed system is discussed in the next session. The detail of the hardware system developed is given in Sect. 3. The data processing part including the database, preprocessing, features acquired are elaborated in Sect. 4. A brief description of the detection techniques used is given in Sect. 5. A detailed analysis of the results obtained is given in Sect. 6, followed by a conclusion and future work.

2 Proposed System

A Raspberry Pi-based driver drowsiness detection system is proposed and developed. The Pi camera module is used to collect the real-time video of the driver’s face. Following this, the Raspberry Pi module segments the acquired video into static frames of facial images. After identifying the face, the images are resized and features are extracted. This includes the identification of facial coordinate points corresponding to eyes and mouth. EAR and MAR are computed with the eye coordinates and mouth coordinates, respectively. Further, the rate of blinking and yawning are also taken into consideration for detecting drowsiness. Two types of detection methods are adopted in this work. First one is a threshold-based technique and the second involves computer-assisted method with the help of machine learning. Based on the aforementioned parameters, if the person is identified to be drowsy, an alarm will be turned on to forewarn the driver. A display module is also attached to cross verify whether the facial landmarks are identified correctly. The performance evaluation of the proposed system is carried out with the help of two databases: one a publically available and the other one, custom made database captured using the associated hardware. The block diagram of the proposed system is as given in Fig. 1.

Fig. 1
figure 1

The block diagram of drowsiness detection system

3 Hardware System Developed

In this project, Raspberry Pi 4, Pi Camera module and a display are used. Raspberry Pi has 2 GB RAM, USB 3.0 Ports and also supports dual 4 K video output. The Pi camera module is a portable lightweight camera that supports Raspberry Pi. It communicates with Pi using the MIPI camera serial interface protocol. Python libraries are used to control the camera module and processing of these images from the captured video. The hardware module is shown in Fig. 2a.

Fig. 2
figure 2

aHardware module with display b sample image in the dataset, indicating salient points

4 Data Processing

4.1 Database

The detection methodologies are analyzed using a publically available dataset, Closed Eyes in the Wild dataset (CEW). Further, the performance of a drowsiness detection system developed is evaluated using a custom dataset, acquired using Pi camera module.

4.1.1 Closed Eyes in the Wild Dataset (CEW)

CEW dataset was developed by Song et al. [18] for detecting closed eyes in still images. This dataset comprised of 2423 subjects. It has 1192 closed eyes, which were directly downloaded from the Internet and 1231 with open eyes, selected from another publically available database viz. Labeled Face in the Wild (LFW) [19]. 120 randomly selected samples from CEW dataset is used for analysis.

4.1.2 Custom Dataset

An image dataset is developed with the help of students in Amrita Viswa Vidyapeetham. The subjects were asked to undertake a virtual driving game for 2 h, to simulate a monotonous driving state and then the videos were acquired. The database consists of 240 images with 120 samples in closed eye state and the remaining in open eye state. Both of these states also have images while the subjects yawned. Sample image is shown in Fig. 2b.

4.2 Preprocessing and Feature Extraction

Preprocessing followed by facial feature extraction is done in four subsequent steps. The first step comprises of image extraction from video frames and converting them to grayscale. The second step involves the detection of the face from the grayscale image so that facial features can be identified easily. The salient points of the face corresponding to eye and mouth regions are detected in the third step using the Haar cascade face detector [20]. This includes 68 facial landmarks [21] in order to localize the eyes, eyebrows, nose, mouth, and jawline. The final step involves detecting the eye and mouth coordinates for computing EAR and MAR. EAR is calculated from horizontal and vertical distance of the eye, and MAR, from the respective values of mouth. The computation of these ratios is given in Eqs. (1) and (2), where pxx is the corresponding facial landmark points shown in Fig. 2b. The averaged values of EAR and MAR are taken from the adjacent 16 frames of the video for further analysis.

$${\text{EAR}} = \frac{{\left| {p44 - p48} \right| + \left| {p45 - p47} \right|}}{{2\left| {p43 - p46} \right|}};$$
(1)
$${\text{MAR}} = \frac{{\left| {p62 - p68} \right| + \left| {p64 - p66} \right|}}{{2\left| {p61 - p65} \right|}}$$
(2)

5 Drowsiness Detection Techniques

Two main methodologies are adopted for detecting the onset of drowsiness. First one is a threshold-based and the second is based on 2 different machine learning techniques viz. LDA and SVM.

5.1 Threshold-Based Drowsiness Detection Technique

Average values of EAR and MAR are taken over 15 images in the video. If the EAR is above 0.3, then it is considered as an awake state. If the drowsy state is detected more than 3 s then the person is considered as sleeping and alarm is turned on. Similarly, if the MAR value crosses a threshold of 0.6, for more than 3 s then it is considered as yawning, or else it will be ignored as talking. These threshold values are chosen based on a trial and error method. If the number of blinks is greater than 16 per minute then the person is considered as drowsy [22]. If the number of yawns is more than 2 per minute, then also drowsiness is detected [23].

5.2 Machine Learning-Based Drowsiness Detection Technique

The machine learning techniques used in this work are LDA and SVM. LDA tries to differentiate between the drowsy and awake states by fitting a linear discriminant, with a constraint to reduce misclassification. SVM, on the other hand, fits a hyperplane, separating the two states such that the margin is maximized. SVM’s are capable of fitting curvilinear hyperplanes with the help of kernel functions.

6 Results and Discussion

The Pi camera module used in the system for acquiring live videos is Rev 1.3. From this, the still images are collected at an average frame rate of 16 fps. For preprocessing, OpenCV’s Haar cascade face detector is used to detect the faces and subsequently resize the images to 400 × 400 pixels. Further 68 facial landmarks are identified using OpenCV libraries. Features EAR, MAR, Blink rate and yawn rate are obtained from the localized points corresponding to eyes, eyebrows, nose, mouth, and jawline.

The drowsiness detection is done using two methods. A simple threshold technique is the first one and the second one, based on supervised machine learning viz. using LDA and SVM. Both of these techniques use the above-mentioned features. Further, the system designed is validated using a publically available database as well as a custom made one. The performance measures of the system are compared in terms of accuracy, precision, recall and F1 score. Table 1 shows the performance analysis of the system using CEW as well as a custom dataset. For CEW dataset, LDA gave higher accuracy and F1 score viz. 0.908 and 0.906, respectively, compared to other techniques. On the other hand, for the custom dataset, SVM gave higher accuracy and F1 score viz. 0.9 and 0.883, respectively, than others. In short, machine learning-based techniques exhibit higher performance measures than threshold-based one.

Table 1 Performance measures of a drowsiness detection system using CEW and custom datasets

7 Conclusions and Future Work

The drowsiness detection system is developed using a Raspberry Pi. The system is tested with 2 datasets, a publically available one and a custom made one. Drowsiness detection is carried out both using a threshold-based technique and with a supervised learning technique. The latter yielded better results in the respective datasets analyzed. Future work aims at incorporating physiological signals such as EEG along with facial features to enhance the reliability of the system.