Keywords

1 Introduction

Traffic accidents cause thousands of people to be injured or even lose their lives every year. According to statistics from the World Health Organization (WHO), fatigued driving causes a considerable part. In this regard, fatigue driving detection technology is also developing rapidly [1]. According to the dimensionality of the acquired feature data, the current fatigue detection methods can be divided into single-dimensional detection and multi-dimensional detection. The researchers [2, 3] proposed an improved method for detecting driver fatigue by calculating the eyelid movement parameter Parcels. The Percols theory’s limitations can only be applied under certain conditions. Uncertain conditions such as indoor lighting, changes in light, and head motion will cause detection errors. Researchers have found that when drivers feel tired, they show many facial features, including frequent blinking, yawning, and shaking their heads. The researcher Sahayad et al. [4] has pointed out that the hybrid fatigue driving detection method’s reliability and accuracy that combines multiple methods are much higher than that of the method using a single sensor. Multi-dimensional fatigue driving detection is to classify multiple data items and involve convex quadratic programming problems. C Buchheim et al. [5] studied the ellipsoid boundary to determine the convex quadratic programming problem’s boundary. Tao Cai et al. [6] designed the Newton-CG augmented Lagrangian algorithm for the convex quadratic constrained quadratic semi-definite programming, assuming Robin-son constraint norms, healthy second-order sufficient, and other three conditions. The assumptions are relatively strong.

Researchers [7, 8] studied a support vector machine to classify data items in the process of classification and feature selection. In classifying data items, the support vector machine’s principle is to put the target vector into a high-dimensional space through nonlinear changes and find the best hyperplane to distinguish data items. Mingze Xia et al. [9] proposed using genetic algorithms to optimize the RBF parameter and error penalty function C, thus achieving better classification of the model.

Qingshuo Zhang et al. [10] proposed multicore support vector machines based on nuclear alignment, which significantly improved the model’s training efficiency. The kernel function is an essential part of the support vector machine, which is divided into a linear kernel function and a polynomial kernel function. Different kernel functions determine that the support vector machine has different characteristics. M. Tanveer et al. [11] proposed a novel, precise 1-norm linear programming formula linear kernel function for twin support vector machine (TWSVM), which has good generalization ability. However, it does not have excellent learning ability and good predictive ability. G Sideratos et al. [12] proposed a probabilistic wind power prediction model based on radial basis function neural network (RBFNN), which has good learning ability but does not have a good generalization and prediction ability. VH Moghaddam [13] proposed a new kernel called Hermite orthogonal polynomial, which has good predictive ability but does not have good generalization ability and learning ability. In addition, the emergence of federated learning has greatly improved the accuracy of the fatigue driving model [14, 15].

According to previous scholars’ research results, one-dimensional fatigue driving detection is easy for researchers to realize, but the data obtained is more susceptible to interference from indoor lighting and head movement. After multi-dimensional fatigue driving detection combines multiple single-dimensional detection methods reasonably, it can significantly improve detection accuracy in the real environment.

However, a single kernel function often cannot have these characteristics simultaneously, so there is currently a lack of a kernel function that can have multiple characteristics at the same time. At present, few researchers combine multi-dimensional feature data and design a support vector machine that can combine multiple kernel function characteristics to realize a complete set of fatigue testing equipment. To realize the academic vacancy in this area, this paper designs a fatigue driving detection device. The device’s processor uses a microcomputer motherboard that provides an open-source software architecture: Raspberry Pi 4 Model B. The photosensitive device uses the infrared sensor of the OV5647 sensory chip. The camera prevents the normal driving activities of the driver from being affected by contact with the driver. Even in the absence of light or low light conditions, the driver’s image can be obtained well. Besides, to meet various needs, this article’s fatigue driving detection system is also equipped with a Global Navigation Satellite System (GNSS) module of Micro Snow for Global Positioning System (GPS), BeiDou Navigation Satellite System (BDS), and Quasi-Zenith Satellite System (QZSS) multi-satellite system speed measurement and other sensors. For this device, its core is the algorithm part [16]. Based on the concept of multi-dimensional detection, this paper uses a face location algorithm based on a cascaded gradient descent tree to locate and distinguish the driver image’s face and obtain the eye aspect ratio (EAR) and mouth aspect ratio (MAR). This paper uses the Euler-based video zoom algorithm to process the face video image. It obtains the driver’s heart rate signal without touching the driver, which dramatically reduces the system’s intrusiveness (Table 1).

Table 1. Pseudo-code display of fatigue driving proposed in this article

Finally, load the collected data into a pre-designed multi-dimensional dataset. Aiming at the problem that supports vector machines cannot have multiple characteristics simultaneously, this paper proposes a hybrid kernel function, which combines a logical kernel function with good generalization and a radial basis polynomial kernel with excellent learning and predictive capabilities [17]. The functions are combined to construct a support vector machine with a hybrid kernel function. The support vector machine based on the improved kernel function has robust learning and prediction capabilities and has good generalization capabilities. Finally, use the improved support vector machine to classify the mixed dataset, and then use the Raspberry Pi 4 Model B and make the corresponding output on the local side [18]. The average total accuracy of the detection of fatigue driving level reached 96.92% in the obtained experimental results. The fatigue detection system can efficiently and accurately detect the driver’s fatigue state in real-time without contacting the driver.

2 Fatigue Driving Detection Method Based on Improved Kernel Function Support Vector Machine

This method can obtain the video stream through the camera. After intercepting the pictures in the video stream, the driver’s facial feature points are detected based on the gradient descent tree algorithm of cascade regression. Calculate the eye aspect ratio (EAR) and mouth aspect ratio (MAR). The driver’s heart rate is obtained by analyzing the RGB image and combining it with the Euler algorithm. The hybrid kernel function using logic type and RBPK type kernel function improves the support vector machine to classify facial features to determine whether they are fatigued. The system can run on low-end development boards, such as Raspberry Pi 4 Model B (Fig. 1).

Fig. 1.
figure 1

The flow chart of the fatigue driving detection system of the support vector machine with an improved kernel function

2.1 Multi-dimensional Fatigue Driving Feature Extraction Based on Gradient Descent Cascade Regression Model

Face Location Based on Gradient Descent Tree Algorithm of Cascade Regression.

This paper locates the face base on the Gradient Boosting Decision Tree (GBDT) [19]. This algorithm can locate human faces within one millisecond, significantly improving the detection efficiency.

The algorithm lets represent all 68 facial landmarks’ coordinates and use the gradient descent tree algorithm to learn each regressor in the cascade. From the image and the facial landmark estimation value, predict and update the vector and add it to the current shape estimate, make the estimated value closer to the right value, complete the purpose of face alignment, and obtain the value of the 68-dimensional facial landmark:

$${\widehat{S}}^{(t+1)}={\widehat{S}}^{(t)}+{r}_{t}(I,{\widehat{S}}^{(t)})$$
(1)

After achieving face alignment and acquiring the coordinates of 68 facial landmarks, this article selects 32 dimensions of the 68-dimensional coordinates to calculate the eye aspect ratio (EAR) and mouth aspect ratio (MAR) to determine whether the driver is fatigued or not.

Calculation of EAR and MAR.

In this paper, 12-dimensional eyes (both eyes) and a 20-dimensional mouth are selected to calculate the opening and closing degree. The calculation formula of EAR [20] is defined as follows, where \({P}_{1}\) to \({P}_{6}\) represents the left eye, \({P}_{7}\) to \({P}_{12}\) represent the right eye, and the EAR values of the two eyes are calculated separately:

$$EAR=\frac{\left|\left|{P}_{2}-{P}_{6}\right|\right|+\left|\left|{P}_{3}-{P}_{5}\right|\right|}{2\left|\left|{P}_{1}-{P}_{4}\right|\right|}$$
(2)

The MAR calculation formula is as follows, \({P}_{13}\) to \({P}_{32}\) represent the mouth (Fig. 2):

$$MAR=\frac{\| {P}_{14}-{P}_{24}\| +\| {P}_{15}-{P}_{23}\| +\| {P}_{16}-{P}_{22}\| +\| {P}_{17}-{P}_{21}\| +\| {P}_{18}-{P}_{20}\| }{2\| {P}_{13}-{P}_{19}\| }$$
(3)
Fig. 2.
figure 2

Schematic diagram of the EAR and MAR (the three pictures on the right represent open eye EAR, closed eye EAR, MAR from top to bottom)

Heart Rate Detection Based on Euler Video Zoom.

In addition to calculating EAR and MAR, this article also obtains heart rate as an index for comprehensively judging fatigue driving.

This paper uses Euler’s video magnification algorithm to process face video images. Compared with the independent component analysis algorithm, this algorithm does not require the source signal’s non-Gaussian independence. It has lower time complexity, which can reduce the time for fatigue driving detection. The algorithm processes video images in the spatial and temporal domains, thereby magnifying subtle changes in the video that are usually invisible or difficult to detect with the naked eye.

In this paper, the G channel with the stronger pulse wave signal among the three frequency channels of RGB in each frame of image magnified on the forehead is detected. The maximum power spectrum corresponding frequency of the signal sequence formed by the average value of the pixels in the G channel’s region of interest is used as the heart rate estimation value. The processed heart rate output value \(\tilde{I }(x,t)\) is calculated [21]. It can be seen from the following formula that the original small translational motion \(\delta (t)\) is amplified to \((1+\alpha )\delta (t)\) after time-domain band-pass filtering (Fig. 3):

$$\tilde{I }(x,t)\approx f(x+(1+\alpha )\delta (t))$$
(4)
Fig. 3.
figure 3

Heart rate detection based on Euler video magnification from the forehead

2.2 Improved Logical Kernel Function

The kernel function is the core of the support vector machine. The performance of different kernel functions has its advantages and disadvantages. The performance of the support vector machine is also different due to different kernel functions. Some kernel functions are global, so they have good generalization capabilities. Some kernel functions have good learning ability and predictive ability. Generally speaking, a single kernel function may not have good learning and generalization capabilities. Therefore, this paper combines the logical kernel function with good generalization, and the radial basis polynomial kernel function (RBPK) with excellent learning ability and predictive ability to construct a mixed kern el function support vector machine, which improves based on The support vector machine of kernel function has not only robust learning and prediction ability but also has good generalization ability.

Logistic Kernel Function.

The expression of Logistic function is:

$$ K\left( x \right) = \frac{1}{{1 + e^{{ - ax^{2} }} }},\;\;\alpha > 0 $$
(5)

The expression of Logistic kernel function is:

$$K\left({x}_{i},{x}_{j}\right)=\frac{1}{1+{e}^{{-a({x}_{i}-{x}_{j})}^{2}}}$$
(6)

As long as the kernel function satisfies the Mercer condition, the dot product operation in the high-dimensional space can be converted into the kernel function operation in the input space, thereby avoiding direct calculation in the high-dimensional space and solving the problem of high algorithm complexity.

The literature [22] gives the proof process of the Logistic kernel function as the support vector machine’s kernel function (Fig. 4).

Fig. 4.
figure 4

Graph of logical kernel function

Radial Basis Polynomial Kernel Function (RBPK).

The paper [23] defines a kernel function called Radial Basis Polynomial Kernel (RBPK):

$$K\left({x}_{i},{x}_{j}\right)=\mathit{exp}\left(\frac{{{(x}_{i}.{x}_{j})}^{d}}{{\sigma }^{2}}\right) d>0$$
(7)

The paper improves RBPK from two kernel functions, which makes full use of the good predictive ability of the polynomial kernel function and the RBF kernel function’s learning ability.

LRBPK Hybrid Kernel Function.

By analyzing the logical kernel function’s image and the radial basis polynomial kernel function, we can conclude that the logical kernel function has good generalization ability, and the support vector machine whose kernel function is RBPK has good learning and prediction ability. Therefore, to obtain a support vector machine with robust learning and predictive capabilities, and generalization capabilities. The Logical and Radial Basis Polynomial Kernel (LRBPK) mixed kernel function is a mixed kernel function of Logistic and RBPK type kernel functions. LRBPK is used as the kernel function of the improved support vector machine in this system.

According to the lemma, we know that if \({K}_{1}\) and \({K}_{2}\) are kernel functions on X*X, and X ∈ R, the constant \(a>=0\), then \(K\left(x,y\right)={K}_{1}\left(x,y\right)+{K}_{2}(x,y)\), \(K\left(x,y\right)=\alpha .{K}_{1}(x,y)\) is still the kernel function.

Therefore, the LRBPK hybrid kernel function expression is as follows:

$${K}_{LRBPK}=n.{K}_{logistic}+(1-n){K}_{RBPK}$$
(8)

3 Experiment and Result Analysis

3.1 Experimental Background

Dataset Description.

The dataset in this paper includes Driver Drowsiness Detection Dataset [24] and the dataset established by the author. In Driver Drowsiness Detection Dataset, subjects play driving games to get different states. Under the guidance of the experimenters, the testers showed a series of facial expressions. The total time of this dataset is about nine and a half hours.

A self-built database was constructed using the experimental device below. Use the OV5647 infrared camera to get the video stream. The video format is 30 frames per second and a color image with 320*240 pixels. The total recording time is 5 h. Contains 12 different testers. There are six female drivers and six male drivers, aged between 18 and 40. The testers simulated everyday driving, yawning, squinting, and sleepiness. And we were shooting in four different directions. The testers also tested without wearing any glasses, wearing black-rimmed glasses, and wearing sunglasses.

We collect data in a different light and different angle driving scenes to simulate a real driving scene. The different light environments are intense light, normal light, low light, and no light. Other angles are divided into front, left, and right sides. The camera’s built-in infrared light supplement can display the picture even when there is no light, but it will be different from the usual light environment (Fig. 5).

Fig. 5.
figure 5

Dataset display

Description of Experimental Device.

This paper designs and manufactures a fatigued driving detection device composed of Raspberry Pi 4 Model B, OV5647 infrared camera, and various sensors to test the actual driving situation. The device is used to run the fatigue driving detection algorithm proposed in this article and obtain self-built dataset (Fig. 6).

Fig. 6.
figure 6

The overall situation of the equipment. The infrared camera is connected to the Raspberry Pi 4 Model B through the CSI interface. In order to be able to prompt alarm data, the Raspberry Pi is connected with a micro display, LED lights.

Raspberry Pi 4 Model B.

The Raspberry Pi 4 Model B selected in this article is a microcomputer motherboard that provides an open-source software architecture. It has a 4-core ARM processor clocked at 1.5 GHz and 4 GB memory, which can run the algorithm model proposed in this article. Its price is not high, and it is easy to mass manufacture similar low-cost devices. It is equipped with 40 GPIO interfaces, which can connect a variety of sensors to facilitate the acquisition of various data. Its built-in Wi-Fi module can transmit data to the server during the experiment, reducing the storage and investment of data related to fatigue driving.

Infrared Camera.

This article uses an infrared camera with the sensory chip OV5647. It has a 160-degree viewing angle range, can acquire more images, and can adjust the focus. Equipped with an infrared fill light that can feel ambient light, the camera can reach a visual distance of 2 m at night. It can adapt well to the environment in the car. It can adapt to low-light and no-light environments that are common for driving (Fig. 7).

Fig. 7.
figure 7

OLED display module and GNSS module.

Other Modules.

The fatigue driving detection of the actual scene will consider many factors. Fatigued driving is detected only during driving, and the Weixue brand GNSS module is installed for GPS, Beidou satellite navigation system (BDS), and QZSS multi-satellite system speed measurement; considering the need for actual temperature and humidity detection, DHT11 sensor is installed; In order to facilitate the intuitive acquisition of data, this article adds a 0.96-inch OLED screen and so on.

3.2 Experimental Process

First, this article extracts fragments from the dataset. A total of 240 video fragments are removed, each of which is 30 s. According to the subjective judgment method, 113 fatigue video clips, 127 non-fatigue video clips, and the category label (0, 1). This paper randomly samples the video and includes four different angles and three glasses-wearing clips—extract 95 fatigue and 95 non-fatigue video clips, respectively. Divide the video into the training set, and test set equally (Fig. 8).

Fig. 8.
figure 8

Experimental flowchart.

One thousand eight hundred pictures were intercepted from the training set and test set and subjectively classified, and 900 images were divided into a training set and test set. Since there are not many data sets, this article conducts Data Augmentation and uses OpenCv to batch flip, adjust brightness, blur, and other processing methods to get 5400 pictures (Fig. 9).

Fig. 9.
figure 9

Data Augmentation processing result display. Different processing is performed on the fatigue state image data.

The gradient descent tree algorithm of the cascaded regression is used to obtain the face in the picture, obtain the face’s 68-dimensional feature points, and extract the feature values of EAR and MAR. Extract the characteristic value of the heart rate through each video clip. Match the video to the corresponding picture. The three sets of feature value data of the training set are provided to the SVM classifier for training.

Get the training parameters to test the test set. The images in the test group were divided into four groups, each with 675 pictures. In order to ensure the balance between the false alarm rate and the false alarm rate, this article defines the accuracy rate: \(Total\;accuracy = 1 - \left( {false\;alarm\;rate + false\;alarm\;rate} \right)\) (Table 2).

Table 2. Precision display

It can be seen that the SVM algorithm using the improved kernel function proposed in this paper has better processing results in the false positive rate and can effectively reduce the false-negative rate.

Since the judgment of the picture cannot be intuitively derived from the test in the actual driving environment, this article selects video clips as the continuous monitoring test. However, since the video has a rate of 30 frames per second, this paper establishes 0.5 s as the interval time for fatigue driving judgment. In this article, we have also made relevant calculations based on the blinking frequency of human eyes. According to Sakai’s research [25], the average number of blinks per minute (Nob) of people is about 25 times. Blink time (BT) is about 0.2 s. According to the probability, we can know that the likelihood of being recognized as closed eyes when blinking is:

$$P(C)=\frac{Not*BT}{60}=20.833\%$$
(9)

The probability of closing the eyes for 5 consecutive times is:

$$P\left(FC\right)=1-{P\left(C\right)}^{5}=0.039\%$$
(10)

Studies have shown that people’s blinking frequency is lower when focusing on driving [26]. The probability of closing the eyes five times will be even lower. In many comparative experiments, this article has also found that the correct rate is better when the fatigue driving judgment occurs five times in a row, and the miss judgment rate is lower. Finally, test with 95 non-fatigue video clips tests set and compare other fatigue driving test data (Table 3).

Table 3. Comparison of the accuracy of various fatigue driving detection methods

It can be seen from the test that the accuracy of this method reaches 98.95%, which exceeds 1.06% of the fatigue driving detection method based on YCbCr color space. In the actual driving process, this article uses the equipment mentioned above to conduct multiple tests. Compared with similar commercial products, our fatigue driving test has a lower false alarm rate and a lower false alarm rate (Fig. 10).

Fig. 10.
figure 10

Practical Testing. The left side of the picture.is the actual test environment, the upper right is the packaged test equipment, and the lower right is the image of the device identification.

4 Conclusion and Future Directions

This paper proposes a multi-dimensional fatigue driving detection system based on an improved kernel function support vector machine by locating and identifying the EAR, MAR, and heart rate of the face. A new fatigue driving detection framework is constructed by improving the kernel function in the support vector machine. Construct a new dataset and combine the public dataset for training and testing, and get a better recognition rate. Comparing single point feature detection with classic support vector machine detection, it has an absolute accuracy improvement. However, due to the lack of a camera to obtain the heartbeat. In the future, we will improve the shortcomings in this area and use better methods to predict. The author will collect more data sets and strive to build a complete fatigue driving detection system. The streamlined system can be used in mid-range IoT devices. In the future, more features such as human body pressure, steering wheel, and head tracking can be combined to develop a more accurate fatigue driving detection system. Combining with L1-L3 unmanned driving systems is also a follow-up research direction.