Keywords

1 Introduction

Driving style refers to how drivers choose to drive habitually and the driver states that represent the common parts of varied driving behavior [1]. Recognition of a driver’s driving style based on rear-end collision risk is of great significance to improve driving safety. It is important to guarantee the safety and adequate performance of drivers and essential to meet drivers’ needs, adjust to the drivers’ preference, and ultimately improve the driving environment’s safety. Driving style recognition also has potential value to help agencies effectively design control strategies [2, 3].

The paper proposed a driving style recognition model to consider the impact of traffic flow levels on driving behavior. The traffic flow level could be classified into normal and congested traffic, and the driving style is labelled concerning the road condition. The difference of risk surrogates between normal and congested traffic would be explored. The decision about driving style ignoring the traffic level is not acceptable. Therefore, the traffic flow levels would be taken into consideration when labelling and recognizing the driving style. The trajectory data extracted from the video are studied in the paper, which contains the identification, GPS, velocity, and acceleration.

2 Literature Review

In recent years, the studies about new modes of travel transportation [4, 5] and innovative approaches have been developed. Some research concentrates on the cooperative schedule to achieve the optimization [6, 7]. Machine learning algorithms applied to driving behavior recognition have been studied in some previous works. Different types of neural network (NN) algorithms have been used [8,9,10,11]. However, a larger size of the network could lead to a long training time [12]. The tree-like algorithm [13, 14] and Hidden Markov Model (HMM) [15, 16] are also adopted to detect the driving behaviors according to the extracted features. Some researchers also combined the HMM with dynamic Bayesian networks or ANN to predict the driving behavior by learning the driving data [17, 18]. While HMM requires a long training time, especially for a high number of states, the recognition time also increases with the number of states [19]. Therefore, a more suitable and effective method should be found to identify the driving style. SVM has been widely applied to various kinds of pattern recognition problems, including voice identification, text categorization, face detection [20, 21]. In addition, SVM performs well with a limited number of training samples, and SVM has fewer parameters to be determined [22, 23]. Therefore, many studies employed SVM to build driving style recognition models.

Except for unsupervised machine learning algorithms, for example, clustering, other machine learning algorithms require labelled or partially labelled driving behavior data. Some research adopted behavior-based or accident-based method to label the driving style [20, 24]. Driver self-reported questionnaire [25] and expert scoring [13] are also adopted to evaluate driving style. However, these two methods rely on drivers or experts’ subjective judgments and can be very time-consuming when the number of drivers in the sample is huge. This paper proposes a new driving data label method based on collision surrogates incorporating traffic level.

3 Methodology

Three collision risk surrogates are used to determine the risk level of the car-following process for each following pairs. The threshold value to classify the risk level is different for normal and congested traffic. The K-means algorithm is applied to group the drivers as calm or aggressive, based on their trajectory risk levels. As the traffic flow has a great impact on driving behavior, it is considered when labelling the driving style.

3.1 Collision Risk Surrogate

It is essential to find the most effective surrogates to describe the collision risk when driving on the road [26,27,28,29]. Vehicle trajectory data such as the vehicle’s velocity and acceleration are usually not good enough to estimate the rear-end collision risk. In the paper, the Margin to Collision (MTC) is used to evaluate the risk.

MTC indicates the final relative position of PV and FV if two vehicles decelerate abruptly.

$$ MTC = {{(x_{r} + v_{p}^{2} /2a_{p} )} \mathord{\left/ {\vphantom {{(x_{r} + v_{p}^{2} /2a_{p} )} {(v_{f}^{2} /2a_{f} )}}} \right. \kern-\nulldelimiterspace} {(v_{f}^{2} /2a_{f} )}} $$
(1)

where af and ap denote the deceleration for FV and PV, respectively. Usually, both are defined as 0.7G. vf and vp respectively denote the velocity of FV and PV. xr denotes relative distance. A modified MTC (MMTC) is proposed in the paper to include the following vehicle’s reaction time when the PV abruptly decelerates. The equation is modified as follows.

$$ MMTC = {{(x_{r} + v_{f}^{2} /2a_{f} - v_{p}^{2} /2a_{p} )} \mathord{\left/ {\vphantom {{(x_{r} + v_{f}^{2} /2a_{f} - v_{p}^{2} /2a_{p} )} {v_{f} }}} \right. \kern-\nulldelimiterspace} {v_{f} }} $$
(2)

MMTC evaluates the minimum reaction time needed for FV to avoid a collision when PV abruptly decelerates. The collision risk is higher with a lower MMTC value since drivers have little time to react. MMTC can evaluate potential collision risk with the abrupt deceleration of PV.

3.2 Key Features Extraction

In this paper, the vehicle acceleration af, relative distance xr, and relative velocity vr are adopted to recognize the driving style. The Discrete Fourier Transform (DFT), and Statistical method (SM) are used respectively to extract the effective key features from the vehicle trajectory. The key parameters that can capture most of the distribution information of vehicle trajectory.

3.3 Recognition Algorithms

Four machine learning algorithms, i.e., SVM, RF, KNN, and MLP, are adopted to build the driving style recognition model. The inputs of the model are the features extracted in Sect. 3.2. The output of the model is the driving style. The recognition process of driver’s aggressive driving style is as follows:

Step 1: Use “leave-one-out” to divide the test set and training set for the model. Select one sample as the training set and the others are test set, ensuring the training set contains calm and aggressive driving styles.

Step 2: In order to avoid the influence of dimension among different trajectory variables and eliminate the differences, the min–max normalization method is used to normalize the sample data.

Step 3: The Differential Evolution algorithm (DE) is applied to optimize the parameters of algorithms, and get the initial structure value of the optimized algorithm.

Step 4: Four algorithms are used to identify the aggressive driving style under normal and aggressive traffics.

Step 5: Model performance evaluation.

4 Results and Discussion

In this paper, the I-80 trajectory dataset of Next Generation Simulation (NGSIM) is adopted to study driving style. According to the data analysis, the aggregate flows of HOV lanes are respectively 250 and 398 vph during two periods, indicating two levels of traffic flow. 370 Leader-follower Vehicle Pairs (LVP) are chosen under congested and normal traffic to study the driving style in this paper since fewer interrupting vehicles are from other lanes.

4.1 Significant Analysis Considering Traffic Levels

Based on the trajectory data extracted from NGSIM, the significant analysis between trajectory features considering traffic levels has been conducted, shown in Table 1. Table 1 shows that there is no significant difference of ITTC surrogate in different traffic levels. Besides that, the drivers tend to keep higher velocity (17.314 m/s) and lower acceleration (0.040 m/s2) when following preceding vehicles. And drivers in congested traffic flow keep higher velocity difference with preceding vehicles to keep safe, while the gap is smaller. However, the ITTC, THW, and MMTC are smaller for congested traffic condition. Therefore, the traffic condition should be taken as a reference when labelling the risky following maneuvers.

Table 1 Significant analysis considering traffic level

4.2 The Sample Data Label

Figure 1 shows the fitting curves of MMTC by adopting three distributions for normal and congested traffic flow, i.e., normal distribution, logistic distribution, and t distribution. The t distribution achieves a better fitting performance than the other two distributions. Therefore, the t distribution is adopted to determine the threshold value of features. The 85% percentile value based on cumulative distribution are also obtained as threshold to classify the car-following maneuvers into several segments with two levels of risk, i.e., safe and risky.

Fig. 1
figure 1

Statistical fitting curves for MMTC for normal traffic flow and congested traffic flow

Each driver’s driving trajectory can be divided into several segments, which belong to two different driving risk levels. A driver is selected to show the trajectory segments according to the threshold values of MMTC, shown in Fig. 2.

Fig. 2
figure 2

Trajectory segments for a driver based on threshold values of MMTC

The proportions of trajectory segments with different risk levels can determine each driver’s driving style. The K-means algorithm is applied to trajectory under different traffic flow to classify the drivers as calm and aggressive based on the ratio of risky maneuver. The clustering results show there are 246 calm drivers and 124 aggressive drivers under normal traffic flow, and 200 calm drivers and 170 aggressive drivers under congested traffic flow.

4.3 Driving Style Recognition

The SVM method is adopted to recognize the driving style under each traffic level. In this paper, the trajectory data including the vehicle acceleration af, relative distance xr, and relative velocity vr are adopted to recognize the driving style, respectively. The Discrete Fourier Transform (DFT) and Statistical method (SM) are respectively used to extract the effective key features from the vehicle trajectory. The z-score method is adopted to standardize features before model training.

In the study, the accuracy, precision, and recall rates are assessed to evaluate the model’s ability to recognize aggressive drivers among all vehicles. The performance of the recognition model is evaluated using the “leave-one-out” cross-validation method. Driving style recognition results based on different feature extraction methods and SVM are shown in Table 2. Except mentioned, the SVM algorithm uses the linear kernel function.

Table 2 The recognition results of driving style based on SVM with DTF and SM

As shown in Table 2, the DFT outperforms SM in feature extraction, with an accuracy of 91.7%. With any combinations of features, the accuracy rate of the SM is lower than that based on DFT. In general, the feature xr and vr perform better than af in recognizing the driving style. A possible reason is that the rear-end collision risk determines the driving style label, the feature af cannot accurately describe the relative motivation between two following vehicles.

The performance of four machine learning algorithms RF, MLP, KNN and SVM using all features with DFT method are compared. The accuracy, precision, and recall rates are listed in Table 3. It can be seen that SVM outperforms other machine learning algorithms. Random Forest is the second best algorithm. However, MLP gives the highest recall rate among all candidates. KNN, as the simplest classification method, unsurprisingly obtains the worst performance. As seen from Table 4, the recognition accuracy of driving style without traffic levels is 76.50%, lower than that with context. Therefore, the traffic levels should be taken into consideration.

Table 3 The recognition results of driving style based on RF
Table 4 The recognition results of driving style with and without traffic level

5 Conclusion

In this study, a novel driving style label method is proposed to assign calm and aggressive labels based on collision risk incorporating traffic levels, which is critical to sample data needed in supervised machine learning. The main findings could be summarized as follows:

The rear-end collision risk surrogates, namely MMTC, are adopted to evaluate the risk during the car-following process. Each driver’s trajectory can be divided into two risk levels incorporating traffic levels, and all drivers can be grouped into calm or aggressive using the K-means algorithm.

Two feature extraction methods are compared in the recognition model. Three machine learning algorithms including RF, MLP, and KNN are also adopted to compare with the SVM. The results show that DFT could better capture the characteristics of driving behavior. The driving style can be recognized with the highest accuracy of 91.7% using SVM.

This study offers the possibility of developing more sophisticated driving style recognition methods. For further work, the proposed method can be extended by selecting other features that can reflect the driving style more accurately. As we know, the driving style is also influenced by the road conditions. Such results can also be used to improve driving style recognition.