Keywords

1 Introduction

Since the world’s first highway was completed and opened to traffic in the 1930s, countries around the world have vigorously implemented highway planning and construction. By the end of 2006, the total journey of highways in my country had reached 45,300 km [1]. Expressways can not only promote economic growth and social development, but also bring huge economic and social benefits [2]. However, with the rapid development of expressways, some problems have emerged in traffic operation and handling. With the continuous increase of traffic flow, the frequency of congestion and traffic accidents is also increasing, and the impact of the composition is also increasing [3]. In order to reduce the negative impact of traffic accidents on the operation of expressways, research on Traffic Incidents Detection algorithms, rapid detection of traffic incidents, recognition and recognition of the scene, and the use of traffic flow guidance, can effectively reduce the impact of traffic incidents on traffic incidents [4].

2 Related Works

Expressway incident detection is the key and center of the expressway traffic processing system, so the research on incident detection has impressive results [3, 5, 6]. First, we compare the traffic parameter data between neighboring stations to identify possible sudden traffic accidents [6].

In 1990, based on the catastrophe theory Prasuet et al. developed the McMeST algorithm. The algorithm uses multiple non-crowded and congested traffic data to establish a template of the distribution relationship between traffic and occupancy [6].

In 1995, CHEU and others developed an algorithm based on a multi-layer feedforward network (MLF), which is composed of three layers of input layer, center layer and output layer [6].

In recent years, with the increase in scientific research funds invested in this area, some experts and scholars in the transportation field have proposed Traffic Incidents Detection algorithms. The domestic research on the AID algorithm mainly converges on the application of new theories and new technologies, including wavelet transformation, BP network, SVM, etc. [7, 8].

Li Wenjiang and Jing Bian proposed an incident detection algorithm based on wavelet analysis: logic judgment to determine whether there is a traffic incident is the basic idea of this method [9].

Lv Qi and Wang Hui proposed a traffic detection algorithm based on dynamic BP network. The algorithm learns the training algorithm of the static BP network, and improves the shortcomings of the static BP network training algorithm that the convergence speed is slow [10].

Liang Xinrong and Liu Zhiyong proposed a SVM algorithm based on the principle of structural risk minimization. According to the different reasons for the traffic flow parameters of traffic operation and non-traffic work, the input samples of the whole algorithm added to the algorithm [11].

3 First Section

In 1993, Vapnik et al. developed a new trainable machine learning method-Support Vector Machine (SVM) theory. It is a model based on statistical learning theory, maximum classification principle and structural risk minimization principle. Support Vector Machine algorithm has good generalization ability and is very suitable for dealing with limited small sample problems in pattern recognition and regression analysis [1,2,3,4,5].

3.1 Description of Classification Problem

Consider such a classification problem: The set of sample sets contain l sample points is:

$$\mathrm{T}=\{\left(\mathrm{x}1,\mathrm{y}1\right),\dots \left(\mathrm{x}\mathrm{l},\mathrm{y}\mathrm{l}\right)\}\in {(\mathrm{X}\times \mathrm{Y})}^{l}$$
(1)

Where the input index vector is \({\mathrm{x}}_{i}\in X={R}^{n}.\) The output is \({y}_{i}\in Y=\){+1, −1}, i = 1, 2, ….., l. The training sample is a collection of these l sample points. So for a given new input x, how to infer whether its output is +l or −1 according to the training set.

The classification problem is described in mathematical language as follows:

For a given training sample, where \(\mathrm{T}=\{\left(\mathrm{x}1,\mathrm{y}1\right),\dots \left(\mathrm{x}\mathrm{l},\mathrm{y}\mathrm{l}\right)\}\in {(\mathrm{X}\times \mathrm{Y})}^{l}\), among them \({\mathrm{x}}_{i}\in X={R}^{n}\), \({y}_{i}\in Y=\){+1, −1}, i = 1, 2, ….., l. Find a real-valued function g(x) on X = Rn. It can be seen from the above that finding a rule that can divide the point of Rn into two parts is the essence of solving the classification problem.

The above two types of classification problems are called classification learning machines. Because when g(x) is different, the decision function f(x) = Sgn((x)) will also be different.

Fig. 1.
figure 1

Linearly separable problem.

As shown in Fig. 1, the linearly separable problem is a classification problem that can easily separate two types of training samples with a straight line.

Fig. 2.
figure 2

Linear inseparable problem

As shown in Fig. 2, the linear inseparability problem is a classification problem that roughly separates the training samples with a straight line.

Fig. 3.
figure 3

Nonlinear separable problem

As shown in Fig. 3, the non-linear separable problem is a classification problem that will produce large errors when divided by a straight line. The Support Vector Machines corresponding to the above three classification problems are linear separable Support Vector Machines, linear inseparable Support Vector Machines and nonlinear separable Support Vector Machines.

3.2 Linear Separable Support Vector Machine

Optimal Separating Hyperplane

In the two-dimensional linear separable situation shown in Fig. 4, the solid point and the hollow point are used to represent the two types of training samples. The classification line that completely separates the two types of samples is H0, the normal vector of the classification line for w. First, suppose that the normal direction w of the optimal classification line H0 has been determined. At this time, the optimal classification line H0 is moved up and down in parallel until it touches a certain type of sample. The line between H1 and H−1 is the best classification line among these candidate classification lines.(W • X) + b =  −1 is the normalized representation of the line H1; (W · X) + b = −1 is the normalized representation of the line H−1. The distance between the two classification boundary lines is 2/\(\left\| w \right\|\). as mentioned above, the normal direction w with the largest distance between H1 and H−1 is obtained.

Fig. 4.
figure 4

The best classification surface

It can be seen from Fig. 4 that individual samples determine the maximum classification interval. The samples that happen to fall on the two classification boundary lines H1 and H-1 are called support vectors. Next, analyze why Support Vector Machines use the principle of maximizing classification interval:

For any training sample (x, y), the form of the test sample is (x + \(\Delta_{{\rm x,y}}\)), where the norm of perturbation \(\Delta_{{\rm x}}\) \(\in\) H is upper bounded by a positive number. If we use an interval as To divide the training samples with the hyperplane of p < r, we should separate all the test samples without error.

Since all training samples are at least classified hyperplane P, and the length of any input sample xi (i = 1, …, l) is bounded, the division of training samples will not be changed.

Linear Separable Support Vector Machine

2/\(\left\| w \right\|\) is the maximum classification interval in the case of Linear Separability. According to the principle of maximizing the classification interval, the problem of finding the Optimal Separating Hyperplane becomes the problem of finding the minimum normal direction w.1/2 \(\left\| w \right\|^{2}\),that is:

$$\left\{\begin{array}{c}\left(w\cdot {X}_{i}\right)+b\le -1{y}_{i}=-1\\ \left(w\cdot {X}_{i}\right)+b\ge 1{y}_{i}=1\end{array}\right.$$
(2)

Combining the (2), we can get \({y}_{i}\)(w • x) + b) ≥ 1. The Optimal Separating Hyperplane constructed in the case of Linear Separability is transformed into the following quadratic programming problem:

$$\left\{\begin{array}{c}s.t.{y}_{i}\left(\left(w\cdot x\right)+b\right)\ge -1\\ \underset{w,b}{\mathrm{min}}\frac{1}{2}(w\cdot w)\end{array}\right.$$
(3)

The original problem of solving the Optimal Separating Hyperplane. Duality theory converts the original problem of solving the Optimal Separating Hyperplane into a dual space into a dual problem solution. Define the Lagrange function as:

$$\mathrm{L}\left(\mathrm{w},\mathrm{b}, \mathrm{a}\right)=\frac{1}{2}\left(w\cdot w\right)-\sum\nolimits_{i=1}^{l}{a}_{i}\left[{y}_{i}(w \cdot {x}_{i})+b\right)-1],{a}_{i}\ge 0$$
(4)
$$\frac{\partial \left(w,b,a\right)}{\partial w}=w-\sum\nolimits_{i=1}^{l}{a}_{i}{y}_{i}{x}_{i}=0$$
(5)
$$\frac{\partial (w,b,a)}{\partial b}=\sum\nolimits_{i=1}^{l}{a}_{i}{y}_{i}=0$$
(6)

According to the Karush-Kuhn-Tucker (KKT) theorem, the optimal solution should also satisfy:

$${a}_{i}\left({y}_{i}\left(w\cdot x+b\right)-1\right)\ge 0\forall i$$
(7)

From Eq. (7) can see that the coefficient \({a}_{i}\) is required to be non-zero, so w can be expressed as:

$$\mathrm{w}=\sum\nolimits_{SV}{a}_{i}{y}_{i}{x}_{i}$$
(8)

Substituting Eqs. (6) and (5) into Eq. (4), we get:

$$\mathrm{L}\left(\mathrm{w}\mathrm{*},\mathrm{b}\mathrm{*},\mathrm{a}\right)=\sum\nolimits_{i=1}^{l}{a}_{i}-\frac{1}{2}\| w *{\| }^{2}=\sum\nolimits_{i=1}^{l}{a}_{i}-\frac{1}{2}\sum\nolimits_{i=1}^{l}\sum\nolimits_{j=1}^{l}{a}_{i}{a}_{i}{y}_{i}{y}_{i}({x}_{i}\cdot {x}_{j})$$
(9)

The optimization problem of Eq. (3) is transformed into a dual space, becomes a dual problem.

$$\mathrm{max}L\left(a\right)=\sum\nolimits_{i=1}^{l}{a}_{i}-\frac{1}{2}\sum\nolimits_{i=1}^{l}\sum\nolimits_{j=1}^{l}{a}_{i}{a}_{i}{y}_{i}{y}_{i}\left({x}_{i}\cdot {x}_{j}\right)$$
$$\mathrm{s}.\mathrm{t}.\;{a}_{i}\ge 0,i=\mathrm{1,2},\dots ,l$$
$$\sum\nolimits_{i=1}^{l}{a}_{i}{y}_{i}=0$$
(10)

If the coefficient ai* of the support vector is the optimal solution, then

$${\mathrm{w}}^{\mathrm{*}}=\sum\nolimits_{i=1}^{l}{a}_{i}*{y}_{i}{x}_{i}$$
(11)

For any given test sample x, select the non-zero support vector coefficient ai*, substitute it into Eq. (7) to obtain b*, and then use the obtained ai*, b*, training sample xi, training result \({y}_{i}\) and The test sample x is put into the Eq. (12).

$$\mathrm{f}\left(\mathrm{x}\right)=\mathrm{s}\mathrm{g}\mathrm{n}[\sum\,{a}_{i}*{y}_{i}\left({x}_{i}\cdot x\right)+{b}^{\mathrm{*}}]$$
(12)

3.3 Linear Inseparable Support Vector Machine

The basic idea of the Linear Non-Separability Support Vector Machine problem is to introduce a non-linear transformation \(\phi (x)\), so that the input space Rn can be mapped to a high-dimensional feature space: \(\mathop z\limits^{\sim }\): X \(\subseteq\) Rn \(\mathop {\xrightarrow{{\phi (X)}}}\limits^{{}}\) Z \(\subseteq\) \(\mathop z\limits^{\sim }\). Then find the high-dimensional feature space as shown in Fig. 5.

Fig. 5.
figure 5

Shows the nonlinear mapping

The method of obtaining the Optimal Separating Hyperplane of nonlinear separable Support Vector Machine. The Kernel functions mainly used for classification problem processing are:

  1. (1)

    Polynomial function Kernel function

    $$\mathrm{K}\left(\mathrm{x},{x}_{i}\right)=[<x\cdot {x}_{i}>+1{]}^{q}$$
    (13)
  2. (2)

    Gaussian Radial Basis Kernel function

    $$\mathrm{K}\left(\mathrm{x},{x}_{i}\right)=\mathrm{e}\mathrm{x}\mathrm{p}(-\frac{||x-{x}_{i}||}{{6}^{2}})$$
    (14)
  3. (3)

    Tangent Hyperbolic Kernell function

$$\mathrm{K}\left(\mathrm{x},{x}_{i}\right)=\mathrm{t}\mathrm{a}\mathrm{n}\mathrm{h}[v<x\cdot {x}_{i}>+c]$$
(15)

Because the final result is determined by a small number of support vectors, we can “reject” a large number of redundant samples by grabbing these key samples. It has better “robustness” [6].

4 Design of Event Detection Algorithm Based on SVM

Based on the principle of traffic accident detection and support vector machine theory introduced in the previous chapter, the event detection algorithm based on SVM is designed to detect the occurrence of traffic events.

4.1 The Principle and Design Process of SVM-AID Algorithm

  1. (1)

    The principle of the SVM-AID algorithm is: According to the principle of maximum classification interval, the acquired data set is optimized and classified.

  2. (2)

    The design process of SVM-AID algorithm is as follows:

Fig. 6.
figure 6

Design steps of SVM-based event detection algorithm

As shown in Fig. 6, the realization of the SVM-AID algorithm generally needs to go through the following steps: First, determine the traffic parameter indicators used in this design. Then the data is normalized, and all the data are transformed to between [−1, 1]. After that, selected train and test the SVM. Finally verify the suitability of the selected parameters.

4.2 Traffic Parameter Selection

Because if you choose all the multiple related indicators of each traffic sample, it will greatly increase the complexity of the analysis problem. Consider whether you can use a few new indicators to replace the original indicators based on this relationship between the variable indicators.

Principal Component Analysis

  1. (1)

    Basic concepts: The Principal Component Analysis method replaces the original indicators with a few independent indicators to reflect the information to be presented.

  2. (2)

    Basic principles: Suppose there are p indicators and n samples to construct an n × p order matrix.

    $$\text{X}=\left[\begin{array}{cc}\begin{array}{cc}{x}_{11}& {x}_{12}\\ {x}_{21}& {x}_{22}\end{array}& \begin{array}{cc}\cdots & {x}_{1p}\\ \cdots & {x}_{2p}\end{array}\\ \begin{array}{cc}\vdots & \vdots \\ {x}_{n1}& {x}_{n2}\end{array}& \begin{array}{cc}\vdots & \vdots \\ \cdots & {x}_{np}\end{array}\end{array}\right]$$
    (16)

Definition: denote x1, x2, …, \({x}_{p}\) as the original index, and z1, z2, …, \({\mathrm{z}}_{m}\) (m ≤ p) as the new index.

$$\left\{\begin{array}{c}{z}_{1}={l}_{11}{x}_{1}+{l}_{12}{x}_{2}+\cdots {l}_{1p}{x}_{p}\\ {z}_{2}={l}_{21}{x}_{1}+{l}_{22}{x}_{2}+\cdots {l}_{2p}{x}_{p}\\ \cdots \cdots \\ {z}_{m}={l}_{m1}{x}_{1}+{l}_{m2}{x}_{2}+\cdots +{l}_{mp}{x}_{p}\end{array}\right.$$
(17)

How to determine the coefficient \({l}_{ij}\):

  1. 1)

    1) \({\mathrm{z}}_{i}\) and \({\mathrm{z}}_{j}\) (i ≠ j; i, j = 1, 2, …, m) are linearly independent of each other;

    Then the new indexes z1, z2…, \({\mathrm{z}}_{m}\) are respectively called the first, second, …, principal components of the original indexes x1, x2, …, \({\mathrm{x}}_{p}\).

    From the above analysis, the following conclusions can be drawn: the essence of Principal Component Analysis is to determine the original index \({\mathrm{x}}_{j}\) (j = 1, 2, …, p) on each principal component \({\mathrm{z}}_{i}\) (i = 1, 2, …, m) The load \({\mathrm{l}}_{ij}\) (i = 1, 2, …, m; j = 1, 2, …, p).

  2. 2)

    Calculation steps.

    Assuming that there are p indicators and n samples, the original data becomes a matrix of n × p order.

    $$\mathrm{X}=\left[\begin{array}{cccc}{x}_{11} & {x}_{12}&\cdots &{x}_{{\rm 1p}} \\ {x}_{21} &{x}_{22} & \cdots & {x}_{2{\rm p}} \\ \vdots & \vdots & \vdots & \vdots \\ {x}_{{\rm n}1} & {x}_{{\rm n}2} &\cdots & {x}_{{\rm np}} \end{array}\right]$$
    (18)
  3. 3)

    Standardize raw data. Use the standardized matrix to find the correlation coefficient matrix.

    $${r}_{ij}=\frac{\sum\nolimits_{k=1}^{n}({X}_{ki}-\bar{{X}_{i}})({X}_{k}-{\bar{X}}_{j})}{\sqrt{\sum\nolimits_{k=1}^{n}({X}_{ki}-{\bar{X}}_{i}{)}^{2}}\sqrt{\sum\nolimits_{k=1}^{n}{(X}_{{k}_{j}}-{{\bar{X}}_{j})}^{2}}}$$
    (19)
    $$\mathrm{R}=({\mathrm{r}}_{\ddot{U}}{)}_{p\times p}$$
    (20)
  4. 4)

    Use the correlation coefficient matrix to obtain the characteristic root and the corresponding characteristic vector.

    $${\mathrm{\lambda }}_{1}\ge {\mathrm{\lambda }}_{2}\ge \dots \ge {\mathrm{\lambda }}_{\mathrm{p}}>0$$
    (21)
    $${\mathrm{a}}_{1}=\left[\begin{array}{c}{a}_{11}\\ \vdots \\ {a}_{p1}\end{array}\right],{\mathrm{a}}_{2}=\left[\begin{array}{c}{a}_{12}\\ \vdots \\ {a}_{p2}\end{array}\right],\dots ,{\mathrm{a}}_{p}=\left[\begin{array}{c}{a}_{1p}\\ \vdots \\ {a}_{pp}\end{array}\right]$$
    (22)
  5. 5)

    Calculate the contribution rate and cumulative contribution rate of the principal components.

Contribution rate.

$$\frac{{\lambda }_{i}}{\sum\nolimits_{i=1}^{p}{\lambda }_{k}}i=\mathrm{1,2},\dots ,p$$
(23)

Cumulative contribution rate

$$\frac{\sum\nolimits_{k=1}^{i}{\lambda }_{k}}{\sum\nolimits_{i=1}^{p}{\lambda }_{k}}i=\mathrm{1,2},\dots ,p$$
(24)

The index corresponding to the characteristic value whose cumulative contribution rate exceeds 85% is the 1…, mth (m ≤ p) principal components.

Acquisition and Discrimination of Traffic Input Data

This paper is determined by analyzing the experimental results. After that, the samples are divided based on the critical crowding density km.

Selection of Training Samples and Test Samples

The downstream the flow and density measured at the detection station t, t−1, t−2 are compared with the flow and density measured at the upstream detection station t−2, t−3, t−4. When selecting training samples, several sets of data can be selected from the data as the design test samples [7].

4.3 Choice of Model and Kernel Function

Selection of Kernel Function.

The most commonly used Kernel functions for SVM are Polynomial kernel functions and The Radial Basis Kernel function is selected in the SVM-AID experiment, and its form is as follows:

$$\mathrm{K}\left(\mathrm{x},{\mathrm{x}}_{\mathrm{i}}\right)=\mathrm{e}\mathrm{x}\mathrm{p}(-\frac{||x-{x}_{i}||}{{\sigma }^{2}})$$
(25)

Selection of Penalty Coefficient C and \(\delta\) core width.

In the SVM algorithm, We use one of the fixed Penalty Coefficient C or the kernel \(\delta\) width and change the other to detect the same set of data.

4.4 Algorithm Simulation and Result Analysis

Simulation experiment 1-Use Principal Component Analysis to Select Input Traffic Flow Parameters.

In the traffic database, 350 sets of samples whose attributes are density, flow, and speed are selected. The experimental results are as follows:

Table 1. Results of simulation experiment 1.

He principal components 1, 2, and 3 in Table 1 above correspond to the three variable indicators of density, flow, and speed respectively. In the following simulation experiments, two indicators of density and flow are used to detect traffic incidents.

Simulation Experiment 2-With or Without Normalization Control Experiment.

The Normalization Method is used to process them to reduce the data span. Two sets of data are selected, one set of data is normalized, and the other set of data is not processed. Results are as follows:

Table 2. Results of simulation experiment 2.

By analyzing Table 2, the following conclusions can be drawn:(1) Normalization is faster than no normalization, processing data faster and saving time. (2) By adjusting the non-uniform data scale to [−1, 1]. (3) Normalization speeds up the convergence when the program is running.

Simulation Experiment 3-Choice of Penalty Coefficient C and \(\delta\) Core Width.

Because of the selection of different Penalty Coefficients or core widths, the performance indicators of the algorithm will all be affected. Therefore, the following two experiments are designed to select the Penalty Coefficient and the kernel width. The experimental results are as follows:

Table 3. Experimental results of fixed \(\delta\) transformation C.

It can be analyzed from Table 3 and Fig. 6, so C = 300 is selected in SVM-AID. The experimental results are as follows:

Table 4. Experimental results of fixed C transformation \(\delta\).\(\delta\) value

It can be analyzed from Table 4, as the kernel \(\delta\) width increases, the average detection time will continue to grow. When the \(\delta\) value reaches 6, the classification accuracy rate does not increase significantly, and the detection time is still increasing. Therefore, select \(\delta\) = 6 in SVM-AID.

Simulation Experiment 4–4 Road Sections Are Tested with Selected Parameters.

Choose 4 sets of data from different road sections, each set of data consists of 30 sets of training data and 361 sets of test data. The experimental results are as follows:

Table 5. Results of simulation experiment 4.

It can be seen from Table 5 that the event detection rate is as high as 98%. By comparing the results of 4 road sections, it can be seen that the classification accuracy rate is slightly less than the event detection rate.

5 Summary

This article first analyzes the digs deep into the characteristics of expressway traffic flow under traffic incident conditions. Afterwards, using Support Vector Machine theory, a Traffic Incidents Detection algorithm based on Support Vector Machine was proposed. Then the traffic flow parameters, Penalty Coefficients and kernel width of the SVM-AID algorithm are determined through multiple experiments, and the influence of each parameter in the SVM algorithm on the identification of traffic incidents is analyzed, and the suitability of the selected parameters is verified.