1 Introduction

At present, the research on intelligent vehicle is divided into two directions: the automatic driving and the auxiliary driving. The former is mainly used for military research and other special occasions and the latter is mainly for the general car users. In order to carry out the auxiliary driving reasonably, it is clear that the correct cognition of the driver’s behavior is needed. Modern road traffic system is a complex dynamic system composed of people, vehicles, roads and environment (four elements), in which the driver’s behavior plays a key role in the normal operation of road traffic.

Driving behavior is a continuous reciprocating process composed of information perception, and action based on judgment and decision while driving. In the perception stage: the driver percepts the factors of the road traffic environment, vehicle performance as well as state mainly through visual, auditory and tactile sense and so on. In the judge and decision stage: the driver analyzes and determine the decision based on the perception information while traveling. In the action stage: the driver manipulates the car based on the judgment and the action includes start, flameout, speed, steering, braking, overtaking and lights, etc.

There are many contents in the research of driver’s vision, among which the research of gaze behavior is the core content. As for the fixation point, in the process of driving, the change of fixation point can well reflect the driver’s selective attention mechanism. The selective attention is a rather important feature on human’s natural vision. That is to say, on the one hand, we humans are not interested in all the things in the environment when observing the external environment. What we focus is the only some parts of the environment. On the other hand, our human eyes’ selective attention mechanism changes as time goes by. That is, the sampling frequency of our human eyes to the external environment is not static. According to the theory of biology, there is a macular region in the center of the retina of our human eye. The closer to the macular area is, the higher the resolution is. Log polar coordinates are ever applied to the logarithmic visual acuity chart. Related research shows that the log polar coordinates can also be used to represent the mapping between the strip skin and the retina of the human eye. The log polar mapping is a conformal mapping. One of the advantages of conformal mapping is that it can reduce the redundant information to the target. Another merit is that the mapping has the characteristics of scale and rotation invariant. Because of our selective attention, we do not have to deal with all the information in the environment.

The research on the drivers’ driving characteristics and the analysis of a series of operations in the driving process and its application to the active safety of the unmanned vehicles not only has important theoretical value but also has significant application value. The paper applies the cognitive psychology theory to the research of driving behavior and combines the selective attention mechanism of psychophysics with the former Weber–Fechner’s law. The relationship between the selective attention mechanism of psychophysics and the Weber–Fechner law [according to the Weber–Fechner law, all people’s feeling (including visual feeling, auditory feeling and so on) all comply with the fact that the feeling is not proportional to the strength of the corresponding physical quantity but proportional to the logarithm of the corresponding physical quantity] makes us greatly simplify information when dealing with massive data problems from different sensors. This approach can not only reduce the complexity of the server’s processing but also drives the development of intelligent vehicle in information computing.

Through applying the theory of hierarchical analysis, we take the safety and comfort of intelligent vehicle as the breakthrough point. And then we took the data of human drivers’ perception behavior as the training set and did regression analysis using the method of regression analysis of machine learning according to the charts of the vehicle speed and the visual field, the vehicle speed and the fixation point as well as the vehicle speed and the dynamic vision. At last we established linear and nonlinear regression models for the training set. Last but not least important, we verified the accuracy of the model through the comparison of different regression analysis. Eventually, it turned out that using logarithmic relationship to express the relationship between the vehicle speed and the visual field, the vehicle speed and the gaze point as well as the vehicle speed and the dynamic vision is better than other models. In the aspect of application, we adopted the technology of multi-sensor fusion and transformed the acquired data from radar, navigation and image to log-polar coordinates, which supplemented and verified multiple information from different sensors and also formed the comprehensive description about the surrounding environment. The approach can make full use of the redundant and mutual complementary characteristics of different sensors and obtain the needed information for intelligent vehicle. We also make the logarithmic model used in the interactive system. For users they can intervene the intelligent vehicle in real-time. On the other hand, the interactive system also make people deal with problem from the actual situation and choose reasonable scheme. Besides, it can also improve the ability of task execution of intelligent vehicle to some extent. We can control the intelligent vehicle through this human–machine interaction system. It is convenient for us to assign a task to intelligent vehicle and monitor the completion of tasks, which can better explain and understand the intelligent driving behavior.

The rest of this paper is organized as follows: in Sect. 2, related works on drivers’ behavior at home and abroad are studied. In Sect. 3, we briefly introduce the regression method of machine learning. In Sect. 4, we present our experiment on sample data using different regression analysis. In Sect. 5, it is the regression diagnostics. Finally, conclusions are drawn and future prospects are discussed in Sect. 6.

2 Recent studies

From the perspective of cognitive psychology, there are mainly two kinds of psychological mechanisms, namely attention mechanism and working memory mechanism on the driver’s adapting to the complex environment. Cognitive psychology says that: people are not the entity that accept external stimuli mechanically and respond passively, but the one that obtain and process environmental stimulus actively and selectively. From the cognitive theory and method, imitating human psychology activity and intelligent behavior in the selective attention mechanism will make driver feel and understand the environment from one-sided, discrete and passive perception to the global, connective and initiative cognitive level.

Gaze behavior is the core research of a driver’s visual characteristics. As far as the fixation point is concerned, the changes of the fixation point in the process of driving can be a good performance of the selection, duration and transmission of visual attention. Selective attention is an important characteristic of human visual nature: on the one hand, people do not identify all natural visual objects in the scene carefully and accurately. And they often percept and understand some areas in the scene of the environment out of interest; On the other hand, selective attention will vary with time. It is non-uniform sampling and the sampling frequency of the visual scene is not immutable and frozen and it changes with time [1]. The research shows that the human eye retina to obtain information from the outside world is not uniform. That is there is a high resolution fovea in macular retinal center, and the macular peripheral resolution reduce gradually with the increasing distance from the center of the retina, and the mapping from retina to striate cortex can be used to log polar mapping approximately. The log-polar coordinate is a conformal mapping which is non-uniform to the field of sampling and it can reduce the redundant information to the characteristic target. And the mapping has the advantage of scale and rotation invariance. Selective attention mechanism may solve the problem of real-time visual information processing and the massive data problem in complex environment.

Study on the visual perception of the relevant model of the driver, we found that the visual perception model is mainly to solve three problems: the first is that where the driver’s visual focus is on in the actual driving; the second is that how much the driver’s visual range is. Drivers always focus on the road in front of the car and the higher the speed is, the more concentrated the driver’s attention is and the more difficultly the driver’s attention transfer. With the increase of the speed, the driver’s field of vision will be narrowed. While the driver’s fixation point is also moving forward (refer to the relevant literature about visual focus, we mainly analyzed the main factors on the visual impact and made the following assumptions about the method of visual focus: the driver’s visual focus should be in the path he expected in the absence of conflict time); the third is how about the driver’s dynamic state of visual perception is. The dynamic vision plays an important role in the driving process and the dynamic vision varies with the change of vehicle speed.

The research of driving behavior content is complex and difficult. The research in this area at abroad started earlier. Over the past decades there has been a considerable development in modeling human behavior in traffic. Recently, a great number of theoretical and computational models have been proposed to study traffic flow and microscopic models. In 2016, Dehban et al. [2] uses a cognitive based driver’s steering behavior modeling to explain how the driver can acquire information in his/her visual field and also how the driver manipulates its environment. At the same time Schnelle et al. [3] put forward a driver steering model with personalized desired path generation. In light of growing attention of intelligent vehicle systems, Driggs-Campbell and Bajcsy [4] derive models based off of human perception and interaction with the environment (e.g., other vehicles on the road) models for driver models that predict driver behaviors. Also Tang et al. [5] proposed a model based on the hidden Markov theory to interpret the dynamic decision-making process during the phase transition period at high-speed signalized intersections. It was found that the proposed model could predict stop-pass decisions with very high accuracy and revealed that approximately 50% of drivers used a two-step decision-making Process. Maran et al. [6] studied a real-time, MPC based motion cueing procedure with time-varying prediction for different classes of drivers. In 2015, X. Xiang et al. [7] put forward a closed-loop speed advisory model with driver’s behavior adaptability for eco-driving. In 2014, Saifuzzaman and Zheng [8] presented a literature review with specific focus on the latest advances in car-following models to include human behavior.

China also has done a lot of research in recent years. In 2012, Tao [9] established the frame structure of driving behavior model based on the theory of psychological field. In 2013, Zheng [10] proposed a cellular automaton model and two lane optimization speed model for the driver’s behavior in complex traffic scenes. In 2014, Zhang et al. [11] put forward the state analysis model of driving behavior based on Bayesian network construction, neglecting the relationship between the type of driving behavior and the observed driving data, directly showing the correlation between the observational data and the state of driving behavior in the form of probability network. Ji [12] in Jilin University constructs the prediction model of driving behavior and put forward the prediction method of driving behavior based on the driver’s vision characteristic and analyze the impact on the driving behavior of the behavior parameters and driving intention identification for the bridge. Tan [13] in Zhongnan University put forward a detection method based on visual attention model of the traffic danger and the method of driver behavior modeling based on Bayesian model and the method of modelling on dangerous driving based on fuzzy rules. This method can be used for a variety of dangerous driving behavior decision. In 2015, Ren [14] determined the key driving operation for the specific driving behavior is using principal components by collecting and analyzing driving behavior and the data of driving operation mode. Qu [15] in Jilin University proposed the predictive method of driving behavior based on stochastic model. The method is an effective modeling on describing the driver’s behavior characteristics. Xu et al. [16] in Zhejiang University can make the car environment become personalized space for user, thus providing user with personalized car service through modeling of the vehicle environment.

Fig. 1
figure 1

Interactive debugging diagram

Fig. 2
figure 2

Log polar coordinate transformation diagram

It has the following shortcomings about the above models: the structure is complex and the conditions of the application is harsh. Although it can well explain some traffic phenomena or simulate driver behavior, it is hard to be accepted because it exists problems: for example, it cannot be solved or difficult to solve or it takes long time to optimize. As a result, its application is very less and lacks of vitality. Besides, the construction of most models is based on the perceptual recognize in the existing micro simulation model on car following. The researchers use the statistical analysis to process data without consideration of the choice of input variables. The paper takes advantage of the merits of log polar that can reduce the redundant information of the scale characteristics of the target, we construct the logarithmic model suitable for the driver’s safety and comfort according to the charts of the vehicle speed and the visual field, the vehicle speed and the fixation point as well as the vehicle speed and the dynamic vision. What is more, we also verify the accuracy of the model through the comparison of different regression analysis.

In this paper, the model is applied to the intelligent driver’s cognitive interactive debugging program (as shown Fig. 1), which can better understand the intelligent driving behavior. Perception and understanding the surrounding environment of the intelligent car is the basic premise to realize intelligent vehicle. Only perceive the information of the road, vehicles, pedestrians and so on around the vehicle timely and accurately, does it have a reliable basis on decision making about the vehicle driving behavior. No single sensor can provide reliable information in any case. The driver’s cognitive coordinate system is a log-polar coordinate system. This paper adopts the technology of multi-sensor fusion, and transforms the acquired data of radar, navigation and image to log-polar coordinates (as shown Fig. 2), supplementing and verifying multiple information from different sensors and forming the comprehensive description about the surrounding environment, which can make full use of the redundant and mutual complementary characteristics of different sensors and obtain the needed information for intelligent vehicle. The procedure of cognitive and interaction about the intelligent car is based on the platform of Beijing Union University intelligent vehicle C70 (as shown Fig. 3). The intelligent driver’s cognitive interactive debugging program shows that human’s selective attention mechanism can solve the problem of real-time visual information processing and the massive data problems in complex environment. It also turns out that the higher the speed is, the more concentrated the driver’s attention is and the more difficultly the driver’s attention transfer. Besides, the interactive debugging program provide strong support to solve the driving decisions about the intelligent vehicle in urban road through constructing the driving situation map (as shown Fig. 4) to realize the information fusion of camera, radar and GPS\({\vert }\)IMU sensors.

Fig. 3
figure 3

C70 car

Fig. 4
figure 4

The map of driving situation

Table 1 The sample data for the vehicle speed and the visual field as well as the fixation point
Table 2 The sample data for the vehicle speed and the dynamic vision

3 Regression methods of machine learning

The paper mainly uses the regression method of machine learning to model the sample data of the vehicle speed and visual field [17], the vehicle speed and the fixation point [17], as well as the vehicle speed and the dynamic vision chart [18] (Tables 1, 2).

Regression analysis is to use the sample (known data), resulting in the fitting equation, so as to predict the unknown data. The regression analysis includes the analysis of the related form of specific phenomena. According to the research purpose in the regression analysis, we should distinguish between independent and dependent variables, and determine the specific form of the equation about the independent and dependent variables of the relationship. This can be shown in Figs. 5 and 6. The paper take the data of the charts of vehicle speed and vision, vehicle speed and gaze point as well as vehicle speed and dynamic vision respectively as the sample of regression analysis.

Fig. 5
figure 5

The first classification on statistical analysis of variables

Fig. 6
figure 6

The second classification on statistical analysis of variables

3.1 The relationship between variables in regression analysis

Regression analysis is to test the relevant factors, determine the relationship between the cause and the result and show the specific equation in mathematical model, so as to carry out the various statistical analysis. In the related diagram, if the independent variable and the dependent variable correspond to a straight line or the calculated correlation coefficient has a significant linear correlation, then it can be fitted to a linear equation.

The main steps of regression analysis are: to establish regression model, to solve the parameters of the regression model and to test the regression model.

There are two types of relationships between independent variables and dependent variables in the regression analysis: functional relationship: \(y=a+bx.\) In statistics, it is also called that x decides y or x is the decision factor of y.

The related relationship: the so-called non-deterministic relationship refers that there is no identified relationship between x and, but there is the same factor that plays a role behind that relationship. It also presents a kind of correlated relationship from superficial data. The relevant relationship determines whether it is suitable to do regression model or not. The correlation coefficient determined whether it is suitable to do linear regression. The formula for the correlation coefficient is shown as follows:

$$\begin{aligned} \gamma _{XY} =\frac{\sum \nolimits _{i=1}^n {(x_i -\bar{{x}})(Y_i -\bar{{Y}})} }{\sqrt{\sum \nolimits _{i=1}^n {(x_i -\bar{{x}})^{2}} \sum \nolimits _{i=1}^n {(Y_i -\bar{{Y}})^{2}} }}. \end{aligned}$$
(1)

3.1.1 How to determine the parameters?

Use the sum of square error to measure the gap between the predicted values and the actual ones. If the true value is y and predictive value is \(\hat{{y}},\) then the squared error is \(\sum {(y-\hat{{y}})} ^{2}.\) We need to find the proper parameters to make the square error \(RSS=\sum \nolimits _{i=1}^n (y_i -\bar{{y}}_i )^2 \) minimum. It can be shown in Fig. 7.

Fig. 7
figure 7

The square error between the predicted value and the actual one

The least squares:

$$\begin{aligned} RSS=\sum \limits _{i=1}^n \left( y_i -\hat{{y}}_i\right) ^2 =\sum \limits _{i=1}^n \left[ y_i -\left( \alpha +\beta X_i\right) \right] ^2 . \end{aligned}$$
(2)

RSS is actually the function of \(\alpha \alpha \) and \(\beta . \) We do partial derivative for \(\alpha \) and \(\beta ,\) respectively and let partial derivatives equal to zero, then we can obtain the value of \(\alpha \) and \(\beta {\text {:}}\)

$$\begin{aligned} \beta =\frac{\sum \nolimits _{i=1}^n {(X_i -\bar{{X}})} (Y_i -\bar{{Y}})}{\sum \nolimits _{i=1}^n {(X_i -\bar{{X}})^{2}} }, \end{aligned}$$
(3)
$$\begin{aligned} \alpha =\bar{{Y}}-\beta \bar{{x}}. \end{aligned}$$
(4)

Due to the overall sample are unknown and estimate the value of a and b using sample values:

$$\begin{aligned} b=\bar{{\beta }}=\frac{\sum \nolimits _{i=1}^n {(x_i -\bar{{x}})(y_i -\bar{{y}})} }{\sum \nolimits _{i=1}^n {(x_i -\bar{{x}})^2 } }, \end{aligned}$$
(5)
$$\begin{aligned} a=\hat{{\alpha }}=\bar{{y}}-b\bar{{x}}. \end{aligned}$$
(6)

Thus for each \(x_i ,\) we can predict the corresponding value of y through \(\hat{{y}}_i =a+bx_i. \)

4 Experiment

The software used in the experiment: R software.

R is a software which programming language is the same as the s language that S-plus is based on. It is a complete software system composed of data processing, calculation and drafting. R language has a good interface with other programming languages and databases.

4.1 Experiment results

  1. (1)

    The hypothesis test of \(\beta _1 \) We use hypothesis test to explain the summary data about the below table. Take the hypothesis test of \(\beta _1 \) for example. This hypothesis test also applies to \(\beta _0 .\) If the coefficient of \(\beta _1 \) is 0, we construct a statistic “t” according to the probability and mathematical statistics theory. The statistic t is as follows:

    $$\begin{aligned} s^{2}=\frac{1}{n-2}\sum \limits _{i=1}^n {\left( \hat{{y}}_i -y_i \right) ^2 } =\frac{1}{n-2}\sum \limits _{i=1}^n {e_i ^2 } , \end{aligned}$$
    (7)
    $$\begin{aligned} SE\left( b_1 \right) =\frac{s}{\sqrt{\sum \nolimits _{i=1}^n { (x_i -\bar{{x}})^2 } }}, \end{aligned}$$
    (8)
    $$\begin{aligned} t=\frac{b_1 -\beta _1 }{SE(b_1 )}. \end{aligned}$$
    (9)

    And then we calculate the value of t and the corresponding area “Pr” (as is shown Fig. 9), the smaller Pr value is, the better the result is. Generally the Pr value requires less than 0.05. The steps of hypothesis test is as follows:

    1. (1)

      Hypothesis test: \(H_0{\text {:}}\,\beta _1 =0,\,H_1 {\text {:}}\,\beta _1 \ne 0.\)

    2. (2)

      Give test level \(\alpha , \) that is to say the hypothesis is based on how much the test level.

    3. (3)

      Calculate the test statistic \(t=\frac{b_1 -\beta _1 }{SE(b_1 )},\) at the same time the statistics conform to t distribution.

    4. (4)

      Calculate \(\Pr \) value corresponding to t value.

    5. (5)

      Compare the test level with the calculated \(\Pr \) value. If \(\Pr \) is less than or equal to \(\alpha ,\) then refuse \(H_0 \) and accept \(H_1.\) If \(\Pr >\alpha , \) accept \(H_0 \)and then refuse \(H_1.\)

    The statistic “t” is in accordance with t-distribution. The t-distribution is similar to Normal distribution (as is shown Fig. 8).

  2. (2)

    The hypothesis test of \(\beta _0{\text {:}} \) The estimated value of is \(b_0 \) and it is also unbiased. Its standard error is \(SE(b_0 =s\sqrt{\frac{\sum \nolimits _{i=1}^n {x_i ^{2}} }{n\sum \nolimits _{i=1}^n {(x_i -\bar{{x}})^{2}} }})=s\sqrt{\frac{1}{n}+\frac{\bar{{x}}^{2}}{\sum \nolimits _{i=1}^n {(x_i -\bar{{x}})^{2}} }}.\) Based on this, statistics \(t=\frac{b_0 -\beta _0 }{SE(b_0 )}\) conform to T distribution of which the freedom is n − 2 degrees. The test of the original hypothesis is \(H_0{\text {:}}\,\beta _0 =0,\) alternative hypothesis is: \(H_1{\text {:}}\,\beta _0 \ne 0.\) Calculate the value of the test statistic and investigate the P value of the one-sided. The steps of hypothesis test about \(\beta _0 \) are as follows:

    1. (1)

      Hypothesis test: \(H_0 {\text {:}}\,\beta _0 =0.\)

    2. (2)

      Give test level \(\alpha ,\) that is to say the hypothesis is based on how much the test level.

    3. (3)

      Calculate the test statistic \(t=\frac{b_0 -\beta _0 }{SE(b_0 )},\) at the same time the statistics conform to T distribution.

    4. (4)

      Calculate P value corresponding to t value.

    5. (5)

      Compare the test level with the calculated P value. If P is less than or equal to \(\alpha , \) then refuse \(H_0 \) and accept \(H_1. \) If P is greater than \(\alpha ,\) then accept \(H_0 \) and then refuse \(H_1\) (Tables 3, 4, 5).

Fig. 8
figure 8

T distribution corresponding to the number of different samples (the greater in the middle part of the possibility is, the smaller the possibility of the two sides is. As long as we observe which area the t value falls in, then we can know how much the possibility of the establishment of the assumption is. The threshold value is 0.05 in this paper)

Fig. 9
figure 9

Given the level of test \(\alpha \)

Table 3 The results of the vehicle speed and the visual field
Table 4 The results of the vehicle speed and the fixation point
Table 5 The results of the vehicle speed and the dynamic vision

4.2 Experiment conclusion

According to the Weber–Fechner law, all people’s feeling, including visual feeling, auditory feeling, etc., all comply with the fact that the feeling is not proportional to the strength of the corresponding physical quantity but proportional to the logarithm of the corresponding physical quantity.

We verified the accuracy of the final model by comparing various regression modeling analysis such as linear regression, non-linear regression and so on through probability theory, the hypothesis test and significance test in the mathematical statistics. This part is detailed discussed in Sect. 5. We first only present the final model here.

The relationship between the vehicle speed and the fixation point:

$$\begin{aligned} v_s ={-}235.312+52.856\ln f_p , \end{aligned}$$

whereas \(v_s \) represents the vehicle speed, \(f_p\) represents the fixation point.

The relationship between vehicle speed and visual dynamics:

$$\begin{aligned} v_s =46.982-120.071\ln v_d , \end{aligned}$$

whereas \(v_s\) represents the vehicle speed, \(v_d\) represents the dynamic vision.

The relationship between the vehicle speed and the visual field:

$$\begin{aligned} v_s =311.331-58.557\ln v_f , \end{aligned}$$

whereas \(v_s\) represents the vehicle speed, \(v_f\) represents the visual field.

Table 6 Normality distribution test table

5 Regression diagnostics

Any model that we constructed must be verified to ensure that the simulation model can reasonably simulate a real system. Model validation and evaluation is an important process and it plays an important role on the model effects. There are mainly two different kinds of methods to test the comparison between the model and the real results. The first one: test the model effects using data curve; the second one: evaluate the effectiveness of the simulation mode using mathematical statistical methods. The former can intuitively showed the simulation results. The latter can quantitatively give the size of the error and evaluate the effect of model results [7].

5.1 Regression diagnostics

(1) Whether the samples meet the normal distribution assumption is the problem that we were doing about the regression analysis. If the sample is normally distributed, then how to test whether the sample data is normally distributed? We will do hypothesis test which assumes that the sample is normally distributed and the overall samples are also normally distributed. Check if we can negate this hypothesis. If we can deny this assumption, then the samples do not follow the normal distribution. Else we can say that the sample follow the normal distribution.

From Table 6, we can see that: the overall sample follow the normal distribution.

(2) The sample data that used for the regression analysis is obtained through measurement or sampling. The method of measuring of course will produce data errors. Some of the errors are reasonable and it will not have much impact on the model. Some of the data errors are large. Some of the data errors are small. As there is more and more data, the positive errors and negative errors will cancel each other. However there may exists significant errors that deviate the normal data far away. These data are called outliers. The question is whether there exists outliers that cause the model to produce large errors or not. We can see whether there exists outliers through drawing the graph of predicted value and residual value (Figs. 10, 11, 12).

(3) Is the linear model reasonable? Suppose the relationship is linear, there are many relationships that are not necessarily linear in the nature. It may be quadratic function or exponential function or other more complex relationships. Even it is the model that we cannot write the analytical expression. How to determine the linear model is reasonable? Whether the model is reasonable or not still needs further statistical tests.

In order to better understand the below table, the interpretation of the below summary data is as follows:

Signif it is the significant mark. The more the number of “*,” is, the higher the reliability is. *** stands for extremely significance; ** stands for highly significance; * stands for significance. The number of * is also connected with Pr value (we construct a statistic “t” according to the probability and mathematical statistics theory. And then we calculate the value of t and the corresponding area “Pr”). We mark 1 * if the value of “Pr” is about 0.05, similarly, we mark 2 * if the value of “Pr” is about 0.01 and 3 * if the value of “Pr” is about 0.001. For the convenience of expression, if the number of * is 0, we use “0*” to mark. If the number of * is 2, we use “2*” to mark. If the number of * is 3, we use 3* to mark.

Fig. 10
figure 10

The anomaly value distribution between the vehicle speed and visual field (we can see that there is two points that deviate from the path. However, the two points deviate that much. So we can say that there does not exist outliers that have much impact on the model)

Fig. 11
figure 11

The anomaly value distribution between the vehicle speed and the fixation point (we can see that there is a point that deviate from the path, but it deviate that much. So we can conclude that there does not exist outliers that have much impact on the model)

Fig. 12
figure 12

The anomaly value distribution between the vehicle speed and dynamic vision (we can see that there does not exist points that deviate from the path. So we can conclude that there does not exist outliers that have much impact on the model)

P-value we construct a statistic “t” according to the probability and mathematical statistics theory. And then we calculate the value of t and the corresponding area. That area is P-value. The smaller the P-value is, the better the result is.

Multiple R-squared the bigger the multiple R-squared is, the higher the corresponding model is.

Equation it is the expression form about the corresponding model.

As far as Signif concerned, from Table 7 we can clearly see that the linear regression and the logarithmic function are better than the polynomial function. So we exclude the polynomial function. As for multiple R-squared, the higher multiple R-squared is, the higher the possibility of equation is. We can see that the linear regression is better than the logarithmic function. However, according to the knowledge of statistics, we cannot all rely on the calculated result using software. The theory is the most part that we should consider first of all. according to the Weber–Fechner law, all people’s feeling, including visual feeling, auditory feeling, etc., all comply with the fact that the feeling is not proportional to the strength of the corresponding physical quantity but proportional to the logarithm of the corresponding physical quantity. What’s more, the multiple R-squared of the logarithmic function is 0.9566. To some extent, it is relatively high. Thus we conclude the conclusion that it is better to use the logarithmic function \(v_s =311.331-58.557\ln v_f \, (v_s \) represents the vehicle speed, \(v_f \) represents the visual field) to express the relationship between the vehicle speed and the visual field.

Table 7 The relationship between the vehicle speed and the visual field
Table 8 The relationship between the vehicle speed and the fixation point

As far as Signif concerned, from Table 8 we can see that the number of * using logarithmic function is more than the linear regression and the exponential function. As for multiple R-squared, we also see that the value of the logarithmic function is higher than the linear regression and the exponential function. Thus we conclude that it is better to use the logarithmic function \(v_s =-235.312+52.856\ln f_p\, (v_s \) represents the vehicle speed, \(f_p \) represents the fixation point) to express the relationship between the vehicle speed and the fixation point.

Table 9 The relationship between the vehicle speed and the dynamic vision

As far as Signif concerned, from Table 9 we can clearly see that the linear regression and the logarithmic function are better than the polynomial function. So we exclude the polynomial function. As for multiple R-squared, the higher multiple R-squared is, the higher the possibility of equation is. We can see that the logarithmic function is better than the linear regression. Thus we conclude the conclusion that it is better to use the logarithmic function \(v_s =46.982-120.071\ln v_d (v_s \) represents vehicle speed, \(v_d \) represents dynamic vision) to express the relationship between the vehicle speed and the dynamic vision.

(4) Whether the error meets the conditions of independence, equal variance (that is to say that the error will not change theoretically as the size of the variable varies and it has nothing to do with the dependent variable), normal distribution assumptions? (Normal distribution is also called Gaussian distribution, according to the central limit theorem, the error also follow the normal distribution) (Table 10).

Table 10 Whether the error follow the normal distribution

(5) If the relationship between the independent variable and dependent variable is multiple linear regression, whether it exists the phenomenon of multi collinearity? How many truly independent ones (one variable cannot be expressed by other variables) are there in all the independent variables \(X_1 ,\,X_2 ,\,X_3 ?\) When we process matrix computations and find the matrix non-inverse. Then the determinant of a matrix is very close to 0. Since when we solve the regression model, we need to find out the inverse of a matrix. If the matrix is irreversible, we will find out the error is particularly large and the result is that this model is basically meaningless. The variable that cannot be independent is called multi collinearity. In order to let reader learn collinearity more clearly. We explain collinearity as follows:

Collinearity if there exists some constants, and, making linear equation \(c_1 x_1 +c_2 x_2 =c_0 \) true for all the data of samples, then we call the two variables \(X_1 \) and \(X_2 \) precisely collinearity.

The discovery of multi collinearity \(x_{(1)} ,\,x_{(2)} ,\ldots ,x_{(p)} \) is the vector after the independent variables \(X_1 ,\,X_2 ,\ldots , X_p \) were centralized and standardized. We denote this vector by \(X=(x_{(1)} ,\,x_{(2)} ,\ldots , x_{(p)} ).\) Set \(\gamma \) is the characteristic value of \(X^{T}X,\) is the corresponding feature vector whose length is 1, that is \(\delta ^{T}\delta =1.\) If \(\gamma \approx 0,\) then \(X^{T}X\delta =\gamma \delta \approx 0;\) if we multiply the left formula by \(\delta ^{T},\) then we can get:

$$\begin{aligned} \delta ^{T}X^{T}X\delta =\gamma \delta ^{T}\delta =\gamma \approx 0. \end{aligned}$$

Thus:

$$\begin{aligned} X\delta \approx 0. \end{aligned}$$

That is:

$$\begin{aligned} \delta _1 x_{(1)} +\delta _2 x_{(2)} +\cdots +\delta _p x_{(p)} \approx 0. \end{aligned}$$

As can be seen from the above equation, for the independent variable \(x_{(1)} ,\,x_{(2)} ,\ldots ,x_{(p)}, \) it does exist \(\delta _1 ,\,\delta _2 ,\ldots , \delta _p \) making the above formula was established. That is to say that there exists multicollinearity between independent variables. An important indicator to measure the severity of multicollinearity is through the condition value of matrix \(X^{T}X,\) i.e.:

$$\begin{aligned} k\left( X^{T}X\right) =\left\| {X^{T}X} \right\| \cdot \left\| {\left( X^{T}X\right) ^{-1}} \right\| =\frac{r_{\max } (x^{T}x)}{r_{\min } (x^{T}x)}. \end{aligned}$$

Among the above formula, \(r_{\max } (X^{T}X),\,r_{\min } (x^{T}x)\) represents the largest and the smallest eigenvalue of \(x^{T}x.\) Usually if \(k<100,\) we can consider that the degree of multi collinearity is small; if \(100\le k\le 1000,\) we can believe that there exists moderate or strong multi collinearity; if \(k>1000,\) then we can consider that there is a serious multi collinearity.

After the rectangle \(\left( {{\begin{array}{cc} {170}&{} {80} \\ {100}&{} {180} \\ {86}&{} {335} \\ {60}&{} {377} \\ {40}&{} {564} \\ {22}&{} {710} \\ \end{array} }} \right) \) is centralized and standardized and we put the result into the above equation, getting the ratio of the maximum characteristic root and the minimum characteristic root is: 30.38056. Because the ratio of the maximum eigenvalue and the minimum eigenvalue is \(30.38056<100,\) the degree of multi collinearity between the independent variable represent for vision field and the independent variables represent for the distance of gaze point is very small.

6 Conclusion

Intelligent vehicle has been developed rapidly in recent years, but it develops relatively short in the domestic especially in the driving behavior of machine learning. It has been a bottleneck in the field of smart cars. In this article, we took the data of human drivers’ perception behavior as the training set and did regression analysis using the method of regression analysis of machine learning according to the charts of the vehicle speed and the visual field, the vehicle speed and the fixation point as well as the vehicle speed and the dynamic vision. At last we established linear and nonlinear regression models for the training set. Last but not least important, we verified the accuracy of the model through probability theory, the hypothesis test and significance test in the mathematical statistics. Eventually, it turned out that using logarithmic relationship to express the relationship between the vehicle speed and the visual field, the vehicle speed and the fixation point as well as the vehicle speed and the dynamic vision is better than other models. In the aspect of application, we adopted the technology of multi-sensor fusion and transformed the acquired data from radar, navigation and image to log-polar coordinates, which supplemented and verified multiple information from different sensors and also formed the comprehensive description about the surrounding environment. Besides, the model is consistent with the famous Weber–Fechner law.(The Weber–Fechner law says that all people’s feeling, including visual feeling, auditory feeling and so on all comply with the fact that the feeling is not proportional to the strength of the corresponding physical quantity but proportional to the logarithm of the corresponding physical quantity.) We also use this logarithmic model to debug the cognitive and interactive system in the intelligent driving. It can be a better understanding of intelligent driving behavior. What’s more, we can analyze the stability of traffic flow using the simulation technique through the combination of the logarithmic model and the vehicle control system. It can reveal the mechanism of rear end collision accident to some extent. For the deficiency of this paper, we will increase the number of sample to improve the effectiveness of the experimental results in the future.