Introduction

The utilization of microscopic simulation modeling and intelligent transport systems (ITS) to evaluate traffic system performance has proved the inadequacy of the existing models, including car-following ones, to accurately represent the behavioral phenomena in complex real-world contexts. Despite the recent advancements of new technologies (such as ITS) for road system applications, achieving a detailed understanding of the dynamic behavioral response of drivers in ‘car-following processes’ has been a major safety concern since decades [1].

Although many studies have supported the car-following theories and development of its subsequent sub-models, calibration and validation of the models as well as an empirical verification of the underlying assumptions have led to serious difficulties with both collection and processing of accurate, unbiased time-series data in a common space–time reference system [2, 3]. As improved accuracy in the experimental data collection can substantially ameliorate the behavioral phenomena from a microscopic perspective and the realism of traffic sub-models, proper estimation of time-series data has still proven to be challenging.

Anecdotal evidence, however, demonstrates several techniques for gathering time-series data which include, static laboratory simulators [4], instrumented vehicles equipped with radar sensors [1, 5, 6] or onboard global positioning system (GPS) receivers. Although driving simulator-based experiments lack the flexibility of representing real traffic phenomena; howbeit, in the domain of instrumented vehicles, the dynamics of the following vehicle can be thoroughly analyzed for long stretches along a route. On one hand, instrumented vehicles can adequately capture the relative spacing and speeds using sensors; while on the other hand, an accurate positional data of the equipped vehicles can be obtained with the satellite-based GPS technology [7].

The advancements in real-time GPS technology have expedited a new horizon in the field of traffic engineering. In the context of car-following theories, Hatipkarasulu et al. [8] and Jiang and Li [9] have underlined the flexibility of vehicle-mounted GPS receivers to collect the position and speed data with a positional accuracy of 1–5 m and speed accuracy of 0.16 kmph. Additionally, by utilizing time-series data for a platoon of ten vehicles in a probing field, Gurusinghe et al. [7] demonstrated the superiority of the GPS technology in accurately estimating data than other conventional methods. Punzo and Simonelli [2] used kinematic differential GPS instruments to investigate the methodological issues of car-following model calibration and validation by accurately monitoring the trajectories of four vehicles in a platoon under real traffic conditions. The peer-reviewed literature has also elucidated the efficacy of real-time differential GPS receivers in suitably modeling the following behavior of cars [10,11,12].

Despite the high expected accuracy of kinematic GPS in vehicle tracking, the estimation of time-series trajectories to apprehend the car-following process of weak-lane discipline traffic is indeed a demanding task. Recent literature has underlined the importance of lateral descriptor (centerline separation) of traffic in modeling the staggered-following behavior [13,14,15,16], where the following vehicles often tend to maintain some lateral separation with its immediate leader, either to perceive the forward visual field with more confidence or to anticipate the behavioral response of the front vehicles and its associated proximity risks in the car-following processes. However, the requirements for staggered-following trajectory data are indeed stringent. For instance, in the field of satellite-based GPS technology, inter-vehicle longitudinal spacing can be obtained from the recorded vehicle positions with an accuracy of 1 m; on the contrary, the estimation of lateral separation from the GPS receivers may produce unreliable results. Understanding that a proper evaluation of car-following behavior in non-lane-based traffic environments requires an accurate characterization of the microscopic traffic variables and reliable experimental data, a suitable integration of the satellite-based GPS technology with an image-based data collection scheme can substantially improve the collection and processing of continuous time-series data for the staggered-following scenario.

Vehicle detection and tracking in mixed traffic conditions remain a challenge. Intrusive and non-intrusive techniques have been tried and tested; however, it is observed that only the image-based technique is able to trace vehicles with better accuracy [17]. Further, several efforts have been made to extract vehicle trajectories from video images. Some efforts were initially made by Metkari et al. [18], Jin et al. [19] and others by marking road sections by strips. In some works, the road sections were physically marked with strips of known width. On the other hand, efforts were made to calibrate the field data-points from their corresponding points visible in the image. Fukui [20], Courtney et al. [21], Bas and Chrisman [22] and Fung et al. [23] are the leading efforts in this regard. Errors were observed with changing camera orientations, and modifications necessary to avoid errors due to camera orientation angles are best described in Fung et al. [23]. Camera calibration techniques and image detection techniques are simultaneously used in several vehicle trajectory data extraction softwares, and the most popular one for mixed traffic is traffic analyzer and enumerator (TRAZER). However, these softwares need camera at fixed position and are not useful if dynamic data need to be obtained from within the stream. There is, thus, a need for semi-manual data collection method, which can manually classify and trace vehicle trajectories of neighboring vehicles. Softwares based on this concept were used to obtain vehicle trajectories [24, 25] or to study inter-vehicular gaps [26, 27]. However, in their study, the authors recorded a video of traffic stream from a static camera placed at a high vantage point. The entire vehicular interaction cannot be recorded by this technique. If the entire interaction between vehicles needs to be captured, one has to either compromise on the video resolution to mark the trajectory (thereby increasing error) or deploy complicated methods like video-stitching by recording from several simultaneous videos which were not attempted earlier. It is, therefore, necessary to use an instrumented vehicle for capturing video of the entire car-following interactions.

In this context, efforts are being directed to the development of a suitable data collection technique for estimation of reliable dynamic time-series car-following data for non-lane-based traffic. The study reported in this paper attempts to establish an image-based in-vehicle trajectory data collection system to process the microscopic variables (such as longitudinal gap, centerline separation, vehicle speeds, accelerations, etc.) using camera calibration and in-vehicle GPS information on straight roads, and to provide a copula-based methodological approach for the safety evaluation of vehicles in car-following process.

Firstly, a brief overview of the car-following processes is described, followed by a detailed description of the data collection methodology which forms the core of this research and the resulting data estimation process is provided. A discussion of the safety evaluation in the car-following process and methodological approach examining the safety criteria follows. Finally, analysis and conclusions providing an overview of the suitability of data collection exercise complete the paper.

Car-Following Behavior

The car-following behavior is a control process in which each driver attempts to maintain a desired following distance behind the lead vehicle by accelerating or decelerating in response to the actions of the vehicle in-front. This continuous adjustment in vehicle speeds is governed by the drivers’ perceptions of adjudging the leading vehicle speed as well as their own speeds, proximity to the desired following headway (or, longitudinal gap, LG) and response time delay which may overcompensate for small deviations from a target point [28, 29], resulting in the following ‘spirals’ which can be expressed in the relative speed-relative spacing plane (Fig. 1).

Fig. 1
figure 1

A typical car-following ‘spiral’ showing the variation of relative speed with longitudinal spacing

Such spirals are indicative of the car-following processes of the driver across all speed ranges in staggered-following scenario and his/her perceptions to maintain a safe desired headway. The following behavior in weak-lane discipline traffic, however, requires the consideration of an indicator of lateral interaction [30], namely centerline separation (CS) for a precise representation of staggered-following behavior. In essence, centerline separation describes the off-centeredness between the leading and following vehicles in the car-following process. Figure 2 illustrates the car-following ‘spirals’ for different CS levels of staggered-following scenario.

Fig. 2
figure 2

Typical following ‘spirals’ for different CS levels

The ‘spirals’ in the figure envisage that the continuous adjustment of the driver to the behavior of the vehicle in-front occurs at all levels of CS and the inter-vehicle spacing gradually decreases with the increase in staggered positions of vehicles. This is an indicative of a typical staggered-following behavior of non-lane-based traffic environments where the following vehicles often tend to circumvent the encumbrance of the lead vehicle to have a clear forward view and anticipate the criticality of the car-following process. As a result, they tend to maintain large centerline separation, thereby resulting in reduced spacing.

Such behavior illustrates the consideration of CS in the car-following theories, an accurate data estimation and processing of which will enrich the realism of model development and representation of drivers behavior in simulation modeling.

Experimental Data

Details of the data used in this study and a comprehensive discussion about the experimental setup, calibration process and the issues regarding data collection are provided in this section.

Instrumented Vehicle Setup and Data Collection

The data used in this study were collected from a series of experiments conducted in March 2018 along straight sections of the NH-27, a four-lane divided carriageway in Guwahati, India, under real traffic conditions. Experiments were conducted by driving two vehicles in the car-following state along rural highway covering a total stretch length of 37.33 km during the afternoon hours, 2:00–4:30 p.m. The vehicles were equipped with GPS receivers [Racelogic video V-box] that recorded the position of each vehicle at 0.1-s interval, and a video system attached to the windshield of the cars, allowing a complete visual record of the experiment. From the positional data of GPS receivers, vehicle speeds, longitudinal/lateral acceleration are calculated through successive derivations of the space traveled; while inter-vehicle longitudinal and lateral gaps are evaluated from the synchronized recorded video by a semi-automated trajectory extracting system [detail discussion in the “Calibration process”].

In the process of data collection, careful attention was devoted to the experimental setup. The drivers of both the vehicles were familiar with the traveled path, but were not aware of the purpose of experimental design. The driver of the leading vehicle was instructed to maintain different speeds in the range of 30–90 kmph at an interval of 10 kmph each (that is, 30 kmph, 40 kmph, 50 kmph, 60 kmph, 70 kmph, 80 kmph and 90 kmph) based on prevailing traffic conditions of the considered road stretch. On the other hand, the driver of the following vehicle was asked to follow the leading vehicle maintaining a safe distance based on his perception. In response to the actions of the leading vehicle, the following process of the subject vehicle was then captured based on the proposed data collection technique. As the aim of the experiment was to collect car-following data, the lane-changing maneuver and intrusions of other vehicles were avoided. Data for any unimpeded intrusions in unavoidable circumstances were discarded. In particular, the car-following data were processed for only such scenarios when the subject vehicle was under the influence of the leading instrumented vehicle. Moreover, the study was conducted for only straight sections where effect of external factors such as curves and gradients affecting the traffic flow was not considered. An example of a car-following scenario and a staggered-following scenario recorded from the camera attached to the following vehicle is presented in Fig. 3.

Fig. 3
figure 3

A typical a car-following and b staggered-following scenario recorded from the camera attached to the following vehicle. LG longitudinal gap, CS centerline separation; Red dot indicates the position of the camera attached to the following vehicle, i (color figure online)

Calibration Process

In an attempt to extract and analyze the inter-vehicle spacing (both longitudinal and lateral) from the video footage, a semi-automated trajectory system is employed in this research where each vehicle’s location on the screen is recognized and tracked at different time stamps using frame-by-frame analysis. The screen coordinates of the vehicle’s position recorded at each tracking are then transformed to real-world coordinates by employing the camera calibration equations devised by Fung et al. [23]. For brevity the detailed process is not discussed in this paper. The same calibration technique was also used to study the car-following behavior and lateral gap maintaining behavior of vehicles in non-lane-based traffic scenario [17, 26, 27].

The four reference points in the video sequence and their respective coordinates in the real world need to be captured and based on the calculated camera parameters (pan angle, tilt angle, swing angle, focal length and distance of the plane from the camera lens) and screen coordinate of each point, the real-world coordinate of the respective point can be estimated. As described in Fig. 4, four endpoints of the road edge markings are selected in this process to form a calibration pattern of rectangle ABCD.

Fig. 4
figure 4

Camera calibration technique considered in the study a rectangle ABCD viewed from the camera b top view (or, bird eye view) of the calibration pattern along with the lead vehicle

With reference to Fig. 4, four endpoints representing an exact rectangle in the world coordinates of known dimensions (AB = CD and AC = BD) are selected and using the semi-automated trajectory extractor, the corresponding end-point is traced to obtain its respective screen coordinates. Each end-point is, however, clicked several times on the screen, and for further data estimation and processing, the average pixel value for each point is considered to reduce errors in manual mouse-clicks. For the estimation of inter-vehicle spacing, it is considered that the edge of the test vehicle (EG or FH) is parallel to the road edge (AB or CD), and the longitudinal and lateral distance of the end-point (either B or D) from one front corner of the test vehicle (either E or F) is known. For any point P representing the mid-point of the rear bumper of the leading vehicle, the transformed real-world coordinates are estimated similar to the methodology described above. With all known measured distances and transformed real-world coordinates, the inter-vehicle spacing can be accurately estimated according to the expression described below:

$$ {\text{Longitudinal gap }} = \left( {{\text{x}}_{\text{P}} - {\text{x}}_{\text{B}} } \right) + \left( {{\text{x}}_{\text{B}} - {\text{x}}_{\text{E}} } \right), $$
$$ {\text{Centerline separation }} = \left| {{\text{y}}_{\text{P}} - {\text{y}}_{\text{B}} - \left( {{\text{y}}_{\text{E}} - {\text{y}}_{\text{B}} } \right) + \left( {{\text{y}}_{{{\text{P}}1}} - {\text{y}}_{\text{E}} } \right)} \right|, $$

where the distances \( \left( {{\text{x}}_{\text{B}} - {\text{x}}_{\text{E}} } \right) \), \( \left( {{\text{y}}_{\text{E}} - {\text{y}}_{\text{B}} } \right) \) and \( \left( {{\text{y}}_{{{\text{P}}1}} - {\text{y}}_{\text{E}} } \right) \) are accurately measured in the field during the calibration exercise (that is, before the start of experiment when the vehicle is stationary).

The real time of sampling, the global coordinates and speed of each vehicle involved in the experiment are directly obtained from the GPS receivers at 10-Hz frequency; while the positional data of the vehicle in-front are extracted from the video recorders at each 5-Hz frequency level. After suitable transformation, the GPS data and the extracted video data are then synchronized to obtain speed of both the vehicles, global time, longitudinal gap and centerline separation at every 0.2-s intervals. While extracting data from the video, an extractor may make an erroneous mouse click with a fixed standard deviation (in pixels) on the screen, and an average extractor makes mouse-clicking with an accuracy of six pixels [26]. These six pixels correspond to different distances in the field, and error in calculating longitudinal gap will increase as field distance from point B (refer Fig. 4) increases. The maximum error (normally distributed, 3σ) increases from 1.2% for a point exactly on point B, to 11.7% for a point 30 m ahead of B for the experiment conducted on Guwahati roads (when point B was 8 m ahead of test vehicle’s front edge). Moreover, the accuracy in the speed measurement is 0.16 kmph [31].

The underlying assumptions of the experimental setup process are (a) the road under surveillance should be reasonably straight such that both the vehicles rest on the same plane and (b) careful attention should be paid to the fixture of the camera on the vehicle since any change in the camera orientation may result in improper data estimation. Such considerations can indeed produce significant reliable and accurate experimental data for understanding the car-following process of non-lane-based traffic streams.

Results

Preliminary Analysis

The synchronized GPS and extracted video data resulted in 7002 cases of car-following events (data being obtained at each 0.2-s intervals) which would essentially help in understanding the following behavior as well as its associated proximity risks. A summary of the descriptive statistics of the inter-vehicle spacing (both longitudinal and lateral) and vehicle speeds for the extracted data is presented in Table 1.

Table 1 Descriptive statistics of the microscopic traffic variables

Statistics of the traffic variables show that the car-following data cover a wide range of vehicle speeds lying in the range of 9–78 kmph. The extent of off-centeredness (CS) of the subject vehicle with the front leader is observed to range from a minimum of zero to a maximum of 2.50 m. The observed range of longitudinal gap further justifies that the dataset covers the entire spectrum of all car–car interactions because the peer-reviewed literature suggests a maximum longitudinal gap of 30 m for a close-following behavior [15, 32].

The skewness and kurtosis values of longitudinal gap further indicate that the distributions have sharper peaks and heavier tails than that of a normal distribution; while kurtosis value of CS indicates that the data are normally distributed. Most of the speed data are characterized by skewness close to zero and negative skewness values indicating that the speed data are more symmetrically distributed than that of longitudinal gap, the tails of the distribution are lighter and have a flatter peak than a normal distribution; a comparison of the mean and median values further corroborates this result.

Dependence Structure Between the Traffic Variables

With an aim to comprehend the behavioral characteristics of longitudinal gap (LG), vehicle speed and CS and the potential dependence relationship between them, a summary of mean and median values of inter-vehicle spacing (LG, CS) for different ranges of vehicle speeds is presented in Table 2.

Table 2 Statistics of the variables for different speed ranges

The table shows that the mean and median values of longitudinal gap increase with the increase in following vehicles speeds and decrease with the increase in centerline separation. Further, vehicle speeds are observed to follow a decreasing trend with centerline separation. This observed trend is indicative of the fact that longitudinal gap and vehicle speeds are dependent on the lateral positioning of vehicles in car-following processes of non-lane-based traffic streams. Because following vehicles at large CS encounters a wide forward field of view and tends to avoid the encumbrance of lead vehicles, they tend to follow the leaders closely maintaining lower longitudinal gap and speed. The correlation coefficients of LG and CS (\( \tau = - 0.257,\rho_{\text{s}} = - 0.376,r = - 0.439 \)), speed and CS (\( \tau = - 0.198,\rho_{\text{s}} = - 0.293,r = - 0.264 \)), and LG and speed (\( \tau = - 0.172,\rho_{\text{s}} = 0.254,r = 0.293 \)) further corroborate the dependence relationship of the microscopic traffic variables.

To attain additional insights into the statistical differences of inter-vehicle spacing according to the following vehicle’s speeds, a one-way analysis of variance test is conducted; the results of which indicated significant statistical difference in longitudinal gaps [F(6,6996) = 110.46; p < 0.001] and centerline separations [F(6,6996) = 70.51; p < 0.001] across all speed ranges. This finding justifies that the longitudinal and lateral separations of the subject vehicle with the vehicle in-front vary significantly according to the speeds of the following vehicle in the car-following processes.

Phenomenology of Safe ‘Distance-Keeping’ in the Car-Following Scenario

Achieving a comprehensive understanding of how drivers control their vehicles while following another vehicle, how substantial the safety problem is, how much percentage of drivers exhibit such unsafe critical events and how the safety requirements can be modeled, is still a matter of debate and requires more attention. The homeostasis theory proposed by Wilde [33] emphasizes that a driver always tend to maintain a constant level of risk exposure by adjusting his behavior, that is, by controlling speed and headway. Similarly, Summala [34] proposed the ‘zero-risk theory’ which states that a driver acts to control the risk level when the risk exceeds the safety margins. Although quantifying the ‘safety’ is difficult, many pioneering efforts were devoted to recommend the threshold values for headways which could essentially distinguish between relatively safe and dangerous encounters in the car-following processes of lane-based traffic [35,36,37,38,39,40]. Anecdotal literature on the car-following models, however, suggest the consideration of different parameters namely, space headway, speed, relative speed, relative acceleration, desired speed, maximum acceleration, etc. to predict the decision of the following vehicle in response to the actions of the leader. Amongst all, space headway (summation of longitudinal gap and vehicle length) and speed form the basis of most of the car-following models. The drivers in real-world scenario can actually perceive available distance more precisely than time measurement and considering further the utilization of safe longitudinal gap information in the development of car-following models (safety distance models or Gipp’s model), evaluation of safe longitudinal gap is assessed in this study rather than time-headway. Particularly, with regard to non-lane-based traffic scenario, there is still a paucity of research concerning the applicability and evaluation of safety criteria, where in addition to speed and headway, centerline separation has proved to be an essential indicator in describing the vehicle following processes.

Understanding that the utilization of safety indicator in weak-lane discipline traffic requires an integration of the longitudinal (longitudinal gap and vehicle speed) as well as the lateral descriptor (centerline separation), the headway thresholds recommended in the literature may not represent the actual scenario in non-lane-based traffic environments as these minimum thresholds are envisaged to vary across different lateral positions of vehicles. A methodological approach using copulas is thus employed in this research to accommodate the dependence structure of the time-series data of longitudinal gap, vehicle speeds and centerline separation. A concoction of the peer-reviewed literature reveals that the average minimum safe and comfortable time-headways in car-following scenario lie in the range of 0.64–1.78 s [35, 36, 38,39,40] for vehicle speeds of 45–150 kmph.

With an aim to quantify the ‘safety’ in car-following processes, the lower 5% values of longitudinal gaps at each speed and centerline separation are selected in this study. Firstly, the CS data were segregated into different groups (0–0.5 m, 0.5–1 m, 1–1.5 m, 1.5–2 m and 2–2.5 m) and for each CS group, the corresponding dataset of longitudinal gap and speeds were separated as well. Secondly, a variety of speeds belonging to each CS group were then further separated into approximate speed values (± 3 kmph) of 30 kmph, 40 kmph, 50 kmph, 60 kmph, 70 kmph, 80 kmph and 90 kmph and the corresponding LG data for each speed were also separated. Finally, for each demarcated CS group and approximate speed value, the minimum 5% data of LG were selected for evaluating the safe distance requirements in the car-following process. This lower 5% LG values obtained for each speed and CS ranges are then compared with the recommended thresholds described in the existing literature and are accordingly, anticipated as the minimum safe distance-keeping requirements for the driver in the following scenario. Figure 5 depicts variation of the average safe longitudinal gap (considering lower 5% values) with vehicle speeds and centerline separation.

Fig. 5
figure 5

Variation of the safe longitudinal gap with speed and CS

The plots depict a pragmatic increasing relationship of the longitudinal gap with speed for each level of CS and also a decreasing trend of safe longitudinal gap and speed with CS is observed for speeds of 30 kmph, 40 kmph, 50 kmph, 60 kmph, 70 kmph and 80 kmph. Considering lower CS values as a replication of actual car-following events of lane-based traffic, a direct comparison of the safe longitudinal gap (Fig. 5) with the recommended thresholds can be acquired for CS less than 0.5 m. In our study, the average safe longitudinal gaps for speed of 40 kmph, 50 kmph and 80 kmph are obtained as 8.55 m, 8.85 m and 18.41 m, respectively, which is similar to the safe LG value indicated by Duan et al. [40] where they found the safe longitudinal gap range as 8.70–23.10 m for speeds of 45 kmph and 90 kmph. Similarly, in a study by Taieb-Maimon and Shimar [38], the range of safe time headway obtained was 0.64–0.69 s for vehicle speeds of 50 kmph, 60 kmph, 70 kmph, 80 kmph, 90 kmph and 100 kmph. Direct comparison of the recommended safe longitudinal gap value (8.88–19.17 m) with our data (8.85–18.41 m for speeds in the range of 50–80 kmph) indicated that the safe longitudinal gaps considered in our study for different vehicle speeds are similar to the values obtained by Taieb-Maimon and Shinar’s [38] work. This indeed justifies that the average safe longitudinal gaps considered in our study (corresponding to lower 5% values) can be used to represent the safe following conditions in car-following and staggered-following cases of non-lane-based traffic environments. The lower 5% values of longitudinal gaps corresponding to each speed and CS level are, therefore, utilized for the development of the copula model.

Application of Copulas in Safety Evaluation

As identified previously, the risk level of drivers defined for lane-based homogeneous traffic may not imply the same level of risk in non-lane-based traffic cases. Considering the dependence relationship among safe LG, speed and CS, a copula-based methodological framework can address the relationship of micro-level parameters used in the car-following models with the lateral descriptor which will, in turn, ascertain the propensity of crash risks at any lateral positioning of vehicle in non-lane-based traffic streams.

Concept of Copulas

A copula is a joint cumulative distribution function that links a stochastic multivariate relationship to its univariate marginal distributions of any dimension, such that each margin is uniformly distributed in [0, 1]. For uniformly distributed continuous random variables \( \left( {X_{1} ,X_{2} , \ldots ,X_{n} } \right) \) with marginal cumulative distribution functions \( \left( {F_{1} \left( {x_{1} } \right),F_{2} \left( {x_{2} } \right), \ldots ,F_{n} \left( {x_{n} } \right)} \right) \), a joint n-dimensional cumulative distribution function (CDF) \( F\left( {x_{1} ,x_{2} , \ldots ,x_{n} } \right) \) can be generated as follows:

$$ \begin{aligned} F\left( {x_{1} ,x_{2} , \ldots ,x_{n} } \right) & = {\mathbb{P}}\left( {X_{1} < x_{1} ,X_{2} < x_{2} , \ldots ,X_{n} < x_{n} } \right) \\ & = {\mathbb{P}}\left( {F_{1} \left( {X_{1} } \right) < F_{1} \left( {x_{1} } \right),F_{2} \left( {X_{2} } \right)}\right. \\&\left.{< F_{2} \left( {x_{2} } \right), \ldots ,F_{n} \left( {X_{n} } \right) < F_{n} \left( {x_{n} } \right)} \right) \\ & = C_{\theta } \left( {F_{1} \left( {x_{1} } \right),F_{2} \left( {x_{2} } \right), \ldots ,F_{n} \left( {x_{n} } \right)} \right) \\ \end{aligned} $$
(1)

Equation (1) shows that the joint CDF F can be described by the margins \( F_{1} , F_{2} , \ldots ,F_{n} \) and the bivariate copula C, which captures the dependency structure among \( X_{1} ,X_{2} , \ldots ,X_{n} \).

Selection of Univariate Marginal Distributions

The first step in the estimation of a suitable copula model requires proper selection of optimal marginal probability distributions. Several probability distributions were employed in this study to select the best-fitted marginal probability distribution function for safe LG data, speed and CS data. The best-fitted distribution was selected when the Akaike-Information Criteria (AIC) and three statistical goodness-of-fit tests namely Kolmogorov–Smirnov (K–S) test, Anderson–Darling (A–D) and Cramer–von Mises (C–vM) test statistics were minimal, rendering the null hypothesis unable to be rejected at α = 0.05 [16, 41]. Table 3 presents a summary of the results of the goodness-of-fit test.

Table 3 Results of goodness-of-fit tests

Based on the goodness-of-fit index, AIC and log-likelihood criterion, logistic distribution was selected as the best fitted for safe longitudinal gap [location = 12.514, scale = 2.893] and speed data [location = 47.221, scale = 5.995]; whereas, normal distribution [mean = 1.284, standard deviation = 0.525] provided the best fit for CS.

Estimation of Suitable Copula Model

Constructing a joint distribution model for 3D variables (LG, CS, speed) requires good assessment of suitable tri-variate copula functions in comparison with their observations. The selection of suitable copula model was performed intuitively based on the designated dependence domains and the nature of association of the data [42].

Considering the flexibility of multivariate Gaussian copula in modeling all ranges of dependence structure \( \left( { - 1,1} \right) \) and the widespread popularity of Archimedean copulas, five copula functions namely, Gaussian, Gumbel, Frank, Clayton and Joe are employed in this study. However, since the multivariate Archimedean copulas (Frank, Gumbel, Clayton and Joe) lack the flexibility of modeling negative dependence structure for higher dimensions \( \left( {n \ge 3} \right) \), the original CS data are transformed to a new variable \( {\text{CS}}^{ '} \), where \( {\text{CS}}^{ '} { = }\frac{ 1}{\text{CS}} \), and subsequently the tri-variate Frank, Gumbel, Clayton and Joe copulas are used to fit the LG, speed and the transformed centerline separation data. Moreover, the performances of tri-variate Gaussian, Frank, Clayton and Joe copulas with/without the transformed variables are assessed based on log-likelihood and AIC criterion, and the parameters of each copula are estimated using maximum pseudo-likelihood method. The dependence parameter value (θ) of each copula function along with the log-likelihood (LL) and Akaike’s information criterion (AIC) values are listed in Table 4.

Table 4 Parameter estimation and its associated LL values for the transformed tri-variate Archimedean copulas and Gaussian copula

The results indicate that tri-variate Gaussian copula showed the highest log-likelihood and the lowest AIC values for LG, CS and speed data, followed by Frank, Clayton, Gumbel and Joe copulas, respectively. On the basis of performance measures of the copula functions presented in Table 4, it is, thus, adjudged that tri-variate Gaussian copula, with logistic distribution for LG and speed, and normal distribution for CS, could be further investigated to assess the level of safety in the following processes of non-lane-based traffic environments.

Non-Exceedance Conditional Probability Distributions

For a particular non-exceedance conditional probability, several combinations of the microscopic variables exceeding a certain threshold can be obtained from the developed tri-variate Gaussian copula model. Figure 6 shows the non-exceedance conditional probabilities of safe longitudinal gap given both vehicle speeds (VS) and centerline separation exceeding certain thresholds.

Fig. 6
figure 6

Conditional probabilities \( {\mathbb{P}}\left( {{\text{LG}}\, \le \,{ \lg }|{\text{VS}}\, \ge \,{\text{vs}},{\text{CS}}\, \ge \,{\text{cs}}} \right) \) with centreline separation being equal to a 0.10 m and b 1.50 m

From the figure, it can be observed that the conditional distribution of safe longitudinal gap gradually increases with the corresponding conditional factor (vehicle speed) decreasing and also it increases with the increase in centerline separation. For instance, the non-exceedance conditional probability for safe longitudinal gap \( {\mathbb{P}}\left( {{\text{LG}} \le 10{\text{m}}} \right) \) with speeds exceeding 40 kmph in car-following state (CS = 0.1 m) is 0.23; whereas for large centerline separation (CS = 1.5 m), the probability is obtained as 0.42. This increase in probability is due to the staggered car-following in which the drivers tend to follow the leaders more closely maintaining larger centerline separation in order to have a clear field of view of the forward scenario. Similarly, for a conditional probability of 0.85, the safe longitudinal gaps corresponding to vehicle speeds exceeding 60 kmph and 80 kmph in car-following state are obtained as 21.3 m and 24.7 m; whereas longitudinal gaps of 17.8 m and 19.3 m are observed for the staggered-following scenario, respectively (about 15–20% decrease in longitudinal gaps for staggered following). This clearly indicates that the decreasing relationship of LG with CS holds true for each value of vehicle speed.

The joint probability density distributions of the longitudinal gap and centerline for vehicle speeds exceeding thresholds of 40 kmph, 50 kmph, 60 kmph and 70 kmph are represented in Fig. 7.

Fig. 7
figure 7

Joint probability density plots for safe LG and CS with a speed ≥ 40 kmph and b speed ≥ 50 kmph, c speed ≥ 60 kmph and d speed ≥ 70 kmph

The bivariate LG–CS plots of Fig. 7 clearly depict a reciprocal dependent relationship between the 2D variables for each speed level. It can be observed that there is a gradual rightward shift in the LG-axis as speed increases, which is quite expected. For example, a majority of the drivers in car-following state (say CS = 0) are observed to maintain safe longitudinal gaps of 17.5 m, 18.75 m, 22.5 m, 23.75 m for speeds exceeding 40 kmph, 50 kmph, 60 kmph and 70 kmph, respectively; while for any arbitrary CS value (say CS = 1 m), the corresponding longitudinal gaps are observed as 14.75 m, 16 m, 17.5 m and 20 m, respectively, which justifies the negative dependent relationship of LG and speed with CS and positive degree of association between LG and speed. The results of the study, therefore, signify that car-drivers in staggered-following cases perceive the scenario ahead with more confidence and can anticipate the lead vehicle’s behavior with better predictability; therefore, they usually tend to follow the leading vehicles closely maintaining lower longitudinal gaps and speeds at higher CS levels.

Such information can provide useful insights in the car-following models, by which the safe ‘distance-keeping’ requirements at different speeds and CS values can be evaluated, which indeed can help in a better replication and representation of actual drivers’ behavior in the following processes.

Possible Applications of the Current Study

The tri-variate Gaussian copula can provide useful insights in the development of car-following models, by which the safe ‘distance-keeping’ requirements at different speeds and CS values can be evaluated. The widely used safety distance car-following models hypothesize that drivers usually try to maintain a sufficient distance with the leading vehicles at different speeds so as to avoid a collision if the leading vehicle suddenly applies brakes. In such models, the information on the safe longitudinal gap thresholds obtained from the copulas at different speeds and CSs can be directly utilized in the model development which will help in a better replication and representation of actual drivers’ behavior in the microsimulation models.

In addition, the results of the study can also find possible applications in providing advanced collision warning system adaptation databases for the evaluation of conflict severities in the car-following process of non-lane-based traffic streams. With the advent of driving-assistance systems, vehicles are now equipped with adaptive cruise control (ACC)/collision avoidance systems (CAS) in which the system maintains a safe headway to the vehicle in-front according to the settings predefined by the users, and also warns the drivers of upcoming potential threats and imminent collisions. In particular, for a given probability value, if the safe longitudinal gap exceeds the available gap at specified CS and speed levels, the warning and intervention systems used for detecting safety hazards in the car-following processes may be triggered, so that any risks of imminent collisions can be avoided. The work undertaken in this study can, therefore, be useful in the development of car-following models, advanced driver assistance systems and for safety evaluation in the car-following process of non-lane-based traffic streams.

Conclusions and Future Scope

The major contribution of this study lies in the focus on car-following data collection and estimation of non-lane-based traffic environments. Efforts are being directed to the development of an image-based in-vehicle trajectory data collection system for an estimation of reliable dynamic time-series car-following data using camera calibration and in-vehicle GPS information. This study not only describes the stringent requirements of the data but also provides a copula-based methodological framework for the safety evaluation of vehicles in the following processes.

First, the car-following ‘spirals’ indicated that the drivers’ adjustments of speeds in the following process occur at all levels of centerline separations and the inter-vehicle spacing follows a decreasing trend with the lateral separation between the vehicles. Preliminary analysis on longitudinal gap (LG), speed and centerline separation (CS) also corroborated a reciprocal dependent relationship between LG and CS, speed and CS, and a positive dependent relationship between LG and speed, complementing earlier researches in mixed traffic. The drivers in car-following state (say CS = 0) are observed to maintain safe longitudinal gaps varying from 17.5 to 23.75 m for speeds exceeding 40–70 kmph, while the gaps for 1 m CS are observed in the range of 14.75–20 m for the same speeds. Understanding that an estimation of the safe longitudinal spacing in the following process can inherently enrich the realism of car-following model development and representation of realistic drivers’ behavior, a tri-variate copula model was developed considering lower 5% longitudinal gap data for each speed and CS level. Based on the performance measures of the copula functions, a tri-variate Gaussian model with logistic distribution for LG and speed, and normal distribution for CS, was found to assess the level of safety in the following processes of non-lane-based traffic environments. The conditional and joint probability distributions further demonstrated the importance of CS in modeling the car-following behavior of non-lane-based traffic streams.

The results obtained in this study can find suitable applications in the car-following model development, advanced driver assistance systems and in safety evaluation of non-lane-based traffic environments as discussed in the “Possible applications of the current study”. For future scope, the results of the study can be extended for other leader–follower vehicle pairs that are more prevalent in mixed traffic streams. Various driver behaviors such as drifting, shying away and closing in reactions, changing lateral positions and the overtaking behavior can be studied using camera calibration methodology. The copula models ensure robust representation of obtained datasets thereby informing the effect of all explanatory terms in gap-keeping. Also, further studies may need to be conducted in diverse cities to capture the driving behavior variability.