Introduction

The vehicle speed on a particular road section is known to be an important contributing factor for accidents and their resulting severity. Speed plays a significant role in planning and designing the road geometry, as well as in the setting of safe speed limits. Moreover, the choice of speed by drivers for a prevailing traffic condition is a key factor in models used to monitor traffic operations and to evaluate the performance of traffic system [1].

A driver’s choice of speed is affected by many factors and there is a wealth of the literature on this topic. From the perspective of statistical analysis, most of the conventional practices to identify the factors affecting speed choice, as well as to formulate models to predict the speed of a road section, are mainly based on OLS linear regression model (e.g. [1,2,3,4,5]). The OLS regression model has a significant constraint in the degree of uncertainty as it considers the fixed effect of the variables across the entire roadway segments [6, 7]. It is possible that the effect of exogenous factors will be heterogeneous and random in nature raised from the observed and unobserved variables not considered in the data collection process. Therefore, the use of fixed parameter regression approach could lead to the inconsistent and biased identification of the factors. As such, there is a scope for improvement in the performance of the least square model by adding random effects associated with many unaccounted factors. Tarris et al. [8] attempted to incorporate random effects to examine the road geometry and driver effects on operating speeds. Poe and Mason Jr [9] used a mixed-model approach to account for random effect for an individual observation site of horizontal curvature of two-lane highways. Both these two studies used data collected from low-speed urban streets in the USA.

However, most of the literature mainly deals with free-flow speeds at the macro-level, i.e. for a network comprising different road sections, or particular sections of several roads, such as curves or intersections. In addition, most existing models are based on homogenous traffic settings. However, the road-traffic environment, as well as the driving behaviour in developing countries, is very different from those pertaining in developed countries [10, 11]. As a result, there is still much to learn about the influencing factors of speed choice at the micro-level, especially of undivided two-lane rural highways in heterogeneous traffic environments. The main objective of this paper is to identify the factors affecting the speed behaviour of a driver at different speeds level and to develop a model for estimating the impact of different factors on speed choice of a two-lane bidirectional highway in developing countries. The study formulates and estimates a micro-level speed prediction model using the random parameter approach to incorporate individual heterogeneous effect of the exogenous variables. The model is based on the micro-level analysis with the variation of speed being analysed on the basis of short road segments.

To quantify the potential random effects of exogenous factors, the current study compared the performance of OLS and random parameter models, using several aggregated and disaggregated goodness-of-fit measures. With the identification of significant influencing factors, the study also discussed the partial effects to quantify the effect of those individual variables on driver speed choice. The model was validated using data collected from different trips. The results confirm the need to accommodate the random effect of exogenous variables in order to examine the speed choice behaviour under a heterogeneous traffic environment.

The study collected data from a 13 km section of a two-lane bidirectional major national highway (N4) in Bangladesh. Field observation, including an instrumented floating vehicle, was used to collect speed and related data. This paper reports on the analysis of the data mainly obtained using a vehicle instrumented with three inside cameras and observers. Speed data were extracted from the speedometer reading of the instrumented vehicle. The back camera provided the phasing and opposing traffic data. Detailed geometric and road environmental attributes were gathered using on-site field and video observation. Speed data extracted second-by-second were analysed for a range of short road segments. The critical segment length, from the point of view of speed and overtaking analysis, was identified as 200 m. Speed profiles for each 200 m segment were analysed, and descriptive speed-related statistics were obtained. In addition, the variation of directional speed and its significance was also quantified. Abrupt speed changes were identified to determine the probability of conflict using surrogate safety measures. As speed choice is affected by many factors, the most significant attributes influencing speeding behaviour have been identified using different sensitivity analyses. Those attributes include traffic flow by vehicle composition, road and roadside environment, as well as traffic operational characteristics.

The paper is organized as follows: The following section briefly outlines the overall data collection methodology (Sect. 2). Next section presents some of the key findings of the analysis speed behaviour (Sect. 3). The significant factors affecting speed behaviour are identified in the following section (Sect. 4). Before concluding section, the paper describes the random parameter model estimated and shows the model estimation and validation results (Sect. 5). Finally, the paper summarizes the main findings and provides a discussion of the potential model applications, imitations and areas for future research (Sect. 6).

Data Collection Methodology

Study Area

The study area is the Jamuna Multipurpose Bridge approach roads, a major highway section in Bangladesh. The selected road segments are two-lane bidirectional rural highway with a heterogeneous traffic environment, typical of a developing country. The length of the bridge is about 4.8 km in total. The approach roads are 16 km from the east side and 17 km from the west. For the analysis of speed behaviour and for model development, the current study used data collected from the east approach road. Details of the road geometry, environmental and traffic operational characteristics are given in Mahmud et al. [12] and Islam [13].

Collection Methods

The primary data for the study reported here were collected using mainly the naturalistic driving method using an instrumented vehicle and field observation simultaneously.

Naturalistic Driving

An instrumented vehicle (microbus) was used to gather data on driving behaviour, as well as to continuously monitor vehicle speed. Two in-vehicle cameras were attached to the front dashboard and rear side of the vehicle to track the movements of the leader and the follower. Another camera was purposefully set to record the speedometer reading. The third camera enabled the continuous monitoring of vehicle speed on a second-by-second basis. Figure 1 shows glimpses of front and speedometer camera views. Simultaneously, two observers recorded all overtaking events and any other risky driver behaviour. The video footage is also included an audio record of any prompt made by the camera operator regarding essential data items, such as the identity of the overtaking vehicle and the details of any manoeuvers deemed to be risky.

Fig. 1
figure 1

Glimpses of front and speedometer camera view

The driver was instructed to drive normally at all times during the entire period of data collection. However, to justify the normal behaviour of the floating vehicle driver, some of the driving attributes were compared with those of the general traffic. These attributes include space mean speed (SMS); time mean speed (TMS); maximum and minimum spot speed; travel time; the number of overtaking events; and average overtaking distance. It is found that the attributes of the test vehicle are not statistically different from those of the corresponding general traffic. The comparative analysis excluded two- and three-wheeler light vehicles (around 10% of total traffic), as these vehicles are significantly slower and their behaviour is different from other mainstream traffic.

Field Observations

Observational studies were made by group of trained observers continuously, whilst video data were being captured along the entire study section (Fig. 2). Different attributes related to driver behaviour were collected using pre-defined form. Moreover, information related to road geometry, roadway condition, road surface friction, road abutting land use pattern, the degree of access control and spot speeds were also collected by the observers from different segments of the road. All of those data are cross-checked with the video data.

Fig. 2
figure 2

Glimpses of field observation

Selection of Appropriate Segment Length

Figure 3 represents the step-by-step process used to select the most appropriate speed analysis segment length. Firstly, the instrumented vehicle’s speeds were extracted from the video footage of speedometer. The speed-related data were verified using a two-step process to minimize the errors. Following a random cross-check of the full dataset, a trip-based time-speed scatter plot was used to identify anomalies and any sudden change in speed. Adjustments were made using alternative video data.

Fig. 3
figure 3

Critical segment identification process

Altogether, 18 reference points along the 13.7 km road section were used to verify and match the speed data and to exact geographic or chainage location. Estimated travel distances were found to have small errors (2–3%) compared with actual on-road distances. Those errors are due mainly to lateral vehicle movements and speedometer reading measurement error.

A number of speed-related statistical attributes, such as maximum, minimum, standard deviation, speed difference and average speed for the respective segment length were used in order to determine the most appropriate small segment length for speed analysis purposes which is termed as segmentation. The level of significance in the differences of those attributes for different segment lengths was evaluated using standard sensitivity analysis including ANOVA, t test. Segments tested ranged from 100 to 500 m, in steps of 50 m. The results show that a 200-m segment length is the most appropriate, taking into account the statistical analysis and the minimum required overtaking distance.

Analysis of Speed Behaviour

The study reported here considered 20 typical trips by a floating vehicle (10 trips in each direction), for different times of day. The summation of total travel distance considered for the detailed analysis is around 272 km, equivalent to around 4.3 h of travel. Speed data were extracted second-by-second for each trip. These speed data have been analysed from different perspectives. A brief summary of that analysis is given in the following sub-sections. However, for the factor analysis modelling, the study selected 13 trips, 10 for development and 3 for validation purposes. All of selected trips were in normal traffic conditions under a bright sunny environment.

Overall Speed Profiles

The posted speed limit for the selected road section is 80 kph. Both the TMS and the SMS were observed to be significantly different from that posted speed. The spot speed or TMS of the floating vehicle reached over 100 kph, and the SMS in different trips ranged from 53 to 73 kph. The mean SMS for the entire study section is 59 kph (60 kph west–east and 58 kph east–west). The summarized overall speed profiles along with different descriptive statistics by direction of travel are presented in Table 1. The difference in SMS between the two directions is found statistically significant (p value is almost zero (4.68E−19), z > zCritical two-tail > zCritical one-tail; F > Fcrit).

Table 1 Overall speed: descriptive statistics

Overall Distribution of Speeds

Table 2 shows the total time and distance spent travelling, for each speed range and by direction, for all 20 recorded trips. Although the overall mean speed is well below 80 kph, the survey vehicle spent, on average, 13% of the time and 20% of distance at speeds above 80 kph. As shown in Table 3, around 24 and 17% of total distance travelled at speeds over 80 kph, in the west–east and east–west directions, respectively. Distributions of speeds by direction were found to be normally distributed (Fig. 4).

Table 2 Distribution of time and distance travelled
Table 3 Deceleration and acceleration frequencies for different ranges of speed change ranges
Fig. 4
figure 4

Normal distribution of speed by direction

Acceleration and Deceleration Changes

The sudden change of speed or acceleration/deceleration behaviour has a significant impact on different traffic operational and road environmental factors [14]. The abrupt change of speed is an indication of disturbance of free flow and indicates a probability of conflict. Table 3 shows the frequency of deceleration and acceleration at different ranges of speed change (m/s2), found in the 20 trips analysed here.

As shown in Table 3, 24% of decelerations between − 2 and − 1 m/s2 occurred from the initial speeds above 70 kph. Observation of this critical deceleration events revealed that certain evasive actions were taken not only to avoid rear-end conflicts but also to avoid severe head-on conflicts with opposing vehicles due to overtaking manoeuvers. Table 3 also shows the number of large acceleration events for different acceleration ranges. Most of those events took place during overtaking manoeuvers, mainly to avoid impending collisions.

Speed Variation for Small Segment

Figure 5 shows the average segmental SMS for each trip. The average segment speed for all trips is also shown. Although there is noticeable variation in speed between different segments, the ranges of speeds are consistently similar.

Fig. 5
figure 5

Segment-wise SMS speed profile

The SMS overall means are 65 kph and 62 kph in the west–east and east–west direction, respectively. Although the mean speeds show similar profiles in both directions, the differences were found to be statistically significant (z > zCritical two-tail > zCritical one-tail; P(Z  ≤ z) one-tail = 0.017 and P(Z ≤ z) two-tail = 0.035). The differences between the two-directional segmental average SMS are relatively less in the first half of the road segment (segments 2–36) where the average SMS varies between 60 and 70 kph. In the second half (beyond segment 36), the average SMS varies considerably more. Segments 40–52 (7.8–10.2 km) show the highest SMS in the entire section and it reaches up to 80 kph. The distribution of speed in each segment of the road is shown as box plot in “Appendix B”.

Factors Affecting Speed Choice

Selection of Explanatory Variables

The choice of speed is affected by several factors (e.g. [1, 2, 5]). Those factors can be divided into three main categories, namely road geometry; traffic flow; and traffic control. Several studies have dealt with the relationships between speeding behaviour and personal factors, including demography and psychology [15]. Most of such past works relate to free-flow speeds on networks comprising different types of road. Therefore, psychology, demography of drivers and vehicular characteristics were different for different drivers and roads. Therefore, those attributes were considered and were found significant in speed choice. However, the current study attempted to identify the factors that affect the speed choice of a driver for each small segment of the studied road, where the posted speed limit is the same throughout the section. Moreover, considering the traffic characteristics of the study area, a microbus with a young driver (around 25-year age) has been considered for the data collection to ensure maximum vehicle representation. Hence, those variables are constant for the context of the current study. Indeed, the influencing factors determining speed choice on undivided two-lane rural highways in heterogeneous traffic environment have yet to receive the same level of attention.

A number of external factors have been assessed, such as those related to road geometry, road environment, traffic flow and operations. Special attention has been given to the different characteristics of road and traffic environment in developing countries. Road geometric factors include alignment, shoulder, presence of bridge, culvert or access road. Road environmental factors include roadside friction in terms of pedestrian activities, roadside trading activities and roadside parking. Factors related to traffic flow and operations include directional flow by different types of vehicles. Segmental traffic flow has been counted during the period of crossing (facing and following and/or overtaking) by the floating vehicle. All of those attributes have been collected for each segment by field observers and instrumented vehicle video. As the study examined the change of speed within small segments, the driver and vehicle-related factors are assumed constant for each segment. A total of 32 independent variables were analysed for four statistical measures of speed levels, namely average; minimum; maximum; and 3rd quartile speed.

For the appropriate variables selection, the study tested variables under different measures (such as indicator, ordinal, scale or continuous), to obtain the best results on a trial and error basis. Separate trials were made in combining and segregating different variables, such as bus and truck. However, segregation of buses and trucks into two different independent variables provided a better result, in terms of goodness of fit. Moreover, geometric and environmental attributes were divided into different categories based on their characteristics. Roadside shoulder was also divided into three different categories according to its geometric conditions, such as good, medium and bad. Finally, each variable was converted to an indicator measure, as it provided better output than the categorical or ordinal measure. A detailed definition of the categories selected is given in “Appendix A”.

Correlation Test

All of explanatory variables shown in “Appendix A” were considered for analysis of the impact of speed behaviour. Pearson’s correlation test among all of those explanatory variables was initially undertaken (Fig. 6). Those variables which were found to be significantly correlated with others were excluded from the model (r > 0.7). Those variables which were found not to be statistically significant were also progressively discarded.

Fig. 6
figure 6

Correlation among initially considered all of explanatory variables

Identification of Significant Factors

Ordinary Least Square Regression (OLS) model

OLS regression was used to identify the significant speed influencing factors. The best model specification for calculating average speed, maximum speed, minimum speed and 75-percentile speed, for a particular road segment, has been developed. Sixteen factors were found to be significant for different speed levels. Among them, nine are related to traffic flow characteristics, such as buses, trucks and other four wheelers from both directions, same directional three wheelers, motorcycles and non-motorized vehicles (NMV) from opposite direction. Five are related to road geometry, including shoulder (good and bad), on road small bridge, culvert and major access; and two are related to roadside environment, such as roadside medium pedestrian activities and low non-motorized vehicle (NMV) along the road side. A list of those factors with coefficient, t value and p value are given in Table 4. The constant term is also given to show the full specification of the OLS regression model.

Table 4 Significant factors affecting speed choice

All of the significant factors related to traffic flow influence negatively the choice of speed at all levels (i.e. the higher the flow the lower the speed). Good shoulder, culvert, major access and absence of NMV help to reduce the congestion by increasing the speed choice at the minimum speed level. Culvert and good shoulder also have significant positive impact to maximize the speed level.

Random Parameter Modelling Approach

Most of the researches deal with the identification of factors affecting speed choice using ordinary least square models. The latter considers the fixed effect of the variables along the sample (e.g. [1, 2, 5, 16, 17]. On the other hand, the effect of an independent variable might be varied and random for sites with different influencing attributes. As such, there is scope for model improvement by adding random effects associated with many unaccounted factors. A random parameter model takes into account the effect of heterogeneity due to unobserved factors that may differ from segment to segment [6, 18]. Tarris et al. [8] attempted to incorporate random effects to examine whether the groups comprising individual drivers and time effects vary randomly. Poe and Mason Jr [9] used a mixed-model approach to account for random effect for individual observation sites. Those models are built around the following structural equations:

$$y_{it} = \alpha 'w_{it} + x_{it} '\beta_{i} + \varepsilon_{it} ,\;\;i = 1, \ldots ,N,\;\;t = 1, \ldots ,Ti$$
(1)

\(\varepsilon_{it}\) random error term (representing pure random noise),

$$\beta_{i} = \beta + \varGamma v_{i}$$
$$\varepsilon_{it} \sim\;N\left[ {0, \sigma_{i}^{2} } \right]$$

The conditional mean function is:

$$E\left[ {y_{{it\left| {X_{it} } \right.}} ,\beta_{i} } \right] = \alpha 'w_{it} + x_{it} '\beta_{i} + \varepsilon_{it}, \quad i = 1, \ldots ,N,\;\;t = 1, \ldots ,Ti$$
(2)

The model assumes that parameters are randomly distributed with a possible heterogeneous (across individuals) distribution. For the random parameter or mixed model, the general form is:

$$\beta_{I} = \beta + \Delta z_{i} + \varGamma v_{i}$$
(3)

Where, it is assumed that

$$E\left[ {v_{{i\left| {X_{i} ,} \right.}} z_{i} } \right] = 0,\;\;\;{\text{Var}}\left[ {v_{{i\left| {X_{i} ,} \right.}} z_{i} } \right] = I$$
$${\text{Var}}\left[ {\beta_{{i\left| {X_{i} ,} \right.}} z_{i} } \right] = \varSigma = \varGamma \varGamma '$$
$$\sigma_{i}^{2} = \sigma^{2} \left( {\text{constanct}} \right)$$

The detailed methodology for computing the random effect can be seen in Greene [19].

According to the description of the Agbelie [20], the study considered a parameter having random effect if the standard deviation of the variable density is statistically significant. On the other hand, if the estimated standard deviation of the variable is not statistically different from the zero, i.e. not significant, the estimated variable is considered as a fixed across the segments of the highway section.

For the random parameter density functional forms, uniform, triangular and normal distributions were investigated. For a variable with the triangular or uniform distribution, the variance of \(\beta_{i}\) is \(\sigma_{i} /6\) or \(\sigma_{i} /3\), respectively. It was found that a normal distribution gave the best statistical result for all random parameters.

Interpretation of Results

Though separate random parameters models have been developed and validated for four different speed levels, only the mean speed model is discussed here, for reasons of brevity.

Model Estimation

Table 5 presents the estimation results for fixed and random parameter average speed models. In the random parameters model, two additional variables were found to be significant, namely medium shoulder and absence of roadside parking. A total of 15 variables were found to be significant; of which six were found to have random effect across the population of segments. The following sub-sections highlight the results by the type of effect for each parameter.

Table 5 Estimation results for OLS and RPM linear regression models

Fixed Parameters Among the nine significant non-random variables, five are related to traffic flow characteristics. The coefficient of all of the fixed parameters variables has shown to have negative sign, which implies that those variables inversely affect the speed choice. In other words, an increase in flow decreases the speed of the vehicle. However, the impact of the variables and their level of significance for different vehicle types is different. Parameters under the group of traffic flow, operating in the direction of the subject vehicle, have the most significant negative influence in speed choice. Opposite directional motorcycle and non-motorized also have a significantly negative influence.

The impact of the poor shoulder is almost five times more than that of average shoulder. This output confirms that the worsening condition of the road shoulder is one of the most important reasons for the reduction in speed. Sporadic parking, though not significant in the fixed parameter-based OLS model, is significant in the RPL model, where it has a negative influence on the choice of speed.

Random Parameters Six variables were found to have a random effect with a normal distribution and significant standard deviation. Buses and trucks travelling in the direction of the subject vehicle are random parameters having a mean of coefficient − 11.50 and − 13.43, respectively. Both parameters show a significant influence on the speed choice, and their influences are varied among the segments with the standard deviation of 3.10 and 6.24, respectively. This significant influence of buses and trucks on speed choice is mainly due to their relative low speeds. Most local buses and loaded trucks are very slow and act as a hindrance to maintain normal traffic speeds.

Among the road geometry parameters, the presence of a small bridge on the segment has a normally distributed random parameter with a mean of − 3.48 and as standard deviation of 7.32. The presence of major access roads is also a normally distributed random parameter with a positive effect on speed choice with a mean coefficient of 6.27 and a standard deviation of 5.59. Drivers tend to increase speed at major access road locations. This might be due to the segregation of directional traffic using physical devices, since most of the major access segments are channelized.

Two variables were found to have random parameters with respect to road environmental measures. Medium pedestrian activity reduces speed with this parameter having a normally distributed mean coefficient of − 4.57 and a standard deviation 5.45. Other random parameter variables related to road environment included the absence or low volume of non-motorized vehicles (NMV). This variable has significant positive impact on speed, with a normally distributed mean of 9.32 and standard deviation 5.73. These results imply that the roadside exposure, whether pedestrians or non-motorized vehicles, has a significant influence on the speed choice.

Model Performance and Validation Analysis

The performance of both models was assessed using several goodness-of-fit measures, both aggregated and disaggregated. Specifically, the effect of including randomness and heterogeneity when estimating segmental speed choice models was analysed. Akaike Information Criterion (AIC), Corrected Akaike Information Criterion (AICc) and Bayesian Information Criterion (BIC) are commonly used disaggregated goodness-of fit measures. AIC, AICc and BIC are the relative quality estimator of statistical model based on the log-likelihood function. The model with the lowest BIC is preferred [21,22,23]. AIC AICc and BIC values were also calculated using the following equations:

$${\text{AIC}} = 2K - 2{\text{LL}}$$
(4)
$${\text{AICc}} = 2K - 2{\text{LL}} + \frac{{2K\left( {K + 1} \right)}}{{\left( {n - K - 1} \right)}}$$
(5)
$${\text{BIC}} = \ln \left( n \right)K - 2{\text{LL}}$$
(6)

where n is the number of observations, K is the number of parameters estimated by the model and LL is the log-likelihood value at convergence.

A number of established aggregate level measures were used to evaluate predictive performance of the model, including Pearson correlation (R), R-square (R2), adjusted R2, root mean square error (RMSE), mean prediction bias (MPB), mean absolute deviation (MAD); mean squared prediction error (MSPE) and mean absolute percentage error (MAPE).

The Pearson correlation coefficient R measures the strength of relationship between observed and predicted output estimated from the independent variables. The R is formulated as:

$$R = \frac{{\sum y_{i} f_{i} - \sum y_{i} \sum f_{i} }}{{\sqrt {\left[ {\sum y_{i}^{2} - \left( {\sum y_{i} } \right)^{2} } \right]\left[ {\sum f_{i}^{2} - \left( {\sum f_{i} } \right)^{2} } \right]} }}$$
(7)

R2 provides the proportion of the variance that is predictable from the independent variables. The R2 is computed as:

$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - f_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \bar{y}} \right)^{2} }}$$
(8)

where \(f_{i}\) and \(y_{i}\) are the predicted and observed speed of individual segment population, \(\bar{y}\) is the observed mean speed. Adjusted R2 is an extension of R2, which takes into account the number of explanatory variables and sample size. This is computed as:

$${\text{Adjusted}}\;R^{2} = 1 - \left( {1 - R^{2} } \right)\frac{n - 1}{n - p - 1}$$
(9)

where p is the total number of explanatory variables and n is the total number of sample.

RMSE provides differences between observed and predicted values and defined as:

$${\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {f_{i} - y_{i} } \right)^{2} }}{n}}$$
(10)

where \(f_{i}\) and \(y_{i}\) are the predicted and observed speed of individual segment population,

MPB shows the magnitude and direction of average bias in model prediction. MPB is defined as:

$${\text{MPB}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {f_{i} - y_{i} } \right)}}{n}$$
(11)

MAD describes average prediction error of the estimated models. MAD is defined as:

$${\text{MAD}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left| {\left. {f_{i} - y_{i} } \right|} \right.}}{n}$$
(12)

MSPE estimates the model prediction error and is defined as:

$${\text{MSPE}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {f_{i} - y_{i} } \right)^{2} }}{n}$$
(13)

Finally, MAPE expresses error as a percentage and is defined by:

$${\text{MAPE}} = \frac{100}{n}\sum \left| {\left. {\frac{{y_{i} - f_{i} }}{{y_{i} }}} \right|} \right.$$
(14)

Table 6 shows the results of the goodness-of-fit measures.

Table 6 Performance of the models

As shown in Table 6, the differences in the goodness-of-fit measures between the models are quite significant. The results clearly suggest that the random parameter model is superior in characterizing driver speed choice across the different road segments in heterogeneous traffic environment of developing countries.

The adjusted R2 value for the RPM model is 0.79. The model uses the segment mean speed to correlate with the variables for all 200-m segments. However, speed choice of a roadway segment is related to human behaviour, as well as the environment, including weather conditions and the on-road environment. Therefore, the adjusted R2 of 0.79 indicates that the model has a reasonable fit. The overall statistical assessment shows that the model provides acceptable estimates of segmental speed.

The performance of the models was validated using tenfold cross-validation technique. The observed results for randomly selected three separate trips were compared with model estimates. Figure 7 shows the relationship between the observed and estimated values for each modelling approach. The adjusted R2 value is 0.50 and 0.78 for OLS and RP model, respectively. This validation test also further established the benefit of accommodating random effects when modelling speed behaviour in heterogeneous traffic environments.

Fig. 7
figure 7

Performance of the models

Summary and Conclusions

This paper summarizes some key features of speed choice behaviour and provides the results of regression modelling used to investigate the factors affecting speed choice behaviour on a two-lane highway in heterogeneous traffic environments of developing countries. Data were collected using different techniques on a 13 km section of a two-lane bidirectional major highway in Bangladesh. The second-by-second choice of driver speeds was extracted using a test vehicle driving normally in traffic. The results show that both the time mean speeds and the space mean speeds are significantly different from the posted 80 kph speed limit. Over-speeding is a common phenomenon, which ranges up to around 110 kph. Vehicles are travelling, on average, 13% of the time and 20% of the distance at speeds above 80 kph. The abrupt change of speed is also a common and concerning issue in this highway section. The analysis of the speed profiles for individual 200 m segments shows that speeds vary considerably across the segments.

The factors influencing speed choice were quantified in detail for four different levels of speeds using traditional ordinary least squares. A comprehensive set of exogenous variables grouped into three main categories, namely traffic flow and operation, road geometry and road and roadside environment, were tested to identify the significant factors and their influences on the speed choice. As the OLS regression model does not consider the randomness and heterogeneity of exogenous variables across the population of segments, the study further formulated random parameter (RPM) regression model. Therefore, this paper contributes to research on driving behaviour by formulating and estimating an RPM regression model to better understand the speed choice behaviour of drivers at micro-level in a developing country two-lane two-way rural highway with a heterogeneous traffic environment.

Though models have been developed for four different speed levels, average speed results have also been put forward. For example, in case of the average speed random parameter model, altogether 15 independent variables were found to have statistically significant impact on the speed choice. Six variables were found to have significant random effects. Those variables include: buses and trucks travelling in the same direction; the presence of a small bridge and major access; medium-level pedestrian activities; and low non-motorized (NMV) activities along the segment. Other significant variables are the presence of four and three wheelers, improper shoulder and parking along the roadside.

In an effort to further assess the predictive performance of the models, eleven goodness-of-fit measures were calculated. The comparative results indicated that the model considering some parameters as random and others as fixed offer a better fit in all aspects. The models were validated using data collected from different trips. These validation results further indicate the considerable potential for the use of random parameter regression when analysing speed behaviour in the heterogeneous traffic environments of developing countries.

The results of the estimated model have important implications for effective speed management to ensure more discipline and safe traffic operation in developing countries. The presence of heavy vehicles, particularly local buses and overloaded trucks, is one of the main reasons for extreme speed differences for different vehicle types. To maintain the homogeneity of the speed throughout the section of the road among different motorized vehicles, the presence of unfit, overloaded heavy vehicle need to be controlled. Long-term policy decision needs to be taken to specify the accepted specification of freight transport vehicle. Loading and unloading of passengers and freight needs to be restricted to specific locations. Segregation of three-wheeler and non-motorized vehicles from mainstream traffic could be highly effective in improving safety. Road shoulder, presence of a bridge and culvert approach needs to be adequately maintained. The results confirm the need to manage roadside parking, pedestrian and non-motorized activities. Moreover, access management measure has potential to reduce inhomogeneity of speed among the vehicles. Besides, location-specific speed enforcement including police patrolling and installation of speed camera need to be provided to manage speed as well as to efficient and safe traffic operation.

Although the comparative model analysis offers strong evidence of improved performance of the RPM model, further enhancements are possible. For example, more advanced modelling approaches, such as latent class modelling, could be used. Application of machine learning techniques with a larger sample could be a useful improvement in the model and could provide better insight on the factors affecting driver speed behaviour. Discriminant function analysis (DFA) can be used for data classification to investigate the probability of their classification into a certain group, as well as to derive optimal combinations of variables. The current study has some limitations related to data and the consideration of explanatory variables. Due to limited resources and time limitations, only a small number of trips by a particular vehicle operated by a particular driver were used. The collection of data from different roads using different vehicles type and drivers is likely to yield improved models. A further attempt could be made to develop segment-specific micro-level speed prediction models with the incorporation of some other attributes. Those attributes could be time of day, weather, drivers’ demographic details, vehicle-related factors, roadside friction profiles and exposure of local traffic. The latter should include pedestrians, informal local para-transit, as well as non-motorized two and three wheelers.