1 Introduction

As society progresses, new requirements and needs may appear. With regard to road transport, researchers, administrations, and private companies are aware that controlling the evolution of traffic results in an increase of productivity and safety allows exploiting synergies among different means of transport and contributes to a more sustainable growth (SHRP 2 2013). Many different initiatives such as real-time calculation of travel times, active traffic management or automated driving emerge as examples of key achievements.

Although these lines of research are very different, they have two commonalities, namely the need for (i) appropriate data and (ii) well-founded calculations. The development of new technologies and computer software offers the possibility of collecting varied data and combining them to obtain accurate results (Yuan et al. 2014). Mobile phones, GPS (Global Positioning System), Bluetooth, Optical Character Recognition (OCR) cameras, and many other devices are sources of traffic data usable for calculations. As seen, GPS-enabled cell phones, RFID technologies, etc. have opened a new way of collecting traffic data, as they are able to register individual vehicle trajectories (Hiribarren and Herrera 2014). Furthermore, vehicles themselves will also act as high-level “sensors” in future cooperative scenarios. With these lagrangian sensors moving within the traffic stream, increasing amounts of data will be available. Therefore, it will be possible to design much more precise methodologies, either for real-time travel time estimation or for any other purpose aimed at the dynamic management of traffic.

However, currently, neither totally accurate data nor the most complex programs are usually available, at least in a sufficient amount, in less trafficked areas. This is the case for example on secondary roads, in rural areas or for small traffic management centers. In fact, the majority of these centers in developed countries depend on equipment such as loop detectors and common cameras (unable to identify vehicles). That is, loops are the main sources of data. Traffic researchers have demonstrated the advisability of deploying double loops (in pairs in each section of each lane) rather than single loops to obtain more data and thus better results in later calculations (Chen et al. 2003). Fortunately, at present this trend is usually fulfilled. Moreover, the fact that there is a single data source on any road is expected to gradually disappear. Anyway, until today’s scenarios evolve, some modifications can be performed in the procedures currently implemented in traffic centers so that they better manage traffic. In the case of this chapter, travel time estimation by means of spot speed methods will be improved only with the application of traffic flow theory, and maintaining loops as the unique data source. First, a remainder of the basics of these detectors is included next.

All inductive loop detectors are similar. They consist of a wire loop installed under the pavement of a lane, which is able to detect the presence of a vehicle (in essence a metallic object) because of the change that it causes in the electromagnetic properties of the loop. The main differences among loops are related to the software that manages and stores these data, which can be programed in several ways. As explained in Chap. 1, the data usually available in previously determined time intervals of aggregation, \(\Delta t,\) with the double-loop configuration are follows:

  • Number of vehicles that pass over the detectors.

  • Lengths of these vehicles: the software that manages the information usually classifies them into groups and keeps only the number of vehicles in each group. For example, in Spain the usual classification is as groups of vehicles shorter than 6 m, between 6 and 10 m and larger than 10 m.

  • Spot speed measurements: again, although at first individual spot speeds are detected, the software calculates and registers only their mean, i.e., the time mean speed, \(\overline{{v }}_{t}\), the average speed of all vehicles passing over a particular spot.

  • Number of vehicles that pass over the detectors with a speed lower than a particular reference speed. It is common to have two different references. Only the number of vehicles that meet this requirement is stored. It must be highlighted that the chance of obtaining these data directly from the software of loop detectors is not a standard in the USA, but it is quite common in Europe. As an example, all Spanish freeway traffic centers manage them.

The duration of the time intervals of aggregation ranges from 20–30 s in the USA up to 15 min in some European countries. Intervals between 3–5 min have proven to be the most suitable (Soriguera and Robusté 2013): both shorter and longer durations have some advantages but also some disadvantages, as it will be discussed in Sect. 3.5.

Variation of traffic speeds at various places over time turns out to be one of the basic inputs for subsequent studies, for example, the indirect estimation of travel times. However, the problem is that most studies are based on the fundamental equation of traffic flow (Eq. 3.1, introduced as Eq. 1.8 in Chap. 1); it provides the relationship between flow, \(q\), and density, \(k\), by means of a specific type of speed, the so-called space mean speed, \(\overline{{v }_{s}}\), which is really a harmonic mean calculated under particular conditions (Wardrop 1952). Further explanation about this point is included in Sect. 3.2.

$$q=\overline{{v }_{s}}*k$$
(3.1)

The use of data provided by loop detectors involves various difficulties when determining the evolution of speeds:

  • Individual speeds are measured at fixed points of a road and must be extrapolated to some extent to achieve the spatial implication needed. This spatial generalization is extremely complicated, particularly in case of congestion.

  • As mentioned, the software delivers time mean speeds. The use of these time means as substitutes of the space means required for calculations can cause a considerable loss of accuracy in the final results.

  • Although loops are simpler, more economical and more common than other devices used to collect traffic data, their utility depends on their density on the road (Bachmann et al. 2013). Some research has resulted in the development of simple search algorithms that efficiently select sensor locations in order to obtain suitable data when the number of available sensors is limited (Viti et al. 2014). Nevertheless, difficulties remain on those roads already constructed.

The goal of the algorithm introduced in this chapter is to calculate spot space mean speeds exclusively from the data provided by double-loop detectors, avoiding extra expenses for the administrations. Specifically, it is focused on the calculation of the variance of the speeds with respect to the time mean, which allows using the relationship between time mean speeds and space mean speeds in the event of stationarity defined by Rakha and Zhang (2005). As explained in Chap. 2, further improvements must be implemented to obtain more accuracy in the final objectives, in this case, in the travel time estimations. Once working with space mean speeds, a procedure for the generalization of these speeds over the links between detectors based on traffic dynamics and queue evolution would be the next challenge to face. Anyway, improvements in this first basic input have, as it is next demonstrated, satisfactory consequences.

The remaining sections of this chapter are as follows. Section 3.2 gives the background of different traffic speed definitions and summarizes their relationships according to various researchers. Section 3.3 develops the proposed algorithm, whose implementation is demonstrated in Sects. 3.4 and 3.5 with artificial and real data, and also compared with other methodologies. After the discussion of the results, attempts to find new relationships between mean speeds are performed in Sect. 3.6. Finally, Sect. 3.7 includes the conclusions and a proposal for new lines of research.

2 Background

Since 1952, when Wardrop (1952) stated his two principles concerning the idea of traffic equilibrium previously developed by Knight (1935), the differences between the time mean speed and the space mean speed have been widely demonstrated. The space mean speed, \(\overline{{v }_{s}},\) is the average speed of all vehicles in a particular stretch of a road at a specific instant (Homburger et al. 1996). The time mean speed, \(\overline{{v }_{t}},\) is the average of the speeds of all vehicles that pass over a section of a road during a certain time interval. It is easy to deduce that the time mean speed is greater than the space mean speed (Daganzo 1997) because vehicles that are faster contribute more to the time-mean than the slow ones. On the contrary, vehicles of all speeds contribute equally to the space-mean. Space averages equal time averages only in case of space–time homogeneous traffic (Breiman 1969).

As it has been explained before, loops on a road detect and average spot speeds in stipulated time intervals, thus providing time mean speeds. However, if the individual spot speeds were stored, \(\overline{{v }_{s}}\) could be calculated by giving them certain spatial nature and by considering stationary traffic in the section (Edie 1965) as Eq. 3.2 shows:

$$\overline{{v }_{s}}=\frac{\sum_{i=1}^{n}{x}_{i}}{\sum_{i=1}^{n}{tt}_{i}}= \frac{n*dx}{\sum_{i=1}^{n}\frac{dx}{{v}_{i}}}= \frac{1}{\frac{1}{n}\sum_{i=1}^{n}\frac{1}{{v}_{i}}},$$
(3.2)

where,

\({x}_{i}\)= distance covered by vehicle i,

\({tt}_{i}\)= time used by vehicle i to cover the distance xi.

\({v}_{i}\)= spot speed of vehicle i,

\(n\) = number of vehicles that pass over the detector during the time interval,

\(dx\) = differential length taken up by the detector.

Therefore, in these conditions the space mean speed could be calculated as the harmonic mean of the individual spot speeds. It must be highlighted, however, that in the origin of this formulation neither a time mean nor a space mean was established, but a generalized definition of the average speed. The fact of labelling this generalized definition of the average speed as space mean speed \(\overline{{v }_{s}}\) is an abuse of notation. Actually, \(\overline{{v }_{s}}\) does not share the spatial implications of the original space mean speed definition unless traffic is stationary. Some limitations have been imposed for that reason, considering that this identification is only performed when the average speed is computed over a narrow rectangular strip in the \(x-t\) plane with a spatial width \(dx\) and a time length \(T\), which corresponds to the measurement region of a loop detector on a highway. Taking this definition into account, the space mean speed appears for example in the mathematical formulation of the average travel time \(\overline{tt }\) of \(n\) vehicles that cover a specific distance of a road \(L\) at a constant speed \({v}_{i}\) (Eq. 3.3, already introduced in Chap. 2 as Eq. 2.1):

$$\overline{tt }= \frac{\sum_{i=1}^{n}{tt}_{i}}{n}= \frac{\sum_{i=1}^{n}\frac{L}{{v}_{i}}}{n}=L* \frac{1}{n}\sum_{i=1}^{n}\frac{1}{{v}_{i }}= \frac{L}{\overline{{v }_{s}}}.$$
(3.3)

In consequence, travel times would be underestimated if \(\overline{{v }_{t}}\) were used instead of \(\overline{{v }_{s}}\) (Soriguera and Robusté 2011). This substitution could lead to other inaccuracies such as wrong estimates of jam densities or shock wave speeds (Knoop et al. 2009). The data aggregation process is in fact an influential source of noise and errors present in conventional measures of the traffic state (Coifman 2014). Many authors have stated the importance of correctly using time-based or space-based data, no matter their source. For example, the inverse of the harmonic mean of instantaneous speeds from probe vehicles is an unbiased and consistent estimator of the mean segment travel time when sampling by space, whereas it is biased upward when sampling by time (Jenelius et al. 2015).

Clearly, upgrades in the loop software would allow these devices to store individual data or even to directly calculate space mean speeds. However, the large number of loops deployed worldwide and human inertia have so far precluded those modifications. Therefore, many researchers have tried to calculate space mean speeds from the time mean speeds provided by the loops, especially in case of stationarity, which is the common hypothesis of all the following methodologies.

The first of these relationships, shown in Eq. 3.4, is due to Wardrop (1952):

$$\overline{{v }_{t}}= \overline{{v }_{s}}+ \frac{{\sigma }_{s}^{2}}{\overline{{v }_{s}}},$$
(3.4)

where \({\sigma }_{s}^{2}\) is the variance of the speed with regard to the space mean for the specific time interval of aggregation chosen. The accuracy of the formula has been experimentally verified, but most traffic management centers cannot use it because individual speeds are needed in order to calculate the variance with regard to the space mean. This formula was actually devised to calculate time means from space means, what is not usually necessary in real life.

Another formula postulated to relate both means is that of Garber (2002) shown in Eq. 3.5:

$$\overline{{v }_{t}}= 0.966*\overline{{v }_{s}}+ 3.541.$$
(3.5)

The main problem of this relationship is that it was established based only on experimental data; thus, it cannot be extrapolated to many situations in which the boundary conditions differ from the original ones. It must be continuously calibrated and, ultimately, it is not worth using.

Equation 3.6 has been used in several traffic studies. It was first derived by Khisty (2003), but they were Rakha and Zhang (2005) who proved it analytically:

$$\overline{{v }_{s}}= \overline{{v }_{t}}- \frac{{\sigma }_{t}^{2}}{\overline{{v }_{t}}}.$$
(3.6)

In this equation \({\sigma }_{t}^{2}\) is the variance of the speed with regard to the time mean for the specific time interval of aggregation. However, the impossibility of calculating the variance arises again. Nevertheless, and taking into account the utility of the formula, Soriguera and Robusté (2011) were able to estimate this variance by imposing the common hypothesis of stationary traffic in each time interval of aggregation and additionally assuming normality of the speed distribution. Then, the variance with regard to the time mean speed is given by Eq. 3.7:

$${\sigma }_{t}= \frac{{v}^{a}- \overline{{v }_{t}}}{{F}^{-1}\left[\frac{{n}_{{v}^{a}}}{n}\right]},$$
(3.7)

where.

\({\sigma }_{t}\) = standard deviation of the speed with regard to the time mean,

\({v}^{a}\) = value of the speed chosen by traffic management centres,

\({F}^{-1}\) = inverse of the cumulative standard normal distribution,

\({n}_{v}^{a}\)= number of vehicles that pass over the detectors with a speed lower than \({v}^{a}\) in each time interval of aggregation,

\(n\) = number of vehicles that pass over the detectors in each time interval of aggregation.

Although this methodology performs well in specific conditions, Soriguera and Robusté (2011) warned that it is inappropriate to use it indiscriminately, especially in cases of shock wave onsets or offsets or with “stop and go” situations. As Cassidy (1998) declared, stationarity ensures some otherwise senseless relationships. However, the relationship established by Rakha and Zhang (2005) has been proven useful under certain conditions even with non-spot data such as those from GPS (Poomrittigul et al. 2008).

Another fact that must be taken into account to establish relationships between speeds is that they more or less fit common statistical distributions. The normal, log-normal, gamma and bimodal distributions appear in the majority of the traffic studies. The normal distribution is undoubtedly the most used because of its simplicity, and it performs well when traffic conditions are homogeneous. Consequently, it is also the common assumption of multivariate normal distributions for link travel times (Jenelius et al. 2013). However, the log-normal and gamma distributions are usually more suitable because they have additional advantages (Haight 1962):

  • They avoid the appearance of negative speeds.

  • They keep their shape if either time speeds or space speeds are fitted.

In the case of the log-normal distribution, another important advantage is the fact that the distribution of travel times based on speeds that fit this distribution maintains the same shape (El Faouzi et al. 2007). If the log-normal speed distribution has a mean μ and a standard deviation σ, the distribution of travel times will follow Eq. 3.8:

$${f}_{t }\left(t\right)= \frac{1}{\sqrt{2*\pi }*\sigma *t}{*e}^{\left[-\dfrac{{\left(Ln t+ \mu \right)}^{2}}{2{*\sigma }^{2}}\right]}.$$
(3.8)

In the cases where traffic is too heterogeneous (for example, because there are many different vehicle types that may behave differently or because phases of free flow follow congestion periods), unimodal distributions should be avoided (Dey et al. 2006). Bimodal or even multimodal distributions might be used. Each of their components would often be a normal or log-normal distribution (May 1990).

Many other complex distributions have been used in research, but their complexity prevents them from being put into practice (Zou and Zhang 2012). Even for log-normal distributions, some improvements can be expected if the distributions are truncated because only a range of speeds makes sense. In addition, the variances of these truncated distributions are always smaller than those of the original ones (Wang 2012).

3 Simple Algorithm for the Estimation of Space Mean Speeds from the Data Provided by Double-Loop Detectors

Having analyzed previous investigations and taking into account the data available, the author decides to use the equation of Rakha and Zhang (2005) to solve the problem of not having an explicit value of the variance. The motivation is that the validity of this formula has been widely demonstrated in experimental studies. However, a particular analysis has been performed in order to compare it with other possible relationships. Section 3.6 contains the results of this comparison, which effectively verifies the goodness of this formula against the others.

To be able to estimate the variance, two important hypotheses are assumed. In each time interval of aggregation \(T\):

  • Traffic is stationary.

  • The speed distribution is log-normal.

The validity of these hypotheses will be discussed in Sect. 3.5.4. The first one has also been taken for granted in the other methodologies discussed in the chapter. With regard to the second, the author exploits the advantages of the log-normal distribution mentioned in Sect. 3.2. Assuming that the distribution of individual speeds \({v}_{i}\) in each time interval of aggregation \(T\) is log-normal, the distribution of the logarithms of these speeds \(x=Lnv\) is a normal distribution \(N({\mu }_{x}, {\sigma }_{x})\). Therefore, the probability density function of the speeds, their mean and their variance are given by Eqs. 3.9 to 3.11, respectively,

$${f}_{v}\left(v\right)= \frac{1}{\sqrt{2*\pi }{*\sigma }_{x}*v}*{e}^{\left[-\dfrac{{\left(Lnv- {\mu }_{x}\right)}^{2}}{2*{{\sigma }_{x}}^{2}}\right]} \,\,\; with \,\, v>0,$$
(3.9)
$${\mu }_{v}= \overline{{v }_{t}}= {e}^{{\mu }_{x}+ \dfrac{{\sigma }_{x}^{2}}{2}},$$
(3.10)
$${\sigma }_{v}^{2}= {\sigma }_{t}^{2}=\left({e}^{{\sigma }_{x}^{2}}- 1\right)*{e}^{2{*\mu }_{x}+ {\sigma }_{x}^{2}},$$
(3.11)

where

\(v\) = individual speed,

\({\mu }_{x}\) = arithmetic mean of the logarithms of the speeds,

\({\sigma }_{x}^{2}\)= variance of the logarithms of the speeds with regard to the mean.

Note that the goal of the algorithm is to estimate \({\sigma }_{v}^{2}\), which corresponds to the variance with regard to the time mean speed, termed \({\sigma }_{t}^{2}\) by Rakha and Zhang (2005). Therefore, \({\mu }_{x}\) and \({\sigma }_{x}\) are needed. \({\mu }_{v}\) is supplied by the loops (the time mean speed, termed \(\overline{{v }_{t}}\) by Rakha and Zhang (2005)).

Let \({n}_{v}^{a}\) be the number of vehicles that pass over the detectors in a section with a speed lower than \({v}^{a}\) in one time interval of aggregation \(T\). The probability that a vehicle passes over the detector with such a speed is shown in Eq. 3.12:

$$\begin{aligned} & P \left[V\le {v}^{a}\right]\approx \frac{{n}_{v}^{a}}{n}\approx P \left[{e}^{X}\le {e}^{{x}^{a}}\right]\approx P \left[{Lne}^{X}\le {Lne}^{{x}^{a}}\right]\approx P \left[X\le {x}^{a}\right]=F \left[Z\left({x}^{a}\right)\right] \\ & =F\left[Z\left(Ln{v}^{a}\right)\right]= F \left[\frac{Ln{v}^{a}-{\mu }_{x}}{{\sigma }_{x}}\right], \\ \end{aligned}$$
(3.12)

where

\({v}^{a}\) = speed chosen as a reference,

\(n\) = number of vehicles that pass over the detectors in each time interval of aggregation,

\({x}^{a}\) = logarithm of the speed va,

\(F\) = cumulative standard normal distribution,

\(\left(Z\right)\)= standardized value.

Rearranging Eqs. 3.10 and 3.12 yields a system with two equations (Eqs. 3.13 and 3.14) and two unknowns:

$$2{\mu }_{x}+ {\sigma }_{x}^{2}={Ln\overline{{v }_{t}}}^{2},$$
(3.13)
$${\mu }_{x}+ {F}^{-1}\left[\frac{{n}_{v}^{a}}{n}\right]*{\sigma }_{x}= Ln{v}^{a},$$
(3.14)

where

\({F}^{-1}\)= inverse of the cumulative standard normal distribution.

Finally, Eq. 3.15 is obtained

$${\sigma }_{x}^{2}- {2*F}^{-1}\left[\frac{{n}_{v}^{a}}{n}\right]*{\sigma }_{x}+Ln{\left(\frac{{v}^{a}}{\overline{{v }_{t}}}\right)}^{2}=0.$$
(3.15)

Solving Eq. 3.15, two possible values of \({\sigma }_{x}\) arise. For two reference values of speed (\({v}^{a1}\) and \({v}^{a2}\)), four values are provided. In practice, some of these are nullified during the calculations because there are some mathematical limitations for the algorithm. In each time interval of aggregation \(T\):

  • \(n\) cannot be too small or the initial substitution of the theoretical probability by the accumulated frequency (Eq. 3.12) is problematic and the confidence interval of the estimations is too small.

  • It is necessary that \({n}_{v}^{a}\ne 0\) and \({n}_{v}^{a}\ne n.\) This keeps the inverse of the cumulated standard distribution from tending to infinite.

  • \({\left({F}^{-1}\left[\frac{{{\varvec{n}}}_{{\varvec{v}}}^{{\varvec{a}}}}{{\varvec{n}}}\right]\right)}^{2}\) must be greater than \(Ln{\left(\frac{{{\varvec{v}}}^{{\varvec{a}}}}{\overline{{{\varvec{v}} }_{{\varvec{t}}}}}\right)}^{2}\) to avoid square roots of negative numbers when solving Eq. 3.15.

  • It is necessary that \(\frac{{v}^{a}}{\overline{{v }_{t}}}\ne 0\) to avoid natural logarithms of zero.

In those cases when more than one value of \({\sigma }_{x}\) results, an action protocol must be established that helps to choose the most suitable. One possibility is to keep the value with the smallest confidence interval for a specific level of confidence. Once a value of \({\sigma }_{x}\) is found and introduced into Eq. 3.13, the corresponding \({\mu }_{x}\) can be calculated. By using both values in Eq. 3.11, \({\sigma }_{t}^{2}\) is finally obtained and can be introduced into Eq. 3.6 to estimate \(\overline{{v }_{\mathrm{s}}}\). The flow chart in Fig. 3.1 summarizes the main steps of the algorithm.

Fig. 3.1
figure 1

Steps of the algorithm to obtain space mean speeds from loop detector data

As noted, in practice it is not easy to choose the best estimate of \({\sigma }_{t}^{2}\) from more than one possible value. There are no simple methods to calculate confidence intervals for the variance of log-normal distributions. Bayesian procedures seem to be the most suitable (Harvey et al., 2012), although quite difficult to implement.

A naïve solution could be to consider the confidence intervals of a parameter calculated in a previous step of the method, for example \({\sigma }_{x}\). If the best \({\sigma }_{x}\) is chosen, the best \({\sigma }_{t}^{2}\) and thus a more accurate \(\overline{{v }_{s}}\) will be obtained. Because the variable \(x\) is normally distributed, the solution for the confidence interval limits of \({\sigma }_{x}\) proposed by Soriguera and Robusté (2011) and developed in Eqs. 3.16 and 3.17 can be used:

$${{\varepsilon }_{{\sigma }_{x(1)}}= - \dfrac{\left({v}^{a} - {\mu }_{x}\right){*\varepsilon }_{z(1) }}{Z (Z+{\varepsilon }_{z\left(1\right)})},}$$
(3.16a)
$${{\varepsilon }_{{\sigma }_{x(2)}}= - \dfrac{\left({v}^{a}- {\mu }_{x}\right)*{\varepsilon }_{z\left(2\right)}}{Z \left(Z+{\varepsilon }_{z\left(2\right)}\right)},}$$
(3.16b)

where

$${\varepsilon }_{Z\left(1\right)} = {{F}^{-1}\left(p + {\varepsilon }_{p}\right)- {F}^{-1}\left(p\right),}$$
(3.17a)
$${\varepsilon }_{Z\left(2\right)} = {{F}^{-1}\left(p- {\varepsilon }_{p}\right)- {F}^{-1}\left(p\right).}$$
(3.17b)

The variable \(p\) is the probability of a vehicle with a speed smaller than \({v}^{a}\) passing over the detector in the time interval of aggregation. The circulation of vehicles over the detectors can be observed as a Bernouilli process; the possibilities are their driving slower than a reference speed or not, these trials being independent. Thus, the estimator of \(p,\) \(\widehat{p,}\) matches Eq. 3.18:

$$\widehat{p}= \frac{{n}_{v}^{a}}{n}.$$
(3.18)

The proposed methodology relies heavily on the availability of \({n}_{v}^{a}\). If \({n}_{v}^{a}\) is not reported to the traffic management center in the normal functioning of the system, the method cannot be applied. Obviously, carrying out modifications in the controllers in order to achieve these data lacks any sense, as it would be simpler, in this case, to introduce other modifications in order to directly obtain \(\overline{{v }_{s}}\). Nevertheless, in those countries where \({n}_{v}^{a}\) is available (a substantial number), the fact of using the estimated \(\overline{{v }_{s}}\) instead of working with \(\overline{{v }_{t}}\) (the current procedure) for later calculations would imply a higher level of accuracy without the need of any re-coding.

4 Implementation of the Algorithm with Artificial Data

To first verify the proper functioning of the algorithm, it was tested successfully with data generated with Matlab and readjusted to fulfil the main hypotheses of the method, i.e., the stationarity of the traffic and the log-normality of the speed distribution in each time interval of aggregation \(T\) as well as the mathematical requirements detailed in Sect. 3.3. For this last reason, the reference values were set at 101 km/h and 110 km/h (90 and 98% of the total time mean speed), ensuring enough vehicles participating in the calculations. The steps followed and the results are shown in Table 3.1, whereas Fig. 3.2 shows them in comparison with time means and real space mean speeds.

Table 3.1 Estimation of the space mean speeds and comparison of the results obtained with the data provided by the loops and with the real values
Fig. 3.2
figure 2

Comparison of the real space mean speeds, the time mean speeds and the space mean speeds estimated with the algorithm from data that completely fulfil the initial conditions of the method

The estimated space mean speeds are much closer to the real space mean speeds than the time mean speeds that the loops provide. The error introduced by the latter is 2.17%, compared to 0.65% for the estimations of the algorithm. The validity of the algorithm has been therefore demonstrated in these ideal conditions.

The mean relative error was calculated taking into account absolute values of the differences. In addition, regarding the estimated space means, only values with differences smaller than the maximum difference incurred by the loops were admitted. This procedure was followed also in Sect. 3.5 with real data.

5 Implementation of the Algorithm with Real Data

The validity of the algorithm has been demonstrated in an ideal situation where all the initial conditions that were assumed when defining the method were met. However, it is also necessary to test it with different combinations of real data for which one or more of these conditions probably will not apply.

5.1 The Data

The data used for this study were collected during two days, on March 31th, 2014 and April 1st, 2014 in a section with double loops (P.K. 86 + 211, with two lanes in the direction toward A Coruña) of the AP-9 freeway, which runs north and south along the west coast of Galicia in Spain. The data were provided per lane and for aggregation time intervals \(T\) of 15 min. It must be noted that the fact that the data is a few years old has no special implication. In fact, the traffic control center in charge of this freeway still generates this type of information on a daily basis.

During the normal management of this freeway, the common data available were and are as follows:

  • Number of vehicles that pass over the loops (\(n\)).

  • Number of vehicles with lengths \(L\) shorter than 6 m, between 6 and 10 m or longer than 10 m.

  • Time mean speeds \(\overline{{v }_{t}}\): in an initial stage these speeds are averaged every 5 min, but then they are smoothed for time intervals of 15 min.

  • Number of vehicles (\({n}_{V}^{a}\)) that pass over the loops with speeds lower than 50 km/h (\({V}^{a1}\)) and 100 km/h (\({V}^{a2}\)), respectively.

Specifically for investigation purposes however, on this occasion the individual speeds and lengths were also provided, thus allowing an analysis of the algorithm with a wide range of different boundary conditions, as well as the comparison of the estimated space mean speeds with the real ones. The algorithm was executed with data obtained on different days, in different lanes (the left, for the fastest vehicles, and the right, for medium–low speed vehicles) and for all vehicles or only those whose lengths \(L\) were within a specified range. In addition, different time intervals of aggregation (\(T\), in minutes) and reference speeds (\({V}^{a1}\) and \({V}^{a2}\)) were used. \(N\) is the number of vehicles detected during the entire data acquisition period. Table 3.2 shows the cases that have been analyzed:

Table 3.2 Cases analyzed to test the algorithm

5.2 The Results

Table 3.3 shows the difference between using the time mean speeds provided by the loop detectors or the space mean speeds estimated with the algorithm as substitutes for real space mean speeds. This difference is shown as in Sect. 3.4, i.e., by determining the mean relative error in each case.

Table 3.3 Comparison between the errors derived from the use of time means and those of the algorithm

In 8 out of the 11 cases analyzed (and taking into account that case V has been subdivided) the algorithm implies an improvement, but there are 2 cases where the results have been worse and another in which no reasonable value has been obtained. This behavior was analyzed and understood; it is discussed in Sect. 3.5.4.

Note that in most cases it is not possible to determine the validity of the algorithm by focusing only on one of the boundary conditions; attention to the combination of all of them is required. Nevertheless, once all the conditions for the calculation have been established, its performance can be improved by changing only one of them. As an example, between cases VI (Fig. 3.3) and VII (Fig. 3.4) only the reference speeds are different. However, the algorithm only shows a good performance in the latter case. The reason underlying this fact is that, in case VI, the sample includes fewer vehicles because most of them were driving at speeds higher than 50 km/h. Another example is based on cases IV (Fig. 3.5) and V (Fig. 3.6). Segregating the sample according to the vehicle length improves the performance for light vehicles because the hypothesis of log-normality is better achieved. As for heavy vehicles, the algorithm in this specific example does not even run due to the small sample size of these vehicles. The influence of the length of the time interval of aggregation can be observed for example between cases II and IV (Figs. 3.7 and 3.5). The results of case IV, where \(T=5\) minutes, are much better.

Fig. 3.3
figure 3

Comparison of the real space mean speeds, the time mean speeds and the space mean speeds estimated with the algorithm in case VI

Fig. 3.4
figure 4

Comparison of the real space mean speeds, the time mean speeds and the space mean speeds estimated with the algorithm in case VII

Fig. 3.5
figure 5

Comparison of the real space mean speeds, the time mean speeds and the space mean speeds estimated with the algorithm in case IV

Fig. 3.6
figure 6

Comparison of the real space mean speeds, the time mean speeds and the space mean speeds estimated with the algorithm in case Va

Fig. 3.7
figure 7

Comparison of the real space mean speeds, the time mean speeds and the space mean speeds estimated with the algorithm in case II

5.3 Comparison Between the Proposed Algorithm and Other Methods

Because the proposed algorithm is somewhat more complicated than that introduced by Soriguera and Robusté (2011), a comparative analysis was performed to verify that it is worth using. In case I for example, the proposed algorithm demonstrated good behavior, diminishing the error incurred by the use of time mean speeds by 0.58%. Figure 3.8 and Table 3.4 Comparison of the errors introduced by different methodologies in case I. compare these results with that obtained with the methodology of Soriguera and Robusté (2011), which, as mentioned before, assume normality and stationarity in each time interval of aggregation \(T\).

Fig. 3.8
figure 8

Comparison of the real space mean speeds, the time mean speeds and the space mean speeds estimated with the proposed algorithm and the algorithm of Soriguera and Robusté (2011) in case I

Table 3.4 Comparison of the errors introduced by different methodologies in case I

In spite of being conscious of the dependence of the formula of Garber on the boundary conditions, Table 3.4 also includes the results that would be obtained from its application, only for comparison purposes. The equation of Wardrop, as it has been previously stated, is clearly useful only to calculate \(\overline{{v }_{t}}\) from \(\overline{{v }_{s}}\), what is not necessary in practical uses.

5.4 Discussion

Given the accuracy of the estimates achieved in each case, some conclusions must be drawn. It seems that the algorithm is worth using in numerous situations because results are usually more accurate than the currently accepted time mean speeds. However, while it clearly performs better in some of these cases, it does not do so well in others. The analysis was carried out taking into account the following boundary conditions:

  • Sample size.

  • Log-normality of the speed distribution.

  • Speeds chosen as references.

  • Length of the time interval of aggregation.

  • Prevailing type of vehicles.

  • General traffic conditions.

  • Place, day and moment of data acquisition.

Regarding the sample size, the larger the sample, the better the algorithm performs. The main reasons are that the probability of having a log-normal distribution of speeds in each time interval of aggregation increases and because fewer mathematical inconsistencies appear during the calculations.

The log-normality of the speed distribution in each time interval of aggregation is one of the main hypotheses of the method and, therefore, it must be met. This can be more or less difficult depending on the conditions established for the calculations. For example, with low traffic densities, the behaviors of fast (e.g. cars) and slow (e.g. trucks, buses, vans) vehicles can be very different (Dey et al. 2006). If the estimation is made with samples from all lanes, bimodal or even multimodal distributions will probably appear. Therefore, the analysis must be made by lane (Soriguera and Robusté 2011). However, with high-medium densities, log-normality could appear even in the whole section because the faster vehicles will not be able to reach their usual speeds. As previously mentioned, log-normality is more suitable with large samples. To illustrate the importance of fulfilling this hypothesis, two time intervals of \(T=5\) minutes of case Va were chosen (time intervals between 7.40 and 7.45 a.m. and between 11.10 and 11.15 a.m.). The errors of estimation in these intervals were among the smallest (0.04% and 0.03%, respectively). The logarithms of the speeds were tested with the Kolgomorov-Smirnov (KS) Test. Table 3.5 shows the results, where the p-value in both cases was greater than 0.05, indicating normality of the logarithms and thus log-normality of the speeds. Figures 3.9 and 3.10 also roughly represent this trend.

Table 3.5 KS test results for two time intervals with accurate estimates
Fig. 3.9
figure 9

Log-normal trend for time interval between 7.45 and 7.50 a.m

Fig. 3.10
figure 10

Log-normal trend for time interval between 11.15 and 11.20 a.m

The election of the speeds chosen as a reference must be made in a logical way with the only purpose of having a sufficient number of vehicles in the sample. In the specific case of the AP-9 freeway, the values used were 50 and 100 km/h. As it is obviously uncommon for a vehicle to drive slower than 50 km/h on a freeway, some data will still be missed. Since the individual speeds were available, other values have been chosen for some of the analyses, what has led to better results. In this research, values of 90 and 98% of the average speed were chosen. In practice, these values could be based on (recent) historical data.

As for the lengths of the time intervals of aggregation, both long and short intervals show advantages and disadvantages. Short durations are more likely to comply with the other main hypothesis of the method, i.e., the stationarity of the traffic flow, and yield more accuracy in subsequent calculations in real time (for example, in travel time calculations). On the contrary, longer periods involve a greater sample size and a lower need for calculation capacity because a smaller number of iterations will be run each day.

Again, the prevailing type of vehicle is related to the convenience of making the estimations per lane or in a whole section to help to ensure the appearance of log-normal distributions. If possible, it is always advisable to work per lane and even to divide the vehicles into groups by their usual speeds, although this last step adds some extra effort. In case of working per lane, later estimates for the section can be obtained with equations such as Eq. 3.19, where the superscript \(i\) labels the lanes of the section (Soriguera and Robusté 2011):

$$\overline{{v }_{s}^{section}}= \frac{1}{\left[\frac{1}{\sum_{i}{n}^{i}}\right]*\sum_{i}\left({n}^{i}/{\overline{v} }_{s}^{i}\right)}.$$
(3.19)

A preliminary analysis of the behavior of each type of vehicle should be done to avoid useless work. In this study, dividing the vehicles into the three sizes established by the Galician traffic management center generally provided the same results as classifying them into only two sizes (presumably the fast and slow ones), or even worse ones in some time intervals of aggregation lacking of vehicles of specific groups in the sample.

Note that the hypothesis of stationarity for the traffic flow has conditioned most of the steps followed when deriving the algorithm and, thus, is essential to achieve a good performance. This stationarity is assumed for each time interval of aggregation, and it is quite likely to occur. Nevertheless, there will also be frequent occasions in which transients (shock waves, stop and go behavior, etc.) will be present, and, thus, in which the algorithm as it is will not provide accurate estimates and would need some complex changes. To detect these situations, some simple measures can be taken. One parameter that can help to detect the presence of transients is the coefficient of variation (\(CV\)) (Eq. 3.20):

$${CV}_{v}= \frac{{\sigma }_{v}}{\overline{v} },$$
(3.20)

where

\({CV}_{v}\) = speed coefficient of variation,

\({\sigma }_{v}\)= speed standard deviation,

\(\overline{v }\)= mean speed.

Theoretically, if stationary traffic is assumed, this parameter tends to increase as the mean speed does; although it is in the denominator, the more the mean increases, the more the deviation does. Besides, the coefficient of variation indicates the importance of distinguishing time mean speeds from space mean speeds based on the relationships established by Wardrop (1952) or Rakha and Zhang (2005), as Eq. 3.21 shows:

$$\overline{{v }_{t}}- \overline{{v }_{s}}= \frac{{\sigma }_{t}^{2}}{\overline{{v }_{t}}}= \frac{{\sigma }_{s}^{2}}{\overline{{v }_{s}}}=CV*\sigma ={CV}^{2}*\overline{v }.$$
(3.21)

The formula indicates that greater differences will occur with high \({CV}_{S}\) and high mean speeds. However, empirically, it is common that the greatest differences appear with high \({CV}_{S}\) and low mean speeds, a supposedly incompatible pairing. This fact indicates that the traffic is not stationary (May 1990; Rakha and Zhang 2005; Soriguera and Robusté 2011). Figure 3.11 shows the relationship between the mean speed and the \(CV\) in case VI, in which the algorithm did not perform well. In this case the \(CV\) diminishes with the mean, indicating the presence of transients and thus explaining the poor functioning of the method. In case IX (Fig. 3.12), the trend agrees with the assumption (stationarity) and the algorithm provides good results.

Fig. 3.11
figure 11

Mean speeds versus the coefficient of variation in case VI

Fig. 3.12
figure 12

Mean speeds versus the coefficient of variation in case IX

Although similar trends are usually obtained by directly comparing average speeds with the difference between time and space means (Figs. 3.13 and 3.14), the fact of not taking into account the variance of the speeds could result in an exaggerated impression of the magnitude of the relationship. The use of \(CV\) is strongly advised.

Fig. 3.13
figure 13

Average speeds versus the difference between time mean speed and space mean speed in case VI

Fig. 3.14
figure 14

Average speeds versus the difference between time mean speed and space mean speed in case IX

Finally, the place, day and moment when the data are collected is related to some of the issues previously mentioned. For example, the number and type of vehicles that drive on a freeway toward a capital on a workday morning in March will be very different from that on an August Sunday on a secondary road surrounding a small town. Therefore, speeds and traffic conditions will also be very different.

6 In Search of Other Relationships Between Mean Speeds

As explained, the algorithm proposed in Sect. 3.3 draws from the premise that the formula derived by Rakha and Zhang (2005) is the one that best defines a relationship between time mean speeds and space mean speeds, under different boundary conditions. Several researchers (e.g. Soriguera and Robusté 2011) reached the same conclusion, and this chapter has demonstrated the goodness of this formula, for example, compared to that of Garber's. However, the author wanted to check whether it would be possible to find a formula that would yield better results for the real case study analyzed in Sect. 3.5, even assuming a priori the impossibility of extrapolation. As the reference speeds played no role in this analysis, a different and more concise nomenclature has been defined (Table 3.6).

Table 3.6 Cases analyzed to verify the best relationship between the time mean speeds and the space mean speeds

As explained in Sect. 3.5.1, individual spot speed data were in this case available. This allowed the calculation of the exact time mean speeds (arithmetic means) and space mean speeds (harmonic means). Then, space mean speeds were estimated from time means by using Garber’s and Rakha and Zhang’s relations. The mean absolute and mean relative errors in relation to the real space mean for each case were also calculated.

In addition, an attempt was made to find another kind of correlation between both means. More in particular, the possibility of a linear, quadratic, cubic, logarithmic, inverse, exponential or power-type relationship was analyzed (Table 3.7).

Table 3.7 Tested correlations between space and time mean speeds

Both the corrected coefficient of determination, \({R}_{c}^{2},\) and the p-value were determined for this purpose. As it is already known, \({R}_{c}^{2}\) is a downward correction of \({R}^{2}\) based on the sample size \(n\) and on the number of independent variables k’, as shown in Eq. 3.22 below:

$$R_{c}^{2} \; = \;R^{2} \; - \;\left[ {\frac{{k^{\prime}*(1 - R^{2} )}}{{(n\; - \;k^{\prime}\; - \;1)}}} \right].$$
(3.22)

The p-value is related to the contrast of the regression (ANOVA). In this case, the null hypothesis stands for a value of \({R}^{2}\) that equals zero. If the significance (p-value) in the statistical F-test is lower than 5% (for a confidence level of 95%), the null hypothesis can be rejected and, therefore, the existence of a correlation is proved.

In each of the cases studied, the estimated space mean speeds and the errors for the most suitable correlation were calculated. In this way, the best relationship both in general and for each particular case was determined. In order to remove the possible outliers, a slight smoothness was also made.

It should be highlighted that the variance with regard to the time mean for each specific time interval of aggregation introduced in Rakha and Zhang’s equation was again calculated from individual spot speeds, which, as said, are not usually available. It is also important to notice that the data used in this study fit different types of distributions depending on the time interval of aggregation, being lognormal and normal distributions the most commonly found, as expected.

Table 3.8 shows the results of the curvilinear estimation. The corrected coefficient of determination indicates that the quadratic correlation is the most suitable in most cases. The coefficients of the quadratic correlation for each analysis are included in Table 3.9. New estimates of space mean speeds were calculated with these values. A level of significance was given to each coefficient, being the value of the null hypothesis equal to zero. As shown in Table 3.9 most coefficients are significant (p-value < 0.05), that is, they are needed to establish a good correlation. A linear relationship could achieve the same results only in two cases (as coefficient \(a\) is non-significant).

Table 3.8 Coefficients and their significance for quadratic relationships
Table 3.9 Corrected coefficient of determination and significance for each correlation
Table 3.10 Errors observed with the different estimations of space mean speeds from time mean speeds

Finally, the mean absolute and mean relative errors with respect to the real space mean speeds (i.e., those calculated from individual speeds) encountered with the formula of Rakha and Zhang, that of Garber and with the quadratic correlation were also compared. The results are included in Table 3.10. It can be observed that the relationship of Rakha, despite being the most complex practice because of the need of estimating the variance with regard to the time mean, is worth considering. Both the absolute and relative errors are at the lowest level in all the cases analyzed in this study. Therefore, it has been again demonstrated its appropriateness to be part of the algorithm presented in Sect. 3.3.

7 Conclusions and Further Research

The development of road networks and new technologies has proven to be a useful tool to respond to the increasing demands of society regarding the total control of traffic evolution. Nevertheless, fundamental traffic theory must be correctly incorporated in modern methodologies in order to obtain accurate results. This chapter introduces an algorithm that estimates space mean speeds in a specific time interval of aggregation as a first step, for example, for the calculation of travel times or occupancies. After analyzing the results obtained, three main conclusions can be drawn:

  • It is possible to improve the current procedure followed by most traffic management centers, i.e., considering time means equal to space means. It can be done inexpensively by exploiting all the data delivered by loop detectors. Specifically, the proposed algorithm allows an estimation of space mean speed values that are accurate in most cases, or, at least, much closer to the real values than time mean speeds. Consequently, the use of these data also improves the results of subsequent calculations.

  • The good performance of the algorithm depends on the fulfilment of its initial hypotheses, i.e., stationarity of the traffic stream and log-normality of speeds in each time interval of aggregation. The boundary conditions for data acquisition and for the calculations can be established to a certain extent in order to achieve these characteristics.

  • In case of transients, for example the formation or dissipation of shock waves, most of the steps followed to design the algorithm are not valid (starting from the extrapolation of the spot speeds to a section). Thus, other specific methodologies should be used. Data fusion appears promising in this respect, as well as other completely different approaches that try to explain the propagation of traffic oscillation by means of car-following models (Li et al. 2014).

Further research can be carried out to improve the accuracy of the results or to enlarge the sphere of application of the proposed algorithm. Some lines could be:

  • Including a smoothing process to remove erroneous data derived from the tendency of traffic loops to drift.

  • Including in the algorithm the steps necessary to calculate the confidence interval for the means in order to be able to choose the most accurate when more than one value is obtained.

  • Designing other algorithms adapted to other common speed distributions in addition to that introduced in this chapter and that in Soriguera and Robusté (2011). Thus, after the application of a prior step that may help to find the most suitable distribution for the speeds, the appropriate algorithm could be chosen in each case.

As noted, it is necessary to develop different and more evolved methodologies to estimate space mean speeds in case of transients. Loop data are probably insufficient in these situations. Other researchers have achieved good results with various techniques of data fusion (Soriguera and Robusté 2011; Bachmann et al. 2013; Yuan et al. 2014). However, there is still much work to do, since it is difficult to put most of them into practice because of their complexities and/or high costs. Of course, the same issues arise when thinking of data-driven approaches.

In view of the results, usual spot speed methods enhanced by the proposed algorithm would be satisfactory to estimate travel times in stable traffic conditions. Their combination with more elaborated methodologies that only partially rely on loop data would allow making the most of these widespread detectors on other occasions. For example at present when congestion exists or even in future driving environments.