1 Dynamic Traffic Assignment Models

Most of the real-time traffic management systems are based on conceptual architectures embedding in their core engines dynamic traffic models, usually a Dynamic Traffic Assignment (DTA) or Dynamic User Equilibrium (DUE) model. These models are aimed at providing, among others, outputs to predict traffic flows and travel times on road networks, which vary over time because of various factors. One of these factors is particularly relevant: the time variation of the demand. Traffic assignment accounting for these time dependencies are referred to as DTA. When the predicted flows are such that no user can unilaterally reduce their travel times, the resulting assignment is said to be a DUE. In any case, their main input is an OD-matrix, that is, the matrix representing the time dependencies of the demand (e.g., Barceló et al. 2004; Allström et al. 2017). Dynamic Traffic Models, either DTA or DUE, are the key tool to estimate traffic states, understanding traffic patterns. And, as already mentioned, to be able to provide a predictive information consistent with the conditions that drivers will experience in the network, thus accounting for traffic evolution. Both important functionalities become more relevant in the case or complex urban networks. This is explained in detail in Ben-Akiva et al. (2010), which describes the approaches on which DynaMIT is based as well as its objectives. Descriptions of other similar systems can be found in Barceló et al. (2007), Heygi et al. (2009), Meschini (2017), and Aimsun (2020). The last two references illustrate these approaches through their implementation in two worldwide used professional systems based on these applications: OPTIMA and Aimsun. The role of DTM becomes even more critical in recent real-time traffic management systems like the Active Transportation and Demand Management (ATDM) and the Dynamic Mobility Applications (DMA), two programs of the United States Department of Transportation (USDOT) (Mahmassani et al. 2017).

The DTA problem can be considered an extension of the well-known Static Traffic Assignment (STA) problem, widely used in transport planning. The dynamic version must be able to determine how link and path flows evolve with time in the traffic network because of a time-dependent demand defined in terms of a time-varying OD-matrix. In other words, the dynamic approach to traffic assignment must describe how traffic flow patterns evolve in time and space on the network (Mahmassani 2001). Subsequently, it must provide the estimations of the link and path travel times and their short-term expected evolution. These are the main inputs to derive the KPIs that lead to specific traffic management policies, namely, those concerning information to travelers, alternative dynamic re-routing, etc.

From this standpoint, the DUE problem can be defined as the dynamic version of Wardrop’s Principle (Wardrop 1952; Friesz et al. 1993; Smith 1993; Ran and Boyce 1996): “If, for each OD pair at each instant of time, the actual travel times experienced by travelers departing at the same time are equal and minimal, the dynamic traffic flow over the network is in a travel-time-based dynamic user equilibrium state”. In other words, the DUE formulation stipulates that the experienced travel cost, including travel time and early/late arrival penalties, is identical for those route and departure time choices selected by travelers between a given OD pair. There are several attempts to translate this formulation into a suitable model.

In a recent paper, Han et al. (2019) review the various formulations of the models and the associated algorithms used to compute DUE, starting from the seminal proposal of Friesz et al. (1993), which formulates it as an open-loop, non-atomic Nask-like game. “Open-loop” means in this context that the selection of routes by the travelers after leaving the origin does not vary in response to changes in the dynamic network conditions. For its part, the term “non-atomic” implies the assumption of techniques based on aggregated traffic flow dynamics instead of techniques based on individual vehicle dynamics. This hypothesis ensures that DUE suitably accounts for two main aspects of travel behavior: the departure time choice and the route choice. Therefore, the modeling hypothesis implies that travel times are identical for all trips departing at the same time interval using the same route. Following with the contribution of Han et al. (2019), the two main components of DUE modeling approaches are highlighted:

  • The mathematical expression of the equilibrium condition.

  • The network performance model, which mimics flow propagation through the network. This is usually referred to as Dynamic Network Loading (DNL).

DTA/DUE have been the subject of intensive research and developments both from the theoretical point of view and as key components of most software platforms used for the practical implementation of traffic management systems. Consequently, as Han et al. (2019) report, the concept of dynamic equilibrium has been implemented in various ways, as, for example, variational inequalities, nonlinear complementarity problems, differential variational inequalities, etc. In this Chapter, we limit our discussion to the formulation in terms of variational inequalities (Friesz et al. 2013; Smith and Wisten 1995), which is the most frequent in practical implementations. It is based on the mathematical model (Eqs. 7.1. and 7.2) proposed by Wu (1998):

$$\left[{tt}_{rsp}\left(t\right)-{\theta }_{rs}(t)\right]*{x}_{rsp}\left(t\right)=0 \forall p\in {K}_{rs}\left(t\right), \forall \left(r,s\right)\in , t\in \left[0,T\right]$$
(7.1)
$${s.t.\;\; tt}_{rsp}\left(t\right)-{\theta }_{rs}(t)\ge 0 \forall p\in {K}_{rs}\left(t\right), \forall \left(r,s\right)\in , t\in \left[0,T\right]$$
$${tt}_{rsp}\left(t\right), {\theta }_{rs}(t, ), {x}_{rsp}\left(t\right) >0 \forall p\in {K}_{rs}\left(t\right), \forall \left(r,s\right)\in , t\in \left[0,T\right]$$

and the flow balancing equations:

$$\sum_{\forall p\in {K}_{rs}\left(t\right)}{x}_{rsp}\left(t\right)={X}_{rs}\left(t\right)\,\, \forall \left(r,s\right)\in I, t\in \left[0,T\right]$$
(7.2)

where \({x}_{rsp}\left(t\right)\) is the flow on path \(p\) departing from origin \(r\) to destination \(s\), \({tt}_{rsp}\left(t\right)\) is the actual path cost from \(r\) to \(s\) on route \(p\), \({\theta }_{rs}\left(t\right)\) is the cost of the shortest path from \(r\) to \(s\), \({K}_{rs}\left(t\right)\) is the set of all available paths from \(r\) to \(s\) and \({X}_{rs}\left(t\right)\) is the demand (number of trips) from \(r\) to \(s\), all of them at time interval \(t\). For their part, \(I\) is the set of all origin–destination pairs \((r,s)\) in the network and \(T\) the overall time period considered. It can be demonstrated that this is equivalent to solve a finite-dimensional vibrational inequality problem consisting of finding a vector \({\varvec{x}}*\) of path flows and a vector \(\tau\) of path travels times, such that

$${\left[x-{x}^{*}\right]}^{T}*\tau \ge 0, \forall x\in \aleph$$
(7.3)

where \(\aleph\) is the set of feasible flows defined by

$$\aleph =\left\{{x}_{rsp}\left(t\right)\left|\sum_{\forall p\in {K}_{rs}\left(t\right)}{x}_{rsp}\left(t\right)={X}_{rs}\left(t\right) \forall \left(r,s\right)\in I, t\in \left[0,T\right]\right., {x}_{rsp}\left(t\right)>0\right\}$$
(7.4)

Wu et al. (1991, 1998a; b) probe that this is equivalent to solve the discretized variational inequality:

$$\sum_{t\in \left[0,T\right]}\sum_{p\in \mathfrak{R}}{tt}_{rsp}\left(t\right)*\left[{x}_{rsp}\left(t\right)-{x}_{rsp}^{*}\left(t\right)\right]\ge 0$$
(7.5)

where \(\mathfrak{R}=\bigcup_{\left(r,s\right)\in I}*{K}_{rs}\) is the set of all available paths from origins to destinations.

Reviews of DTA models can be found in Boyce et al. (2001), Peeta and Ziliaskopoulos (2001), Szeto and Lo (2005), Szeto and Wong (2012), Jeihani (2007), and Bliemer et al. (2017).

Algorithms to deal with DTA or DUE problems usually involve solving this variational inequality formulation. A wide variety of algorithms has been proposed: from projection algorithms (Wu et al. 1991, 1998a; b; Florian et al. 2001) or methods of alternating directions (Lo and Szeto 2002) to various versions of the Method of Successive Averages (MSA) (Tong and Wong 2000; Florian et al. 2002; Mahut et al. 2003a, b; Mahut et al. 2004; Varia and Dhingra 2004).

The computational approaches proposed to solve the DTA problem can be broadly classified into two classes: mathematical formulations, looking for analytical solutions, and traffic simulation-based approaches, looking for approximate heuristic solutions. Both fit the conceptual framework proposed by Florian et al. (2001) and Florian et al. (2002), formalizing the relationships and dependencies between the two main components identified (Fig. 7.1):

Fig. 7.1
figure 1

Conceptual algorithmic scheme for DTA

  • A method to determine the path-dependent flow rates on the paths on the network, usually applying any of the approaches mentioned above (MSA, projection methods, etc.).

  • A DNL method, which determines how these path flows give raise to time-dependent arc volumes, arc travel times and path travel times.

Quite frequently, and basically in all practical implementations mentioned above, DNL method is based on a mesoscopic simulation model (Barceló 2010a) emulating the flow propagation through the network in the current conditions. Depending on how the convergence criterion and the iterative process implemented, the resulting assignment is a DTA or a DUE (see Chiu et al. 2011 for more details).

1.1 Determining the Path-Dependent Flow Rates by MSA: Convergence Criterion to Equilibrium

If the convergence criteria are not met after one particular iteration of the conceptual algorithmic scheme in Fig. 7.1., a new one is performed. In this new iteration, after computing the new potential paths once the link costs have been updated, the key point is the determination of how the demand will be split among these paths, producing the corresponding path flows. Carey and Ge (2012) or Han et al. (2019) provide a comprehensive panoramic view of the many computational alternatives.

To illustrate these concepts, in this chapter we address the MSA method, one of the most frequently used in practice to estimate the path-dependent flow rates to solve (Eq. 7.5). MSA is a procedure that redistributes the flows among the available paths in an iterative procedure that, at any iteration \(n\), computes a new shortest path, \({c}_{rs}(t)\), from origin \(r\) to destination \(s\) at time interval \(t\). Depending on \({c}_{rs}\left(t\right)\) the path flows update process is as follows:

If \({c}_{rs}\left(t\right)\notin {K}_{rs}^{n}\left(t\right)\)

$${x}_{rsp}^{n+1}\left(t\right)=\left\{\begin{array}{c}{\alpha }_{n} * {x}_{rsp}^{n}\left(t\right) if\,\, p\in {K}_{rs}^{n}\left(t\right) \\ \forall r,s,t \\ \left(1-{\alpha }_{n}\right)*{X}_{rs}\left(t\right) if\,\, p={c}_{rs}\left(t\right)\end{array}\right.$$
(7.6a)
$${K}_{rs}^{n+1}\left(t\right)={K}_{rs}^{n}\left(t\right)\cup {c}_{rs}\left(t\right)$$
(7.6b)

Otherwise if \({c}_{rs}\left(t\right)\in {K}_{rs}^{n}\left(t\right)\)

$${x}_{rsp}^{n+1}\left(t\right)=\left\{\begin{array}{c}{\alpha }_{n}*{x}_{rsp}^{n}\left(t\right) if\,\, p\ne {c}_{rs}\left(t\right) \\ \forall r,s,t\\ {\alpha }_{n}{*x}_{rsp}^{n}\left(t\right)+\left(1-{\alpha }_{n}\right)*{X}_{rs}\left(t\right) if\,\, p={c}_{rs}\left(t\right)\end{array}\right.$$
(7.7a)
$${K}_{rs}^{n+1}\left(t\right)={K}_{rs}^{n}\left(t\right)$$
(7.7b)

Depending on the values of the weighting coefficients αn, different MSA schemes can be implemented (Carey and Ge 2012), probably being the most typical value \({\alpha }_{n}=\frac{n}{n+1}\). Many variants have been suggested. For example, Varia and Dhingra (2004) propose a modified MSA algorithm where the weighting coefficient takes into account a variable step length that depends on the current path travel times (Eq. 7.8):

$${\alpha }_{n}=\frac{{\lambda }_{k}*\left[exp\left(-{tt}_{rsp}\left(t\right)\right)\right]}{\left(n+1\right)*\left[\sum_{p}*\left[exp\left(-{tt}_{rsp}\left(t\right)\right)\right]\right]}$$
(7.8)

One of the potential computational drawbacks of these implementations of MSA is the growing number of paths when dealing with large networks. To avoid this in the case of DTA assignments, an alternative is to specify the maximum number K of paths to keep for each OD pair. Several modified implementations have been suggested to keep control of the number of paths in MSA algorithms (Peeta and Mahmassani 1995; Sbayti et al. 2007). Interesting proposals are those in Mahut et al. (2003a, 2004; b). Possibly, one of the most computationally efficient is the one proposed by Florian et al. (2002). This variant of the algorithm initializes the process based on an incremental loading scheme that distributes the demand among the available shortest paths. The process is repeated for a predetermined number of iterations, after which no new paths are added and the corresponding fraction of the demand is redistributed according to the MSA scheme. This modified MSA works as follows:

Let K be the maximum number of iterations to compute new paths.

If \(n \le K\)

a new shortest path \({c}_{rs}\left(t\right)\notin {K}_{rs}^{n}\left(t\right)\) is found. Then,

$${x}_{rsp}^{n+1}\left(t\right)=\frac{1}{n+1}*{X}_{rs}\left(t\right)\;\; \forall p\in {K}_{rs}^{n}\left(t\right), \forall \left(r,s\right)\in I, t\in \left[0,T\right]$$
(7.9a)
$${K}_{rs}^{n+1}\left(t\right)={K}_{rs}^{n}\left(t\right)\cup {c}_{rs}\left(t\right)$$
(7.9b)

If \(n > K\)

the new shortest path is computed among the existing paths \({c}_{rs}\left(t\right)\in {K}_{rs}^{n}\left(t\right)\). Then, the set \({K}_{rs}^{n}\left(t\right)\) does not change, \({K}_{rs}^{n+1}\left(t\right)={K}_{rs}^{n}\left(t\right)\), and

$${x}_{rsp}^{n+1}\left(t\right)=\left\{\begin{array}{c}\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\! \frac{1}{n+1}*{X}_{rs}\left(t\right) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; if\,\, p\ne {c}_{rs}\left(t\right) \\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \forall p\in {K}_{rs}^{n}\left(t\right), \forall \left(r,s\right)\in I, t\in \left[0,T\right]\\ \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\! \frac{n}{n+1}*{x}_{rsp}^{n}\left(t\right)+\frac{1}{n+1}*{X}_{rs}\left(t\right) \;\;\;if\,\, p={c}_{rs}\left(t\right)\end{array}\right.$$
(7.10)

However, the possibility of repeating shortest paths from one iteration to the next to keep a maximum \(K\) of different shortest paths in a proper implementation of the algorithm implies a requirement: that the number of iterations \(n\) is defined for any OD pair and time interval.

All the approaches for DUE based on simulation procedures for the network loading process are, therefore, heuristic in nature. Thus, no formal proof of convergence can be provided. However, a convergence criterion is necessary. In this context, a way to empirically determine if the solution reached can be interpreted in terms of a DUE, in the mentioned sense that “the actual travel time experienced by travelers departing at the same time are equal and minimal”, can be based on an ad hoc version of the Relative Gap Function proposed by Janson (1991):

$$Rgap\left(n\right)=\frac{\sum_{t}\sum_{\left(r,s\right)\in I}\sum_{p\in {K}_{rs}\left(t\right)}{x}_{rsp}^{n}\left(t\right)*\left[{tt}_{rsp}^{n}\left(t\right)-{\theta }_{rs}^{n}\left(t\right)\right]}{\sum_{t}\sum_{\left(r,s\right)\in I}{X}_{rs}\left(t\right)*{\theta }_{rs}^{n}\left(t\right)}$$
(7.11)

where \({x}_{rsp}^{n}\left(t\right)\) is the flow on path \(p\) from \(r\) to \(s\) departing at time \(t\) at iteration \(n\). The difference \({tt}_{rsp}^{n}\left(t\right)-{\theta }_{rs}^{n}\left(t\right)\) measures the excess cost experienced because of using a path of cost \({tt}_{rsp}^{n}\left(t\right)\) instead of the shortest path, with cost \({\theta }_{rs}^{n}\left(t\right)\), at iteration \(n\). The ratio measures the total excess cost with respect to the total minimum cost if all travelers would have used the shortest paths.

1.2 Dynamic Network Loading

Once the path flows have been estimated, the next step in the DTA determines how these flows propagate across the network along the assigned paths. Thus, it yields travel times as a function of flows and accounting for their temporal profiles (Xu et al. 1999). The procedures to achieve this goal are precisely the DNL methods, which have been, and still are, a fertile research domain. In fact, a wide variety of DNL have been already proposed. Carey and Ge (2012) or Han et al. (2019) provide comprehensive overviews about them. Some of these methods, for example, those in Friesz et al. (1993), Wu et al. (1998b), or Xu et al. (1999), assume travel time functions of the form \({tt}_{ij}\left({x}_{ij}^{t}\right)={f}_{ij}\left({x}_{ij}^{t}\right)\), where \({f}_{ij}\left({x}_{ij}^{t}\right)\) is the travel time function for link \((ij)\) that provides the travel time \({tt}_{ij}\) to traverse the link as a function of \({x}_{ij}^{t}\), i.e., the flow in link \((ij)\) at time \(t.\)

However, most of the DNL currently used both in research as well as in the professional practice are based on a mesoscopic modeling of traffic flow dynamics. This is a simplification that, while capturing the essentials of the dynamics, is less data demanding and computationally more efficient than microscopic models, which emulate the dynamics of traffic flows from the detailed dynamics of each vehicle. Mesoscopic approaches sometimes combine microscopic aspects in a simplified way (basically, they can deal with individual vehicles) with macroscopic aspects (e.g., those directly concerning the flow dynamics). There are two main approaches to mesoscopic traffic simulation. First, those in which individual vehicles are not taken into account, and vehicles are grouped into packages or multivehicle platoons that move along the links. This is, for example, the case in CONTRAM (Leonard et al. 1989). Second, those in which flow dynamics are determined by simplified dynamics of individual vehicles. DYNASMART (Jayakrisham et al. 1994), DYNAMIT (Ben-Akiva et al. 1997, 2001, 2002, 2010), Dynameq (Mahut 2000; Florian et al. 2001, 2002; Mahut et al. 2003a, b, 2004; Mahut and Florian 2010), MEZZO (Burghout 2004; Burghout et al. 2005), or Aimsun (Casas et al. 2010) are well-known examples.

From a methodological point of view, the simulation approach of mesoscopic modeling lays in the way it deals with time. The most common approaches are based on synchronous timing, that is, time-oriented simulations in which time in the model progresses according to an appropriately chosen time unit \(\Delta t\), also known as the simulation step. This is the case of DYNASMART and DynaMIT. Other approaches are asynchronous or event-based. That is, the state of the model changes when some events occur. Thus, time advances in variable amounts. Dynameq and MEZZO are examples of event-based mesoscopic traffic simulators.

One of the main phenomena determining the time evolution of traffic flows across the network are vehicle queues and their backward propagation (or spillback). As the finite-difference approximations to the fluid flow models in terms of the theory of kinematic waves (LWR, Lighthill and Whitham 1955; Richards 1956), satisfactorily reproduces that dynamics, it has been quite natural to use it to develop DNL models. One of the first was the Cell Transmission Model (CTM) proposed by Daganzo (1994, 1995a), which has been extensively used by other authors (e.g., Lo and Szeto 2002; Szeto and Lo 2004). This model assumes a triangular or trapezoidal flow-density function. Daganzo (1995b) developed a second model similar to the CTM, in this case a Finite-Difference Approximation Method (FDAM), which assumes a general nonlinear flow-density function. This FDAM can be used for network loading in the same way as the Cell Transmission Model for networks in Daganzo (1995a). These basic models exhibit limitations, namely in the case of urban networks, since they only account for flow dynamics in links. This means that they do not explicitly deal with intersections and more in particular with signalized intersections, quite usual in urban networks. In this context, Bellei et al. (2005) propose a DUE approach that is an extension of the CTM. This approach, described theoretically in detail in Gentile et al. (2007), is the basis for the General Link Transmission Model (GLTM), which can deal with any concave fundamental diagram and node topology. The road network is modeled in terms of an oriented graph \(G=\left(N,A\right)\), where \(N\) is the set of nodes, each one representing an intersection and where links \(A\), connecting two intersections, converge or diverge. The forward and backward stars of each node identify the set of links converging or diverging to/from it. The GTLM link model provides the main input to the node model in terms of the incoming flows. The output from the node model is the outflows that constitute the main input for the link model (Gentile et al. 2010, 2015).

This modeling approach has also been used in many other developments that model the link, explicitly or implicitly, splitting it into two parts: the running part and the queueing part (Fig. 7.2). The running part is that where vehicles are not yet delayed by the queue spillback at the downstream node, where the capacity is limited by stop or give way signs, or traffic lights.

Fig. 7.2
figure 2

Link model

Nodes are modeled according to the interactions between traffic flows at intersections, as node transfer modules, or according to a queue server approach, explicitly accounting for traffic lights and the delays that they cause (Mahmassani et al. 1994). In this case, a simplified car-following model compatible with the macroscopic speed-density relationship on the link approximates the individual vehicle dynamics in the running part. This speed is used to estimate the earliest time at which the target vehicle could exit the link, unless it is affected by the queue spillback when reaching the border between the running part and the queueing part. Vehicle dynamics are then ruled by the queue discharging process. The boundary between the running part and the queueing part is dynamic, according to the queue spillback and queue discharge processes.

Various solutions have been proposed for simulating flow dynamics in the link running part in a simplified way. In essence, they solve the continuity equation of traffic flow:

$$\frac{\partial q}{\partial x}+\frac{\partial k}{\partial t}=g\left(x,t\right)$$
(7.12)
$$q\left(x,t\right)=k\left(x,t\right)*u\left(x,t\right)$$
(7.13)

Link densities are determined by solving the finite differences form of the continuity Eq. (7.12). This can be done using a suitable approach, as, for example, CTM or GTLM, and a functional form (7.13) of the fundamental diagram, where \(q(x,t)\) is the flow, \(k(x,t)\) the density, \(u(x,t)\) the spatial speed and \(g(x,t)\) a flow generation term, all of them at time \(t\) in \(x\). Jayakrisham et al. (1994) solve these equations in DYNASMART given the densities and the in- and outflows for each section at each time step and assuming that section speeds are calculated from the densities using the modified Greenshields (1934) speed-density relationship (Eq. 7.14):

$${u}_{i}^{t}=\left({u}_{f}-{u}_{0}\right)*{\left(1-\frac{{k}_{i}^{t}}{{k}_{jam}}\right)}^{\alpha }+{u}_{0}$$
(7.14)

where \({u}_{i}^{t}\) and \({k}_{i}^{t}\) are, respectively, the mean speed and density in section \(i\) at time step \(t\), \({u}_{f}\) and \({u}_{0}\) are the mean free speed and the minimum speed, \({k}_{jam}\) is the jam density and \(\alpha\) is a parameter that captures speed sensitivity to density. DYNAMIT (Ben-Akiva et al. 2001, 2010) includes a speed-density relationship (7.15) that generalizes the one proposed by May and Keller (1967) including a lower bound limiting density, \({k}_{min}\), and a second parameter \(\beta\) to capture speed sensitivity to concentration:

$$u=\left\{\begin{array}{c}{ u}_{f} \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;if\,\, k<{k}_{\mathrm{min}}\\ {u}_{f}*{\left[1-{\left(\frac{k-{k}_{min}}{{k}_{jam}}\right)}^{\alpha }\right]}^{\beta } \;\;\;\;\;\;\;\;\;\;\; otherwise \end{array}\right.$$
(7.15)

More in particular, the link speed is modeled assuming that it is constant on the upstream section of the link, changes along a deceleration zone covering a downstream section, and varies linearly as a function of the position in this section. According to this assumption, \({v}_{u}\) is the speed at the upstream end of the link and the one that is a function of the average density on its running part. That is, \({v}_{u}\) is determined by Eq. 7.15. For their part, \({v}_{d}\) is the speed at the downstream end of the segment and \({L}_{s}\) is the length of the deceleration zone. \({L}_{s}\) depends on the geometry of the segment and on traffic conditions. Ben-Akiva et al. (2001) propose a way to determine \({L}_{s}\) that is consistent with the empirical evidence that the majority of delays are related to queuing. Finally, assuming that the target link starts at position \(0\) and ends at position \(L\) (i.e., \(L\) is the length of the segment), the speed function at an intermediate point \(x\) in the segment can be written as follows (Eqs. 7.16 and 7.17):

$${\varvec{v}}\left(x\right)=\left\{\begin{array}{c} {v}_{d} \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;if \,\,0\le x\le L-{L}_{s}\\ \lambda *\left(x*L\right)+{v}_{d} \;\;\;\;\;\;\;\;\;\;\;if\,\, L-{L}_{s}\le x\le L\end{array}\right.$$
(7.16)

where

$$\lambda =\frac{{v}_{d}-{v}_{u}}{{L}_{s}}$$
(7.17)

Other models like MEZZO (Burghout 2004; Burghout et al. 2005) complement this approach according to empirical evidence establishing that there are two limiting densities \({k}_{\mathrm{min}}\) and \({k}_{\mathrm{max}}\), which delimit the range in which speed is still a function of the density (del Castillo and Benitez 1995; Eq. 7.18):

$$u = \left\{ {\begin{array}{*{20}c} {u_{f} } & \!\!{{\text{if}}\;k < k_{{{\text{min}}}} } \\ {u_{0} + \left( {u_{f} - u_{0} } \right)*\left[ {1 - \left( {\frac{{k - k_{{{\text{min}}}} }}{{k_{{{\text{max}}}} - k_{{{\text{min}}}} }}} \right)^{\alpha } } \right]^{\beta } } & \;\;\;\;\;\;\;\;{{\text{if}}\, k \in \left[ {k_{{{\text{min}}}} ,k_{{{\text{max}}}} } \right]} \\ {u_{{{\text{min}}}} } & {{\text{if}}\;k > k_{{k{\text{max}}}} } \\ \end{array} } \right.$$
(7.18)

\({u}_{\mathrm{min}}\) is the minimum speed in congested conditions. Various queueing models have been proposed to calculate the waiting times in the queuing part of the link. That is, the delays incurred by vehicles because of the output and acceptance capacities of the links. These, respectively, determine the rate at which vehicles can leave the link and how many vehicles can enter it depending on the available space. Obviously, when the acceptance capacity of a link is zero no more vehicles can enter the segment and spillbacks occur. A good example that illustrates this idea is the simplified model in DynaMIT (Ben-Akiva et al. 2001), which considers that the delay of the \(i-\mathrm{th}\) vehicle in the queue is given by Eq. 7.19:

$$\frac{i}{\rho }$$
(7.19)

where \(\rho\) is the output capacity of the link. Then, during a time period of length \(t\), \(\rho *t\) vehicles will leave the queue. A vehicle in the running part that at time \(t\) reaches the end of the queue will find it at \(lq\left(t\right)\), length of queue at time t, given by

$$lq\left(t\right)={lq}_{0}+{l}_{\mathrm{eff}}*\left(\rho *t-m\right)$$
(7.20)

In Eq. 7.20, \({lq}_{0}\) is the position of the end of the queue at time \(t = 0\), \({l}_{eff}\) is the effective length of the queue (i.e., the physical length plus headways), and \(m\) is the number of vehicles that reached the queue before the considered vehicle. Obviously, the model is relevant only when \(0<lq\left(t\right)<L\).

A completely different approach is taken in Dynameq (Mahut and Florian 2010). It is based on a simulation model that moves vehicles individually, according to a simplified car-following model. In this model, given two consecutive vehicles, the leader vehicle \(n\) and the follower \(n+1\), the position \({x}_{n+1}\left(t\right)\) of the follower at time t relative to the position of the leader at \({x}_{n}\left(t-T\right)\) is estimated according to Eq. 7.21:

$${x}_{n+1}\left(t\right)=\mathrm{Min}\left[{x}_{n+1}\left(t-\varepsilon \right)+\varepsilon {u}_{f},{x}_{n}\left(t-T\right)-{l}_{\mathrm{eff}}\right]$$
(7.21)

where \(T\) is the reaction time, \({u}_{f}\) the free-flow speed, \({l}_{\mathrm{eff}}\), as before, the effective vehicle length and \(\varepsilon\) an arbitrary short time interval. The first term inside the minimizing operator represents the farthest position downstream that can be attained at time \(t\) based on the follower’s position at time \(\left(t-\varepsilon \right)\), as constrained by the maximum speed of the vehicle, \({u}_{f}\). The second term inside this operator represents the farthest position downstream that can be attained based on the trajectory of the next vehicle downstream in the same lane, according to a simple collision-avoidance rule (Mahut 1999, 2001; Newell 2002). It is a simplified model that only depends on the free-flow speed and does not account for accelerations. It can be considered a lower-order model, since it only defines the position of each vehicle in time, rather than vehicle speed or acceleration.

The solution of the car-following relationship (Eq. 7.21) for time results in (Eq. 7.22):

$${t}_{n+1}\left(x\right)=\mathrm{Max}\left[{t}_{n+1}\left(x-\delta \right)+\frac{\delta }{{u}_{f}},{t}_{n}\left(x+{l}_{\mathrm{eff}}\right)+T\right]$$
(7.22)

This relationship in Eq. 7.22 enables the event-based simulation approach used in Dynameq, because it is possible to derive the following expression in Eq. 7.23. It calculates the link entrance and exit times for each vehicle:

$${t}_{n+1}\left({L}_{1}\right)=\mathrm{Max}\left[{t}_{n+1}\left(0\right)+\frac{{L}_{1}}{{u}_{f}^{1}}, {t}_{n}\left({L}_{1}\right)+T+\frac{{l}_{\mathrm{eff}}}{\mathrm{min}\left[{u}_{f}^{1},{u}_{f}^{2}\right]},{t}_{n+{L}_{2}/{l}_{eff}}\left({L}_{2}\right)+\frac{{L}_{2}}{{l}_{\mathrm{eff}}}*T\right]$$
(7.23)

where \({L}_{1}\) and \({L}_{2}\) are the lengths of two sequential links with speeds \({u}_{f}^{1}\) and \({u}_{f}^{2}\), respectively. The vehicle attributes represented by \({l}_{\mathrm{eff}}\) and \(T\) are considered identical over the entire traffic stream, and each vehicle adopts the link-specific free-flow speed when traversing a given link. The link lengths are assumed to be integer multiples of the vehicle length, \({l}_{\mathrm{eff}}\). It can be shown (Mahut 2000) that this model yields the triangular fundamental flow-density diagram (Daganzo 1994). The main events changing the state of the model are the arrivals of vehicles to links, their link departures or transfers from one link to the next, according to the turning movements at intersections.

This one-lane link model can be extended to multilane links, including lane changing decisions and additional terms to (7.23) to account for conflicts at nodes with multiple outgoing links. Details can be found in Florian et al. (2008), Mahut and Florian (2010).

The summary description of the most common DTA and DUE included in this section has shown how they can provide TMS with useful information. On the one hand, with the inputs allowing them to estimate the network traffic state. On the other hand (and what is even more relevant), with the necessary outputs to predict traffic flows and travel times on road networks. Moreover, this prediction accounts for their evolution over time because of various factors, being one of them particularly relevant: the time variation of the demand. The main pending question at this point is how to provide this time variation of the traffic demand that constitutes the main input to DTA or DUE. In other words, how to estimate OD-matrices.

2 The Static Formulation of the OD-Estimation Problem

Traffic assignment models aim at estimating traffic flows in the network assigning a trip OD-matrix to it, in terms of a route choice mechanism. Therefore, OD trip matrices become their major data input to describe the patterns of traffic behavior across this network. All formulations of static traffic assignment models (e.g., Florian and Hearn 1995), as well as the dynamic ones (e.g., Ben-Akiva et al. 2001), assume that a reliable estimate of an OD is available. However, OD-matrices are not directly observable yet, especially in the case of the time-dependent OD-matrices that are necessary for DTA models. Consequently, it has been natural to resort to indirect estimation methods. These are the matrix adjustment methods, whose main modeling hypothesis can be stated as follows: “if traffic flows in the links of a network are the consequence of the assignment of an OD matrix to a network, and if we are capable of measuring link flows, the problem of estimating the OD matrix that generates such flows can be considered as the inverse of the assignment problem” (Cascetta 2001). In other words, the traffic assignment problem is defined as the direct problem, i.e., “given the O/D matrix \(X\) and the cost conditions for using links on the road network, the user equilibrium assignment problem estimates the user equilibrium flows \(Y\) on the links of the road network” (Eq. 7.24):

$$Y=\mathrm{Assignmt}\left(X\right)$$
(7.24)

where \(Y\) is the set of all link flows, \(X\) is the OD-matrix, and \(Assignmt\) is an equilibrium assignment algorithm assigning the OD-matrix \(X\) to the network. The reciprocal problem would be that of estimating, from the observed link flows \({y}_{l}\), the OD-matrix \(X\) that originated them. In other words, the reciprocal problem of traffic assignment, as described by Cascetta (2001), consists in “assuming that the observed flows \({y}_{l}\) on a subset \(\widehat{L}\subseteq L\) of links in the network (or in all links) constitute an user equilibrium flow pattern as defined by Wardrop (1952), determining the OD matrix \(X\) whose assignment would produce the observed flows \({y}_{l}\)”. Formally, this implies that (Eq. 7.25)

$$X={\mathrm{Assignmt}}^{-1}\left(Y\right)$$
(7.25)

Since the earlier formulation proposed by van Zuylen and Willumsen (1980), the matrix adjustment problem has been a relevant research and practical problem. Given a road network \(G=\left\{L,N\right\}\), with a set of links \(L\), a set of nodes \(N\), and a set \(I\) of OD pairs, the OD-matrix estimation problem consists in finding a feasible vector (OD-matrix) \(X\in\Omega\), where Ω is the set of all feasible OD-matrices. For their part \(X=\left\{{X}_{i}\right\},i\in I\), are the demands for all OD pairs, being \(I=\left\{\mathrm{set of all }OD\mathrm{ pairs in the network}\right\}\). \((r,s)\), as introduced in Sect. 7.1, stands for the \(i-th\) OD pair. The assignment of the OD-matrix explains the observed flows \({y}_{l}\) on a subset \(\widehat{L}\subseteq L\) of links equipped with counting stations. It is usually accepted that the assignment of the OD-matrix to the links of the network is made according to an assignment proportion matrix \(P=\left\{{p}_{il}\right\}, \forall i\in I, \forall l\in L\), where each element \({p}_{il}\) in the matrix is defined as the proportion of the OD demand \({X}_{i}\) that uses link \(l\). The notation \(P=P\left(X\right)\) remarks that, in general, these proportions depend on the demand.

The hypotheses supporting the approach are illustrated in Fig. 7.3, which depicts possible positions of a hypothetical detector at a link \(l\).

Fig. 7.3
figure 3

Possible link-path relationships

Let \({y}_{l}\) be the flow measured by one detector and \({h}_{k}\) the flow on path \(k\) to which this link belongs. If \({\varphi }_{ik}\) is the fraction of the demand of the ith OD pair \({X}_{i}\), the flow \({h}_{k}\) is given by Eq. 7.26:

$${h}_{k}={\varphi }_{ik}*{X}_{i}$$
(7.26)

\({\delta }_{lk}\) is the link-path assignment matrix, taking the following values (Eq. 7.27):

$$\delta _{{lk}} = \left\{ {\begin{array}{*{20}c} 1 & \,\,\,{{\text{if}}\;{\text{link}}\;l\;{\text{belongs}}\;{\text{to}}\;{\text{path}}\;k:l \in {\text{Path}}\;k} \\ 0 & \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!{{\text{otherwise}}} \\ \end{array} } \right.\,\,\,\,\forall l \in L,k \in K_{i} ,i \in I$$
(7.27)

where \({K}_{i}=\left\{\mathrm{Set\,\, of\,\, all\,\, paths\,\, connecting\,\, the }\,\,i\mathrm{th }\,\,OD\,\,\mathrm{ pair}\right\}\). The relationship between the measured flow \({y}_{l}\) on link \(l\) and the flows \({h}_{k}\) on the paths using link \(l\) is given by Eqs. 7.28 and 7.29:

$${y}_{l}=\sum_{i\in I}\sum_{k\in {K}_{i}}{h}_{k}*{\delta }_{lk} =\sum_{i\in I}\sum_{k\in {K}_{i}}{\varphi }_{ik}*{\delta }_{lk}{*X}_{i}=\sum_{i\in I}{p}_{il}*{X}_{i}$$
(7.28)
$${p}_{il}=\sum_{k\in {K}_{i}}{\varphi }_{ik}{*\delta }_{lk}$$
(7.29)

When assigned to the network, the OD-matrix induces a flow \(Y=\left\{{y}_{l}\right\}, \forall l\in L\) in its links. If we assume that observed flows \(\widehat{Y}=\left\{{y}_{l}\right\}\) are available for a subset \(\widehat{L}\) of the links, \(l\in \widehat{L}\subseteq L,\) and that a target matrix \({X}^{H}\in\Omega\) is also available, the generic OD-matrix estimation problem can be formulated (Lundgren and Peterson 2008) as (Eq. 7.30):

$${\mathrm{Min}}_{XY}\left(X, Y\right)={w}_{1}*{F}_{1}\left(X,{X}^{H}\right)+{w}_{2}{*F}_{2}\left(Y,\widehat{Y}\right)$$
$$s.t. \sum_{i\in I}{p}_{il}\left({X}^{H}\right){*X}_{i}={\widehat{y}}_{l}, \forall l\in \widehat{L}$$
(7.30)
$$X\in\Omega$$

The functions \({F}_{1}\left(X,{X}^{H}\right)\) and \({F}_{2}\left(Y,\widehat{Y}\right),\) respectively, represent generalized distance measures. The first one that between the estimated OD-matrix \(X\) and the given target matrix \({X}^{H}\), and the second one between the estimated link flows \(Y\) and the observed link flows \(\widehat{Y}\). The parameters \({w}_{1}\) and \({w}_{2}\) reflect the relative belief (or uncertainty) in the information contained in \({X}^{H}\) and \(\widehat{Y}\). The problem expressed in Eq. 7.30 can be interpreted as a two-objective optimization problem, being precisely these objectives \({F}_{1}\) and \({F}_{2}\), whereas \({w}_{1}\) and \({w}_{2}\) are the corresponding weighting factors.

The set \(\Omega\) of feasible OD-matrices normally includes the non-negative OD-matrices. However, it can also be limited to those matrices within a certain deviation from the target values (Eq. 7.31), i.e.,

$$\Omega =\left\{X\ge 0\left|\left(1-\alpha \right)*{X}^{H}\le X\le \left(1+\alpha \right){*X}^{H}\right.\right\}$$
(7.31)

for some suitable parameter \(\mathrm{\alpha }>0\) stating the tolerance level. An analogous formulation can be used to state, instead, a maximum deviation from the link flow observations with a tolerance parameter \(\upbeta >0\) (Eq. 7.32):

$$\Omega =\left\{X\ge 0\left|\left(1-\beta \right)\right.{*\widehat{y}}_{l}\le {y}_{l}\le \left(1+\beta \right)*{\widehat{y}}_{l}\right\}$$
(7.32)

Another possibility is to restrict the total travel demand in all OD pairs originating or terminating at a certain node. This is the four-step demand model (Ortúzar and Willunsen 2011), which makes an adjustment of the trip distribution with respect to the trip generation. In any case, all these constraints on \(\Omega\) are linear or convex and can be easily handled from the optimization point of view.

Obviously, the resulting OD-matrix is dependent on the objective function minimized in (7.30), that is, on the distance measure chosen. One of the distances initially proposed, probably as an analogy with the trip distribution problem, was the maximum entropy function. It was derived from the principle of minimum information (van Zuylen and Willumsen 1980) and is expressed as in Eq. 7.33:

$${F}_{1}\left(X,{X}^{H}\right)=\sum_{i\in I}{X}_{i}*\left\{\mathrm{log}\frac{{X}_{i}}{{X}_{i}^{H}}-1\right\}$$
(7.33)

The function \({F}_{2}\) in (7.30) can be formulated in a similar way.

A type of objective function that is becoming very used in these models is the one based on the least squares formulation. This is equivalent to assume a Euclidean distance function between observed and estimated variables. For example, the function \({F}_{2}\) for the observed volumes would correspond to Eq. 7.34:

$${F}_{2}\left(Y,\widehat{Y}\right)=\sum_{l\in \widehat{L}}{\left({y}_{l}-{\widehat{y}}_{l}\right)}^{2}$$
(7.34)

and could be weighted using the information on the significance of each observation. For instance, when the measurements contained in \(y\) are computed as means from a set of observations for each link, the variance \({\sigma }_{l}^{2}\) can be used as a measure on how important each link observation is. Equation 7.34 would be then reformulated as Eq. 7.35:

$${F}_{2}\left(Y,\widehat{Y}\right)=\sum_{l\in \widehat{L}}{\frac{1}{{\sigma }_{l}^{2}}*\left({y}_{l}-{\widehat{y}}_{l}\right)}^{2}$$
(7.35)

One disadvantage of the entropy maximizing approaches as formulated in Eq. 7.33 lies in the treatment of link flow observations as error-free constraints (Bell and Iida 1997). An attempt to overcome this disadvantage consists in using a generalized least squares approach to provide a framework accounting for errors from various sources. The method, first proposed by Cascetta (1984), also yields standard errors for the trip table, thereby indicating the relative robustness of the fitted values. The equivalent problem, assuming that the weighting factors \({w}_{1}\) and \({w}_{2}\) have the same value, takes the following form (Eq. 7.36):

$$\begin{aligned}{\mathrm{Min}}_{X} F\left(X\right)&=\frac{1}{2}*\left[{\left(X-{X}^{H}\right)}^{T}{*\left({X}_{C}^{H}\right)}^{-1}*\left(X-{X}^{H}\right)\right]\\ & \quad +\frac{1}{2}*\left[{\left(\widehat{Y}-P\left({X}^{H}\right)*X\right)}^{T}*{\left({Y}_{C}\right)}^{-1}*\left(\widehat{Y}-P\left({X}^{H}\right)*X\right)\right]\end{aligned}$$
(7.36)

The inputs are prior estimates of OD flows, \({X}^{H}\), link flow measurements, \(\widehat{Y}\), variance–covariance matrices of the prior estimates and link flow measurements, respectively, \({X}_{C}^{H}\) and \({Y}_{C}\) and the matrix of link choice proportions \(P({X}^{H}).\) As the variance–covariance matrices are positive definite and the objective function is convex, the minimum is uniquely given by (Eq. 7.37):

$$\begin{aligned}\nabla F\left({X}^{*},{Y}^{*}\right)&={\left({X}_{C}^{H}\right)}^{-1}*\left({X}^{*}-{X}^{H}\right) \\ & \quad -{P}^{T}\left({X}^{H}\right)*{\left({Y}_{C}\right)}^{-1}*\left(\widehat{Y}-P\left({X}^{H}\right){*X}^{*}\right)=0 \end{aligned}$$
(7.37)

This yields the following linear estimator (Eq. 7.38):

$$\begin{aligned} X^{*} & = \left[ {\left( {X_{C}^{H} } \right)^{{ - 1}} + P^{T} \left( {X^{H} } \right)*\left( {Y_{C} } \right)^{{ - 1}} *P\left( {X^{H} } \right)} \right]^{{ - 1}} \\ & \quad *\left[ {\left( {X_{C}^{H} } \right)^{{ - 1}} *X^{H} + P^{T} \left( {X^{H} } \right)*\left( {Y_{C} } \right)^{{ - 1}} *\hat{Y}} \right] \\ \end{aligned}$$
(7.38)

For their part, the sensitivities of this factor are given by Eq. 7.39:

$$\begin{aligned} \Delta X^{*} & = \left[ {\left( {X_{C}^{H} } \right)^{{ - 1}} + P^{T} \left( {X^{H} } \right)*\left( {Y_{C} } \right)^{{ - 1}} *P\left( {X^{H} } \right)} \right]^{{ - 1}} \\ & \quad *\left[ {\left( {X_{C}^{H} } \right)^{{ - 1}} *\Delta X^{H} + P^{T} \left( {X^{H} } \right)*\left( {Y_{C} } \right)^{{ - 1}} *\Delta \hat{Y}} \right] \\ \end{aligned}$$
(7.39)

Additionally, taking into account that \({X}^{H}\) and \(y\) are uncorrelated and assuming that \(E={\left[{\left({X}_{C}^{H}\right)}^{-1}+{P}^{T}\left({X}^{H}\right)*{\left({Y}_{C}\right)}^{-1}*P\left({X}^{H}\right)\right]}^{-1},\) the variance of \({X}^{*}\) is given by Eq. 7.40:

$$\mathrm{Var}\left\{{X}^{*}\right\}=E{\left({X}_{C}^{H}\right)}^{-1}*E+E{*P}^{T}\left({X}^{H}\right){*\left({Y}_{C}\right)}^{-1}*P\left({X}^{H}\right)*E$$
(7.40)

Unlike the maximum entropy model, there is nothing to prevent negative fitted values for the OD flows being produced by the generalized least squares estimator. While negative values would reflect small real values, they are nonetheless counterintuitive. Bell (1991) has also considered the introduction of non-negativity constraints for the fitted OD-matrix.

3 Bi-level Optimization Models for OD Adjustment

The estimation of OD-matrices from observed flows as the reciprocal of the assignment problem is a highly undetermined problem. That is, there are in general many OD-matrices, which, when assigned to the network, induce equivalent link flows. The objective function and the set of constraints in the formulation of the problem are aimed at reducing this indetermination. However, this simple formulation can still have some drawbacks. The set of constraints in the generic problem formulation (Eq. 7.30) to determine \(X\) is expressed by Eq. 7.41:

$$\sum_{i\in I}{p}_{il}\left({X}^{H}\right)*{X}_{i}={\widehat{y}}_{l}, \forall l\in \widehat{L}$$
(7.41)
$$s.t. \,\,X\in\Omega$$

It consists of one equation for every link flow observation. Thus, it is an undetermined equation system, as long as the number of OD pairs \(\left|I\right|\) is greater than the number of link flow observations \(\left|\widehat{L}\right|\). This fact is especially true for large real-world networks. Additionally, the information transferred through the equation system is delimited by topological dependencies. A basic principle in network flows is that, for consistent flows, the balance equations must hold. In other words, the sum of incoming and outgoing flows at any intermediate node must be zero. This principle, which can also be interpreted in physical terms using Kirchhoff’s law, means that, for each intersection, at least one link flow is linearly dependent from the others. This results in a row-wise dependency for the equation system.

On the other hand, the elements \({p}_{il}\left({X}^{H}\right)\) are non-zero because they are part of one or more shortest paths for OD pairs \(i\in I\). However, since every subpath of a shortest path is a shortest path, every pair of nodes along a certain shortest path is connected through parts of this shortest path. This results in a column-wise dependency for the equation system. Thus, we can conclude that the equation system (Eq. 7.31) is most likely not fully ranked, which further increases the freedom of choice for the OD-estimation problem. Therefore, the way of determining \({p}_{il}\left({X}^{H}\right)\) is crucial for the quality of the OD-matrix estimation model. This is usually done depending on how the assignment matrix \(P({X}^{H})\) is calculated, and whether it is dependent of \(X\) or not. In other words, if the route choices are made depending on the congestion or not. If the assignment of the OD-matrix to the network is independent of the link flows, that is, if we have an uncongested network,\(P\left({X}^{H}\right)=P\) is a constant matrix. In that case, the first set of constraints in Eq. 7.41 can be reformulated as in Eq. 7.42:

$$\sum_{i\in I}{p}_{il}{*X}_{i}={\widehat{y}}_{i} \,\,\forall l\in \widehat{L}$$
(7.42)
$$s.t.\,\, X\in\Omega$$

In addition, this substitution can be directly performed in the objective, i.e., in the function \({F}_{2}\left(Y,\widehat{Y}\right)\), which reduces the OD-matrix estimation to a problem only in the variable \(X\). Assuming that the deviation measures \({F}_{1}\) and \({F}_{2}\) are convex and that the set of feasible OD-matrices \(\Omega\) is linear or, at least, convex, the OD-estimation problem can be easily solved with some suitable standard algorithms for nonlinear programming. This is the usual approach in most cases (van Zuylen and Willumsen 1980). However:

The assumption that the assignment, i.e., the route choice, is independent of the load on the links is only realistic in a network with a very low congestion rate or in networks where, in practice, only one route can be used.

If we assume that the network is congested and that the routes are chosen depending on the current travel times, the route proportions are in turn dependent on the existing traffic situation. For its part, this situation depends on the OD-matrix. Thus, the relationship between the route proportions \(P\) and the OD-matrix \(X\) can only be defined implicitly. In this case, a plausible hypothesis is to assume that the choice proportions can be derived from a traffic assignment model. Then, the set of feasible solutions to the estimation problem (Eq. 7.30) is defined as all points \((X,Y)\) in which \(Y\) is the link flow solution satisfying an assignment of the corresponding demand \(X\in\Omega\). In this case, the generic OD-matrix estimation problem (Eq. 7.30) can be reformulated as a bi-level optimization problem. Bell and Iida (1997) propose an approach based on the hypothesis that a traffic assignment model can be represented by a function whose input is the OD-matrix \(X\) and whose outputs are the link flows \(Y\) (Eq. 7.43)

$$Y=A\left(X\right)*X$$
(7.43)

That is simply a reformulation of the direct assignment problem as defined in Sect. 7.2, in which, given the OD-matrix \(X,\) it is possible to find the link flows \(\widehat{y}\). The reciprocal problem of finding \(X\) given \(y\) (Eq. 7.25) is not possible, since the inverse of this function does not exist. However, a way of accounting for this functional relationship in the OD-estimation process could be to reformulate the least squares formulation including it explicitly in the model (Eq. 7.44):

$$\begin{aligned} {\text{Minimize}}_{X} F\left( X \right) & = \frac{1}{2}*\left( {X - X^{H} } \right)^{T} *\left( {X_{C}^{H} } \right)^{{ - 1}} *\left( {X - X^{H} } \right) \\ & \quad + \frac{1}{2}*\left[ {y - A\left( X \right)*X} \right]^{T} *\left( {Y_{C} } \right)^{{ - 1}} *\left[ {\hat{Y} - A\left( X \right)*X} \right] \\ \end{aligned}$$
(7.44)

If the assignment function (Eq. 7.73) is differentiable, then (Eq. 7.45):

$$\nabla F\left(X\right)={\left({X}_{C}^{H}\right)}^{-1}*\left(X-{X}^{H}\right)-\nabla A{\left(X\right)}^{T}{*\left({Y}_{C}\right)}^{-1}*\left[\widehat{Y}-A\left(X\right)*X\right]$$
(7.45)

And if the Jacobian of the assignment function \(\nabla A\left(X\right)\) is independent of \({X}^{H}\), then (Eq. 7.46):

$${\nabla }^{2}F\left(X\right)={\left({X}_{C}^{H}\right)}^{-1}+\nabla A\left(X\right){*\left({Y}_{C}\right)}^{-1}*\nabla A\left(X\right)$$
(7.46)

\({\nabla }^{2}F\left(X\right)\) is positive definite, since \({X}_{C}^{H}\) and \({Y}_{C}\) are variance–covariance matrices, and there is a unique solution to the equivalent optimization problem. Yang (1995) proposes an efficient heuristic approach to solve this bi-level problem.

However, as Florian and Chen (1995) probe, the assignment function is usually not differentiable. Therefore, analytical approaches are of limited usefulness, since they are constrained to simple uncongested cases. Consequently, other formulations have been proposed. The most common formulation of the bi-level OD-matrix estimation problem for the general case is that Eqs. 7.47 and 7.48, respectively, referred to the upper level and to the lower level problem. Equation 7.47 is as follows:

$${\mathrm{Min}}_{X} F\left(X,Y\right)={w}_{1}{*F}_{1}\left(X,{X}^{H}\right)+{w}_{2}{*F}_{2}\left(Y,\widehat{Y}\right)$$
(7.47)
$$\mathrm{s}.\mathrm{t}.\,\, X\in \Omega$$

We want to find the \(X\) that minimizes \(F\left(X,Y\right)\) subject to \(X\in \Omega\) under the hypothesis that the induced link flow \(\widehat{y}\) satisfies the equilibrium assignment conditions obtained by solving Eq. 7.48, that is, the lower level problem:

$$Y\left(X\right)=\mathrm{argmin}\sum_{l\in L}\underset{0}{\overset{{y}_{l}}{\int }}{s}_{l}\left(x\right)dx$$
$$s.t. \sum_{k\in {K}_{i}}{h}_{k}={X}_{i}, \forall i\in I$$
(7.48)
$${h}_{k}\ge 0 \forall k\in {K}_{i}, \forall i\in I$$
$${y}_{l}=\sum_{i\in I}\sum_{k\in {K}_{i}}{\delta }_{lk}{*h}_{k}, \forall l\in L$$

The algorithm for this OD adjustment method based on a bi-level optimization process can be viewed as the calculation of a sequence of OD-matrices, so that the least squares error between traffic counts coming from detectors and traffic flows obtained by a traffic assignment is increasingly reduced. The estimation of the OD-matrix requires information about the routes used for the trips contained in the OD-matrix, \({X}_{rs}\). Particularly, it requires the definition of the route and the trip proportions relative to the total trips \({X}_{rs}\) originated at zone \(r\) and ending at zone \(s\). This information is difficult both to handle and to store in traffic databases, considering that the number of routes connecting all OD pairs on a connected network can grow exponentially with the size of the network. This is the reason to use a mathematical programming approach based on a traffic assignment algorithm, which is solved at each iteration without requiring the explicit route definition. The algorithmic scheme to numerically solve the bi-level formulation of the OD-matrix adjustment problem is illustrated in the conceptual diagram in Fig. 7.4. The solution at the \(k-\mathrm{th}\) iteration of the upper level nonlinear optimization problem for the current estimates of the link flows, \({\widehat{Y}}^{k},\) provides a new estimate \({X}^{k}\) of the OD-matrix, which is the input to the lower level equilibrium assignment problem. In turn, the solution to this lower level problem updates the link flow estimates. The iterative process continues until certain convergence criterion is satisfied.

Fig. 7.4
figure 4

Algorithmic scheme for the bi-level approach to the OD adjustment problem

One of the first operational approaches of the bi-level algorithm was the one proposed by Spiess (1990), whose bi-level optimization adjustment procedure solves the following bi-level nonlinear optimization problem (Eqs. 7.49 and 7.50):

$$\mathrm{Min}\,\, F\left[Y\left(X\right)\right]=\frac{1}{2}*\left\{\sum_{l\in \widehat{L}}{\left[{y}_{l}\left(X\right)-{\widehat{y}}_{l}\right]}^{2}\right\}$$
(7.49)
$$Y\left(X\right)=\mathrm{argmin}\sum_{l\in L}\underset{0}{\overset{{y}_{l}}{\int }}{s}_{l}\left(x\right)dx$$
(7.50)
$$s.t. \sum_{k\in {K}_{i}}{h}_{k}={X}_{i}, \forall i\in I$$
$${h}_{k}\ge 0 \forall k\in {K}_{i}, \forall i\in I$$
$${y}_{l}=\sum_{i\in I}\sum_{k\in {K}_{i}}{\delta }_{lk}*{h}_{k}=\sum_{i\in I}{X}_{i}\sum_{k\in {K}_{i}}{\delta }_{lk}*{p}_{k}, {p}_{k}=\frac{{h}_{k}}{{X}_{i}} \forall l\in L$$

where \({y}_{l}(X)\) is the flow on link \(l\) estimated by the lower level traffic assignment problem with the adjusted trip matrix \(X\), \({h}_{k}\) is the flow on the \(k-\mathrm{th}\) path for the \(i-\mathrm{th}\) O-D pair and \({\widehat{y}}_{l}\) is the measured flow on link \(l\). \(I\) is the set of all OD pairs in the network, and \({K}_{i}\) is the set of paths connecting the \(i-\mathrm{th}\) O-D pair. \({s}_{l}\)(\({y}_{l}\)) is the volume-delay function for link \(l\in L\). The algorithm used to solve the problem is heuristic in nature, of steepest descent type, and does not guarantee that a global optimum of the problem will be found. The iterative process for a generic iteration \(k\) is as follows:

  • Given a solution \({X}_{i}^{k}\), an equilibrium assignment is solved, yielding link flows \({y}_{l}^{k}\) and proportions \(\left\{{p}_{il}^{k}\right\}\) satisfying the relationship in Eq. 7.51:

    $${y}_{l}^{k}=\sum_{i\in I}{p}_{il}^{k}{*X}_{i}^{k}\quad \forall l\in L$$
    (7.51)

    The target matrix is used in the first iteration (i.e., \({X}_{i}^{1}={X}_{i}^{H}, \forall i\in I\)).

  • The estimate of the OD-matrix at iteration \(k+1\) is calculated in terms of the gradient of the objective function \(F\left[Y\left(X\right)\right]\) with Eq. 7.52:

    $$X_{i}^{{k + 1}} = \left\{ {\begin{array}{*{20}c} {X_{i} } & \!\!\!\!\!\!\!\!\!\!{{\text{for}}\;k = 0} \\ {X_{i}^{k} *\left[ {1 - \lambda ^{k} *\left( {\frac{{\partial F\left[ {Y\left( X \right)} \right]}}{{\partial X_{i} }}} \right)_{{X_{i}^{k} }} } \right]} &\;\;\;\;\; {{\text{for}}\;k = 1,2,3 \ldots } \\ \end{array} } \right.$$
    (7.52)

    That is, a change in the demand is proportional to the demand in the initial matrix and zeroes are preserved in the process.

  • The gradient is approximated as in Eq. 7.53:

    $$\frac{\partial F\left[Y\left(X\right)\right]}{\partial {X}_{i}}=\sum_{k\in {K}_{i}}{p}_{k}\sum_{l\in \widehat{L}}{\delta }_{lk}*\left({\widehat{y}}_{l}-{y}_{l}\right)\,\, \forall i\in I$$
    (7.53)

    where \(\widehat{L}\subset\) is the subset of links with flow counts and \({p}_{k}=\frac{{h}_{k}}{{X}_{i}}\).

  • The step length is approximated as in Eq. 7.54 and 7.55:

    $${\lambda }^{*}=\frac{\sum_{l\in \widehat{L}}{y}_{l}^{^{\prime}}*\left({\widehat{y}}_{l}-{y}_{l}\right)}{\sum_{l\in \widehat{L}}{\left({y}_{l}^{^{\prime}}\right)}^{2}}$$
    (7.54)

    where

    $${y}_{l}^{^{\prime}}=-\sum_{i\in I}{X}_{i}*\left(\sum_{k\in {K}_{i}}{p}_{k}\sum_{l\in \widehat{L}}{\delta }_{lk}*\left({\widehat{y}}_{l}-{y}_{l}\right)\right)*\left(\sum_{k\in {K}_{i}}{\delta }_{lk}*{p}_{k}\right)$$
    (7.55)

    To ensure the convergence the step length must satisfy the condition in Eq. 7.56:

    $${\lambda }^{*}\frac{\partial F\left[Y\left(X\right)\right]}{\partial {X}_{i}}<1\,\, \forall i\in I$$
    (7.56)

    If the condition is violated for some \(I\), the step length must be bounded accordingly (Eq. 7.57):

    $${\lambda }^{*}=\frac{1}{\underset{i}{\mathrm{max}}\left\{\frac{\partial F\left[Y\left(X\right)\right]}{\partial {X}_{i}}\right\}}+\varepsilon$$
    (7.57)

    where \(\varepsilon\) is added to avoid numerical errors.

Further details on the algorithmic properties of this approach are available in Florian and Chen (1995). Alternative approaches improving the simplified gradient approach can be found in Codina and Barceló (2004) and Lundgren and Peterson (2008), among others.

In summary, the most common practices consist in using an initial OD estimate, the OD seed \({X}^{H}\) as input, and adjusting it. This adjustment is done based on the available link counts \(y\) provided by an existing layout of traffic counting stations and on other additional information, whenever it is available. Adjustments can be considered as indirect estimation methods based on optimization approaches. All of them share two fundamental modeling hypotheses:

  • A mapping scheme of OD flows-link flow counts is available

  • If \(L\) is the set of links in the network, flow detectors are only located in a subset \(\widehat{L }\subset L,\) from which link flow measurements \({\widehat{y}}_{l}\), \(l\in \widehat{L}\) are available.

Assuming these hypotheses, a bi-level optimization model can be proposed, which is usually solved by computational schemes like the one conceptually depicted in Fig. 7.4. That is, iterating between an upper and a lower level. Again, the upper level solves a nonlinear optimization problem that minimizes the distance between available empirical evidence (i.e., a target OD-matrix \({X}^{H}\) and observed flows \(\widehat{Y}\) in a subset of links) and the estimations provided by the algorithm, while the lower level solves a User Equilibrium Traffic Assignment (UETA). The solution to the upper level nonlinear optimization problem provides new estimates of the OD-matrix, which constitute the input to the lower level assignment problem. In turn, the solution to this latter problem provides new estimates of link flows. This computational scheme is in fact a computational framework from which multiple algorithmic variants to solve the problem, both at the upper and at the lower level, can be derived.

The second modeling hypothesis strongly depends on the detection layout available in the network. Unfortunately, they are usually designed and implemented with the primary purpose of providing the data required by traffic control applications. Therefore, current detection layouts in traffic networks are not appropriate for the reconstruction of OD-matrices, as they do not take into account the OD pattern structure explicitly. This could represent a serious drawback regarding the quality of the OD reconstruction, since it has been observed in practice that the adjustment procedure can act implicitly as a metaregression model. That is, it would fit quite well those parts of the network with a relatively rich detection infrastructure (in fact overfit them is most cases), while completely distorting other parts of the network where detection is sparse. This would generate an unbalancing process moving trips between parts of the network, depending on the numerical requirements of the process, but completely unrelated to the underlying transportation phenomena modeled by the OD pattern. In this context, the objective of identifying a detection layout that optimizes the coverage of origin–destination demand on the road network while minimizing the uncertainties of the estimated OD is a subsidiary prior requirement. Since the seminal work of Yang and Zhou (1998), the problem has received substantial attention in recent years, being Ehlert et al. (2006), Fei et al. (2007) just example references. Castillo et al. (2008), who formulate the problem from the perspective of the observability of systems being a sine qua non condition for their state estimation and forecasting, must be highlighted. Larsson et al. (2010) provide an overview of the pros and cons of various approaches, and Barceló et al. (2012) complement the detection layout models with a sensitivity analysis, enabling the analyst to establish a relationship between the quality of the layout and the quality of the OD pattern reconstruction.

In consequence, for the practice of the matrix adjustment, it is not only relevant the mathematical modeling approach to be used but it is also highly recommendable to pay attention to the detection layout whose measurements are going to be used for the adjustment of an OD matrix.

4 Analytical Formulations for the Dynamic OD Matrix Estimation (DODME) Problem

The static bi-level optimization OD adjustment problem can be reformulated as (Eqs. 7.58 and 7.59):

$$\mathrm{Min}\,\,Z(X,Y)= {w}_{1}{*F}_{1}\left(X,{X}^{H}\right)+{w}_{2}*{F}_{2}\left(Y,\widehat{Y}\right)$$
(7.58)
$$s.t.\,\, Y=\mathrm{Assignmt}\left(X\right)$$
(7.59)
$$X\ge 0$$

where \({F}_{1}\) and \({F}_{2}\), as before, are suitable distance functions between estimated and observed values, while \({w}_{1}\) and \({w}_{2}\) are weighting factors reflecting the uncertainty of the information contained in \({X}^{H}\) and \(\widehat{Y}\), respectively. The underlying hypothesis is that \(Y\left(X\right)\) are the link flows predicted by assigning the demand matrix \(X\) to the network, which can be expressed by a proportion of the OD demand flows passing through the count location at a certain link. In terms of the assignment matrix \(A(X)\), the proportion of OD flow that contributes to a certain link traffic count is (Eq. 7.60):

$$Y=A\left(X\right)*X$$
(7.60)

This is a bi-level optimization problem that solves (at the upper level) the nonlinear optimization problem by substituting the estimated flows \(Y\) in the objective function (Eq. 7.59) using the relationship in Eq. 7.60. Thus, it results in (Eq. 7.61):

$$\mathrm{Min}\,\,Z\left(X,Y\right)={w}_{1}*{F}_{1}\left(X,{X}^{H}\right) +{w}_{2}{*F}_{2}\left(A\left(X\right)*X,\widehat{Y}\right)$$
(7.61)
$$s.t.\,\, X\ge 0$$

To estimate a new assignment matrix \(X\) while at the lower level, a Static User Equilibrium Assignment is used to solve the assignment problem \(Y=\mathrm{Assignmt}\left(X\right)\), i.e., to estimate the assignment matrix \(A\left(X\right)\) induced by the new \(X\). Spiess (1990) is a good example of a seminal model based on this approach. Static models have made wide use of the analytical approaches that include flow counts as complementary information to reduce indeterminacy when solving the minimization problem (Eq. 7.61), as in Codina and Montero (2006), Lundgren and Peterson (2008), and Spiess (1990). The various algorithmic approaches to numerically solve the problem look for algorithmic efficiency, convergence properties, and stability. However, since they are static, they are supported by static assignment models.

In this context, some researchers as Frederix et al. (2011), Lundgren and Peterson (2008), Toledo and Kolechkina (2013), or Yang et al. (2017) drew attention to the role played by the quality of the assignment matrix, which results from the lower level assignment process when estimating the flows used in the upper level. Therefore, they proposed either analytical or empirical approaches for improving it. The analytical approaches assume a functional dependency that allows for a Taylor expansion around the current solution. While some authors like Lundgren and Peterson (2008) still derive the expansion from a static traffic assignment, others like Frederix et al. (2013) or Toledo and Kolechkina (2013) propose a dynamic traffic assignment to account for time dependencies. The approaches based on the hypothesis of linear relationships may be invalid when congestions build up in the network, resulting in non-linearities. The dynamic assignment would be more appropriate for working with congestion building processes that would be captured by the analytical expansion of the dynamic assignment matrix. Frederix et al. (2013) offer a relevant theoretical contribution, while Toledo and Kolechkina (2013) provide more insights to apply it to large networks.

A simpler approach is the modification of the Spiess procedure performed by Ros-Roca et al. (2020). They used, on the one hand, a first-order approach to the assignment matrix that is provided by replacing the static assignment at the lower level by a dynamic traffic assignment. On the other hand, an ad hoc reformulation of the analytical calculation of the gradient that is suitable for a straightforward calculation of the step length at each iteration.

The following notation is used for the dynamic analytical extension from this point until the end of the chapter:

  • \(I\) is the set of OD pairs.

  • \(\mathcal{T}=\left\{1, \dots ,T\right\}\) is the set of time intervals.

  • \(L\) is the set of links in the network. \(\widehat{L}\subseteq L\) is the subset of links that have sensors.

  • \({\widehat{y}}_{l,t}\) are the measured flow counts at link \(l\) during time period \(t\). \({y}_{l,t}\) are the corresponding simulated flow counts, \(\forall l\in \widehat{L}\subseteq L\) and \(\forall t\in \mathcal{T}\). \(Y=\left({y}_{l,t}\right)\) and \(\widehat{Y}=\left({\widehat{y}}_{l,t}\right)\) are the link flow counts in vector form.

  • \({x}_{n,r}\) are the OD flows for \(n-\mathrm{th}\) OD pairs departing during time period \(r\), \(\forall n\in I\) and \(\forall r\in \mathcal{T}\).\(X=\left({x}_{n,r}\right)\) are the OD flows in vector form.

  • \({a}_{n,r}^{l,t}\) is the flow proportion of the \(n-\mathrm{th}\) OD pair, \(n\in I\), departing at time period \(r\in \mathcal{T}\) and captured by link \(l\in \widehat{L}\) at time period \(t\in \mathcal{T}\). \(A=\left[{a}_{n,r}^{l,t}\right]\) is the assignment matrix.

Given a network with a set of links \(L\), a set \(I\) of OD pairs, and the set of time periods \(\mathcal{T}\), the goal of the dynamic OD-matrix estimation problem is to find a feasible vector (OD-matrix) \({X}^{*}\in \mathrm{G}\subseteq {\mathbb{R}}_{+}^{I\times \mathcal{T}}\), where \({X}^{*}=\left({x}_{n,r}^{*}\right), n\in I, r\in \mathcal{T}\) consists of the demands for all OD pairs. It can be assumed that the assignment of the time-sliced OD-matrices to the links of the network should be done according to an assignment proportion matrix \(A=\left[{a}_{n,r}^{l,t}\right], \forall l\in L, \forall n\in I, \forall r,t\in \mathcal{T}\), where each element in the matrix is defined as the proportion of the OD demand \({x}_{n,r}\) that uses link \(l\) at time period \(t\). The notation \(A=A\left(X\right)\) is used to indicate that, in general, these proportions depend on the demand. The linear relationship between the flow count on a link and the given OD pair has a matrix form, which thus sets the vector of detected flows as \(Y=\left({Y}_{1}, \dots ,{Y}_{T}\right)=({y}_{\mathrm{1,1}}, \dots , {y}_{L,1}, \dots ,{y}_{1,T}, \dots {y}_{L,T}\)) and the vector of OD flows as \(X=\left({X}_{1}, \dots ,{X}_{T}\right)=({x}_{\mathrm{1,1}}, \dots , {x}_{N,1}, \dots ,{x}_{1,T}, \dots , {x}_{N,T})\). Expressing this relationship as the matrix product (Eq. 7.42), \(A\left(X\right)\) is now (Eq. 7.62):

$$A\left(X\right)=\left(\begin{array}{cc}\begin{array}{ccc}{A}^{\mathrm{1,1}}& 0& \cdots \\ {A}^{\mathrm{1,2}}& {A}^{\mathrm{2,2}}& 0\\ \vdots & \ddots & \ddots \end{array}& \begin{array}{c}0\\ \vdots \\ 0\end{array}\\ \begin{array}{ccc}{A}^{1,\mathrm{T}}& \cdots & {A}^{T-1,T}\end{array}& {A}^{T,T}\end{array}\right) \mathrm{where }{A}^{r,t}=\left(\begin{array}{ccc}{a}_{1,r}^{1,t}& \cdots & {a}_{N, r}^{1,t}\\ \vdots & \ddots & \vdots \\ {a}_{1,r}^{L,t}& \cdots & {a}_{N,r}^{L,t}\end{array}\right)$$
(7.62)

\({a}_{n,r}^{l,t}\) represents the proportion of OD flow departing at time \(r\), \({x}_{n,r}\), passing through link \(l\) at time \(t\), \({y}_{l,t}\). \({A}^{r,t}\) represents the assignment matrix for the departing flows at time window \(r\) detected at time window \(t.\) Therefore, \(A\) is a lower-diagonal matrix, because OD flows departing at time \(r\) cannot pass through link \(l\) at time \(t < r\).

This linear mapping between the link flows and the OD flows is indeed the first term in the Taylor expansion of the relationship between link flows and OD flows, where additional terms capture the assignment matrix’s sensitivity to changes in the OD flows, path choice, and congestion propagation effects (Frederix et al. 2011, 2013; Toledo and Kolechkina 2013). Let \(X\mathrm{^{\prime}}\) be in the neighborhood of \(X\). Then, the Taylor expansion is (Eq. 7.63):

$$\begin{aligned} y_{{l,t}} & = \sum\limits_{{n \in I}} {\sum\limits_{{r = 1}}^{t} {a_{{n,r}}^{{l,t}} } } \left( {X^{\prime}} \right)*x_{{n,r}}^{\prime } + \sum\limits_{{n \in I}} {\sum\limits_{{r = 1}}^{t} {\frac{{\partial y_{{l,t}} \left( {X^{\prime}} \right)}}{{\partial x_{{n,r}} }}} } *\left( {x_{{n,r}} - x_{{n,r}}^{\prime } } \right) = \\ & = \sum\limits_{{n \in I}} {\sum\limits_{{r = 1}}^{t} {a_{{n,r}}^{{l,t}} } } \left( {X^{\prime}} \right)*x_{{n,r}}^{\prime } \\ & \quad + \sum\limits_{{n \in I}} {\sum\limits_{{r = 1}}^{t} {\left. {\frac{{\partial \left[ {\sum\nolimits_{{n \in I}} {\sum\nolimits_{{r = 1}}^{t} {a_{{n,r}}^{{l,t}} \left( {X^{\prime}} \right)*x_{{n,r}} } } } \right]}}{{\partial x_{{n,r}} }}} \right|} } _{{X^{\prime}}} *\left( {x_{{n,r}} - x_{{n,r}}^{\prime } } \right) = \\ & = \sum\limits_{{n \in I}} {\sum\limits_{{r = 1}}^{t} {a_{{n,r}}^{{l,t}} } } \left( {X^{\prime}} \right)*x_{{n,r}}^{\prime } \\ & \quad + \sum\limits_{{n \in I}} {\sum\limits_{{r = 1}}^{t} {\left( {x_{{n,r}} - x_{{n,r}}^{\prime } } \right)} } *\left[ {\sum\limits_{{n^{\prime} \in I}} {\sum\limits_{{r^{\prime} = 1}}^{t} {\left. {\frac{{\partial a_{{n^{\prime},r^{\prime}}}^{{l,t}} \left( {X^{\prime}} \right)}}{{\partial x_{{n,r}} }}} \right|} } _{{X^{\prime}}} *x^{\prime}_{{n^{\prime},r^{\prime}}} } \right] \\ \end{aligned}$$
(7.63)

This enables redefining Spiess’ approach to the dynamic case by simply using the first term in the above Taylor expansion. It does not account for the propagation effects, but it explicitly considers time dependencies. The traffic assignment problem at the lower level must now be a dynamic traffic assignment (DTA). Then, the time periods for the entire formulation must be considered as follows (Eqs. 7.64 and 7.65):

$$\mathrm{Min}\,\,Z\left(X\right)=\frac{1}{2}*{\sum }_{t\in \mathcal{T}}{\sum }_{l\in \widehat{L}}{\left(\left({\sum }_{n\in I}{\sum }_{r = 1}^{t}{a}_{n,r}^{l,t}{*x}_{n,r}\right)-{\widehat{y}}_{l,t}\right)}^{2}$$
(7.64)
$$s.t.\,\, {a}_{n,r}^{l,t}=\mathrm{Assignment}\left(X\right)$$
(7.65)
$${x}_{n,r}\ge 0$$

where \({a}_{n,r}^{l,t}\) is the assignment matrix described before. Therefore, the linear combination inside the brackets is the simulated flow \({y}_{l,t}\), applying (Eq. 7.66):

$$\frac{\partial {y}_{l,t}}{\partial {x}_{n,r}}={a}_{n,r}^{l,t}$$
(7.66)

As in Spiess (1990), the chain rule can be used to obtain the gradient of the objective function (Eq. 7.67):

$$\frac{\partial Z}{\partial {x}_{n,r}}= {\sum }_{t\in \mathcal{T}}{\sum }_{l\in \widehat{L}}\frac{\partial {y}_{l,t}}{\partial {x}_{n,r}}*\left({y}_{l,t} - {\widehat{y}}_{l,t}\right)= {\sum }_{t\in \mathcal{T}}{\sum }_{l\in \widehat{L}}{a}_{n,r}^{l,t}*\left({y}_{l,t} - {\widehat{y}}_{l,t}\right)$$
(7.67)

We obtain similar equations finding the optimal step size by using the same procedure (Eq. 7.68):

$${y}_{l,t}^{\mathrm{^{\prime}}}= \frac{d {y}_{l,t}}{d \lambda }= {\sum }_{r=1}^{t}{\sum }_{n\in I}\frac{d {x}_{n,r}}{d \lambda }*\frac{\partial {y}_{l,t}}{\partial {x}_{n,r}}= {\sum }_{r=1}^{t}{\sum }_{n\in I}-{x}_{n,r}*\frac{\partial Z}{\partial {x}_{n,r}}*\frac{\partial {y}_{l,t}}{\partial {x}_{n,r}}$$
(7.68)

The optimal step length λ can be calculated solving the 1-dimensional optimization problem in Eq. 7.69 and whose solution is given by Eq. 7.70:

$${\mathrm{Z}}^{\mathrm{^{\prime}}}\left(\lambda \right)= {\sum }_{t\in \mathcal{T} }{\sum }_{l\in \widehat{L}}{y}_{l,t}^{\mathrm{^{\prime}}}*\left({\tilde{y }}_{l,t}-{\widehat{y}}_{l,t}+\lambda *{y}_{l,t}^{\mathrm{^{\prime}}}\right)=0$$
(7.69)
$${ \lambda }^{*}= \frac{-{\sum }_{t\in \mathcal{T}}{\sum }_{l\in \widehat{L}}{y}_{l,t}^{^{\prime}}*({y}_{l,t}-{\widehat{y}}_{l,t})}{{\sum }_{t\in \mathcal{T}}{\sum }_{l\in \widehat{L}}{{y}_{l,t}^{^{\prime}}}^{2}}$$
(7.70)

Then, the iterative procedure described by Spiess (1990) can be used in DTA using these new equations, which are expanded with the time windows. In addition, this procedure can be improved by adding a second term in the objective function to compare it with a historical OD-matrix. If the quadratic function is used, and replacing \({w}_{1}\) and \({w}_{2}\) by \(w = {{w_{2} } \mathord{\left/ {\vphantom {{w_{2} } {w_{1} }}} \right. \kern-\nulldelimiterspace} {w_{1} }}\) for simplification, Eq. 7.71 arises

$$\begin{aligned} {\text{Min}}\,\,Z & = \frac{1}{2}*\sum _{{t \in {\mathcal{T}}}} \sum _{{l \in \hat{L}}} \left( {\left( {\sum _{{n \in I}} \sum _{{r = 1}}^{t} a_{{n,r}}^{{l,t}} *x_{{n,r}} } \right) - \hat{y}_{{l,t}} } \right)^{2} \\ & \quad + \frac{w}{2}*\sum _{{r \in {\mathcal{T}}}} \sum _{{n \in I}} \left( {x_{{n,r}} - x_{{n,r}}^{H} } \right)^{2} \\ \end{aligned}$$
(7.71)

In this case, Eq. 7.47 is updated, resulting in Eq. 7.72:

$$\begin{aligned} \frac{{\partial Z}}{{\partial x_{{n,r}} }} & = \sum _{{t \in {\mathcal{T}}}} \sum _{{l \in \hat{L}}} \frac{{\partial y_{{l,t}} }}{{\partial x_{{n,r}} }}*\left( {y_{{l,t}} - \hat{y}_{{l,t}} } \right) + \frac{w}{2}*x_{{n,r}} \\ & = \sum _{{t \in {\mathcal{T}}}} \sum _{{l \in \hat{L}}} a_{{n,r}}^{{l,t}} *\left( {y_{{l,t}} - \hat{y}_{{l,t}} } \right) + \frac{w}{2}*x_{{n,r}} \\ \end{aligned}$$
(7.72)

Therefore, the Iterative Dynamic Spiess Procedure would be (Eq. 7.73):

$${X}_{i}^{\left(k+1\right)}= \left\{\begin{array}{cc}{X}_{i}^{H}& \mathrm{for}\,\, k=0\\ {X}_{i}^{\left(k\right)}*\left(1-{\lambda }^{\left(k\right)}*{\left[\frac{\partial Z\left(X\right)}{\partial {X}_{i}}\right]}_{{X}_{i}^{\left(k\right)}}\right)& \mathrm{for}\,\, k>0\end{array}\right.$$
(7.73)

The use of Euclidean distances to measure the distance between the estimated OD, \(X\), and the historical \({X}^{H}\) has been discussed critically in Frederix et al. (2013). For example, Djukic (2014) shows that using a Euclidean distance term can result in two matrices that have very different structures but maintain the same distance value with respect to the reference matrix. Other distance measures have been suggested, for example, in Ros-Roca et al. (2020). Although additional measurements are expected to improve the outcome of the OD-estimation in terms of structural similarity, the analytic approaches do not seem capable of adding measurements different from link counts.

The resort to the classical entropy function, as in the original analytical formulations, is an appealing option because of its structural meaning. With this approach, Eqs. 7.71 and 7.72, respectively, become Eqs. 7.74 and 7.75:

$$\begin{aligned} {\text{Min}}\,\,Z & = \frac{1}{2}*\sum _{{t \in {\mathcal{T}}}} \sum _{{l \in \hat{L}}} \left( {\left( {\sum _{{n \in I}} \sum _{{r = 1}}^{t} a_{{n,r}}^{{l,t}} *x_{{n,r}} } \right) - \hat{y}_{{l,t}} } \right)^{2} \\ & \quad + \frac{w}{2}*\sum _{{r \in {\mathcal{T}}}} \sum _{{n \in I}} x_{{n,r}} *{\text{log}}\left( {\frac{{x_{{n,r}} }}{{x_{{n,r}}^{H} }}} \right) \\ \end{aligned}$$
(7.74)
$$\frac{\partial Z}{\partial {x}_{n,r}}= {\sum }_{t\in \mathcal{T}}{\sum }_{l\in \widehat{L}}{a}_{n,r}^{l,t}*\left({y}_{l,t} - {\widehat{y}}_{l,t}\right)+\frac{w}{2}*\left(\mathrm{log}\left(\frac{{x}_{n,r}}{{x}_{n,r}^{H}}\right)+1\right)$$
(7.75)

5 Practical Applications for Traffic Management

Because DTA is a core component of most Dynamic Traffic Management Systems and the Dynamic Origin–Destination Matrices are the main input to DTA, algorithms to numerically implement DODME approaches become a basic procedure in all of them. The main approaches are:

  • The strict analytical dynamic approaches based on State-Space Modeling (Ashok and Ben-Akiva 1993, 2002), which are the basis of DynaMIT (Ben-Akiva et al. 2020).

  • The numerical approximations of analytical optimization approaches, as the ones proposed by Frederix et al. (2011), Frederix et al. (2013), Toledo and Kolechkina (2013), or Ros-Roca et al. (2020). Other variants are those studied by Djukic et al. (2017, 2018, 2019), currently implemented in Aimsun Live, Aimsun (2020), or OPTIMA.

  • Simulation-based approaches: Stochastic Perturbation Stochastic Approximation (SPSA).

5.1 Analytical Approaches Based on State-Space Modeling

The approach taken in DynaMIT to estimate dynamic OD-matrices, aimed at providing support to real-time management decisions, is different from the bi-level optimization considered so far. DynaMIT formulates the real-time dynamic OD-estimation based on the Kalman Filtering framework proposed by Ashok and Ben-Akiva (1993). The basic information, as in all other approaches, is that contained in the historical OD-matrix, which is combined with traffic count data from the counting stations along the network. Other differential aspects of the estimation proposed in DynaMIT are the use it makes of each day’s estimate to update the original historical OD estimate in a learning process. These updated historical OD-matrices contain rich information about the latent factors that affect travel demand and its daily variations, which the approach tries to capture. To achieve this goal, this approach uses as state variables the deviations of the OD flows from the historical OD estimates, instead of the actual flows themselves.

The underlying hypothesis states that (Antoniou et al. 2007) modern surveillance systems generate data and historical information that can be used for the estimation and prediction of the time evolving demand patterns represented by OD-matrices. The wealth of information contained in these off-line values, which affects trip making and traffic dynamics, as well as their temporal and spatial evolution, can be incorporated into the DODME process as a priori estimates.

The approach based on Kalman filtering assumes an autoregressive procedure that provides a prediction tool consistent with the estimation process. That autoregressive procedure models the temporal relationships among deviations in OD flows, also accounting for unobserved factors that are correlated over time, as, for instance, weather effects. A proper approach that incorporates this information and its associated errors in the estimation process considers transport systems as dynamic systems and resorts to the state-space modeling approach. The formulation of the DODME problem discussed so far shows that the most critical issue is the calculation of the assignment matrix, \({a}_{ijr}^{lt},\) mapping the observed link flows, \({y}_{lt}\), and the unobserved OD flows, \({x}_{ijr}\). This matrix must be estimated at each step of the iterative processes by solving numerically the corresponding mathematical model (Eq. 7.76):

$${y}_{lt}={\sum }_{\left(i,j\right)\in I}{\sum }_{r = 1}^{t}{a}_{ijr}^{lt}{*x}_{ijr}\,\, \forall l\in \widehat{L}, t\in T$$
(7.76)

The dynamic problem formulation assumes that the assignment matrix depends on link and path travel times and on traveler route choice factors, being all of them time-varying. Precisely, time variations are captured by the time indices in Eq. 7.76. The mapping can be interpreted as the contribution, i.e., the fraction, of the OD flow of pair \((i,j)\) departing origin \(i\) with destination \(j\), at time interval \(r\), that flows across detectors located at link \(l\), during time interval \(t\).

Ashok and Ben-Akiva (2002), in an extension to their previous seminal work in Ashok and Ben-Akiva (1993), make the observation that “all quantities are imperfectly observed, thereby they introduce errors into the OD estimation process, erroneous travel times and/or route choice fractions resulting in an imperfect assignment matrix”. Therefore, they propose reformulating Eq. 7.76 as Eq. 7.77:

$${y}_{lt}={\sum }_{\left(i,j\right)\in I}{\sum }_{r = 1}^{t}{a}_{ijr}^{lt}*{x}_{ijr}+{\nu }_{lt}\,\, \forall l\in \widehat{L}, t\in T$$
(7.77)

where \({\nu }_{lt}\) is the measurement error. The reformulation of the DODME as a state-space model involves two types of equations:

  • Transition equations that capture the evolution of the state vector over time.

  • Measurement equations that, according to Antoniou et al. (2007), “capture a mapping of the state vector on the measurements: a priori values of the model parameters provide direct measurements of the unknown parameters”.

Let \({X}_{k}\) be the vector of state variables whose values define the state of the system at time interval \(k\). A Kalman filter iterates between an updating (prediction) of the system’s state at time \(k\), obtained from the system’s state at time \(k-1\), and a correction based on an update of the measurements of the system. This corresponds to a process model that models the transformation of the system’s state in terms of a linear stochastic difference equation (Eq. 7.78):

$${X}_{k}=\Phi {X}_{k-1}+{w}_{k-1}$$
(7.78)

where \(\Phi\) is the transition matrix from system’s state at time \(k-1\) to system’s state at time \(k\), and \({w}_{k-1}\) is the process error term. Additionally, a measurements model describes the relationship between the process changing the system’s state and the system measurements (Eq. 7.79):

$${Y}_{k}=A*{X}_{k}+{v}_{k}$$
(7.79)

Assuming initial estimates of the state vector \({\widehat{X}}_{k-1}\) and of the error covariance \({P}_{k-1}\) at time interval \(k-1\), the prediction phase consists of two steps: (i) a state projection step (Eq. 7.80) and (ii) a covariance projection step (Eq. 7.81), respectively, projecting forward the state estimate or the covariance from time step \(k-1\) to step \(k\):

$${\widehat{X}}_{k}^{k-1}=\phi *{\widehat{X}}_{k-1}^{k-1}+{w}_{k-1}$$
(7.80)
$${P}_{k}^{k-1}=\Phi *{P}_{k-1}^{k-1}{*\Phi }^{T}+Q$$
(7.81)

The correction regarding the measurements update consists of three steps: (i) the computation of the Kalman Gain (Eq. 7.82), (ii) the update of the error covariance (Eq. 7.83) and (iii) the update of the state estimates with the measurements \({Z}_{k}\) (Eq. 7.84):

$${K}_{k}={P}_{k}^{k-1}*{A}^{T}*{\left(A{*P}_{k}^{k-1}{*A}^{T}+R\right)}^{-1}$$
(7.82)
$${P}_{k}^{k}=\left(I-{K}_{k}*A\right)*{P}_{k}^{k-1}$$
(7.83)
$${\widehat{X}}_{k}^{k}={\widehat{X}}_{k}^{k-1}+{K}_{k}*\left({Y}_{k}-A{\widehat{*X}}_{k}^{k-1}\right)$$
(7.84)

where \({w}_{k}\) and \({v}_{k}\), the process and measurement errors, are independent, white noise, and normally distributed (Eqs. 7.85 and 7.86):

$$p\left(w\right)\sim N\left(0,Q\right)$$
(7.85)
$$p\left(v\right)\sim N\left(0,R\right)$$
(7.86)

Q and R are, respectively, the covariance matrices of the process and the measurement errors.

When applying Kalman filtering to DODME, the state vector is the vector \(X\) of unknown OD flows, and the transition equation represents an autoregressive process. However, Ashok and Ben-Akiva (1993) state that “an autoregressive process can only capture interdependencies among OD flows. It does not include structural information about trip patterns, which are a function of spatial and temporal distribution of activities, as well as of the characteristics of the transportation system”. Therefore, it is desirable to modify the model in such a way that it also incorporates structural information. This information could be, for example, that contained in a prior estimate. For instance, a historical OD-matrix \({X}^{H}\) provided by a reliable surveillance system. It can be accommodated in the model by reformulating the state vector in terms of the deviations from that historical OD flows. The transition equation would then be as follows (Eq. 7.87):

$${X}_{ij\left(t+1\right)}-{X}_{ij\left(t+1\right)}^{H}=\sum_{r=t-s}^{t}\sum_{\left(p,q\right)\in I}{f}_{ijt}^{pqr}*\left({X}_{pqr}-{X}_{pqr}^{H}\right)+{w}_{ijt}$$
(7.87)

where \({f}_{ijt}^{pqt}\) describes the effect of the deviation \(\left({X}_{pqr}-{X}_{pqr}^{H}\right)\) on the deviation \(\left({X}_{ij\left(t+1\right)}-{X}_{ij\left(t+1\right)}^{H}\right).\) The first one is the deviation of the OD flow from origin \(p\) to destination \(q\) and departing at time \(r\). Equivalently, the second one is the deviation of the OD flow from origin \(i\) to destination \(j\) and departing at time \(t+1.\) In this second deviation, \({w}_{ijt}\) is a random term error for OD pair \((i,j)\) at time \(t\) and \(s\) is the order of the autoregressive process, that is, the number of lagged OD flow deviations assumed to affect the OD deviation in interval \(t+1\). Equation 7.87models the temporal relationship among deviations in OD flows, capturing the correlation over time among deviations which arise from unobserved factors that correlated over time. It assumes dependency of deviations corresponding to one OD pair on deviations corresponding to other OD pairs in prior periods” (Ashok and Ben-Akiva 1993). It can be rewritten in matrix form (Eq. 7.88):

$$\Delta {X}_{t+1}={X}_{t+1}-{X}_{t+1}^{H}=\sum_{r=t-s}^{t}{\Phi }_{t}^{r}*\left({X}_{pqr}-{X}_{pqr}^{H}\right)+{w}_{t}$$
(7.88)

In the general case, the computation of the transition matrix \({\Phi }_{t}^{r}\) involves estimating linear regression models for each OD pair and for each time interval. However, depending on the network topology, some of these correspondences may be ignored and thus the matrix is simplified. There are also some other hypotheses enabling further simplifications, as, for example, the assumption that the autoregressive process remains constant with respect to \(t\). This implies that it depends only on the difference \((t-s)\) and not on the individual values of \(t\) and \(s\). Equation 7.77 can be rewritten accordingly to get the measurements equation in terms of deviations with respect to historical values \({y}_{lt}^{H}\), as in Eq. 7.89:

$${y}_{lt}-{y}_{lt}^{H}={\sum }_{\left(i,j\right)\in I}{\sum }_{r =\mathrm{ t}-\mathrm{s}}^{t}{a}_{ijr}^{lt}*\left({x}_{ijr}-{x}_{ijr}^{H}\right)+{\nu }_{lt}\,\, \forall l\in \widehat{L}, t\in T$$
(7.89)

It can also be expressed in matrix form (Eq. 7.90):

$${\Delta {Y}_{t}=Y}_{t}-{Y}_{t}^{H}={\sum }_{r =\mathrm{ t}-\mathrm{s}}^{t}{A}_{r}^{t}*\left({X}_{r}-{X}_{r}^{H}\right)+{\nu }_{t}$$
(7.90)

where \({\nu }_{t}\) is the measurements random error vector at time\(t\). Error terms \({w}_{t}\) and \({\nu }_{t}\) are uncorrelated, which means that\(E\left[{w}_{t}\right]=E\left[{v}_{t}\right]=0\). The variance–covariance matrices are \({Q}_{t}\) and \({R}_{t },\) respectively.

There is an additional advantage in reformulating the Kalman filtering in terms of deviations as state variables and measurements, since the traffic flow variables have skewed distributions (Antoniou et al. 2007). However, the deviations from these variables from available estimates have symmetric distributions and, hence, are more amendable to approximations to normal distributions. This is a useful property in terms of Kalman filtering (Kalman 1960; Gelb 1974). Then, assuming an initial state of the system with \(\Delta {X}_{0}\), with mean \(\Delta {\overline{X} }_{0}\), and variance–covariance \({P}_{0}\), the Kalman filtering algorithm for DODME, for a time horizon \(T\) divided into \(N\) intervals of equal length, is

figure a

Many alternative versions of these basic algorithms resorting to variants of Kalman filtering have been proposed, as those in Ashok and Ben-Akiva (2002), Hu et al. (2001), Antoniou et al. (2007), Lin and Chang (2007). In essence, many of the most appealing ones deal with the calculation of matrices \(\Phi\) and \(A\). That is, with the characteristics of the autoregressive model, the mapping OD paths and the links flows, being these latter the most critical. Antoniou et al. (2007) propose nonlinear relationships for the measurement equations, generically defined as (Eq. 7.91):

$${\Delta M}_{t}={M}_{t}-{M}_{t}^{H}=\mathcal{S}\left(\Delta {X}_{t}\right)-{M}_{t}^{H}+{v}_{t}$$
(7.91)

where \({M}_{t}\) is the vector of measurements at time \(t\), \(\mathcal{S}\left(\Delta {X}_{t}\right)\) is a simulation model and \({M}_{t}^{H}=\mathcal{S}\left(\Delta {X}_{t}^{H}\right)\). When traffic flow models are used to simulate the time progression of traffic flows through the network, they can be approximated by continuous functions \(h\left(x\right)\)(Antoniou 2004). These functions can be linearized to approximate the measurement equation as in Eq. 7.92:

$${H}_{t}={\left.\frac{\partial h\left({x}^{*}\right)}{\partial {x}^{*}}\right|}_{{x}^{*}={X}_{t}^{t-1}}$$
(7.92)

An example based on this linearization included in Antoniou (2004) and Antoniou et al. (2007) is the following Extended Kalman Filter (EKF):

figure b

Equation 7.91 also opens the door to the consideration of additional measurements in Kalman filters other than traffic variables like the link flow counts from fixed counting stations (e.g., inductive loop detectors, magnetometers…). For example, the travel times between pairs of points in the network, as measured by ICT applications (e.g., Bluetooth, GPS…).

5.2 Aimsun Live

A professional software platform for traffic management with a DTA as core engine and that haves as main input dynamic OD-matrices is Aimsun Live (Aimsun 2020). The DODME process implemented in Aimsun Live is a variant of the numerical approximations of analytical optimization approaches discussed Sect. 7.4. Djukic et al. (2017, 2018, 2019) reformulate the bi-level approach (Eq. 7.61) as in Eq. 7.93:

$$\mathrm{Min}\,\, Z\left(X\right)= \alpha *{\Vert X-{X}^{H}\Vert }^{2}+\left(1-\alpha \right)*{\Vert A\left(X\right)*X-\widehat{Y}\Vert }^{2}$$
(7.93)
$$s.t.\,\, X\ge 0$$

Assuming that the flow estimates are provided by the DTA at the lower level, i.e., at the algorithmic framework in Fig. 7.4. implemented in Aimsun (2020), then \(Y=DTA\left(X\right)\). This allows a Taylor expansion as in Eq. 7.63. Then, Djukic et al. (2018) propose a modified bi-level approach that, at iteration \(k\), replaces at the upper level the objective function in Eq. 7.95 by the approximation in Eq. 7.94:

$${Z}_{k}\left(X\right)=\alpha *{\Vert X-{X}^{H}\Vert }^{2}+{\Vert \widehat{Y}-{Y}_{k}-{A}_{k}*\left(X-{X}_{k}\right)\Vert }^{2}$$
(7.94)

where at iteration \(k\), \({X}_{k}\) is the estimated OD demand vector, \({A}_{k}\) the assignment matrix estimated from Aimsun’s DTA using Eq. 7.63 and \({Y}_{k}\) the vector of estimated link flow counts in the subset of links with counting stations. Aimsun’s DTA estimates \({A}_{k}\) by stopping the Taylor expansion at either the first or the second term, depending on the desired degree of accuracy or on the affordable computing cost. Djukic et al. (2018) propose to solve the approximated upper level optimization problem (Eq. 7.94) with non-negative variable constraints, using a gradient descent method. Particularly, one using as descent direction the one defined by the following gradient (Eq. 7.95):

$${d}_{k}=-\nabla {Z}_{k}\left(X\right)$$
(7.95)

This gradient can be calculated from (7.94) as

$$\begin{aligned} \nabla Z_{k} \left( X \right) & = 2*\alpha *\left( {X - X^{H} } \right) \\ & \quad + 2*\left( {A_{k}^{T} *A_{k} *X - A_{k}^{T} *\hat{Y} + A_{k}^{T} *Y_{k} - A_{k}^{T} *A_{k} *X_{k} } \right) \\ \end{aligned}$$
(7.96)

Then, the new OD-matrix for the lower level iteration \(k+1\) is given by Eq. 7.97:

$${X}_{k+1}= {X}_{k}+{\lambda }_{k}{*d}_{k}$$
(7.97)

where \({\lambda }_{k}\) is the optimal step length in the gradient movement along the descent direction. The gradient procedure to optimize Eq. 7.94 is also iterative. It recalculates the step size at each iteration until either a convergence criterion is met or a maximum number \(M\) of iterations is reached, whatever occurs first. At gradient iteration \(m,\) the estimated demand is \({X}_{k}^{m}\), the search direction at this iteration is given by \(\nabla Z\left({X}_{k}^{m}\right)\) (calculated from Eq. 7.96) and the step size calculation can be calculated solving Eq. 7.98, using any of the available line search procedures (Bazaraa et al. 1993):

$${\lambda }_{k}^{m}={Min}_{\lambda } Z\left[{X}_{k}^{m}-\lambda *\nabla Z\left({X}_{k}^{m}\right)\right]$$
(7.98)

However, since \(Z\left(X\right)\) is quadratic, the optimal step can be computed analytically using Eq. 7.99:

$${\lambda }_{k}^{m}=\frac{{\Vert \nabla Z\left({X}_{k}^{m}\right)\Vert }^{2}}{{\Vert \nabla Z\left({X}_{k}^{m}\right)\Vert }^{2}+{\Vert {A}_{k}*\nabla Z\left({X}_{k}^{m}\right)\Vert }^{2}}$$
(7.99)

The proposed algorithm iteratively updates the demand at iteration \(k+1\) from the demand at the previous iteration \(k,\) until some convergence criteria are satisfied. The algorithm is modified with respect to the usual approaches to better fit the requirements for congested large-scale networks. The proposed modification relaxes the assumption on link flow proportions provided by the DTA assignment matrix by computing the marginal effects of the demand deviations on link flows given by Eq. 7.63. Therefore, it reduces the number of OD variables in this Eq. 7.63 by including only those OD pairs whose change in demand values causes significant deviations in the link flows. The modified algorithm is, according to Djukic et al. (2018), as follows:

figure c

The computational testing of this proposed modified bi-level optimization framework, which solves the high-dimensionality of nonlinear OD-estimation problems by computing the marginal effects only for the most significant OD pairs with respect to traffic observations, allows the modeler to control the trade-off between the simplicity of the model and the level of realism. It is thus very efficient for practical purposes.

5.3 Simulation-Based Approaches: Stochastic Perturbation Stochastic Approximation (SPSA)

The optimization problem in Eqs. 7.58 and 7.59, as already mentioned, is highly underdetermined because there are many more variables than equations in the system. In other words, \(X\in {\mathbb{R}}^{\left|I\right|\times T}, Y\in {\mathbb{R}}^{\left|\widehat{L}\right|\times T}\) and \(\left|I\right|\gg |\widehat{L}|\). Therefore, the problem is very sensitive to the quantity of data and the detection layout in the real network. As the availability of new measurements like those provided by smartphone and GPS localization allows calculating travel times between arbitrary pairs of points, the use of these data seems to be a promising approach for reducing the aforementioned underdetermination. An apparently straightforward extension of the bi-level formulation in Eqs. 7.58 and 7.59 accounting for measured, \(\widehat{tt},\) and estimated travel times, \(tt\), would be the expansion of the objective function adding a third term, \({F}_{3}\left(tt,\widehat{tt}\right).\) This term would be aimed at minimizing the distance between measured and estimated travel times between arbitrary pairs of points in the network, assuming that trips are most likely made via the shortest paths. The hypothetical formulation (Ros-Roca et al., 2021a) would be (Eqs. 7.1007.102):

$$\mathrm{Min}\,\,Z\left(X\right)={w}_{1}{*F}_{1}\left(X,{X}^{H}\right)+{w}_{2}{*F}_{2}\left(Y,\widehat{Y}\right)+{w}_{3}*{F}_{3}\left(tt,\widehat{tt}\right)$$
(7.100)
$$s.t.\,\, Y\left(X\right)=\mathrm{Assignmt}\left(X\right)$$
(7.101)
$$tt\left(X\right)=\mathcal{F}\left(X\right)$$
(7.102)
$$X\in \Omega$$

Assuming that \(Y\left(X\right)=\mathrm{Assignmt}\left(X\right)=A\left(X\right)*X,\) that is, the relationship between the estimated link flows and the estimated OD-matrix defined by the assignment, the problem can be reformulated as follows (Eqs. 7.103 and 7.104):

$$\mathrm{Min}\,\,Z\left(X\right)={w}_{1}{*F}_{1}\left(X,{X}^{H}\right)+{w}_{2}{*F}_{2}\left(A\left(X\right)X,\widehat{Y}\right)+{w}_{3}{*F}_{3}\left(tt,\widehat{tt}\right)$$
(7.103)
$$s.t.\,\, tt\left(X\right)=\mathcal{F}\left(X\right)$$
(7.104)
$$X\in \Omega$$

The analytical relationship in Eq. 7.104 either does not exist or is unclear. However, in practice, travel times can be estimated from it if the assignment is a DTA. Therefore, it can be accepted that some kind of relationship exists and the relationship \(tt\left(X\right)\sim \mathrm{Assignmt}\left(X\right)\) is assumed. The problem to be solved is again reformulated as (Eq. 7.105):

$$\mathrm{Min}\,\,Z\left(X\right)={w}_{1}*{F}_{1}\left(X,{X}^{H}\right)+{w}_{2}*{F}_{2}\left(Y,\widehat{Y}\right)+{w}_{3}*{F}_{3}\left(tt,\widehat{tt}\right)$$
(7.105)
$$s.t.\,\, \left(Y, tt\right)=\mathrm{Assignmt}\left(X\right)$$
$$X\in \Omega$$

As mentioned before, it is unclear how these new measurements can be included in the analytical formulations. Nevertheless, it seems rather easy to deal with them by using approaches based on derivative-free optimization methods that approximate the descent direction based on simulation. Among them, simulation optimization techniques are especially suited to deal with optimization problems that cannot be solved with the usual analytical algorithms. Some reasons are:

  • The objective function cannot be analytically expressed as a function of parameters because its evaluation requires a simulation. Therefore, it is not differentiable in terms of the parameters.

  • The time cost of evaluating the objective function is expensive, as it requires having simulated data for each evaluation of the function.

Simulation-based optimization techniques can be generically formulated assuming that there is a mathematical model \(\mathcal{M}\) with a set of parameters \(P=\left\{{p}_{1},{p}_{2},\dots ,{p}_{N}\right\}\) and an objective function \(\mathcal{F}\left(\mathcal{R}, S\right)\) defined as the sum of error functions between real observations \(\mathcal{R}\) and the corresponding simulated data \(S\). The purpose of \(\mathcal{M}\) is then to provide (Eq. 7.106):

$$\mathrm{Min}\,\,\mathcal{F}\left(\mathcal{R}, S\right)$$
(7.106)
$$s.t.\,\, P\in\Omega {\subseteq {\mathbb{R}}}^{N}$$

When \(\mathcal{F}\left(\mathcal{R}, S\right)\) (i) is on-convex, nonlinear, (ii) cannot be represented analytically as a function of the set of parameters \(P\) and (iii) has to be evaluated by simulation.

There is a wide range of different simulation optimization techniques to solve Eq. 7.106. For example, Nelder-Mead, SNOBFIT, and SPSA are optimization techniques, either derivative free or approximating the gradient, that evaluate it using simulation. Osorio and Linsen (2015) make an approximation of the upper level function by building a metamodel that can be solved analytically. Its conceptual diagram is depicted in Fig. 7.5.

Fig. 7.5
figure 5

Conceptual diagram of the simulation-based optimization approach of Osorio and Linsen (2015)

Fig. 7.6
figure 6

Conceptual diagram of a SPSA approach adding travel times

Simultaneous Perturbation Stochastic Approximation (SPSA) (Spall 1992) is commonly used in OD-matrix estimation (Cipriani et al. 2011; Cantelmo et al. 2014; Antoniou et al. 2015; Lu et al. 2015; Ros-Roca et al. 2020) and it can easily account for additional measurements (Bullejos et al. 2014; Antoniou et al. 2016; Carrese et al. 2017; Nigro et al. 2018). SPSA preserves the original upper level formulation and is easy to implement for simulation optimization problems.

SPSA is a simulation-based optimization algorithm, and it only requires two evaluations of the objective function to approximate the gradient instead of \(N\), as in the case of a finite-difference gradient approach. Like in many iterative procedures, it begins with an initial OD-matrix (usually a historical OD-matrix). The next matrix at iteration \(k+1\) is computed from the matrix at iteration \(k\), moving a distance \({a}_{k}\) along the descent direction provided by the following gradient (Eq. 7.107):

$${X}_{k+1}={X}_{k}-{a}_{k} {*\widehat{g}}_{k}\left({X}_{k}\right)$$
(7.107)

Two particularities distinguish this method from the conventional gradient descent method:

  • The estimated gradient \({\widehat{g}}_{k}\left({X}_{k}\right)\), is calculated according to Eq. 7.108:

    $${\widehat{g}}_{k}\left({X}_{k}\right)=\frac{Z\left({X}_{k}+{c}_{k}*{\Delta }_{k}\right)-Z({X}_{k})}{{c}_{k}}*\left(\begin{array}{c}{\Delta }_{k,1}^{-1}\\ \vdots \\ {\Delta }_{k, N}^{-1}\end{array}\right)=\left(\begin{array}{c}\frac{Z\left({X}_{k}+{c}_{k}{*\Delta }_{k}\right)-Z\left({X}_{k}\right)}{{c}_{k}{*\Delta }_{\mathrm{k},1}}\\ \vdots \\ \frac{Z\left({X}_{k}+{c}_{k}{*\Delta }_{k}\right)-Z\left({X}_{k}\right)}{{c}_{k}{*\Delta }_{\mathrm{k},\mathrm{N}}}\end{array}\right)$$
    (7.108)

    where \({\Delta }_{k}\) is a random perturbation N-dimensional vector with \({\Delta }_{i}, \forall i\) independent identically distributed random variables that satisfy \({\mathbb{E}}\left({\Delta }_{i}\right)=0\) and \(\left|{\mathbb{E}}\left({\left({\Delta }_{i}^{-1}\right)}^{n}\right)\right|<\infty , \forall n\). One commonly used perturbation is \({\Delta }_{i}\sim Be(1/2, \pm 1)\), which is a Bernoulli distribution with a probability of ½ for each ± 1. This is the asymmetric design, although a symmetric design using \(Z\left({X}_{k}+{c}_{k}{*\Delta }_{k}\right)\) and \(Z\left({X}_{k}-{c}_{k}{*\Delta }_{k}\right)\) can also be considered.

  • The spacing coefficient \({c}_{k}\) and the step size \({a}_{k}\) are decreasing sequences of positive real values, and they satisfy some regularity conditions in order to ensure the convergence of the method, as detailed in Spall (1992). Typically, the sequences used are (Eqs. 7.109 and 7.110):

    $${a}_{k}= \frac{a}{{\left(A+k+1\right)}^{\alpha }}$$
    (7.109)
    $${c}_{k}= \frac{c}{{\left(k+1\right)}^{\gamma }}$$
    (7.110)

    where \(a,A\) and \(c\) are chosen depending on the problem, while \(\alpha =0.602\) and \(\gamma =0.101\).

Averaging many independent estimates of the gradient of Eq. 7.108 contributes to a more stable and quicker convergence of the SPSA method (Spall 1992). Therefore, the gradient estimation is finally calculated as (Eq. 7.111):

$$\widehat{g}\left({X}_{k}\right)=\frac{1}{{n}_{g}}*\sum_{j=1}^{{n}_{g}}{\widehat{g}}_{k}^{j}\left({X}_{k}\right)$$
(7.111)

where \({\widehat{g}}_{k}^{j}({X}_{k})\) is precisely calculated as in Eq. 7.108. The asymmetric design for the gradient saves a large number of assignments, since all \({\widehat{g}}_{k}^{j}\left({X}_{k}\right), \forall j\) share the mid-point \({X}_{k}\) evaluation.

The versatility of simulation optimization techniques, especially when using SPSA, allows including additional information in a new form, such as the constraints in the OD-estimation problem. Ros-Roca et al. (2017) tried adding constraints to simulation optimization problems when dealing with the calibration of microsimulation models.

A potential improvement with respect to the original formulation (Bullejos et al. 2014; Cantelmo et al. 2014) replaces the gradient by the Conjugate Gradient (CG) (Luenberger and Ye 2008), a descent method for the optimization algorithm of the OD-estimation problem. This modifies the descent direction in the iterative procedure by using the previous iteration gradient. It can be incorporated into SPSA by replacing Eq. 7.107 with Eqs. 7.1127.114:

$${X}_{k}={X}_{k-1}+{a}_{k}{*d}_{k}$$
(7.112)
$${d}_{k}= -\widehat{g}\left({X}_{k}\right)+{\beta }_{k}*\widehat{g}\left({X}_{k-1}\right)$$
(7.113)
$${\beta }_{k}= \frac{{\widehat{g}{\left({X}_{k}\right)}^{T}*d}_{k-1}}{{\Vert {d}_{k-1}\Vert }^{2}}$$
(7.114)

SPSA’s main drawback for the OD-estimation problem is that all different OD flows receive the same perturbation magnitude (Eq. 7.108). As OD flows usually have very different magnitudes, this implies very different changes to each flow, which can lead to several problems of convergence. Tympakianaki et al. (2015) approached this phenomenon by clustering the variables according to their magnitude. A different alternative can be normalizing to the interval [0, 1] all variables using some particular reasonable bounds \(\left[{a}_{i},{b}_{i}\right]\). For example, Ros-Roca et al. (2018) performed a classical linear transformation from \(\left[{a}_{i},{b}_{i}\right]\) to \([\mathrm{0,1}]\), where \({a}_{i}\) and \({b}_{i}\) were based on additional information from the network, particularly socioeconomic or past reliable OD-matrices. The normalization was performed using the following linear application (Eq. 7.115):

$$\begin{array}{ccc}{\varphi }_{i}:& \left[{a}_{i}, {b}_{i}\right]\to & \left[\mathrm{0,1}\right]\\ \\ { X}_{i }& \mapsto & \frac{{{X}_{i}-a}_{i}}{{b}_{i}-{a}_{i}} \end{array}$$
(7.115)

Using the normalized variables in SPSA procedure, each variable will be perturbed according to its magnitude.

Experience with similar problems shows that the selection of SPSA gain sequences \({a}_{k}\) and \({c}_{k}\) is crucial for the convergence and performance of the algorithm. The sequences in the form of Eq. 7.109 and 7.110 are widely used, as they satisfy the conditions of convergence that were proved in Spall (1992). This reduces the problem of selecting appropriate values for \(a, A, \alpha , c\) and \(\gamma\). Kostic et al. (2017b) showed the sensitivity of SPSA with respect to these parameters. Based on the guidelines in Spall (2003), an automated selection of the parameters \(a, A\) and \(c\), can be based on the objective function’s variability that results from the simulation, and on the desired perturbation steps in the early iterations. The selection would be done according to the following schema:

  • First, those values stated as optimal for convergence in Spall (1998) are fixed. That is, \(\alpha =0.602, \gamma =0.101.\)

  • Several evaluations of \(Z({X}^{H})\) to capture the variability of the objective function are computed. Since the variables have been normalized, it seems natural to use the coefficient of variation (\(CoV\left(Z\right) ={\sigma }_{Z}/{\mu }_{Z}\)) for this purpose. The parameter \(c\) is set at \(c=CoV\).

  • A is set as \(10\mathrm{\%}\) of the maximum number of iterations (\(A=0.1\cdot {\mathrm{iter}}_{\mathrm{max}}\)).

  • \({n}_{g}\) experiments are simulated using the SPSA logic \({X}_{i}={X}^{H}+c{\Delta }_{N}\). This allows finding the respective gradients \({g}_{k}\) as in the SPSA procedure.

  • The desired iterative modification of the first iteration must be determined with Eq. 7.116:

    $${X}_{k+1}={X}_{k}-{a}_{k}{*g}_{k}\to {X}_{k+1}-{X}_{k}={|{a}_{k}*g}_{k}|$$
    (7.116)
  • The corresponding \(a\) for the desired change in the initial iteration must be computed using Eq. 7.117:

    $$\left|{a}_{k}{*g}_{k}\right|= \frac{a}{{\left(1+A+k\right)}^{\alpha }}{*|g}_{k}|\to a=\frac{{\left|{a}_{k}{*g}_{k}\right|*\left(1+A+k\right)}^{\alpha }}{\left|{g}_{k}\right|}$$
    (7.117)
  • The minimum of the \({n}_{g}\) performed experiments must be finally chosen. That is (Eq. 7.118):

    $$a=\mathrm{min}\left\{{a}_{\left\{i=1\right\}}, \dots , {a}_{\left\{i={N}_{g}\right\}} \right\}$$
    (7.118)

As already mentioned, the underdetermination of the OD-estimation problem can lead to different adjusted OD-matrices that show the same traffic counts at the sensor locations even though they are different. Furthermore, the adjusted OD-matrix can also be non-consistent with the socioeconomic factors of the area under study. In traffic analyses, practitioners usually have access to historical data in the form of an OD-matrix \({X}^{H}\) which, with a certain degree of uncertainty, provides prior information about the mobility patterns of the target area. Therefore, including constraints in the SPSA formulation that accounts for this information can lead to more realistic results. A possible approach is to add bounding values to the OD values, which is not easy to do in analytical formulations (Codina and Montero 2006) but is relatively easy to manage in SPSA. In Cipriani et al. (2011), a single generation constraint is added to the minimization problem (Eq. 7.119):

$$\sum_{\mathrm{i}=1}^{{\mathrm{n}}_{\mathrm{h}}}{\mathrm{G}}_{\mathrm{o}}^{\mathrm{i}}\le {\mathrm{G}}_{\mathrm{o}}^{*}\;\; \forall \mathrm{o}\in \left\{\mathrm{origins}\right\}$$
(7.119)

with \({G}_{o}^{*}\) being the a priori generation value for the origin zone \(0\) and \({n}_{h}\) the number of time periods. Other approaches, that of Ros-Roca et al. (2020), specify upper and lower bounds for each OD flow, defined in terms of a percentage \(\beta\) of this flow's historical value, according to its degree of uncertainty. With the constraints, the minimization problem is updated as follows (Eqs. 7.120 and 7.121):

$$\mathrm{Min}\,\,Z(X,Y) ={w}_{1}{*F}_{1}\left(X,{X}^{H}\right)+{w}_{2}{*F}_{2}\left(Y, \widehat{Y}\right)$$
(7.120)
$$s.t.\,\, Y=\mathrm{Assignmt}\left(X\right)$$
(7.121)
$$X\in G=\left\{\left(1-\beta \right)*{x}_{n,r}^{H}\le {x}_{n,r}\le \left(1+\beta \right)*{x}_{n,r}^{H} , \forall {x}_{n,r}\in X\right\}\subset {\mathbb{R}}_{+}^{I\times T}$$
$$X\ge 0$$

This single constraint in Eq. 7.119 results from summing for each origin all the upper bounds in the former minimization problem. The addition of all constraints makes the feasible region bigger. Greater values are therefore allowed for some variables, but this is compensated by others having low values. On the contrary, the proposal for constrained SPSA in Ros-Roca et al. (2020) defines a smaller feasible region that accounts for further information of each OD pair.

These constraints added to the problem also have an effect on the originally presented SPSA algorithm. Sadegh and Spall (1998) proposed to add a projection to the set \(G\) during the iterative procedure shown in Eq. 7.107. The projection would be applied only to the iterative procedure as \({X}_{k+1}={\pi }_{G}\left({X}_{k}-{a}_{k}{*\widehat{g}}_{k}({X}_{k})\right)\), while \(Z({X}_{k}+{c}_{k}{*\Delta }_{k})\) could be computed subject to non-negative OD values. This method, in which some strict constraints are added to the procedure, is called Constrained SPSA.

Inspired in Wang and Spall (1999), other formulations equivalent to Eqs. 7.120 and 7.121 add penalty functions to the objective function (Eqs. 7.122 and 7.123):

$$\mathrm{Min}\,\,Z(X,Y) ={w}_{1}{*F}_{1}\left(X,{X}^{H}\right)+{w}_{2}{*F}_{2}\left(Y, \widehat{Y}\right)+{r}_{k}*P\left(X, {X}^{H}\right)$$
(7.122)
$$s.t.\,\, Y=\mathrm{Assignmt}\left(X\right)$$
(7.123)
$$X\ge 0$$

where \({r}_{k}\) is an increasing sequence of the form \({r}_{k}=r{*\left(1+k\right)}^{\rho }\) and \(P\left(X, {X}^{H}\right)\) is a set of penalization functions for the set of constraints that delimit the constraints of set \(G\). Formally (Eq. 7.124):

$$G\triangleq \left\{{q}_{n,r}\left(X, {X}^{H}\right)\le 0, \forall n\in I, r\in T\right\}=$$
$$=\left\{{x}_{n,r}-\left(1+\beta \right)*{x}_{n,r}^{H}\le 0 , \left(1+\beta \right){*x}_{n,r}^{H}-{x}_{n,r}\le 0\,\, \forall n\in I, r\in T\right\}$$
(7.124)

The penalty function \(P\left(X, {X}^{H}\right)\) must be differentiable, non-negative, and an increasing function. Wang and Spall (1999) propose a sum for each constraint of penalizing functions that satisfy \(p\left(x\right)=0\) if and only if \(x\ge 0\). That is (Eq. 7.125):

$$\begin{aligned} P\left( {X,X^{H} } \right) & = \sum _{{n \in I}} \sum _{{r = 1}}^{T} w_{{n,r}} *p\left( {q_{{n,r}} \left( {X,X^{H} } \right)} \right) \\ & = \sum _{{n \in I}} \sum _{{r = 1}}^{T} w_{{n,r}} *{\text{max}}\left\{ {0,q_{{n,r}} \left( {X,X^{H} } \right)} \right\}^{2} \\ \end{aligned}$$
(7.125)

As in the previous variant, the iterative procedure is also modified to incorporate the gradient of the penalization function (Eq. 7.126):

$${X}_{k+1}={X}_{k}-{a}_{k}* {\widehat{g}}_{k}\left({X}_{k}\right)-{a}_{k}*{r}_{k}+\nabla P\left({X}_{k}, {X}^{H}\right)$$
(7.126)

When additional information from ICT measurements is available, it can be included in the SPSA formulation (Eq. 7.107) as long as it can be estimated from the current OD-matrix \(X\) by means of a DTA. This is, for example, the case of subpaths travel times \(\widehat{tt}\) measured either by Bluetooth (Bullejos et al. 2014; Antoniou et al. 2016) or by GPS tracking (Ros-Roca et al. 2021a). The logical diagram of this process is described in Fig. 7.6. The calculation of these observed subpaths travel times \(\widehat{tt}\) requires the identification of the most used paths from the available measurements and their map matching to the transport model supporting the DTA. This allows computing the corresponding estimated travel times \(tt\) from the current OD, which will be added in the additional term to the objective function in Eq. 7.107. The processing of the GPS data to calculate \(\widehat{tt}\) is described in Sect. 7.6.

In Kostic et al. (2017a), the additional term of the objective function in Eq. 7.107 is formulated as a function of the measured speeds at detection stations equipped with conventional technologies (i.e., inductive loops), and the DTA used is TRE (Gentile et al. 2007; Gentile 2010), supporting OPTIMA.

6 Data-Driven Approaches

The availability of new traffic data supplied by ICT applications, i.e., mobile phones, image processing techniques for license plate recognition, Bluetooth devices, FCD from onboard tracking mobile devices vehicles like GPS, etc., prompted the research interest in finding which could be the advantages of including these data explicitly in the OD-estimation methods. In this context, probe (or equipped) vehicles can be grouped into two generic classes (Nanthawichit et al. 2003; Eiseman and List 2004), according to the explanations in Chap. 1. First, those vehicles equipped with devices that can only be detected at specific locations (i.e., where the detection technology is located), as, for example, those equipped with a tag-reader or with a Bluetooth or Wi-Fi device. Known as “space-based” probe vehicles, their true origin and destination are not known, and their approximate estimates can only be inferred, being this inference strongly dependent on the layout of the detection devices (e.g., tag-readers, Bluetooth antennas). Second, those vehicles equipped with wireless communication mobile devices that are fully visible in the areas covered by the corresponding telecommunications system. Therefore, these systems can provide seamless data about their location, speed, travel direction, etc., depending on the device. These are known as “time-based” probe vehicles.

Methodologies related to space-based probe vehicles that have received significant attention are those based on the identification and reidentification of the license plate of all vehicles passing the area covered by a TV camera with a LPR technology (Mo et al. 2020). Also, those based on the identification of Bluetooth devices between coupled pairs of Bluetooth antennas (Barceló et al. 2013; Behara et al. 2021). However, as already mentioned, results of these methodologies have a strong dependency on the layout of TV Cameras or Bluetooth antennas in the network, this layout becomes a critical aspect for the observability of the system (Castillo et al. 2008) and thus determines the capability of the methods to estimate and predict its state.

As regards time-based probe vehicles, the pervasive penetration of mobile phones has allowed a better understanding of human mobility patterns from their traces, that is, by means of their digital footprints. As mobility patterns include information about where people are and how they got there, mobile phones were soon identified as an important data source for urban modeling. They attracted the interests of researchers and practitioners, as they were seen as a powerful data source that would allow overcoming the well-known drawbacks and limitations of conventional methods in transportation analysis (i.e., household survey). Analyses are usually conducted using datasets, the so-called Call Detail Records (CDR), previously recorded by a mobile provider for communication and billing purposes, after an anonymization process. A seminal example of this process can be found in González et al. (2008), where each individual calling activity is characterized to allow monitoring the user’s movement over time. Calabrese et al. (2013) provide an example of techniques aimed at extracting useful mobility information from mobile phone traces of millions of users from which to infer individual mobility patterns in large urban areas, especially OD-matrices (Zhang et al. 2010; Calabrese et al. 2011). Since CDR are time tagged and locations can be identified after suitable processing, added value information for a variety of mobility analyses can be extracted from the (Çolak et al. 2015). Additionally, OD-matrices can be differentiated by purpose and time of the day (Alexander et al. 2015). However, this requires resorting to very specific Data Analytics techniques, given the huge amount of data frequently recorded from millions of users. Gundlegård et al. (2015) or Jianga et al. (2016) are good examples of this data processing to extract the OD-matrices.

However, the type of OD-matrices that dynamic traffic models used in traffic management systems require as input is rather different from the matrices directly extracted from DCR. Indeed, the mobility patterns modeled by these latter OD-matrices are global, that is, they include all types of trips without distinguishing the transportation mode used. Conversely, the OD-matrices of interest for traffic management purposes are usually those modeling the passenger cars patterns. Additional work is necessary to estimate these specific OD-matrices. For example, DCR OD-matrices can be combined with simulation models like MITSIM (Iqbal et al. 2014) or they can be fused with other data sources (Montero et al. 2019). Bassolas et al. (2019) propose also a fusion variant to generate inputs to activity-based travel demand models using MATSIM.

Among the time-based probe vehicles, the better suited to generate OD-matrices that can be exploited by dynamic traffic models seem to be those allowing the tracking of the equipped individual vehicles and the reconstruction of their trajectories. Assuming that the collected data from the tracking technologies include geolocation and time stamps, i.e., waypoints in the terminology of commercial GPS providers, map matching and path inference procedures could provide comprehensive information about origins, destinations, taken paths, and path travel times. This was essentially the assumption in an early paper of van Aerde et al. (1993), accepting that probe vehicles were fully visible. The mentioned seminal papers of Nanthawichit et al. (2003), and Eisenman and List (2004) later accepted this hypothesis. Therefore, assuming that these sampled trajectory data are available, the question is whether and how they can be used to find sound estimates of dynamic OD-matrices, that is, OD-matrices discretized in time, exploiting for that purpose the time tag recorded data.

Research on the potential use of these mobile data for transport analysis has also prompted a key question with relevant practical applications. Most of the DODME approaches discussed in the previous sections usually assume that one of the inputs is provided by an available historical matrix. The reliability and quality of such historical OD-matrices has been questioned in practical transport planning practice, as it could be largely outdated or even not exist. However, this is not the case in most practical traffic management applications due to the amount and quality of data supplied by modern surveillance systems. Therefore, the improvement of the seed matrices used in DODME by means of sample data from probe vehicles is a relevant contribution. However, the previous discussion on DODME approaches makes evident that all of them rely on the estimate of a dynamic assignment matrix. The fact that this assignment matrix must be estimated by a DTA or a DUE and that the approach implies an iterative process, this could represent a heavy computational burden not affordable in real-time applications. Therefore, the key question is: can the dynamic traffic assignment matrix be empirically estimated from probe vehicle data? And, if possible, how can it be used to improve DODME approaches? A positive answer to the first question opens the door to build models some of whose components are directly derived from an empirical procedure, which would be based on the observed data, instead of from an analytical procedure. In other words, this approach paves the way to build data-driven models.

6.1 A Conceptual Proposal on Data-Driven Modeling

From this latter perspective, an interesting proposal is that made by Yang et al. (2017). They wanted to determine whether the availability of such trajectory data could be used to develop an approach to DODME independent of the reliability of an historical OD. That is, to make a good empirical estimation of the assignment matrix, making it unnecessary to resort to DTA. According to the above-mentioned statements, it is assumed that each probe vehicle reports its position in the form of GPS coordinates after a preprocessing procedure performed with map-matching techniques. In summary the approach is as follows:

  • It is assumed that vehicle trajectories from origins to destinations are traceable for each probe vehicle, and that the supplied GPS locations have been suitable preprocessed by data cleansing and map-matching procedures. Therefore, GPS locations in the approach are assumed to be exact.

  • If \(\widehat{L}\subseteq L\) is the subset of links with counting stations, two link flow measurements are available for each time period \(r\). There are \({\widehat{y}}_{lr}, l\in \widehat{L}, r\in T\) flows from the counting stations at links \(l\in \widehat{L}\), and \({\widehat{h}}_{lr}, l\in \widehat{L}, r\in T\) flows of probe vehicles crossing that link \(l\) at time interval \(r\).

  • The OD probe ratios, that is, the average number of vehicles observed across the entire network during a time interval are given by Eq. 7.127:

    $${\gamma }_{r}=\frac{\sum_{ l\in \widehat{L}}{\widehat{h}}_{lr}}{\sum_{l\in \widehat{L}}{\widehat{y}}_{lr}} \forall r\in T$$
    (7.127)
  • Thus, the seed OD-matrix \({\widehat{x}}_{nr}\) can be estimated with Eq. 7.128:

    $${\widehat{x}}_{nr}=\frac{{\widehat{z}}_{nr}}{{\gamma }_{r}} \forall n\in I, \forall r\in T$$
    (7.128)

    where \(I\) is the set of all OD pairs, and \({\widehat{z}}_{nr}\) is the number of identified probe vehicles traveling from the origin to the destination of the \(n-th\) OD pair.

  • The assumption of the identification of locations of probe vehicles allows, in a similar way, directly estimating the assignment matrix (Eq. 7.129):

    $${a}_{ln}^{t}=\frac{1}{\left|T\right|}*\sum_{r\in T}\left(\frac{{\widehat{z}}_{ln}^{r.r+t}}{{\widehat{z}}_{nr}}\right) t\in T, n\in I, l\in \widehat{L}$$
    (7.129)

Assuming this data-driven approach, a variant of the model in Eq. 7.71 is proposed (Eqs. 7.130):

$$\genfrac{}{}{0pt}{}{Min \left[\sum_{r\in T}\sum_{n\in I}\frac{{\left({x}_{nr}-{\widehat{x}}_{nr}\right)}^{2}}{{w}_{nr}^{2}}+\sum_{r\in T}\sum_{l\in \widehat{L}}\frac{{\left({y}_{lr}-{\widehat{y}}_{lr}\right)}^{2}}{{q}_{rl}^{2}}\right] }{{x}_{nr}}$$
(7.130)

s.t.

$${y}_{lr}=\sum_{t\in T}\sum_{n\in I}{a}_{ln}^{t}{*x}_{n,r-t} \forall l\in \widehat{L}, \forall r\in T$$
$$-\beta {x}_{nr}\le {x}_{n,r+1}-{x}_{nr}\le {\beta x}_{nr} \forall n\in I,\forall r\in T$$
$${x}_{nr}\ge 0 \forall n\in I,\forall r\in T$$

where the first constraint expresses, as in the analytical models, the relationships set up by the empirical assignment matrix \({a}_{ln}^{t}\) between \({y}_{lr}\), the estimated flows at the links \(l\) with traffic detection stations \(l\in \widehat{L}\) for each time interval \(r\), and \({x}_{n,r-t},\) the OD flows leaving the origin at time \(r-t\), observed in link \(l\) at time \(r.\) The objective function in this case is formulated in terms of a quadratic distance function. Its metrics are defined, as in Eq. 7.35, respectively, by the matrix of variances, \({w}_{nr}^{2}\) and \({q}_{rl}^{2},\) of the empirical OD-matrix, \({\widehat{x}}_{nr,}\) and the link flow measurements, \({\widehat{y}}_{lr}\). The coefficient \(\beta\) in the bounding constraints of Eq. 7.130 is the maximum change percentage of OD flows between two consecutive intervals.

Yang et al. (2017) also propose a more general variant of the model assuming that there is a correlation between the OD probe ratios, \({\gamma }_{nr}\), and the link probe ratios, \({\theta }_{lr.}\). In other words, that there exists a function \({\theta }_{lr}=P\left({\gamma }_{nr}\right)\), for which they postulate the following form (Eq. 7.131):

$${\theta }_{lr}= \sum_{t\in T}\sum_{i\in I}{\rho }_{ln}^{t}*{\gamma }_{nr}=\sum_{t\in T}\sum_{i\in I}{\rho }_{ln}^{t}*\left(\frac{{\widehat{z}}_{n.r-t}}{{x}_{n,r-t}}\right) \forall l\in \widehat{L}, \forall r\in T$$
(7.131)

where \({\rho }_{ln}^{t}\) define the assignment matrix of probe ratios, which is assumed to be computed empirically from the collected data according to the main hypothesis of the method. These link probe ratios depend on the estimated OD-matrix, \({x}_{nr}\) as expressed in Eq. 7.131, being therefore new variables of the model. Assuming that the available GPS data allow estimating the empirical values \({\widehat{\theta }}_{lr,}\) they can be added to the objective function (Eq. 7.130) yielding the enhanced model in Eq. 7.132:

$$\genfrac{}{}{0pt}{}{{Min}_{{x}_{nr} } \left[\sum_{r\in T}\sum_{n\in I}\frac{{\left({x}_{nr}-{\widehat{x}}_{nr}\right)}^{2}}{{w}_{nr}^{2}}+\sum_{r\in T}\sum_{l\in \widehat{L}}\frac{{\left({y}_{lr}-{\widehat{y}}_{lr}\right)}^{2}}{{q}_{rl}^{2}}+\sum_{r\in T}\sum_{l\in \widehat{L}}\frac{{\left({\theta }_{lr}-{\widehat{\theta }}_{lr}\right)}^{2}}{{v}_{rl}^{2}}\right] }{ }$$
(7.132)

s.t.

$${y}_{lr}=\sum_{t\in T}\sum_{n\in I}{a}_{ln}^{t}*{x}_{n,r-t} \forall l\in \widehat{L}, \forall r\in T$$
$${\theta }_{lr}=\sum_{t\in T}\sum_{i\in I}{\rho }_{ln}^{t}*\left(\frac{{\widehat{z}}_{n.r-t}}{{x}_{n,r-t}}\right) \forall l\in \widehat{L}, \forall r\in T$$
$$-\beta {x}_{nr}\le {x}_{n,r+1}-{x}_{nr}\le {\beta x}_{nr} \forall n\in I,\forall r\in T$$
$${x}_{nr}\ge 0 \forall n\in I,\forall r\in T$$

where, as before, \({v}_{rl}^{2}\) is the variance of the observed ratios. Since the optimization model is quadratic, the gradient can be easily calculated and a gradient algorithm is proposed to numerically solve the problem.

6.2 Accounting for Mobility Learning from ICT Data Collection

Cascetta et al. (2013) formulate the hypothesis that “an OD estimator can be based on the assumption of constant distribution shares across larger time horizons with respect to the within-day variation of the production profiles, leading to an estimator that dramatically improves the unknowns/equations ratio”. Krishnakumari et al. (2019) propose to go a step further. They assume that all realized travel times are available over all (shortest) paths. Also, that it is only necessary to specify how many of the shortest paths are actually used for each OD pair and the proportions of each OD flows over these used shortest paths. These proportions are a behavioral assumption at the macroscopic scale (a path flow proportion), and not in the form of a detailed route choice model with (elaborated) trade-offs.

Nevertheless, the assumptions in Krishnakumari et al. (2019) about the distribution of traffic over the network are not sufficient to estimate the underlying OD-matrix, They must be complemented with additional information that, for instance, can be provided by measured link flow counts \({\widehat{y}}_{lt}\) from counting stations, measured at links \(l\in \widehat{L}\) at time \(t\in T\). Link flow counts \({y}_{lt}\) that, as shown in Eq. 7.76, can be estimated in terms of the relationships between flows and OD flows \({x}_{ijr}\) departing from origin \(i\) with destination \(j\) at time interval \(r\), arriving at link \(l\) at time \(t\), and the assignment matrix \({a}_{ijr}^{lt}\).

However, to be valid, these relationships must be set up considering that counts in or downstream congestion are not informative of demand, but of (discharge) capacity, as shown in Frederix et al. (2011). Information on demand is only provided if \({y}_{lt}\) are estimated in uncongested conditions, and no path flows for OD pair \(\left(i,j\right)\in I\) using a path \(k\) to which \(l\) belongs, experience a bottleneck upstream before crossing link \(l\). Therefore, in order to overcome these limitations, the computation of the assignment matrix, or that of any related terms, must be done in a way that explicitly accounts for congestion effects.

The approach proposed by Krishnakumari et al. (2019) assumes that, in addition to the availability of OD travel times, also the productions \({P}_{ir}\), i.e., the total outgoing flows from each origin \(i\), during period \(r,\) as well as the attractions \({A}_{jr}\), i.e., the total incoming flows to each zone \(j\) during period \(r,\) are observable and, therefore, available.

The availability of these inputs from the observed data is based on a methodology proposed by López et al. (2017b) that is based on specific Data Analytics techniques suited to build consensual 3D speed maps by clustering techniques from link speeds. These speeds are estimated from field data by a heuristic procedure defined in López et al. (2017a). According to the authors, this procedure can exploit classical data (e.g., from inductive loops or cameras) as well as those from more modern data sources (e.g., mobile phone records, GPS tracking, etc.). The methodology is illustrated in the referenced papers for a case in which link speeds were estimated from individual travel times recorded by TV cameras equipped with LPR technology. However, it is extensible to other technologies as long as the requirements are met.

In any case, field data must be appropriately cleansed out and the outliers removed. Krishnakumari et al. (2019) discuss several procedures for this purpose, among them a moving average process where, if \({\tau }_{n}\) is the \(n-\mathrm{th}\) realized travel time for a given OD pair, Eq. 7.133 gives the moving average:

$${\overline{\tau }}_{n}=\frac{1}{k}\sum_{i=0}^{k-1}{\tau }_{n-i}$$
(7.133)

Outliers are defined by \({\overline{\tau }}_{n}+\Delta \tau ,\) where \(\Delta \tau\) is a time window empirically determined, for instance, as the standard deviation times recorded during the peak demand. The mean of the observed travel times for a given OD pair \((i,j)\) at a given period is considered the travel time from \(i\) to \(j\) at time \(t\),\({tt}_{ij}\left(t\right)\). Additionally, the considered k-shortest paths as the most likely used between each OD pair. For a particular one\((i,j)\), a path \({L}_{p}\) is characterized by a sequence of links \({L}_{p}=\left({l}_{p1},{l}_{p2}, \dots ,{l}_{pn}\right)\). Then, the path speed is (Eq. 7.134):

$${s}_{p}=\frac{\mathrm{dist}\left({L}_{i}\right)}{{tt}_{ij}\left(t\right)}$$
(7.134)

Krishnakumari et al. (2019) also consider various approaches to impute link speeds when no data are available.

The main assumption behind the approach proposed by Lopez et al. (2017b) is that the availability of the data provided by these more modern data sources allows finding empirically driving insights of human mobility, namely, those concerning their dynamic aspects, and thus enables their use in mathematical models aimed at predicting that dynamic mobility. This means to investigate the regularity of macroscopic mobility patterns, how they vary within days and from day to day. For that purpose, Lopez et al. (2017b) propose a methodology based on what they call 3D maps, in essence spatial–temporal speed cluster maps, which are a joined partition of space (i.e. the road network) and time into homogeneous clusters characterized by constant mean speeds. The proposed approach considers that link speed data can be reconstructed from trip travel time observations with Eq. 7.134, as in Lopez and al. (2017b), and that the network is coded in Open Street Map Geographical Information System (OSM GIS) Database, also used to compute all shortest paths. The cluster building process is based on the following partitioning criteria:

  • All clusters should contain a single connected component. In other words, all links in the cluster are reachable within the cluster.

  • An intra-cluster homogeneity criterion, formulated in terms of the minimization of the internal speed variance for all clusters. If \(n\) is the number of clusters, the total within cluster variance \({TV}_{n}\) is given by (Eq. 7.135):

    $${TV}_{n}= \frac{1}{{\sum }_{i=1}^{n}{n}_{i}}*\left(\frac{{\sum }_{i=1}^{n}{n}_{i}{*s}_{i}^{2}}{{s}^{2}}\right)$$
    (7.135)

    where \({n}_{i}\) is the number of links in cluster \(i\), \({s}_{i}\) the standard deviation of links speeds in cluster \(i\), and \(s\) the standard deviation for the whole network. It is assumed that link speeds have been estimated from Eq. 7.134.

  • An inter-cluster dissimilarity criterion that maximizes the difference in speed between neighbor clusters, where the inter-cluster dissimilarity is given by Eqs. 7.136 and 7.137:

    $${CCD}_{n}=\frac{\sum_{i=1}^{n}{\sum }_{k=i+1}^{n}{\delta }_{ik}*\sqrt{{n}_{i}*{n}_{k}}*\left|{\overline{v} }_{i}-{\overline{v} }_{k}\right|}{\sum_{i=1}^{n}{\sum }_{k=i+1}^{n}{\delta }_{ik}*\sqrt{{n}_{i}*{n}_{k}}}$$
    (7.136)
    $$\delta _{{ik}} = \left\{ {\begin{array}{*{20}c} 1 & \;\;{{\text{if}}\;{\text{clusters}}\;i\;{\text{and}}\;k\;{\text{have}}\;{\text{a}}\;{\text{common}}\;{\text{border}}} \\ 0 & \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!{{\text{otherwise}}} \\ \end{array} } \right.$$
    (7.137)

    where \({\overline{v} }_{i}\) is the mean speed in cluster \(i\).

Lopez et al. (2017b) test three different clustering approaches, k-means, DBSCAN, and S-cut and conclude that, at least in the case study reported in the paper, k-means is the most economical in terms of computational cost to obtain the envisaged 3D speed maps. Furthermore, assuming that the observational data cover a period of \(M\) days, they add a new process to find commonalities in these days’ congestion patterns, the so-called “consensual” patterns, by means of Consensus Learning Techniques (Filkov and Skiena 2004).

The approach proposed by Krishnakumari et al. (2019) uses these results for different purposes:

  • To estimate or predict the production and attraction patterns using the identified 3D speed and flow patterns (possibly augmented with other data) using machine learning techniques (especially Neural Network techniques, although other techniques could also be used).

  • To compute \(N\) weighted (by travel time) shortest paths, where \(N\) is an assumption on how many alternative routes are used on average for each OD flow on these paths.

  • To estimate path flows on the used paths assuming that are inversely proportional to the realized travel times on these paths, considering path overlap, and under the additional constraint that the path flow solution space is determined by all admissible link flow counts.

Let’s assume that \({x}_{ijr}^{k}\) is the path flow from origin \(i\in O\) (where \(O=\left\{\mathrm{set}\,\, \mathrm{of}\,\, \mathrm{all}\,\, \mathrm{origins}\right\}\)) to destination \(j\in D\) (where \(D=\left\{\mathrm{set}\,\, \mathrm{of}\,\, \mathrm{all}\,\, \mathrm{destinations}\right\}\)), departing from origin \(i\) at time period \(r\in T\) (where \(T\) is the time horizon) on path \(k\in {N}_{ij}^{k}\) (where \({N}_{ij}^{k}\) is the set of all paths between origin \(i\) and destination \(j\) at time period \(k\)}; \({x}_{ijr}\) is the OD flow from origin \(i\in O\) to destination \(j\in D\), departing from origin i at time period \(r\in T; {P}_{ir}\) is the production of origin \(i\) during period \(r,\) \({A}_{jr}\) is the attraction of destination \(j\) during period \(r\); \({TT}_{ijr}^{k}\) is the travel time for vehicles traversing path \(k\) from origin \(i\) to destination \(j\) departing from \(i\) in time period \(r\); \({P}_{ijr}^{k}\) is the proportion of vehicles traveling on path \(k\) from origin \(i\) to destination \(j\) departing from \(i\) in time period \(r\) and \({\widehat{y}}_{lr}\) is the measured flow count in link \(l\) at time period \(r\). \({P}_{ir}\) is the sum of all outgoing flows from \(i\) at this time period along all paths \(k\in {N}_{ij}^{r}\) from \(i\) to all destinations \(j\in D\) (Eq. 7.138):

$${P}_{ir}=\sum_{j\in D}\sum_{k\in {N}_{ij}^{r}}{x}_{ijr}^{k}$$
(7.138)

In a similar way, the attraction \({A}_{jr}\) of destination \(j\) during period \(r\) is the sum of all incoming flows to destination \(j\) from all origins \(i\in O\) along all paths \(k\in {N}_{ij}^{k}\) (Eq. 7.139):

$${A}_{jr}=\sum_{i\in O}\sum_{k\in {N}_{ij}^{r}}{x}_{ijr}^{k}$$
(7.139)

Since links speeds are available, path travel times \({TT}_{ijr}^{k}\) can be calculated. From them, a behavioral assumption can be made on the proportion of trips using each available path in terms of each utility, which is defined by the path travel time. Krishnakumari et al. (2019) estimate this path proportion with the modified logit-based model proposed by Ben-Akiva and Bierlaire (1999) (Eq. 7.140):

$${P}_{ijr}^{k}=\frac{{e}^{{TT}_{ijr}^{k}*\left(1-{PS}^{k}\right)}}{\sum_{p\in {N}_{ij}^{r} }{e}^{{TT}_{ijr}^{p}*\left(1-{PS}^{p}\right)}}$$
(7.140)

In this Eq. 7.140 a correction term \({PS}^{k}\) is added to the deterministic component of the discrete-choice mode. It is the path size factor defined by Eqs. 7.141 and 7.142:

$${PS}^{k}=\sum_{a\in \mathrm{Path}\,\, k}\left(\frac{{l}_{a}}{{L}^{k}}\right)*\frac{1}{\sum_{p\in {N}_{ij}^{r}}{\delta }_{ap}}$$
(7.141)
$$\delta _{{ap}} = \left\{ {\begin{array}{*{20}c} 1 &\,\, {{\text{if}}\;{\text{link}}\;a\;{\text{belongs}}\;{\text{to}}\;{\text{path}}\;p} \\ 0 & \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!{{\text{otherwise}}} \\ \end{array} } \right.$$
(7.142)

where \({l}_{a}\) is the length of link a, \({L}^{k}\) is the length of paths k, and \({\delta }_{ap}\) is the link-path incidence matrix. The path size factor tries to capture the correlations between alternative paths measuring the dependencies in terms of a certain degree of similarity among the shared links. The calculation of the path proportions allows setting up the relationships between the OD flows, \({x}_{ijr,}\) and the path flows, \({x}_{ijr}^{k}\) (Eq. 7.143):

$${x}_{ijr}^{k}= {P}_{ijr}^{k}{*x}_{ijr} \forall i\in {O}_{i},\forall j\in {D}_{j},\forall r\in T, \mathrm{and}\,\, k\in {N}_{ij}^{r}$$
(7.143)

The number of paths \({N}_{ij}^{r}\) can be exponentially large but, in practice, as not all of them are significantly used, this number can be reduced to a smaller set \({N}_{ij}^{r*}\le {N}_{ij}^{r}\). This smaller set can be identified as part of the data analytics procedures to estimate the values of the model components. This leads to the approximation of the estimated OD-matrix as in Eq. 7.144:

$${x}_{ijr}=\sum_{k\in {N}_{ij}^{r*}}{x}_{ijr}^{k}$$
(7.144)

This approximation is sufficiently good if \({N}_{ij}^{r*}\) has been properly defined.

The relationship between link flows and path flows can be reformulated explicitly considering the effects of congestion in order to account for the conditions discussed above. That is, that flows \({y}_{lr}\) measured in link \(l\) at time \(r\) are informative of path flows crossing the link only if they are not congested at that time and if none of the links upstream of it experiences a bottleneck. The approach chosen by Krishnakumari et al. (2019) considering the subset of paths to which link \(l\) belongs and satisfying these conditions can be formulated as follows. If \({\mathrm{\wp }}_{r}^{l}\) is the set of paths to which link \(l\) belongs at time r, the subset of paths satisfying the conditions is given by (Eq. 7.145):

$$\wp _{{ijt}}^{k} \in \wp _{r}^{l} ,t \le r - \overline{{TT}} _{{ijr}}^{{k\backslash l}}\,\, \forall i,j,k\quad k\;{\text{all}}\;{\text{paths}}\;{\text{traversing}}\;l\;{\text{during}}\;{\text{period}}\;r$$
(7.145)

where \({\overline{TT} }_{ijr}^{k\backslash l}\) estimates the partial arrival travel times to link \(l\) along the paths in \({\mathrm{\wp }}_{ijt}^{k}\). This implies that (Eq. 7.146):

$$\sum\limits_{{\wp _{{ijt}}^{k} \in \wp _{r}^{l} }} {P_{{ijt}}^{k} } *x_{{ijr}} = \left\{ {\begin{array}{*{20}c} 0 & \;{{\text{if}}\;{\text{link}}\;{\text{supstream}}\;{\text{of}}\;l \in \wp _{{ijt}}^{k} \;{\text{are}}\;{\text{congested}}} \\ {P_{{ijt}}^{k} * x_{{ijr}} } & \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!{{\text{otherwise}}} \\ \end{array} } \right.$$
(7.146)

Thus, if \({\widehat{y}}_{lr}\) are the link flows measured at links \(l\in \widehat{L}\subseteq L\) equipped with detection stations, their relationships with the OD flows \({x}_{ijr}\) can be stated with Eq. 7.147:

$${\widehat{y}}_{lr}=\sum_{\left(i,j\right)\in I}{P}_{ijr}^{k}*{x}_{ijr}\,\, \forall l\in \widehat{L},\forall r\in T$$
(7.147)

Together with the corresponding reformulations of Eqs. 7.134 and 7.135 and in terms of Eq. 7.146, a system of equations (Eqs. 7.148 and 7.149) is defined:

$${P}_{ir}=\sum_{j\in D}{x}_{ijr} \,\,\forall i\in O, \forall r\in T$$
(7.148)
$${A}_{jr}=\sum_{i\in O}{x}_{ijr}\,\, \forall j\in D, \forall r\in T$$
(7.149)

As highlighted in Krishnakumari et al. (2019) “this system is underdetermined or overdetermined (or rare cases full rank) depending on the available link counts and the choice and number of link paths for each OD pair”. To solve the system, the authors propose to use the constrained least squares algorithm of Altman and Gondzio (1999), either with lower bounds set to 0 to ensure non-negative solutions, or without bounds when no solution exist and ignoring the negative values in computing the solution error.

A potential limitation of the proposed approach arises for large networks. That is, when the number of origins and destinations grows and, then, the number of OD flows grows quadratically. However, the number of Eqs. 7.148 and 7.149 in the system only grows linearly, as link flow equations do (Eq. 7.147), assuming also an increase in the number of detection stations. The authors propose to use in this case the dimensionality reduction techniques studied in Djukic et al. (2012), which are based on the application of the Principal Components Analysis (Jolliffe 2002).

To end this section, it should be noticed that this data-driven approach is the planned forthcoming OD-estimation method in future versions of the corresponding modules of Aimsun Next and Live software platforms for traffic analysis and management.

6.3 Estimating Assignment Matrices from FCD Data

As mentioned, the computational burden associated with the DTA required in the analytical approaches to the DODME problem, which is necessary to estimate the assignment matrix, and the existing doubts on how to integrate the additional information that can be available, have fostered research on these issues not only among researchers, but also among practitioners and developers of professional software platforms. An example of this motivation can be found in a recent work of the team supporting the OPTIMA traffic management platform (Mitra et al. 2020). This platform is aimed at estimating base OD demand matrices for large-scale networks using the information that can be extracted from large amounts of FCD data and link flow counts. The main assumption, similar than that of previous approaches, is that a detailed analysis of FCD trajectories, if properly and accurately done, enables the estimation of the two main required inputs: (i) a revealed OD-matrix \({X}^{0}\) extracted from the FCD trajectories, playing the role of seed matrix and (ii) information to build from FCD data a reliable assignment matrix that can replace the one provided by DTA in analytical approaches.

A critical point is that of the quality of the FCD data, since they can be poor, not homogeneous, or biased. However, Mitra et al. (2020) claim that, even in these cases, it is possible to take advantage of these data. Their suitable cleansing and filtering and their clustering according to similar average behaviors are useful techniques to apply. Also, the use of specialized map-matching algorithms matching each individual raw GPS trajectory on the transportation graph in order to reconstruct the most likely paths in this graph (Hart et al. 1968; Marchal et al. 2004; Quddus et al. 2007; Kubicka et al. 2018; Millard-Ball et al. 2019). The map-matched trajectories can be associated with origin and destination zones, departing from origin zones at specific times of the day.

Let’s assume that \(X\) is the estimated OD vector of size \(\left|I\right|*\left|T\right|\) (where \(I\) is the set of OD pairs and \(T\) the set of time intervals), that \(\widehat{Y}\) is the vector of traffic counts (of size \(\left|\widehat{L}\right|*\left|T\right|\), being \(\widehat{L}\) the set of links with counting stations), and that \(A\) is the estimated assignment matrix from FCD data. Then, mapping the estimated OD flows to the estimated link flow counts \(Y\) (with \(Y=A*X\)) and simplifying the formulation for a simple fixed time interval (no interdependencies between time intervals are assumed in this approach. See Mitra et al. (2020) for additional details), the DODME problem can be formulated as in Eq. 7.150:

$$\mathrm{Min}\,\, \varphi \left(X\right)=\frac{1}{2}{*\Vert A*X-\widehat{Y}\Vert }^{2}+\frac{\lambda }{2}{*\Vert X-\widehat{X}\Vert }^{2}$$
(7.150)

where \(\lambda\) is the relative weight of the demand term, and \(\widehat{X}\) is the reference demand vector, whose \(ij-\mathrm{th}\) element is given by Eq. 7.151:

$${\widehat{X}}_{ij}=\gamma *{\alpha }_{i}*{\beta }_{j}*{X}_{ij}^{0}\,\, \forall i\in O, \forall j\in D$$
(7.151)

being \(O\) the set of origins, D the set of destinations and \({X}^{0}\) the observed seed OD-matrix from FCD trajectories. \(\gamma\) is a constant factor that homogenously scales all OD pairs, and \({\alpha }_{i},\forall i\in O\) and \({\beta }_{j}, \forall j\in D,\) respectively, are the generation and attraction factors for each origin and each destination.

The solution to Eq. 7.150 is found by an iterative process that generates a sequence of feasible solutions \(\left\{{X}^{k}\right\}\). This is done in such way that a new solution is found at iteration \(k+1\) from the solution at iteration \(k\) by moving a step of length \({\theta }^{k}\in \left(0,\left.1\right]\right.\) along a feasible descent direction \(\Delta {X}^{k}\) (Eq. 7.152):

$${X}^{k+1}={X}^{k}+{\theta }^{k}*\Delta {X}^{k}$$
(7.152)

Since \(\varphi \left(X\right)\) is a quadratic problem, the descent direction can be found by a Newton method solving with Eq. 7.153:

$$\Delta {X}^{k}={\left[{\nabla }^{2}\varphi \left({X}^{k}\right)\right]}^{-1}*\nabla \varphi \left({X}^{k}\right)$$
(7.153)

where \(\nabla \varphi \left({X}^{k}\right)\) is the gradient of \(\varphi \left(X\right)\), and \({\nabla }^{2}\varphi \left({X}^{k}\right)\) the Hessian at \({X}^{k}\). In practice, Eq. 7.153 can be solved efficiently without inverting the Hessian and, since \(\varphi \left(X\right)\) is quadratic, the solution can be exactly found in one step if the Hessian is definite positive.

Several alternatives have been proposed (Mitra et al. 2020) to estimate the values of factors \(\gamma\), \({\alpha }_{O}\) and \({\beta }_{D}\), where \({\alpha }_{O}\) and \({\beta }_{D}\) are the vectors of attraction and generating factors. An example procedure that simultaneously optimizes \({\alpha }_{O}\) and \({\beta }_{D}\) could consist in (i) calculating the optimal value of \(\gamma\) the common global factor by solving the quadratic problem in Eq. 7.154 and (ii) calculating the optimal values of \({\alpha }_{O}\) and \({\beta }_{D}\) by solving Eq. 7.155:

$$\genfrac{}{}{0pt}{}{\mathrm{Min }\,\,\varphi \left(X\right) }{\gamma \ge 0}$$
(7.154)

s.t. \(\;{\alpha }_{O}\) = \({\beta }_{D}\)=1

$$\genfrac{}{}{0pt}{}{\mathrm{Min}\,\, \varphi \left(X\right) }{{\alpha }_{O},{\beta }_{D} \ge 0}$$
(7.155)

Mitra et al. (2020) present promising results of this approach applied to the large-scale network of Turin, with 438 zones, 96,420 links, 6,352 nodes, 1203 counting locations and GPS data for 1 year.

The potential problems of dealing with GPS data reported when discussing previous approaches fostered the search for other practical solutions. Most of these problems concern the unbiased reconstruction of vehicle trajectories and the estimation of the observed seed OD-matrix \({X}^{0}\) and are usually linked to the fact that most of the available commercial GPS data are obtained from non-homogeneous vehicle fleets (e.g., indiscriminate mix of commercial vehicles and passenger cars). Another source of issues is trajectories being split by random identity changes due to privacy reasons. However, once these data properly cleansed and filtered out, the waypoints or POIs (Points of Interest) supplied by GPS data can be considered reliable. These are usually given as an ordered sequence of waypoints containing the information (IDk, date, ts(kl), latkl, longkl), as illustrated in Table 7.1. IDk is the identity of each trip \(k,\) the date stands for the recording date, ts(k,l) is the time tag for the \(l-\mathrm{th}\) observation of trip \(k\) and latkl and longkl, respectively, are its latitude and longitude.

Table 7.1 Example of GPS recorded waypoints

However, these geographically referenced data do not usually correspond to the analyzed road network. Therefore, as already mentioned, they must be properly map-matched to transform these sequences of waypoints in points corresponding to paths on that network. The most used procedures (Marchal et al. 2004; Schuessler and Axhausen 2009; Pereira et al. 2009; Rahmani and Koutsopoulos 2013; Kubicka et al. 2018) assign each waypoint to a point in the nearest link of the network. There are available tools provided by software platforms to perform this operation, as, for instance, OpenLR (OpenLR 2020), or GPX (PTV Visum 2020). An example on how this works is depicted in Fig. 7.7, in which the red stars are the waypoints and the red numbers near the links are the relative position of the waypoint projection over the target link. Timestamps for waypoints are depicted in green.

Fig. 7.7
figure 7

Schematic overview of the map-matching process

Link travel times can be heuristically estimated from waypoint timestamps according to their sequence (Ros-Roca et al. 2021b). In this example, for all links in the sequence, the interpolated travel time for a link is the sum of the timestamp differences of two consecutive waypoints mapped in the target link. In the case of two consecutive waypoints that are not wholly projected within one link, the distance-based fraction within the link is taken (lk is the length of link \(k\) in Fig. 7.7). For instance, the travel time for link \({l}_{3}\) can be estimated taking into account that the travel time for the trip between the 3rd and 4th waypoints is 20 s, and that it is the estimated travel time of the whole link \({l}_{3}\) plus a 0.2 fraction of \({l}_{2}\) and a 0.7 fraction of \({l}_{4}\)(Eq. 7.156, with the result in s):

$${tt}_{3}=\frac{{l}_{3}}{0.2{*l}_{2}+{l}_{3}+0.7{*l}_{4}}*20$$
(7.156)

The estimated travel time in link \({l}_{4}\) is obtained by adding two parts, the first part is the travel time proportion between the 3rd and 4th timestamps in link \({l}_{4}\) (adding 0.7 of \({l}_{4}\) to 0.2 of the length of link \({l}_{2}\) plus the entire length of link \({l}_{3}\)). The second part is estimated directly from the proportion of link \({l}_{4}\) lying between 4 and 5th timestamps (a fraction of 7 s, which is the travel time between waypoints, calculated as 0.3 of the \({l}_{4}\) distance over the total distance between the 4th and 5th waypoints, that is 0.3 \({l}_{4}\)+0.2 \({l}_{5}\)). Overall, the travel time in link \({l}_{4}\) is given by Eq. 7.157 (in s):

$${tt}_{4}=\frac{0.7*{l}_{4}}{0.2*{l}_{2}+{l}_{3}+0.7*{l}_{4}}*20+\frac{0.3{*l}_{4}}{0.3{*l}_{4}+0.2*{l}_{5}}*7$$
(7.157)

Finally, once all the waypoint sequences are converted to several paths with full details at the link level, the link travel times are averaged. The outcome of this process is the set of observed link travel times at each time period \(t, {\widehat{tt}}_{lt} \forall l\in L, \forall t\in T\), for all links in the network that are monitored by GPS tracking. That is, the dataset of estimated link travel times. Despite possibly being huge the quantity of trajectories available for the target network, which will depend on the penetration rate of devices with GPS among the population, the final sample may uncover links. It is also possible that some of them are not fully covered by time information, as, for instance, the first and last links in each sequence (e.g., links 1 and 6 in the example in Fig. 7.7). Moreover, the procedure that infers link travel times can produce non-feasible values when they are below the free-flow link travel time. In these situations, scaled travel times are used (Eqs. 7.158 and 7.159):

$$\widehat{t}{t}_{l^\mathrm{^{\prime}}t}=R*t{{t}_{0}}_{{l}^{\mathrm{^{\prime}}}}$$
(7.158)
$$R={\mathrm{mean}}_{l\in GPS}\left(\frac{\widehat{t}{t}_{lt}}{t{{t}_{0}}_{{l}^{\mathrm{^{\prime}}}}}\right)$$
(7.159)

where \(t{{t}_{0}}_{{l}^{\mathrm{^{\prime}}}}\) is the free-flow travel time at each link, and \(R\) is computed using all observed link travel times and their corresponding free-flow travel times. That is, \(R\) is the arithmetic mean of the expanding factors found for each link and can be understood as a global expanding factor linked to the congestion effect. The methodological process for generating the observed link travel times dataset is summarized in Fig. 7.8.

Fig. 7.8
figure 8

Conceptual methodological approach to the process of importing waypoints into a network model and their use to estimate link travel times

The estimated average link travel times \({\widehat{tt}}_{lt}\) for each link \(l\in L\), for each time interval \(t\in T\) can be used to generate a plausible Route Choice Set \(\mathcal{K}=\left\{{K}_{ijr},\forall i\in O,\forall j\in D,\forall r\in \mathcal{T}\right\}\) of the most likely used paths between each origin and each destination at each departure time. This can be done by applying variants of Dijkstra-based algorithms explicitly accounting for commonalities between paths in terms of shared links, as in Krishnakumari et al. (2019). However, as we are in this case considering link travel times, other alternatives like those proposed by Chabini (1998), dealing directly with time-dependent shortest paths, can be more appropriate. Nassir et al. (2014), Janmyr and Wadell (2018), use the penalization of overlapping in terms of “commonality factors” proposed by Cascetta et al. (1996), Cascetta (2001) as a measure of similarity between alternatives. This allows overcoming the problems derived from the basic hypothesis of irrelevant alternatives with discrete-choice models reducing the systematic utility of paths, being this utility measured in terms of travel time, in proportion to its level of overlapping with other alternative paths. Such procedures can be additionally strengthened by applying the modification of the variant of Bovy et al. (2008) proposed by Janmyr and Wadell (2018). According to this modification, paths in \({K}_{ijr}\) are denoted here as \({k\left(i,j,r\right)\in K}_{ijr}\) in order to explicitly show the dependence on \(\left(i,j,r\right)\). Let’s assume that the sequence of links that compound a certain path \(k\left(i,j,r\right)\) is \({\Gamma }_{k\left(i,j,r\right)}=\left\{{e}_{1},\dots ,{e}_{m}\right\}\). Then, the proportion of paths choice for each path in the set \({K}_{ijr}\) is calculated in terms of the following modified discrete logit-based choice model that uses the commonality factor (CF) for each OD pair and time period (Eqs. 7.160 and 7.161):

$$C{F}_{k(i.j.r)}=\frac{1}{{\mu }_{{CF}_{k}}}*\sum_{a\in {\Gamma }_{k(,i,j,r)}}\left(\frac{{l}_{a}}{{L}_{k(i,j,r)}}*\mathrm{log}\left(\sum_{h\in {K}_{ijr}}({\delta }_{ahr}+1)\right)\right)$$
(7.160)
$${P}_{k(i,j,r)}=\frac{\mathrm{exp}[{\mu }_{{P}_{k}}(-\widehat{t}{t}_{k(i,j,r)}-C{F}_{k(i,j,r)})]}{\sum_{h\in {K}_{ijr}}\mathrm{exp}[{\mu }_{{P}_{k}}(-\widehat{t}{t}_{h(i,j,r)}-C{F}_{h(i,j,r)})]}$$
(7.161)

where \({\delta }_{ahr}=1\) if path \(h\in {K}_{ijr}\) uses link \(a\) at time \(r\) and \(0\) otherwise, \({l}_{a}\) is the length of link \(a\) and \({L}_{k(i,j,r)}\) is the total length of path \(k\in {K}_{ijr}\). In order to adapt magnitudes for the discrete-choice summation, \({\mu }_{{P}_{k}}\) and \({\mu }_{{CF}_{k}}\) are parameters fixed as in Eq. 7.162:

$${\mu }_{{P}_{k}}={\mu }_{C{F}_{k}}=\frac{1}{{\mathrm{mean}}_{k\in {K}_{ijr}}\left(\widehat{t}{t}_{k(i,j,r)}\right)}$$
(7.162)

These calculations provide the flow distribution for each path on the basis of observed path travel times, which are the summation of the observed time-dependent link travel times. That is, they consider the arrival time, \(\widehat{t}{t}_{at(k)}\), at each link \(a\) belonging to the path \(k(i,j,r)\) (Eq. 7.163):

$$\widehat{t}{t}_{k(i,j,r)}=\sum_{a\in {\Gamma }_{k(i,j,r)}}\widehat{t}{t}_{at(k)}$$
(7.163)

Once \({P}_{k}=\left\{{P}_{k\left(i,j,r\right)}\right\}\) is determined from the \(k\) shortest paths obtained from the estimated travel times, the estimated time-dependent assignment matrix \(\overline{A }= \left[{\overline{a} }_{ijr}^{lt}\right]\) can be calculated with Eq. 7.164 and 7.165:

$${\overline{a} }_{ijr}^{lt}=\sum_{k\in {K}_{ijr}}{\delta }_{k(i,j,r)}^{lt}{*P}_{k(i,j,r)}\;\; \forall i,j,r,l,t$$
(7.164)
$$\delta _{{k(i,j,r)}}^{{lt}} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if}}\;{\text{path}}\;k\left( {i,j,r} \right)\;{\text{uses}}\;{\text{link}}\;l\;{\text{at}}\;{\text{time}}\;t} \hfill \\ 0 \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.$$
(7.165)

where \({\delta }_{k(i,j,r)}^{lt}\) is the estimated incidence indicator.

This is the estimated assignment matrix that can replace the calculated assignment matrix from DTA in an alternative formulation of DODME. Therefore, the relationship in Eq. 7.76 that the assignment matrix establishes between estimated link flows \({y}_{lt}\) and estimated OD flows \({x}_{ijr}\) can now be rewritten as Eq. 7.166:

$${y}_{lt}=\sum_{i\in O}\sum_{j\in D}\sum_{r=1}^{t}{\overline{a} }_{ijr}^{lt}{*x}_{ijr}$$
(7.166)

If data collected from a sample of GPS-tracked vehicles is available and if it is possible to create a discrete time estimate of a seed OD-matrix from it, that is, the observed OD-matrix \({X}^{0}=\left[{x}_{ijr}^{0}\right]\), this last matrix could be expanded to estimate the OD-matrix in terms of the scaling factors per origins, \({\alpha }_{i}, \forall i\in O\), and per destinations \({\beta }_{j}, \forall j\in D,\) such that (Eq. 7.167):

$${x}_{ijr}={\alpha }_{i}{*\beta }_{j}*{x}_{ijr}^{0}$$
(7.167)

It can be assumed, as in all previous formulations, that a reliable historical OD-matrix \({X}^{H}\) is available. As already mentioned, this assumption would be questionable in long-term planning applications, as this matrix could be either largely outdated or simply not exist. However, its existence is a reasonable hypothesis in traffic management applications, where a surveillance system is already in operation and provides rich structural information (Ashok and Ben-Akiva 1993; Ben-Akiva et al. 2001; Djukic et al. 2018; Aimsun 2020). Once the existence of a historical OD-matrix accepted, the DODME problem can be reformulated in terms of the estimation of the scaling factors \({\alpha }_{i}\), and \({\beta }_{j}\), in the following way (Eq. 7.168):

$$\begin{gathered} \mathop {{\text{Min}}}\limits_{{\alpha _{i} ,\beta _{j} }} \left[ {w\left( {\sum\limits_{{i \in O}} {\sum\limits_{{j \in D}} {\sum\limits_{{r = 1}}^{t} {\left( {x_{{ijr}}^{H} - \alpha _{i} *\beta _{j} *x_{{ijr}}^{0} } \right)^{2} } } } } \right)} \right. \hfill \\ \quad \quad \quad \left. { + \sum\limits_{{l \in \hat{L}}} {\sum\limits_{{t \in {\mathcal{T}}}} {\left( {\hat{y}_{{lt}} - \sum\limits_{{i \in O}} {\sum\limits_{{j \in D}} {\sum\limits_{{r = 1}}^{t} {\alpha _{i} } } } *\beta _{j} *\bar{a}_{{ijr}}^{{lt}} *x_{{ijr}}^{0} } \right)^{2} } } } \right] \hfill \\ \end{gathered}$$
(7.168)
$$s.t.\,\, {\alpha }_{i},{\beta }_{j}\ge LB\;\; \forall i\in O, \forall j\in D$$

The problem variables are multiplicative scaling factors for each origin \({\alpha }_{i}\) and each destination \({\beta }_{j}\), which significantly reduces the number of variables from \(\left|I\right|* \left|J\right|* |T|\) to \(|I|+|J|.\) Moreover, the fact of using the scaling factors as variables aims at preserving the structure of the seed OD-matrix, as gravity models do. Since the model is no longer quadratic and is bounded from below, other optimization procedures could be advisable. Ros-Roca et al. (2021a, b) report good results using the L-BFGS-B method (Morales and Nocedal 2011). It is a quasi-Newton method suitable for constrained nonlinear problems with a high number of variables, and it efficiently reduces the memory requirements and the computational burden.

Theoretically, the lower bound (LB) should be a non-negativity constraint for all the scaling factors αi, βj. However, from a practical point of view, αi = 0 or βj = 0 implies that a positive OD flow of the seed OD-matrix from a certain origin or to certain destination would become null. Therefore, considering that the seed OD-matrix in Eq. 7.167 comes from reliable information on mobility, the scaling factors cannot be null and the lower bound should therefore be larger than zero.

If the quality of the observed seed matrix \({X}^{0}\) is questionable due to the conditions in which GPS data have been collected, (this could be the case for some commercial GPS data, as mentioned) but the historical matrix \({X}^{H}\) is very reliable, both matrices could be fused to generate an improved seed matrix (Ros-Roca et al. 2021b).

7 Measuring the Quality of the OD Estimates

A critical question when estimating an OD is how the quality of the resulting estimated matrix can be assessed. This quality has been usually assessed in terms of the convergence of the objective function and the \({R}^{2}\) fit between measured and simulated traffic counts at links with counting stations. From the optimization point of view, these measures are a good selection because they can show explicitly that the used method works specifically for the purpose of minimizing the objective function designed as an OD-matrix estimation problem. Furthermore, it verifies that the estimated OD acceptably replicates the observed flows. However, despite \({R}^{2}\) being a good indicator of how the optimization problem is performing, it can produce misleading results. For example, it is possible that a high regression is achieved but the resulting estimated OD-matrix does not match the reality of the demand pattern and the internal mobility of the study area. Therefore, some other indicators that evaluate the mobility patterns in the OD-matrices are needed.

These indicators do not pay any attention to the quality of the results from a structural point of view. In other words, they do not distinguish whether the traffic OD patterns resulting from the adjustment approach exhibit an acceptable degree of structural similarity to the historical OD-matrix (when a reliable one is available), or whether the used approach provides a perturbed matrix that, even fitting the observed link flows, is structurally different. If this last is the case, it could be doubtful that such a structural change could be physically interpretable in terms of the underlying transportation system. Particularly when considering increases or decreases in the total number of trips between transportation zones that cannot be consistent with the socioeconomic attributes of the zone generating or attracting them. Looking at the link-path relationships visualized on the right-hand side of Fig. 7.3, it may happen that the optimization process used to solve the DODME problem locally behaves as a retrogression model. This model could pull forth and back the OD flows in paths crossing the link with the counting station in order to fit measured and simulated flows as well as possible. This would just be a consequence of a numerical procedure, ignoring the underlying structure of the modeled reality.

A widely used proposal has been to resort to other goodness of fit indicators, like the Mean Square Error (MSE) and other similar ones (Hollander and Liu 2008). Other approaches consider alternative formulations of the objective function in terms of the distance function between the historical and the estimated OD-matrices. Classical distances between vectors can be applied to matrices by considering these matrices \({X}^{H}, X\in {\mathcal{M}}_{n}({\mathbb{R}})\) as vectors of \({X}^{H}, X\in {\mathbb{R}}^{n\times n}\). Euclidean, Manhattan, and other vector distances can be used in the objective function of the OD-estimation problem aimed at minimizing the distance between matrices. However, these metrics, although comparing the OD-matrices cell by cell, do not have the ability to capture the differences and similarities of many aspects, such as their structure. Therefore, the spatio-temporal similarities of OD-matrices are not captured by these measures (Djukic 2014) and it seems clear that alternatives to these vector measures must be used. Djukic (2014) or Behara (2019) present a reference matrix \({M}_{R}\), which could be considered as a hypothetical ground truth matrix, \({X}^{GT}\), and two additional matrices \({M}_{1}\) and \({M}_{2}\) generated by perturbations of that reference matrices, such that they clearly have different structures but are indistinguishable in terms of measures like MSE or similar. The example in Fig. 7.9. illustrates this situation. Let us consider the three matrices,\({M}_{R}\), \({M}_{1}\) and \({M}_{2},\) the reference and perturbed matrices, respectively, generated following the guidelines of Djukic (2014).

Fig. 7.9
figure 9

Comparing matrices with the same MSE and different structures

Comparing \({M}_{R}\), \({M}_{1}\) and \({M}_{2}\) in terms of MSE, the results are that MSE (\({M}_{R}\), \({M}_{1}\)) = MSE (\({M}_{R}\), \({M}_{2}\)) = 16. Therefore, MSE does not help to discriminate which of the two matrices \({M}_{1}\) and \({M}_{2}\) is closer to \({M}_{R}.\)

Djukic (2014) proposes a measure of structural similarity based on the Image Quality Assessment process for comparing two different images (Wang et al. 2004). This measure is the Structural SIMilarity index (SSIM) for a matrix of pixels, that is, the product of three different comparison components: luminance, contrast, and structure. Luminance corresponds to the intensity of illumination, which is indeed the mean of the different pixels in a sub-matrix. Contrast is the squared average between pixels once the luminance is removed, thus making it the standard deviation. Finally, the structure is compared by using the covariance between the two matrices. These three factors are firstly transformed with the aim of adjusting them to the interval \([\mathrm{0,1}]\), where 1 means perfect match and 0 means no match. SSIM is therefore a similarity measure that is independent of the magnitude of the values in the matrix. Equation 7.169 gives the formula summarizing this explanation:

$$SSIM\left(x, y\right)=l{\left(x,y\right)}^{\alpha }*c{\left(x,y\right)}^{\beta }*s{\left(x, y\right)}^{\gamma }$$
(7.169)

where luminance, contrast, and structure are, respectively, defined by Eqs. 7.1707.172:

$$l\left(x,y\right)=\frac{2{*\mu }_{x}*{\mu }_{y}+{C}_{1}}{{\mu }_{x}^{2}+{u}_{y}^{2}+{C}_{1}}$$
(7.170)
$$c\left(x,y\right)= \frac{2{*\sigma }_{x}*{\sigma }_{y}+{C}_{2}}{{\sigma }_{x}^{2}+{\sigma }_{y}^{2}+{C}_{2}}$$
(7.171)
$$s\left(x,y\right)=\frac{{\sigma }_{xy}+{C}_{3}}{{\sigma }_{x}{\sigma }_{y}+{C}_{3}}$$
(7.172)

and \({\mu }_{x}, {\sigma }_{x}, {\mu }_{y}, {\sigma }_{y}, {\sigma }_{xy}\) are the mean, standard deviation, and covariance of the vectors \(x\) and \(y\), respectively. \({C}_{1},{C}_{2}, {C}_{3}\) are stability constants aimed at avoiding numerical problems and are typically set to \({C}_{1}={C}_{2}=2*{C}_{3}=1.\) For their part, \(\alpha , \beta , \gamma\) are weighting coefficients typically set to 1 (Wang et al. 2004). In image comparison, because pixel proximity is crucial in image pattern recognition, Wang et al. (2004) propose to first generate sliding submatrices of dimension \(N\) entirely covering the image, then compute the SSIM index for each of them and, finally, calculate the MSSIM as the mean of the SSIM of all submatrices of dimension \(N\). Djukic (2014) assimilates the OD-matrix to an image whose pixels would be the OD cells and explores various alternatives for generating these sliding windows in terms of proximities. Behara (2019) and Behara et al. (2020) propose a procedure to generate them based on the geographical structure of the area spanned by the transport system object of study. Ros-Roca et al. (2020) propose to use rectangular sliding windows as submatrices corresponding to either rows or columns in the OD-matrix. In any case, SSIM will capture the similarity between these distributions by considering the mean, the variance, and the structure of departing and arriving distributions, all of which correspond to the structural property of the trip patterns described by the OD-matrix.

Furthermore, let us assume that the number of generated submatrices is \({N}_{s}\) that \(a\) and \(b\) are, respectively, the corresponding windows of the matrices \(A\) and \(B\) to compare and that SSIM(a,b) is their similarity value. Then, if MSSIM is \(SSIM(a,b)\) averaged over \({N}_{s}\) sliding windows, a key question arises. Particularly, whether all windows have the same weight or whether their role in the total demand requires that they have different weights. In the case of OD-matrices, it is obvious that not all origins or destinations are equivalent in a transport network. Therefore, a weighted MSSIM as in Wang and Simoncelli (2008) prioritizes those origins and destinations with more impact on the network. This proposed weighting average is defined as in Eq. 7.173:

$$MSSIM\left(A,B\right)=\frac{\sum_{i=1}^{{N}_{s}}W\left({a}_{i},{b}_{i}\right)*SSIM\left({a}_{i},{b}_{i}\right)}{\sum_{i=1}^{{N}_{s}}W\left({a}_{i},{b}_{i}\right)}$$
(7.173)

where ai, bi are, respectively, the \(i-th\) windows of A and B, while the weight \(w\left({a}_{i},{b}_{i}\right)\) is given by Eq. 7.174:

$$w\left({a}_{i},{b}_{i}\right)=\mathrm{log}\left[\left(1+\frac{{\sigma }_{{a}_{i}}^{2}}{{C}_{2}}\right)*\left(1+\frac{{\sigma }_{{b}_{i}}^{2}}{{C}_{2}}\right)\right]$$
(7.174)

The weighting factors for the sliding windows, in the case of OD-matrices, account for variances of the selected windows that, given how they are defined, represent the variance of trips from an origin to all destinations or from all origins to one destination. The use of MSSIM in addition to the conventional performance indicators has demonstrated that the usual \({R}^{2}\) goodness of fit between observed and simulated links flows must be carefully complemented (e.g., Djukic 2014; Behara et al. 2020; Ros-Roca et al. 2020, 2021b). Particularly, it must be complemented with a MSSIM analysis in order to check the structural quality of the estimated OD-matrix \(X\) when an acceptable historical \({X}^{H}\) that conveys reliable structural information on the OD patterns is available.

Comparing again \({M}_{R}\), \({M}_{1},\) and \({M}_{2}\) in terms of MSSIM, the results are, MSSIM \(({M}_{R}\), \({M}_{1}\)) = 0.914882 and MSSIM \({(M}_{R}\), \({M}_{2}\)) = 0.510276, which clearly shows that \({M}_{2}\) is structurally different from \({M}_{R}\).

The relevance of this structural similarity measure (Behara et al. 2020; Behara et al. 2021) led to explicitly include it in the objective function of the mathematical model for DODME, reformulating it as follows (Eqs. 7.1757.179):

$$\begin{aligned} {\text{Min}}\,\,Z\left( X \right) & = \frac{1}{2}*\left[ {\left( {c_{1} + \left( {Y - \hat{Y}} \right)^{T} *\left( {Y - \hat{Y}} \right)} \right)} \right] \\ & \quad \quad \quad *\left[ {\left( {c_{2} + f\left( {s,\hat{s}} \right)} \right)^{T} *\left( {c_{2} + f\left( {s,\hat{s}} \right)} \right)} \right] \\ \end{aligned}$$
(7.175)
$$Y=A*X$$
(7.176)
$$s=Q*X$$
(7.177)
$$f\left(s,\widehat{s}\right)= \frac{1-\rho \left(s,\widehat{s}\right)}{2}$$
(7.178)
$$\rho \left(s,\widehat{s}\right)=\frac{{\left(\widehat{s}-{\mu }_{\widehat{s}}\right)}^{T}*\left(s-{\mu }_{s}\right)}{\sqrt{{\left(\widehat{s}-{\mu }_{\widehat{s}}\right)}^{T}*\left(\widehat{s}-{\mu }_{\widehat{s}}\right)}*\sqrt{{\left(s-{\mu }_{s}\right)}^{T}*\left(s-{\mu }_{s}\right)}}$$
(7.179)

where \(A\) is the assignment matrix, \(Y\) and \(\widehat{Y}\) are, respectively, the estimated and the observed link flows at links with counting stations and \(s\) and \(\widehat{s}\) denote the observed and simulated flows at subpaths detected by Bluetooth (or Wi-Fi) antennas. For their part, \(Q\) is the corresponding subpath assignment matrix, while \({c}_{1}\) and \({c}_{2}\) are stabilizing constants. The algorithmic approach assumes that t \(A\) and \(Q\) are locally constant.

8 Concluding Remarks

The main objective of this Chapter has been to highlight the role of two key components of the engine of most traffic management and information systems. First, a Dynamic Traffic Model, usually a DTA or DUE, which is quite frequently supported by a Network Loading process based on a mesoscopic traffic simulation approach. Second, a Dynamic Origin–Destination Matrix Estimator (DODME) that suitably models the time-dependent mobility patterns. The main goals of these components are the estimation of the traffic state in the managed road network and its short-term prediction, accounting for impacts of external events like traffic incidents that would change the operational conditions in the network. Travel times are one of the main outputs describing these states for both managers and travelers in the network. Figure 7.10 conceptually summarizes a generic architecture of a traffic management and information system highlighting the role of these two key components and their interactions since, as it has been discussed in the chapter, the main input to a DTA or DUE is a Dynamic OD, and DOME procedures usually rely on information generated by a DTA.

Fig. 7.10
figure 10

Generic architecture of a traffic management and information system highlighting the role of the two main engine components: a DOMME and a DTA/DUE

This chapter has also provided an overview of the main approaches to both models, DODME and DTA/DUE, and their relationships. The role of one critical component, the dynamic assignment matrix, has been extensively discussed. This matrix describes the structure of the dynamic of the use of the links of the network by the traffic flows in the paths from origin to destinations. The possibility of exploiting the huge amount of traffic data supplied by ICT applications, which allows empirically reproducing the assignment matrix from data instead of from models in the direction of the data-driven modeling, has also been addressed. This trend is intellectually very appealing and, in fact, it is currently leaving the Academia realm to enter the domain of real-life applications, as it can be deduced from the last versions of some professional software platforms.