1 Introduction

One of the most popular approaches to travel forecasting in transportation networks is the four-step procedure (Dios Ortúzar and Willumsen 2011): sequential run of trip generation, trip distribution, modal split, and traffic assignment stages. However, this approach has a number of limitations, e.g. there is no convergence guarantee (Oppenheim 1995; Boyce et al. 1994; Boyce 2002).

To overcome this issue, there were proposed network equilibrium models (NE / combined models) which can be formulated as an optimization or, more generally, a variational inequality problem (Beckmann et al. 1956; De Cea et al. 2005). In particular, Evans (1976) reduced the problem of searching equilibrium in the case of one transport mode to a convex optimization problem, combining trip distribution and route assignment models. Authors of Florian and Nguyen (1978) made an extension to the multi-modal case, where destination and mode are chosen simultaneously with the same value of a calibration parameter. The first mathematical formulation of a network equilibrium model with hierarchical destination and mode choices was proposed in Fernández et al. (1994) — the approach was presented for modelling nested choice structure of trips using several modes (e.g. park’n ride trips). Abrahamsson and Lundqvist (1999) formulated a nested combined model where mode choice is conditioned by destination choice and demonstrated its application for the Stockholm region. The recent works (Chu 2018), Liu et al. (2018), and Gao et al. (2022) proposed the extensions of the combined models for the cases of modeling trip frequency, remote park-and-ride, and tourism demand, respectively.

Finding a solution in trip distribution and traffic assignment problems — whether they are considered separately in the four-step approach or combined into one network equilibrium problem — relies on numerical methods for convex optimization. E.g., a classic choice for the traffic assignment problem (which is the most computationally expensive part) is the Frank–Wolfe algorithm (Frank and Wolfe 1956), and for the trip distribution problem it is the Sinkhorn algorithm (Sinkhorn 1974). A class of path-based algortihms can be an alternative to the link-based Frank–Wolfe algorithm for solving traffic assignment problem: Chen et al. (2020), Xie et al. (2017), Babazadeh et al. (2020). A popular choice for solving an optimization problem in the above-mentioned combined models is a partial linearization algorithm of Evans (1976) and its modifications for multi-modal and multi-user cases (Abrahamsson and Lundqvist 1999; Boyce et al. 1983). Recently, in Yang et al. (2013); Fan et al. (2022); Zarrinmehr et al. (2019); Cabannes et al. (2019); Wang et al. (2022), improvements of these algorithms were presented. Also, in subsequent years there have been developed a lot of new optimization methods, in particular, accelerated gradient methods (Nesterov 2004, 2009, 2015), which can be applied to the described problems.

Another direction of research on travel modelling in recent years is related to capacitated transportation networks, which allow to overcome some limitations of the standard Beckmann traffic assignment model (Nesterov and De Palma 2003; Zokaei Aashtiani et al. 2021; De Cea et al. 2005; Wang et al. 2019; Smith et al. 2019; Anikin et al. 2020; Zhu et al. 2020).

For the best of our knowledge, there is no works considering the application of accelerated gradient methods to combined models.

In this paper, we consider:

  • An entropy-based trip distribution model with hierarchical choice structure (Wilson 1969; Fernández et al. 1994; Abrahamsson and Lundqvist 1999);

  • The Beckmann traffic assignment model with inelastic demand (Beckmann et al. 1956);

  • The stable dynamics traffic assignment model, where resulting flow distribution satisfy the network’s capacity constraints (Nesterov and De Palma 2003);

  • An NE model combining all the models mentioned above.

In the last case, we consider the nested combined model proposed in Abrahamsson and Lundqvist (1999), where transit and road networks are independent, and the transit network has constant travel costs. We extend it to the case of the stable dynamics model for traffic assignment.

We employ accelerated primal-dual gradient methods to solve corresponding optimization problems and compare their performances to the classic Sinkhorn, Frank–Wolfe, and generalised Evans algorithms. Also, we provide theoretical guarantees for their convergence rate.

The main contributions of the paper are the following:

  • We propose a way to solve the dual problem of the nested combined model of Abrahamsson and Lundqvist (1999) with a universal accelerated gradient method USTM (Gasnikov and Nesterov 2018);

  • We extend the nested combined model of Abrahamsson and Lundqvist (1999) to the case of capacitated networks: namely, we propose a way to solve the dual problem for searching equilibrium in combined trip distribution model with the nested choice structure and the stable dynamics traffic assignment model;

  • We provide theoretical upper bounds on the complexity of searching network equilibrium by the USTM algorithm.

  • We conducted numerical experiments comparing different algorithms on Moscow and Berlin transportation networks.

The paper is organized as follows. In Sect. 2, we give a general problem statement for a combined trip distribution, modal split, and traffic assignment model. In Sect. 3, we describe the primal-dual accelerated method to solve the NE problem and provide its convergence analysis. In Sects. 4 and 5, we describe optimization algorithms that we consider for separate traffic assignment and trip distribution models. Section 6 presents numerical experiments conducted on Moscow and Berlin transportation networks.

2 Problem statement

We start with the description of the Beckmann and the stable dynamics models for searching the road network user equilibrium. Similarly to Abrahamsson and Lundqvist (1999), we assume the road and the transit networks are independent, and there is no congestion effects in the transit network (its travel costs are constant and defined as the costs of the shortest routes). Then, in Sect. 2.2, we describe the trip distribution model with a hierarchical choice structure of destination and travel mode (by car, public transport, or on foot). And finally, in 2.3, we consider the combined trip distribution-modal split-assignment problem and formulate its dual problem.

2.1 Route assignment models

Let the urban road network be represented by a directed graph \({\mathcal {G}} = ( {\mathcal {V}}, {\mathcal {E}} )\), where vertices \({\mathcal {V}}\) correspond to intersections or centroids (Sheffi 1985) and edges \({\mathcal {E}}\) correspond to roads, respectively. Suppose we are given the travel demands: namely, let \(d_{ij}\)(veh/h) be a trip rate from origin i to destination j. We denote by \(P_{ij}\) the set of all simple paths from i to j. Respectively, \(P = \bigcup _{(i,j) \in OD} P_{ij}\) is the set of all possible routes for all origin–destination pairs OD. Agents traveling from node i to node j are distributed among paths from \(P_{ij}\), i.e. for any \(p \in P_{ij}\) there is a flow \(x_p \in {\mathbb {R}}_+\) along the path p, and \(\sum _{p \in P_{ij}} x_p = d_w\). Flows from origin nodes to destination nodes create the traffic in the entire network \({\mathcal {G}}\), which can be represented by an element of

$$\begin{aligned} X = X(d) = \Bigl \{x \in {\mathbb {R}}_{+}^{|P|}: \; \sum _{p \in P_{ij}} x_p = d_{ij}, \; (i, j) \in OD \Bigr \}. \end{aligned}$$
(1)

Note that the dimension of X can be extremely large: e.g. for \(n \times n\) Manhattan network \(\log |P| = \Omega (n)\). To describe a state of the network we do not need to know an entire vector x, but only flows on arcs:

$$\begin{aligned} f_e(x) = \sum _{p \in P} \delta _{e p} x_p \quad \text {for} \quad e \in {\mathcal {E}}, \end{aligned}$$

where \(\delta _{e p} = \mathbbm {1}\{e \in p\}\). Let us introduce a matrix \(\Theta\) such that \(\Theta _{e, p} = \delta _{e p}\) for \(e \in {\mathcal {E}}\), \(p \in P\), so in vector notation we have \(f = \Theta x\). To describe an equilibrium we use both path- and link-based notations (xt) or (ft).

Beckmann model (Beckmann et al. 1956; Patriksson 2015). One of the key ideas behind the Beckmann model is that the cost (e.g. travel time, gas expenses, etc.) of passing a link e is the same for all agents and depends solely on the flow \(f_e\) along it. In what follows, we denote this cost for a given flow \(f_e\) by \(t_e = \tau _e(f_e)\). In practice the BPR functions are usually employed (US Bureau of Public Roads 1964):

$$\begin{aligned} \tau _e(f_e) = {\bar{t}}_e \left( 1 + \rho \left( \frac{f_e}{{\bar{f}}_e}\right) ^{\frac{1}{\mu }} \right) , \end{aligned}$$
(2)

where \({\bar{t}}_e\) are free flow times, and \({\bar{f}}_e\) are road capacities of a given network’s link e. We take these functions with parameters \(\rho = 0.15\) and \(\mu = 0.25\).

Another essential point is a behavioral assumption on agents called the first Wardrop’s principle: we suppose that each of them knows the state of the whole network and chooses a path p minimizing the total cost

$$\begin{aligned} T_p(t) = \sum _{e \in p} t_e. \end{aligned}$$

The cost functions are supposed to be continuous, non-decreasing, and non-negative. Then \((x^*, t^*)\), where \(t^* = (t_e^*)_{e \in {\mathcal {E}}}\), is an equilibrium state, i.e. it satisfies conditions

$$\begin{aligned} t_e^* = \tau _e(f_e^*), \quad \text {where}\quad f^* = \Theta x^*, \\ x^*_{p_w} > 0 \Longrightarrow T_{p_{ij}}(t^*) = T_{ij}(t^*) = \min _{p \in P_{ij}} T_p(t^*), \end{aligned}$$

if and only if \(x^*\) is a minimum of the potential function:

$$\begin{aligned}&\Psi (x) = \sum _{e \in {\mathcal {E}}} \underbrace{\int _{0}^{f_e} \tau _e (z) d z}_{\sigma _e(f_e)} \longrightarrow \min _{f = \Theta x, \; x \in X} \\&\Longleftrightarrow \Psi (f) = \sum _{e \in {\mathcal {E}}} \sigma _e (f_e) \longrightarrow \min _{f = \Theta x : \; x \in X}, \end{aligned}$$
(B)

and \(t_e^* = \tau _e(f_e^*)\) (Beckmann et al. 1956).

Another way to find an equilibrium numerically is by solving a dual problem. We can construct it according to Theorem 4 from Nesterov and De Palma (2003), the solution of which is \(t^*\):

$$\begin{aligned} Q(t) = \sum _{{ij} \in OD} d_{ij} T_{ij}(t) - \underbrace{\sum _{e \in {\mathcal {E}}} \sigma _e^*(t_e)}_{h(t)} \longrightarrow \max _{t \ge {\bar{t}}}, \end{aligned}$$
(DualB)

where

$$\begin{aligned} \sigma _e^*(t_e) = \sup _{f_e \ge 0} \{t_e f_e - \sigma _e(f_e) \} = {\bar{f}}_e \left( \frac{t_e - {\bar{t}}_e}{{\bar{t}}_e \rho } \right) ^{\mu } \frac{\left( t_e - {\bar{t}}_e \right) }{1 + \mu } \end{aligned}$$

is the conjugate function of \(\sigma _e(f_e)\), \(e \in {\mathcal {E}}\).

When we search for the solution to this problem numerically, on every step of an applied method we can reconstruct primal variable f from the current dual variable t: \(f \in \partial \sum _{(i,j) \in OD} d_{ij} T_{ij}(t)\). Then we can use the duality gap — which is always nonnegative — for the estimation of the method’s accuracy:

$$\begin{aligned} \Delta (f, t) = \Psi (f) - Q(t). \end{aligned}$$

It vanishes only at the equilibrium \((f^*, t^*)\).

Stable dynamics model. Nesterov and De Palma (2003) proposed an alternative model called the stable dynamics model, which takes an intermediate place between static and dynamic network assignment models. Namely, its equilibrium can be interpreted as the stationary regime of some dynamic process. Its key assumption is that we no longer introduce a complex dependence of the travel cost on the flow (as in the standard static models), but only pose capacity constraints, i.e. the flow value on each link imposes the feasible set of travel times

$$\begin{aligned} \tau _e(f_e) = {\left\{ \begin{array}{ll} {\bar{t}}_e, &{} 0 \le f_e < {\bar{f}}_e,\\ \left[ {\bar{t}}_e, \infty \right] , &{} f_e = {\bar{f}}_e,\\ +\infty , &{} f_e > {\bar{f}}_e. \end{array}\right. } \end{aligned}$$
(3)

Unlike in the Beckmann model, there is no one-to-one correspondence between equilibrium travel times and flows on the links of the network. There are examples in Nesterov and De Palma (2003) illustrating the difference. Also, one can find in Chudak et al. (2007) a detailed comparison of equilibria in these two models conducted for large and small networks.

Hence, an equilibrium state \((x^*, t^*)\) of the stable dynamics model satisfies the next conditions:

$$\begin{aligned} t_e^* \in \tau _e(f_e^*), \\ x^*_{p_{ij}} > 0 \Longrightarrow T_{p_{ij}}(t^*) = T_{ij}(t^*). \end{aligned}$$

The above formula can be reformulated in terms of an optimization problem. The pair \((f^*, t^*)\) is an equilibrium if and only if it is a solution of the saddle-point problem

$$\begin{aligned} \sum _{e \in E} [t_e f_e - (t_e - {\bar{t}}_e) {\bar{f}}_e] \longrightarrow \min _{\begin{array}{c} f = \Theta x: \\ x \in X \end{array}} \max _{t_e \ge {\bar{t}}_e}, \end{aligned}$$
(SaddleSD)

where its primal problem is

$$\begin{aligned} \Psi (f) = \sum _{e \in {\mathcal {E}}} f_e {\bar{t}}_e \longrightarrow \min _{\begin{array}{c} f = \Theta x : \\ x \in X,\, f_e \le {\bar{f}}_e \end{array}}, \end{aligned}$$
(SD)

and its dual problem is

$$\begin{aligned} Q(t) = \sum _{(i,j) \in OD} d_{ij} T_{ij}(t) - \underbrace{\langle t - {\bar{t}}, {\bar{f}}\rangle }_{h(t)} \longrightarrow \max _{t_e \ge {\bar{t}}_e}. \end{aligned}$$
(DualSD)

In contrast with the Beckmann model, the equilibrium state in the stable dynamics model is defined by pair \((f^*, t^*)\) (in particular, it differs from the system optimum \((f^*, {\bar{t}})\) in the model only by the time value).

In both cases the dual problem has form

$$\begin{aligned} Q(t) = \sum _{(i,j) \in OD} d_{ij} T_{ij}(t) - h(t) \longrightarrow \max _{t \ge {\bar{t}}}. \end{aligned}$$

The optimization problem is convex, non-smooth and composite.

2.2 Trip distribution with modal split (D-MS)

Let us further assume that there are several trip purposes (demand layers), travel modes (transportation modes), and agents types. We use the logit model with calibration parameters \(\alpha _{am}\), \(\beta _{am}\) corresponding to choices of travel mode m by agents of the type a. Further, we consider the case when \(\alpha _{am}\) values are the same for all travel modes of agent type a:

$$\begin{aligned} \alpha _{am} {:}{=}\alpha _a. \end{aligned}$$

Necessity of this condition will be explained below.

For example, if we want to make travelling by car (travel mode \(m_1\)) unavailable for non-car-owners \(a_1\), we can set \(\beta _{a_1 m_1} {:}{=}\inf\) to get zero trips \(d^{a_1m_1}=0\). Thus, for every agent type a we can implicitly set its group (nest) of available travel modes.

To define destination choice model, we use the entropy-based trip distribution model of Wilson (1969). For every trip purpose r (e.g., home-work, home-other) we define calibration parameter \(\gamma _r\). This parameter defines the sensitivity of agents with the trip purpose r to trip length.

According to Abrahamsson and Lundqvist (1999), Fernández et al. (1994), we consider the following problem:

$$\begin{aligned} \sum _{i, j, r, a, m} d_{ij}^{ram} T_{ij}^m + \underbrace{\sum _{i, j, r, a} \frac{1}{\gamma _r} d_{ij}^{ra} \ln d_{ij}^{ra} + \sum _{i, j, r, a, m} \frac{1}{\alpha _a} d_{ij}^{ram} \left( \ln \left( \frac{d_{ij}^{ram}}{d_{ij}^{ra}} \right) + \beta _{am} \right) }_{H(d)} \rightarrow \min _{d \in \Pi '(l, w)}, \end{aligned}$$
(P1)

where

$$\begin{aligned} \Pi '(l, w) = \left\{ d_{ij}^{ram}\ge 0: \sum _{j, m} d_{ij}^{ram} = l_i^{ra}, \sum _{i, a, m} d_{ij}^{ram} = w_j^{r} \right\} , \end{aligned}$$

\(d_{ij}^{ram}\) is a number of trips from zone i to j by travel mode m of agents type a with trip purpose r and \(d_{ij}^{ra} = \sum _m d_{ij}^{ram}\); \(l_i^{ra}\) is a number of production from zone i of agents type a with trip purpose r; \(w_j^{r}\) is a number of attractions to zone j of the trip purpose r.

This is the combined trip distribution-modal split (D-MS) problem, where the choice structure is nested: travel mode choice is conditioned by destination choice (Abrahamsson and Lundqvist 1999). If \(\gamma _r\) and \(\alpha _a\) are equal, then (P1) reduces to the problem that also corresponds to D-MS model with simultaneous choices of destination and travel mode with the same calibration parameters (Florian and Nguyen 1978; Abrahamsson and Lundqvist 1999).

For fixed values \(d_{ij}^{ra}\), it is straightforward to check that the optimal \(d_{ij}^{ram}\) satisfy the following relation:

$$\begin{aligned} {{\,\textrm{prob}\,}}_{ij}^{ram} = \frac{d_{ij}^{ram}}{d_{ij}^{ra}} = \frac{\exp \left( - \alpha _a T_{ij}^m - \beta _{am}\right) }{ \sum _{m'} \exp \left( -\alpha _a T_{ij}^{m'} - \beta _{am'}\right) }, \end{aligned}$$
(4)

where \(T_{ij}^{am} = T_{ij}^m + \frac{\beta _{am}}{\alpha _a}\), i.e., the modal split corresponds to the logit model. Moreover,

$$\begin{aligned} \min _{d_{ij}^{ram}} \left( \sum _{i,j,r,a,m} d_{ij}^{ram}T_{ij}^{am}+ \frac{1}{\alpha _a} \sum _{i,j,r,a,m} d_{ij}^{ram}\ln \frac{d_{ij}^{ram}}{d_{ij}^{ra}} \right) = d_{ij}^{ra}\underbrace{\left( - \frac{1}{\alpha _a} \ln \sum _{m} \exp \left( - \alpha _a T_{ij}^{am}\right) \right) }_{T_{ij}^a}, \end{aligned}$$

where \(T_{ij}^a\) is a composite travel cost for agents of type a.

Substituting \(d_{ij}^{ram}= {{\,\textrm{prob}\,}}_{ij}^{ram}d_{ij}^{ra}\) we reduce the problem (P1) to

$$\begin{aligned} E(d, T) = \sum _{i,j,r,a} d_{ij}^{ra}T_{ij}^a+ \sum _{i,j,r,a} \frac{1}{\gamma _r} d_{ij}^{ra}\ln d_{ij}^{ra}\rightarrow \min _{d \in \Pi (l, w)}, \end{aligned}$$
(E)

where

$$\begin{aligned} \Pi (l,w) = \left\{ d \ge 0: \sum _{j} d_{ij}^{ra}= l_i^{ra}, \sum _{i,a} d_{ij}^{ra}= w_j^{r} \right\} , \end{aligned}$$

and \(T_{ij}^a = - \frac{1}{\alpha _a} \ln \sum _m \exp \left( - \alpha _a T_{ij}^{m}- \beta _{am} \right)\).

Let us derive its dual problem. In our problem statement, the system of constraints \(\Pi '(l, w)\) is consistent \(\sum _{i,a} l_i^{r,a} = \sum _{j} w_j^r\). Therefore Gasnikov et al. (2015), we can introduce a tautological constraint

$$\begin{aligned}&\sum _{i,j,a} d^{ra}_{ij} = \sum _{i,a} l_i^{r,a} = \sum _{j} w_j^r = N_{r}. \end{aligned}$$
(5)

We will utilize tautological constraint (5) to obtain dual function with bounded subgradient norm.

$$\begin{aligned}&\min _{d \in \Pi (l, w)} \sum _{i,j,r,a} d_{ij}^{ra}T_{ij}^a+ \sum _{i,j,r,a} \frac{1}{\gamma _r} d_{ij}^{ra}\ln d_{ij}^{ra}\nonumber \\ {}&= \min _{\begin{array}{c} {d^{ra}_{ij}}\ge 0\\ \sum _{i,j,a} {d^{ra}_{ij}}= N_r \end{array}} \max _{\lambda \ge 0} \sum _r \frac{1}{\gamma }_r \sum _{i,j,a} {d^{ra}_{ij}}\ln {d^{ra}_{ij}}+ \sum _{i,j,a} {d^{ra}_{ij}}T_{ij}^a+ \sum _{i,a} \lambda ^l_{rai} \left( \sum _j {d^{ra}_{ij}}- l_i^{ra} \right) \nonumber \\ {}&+ \sum _{j} \lambda ^w_{rj} \left( \sum _{i,a} {d^{ra}_{ij}}- w^r_j \right) \nonumber \\ {}&= \max _{\lambda \ge 0} \sum _r \min _{\begin{array}{c} {d^{ra}_{ij}}\ge 0\\ \sum _{i,j,a} {d^{ra}_{ij}}= N_r \end{array}} \frac{1}{\gamma }_r \sum _{i,j,a} {d^{ra}_{ij}}\ln {d^{ra}_{ij}}+ \sum _{i,j,a} {d^{ra}_{ij}}T_{ij}^a+ \sum _{i,a} \lambda ^l_{rai} \left( \sum _j {d^{ra}_{ij}}- l_i^{ra} \right) \nonumber \\ {}&+ \sum _{j} \lambda ^w_{rj} \left( \sum _{i,a} {d^{ra}_{ij}}- w^r_j \right) \nonumber \\ {}&= -\min _{\lambda \ge 0} \varphi (\lambda ^l, \lambda ^w, T), \end{aligned}$$
(6)

where by \(\varphi (\lambda ^l, \lambda ^w, T)\) we denoted the negative of the dual function.

Let y be the dual variable for the tautological constraint \(\sum _{i,j,a} d^{ra}_{ij} = N_{r}\). Then taking the gradient by d we get one of the optimality conditions for the inner minimization problem:

$$\begin{aligned} \frac{1}{\gamma _r}\left( \ln {d^{ra}_{ij}}+ 1 \right) + T_{ij}^a+ \lambda ^l_{rai} + \lambda _{rj}^w + y = 0, \end{aligned}$$

therefore

$$\begin{aligned} {d^{ra}_{ij}}= \exp \left( -1 - \gamma _r \left( T_{ij}^a+ \lambda ^l_{rai} + \lambda _{rj}^w + y \right) \right) . \end{aligned}$$

By choosing y such that \({d^{ra}_{ij}}\) satisfies \(\sum _{i,j,a} d^{ra}_{ij} = N_{r}\) we obtain

$$\begin{aligned} d^{ra}_{ij}(\lambda ^l_r, \lambda ^w_r, T) = \frac{N_r \exp \left( -\gamma _r\left( T^{a}_{ij} + \lambda ^l_{rai} + \lambda ^w_{rj} \right) \right) }{\sum _{i,j,a}\exp \left( -\gamma _r\left( T^{a}_{ij} + \lambda ^l_{rai} + \lambda ^w_{rj} \right) \right) }. \end{aligned}$$
(7)

Substituting this into (6) yields

$$\begin{aligned} \varphi (\lambda ^l, \lambda ^w, T) = \sum _r \frac{N_r}{\gamma _r} \ln {\frac{1}{N_r}\sum _{i,j,a}\exp \left( -\gamma _r\left( T_{ij}^a+ \lambda ^l_{rai} + \lambda ^w_{rj} \right) \right) } + \sum _{i,a} \lambda _{rai}^l l_i^{ra} + \sum _j \lambda _{rj}^w w^r_j. \end{aligned}$$
(8)

2.3 Combined distribution-modal split-assignment problem (D-MS-A)

Now, we combine the road, the transit, and the pedestrian networks into one multi-modal network, which we denote again by \({\mathcal {G}} = ( {\mathcal {V}}, {\mathcal {E}} )\). Slightly abusing notations, in the same way as in Sect. 2.1 we can define the set of path flows X(d) corresponding to an interzonal trip matrix \(d \in \Pi '(l, w)\), and link flows \(f_e^{ram}\), \(f_e^m = \sum _{r,a} f_e^{ram}\), and \(f_e = \sum _{m} f_e^m\).

According to Abrahamsson and Lundqvist (1999), the combined distribution-modal split-assignment problem can be formulated as follows:

$$\begin{aligned} P_3(f,d) = \Psi (f) + H(d) \rightarrow \min _{\begin{array}{c} f = \Theta x, \; x \in X(d)\\ d \in \Pi '(l,w) \end{array}}, \end{aligned}$$
(P3)

where

$$\begin{aligned} \Psi (f) = \sum _{e \in {\mathcal {E}}} \left( \sigma _e(f_e) + \sum _{m \in V} c_e^{m} f_e^{m}\right) . \end{aligned}$$

Similarly to Sect. 2.1 we obtain from (P3) the saddle-point problem

$$\begin{aligned} S_3'(d, t) = \sum _{i,j,r,a,m} d_{ij}^{ram}T_{ij}^{m}(t) - h(t) + H(d) \rightarrow \min _{d \in \Pi '(l, w)} \ \max _{t \ge {\bar{t}}}, \end{aligned}$$
(S3')

where \(T_{ij}^{m}(t)\) is the minimal cost of the path from \(i \in O\) to \(j \in D\) with the links cost \(t_e + c_e^m\). According to Sect. 2.2, the above problem reduces to

$$\begin{aligned} S_3(d,t) = \underbrace{\sum _{i,j,r,a} d_{ij}^{ra}T_{ij}^a(t) + \sum _{i,j,r,a} \frac{1}{\gamma _r} d_{ij}^{ra}\ln d_{ij}^{ra}}_{ E(d, T(t))} - h(t) \rightarrow \min _{d \in \Pi (l,w)} \max _{t \ge {\bar{t}}}, \end{aligned}$$
(S3)

where \(T_{ij}^a(t) = - \frac{1}{\alpha _a} \ln \left( \sum _m \exp \left( - \alpha _a T_{ij}^{m}(t) - \beta _{am} \right) \right)\).

Respectively, the dual problem is

$$\begin{aligned} D_3(\lambda , t) = -\varphi (\lambda ^l, \lambda ^w, T(t)) - h(t) \longrightarrow \max _{t \ge {\bar{t}}, \; \lambda ^l, \lambda ^w}. \end{aligned}$$
(D3)

Thus, there are several ways to formulate an optimization problem. In this paper, we consider the following particular formulation of the problem and further provide convergence analysis of the accelerated gradient method application to it:

$$\begin{aligned} D'_3 (t) = - \varphi _3(t) - h(t) \longrightarrow \max _{t \ge {\bar{t}}}, \end{aligned}$$
(D3')

where

$$\begin{aligned} \varphi _3(t) = \min _{\lambda } \varphi (\lambda , T(t)) = - \min _{d \in \Pi (l,w)} E(d, T(t)). \end{aligned}$$

3 Dual approach for solving the combined model

In Sect. 3.1, we describe the universal gradient method of similar triangles (USTM) for solving the dual problem (D3’) of the described combined model. And we provide its convergence analysis in Sect. (3.2).

3.1 Dual method for NE problem

Algorithm 1
figure a

Universal Method of Similar Triangles

A popular approach for searching equilibrium in combined models is the partial linearization algorithm of Evans (1976) and its modifications for multi-modal and multi-user cases (Abrahamsson and Lundqvist 1999; Boyce et al. 1983). The approach is further developed in Yang et al. (2013) by incorporating better line-search procedures. Note that the algorithm can be viewed as partly dual, because it is formulated in terms external to the primal problem: it includes cost matrices \(T_{ij}\), which are the dual variables of the saddle-point problem (S3) (or (S3’)) or the dual problem (D3). But still, the algorithm is essentially primal, since it optimizes (P3) by flow and trip distribution pair.

Here we propose an alternative approach based on solving the dual problem (D3’) with the universal method of similar triangles (USTM), and afterwards we prove the convergence rates for it.

Algorithm 1 provides the pseudocode of USTM with an inexact oracle and the euclidean prox-structure. Here we used the following notations:

$$\begin{aligned} \phi _0(t) = \frac{1}{2} \Vert t - t^0\Vert _2^2, \\ \phi _{k+1}(t) = \phi _k(t) + \alpha _{k+1} \left[ {\tilde{\Phi }}(y^{k+1}) + \left\langle {\tilde{\nabla }} \Phi (y^{k+1}), t - y^{k+1} \right\rangle + h(t) \right] . \end{aligned}$$

Note that we did not specify the stopping criterion as it can be different for different models (Fan et al. 2022).

To find a network equilibrium in D-MS-A model, we apply USTM to minimize the composite objective \(-D'_3(t) = \varphi _3(t) + h(t)\) in (D3’), thus we set \(\Phi (t) {:}{=}\varphi _3(t)\) in Algorithm 1. Recall that

$$\begin{aligned} \varphi _3(t) = - \min _{d \in \Pi (l, w)} E(d, T(t)). \end{aligned}$$

Note that at each iteration we need to compute \(\varphi _3\) and \(\nabla \varphi _3\) with travel costs \(y^{k+1}\), \(t^{k+1}\). Since this itself is done by an iterative procedure, we cannot expect to find the exact solution of the subproblem. Instead, we use the following inexact oracle for \(\varphi _3\):

$$\begin{aligned} {\tilde{\varphi }}_3(t) = \varphi _{3,\delta }(t) = - E({\tilde{d}}(t), T(t)), \quad {\tilde{\nabla }} \varphi _3(t) = \nabla _\delta \varphi _3(t) = - \nabla _t E({\tilde{d}}(t), T(t)), \end{aligned}$$

where \(\delta \ge 0\) and \({\tilde{d}}(t) = d_\delta (t) \in \Pi (l, w)\) (see Sect. 5.4) is a \(\delta\)-solution of \(\min _d E(d, T(t))\), i.e.

$$\begin{aligned} E(d_\delta (t), T(t)) \le \min _{d \in \Pi (l, w)} E(d, T(t)) + \delta . \end{aligned}$$

Recall that E(dT(t)) is concave w.r.t. t, and its superdifferential \(\partial _t E(d, T(t))\) is given by

$$\begin{aligned} \partial _t E(d, T(t)) = \sum _{i,j,r,a} d_{ij}^{ra} \partial T_{ij}^a(t). \end{aligned}$$

Further, \(d_{ij}^{ra} \partial T^a_{ij}(t) = \sum _m d_{ij}^{ram} \partial T_{ij}^m(t)\), and

$$\begin{aligned} \partial T_{ij}^m(t) = \partial \min _{a_p^m \in P_{ij}^m} \langle t, a_{pm} \rangle = \textrm{Conv}\left\{ a_p^m \in P_{ij}^m: \langle t, a_{pm} \rangle = T_{ij}^m(t)\right\} , \end{aligned}$$

where \(a_p^m \in \{0,1\}^{|{\mathcal {E}}|}\) is a binary vector encoding a path p for the travel mode m. Note that several shortest paths may exist. Finally, we get that

$$\begin{aligned} \partial _t E(d, T(t)) = \sum _{i,j,r,a,m} d_{ij}^{ram} \textrm{Conv}\left\{ a_p^m \in P_{ij}^m: \langle t, a_{pm} \rangle = T_{ij}^m(t)\right\} , \end{aligned}$$
(9)

and any supergradient \(\nabla _t E(d, T(t))\) is a vector of link flows by shortest paths, corresponding to the trip distribution d.

Since we solve the dual problem (D3’), we need a way to recover an approximate solution (df) of the primal problem (P3). For any \(t \ge 0\), given \({\tilde{d}}_{ij}^{ra}(t)\) and \(T_{ij}^m(t)\), define \(d'(t) \in \Pi '(l, w)\) by formula (4). Then we reconstruct a full correspondence matrix after K iterations of Algorithm 1 as

$$\begin{aligned} {\hat{d}}^K = \frac{1}{A_K} \sum _{k=1}^K \alpha _k d'(y^k) \in \Pi '(l, w). \end{aligned}$$
(10)

Corresponding link flows can be recovered as (see Kubentayeva and Gasnikov 2021, f. (18))

$$\begin{aligned}{}[{\hat{f}}^K]_e^m = \frac{1}{A_K} \sum _{k=1}^K \alpha _k [f^k]_e^m. \end{aligned}$$
(11)

where \(f^k\) are link flows by shortest paths for times \(y^k\) and correspondence matrix \(d'(y^k)\), such that

$$\begin{aligned} \sum _m [f^k]_e^m = - \left[ {\tilde{\nabla }} \varphi _3(y^k)\right] _e, \quad e \in {\mathcal {E}}. \end{aligned}$$

3.2 Convergence analysis

Below we derive some properties of the problem and then use them to apply the USTM convergence theorem, what gives us the convergence rate of our dual algorithm for searching equlibria in combined model.

The next lemma is a trivial counterpart of f. (5) in Kubentayeva and Gasnikov (2021), following from (9).

Lemma 1

For any \(d, d' \in \Pi (l, w)\), \(t, t' \ge 0\), and supergradients \(\nabla _t E(d, T(t))\), \(\nabla _t E(d', T(t'))\) it holds that \(\Vert \nabla _t E(d, T(t)) - \nabla _t E(d', T(t'))\Vert _2 \le M = \sqrt{2\,H} N\), where \(H \le |{\mathcal {V}}|-1\) is the maximum simple path length in the network, and \(N = \sum _r N_r\) is the total number of active agents.

Typically (e.g. for a Manhattan network) \(H = O(\sqrt{|{\mathcal {V}}|})\).

Recall the following standard result concerning inexact oracles.

Lemma 2

For any \(t, t' \ge 0\)

$$\begin{aligned} {\tilde{\varphi }}_3(t') + \delta \ge \varphi _3(t') \ge {\tilde{\varphi }}_3(t) + \langle {\tilde{\nabla }} \varphi _3(t), t' - t \rangle . \end{aligned}$$

Proof

Since E(dT(t)) is concave w.r.t. t,

$$\begin{aligned} E({\tilde{d}}(t), T(t')) \le E({\tilde{d}}(t), T(t)) + \langle \nabla _t E({\tilde{d}}(t), T(t)), t' - t \rangle = - {\tilde{\varphi }}_3(t) - \langle {\tilde{\nabla }} \varphi _3(t), t' - t \rangle . \end{aligned}$$

Thus,

$$\begin{aligned} \varphi _3(t') \ge - E({\tilde{d}}(t), T(t')) \ge {\tilde{\varphi }}_3(t) + \langle {\tilde{\nabla }} \varphi _3(t), t' - t \rangle . \end{aligned}$$

The claim follows. \(\square\)

The following bound is the main tool to prove the convergence rate. It is an analogue of Lemma 2 in Gasnikov and Nesterov (2016) adapted to the case of an inexact oracle.

Lemma 3

Assume at the k-th iteration of Algorithm 1 we call an inexact oracle \(({\tilde{\Phi }}, {\tilde{\nabla }} \Phi )\) satisfying

$$\begin{aligned} {\tilde{\Phi }}(t') + \delta _k \ge \Phi (t') \ge {\tilde{\Phi }}(t) + \langle {\tilde{\nabla }} \Phi (t), t' - t \rangle \quad \forall t, t' \in {{\,\textrm{dom}\,}}h, \end{aligned}$$

with \(\delta _k = \frac{\alpha _{k+1} \varepsilon }{4 A_{k+1}}\). Then for any \(k > 0\)

$$\begin{aligned} \Phi (t^k) + h(t^k) \le \frac{1}{A_k} \phi _k(u^k) + \frac{3 \varepsilon }{4} \le \frac{1}{A_k} \phi _k(t) - \frac{1}{2 A_k} \Vert t - u^k\Vert _2^2 + \frac{3 \varepsilon }{4} \quad \forall t \in {{\,\textrm{dom}\,}}h. \end{aligned}$$

Moreover,

$$\begin{aligned} \Phi (t^k) + h(t^k) + \frac{1}{2 A_k} \Vert t^* - u^k\Vert _2^2 \le \Phi (t^*) + h(t^*) + \frac{1}{2 A_k} \Vert t^* - t^0\Vert _2^2, \end{aligned}$$

where \(t^* = \mathop {\mathrm {arg\,min}}\limits \Phi (t) + h(t)\).

Proof

We are going to prove by induction that

$$\begin{aligned} A_k (\Phi (t^k) + h(t^k)) \le \phi _k(u^k) + A_k \frac{3 \varepsilon }{4}. \end{aligned}$$

Note that since \(\phi _k\) is 1-strongly convex,

$$\begin{aligned} \phi _k(u^k) \le \phi _k(t) - \frac{1}{2} \Vert t - u^k\Vert _2^2 \quad \forall t \in {{\,\textrm{dom}\,}}h. \end{aligned}$$

The inner stopping criterion yields that

$$\begin{aligned} {\tilde{\Phi }}(t^{k+1})&\le {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), t^{k+1} - y^{k+1} \rangle + \frac{L_{k+1}}{2} \Vert t^{k+1} - y^{k+1}\Vert _2^2 + \frac{\alpha _{k+1}}{2 A_{k+1}} \varepsilon \\&= {\tilde{\Phi }}(y^{k+1}) + \frac{\alpha _{k+1}}{A_{k+1}} \left[ \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - u^k \rangle + \frac{\varepsilon }{2}\right] + \frac{1}{2 A_{k+1}} \Vert u^{k+1} - u^k\Vert _2^2. \end{aligned}$$

By the assumptions of the lemma,

$$\begin{aligned} {\tilde{\Phi }}(y^{k+1}) \le \Phi (t^k) - \langle {\tilde{\nabla }} \Phi (y^{k+1}), t^k - y^{k+1}\rangle = \Phi (t^k) + \frac{\alpha _{k+1}}{A_{k+1}} \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^k - t^k\rangle , \end{aligned}$$

thus

$$\begin{aligned} A_{k+1}&{\tilde{\Phi }}(t^{k+1}) \le A_{k+1} {\tilde{\Phi }}(y^{k+1}) + \alpha _{k+1} \left[ \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - u^k \rangle + \frac{\varepsilon }{2}\right] + \frac{1}{2} \Vert u^{k+1} - u^k\Vert _2^2 \\&\le A_k \Phi (t^k) + \alpha _{k+1} \left[ {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - u^k + \frac{A_k}{A_{k+1}} (u^k - t^k)\rangle + \frac{\varepsilon }{2}\right] \\&\quad + \frac{1}{2} \Vert u^{k+1} - u^k\Vert _2^2 \\&= A_k \Phi (t^k) + \alpha _{k+1} \left[ {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - y^{k+1}\rangle + \frac{\varepsilon }{2}\right] + \frac{1}{2} \Vert u^{k+1} - u^k\Vert _2^2 . \end{aligned}$$

Now, using the convexity of h and the definition of \(\delta _k\) we obtain that

$$\begin{aligned} A_{k+1}&\left[ \Phi (t^{k+1}) + h(t^{k+1})\right] \\&\le A_{k+1} \left[ {\tilde{\Phi }}(t^{k+1}) + \delta _k\right] + A_k h(t^k) + \alpha _{k+1} h(u^{k+1}) \\&\le A_k \left[ \Phi (t^k) + h(t^k)\right] + \alpha _{k+1} \left[ {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - y^{k+1}\rangle + h(u^{k+1}) + \frac{3 \varepsilon }{4}\right] \\&\quad + \frac{1}{2} \Vert u^{k+1} - u^k\Vert _2^2 \\&\le \phi _k(u^{k+1}) + \alpha _{k+1} \left[ {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - y^{k+1}\rangle + h(u^{k+1})\right] + A_{k+1} \frac{3 \varepsilon }{4} \\&= \phi _{k+1}(u^{k+1}) + A_{k+1} \frac{3 \varepsilon }{4} . \end{aligned}$$

The last claim of the lemma follows from the inequality

$$\begin{aligned} \Phi (t^*) \ge {\tilde{\Phi }}(t) + \langle {\tilde{\nabla }} \Phi (t), t^* - t \rangle , \end{aligned}$$

what implies

$$\begin{aligned} \frac{1}{A_k} \phi _k(t^*) \le \Phi (t^*) + h(t^*) + \frac{1}{2 A_k} \Vert t^* - t^0\Vert _2^2. \end{aligned}$$

\(\square\)

Now we are ready to prove the main result of this section: a primal-dual convergence rate for USTM in the combined model. The complexity analysis in the next theorem is similar to Theorems 3 and 4 in Kubentayeva and Gasnikov (2021), where USTM was applied to the route assignment problem.

Theorem 1

Assume \(t^0 = {\bar{t}}\), \(L_0 \le \frac{M^2}{\varepsilon }\), and at the k-th iteration problem (E) is solved with accuracy \(\delta _k = \frac{\alpha _{k+1} \varepsilon }{4 A_{k+1}}\). Define

$$\begin{aligned} R = \Vert t^* - {\bar{t}}\Vert , \quad {\tilde{R}}^2 = \rho ^2 N^{2 / \mu } \sum _{e \in E} \frac{{\bar{t}}_e^2}{{\bar{f}}_e^{2 / \mu }}. \end{aligned}$$

Then in the case of Beckmann’s model, after at most

$$\begin{aligned} K = 4 \left( \frac{M {\tilde{R}}}{\varepsilon }\right) ^2 \end{aligned}$$

iterations it holds that

$$\begin{aligned} 0 \le P_3({\hat{d}}^K, {\hat{f}}^K) - D_3'(t^K) \le \varepsilon . \end{aligned}$$

In the case of Stable Dynamics model, after at most

$$\begin{aligned} K = 4 \left( \frac{M R}{\varepsilon }\right) ^2 \end{aligned}$$

iterations it holds that

$$\begin{aligned} 0 \le P_3({\hat{d}}^K, {\hat{f}}^K) - D_3'(t^K) + \langle ({\hat{f}}^K - {\bar{f}})_+, t^* - {\bar{t}}\rangle \le \varepsilon , \\ \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2 \le \frac{2 \varepsilon }{R}. \end{aligned}$$

Proof

By Lemma 1,

$$\begin{aligned} {\tilde{\Phi }}(t^{k+1}) \le {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), t^{k+1} - y^{k+1} \rangle + M \Vert t^{k+1} - y^{k+1}\Vert _2. \end{aligned}$$

Then f. (A3) from the proof of Theorem 3 in Kubentayeva and Gasnikov (2021) ensures that for all k

$$\begin{aligned} A_k \ge \frac{\varepsilon k}{2 M^2}. \end{aligned}$$

Recall that \(f = f^k\) are link flows by shortest paths for times \(y = y^k\) and interzonal trips \(d' = d'(y^k)\). Then, according to Sect. 2.3,

$$\begin{aligned} E({\tilde{d}}(y), T(y)) = \sum _{i, j, r, a, m} d_{ij}^{ram} T_{ij}^m(y) + H(d') = \sum _{e,m} f_e^m (y_e + c_e^m) + H(d'), \end{aligned}$$

and for any t

$$\begin{aligned} {\tilde{\varphi }}_3(y) + \left\langle {\tilde{\nabla }}\varphi _3(y), t - y \right\rangle&= -E({\tilde{d}}(y), T(y)) - \langle \nabla _t E({\tilde{d}}(y), T(y)), t - y \rangle \\&= - \sum _{e,m} f_e^m (y_e + c_e^m) - H(d') + \sum _{e,m} f_e^m (y_e - t_e) \\&= - \sum _{e,m} f_e^m (t_e + c_e^m) - H(d'). \end{aligned}$$

Therefore, due to the convexity of the entropy,

$$\begin{aligned} \phi _K(t)&= \sum _{k=1}^K \alpha _k \left( {\tilde{\varphi }}_3(y^k) + \left\langle {\tilde{\nabla }} \varphi _3(y^K), t - y^k \right\rangle + h(t)\right) + \frac{1}{2} \Vert t - t^0\Vert _2^2 \\&= - A_K \sum _{e,m} [{\hat{f}}^K]_e^m (t_e + c_e^m) - \sum _{k=1}^K \alpha _k H(d'(y^k)) + A_K h(t) + \frac{1}{2} \Vert t - {\bar{t}}\Vert _2^2 \\&\le A_K \sum _{e} \left( \sigma ^*_e(t_e) - [{\hat{f}}^K]_e t_e - \sum _m [{\hat{f}}^K]_e^m c_e^m\right) - A_K H({\hat{d}}^K) + \frac{1}{2} \Vert t - {\bar{t}}\Vert _2^2. \end{aligned}$$

Then by Lemma 3,

$$\begin{aligned} -D_3'(t^K) \le \sum _{e} \left( \sigma ^*_e(t_e) - [{\hat{f}}^K]_e t_e - \sum _m [{\hat{f}}^K]_e^m c_e^m\right) - H({\hat{d}}^K) + \frac{1}{2 A_K} \Vert t - {\bar{t}}\Vert _2^2 + \frac{3 \varepsilon }{4}. \end{aligned}$$

The rest of the proof repeats the proofs of Lemmata 1 and 2 in Kubentayeva and Gasnikov (2021), mutatis mutandis. In the case of the Beckmann model, we substitute \(t_e = \tau _e\left( [{\hat{f}}^K]_e\right)\), what gives us the bound

$$\begin{aligned} -D_3'(t^K)&\le - \sum _{e} \left( \sigma _e\left( [{\hat{f}}^K]_e\right) + \sum _m [{\hat{f}}^K]_e^m c_e^m\right) - H({\hat{d}}^K) + \frac{{\tilde{R}}^2}{2 A_K} + \frac{3 \varepsilon }{4} \\&= - \Psi ({\hat{f}}^K) - H({\hat{d}}^K) + \frac{{\tilde{R}}^2}{2 A_K} + \frac{3 \varepsilon }{4} \le - P_3({\hat{d}}^K, {\hat{f}}^K) + \frac{(M {\tilde{R}})^2}{K \varepsilon } + \frac{3 \varepsilon }{4}. \end{aligned}$$

In the case of the Stable Dynamics model,

$$\begin{aligned} -D_3'(t^K)&\le \min _{t \ge {\bar{t}}} \left\{ \sum _{e} \left( {\bar{f}}_e (t_e - {\bar{t}}_e) - [{\hat{f}}^K]_e t_e - \sum _m [{\hat{f}}^K]_e^m c_e^m\right) + \frac{1}{2 A_K} \Vert t - {\bar{t}}\Vert _2^2\right\} \\&- H({\hat{d}}^K) + \frac{3 \varepsilon }{4} \\&= - \Psi ({\hat{f}}^K) - H({\hat{d}}^K) + \min _{t \ge {\bar{t}}} \left( \langle {\bar{f}}- {\hat{f}}^K, t - {\bar{t}}\rangle + \frac{1}{2 A_K} \Vert t - {\bar{t}}\Vert _2^2\right) + \frac{3 \varepsilon }{4} \\&= - P_3({\hat{d}}^K, {\hat{f}}^K) - \frac{A_K}{2} \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2^2 + \frac{3 \varepsilon }{4} \\&\le - P_3({\hat{d}}^K, {\hat{f}}^K) - \frac{K \varepsilon }{4 M^2} \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2^2 + \frac{3 \varepsilon }{4}. \end{aligned}$$

Since optimal \(t^* - {\bar{t}}\ge 0\) are Lagrange multipliers for the problem (P3),

$$\begin{aligned} P_3({\hat{d}}^K, {\hat{f}}^K) \ge P_3(d^*, f^*) - \langle ({\hat{f}}^K - {\bar{f}})_+, t^* - {\bar{t}}\rangle = D_3'(t^*) - \langle ({\hat{f}}^K - {\bar{f}})_+, t^* - {\bar{t}}\rangle , \end{aligned}$$

and thus

$$\begin{aligned} - \langle ({\hat{f}}^K - {\bar{f}})_+, t^* - {\bar{t}}\rangle \le P_3({\hat{d}}^K, {\hat{f}}^K) - D_3'(t^K) \le - \frac{K \varepsilon }{4 M^2} \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2^2 + \frac{3 \varepsilon }{4}. \end{aligned}$$

Therefore,

$$\begin{aligned} \frac{K \varepsilon }{4 M^2} \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2^2 \le R \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2 + \frac{3 \varepsilon }{4} \end{aligned}$$

and, finally,

$$\begin{aligned} \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2 \le \frac{4 M^2 R}{K \varepsilon } + M \sqrt{\frac{3}{K}}, \end{aligned}$$

what yields the result. \(\square\)

4 Frank–Wolfe variations and USTM in traffic assignment

Here, we consider several numerical methods for solving a separate problem of searching user equilibrium with inelastic demands. The Frank–Wolfe method and its variations with different line search strategies effectively solve the Beckmann traffic assignment problem, but due to its primal nature it cannot be applied to the stable dynamics model. Meanwhile, the primal-dual USTM method can be applied to both problems. Further, we conduct the experiments for these methods.

4.1 Frank–Wolfe variations

In the Beckmann model, searching equilibria reduces to minimization of the potential function (B). One of the most popular and effective approaches to solve this problem numerically is the famous Frank–Wolfe method (Frank and Wolfe 1956; Jaggi 2013) as well as its numerous modifications (Fukushima 1984; LeBlanc et al. 1985; Arezki and Van Vliet 1990; Chen et al. 2002). Also, one can apply the primal-dual subgradient methods to optimize the dual problem and then reconstruct a solution to the primal one. However, our research (Kubentayeva and Gasnikov 2021) showed that this approach demands more parameter adjustments to reach the Frank–Wolfe algorithm’s performance with standard step size strategy.

In this paper, we test various step size strategies of Frank–Wolfe method. Namely, we consider some simple decaying step size schedules like standard choice of step size \(\gamma _k = \frac{2}{k+1}\) and \(\gamma _k = \frac{1}{k}\) leading to the averaging of \(f^k\), and a number of approaches based on a choice of the optimal step size by solving auxiliary one-dimensional problem

$$\begin{aligned} \gamma _k {:}{=}\mathop {\mathrm {arg\,min}}\limits \limits _{\gamma \in [\gamma _{\min }, \gamma _{\max }]} \Psi ((1 - \gamma ) f^k + \gamma s^k). \end{aligned}$$

The variety of these approaches corresponds to the different one-dimensional optimization methods. We consider the Brent method on a segment \(\gamma _k \in [0, 1]\) (Brent 1971) and exponential decreasing of \(\gamma _k\) until Armijo rule is satisfied (Armijo 1966). The last modification considered is the backtracking line-search method developed for specific use in Frank–Wolfe algorithms proposed in Pedregosa et al. (2020).

The Frank–Wolfe method’s theoretical convergence rate for convex objective (with Lipschitz-continuous gradient) is \(O(1/\varepsilon )\) (Pedregosa et al. 2020; Jaggi 2013)

Algorithm 2
figure b

Frank–Wolfe algorithm

4.2 Primal-dual universal similar triangles method

Let us remind that (DualB) and (DualSD) dual problems of Beckmann and stable dynamics traffic assignment models have the same structure:

$$\begin{aligned} Q(t) = \sum _{i,j} d_{ij} T_{ij}(t) - h(t) \longrightarrow \max _{t \ge {\bar{t}}}. \end{aligned}$$

The optimization problems are convex, non-smooth and composite. We apply the USTM method described to minimize the composite objective Q(t). Here, in the Algorithm 1, we set \(\Phi (t) {:}{=}\sum _{i,j} d_{ij} T_{ij}(t)\). As in Sect. 3.1, for both models, flows (primal variables) are reconstructed in the following way:

$$\begin{aligned} {\hat{f}}^K = - \frac{1}{A_K} \sum _{k = 1}^{K} \alpha _k \nabla \Phi (y^k), \end{aligned}$$
(12)

where \(\alpha _k\) is a coefficient of the USTM method on iteration k, and \(A_K = \sum _{k = 1}^{K} \alpha _k\). Note that any element from the set \({{\,\mathrm{\partial }\,}}\Phi (t)\) has form \(\nabla \Phi (t) = - f\), where \(f = \Theta x\) is a flow distribution on links induced by \(x \in X\) concentrated on the shortest paths for given times t (and vice versa: any such f corresponds to a subgradient of \(\Phi (t)\)). Hence, weighted \({\hat{f}}^K\) are also induced by flows on the paths.

For the Beckmann model, we can also use the duality gap to estimate the method’s accuracy:

$$\begin{aligned} \Delta ^K = Q(t^K) + \Psi ({\hat{f}}^K). \end{aligned}$$

For the stable dynamics model, flows reconstruction according to (12) keeps feasibility of \({\hat{f}}^K\) (i.e. they are induced by flows on the paths), but can violate the networks capacity constraints — so the duality gap \(\Delta ^K\) can be negative. To solve the SD traffic assignment problem with inelastic demand, Kubentayeva and Gasnikov (2021) proposed a novel way to reconstruct admissible flows — which also meet capacity constraints – together with a novel computable duality gap, which can be used in a stopping criterion.

The USTM method requires \(O(1 / \varepsilon ^2)\) iterations to obtain an \(\varepsilon\)-solution of primal and dual problems of Beckmann and SD models (Nesterov 2015; Kubentayeva and Gasnikov 2021).

5 Sinkhorn’s variations for trip distribution

Optimization problem of entropy-based trip distribution model of Wilson (1969) coincides with optimal transport (OT) problem with entropy regularizer (Cuturi 2013). To solve the problem, celebrated Sinkhorn’s algorithm is used (Sect. 5.1). In Sects. 5.2 and 5.3, we consider accelerated gradient methods adapted for solving OT problems. These methods achieve better theoretical convergence rates compared to Sinkhorn-like methods in some regimes. Later, in Sect. 6.3, we conduct experiments to compare performances of the mentioned methods.

5.1 Sinkhorn’s algorithm

In this section, for the sake of formulas simplicity, we assume a single agent type and travel mode. Since the problem (8) is separable, without loss of generality, we consider only one trip purpose and suppose \(\sum _{i,j} d_{ij} = 1\). Thus, eq. (8) takes form

$$\begin{aligned} \varphi (\lambda ^l, \lambda ^w) = \frac{1}{\gamma } \ln {\sum _{i,j}\exp \left( -\gamma \left( T_{ij} + \lambda ^l_{i} + \lambda ^w_{j} \right) \right) } + \sum _{i} \lambda _{i}^l l_i + \sum _j \lambda _{j}^w w_j. \end{aligned}$$
(13)

Following Guminov et al. (2021), we perform a change of variables \(\mu ^l = -\gamma \lambda ^l\), \(\mu ^w = -\gamma \lambda ^w\) in (13) and obtain an equivalent formulation

$$\begin{aligned} \phi (\mu ^l, \mu ^w) = \frac{1}{\gamma } \left[ \ln \left( {\textbf{1}}^T d\left( \mu ^l,\mu ^w \right) {\textbf{1}} \right) - \langle \mu ^l,l \rangle - \langle \mu ^w,w \rangle \right] \rightarrow \min _{{\mu ^l}, {\mu ^w}}, \end{aligned}$$
(14)

where

$$\begin{aligned} \left[ d(\mu ^l,\mu ^w) \right] _{i, j} = \exp {\left( \mu ^l_i+\mu ^w_j-\gamma T_{ij}(t) \right) }, \end{aligned}$$
(15)

with the primal-dual coupling

$$\begin{aligned} d= d(\mu ^l, \mu ^w) / {\textbf{1}}^T d(\mu ^l, \mu ^w){\textbf{1}}. \end{aligned}$$
(16)

Similarly to the well-known Sinkhorn algorithm, the objective in (14) can be alternatively minimized (see Algorithm 3).

Algorithm 3
figure c

Sinkhorn’s Algorithm (dual objective with the tautological constraint)

Note that, according to Lemma 9 in Guminov et al. (2021) for the problem (14), partial explicit minimization is possible via the same formulas as for classical entropy-regularized OT problem (Cuturi 2013) without tautological constraint:

$$\begin{aligned} \psi (\mu ^l, \mu ^w) = {\textbf{1}}^T d\left( \mu ^l,\mu ^w \right) {\textbf{1}}- \langle \mu ^l, l \rangle - \langle \mu ^w, w \rangle \rightarrow \min _{\mu ^l, \mu ^w} , \end{aligned}$$
(17)

but the primal-dual coupling formulas are different: (16) for the problem (14) and (15) for the problem (17).

The argminima of (14) should be implemented using numerically stable computation of the logarithm of the sum of exponents (logsumexp trick), but analytically the argminima are given by

$$\begin{aligned}&\ln \mu ^l_{k+1} {:}{=}\ln \mu ^l_{k}+\ln l-\ln \left( d\left( \mu ^l_k, \mu ^w_k \right) {\textbf{1}} \right) , \end{aligned}$$
(18)
$$\begin{aligned}&\ln \mu ^w_{k+1} {:}{=}\ln \mu ^w_{k}+\ln w-\ln \left( {\textbf{1}}^T d\left( \mu ^l_k, \mu ^w_k \right) \right) , \end{aligned}$$
(19)

where logarithm is taken element-wisely.

The authors of Gasnikov et al. (2015) pointed out that the objective (14), its gradient

$$\begin{aligned}&\nabla _{\mu ^l}\phi (\mu ) = \partial \phi (\mu ^l, \mu ^w) /\partial \mu ^l = \frac{1}{\gamma }\left( \frac{d\left( \mu ^l, \mu ^w \right) {\textbf{1}}}{{\textbf{1}}^T d\left( \mu ^l, \mu ^w \right) {\textbf{1}}} -l\right) , \end{aligned}$$
(20)
$$\begin{aligned}&\nabla _{\mu ^w}\phi (\mu ) = \partial \phi (\mu ^l, \mu ^w) /\partial \mu ^w = \frac{1}{\gamma }\left( \frac{d\left( \mu ^l, \mu ^w \right) ^T{\textbf{1}}}{{\textbf{1}}^T d\left( \mu ^l, \mu ^w \right) {\textbf{1}}} - w\right) , \end{aligned}$$
(21)

and eq. (15) are invariant under transformations

$$\begin{aligned}&\mu ^l\rightarrow \mu ^l+t_{\mu ^l}{\textbf{1}}\end{aligned}$$
(22)
$$\begin{aligned}&\mu ^w\rightarrow \mu ^w+t_{\mu ^w}{\textbf{1}}, \end{aligned}$$
(23)

with \(t_{\mu ^l},t_{\mu ^w}\in {\mathbb {R}}\). That leads to better numerical stability. In our experiments, we present a variant of Algorithm 3 (labeled as SINKHORN-TAUT-SHIFT) with such invariant transformations, that provide maximum of the dual variables equals 1, and with numerically stable computations of the logarithm of the sum of exponents.

5.2 Accelerated Sinkhorn’s algorithm

Besides the Sinkhorn’s algorithm, accelerated gradient methods are adapted for solving OT problems. These methods achieve better theoretical convergence rates compared to Sinkhorn-like methods in some regimes. To the best available knowledge, the first such method was proposed in Gasnikov et al. (2016), where the authors proposed non-adaptive Accelerated Gradient Descent (AGD) method for a more general class of entropy-linear programming problems. The algorithmic idea is to run AGD for solving (14) and equip it with some primal updates to guarantee the convergence rate also for the primal problem.

In this subsection, Algorithm 1 and Algorithm 2 (its adaptation for trip distribution problem listed as Algorithm 4) from Guminov et al. (2021) are described. The authors proposed to replace in the classical AGD methods the gradient step with a step of explicit minimization w.r.t. one of the blocks of variables. To formalize the latter, suppose that the vector of dual variables can be divided into m block s.t. \(\mu = \left( \mu _1^T, \dots , \mu _m^T\right) ^T\). So that, notations \(\phi (\mu )\) and \(\phi (\mu _1, \dots , \mu _m)\) are equivalent. And suppose that it is possible to minimize the dual objective (14) over i-th block holding the others variables fixed:

$$\begin{aligned} \mathop {\mathrm {arg\,min}}\limits _i \phi (\mu ) {:}{=}\mathop {\mathrm {arg\,min}}\limits _{\xi } \phi (\mu _1, \dots , \mu _{i-1},\xi ,\mu _{i+1}, \cdots , \mu _m). \end{aligned}$$
(24)

Introduce also a notation for block gradient

$$\begin{aligned} \nabla _i \phi (\mu ) = \frac{\partial \phi (\mu _1, \dots , \mu _{i-1},\xi ,\mu _{i+1}, \cdots , \mu _m)}{\partial \xi }. \end{aligned}$$
(25)

The resulting algorithms m times theoretically slower than its gradient counterpart, where m is the number of blocks of variables used in alternating minimization. But in practice the algorithms work faster (Guminov et al. 2021).

Algorithm 4
figure d

AGM-NONPD

In practice variable transformations (2223) (with \(t_\xi = -\Vert \xi \Vert _\infty\), \(\xi \in \{\mu ^l,\mu ^w\}\)) performed after steps 5 and 7 of Algorithm 4 and can lead to a better numerical stability when \(\gamma\) is big.

For Algorithm 2 from Guminov et al. (2021) constraints residual \(\Vert ((d{\textbf{1}}- l)^T, ({\textbf{1}}^Td -w)^T)^T\Vert _2 = {{\widetilde{O}}} \left( \frac{1}{k^{2}}\right)\), but in our experiment it was observed that constraints residual decrease faster for \(d=d(\kappa _{k})\) (16) than for the theoretically obtained primal variable d using primal-dual property of the algorithm. We present experiments only on the best performing modifications with \(d=d(\kappa _{k})\) primal variable reconstruction, labeled as NONPD (since it does not utilizes primal-dual property of the algorithms considered in this subsection).

According to (Guminov et al. 2021, Theorem 3) the objective of the form (14) can be minimized with the following rate

$$\begin{aligned} \phi (\kappa _k)-\phi ^* = {{\widetilde{O}}} \left( \frac{\gamma ^{1/2}\Vert T\Vert _\infty }{{k^{2}}}\right) . \end{aligned}$$

5.3 MIXED AGM

One more natural modification of Algorithm 4 can be obtained by performing several steps of explicit minimization instead of one. The natural number of steps seems to be equal to the number of blocks m. But the proof of Algorithm 4 utilizes the following property of a step explicit minimization

$$\begin{aligned} \phi (\eta ^{k+1})\le \phi (\mu ^k)-\frac{1}{2L}\Vert \nabla _{i^k} \phi (\mu ^k)\Vert _2^2 \end{aligned}$$

in order to obtain

$$\begin{aligned} \phi (\eta ^{k+1}) \le \phi (\mu ^k)-\frac{1}{2nL}\Vert \nabla \phi (\mu ^k)\Vert _2^2. \end{aligned}$$
(26)

The latter is true since \(i^k=\mathop {\mathrm {arg\,max}}\limits \limits _{i\in \{1, \dots , m\}} \Vert \nabla _{i^k} \phi (\mu ^k)\Vert _2^2\).

But the inequality (26) can be satisfied if one replaces lines 6 and 7 in Algorithm 4 with the following Algorithm 5.

Algorithm 5
figure e

k-th step

Despite the practical performance, this modification has no theoretical guarantees because \(\Vert \nabla _{i^k} \phi (\mu ^k)\Vert _2^2\) can be greater than \(\sum ^J_j \Vert \nabla \phi (\zeta ^j)\Vert _2^2\) for any \(J > m\).

Moreover, Algorithm 4 is non-increasing. But non-increasing property of the algorithm can be violated (with either the exact minimization given by lines 6 and 7 of Algorithm 4 replaced with Algorithm 5 or not) due to numerical instabilities. Once it happened, Algorithm 4 is stopped. The computations can be proceeded from the last obtained \(\eta ^k\) with Sinkhorn’s iterations. In fact, these numerical instabilities break monotonicity of Sinkhorn’s iterations too, but in practice the proceeding of computations with Sinkhorn’s iterations allows to find better minima.

The modification, named MIXED-AGM-NONPD, combines Sinkhorn’s iterations after reaching the stability limit and the exact minimization given by Algorithm 5.

5.4 Reconstruction of correspondence matrix

Finally, let us discuss a reconstruction of a solution to the primal problem (E). Assume we reconstruct a solution \(d_k\) of the primal problem (E) by formula (16) with \(\mu ^l = \mu ^l_{k}\), \(\mu ^w = \mu ^w_{k}\). However, since the dual problem is only approximately solved, \(d^k\) in general does not satisfy the marginal constraints. So obtain a feasible solution, one can use projection Algorithm 2 from Altschuler et al. (2017). According to (Altschuler et al. 2017, Theorem 4), it has complexity \(O(|O| \cdot |D|)\) and returns a correspondence matrix \({\hat{d}}_k \in \Pi (l, w)\) such that

$$\begin{aligned} \Vert d_k - {\hat{d}}_k\Vert _1 \le \delta _k = \Vert d_k {\textbf{1}}- l\Vert _1 + \Vert d_k^T {\textbf{1}}- w\Vert _1. \end{aligned}$$

The error can be estimated following Theorem 8 in Stonyakin et al. (2019):

$$\begin{aligned} E({\hat{d}}_k, T) \le \min _{d \in \Pi (l, w)} E(d, T) + 2 \delta _k \Vert T\Vert _\infty + \frac{4 \delta _k}{\gamma } \log \left( \frac{|O| \cdot |D|}{\delta _k}\right) . \end{aligned}$$
(27)

Consider \(\mu ^l_k\), \(\mu ^w_k\) obtained with Sinkhorn’s algorithm. Using (20) and the convexity of \(\phi\) we get

$$\begin{aligned} \phi (\mu ^l_k, \mu ^w_k) - \phi ^*&\le \langle \mu ^l_k - \mu ^l_*, \nabla _{\mu ^l} \phi (\mu ^l_k, \mu ^w_k)\rangle + \langle \mu ^w_k - \mu ^w_*, \nabla _{\mu ^w} \phi (\mu ^l_k, \mu ^w_k)\rangle \\&= \frac{1}{\gamma } \left( \langle \mu ^l_k - \mu ^l_*, d_k {\textbf{1}}- l\rangle + \langle \mu ^w_k - \mu ^w_*, d_k^T {\textbf{1}}- w\rangle \right) , \end{aligned}$$

where \((\mu ^l_*, \mu ^w_*)\) is the solution of (15). Then Lemma 3 and Theorem 7 from Stonyakin et al. (2019) ensure that

$$\begin{aligned} \phi (\mu ^l_k, \mu ^w_k) - \phi (\mu ^l_*, \mu ^w_*) \le \frac{1}{2} \Vert T\Vert _\infty \left( \Vert d_k {\textbf{1}}- l\Vert _1 + \Vert d_k^T {\textbf{1}}- w\Vert _1\right) . \end{aligned}$$

Combining the above bounds, we obtain the following bound on the duality gap:

$$\begin{aligned} E({\hat{d}}_k, T) + \phi (\mu ^l_k, \mu ^w_k) \le \frac{5}{2} \delta _k \Vert T\Vert _\infty + \frac{4 \delta _k}{\gamma } \log \left( \frac{|O| \cdot |D|}{\delta _k}\right) . \end{aligned}$$
(28)

6 Numerical experiments

In our experiments, we consider the morning peak-hour in Moscow transportation network. The city’s data are provided by Russian University of Transport.

The city and its suburbs are split into 1420 zones. Moscow road network consists of 12970 nodes and 36905 links, a part of it is visualized on Fig. 1. We model the crossings by inserting auxiliary links for each allowed turn between road links. Resulting graph contains 63073 nodes and 94546 links.

Fig. 1
figure 1

Moscow network link loads, obtained for the combined Beckmann model

In our four-stage model of Moscow we consider

  • Two demand layers: home-to-work, and home-to-others;

  • Two agent types: car owners and non-car-owners;

  • And three travel modes: public transport, pedestrian and car.

6.1 Parallel computing

Calculation of flows f is the most expensive part, since we have to find the shortest paths for all pairs \(w \in OD\). We use Dijkstra’s algorithm (Dijkstra 1959) to find the shortest paths, which runs in \(O(|{\mathcal {E}}| + |{\mathcal {V}}| \log |{\mathcal {V}}|)\) time; Given the shortest paths tree, flows aggregation have linear performance \(O(|{\mathcal {V}}|)\). Hence, the total complexity of flows calculation is \(O\bigl (|O| (|{\mathcal {E}}| + |{\mathcal {V}}| \log |{\mathcal {V}}|)\bigr )\). Moreover, flows reconstruction for every source \(o \in O\) can be computed in parallel. Table 1 shows the result of running 100 iterations of the Frank–Wolfe method on Moscow road network with the various number of cores involved (processor’s speed is 3092,72 MHz).

Table 1 Effect of CPU parallelism

6.2 Frank–Wolfe algorithm’s variations

Each of the considered modifications of the Frank–Wolfe algorithm was run up to 2000 iterations for the traffic assignment task of the classic four-stage model for the Moscow road network. The results are shown in Fig. 2.

Fig. 2
figure 2

Convergence rate for the different Frank–Wolfe modifications for Moscow network

6.3 Sinkhorn algorithm’s variations

Experiments were run for the Trip distribution stage with dual function adjustment for gradient methods described in Sect. 5 for the Moscow road network. The results are shown in Fig. 3.

Different formulations of minimized targets for Sinkhorn’s method were considered (for example, formulation (14) or (17)), but conceptual differences were not identified, therefore only (14) formulation is shown as SINKHORN-TAUT-SHIFT. Label AAM-NONPD corresponds (Guminov et al. 2021, Algorithm 3), that can be easily adapted similarly as Algorithm 4 was adapted from (Guminov et al. 2021, Algorithm 1). One should note that utilized Sinkhorn’s variation has comparable to gradient methods convergence rate, hence common approach is suitable for solving Trip Distribution problem.

Fig. 3
figure 3

Sinkhorn’s algorithm modifications for the trip distribution stage

6.4 Combined model, Beckmann

Here we compare three algorithms for finding a fixed-point of the four-stage Beckmann traffic model, namely Four-stage procedure, Evans algorithm and our dual approach via USTM. The difference with the two-stage model is the addition of mode split and mode cost averaging steps. Mode split step usually cause wobbling between public transport and car modes when using straightforward Four-stage procedure: if the road network is free at the first iteration agents start alternating between these two modes at each iteration. So we applied exponential averaging of modes cost matrices to handle this problem: \(T^m_{ij}[k+1] = \frac{1}{2} \left( T^m_{ij}[\text {new}] + T^m_{ij}[k]\right)\).

Figure 4 shows the convergence of the duality gap for all three algorithms considered. It can be seen that the Four-stage procedure does not tend to converge to zero duality gap: after 5-6 iterations (about 70 min) it reaches its lower value of the duality gap, then it starts to fluctuate around this value. In order to increase the accuracy of the approximate solution found by Four-stage procedure, one has to increase the number of inner iterations, which will make each outer iteration slower.

In contrast, Evans method steadily converge to zero duality gap.

Fig. 4
figure 4

Duality gap convergence

Some intuition about the behavior of the methods can be given by the Fig. 5, where two-dimensional projections of \(d^m_{ij}\) trajectories are depicted. The projections were made by multidimensional scaling method, which tries to preserve pairwise distances while matching points from high-dimensional space (in our case — correspondence matrices) to points on the plane. As one can see, the trajectories start from the same point, since the calculation of the correspondence matrices and the modal splitting in both methods is the same. After a few iterations the trajectories of the methods are in the same region again, but the Evans method proceeds with small steps, while the Four-stage procedure makes long jumps around the point to which Evans method converges. The trajectory of USTM is similar to the trajectory of the Evans algorithm and is omitted for the sake of readability.

Fig. 5
figure 5

2-dimensional projections of \(d^m_{ij}\) trajectories for the Evans algorithm and the four-stage procedure, obtained by multidimensional scaling. The trajectory of the Evans method is sparsified to 50 points. The last point is marked with a large cross

6.5 Combined model, stable dynamics

Here we compare the results obtained for the Beckman and the Stable Dynamics models on the Moscow city transportation model. We use the USTM algorithm to search for the equilibria because other algorithms are not applicable since the link travel times are not functions of the link flows in the Stable Dynamics model.

We used the same Moscow network as in previous experiment, but, since Stable Dynamics model is usually infeasible for peak-hours correspondences, we divided the peak-hour departures \(l_i^{ra}\) and arrivals \(w_j^r\) by two.

Convergence trajectories for Stable Dynamics model are shown in Fig. 6. We discuss convergence of Beckmann model in more representative case of peak-hour departures and arrivals in Sect. 6.4, therefore convergence trajectories for Beckmann model are omitted in this subsection.

We asses the convergence by monitoring two values: constraints violation and function suboptimality. Since the dual approach allows the flows to exceed the link capacities, the primal variables stay outside of the feasible region, thus the duality gap could be negative, as shown in Fig. 6b. Then duality gap is negative, the objective function value at that approximate solution (of the minimization problem) is less than the optimal function value, but the approximate solution is infeasible.

The comparison of the approximate solutions is given in Fig. 7. It is evident that Beckmann’s model is more likely to exceed the link capacity. Figure 7b shows that the travel time on some links in Stable Dynamics model exceeds the free-flow time by several hundred times. This implies that some zones are connected to the rest of the network only by low-capacity links, leading to huge traffic congestion at equilibrium. This result is likely due to inaccuracies in the input data, but if not, these bottleneck links should be prioritized in the transportation network improvement process.

Fig. 6
figure 6

Convergence of USTM on the Stable Dynamics model: a total flow above the link capacity limits, b absolute value of the duality gap, the sign is marked by color

Fig. 7
figure 7

Histograms of the network load: a histogram of the ratio of the amount of flow on the link to its capacity, b histogram of the ratio of the travel time on the link to the travel time on the same link when it is free

6.6 Traffic assignment model: Frank–Wolfe vs USTM for Beckmann model

Experiments were conducted for single trip purpose, agent type and travel mode (by car) for the Berlin-Center network split into 865 zones with 12981 nodes and 28376 links (for more details see Transportation Networks for Research Core Team 2023). As it was shown in the article (Kubentayeva and Gasnikov 2021), performance of the USTM method is better than UGD (Nesterov 2015) and other variations of accelerated gradient descent, thus only USTM and conventional Frank–Wolfe methods are considered. Convergence by primal function and duality gap is presented in Fig. 8. It is necessary to emphasize that the bigger \(\varepsilon\), the faster USTM converges to \(\varepsilon\) accuracy and oscillates. Thereby, it makes sense to use restarting technique for faster convergence — run method with \(\varepsilon '\) and then with final desired accuracy \(\varepsilon < \varepsilon '\).

Fig. 8
figure 8

Convergence of Frank–Wolfe and USTM with different \(\varepsilon\) for Beckmann model for Berlin-Center network: a primal function, b duality gap