Primal-dual gradient methods for searching network equilibria in combined models with nested choice structure and capacity constraints

Kubentayeva, Meruza; Yarmoshik, Demyan; Persiianov, Mikhail; Kroshnin, Alexey; Kotliarova, Ekaterina; Tupitsa, Nazarii; Pasechnyuk, Dmitry; Gasnikov, Alexander; Shvetsov, Vladimir; Baryshev, Leonid; Shurupov, Alexey

doi:10.1007/s10287-023-00494-8

Primal-dual gradient methods for searching network equilibria in combined models with nested choice structure and capacity constraints

Original Paper
Published: 04 January 2024

Volume 21, article number 15, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Management Science Aims and scope Submit manuscript

Primal-dual gradient methods for searching network equilibria in combined models with nested choice structure and capacity constraints

Download PDF

Meruza Kubentayeva¹,
Demyan Yarmoshik¹,
Mikhail Persiianov^1,2,
Alexey Kroshnin^2,3,
Ekaterina Kotliarova¹,
Nazarii Tupitsa¹,
Dmitry Pasechnyuk¹,
Alexander Gasnikov^1,2,3,
Vladimir Shvetsov^1,4,
Leonid Baryshev⁴ &
…
Alexey Shurupov⁴

232 Accesses
2 Citations
Explore all metrics

Abstract

We consider a network equilibrium model (i.e. a combined model), which was proposed as an alternative to the classic four-step approach for travel forecasting in transportation networks. This model can be formulated as a convex minimization program. We extend the combined model to the case of the stable dynamics model in the traffic assignment stage, which imposes strict capacity constraints in the network. We propose a way to solve corresponding dual optimization problems with accelerated gradient methods and give theoretical guarantees of their convergence. We conducted numerical experiments with considered optimization methods on Moscow and Berlin networks.

Dual Methods for Finding Equilibriums in Mixed Models of Flow Distribution in Large Transportation Networks

Article 01 September 2018

Hybrid Methods for Network Equilibrium Problems

Equilibrium in Multi-criteria Transportation Networks

Article 01 February 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

One of the most popular approaches to travel forecasting in transportation networks is the four-step procedure (Dios Ortúzar and Willumsen 2011): sequential run of trip generation, trip distribution, modal split, and traffic assignment stages. However, this approach has a number of limitations, e.g. there is no convergence guarantee (Oppenheim 1995; Boyce et al. 1994; Boyce 2002).

To overcome this issue, there were proposed network equilibrium models (NE / combined models) which can be formulated as an optimization or, more generally, a variational inequality problem (Beckmann et al. 1956; De Cea et al. 2005). In particular, Evans (1976) reduced the problem of searching equilibrium in the case of one transport mode to a convex optimization problem, combining trip distribution and route assignment models. Authors of Florian and Nguyen (1978) made an extension to the multi-modal case, where destination and mode are chosen simultaneously with the same value of a calibration parameter. The first mathematical formulation of a network equilibrium model with hierarchical destination and mode choices was proposed in Fernández et al. (1994) — the approach was presented for modelling nested choice structure of trips using several modes (e.g. park’n ride trips). Abrahamsson and Lundqvist (1999) formulated a nested combined model where mode choice is conditioned by destination choice and demonstrated its application for the Stockholm region. The recent works (Chu 2018), Liu et al. (2018), and Gao et al. (2022) proposed the extensions of the combined models for the cases of modeling trip frequency, remote park-and-ride, and tourism demand, respectively.

Finding a solution in trip distribution and traffic assignment problems — whether they are considered separately in the four-step approach or combined into one network equilibrium problem — relies on numerical methods for convex optimization. E.g., a classic choice for the traffic assignment problem (which is the most computationally expensive part) is the Frank–Wolfe algorithm (Frank and Wolfe 1956), and for the trip distribution problem it is the Sinkhorn algorithm (Sinkhorn 1974). A class of path-based algortihms can be an alternative to the link-based Frank–Wolfe algorithm for solving traffic assignment problem: Chen et al. (2020), Xie et al. (2017), Babazadeh et al. (2020). A popular choice for solving an optimization problem in the above-mentioned combined models is a partial linearization algorithm of Evans (1976) and its modifications for multi-modal and multi-user cases (Abrahamsson and Lundqvist 1999; Boyce et al. 1983). Recently, in Yang et al. (2013); Fan et al. (2022); Zarrinmehr et al. (2019); Cabannes et al. (2019); Wang et al. (2022), improvements of these algorithms were presented. Also, in subsequent years there have been developed a lot of new optimization methods, in particular, accelerated gradient methods (Nesterov 2004, 2009, 2015), which can be applied to the described problems.

Another direction of research on travel modelling in recent years is related to capacitated transportation networks, which allow to overcome some limitations of the standard Beckmann traffic assignment model (Nesterov and De Palma 2003; Zokaei Aashtiani et al. 2021; De Cea et al. 2005; Wang et al. 2019; Smith et al. 2019; Anikin et al. 2020; Zhu et al. 2020).

For the best of our knowledge, there is no works considering the application of accelerated gradient methods to combined models.

In this paper, we consider:

An entropy-based trip distribution model with hierarchical choice structure (Wilson 1969; Fernández et al. 1994; Abrahamsson and Lundqvist 1999);
The Beckmann traffic assignment model with inelastic demand (Beckmann et al. 1956);
The stable dynamics traffic assignment model, where resulting flow distribution satisfy the network’s capacity constraints (Nesterov and De Palma 2003);
An NE model combining all the models mentioned above.

In the last case, we consider the nested combined model proposed in Abrahamsson and Lundqvist (1999), where transit and road networks are independent, and the transit network has constant travel costs. We extend it to the case of the stable dynamics model for traffic assignment.

We employ accelerated primal-dual gradient methods to solve corresponding optimization problems and compare their performances to the classic Sinkhorn, Frank–Wolfe, and generalised Evans algorithms. Also, we provide theoretical guarantees for their convergence rate.

The main contributions of the paper are the following:

We propose a way to solve the dual problem of the nested combined model of Abrahamsson and Lundqvist (1999) with a universal accelerated gradient method USTM (Gasnikov and Nesterov 2018);
We extend the nested combined model of Abrahamsson and Lundqvist (1999) to the case of capacitated networks: namely, we propose a way to solve the dual problem for searching equilibrium in combined trip distribution model with the nested choice structure and the stable dynamics traffic assignment model;
We provide theoretical upper bounds on the complexity of searching network equilibrium by the USTM algorithm.
We conducted numerical experiments comparing different algorithms on Moscow and Berlin transportation networks.

The paper is organized as follows. In Sect. 2, we give a general problem statement for a combined trip distribution, modal split, and traffic assignment model. In Sect. 3, we describe the primal-dual accelerated method to solve the NE problem and provide its convergence analysis. In Sects. 4 and 5, we describe optimization algorithms that we consider for separate traffic assignment and trip distribution models. Section 6 presents numerical experiments conducted on Moscow and Berlin transportation networks.

2 Problem statement

We start with the description of the Beckmann and the stable dynamics models for searching the road network user equilibrium. Similarly to Abrahamsson and Lundqvist (1999), we assume the road and the transit networks are independent, and there is no congestion effects in the transit network (its travel costs are constant and defined as the costs of the shortest routes). Then, in Sect. 2.2, we describe the trip distribution model with a hierarchical choice structure of destination and travel mode (by car, public transport, or on foot). And finally, in 2.3, we consider the combined trip distribution-modal split-assignment problem and formulate its dual problem.

2.1 Route assignment models

Let the urban road network be represented by a directed graph ${\mathcal {G}} = ( {\mathcal {V}}, {\mathcal {E}} )$, where vertices ${\mathcal {V}}$ correspond to intersections or centroids (Sheffi 1985) and edges ${\mathcal {E}}$ correspond to roads, respectively. Suppose we are given the travel demands: namely, let $d_{ij}$(veh/h) be a trip rate from origin i to destination j. We denote by $P_{ij}$ the set of all simple paths from i to j. Respectively, $P = \bigcup _{(i,j) \in OD} P_{ij}$ is the set of all possible routes for all origin–destination pairs OD. Agents traveling from node i to node j are distributed among paths from $P_{ij}$, i.e. for any $p \in P_{ij}$ there is a flow $x_p \in {\mathbb {R}}_+$ along the path p, and $\sum _{p \in P_{ij}} x_p = d_w$. Flows from origin nodes to destination nodes create the traffic in the entire network ${\mathcal {G}}$, which can be represented by an element of

$$\begin{aligned} X = X(d) = \Bigl \{x \in {\mathbb {R}}_{+}^{|P|}: \; \sum _{p \in P_{ij}} x_p = d_{ij}, \; (i, j) \in OD \Bigr \}. \end{aligned}$$

(1)

Note that the dimension of X can be extremely large: e.g. for $n \times n$ Manhattan network $\log |P| = \Omega (n)$. To describe a state of the network we do not need to know an entire vector x, but only flows on arcs:

$$\begin{aligned} f_e(x) = \sum _{p \in P} \delta _{e p} x_p \quad \text {for} \quad e \in {\mathcal {E}}, \end{aligned}$$

where $\delta _{e p} = \mathbbm {1}\{e \in p\}$. Let us introduce a matrix $\Theta$ such that $\Theta _{e, p} = \delta _{e p}$ for $e \in {\mathcal {E}}$, $p \in P$, so in vector notation we have $f = \Theta x$. To describe an equilibrium we use both path- and link-based notations (x, t) or (f, t).

Beckmann model (Beckmann et al. 1956; Patriksson 2015). One of the key ideas behind the Beckmann model is that the cost (e.g. travel time, gas expenses, etc.) of passing a link e is the same for all agents and depends solely on the flow $f_e$ along it. In what follows, we denote this cost for a given flow $f_e$ by $t_e = \tau _e(f_e)$. In practice the BPR functions are usually employed (US Bureau of Public Roads 1964):

$$\begin{aligned} \tau _e(f_e) = {\bar{t}}_e \left( 1 + \rho \left( \frac{f_e}{{\bar{f}}_e}\right) ^{\frac{1}{\mu }} \right) , \end{aligned}$$

(2)

where ${\bar{t}}_e$ are free flow times, and ${\bar{f}}_e$ are road capacities of a given network’s link e. We take these functions with parameters $\rho = 0.15$ and $\mu = 0.25$.

Another essential point is a behavioral assumption on agents called the first Wardrop’s principle: we suppose that each of them knows the state of the whole network and chooses a path p minimizing the total cost

$$\begin{aligned} T_p(t) = \sum _{e \in p} t_e. \end{aligned}$$

The cost functions are supposed to be continuous, non-decreasing, and non-negative. Then $(x^*, t^*)$, where $t^* = (t_e^*)_{e \in {\mathcal {E}}}$, is an equilibrium state, i.e. it satisfies conditions

$$\begin{aligned} t_e^* = \tau _e(f_e^*), \quad \text {where}\quad f^* = \Theta x^*, \\ x^*_{p_w} > 0 \Longrightarrow T_{p_{ij}}(t^*) = T_{ij}(t^*) = \min _{p \in P_{ij}} T_p(t^*), \end{aligned}$$

if and only if $x^*$ is a minimum of the potential function:

$$\begin{aligned}&\Psi (x) = \sum _{e \in {\mathcal {E}}} \underbrace{\int _{0}^{f_e} \tau _e (z) d z}_{\sigma _e(f_e)} \longrightarrow \min _{f = \Theta x, \; x \in X} \\&\Longleftrightarrow \Psi (f) = \sum _{e \in {\mathcal {E}}} \sigma _e (f_e) \longrightarrow \min _{f = \Theta x : \; x \in X}, \end{aligned}$$

(B)

and $t_e^* = \tau _e(f_e^*)$ (Beckmann et al. 1956).

Another way to find an equilibrium numerically is by solving a dual problem. We can construct it according to Theorem 4 from Nesterov and De Palma (2003), the solution of which is $t^*$:

$$\begin{aligned} Q(t) = \sum _{{ij} \in OD} d_{ij} T_{ij}(t) - \underbrace{\sum _{e \in {\mathcal {E}}} \sigma _e^*(t_e)}_{h(t)} \longrightarrow \max _{t \ge {\bar{t}}}, \end{aligned}$$

(DualB)

where

$$\begin{aligned} \sigma _e^*(t_e) = \sup _{f_e \ge 0} \{t_e f_e - \sigma _e(f_e) \} = {\bar{f}}_e \left( \frac{t_e - {\bar{t}}_e}{{\bar{t}}_e \rho } \right) ^{\mu } \frac{\left( t_e - {\bar{t}}_e \right) }{1 + \mu } \end{aligned}$$

is the conjugate function of $\sigma _e(f_e)$, $e \in {\mathcal {E}}$.

When we search for the solution to this problem numerically, on every step of an applied method we can reconstruct primal variable f from the current dual variable t: $f \in \partial \sum _{(i,j) \in OD} d_{ij} T_{ij}(t)$. Then we can use the duality gap — which is always nonnegative — for the estimation of the method’s accuracy:

$$\begin{aligned} \Delta (f, t) = \Psi (f) - Q(t). \end{aligned}$$

It vanishes only at the equilibrium $(f^*, t^*)$.

Stable dynamics model. Nesterov and De Palma (2003) proposed an alternative model called the stable dynamics model, which takes an intermediate place between static and dynamic network assignment models. Namely, its equilibrium can be interpreted as the stationary regime of some dynamic process. Its key assumption is that we no longer introduce a complex dependence of the travel cost on the flow (as in the standard static models), but only pose capacity constraints, i.e. the flow value on each link imposes the feasible set of travel times

$$\begin{aligned} \tau _e(f_e) = {\left\{ \begin{array}{ll} {\bar{t}}_e, &{} 0 \le f_e < {\bar{f}}_e,\\ \left[ {\bar{t}}_e, \infty \right] , &{} f_e = {\bar{f}}_e,\\ +\infty , &{} f_e > {\bar{f}}_e. \end{array}\right. } \end{aligned}$$

(3)

Unlike in the Beckmann model, there is no one-to-one correspondence between equilibrium travel times and flows on the links of the network. There are examples in Nesterov and De Palma (2003) illustrating the difference. Also, one can find in Chudak et al. (2007) a detailed comparison of equilibria in these two models conducted for large and small networks.

Hence, an equilibrium state $(x^*, t^*)$ of the stable dynamics model satisfies the next conditions:

$$\begin{aligned} t_e^* \in \tau _e(f_e^*), \\ x^*_{p_{ij}} > 0 \Longrightarrow T_{p_{ij}}(t^*) = T_{ij}(t^*). \end{aligned}$$

The above formula can be reformulated in terms of an optimization problem. The pair $(f^*, t^*)$ is an equilibrium if and only if it is a solution of the saddle-point problem

$$\begin{aligned} \sum _{e \in E} [t_e f_e - (t_e - {\bar{t}}_e) {\bar{f}}_e] \longrightarrow \min _{\begin{array}{c} f = \Theta x: \\ x \in X \end{array}} \max _{t_e \ge {\bar{t}}_e}, \end{aligned}$$

(SaddleSD)

where its primal problem is

$$\begin{aligned} \Psi (f) = \sum _{e \in {\mathcal {E}}} f_e {\bar{t}}_e \longrightarrow \min _{\begin{array}{c} f = \Theta x : \\ x \in X,\, f_e \le {\bar{f}}_e \end{array}}, \end{aligned}$$

(SD)

and its dual problem is

$$\begin{aligned} Q(t) = \sum _{(i,j) \in OD} d_{ij} T_{ij}(t) - \underbrace{\langle t - {\bar{t}}, {\bar{f}}\rangle }_{h(t)} \longrightarrow \max _{t_e \ge {\bar{t}}_e}. \end{aligned}$$

(DualSD)

In contrast with the Beckmann model, the equilibrium state in the stable dynamics model is defined by pair $(f^*, t^*)$ (in particular, it differs from the system optimum $(f^*, {\bar{t}})$ in the model only by the time value).

In both cases the dual problem has form

$$\begin{aligned} Q(t) = \sum _{(i,j) \in OD} d_{ij} T_{ij}(t) - h(t) \longrightarrow \max _{t \ge {\bar{t}}}. \end{aligned}$$

The optimization problem is convex, non-smooth and composite.

2.2 Trip distribution with modal split (D-MS)

Let us further assume that there are several trip purposes (demand layers), travel modes (transportation modes), and agents types. We use the logit model with calibration parameters $\alpha _{am}$, $\beta _{am}$ corresponding to choices of travel mode m by agents of the type a. Further, we consider the case when $\alpha _{am}$ values are the same for all travel modes of agent type a:

$$\begin{aligned} \alpha _{am} {:}{=}\alpha _a. \end{aligned}$$

Necessity of this condition will be explained below.

For example, if we want to make travelling by car (travel mode $m_1$) unavailable for non-car-owners $a_1$, we can set $\beta _{a_1 m_1} {:}{=}\inf$ to get zero trips $d^{a_1m_1}=0$. Thus, for every agent type a we can implicitly set its group (nest) of available travel modes.

To define destination choice model, we use the entropy-based trip distribution model of Wilson (1969). For every trip purpose r (e.g., home-work, home-other) we define calibration parameter $\gamma _r$. This parameter defines the sensitivity of agents with the trip purpose r to trip length.

According to Abrahamsson and Lundqvist (1999), Fernández et al. (1994), we consider the following problem:

$$\begin{aligned} \sum _{i, j, r, a, m} d_{ij}^{ram} T_{ij}^m + \underbrace{\sum _{i, j, r, a} \frac{1}{\gamma _r} d_{ij}^{ra} \ln d_{ij}^{ra} + \sum _{i, j, r, a, m} \frac{1}{\alpha _a} d_{ij}^{ram} \left( \ln \left( \frac{d_{ij}^{ram}}{d_{ij}^{ra}} \right) + \beta _{am} \right) }_{H(d)} \rightarrow \min _{d \in \Pi '(l, w)}, \end{aligned}$$

(P1)

where

$$\begin{aligned} \Pi '(l, w) = \left\{ d_{ij}^{ram}\ge 0: \sum _{j, m} d_{ij}^{ram} = l_i^{ra}, \sum _{i, a, m} d_{ij}^{ram} = w_j^{r} \right\} , \end{aligned}$$

$d_{ij}^{ram}$ is a number of trips from zone i to j by travel mode m of agents type a with trip purpose r and $d_{ij}^{ra} = \sum _m d_{ij}^{ram}$; $l_i^{ra}$ is a number of production from zone i of agents type a with trip purpose r; $w_j^{r}$ is a number of attractions to zone j of the trip purpose r.

This is the combined trip distribution-modal split (D-MS) problem, where the choice structure is nested: travel mode choice is conditioned by destination choice (Abrahamsson and Lundqvist 1999). If $\gamma _r$ and $\alpha _a$ are equal, then (P1) reduces to the problem that also corresponds to D-MS model with simultaneous choices of destination and travel mode with the same calibration parameters (Florian and Nguyen 1978; Abrahamsson and Lundqvist 1999).

For fixed values $d_{ij}^{ra}$, it is straightforward to check that the optimal $d_{ij}^{ram}$ satisfy the following relation:

$$\begin{aligned} {{\,\textrm{prob}\,}}_{ij}^{ram} = \frac{d_{ij}^{ram}}{d_{ij}^{ra}} = \frac{\exp \left( - \alpha _a T_{ij}^m - \beta _{am}\right) }{ \sum _{m'} \exp \left( -\alpha _a T_{ij}^{m'} - \beta _{am'}\right) }, \end{aligned}$$

(4)

where $T_{ij}^{am} = T_{ij}^m + \frac{\beta _{am}}{\alpha _a}$, i.e., the modal split corresponds to the logit model. Moreover,

$$\begin{aligned} \min _{d_{ij}^{ram}} \left( \sum _{i,j,r,a,m} d_{ij}^{ram}T_{ij}^{am}+ \frac{1}{\alpha _a} \sum _{i,j,r,a,m} d_{ij}^{ram}\ln \frac{d_{ij}^{ram}}{d_{ij}^{ra}} \right) = d_{ij}^{ra}\underbrace{\left( - \frac{1}{\alpha _a} \ln \sum _{m} \exp \left( - \alpha _a T_{ij}^{am}\right) \right) }_{T_{ij}^a}, \end{aligned}$$

where $T_{ij}^a$ is a composite travel cost for agents of type a.

Substituting $d_{ij}^{ram}= {{\,\textrm{prob}\,}}_{ij}^{ram}d_{ij}^{ra}$ we reduce the problem (P1) to

$$\begin{aligned} E(d, T) = \sum _{i,j,r,a} d_{ij}^{ra}T_{ij}^a+ \sum _{i,j,r,a} \frac{1}{\gamma _r} d_{ij}^{ra}\ln d_{ij}^{ra}\rightarrow \min _{d \in \Pi (l, w)}, \end{aligned}$$

(E)

where

$$\begin{aligned} \Pi (l,w) = \left\{ d \ge 0: \sum _{j} d_{ij}^{ra}= l_i^{ra}, \sum _{i,a} d_{ij}^{ra}= w_j^{r} \right\} , \end{aligned}$$

and $T_{ij}^a = - \frac{1}{\alpha _a} \ln \sum _m \exp \left( - \alpha _a T_{ij}^{m}- \beta _{am} \right)$.

Let us derive its dual problem. In our problem statement, the system of constraints $\Pi '(l, w)$ is consistent $\sum _{i,a} l_i^{r,a} = \sum _{j} w_j^r$. Therefore Gasnikov et al. (2015), we can introduce a tautological constraint

$$\begin{aligned}&\sum _{i,j,a} d^{ra}_{ij} = \sum _{i,a} l_i^{r,a} = \sum _{j} w_j^r = N_{r}. \end{aligned}$$

(5)

We will utilize tautological constraint (5) to obtain dual function with bounded subgradient norm.

$$\begin{aligned}&\min _{d \in \Pi (l, w)} \sum _{i,j,r,a} d_{ij}^{ra}T_{ij}^a+ \sum _{i,j,r,a} \frac{1}{\gamma _r} d_{ij}^{ra}\ln d_{ij}^{ra}\nonumber \\ {}&= \min _{\begin{array}{c} {d^{ra}_{ij}}\ge 0\\ \sum _{i,j,a} {d^{ra}_{ij}}= N_r \end{array}} \max _{\lambda \ge 0} \sum _r \frac{1}{\gamma }_r \sum _{i,j,a} {d^{ra}_{ij}}\ln {d^{ra}_{ij}}+ \sum _{i,j,a} {d^{ra}_{ij}}T_{ij}^a+ \sum _{i,a} \lambda ^l_{rai} \left( \sum _j {d^{ra}_{ij}}- l_i^{ra} \right) \nonumber \\ {}&+ \sum _{j} \lambda ^w_{rj} \left( \sum _{i,a} {d^{ra}_{ij}}- w^r_j \right) \nonumber \\ {}&= \max _{\lambda \ge 0} \sum _r \min _{\begin{array}{c} {d^{ra}_{ij}}\ge 0\\ \sum _{i,j,a} {d^{ra}_{ij}}= N_r \end{array}} \frac{1}{\gamma }_r \sum _{i,j,a} {d^{ra}_{ij}}\ln {d^{ra}_{ij}}+ \sum _{i,j,a} {d^{ra}_{ij}}T_{ij}^a+ \sum _{i,a} \lambda ^l_{rai} \left( \sum _j {d^{ra}_{ij}}- l_i^{ra} \right) \nonumber \\ {}&+ \sum _{j} \lambda ^w_{rj} \left( \sum _{i,a} {d^{ra}_{ij}}- w^r_j \right) \nonumber \\ {}&= -\min _{\lambda \ge 0} \varphi (\lambda ^l, \lambda ^w, T), \end{aligned}$$

(6)

where by $\varphi (\lambda ^l, \lambda ^w, T)$ we denoted the negative of the dual function.

Let y be the dual variable for the tautological constraint $\sum _{i,j,a} d^{ra}_{ij} = N_{r}$. Then taking the gradient by d we get one of the optimality conditions for the inner minimization problem:

$$\begin{aligned} \frac{1}{\gamma _r}\left( \ln {d^{ra}_{ij}}+ 1 \right) + T_{ij}^a+ \lambda ^l_{rai} + \lambda _{rj}^w + y = 0, \end{aligned}$$

therefore

$$\begin{aligned} {d^{ra}_{ij}}= \exp \left( -1 - \gamma _r \left( T_{ij}^a+ \lambda ^l_{rai} + \lambda _{rj}^w + y \right) \right) . \end{aligned}$$

By choosing y such that ${d^{ra}_{ij}}$ satisfies $\sum _{i,j,a} d^{ra}_{ij} = N_{r}$ we obtain

$$\begin{aligned} d^{ra}_{ij}(\lambda ^l_r, \lambda ^w_r, T) = \frac{N_r \exp \left( -\gamma _r\left( T^{a}_{ij} + \lambda ^l_{rai} + \lambda ^w_{rj} \right) \right) }{\sum _{i,j,a}\exp \left( -\gamma _r\left( T^{a}_{ij} + \lambda ^l_{rai} + \lambda ^w_{rj} \right) \right) }. \end{aligned}$$

(7)

Substituting this into (6) yields

$$\begin{aligned} \varphi (\lambda ^l, \lambda ^w, T) = \sum _r \frac{N_r}{\gamma _r} \ln {\frac{1}{N_r}\sum _{i,j,a}\exp \left( -\gamma _r\left( T_{ij}^a+ \lambda ^l_{rai} + \lambda ^w_{rj} \right) \right) } + \sum _{i,a} \lambda _{rai}^l l_i^{ra} + \sum _j \lambda _{rj}^w w^r_j. \end{aligned}$$

(8)

2.3 Combined distribution-modal split-assignment problem (D-MS-A)

Now, we combine the road, the transit, and the pedestrian networks into one multi-modal network, which we denote again by ${\mathcal {G}} = ( {\mathcal {V}}, {\mathcal {E}} )$. Slightly abusing notations, in the same way as in Sect. 2.1 we can define the set of path flows X(d) corresponding to an interzonal trip matrix $d \in \Pi '(l, w)$, and link flows $f_e^{ram}$, $f_e^m = \sum _{r,a} f_e^{ram}$, and $f_e = \sum _{m} f_e^m$.

According to Abrahamsson and Lundqvist (1999), the combined distribution-modal split-assignment problem can be formulated as follows:

$$\begin{aligned} P_3(f,d) = \Psi (f) + H(d) \rightarrow \min _{\begin{array}{c} f = \Theta x, \; x \in X(d)\\ d \in \Pi '(l,w) \end{array}}, \end{aligned}$$

(P3)

where

$$\begin{aligned} \Psi (f) = \sum _{e \in {\mathcal {E}}} \left( \sigma _e(f_e) + \sum _{m \in V} c_e^{m} f_e^{m}\right) . \end{aligned}$$

Similarly to Sect. 2.1 we obtain from (P3) the saddle-point problem

$$\begin{aligned} S_3'(d, t) = \sum _{i,j,r,a,m} d_{ij}^{ram}T_{ij}^{m}(t) - h(t) + H(d) \rightarrow \min _{d \in \Pi '(l, w)} \ \max _{t \ge {\bar{t}}}, \end{aligned}$$

(S3')

where $T_{ij}^{m}(t)$ is the minimal cost of the path from $i \in O$ to $j \in D$ with the links cost $t_e + c_e^m$. According to Sect. 2.2, the above problem reduces to

$$\begin{aligned} S_3(d,t) = \underbrace{\sum _{i,j,r,a} d_{ij}^{ra}T_{ij}^a(t) + \sum _{i,j,r,a} \frac{1}{\gamma _r} d_{ij}^{ra}\ln d_{ij}^{ra}}_{ E(d, T(t))} - h(t) \rightarrow \min _{d \in \Pi (l,w)} \max _{t \ge {\bar{t}}}, \end{aligned}$$

(S3)

where $T_{ij}^a(t) = - \frac{1}{\alpha _a} \ln \left( \sum _m \exp \left( - \alpha _a T_{ij}^{m}(t) - \beta _{am} \right) \right)$.

Respectively, the dual problem is

$$\begin{aligned} D_3(\lambda , t) = -\varphi (\lambda ^l, \lambda ^w, T(t)) - h(t) \longrightarrow \max _{t \ge {\bar{t}}, \; \lambda ^l, \lambda ^w}. \end{aligned}$$

(D3)

Thus, there are several ways to formulate an optimization problem. In this paper, we consider the following particular formulation of the problem and further provide convergence analysis of the accelerated gradient method application to it:

$$\begin{aligned} D'_3 (t) = - \varphi _3(t) - h(t) \longrightarrow \max _{t \ge {\bar{t}}}, \end{aligned}$$

(D3')

where

$$\begin{aligned} \varphi _3(t) = \min _{\lambda } \varphi (\lambda , T(t)) = - \min _{d \in \Pi (l,w)} E(d, T(t)). \end{aligned}$$

3 Dual approach for solving the combined model

In Sect. 3.1, we describe the universal gradient method of similar triangles (USTM) for solving the dual problem (D3’) of the described combined model. And we provide its convergence analysis in Sect. (3.2).

3.1 Dual method for NE problem

A popular approach for searching equilibrium in combined models is the partial linearization algorithm of Evans (1976) and its modifications for multi-modal and multi-user cases (Abrahamsson and Lundqvist 1999; Boyce et al. 1983). The approach is further developed in Yang et al. (2013) by incorporating better line-search procedures. Note that the algorithm can be viewed as partly dual, because it is formulated in terms external to the primal problem: it includes cost matrices $T_{ij}$, which are the dual variables of the saddle-point problem (S3) (or (S3’)) or the dual problem (D3). But still, the algorithm is essentially primal, since it optimizes (P3) by flow and trip distribution pair.

Here we propose an alternative approach based on solving the dual problem (D3’) with the universal method of similar triangles (USTM), and afterwards we prove the convergence rates for it.

Algorithm 1 provides the pseudocode of USTM with an inexact oracle and the euclidean prox-structure. Here we used the following notations:

$$\begin{aligned} \phi _0(t) = \frac{1}{2} \Vert t - t^0\Vert _2^2, \\ \phi _{k+1}(t) = \phi _k(t) + \alpha _{k+1} \left[ {\tilde{\Phi }}(y^{k+1}) + \left\langle {\tilde{\nabla }} \Phi (y^{k+1}), t - y^{k+1} \right\rangle + h(t) \right] . \end{aligned}$$

Note that we did not specify the stopping criterion as it can be different for different models (Fan et al. 2022).

To find a network equilibrium in D-MS-A model, we apply USTM to minimize the composite objective $-D'_3(t) = \varphi _3(t) + h(t)$ in (D3’), thus we set $\Phi (t) {:}{=}\varphi _3(t)$ in Algorithm 1. Recall that

$$\begin{aligned} \varphi _3(t) = - \min _{d \in \Pi (l, w)} E(d, T(t)). \end{aligned}$$

Note that at each iteration we need to compute $\varphi _3$ and $\nabla \varphi _3$ with travel costs $y^{k+1}$, $t^{k+1}$. Since this itself is done by an iterative procedure, we cannot expect to find the exact solution of the subproblem. Instead, we use the following inexact oracle for $\varphi _3$:

$$\begin{aligned} {\tilde{\varphi }}_3(t) = \varphi _{3,\delta }(t) = - E({\tilde{d}}(t), T(t)), \quad {\tilde{\nabla }} \varphi _3(t) = \nabla _\delta \varphi _3(t) = - \nabla _t E({\tilde{d}}(t), T(t)), \end{aligned}$$

where $\delta \ge 0$ and ${\tilde{d}}(t) = d_\delta (t) \in \Pi (l, w)$ (see Sect. 5.4) is a $\delta$-solution of $\min _d E(d, T(t))$, i.e.

$$\begin{aligned} E(d_\delta (t), T(t)) \le \min _{d \in \Pi (l, w)} E(d, T(t)) + \delta . \end{aligned}$$

Recall that E(d, T(t)) is concave w.r.t. t, and its superdifferential $\partial _t E(d, T(t))$ is given by

$$\begin{aligned} \partial _t E(d, T(t)) = \sum _{i,j,r,a} d_{ij}^{ra} \partial T_{ij}^a(t). \end{aligned}$$

Further, $d_{ij}^{ra} \partial T^a_{ij}(t) = \sum _m d_{ij}^{ram} \partial T_{ij}^m(t)$, and

$$\begin{aligned} \partial T_{ij}^m(t) = \partial \min _{a_p^m \in P_{ij}^m} \langle t, a_{pm} \rangle = \textrm{Conv}\left\{ a_p^m \in P_{ij}^m: \langle t, a_{pm} \rangle = T_{ij}^m(t)\right\} , \end{aligned}$$

where $a_p^m \in \{0,1\}^{|{\mathcal {E}}|}$ is a binary vector encoding a path p for the travel mode m. Note that several shortest paths may exist. Finally, we get that

$$\begin{aligned} \partial _t E(d, T(t)) = \sum _{i,j,r,a,m} d_{ij}^{ram} \textrm{Conv}\left\{ a_p^m \in P_{ij}^m: \langle t, a_{pm} \rangle = T_{ij}^m(t)\right\} , \end{aligned}$$

(9)

and any supergradient $\nabla _t E(d, T(t))$ is a vector of link flows by shortest paths, corresponding to the trip distribution d.

Since we solve the dual problem (D3’), we need a way to recover an approximate solution (d, f) of the primal problem (P3). For any $t \ge 0$, given ${\tilde{d}}_{ij}^{ra}(t)$ and $T_{ij}^m(t)$, define $d'(t) \in \Pi '(l, w)$ by formula (4). Then we reconstruct a full correspondence matrix after K iterations of Algorithm 1 as

$$\begin{aligned} {\hat{d}}^K = \frac{1}{A_K} \sum _{k=1}^K \alpha _k d'(y^k) \in \Pi '(l, w). \end{aligned}$$

(10)

Corresponding link flows can be recovered as (see Kubentayeva and Gasnikov 2021, f. (18))

$$\begin{aligned}{}[{\hat{f}}^K]_e^m = \frac{1}{A_K} \sum _{k=1}^K \alpha _k [f^k]_e^m. \end{aligned}$$

(11)

where $f^k$ are link flows by shortest paths for times $y^k$ and correspondence matrix $d'(y^k)$, such that

$$\begin{aligned} \sum _m [f^k]_e^m = - \left[ {\tilde{\nabla }} \varphi _3(y^k)\right] _e, \quad e \in {\mathcal {E}}. \end{aligned}$$

3.2 Convergence analysis

Below we derive some properties of the problem and then use them to apply the USTM convergence theorem, what gives us the convergence rate of our dual algorithm for searching equlibria in combined model.

The next lemma is a trivial counterpart of f. (5) in Kubentayeva and Gasnikov (2021), following from (9).

Lemma 1

For any $d, d' \in \Pi (l, w)$, $t, t' \ge 0$, and supergradients $\nabla _t E(d, T(t))$, $\nabla _t E(d', T(t'))$ it holds that $\Vert \nabla _t E(d, T(t)) - \nabla _t E(d', T(t'))\Vert _2 \le M = \sqrt{2\,H} N$, where $H \le |{\mathcal {V}}|-1$ is the maximum simple path length in the network, and $N = \sum _r N_r$ is the total number of active agents.

Typically (e.g. for a Manhattan network) $H = O(\sqrt{|{\mathcal {V}}|})$.

Recall the following standard result concerning inexact oracles.

Lemma 2

For any $t, t' \ge 0$

$$\begin{aligned} {\tilde{\varphi }}_3(t') + \delta \ge \varphi _3(t') \ge {\tilde{\varphi }}_3(t) + \langle {\tilde{\nabla }} \varphi _3(t), t' - t \rangle . \end{aligned}$$

Proof

Since E(d, T(t)) is concave w.r.t. t,

$$\begin{aligned} E({\tilde{d}}(t), T(t')) \le E({\tilde{d}}(t), T(t)) + \langle \nabla _t E({\tilde{d}}(t), T(t)), t' - t \rangle = - {\tilde{\varphi }}_3(t) - \langle {\tilde{\nabla }} \varphi _3(t), t' - t \rangle . \end{aligned}$$

Thus,

$$\begin{aligned} \varphi _3(t') \ge - E({\tilde{d}}(t), T(t')) \ge {\tilde{\varphi }}_3(t) + \langle {\tilde{\nabla }} \varphi _3(t), t' - t \rangle . \end{aligned}$$

The claim follows. $\square$

The following bound is the main tool to prove the convergence rate. It is an analogue of Lemma 2 in Gasnikov and Nesterov (2016) adapted to the case of an inexact oracle.

Lemma 3

Assume at the k-th iteration of Algorithm 1 we call an inexact oracle $({\tilde{\Phi }}, {\tilde{\nabla }} \Phi )$ satisfying

$$\begin{aligned} {\tilde{\Phi }}(t') + \delta _k \ge \Phi (t') \ge {\tilde{\Phi }}(t) + \langle {\tilde{\nabla }} \Phi (t), t' - t \rangle \quad \forall t, t' \in {{\,\textrm{dom}\,}}h, \end{aligned}$$

with $\delta _k = \frac{\alpha _{k+1} \varepsilon }{4 A_{k+1}}$. Then for any $k > 0$

$$\begin{aligned} \Phi (t^k) + h(t^k) \le \frac{1}{A_k} \phi _k(u^k) + \frac{3 \varepsilon }{4} \le \frac{1}{A_k} \phi _k(t) - \frac{1}{2 A_k} \Vert t - u^k\Vert _2^2 + \frac{3 \varepsilon }{4} \quad \forall t \in {{\,\textrm{dom}\,}}h. \end{aligned}$$

Moreover,

$$\begin{aligned} \Phi (t^k) + h(t^k) + \frac{1}{2 A_k} \Vert t^* - u^k\Vert _2^2 \le \Phi (t^*) + h(t^*) + \frac{1}{2 A_k} \Vert t^* - t^0\Vert _2^2, \end{aligned}$$

where $t^* = \mathop {\mathrm {arg\,min}}\limits \Phi (t) + h(t)$.

Proof

We are going to prove by induction that

$$\begin{aligned} A_k (\Phi (t^k) + h(t^k)) \le \phi _k(u^k) + A_k \frac{3 \varepsilon }{4}. \end{aligned}$$

Note that since $\phi _k$ is 1-strongly convex,

$$\begin{aligned} \phi _k(u^k) \le \phi _k(t) - \frac{1}{2} \Vert t - u^k\Vert _2^2 \quad \forall t \in {{\,\textrm{dom}\,}}h. \end{aligned}$$

The inner stopping criterion yields that

$$\begin{aligned} {\tilde{\Phi }}(t^{k+1})&\le {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), t^{k+1} - y^{k+1} \rangle + \frac{L_{k+1}}{2} \Vert t^{k+1} - y^{k+1}\Vert _2^2 + \frac{\alpha _{k+1}}{2 A_{k+1}} \varepsilon \\&= {\tilde{\Phi }}(y^{k+1}) + \frac{\alpha _{k+1}}{A_{k+1}} \left[ \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - u^k \rangle + \frac{\varepsilon }{2}\right] + \frac{1}{2 A_{k+1}} \Vert u^{k+1} - u^k\Vert _2^2. \end{aligned}$$

By the assumptions of the lemma,

$$\begin{aligned} {\tilde{\Phi }}(y^{k+1}) \le \Phi (t^k) - \langle {\tilde{\nabla }} \Phi (y^{k+1}), t^k - y^{k+1}\rangle = \Phi (t^k) + \frac{\alpha _{k+1}}{A_{k+1}} \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^k - t^k\rangle , \end{aligned}$$

thus

$$\begin{aligned} A_{k+1}&{\tilde{\Phi }}(t^{k+1}) \le A_{k+1} {\tilde{\Phi }}(y^{k+1}) + \alpha _{k+1} \left[ \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - u^k \rangle + \frac{\varepsilon }{2}\right] + \frac{1}{2} \Vert u^{k+1} - u^k\Vert _2^2 \\&\le A_k \Phi (t^k) + \alpha _{k+1} \left[ {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - u^k + \frac{A_k}{A_{k+1}} (u^k - t^k)\rangle + \frac{\varepsilon }{2}\right] \\&\quad + \frac{1}{2} \Vert u^{k+1} - u^k\Vert _2^2 \\&= A_k \Phi (t^k) + \alpha _{k+1} \left[ {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - y^{k+1}\rangle + \frac{\varepsilon }{2}\right] + \frac{1}{2} \Vert u^{k+1} - u^k\Vert _2^2 . \end{aligned}$$

Now, using the convexity of h and the definition of $\delta _k$ we obtain that

$$\begin{aligned} A_{k+1}&\left[ \Phi (t^{k+1}) + h(t^{k+1})\right] \\&\le A_{k+1} \left[ {\tilde{\Phi }}(t^{k+1}) + \delta _k\right] + A_k h(t^k) + \alpha _{k+1} h(u^{k+1}) \\&\le A_k \left[ \Phi (t^k) + h(t^k)\right] + \alpha _{k+1} \left[ {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - y^{k+1}\rangle + h(u^{k+1}) + \frac{3 \varepsilon }{4}\right] \\&\quad + \frac{1}{2} \Vert u^{k+1} - u^k\Vert _2^2 \\&\le \phi _k(u^{k+1}) + \alpha _{k+1} \left[ {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), u^{k+1} - y^{k+1}\rangle + h(u^{k+1})\right] + A_{k+1} \frac{3 \varepsilon }{4} \\&= \phi _{k+1}(u^{k+1}) + A_{k+1} \frac{3 \varepsilon }{4} . \end{aligned}$$

The last claim of the lemma follows from the inequality

$$\begin{aligned} \Phi (t^*) \ge {\tilde{\Phi }}(t) + \langle {\tilde{\nabla }} \Phi (t), t^* - t \rangle , \end{aligned}$$

what implies

$$\begin{aligned} \frac{1}{A_k} \phi _k(t^*) \le \Phi (t^*) + h(t^*) + \frac{1}{2 A_k} \Vert t^* - t^0\Vert _2^2. \end{aligned}$$

$\square$

Now we are ready to prove the main result of this section: a primal-dual convergence rate for USTM in the combined model. The complexity analysis in the next theorem is similar to Theorems 3 and 4 in Kubentayeva and Gasnikov (2021), where USTM was applied to the route assignment problem.

Theorem 1

Assume $t^0 = {\bar{t}}$, $L_0 \le \frac{M^2}{\varepsilon }$, and at the k-th iteration problem (E) is solved with accuracy $\delta _k = \frac{\alpha _{k+1} \varepsilon }{4 A_{k+1}}$. Define

$$\begin{aligned} R = \Vert t^* - {\bar{t}}\Vert , \quad {\tilde{R}}^2 = \rho ^2 N^{2 / \mu } \sum _{e \in E} \frac{{\bar{t}}_e^2}{{\bar{f}}_e^{2 / \mu }}. \end{aligned}$$

Then in the case of Beckmann’s model, after at most

$$\begin{aligned} K = 4 \left( \frac{M {\tilde{R}}}{\varepsilon }\right) ^2 \end{aligned}$$

iterations it holds that

$$\begin{aligned} 0 \le P_3({\hat{d}}^K, {\hat{f}}^K) - D_3'(t^K) \le \varepsilon . \end{aligned}$$

In the case of Stable Dynamics model, after at most

$$\begin{aligned} K = 4 \left( \frac{M R}{\varepsilon }\right) ^2 \end{aligned}$$

iterations it holds that

$$\begin{aligned} 0 \le P_3({\hat{d}}^K, {\hat{f}}^K) - D_3'(t^K) + \langle ({\hat{f}}^K - {\bar{f}})_+, t^* - {\bar{t}}\rangle \le \varepsilon , \\ \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2 \le \frac{2 \varepsilon }{R}. \end{aligned}$$

Proof

By Lemma 1,

$$\begin{aligned} {\tilde{\Phi }}(t^{k+1}) \le {\tilde{\Phi }}(y^{k+1}) + \langle {\tilde{\nabla }} \Phi (y^{k+1}), t^{k+1} - y^{k+1} \rangle + M \Vert t^{k+1} - y^{k+1}\Vert _2. \end{aligned}$$

Then f. (A3) from the proof of Theorem 3 in Kubentayeva and Gasnikov (2021) ensures that for all k

$$\begin{aligned} A_k \ge \frac{\varepsilon k}{2 M^2}. \end{aligned}$$

Recall that $f = f^k$ are link flows by shortest paths for times $y = y^k$ and interzonal trips $d' = d'(y^k)$. Then, according to Sect. 2.3,

$$\begin{aligned} E({\tilde{d}}(y), T(y)) = \sum _{i, j, r, a, m} d_{ij}^{ram} T_{ij}^m(y) + H(d') = \sum _{e,m} f_e^m (y_e + c_e^m) + H(d'), \end{aligned}$$

and for any t

$$\begin{aligned} {\tilde{\varphi }}_3(y) + \left\langle {\tilde{\nabla }}\varphi _3(y), t - y \right\rangle&= -E({\tilde{d}}(y), T(y)) - \langle \nabla _t E({\tilde{d}}(y), T(y)), t - y \rangle \\&= - \sum _{e,m} f_e^m (y_e + c_e^m) - H(d') + \sum _{e,m} f_e^m (y_e - t_e) \\&= - \sum _{e,m} f_e^m (t_e + c_e^m) - H(d'). \end{aligned}$$

Therefore, due to the convexity of the entropy,

$$\begin{aligned} \phi _K(t)&= \sum _{k=1}^K \alpha _k \left( {\tilde{\varphi }}_3(y^k) + \left\langle {\tilde{\nabla }} \varphi _3(y^K), t - y^k \right\rangle + h(t)\right) + \frac{1}{2} \Vert t - t^0\Vert _2^2 \\&= - A_K \sum _{e,m} [{\hat{f}}^K]_e^m (t_e + c_e^m) - \sum _{k=1}^K \alpha _k H(d'(y^k)) + A_K h(t) + \frac{1}{2} \Vert t - {\bar{t}}\Vert _2^2 \\&\le A_K \sum _{e} \left( \sigma ^*_e(t_e) - [{\hat{f}}^K]_e t_e - \sum _m [{\hat{f}}^K]_e^m c_e^m\right) - A_K H({\hat{d}}^K) + \frac{1}{2} \Vert t - {\bar{t}}\Vert _2^2. \end{aligned}$$

Then by Lemma 3,

$$\begin{aligned} -D_3'(t^K) \le \sum _{e} \left( \sigma ^*_e(t_e) - [{\hat{f}}^K]_e t_e - \sum _m [{\hat{f}}^K]_e^m c_e^m\right) - H({\hat{d}}^K) + \frac{1}{2 A_K} \Vert t - {\bar{t}}\Vert _2^2 + \frac{3 \varepsilon }{4}. \end{aligned}$$

The rest of the proof repeats the proofs of Lemmata 1 and 2 in Kubentayeva and Gasnikov (2021), mutatis mutandis. In the case of the Beckmann model, we substitute $t_e = \tau _e\left( [{\hat{f}}^K]_e\right)$, what gives us the bound

$$\begin{aligned} -D_3'(t^K)&\le - \sum _{e} \left( \sigma _e\left( [{\hat{f}}^K]_e\right) + \sum _m [{\hat{f}}^K]_e^m c_e^m\right) - H({\hat{d}}^K) + \frac{{\tilde{R}}^2}{2 A_K} + \frac{3 \varepsilon }{4} \\&= - \Psi ({\hat{f}}^K) - H({\hat{d}}^K) + \frac{{\tilde{R}}^2}{2 A_K} + \frac{3 \varepsilon }{4} \le - P_3({\hat{d}}^K, {\hat{f}}^K) + \frac{(M {\tilde{R}})^2}{K \varepsilon } + \frac{3 \varepsilon }{4}. \end{aligned}$$

In the case of the Stable Dynamics model,

$$\begin{aligned} -D_3'(t^K)&\le \min _{t \ge {\bar{t}}} \left\{ \sum _{e} \left( {\bar{f}}_e (t_e - {\bar{t}}_e) - [{\hat{f}}^K]_e t_e - \sum _m [{\hat{f}}^K]_e^m c_e^m\right) + \frac{1}{2 A_K} \Vert t - {\bar{t}}\Vert _2^2\right\} \\&- H({\hat{d}}^K) + \frac{3 \varepsilon }{4} \\&= - \Psi ({\hat{f}}^K) - H({\hat{d}}^K) + \min _{t \ge {\bar{t}}} \left( \langle {\bar{f}}- {\hat{f}}^K, t - {\bar{t}}\rangle + \frac{1}{2 A_K} \Vert t - {\bar{t}}\Vert _2^2\right) + \frac{3 \varepsilon }{4} \\&= - P_3({\hat{d}}^K, {\hat{f}}^K) - \frac{A_K}{2} \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2^2 + \frac{3 \varepsilon }{4} \\&\le - P_3({\hat{d}}^K, {\hat{f}}^K) - \frac{K \varepsilon }{4 M^2} \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2^2 + \frac{3 \varepsilon }{4}. \end{aligned}$$

Since optimal $t^* - {\bar{t}}\ge 0$ are Lagrange multipliers for the problem (P3),

$$\begin{aligned} P_3({\hat{d}}^K, {\hat{f}}^K) \ge P_3(d^*, f^*) - \langle ({\hat{f}}^K - {\bar{f}})_+, t^* - {\bar{t}}\rangle = D_3'(t^*) - \langle ({\hat{f}}^K - {\bar{f}})_+, t^* - {\bar{t}}\rangle , \end{aligned}$$

and thus

$$\begin{aligned} - \langle ({\hat{f}}^K - {\bar{f}})_+, t^* - {\bar{t}}\rangle \le P_3({\hat{d}}^K, {\hat{f}}^K) - D_3'(t^K) \le - \frac{K \varepsilon }{4 M^2} \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2^2 + \frac{3 \varepsilon }{4}. \end{aligned}$$

Therefore,

$$\begin{aligned} \frac{K \varepsilon }{4 M^2} \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2^2 \le R \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2 + \frac{3 \varepsilon }{4} \end{aligned}$$

and, finally,

$$\begin{aligned} \Vert ({\hat{f}}^K - {\bar{f}})_+\Vert _2 \le \frac{4 M^2 R}{K \varepsilon } + M \sqrt{\frac{3}{K}}, \end{aligned}$$

what yields the result. $\square$

4 Frank–Wolfe variations and USTM in traffic assignment

Here, we consider several numerical methods for solving a separate problem of searching user equilibrium with inelastic demands. The Frank–Wolfe method and its variations with different line search strategies effectively solve the Beckmann traffic assignment problem, but due to its primal nature it cannot be applied to the stable dynamics model. Meanwhile, the primal-dual USTM method can be applied to both problems. Further, we conduct the experiments for these methods.

4.1 Frank–Wolfe variations

In the Beckmann model, searching equilibria reduces to minimization of the potential function (B). One of the most popular and effective approaches to solve this problem numerically is the famous Frank–Wolfe method (Frank and Wolfe 1956; Jaggi 2013) as well as its numerous modifications (Fukushima 1984; LeBlanc et al. 1985; Arezki and Van Vliet 1990; Chen et al. 2002). Also, one can apply the primal-dual subgradient methods to optimize the dual problem and then reconstruct a solution to the primal one. However, our research (Kubentayeva and Gasnikov 2021) showed that this approach demands more parameter adjustments to reach the Frank–Wolfe algorithm’s performance with standard step size strategy.

In this paper, we test various step size strategies of Frank–Wolfe method. Namely, we consider some simple decaying step size schedules like standard choice of step size $\gamma _k = \frac{2}{k+1}$ and $\gamma _k = \frac{1}{k}$ leading to the averaging of $f^k$, and a number of approaches based on a choice of the optimal step size by solving auxiliary one-dimensional problem

$$\begin{aligned} \gamma _k {:}{=}\mathop {\mathrm {arg\,min}}\limits \limits _{\gamma \in [\gamma _{\min }, \gamma _{\max }]} \Psi ((1 - \gamma ) f^k + \gamma s^k). \end{aligned}$$

The variety of these approaches corresponds to the different one-dimensional optimization methods. We consider the Brent method on a segment $\gamma _k \in [0, 1]$ (Brent 1971) and exponential decreasing of $\gamma _k$ until Armijo rule is satisfied (Armijo 1966). The last modification considered is the backtracking line-search method developed for specific use in Frank–Wolfe algorithms proposed in Pedregosa et al. (2020).

The Frank–Wolfe method’s theoretical convergence rate for convex objective (with Lipschitz-continuous gradient) is $O(1/\varepsilon )$ (Pedregosa et al. 2020; Jaggi 2013)

4.2 Primal-dual universal similar triangles method

Let us remind that (DualB) and (DualSD) dual problems of Beckmann and stable dynamics traffic assignment models have the same structure:

$$\begin{aligned} Q(t) = \sum _{i,j} d_{ij} T_{ij}(t) - h(t) \longrightarrow \max _{t \ge {\bar{t}}}. \end{aligned}$$

The optimization problems are convex, non-smooth and composite. We apply the USTM method described to minimize the composite objective Q(t). Here, in the Algorithm 1, we set $\Phi (t) {:}{=}\sum _{i,j} d_{ij} T_{ij}(t)$. As in Sect. 3.1, for both models, flows (primal variables) are reconstructed in the following way:

$$\begin{aligned} {\hat{f}}^K = - \frac{1}{A_K} \sum _{k = 1}^{K} \alpha _k \nabla \Phi (y^k), \end{aligned}$$

(12)

where $\alpha _k$ is a coefficient of the USTM method on iteration k, and $A_K = \sum _{k = 1}^{K} \alpha _k$. Note that any element from the set ${{\,\mathrm{\partial }\,}}\Phi (t)$ has form $\nabla \Phi (t) = - f$, where $f = \Theta x$ is a flow distribution on links induced by $x \in X$ concentrated on the shortest paths for given times t (and vice versa: any such f corresponds to a subgradient of $\Phi (t)$). Hence, weighted ${\hat{f}}^K$ are also induced by flows on the paths.

For the Beckmann model, we can also use the duality gap to estimate the method’s accuracy:

$$\begin{aligned} \Delta ^K = Q(t^K) + \Psi ({\hat{f}}^K). \end{aligned}$$

For the stable dynamics model, flows reconstruction according to (12) keeps feasibility of ${\hat{f}}^K$ (i.e. they are induced by flows on the paths), but can violate the networks capacity constraints — so the duality gap $\Delta ^K$ can be negative. To solve the SD traffic assignment problem with inelastic demand, Kubentayeva and Gasnikov (2021) proposed a novel way to reconstruct admissible flows — which also meet capacity constraints – together with a novel computable duality gap, which can be used in a stopping criterion.

The USTM method requires $O(1 / \varepsilon ^2)$ iterations to obtain an $\varepsilon$-solution of primal and dual problems of Beckmann and SD models (Nesterov 2015; Kubentayeva and Gasnikov 2021).

5 Sinkhorn’s variations for trip distribution

Optimization problem of entropy-based trip distribution model of Wilson (1969) coincides with optimal transport (OT) problem with entropy regularizer (Cuturi 2013). To solve the problem, celebrated Sinkhorn’s algorithm is used (Sect. 5.1). In Sects. 5.2 and 5.3, we consider accelerated gradient methods adapted for solving OT problems. These methods achieve better theoretical convergence rates compared to Sinkhorn-like methods in some regimes. Later, in Sect. 6.3, we conduct experiments to compare performances of the mentioned methods.

5.1 Sinkhorn’s algorithm

In this section, for the sake of formulas simplicity, we assume a single agent type and travel mode. Since the problem (8) is separable, without loss of generality, we consider only one trip purpose and suppose $\sum _{i,j} d_{ij} = 1$. Thus, eq. (8) takes form

$$\begin{aligned} \varphi (\lambda ^l, \lambda ^w) = \frac{1}{\gamma } \ln {\sum _{i,j}\exp \left( -\gamma \left( T_{ij} + \lambda ^l_{i} + \lambda ^w_{j} \right) \right) } + \sum _{i} \lambda _{i}^l l_i + \sum _j \lambda _{j}^w w_j. \end{aligned}$$

(13)

Following Guminov et al. (2021), we perform a change of variables $\mu ^l = -\gamma \lambda ^l$, $\mu ^w = -\gamma \lambda ^w$ in (13) and obtain an equivalent formulation

$$\begin{aligned} \phi (\mu ^l, \mu ^w) = \frac{1}{\gamma } \left[ \ln \left( {\textbf{1}}^T d\left( \mu ^l,\mu ^w \right) {\textbf{1}} \right) - \langle \mu ^l,l \rangle - \langle \mu ^w,w \rangle \right] \rightarrow \min _{{\mu ^l}, {\mu ^w}}, \end{aligned}$$

(14)

where

$$\begin{aligned} \left[ d(\mu ^l,\mu ^w) \right] _{i, j} = \exp {\left( \mu ^l_i+\mu ^w_j-\gamma T_{ij}(t) \right) }, \end{aligned}$$

(15)

with the primal-dual coupling

$$\begin{aligned} d= d(\mu ^l, \mu ^w) / {\textbf{1}}^T d(\mu ^l, \mu ^w){\textbf{1}}. \end{aligned}$$

(16)

Similarly to the well-known Sinkhorn algorithm, the objective in (14) can be alternatively minimized (see Algorithm 3).

Note that, according to Lemma 9 in Guminov et al. (2021) for the problem (14), partial explicit minimization is possible via the same formulas as for classical entropy-regularized OT problem (Cuturi 2013) without tautological constraint:

$$\begin{aligned} \psi (\mu ^l, \mu ^w) = {\textbf{1}}^T d\left( \mu ^l,\mu ^w \right) {\textbf{1}}- \langle \mu ^l, l \rangle - \langle \mu ^w, w \rangle \rightarrow \min _{\mu ^l, \mu ^w} , \end{aligned}$$

(17)

but the primal-dual coupling formulas are different: (16) for the problem (14) and (15) for the problem (17).

The argminima of (14) should be implemented using numerically stable computation of the logarithm of the sum of exponents (logsumexp trick), but analytically the argminima are given by

$$\begin{aligned}&\ln \mu ^l_{k+1} {:}{=}\ln \mu ^l_{k}+\ln l-\ln \left( d\left( \mu ^l_k, \mu ^w_k \right) {\textbf{1}} \right) , \end{aligned}$$

(18)

$$\begin{aligned}&\ln \mu ^w_{k+1} {:}{=}\ln \mu ^w_{k}+\ln w-\ln \left( {\textbf{1}}^T d\left( \mu ^l_k, \mu ^w_k \right) \right) , \end{aligned}$$

(19)

where logarithm is taken element-wisely.

The authors of Gasnikov et al. (2015) pointed out that the objective (14), its gradient

$$\begin{aligned}&\nabla _{\mu ^l}\phi (\mu ) = \partial \phi (\mu ^l, \mu ^w) /\partial \mu ^l = \frac{1}{\gamma }\left( \frac{d\left( \mu ^l, \mu ^w \right) {\textbf{1}}}{{\textbf{1}}^T d\left( \mu ^l, \mu ^w \right) {\textbf{1}}} -l\right) , \end{aligned}$$

(20)

$$\begin{aligned}&\nabla _{\mu ^w}\phi (\mu ) = \partial \phi (\mu ^l, \mu ^w) /\partial \mu ^w = \frac{1}{\gamma }\left( \frac{d\left( \mu ^l, \mu ^w \right) ^T{\textbf{1}}}{{\textbf{1}}^T d\left( \mu ^l, \mu ^w \right) {\textbf{1}}} - w\right) , \end{aligned}$$

(21)

and eq. (15) are invariant under transformations

$$\begin{aligned}&\mu ^l\rightarrow \mu ^l+t_{\mu ^l}{\textbf{1}}\end{aligned}$$

(22)

$$\begin{aligned}&\mu ^w\rightarrow \mu ^w+t_{\mu ^w}{\textbf{1}}, \end{aligned}$$

(23)

with $t_{\mu ^l},t_{\mu ^w}\in {\mathbb {R}}$. That leads to better numerical stability. In our experiments, we present a variant of Algorithm 3 (labeled as SINKHORN-TAUT-SHIFT) with such invariant transformations, that provide maximum of the dual variables equals 1, and with numerically stable computations of the logarithm of the sum of exponents.

5.2 Accelerated Sinkhorn’s algorithm

Besides the Sinkhorn’s algorithm, accelerated gradient methods are adapted for solving OT problems. These methods achieve better theoretical convergence rates compared to Sinkhorn-like methods in some regimes. To the best available knowledge, the first such method was proposed in Gasnikov et al. (2016), where the authors proposed non-adaptive Accelerated Gradient Descent (AGD) method for a more general class of entropy-linear programming problems. The algorithmic idea is to run AGD for solving (14) and equip it with some primal updates to guarantee the convergence rate also for the primal problem.

In this subsection, Algorithm 1 and Algorithm 2 (its adaptation for trip distribution problem listed as Algorithm 4) from Guminov et al. (2021) are described. The authors proposed to replace in the classical AGD methods the gradient step with a step of explicit minimization w.r.t. one of the blocks of variables. To formalize the latter, suppose that the vector of dual variables can be divided into m block s.t. $\mu = \left( \mu _1^T, \dots , \mu _m^T\right) ^T$. So that, notations $\phi (\mu )$ and $\phi (\mu _1, \dots , \mu _m)$ are equivalent. And suppose that it is possible to minimize the dual objective (14) over i-th block holding the others variables fixed:

$$\begin{aligned} \mathop {\mathrm {arg\,min}}\limits _i \phi (\mu ) {:}{=}\mathop {\mathrm {arg\,min}}\limits _{\xi } \phi (\mu _1, \dots , \mu _{i-1},\xi ,\mu _{i+1}, \cdots , \mu _m). \end{aligned}$$

(24)

Introduce also a notation for block gradient

$$\begin{aligned} \nabla _i \phi (\mu ) = \frac{\partial \phi (\mu _1, \dots , \mu _{i-1},\xi ,\mu _{i+1}, \cdots , \mu _m)}{\partial \xi }. \end{aligned}$$

(25)

The resulting algorithms m times theoretically slower than its gradient counterpart, where m is the number of blocks of variables used in alternating minimization. But in practice the algorithms work faster (Guminov et al. 2021).

In practice variable transformations (22, 23) (with $t_\xi = -\Vert \xi \Vert _\infty$, $\xi \in \{\mu ^l,\mu ^w\}$) performed after steps 5 and 7 of Algorithm 4 and can lead to a better numerical stability when $\gamma$ is big.

For Algorithm 2 from Guminov et al. (2021) constraints residual $\Vert ((d{\textbf{1}}- l)^T, ({\textbf{1}}^Td -w)^T)^T\Vert _2 = {{\widetilde{O}}} \left( \frac{1}{k^{2}}\right)$, but in our experiment it was observed that constraints residual decrease faster for $d=d(\kappa _{k})$ (16) than for the theoretically obtained primal variable d using primal-dual property of the algorithm. We present experiments only on the best performing modifications with $d=d(\kappa _{k})$ primal variable reconstruction, labeled as NONPD (since it does not utilizes primal-dual property of the algorithms considered in this subsection).

According to (Guminov et al. 2021, Theorem 3) the objective of the form (14) can be minimized with the following rate

$$\begin{aligned} \phi (\kappa _k)-\phi ^* = {{\widetilde{O}}} \left( \frac{\gamma ^{1/2}\Vert T\Vert _\infty }{{k^{2}}}\right) . \end{aligned}$$

5.3 MIXED AGM

One more natural modification of Algorithm 4 can be obtained by performing several steps of explicit minimization instead of one. The natural number of steps seems to be equal to the number of blocks m. But the proof of Algorithm 4 utilizes the following property of a step explicit minimization

$$\begin{aligned} \phi (\eta ^{k+1})\le \phi (\mu ^k)-\frac{1}{2L}\Vert \nabla _{i^k} \phi (\mu ^k)\Vert _2^2 \end{aligned}$$

in order to obtain

$$\begin{aligned} \phi (\eta ^{k+1}) \le \phi (\mu ^k)-\frac{1}{2nL}\Vert \nabla \phi (\mu ^k)\Vert _2^2. \end{aligned}$$

(26)

The latter is true since $i^k=\mathop {\mathrm {arg\,max}}\limits \limits _{i\in \{1, \dots , m\}} \Vert \nabla _{i^k} \phi (\mu ^k)\Vert _2^2$.

But the inequality (26) can be satisfied if one replaces lines 6 and 7 in Algorithm 4 with the following Algorithm 5.

Despite the practical performance, this modification has no theoretical guarantees because $\Vert \nabla _{i^k} \phi (\mu ^k)\Vert _2^2$ can be greater than $\sum ^J_j \Vert \nabla \phi (\zeta ^j)\Vert _2^2$ for any $J > m$.

Moreover, Algorithm 4 is non-increasing. But non-increasing property of the algorithm can be violated (with either the exact minimization given by lines 6 and 7 of Algorithm 4 replaced with Algorithm 5 or not) due to numerical instabilities. Once it happened, Algorithm 4 is stopped. The computations can be proceeded from the last obtained $\eta ^k$ with Sinkhorn’s iterations. In fact, these numerical instabilities break monotonicity of Sinkhorn’s iterations too, but in practice the proceeding of computations with Sinkhorn’s iterations allows to find better minima.

The modification, named MIXED-AGM-NONPD, combines Sinkhorn’s iterations after reaching the stability limit and the exact minimization given by Algorithm 5.

5.4 Reconstruction of correspondence matrix

Finally, let us discuss a reconstruction of a solution to the primal problem (E). Assume we reconstruct a solution $d_k$ of the primal problem (E) by formula (16) with $\mu ^l = \mu ^l_{k}$, $\mu ^w = \mu ^w_{k}$. However, since the dual problem is only approximately solved, $d^k$ in general does not satisfy the marginal constraints. So obtain a feasible solution, one can use projection Algorithm 2 from Altschuler et al. (2017). According to (Altschuler et al. 2017, Theorem 4), it has complexity $O(|O| \cdot |D|)$ and returns a correspondence matrix ${\hat{d}}_k \in \Pi (l, w)$ such that

$$\begin{aligned} \Vert d_k - {\hat{d}}_k\Vert _1 \le \delta _k = \Vert d_k {\textbf{1}}- l\Vert _1 + \Vert d_k^T {\textbf{1}}- w\Vert _1. \end{aligned}$$

The error can be estimated following Theorem 8 in Stonyakin et al. (2019):

$$\begin{aligned} E({\hat{d}}_k, T) \le \min _{d \in \Pi (l, w)} E(d, T) + 2 \delta _k \Vert T\Vert _\infty + \frac{4 \delta _k}{\gamma } \log \left( \frac{|O| \cdot |D|}{\delta _k}\right) . \end{aligned}$$

(27)

Consider $\mu ^l_k$, $\mu ^w_k$ obtained with Sinkhorn’s algorithm. Using (20) and the convexity of $\phi$ we get

$$\begin{aligned} \phi (\mu ^l_k, \mu ^w_k) - \phi ^*&\le \langle \mu ^l_k - \mu ^l_*, \nabla _{\mu ^l} \phi (\mu ^l_k, \mu ^w_k)\rangle + \langle \mu ^w_k - \mu ^w_*, \nabla _{\mu ^w} \phi (\mu ^l_k, \mu ^w_k)\rangle \\&= \frac{1}{\gamma } \left( \langle \mu ^l_k - \mu ^l_*, d_k {\textbf{1}}- l\rangle + \langle \mu ^w_k - \mu ^w_*, d_k^T {\textbf{1}}- w\rangle \right) , \end{aligned}$$

where $(\mu ^l_*, \mu ^w_*)$ is the solution of (15). Then Lemma 3 and Theorem 7 from Stonyakin et al. (2019) ensure that

$$\begin{aligned} \phi (\mu ^l_k, \mu ^w_k) - \phi (\mu ^l_*, \mu ^w_*) \le \frac{1}{2} \Vert T\Vert _\infty \left( \Vert d_k {\textbf{1}}- l\Vert _1 + \Vert d_k^T {\textbf{1}}- w\Vert _1\right) . \end{aligned}$$

Combining the above bounds, we obtain the following bound on the duality gap:

$$\begin{aligned} E({\hat{d}}_k, T) + \phi (\mu ^l_k, \mu ^w_k) \le \frac{5}{2} \delta _k \Vert T\Vert _\infty + \frac{4 \delta _k}{\gamma } \log \left( \frac{|O| \cdot |D|}{\delta _k}\right) . \end{aligned}$$

(28)

6 Numerical experiments

In our experiments, we consider the morning peak-hour in Moscow transportation network. The city’s data are provided by Russian University of Transport.

The city and its suburbs are split into 1420 zones. Moscow road network consists of 12970 nodes and 36905 links, a part of it is visualized on Fig. 1. We model the crossings by inserting auxiliary links for each allowed turn between road links. Resulting graph contains 63073 nodes and 94546 links.

In our four-stage model of Moscow we consider

Two demand layers: home-to-work, and home-to-others;
Two agent types: car owners and non-car-owners;
And three travel modes: public transport, pedestrian and car.

6.1 Parallel computing

Calculation of flows f is the most expensive part, since we have to find the shortest paths for all pairs $w \in OD$. We use Dijkstra’s algorithm (Dijkstra 1959) to find the shortest paths, which runs in $O(|{\mathcal {E}}| + |{\mathcal {V}}| \log |{\mathcal {V}}|)$ time; Given the shortest paths tree, flows aggregation have linear performance $O(|{\mathcal {V}}|)$. Hence, the total complexity of flows calculation is $O\bigl (|O| (|{\mathcal {E}}| + |{\mathcal {V}}| \log |{\mathcal {V}}|)\bigr )$. Moreover, flows reconstruction for every source $o \in O$ can be computed in parallel. Table 1 shows the result of running 100 iterations of the Frank–Wolfe method on Moscow road network with the various number of cores involved (processor’s speed is 3092,72 MHz).

Table 1 Effect of CPU parallelism

Full size table

6.2 Frank–Wolfe algorithm’s variations

Each of the considered modifications of the Frank–Wolfe algorithm was run up to 2000 iterations for the traffic assignment task of the classic four-stage model for the Moscow road network. The results are shown in Fig. 2.

6.3 Sinkhorn algorithm’s variations

Experiments were run for the Trip distribution stage with dual function adjustment for gradient methods described in Sect. 5 for the Moscow road network. The results are shown in Fig. 3.

Different formulations of minimized targets for Sinkhorn’s method were considered (for example, formulation (14) or (17)), but conceptual differences were not identified, therefore only (14) formulation is shown as SINKHORN-TAUT-SHIFT. Label AAM-NONPD corresponds (Guminov et al. 2021, Algorithm 3), that can be easily adapted similarly as Algorithm 4 was adapted from (Guminov et al. 2021, Algorithm 1). One should note that utilized Sinkhorn’s variation has comparable to gradient methods convergence rate, hence common approach is suitable for solving Trip Distribution problem.

6.4 Combined model, Beckmann

Here we compare three algorithms for finding a fixed-point of the four-stage Beckmann traffic model, namely Four-stage procedure, Evans algorithm and our dual approach via USTM. The difference with the two-stage model is the addition of mode split and mode cost averaging steps. Mode split step usually cause wobbling between public transport and car modes when using straightforward Four-stage procedure: if the road network is free at the first iteration agents start alternating between these two modes at each iteration. So we applied exponential averaging of modes cost matrices to handle this problem: $T^m_{ij}[k+1] = \frac{1}{2} \left( T^m_{ij}[\text {new}] + T^m_{ij}[k]\right)$.

Figure 4 shows the convergence of the duality gap for all three algorithms considered. It can be seen that the Four-stage procedure does not tend to converge to zero duality gap: after 5-6 iterations (about 70 min) it reaches its lower value of the duality gap, then it starts to fluctuate around this value. In order to increase the accuracy of the approximate solution found by Four-stage procedure, one has to increase the number of inner iterations, which will make each outer iteration slower.

In contrast, Evans method steadily converge to zero duality gap.

Some intuition about the behavior of the methods can be given by the Fig. 5, where two-dimensional projections of $d^m_{ij}$ trajectories are depicted. The projections were made by multidimensional scaling method, which tries to preserve pairwise distances while matching points from high-dimensional space (in our case — correspondence matrices) to points on the plane. As one can see, the trajectories start from the same point, since the calculation of the correspondence matrices and the modal splitting in both methods is the same. After a few iterations the trajectories of the methods are in the same region again, but the Evans method proceeds with small steps, while the Four-stage procedure makes long jumps around the point to which Evans method converges. The trajectory of USTM is similar to the trajectory of the Evans algorithm and is omitted for the sake of readability.

6.5 Combined model, stable dynamics

Here we compare the results obtained for the Beckman and the Stable Dynamics models on the Moscow city transportation model. We use the USTM algorithm to search for the equilibria because other algorithms are not applicable since the link travel times are not functions of the link flows in the Stable Dynamics model.

We used the same Moscow network as in previous experiment, but, since Stable Dynamics model is usually infeasible for peak-hours correspondences, we divided the peak-hour departures $l_i^{ra}$ and arrivals $w_j^r$ by two.

Convergence trajectories for Stable Dynamics model are shown in Fig. 6. We discuss convergence of Beckmann model in more representative case of peak-hour departures and arrivals in Sect. 6.4, therefore convergence trajectories for Beckmann model are omitted in this subsection.

We asses the convergence by monitoring two values: constraints violation and function suboptimality. Since the dual approach allows the flows to exceed the link capacities, the primal variables stay outside of the feasible region, thus the duality gap could be negative, as shown in Fig. 6b. Then duality gap is negative, the objective function value at that approximate solution (of the minimization problem) is less than the optimal function value, but the approximate solution is infeasible.

The comparison of the approximate solutions is given in Fig. 7. It is evident that Beckmann’s model is more likely to exceed the link capacity. Figure 7b shows that the travel time on some links in Stable Dynamics model exceeds the free-flow time by several hundred times. This implies that some zones are connected to the rest of the network only by low-capacity links, leading to huge traffic congestion at equilibrium. This result is likely due to inaccuracies in the input data, but if not, these bottleneck links should be prioritized in the transportation network improvement process.

6.6 Traffic assignment model: Frank–Wolfe vs USTM for Beckmann model

Experiments were conducted for single trip purpose, agent type and travel mode (by car) for the Berlin-Center network split into 865 zones with 12981 nodes and 28376 links (for more details see Transportation Networks for Research Core Team 2023). As it was shown in the article (Kubentayeva and Gasnikov 2021), performance of the USTM method is better than UGD (Nesterov 2015) and other variations of accelerated gradient descent, thus only USTM and conventional Frank–Wolfe methods are considered. Convergence by primal function and duality gap is presented in Fig. 8. It is necessary to emphasize that the bigger $\varepsilon$, the faster USTM converges to $\varepsilon$ accuracy and oscillates. Thereby, it makes sense to use restarting technique for faster convergence — run method with $\varepsilon '$ and then with final desired accuracy $\varepsilon < \varepsilon '$.

References

Abrahamsson T, Lundqvist L (1999) Formulation and estimation of combined network equilibrium models with applications to stockholm. Transp Sci 33(1):80–100
Article Google Scholar
Altschuler J, Weed J, Rigollet P (2017) Near-linear time approxfimation algorithms for optimal transport via Sinkhorn iteration. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in neural information processing systems 30, pp. 1961–1971. Curran Associates, Inc., arXiv:1705.09634. http://papers.nips.cc/paper/6792-near-linear-time-approximation-algorithms-for-optimal-transport-via-sinkhorn-iteration.pdf
Anikin A, Dorn Y, Nesterov Y (2020) Computational methods for the stable dynamic model, pp 280–294 . https://doi.org/10.1007/978-3-030-38603-0_21
Arezki Y, Van Vliet D (1990) A full analytical implementation of the PARTAN/Frank-Wolfe algorithm for equilibrium assignment. Transp Sci 24(1):58–62
Article Google Scholar
Armijo L (1966) Minimization of functions having Lipschitz continuous first partial derivatives. Pac J Math 16(1):1–3
Article Google Scholar
Babazadeh A, Javani B, Gentile G, Florian M (2020) Reduced gradient algorithm for user equilibrium traffic assignment problem. Transp A: Transp Sci 16(3):1111–1135. https://doi.org/10.1080/23249935.2020.1722279
Article Google Scholar
Beckmann MJ, McGuire CB, Winsten CB (1956) Studies in the economics of transportation. Technical report
Boyce D (2002) Is the sequential travel forecasting paradigm counterproductive? J Urban Plan Develop 128(4):169–183
Article Google Scholar
Boyce DE, Chon KS, Lee YJ, Lin K, LeBlanc LJ (1983) Implementation and computational issues for combined models of location, destination, mode, and route choice. Environ Plan A 15(9):1219–1230
Article Google Scholar
Boyce DE, Zhang Y-F, Lupa MR (1994) Introducing feedback into four-step travel forecasting procedure versus equilibrium solution of combined model. Transport Res Record 1443:65
Google Scholar
Brent RP (1971) An algorithm with guaranteed convergence for finding a zero of a function. Comput J 14(4):422–425
Article Google Scholar
Cabannes T, Glista E, Dwarakanath K, Rao X, Veeravalli T, Bayen AM (2019) Sensitivity analysis and relaxation of the static traffic assignment problem with capacity constraints. In: 2019 IEEE 58th conference on decision and control (CDC), pp 2214–2219. https://doi.org/10.1109/CDC40024.2019.9030149
De Cea J, Fernández JE, Dekock V, Soto A (2005) Solving network equilibrium problems on multimodal urban transportation networks with multiple user classes. Transp Rev 25(3):293–317
Article Google Scholar
Chen A, Jayakrishnan R, Tsai W (2002) Faster Frank-Wolfe traffic assignment with new flow update scheme. J Transp Eng ASCE. https://doi.org/10.1061/(ASCE)0733-947X(2002)128:1(31)
Article Google Scholar
Chen X, Liu Z, Zhang K, Wang Z (2020) A parallel computing approach to solve traffic assignment using path-based gradient projection algorithm. Transp Res Part C: Emerg Technol 120:102809. https://doi.org/10.1016/j.trc.2020.102809
Article Google Scholar
Chu Y-L (2018) Implementation of a new network equilibrium model of travel choices. J Traffic Transp Eng (English Edition) 5(2):105–115. https://doi.org/10.1016/j.jtte.2017.05.014
Article Google Scholar
Chudak FA, Dos Santos Eleuterio V, Nesterov Y (2007) Static traffic assignment problem: a comparison between Beckmann (1956) and Nesterov & de Palma (1998) models. In: 7th swiss transport research conference . ETH
Cuturi M (2013) Sinkhorn distances: lightspeed computation of optimal transportation distances
Dijkstra EW et al (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271
Article Google Scholar
Dios Ortúzar J, Willumsen LG (2011) Modelling Transport, (4th edn). Wiley
Evans SP (1976) Derivation and analysis of some models for combining trip distribution and assignment. Transp Res 10(1):37–57. https://doi.org/10.1016/0041-1647(76)90100-3
Article Google Scholar
Fan Y, Ding J, Liu H, Wang Y, Long J (2022) Large-scale multimodal transportation network models and algorithms-part i: the combined mode split and traffic assignment problem. Transp Res Part E: Logist Transp Rev 164:102832
Article Google Scholar
Fernández E, De Cea J, Florian M, Cabrera E (1994) Network equilibrium models with combined modes. Transp Sci 28(3):182–192
Article Google Scholar
Florian M, Nguyen S (1978) A combined trip distribution modal split and trip assignment model. Transp Res 12(4):241–246
Article Google Scholar
Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res Logist Q 3(1–2):95–110
Article Google Scholar
Fukushima M (1984) A modified Frank-Wolfe algorithm for solving the traffic assignment problem. Transp Res Part B: Methodol 18(2):169–177
Article Google Scholar
Gao J, Wang Y, Zhou J (2022) A study on two-stage selection model of tourism destination at the scale of urban agglomerations. Arch Transp 63:143–157. https://doi.org/10.5604/01.3001.0016.0020
Article Google Scholar
Gasnikov AV, Gasnikova EV, Nesterov YE, Chernov AV (2016) Efficient numerical methods for entropy-linear programming problems. Comput Math Math Phys 56(4):514–524. https://doi.org/10.1134/S0965542516040084
Article Google Scholar
Gasnikov AV, Nesterov YE (2018) Universal method for stochastic composite optimization problems. Comput Math Math Phys 58(1):48–64
Article Google Scholar
Gasnikov A, Dvurechensky P, Kamzolov D, Nesterov Y, Spokoiny V, Stetsyuk P, Suvorikova A, Chernov A (2015) Universal method with inexact oracle and its applications for searching equillibriums in multistage transport problems. arXiv preprint arXiv:1506.00292
Gasnikov A, Nesterov Y (2016) Universal fast gradient method for stochastic composit optimization problems. arXiv:1604.05275
Guminov S, Dvurechensky P, Tupitsa N, Gasnikov A (2021) On a combination of alternating minimization and Nesterov’s momentum. In: International conference on machine learning, pp 3886–3898. PMLR
Jaggi M (2013) Revisiting frank–wolfe: projection-free sparse convex optimization. In: Proceedings of the 30th international conference on machine learning, pp 427–435
Kubentayeva M, Gasnikov A (2021) Finding equilibria in the traffic assignment problem with primal-dual gradient methods for stable dynamics model and Beckmann model. Mathematics 9(11):1217
Article Google Scholar
LeBlanc LJ, Helgason RV, Boyce DE (1985) Improved efficiency of the Frank-Wolfe algorithm for convex network programs. Transp Sci 19(4):445–462
Article Google Scholar
Liu Z, Chen X, Meng Q, Kim I (2018) Remote park-and-ride network equilibrium model and its applications. Transp Res Part B: Methodol 117:37–62. https://doi.org/10.1016/j.trb.2018.08.004
Article Google Scholar
Nesterov Y (2009) Primal-dual subgradient methods for convex problems. Math Program 120(1):221–259. https://doi.org/10.1007/s10107-007-0149-x
Article Google Scholar
Nesterov Y (2015) Universal gradient methods for convex optimization problems. Math Program 152(1):381–404. https://doi.org/10.1007/s10107-014-0790-0
Article Google Scholar
Nesterov Y, De Palma A (2003) Stationary dynamic solutions in congested transportation networks: summary and perspectives. Netw Spat Econ 3(3):371–395
Article Google Scholar
Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, Massachusetts
Oppenheim N et al (1995) Urban travel demand modeling: from individual choices to general equilibrium. Wiley, New York
Google Scholar
Patriksson M (2015) The traffic assignment problem: models and methods. Courier Dover Publications
Pedregosa F, Negiar G, Askari A, Jaggi M (2020) Linearly convergent Frank–Wolfe with backtracking line-search. In: international conference on artificial intelligence and statistics. Proceedings of machine learning research
Sheffi Y (1985) Urban transportation networks, vol 6, Prentice-Hall, Englewood Cliffs, NJ
Sinkhorn R (1974) Diagonal equivalence to matrices with prescribed row and column sums II. Proc Am Math Soc 45:195–198. https://doi.org/10.2307/2040061
Article Google Scholar
Smith M, Huang W, Viti F, Tampère CMJ, Lo HK (2019) Quasi-dynamic traffic assignment with spatial queueing, control and blocking back. Transp Res Part B: Methodol 122:140–166. https://doi.org/10.1016/j.trb.2019.01.018
Article Google Scholar
Stonyakin FS, Dvinskikh D, Dvurechensky P, Kroshnin A, Kuznetsova O, Agafonov A, Gasnikov A, Tyurin A, Uribe CA, Pasechnyuk D, Artamonov S (2019) Gradient methods for problems with inexact model of the objective. In: Khachay M, Kochetov Y, Pardalos P (eds) Math Optim Theor Oper Res. Springer, Cham, pp 97–114
Chapter Google Scholar
Transportation Networks for Research Core Team (2023) Transportation networks for research. https://github.com/bstabler/TransportationNetworks. Accessed: 2023-04-30
US Bureau of Public Roads (1964) Traffic assignment manual. Department of Commerce, urban planning division, Washington D.C
Wang X, Shahidehpour M, Jiang C, Li Z (2019) Coordinated planning strategy for electric vehicle charging stations and coupled traffic-electric networks. IEEE Transact Power Syst 34(1):268–279. https://doi.org/10.1109/TPWRS.2018.2867176
Article Google Scholar
Wang Z, Zhang K, Chen X, Wang M, Liu R, Liu Z (2022) An improved parallel block coordinate descent method for the distributed computing of traffic assignment problem. Transp A: Transp Sci 18(3):1376–1400. https://doi.org/10.1080/23249935.2021.1942303
Article Google Scholar
Wilson AG (1969) The use of entropy maximising models, in the theory of trip distribution, mode split and route split. J Transp Econ Polic 108–126
Xie J, Nie Y, Liu X (2017) A greedy path-based algorithm for traffic assignment. Transp Res Record. https://doi.org/10.13140/RG.2.2.18661.09447
Article Google Scholar
Yang C, Chen A, Xu X (2013) Improved partial linearization algorithm for solving the combined travel-destination-mode-route choice problem. J Urban Plan Develop 139(1):22–32
Article Google Scholar
Zarrinmehr A, Aashtiani H, Nie Y, Azizian H (2019) Complementarity formulation and solution algorithm for auto-transit assignment problem. Transp Res Record J Transp Res Board 2673:036119811983795. https://doi.org/10.1177/0361198119837956
Article Google Scholar
Zhu J-X, Luo Q-Y, Guan X-Y, Yang J-L, Bing X (2020) A traffic assignment approach for multi-modal transportation networks considering capacity constraints and route correlations. IEEE Access 8:158862–158874
Article Google Scholar
Zokaei Aashtiani H, Poorzahedy H, Nourinejad M (2021) Wardrop’s first principle: extension for capacitated networks. Sci Iran 28(1):175–191
Google Scholar

Download references

Acknowledgements

This work was supported by a grant for research centers in the field of artificial intelligence, provided by the Analytical Center for the Government of the Russian Federation in accordance with the subsidy agreement (agreement identifier 000000D730321P5Q0002) and the agreement with the Moscow Institute of Physics and Technology dated November 1, 2021 No. 70-2021-00138.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Meruza Kubentayeva, Demyan Yarmoshik, Mikhail Persiianov, Ekaterina Kotliarova, Nazarii Tupitsa, Dmitry Pasechnyuk, Alexander Gasnikov & Vladimir Shvetsov
Institute for Information Transmission Problems RAS, Moscow, Russia
Mikhail Persiianov, Alexey Kroshnin & Alexander Gasnikov
Higher School of Economics, Moscow, Russia
Alexey Kroshnin & Alexander Gasnikov
Russian University of Transport, Moscow, Russia
Vladimir Shvetsov, Leonid Baryshev & Alexey Shurupov

Authors

Meruza Kubentayeva
View author publications
You can also search for this author in PubMed Google Scholar
Demyan Yarmoshik
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail Persiianov
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Kroshnin
View author publications
You can also search for this author in PubMed Google Scholar
Ekaterina Kotliarova
View author publications
You can also search for this author in PubMed Google Scholar
Nazarii Tupitsa
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Pasechnyuk
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gasnikov
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Shvetsov
View author publications
You can also search for this author in PubMed Google Scholar
Leonid Baryshev
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Shurupov
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AG conceptualized the research topic. In Sect. 2, AG, VS, MK, DY, and AK made a review of the existing results and formulated optimization problems of the proposed NE model. AK, DY, and MK provided in Sects. 3 and 5.4 convergence analysis for the USTM method for solving the considered NE problem with SD and Beckmann models in the traffic assignment stage. DY, MP, and MK implemented combined models and conducted the experiments of Sects. 6.1, 6.4, and 6.5. DP made a review of numerical methods in Sect. 4 for the traffic assignment problem and conducted the experiments of Sects. 6.2. and 6.6. NT and EK made a review of numerical methods for the trip distribution problem in Sects. 5.1$-$5.3 and conducted the experiments of the Sect. 6.3 VS, LB, and AS aggregated travel demand and transportation network data of Moscow, designed the network structure, participated in designing the experiments for the Moscow network, and provided computing resources.

Corresponding author

Correspondence to Meruza Kubentayeva.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kubentayeva, M., Yarmoshik, D., Persiianov, M. et al. Primal-dual gradient methods for searching network equilibria in combined models with nested choice structure and capacity constraints. Comput Manag Sci 21, 15 (2024). https://doi.org/10.1007/s10287-023-00494-8

Download citation

Received: 01 July 2023
Accepted: 28 November 2023
Published: 04 January 2024
DOI: https://doi.org/10.1007/s10287-023-00494-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Primal-dual gradient methods for searching network equilibria in combined models with nested choice structure and capacity constraints

Abstract

Similar content being viewed by others

Dual Methods for Finding Equilibriums in Mixed Models of Flow Distribution in Large Transportation Networks

Hybrid Methods for Network Equilibrium Problems

Equilibrium in Multi-criteria Transportation Networks

1 Introduction

2 Problem statement

2.1 Route assignment models

2.2 Trip distribution with modal split (D-MS)

2.3 Combined distribution-modal split-assignment problem (D-MS-A)

3 Dual approach for solving the combined model

3.1 Dual method for NE problem

3.2 Convergence analysis

Lemma 1

Lemma 2

Proof

Lemma 3

Proof

Theorem 1

Proof

4 Frank–Wolfe variations and USTM in traffic assignment

4.1 Frank–Wolfe variations

4.2 Primal-dual universal similar triangles method

5 Sinkhorn’s variations for trip distribution

5.1 Sinkhorn’s algorithm

5.2 Accelerated Sinkhorn’s algorithm

5.3 MIXED AGM

5.4 Reconstruction of correspondence matrix

6 Numerical experiments

6.1 Parallel computing

6.2 Frank–Wolfe algorithm’s variations

6.3 Sinkhorn algorithm’s variations

6.4 Combined model, Beckmann

6.5 Combined model, stable dynamics

6.6 Traffic assignment model: Frank–Wolfe vs USTM for Beckmann model

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation