1 Introduction

1.1 Road traffic modeling

With the concentration of populations in cities where transport flows are constantly increasing, road traffic modeling has become a central issue in urban planning. Whether it is to configure traffic lights, to adapt public transport offers or to manage in an optimal way situations of high congestion. The knowledge of the behavior of a road network is an important issue for the management of a crisis in an urban environment. It is indeed necessary to quickly predict traffic, especially traffic jams, in order to organize rescue operations. Before creating decision support tools for crisis management involving road traffic, a first step is to develop a method for monitoring road traffic.

In this work, we will focus on the evacuation of a traffic road in finite time. It can be, for example, an axis that we wish to free in order to provide access to first aid. We approach this question in the form of an optimal control problem of a graph where each edge corresponds to a traffic road. The main objective of this work is the determination of a prototype algorithm answering this question.

1.2 State of the art

Depending on the situation of interest, several types of approaches co-exist to model this phenomenon, depending on the scale and the desired accuracy. At the microscopic scale, trajectories (speed and position) of each vehicle can be described through Lagrangian microscopic models such as car-following models [22, 44, 48] or cellular automaton models [7, 27, 37]. Kinetic traffic models were also introduced in the 60’s [14, 16, 18, 36, 38, 40, 46] and deal with vehicles on a mesoscopic scale in the form of distributions in position-velocity space.

Finally, on a macroscopic scale, the vehicle flow is then considered as a continuous medium. This is the case of the famous model introduced by Lighthill, Whitham and Richards (denoted LWR in what follows) [33, 42]. We will use it in the rest of this article, since the continuous description of the flow is particularly well adapted to the writing of a dynamic programming problem. It is notable that this first-order model introduced in the 50’s was later extended to Euler-type systems with viscosity, of second order [3, 39, 47, 50].

In this paper, we introduce and analyze an optimal control problem modeling an evacuation scenario for an axis belonging to a road network. A close control model has already been introduced in [24], but under restrictive assumptions about the flow regime in order to avoid congestion phenomena, which is what we want to avoid. It has been in particular highlighted that the flow may not be invertible with respect to control parameters, meaning that density at the nodes may not be reconstructed from boundary conditions. A related model, also applicable to crisis management, is discussed in [35], where the objective function is the mean spatial velocity at final time computed on specific path between e.g., an accident location and an hospital. The control variables are the traffic distribution parameters at junctions. The exact solution is computed locally for each junction and a sub-optimal global solution is retrieved by applying the local optimal solution for each junction. In [23], a method to manage variable speed limits combined with coordinated ramp metering within the framework of the Lighthill-Whitham-Richards (LWR) network model has been introduced. They consider a “first-discretize-then-optimize” numerical approach to solve it numerically. Note that the optimal control problem we deal with in the following has some similarities with the one proposed by the authors of this article, but differs in several respects: our control variables aim to control not just the ramps metering, but an entire axis. Furthermore, our approach to solving the underlying optimal control problem is quite different, in particular our use of optimality conditions leading to an efficient solution algorithm.

In [41], a similar control problem is discussed, which is addressed using a piecewise-linear flux framework driving to a much simplified adjoint system involving piecewise constant Jacobian matrices in time and space. This approach enables faster gradient computation, allowing them to tackle real-world simulations related to ramp-metering configuration. In our study, we opted to use the model introduced in [13] and dealt with the nonlinear flux using automatic differentiation techniques to compute the gradient. This kind of approach has also been used in [6] for air traffic flow with a modified LWR-based network model.

We also mention several contributions aiming at dealing with concrete or real-time cases. The paper [19] deals with this kind of control problem for simplified nonlinear (essentially without considering congestion) and linear formulations of the model, considering large-scale networks. In [28], an algorithm for real-time traffic state estimation and short-term prediction is introduced.

Several other close control problems have been investigated in the past: in [25], the authors use switching controls to deal with traffic lights at an 8 \(\times \) 4 junction in a piecewise linear flow framework. The paper [2] deals with an optimal control problem of the same kind, formulated from the continuous LWR model, solved numerically using a heuristic random parameters search. Other works have looked at traffic regulation by comparing and discussing the use of lights and circles [12]. We refer to [1, 5] for a broad view on this topic.

Regarding now controlled microscopic models, let us mention [34] where an approach based on model predictive control (MPD) is developed and [4] using a reinforcement learning algorithm. In the book [45], the last chapter is devoted to the modeling of various optimization problems related to road traffic.

Concerning now the control of mesoscopic and macroscopic models, there are software contributions seeking in particular to bring real time answers and guidance tools in case of accident. Let us also mention [26, 31] dealing in part with traffic prediction and control problems.

This article uses a controlled model close to the one studied in [24] which is devoted to the control of the macroscopic LWR model on a directed graph using the framework introduced in [13]. However, we are interested in taking into account some properties of the model that were not addressed in [24], such as congestion phenomena.

Hence, we will not focus on individual paths since we are interested in the overall dynamics with a focus on congestion. Particular attention is paid to the modeling of junctions as they introduce a nonlinear coupling between roads.

1.3 Main objectives

Our goal is to design a numerical method to tackle the control of the flow of vehicles in all fluid regimes, including saturated regimes/congestions, and using dynamical barriers at each road in the graph.

We thus propose a road traffic model in the form of a controlled graph at each junction. This models an urban area structured by roads. For practical reasons, even if we wish to take into account possible congestion/shocks in the system, we position ourselves in a differentiable framework by adding weak diffusion terms in the system, through a semi-discretization in space. Hence, the resulting problem consists in minimizing the vehicle density at the final time on a given route under ODE constraints, obtained by a finite volume semi-discretization of the LWR system. We then enrich this problem with a constraint on the number of simultaneously active roadblocks in order to take into account staffing issues. We use this analysis to build a new algorithm adapted to this problem, mixing a classical gradient approach with a fixed-point method, which allows non-local perturbations. One of the advantages of this approach is that the constructed sequence can escape from a minimum well and survey a larger part of the admissible set than a standard gradient type algorithm.

1.4 Plan of this article

While the macroscopic model describing the flow is very standard (Lighthill-Whitham-Richards, 1955) the problem of flow distribution at junctions, inspired by more recent works [20, 21], realizes a nonlinear coupling between the different edges. The distribution of vehicles at junctions is thus modeled by an optimal allocation process that depends on the maximum possible flows at the junctions, through a linear programming (LP) problem targeting the maximization of incoming flows. In order to influence the traffic flow, we introduce control functions that are defined at each road entrance and acts as barriers by weighting the capacity of a road leaving a junction to admit new vehicles.

Section 2.1 is devoted to the description of the traffic flow model we consider, and to the optimal control problem modeling the best evacuation scenario for a road.

After investigating the existence of optimal control in Sect. 3.1, we then derive necessary optimality conditions that we reformulate in an exploitable way in Sect. 3.2. We hence introduce an optimization algorithm in Sect. 4, based on a hybrid combination of a variable step gradient method and a well-chosen fixed point method translating the optimality conditions and allowing global perturbations. Finally, in Sect. 5, we illustrate the efficiency of the introduced method using examples of various graphs modeling, in particular, a traffic circle or a main road surrounded by satellite roads.

Our contribution can be summarized in the form of a computational open source code allowing processing complex graphs, but nevertheless of rather small dimensions. This code, written in the Julia language, is available and usable at the following link:

https://github.com/mickaelbestard/TRoN.jl

Notations. Throughout this article, we will use the following notations:

  • For \(x\in \mathbb {R}^n\), \(x_+\) will denote the positive part of x, namely \(x_+=\max \{x,0_{\mathbb {R}^n}\}\), the max being understood component by component;

  • \({\text {BV}}(0,T)\) denotes the space of all \(L^1(0,T)\) functions of bounded variation on (0, T). For any \(u \in L^1(0,T)\), the total variation of u is defined by

    $$\begin{aligned} {\text {TV}}(u)= \sup \left\{ \int \limits _0^T u(t) \phi ' ( t ) \, dt, \ \phi \in C_c^1 ([0,T]), \ \Vert \phi \Vert _{L^\infty ( 0,T )} \leqslant 1 \right\} ; \end{aligned}$$
  • \(W^{1,\infty }(0,T)\) denotes the space of all functions f in \(L^\infty (0,T)\) whose gradient in the sense of distributions also belongs to \(L^\infty (0,T)\);

  • \(\Vert \cdot \Vert _{\mathbb {R}^d}\) (resp. \(\langle \cdot ,\cdot \rangle _{\mathbb {R}^d}\)) stands for the standard Euclidean norm (resp. inner product) in \(\mathbb {R}^d\);

  • Let \(F:\mathbb {R}^p\times \mathbb {R}^q\rightarrow \mathbb {R}^m\) and \((x_0,y_0)\in \mathbb {R}^p\times \mathbb {R}^q\). We will denote by \(\partial _x F(x_0,y_0)\), the Jacobian matrix of the mapping \(x\mapsto F(x,y_0)\). When no ambiguity is possible, we will simply denote it by \(\partial _x F\);

  • \(N_r\): number of roads in the model;

  • \(N_c\): number of mesh cells per road for the first-order Finite Volume scheme;

  • \(N_T\): number of time discretization steps in the numerical schemes;

2 A controlled model of trafic flow

In this section, we first present the traffic flow model on a road network and its semi-discretization. Then we introduce the precise control model: the control at junctions will translate regulation action using traffic signs or traffic lights.

2.1 Traffic dynamics on network without control

The road network is a directed graph of \(N_r\) roads, with \(N_r\in \mathbb {N}^*\), where each edge corresponds to a road and each vertex to a junction. The roads will all be denoted as real intervals \([a_i,b_i]\) for some \(i\in \llbracket 1,N_r\rrbracket \). Of course, this writing does not determine the topology of the graph. It is necessary to define, for each junction, the indices of the incoming and outgoing roads in order to characterize the directed graph completely.

We are interested in the time evolution of the densities on each road. On the i-th road, the evolution of the density \(\rho _i(t,x)\) is provided by the so-called LWR model

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t \rho _i(t,x) + \partial _x f_i(\rho _i(t,x)) = 0 , &{} (t,x) \in (0,T)\times [a_i, b_i], \\ \rho _i(0,x) = \rho _i^0(x), &{} x\in [a_i, b_i], \\ f_i(\rho _i(t,a_i)) = \gamma ^L_i(t), &{} t\in (0,T), \\ f_i(\rho _i(t,b_i)) = \gamma ^R_i(t), &{} t\in (0,T), \end{array}\right. } \end{aligned}$$
(LWR)

where the flux \(f_i(\rho )\) is given by

$$\begin{aligned} f_i(\rho ) = \rho \,v_i^{\max }\left( 1-\frac{\rho }{\rho _i^{\max }}\right) , \end{aligned}$$
(1)

where \(v_i^{\max }\) and \(\rho ^{\max }_i\) denote, respectively, the maximal velocity and density allowed on the i-th road. The initial density is denoted \(\rho _i^0(x)\) and \({\varvec{\gamma }}_i^L\) and \({\varvec{\gamma }}_i^R\) are functions allowing prescribing the flux at the left and right boundaries of the interval.

The fluxes \({\varvec{\gamma }}\) at junctions are determined by using the model introduced in [13]. Considering a junction J, the sets of indices corresponding to the in-going and outgoing roads are denoted \(J_{\text {in}}\) and \(J_{\text {out}}\). We assume that there is a statistical traffic distribution matrix \(A_J\) defined by

$$\begin{aligned} A_J {:}{=} \left( \alpha _{ji}\right) _{(j,i) \in J_{\text {out}}\times J_{\text {in}}},\quad 0< \alpha _{ji} < 1,\quad \sum _{j \in J_{\text {out}}}\alpha _{ji}=1, \end{aligned}$$

with \(\alpha _{ji}\) the proportion of vehicles going to the j-th outgoing road among those coming from the i-th incoming road. Outgoing fluxes have to satisfy the relation

$$\begin{aligned} (\gamma _j^L)_{j \in J_{\text {out}}} = A_J (\gamma _i^R)_{i \in J_{\text {in}}}, \end{aligned}$$
(2)

and we have the balance between in-going and outgoing fluxes thanks to the row-stochasticity of \(A_J\):

$$\begin{aligned} \sum _{i \in J_{\text {in}}} \gamma _i^R = \sum _{j \in J_{\text {out}}} \gamma _j^L. \end{aligned}$$

Let us note that we have natural constraints on the fluxes. Indeed the LWR flux function \(f_i\) reaches its maximum at \(\rho _i^{\max }/2\). Therefore, incoming fluxes \(\gamma _i^R\) have to be smaller than the following upper bound:

$$\begin{aligned} {\gamma _i^{R,\max } = f_i(\min \{\rho _i(t,b_i^+),\rho _i^{\max }/2\})}, \end{aligned}$$

which takes into account reduced demand when the density is lower than \(\rho _i^{\max }/2\) in the incoming road. Similarly outgoing fluxes \(\gamma _j^L\) have to be smaller than the upper-bound:

$$\begin{aligned} {\gamma _j^{L,\max } = f_j(\max \{\rho _j(t,a_j^-),\rho _j^{\max }/2\})}, \end{aligned}$$

which takes into account reduced capacities when the density is larger than \(\rho _j^{\max }/2\) in the outgoing road. We refer to [13] for details. It is notable that, according to J.P. Lebacque, \(\gamma _i^{R,\max }\) and \(\gamma _j^{L,\max }\) can, respectively, be seen as the upstream demand and the downstream supply functions [29]. Consequently, the fluxes belong to the following set:

$$\begin{aligned} \Omega _J = \left\{ \left( (\gamma _i^R)_{i \in J_{\text {in}}},(\gamma _j^L)_{j \in J_{\text {out}}}\right) \in \prod _{i \in J_{\text {in}}} \left[ 0,\gamma _i^{R,\max }\right] \prod _{j \in J_{\text {out}}} \left[ 0,\gamma _j^{L,\max }\right] \quad \text {with} \quad (\gamma _j^L)_{j \in J_{\text {out}}} = A_J (\gamma _i^R)_{i \in J_{\text {in}}} \right\} \end{aligned}$$

Then, we assume that drivers succeed in maximizing the total flow. Consequently the fluxes are solution to the following Linear Programming (LP) problem:

$$\begin{aligned} \max _{ \left( (\gamma _i^R)_{i \in J_{\text {in}}},(\gamma _j^L)_{j \in J_{\text {out}}}\right) \in \Omega _J} \ \sum _{i \in J_{\text {in}}} \gamma _i^R, \end{aligned}$$
(3)

This definition is valid at any junction of the network. Note that this problem does not have necessarily a unique solution in the case where there are more incoming roads than outgoing roads. In that case, a priority modeling assumption is added to select one solution. We refer to [8] for more details. In the following, we will only consider networks with at most two incoming roads and two outgoing roads, i.e., either \(1\times 1\), \(1\times 2\), \(2\times 1\) or \(2\times 2\) junctions. This gives existence and stability of the LWR Cauchy problem [13], and has the advantage that the linear programming problem has a unique solution and can be explicitly provided [43], while it already enables to model a huge variety of road networks. We formally write:

$$\begin{aligned} {\varvec{\gamma }}(t) = \phi ^{LP}({\varvec{\rho }}(t)), \end{aligned}$$

for the simultaneous resolution of the linear programming problems at all the junctions of the network. Note that the dependency on \({\varvec{\rho }}(t)\) results from the definition of the upper bounds involved in the definition of the sets \(\Omega _J\). The explicit expression of \(\phi ^{LP}\) for the 4 types of junctions is detailed in “Appendix A”.

2.2 Control at junctions

We introduce the vector of controls \(\textbf{u}= \left( u_i(\cdot )\right) _{1\leqslant i\leqslant N_r}\in L^\infty ([0,T],\mathbb {R}^{N_r})\) at every road entrance. Note that we implicitly assume that every road junction is controlled, but our model allows without any difficulty to neutralize some controls in order to model the fact that only some junctions are controlled.

We interpret each control \(u_j\) as a rate, assuming that at each time \(t\in (0,T)\), the maximum flow out of a junction and coming into road j is weighted by a coefficient \(u_j(t)\in [0,1]\). At junction J, we thus define the following polytope of constraints

$$\begin{aligned} \Omega _J({\textbf{u}})&= \Big \{ \left( (\gamma _i^R)_{i \in J_{\text {in}}},(\gamma _j^L)_{j \in J_{\text {out}}}\right) \in \prod _{i \in J_{\text {in}}} \left[ 0,\gamma _i^{R,\max }\right] \prod _{j \in J_{\text {out}}} \left[ 0,(1-u_j)\gamma _j^{L,\max }\right] \nonumber \\&\quad \text {with}\quad \ (\gamma _j^L)_{j \in J_{\text {out}}} = A_J (\gamma _i^R)_{i \in J_{\text {in}}} \Big \} \end{aligned}$$
(4)

Thus the fluxes at junctions are determined by solving the LP problem (3) with this new set of constraints.

In the case where the junction has more than one outgoing road, this initial model is not completely satisfactory since full control on one outgoing road would result in zero incoming fluxes for all the outgoing roads: indeed relation (2) implies that outgoing fluxes are linear combination of the incoming fluxes with positive weights. Consequently, as soon as a control is fully activated, all the incoming fluxes vanish and then all the outgoing fluxes vanish, too. This is definitely not the desired behavior as we would expect that the traffic flow would be deviated to the uncontrolled outgoing roads. To solve this problem, we choose to make the traffic distribution matrix \(A_J\) also dependent on the control \(\textbf{u}\). More precisely, we ask that, as soon as a road entry is fully controlled and the other are not controlled, the proportion of vehicles entering the road is set to 0. To illustrate this point, let us describe the control of \(1\times 2\) junction, i.e., with one incoming and 2 outgoing roads. Then the traffic distribution matrix A writes:

$$\begin{aligned} A({\textbf{u}}) = \begin{pmatrix} \alpha ({\textbf{u}}) \\ 1-\alpha ({\textbf{u}}) \end{pmatrix}, \end{aligned}$$

where \(\alpha ({\textbf{u}})\) denotes the proportion of vehicles going in the first outgoing road. Then, denoting \({\textbf{u}}= (u_1, u_2)\) the controls of the two outgoing roads, the function \(\alpha (u_1, u_2)\) has to be chosen such that:

  1. (a)

    if the first outgoing road is fully controlled and the second is not controlled, then \(\alpha ({\textbf{u}})\) has to vanish, resulting in the condition: \(\alpha (1, 0) = 0\), and then the entire vehicle flow goes in the second outgoing road.

  2. (b)

    inversely, if the second outgoing road is fully controlled and the first one is not controlled, then \(1-\alpha ({\textbf{u}})\) has to vanish, resulting in the condition: \(1-\alpha (0, 1) = 0\), and then the entire vehicle flow goes in the first outgoing road.

We also ask for the following additional property:

  1. (c)

    the parameters are not modified if the outgoing roads are equally controlled, which results in the condition:

    $$\begin{aligned} \forall u \in [0,1],\quad \alpha (u, u) = {\bar{\alpha }}, \end{aligned}$$

    where \({\bar{\alpha }}\) is a given distribution coefficient.

In “Appendix A”, we provide an explicit construction of such a function. Note that, in practice, to avoid the degeneracy of the junction problems, the distribution coefficient is set to a very small value \(\varepsilon ^2\) (resp. \((1-\varepsilon ^2)\)) in situations (a) (resp. (b)). We also treat the case of \(2\times 2\) junctions. Note that junctions with only one outgoing roads (\(1\times 1\) and \(2\times 1\)) do not require such modification.

Like in the previous section, the simultaneous resolution of the Linear Programming problems at each junctions junction is now denoted:

$$\begin{aligned} {\varvec{\gamma }}= \phi ^{LP}({\varvec{\rho }},{\textbf{u}}) \end{aligned}$$

The explicit expressions of \(\phi ^{LP}\) in the set of cases we deal with are provided in “Appendix A”. Beyond the explicit expression of \(\phi ^{LP}({\varvec{\rho }},{\textbf{u}})\), what matters is that \(\phi ^{LP}\) is a Lipschitz function with respect to \(({\varvec{\rho }},{\textbf{u}})\). In the following, in order to use tools of differentiable optimization, we will consider a \(C^1\) approximation of this function. This issue is commented at the end of “Appendix A”. According to these comments, we will assume from now on:

figure a

In this paper, we do not investigate the well-posedness of this continuous controlled dynamics but rather focus on a semi-discretization version.

2.3 Semi-discretized model

For algorithmic efficiency reasons, we prefer to be able to define the sensitivity of the different data of the problem with respect to the control. Since the model (LWR) is known to generate possible irregularities in the form of shocks, we have decided to introduce regularity through a semi-discretization of the model in space. The relevance of this choice will be discussed in the concluding section.

Hence, we discretize the model (LWR) with a first-order Finite Volume (FV) scheme. To simplify the notations, we consider the same number \(N_c\in \mathbb {N}^*\) of mesh cells per road. The discretization points on the i-th roads are denoted \(a_i = x_{i,1/2}< x_{i,3/2}< \ldots < x_{i,N_c-1/2} = b_i\) and the space steps \(\Delta x_{i,j} = x_{i,j+1/2} - x_{i,j-1/2}\). Then the discrete densities and the boundary fluxes are denoted:

$$\begin{aligned} {\varvec{\rho }}(t)&= \big (\rho _{i,j}(t)\big )_{\begin{array}{c} 1\leqslant i\leqslant N_r\\ 1\leqslant j\leqslant N_c \end{array}} = \left( \frac{1}{\Delta x_{i,j}}\int \limits _{x_{i,j-\frac{1}{2}}}^{x_{i,j+\frac{1}{2}}}\rho (t,x)\,\textrm{d}x \right) _{\begin{array}{c} 1\leqslant i\leqslant N_r\\ 1\leqslant j\leqslant N_c \end{array}},\\ {\varvec{\gamma }}(t)&= \left( \gamma _1^L(t), \gamma _1^R(t),\ldots , \gamma _{N_r}^L(t), \gamma _{N_r}^R(t)\right) . \end{aligned}$$

With a slight abuse of notation, we have written the discrete variable similarly as the continuous one, since we will essentially work on the semi-discretized model. Then the semi-discretized dynamics are given by the differential equation

$$\begin{aligned} {\left\{ \begin{array}{ll} \dfrac{\textrm{d}{\varvec{\rho }}}{\textrm{d}t}\big (t\big ) = f^{FV}\left( {\varvec{\rho }}(t),{\varvec{\gamma }}(t)\right) , &{} t\in (0,T) \\ {\varvec{\gamma }}(t) = \phi ^{LP}\left( {\varvec{\rho }}(t), \textbf{u}(t))\right) , &{} t\in (0,T) \\ {\varvec{\rho }}(0) = {\varvec{\rho }}_0, &{} \end{array}\right. } \end{aligned}$$
(LWR-sd)

where the finite volume flow at the j-th mesh cell of the i-th road reads

$$\begin{aligned} f^{FV}({\varvec{\rho }},{\varvec{\gamma }})_{i,j} = {\left\{ \begin{array}{ll} \frac{-1}{\Delta x_{i,j}}\left( {\mathcal {F}}_i(\rho _{i,j},\rho _{i,j+1}) - \gamma _i^L\right) ,&{} \text {if } j=1, \\ \frac{-1}{\Delta x_{i,j}}\left( {\mathcal {F}}_i(\rho _{i,j},\rho _{i,j+1}) - {\mathcal {F}}_i(\rho _{i,j-1},\rho _{i,j})\right) ,&{} \text {if } 2\leqslant j\leqslant N_c-1, \\ \frac{-1}{\Delta x_{i,j}}\left( \gamma _i^R - {\mathcal {F}}_i(\rho _{i,j-1},\rho _{i,j})\right) ,&{} \text {if } j=N_c, \\ \end{array}\right. } \end{aligned}$$
(5)

with \({\mathcal {F}}_i(u,v)\) the so-called local Lax-Friedrichs numerical flux given by

$$\begin{aligned} {\mathcal {F}}_i(u,v) = \frac{f_i(u) + f_i(v)}{2} - \max {\left\{ |f_i'(u)|, |f_i'(v)|\right\} }\frac{(v - u)}{2}. \end{aligned}$$

We refer to [32] for an introduction to finite volume approximations.

2.4 Conclusion: an optimal control problem

We are interested in an approximate controllability problem which consists in emptying a given route as much as possible for a fixed final time \(T>0\). Let us introduce \(\chi _{\text {path}}\subset \llbracket 1,N_r\rrbracket \), a set of indices corresponding to the route we wish to empty in a time T.

We would like to minimize a functional with respect to the control \({\textbf{u}}\), representing the sum of all the densities on this path, in other words

$$\begin{aligned} C_T({\textbf{u}})&= \sum _{i\in \chi _{\text {path}}} \sum _{j=1}^{N_c}\rho _{i,j}(T;{\textbf{u}}) = {\textbf{c}}\cdot {\varvec{\rho }}(T;{\textbf{u}}), \end{aligned}$$

where \({\varvec{\rho }}(t;{\textbf{u}})\) denotes the solution to the controlled system (LWR-sd) and \({\textbf{c}}=(c_i)_{1\leqslant i\leqslant N_r}\) the vector defined by

$$\begin{aligned} c_i=1 \text { if }i\in \chi _{\text {path}}\quad \text {and}\quad c_i=0\text { else}. \end{aligned}$$

Of course, it is necessary to introduce a certain number of constraints on the sought controls, in accordance with the model under consideration, and to model that the obtained control is feasible in practice. We are thus driven to consider the following constraints:

  1. (i)

    \(0\leqslant u_i(\cdot ) \leqslant 1\), meaning that at each time, the control is a vehicle acceptance rate on a road;

  2. (ii)

    \(\sum _{i=1}^{N_r} u_i(\cdot ) \leqslant N_{\max }\): this imposes a constraint on the total amount of control in order to take into account the staff required for roadblocking. In particular, there cannot be more than \(N_{\max }\) fully active controls (\(u_i = 1\)) at the same time.

  3. (iii)

    Regularity: we will assume that \({\textbf{u}}\) is of bounded variation, and write \({\textbf{u}}\in {\text {BV}}([0,T];\mathbb {R}^{N_r})\). This constraint involves the total variation of the control and models that the roadblock is supposed not to “blink” over time. From a control point of view, we aim at avoiding the so-called chattering phenomenon.

The first constraint above will be included in the set of admissible controls. Concerning the other two constraints, we have chosen to include them as penalty/regularization terms in the functional. Of course, other choices would be quite possible and relevant.

Let \(N_{\max }\in \mathbb {N}^*\) be an integer standing for the maximal number of active controls for this problem and \(\theta =(\theta _S, \theta _B)\in \mathbb {R}_+^2\) be two non-negative parameters. According to all the considerations above, the optimal control problem we will investigate reads:

figure b

where the admissible set of controls is defined by

$$\begin{aligned} \mathcal {U}_{ad}=L^\infty ([0,T],[0,1]^{N_r}), \end{aligned}$$
(6)

and the regularized cost functional \({\mathcal {J}_\theta }\) writes

$$\begin{aligned} {\mathcal {J}_\theta }({\textbf{u}}) = C_T({\textbf{u}}) + \frac{\theta _S}{2} S({\textbf{u}}) + \theta _B B({\textbf{u}}), \end{aligned}$$
(7)

where \(S({\textbf{u}})\) denotes the regularizing term modeling the limitation on the number of active controls and \(B({\textbf{u}})\) is the total variation of \({\textbf{u}}\) in time, namely

$$\begin{aligned} S({\textbf{u}}) = \int \limits _0^T\left( \sum _{i=1}^{N_r} u_i(t)-N_{\max }\right) _+^2\,dt \quad \text {and}\quad B({\textbf{u}}) = \sum _{i=1}^{N_r}{\text {TV}}(u_i). \end{aligned}$$
(8)

3 Analysis of the optimal control problem (\(\mathcal {P}_\theta \))

In this section, we will investigate the well-posedness and derive the optimality conditions of problem (\(\mathcal {P}_\theta \)). These conditions will form the basis of the algorithms used in the rest of this study. Without loss of generality, we will suppose that all roads have the same traffic parameters, \(v_i^{\max } = 1\), \(\rho _i^{\max }=1\) and have same length and same number of mesh cells. In particular, the space steps \(\Delta x_i = \Delta x\) is the same for all roads.

3.1 Well-posedness of problem (\(\mathcal {P}_\theta \))

We follow the so-called direct method of calculus of variations. The key point of the following result is the establishment of uniform estimates of \({\varvec{\rho }}\) with respect to the control variable \({\textbf{u}}\), in the \(W^{1,\infty }(0,T)\) norm.

Theorem 1

Let us assume that \(\theta _B\) is positive. Then, problem (\(\mathcal {P}_\theta \)) has a solution.

Proof

Let \(({\textbf{u}}_n)_{n\in \mathbb {N}}\) be a minimizing sequence for problem (\(\mathcal {P}_\theta \)). Observe first that, by minimality, the sequence \(({\mathcal {J}_\theta }({\textbf{u}}_n))_{n\in \mathbb {N}}\) is bounded, and therefore, so is the total variation \((B({\textbf{u}}_n))_{n\in \mathbb {N}}\), as \(\theta _B\) is assumed to be positive. We infer that, up to a subsequence, \(({\textbf{u}}_n)_{n\in \mathbb {N}}\) converges in \(L^1(0,T)\) and almost everywhere toward an element \({\textbf{u}}^*\in {\text {BV}}(0,T)\). Since the class \(\mathcal {U}_{ad}\) is closed for this convergence, we moreover get that \({\textbf{u}}^*\in L^\infty ([0,T],[0,1]^{N_r})\).

For the sake of readability, we will denote similarly a sequence and any subsequence with a slight abuse of notation.

In what follows, it is convenient to introduce the function \(g:\mathbb {R}^{N_c\times N_r}\times [0,1]^{N_r}\) defined by \(g({\varvec{\rho }},{\textbf{u}}) = f^{FV}({\varvec{\rho }},\phi ^{LP}({\varvec{\rho }},{\textbf{u}}))\), so that \({\varvec{\rho }}\) solves the ODE system \({\varvec{\rho }}' = g({\varvec{\rho }},{\textbf{u}})\). Let us set \({\varvec{\rho }}_n {:}{=} {\varvec{\rho }}(\cdot ;{\textbf{u}}_n)\).

Step 1: the function g is continuous Indeed, note that the function \(\phi ^{LP}\) is continuous (and even Lipschitz), according to (H\(_{\phi ^{LP}}\)). Explicit expressions of \(\phi ^{LP}\) for the four considered type of junctions are detailed in “Appendix A”. According to (5), \(f^{FV}\) has therefore the same regularity as the numerical flow \({\varvec{\rho }}\mapsto {\mathcal {F}}({\varvec{\rho }})\), defined by

$$\begin{aligned} {\mathcal {F}}_i^{j,j+1}({\varvec{\rho }}) = \frac{f_i^j+f_i^{j+1}}{2} - \frac{\max \{|df_i^j|,|df_i^{j+1}|\}}{2}\left( \rho _i^{j+1}-\rho _i^j\right) , \end{aligned}$$

with \(f_i^j = \rho _i^j(1-\rho _i^j)\) and \(df_i^j = 1-2\rho _i^j\), which is obviously continuous.

Step 2: the sequence \(({\varvec{\rho }}_n)_{n\in \mathbb {N}}\) is uniformly bounded in \(W^{1,\infty }(0,T)\) For \({\textbf{u}}\in \mathcal {U}_{ad}\) given, let us introduce the function \(\mathcal {L}: t\mapsto \frac{1}{2}\Vert \rho (t)\Vert _{\mathbb {R}^{N_c\times N_r}}^2\).

According to the chain rule, \(\mathcal {L}\) is differentiable, and its derivative readsFootnote 1

$$\begin{aligned} \mathcal {L}'(t)= & {} \langle {\varvec{\rho }}'(t),{\varvec{\rho }}(t) \rangle _{\mathbb {R}^{N_c\times N_r}} = \langle g({\varvec{\rho }}(t),{\textbf{u}}(t)), {\varvec{\rho }}(t) \rangle _{\mathbb {R}^{N_c\times N_r}} \\= & {} -\frac{1}{\Delta x}\sum _{i=1}^{N_r}\sum _{j=1}^{N_c}\left( {\mathcal {F}}_i^{j,j+1}({\varvec{\rho }})- {\mathcal {F}}_i^{j-1,j}({\varvec{\rho }})\right) \rho _i^j\\= & {} -\frac{1}{\Delta x}\sum _{i=1}^{N_r}\left( \gamma _i^R\rho _i^{N_c}+\sum _{j=1}^{N_c-1}{\mathcal {F}}_i^{j,j+1}({\varvec{\rho }})\rho _i^j - \gamma _i^L\rho _i^1 - \sum _{j=2}^{N_c}{\mathcal {F}}_i^{j-1,j}({\varvec{\rho }})\rho _i^j \right) \\= & {} \frac{1}{\Delta x}\sum _{i=1}^{N_r}\left( \gamma _i^L\rho _i^1-\gamma _i^R\rho _i^{N_c} + \sum _{j=1}^{N_c-1}{\mathcal {F}}_i^{j, j+1}({\varvec{\rho }})\left( \rho _i^{j+1}-\rho _i^{j}\right) \right) . \end{aligned}$$

For a given \(i\in \llbracket 1,N_r\rrbracket \), and \(j\in \llbracket 1,N_c-1\rrbracket \), we have

$$\begin{aligned} {\mathcal {F}}_i^{j,j+1}({\varvec{\rho }})\left( \rho _i^{j+1}-\rho _i^j\right)&= \left( \frac{f_i^j+f_i^{j+1}}{2} - \frac{\max \{|df_i^j|,|df_i^{j+1}|\}}{2}\left( \rho _i^{j+1}-\rho _i^j\right) \right) \left( \rho _i^{j+1}-\rho _i^j\right) \\&\leqslant \left( f_i(\sigma _i) + (\rho _i^{j+1}-\rho _i^j) \right) \left( \rho _i^{j+1}-\rho _i^j\right) , \end{aligned}$$

since \(f_i^j\leqslant f_i(\sigma _i)= 1/4\) according to the explicit expression (1) of \(f_i\), and \(|df_i^j|\leqslant v_i^{\max }=1\). Since the local Lax-Friedrichs scheme is monotone, we infer that \(\rho _i^j\) is lower than \(\rho _i^{\max } = 1\). We then have

$$\begin{aligned} {\mathcal {F}}_i^{j,j+1}({\varvec{\rho }})\left( \rho _i^{j+1}-\rho _i^j\right)&\leqslant 2 f_i(\sigma _i) + \left( \rho _i^{j+1}-\rho _i^j\right) ^2 = \frac{1}{2} + \left( \rho _i^{j+1}-\rho _i^j\right) ^2 \\&\leqslant \frac{1}{2} + (\rho _i^j)^2 + (\rho _i^{j+1})^2 - 2\rho _i^j\rho _i^{j+1} \\&\leqslant \frac{1}{2} + (\rho _i^j)^2 + (\rho _i^{j+1})^2, \end{aligned}$$

by positivity of the \(\rho _i^j\). Using the majoration \( \gamma _i^L\rho _i^1-\gamma _i^R\rho _i^{N_c} \leqslant f_i(\sigma _i)\times 2 = 1/2\), all the calculations above yield

$$\begin{aligned} \mathcal {L}'(t)&\leqslant \frac{1}{\Delta x}\sum _{i=1}^{N_r} \left( \frac{1}{2} + \sum _{j=1}^{N_c-1} \left( \frac{1}{2}+ (\rho _i^j)^2 + (\rho _i^{j+1})^2 \right) \right) \\&\leqslant \frac{1}{\Delta x} \left( \frac{N_r N_c}{2} + || {\varvec{\rho }}(t) || ^2 \right) . \end{aligned}$$

We infer the existence of two positive numbers \({\bar{\alpha }}\), \({\bar{\beta }}\) that do not depend on \({\textbf{u}}\), such that \(\mathcal {L}'(t) \leqslant {\bar{\alpha }} \mathcal {L}(t) + {\bar{\beta }}\) for a.e. \(t\in (0,T)\). By using a Grönwall-type inequality, we get

$$\begin{aligned} \mathcal {L}(t) \leqslant \mathcal {L}(0)e^{{\bar{\alpha }} t} + \int \limits _0^t \beta e^{{\bar{\alpha }}(t-s)}\, ds = \left( \mathcal {L}(0) + \frac{\bar{\beta }}{{\bar{\alpha }}}(1-e^{-{\bar{\alpha }} t})\right) e^{{\bar{\alpha }} t}, \end{aligned}$$
(9)

Step 3: conclusion According to (9), the sequence \(({\varvec{\rho }}_n)_{n\in \mathbb {N}}\) is bounded in \(L^\infty (0,T)\). Since \({\varvec{\rho }}_n' = g({\varvec{\rho }}_n,{\textbf{u}}_n)\) a.e. in (0, T) and g is continuous, it follows that \(({\varvec{\rho }}_n)_{n\in \mathbb {N}}\) is uniformly bounded in \(W^{1,\infty }(0,T)\). Therefore, by using the Arzela-Ascoli theorem, this sequence converges up to a subsequence in \(C^0([0,T])\) toward an element \({\varvec{\rho }}^* \in W^{1,\infty }(0,T)\).

Let us recast System (LWR-sd) as a fixed point equation, as

$$\begin{aligned} \text {for every} t\in (0,T), \quad {\varvec{\rho }}_n(t) - {\varvec{\rho }}_0 = \int \limits _0^t g({\varvec{\rho }}_n(s),{\textbf{u}}_n(s))\, \textrm{d}s, \end{aligned}$$

we obtain by letting n go to \(+\infty \),

$$\begin{aligned} {\varvec{\rho }}^*(t) - {\varvec{\rho }}_0 = \int \limits _0^t g({\varvec{\rho }}^*(s),{\textbf{u}}^*(s))\, \textrm{d}s, \end{aligned}$$

meaning that \({\varvec{\rho }}^*\) solves System (LWR-sd) with \({\textbf{u}}^*\) as control. Finally, according to the aforementioned convergences and since the functionals S and B are convex, it is standard that one has

$$\begin{aligned} \lim _{n\rightarrow +\infty }C_T({\textbf{u}}_n)=C_T({\textbf{u}}^*),\quad \liminf _{n\rightarrow +\infty }S({\textbf{u}}_n)\geqslant S({\textbf{u}}^*), \quad \liminf _{n\rightarrow +\infty }B({\textbf{u}}_n)\geqslant B({\textbf{u}}^*). \end{aligned}$$

We get that \({\mathcal {J}_\theta }({\textbf{u}}^*)\leqslant \liminf _{n\rightarrow +\infty } {\mathcal {J}_\theta }({\textbf{u}}_n)\), and we infer that \({\textbf{u}}^*\) solves problem (\(\mathcal {P}_\theta \)). \(\square \)

3.2 Optimality conditions

We have established the existence of an optimal control in Theorem 1. In order to derive a numerical solution algorithm, we will now state the necessary optimality conditions on which the algorithm we will build is based. One of the difficulties is that the functional we use involves non-differentiable quantities. We will therefore first use the notion of subdifferential to write the optimality conditions. In Sect. 4 dedicated to numerical experiments, we will explain how we approximate these quantities.

Let us first compute the differential of the functional \(C_T\). To this aim, we introduce the tangent cone to the set \(\mathcal {U}_{ad}\).

Definition 1

Let \({\textbf{u}}\in \mathcal {U}_{ad} \). A function \({\textbf{h}}\) in \(L^{\infty }(0,T)\) is said to be an “admissible perturbation” of \({\textbf{u}}\) in \(\mathcal {U}_{ad}\) if, for every sequence of positive real numbers \((\varepsilon _n)_{n\in \mathbb {N}}\) decreasing to 0, there exists a sequence of functions \({\textbf{h}}^n\) converging to \({\textbf{h}}\) for the weak-star topology of \(L^{\infty }(0,T)\) as \(n \rightarrow +\infty \), and such that \({\textbf{u}}+\varepsilon _n {\textbf{h}}^n \in \mathcal {U}_{ad}\) for every \(n \in \mathbb {N}\).

Proposition 1

Let \({\textbf{u}}\in \mathcal {U}_{ad}\) and \(({\varvec{\rho }},{\varvec{\gamma }})\) the associated solution to (LWR-sd). We introduce the two matrices M and N defined from the Jacobian matrices of \(f^{FV}\) and \(\phi ^{LP}\) as

$$\begin{aligned} M({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}})&= (\partial _{\varvec{\rho }}f^{FV})({\varvec{\rho }},{\varvec{\gamma }}) + (\partial _{\varvec{\gamma }}f^{FV})({\varvec{\rho }},{\varvec{\gamma }}) \, (\partial _{\varvec{\rho }}\phi ^{LP})({\varvec{\rho }},{\textbf{u}}), \end{aligned}$$
(10)
$$\begin{aligned} N({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}})&= (\partial _{\varvec{\gamma }}f^{FV})({\varvec{\rho }},{\varvec{\gamma }})(\partial _{\textbf{u}}\phi ^{LP})({\varvec{\rho }},{\textbf{u}}), \end{aligned}$$
(11)

where we use the notation conventions introduced in Sect. 1.

The functional \(C_T\) is differentiable in the sense of Gâteaux and its differential reads

$$\begin{aligned} \textrm{d}C_T({\textbf{u}}){\textbf{h}}= \int \limits _0^T (N({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}})^\top {\textbf{p}})\cdot {\textbf{h}}\, \textrm{d}t, \end{aligned}$$
(12)

for every admissible perturbation \({\textbf{h}}\), where \({\textbf{p}}\) is the so-called adjoint state, defined as the unique solution to the Cauchy system

$$\begin{aligned} {\left\{ \begin{array}{ll} {\textbf{p}}' + M({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}})^\top {\textbf{p}}= 0\quad \text {in }(0,T),\\ {\textbf{p}}(T) = {\textbf{c}}. \end{array}\right. } \end{aligned}$$
(13)

Remark 1

In Proposition 1 above, the matrices M and N express respectively the way by which \({\varvec{\rho }}\) interacts with \({\varvec{\gamma }}\) and \({\varvec{\gamma }}\) interacts with \({\textbf{u}}\).

Proof of Proposition 1

Let \({\textbf{u}}\in \mathcal {U}_{ad}\). The Gâteaux differentiability of \(C_T\), \(\mathcal {U}_{ad}\ni {\textbf{u}}\mapsto {\varvec{\rho }}\) and \(\mathcal {U}_{ad}\ni {\textbf{u}}\mapsto {\varvec{\gamma }}\) is standard, and follows for instance directly of the proof of the Pontryagin Maximum Principle (PMP, see e.g., [30]). Although the expression of the differential of \(C_T\) could also be obtained by using the PMP, we provide a short proof hereafter to make this article self-contained.

Let \({\textbf{h}}\in L^\infty (0,T,\mathcal {U}_{ad})\) be an admissible perturbation of \({\textbf{u}}\) in \(\mathcal {U}_{ad}\). One has

$$\begin{aligned} \textrm{d}C_T({\textbf{u}})\,{\textbf{h}}= {\textbf{c}}\cdot \overset{{\textbf {.}}}{{\varvec{\rho }}} (T), \end{aligned}$$
(14)

where \( \overset{{\textbf {.}}}{{\varvec{\rho }}} = \textrm{d}({\textbf{u}}\mapsto {\varvec{\rho }}){\textbf{h}}\) solves the system

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{{\varvec{\rho }}}}' = (\partial _{\varvec{\rho }}f^{FV})({\varvec{\rho }},{\varvec{\gamma }}){\dot{{\varvec{\rho }}}} + (\partial _{\varvec{\gamma }}f^{FV})({\varvec{\rho }},{\varvec{\gamma }}){\dot{{\varvec{\gamma }}}}, \\ {\dot{{\varvec{\gamma }}}} = (\partial _{\varvec{\rho }}\phi ^{LP})({\varvec{\rho }},{\textbf{u}}){\dot{{\varvec{\rho }}}} + (\partial _{\textbf{u}}\phi ^{LP})({\varvec{\rho }},{\textbf{u}}){\textbf{h}}, \\ {\dot{{\varvec{\rho }}}}(0) = 0. \end{array}\right. } \end{aligned}$$

We infer that \({\dot{{\varvec{\rho }}}}\) solves

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{{\varvec{\rho }}}}' = M({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}}) {\dot{{\varvec{\rho }}}} + N({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}}) {\textbf{h}}, \\ {\dot{{\varvec{\rho }}}}(0) = 0, \end{array}\right. } \end{aligned}$$
(15)

with M and N as defined in (10). Let us multiply the main equation of (13) by \({\dot{{\varvec{\rho }}}}\) in the sense of the inner product, and integrate over (0, T). We obtain:

$$\begin{aligned} \int \limits _0^T {\dot{{\varvec{\rho }}}} \cdot \frac{\textrm{d}{\textbf{p}}}{\textrm{d}t} \, \textrm{d}t + \int \limits _0^T {\dot{{\varvec{\rho }}}}\cdot (M({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}})^\top {\textbf{p}}) \, \textrm{d}t=0. \end{aligned}$$

Similarly, let us multiply the main equation of (15) by \({\textbf{p}}\) in the sense of the inner product, and integrate over (0, T). We obtain:

$$\begin{aligned} \int \limits _0^T {\textbf{p}}\cdot \frac{\textrm{d}{\dot{{\varvec{\rho }}}}}{\textrm{d}t} \, \textrm{d}t - \int \limits _0^T M({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}}){\dot{{\varvec{\rho }}}}\cdot {\textbf{p}}\, \textrm{d}t=\int \limits _0^T N({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}}){\textbf{h}}\cdot {\textbf{p}}\, \textrm{d}t. \end{aligned}$$

Summing the last two equalities above yields

$$\begin{aligned} {\dot{{\varvec{\rho }}}}(T)\cdot {\textbf{p}}(T)-{\dot{{\varvec{\rho }}}}(0)\cdot {\textbf{p}}(0)=\int \limits _0^T {\textbf{h}}\cdot (N({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}})^\top {\textbf{p}})\, \textrm{d}t. \end{aligned}$$

Using this identity with \({\textbf{p}}(T) = c\) and \({\dot{{\varvec{\rho }}}}(0) = 0\) results in expression (12). \(\square \)

From this result, we will now state the optimality conditions for problem (\(\mathcal {P}_\theta \)). Let us first recall that, according to [17, Proposition I.5.1], the subdifferential of the total variation is well-known, given by

$$\begin{aligned} \partial {\text {TV}}({\textbf{u}}^*)=\{\varvec{\eta } \in C^0([0,T];\mathbb {R}^{N_r})\mid \Vert \varvec{\eta } \Vert _\infty \leqslant 1\text { and }\int \varvec{\eta } \textrm{d}{\textbf{u}}^*={\text {TV}}({\textbf{u}}^*)\}. \end{aligned}$$

Let us denote by \(e_i\) the i-th vector of the canonical basis of \(\mathbb {R}^{N_r}\).

Theorem 2

Let \({\textbf{u}}^*=(u_i^*)_{1\leqslant i\leqslant N_r}\), denote a solution to problem (\(\mathcal {P}_\theta \)), \(({\varvec{\rho }},{\varvec{\gamma }})\) the associated solution to (LWR-sd). and let \(i\in \llbracket 1,N_r\rrbracket \). There exists \(\varvec{\eta ^*}\in \partial {\text {TV}}({\textbf{u}}^*)\) such that

  • on \(\{u^*_i=0\}\), one has \(\Psi \cdot e_i \geqslant 0\),

  • on \(\{u^*_i=1\}\), one has \(\Psi \cdot e_i \leqslant 0\),

  • on \(\{0<u^*_i<1\}\), one has \(\Psi \cdot e_i = 0\),

where the function \(\Psi :[0,T]\rightarrow \mathbb {R}^{N_r}\) is given by

$$\begin{aligned} \Psi (t)=N({\varvec{\rho }}(t),{\varvec{\gamma }}(t),{\textbf{u}}^*(t))^\top {\textbf{p}}^*(t)+\theta _S\left( \sum _{i=1}^{N_r}u_i^*(t) -N_{\max }\right) _++\theta _B\varvec{\eta ^*}(t) \end{aligned}$$

and where \({\textbf{p}}^*\) denotes the adjoint state introduced in Proposition 1, associated with the control choice \({\textbf{u}}^*\).

Remark 2

Written in this way, the first-order optimality conditions are difficult to use. In the next section, we will introduce an approximation of the total variation of \({\textbf{u}}^*\) leading to optimality conditions more easily usable within an algorithm.

Proof of theorem 2

To derive the first-order optimality conditions for this problem, it is convenient to introduce the so-called indicator function \(\iota _{\mathcal {U}_{ad}}\) given by

$$\begin{aligned} \iota _{\mathcal {U}_{ad}}({\textbf{u}})=\left\{ \begin{array}{ll} 0 &{} \text {if }{\textbf{u}}\in \mathcal {U}_{ad}\\ +\infty &{} \text {else}. \end{array}\right. \end{aligned}$$

Observing that the optimization problem we want to deal with can be recast as

$$\begin{aligned} \min _{{\textbf{u}}\in L^\infty ((0,T);\mathbb {R}^{N_r})}\mathcal {J}_\theta ({\textbf{u}})+\iota _{\mathcal {U}_{ad}}({\textbf{u}}), \end{aligned}$$

it is standard in nonsmooth analysis to write the first order optimality conditions as:

$$\begin{aligned} 0\in \partial \left( \mathcal {J}_\theta ({\textbf{u}}^*)+ \iota _{\mathcal {U}_{ad}}({\textbf{u}}^*)\right) , \end{aligned}$$

or similarly, by using standard computational rules [17],

$$\begin{aligned} -\partial C_T({\textbf{u}}^*)-\frac{\theta _S}{2}\partial S({\textbf{u}}^*)\in \theta _B\partial B({\textbf{u}}^*)+\partial \iota _{\mathcal {U}_{ad}}({\textbf{u}}^*), \end{aligned}$$

Let \({\textbf{u}}\in \mathcal {U}_{ad}\). The condition above yields the existence of \(\varvec{\eta ^*}=(\eta _i^*)_{1\leqslant i\leqslant N_r}\in \partial {\text {TV}}({\textbf{u}}^*)\) such that

$$\begin{aligned} \textrm{d}C_T({\textbf{u}}^*)({\textbf{u}}-{\textbf{u}}^*) +\theta _S\textrm{d}S({\textbf{u}}^*)({\textbf{u}}-{\textbf{u}}^*)+\theta _B\sum _{i=1}^{N_r}\langle \eta _i^*,u_i-u_i^*\rangle _{L^2(0,T)}\geqslant 0. \end{aligned}$$

Since \({\textbf{u}}\) is arbitrary, we infer that for any admissible perturbation \({\textbf{h}}\) of \({\textbf{u}}^*\) (see definition 1), one has

$$\begin{aligned} \textrm{d}C_T({\textbf{u}}^*){\textbf{h}}+\theta _S\textrm{d}S({\textbf{u}}^*){\textbf{h}}+\theta _B\sum _{i=1}^{N_r}\langle \eta _i^*,h_i\rangle _{L^2(0,T)}\geqslant 0, \end{aligned}$$

or similarly

$$\begin{aligned} \int \limits _0^T {\textbf{h}}\cdot \left( N({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}}) ^\top {\textbf{p}}^*+\theta _S\left( \sum _{i=1}^{N_r}u_i^*-N_{\max }\right) _ ++\theta _B\varvec{\eta ^*}\right) \, \textrm{d}t\geqslant 0. \end{aligned}$$
(16)

To analyze this optimality condition, let us introduce the function \(\Psi :[0,T]\rightarrow \mathbb {R}^{N_r}\) defined by

$$\begin{aligned} \Psi (t)=N({\varvec{\rho }},{\varvec{\gamma }},{\textbf{u}}) ^\top {\textbf{p}}^*(t)+\theta _S\left( \sum _{i=1}^{N_r}u_i^*-N_ {\max }\right) _++\theta _B\varvec{\eta ^*}(t). \end{aligned}$$
(17)

In what follows, we will write the optimality conditions holding for the i-th component of \({\textbf{u}}^*\), where \(i\in \llbracket 1,N_r\rrbracket \) is given.

Let us assume that the set \(\mathcal {I}=\{0<u_i^*<1\}\) has positive Lebesgue measure. Let \(x_0\) be a Lebesgue point of \(u_i^*\) in \(\mathcal {I}\) and let \((G_{n})_{n\in \mathbb {N}}\) be a sequence of measurable subsets with \(G_{n}\) included in \(\mathcal {I}\) and containing \(x_0\). Let us consider \({\textbf{h}}=(h_j)_{1\leqslant j\leqslant N_r}\) such that \(h_j=0\) for all \(j\in \llbracket 1,N_r\rrbracket \backslash \{i\}\) and \(h_i=\mathbb {1}_{G_{n}}\). Notice that \({\textbf{u}}^* \pm \eta {\textbf{h}}\) belongs to \(\mathcal {U}_{ad}\) whenever \(\eta \) is small enough. According to (16), one has

$$\begin{aligned} \pm \int \limits _{G_n} \Psi (t)\cdot e_i\, \textrm{d}t\geqslant 0. \end{aligned}$$

Dividing this inequality by \(|G_{n}|\) and letting \(G_{n}\) shrink to \(\{x_0\}\) as \(n\rightarrow +\infty \) shows that one has

$$\begin{aligned} \Psi (t)\cdot e_i=0, \quad \text {a.e. in }\mathcal {I}. \end{aligned}$$

Let us now assume that the set \(\mathcal {I}_1=\{u_i^*=1\}\) has positive Lebesgue measure. Then, by mimicking the reasoning above, we consider \(x_1\), a Lebesgue point of \(u_i^*\) in \(\mathcal {I}_1\), and \({\textbf{h}}=(h_j)_{1\leqslant j\leqslant N_r}\) such that \(h_j=0\) for all \(j\in \llbracket 1,N_r\rrbracket \backslash \{i\}\) and \(h_i=-\mathbb {1}_{G_{n}}\), where \((G_{n})_{n\in \mathbb {N}}\) is a sequence of measurable subsets with \(G_{n}\) included in \(\mathcal {I}_1\) and containing \(x_1\). According to (16), one has

$$\begin{aligned} - \int \limits _{G_n} \Psi (t)\cdot e_i\, \textrm{d}t\geqslant 0. \end{aligned}$$

As above, we divide this inequality by \(|G_{n}|\) and let \(G_{n}\) shrink to \(\{x_1\}\) as \(n\rightarrow +\infty \). We recover that \(\Psi (t)\cdot e_i\leqslant 0\).

Regarding now the set \(\mathcal {I}_0=\{u_i^*=0\}\), the reasoning is a direct adaptation of the case above, which concludes the proof. \(\square \)

4 Toward a numerical algorithm

In this section, we introduce an exploitable approximation of the problem we are dealing with and describe the algorithm implemented in the numerical part.

4.1 An approximate version of problem (\(\mathcal {P}_\theta \))

The fact that problem (\(\mathcal {P}_\theta \)) involves the total variation of control makes the problem non-differentiable. Of course, dedicated algorithms exist to take into account such a term in the solution, for instance Chambolle’s projection algorithm [10]. Nevertheless, in order to avoid too costly numerical approaches, we have chosen to consider a simple differentiable approximation of the term \(B({\textbf{u}})\), namely

$$\begin{aligned} B_\nu ({\textbf{u}}) = \sum _{i=1}^{N_r} {TV_\nu }(u_i), \end{aligned}$$
(18)

where \(\nu >0\) is a small parameter and the total variation \({\text {TV}}(u_i)\) is approximated by a differentiable functional in \(L^2(0,T)\), denoted \({\text {TV}}_\nu (u_i) \), where \(\nu >0\) stands for a regularization parameter. The concrete choice of the differentiable approximation of the TV standard will be discussed in the rest of this section. We will also give some elements on its practical implementation.

We are thus led to consider the following approximate version of problem (\(\mathcal {P}_\theta \)):

figure c

where \(\mathcal {U}_{ad}\) is given by (6), and \(\mathcal {J}_{\theta ,\nu }\) is given by

$$\begin{aligned} \mathcal {J}_{\theta ,\nu }({\textbf{u}}) = C_T({\textbf{u}}) + \frac{\theta _S}{2} S({\textbf{u}}) + \theta _B B_\nu ({\textbf{u}}), \end{aligned}$$
(19)

We will see that this approximation is in fact well adapted to practical use. Indeed, the first-order optimality conditions for this approximated problem can be rewritten in a very concise and workable way, unlike the result stated in Theorem 2. This is the purpose of the following result.

Theorem 3

Let \({\textbf{u}}^*\in \mathcal {U}_{ad}\) denote a local minimizer for problem (\(\mathcal {P}_{\theta ,\nu }\)) and \(({\varvec{\rho }},{\varvec{\gamma }})\) the associated solution to (LWR-sd). Then, \({\textbf{u}}^*\) satisfies the first order necessary condition

$$\begin{aligned} \Lambda (\cdot )=0\quad \text {a.e. on }[0,T], \end{aligned}$$

where \(\Lambda :[0,T]\rightarrow \mathbb {R}^{N_r}\) is defined by

$$\begin{aligned} \Lambda (t) = \min \left\{ {\textbf{u}}^*(t),\max \left\{ {\textbf{u}}^*(t)-1,\nabla _{{\textbf{u}}} \mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)(t)\right\} \right\} , \end{aligned}$$

and

$$\begin{aligned} \nabla _{{\textbf{u}}} \mathcal {J}_{\theta ,\nu }({\textbf{u}}^*):[0,T]\ni t \mapsto N({\varvec{\rho }}(t),{\varvec{\gamma }}(t),{\textbf{u}}(t)) ^\top {\textbf{p}}^*(t)+\theta _S\left( \sum _{i=1}^{N_r}u_i^*(t)-N_{\max }\right) _ ++\theta _B\sum _{i=1}^{N_r}\nabla _{{\textbf{u}}}{\text {TV}}_\nu (u_i)(t), \end{aligned}$$

where \({\textbf{p}}^*\) has been introduced in Theorem 2, the min, max operations being understood componentwise, and the term \(\nabla _{{\textbf{u}}}\) denoting the gradient with respect to \({\textbf{u}}\) in \(L^2(0,T)\).

Proof

The proof is similar to the proof of theorem 2. Indeed, let \({\textbf{u}}^*\) be a local minimizer for problem (\(\mathcal {P}_{\theta ,\nu }\)). The first-order optimality conditions are given by the so-called Euler inequation and read \( \textrm{d}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*){\textbf{h}}\geqslant 0, \) or similarly

$$\begin{aligned} \int \limits _0^T \nabla _{\textbf{u}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)\cdot {\textbf{h}}\, \textrm{d}t\geqslant 0. \end{aligned}$$

for every admissible perturbation \({\textbf{h}}\), as defined in Definition 1. Let us fix \(i\in \llbracket 1,N_r\rrbracket \). By mimicking the reasoning involving Lebesgue points in the proof of Theorem 2, we get

  • on \(\{u^*_i=0\}\), one has \(\nabla _{{\textbf{u}}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)\cdot e_i \geqslant 0\),

  • on \(\{u^*_i=1\}\), one has \(\nabla _{{\textbf{u}}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)\cdot e_i \leqslant 0\),

  • on \(\{0<u^*_i<1\}\), one has \(\nabla _{{\textbf{u}}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)\cdot e_i = 0\).

Note that, on \(\{u^*_i=0\}\), one has \(\nabla _{\textbf{u}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)\cdot e_i\geqslant 0\) and then \(\Lambda (t) \cdot e_i= \min \{0, \max \{-1, \nabla _{\textbf{u}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)(t)\cdot e_i\}\} = 0\). On \(\{u^*_i=1\}\), one has \(\nabla _{\textbf{u}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)\cdot e_i\leqslant 0\) and therefore \(\Lambda (t) \cdot e_i= \min \{1, \max \{0, \nabla _{\textbf{u}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)(t)\cdot e_i\}\} = 0\). On \(\{0<u^*_i<1\}\), one has \(\nabla _{\textbf{u}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)\cdot e_i= 0\) and therefore \(\Lambda (t) \cdot e_i= \min \{u_i^*, \max \{u_i^*(t)-1, 0\}\} = 0\).

Conversely, let us assume that \(\Lambda (\cdot )=0\). On \(\{u^*_i=0\}\), one has

$$\begin{aligned} 0=\Lambda (t) \cdot e_i=\min \{0, \max \{-1,\nabla _{\textbf{u}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)(t)\cdot e_i\}\}, \end{aligned}$$

so that \(\max \{-1,\nabla _{\textbf{u}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)(t)\cdot e_i\}\geqslant 0\) and finally, \(\nabla _{\textbf{u}}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^*)(t)\cdot e_i \geqslant 0\). A similar reasoning yields the optimality conditions on \(\{u^*_i=1\}\) and \(\{0<u^*_i<1\}\). The conclusion follows. \(\square \)

4.1.1 Practical computation of the TV operator gradient in a discrete framework

From a practical point of view, we discretize the space of controls, which leads us to consider a time discretization denoted \((t_n)_{1\leqslant n\leqslant N_T}\) with \(N_T\in \mathbb {N}^*\) fixed, as well as piecewise constant controls in time, denoted \((u_i^n)_{\begin{array}{c} 1\leqslant i\leqslant N_r\\ 1\leqslant n\leqslant N_T \end{array}}\).

Hence, the term \(u_i^n\) corresponds to the control of road i and time n. The simplest discrete version of the TV semi-norm is given by

$$\begin{aligned} {\text {TV}}({\textbf{u}}) \simeq \sum _{i=1}^{N_r}\sum _{n=2}^{N_T}|u_i^n - u_i^{n-1}|. \end{aligned}$$

In order to manipulate differentiable expressions, we introduce for some \(\nu >0\), the smoothed discrete TV operator is given by

$$\begin{aligned} {\text {TV}}_\nu : \mathbb {R}^{N_r\times N_T}\ni (u_i^n)_{\begin{array}{c} 1\leqslant i\leqslant N_r\\ 1\leqslant n\leqslant N_T \end{array}} \mapsto \sum _{i=1}^{N_r}\sum _{n=2}^{N_T}a_\nu (u_i^n - u_i^{n-1}), \end{aligned}$$
(20)

where \( a_\nu :\mathbb {R}\ni x \mapsto \sqrt{x^2 + \nu ^2}\).

Let \((i_0,n_0)\in \llbracket 1,N_r\rrbracket \times \llbracket 1,N_T\rrbracket \). In what follows, we will use a discrete equivalent of Theorem 3, involving the gradient of \( {\text {TV}}_\nu \), obtained from the expression

$$\begin{aligned} \partial _{u_{i_0}^{n_0}} {\text {TV}}_\nu ({\textbf{u}})=\left\{ \begin{array}{ll} a_\nu '(u_{i_0}^{n_0}-u_{i_0}^{n_0-1})-a_\nu '(u_{i_0}^{n_0+1} -u_{i_0}^{n_0}) &{} \text {if }2\leqslant n_0\leqslant N_T-1\\ a_\nu '(u_{i_0}^{N_T}-u_{i_0}^{N_T-1}) &{} \text {if }n_0=N_T\\ -a_\nu '(u_{i_0}^{2}-u_{i_0}^{1}) &{} \text {if }n_0=1, \end{array} \right. \end{aligned}$$

where \({\textbf{u}}=(u_i^n)_{\begin{array}{c} 1\leqslant i\leqslant N_r\\ 1\leqslant n\leqslant N_T \end{array}}\) is given.

4.2 Numerical solving of the primal and dual problems

The models we use are already discretized in space. We now explain how we discretize them in time. We recall that, at the end of Sect. 4.1, we have already considered that the controls are assimilated to piecewise constant functions on each cell of the considered mesh and on each time step.

4.2.1 Finite volumes scheme for the primal problem

The main transport equation on network is solved by integrating for each road the finite volume flow given by (LWR-sd) with an explicit Euler scheme:

$$\begin{aligned} \rho _{i,j}^{n+1} = \rho _{i,j}^{n} - \frac{\Delta t_n}{\Delta x_{i,j}}\left( {\mathcal {F}}_{i,j+\frac{1}{2}}^{n} - {\mathcal {F}}_{i,j-\frac{1}{2}}^{n}\right) , \quad 1\le i\le N_r,\quad 1\leqslant j\leqslant N_c,\quad {1\leqslant n\leqslant N_T-1}, \end{aligned}$$

using the local Lax-Friedrichs numerical flux

$$\begin{aligned} {\mathcal {F}}_{i,j+\frac{1}{2}}^{n} = {\mathcal {F}}(\rho _{i,j}^{n}, \rho _{i,j+1}^{n}),\quad 2\leqslant j\leqslant N_c-1, \end{aligned}$$

and the Neumann boundary conditions

$$\begin{aligned} {\mathcal {F}}_{i,\frac{1}{2}}^n {:}{=} \gamma _{i,L}^n, \quad {\mathcal {F}}_{i,N_c+\frac{1}{2}}^n {:}{=} \gamma _{i,R}^n, \end{aligned}$$

where \( \gamma _{i,L}^n\) and \( \gamma _{i,R}^n\) are obtained as the solution to the linear programming system

$$\begin{aligned} {\varvec{\gamma }}^n = \phi ^{\text {LP}}({\varvec{\rho }}^{n}, {\textbf{u}}^{n}). \end{aligned}$$

4.2.2 Euler scheme for adjoint problem

To solve the backward ODE (13),it is convenient to introduce \(Z(t) {:}{=} (M(T-t))^T\), and \({\textbf{q}}(t) {:}{=} {\textbf{p}}(T-t)\) such that we are now dealing with the Cauchy system:

$$\begin{aligned} {\left\{ \begin{array}{ll} {\textbf{q}}'(t) = Z(t)\,{\textbf{q}}(t), &{} t\in (0,T), \\ {\textbf{q}}(0) = {\textbf{c}}, &{} \, \end{array}\right. } \end{aligned}$$

that we integrate with a classical explicit Euler scheme:

$$\begin{aligned} {\textbf{q}}^1 = {\textbf{c}},\quad {\textbf{q}}^{n+1} = \left( I+\Delta t_n\,Z^n\right) {\textbf{q}}^n, \quad n=1,\cdots ,N_T-1. \end{aligned}$$

The solution is finally recovered using that \({\textbf{p}}^n = {\textbf{q}}^{N_T-n+1}\).

4.3 Optimization algorithms

The starting point of the algorithm we implement is based on a standard primal-dual approach, in which the state and the adjoint are computed in order to deduce the gradient of the considered functional. We combine it with a projection method in order to guarantee the respect of the \(L^\infty \) constraints on the control. This method has the advantage of being robust, as it generally allows a significant decrease in the cost functional. On the other hand, it often has the disadvantage of being very local, which results in an important dependence on the initialization. Moreover, one can expect that there are many local minima, since the targeted problem is intrinsically of infinite dimension, which makes the search difficult.

We will propose a modification of this well-known method, using a fixed point method inspired by the optimality condition stated in Theorem 3.

4.3.1 Projected gradient descent

A direct approach is to consider the gradient algorithm, in which we deal with the condition \(0\leqslant u^k_i\leqslant 1\) by projection, according to Algorithm 1. The specific difficulty of this approach is to find a suitable descent-step \(\delta _k\), which must be small enough to ensure descent but large enough for the algorithm to converge in a reasonable number of iterations. We hereby combine this algorithm with a scheduler (21) inspired by classical learning rate scheduler in deep-learning [49] to select an acceptable descent step, where \(\delta _0\) and \({\text {decay}}\) are given real numbers.

$$\begin{aligned} \delta _k {:}{=} \frac{\delta _0}{1 + {\text {decay}}\times k}. \end{aligned}$$
(21)
Algorithm 1
figure d

Optimal control by projected gradient descent method (GD)

4.3.2 Fixed point method

A fixed-point (FP) algorithm is derived using the first-order optimality conditions stated in Theorems 2 and 3, by rewriting them in a fixed-point formulation. We have seen that they write under the form

$$\begin{aligned} u_i \in I_\mu \Rightarrow \partial _{u_i}\mathcal {J}_{\theta ,\nu }({\textbf{u}})\in E_\mu , \end{aligned}$$

where \(\mu \in \{0, *, 1\}\), \(I_0 = \{0\}\), \(I_*=(0,1)\), \(I_1=\{1\}\), \(E_0 = \mathbb {R}_+\), \(E_*=\{0\}\), \(E_1=\mathbb {R}_-\). This rewrites as

$$\begin{aligned} u_i \in I_\mu \Rightarrow u_i \in F_\mu {:}{=}\{u_i|\partial _{u_i}\mathcal {J}_{\theta ,\nu }({\textbf{u}})\in E_\mu \}. \end{aligned}$$

This leads us to compute \(u_i\) by using the following fixed-point relationship

$$\begin{aligned} u_i^{k+1} = 1 \times \mathbb {1}_{\{\partial _{u_i}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^k) < -\kappa \}} + u_i^k\,\mathbb {1}_{\{ -\kappa \leqslant \partial _{u_i}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^k) \leqslant \kappa \}} + 0 \times \mathbb {1}_{\{\partial _{u_i}\mathcal {J}_{\theta ,\nu }({\textbf{u}}^k) > \kappa \}}, \end{aligned}$$
(22)

where \(\kappa >0\) is a small threshold preventing too small gradients from modifying the control. This finally leads us to Algorithm 2.

Remark 3

One important advantage of the fixed-point formulation can be illustrated by the fact that some common control configurations can lead to a flat gradient while optimization is still possible: gradient descent is too local to detect certain appropriate and existing descent direction.

Algorithm 2
figure e

Optimal control with Fixed Point method (FP)

It is noteworthy that the convergence of a fixed-point algorithm applied to an extremal eigenvalue problem has been established in [11], for certain parameters regimes involved in the problem. To the best of our knowledge, there is no general convergence result for this type of optimal control problem.

4.3.3 Hybrid GDFP methods

Since gradient descent theoretically always guarantees a direction of descent (provided we choose a small enough step) but can be slow and/or remain trapped in a local extremum (see Remark 3), and since the fixed point appears more exploratory but does not guarantee descent, it seems worthwhile to investigate the hybridization of both methods.

The chosen algorithm is implemented by computing most of the iterations with the gradient descent method and using the fixed point method every \(K\in \mathbb {N}\) iterations in the expectation of escaping from possible basins of attraction of local minimizers. Note that K is a hyper-parameter of the method. This is summarized in Algorithm 3.

Instead of performing FP steps regularly, we can choose to space them in order to have more iterations for the gradient descent to converge. This second version is given in Algorithm 4 where the time between two FP steps increases by a factor of \(\tau \).

Algorithm 3
figure f

Optimal control with hybrid GD & FP algorithm (GDFP)

Algorithm 4
figure g

Optimal control with GDFP with spaced FP steps

5 Numerical results

This section presents some results obtained in various situations using the methods presented above. In all that follows, we assume that \(\rho ^{\max } = 1\), \(v^{\max } = 1\) and \(L = 1\). We will first validate our approach on single junctions before studying more complex road networks. These are namely a traffic circle and a three routes network of intermediate size with a configuration unfavorable to our objective. In all the test cases, we impose a flow equal to \(f_i(\rho _{\text {in}}^b)\) with \(\rho _{\text {in}}^b = 0.66\) on the incoming boundaries of the network and a Neumann flow boundary condition on all outgoing boundaries.

In the following numerical tests, the regularization parameters \(\nu \) and \(\eta \) are set surpentively equal to 0, after testing the agorithm for very low values of these parameters. We found no unusual phenomena in the limit, and the simulations suggest the continuity of the minimizers. The question of choosing \(\nu \) and \(\eta \) is therefore related to the study of the continuous problem mentioned in the concluding Sect. 6.

5.1 Single junctions

We start by considering single junctions of type \(1\! \times \! 1\), \(1\! \times \! 2\), \(2\! \times \! 1\) and \(2\! \times \! 2\) as they will be the building blocks of the more complex networks.

We consider 50 mesh cells per road and the initial density equals 0.66. The route to empty is always composed of an incoming and an outgoing road. The time interval of the simulations is adjusted in order to allow the route to be completely emptied: the final time T thus equals, respectively, 6, 3.5, 10 and 5 for the four junctions. We start the optimization algorithms with initial controls equal to 0 and set the convergence threshold for having \(\Vert \Lambda \Vert \) less than \(10^{-1}\), with a prescribed maximum of 100 iterations. Neither constraints on the number of controls (\(\theta _S=0\)) nor BV regularization (\(\theta _B=0\)) are considered for these test cases.

Figure 1 shows us the comparison between the cost functional history when using the gradient descent (GD), the fixed-point (FP) and the hybrid (GDFP) methods. Specific numerical parameters of the algorithm are given in Table 1. It should be noted that we consider given parameter values, but we have systematically verified that wide ranges of values guarantee the results mentioned, even if this means modifying the number of iterations to guarantee numerical convergence. For example, for the weights \(\theta _S\) and \(\theta _B\), any initial value below \(10^{-5}\) works effectively. The acceptable range for the parameter \(\kappa \) is for instance between 0 and 1 for the \(1\times 2\) junction. Of course, the choice of \(\kappa \) influences the number of iterations required to guarantee numerical convergence.

We observe fundamental differences in behavior between the cost functionals obtained by GD and the one obtained by FP. Indeed, GD allows a regular descent, whereas FP generates jumps and oscillations, which allows a better exploration of the parameters and avoids certain unsatisfactory local minima.

In Fig. 2, we also plotted the evolution of the optimality conditions \(\Lambda \) for all methods: as expected, we observe that this quantity reaches values close to 0 at the optimal point.

The obtained controls are shown in Fig. 3 and seem relevant. For instance, in the \(2\! \times \! 1\) case, the control of the outgoing road 3 is fully activated only after time \(t\approx 8\) to enable the emptying of the incoming road 1 first. We further note that the controls are essentially sparse and non-oscillating. They are almost bang-bang, i.e., take only values 0 and 1.

These results seem to suggest that the hybrid method is the most likely to generalize to larger graphs because of its ability to explore and find critical points while still being able to provide convergence. It is therefore the one we will use in the following.

Table 1 Numerical parameters for single junctions
Fig. 1
figure 1

(Single junctions) Cost functional as a function of iterations for the gradient descent (GD), the fixed point (FP) and the hybrid (GDFP) methods

Fig. 2
figure 2

(Single junctions) Iterations of the convergence criterion \(||\Lambda ||\)

Fig. 3
figure 3

(Single junctions) Optimal controls obtained with the GDFP method. The routes to empty are respectively (1, 2), (1, 3), (1, 3), (1, 3)

5.2 Traffic circle

We consider the traffic circle test case as proposed in [9]. It is composed of 8 roads as depicted in Fig. 4a including 2 incoming roads, 2 outgoing roads and the 4 circle roads. The roads (2, 3, 5, 7) in the circle are counter-clockwise, roads (1, 6) are incoming and roads (4, 8) are outgoing. Initial density is taken constant equal to 0.66 on each road and, in Fig. 4b, we observe that congestion appears if no control is applied, since boundary conditions make new vehicles constantly want to enter at the incoming boundaries.

The route to evacuate is chosen to be (1, 2, 3, 5, 8) (see Fig. 4a) and there are no constraints on the maximal number of controls (\(\theta _S = 0\)) or BV regularization (\(\theta _B=0\)). As this test case are more difficult to handle than single junctions, we use Algorithm 4, which incorporates increasingly large spacing between fixed-point iterations: this gives the gradient iterations more time to converge. The parameters are given by Table 2.

Results are gathered on Fig. 5. On the upper left panel, the cost functional history is depicted. The first four iterations of the gradient descent make the cost functional decrease very slowly, except for the second iteration, and then a FP step triggers the jump seen at iteration 5. A few GD steps are then observed and then another FP step results in a second jump at iteration 10. Then the gradient descent is able to reach a satisfactory local minimum by the 12th iteration. We note that the optimality function \(\Lambda \) (Fig. 5) follows essentially the same behavior and reaches a small value in the final iteration.

The control obtained by the algorithm is given in Fig. 5c, bottom left. We observe first that roads 1 and 6 are always controlled since they are entry roads on the network. These controls acting on the incoming boundaries have an immediate impact on the total number of cars on the road network. That road 3 entrance is always controlled to drive the flow to the outgoing road 4. Furthermore, while road 5 entrance is mostly controlled after time \(t=8\) to let cars from road 3 leave the route before, control on road 7 entrance is quite the opposite: up to time \(t=8\), it is activated to let the flow circulate as road 8 entrance is open, then from time \(t=8\) to \(t=10\), it is deactivated as outgoing road 8 entrance is now closed. Finally, in Fig. 5d, we can check that the route is effectively empty at the final time.

Table 2 Numerical parameters for the traffic circle and the three routes networks
Fig. 4
figure 4

(Traffic circle) Left: Traffic circle configuration. The roads (2, 3, 5, 7) in the circle are counter-clockwise, roads (1, 6) are incoming and roads (4, 8) are outgoing. The route to empty (1, 2, 3, 5, 8) is in red. Right: Numerical simulations without control at time \(T=10\)

Fig. 5
figure 5

(Traffic circle). Results obtained with GDFP algorithm with spaced FP steps

5.3 Three routes network

We consider the three routes network composed of 23 roads as shown in Fig. 6a: all the roads are directed from left to right, with road 1 and road 11 being respectively the incoming and outgoig leaves. Initial density is sill constant equal to 0.66 and a corresponding incoming flux is imposed at the entrance of the network on road 1. Neumann boundary conditions are used at the end of the outgoing road 11. The traffic distribution at each junction is uniform: there are no preferred trajectories. Figure 6b depicts the result of the simulation without control: the density in the central route tends to be saturated as several roads lead onto it.

In the sequel, we would like to determine an optimal control in order to empty the central route, made of roads (4, 6, 7, 8, 9) (see Fig. 6a), with \(N_{\max }=5\). Note that, contrary to the previous test cases, controlling this route may not prevent the flow from circulating from the incoming to the outgoing road.

Fig. 6
figure 6

(Three routes network) Left: Three routes network configuration. Road 1 is incoming and road 11 is outgoing and all the others are directed from left to right. The route to empty (4, 6, 7, 8, 9) is in red. Right: Numerical simulations without control at time \(T=30\)

We now give effect to the regularization by setting \(\theta _S=10^{-4}\) and \(\theta _B=10^{-6}\) for each iteration. We use the numerical parameters of Table 2: in particular, FP step is used only at iteration 5 (Fig. 7).

Fig. 7
figure 7

(Three routes network with activated constraints) Controls obtained with the GDFP algorithm

The penalization algorithm has clearly solved the two issues: the number of active controls is now less than 7, which is still higher than 5 but far preferable to the earlier 16 in an approached control point of view, and there is no time oscillations of the controls even if it had been a mathematically correct way to reduce the staffing constraint.

It is worth noticing that the fixed-point algorithm is also able to converge for this case, but only if we first consider the optimal control problem with essentially no limitation on the number of active controls (\(\theta _S = 10^{-8}\)) nor BV regularization (\(\theta _B = 10^{-8}\)) until \(\Vert \Lambda \Vert _2\) becomes lower than 1. We then consider that we are close to a basin of attraction and we increase the regularization weights by setting \(\theta _S=10^{-4}\) and \(\theta _B=10^{-6}\). The associated plots are given in Figs. 8 and 9 where we used the parameters of Table 2. Figure 8a, b represents the cost functional history and the optimality defect history during the optimization process. We observe first a rapid decrease of the cost functional during the first six FP steps and \(\Vert \Lambda \Vert _2\) becomes lower than 1 at iteration 7.

Figure 8c shows the control provided by the algorithm at this iteration. The time evolution of the road densities with this control is represented in Fig. 9a. The control performs rather well regarding the final cost functional and the final road densities on the central line. However, as it might be expected, this remains rather unsatisfactory given the relatively high number of active controls. When the regularization parameters \(\theta _S\) and \(\theta _B\) are changed at iteration 7, we observe a slight increase of the loss \({\mathcal {J}_\theta }\), mainly due to the increase of \(\theta _S\).

After the fixed-point step at iteration 8, the convergence criterion \(\Vert \Lambda \Vert _2 < 1\) is again satisfied and the algorithm stops.

Fig. 8
figure 8

(Three routes network with activated constraints) Controls obtained with the FP algorithm

Fig. 9
figure 9

(Three routes network with scheduled constraints) Densities obtained with the FP algorithm

6 Conclusion

This article has therefore introduced a new network traffic control problem where the control acts not only on the flux constraints at the entrance of the road but also on the traffic distribution matrices at the junctions. The theoretical analysis of the optimal control problem led to a hybrid numerical method, which combines the gradient descent method and a fixed-point strategy. The numerical results obtained on the traffic circle and on the three-routes network have shown that the method is relatively effective in providing relevant controls. It is also possible to impose constraints on the number of active controls.

This work will be pursued in several directions. Firstly, it would be interesting to show theoretically that optimal controls are indeed bang-bang, i.e., that they take only the two extreme values 0 and 1, as suggested by our numerical results. This could lead to further improvements in the optimization algorithm. For example, we plan to formulate a shape optimization problem whose unknown is the (temporal) domain over which control is equal to 1, and then apply efficient algorithms for such problems, such as the level set method.

Secondly, the indicator and projection functions make our algorithms sensitive to threshold or scaling effects. Further analysis could prove very useful in generically determining hyperparameters such as \(\theta \) or \(\kappa \), given that they are chosen heuristically so far. Based on our testing, a wide range of values for \(\kappa \) have proven effective. However, in some test cases, \(\kappa \) should not be too small to ensure algorithm convergence. For instance in the \(2\times 2\) test case, the FP algorithm is stuck in a local minimum for any \(\kappa \) whose order of magnitude is less than \(10^{-4}\), and converges for every \(\kappa \in [10^{-4},1]\). As for the value of K, it mainly influences the number of iterations required for convergence in single junctions. However, in more complex test cases, such as the traffic circle scenario, it can significantly alter the behavior of the algorithm. In these cases, solutions that are less easily reachable may not be found if the fixed-point iterations take steps that are too large, causing the control to move away from these solutions. Finally, the problem has been tackled here using a strategy of discretization (in space) followed by optimization. Another possibility would have been to optimize first and then discretize. However this would require to deal with a non-differentiable cost function since the density is non-differentiable anymore. Such a strategy may give rise to a different type of algorithm.

More generally, we consider it important to clarify the links between the solutions of fully discretized (in time and space), semi-discretized (in space) and continuous (without discretization) versions of the problem. These are fairly advanced \(\Gamma \)-convergence issues, which will be the subject of future reflection. In particular, we will seek to implement an argument to certify that the approach chosen in this article does not create “numerical pathology” when trying to approximate solutions to the continuous version of the optimization problem.

A longer-term objective we plan to tackle in the coming years is to generalize and test our algorithm both on larger road networks and on real data. To this end, we will seek to couple this method with relevant initialization methods for which initial control would be obtained by considering a macroscopic graph and implementing model reduction methods. Such an approach has, for example, been developed in a different context in epidemiology [15].