1 Introduction

Transportation systems play an important role in modern life. Due to population growth and the massive production of vehicles, traffic congestion problems in metropolitan areas have become a common daily occurrence. To a commuter or traveler, congestion means loss of time, potentially missed business opportunities, and increased stress and frustration. To an employer, congestion means lost worker productivity, reduced trade opportunities, delivery delays, and increased costs (Wen 2008). For example, a significant aspect is the value of wasted fuel and loss of productivity. In 2010, traffic congestion cost about US$115 billion in the 439 urban areas of the United States alone (Schrank et al. 2011).

Minimizing driving time directly impacts quality of life. One way to reduce travel time is by lowering congestion through the redistribution of traffic throughout the network. Improvements in transportation systems require a careful analysis of several factors. Different alternatives are evaluated using models that attempt to capture the nature of transportation systems and thus allow the estimation of the effect of future changes in system performance. Performance measures include efficiency in time and cost, security, and social and environmental impact, among others.

Several strategies have been proposed to reduce traffic congestion. Among them, the deployment of tolls on certain roads can induce drivers to choose alternative routes, thus reducing congestion as the result of better traffic flow distribution. Naturally, tolls can increase the cost of a trip, but this can be compensated with less travel time, reduced fuel cost, and lower amounts of stress. In the 1950s, Beckmann et al. (1956) proposed the use of tolls with this objective. This idea has made its way into modern transportation networks. In 1975, Singapore implemented a program called Electronic Road Pricing or ERP. Several cities in Europe and the United States, such as in London and San Diego, have begun to charge toll on their transportation networks (Bai et al. 2010). In fact, tolls are being deployed for traffic engineering in many small as well as large cities around the world.

Determining the location of tollboothsFootnote 1 and their corresponding tariffs is a combinatorial optimization problem. This problem has aroused interest in the scientific community not only because of its intrinsic difficulty, but also because of the social importance and impact of its solution.

The optimization of transportation network performance has been widely discussed in the literature. The minimum tollbooth problem (MINTB), first introduced by Hearn and Ramana (1998), aims at minimizing the number of toll locations to achieve system optimality. Yang and Zhang (2003) formulate second-best link-based pricing as a bi-level program and solve it with a genetic algorithm. In Bai et al. (2010) it is shown that the problem is NP-hard and a local search heuristic is proposed. Another similar problem is to minimize total revenue (MINREV). MINREV is similar to MINSYS, but in this class of problems tolls can be negative as well as positive, while MINSYS does not accept negative tolls (Hearn and Ramana 1998; Dial 1999a, b; Hearn and Yildirim 2002; Bai et al. 2004). For a complete review of the design and evaluation of road network pricing schemes we refer the reader to the survey by Tsekeris and Voß (2009).

Two important transportation network concepts were introduced by Wardrop (1952): user equilibrium (UE) and system optimal (SO). The former is related to the equilibrium obtained when each user chooses a route that minimizes his/her costs in a congested network. In an UE state, any user can reduce his/her own travel cost by changing routes. Differently, SO is related to a state of equilibrium with minimum average journey time. This occurs when the users cooperate to choose their routes. However, the user usually chooses his/her own route in a non-cooperative manner. In a simplistic modeling behavior, users can choose their routes by different criteria. One possible simplification assumes that users choose their routes considering only fixed costs such as time to travel, or a value that depends on the congestion, or even only the toll values. These situations do not correspond to user equilibrium, but model different behaviors of the users.

In this paper, we approach the tollbooth problem by routing on shortest paths as first studied in Buriol et al. (2010). The objective is to determine the location of a fixed number \({\mathcal {K}}\) of tollbooths and set their corresponding tariffs so that users travel on shortest paths between origin and destination, reducing network congestion. We calculate shortest paths according to two weight functions. In the first, the weights correspond to the tariffs of the tolled arcs. The second function considers as the weight of each arc its toll tariff added to its free flow time, where free flow time of an arc is defined to be the congestion-free time to traverse the arc. We also present a mathematical model for the minimum average link travel time and the tollbooth problem. We further propose two piecewise-linear functions that approximate an adapted convex travel cost function of the Bureau of Public Roads (1964) for measuring link congestion. Finally, we extend the work in Buriol et al. (2010) presenting a larger set of experiments, considering a new arc value to calculate shortest paths, a review of the algorithm components, such as the local search, and a more detailed review of the behavior of the algorithm, including a new set of instances and an analysis of characteristics of the final solutions.

This paper is organized as follows. In Sect. 2 we present mathematical models for the minimum average link travel time, the tollbooth problem, and two approximate piecewise-linear functions for travel cost. The biased random-key genetic algorithm with local search proposed in Buriol et al. (2010) is presented in Sect. 3. Computational results are reported in Sect. 4. Finally, conclusions are drawn in Sect. 5.

2 Problem formulation

A road network can be represented as a directed graph \(G=(V,A)\) where \(V\) represents the set of nodes (street or road intersections or points of interest), and \(A\) the set of arcs (street or road segments). Each arc \(a \in A\) has an associated capacity \(c_a\), and a time \(t_a\), called the free flow time, necessary to transverse the unloaded arc \(a\). To calculate the congestion on each link, a potential function \(\Phi _a\) is computed as a function of the load or flow \(\ell _a\) on arc \(a\), along with \(\alpha _a\) and \(\beta _a\), two real-valued arc-tuning parameters. In addition, let

$$\begin{aligned} K =\left\{ (o(1),d(1)), (o(2),d(2)), \ldots , (o(|K|),d(|K|)\right\} \subseteq V \times V \end{aligned}$$

denote the set of commodities or origin-destination (OD) pairs, where \(o(k)\) and \(d(k)\) represent, respectively, the origination and destination nodes for \(k = 1,\ldots , |K|\). Each commodity \(k\) has an associated demand of traffic flow \(d_k=d_{o(k),d(k)}\), i.e., for each OD pair \((o(k), d(k))\), there is an associated flow \(d_k\) that emanates from node \(o(k)\) and terminates in node \(d(k)\). In this paper we address the problem in which all the demand is routed on the network, such that traffic congestion is minimized. To encourage traffic to take on particular routes, we resort to levying tolls on selected street or road segments.

Before we describe our mathematical models, some notation is introduced. We denote by \( IN (v)\) the set of incoming arcs to node \(v \in V\), by \( OUT (v)\) the set of outgoing arcs from node \(v \in V\), by \(a = (a_t, a_h) \in A\) a directed arc of the network, where \(a_t \in V\) and \(a_h \in V\) are, respectively, the tail and head nodes of arc \(a\), by \({\mathcal {S}}=\sum _{k = 1}^{|K|} d_k\) the total sum of demands, and by \({\mathcal {Q}} \subseteq V\) the set of destination nodes. Moreover, we denote by \(\Phi _a\) the traffic congestion of arc \(a \in A\), and by \({\mathcal {K}}\) the number of tollbooths to deploy (tolls are levied on users of the network at tollbooths). The values of \(\varphi _a^u\) and \(\varphi _a^l\) are approximations of traffic congestion cost on arc \(a \in A\) given by piecewise-linear functions. We note that throughout the paper we refer to flow and load interchangeably, as we do for commodity and demand.

In the next subsection we present a mathematical model of a relaxation of the tollbooth problem that does not take into account shortest paths. In Sect. 2.2 a complete model for the tollbooth problem is presented and in Sect. 2.3 we propose two piecewise-linear functions that approximate the convex cost function.

2.1 Model for minimization of average user travel time (MM1)

The evaluation of the traffic congestion cost can be defined in different ways according to specific goals. In this paper we use the potential function

$$\begin{aligned} \Phi = \sum _{a \in A}{\Phi _a}, \text{ where }~ \Phi _a=\frac{\ell _a}{{\mathcal {S}}} t_a \left[ 1+\beta _a\left( \frac{\ell _a}{c_a}\right) ^{\alpha _a}\right] , \text{ for } \text{ all }~a \in A, \end{aligned}$$

which is the convex travel cost function of the Bureau of Public Roads (1964) for measuring link congestion scaled by the term \(\ell _a/{\mathcal {S}}\). This way, the potential function evaluates the average user travel time over all trips. Function \(\Phi _a\) is convex and nonlinear and is a strictly increasing function of \(l_a\).

A mathematical programming model of average user travel time is

$$\begin{aligned} \min \;&\Phi = \sum _{a \in A}\ell _a t_a\bigl [1+\beta _a(\ell _a/c_a)^{\alpha _a}\bigr ]/{\mathcal {S}} \end{aligned}$$
(1)
$$\begin{aligned} \text {subject to: } \nonumber \\&\ell _a=\sum _{q \in {\mathcal {Q}}}x_a^{q}, \; \forall a \in A, \end{aligned}$$
(2)
$$\begin{aligned}&\sum _{a\in OUT (v)} x_a^{q} - \sum _{a\in IN (v)}x_a^{q} = d_{v,q}, \; \forall v \in V \backslash \{q\}, \; \forall q \in {\mathcal {Q}}, \end{aligned}$$
(3)
$$\begin{aligned}&x_{a}^{q} \ge 0, \quad \forall a \in A, \; \forall q \in {\mathcal {Q}}, \end{aligned}$$
(4)
$$\begin{aligned}&\ell _a \ge 0, \quad \forall a \in A. \end{aligned}$$
(5)

Its goal is to determine flows on each arc such that the average user travel time is minimized. In this model, decision variables \(x_a^{q} \in \mathbb {R}^+\) represent the total flow to destination \(q \in {\mathcal {Q}}\) on arc \(a \in A\), and variables \(\ell _a \in \mathbb {R}^+\) represent the total flow on arc \(a \in A\). Objective function (1) minimizes average user travel time. Constraints (2) define total flow on each arc \(a \in A\) taking into consideration the contribution of all commodities. Constraints (3) guarantee flow conservation, and (4)–(5) define the domains of the variables.

This model computes flow distribution without taking into account that users take a least cost route, providing a lower bound for the tollbooth problem to be described in the next subsection.

2.2 Model for the tollbooth problem (MM2)

A mathematical programming model for the tollbooth problem is

$$\begin{aligned} \min \;&\; \Phi = \sum _{a \in A} \ell _a t_a \bigl [1 + \beta _a(\ell _a/c_a)^{\alpha _a}\bigr ]/{\mathcal {S}} \end{aligned}$$
(6)
$$\begin{aligned} \text {subject to:} \nonumber \\&\ell _a=\sum _{q \in {\mathcal {Q}}}x_a^{q}, \; \forall a \in A, \end{aligned}$$
(7)
$$\begin{aligned}&\sum _{a\in OUT (v)} x_a^{q} - \sum _{a\in IN (v)}x_a^{q} = d_{v,q}, \; \forall v \in V \backslash \{q\}, \; \forall q \in {\mathcal {Q}}, \end{aligned}$$
(8)
$$\begin{aligned}&C_a+w_a+\delta _{a_h}^{q}-\delta _{a_t}^{q}\ge 0, \; \forall a\in A, \; \forall q \in {\mathcal {Q}}, \end{aligned}$$
(9)
$$\begin{aligned}&\delta _q^{q}=0,\; \forall q \in {\mathcal {Q}}, \end{aligned}$$
(10)
$$\begin{aligned}&C_a+w_a+\delta _{a_h}^{q}-\delta _{a_t}^{q}\ge (1-y_a^{q})/M_1, \; \forall a\in A, \; \forall q \in {\mathcal {Q}}, \end{aligned}$$
(11)
$$\begin{aligned}&C_a+w_a+\delta _{a_h}^{q}-\delta _{a_t}^{q}\le (1-y_a^{q})M_2, \; \forall a\in A, \; \forall q \in {\mathcal {Q}}, \end{aligned}$$
(12)
$$\begin{aligned}&M_3 y_a^{q}\ge x_a^{q},\; \forall a\in A, \; \forall q \in {\mathcal {Q}}, \end{aligned}$$
(13)
$$\begin{aligned}&M_3 y_{a}^{q} +M_3 y_{b}^{q}\le 2M_3-x_{a}^{q}+x_{b}^{q},\; \forall a, b \in \mathsf {A}_{ OUT (v)}^2, \forall v \in V, \; \forall q \in {\mathcal {Q}}, \end{aligned}$$
(14)
$$\begin{aligned}&P_{l}p_a\le w_a\le P_{u}p_a,\; \forall a \in A, \end{aligned}$$
(15)
$$\begin{aligned}&\sum _{a\in A}p_a={\mathcal {K}},\; \forall a \in A, \end{aligned}$$
(16)
$$\begin{aligned}&x_{a}^{q} \ge 0, \; \forall a \in A, \; \forall q \in {\mathcal {Q}}, \end{aligned}$$
(17)
$$\begin{aligned}&\ell _a\ge 0,\; \forall a \in A, \end{aligned}$$
(18)
$$\begin{aligned}&w_a\ge 0,\; \forall a \in A, \end{aligned}$$
(19)
$$\begin{aligned}&\delta _v^{q}\ge 0,\; \forall q \in {\mathcal {Q}},\; \forall v \in V, \end{aligned}$$
(20)
$$\begin{aligned}&p_a \in \{0,1\},\; \forall a \in A. \end{aligned}$$
(21)

This model seeks to levy tolls on \({\mathcal {K}}\) arcs of the transportation network such that the average user travel time is minimized if traffic is routed on least-cost paths. Here, the cost of a path is defined to be the sum of the tolls levied on the arcs of the path, or the sum of tolls and free flow times. We later describe these arc weight functions in more detail.

The decision variables for this model determine whether an arc will host a tollbooth and the amount of toll levied at each deployed tollbooth. Denote by \(w_a \in \{ 0, P_l, P_l+1, \ldots , P_u \}\) the toll tariff levied on arc \(a \in A\), where \(P_{l}, P_{u} \in \mathbb {N}^+\) are the minimum and maximum tariff values, respectively. For convenience we define \(P_l=1\). If no toll is levied on arc \(a\), then \(w_a= 0\). The binary decision variable \(p_a = 1\) if a tollbooth is deployed on arc \(a \in A\). The auxiliary binary variable \(y_a^{q} = 1\) if arc \(a \in A\) is part of a shortest path to destination node \(q \in {\mathcal {Q}}\). Finally, auxiliary variable \(\delta _v^{q}\) is the shortest-path distance from node \(v \in V\) to destination node \(q \in {\mathcal {Q}}\), and the constants \(M_1,\,M_2\), and \(M_3\) are sufficiently larger numbers.

Objective function (6) minimizes average user travel time. Constraints (7) define the total flow on each arc \(a \in A\) while constraints (8) impose flow conservation. The other constraints force the flow of each commodity to follow the shortest path between the corresponding OD pair. An arc \(a\) belongs to the shortest path to destination \(q\) if the distance \(\delta _{a_h}^{q}-\delta _{a_t}^{q}\) is equal to the arc cost, which in this case is \(C_a+w_a\), where \(C_a\) will be introduced later in this subsection. Thus, constraints (9) define the shortest path distance for each node \(v \in V\) and each destination \(q \in {\mathcal {Q}}\). For consistency, constraints (10) require, for all \(q \in {\mathcal {Q}}\), that the shortest distance from \(q\) to itself be zero. Constraints (11) and (12) together with (9) and (10) determine whether arc \(a \in A\) belongs to the shortest path and thus determine the values of \(y_a^{q}\), for \(q \in {\mathcal {Q}}\). Constraints (11) require that an arc that does not belong to the shortest path have reduced cost \(C_a+w_a+\delta _{a_h}^{q}-\delta _{a_t}^{q} > 0\). Constraints (12) assure that if the reduced cost of arc \(a \in A\) and destination \(q \in {\mathcal {Q}}\) is equal to zero, then arc \(a\) belongs to the shortest path to destination \(q\), i.e. \(y_a^{q} = 1\). In the computational experiments of Sect. 4.2, we used \(M_1=100\) and \(M_2=1{,}000\). Constraints (13) assure that flow is sent only on arcs belonging to a shortest path. Constraints(14) are the even-split constraints. They guarantee that flow is split evenly among all shortest paths. In these constraints, \(\mathsf {A}_{ OUT (v)}^2\) is the set of all ordered groups of two distinct elements of \( OUT (v)\). We later discuss these constraints in more detail. Constraints (15) limit the minimum and maximum tariff for a deployed tollbooth. Constraints (16) require that exactly \({\mathcal {K}}\) tolls be deployed. The remaining constraints define the domains of the variables.

Constraints (14) come in pairs for each node \(v \in V\). For every pair of outgoing links \(a \in OUT (v)\) and \(b \in OUT (v)\): \(\{a,b\} \in \mathsf {A}_{ OUT (v)}^2\) and \(\{b,a\} \in \mathsf {A}_{ OUT (v)}^2\), there are two corresponding constraints. They model load balancing by assuring that if the flow from node \(v \in V\) to destination \(q \in {\mathcal {Q}}\) is routed on both arcs \(a \in A\) and \(b \in A\), i.e. if \(y_a^{q}=y_b^{q}=1\), then the flow on these arcs must be evenly split, i.e. \(x_a^{q}=x_b^{q}\). To see this, suppose \(y_a^{q}=y_b^{q}=1\). The constraint for pair \(\{a,b\} \in \mathsf {A}_{ OUT (v)}^2\) implies that \(x_a^{q} \le x_b^{q}\). By symmetry the constraint for pair \(\{b,a\} \in \mathsf {A}_{ OUT (v)}^2\) implies that \(x_a^{q} \ge x_b^{q}\). Consequently, \(x_a^{q} = x_b^{q}\). Note that taking \(M_3 = \max _{q \in {\mathcal {Q}}} \left( \sum _{v \in V}d_{v,q} \right) \) we assure that the right-hand-side of constraint (14) is bounded from below by \(M_3\), making these constraints redundant for pairs of links with at most one of either \(y_a^k\) or \(y_b^k\) equal to one.

A model for OSPF routing, which also considers shortest paths and even flow splitting, was previously proposed in Broström and Holmberg (2006). In their model a shortest path graph is built for each OD pair, while we opted for building a shortest path graph from all nodes to each node \(q \in {\mathcal {Q}}\). This modification reduces the number of variables and constraints of the model.

We evaluate shortest paths according to two weight functions. In the first approach, called SPT (Shortest Path Toll), we define the weight of an arc \(a \in A\) to be the tariff \(w_a\) levied on that arc. In this case, we set \(C_a = \epsilon \), a sufficiently small value. This way, when there are one or more zero-cost paths, the flow is always sent along paths having smallest hop count. In the second approach, called SPTF (Shortest Path Toll+Free flow time), we define the weight of an arc \(a \in A\) to be the tariff \(w_a\) levied on the arc plus the free flow time \(t_a\) of the arc, i.e. parameter \(C_a = t_a + \epsilon \). The value \(\epsilon > 0\) is added to the cost with the same goal as in the case of SPT since it is possible that \(t_a = 0\) for one or more arcs \(a \in A\).

2.3 Piecewise-linear functions for the models

The performance of mixed integer linear programming solvers has improved considerably over the last few years. The two mathematical programming models presented so far have a nonlinear objective function \(\Phi \). To apply these solvers, one must first linearize \(\Phi \), resulting in an approximation of the nonlinear objective function. One possible option is to approximate the nonlinear function by a piecewise linear function. Fortz and Thorup (2004) proposed a piecewise-linear function for a general routing problem to approximate network congestion cost. Ekström et al. (2012) describe an iterative approximation by piecewise linear function for the travel time and total travel time, resulting in a mixed integer linear program.

In this subsection, we propose two piecewise-linear approximations of the function \(\Phi = \sum _{a \in A} \Phi _a\). The first linearization \(\varphi ^u\), is an overestimation, and under certain conditions is an upper bound of \(\Phi \). The second linearization \(\varphi ^l\) is an underestimation and provides a lower bound of \(\Phi \). It is possible to apply these linearizations to any model with this type of nonlinear function. We apply them to models MM1 and MM2.

Let \(\Omega \) be the set of constraints (2)–(5) or (7)–(21) of the previously described mathematical models. For the case where \(\Omega \) represents the constraints of the MM1 model the approximation is called LMM1. On the other hand, when \(\Omega \) represents the constraints of the MM2 model, we call the approximation LMM2.

In approximation \(\varphi ^u\), the cost function of each arc \(a \in A\) is composed of a series of line segments sequentially connecting coordinates

$$\begin{aligned} (X_0,\Phi _a(X_0)), \left( X_1,\Phi _a(X_1)\right) , \ldots , (X_n,\Phi _a(X_n)), \end{aligned}$$

where values \(X_0, X_1, \ldots , X_n\) are given such that \(X_0 = 0\), and for \(i=1,\ldots ,n,\,X_i \in \mathbb {R}\) and \(X_i > X_{i-1}\).

If we denote the cost on arc \(a \in A\) by \(\varphi _a^u\), then the resulting mathematical programming model of the overestimation \(\varphi ^u\) is

$$\begin{aligned} \min&\; \sum _{a \in A}\varphi _a^u \end{aligned}$$
(22)
$$\begin{aligned} \text {subject to:} \nonumber \\&\text{ Constraints } \,\Omega \,\text{ are } \text{ satisfied, } \end{aligned}$$
(23)
$$\begin{aligned}&(m_a^i/c_a)\ell _a+b_a^i\le \varphi _a^u, \; \forall a \in A,\; \forall i=1,\ldots ,n, \end{aligned}$$
(24)
$$\begin{aligned}&\varphi _a^u \ge 0,\; \forall a \in A, \end{aligned}$$
(25)

where

$$\begin{aligned} m_a^i&=(\Phi _a(X_{i})-\Phi _a(X_{i-1}))/(X_{i}-X_{i-1}),\\ b_a^i&=\Phi _a(X_{i})-X_{i}m_a^i, \end{aligned}$$

where

$$\begin{aligned} \Phi _a(X_i) = X_i c_a t_a (1+\beta _a(X_i)^{\alpha _a})/{\mathcal {S}} \end{aligned}$$

for \(X_0=0<X_1<\cdots <X_n\). Objective function (22) minimizes the approximation of average user travel time. Constraints (24) evaluate the partial cost on each arc by determining the approximate value \(\varphi _a^u\) for \(\Phi _a\) according to load \(l_a\). Constraints (25) define the domain of the variables.

The linearization requires the definition of the terms \(X_0,X_1,\ldots ,X_n\) whose values are computed as a function of \({\ell _a}/{c_a}\). The number of these terms can be arbitrarily defined according to the accuracy required for the linearization of the cost function, or according to characteristics of the set of instances. This linearization requires a balance between the accuracy of the computed solution and the time to compute the linearization. With a large number \(n\), the linearization tends to provide a better approximation of the original value, while a small value of \(n\) can save time while solving the model since each element entails \(|A|\) additional constraints.

A second linearization, which we denote by \(\varphi ^l_a\), is an underestimation and gives us a lower bound on \(\Phi _a\). The mathematical model of this linearization is similar to that of the overestimation. However, to estimate \(\varphi _a^l\), we first compute the slope \(m_a(x)\) of \(\Phi _a\) at \(x = (X_{i-1}+X_{i})/{2}\), for \(i=1,\ldots ,n\), as

$$\begin{aligned} m_a(x) = \frac{\partial \Phi _a}{\partial x}= \frac{t_a}{{\mathcal {S}}}+\frac{(\alpha _a+1)t_a\beta _a x^{\alpha _a}}{c_a^{\alpha _a}{\mathcal {S}}}. \end{aligned}$$

Given \(x\) and \(m_a(x)\), the independent term can be easily computed.

Linearizations \(\varphi _a^l\) and \(\varphi _a^u\) produce, respectively, an underestimation and an overestimation of \(\Phi _a\), as Proposition 1 states.

Proposition 1

Let \(\varphi ^u = \sum _{a \in A} \varphi _a^u\), \(\varphi ^l = \sum _{a \in A} \varphi _a^l\), and as before \(\Phi = \sum _{a \in A} \Phi _a\). Let \(X_0, X_1,\ldots ,X_n\) be the values for which the approximation is computed. If \({\ell _a}/{c_a} \le X_n, \forall a \in A\), then \(\varphi ^l \le \Phi \le \varphi ^u\).

Proof

As \(\Phi \) is convex, by construction \(\varphi _a^l \le \Phi _a\), then \(\Phi =\sum _{a \in A} \Phi _a \ge \sum _{a \in A} \varphi _a^l=\varphi ^l\). Thus \(\Phi \ge \varphi ^l\). Furthermore, if \({\ell _a}/{c_a}\le X_n\), then by construction \(\varphi _a^u \ge \Phi _a\), which implies that \(\varphi ^u=\sum _{a \in A} \varphi _a^u \ge \sum _{a \in A} \Phi _a=\Phi \). Thus \(\varphi ^u \ge \Phi \). Therefore \(\varphi ^l \le \Phi \le \varphi ^u\). \(\square \)

Note that for Proposition 1 to hold we do not make the assumption that the underestimation \(\varphi ^l\) be a lower bound of \(\Phi \), while the overestimation \(\varphi ^u\) requires that \({\ell _a}/{c_a} \le X_n, \forall a \in A\) be true for the proposition to hold.

A representation of the functions \(\varphi _a^u\), \(\varphi _a^l\), and  \(\Phi \) is depicted in Fig. 1. It shows the cost function \(\Phi \) (solid line) as well as the piecewise-linear cost functions \(\varphi ^u\) and \(\varphi ^l\) for an arc \(a \in A\) with \(t_a=5\), \(c_a=200\), \(\alpha _a=4\), \(\beta _a=0.15\), and \({\mathcal {S}}=1{,}000\) using with \(\{X_0, X_1, \ldots , X_6\}=\{0, 0.65, 1, 1.25, 1.7, 2.7, 5\}\). Observe that there is a higher concentration of points \(X\) in the range \(\frac{l_a}{c_a}=[0.65; 1.25]\). This dense concentration of points in this region is used because the flow on the majority of the arcs is concentrated around their capacity. Thus, to obtain a good approximation requires that several \(X\) values be set to values around \(\frac{l_a}{c_a} = 1\). Note that a ratio of \(\frac{l_a}{c_a} > 1\) indicates that the arc is overloaded.

Fig. 1
figure 1

Comparison of the cost function with the linear piecewise-linear cost function

3 A biased random-key genetic algorithm

In this section we describe the biased random-key genetic algorithm (BRKGA) for the tollbooth problem, proposed in Buriol et al. (2010).

A random-key genetic algorithm (RKGA) is a metaheuristic, originally proposed by Bean (1994), for finding optimal or near-optimal solutions to optimization problems. RKGAs encode solutions as vectors of random keys, i.e. randomly generated real numbers in the interval \((0,1]\). A RKGA starts with a set (or population) of \(p\) random vectors of size \(n\). Parameter \(n\) depends on the encoding while parameter \(p\) is user-defined. Starting from the initial population, the algorithm generates a series of populations. Each iteration of the algorithm is called a generation. The algorithm evolves the population over the generations by combining pairs of solutions from one generation to produce offspring solutions for the following generation.

RKGAs rely on decoders to translate a vector of random keys into a solution of the optimization problem being solved. A decoder is a deterministic algorithm that takes as input a vector of random keys and returns a feasible solution of the optimization problem as well as its cost (or fitness).

At the \(k\)th generation, the decoder is applied to all newly created random keys and the population is partitioned into a smaller set of \(p_e\) elite solutions, i.e., the fittest \(p_e\) solutions in the population and another larger set of \(p-p_e>p_e\) non-elite solutions. Population \(k+1\) is generated as follows. All \(p_e\) elite solutions of population \(k\) are copied without change to population \(k+1\). This elitist strategy maintains the best solution on hand. In biology, as well as in genetic algorithms, evolution only occurs if mutation is present. As opposed to most genetic algorithms, RKGAs do not use a mutation operator, where each component of the solutions is modified with small probability. Instead \(p_m\) mutants are added to population \(k+1\). A mutant is simply a vector of random keys, generated in the same way a solution of the initial population is generated.

With \(p_e + p_m\) solutions accounted for population \(k+1\), \(p-p_e-p_m\) additional solutions must be generated to complete the \(p\) solutions that make up population \(k+1\). This is done through mating or crossover. In the RKGA of Bean (1994), two solutions are selected at random from the entire population. One is parent-\(A\) while the other is parent-\(B\). A child \(C\) is produced by combining the parents using parameterized uniform crossover (Spears and DeJong 1991). Let \(\rho _A > 1/2\) be the probability that the offspring solution inherits the key of parent-\(A\) and \(\rho _B=1-\rho _A\) be the probability that it inherits the key of parent-\(B\), i.e.

$$\begin{aligned} c_i = {\left\{ \begin{array}{ll} a_i &{}\text {with probability}\,\, \rho _A,\\ b_i &{}\text {with probability}\,\, \rho _B=1-\rho _A, \end{array}\right. } \end{aligned}$$

where \(a_i\) and \(b_i\) are, respectively, the \(i\)-th key of parent-\(A\) and parent-\(B\), for \(i=1,\ldots ,n\). This crossover always produces a feasible solution since \(c\) is also a vector of random keys and by definition the decoder takes as input any vector of random keys and outputs a feasible solution.

Biased random-key genetic algorithms (Gonçalves and Resende 2011) differ from Bean’s algorithm in the way parents are selected. In a BRKGA parent-\(A\) is always selected at random from the set of \(p_e\) elite solutions while parent-\(B\) is selected at random from the set of \(p-p_e\) non-elite solutions. The selection process is biased since an elite solution \(s\) has probability \(Pr(s)=1/p_e\) of being selected for mating while a non-elite solution \(\bar{s}\) is selected with probability \(Pr(\bar{s})=1/(p-p_e)\). Since \(p-p_e > p_e\), then \(Pr(s) > Pr(\bar{s})\). In addition, elite solutions have a higher probability of passing on their random keys since probability \(\rho _A>1/2\). Though the difference between RKGAs and BRKGAs is small, the resulting heuristics behave quite differently. Experimental results in Gonçalves et al. (2014) show that BRKGAs are almost always faster and more effective than RKGAs.

To describe a BRKGA, one need only show how solutions are encoded and decoded, what choice of parameters \(p\), \(p_e\), \(p_m\), and \(\rho _A\) were made, and how the algorithm stops. We describe the encoding and decoding procedures next and give values for parameters as well as the stopping criterion in Sect. 4.

Solutions are encoded as a \(2 \times |A|\) vector \(\mathcal {X}\), where \(|A|\) is the cardinality of the set \(A\) of arcs in the network. The first \(|A|\) keys correspond to the random keys which define the toll tariffs while the last \(|A|\) keys correspond to a binary vector \(b\), with \({\mathcal {K}}\) positions set to one, used to indicated tolled arcs.

The decoder has two phases. In the first phase tolls are selected and arc tariffs are set directly from the random keys. In the second phase, a local improvement procedure attempts to change the tariffs with the goal of reducing the value of the objective function. Each tolled arc \(a\) has a tariff in the interval \([1,w_{\mathrm{max}}]\), where \(w_{\mathrm{max}}\) is an input parameter. The tariff for arc \(a\) is simply decoded as \(b_a \cdot \lceil \mathcal {X}_a \cdot w_{\mathrm{max}} \rceil \). In an initial solution, the \({\mathcal {K}}\) tolled arcs are selected randomly by uniform distribution. In a crossover, if both parents have a toll in arc \(a\), the same arc is tolled in the child. The remaining tolls are selected randomly among the arcs whose parents have different values.

Demands are routed forward to their destinations on shortest weight paths. For SPT, tolled links have weights equal to their tariffs and untolled links are assumed to have weight zero. For SPFT, we add to the tariff the free flow time to define the weight of all tolled arcs, while each untolled arc has weight equal to its free flow time. Depending on the number of tolls and the network, there can be several shortest paths of cost zero (especially for SPT). In this case, we use the path with the least number of hops. Traffic at intermediate nodes is split equally among all outgoing links on shortest paths to the destination. After the flow is defined, the fitness of the solution is computed by evaluating the objective function \(\Phi \).

The second phase of the decoder is a local improvement. Local search is applied to the solution produced in the first phase of the decoder. In short, it works as follows. Let \(q_{ ls }\) be an integer parameter and \(A^* \subseteq A\) be the \(q = \min \{|A|,q_{ ls }\}\) arcs having the largest congestion costs \(\Phi _a\), i.e. \(|A^*|=q\) and \(\Phi _{a^*} \ge \Phi _{a},\) for all pairs \(\{a^*,a\}\) such that \(a^* \in A^*\) and \(a \in A \setminus A^*\). For each arc \(a^* \in A^*\), in case it is tolled, its weight is increased by one unit at a time, to induce a reduction of its load. The unit-increase is repeated until either the weight reaches \(w_{max}\) or \(\Phi \) no longer improves. If no improvement in the objective function is achieved, the weight is reset to its initial value. In case the arc is not currently tolled, a new toll is installed on the arc with initial weight one, and a toll is removed from some other link tested in circular order. If no reduction in the objective function is achieved, the solution is reversed to its original state. Every time a reduction in \(\Phi \) is achieved, a new set \(A^*\) is computed and the local search restarts. The procedure stops at a local minimum when there is no improved solution changing the weights of the candidate arcs in set \(A^*\).

In the local improvement, every time a weight is changed (added by one unit, inserted or removed) the current shortest paths are updated (Buriol et al. 2008) instead of recomputed from scratch, thus saving a considerable amount of running time.

4 Computational results

In this section we present computational experiments with the models and algorithms introduced in the previous sections of this paper. Initially, we describe the dataset used in the experiments. Then, we detail three sets of experiments. The first set evaluates the mathematical models MM1 and LMM1. The second set of experiments evaluates the full model MM2 with piecewise linear function, which considers the shortest-path constraints with even split of loads. The last set of experiments evaluates the biased random-key genetic algorithm presented in Sect. 3.

The experiments were done on a computer with an Intel Core i7 930 processor running at 2.80 GHz, with 12 GB of DDR3 RAM of main memory, and Ubuntu 10.04 Linux operating system. The biased random-key genetic algorithms (BRKGA) were implemented in C and compiled with the \(\mathtt gcc \) compiler, version 4.4.3, with optimization flag -03. The commercial solver CPLEX 12.3Footnote 2 was used to solve the proposed linearizations of the mathematical linear models, while MOSEKFootnote 3 was used to solve the mathematical model MM1 (with convex objective function).

Table 1 details six synthetic instances (S1) and ten real-world instances (S2) considered in our experiments and made available by Bar-Gera (2013).

Table 1 Attributes for the instances are given in each column

To test model LMM2, we created the instances from set S1 from instance SiouxFalls of S2 by removing from SiouxFalls some of its nodes and their adjacent links as well as all OD pairs where these nodes are either origin or destination nodes. Let \(n<|V|\) be the new number of nodes. We choose to remove nodes

$$\begin{aligned} v\in ~V: v= \left\lfloor k \frac{|V|}{|V|-n}+1 \right\rfloor \, \text{ with }\quad k = 0,\ldots ,|V|-n-1. \end{aligned}$$

Let \(v, a, b \in V\) be nodes such that \(a \in OUT (v)\) and \(b \in IN (v)\). Furthermore, let \(a_t\) (\(b_t\)) and \(a_h\) (\(b_h\)) be, respectively, the tail and head nodes of links \(a\) (\(b\)). We create a link \(a'\) from \(a_h\) to \(b_t\) if there does not already exist a link between \(a_h\) and \(b_t\) and furthermore \(| OUT (b_t)|<4\) or \(| IN (a_h)|<4\). Link \(a'\) has the same characteristics (free flow time, capacity, etc.) of link \(a\). After all extensions, we remove from the network all arcs \(a \in OUT (v) \cup IN (v)\) as well as node \(v\).

4.1 Results for models MM1 and LMM1

The first set of experiments evaluates the models when solved with commercial solvers. Table 2 presents, for each instance, the objective functions \(\Phi \), and the lower and upper bounds \(\varphi ^l\) and \(\varphi ^u\), respectively.

In the first two columns after the name of the instance, we present the objective function values \(\Phi \) and the computational times for model MM1 obtained with the nonlinear solver MOSEK \(6.0\) using the modeling system GAMS.Footnote 4 A few nonlinear solvers are part of the GAMS system and we evaluated the performance of all of them. Some of them are general nonlinear solvers, and have no specific routines for convex functions. Most were not able to solve the larger instances. MOSEK presented the best performance in terms of running times and for this reason we report only the results obtained with MOSEK. The next columns present results for CPLEX 12.3 with the proposed piecewise-linear functions \(\varphi ^l\) and \(\varphi ^u\), respectively the lower and upper estimations of function \(\Phi \). In each case, we show the objective function values in columns \(\varphi ^l\) and \(\varphi ^u\), as well as \(\Phi {\{\varphi ^l}\}\) and \(\Phi {\{\varphi ^u}\}\), the values of \(\Phi \) considering the arc loads obtained by the different approximations. The computational times are reported in seconds.

Table 2 Computational results for MM1 and LMM1

From the results in Table 2, three main observations can be made. First, there are small gaps between \(\varphi ^l\) and \(\Phi {\{\varphi ^l\}}\), as well as between \(\varphi ^u\) and \(\Phi {\{\varphi ^u\}}\), i.e. both piecewise-linear functions \(\varphi ^l\) and \(\varphi ^u\) have values that are, respectively, close to \(\Phi {\{\varphi ^l\}}\) and \(\Phi {\{\varphi ^u\}}\). In a small number of cases the gap is significant and we observe that, as expected, this occurs in instances with higher average or higher maximum utilization (\(\ell _a/c_a\)), like Barcelona and Winnipeg. Second, we compare the results for models MM1 and LMM1. The gaps between \(\varphi ^l\) and \(\Phi \), and between \(\varphi ^u\) and \(\Phi \), are also small, which means that the piecewise functions have similar values to the original convex function \(\Phi \). However, for most of the instances, the computational times spent by MOSEK on the convex function are two to four orders of magnitude greater than the time spent by CPLEX on the piecewise-linear functions. The only case where solving the model with a piecewise linear function (computing \(\varphi ^u\) with CPLEX) took longer than solving the model with the convex function \(\Phi \) (using MOSEK) was for instance ChicagoSketch. However, CPLEX found good solutions quickly, and spent most of the time certifying optimality. For example, CPLEX found solutions with a gap of 3 % with respect to the optimal solution in about 650 s, while MOSEK needed more than 1,600 s to reach this gap.

The last important observation is that the MM1 model is a relaxation of MM2. Moreover, the shortest paths and even-split constraints (Eq. 14) of model MM2 add a considerable number of variables and constraints to the model. Thus, evaluating MM2 with a convex function became impracticable in terms of computational time, and for this reason no corresponding results are reported. In the next experiment we evaluate both approximations for the full model (MM2).

4.2 Results for the tollbooth problem with piecewise-linear cost (LMM2)

This set of experiments tests the performance of CPLEX on MM2, the model that includes shortest paths and even-split constraints. We run the model considering both weight functions to calculate shortest paths (SPT and SPTF) and both piecewise-linear functions introduced in Sect. 2.3.

Table 3 present results for model LMM2 when the shortest path is calculated considering only the toll tariffs (SPT), and for tariffs plus the free flow time (SPTF), respectively. For each instance, we tested several scenarios of \({\mathcal {K}}\). For each scenario we present the objective function values of approximations \(\varphi ^l\) and \(\varphi ^u\) obtained by CPLEX, the corresponding \(\Phi \{\varphi ^l\}\) and \(\Phi \{\varphi ^u\}\) values (as described in the previous subsection), the gap returned by the solver for a time limit of 1,800 s, and finally the running times in seconds. The null values (–) indicate that a feasible solution was not found within the time limit.

Table 3 Computational results for LMM2 to SPT and SPTF

Table 3 illustrates the difficulty in solving these instances with CPLEX. For most of the instances no optimal solution was found within 30 min, and for many of them not even a feasible solution was found in this time limit. A small increase in the instance size implies in a large increase in the computational effort spent to solve the model. We observe that for SPT the solver has more difficulty in finding an initial solution, and the gap returned by the solver is slightly higher in comparison with SPTF. Furthermore, the computational time is slightly reduced for SPTF, and \(\varphi ^l\) was computed slightly faster than was \(\varphi ^u\).

Results for instances SiouxFalls_06 and SiouxFalls_08 were found for \(\varphi ^l\) and \(\varphi ^u\) in a less than one second, and for this reason they were omitted from the table. On the other hand, results were omitted for SiouxFalls_16, for both piecewise-linear functions and shortest-path evaluations SPT and SPTF, since the solver was unable to find any feasible solution within the time limit.

In summary, Table 2 shows that solving the simplified model MM1, i.e. MM2 without the shortest paths computation and even-load constraints, takes a considerable time, while their corresponding linearized versions \(\varphi ^l\) and \(\varphi ^u\) are calculated very quickly for almost all cases. Table 3, on the other hand, shows that the linearizations \(\varphi ^l\) and \(\varphi ^u\) of the full model MM2 takes a long time even for small instances. Thus, these results motivated us to propose a heuristic solution to solve the tollbooth problem, and the results of the proposed biased random-key genetic algorithm are presented in the next subsection.

4.3 Results for the biased random-key genetic algorithm

This section presents results for the biased random-key genetic algorithm applied on instances from class S2. We extended the experimental study performed by Buriol et al. (2010) in which results for only three of these instances were presented. Moreover, we provide an analysis of the best solution for each combination of instance, value of \({\mathcal {K}}\), and problem type (SPT or SPTF).

To tune the parameters, a set of experiments was performed. The experiment consisted of two steps. In the first step, we determined the fixed running time for each triplet: instance (SiouxFalls, Prenzlauerberg Center, and Anaheim), value of \({\mathcal {K}}\), and problem type (SPT or SPTF). To define the fixed time, we ran the BRKGA with local search using a set of predefined parameters: population size \(p=50\), elite set of size \(p_e = 0.25p\), mutant set of size \(p_m = 0.05p\), elite key inheritance probability \(\rho _A = 0.7\), and a restart parameter \(r=10\). At every \(r\) generations we verify whether the best three individuals in the population have identical fitness (within \(10^{-3}\) of each other). If they do, then the second and third best are replaced by two new randomly-generated solutions. The BRKGA was run for at least 500 and at most 2,000 generations, stopping after 100 generations without improvement of the incumbent solution. The fixed time is defined to be the average of five independent runs.

The running time defined in the first step of the tuning phase is used in the second step to determine the best combination of parameter values. We ran the BRKGA for this fixed amount of time with parameters taken from the sets of values shown in the third column of Table 4. All combinations of parameter values were considered.

Table 4 Parameter values in tuning experiment

Given a set of triples, each consisting of an instance, a value of \({\mathcal {K}}\), and a problem type (SPT or SPTF), we run the BRKGA on each triple using all combinations of the parameters in Table 4. The relative gaps of the fitness values from each run to the best fitness over all runs for each triple is computed. We observe that using a local search in the BRKGA results in better solutions than using no local search. In the case of \(q_{ ls }=0\), the average relative gap is 29.86, while for \(q_{ ls } = 2, 5,\) and \(10\), the relative gap was 6.70, 5.96, and 5.88, respectively. Therefore, we analyze the remaining parameters considering only runs where \(q_{ ls }=10\). Table 5 shows the average relative gaps for these remaining parameters. The best observed parameter values (in bold) were \(p = 100\) for population size, \(p_e=0.15p\) for elite population size, \(p_m=0.05p\) for mutant population size, \(\rho _A=0.70\) for inheritance probability, and \(r=10\) for restart.

Table 5 Average of relative gaps obtained for different parameters

Once the parameters were set, we ran the BRKGA with local search (BRKGA \(+\) LS) with a time limit of 3,600  (except for ChicagoSketch, the largest instance, for which we ran with a time limit of 7,200 s). We allow the maximum number of generations to be 2,000, and the maximum number of generations without improvement to be \(100\).

Table 6 shows averages over five runs of BRKGA \(+\) LS and a comparison between SPT and SPTF. For each value of \({\mathcal {K}}\), it lists the best solution value (Best \(\Phi \)) over the five runs, average fitness value (Avg \(\Phi \)), standard deviation (SD), and average running time in seconds.

Table 6 Detailed results of SPT and SPTF for BRKGA \(+\) LS

The first observation is that as the value of \({\mathcal {K}}\) increases, the value of \(\Phi \) tends to decrease and have less variance. In fact, in most cases, the best solutions were found for \({\mathcal {K}} \ge \frac{|A|}{2}\). With small \({\mathcal {K}}\) it is easy for flow to bypass tolled arcs, which impairs traffic engineering. On the other hand, the search space increases considerably for larger \({\mathcal {K}}\) values, making the problem hard to solve. Since there are \(\left( {\begin{array}{c}|A|\\ {\mathcal {K}}\end{array}}\right) \) configurations for the location of \({\mathcal {K}}\) tolls and for each configuration each toll can have 20 different values, then the size of the solution space is \(\sigma ({\mathcal {K}}) = \left( {\begin{array}{c}|A|\\ {\mathcal {K}}\end{array}}\right) 20^{\mathcal {K}}\). Thus, the solution space size is much larger for \({\mathcal {K}} \ge \frac{|A|}{2}\) than for \({\mathcal {K}} < \frac{|A|}{2}\). Furthermore, even though the maximum of \(\sigma ({\mathcal {K}})\) is achieved for a value of \({\mathcal {K}} < |A|\), in all of the instances, \(\sigma ({\mathcal {K}}') > \sigma ({\mathcal {K}})\) for all \({\mathcal {K}}' > {\mathcal {K}}\), where \(\mathcal {K'}\) is the largest \({\mathcal {K}}\) value tested. For example, for the SiouxFalls instance, for which the largest value of \({\mathcal {K}}\) tested was 70, \(\sigma ({\mathcal {K}}) = \left( {\begin{array}{c}76\\ {\mathcal {K}}\end{array}}\right) 20^{\mathcal {K}}\), which achieves a maximum for \({\mathcal {K}} = 73\).

In most entries of Table 6 the standard deviation is small, showing robustness of the algorithm. The table also shows that for small values of \({\mathcal {K}}\), SPTF has smaller \(\Phi \) than SPT. This occurs because, for small values of \({\mathcal {K}}\), SPT has many zero-cost paths, making it difficult to influence flow distribution with tolls.

Table 7 Approximation of the lower bound with tolls

Table 7 presents, for each instance, the shortest average user travel time using tolls obtained by BRKGA in comparison with the optimal distribution flow obtained by solving linear program MM1. An optimal solution for MM1 is a lower bound for the tollbooth problem. The results show that with tolls it is possible to obtain a near-optimal flow distribution.

We next explore the main characteristics of the near-optimal solutions found by BRKGA \(+\) LS. For the best solution found in the five runs, Table 8 lists the average number of paths for each OD pair (#Path), the average number of hops among all OD shortest paths (#Hop), the average sum, over all OD pairs, of the tariffs on the shortest paths (#Toll), and the average number of distinct arcs used, over all OD pairs (#UArc).

Table 8 Detailed results of best solution found by BRKGA \(+\) LS algorithm

Columns #Path in Table 8 show that when \({\mathcal {K}}\) increases, a strong reduction in the number of shortest paths is observed for SPT, while for SPTF, the reduction is not as pronounced. Again, this occurs because of the large number of zero-cost paths present in SPT when \({\mathcal {K}}\) is small. Of these, traffic flows on one or more paths of minimum hop count. On the other hand, for SPTF, the inclusion of free flow time to the arc weight leads to paths of distinct cost, with a few of minimum cost (in many cases a single minimum cost path).

Columns #Hop in Table 8 show the minimum hop count distance between OD vertices. For large values of \({\mathcal {K}}\) we observe that as the number of installed tolls increases, the hop count decreases in both SPT and SPTF. This occurs because with a large number of tolls it is possible to do better traffic engineering. For small values of \({\mathcal {K}}\) in SPT, the hop count is small because it corresponds to a minimum hop-count path among the zero-cost shortest paths.

The columns #Toll in Table 8 show the average number of tolls that a user traverses on an OD shortest path. Clearly, this value increases with \({\mathcal {K}}\). Since an increase in \({\mathcal {K}}\) leads to a decrease in the number of shortest paths (column #Path), the number of distinct arcs (column #UArc) consequently decreases.

5 Conclusions

In this paper we presented an extensive study of the tollbooth problem. Two mathematical formulations for different versions of the tollbooth problem were presented, as well as linearizations that give lower and upper bounds for their objective functions. Computational tests were conducted taking into account the original and the linearized models, applied on two sets of synthetic and real-world instances. Moreover, a random-key genetic algorithm was run for this same set of instances.

When analyzing the results for the mathematical models, we concluded that the model MM2, which includes shortest paths and even-split constraints, has a large number of variables and constraints, making it difficult to be solved with general-purpose solvers, even when we limit ourselves to small instances. On the other hand, if shortest paths and even-split constraints are removed from the model, giving rise to a simplified version of the problem, the linearized versions of the problem can be solved efficiently with CPLEX. However, results obtained with the biased random-key genetic algorithm for the complete model shows it has a good tradeoff between computation time and solution quality on this problem.

Finally, considering that users naturally take the least costly path, toll setting can be used to better distribute the flow in the network and consequently reduce traffic congestion.