1 Introduction

There are several variants of the spanning tree problem that are useful to model problems arising in communication networks. For example, the network may be required to connect a specified subset of nodes (Steiner Tree Problem [13]); if a label is assigned to each link, one could be interested in looking for homogeneous subgraphs of the network (Minimum Labelling Spanning Tree Problem [4, 6, 7]); when costs are related to the connectivity of a node then looking for a spanning tree with constraints on the degree of the nodes could be of interest ([13]). The problem addressed in this paper is related to this last mentioned variant and it is known in the literature as the Spanning Tree with Minimum Number of Branch Vertices Problem ([5, 8]). It is motivated by some applications in optical networks where it is useful to connect the nodes so that the number of connections of each node is limited.

In particular this problem arises from a practical application in the context of all-optical networks which are a class of backbone wide-area networks (WAN) where connections are routed by intermediate nodes in the optical domain without electronic conversion [18]. The main advantage of such a networking is the great expansion of the bandwidth gained by the wavelength-division multiplexing (WDM), a multiplexing technique that permits each fiber to carry many independent signals with different wavelengths.

Let us consider the simple example depicted in Fig. 1. The optical network is composed of three stations, namely A, B and C and four intermediate nodes, namely, 1, 2, 3 and 4. Station A is connected to node 2, station B to node 3 and station C to node 1. Roughly speaking, we could define a station as a node in the network where the traffic is aggregated and the opto-electronic conversions are carried out. A Wavelength Routed optical network is a network where the intermediate nodes are able to route the signal according to the assigned bandwidth. Suppose that in our example network station A needs to communicate with station B and station C. This communication could be carried out by 2 different unicast connections (i.e., single source–single destination) that would use two different bandwidths, or by means of a multicast connection (single source–multiple destinations) using the same bandwidth. In the former case, a Routing and Wavelength Assignment (RWA) algorithm defines a light-path (p,w), where p is the path used to route the connection, i.e. the set of fiber links from the source to the destination, and w is the wavelength of the connection allowing communication between the two stations for each unicast connection. In our example, we would have two light-paths, the first one connecting station A to station B with intermediate nodes 2, 4, 3 and using bandwidth w AB ; the second one connecting station A to station C with intermediate nodes 2, 4, 1 and using bandwidth w AC . In the latter case, a multicast connection from a source to a set of destinations can be represented by a couple (t,w), where t is a directed tree used for signal routing and w is the connection wavelength. Such a couple is referred to as a light-tree [19] whose practical implementation requires the intermediate nodes (in our case node 4) to have the ability to split the input signal to multiple outputs. A light-splitting switch is the optical device needed to perform such a task. Such devices are rather expensive and determine significant power loss of the output signals. Therefore, a design method aimed at reducing the number of light-splitting switches is in order.

Fig. 1
figure 1

An example of optical network

Note that, light-splitting switches are located at branch nodes, that is at the nodes of the tree whose degree is greater than 2, hence this problem can be formalized as follows: given a connected graph G representing a network, look for a spanning tree of G with the minimum number of branch vertices (in the sequel the problem will be referred to as MBV-Minimum Branch Vertices Problem). Observe that, in such a context, obviously, a Hamiltonian path, if any, would ensure complete connection requiring no light-splitting switches at all. Of course, in real-case network, it is not known in advance whether such a Hamiltonian path exists or not.

The problem was introduced by Gargano et al. [5], where the computational complexity was analyzed and the problem was proved to be NP-hard. Cerulli et al. [8] provide some heuristics suitable for general instances and compare the results with those provided by a standard solver. In this paper we further study the problem by analyzing different lower and upper bounds obtained by considering different relaxations of four different formulations of the problem, namely, a Lagrangian relaxation, a continuous relaxation and a mixed integer-continuous relaxation. The Lagrangian dual is solved by means of both a standard subgradient algorithm and an ad-hoc dual ascent procedure. A useful follow-up of tackling the Lagrangian dual is the possibility of getting a feasible solution of the original problem with no extra costs. The quality of the corresponding upper bound is tested by comparison either with the optimal solution, whenever available, or with the feasible solution provided by some existing heuristic algorithms [8].

The paper is organized as follows. In Sect. 2 we describe the problem and provide the different mathematical formulations. In Sect. 3 the Lagrangian relaxation is discussed together with the Lagrangian dual. Our dual ascent algorithm is presented in Sect. 4, while several computational results are discussed in Sect. 5. Some concluding remarks are discussed in Sect. 6.

2 Mathematical formulations

The first provided formulation is an IP mathematical model. Due to the exponential number of constraints this formulation is not suitable to be tested on instances of significant size; we used it since it is particularly suitable for adopting our Lagrangian relaxation approach which will be described in Sect. 3. In addition, we consider: a single commodity flow formulation (SC), a multicommodity flow formulation (MC) and a Miller-Tucker-Zemlin formulation (MTZ).

2.1 Integer programming formulation (IP)

Let us consider an undirected network G=(V,E), where V denotes the set of n vertices and E the set of m edges. The set of variables for the classical (IP) model are the following:

  • binary variable x e for each eE that assumes value equal to 1 if edge e is selected and value equal to 0 otherwise;

  • binary variable y v for each vV that assumes value equal to 1 if vertex v is of the branch type (that is its degree, as vertex of the tree, is greater than two), and is equal to 0 otherwise.

Then, the (IP) formulation of MBV is the following:

(2.1)
(2.2)
(2.3)
(2.4)
(2.5)
(2.6)

where for any given subset of vertices S we denote by E(S) the set of edges having both the endpoints in S. Moreover we denote by A(v) the set of incident edges to vertex v and by δ v its size, i.e. δ v =|A(v)|. The objective function (2.1) requires the minimization of the total number of branch vertices. Constraint (2.2) requires to select exactly n−1 edges in the spanning tree. Constraints (2.3) guarantee that the edges in the spanning tree cannot form cycles. The coupling constraints (2.4) ensure that each variable y v is set to 1, whenever v has more than two incident edges belonging to the optimum spanning tree.

2.2 Single commodity flow formulation (SC)

For the flow-based formulations we consider the directed version of the graph where two oriented arcs (u,v) and (v,u) are associated with each edge (u,v)∈E. We denote by E′ this new set of arcs. Let us denote by A +(v) and A (v), the set of arcs in E′ outgoing from v and incoming into v, respectively. A spanning tree T of G can be found by sending from a source vertex sV one unit of flow to every other vertex vV∖{s} of the graph [16]. We introduce both the binary variable x uv for each (u,v)∈E′ that assumes value equal to 1 if arc (u,v) is selected and value equal to 0 otherwise, and the flow variable f uv for each (u,v)∈E′ representing the flow going from vertex u to vertex v.

The single-commodity flow formulation (SC) of MBV is then the following [8]:

(2.7)
(2.8)
(2.9)
(2.10)
(2.11)
(2.12)
(2.13)
(2.14)
(2.15)

The objective function (2.7) to be minimized is the same as in the (IP) model and counts the number of branch vertices in the spanning tree. Constraints (2.8) ensure that each vertex in the optimal spanning tree has exactly one incoming arc. Equations (2.9) and (2.10) balance the flow at each vertex and ensure the connectivity of any feasible solution. Constraints (2.11) are coupling constraints linking the flow variables f with the binary variables x. The coupling constraints (2.12) ensure each variable y v to be equal to 1, whenever v has more than two adjacent arcs belonging to the optimum spanning tree.

We considered in our experiments two different relaxations of this model. The first one is a mixed integer-continuous relaxation obtained by replacing constraints (2.14) with the new set of constraints 0≤x uv ≤1, ∀(u,v)∈E′ (denoted in the sequel (2.14′)). The second considered relaxation is a continuous relaxation obtained by replacing constraints (2.14) by constraints (2.14′) and constraints (2.13) by the new set of constraints 0≤y v ≤1, ∀vV (denoted in the sequel (2.13′)). Additionally (as it was also done, for example, in [1] and [2]), in order to improve the quality of the returned bounds of both the relaxations we added the following set of constraints:

$$ x_{uv}+x_{vu} \leq1\quad \forall (u,v) \in E' $$
(2.16)

imposing that for each edge (u,v)∈E of the original graph, at most one among the two associated arcs (u,v) and (v,u) will be selected. The single commodity flow model (SC) will be solved also to optimality by means of the Cplex solver to get the optimum solution and to evaluate the quality of the lower bounds obtained by the different relaxations and of the upper bounds provided by our algorithms.

2.3 Multi commodity flow formulation (MC)

MBV can be also formulated in terms of multicommodity flow (MC) [14]. We introduce a set of flow variables \(f_{uv}^{k}\) defined on each (u,v)∈E′ and kV which represents the flow along the arc (u,v) relative to the path from the root s to vertex k. We also consider any given vertex s to be the source as in the previous model.

(2.17)
(2.18)
(2.19)
(2.20)
(2.21)
(2.22)
(2.23)
(2.24)
(2.25)
(2.26)

The objective function (2.17) is the same as in the previous models. Constraints (2.18) ensure each vertex in the spanning tree has exactly one incoming arc. Flow conservations constraints (2.19)–(2.21) ensure that the root vertex s is connected to every vertex in the graph. Constraints (2.22) are logical constraints linking the flow and the arcs variables. Constraints (2.23) corresponds to constraints (2.12) of the (SC) model. The mixed continuous relaxation of this model is obtained by replacing constraints (2.24) by (2.14′) and by adding constraints (2.16). The continuous relaxation of this model is obtained by replacing constraints (2.24) by (2.14′), constraints (2.25) by (2.13′) and by adding constraints (2.16). Note that, the polyhedron defined by (2.18)–(2.22), (2.14′) and (2.26) describes the convex hull of the incidence vectors of the spanning tree, hence, this relaxation will be used in our experiments to compute the optimum solution of the Lagrangian dual described in the next section. This solution will be then used to evaluate the quality of the bounds provided by our dual ascent procedure and by the subgradient algorithm in solving the Lagrangian dual.

2.4 Miller-Tucker-Zemlin formulation (MTZ)

Finally, we provide a Miller-Tucker-Zemlin (MTZ) formulation for the problem. We need an additional set of variables u v defined on each vertex vV that assign a label to each vertex of the graph. In particular, such a labeling ensures any directed arc that belongs to the optimum spanning tree goes from a vertex with a lower label to a vertex with a higher label. Such a label assignment is aimed at subtour elimination. The basic MTZ formulation for our problem is the following:

(2.27)
(2.28)
(2.29)
(2.30)
(2.31)
(2.32)
(2.33)
(2.34)
(2.35)
(2.36)

Constraints (2.29)–(2.32) ensure that: (i) node v is assigned a label 1≤u v n−1 (constraints (2.30)–(2.31)), (ii) the root node s has label equal to zero (constraints (2.32)), and (iii) each selected arc (u,v) is such that u u <u v (constraints (2.29)). Note that the above mentioned set of constraints does not associate a unique set of node labels to a given spanning tree (see for example Fig. 2). Indeed, for a given selected arc (u,v) variable u v may assume a value strictly greater than u u +1. However, it is possible to lift these constraints to ensure u v =u u +1 so that u v indicates the position of vertex v in the spanning tree, i.e. the number of arcs in the path between the root s and vertex v. In order to face this issue, we added some lifting constraints proposed by Desrosiers and Laporte [9] and by Gouveia [12]. In particular, Desrosiers and Laporte proposed the following lifted modification of constraints (2.29):

$$ (n-2)x_{vu} + n x_{uv} + u_u \leq u_v+ (n-1) \quad\forall (u,v) \in E',\ u\neq s,\ v\neq s $$
(2.37)

Constraints (2.37) ensure that for each arc (u,v) in the spanning tree with us and vs we have u v =u u +1. This is due to the combined effect of these constraints for arcs (u,v) and (v,u) when x uv =1.

Fig. 2
figure 2

An example of multiple labeling assignment corresponding to the same spanning tree

Note that, if an arc (s,v) is selected to be in the final tree, constraints (2.37) do not ensure u v =u s +1. For this reason we consider also the following inequalities proposed by Gouveia [12] to lift constraints (2.30)–(2.31) for arcs incident to the root node:

(2.38)
(2.39)

They ensure that variable u v is equal to 1 in case arc (s,v) is selected, and u v ≥2 otherwise. Note that these constraints were proposed in [12] for the minimum spanning tree problem with hop constraints, that is the problem of finding a spanning tree of minimum cost with the additional constraints that each path starting from the root node has no more than p arcs; however, by setting p=n these inequalities are valid for our problem too. Some other lifting constraints were proposed in [12], however, we did not add them to our formulation since when considered together with constraints (2.37), (2.38) and (2.39) they did not affect the quality of the lower bound for both the considered relaxations.

The final MTZ formulation we considered for our problem is reported below for clarity of exposition:

(2.27)
(2.28)
(2.37)
(2.38)
(2.39)
(2.32)
(2.33)
(2.34)
(2.35)
(2.36)

The continuous relaxation of this formulation is obtained by replacing (2.34) and (2.35), with (2.14′) and (2.13′), respectively, and by adding the set of constraints (2.16). While the mixed integer-continuous relaxation is obtained by replacing constraints (2.34) by (2.14′) and by adding the set of constraints (2.16). Next section presents the Lagrangian relaxation of problem (IP) obtained by relaxing the coupling constraints (2.4).

3 Relaxing the (IP) formulation: the Lagrangian dual

We focus in this section on the (IP) formulation of (MBV) presented in the previous section and study its Lagrangian relaxation [17]. Solution of the corresponding Lagrangian dual can be obtained by means of a standard subgradient approach or by means of an appropriate dual ascent procedure. We implemented and tested both of them.

3.1 Relaxing the coupling constraints

Before introducing the Lagrangian relaxation, note that whenever δ v =|A(v)|≤2, constraint (2.4) is trivially satisfied for any choice of the decision variables satisfying the remaining constraints. Thus we rewrite it in the form:

$$ \sum_{e\in A(v)}x_{e}-2\leq \delta_v y_v, \quad \forall v \in V' $$
(3.1)

where V′={vV|δ v >2}.

By introducing the Lagrangian multipliers λ v ≥0, vV′, and relaxing constraints (3.1), we obtain the following relaxed problem (LR):

$$ z(\lambda)= \min\sum_{v \in V'}y_v + \sum _{v \in V'}\lambda_v\biggl(\sum _{e\in A(v)}x_{e}-2 -\delta_v y_v\biggr) $$
(3.2)

subject to constraints (2.2)–(2.3) and constraints (2.5)–(2.6), where the multipliers λ v ≥0, vV′ have been grouped into the vector λ of appropriate dimension. Note that the above problem is formally identical to the problem obtained by relaxing the original constraints (2.4) and letting λ v =0,vVV′. Function z(λ) will be referred to as the Lagrangian function.

By simple manipulations, z(λ) may be rewritten as:

$$ z(\lambda)= -2\sum_{v \in V'}\lambda_v + z_1(\lambda)+z_2(\lambda) $$
(3.3)

where z 1(λ) and z 2(λ) are defined as follows:

(3.4)

and

(3.5)

The Lagrangian dual problem (LD) is then:

$$ \max_{\lambda\geq0}z(\lambda)= \max_{\lambda\geq0} -2\sum _{v \in V'}\lambda_v + z_1( \lambda)+z_2(\lambda) $$
(3.6)

It is well known that z(λ) is a piecewise linear concave function; a subgradient of z at λ is the vector \(g \in\mathbb{R}^{|V'|}\) whose generic component g v is

$$ g_v = \sum_{e\in A(v)}x_{e}( \lambda)-2 - \delta_{v} y_{v}(\lambda),\quad v\in V' $$
(3.7)

The subgradient method [17] to maximize z(λ) consists in a sequence of moves, according to appropriately calculated stepsizes, along the subgradient direction. The pseudocode of the subgradient algorithm is given next. The algorithm requires setting of the precision parameter ϵ k and of the maximum number of iterations k max .

Algorithm 3.1

(Subgradient)

  1. 1.

    Set \(\lambda_{v} = \max_{u \in V^{d}}\{\frac{1}{\delta_{u}}\},\ \forall v \in V'; k=0 \)

  2. 2.

    Solve the Lagrangian relaxation by solving separately problems (3.4) and (3.5) and obtain \(z(\lambda^{(k)})=-2\sum_{v \in V}\lambda_{v}^{(k)}+z_{1}(\lambda^{(k)})+z_{2}(\lambda^{(k)})\), and, also, (x(λ (k)),y(λ (k))). Keep z EUR , the best solution value found so far.

  3. 3.

    Calculate the subgradient g (k), whose v-th component is \(g_{v}^{(k)}= \sum_{e \in A(v)}x_{e}(\lambda^{(k)})-2-\delta_{v}y_{v}(\lambda^{(k)})\)

  4. 4.

    Calculate the stepsize t k : \(t_{k}=\epsilon_{k} \frac{z_{EUR}-z(\lambda^{(k)})}{\parallel{g^{(k)}}\parallel^{2}}\)

  5. 5.

    Set

    $$\lambda_v^{(k+1)}= \begin{cases} 0 &\mbox{if}\ \lambda_v^{(k)}+t_kg_v^{(k)}<0 \\ \lambda_v^{(k)}+t_kg_v^{(k)} &\mbox{if}\ 0\leq \lambda_v^{(k)}+t_kg_v^{(k)}\leq\frac{1}{\delta_v} \\ \frac{1}{\delta_v} &\mbox{otherwise} \end{cases} $$
  6. 6.

    If kk max Stop else k=k+1 and iterate.

We remark that in general the subgradient method does not guarantee monotonicity of the objective function z(λ); such monotonicity is, instead, typical of the dual ascent method we are introducing next.

The effectiveness of any dual ascent procedure lies on the possibility of easily solving problem (LR) for any fixed set of multipliers, that is to calculate z(λ), which requires evaluation of both z 1(λ) and z 2(λ).

In particular z 1(λ) can be calculated by solving a minimum spanning tree problem where the weight of edge e=(u,v) is λ u +λ v , while z 2(λ) is simply obtained by inspection, that is by setting y v =1 whenever 1−δ v λ v ≤0 and y v =0 otherwise:

$$z_2(\lambda)= \sum_{v \in V'}(1- \delta_v\lambda_v)_{-} $$

where we define:

$$(a)_{-}=\min\{0,a\} $$

Now let {x e (λ)} and {y v (λ)} be any pair of optimal solutions to z 1(λ) and z 2(λ), respectively. If they constitute a feasible solution for the original problem, then such a solution is also ϵ-optimal for ϵ defined as:

$$\epsilon= \sum_{v \in V'} \lambda_v \biggl(2 + \delta_vy_v(\lambda) - \sum _{e\in A(v)}x_{e}(\lambda)\biggr) $$

On the contrary, let us suppose that {x e (λ)},{y v (λ)} do not constitute a feasible solution for (IP), then there exists at least one node vV′ such that the corresponding constraint (3.1) is not satisfied; this means that for such node v the following Case 1 occurs:

  • Case 1: ∑ eA(v) x e (λ)>2 and y v (λ)=0;

On the other hand, it might be also happen that there exists some node v for which we have:

  • Case 2: ∑ eA(v) x e (λ)≤2 and y v (λ)=1;

The two above described cases are the basis for our multiplier-updating strategy in the dual ascent procedure, which is aimed at obtaining a greater value of the objective function of the Lagrangian dual problem and, therefore, a better bound for the objective function of (IP) by updating the multipliers. The proposed dual ascent procedure, starting from any initial setting of the multipliers, works by updating one multiplier at the time as explained in the next subsection.

3.2 Multipliers update

The multipliers setting and updating process is based on the following property

Theorem 3.2

The optimal multiplier vector λ of (LD) satisfies the condition

$$\lambda^*_v \leq \frac{1}{\delta_v}, \quad \forall v \in V' $$

Proof

Suppose that at the optimum of the Lagrangian dual (3.6) for some v′∈V′ we have \(\lambda^{*}_{v'} > \frac{1}{\delta_{v'}}\). Observe that, corresponding to such a value of the multiplier, we have y v(λ )=1. Let \(\sigma= \lambda^{*}_{v'} - \frac{1}{\delta_{v'}} >0\) and let us decrease the value of the multiplier by the quantity σ obtaining the new value: \(\lambda_{v'}^{new}= \lambda^{*}_{v'}- \sigma\). Let us evaluate the new value of the corresponding objective function z(λ) given in Eq. (3.2). Note that, the corresponding optimal value of y v is still equal to 1 when evaluated considering the new value of the multiplier; moreover, since any feasible choice of the variables x e is such that ∑ eA(v′) x e −2−δ v≤−2, then we have the objective function value of the relaxed problem in such new multiplier setting increases at least by 2σ, which contradicts optimality of λ . □

From Theorem 3.2, the following remark can be stated:

Remark 3.3

In the sequel we assume that every multiplier λ v can only range in the closed interval \([0,\frac{1}{\delta_{v}}]\).

Suppose now that for any given choice of the multiplier vector λ, Case 2 (described in the previous section) occurs for any node v′∈V′, that is:

$$\sum_{e\in A(v')}x_{e}(\lambda)-2 > 0 \quad \hbox{and}\quad y_{v'}(\lambda)=0 $$

Since y v(λ)=0, then we have \(\lambda_{v'} < \frac{1}{\delta_{v'}}\) hence in this case it is possible to increase λ v while keeping y v=0. Indeed, recalling the expression of the subgradient (3.7), the v′-th component of g is g v=∑ eA(v′) x e (λ)−2>0. Consequently, we consider the case of modifying just one multiplier, in particular of increasing only the multiplier λ v. This corresponds to move along the direction e v, that is along the v′-th unit vector and it is clear that the necessary ascent condition g T e v≥0 is satisfied. Consequently, it is worth assuming such direction suitable for a line search process. Observe that by Theorem 3.2 the maximum increase to be considered for multiplier v′ is \(\frac{1}{\delta_{v'}}- \lambda_{v'}\).

Let us examine now when for a given vertex v′ Case 1 occurs, that is for any given choice of the multiplier vector λ there is some vertex v′∈V′ such that:

$$\sum_{e\in A(v')}x_{e}(\lambda)-2 \leq0 \quad \hbox{and}\quad y_{v'}(\lambda)=1. $$

In this case we have \(\lambda_{v'}= \frac{1}{\delta_{v'}}\) and by Remark 3.3 a reduction of λ v is to be considered as a possible way to increase z. In fact, taking into account the subgradient (3.7), we have g v<0 and, moving along −e v the necessary ascent condition −g T e v≥0 is satisfied and −e v is a possible ascent direction for z.

Based on the above observations, we are now ready to provide the detailed description of our dual ascent procedure.

4 The algorithm

Our algorithm is an ascent method which is equipped with a line search along a direction that can only assume the form either of the unit vector e v , for some vV′, (increase of just the multiplier λ v ) or of −e v (decrease of just the multiplier λ v ).

As usual in nonsmooth optimization algorithms [15], the selected search direction cannot be guaranteed in general to be an ascent one, possible failure of the line search has to be taken into account. More complex nonsmooth optimization algorithms capable to handle such difficulty are described in [10, 11].

In particular we adopt a line search procedure of the Armijo type, which can provide two possible outputs:

  1. 1.

    Serious step (ascent achieved), with consequent update of the current multiplier vector;

  2. 2.

    Null step (no significant ascent achieved), with no current multiplier vector update.

We describe separately the dual ascent and the line search procedures.

Algorithm 4.1

(Dual ascent)

  1. 1.

    Select λ v in the interval \([0,\frac{1}{\delta_{v}}],\ \forall v \in V' \)

  2. 2.

    Solve (LR) for the initial setting of the multiplier vector λ; let (x(λ),y(λ)) be the optimal solution.

  3. 3.

    Define V +={vV′|∑ eA(v) x e (λ)>2 and y v (λ)=0}.

  4. 4.

    If V +=∅ go to step 6. Else randomly select v′∈V + and perform the line search along the direction e v.

  5. 5.

    If the line search exit is “Serious step” update the multiplier vector by setting λ=λ +, the output of the line search. Update also the solution (x(λ),y(λ)) of the current Lagrangian relaxation. Return to step 3. Else set V +=V +−{v′} and return to step 4.

  6. 6.

    Define V ={vV′|∑ eA(v) x e (λ)−2−δ v y v (λ)<0}.

  7. 7.

    If V =∅ STOP. Else randomly select v′∈V and perform the line search along the direction −e v.

  8. 8.

    If the line search exit is “Serious step” update the multiplier vector by setting λ=λ +, the output of the line search. Update also the solution (x(λ),y(λ)) of the current Lagrangian relaxation. Return to step 3. Else set V =V −{v′} and return to step 7.

We state now the line search algorithm. The input direction is either e v or −e v according to the fact that the algorithm is called, respectively, at step 4 or at step 7 of the dual ascent. We indicate the search direction by d v. Observe that the maximum positive allowable stepsize t max is equal to \(\frac{1}{\delta_{v'}}-\lambda_{v'}\) whenever d v=e v, while it is equal to λ v whenever is d v=−e v.

The algorithm requires initialization of the following parameters:

  1. 1.

    Null step threshold \(\bar{t}\);

  2. 2.

    Ascent parameter 0<p<1;

  3. 3.

    Step reduction parameter 0<σ<1.

Algorithm 4.2

(Line search)

  1. 1.

    Set t=t max . If \(t \leq\bar{t}\) declare “Null step” and exit. Else calculate z′, a majorizing estimate of the directional derivative of z(λ) along d v, according to the formula:

    $$z'= \begin{cases} \sum_{e \in A({v'})} x_e (\lambda)-2 &\mbox{if}\ d_{v'} = e_{v'} \\ 2+\delta_{v'}-\sum_{e \in A({v'})} x_e (\lambda) &\mbox{if}\ d_{v'} = -e_{v'} \end{cases} $$
  2. 2.

    Set λ +=λ+td v. Solve the Lagrangian relaxation for {z(λ +),x(λ +),y(λ +)}.

  3. 3.

    If

    $$z\bigl(\lambda^+\bigr) \geq z(\lambda)+ ptz' $$

    declare “Serious step” and exit, returning the updated multiplier vector λ +.

  4. 4.

    Set t=σt. If \(t < \bar{t}\) declare “Null step” and exit, else return to step 2.

To verify that the proposed procedure terminates, we observe first that it is z′≥1, and, consequently, every time a serious step takes place an increase in the objective function z of at least \(p\bar{t}\) is achieved. Since z(λ) is bounded from above, only a finite number of such bounded-away-from-zero increases may occur. Termination hence follows by considering that, whenever ascent cannot be achieved, and consequently no multiplier vector update occurs, only a finite number of calls of the line search algorithm can take place.

Notice that, at each iteration the solution of (LR) provides a spanning tree of the input graph and therefore, an upper bound to the problem. Hence, at termination, the algorithm returns both a lower bound and the best upper bound among all the ones computed during the performed iterations.

5 Computational experiments and analysis of the results

We compare the bounds obtained by our Lagrangian approaches and by the relaxations of the proposed mathematical formulations on a wide set of experiments. We remark that the optimal solutions which are used for comparison purposes in this section are obtained by solving to optimality (when possible) the Single Commodity Flow formulation (2.7)–(2.15).

The main aim of our experiments is to gain insight on the trade-off between quality of the returned solutions and required computational time for all proposed methodologies which can be useful to decide which one of them is eligible in different scenarios of application (i.e., simply getting a lower bound, getting a good feasible solution in reasonable amount of time, embedding the relaxation into an exact algorithm such as Branch and Bound).

Our discussion is divided into two parts: the first one is devoted to the analysis of the procedures proposed to get a lower bound of the problem, while the second part is focused on the comparison of different algorithms able to return feasible solutions for the problem.

Hence, in the first part, we compare the subgradient (SG), the Dual Ascent (DA), the continuous and the mixed relaxations of the proposed models.

In the second part we compare the best feasible solutions returned during execution of both subgradient (SG) and Dual Ascent (DA), against those provided by three other heuristics drawn from the literature [8].

5.1 Scenarios generation and parameter settings

We generated sparse graphs of different size n and different number of edges m. We need to point out that the density of the graph is relevant to make the comparison among the different lower bounds meaningful. Indeed, we noticed that for instances where the density d of the graph was greater than 0.2 (where \(d=\frac{m}{\frac{n(n-1)}{2}}\)), the lower bounds obtained by the continuous LP relaxation of all the models were in most of the cases equal to zero, which means that the optimal solution is very probably equal to zero. Hence, our benchmark instances are sparse graphs (i.e. d<0.2) to guarantee a significant number of branch vertices in the optimal tree and therefore significant lower bound values. Parameter m has been generated according to the following formula: \(m= \lfloor(n-1)+ i\times1.5\times\lceil\sqrt{n}\,\rceil\rfloor\) with i=1,2,3,4,5. Small instances correspond to a range of n between 20 and 180, while large instances correspond to a range from 200 to 500. The total number of different scenarios is 80. We generated five instances for each scenario, hence our numerical experiments are related to a total number of 400 instances. All the instances, together with the detailed results, can be downloaded from authors’ website [20]. Both the subgradient and the dual ascent algorithms have been coded in C and run on a 2.4 GHz Intel Core2 Q6600 processor. All the proposed formulations have been coded in AMPL and solved using CPLEX 11. Since the choice of the root node could affect the quality of the lower bounds, in order to make a safe comparison we always chose the same vertex, namely vertex 1, as a source vertex. A time limit equal to 3600 seconds has been defined to solve the mathematical models. The parameters of the dual ascent algorithm have been set to p=0.1, σ=0.05 and \(\overline{t}=0.02\). The precision parameter ϵ k of the subgradient algorithm has been fixed to 0.1 and the maximum number of iterations k max to 1000.

5.2 Lower bound analysis

All the results are given in Tables 17 and Figs. 3 and 4.

Fig. 3
figure 3

Lower bound analysis: average and standard deviation of the relative gap from the optimum on the small instances

Fig. 4
figure 4

Lower bound analysis: average and standard deviation of the relative gap from the optimum on the large instances

Table 1 Lower bound analysis on small instances

In particular, Tables 1 and 2 compare the quality of the lower bounds and the required computational time of SG, DA and of the different relaxations of the models, for small and large instances, respectively. In each table, the first three columns report the characteristics of each scenario: scenario ID, the number of nodes (n) and the number of arcs (m). For each algorithm and each model, the tables report the average values taken on the five tested instances of LB (the calculated lower bound), Time (the computation time in seconds) and GAP (the relative gap between the returned lower bound and the optimum solution, when available). In particular, GAP on a given instance is computed as the ratio \(\frac{\mathit{Optimum} -\mathit{LB}}{\mathit{Optimum}}\), where Optimum is the optimum value of the problem. The average of the optimal values over the five instances is reported, when available, together with the average time. Whenever at least one instance of the scenario has not been solved to optimality by solver CPLEX within the time limit of 3600 s, the term N.A. (Not Available) appears in the tables. In such case, of course, no GAP value is reported for all the tested algorithms. Since the optimum solution was not available for many of the large instances (and hence the GAP could not be computed for many scenarios), we report in Table 10 the optimum value (when available), the best lower and the best upper bound obtained on each of the 175 large instances. This table is useful to perform comparisons with other algorithms (the interested reader can also refer to the detailed results related to the entire set of instances available on authors’ website [20]).

Table 2 Lower bound analysis on large instances

We remark that symbol N.A. is also used when a tested algorithm has been unable to calculate the lower bound in the given time limit for at least one of the instances of the scenario. In fact, such a case occurs actually only for the mixed continuous relaxation of MC. The results related to the subgradient algorithm and to the dual ascent algorithm are reported as SG and DA, respectively. SC-Mixed and SC-Cont contain the results provided by the mixed and the continuous relaxation of the single-commodity formulation, respectively; MTZ-Mixed and MTZ-Cont are the mixed and the continuous relaxation of the Miller-Tucker-Zemlin formulation, respectively. Finally, MC-Cont and MC-Mixed report the results of the continuous relaxation and the mixed relaxation of the multicommodity-flow formulation, respectively, while SC gives the results of the exact solution (when available). To summarize the behavior of the solver CPLEX on the tested instances we remark that all the 225 small instances and only 128 out of 175 large ones have been solved to optimality within the time limit. As far as the MC-Mixed is concerned 209 of the small instances and only 44 out of 175 large ones have been solved to optimality within the time limit.

In Figs. 3 and 4 we report the average and the standard deviation of the relative gap computed on all the instances (for which the optimum was available).

We report, finally, Tables 3 and 4 for small instances and Tables 5 and 6 for large instances, where complete comparisons of the performances, both in terms of quality of the lower bound and computational time, is provided. The generic cell (i,j) in the tables reports the percentage of instances for which the procedure associated to row i has worked strictly better than the one associated to column j.

Table 3 Lower bound analysis: dominance with respect to quality of the solution on small instances. The generic cell (i,j) in the table reports the percentage of small instances for which the procedure associated to row i showed a strictly better bound than the one associated to column j
Table 4 Lower bound analysis: dominance with respect to computational time on small instances. The generic cell (i,j) in the table reports the percentage of small instances for which the procedure associated to row i showed a strictly better computational time than the one associated to column j
Table 5 Lower bound analysis: dominance with respect to quality of the solution on large instances. The generic cell (i,j) in the table reports the percentage of large instances for which the procedure associated to row i showed a strictly better bound than the one associated to column j
Table 6 Lower bound analysis: dominance with respect to computational time on large instances. The generic cell (i,j) in the table reports the percentage of large instances for which the procedure associated to row i showed a strictly better computational time than the one associated to column j

Some comments on the results are in order.

We observe first that the best bounds are those returned by the mixed relaxations (as was expected) which in many cases return zero gap. More specifically, out of the total number of 353 instances for which the optimal solution is available, zero gap has been achieved 343, 256, and 253 times by SC-Mixed, MTZ-Mixed and MC-Mixed, respectively.

As for comparison among the mixed relaxations we observe (refer to Tables 1 and 2) that the SC-Mixed relaxation in the worst case returns a bound that is 3 % away from the optimum value on the small instances (see instance number 12 and 25), and 0 % on the large ones. The worst bound for the MTZ-Mixed relaxation is 7 % away from the optimum on the small instances (instance number 12 an 19), and 2 % on the large ones. The MC-Mixed relaxation, whenever is solved at the optimum in the given time limit, provides always zero gap.

As far as we compare the bounds provided by SG and DA, we observe that they are, as expected, comparable with those provided by the continuous relaxation SC-Cont and MTZ-Cont. On the other hand, SG (see Tables 3 and 5) works better than DA in the 84 % of the experiments for the small instances and 98 % for the large ones.

The comparison provides totally different observations if we consider the results of the experiments from the point of view of the computational time. DA works uniformly better than SG and the continuous relaxations, while the related computational time is, as expected, order of magnitude lower as far as the mixed relaxations are concerned.

Finally, in order to evaluate the robustness of the bounds, we observe from Figs. 3 and 4 that the average value of the gap (whenever available) is relatively stable both on small and large instances in all cases, with the mixed relaxations exhibiting also quite small values of the standard deviation.

When considering this aspect together with the corresponding computational time the final conclusion makes SG and DA preferable to the three continuous relaxations in terms of trade-off between quality of solution and computational time. More in general, we believe that SG and DA can be considered as the election tool whenever it is necessary to run many times relaxed models to obtain quick lower bounds (this is the case of application in a Branch and Bound framework). While, whenever one is faced with the need of calculating a better lower bound, the use of any Mixed relaxation appears definitely advisable.

We have also evaluated the role of the constraints (2.16), (2.37), (2.38) and (2.39) in the improvement of the bounds. In particular, we have proceeded on a leave-one-out basis for the mixed and the continuous relaxations of both the single-commodity flow formulation and the Miller-Tucker-Zemlin formulation.

The results are reported in Table 7 where each row corresponds to the constraints, while the columns report the average and the standard deviation of the relative difference of the bounds, computed on all the 400 instances, once the constraints are removed.

Table 7 Lower bound analysis: evaluation of effectiveness of constraints (2.16), (2.37), (2.38) and (2.39) on the quality of the solution returned by SC-Cont, SC-Mixed, MTZ-Cont and MTZ-Mixed

The average improvement of the lower bound due to constraints (2.16) is higher for the continuous relaxations of both the models than for the corresponding mixed ones, where, however it has an important role as well. Such an improvement is around 33 % on average for the continuous relaxations and 15 % for the mixed ones. Constraints (2.16) on the MTZ relaxations has the same effectiveness as constraints (2.37). Finally, constraints (2.38) and (2.39) are the least effective ones having an average improvement of the bound on MTZ-Cont equal to 0.2 % and not affecting at all the lower bound returned by MTZ-Mixed.

5.3 Upper bound analysis

All the results are reported in Tables 89 and Figs. 5 and 6. Table 10 reports the best lower bounds (and upper bounds) obtained on each of the 175 large instances.

Fig. 5
figure 5

Upper bound analysis: average and standard deviation of the relative gap from the optimum on the small instances

Fig. 6
figure 6

Upper bound analysis: average and standard deviation of the relative gap from the optimum on the large instances

Table 8 Upper bound analysis on small instances
Table 9 Upper bound analysis on large instances
Table 10 Optimum solution (whenever available), best lower and upped bounds for each of the large instances

In particular, Tables 8 and 9 compare the quality of the upper bounds and the required computational time of SG, DA and of three heuristics EWS, NCH, MIXED presented in [8]. An interesting property of SG and DA algorithms is the possibility to compute an upper bound of the problem with no additional CPU time. The structure of these tables is identical to Tables 1 and 2. In each table, the first three columns report the characteristics of each scenario: scenario ID, the number of nodes (n) and the number of arcs (m). For each algorithm, the tables report the average values taken on the five tested instances of UB (the calculated upper bound), Time (the computation time in seconds) and GAP (the relative gap between the returned upper bound and the optimum solution, when available). In particular, GAP is computed as the ratio \(\frac{\mathit{UB}-\mathit{Optimum}}{\mathit{UB}}\). The average of the optimal values over the five instances is reported, when available, together with the average time. Whenever at least one instance of the scenario has not been solved to optimality by solver CPLEX within the time limit of 3600 secs., the term N.A. (Not Available) appears in the tables. In such case, of course, no GAP value is reported for all the tested algorithms.

First note that, all the five algorithms are very fast: all the instances were solved within one second. The best upper bound is returned by SG. Figures 5 and 6 represent the average and standard deviation of the relative gap between the upper bound and the optimum value (whenever available), for each algorithm on small and large instances, respectively. On the small instances, the average percentage gap from the optimum is 0.27 for SG, 0.34 for DA, 0.38 for EWS, 0.40 for NCH and 0.38 for Mixed. The corresponding standard deviation varies between 0.24 ad 0.25 revealing quite a stable behavior for all the algorithms in terms of quality of the bound on all the small instances. This observation can be made also for the large instances.

6 Conclusions

We analyzed a relevant problem arising in telecommunication networks, namely, the problem of finding a spanning tree with the minimum number of branch vertices. Different procedures both to determine a lower bound and to determine an upper bound to the optimum solution of the problem were proposed, tested and compared. A deep analysis of the results was performed to gain insight on the trade-off between quality of the returned solutions and required computational time for all proposed methodologies.

When analyzing the quality of the upper bound we compared our Lagrangian approaches, SG and DA, with three heuristics existing in the literature. All the tested methodologies are very fast, however, SG and DA show a better performance in terms of quality of the solution.

When analyzing the quality of the lower bounds, the final conclusion makes SG and DA preferable to the three continuous relaxations in terms of trade-off between quality of solution and computational time. We outline here that, among all the considered procedures, SG and DA are the only ones able to provide simultaneously in very fast computational time both an upper bound (better than the existing heuristics) and a lower bound (whose quality is comparable to the continuous relaxation of the considered models). Our final conclusion in then that SG and DA can be considered as the election tool whenever it is necessary to run many times relaxed models to obtain quick lower bounds and upper bounds (this is the case of application in a Branch and Bound framework). While, whenever one is faced with the need of calculating a better lower bound, the use of any Mixed relaxation appears definitely advisable.