1 Introduction

A flexible and agile next generation optical network architecture necessitates the use of colorless, directionless, and contentionless (CDC) features of reconfigurable optical add-drop multiplexers (ROADMs). Such infrastructure, controlled via software, can facilitate rapid end-to-end service provisioning and restoration in the network [1, 2]. The contentionless ROADMs can handle the same wavelength at the same add/drop structure, avoiding the use of optoelectronic regenerators for wavelength conversions. The ROADM nodes also allow the optical bypass to reduce opto-electronic regenerators further, barring their need to maintain signal quality. To prevent signal deterioration caused by noise sources in optical communication systems, optical regenerators that enable signal re-amplification, re-shaping, and re-timing (3R) are utilized. Using 3R-optoelectronic regeneration capability at a selected set of nodes leads to a translucent optical network (TON) [3]. Deploying ROADMs with regenerators at selected locations reduces the time to provision a new communication request remotely in the network and also enables recovery from network failures [2, 4]. This contrasts with the traditional approach of placing the regenerators only when required to set up a lightpath route falling short in optical reachability (typically in the range of 350 km to 5000 km [5]). Although not all connections use the maximum optical reach, we must regenerate some before the actual requirement for enhanced load balancing. Prior deployment of regenerators at a few ROADM sites optimizes the system effectively to reduce the overall CAPEX and OPEX [4]. Moreover, managing more numbers of regenerators at a single site is more cost-effective than lower numbers of regenerators at multiple sites [6]. In the upcoming elastic optical networks (EONs), the bandwidth availability on a fiber link and the selected modulation level determines the data transfer capacity [7]. The optical reach constraint limits the advantages of elastic optical network in the selection of modulation scheme. The regenerator placement problem becomes even more relevant in EONs since we need to deal with connections having variable optical reach. This study allows a variable optical reach, tuned and controlled by changing a multiplicative factor, termed the stretch factor.

In the literature, regenerator placement (RP) problem was considered with both topology-based (also called connectivity dependent [8]) and traffic prediction-based (also called path based) strategies. Nodal degree first (NDF), centered node first (CNF) [9] and analytical model (AM) [10] under topology-based; traffic load prediction (TLP) and signal quality prediction (SQP) [7] under traffic-based algorithms were studied. The topology-based RP solutions can be easily implemented but do not give high network performance, due to the network being subject to variable traffic patterns, that were not considered during the design phase while placing the regenerators. On the other hand, the RP algorithms based on the traffic load distributions are computationally intensive but found to provide better network performance for the load distribution used during the optimization process [11]. Getting to know the exact traffic matrix is difficult, whereas the traffic flows at a node may be monitored with lesser effort. All traffic prediction-based RP solutions in the literature are based on the knowledge of exact traffic matrix, motivating us for this study that considers the traffic flows only at a node.

The authors of [12] established that, if XYZ represent the number of regenerators required for the cases where the lightpath routing is done considering (a) only simple paths, (b) all simple and non-simple paths, and (c) all simple and only non-simple paths that do not share an edge traversed in the same direction, respectively, then \(X \ge Z \ge Y\). Consideration of type Y routes in lightpath establishment increases alternative path availability but adds to the complexity of the RP problem further [13]. In our earlier study [14], we presented solutions to routing and wavelength assignment problem considering type Y routes assuming sparse regenerator deployment.

The TLP-based RP solutions, considered either static (also called permanent lightpath demand) or dynamic traffic demands [15,16,17]. In the static scenario, the complete demand matrix is known apriori; whereas, in the dynamic scenario, the lightpath requests may be scheduled (also called scheduled lightpath demand with a start/stop time of each demand being known) or ad-hoc (also called Ad-hoc lightpath demand with no knowledge of start/stop timings for demand). Due to the various types of applications, such as video streams and P2P traffic with different traffic characteristics, the conventional design methods using a single traffic pattern are inadequate to deal with unpredictable traffic patterns. The work in [18] studied a different version of the RP in which k possible traffic patterns are assumed to be available, wherein the objective is to place the minimum number of regenerators satisfying each of these patterns.

In a pipe model of traffic, the full traffic matrix, that is the exact demand between each node pair, is assumed to be known. However, since it is difficult to determine the exact traffic matrix in practice, the authors of [19, 20] used an uncertain demand model known as the hose model to account for the traffic uncertainty. Considering the hose model was shown to bring greater flexibility in designing network operations [21], since the model only considered the upper bound on the total ingress/egress demand at each node of the network. We introduce a new perspective to the regenerator site selection problem in the TONs with hose traffic model. To the best of our knowledge, there is no earlier study that studied RP problem with the hose model of traffic.

It must be noted that, though earlier studies in conventional IP networks used the hose model, they are not applicable to WDM networks due to the differences in its multiplexing requirements [22]. In WDM network we require each wavelength to be used for a unique lightpath and the choice of incoming and outgoing wavelengths at an intermediate node is governed by the wavelength continuity constraint. The availability of transponders also restricts the selection of outgoing and incoming wavelengths at the source and destination nodes. In line with the assumption of the full wavelength conversion capability, we too take recourse in the availability of ROADM node to overcome the challenge [23].

The authors of [3] discussed the practical deployment issues to be considered in the RP problem. The performance metrics suggested considering power, space, and workforce availability at the nodes. The generalized RP problem in [24, 25], cites practical difficulty in placing regenerators at all network nodes and proposed algorithms considering a subset of nodes as the candidate nodes. On the other hand, we assume every node to be a candidate node, with the limitations defined by continuous values for the parameters of regeneration cost and regeneration capacity. To simplify our experiments, the cost and capacity parameters are assumed to be random positive values. Choice of the parameters considering practical aspects at each site (i.e., defining a function with node-specific input parameters) is left for future work.

The main contributions of this paper are:

  • We address the RP problem in translucent optical networks with the regenerators restricted to a limited number of nodes.

  • Instead of either the network topology-based or traffic-based algorithms, we use a hybrid approach for RP problem.

  • For the first time, the uncertainty of traffic matrix is addressed by the use of the hose model for the RP problem.

  • We use ILP formulations that minimize the number of regenerator sites to lower CAPEX and OPEX.

  • We propose RP solutions that aids in subsequent regenerators’ allocation (i.e., during the lightpath routing phase) by allowing non-simple paths to establish the lightpaths in TONs.

  • We show the efficacy of our proposed solution by comparing the results with another in the literature.

The rest of this paper is organized as follows. Section 2 discusses the literature on RP problem. We present ILP formulations in Sect. 3. In Sect. 4 we present our approaches to solve the problem, including the proposed exact and heuristic algorithms. Results from the numerical evaluation of the ILP formulations and heuristic algorithm are presented in Sect. 5. We also compare the performance of the proposed heuristic with another in the literature. Section 6 concludes the paper.

2 Literature review

In the past, there has been lot of research work in the area of translucent optical networks (TONs) [26]. For a survey of the literature on network planning and operation in TONs, the reader may refer [27, 28]. Most of the research on RP problem presented heuristics for solving the problem [29], [11], [30]. There have also been a few works proposing exact algorithms to solve the RP problem [31, 32]. In this section, we review some literature related to our work on regenerator placement.

The RP problem has some similarity with the problems of wavelength converter placement [33, 34], relay network design [35] and hub location [36]. The RP problem can be formulated as a maximum leaf spanning tree problem (MLSTP), that aims to find a spanning tree of an undirected graph maximizing the number of leaves [37,38,39,40]. Alternately, the minimum connected dominating set problem (MCDSP), that aims to find a connected dominating set of a graph with minimum cardinality can also be used for this problem [41, 42]. The works in [24, 43] studied a set covering (SC) formulation of the RP problem and proposed heuristics with K-center and K-shortest path-based approaches. Under the hose traffic model, the RP problem may be formulated as an MCDSP (presented in Sect. 3).

The authors of [8, 41, 42, 44, 45] also formulated the RP problem as an MCDSP, but with an objective to ensure the existence of a single path between each source-destination pair and without considering the node-specific constraints. The exact ILP formulations were presented in [44, 45]. However, the regenerator deployment strategy used places regenerators only in the island boundaries. Furthermore, the work attempts an arbitrary degree of end-to-end connectivity by solving the K-connected K-dominating set problem, by finding 1-connected dominating 3R node-set. The number of nodes in this set is not predefined and is an output of our formulation. The mixed-ILP formulation in [45] is based on the arc-chain formulation to set up paths between all not-directly connected source-destination pairs in the reachability graph and solved using a branch-and-price algorithm.

Based on the categorization of the RP algorithms discussed in [11] we identify that two main approaches are jointly used in the literature; node counter and ranking (NC &R) and transitional weight (TW) [46]. In the NC &R strategy, a dedicated counter is assigned to each network node. The counter is incremented depending on the heuristics used by the RP algorithm. At the end of the process, the nodes that the counter presents with higher values are selected for placement (ranking). The NX-policy [9, 46,47,48] stands for the class of RP algorithms that assumes knowledge about the number of desired translucent nodes (N) and the number of regenerators to be placed in each selected node (X). The TW is often used as counter increment strategy, which assumes a given routing algorithm and each time a route passes through a given node, the counter of this node is incremented or not according to the heuristics used. There are also the topology-TW strategy, where only one route linking all possible pairs of source-destination nodes are considered and there is the traffic-TW strategy, where a given traffic matrix is considered during the algorithm evaluation.

In a recent paper [49] involving EON, authors presented a new strategy for regeneration capacity increase when required to accommodate additional dynamic demands. Authors of [50] presents a deep learning model to predict the best regenerator placement in EON and claim that a smaller set of network usage features to be sufficient for acceptable QoS. Researchers in [51] propose machine learning approaches utilizing usage features in EON to predict optimal regenerator placement. A relatively new approach with predeployment of back-to-back tunable transponders in EON is considered in [52]. The work proposes to use transponders for transmitting/receiving the signal at the source/destination nodes (add/drop) and regeneration of the signal at some intermediate nodes. In [53] a solution for the RP problem in IP-over-EON is presented with results showing improved energy efficiency compared to the other works in the literature. In [54] digital signal processing at the transceivers and hybrid Raman/erbium-doped-fiber amplifiers are shown to help reduce the need for OEO regenerators supporting a 200 Gbps any-to-any traffic. Authors in [55] propose strategies optimizing regenerator locations and 3R units at each site in reconfigurable TON. The work advocates for the predeployment of regenerators for network failure scenarios. In [56], six heuristics for regenerator placement and allocation in EON are compared by simulation method.

In general, there is a trade-off between the number of regenerators placed and the resources used. Findings of [57] indicate that the suggested evolutionary algorithm may reduce spectrum use with fewer regeneration nodes. Depending on the network load circumstances, the number of regenerators required with complete knowledge of QoT (quality of Transmitted signals utilizing monitoring devices) can further be reduced by 20% to 50% [58]. The supplementary regenerators result in a release of spectrum resources and improved income for operators since the network can now accommodate more requests [59]. According to a current research [60], the number of regenerators must be fixed depending on the energy efficiency of different network designs. More regenerators in the network are justifiable if the financial advantage of lowering the blockage ratio compensates for the higher cost due to energy consumption. Increased regenerator numbers alleviate reach-blockings; with fewer reach-blockings, capacity-blockings become more noticeable as load increases. The best trade-off between blocking probability and network costs following extra regenerators usage is obtained when adaptive methods are applied to choose modulation formats and routing paths [61]. A higher modulation level results in poorer transmission reaches for a given route length, necessitating additional regenerators. Furthermore, maximum transmission reach rises as optical transmission power increases, but at a higher cost in terms of power consumption. As a result, optimal transmission power and regenerator number are desired [62].

We identified the following limitations from the literature survey, which are avoided in our work. The RP solutions approaches proposed traffic or topology-based heuristic solutions, which either fall in NC &R or TW or a mix of both schemes. We observed that the distribution of regenerator sites in the network satisfying projected regeneration demands (forming a connected regeneration backbone) for serving any future demands is not considered in any of the prior studies. Earlier works either assumed a static or dynamic pipe model of traffic and the regenerator nodes are selected purely based on the ranking of nodes calculated using the number of regenerations. Node-specific cost factors and constraints were not considered in any of the earlier studies. All the earlier proposals considered the use of regenerators as late as possible, only when the optical reach is exhausted. We also found that none of the algorithms proposed earlier accept non-simple lightpaths in the network.

3 Problem formulation

As a part of network survivability requirements, lightpaths are re-routed around the failed nodes or edges. There exists a limit on the number of regenerator units supported by each node of the network. As discussed earlier, inclusion of type Y routes involving non-simple paths makes the selection of regenerator sites very critical. The lightpaths starting at any node should be able to reach the destination with only one regeneration in the network, before the optical reach limit is exhausted. Determining the traffic demand in dynamic network is also hard. Keeping all these constraints in mind, we formulate the RP problem with a different model. First, we summarize the notation used in the proposed optimization model in Table 1 and then discuss the details.

Table 1 Notations used in the optimization model

3.1 Network model

We consider a wide-area WDM network (\(\mathcal {G = (N, E)}\)) to place O-E-O regenerators at a few ROADM nodes. We assume that the fiber links are bidirectional (i.e., for every pair of nodes ij, there is a link from i to j and j to i). All nodes \(\mathcal {N}\) deploy ROADMs. Each \(\zeta _{i}\) depends on the electric power availability, manpower availability and the size of the site. The regeneration demand \(\delta _{i}\) is derived from the projected volume of traffic flow at node i and the lightpath regeneration probability of node i (proposed in [10]).

The regenerator sites are satisfactorily selected if \(\mathcal {N}^{\psi }\) is reachable by \(\Re\). We define a subset of nodes \(\mathcal {N}^{\psi } \subset \mathcal {N}\) reachable by \(\Re\), if and only if, the following conditions are true.

  1. i.

    \(\forall i \in \mathcal {N}^{\psi }\) corresponds to the existence of a node \(j \in \mathcal {N}^{\psi }\) for which \(\Delta (i,j) \le \Re\). A lightpath starting from a regenerator site (with full signal strength after regeneration) can regenerate at another site within distance \(\Re\) away. This also ensures that the lightpath routes are not confined to one single site (or region).

  2. ii.

    \(\forall i \in \mathcal {N}\), the total capacity out of those nodes \(j \in \mathcal {N}^{\psi }\) for which \(\Delta (i,j) \le \Phi \times \Re\) with stretch factor \(0 \le \Phi \le 1\), greater than equal to \(\delta _{i}\). The regeneration demand (\(\delta _{i}\)) at a location i must be satisfied by the total regeneration capacities contributed by those regenerator sites located within distance \(\Phi \times \Re\) away. \(\Phi\) models the managerial decision parameter. In a flexible network for a given communication demand, maintaining the required data rate using a higher modulation level needs less bandwidth but has a restricted optical reach. Considering the quality of service requirements, network management unit may control the data transfer capacity. A lightpath can be transmitted for a maximum distance of \(\Re\) with a lower modulation level without much signal attenuation. The smaller value of \(\Phi\) models the higher modulation level required for bandwidth-efficient transmission in the network; this in turn implies that more regeneration sites should be available.

  3. iii.

    \(\forall i,j \in \mathcal {N}^{\psi }\), if \(\omega _{ij}\) represents hop count of the shortest path connecting nodes i and j in G, than, \(\Delta (i,j) \le \omega _{ij} \times \Re\). The regenerator network, where each regenerator site is separated from another by at most a distance of \(\Re\), must span the whole network (i.e., graph \(\mathcal {G}\)). For modelling this, we should assign \(\Re\), with a more conservative value (say, the maximum optical reach of a lightpath in heavy traffic conditions). This constraint guarantees that the serving area of the regenerator site covers all possible lightpath demands in different traffic scenarios.

3.2 Problem formulation: a mixed-integer quadratically constrained program (MIQCP)

We attempt to optimize the total cost (CAPEX & OPEX) by minimizing regeneration facilities in the network (i.e., \(\sum _{i = 1}^{\eta }{c_{i} \times \upsilon _{i}}\)). Let \(\mathcal {N}_{i}^{\Phi \Re } = \{j \in \mathcal {N} | \ \Delta (i,j) \le \Phi \times \Re \}\), represent the set of nodes reachable from i within distance of \(\Phi \times \Re\). We may equivalently write the condition (ii) as \(\sum _{j\in \mathcal {N}_{i}^{\Phi \Re }}\zeta _{j} \times \upsilon _{j} \ge \delta _{i}, \forall i \in \mathcal {N}\).

If \({\tilde{\mathcal G}} = ({\tilde{\mathcal N}},{\tilde{\mathcal E}})\), represents the reachability graph (also called connectivity graph) of \(\mathcal {G}\), where \({\tilde{\mathcal N}}=\mathcal {N}\) and \({\tilde{\mathcal E}}=\{ (i,j) | \Delta (i,j) \le \Re ;\; i,j \in \mathcal {N};\; i \ne j \}\). The connected dominating set of graph \({\tilde{\mathcal G}}\) may be represented by a subgraph \(\mathcal {S}\) of \({\tilde{\mathcal G}}\), where, every \(i \in {\tilde{\mathcal N}}\) with \(\upsilon _{i} = 1\) denoting a node in \(\mathcal {S}\) (i.e., a regenerator site). In other words, \(\mathcal {S}\) represents a subgraph of \(\mathcal {G}\) where every node denotes a regenerator node of \(\mathcal {G}\). So, instead of formulating the problem on original graph \(\mathcal {G}\) we may use graph \({\tilde{\mathcal G}}\) to identify the regenerator sites. The condition (iii) may be realized by utilizing the following network flow model.

A virtual flow (i.e., a lightpath with a source and destination nodes) exists if the source and destinations are connected, and otherwise, the flow fails to reach the destination. Suppose that, there is a lightpath start node \(O_{i}\) attached to node i; and it has \(\eta\) units of flow request to be forwarded on \({\tilde{\mathcal E}}\) through node i. If the flow residue (not routed by the network) is represented as \(0 \le \upsilon _{i}^{o} \le \eta\). With \(x_{j}=1\), for every regenerator node j, we route one unit of flow. We use \(y^{jk}_{i}\) to denote the amount of flow on edge (jk) originating from \(O_{i}\). Therefore, we can make sure that the flow reaches all those nodes j with \(x_{j}=1\) routed from node \(i \in {\tilde{\mathcal G}}\) and satisfies the following constraints:

$$\begin{aligned}&\upsilon _{i}^{o} + y^{0i}_{i} =\; \eta \end{aligned}$$
(1)
$$\begin{aligned}&0\le y^{jk}_{i} \le \; \eta \times \upsilon _{k}, \forall (j,k)\in {\tilde{\mathcal E}}\cup (0_{i},i) \end{aligned}$$
(2)
$$\begin{aligned}&\sum _{j|(j,k)\in {\tilde{\mathcal E}}}y^{jk}_{i} = \; \upsilon _{k} + \sum _{l|(k,l)\in {\tilde{\mathcal E}}}y^{kl}_{i}, \forall k\in {\tilde{\mathcal N}} \end{aligned}$$
(3)
$$\begin{aligned}&\sum _{j\in {\tilde{\mathcal N}}}\upsilon _{j} = \; y^{0i}_{i} \end{aligned}$$
(4)
$$\begin{aligned}&0\le \; \upsilon _{i}^{o}. \end{aligned}$$
(5)

The equation (1) may be understood by observing that, the total amount of flow \(y^{0i}_{i}\) moving out of the source \(O_{i}\) and the residue flow \(\upsilon _{i}^{o}\) in \(O_{i}\) is \(\eta\). Here \(\eta\) is the node-set size of the network \(\mathcal {G}\), and it represents the upper bound of the flow that is allowed to be successfully routed in the network. For nodes with \(x_{k}=1\) (i.e., when \(k \in {\tilde{\mathcal N}}\), and is a regenerator node), the maximum incoming flow can be \(\eta\), otherwise zero. Constraint (2) ensures that, only a regenerator node (as a sink in \({\tilde{\mathcal G}}\)) can receive incoming flows. For each \(k \in {\tilde{\mathcal N}}\), equation (3) states that the total incoming flow to k must exactly match the sum of its outgoing flow and the sink amount. The total of all flows originating from \(i \in {\tilde{\mathcal N}}\) must be equal to the total routed to the sinks. This gets ensured by the equation (4). Finally, condition (5) says that the residue amount of flow at a source node must be non-negative. The above conditions necessitate a regenerator site at node i, else, it means that no flow originating from \(O_{i}\) is allowed to be routed to the sinks.

We are now in a position to formalize the RP problem as follows.

Definition: Given an optical network with an average optical reach distance, regeneration demand, capacity, and cost of regeneration for each node, the RP problem is to determine the minimum number of network nodes for regenerator placement, such that demand at each node is satisfied, and there exist lightpath routes of which no sub-path without internal regenerators has a length greater than the optical reach and yet the overall cost of regeneration is minimum.

Optimization problem: With \(\mathcal {S}\) denoting the connected dominating set (CDS) of \(\mathcal {G}\) and \(\mathcal {N(S)}\) as its node set. Each node \(i \in \mathcal {G}\) has a regeneration capacity of \(\zeta _{i} \in \mathbb {Z^{+}}\); regeneration demand \(\delta _{i} \in \mathbb {Z^{+}}\) and a node set \(\mathcal {N}_{i}^{\Phi \Re }\). If \({\tilde{\mathcal G}} = ({\tilde{\mathcal N}},{\tilde{\mathcal E}})\), and the associated costs \(c_{i} \in \mathbb {Z^{+}}\) for every node i is provided, the following formulation (we name, \(RP^{MIQCP}\)) defines the RP problem.

$$\begin{aligned}&\text{ minimize }\quad \sum _{i = 1}^{\eta }{c_{i}\times \upsilon _{i}} \end{aligned}$$
(6.1)
$$\begin{aligned}&\text{ subject } \text{ to }\sum _{j\in \mathcal N_{i}^{\Phi \Re }}\zeta _{j} \times \upsilon _{j} \ge \delta _{i}, \forall i \end{aligned}$$
(6.2)
$$\begin{aligned}&\upsilon _{i} = \{0,1\}, \forall i \end{aligned}$$
(6.3)
$$\begin{aligned}&\upsilon _{i}^{o}+y^{0i}_{i} = \eta , \forall i\in {\tilde{\mathcal N}} \end{aligned}$$
(6.4)
$$\begin{aligned}&0\le y^{jk}_{i} \le \eta \times \upsilon _{i} \times \upsilon _{k}, \forall (j,k)\in {\tilde{\mathcal E}}\cup (0_{i},i), \nonumber \\&\quad \forall i\in {\tilde{\mathcal N}} \end{aligned}$$
(6.5)
$$\begin{aligned}&\sum _{j|(j,k)\in {\tilde{\mathcal E}}}y^{jk}_{i} = \upsilon _{i} \times \upsilon _{k} + \sum _{l|(k,l)\in {\tilde{\mathcal E}}}y^{kl}_{i}, \forall i,k\in {\tilde{\mathcal N}} \end{aligned}$$
(6.6)
$$\begin{aligned}&\upsilon _{i}\sum _{j\in {\tilde{\mathcal N}}}\upsilon _{j} = y^{0i}_{i},\forall i\in {\tilde{\mathcal N}} \end{aligned}$$
(6.7)
$$\begin{aligned}&0\le \upsilon _{i}^{o},\forall i\in {\tilde{\mathcal N}} . \end{aligned}$$
(6.8)

The equation (6.1), defines the objective function discussed above, whereas, condition (6.2) represents the condition (ii) explained in Sect. 3.1. We define the boolean variable \(\upsilon _{i}\) in equation (6.3). The constraints (6.4) through (6.8) correspond to the connected dominating set, forming a subgraph with conditions (1) through (5) for every node. The constraints (6.6) and (6.7) have quadratic terms making the problem intractable.

3.3 RP complexity analysis

The decision-making RP can be stated as follows. Given \({\tilde{\mathcal G}} = ({\tilde{\mathcal N}},{\tilde{\mathcal E}})\), does there exist a subgraph \(\mathcal {S}\) of \({\tilde{\mathcal G}}\) which satisfies the following three constraints. The first being, \(\sum _{j\in \mathcal N_{i}^{\Phi \Re } \cap \mathcal {N(S)}}\zeta _{j} \ge \delta _{i}, \forall i \in {\tilde{\mathcal N}}\); the second, \(\mathcal {S}\) must be connected; and lastly, \(\sum _{i\in \mathcal {N(S)}} c_{i} \le C\), where \(C \in \mathbb {Z^{+}}\) represents a cost bound (i.e., permitted maximum regenerator sites).

We show that, the decision-making RP is in fact non-deterministic polynomial-time complete (NP-complete) problem. With some resemblance to [13], we here present a procedure to reduce the well-known problem of vertex-cover (VC) to the RP problem. Given a graph \({\overline{\mathcal G}} = ({\overline{\mathcal N}},{\overline{\mathcal E}})\), a VC solution is a subset \(\acute{\mathcal{N}} \subset {\overline{\mathcal N}}\) and satisfy the requirement that, each edge \((l,m) \in {\overline{\mathcal E}}\) either has both \(l, m \in {\overline{\mathcal N}}\), or at least l or \(m \in {\overline{\mathcal N}}\). An optimized VC solution looks for a vertex cover \(\acute{\mathcal{N}}\) such that \(|\acute{\mathcal{N}}| \le C\).

We can reduce \({\overline{\mathcal G}} = ({\overline{\mathcal N}},{\overline{\mathcal E}})\), generating \({\tilde{\mathcal G}} = ({\tilde{\mathcal N}},{\tilde{\mathcal E}})\) using the following steps. 1) \({\tilde{\mathcal N}}\) includes \({\overline{\mathcal N}} \cup {\overline{\mathcal E}}\); 2) create an edge \((l,m) \in {\tilde{\mathcal E}}\) for all distinct \(l, m \in {\overline{\mathcal N}}\); 3) append (ln) and (nm) to \({\tilde{\mathcal E}},\; \forall n = (l,m) \in {\overline{\mathcal E}}\); 4) set cost \(c_i = 1\) for each \(i \in {\overline{\mathcal N}}\), and zero otherwise; 5) set recharging capacity \(\zeta _{n}=1\) for each \(n \in {\overline{\mathcal E}}\), and zero otherwise; 6) set regeneration demand \(\delta _{i} = |{\overline{\mathcal E}}|\) and \(\mathcal N_{i}^{\Phi \Re } = {\overline{\mathcal E}}, \forall i \in {\tilde{\mathcal N}}\).

We argue that a VC solution on \({\overline{\mathcal G}}\) has a max cost of C, if and only if the decision-making RP problem too has a feasible solution with a max cost of C.

Let us assume subgraph \(\mathcal {S}\) as a solution to the RP problem. Assign \(\acute{\mathcal{N}} = \mathcal {N(S)} \cap {\overline{\mathcal N}}\). Since \(\zeta _{n}=1, \forall n \in {\overline{\mathcal E}}\) and \(\delta _{i} = |{\overline{\mathcal E}}|, \forall i \in {\tilde{\mathcal N}}\) assures \({\overline{\mathcal E}} \subset \mathcal {N(S)}\); thus, \(\mathcal {S}\) contains \({\overline{\mathcal E}}\). Further, as \(\mathcal {S}\) is a connected subgraph, every \(i \in \acute{\mathcal{N}}\) necessarily has an edge \(n \in {\overline{\mathcal E}}\) in \({\tilde{\mathcal G}}\). \(\acute{\mathcal{N}}\) contains a maximum of C nodes; therefore, \(\acute{\mathcal{N}}\) is a VC solution for \({\overline{\mathcal G}}\) satisfying \(|\acute{\mathcal{N}}| \le C\).

To verify the opposite, suppose that, \(|\acute{\mathcal{N}}| \le C\), where \(\acute{\mathcal{N}}\) is a vertex cover of \({\overline{\mathcal G}}\) and a subgraph \(\mathcal {S}\) of \({\tilde{\mathcal G}}\) contains nodes \(\acute{\mathcal{N}} \cup {\overline{\mathcal E}}\). We can deduce that, \(|\mathcal N_{i}^{\Phi \Re } \cap \mathcal {N(S)}| = |{\overline{\mathcal E}}|\) and so, \(\sum _{j\in \mathcal N_{i}^{\Phi \Re } \cap \mathcal {N(S)}}\zeta _{j} = |{\overline{\mathcal E}}| = \delta _{i}\). With a VC solution of \(\acute{\mathcal{N}}\), every \(n = (l,m) \in {\overline{\mathcal E}}\) necessarily have at least either of l and m in \({\overline{\mathcal E}}\). This implies that, \(\mathcal {S}\) necessarily includes an edge (nt) for some \(t \in \acute{\mathcal{N}}\) and \(\acute{\mathcal{N}}\) forms a clique in \({\tilde{\mathcal G}}\), showing further that, \(\mathcal {S}\) must be connected. Since, every \(n \in {\tilde{\mathcal E}} \subset {\overline{\mathcal N}}\) does not have any cost; so, \(\mathcal {S}\) must have an equal cost as of \(\acute{\mathcal{N}}\) concerning \({\tilde{\mathcal G}}\). This shows that, the RP problem has a solution with max cost of C.

The VC problem is known to be in NP-complete class, so is our decision-making RP problem.

3.4 Analytical model for regeneration demand predictions

Based on the work in [10], we identify three most significant factors influencing the estimation of regeneration demand; the probability mass function of the link lengths in the network, the length of the links that are incident on a node, and the edgeness of a node. The edgeness refers to the closeness of a node to the edges of the topology. We applied the prediction formula from [10] to estimate the regeneration demand \(\delta _{i}, \;\forall i\in \mathcal {N}\). We express the regeneration demand to be:

$$\begin{aligned}&\delta _{i} = \sigma _i \times \rho _{i}, \;\forall i\in \mathcal {N} \end{aligned}$$
(7)

We consciously avoided using similar stochastic strategies (routing-only and routing-and-reach) suggested in [2] and believe that involving more information into the regenerator demand estimation would mean more assumptions about the operating conditions, that may not be practical and make the process computationally intensive. The regeneration arrival rate (say, \(\sigma _i\) at a node i) reflects the ratio of the regenerations in each node of the network. We still have to derive the maximum flow rates at a node i to arrive at the regeneration demand \(\delta _{i}\). The following section introduces the hose-traffic model and the method used for flow rate estimation (say, flow rate of \(\rho _i\) at a node i).

3.5 Hose model

The hose uncertainty traffic model only specifies the maximum ingress (traffic entering the network, say \(R_i\)) and egress (traffic leaving the network, say \(C_i\)) rates for each node i in the network. The point to point demand matrix is restricted by these ingress s egress bounds and are the only known aspects of the traffic. Any feasible traffic matrix \(T = [T_{ij}]\) for the network is constrained by: \(\sum _{j \in \mathcal {N}} T_{ij} \le R_{i}\) and \(\sum _{j \in \mathcal {N}} T_{ji} \le C_{i}, \forall i \in \mathcal {N}\).

Assuming that the ingress traffic is distributed non uniformly across all the nodes (according to a traffic distribution vector \(\alpha\)), we adapted the two phase routing scheme proposed in [63] to derive the maximum flow rate at any node in the network. In the first phase, the ingress traffic at any node is distributed to every node i acting as an intermediate node, independent of the final destination of the traffic. With the end of first phase, each node i receives traffic destined for different destinations. The actual routes to their respective destinations are decided in the second phase of the routing. The routes are generated based on the minimum path cost first (MPCF) scheme with path cost \(\Delta (i,j)\) for every node pair (ij). Node i acts as the intermediate receiving node defining the \(\mathcal {N}\) dimensional first-phase routing cost vector \(F_{C_i} = [\Delta (0,i),\Delta (1,i),\ldots ,\Delta (\mathcal {N}-1,i)]\), \(\forall i \in \mathcal {N}\) with \(\Delta (i,i)=0\). In the second phase, node i acts as the intermediate transmitting node defining second-phase routing cost vector \(S_{C_i} = [\Delta (i,0),\Delta (i,1),\ldots ,\Delta (i,\mathcal {N}-1)]\), \(\forall i \in \mathcal {N}\) with \(\Delta (i,i)=0\). We then add the respective average route costs of both the phases involving node i giving the average total path cost \(Avg_{C_i} = \sum _{j \in \mathcal {N}}(\Delta (j,i) + \Delta (i,j))/|\mathcal {N}|, \forall i \in \mathcal {N}\). When the average cost of path for the two-phase routing \(Avg_{C_i}\) is larger, the distribution fraction \(\alpha _i\) is smaller and the opposite happens when \(Avg_{C_i}\) takes smaller values. The distribution fraction \(\alpha _i\) is calculated considering that \(\sum _{i \in \mathcal {N}} \alpha _i = 1\) as:

$$\begin{aligned}&\alpha _i = (Avg_{C_i})^{-1}/ \sum _{j \in \mathcal {N}}(Avg_{C_j})^{-1} \end{aligned}$$
(8)

A maximum traffic demand of \(\alpha _j \times R_i\) is sent from node i to another node j during the first phase of the routing. At the end of first phase, node i again receives \(\alpha _i \times R_m\) traffic from any other node m; among these, the traffic destined for node j is \(\alpha _i \times l_{mj}\), if and only if, we assure that traffic is distributed initially irrespective of their final destinations. So, during second phase of the routing process \(\sum _{m \in \mathcal {N}} \alpha _i \times l_{mj} = \alpha _i \times C_j\) amount of traffic is sent from node i to node j. Considering both the routing phases, the maximum traffic of \(\alpha _j R_i+\alpha _i C_j\) is routed from node i to node j. It is easy to deduce further that, the maximum traffic passing through any node i i.e.,

$$\begin{aligned}&\rho _i =\alpha _i \times \sum _{j \in \mathcal {N}, i\ne j} (R_j+C_j) \end{aligned}$$
(9)

Its worth mentioning that \(\rho\) values do not depend on the individual traffic matrix T and satisfies the hose model constraints. Further, the estimated \(\rho\) is sensitive to routing strategy considered as the \(\alpha\) values computed are based on the routing algorithm.

3.6 Calculating node potential

We present an approach for calculating the node potential, quantifying each node’s importance in the network (or potential of a node to become a regenerator site in a given network topology). The work in [64] developed a node importance evaluation method for use in complex networks. With insight from their work, we define the following function for calculating the potential for each node focusing on an optical communication network. The node potential of a node i is determined by:

$$\begin{aligned}&I_i = E_i \times \sum _{j \in {\tilde{\mathcal N}}} \left(\frac{D_j \times E_j}{k^2}\right) \times (S_i + 1) \end{aligned}$$
(10)

where, \(E_i\) represents the efficiency of node i and defined as \(E_i = \frac{1}{n} \times \sum _{m=1, m \ne i}^n (\frac{1}{d_{im}})\). Efficiency defines the degree of influence of nodes on other relevant nodes in the network. n and K represent the order and average degree of the network, respectively. \(D_j\) denotes the degree of node j and \(d_{im}\) is the shortest path length from node i to node m. \(S_i = \frac{S_{i}^{{\tilde{\mathcal G}}}}{\sum _{j \in {\mathcal {N}_{i}^{\Phi \Re }}} S_j^{{\tilde{\mathcal G}}}}\) computes the relative surplus regenerator availability at node i with respect to its adjacent nodes j in the reachability graph \({\tilde{\mathcal G}}\). Here, \(S_{i}^{{\tilde{\mathcal G}}}\) represents surplus regeneration of \(\zeta _{i} - \delta _{i}\) at node i if \(\zeta _{i} - \delta _{i} \ge 0\) or 0 otherwise.

We use the node potential values in Algorithm 1 and Algorithm 3, for ranking nodes in the networks. Note that, unlike the traffic distribution fraction \(\alpha _i\), which is only topology dependent, the node potential \(I_i\) for a node i, is also sensitive to the variations in hose traffic inputs.

Fig. 1
figure 1

RP problem: input/output system diagram

4 Solution approaches

Due to the NP-hard nature of the problem, there is no simple method to solve the RP problem in question. Since the constraints 6.6 & 6.7 are quadratic equations, neither a mixed-integer quadratic program (MIQP) nor an MILP solver can be used to solve the formulation \(RP^{MIQCP}\). With some relaxations in the constraints, we could find methods to solve it, albeit not optimal. Here, we present two solution approaches, with each having its distinctive characteristics. We present a block diagram for a clear understanding of the overall input/output system of the RP problem in Fig. 1.

4.1 Approach I: a mixed integer linear programming (MILP):an optimum approach

If one considers that a node i is among the nodes, where regeneration facility will ultimately be installed irrespective of the others (i.e., \(if \; \upsilon _{k} = 1\) for some \(k=1..\eta\)), the formulation \(RP^{MIQCP}\) can be restated as follows (named \(RP^{MILP}\) here).

$$\begin{aligned}&\text{ minimize } \sum _{k = 1}^{\eta }{c_{k} \times \upsilon _{k}} \end{aligned}$$
(11.1)
$$\begin{aligned}&\text{ subject } \text{ to }\sum _{j\in \mathcal N_{k}^{\Phi \Re }}\zeta _{j} \times \upsilon _{j} \ge \delta _{k}, \forall k\in {\tilde{\mathcal N}} \end{aligned}$$
(11.2)
$$\begin{aligned}&~\upsilon _{k} = \{0,1\}, \forall k\in {\tilde{\mathcal N}} \end{aligned}$$
(11.3)
$$\begin{aligned}&~\upsilon _{i}^{o} + y^{0i}_{i} = \; \eta \end{aligned}$$
(11.4)
$$\begin{aligned}&0 \le \; y^{jk}_{i} \le \; \eta \times \upsilon _{k}, \forall (j,k)\in {\tilde{\mathcal E}}\cup (0_{i},i) \end{aligned}$$
(11.5)
$$\begin{aligned}&\sum _{j|(j,k)\in {\tilde{\mathcal E}}}y^{jk}_{i} = \; \upsilon _{k} + \sum _{l|(k,l)\in {\tilde{\mathcal E}}}y^{kl}_{i}, \forall k\in {\tilde{\mathcal N}} \end{aligned}$$
(11.6)
$$\begin{aligned}&\sum _{j\in {\tilde{\mathcal N}}}\upsilon _{j} = \; y^{0i}_{i} \end{aligned}$$
(11.7)
$$\begin{aligned}&0\le \upsilon _{i}^{o} \; \& \; \upsilon _{i} = 1 \end{aligned}$$
(11.8)

To ensure that the solution (subgraph \(\mathcal {S}\)), containing all nodes j with \(\upsilon _{j} = 1\) forms a CDS, having confirmed that \(\upsilon _{i} = 1\); this is guaranteed with the use of equations (1) through (5). The formulation \(RP^{MIQCP}\) is reduced to a MILP formulation \(RP^{MILP}\) and may be solved using standard MILP solvers. Note, now quadratic equations 6.6 and 6.7 are replaced with linear equations 11.6 and 11.7, respectively.

figure c

The primary concern of our approach lies with the choice of node i, sure to be included in the final solution (i.e., \(\upsilon _{i} = 1\)). One of the methods is that we sequentially presume that a node \(i \in {\tilde{\mathcal N}}\) will be hosting regeneration facility and so gets into the final solution (we set \(\upsilon _{i} = 1\) for ensuring this). Thereafter, the MILP solver is run on \(RP^{MILP}\) formulation giving minimum cost regenerator sites among the rest of the nodes in \({\tilde{\mathcal G}}\). This is repeated \(\eta\) times and the final solution is the minimum among the \(\eta\) solutions generated so far (we call this iterative-\(RP^{MILP}\)). The running time of this algorithm depends on the network size (i.e., \(\eta\)) and efficiency of the MILP solver used (for solving \(RP^{MILP}\)). The algorithm works for smaller problem instances, and the method grantees an optimized solution provided MILP solver returns optimal solutions too. For larger networks with increased \(\eta\), the number of MILP runs and computation time required for each grows exponentially and quickly becomes intractable.

If we have a surety of selecting a single node i, we could reduce the number of expensive MILP runs to a greater extent. We find that generating MILP solutions for every node i may be avoided if we assume a positive demand at every network node. In the WDM wide-area networks with IP-traffic, one may be assured of demands at all nodes. The RP solution must contain a node i where regenerators are placed; otherwise, we have an infeasible solution. Let one such node be i, where the facility can be set up (i.e., with \(\upsilon _{i} = 1\) in the optimal solution), then a node \(j \in {\mathcal {N}_{i}^{\Phi \Re }}\) must have been assigned to node i. Similarly, if \(\upsilon _{i} = 0\) in the optimal solution, then a node \(j \in {\mathcal {N}_{i}^{\Phi \Re }}\) must have \(\upsilon _{j} = 1\). Therefore, for any node \(i \in {\tilde{\mathcal N}}\) there exists one another node \(j \in {\tilde{\mathcal N}}\) such that both i and j are in node set \(\mathcal {N}_{i}^{\Phi \Re }\) and \(\mathcal {N}_{j}^{\Phi \Re }\) (i.e., subgraph \({\tilde{\mathcal G}}\)). With given condition, \(0 \le \Phi \le 1\) nodes i and j must be single-hop apart in subgraph \({\tilde{\mathcal G}}\). We now can presume any node i as regenerator node and apply the MILP formulated (\(RP^{MILP}\)) with node set \(\mathcal {N}_{i}^{\Phi \Re }\) only. Instead of \(\eta\) MILP iterations, we now need only \(|\mathcal {N}_{i}^{\Phi \Re }|\) iterations of MILP. We used SCIP (a Branch-and-Bound (B &B) platform) to solve the \(RP^{MILP}\) instances. The complete procedure is presented in Algorithm 1. We here use the procedure discussed in Sect. 3.6 to generate node potential for each node and pick the node with the highest potential value as the primary regenerator node. Other methods may be employed for this node selection (highest-degree node or random node) process. However, by selecting a node \(n \in {\tilde{\mathcal G}}\) with the highest node potential, we could further optimize the objective of the proposed Algorithm 1. Experimental results comparing the methods are presented later (Table 5 in Sect. 5).

4.2 Approach II: heuristic approach for decision-making RP

This section proposes a heuristic algorithm for a solution to the regenerator site selection problem by computing the budgeted connected dominating set on the network. The proposed algorithm incorporates the following features: i) Each selected regenerator node must be connected to at least another within a maximum distance of \(\Phi \Re\). ii) The regeneration demand for each non-regenerator node is required to be accommodated by the selected regenerator nodes in the vicinity within a maximum distance of \(\Phi \Re\). iii) The site selection is made to minimize the total cost of the network’s regeneration facility (assumed to be equal for all nodes in the network). The budget refers to the maximum number of regenerator sites allowed to be set up, assuming construction cost of \(c_{i}=1\) for each node i.

4.2.1 Basic concepts: (\(\Phi \Re\)-connected \(\Phi \Re\)-distance dominating set problem)

Given a graph \(\mathcal{G}^{'} = (\mathcal{V}^{'}, \mathcal{E}^{'})\), a subset \(Z\subset \mathcal{V}^{'}\) is called a connected dominating set if every node in \(\mathcal{V}^{'}\) is adjacent to at least one node in \(\mathcal{V}^{'}\) and the subgraph induced by \(\mathcal{V}^{'}\) is connected [41]. More precisely, with given \(\mathcal{G}^{'} = (\mathcal{V}^{'}, \mathcal{E}^{'})\) and two input parameters \(\Phi \Re\) and k, the \(\Phi \Re\)-distance dominating set problem determines whether \(\mathcal{G}^{'}\) contains a set Z of at most k nodes such that every vertex in \(\mathcal{G}^{'}\) has a distance of at most \(\Phi \Re\) to a node in Z. When \(\Phi \Re\) equals edge-length (hop distance), it is just the dominating set problem. When there is a subset \(Z\subset \mathcal{V}^{'}\) with \(|Z|=k\) such that \(\Phi \Re\)-dominates \(\mathcal{G}^{'}\) and \(\mathcal{G}^{'}[Z]\) is connected we have a solution for connected \(\Phi \Re\)-distance dominating set problem for \(\mathcal{G}^{'}\) within a budget of k. A set is said to be \(\Re\)-connected in a graph \(\mathcal{G}^{'}\) if it induces a connected sub-graph \(\mathcal {G}^{'}_{\Re }\) derived from \(\mathcal{G}^{'}\) by inserting an edge between any two nodes that have a distance of at most \(\Re\) in \(\mathcal{G}^{'}\).

We observed that the connected dominating set problem might be used as the main subproblem when solving the RP problem. In the following subsection, we describe a heuristics algorithm for the dominating set problem. We discuss our approach in finding a CDS solution in Sect. 4.2.3. In Sect. 4.2.4 we present the regenerator site selection algorithm solving the budgeted \(\Phi \Re\)-connected \(\Phi \Re\)-distance dominating set problem.

4.2.2 A dominating set (DS) algorithm

The DS heuristic presented in Algorithm 2 identifies the dominating nodes to cover each of the nodes in the network (with a maximum path length of \(\Phi \Re\)), an adaptation of the work reported in [65].

figure d

The main steps of the DS algorithm involve following ideas. In Step I, for each of \(i \in {\tilde{\mathcal N}}\), two count values are initialized, \(Deg^{Cnt}_i\) for node degree count (i.e., \(deg(i) + 1\); node i is adjacent to itself) and \(Cov^{Cnt}_i\) counts the number of adjacent nodes j covered (i.e., all adjacent j’s whose regeneration demand \(\delta _{j}\) may be accommodated with capacity \(\zeta _{i}\)) in dynamic scenario by node i. \(Cov^{Cnt}_i\) denotes the potential of a node to become a regenerator site (used for ranking nodes \(i \in {\tilde{\mathcal N}}\) in accordance to their \(Cov^{Cnt}_i\) values). We grow dominating set X as late as possible, initialized to empty set \(\varnothing\) in Step II of the algorithm.

Step III is the main step of the algorithm, repeated \(|{\tilde{\mathcal N}}|\) times. A node i with least \(Cov^{Cnt}_i\) is selected in Step III-1 depicting the idea that lower degree nodes should be tried first. We check all isolated nodes first having lower \(Cov^{Cnt}_i\) values and then proceed toward non-isolated nodes clearly having higher values for the two counts. If there exists a node j adjacent to i with \(Deg^{Cnt}_j = 1\), add node i to current dominating set X (i,e,. node i is the only possible node left in \({\tilde{\mathcal G}}\) to cover node j). Once node i gets included in X, \(Deg^{Cnt}_j\) for all nodes \(j \in {\mathcal {N}_{i}^{\Phi \Re }}\) are set to 0 to designate them as covered by node i (Step III-2). If no node j adjacent to i has \(Deg^{Cnt}_j = 1\), this information is used to update the \(Deg^{Cnt}_j\) and \(Cov^{Cnt}_j\) to indicate that no node j is solely dependent on i for its regeneration requirements. \(Deg^{Cnt}_j\) is decremented since node i no more covers j and \(Cov^{Cnt}_j\) is incremented to improve the potential of node j, for they are more worthy to be in X now. Finally, at the end of the current loop, node i is marked as checked by setting \(Cov^{Cnt}_i\) to infinity. After \(|{\tilde{\mathcal N}}|\) executions of the Step III, the dominating set X is returned (Step IV).

4.2.3 Generating a connected subgraph: (branch-and-cut algorithm for Steiner tree in a graph problem (SPG))

As a result of Algorithm 2, the node set \(\mathcal {N}(X)\) may not be a connected subgraph. That being the case, we are required to include a few additional nodes to make the node set connected. One way to get a connected subgraph is to solve the Steiner tree problem in a graph \({\tilde{\mathcal G}} = ({\tilde{\mathcal N}},{\tilde{\mathcal E}})\). A few high potential nodes (also called Steiner nodes, \(Y \subset {\tilde{\mathcal N}}\)\(\mathcal {N}(X)\)) gets added forming a Steiner tree \(S=\mathcal {N}(X) \cup Y\), spanning the node set \(T = \mathcal {N}(X)\) representing the terminal nodes. A standard SPG problem finds a minimum-weighted tree \(S \subset {\tilde{\mathcal G}}\) (i.e., a subgraph S) for an undirected connected graph \({\tilde{\mathcal G}} = ({\tilde{\mathcal N}},{\tilde{\mathcal E}})\) with edge weights \(C : {\tilde{\mathcal E}} \rightarrow \mathbb {Q_{+}}\) and a set of terminal nodes \(T \subset {\tilde{\mathcal N}}\). We can easily see that, a solution to the SPG corresponds to an equivalent CDS solution. In other words, the node set S represents a CDS solution Z discussed in Sect. 4.2.1 when \({\tilde{\mathcal G}} = {\mathcal {G}}^{'}\).

A variation to the SPG called node-weighted Steiner tree problem (NWSTP) generalizes the SPG problem by adding weights to the vertices \(C : {\tilde{\mathcal N}} \rightarrow \mathbb {Q_{+}}\) in addition to the usual edge weights and has an objective of finding the set \(S=(\tilde{\mathcal{N}_S},\tilde{\mathcal{E}_S})\) that spans terminals T while minimizing \(C(S) = \sum _{v \in \tilde{\mathcal{N}_S}} C_v + \sum _{e \in \tilde{\mathcal{E}_S}} C_e\). Authors in [66] presented a Steiner tree in a directed graph formulation considering it to be equivalent to the Steiner Arborescence Problem (SAP). Given a directed graph \(G = (V, A)\) with edge weights \(C : A \rightarrow \mathbb {Q_{+}}\), a set of terminal nodes \(T \subset V\) and a root \(r \in T\), a directed tree (or arborescence) \(R = (V_R, A_R) \subset G\) is sought for satisfying the following two requirements. The first being, \(\forall t \in T\) the tree R has exactly one directed path from r to t, and secondly, \(C(R) = \sum _{a \in A_R} C_a\) is minimized. Authors in [67] suggested a method to transform NWSTP to a SAP by replacing each undirected edge \({\tilde{\mathcal E}}_{i,j}\) in \({\tilde{\mathcal G}}\) by two antiparallel arcs \(A_{i,j}\) and \(A_{j,i}\) of the same cost and designate an arbitrary terminal as the root r in the directed graph G. Further, the weights of each vertices are added to all of its entering incident arcs (now arc weight \(C_{a}^{'} = C_{i,j} + C_j\) for \(a=A_{i,j} \in A\)). Each solution of SAP, \(R_G (T)\) yields an equivalent solution \(S_{{\tilde{\mathcal G}}} (T)\) by replacing each directed arc (\(A_{i,j}\) or \(A_{j,i}\)) by the corresponding undirected edge (\({\tilde{\mathcal E}}_{i,j}\) or \({\tilde{\mathcal E}}_{j,i}\)). We use the SCIP-Jack solver for a NWSTP solution and refer readers to [68] for the implementation details of the branch-and-cut algorithm based on flow-balance-directed-cut NWSTP formulation reported first in [66].

Fig. 2
figure 2

Flowchart on working of heuristic (Algorithm 3)

4.2.4 Regenerator site selection algorithm

In this section, we present our approach (Algorithm 3) to solve RP problem with the objective of opening at most k sites such that the selected sites form a connected subgraph and can serve the demands of each node of the network. A flowchart in Fig. 2 summarizes our proposed heuristic.

figure e

In Step-I of the Algorithm 3, we construct the reachability graph \({\tilde{\mathcal G}}\) with an estimated optical reach of \(\Phi \times \Re\). Initially, when no regenerator sites have been selected, the node-set representing the solution vector \(S_0\) is empty. With Step-I inputs, we identify the dominating nodes in the network by using Algorithm 2 in Step-II. The result of this step ensures that at least one of the dominating node in \(\mathcal {N}(X)\) is reachable from each dominated node. However, a dominating node may not be reachable from another (i.e., node-set X is not \(\Phi \Re\)-connected). Therefore, Step-IV of the RP Algorithm is invoked from Step-III to add additional nodes to X, if X is not connected and the solution so far (i.e., node-set X) is within a budget of k. The node-set size |X| quantifies the lower bound on the number of regenerator sites required and so, if found exceeding a given budget, it means that no feasible solution exists for the problem instance. Otherwise, Step-V is executed to check for the satisfiability of all the demands in the network. When both budget constraints and demands are met, Step-VI is invoked to declare the current feasible solution \(S_i\) as the final solution S. We declare problem infeasibility in two other cases, the first being the case in which X is not connected and node-set size |X| do not allow for the addition of nodes to X, for making it connected. Secondly, in Step-V, if we find that all demands are not satisfied by CDS X and node-set size |X| do not allow for nodes to X for accommodating any additional demands. A node \(m \in {\tilde{\mathcal N}}\)X with largest node potential score is added to X, in case |X| is still within budget, and there is scope for better solution satisfying more demand. In Step-IV of Algorithm 3, after computing node potentials for each node in the network, a directed graph G is constructed to transform our NWSTP to an SAP solved by the Branch-and-cut algorithm. The arc weights are adjusted in a directed graph to include the node weights (node costs) of the undirected graph, making weights of nodes and edges collectively decide on the resulting Steiner tree’s cost. The dominating set X represents the terminal nodes to be spanned by a Steiner tree; |X| runs of the Branch-and-cut algorithm generate at-most |X| distinct trees assuming each node \(r \in \mathcal {N}(X)\) as the root node. The Steiner tree with the least nodes is selected as the candidate solution \(S_i\).

5 Experimental results

In this section, we present experimental results conducted on two network topologies: the NSF-United States (Fig. 3 ) and Pan-European COST-266 (Fig. 4 ) networks [69]; with topological details presented in Table 2. We analyze the results of iterative-\(RP^{MILP}\), Algorithm 1, and Algorithm 3 assuming that the regeneration demands are calculated based on the hose traffic input. We produce the hose traffic in random, with ingress/egress bounds (column 2 in Table 3) for each node i: \(R_i\) and \(C_i\) assigned integer values following uniform distribution in three (low, medium and high) demand categories. We categorize the lightpath demands to study its effects on the RP solutions. We assumed that each edge in the network represents two fibers in the opposite direction and each’s channel capacity to be 96. Assuming that the total number of lightpath requests from each node is destined for all possible \(|\mathcal {N}-1|\) nodes, we randomize the ingress/egress values. In the low range, it ranges from \(0-0.0175\) times the maximum number of channels per fiber, whereas, in the medium category, it is \(0-0.035\) times, and for high, it is \(0-0.07\) times. For instance, considering a node in NSF-US topology, the node ingress (egress) for medium demand category would range between 0 and 43 (i.e., \(\lfloor 13 \times 96 \times 0.035 \rfloor\)). Note that multiple requests are allowed between two nodes in our work.

Using the above method to estimate the regeneration demands may be abated by simulating pipe traffic on respective topologies, assuming a suitable routing strategy, but it will not capture the worst-case network congestion ratio. In general, the hose model’s congestion ratio is larger than that of a pipe model since all possible traffic requests bounded by the hose model are considered in route selections.

We experimented with two sets of values for the traffic distribution vector \(\alpha\), in one we kept it static with a uniform value of \(1/|\mathcal {N}|\) and the other calculated with the use of equation (8). Note that the two-phase MPCF routing is considered in respective topologies to arrive at the \(\alpha\) values in equation (8). The characteristics of the demands sourced at a node (maximum, median and average counts) for respective topologies observed in our experiments using MPCF routing are presented in the third column of Table 3. An equivalent maximum demand estimation to a pipe model is given in the fourth column of Table 3 for an analogy. The data reported are averaged over 10 runs for each network. The cumulative traffic flows at any node i, \(\rho _i\) is computed by using equation (9) which is utilized further to estimate the regeneration demands \(\delta _i\) with use of equation (7).

Table 2 Topological details
Fig. 3
figure 3

NSF-US Network with edge lengths in kilometers

Fig. 4
figure 4

COST-266 Network with edge lengths in kilometers

Table 3 Hose demand characteristics

We assumed a maximum optical reach of \(\Re\) of 5000 km, referring to the normalized optical reach reported in [5] for the PM-BPSK modulation technique. The ILP simulation results of iterative-\(RP^{MILP}\) and Algorithm 1 were obtained using the SCIP 6.0.2 Optimization suite with ZIMPL 3.3.8 as the modeler, and SoPlex 4.0.2 as LP solver [70]. The heuristic Algorithm 3 is implemented in C++ using the Boost graph library and SCIP-Jack solver for generating the NWSTP solutions.

The problem instances were generated considering five different \(\Phi\) values among \(\{\Phi _{min}, 0.4, 0.6, 0.8, 1\}\), where \(\Phi _{min}\) represents the smallest \(\Phi\) for which \(\Phi _{min} \times \Re \ge\) minimum optical reach required for maintaining connectivity of all nodes in respective topologies (i.e., 1350 (1209) km for NSF (COST)). The feasibility of our problem instances considering the maximum budget allowed (cumulative cost of regenerator sites) is directly influenced by the \(\Phi\) values. It is evident from our experimental results that the \(\Phi _{min}\) (0.27 for NSF and 0.24 for COST) limits the optical reachability of a lightpath in respective topologies, making problem instances harder to solve, and so are the best test cases for our algorithm analysis.

Table 4 Results of iterative-\(RP^{MILP}\) versus Algorithm 1 for 14-node NSF topology

Four different experiments were conducted in this study. The first aims for comparing the performance of Algorithm 1 to the optimum results of iterative-\(RP^{MILP}\) changing \(\Phi\) and the traffic size (Sect. 5.1). In the second test, we examine how the objective function of Algorithm 1 varies while using different criteria for selecting the most promising regenerator site for inclusion in the final solution (Sect. 5.2). In the final experiments, the results of Algorithm 1 and Algorithm 3 are compared to prove the efficacy of our proposed heuristic in a real-world network (Sect. 5.3). The traffic distribution vectors are calculated both with uniform (i.e., static) distribution and non-uniform (i.e., MPCF) distribution methods. The demands are segregated in low, medium, and high categories and observations were made to see its effects on performances of algorithms keeping the regeneration capacities and cost of each node fixed. Section 5.4 summarizes the experimental setup and the essential differences found between Algorithm 3 and COR2P [71] heuristics. The results of experiments are discussed further in (Sect. 5.5).

5.1 Algorithm 1 versus iterative-\(RP^{MILP}\)

We generated 20 feasible instances for each demand category (changing node ingress/egress bounds) randomly and used a smaller 14-node NSF topology for our first experiment. The regeneration capacity \(\zeta\) of each node is fixed at 96, and the regenerator site cost c for each node is assigned a random value in the range of (0,1]. The traffic distribution based on the MPCF scheme is used for the results presented in Table 4. When we arrive at the same objective function values (with exactly the same regenerator sets) considering the feasible instances by both the iterative-\(RP^{MILP}\) and Algorithm 1 approach, we say that the solutions matched. Column 2 in Table 4 indicates the number of feasible and matched solutions obtained among the 20 instances. We observed that all instances were feasible when \(\Phi = 1\) and the node ingress/egress bound within 40. As \(\Phi\) value decreases towards \(\Phi _{min}\), the number of feasible solutions kept decreasing as the constraint (6.2) increases strength. It is worth mentioning that if all the nodes in the given topology are assumed to be regenerator sites with each assigned maximum regeneration capacity, and still we do not have a feasible solution, it means that the problem instance is unsolvable. Columns 3 and 4 present the best objective function and computation times, respectively, for feasible instances. The results show that the iterative-\(RP^{MILP}\) took much longer time for larger number of iterations (i.e., 14), instead of only 1 in case of Algorithm 1 and gives optimum results always as expected. We could not find feasible solutions for cases with \(\Phi _{min}\) and ingress/egress bounds of 0-to-60 by both the approaches. The number of regeneration sites selected (shown in brackets) among the feasible cases matched for all the instances; however, we noticed a few variations in the objective function values generated. The variations arise due to the differences in the actual regeneration sites selected (thus changing the total regenerator site costs), though the total regeneration sites may be the same. The iterative-\(RP^{MILP}\) approach produced the best results but works only for smaller networks and quickly becomes intractable with an increase in the problem size.

Table 5 Comparative results of Algorithm 1 for 14-node NSF topology with non-uniform distribution of medium traffic demands and different primary node selection criteria

5.2 Primary regenerator selection criteria and Algorithm 1

In Table 5, we present the results to show the superiority of using the maximum node potential method in the primary node selection process required for running Algorithm 1. Three methods of primary node selection (i.e., random, maximum degree, and maximum potential) are compared, giving average values of the objective function and regenerator sites for ten runs with NSF topology and random site costs. The max potential method fared well in both the above parameters requiring longer computation time slightly. The random approach is the better choice than the max degree for smaller values of \(\Phi\).

Table 6 Experimental results of Algorithm 1 versus Algorithm 3 keeping site cost and regeneration capacity equal for all nodes in NSF and COST topology

5.3 Algorithm 1 versus Algorithm 3

In the third set of experiments, we kept regeneration costs (\(c=1\)) and capacities (\(\zeta =192\)) fixed at each node for the NSF and COST networks. We generated 10 random instances for each demand category considering the static and MPCF traffic distribution vector (\(\alpha\)); the average results of the feasible instances are presented in Table 6. Some cases where either no feasible solution could be found, or the process got killed due to insufficient resources (memory/time) are denoted as “–” in the table. As expected, Algorithm 1 always produced a better solution than that of Algorithm 3 irrespective of the variations in topology, traffic, and \(\Phi\). As traffic size increased, the ILP solver used in Algorithm 1 is soon overwhelmed and proved its unscalability. The heuristic Algorithm 3 is very quick in finding solutions, and linear growth in its computation time requirement was seen as traffic size increased, though at the cost of sub-optimal values.

5.4 Algorithm 3 versus COR2P [71]

In the final set of experiments, we compare the results of our heuristic approach (Algorithm 3) with another named Cross-Optimization for RWA and Regenerator placement (COR2P) [71]. COR2P is a three-step heuristic that works on NC &R placement approach and uses pipe traffic inputs. In the first stage of COR2P, the algorithm searches for preliminary routes for demands requiring limited resources, adhering to the wavelength continuity constraint. The second stage involves finding prospective regenerator locations using QoT estimations of routes generated by bit-error-rate predictor. The step 3 performs the RWA and RP utilizing inputs from the previous two steps.

For comparison, we utilize the 14-node NSF topology under equivalent settings. The results reported are averaged over 20 runs of the algorithms. We build a traffic matrix (T) at random, with each element (\(T_{ij}\)) indicating the demand for pipe traffic between a given source(i)-destination(j) pair. All elements are spread evenly between 1 and 7 channels, generating less than 800 lightpath demands. The ingress/egress parameters for the equivalent hose model are calculated by adding all elements of the respective column and row in T. i.e., \(R_{i} = \sum _{j \in 1 .. 14} T_{ij}\) and \(C_{i} = \sum _{j \in 1..14} T_{ji}, i \ne j, \forall i \in 1..14\). The final value for the ingress/egress (\(R_{i}\)/\(C_{i}\)) for each node i is assumed to be the peak value among the twenty derived ingress/egress values. i.e., \(R_{i} = max (R_{i}^1, R_{i}^2,.. , R_{i}^{20})\) and \(C_{i} = max (C_{i}^1, C_{i}^2,.. , C_{i}^{20})\). To create a level playing field for comparing the two algorithms, the regeneration capacity for each node is fixed to 64. To favour regenerator concentration, the weigh of the regeneration cost is set to 0.9 (used in calculating the global cost function for each routes) in COR2P. We assign a value of 0.28 to the ratio (regeneration-sites/nodes-in-network; equivalent to the budget of five regenerator sites in Algorithm 3) during the first phase of COR2P. The ratio value less than one and a weight near-zero in COR2P urge COR2P to place regenerators among the restricted number of sites. We generated six shortest paths for each demand in our experiments. In Algorithm 3, we set \(\Phi = 1\) and \(\Re = 3408\) km (max edge length for NSF) since COR2P uses hop distances. The number of wavelengths per fiber is set to 64 in both the algorithms. Once we find the RP solution with Algorithm 3, we apply our proposed RWA heuristic [14] to arrive at the blocking ratio values. Note that the RWA heuristic in [14] use non-simple routing and pipe traffic.

5.5 Discussion

Concerning the traffic distribution method used, we experimented with two scenarios; the static distribution assumes an ideal load balancing routing strategy whereby each node gets a uniform share of the network traffic, whereas, in another scenario, the MPCF routing scheme (Sect. 3.5) is used to distribute the network traffic nonuniformly. In the COST network, the static distribution is seen favouring higher load scenarios (smaller values of regeneration sites required) in both the outcomes of Algorithm 1 and Algorithm 3. This is more prominent from the observation that only a single instance is found feasible for MPCF distribution even with the use of Algorithm 3 for high load category and \(\Phi = \Phi _{min}\); whereas, seven instances were solvable when traffic distribution was static. The opposite seemed to be happening when traffic size is smaller in the COST network. The static distribution in the NSF network showed no such trend concerning the changing traffic size and consistently induced poorer results than that of the MPCF distribution using Algorithm 1 and Algorithm 3. The NSF and COST topology are quite different and so maybe behaving differently to the variations in traffic distribution. At first, we see this in the mean link lengths (1299 km in NSF vs. 625 km in COST); secondly, nodes are mostly located in the periphery of the NSF network nodes are located closely in the core. In the NSF network, alternative paths between each node pair are less compared to the COST network, and so, it is comparatively harder to solve problem instances in COST than that of an NSF network, especially for the high traffic scenarios. The larger mean link length also seems to favour the problem instances in the NSF network as the optical reach constraints are stronger, making ILP formulations easier to solve using the SoPlex solver.

Table 7 Experimental results of Algorithm 3: percentage of nodes appearing as regenerator sites with 28-node COST-266 topology and MPCF demand distribution

Table 7 presents elaborate results of the ten separate instances involving a 28-node COST network and MPCF demand distribution given in Table 6 above. In each column (depicting a node among 28 nodes), we show the percentage of the 10 instances for which that node is present as a chosen regenerator site. If we observe results in the first row involving a high traffic category with \(\Phi = \Phi _{min}\), we can understand that nodes 14, 20, and 28 were present in all feasible solutions among the 10 instances; whereas, node 10, 17, and 23 were chosen only twice. There is a clear increasing trend seen in the number of regenerator sites chosen to increase traffic size and decrease \(\Phi\) values. Excluding some solutions with minor deviations to the trend, we observe a steady increase in the percentage values for selected nodes in the solutions. With decreasing \(\Phi\), the node’s percentage values show fewer deviations as alternative solutions increase in number, and choosing an optimum solution among feasible solutions becomes tougher. Therefore, from a network management perspective, a combination of solutions (each providing its preferences in regenerator sites and costs) involving various regeneration capacity assumptions for the nodes, the modulation level supported, and projected traffic conditions in the network must be considered while making the final regenerator placement decisions. Our experimental results for COST topology show that nodes 3, 5, 14, 20, 24, and 28 are the most promising sites for regenerator placements and account for 37% of all our solutions.

Fig. 5
figure 5

Average values of objective function and computation time changing with ingress/egress bounds for NSF network

In Fig. 5, we plot average values for the objective function and computation time using the same settings as the first set of experiments conducted on the NSF network. We compare results of iterative-\(RP^{MILP}\), Algorithm 1, and heuristic Algorithm 3 to show the effectiveness of our proposed heuristic concerning the other two methods. Note that, for ingress/egress bounds of 0-to-60 the results are not shown for iterative-\(RP^{MILP}\) and Algorithm 1 as we could not run ILP solver due to the high memory requirements. The computation time increases with an increase in traffic size; with iterative-\(RP^{MILP}\) growing the fastest and Algorithm 3 the slowest, but has the worst solution quality. However, we can clearly see that the heuristic is scalable and capable of delivering good solutions taking a reasonable amount of time.

Fig. 6
figure 6

Regenerator distributation per site for NSF network: average values of twenty iterations using COR2P [71] versus single iteration using Algorithm 3

Fig. 7
figure 7

Blocking ratio for NSF network: twenty iterations using COR2P versus (single iteration using Algorithm 3 for RP + twenty iteration of RWA Algorithm [14])

In Fig. 6, we plot the distribution of regenerators among network nodes using Algorithm 3 and COR2P [71] on the NSF topology. We discovered that Algorithm 3 places regenerators at five regenerating sites, whereas the twenty iterations of COR2P on various traffic matrices produced inconsistent site choices and each with different numbers of regenerator units. The values attached to each node for COR2P of the Fig. 6 represent the average number of regenerator units placed by twenty algorithm runs. The total number of regenerator units placed using Algorithm 3 was 26 less than the cumulative average value of around 256 by COR2P. We find that Algorithm 3 optimizes the overall number of regenerators and produces consistent regenerator site suggestions for identical scenarios. The hose traffic model seems to capture the changes in traffic better than the pipe model. Once we find the regenerator sites and the number of units to deploy by Algorithm 3, we utilize our RWA heuristic [14] to calculate the blocking ratio for each traffic matrices. Figure 7 illustrates the blocking ratio in twenty iterations for both algorithms. We observed a significant reduction in the blocking ratio in all traffic scenarios. To conclude, the results of our experiments present a blocking ratio 11% better than COR2P. In addition to having a better regenerator placement, this may be attributed to the fact that the RWA algorithm accepts non-simple routing giving more room for lightpath demands. The study in [56] revealed that the blocking probabilities for the path requests depend on the possible synergies between the regenerator placement and allocation strategies. Our results appears to support their findings.

6 Summary and conclusions

In this paper, we addressed the problem of regenerator site selection considering uncertain traffic. The locations where regeneration demands are more may not be the sole criteria for making the final placement decisions, since node-specific constraints like the cost of placement also play an important role. Consequently, to account for the regeneration capacity of a location, we took a hybrid approach by first utilizing the topological details for regeneration demand prediction and later use this to make the final placement decision with a traffic-based approach. We have restated the regenerator placement problem to include heterogeneous node-specific constraints and added flexibility to the optimization model. The coexistence of signals with different modulation levels (each having a corresponding maximum optical reachability) for transmission also affect the optimal choice of regenerators in a flexible network. We keep the optical reach distance as tuneable and control it through stretch factor. Uncertainty in traffic is handled with the hose traffic model, which specifies the input traffic bounds at each node and does not require the source-destination to be specified for each flow explicitly.

Mathematical formulations were proposed to model the problem and we proved the NP-completeness. Exact and heuristic algorithms were proposed to solve the problem. We proposed three approaches to solve the problem, with each having distinct characteristics that may be utilized as per the required quality of a solution, in terms of problem size, computational, and storage efficiency. Detailed experimental results show the efficacy of our algorithms. Our model would also help manage decision-making by solving some evolving issues of mixed-signal provisioning in flexible optical networks.