1 Introduction

In the late twentieth century, bandwidth reservation strategy was designed to provide quality of service (QoS) for real-time multimedia applications, such as satellite communication and video-conferencing. A number of bandwidth reservation protocols and models were then proposed, such as resource reservation protocol (RSVP) [1], asynchronous transfer mode (ATM) [2] and internet integrated service model [3, 4]. Because of its solid performance, bandwidth reservation has been increasingly used in recent years for the transfer of extremely large amounts of data [5,6,7,8,9]. For example, the Large Hadron Collider (LHC), the most well-known high-energy particle accelerator, can generate up to 30 petabytes of data per year [10], and the climate data in climate science is expected to exceed 100 exabytes by 2020 [11]. Such sheer volume of data is normally generated at one data center and then needs to be transferred to other geographically distributed data centers for collaboration [8, 12]. Bandwidth reservation on dedicated channels of high-performance networks (HPNs) has proven very effective for such extremely large amounts of data transfer. For example, the data generated by LHC has been transferred globally using the on-demand secure circuits and advance reservation system (OSCARS) deployed in the energy sciences network (ESnet) [13], one of the most widely used bandwidth reservation services in scientific area.

Besides ESnet, there are many other similar HPNs providing bandwidth reservation services, such as Internet2 ION [14], circuit-switched high-speed end-to-end transport architecture [15], user controlled light paths [16], Japanese Gigabit Network II [17], UltraScience Net [18] and Bandwidth on Demand in Geant2 network [19]. Considering the exponential growth of the data generated from the next generation scientific research applications, the bandwidth reservation service provided by dedicated HPNs is expected to be deployed and used by more and more applications across the globe.

To make the data transfers using bandwidth reservation, bandwidth reservation requests (BRRs) are firstly created, specifying properties of the data transfers as well as the scheduling constraints and performance requirements. One typical BRR specifies the following data property parameters: the source end-site, the destination end-site, size of the data to be transferred, the maximum local area network (LAN) bandwidth, the data available time, and the data transfer deadline. After the receipt of one BRR, the underlying scheduling network then employs scheduling algorithms to identify the data transfer path and allocate bandwidth resources on that path within a certain time interval. Only when all the constraints and requirements of the received BRR have been successfully satisfied can the corresponding data transfer start.

Although many different problems regarding bandwidth reservation have been studied in the past decades, there are still some critical problems to be investigated. One of them is the design of effective scheduling strategy to achieve the trade-off between data transfer cost and other data transfer performance parameters. Challenges of this problem arise from the requirements desired by both the users and the bandwidth reservation service providers. As for the users, the most common data transfer performance parameter is the data transfer completion time. However, many times the users require their data transfers to be finished not only before the specific deadlines, but also with the minimal costs charged by the bandwidth reservation service providers under certain data transfer cost model. While for the bandwidth reservation service providers, successfully transferring the data using the minimum bandwidth resources is highly desired to achieve high system throughput for the maximum profit and resource utilization.

In this paper, we focus on the trade-off between data transfer cost and the most common data transfer performance, i.e., data transfer completion time. We consider the scheduling of two types of BRRs regarding such trade-off: (1) to achieve the minimum data transfer cost given the data transfer deadline, referred to as MinC-TC (minimize data transfer cost with time constraint), and (2) to achieve the earliest data transfer completion time given the maximum data transfer cost, referred to MinT-CC (minimize data transfer completion time with cost constraint). We assume that the data transfer cost is mainly determined by the amount of the reserved bandwidth, time duration of the bandwidth reservation and length of the bandwidth reservation path. We propose two bandwidth reservation algorithms with rigorous optimality proofs, called Opt-MinC-TC and Opt-MinT-CC (“Opt” denotes “Optimal”), to optimize the scheduling of these two types of BRRs. We then compare the proposed algorithms with two scheduling algorithms originating from one widely used scheduling algorithm in production networks, called FBR-MinC-TC and FBR-MinT-CC (“FBR” denotes “Flexible Bandwidth Reservation”), from the perspective of various performance metrics. The efficacy of Opt-MinC-TC and Opt-MinT-CC is verified through extensive simulations on simulated ESnet.

The rest of this paper is organized as follows. We show the work related to bandwidth reservation in dedicated HPNs in Sect. 2. The mathematical models, concepts of bandwidth reservation and problem formulation are presented in Sect. 3. Detailed algorithm designs and illustrations of Opt-MinC-TC and FBR-MinC-TC, and Opt-MinT-CC and FBR-MinT-CC are given in Sects. 4 and 5, respectively. We conduct extensive simulations and results analysis in Sect. 6, and conclude our work in Sect. 7.

2 Related Work

Table 1 Bandwidth reservation related work in recent years

Because of the wide use of bandwidth reservation service for the immense data transfer to achieve the guaranteed performance, various problems have been investigated in the past few years. To show these related work in a clearer way, we tabulate them in Table 1. For each existing research shown in Table 1, if complexity of the studied problem is NP-complete, the corresponding NP-complete proof is given and heuristic algorithm is then proposed; otherwise, the optimal algorithm is designed and the corresponding optimality proof is given except [9]. Compared with existing work, our problems under investigation are very different and unique from both the perspectives of objectives and algorithm design.

3 Mathematical Models, Concepts of Bandwidth Reservation and Problem Formulation

In this section, we first show mathematical models and concepts of bandwidth reservation, followed by data transfer cost model and problem formulation.

3.1 Mathematical Models and Concepts of Bandwidth Reservation

A number of definitions and parameters will be introduced in this section to show the mathematical models and concepts of bandwidth reservation in a concise and clearer way. For convenience, several parameters are tabulated in Table 2.

The two types of BRRs considered in the paper, namely MinC-TC and MinT-CC, are described as follows:

  • \((v_s,v_d,B^{max},D,[t^S,t^E])\): Identify the bandwidth reservation option with the minimum data transfer cost under the constraint that the data transfer must be completed no later than the preset deadline \(t^E\);

  • \((v_s,v_d,B^{max},D,t^S,C^{max})\): Identify the bandwidth reservation option with the earliest data transfer completion time under the constraint that the data transfer cost must not exceed the preset maximum cost \(C^{max}\).

In the above notations, D denotes size of the data to be transferred from the source node \(v_s\) to the destination node \(v_d\), while \(t^S\) and \(B^{max}\) denote the earliest possible data transfer start time and the maximum LAN bandwidth constraint, respectively [10]. The reserved bandwidth for one BRR is upper limited by \(B^{max}\).

Table 2 Definitions of some parameters introduced in Sect. 3
Fig. 1
figure 1

Topology of an example scheduling network (left) and the available bandwidth table of each edge within time interval \([0, \infty )\) (right)

We model a scheduling network as a graph G(VE), where V and E represent the set of nodes and edges, respectively. Suppose we have an example scheduling network G shown on the left side of Fig. 1, we have \(V = \{v_s, v_1, v_2, v_d\}\) and \(E =\{v_s-v_1, v_s-v_2, v_1-v_2, v_2-v_d\}\). The available bandwidth table showing the available bandwidth of each edge within time interval \([0, \infty )\) is presented on the right side of Fig. 1. Bandwidth capacity of an edge is defined as the amount of available bandwidth of that edge without any bandwidth reservation on it. We suppose the bandwidth capacity of each edge of G is its maximum available bandwidth within time interval \([0,11\,\hbox {s}]\) and there is no bandwidth reservation on any edge of G after 11s. We suppose G has one MinC-TC BRR to schedule, and the BRR requires to transfer 36 Gb data from \(v_s\) to \(v_d\) within time interval \([0, 11\,\hbox {s}]\) with the minimum data transfer cost. Suppose the specified maximum LAN bandwidth of the received BRR is 12 Gb/s, we can represent the above MinC-TC BRR as \((v_s,v_d,12\,\hbox {Gb/s},36\,\hbox {Gb},[0,11\,\hbox {s}])\).

As we can see from Fig. 1, available bandwidth of an edge might change from time to time. Such bandwidth dynamism is caused by the dynamic bandwidth reservation and release on the edge [12]. Given time interval \([t^S,t^E]\) and scheduling network G, for any time point \(t \in [t^S,t^E]\), if any edge of G has different available bandwidths at time point \(t - \delta\) and \(t + \delta\), \(\delta \rightarrow 0\), we call time point t a time dot [5]. These two end points of the given interval \([t^S,t^E]\), \(t^S\) and \(t^E\), are also regarded as time dots. For example, G shown in Fig. 1 has three time dots within \([0,11\,\hbox {s}]\), i.e., \(\left\{ 0, 6\,\hbox {s}, 11\,\hbox {s} \right\}\), and four time dots within \([0,\infty )\), i.e., \(\left\{ 0, 6\,\hbox {s}, 11\,\hbox {s}, \infty \right\}\).

Given time dots list \(\left\{ td_0, td_1, \dots , td_{n-1} \right\}\), the time interval between any two adjacent time dots is called a time step while the time interval between any two different time dots is called a time window. Hence, a time step has the form \([td_i, td_{i+1}]\), \(0 \le i \le n-2\), while a time window has the form \([td_i, td_j]\), \(0 \le i < j \le n-1\). In the rest of the paper, we also denote time step i as \(ts_i=[ts_i^s, ts_i^e]\) and time window i as \(tw_i=[tw_i^s, tw_i^e]\). For example, within \([0,11\,\hbox {s}]\), G shown in Fig. 1 has two time steps, i.e., \(ts_0=[0, 6\,\hbox {s}]\) and \(ts_1=[6\,\hbox {s}, 11\,\hbox {s}]\), and three time windows, i.e., \(tw_0=[0, 6\,\hbox {s}]\), \(tw_1=[6\,\hbox {s}, 11\,\hbox {s}]\) and \(tw_2 = [0, 11\,\hbox {s}]\) (a time step is also a time window). These three time windows are shown in Fig. 1. It is not difficult to see that the available bandwidths of all edges of G within a time step do not change, and a time window consists of one time step or multiple consecutive time steps.

We represent the available bandwidth of edge e within time step \(ts_i\) as \(B(e, ts_i)\), which can be directly read from the available bandwidth table of G. For example, within time step \(ts_0=[0,6\,\hbox {s}]\), we have \(B(v_s-v_1, ts_0) = 10\,\hbox {Gb/s}\). The bandwidth resource of an edge within a time step is defined as the maximum amount of data that edge can transfer within that time step. So for edge e within time step \(ts_i = [ts_i^s, ts_i^e]\), its bandwidth resource equals \(B(e, ts_i) \cdot (ts_i^e - ts_i^s)\). For example, for edge \(v_s-v_1\) within time step \(ts_0 = [0, 6\,\hbox {s}]\), its bandwidth resource equals \(10\,\hbox {Gb/s} \cdot (6\,\hbox {s} - 0) = 60\,\hbox {Gb}\). The bandwidth resource of G within time interval \([t^S, t^E]\) equals sum of the bandwidth resources of all edges within all time steps contained in \([t^S, t^E]\).

Available bandwidth of edge e within time window \(tw_i\) equals the smallest available bandwidth of e among all time steps contained in \(tw_i\). For example, time window \(tw_2\) consists of two time steps \(ts_0=[0, 6\,\hbox {s}]\) and \(ts_1=[6\,\hbox {s}, 11\,\hbox {s}]\), so available bandwidth of edge \(v_s-v_1\) within time window \(tw_2\) equals \(min(10\,\hbox {Gb/s}, 11\,\hbox {Gb/s}) = 10\,\hbox {Gb/s}\). Similarly, available bandwidth of path p within time window \(tw_i\), denoted by \(B(p, tw_i)\), is limited by the bottleneck edge of p, namely, the edge with the minimum available bandwidth within \(tw_i\). For example, path \(v_s-v_2-v_d\) consists of two edges \(v_s-v_2\) and \(v_2-v_d\), and available bandwidths of these two edges within time window \(tw_2\) are \(3\,\hbox {Gb/s}\) and \(12\,\hbox {Gb/s}\), respectively, so \(B(v_s-v_2-v_d,tw_2) = min(3\,\hbox {Gb/s}, 12\,\hbox {Gb/s}) = 3\,\hbox {Gb/s}\). We use L(p) to denote length of path p, namely the number of edges p contains.

For a given BRR, we say it can be successfully scheduled if we can make a bandwidth reservation option on one path of G and this option can satisfy all the requirements and constraints of that BRR. For convenience, we call the above path a qualified path and the reservation option a qualified reservation (QR), denoted by \((p,b,[t^s,t^e],c)\), where p, b, \(t^s\), \(t^e\) and c denote the qualified path (also the data transfer path), the reserved bandwidth on the qualified path, data transfer start time, data transfer end time, and data transfer cost, respectively [10]. A path within a time window is called a qualified path only when we can make at least one QR on that path within that time window. Given a BRR, we normally can identify multiple qualified paths and make multiple QRs on these paths within different time windows. For example, for the given MinC-TC BRR \((v_s,v_d,12\,\hbox {Gb/s},36\,\hbox {Gb},[0,11\,\hbox {s}])\), we can make infinite QRs for it on paths \(v_s-v_2-v_d\) and \(v_s-v_1-v_2-v_d\) of G within time interval \([0,11\,\hbox {s}]\). Here we are interested in two special QRs: the one with the minimum data transfer cost, denoted as QRMC, and the one with the earliest completion time, denoted as QRECT. Focus of this paper is design of the scheduling algorithms to identify QRMC for the given MinC-TC BRR and QRECT for the given MinT-CC BRR.

Given a BRR, if we want to successfully schedule it within \(tw_i\), length of \(tw_i\) should be at least \(\frac{D}{B^{max}}\), i.e., \(tw_i^e-tw_i^s \ge \frac{D}{B^{max}}\), and the reserved bandwidth within \(tw_i\) should be at least \(\frac{D}{tw_i^e-tw_i^s}\). We further derive that we can remove those edges with available bandwidths less than \(\frac{D}{tw_i^e-tw_i^s}\) within \(tw_i\), and this pruning does not affect the scheduling feasibility of the given BRR within \(tw_i\). For example, for the given MinC-TC BRR \((v_s, v_d, 12\,\hbox {Gb/s}, 36\,\hbox {Gb}, [0,11\,\hbox {s}])\) within time window \([6\,\hbox {s},11\,\hbox {s}]\), the minimum bandwidth is \(\frac{36\,\hbox {Gb}}{(11\,\hbox {s} - 6\,\hbox {s})} = 7.2\,\hbox {Gb/s}\). As we can see from Fig. 1, neither edge \(v_1-v_2\) nor \(v_s-v_2\) has adequate available bandwidth, so both edges can be removed when we try to schedule the given BRR within \([6\,\hbox {s},11\,\hbox {s}]\). Such redundant edges pruning strategy gives us two benefits: (1) the efficiency of the data transfer path identification process can be greatly improved because of the reduced searching space using Dijkstra’s algorithm, and (2) the non-NULL path returned by Dijkstra’s algorithm can finish the data transfer of the corresponding BRR for sure, which improves the BRR scheduling ratio. Such benefits will be further explained in Sects. 46.

3.2 Data Transfer Cost Model and Problem Formulation

The total bandwidth resource consumed by a QR is defined as the total bandwidth resource consumed to transfer data of the corresponding BRR, which equals

$$\begin{aligned} L(p) \cdot b \cdot (t^e-t^s) = L(p) \cdot D, \end{aligned}$$
(1)

where D is size of the data to be transferred in the corresponding BRR.

For the bandwidth reservation service providers, the profit comes from transferring data of the BRRs from users. In this paper, we assume the profit of providing bandwidth reservation service for a BRR within a time interval, namely the data transfer cost of that BRR, equals the overall amount of bandwidth resource consumed by the BRR to transfer its data within that time interval. With this data transfer cost model, it is easy to see from Eq. 1 that the best case, namely the minimum data transfer cost, for one BRR is that length of the data transfer path is 1, namely the source node and the destination node are directly connected by one edge.

Based on our analysis, we further describe these two types of BRRs, MinC-TC and MinT-CC, as follows:

  • \((v_s,v_d,B^{max},D,[t^S,t^E])\): Identify the qualified path with the least length among all possible qualified paths within all possible time windows contained in \([t^S,t^E]\), and return the corresponding QR on it, which is the QRMC of the BRR;

  • \((v_s,v_d,B^{max},D,t^S,C^{max})\): Identify the qualified path satisfying the following two constraints, and return the corresponding QR on it, which is the QRECT of the BRR: (1) its length is no larger than \(\lfloor \frac{C^{max}}{D}\rfloor\), and (2) data transfer on it has the earliest completion time among all possible qualified paths within all possible time windows contained in \([t^S,\infty )\).

Now scheduling these two types of BRRs comes down to how to identify the optimal paths.

4 Algorithm Design and Analysis for MinC-TC

In this section, we focus on the algorithm design and analysis for MinC-TC. Its optimal algorithm, Opt-MinC-TC, is proposed followed by the comparison algorithm, FBR-MinC-TC. For each algorithm, we present the detailed algorithm design and brief explanation, and then illustrate it using an example.

4.1 Algorithm Design and Analysis of Opt-MinC-TC

4.1.1 Algorithm Design and Explanation

As mentioned in the data transfer cost model in Sect. 3.2, data transfer cost of one BRR is actually proportional to the length of the data transfer path, so to achieve the minimum data transfer cost, we should make the bandwidth reservation on the path with the least length from the source node to the destination node within a certain time interval. Please refer to Algorithm 1 for the detailed algorithm design and pseudocode of Opt-MinC-TC. Its complexity is \(O\left( |td|^2 \cdot \left( |E |+ |V |\cdot \log {|V |} \right) \right)\) in the worst case. In the algorithm, we use a list named ltw to store the time windows within time interval \([t^S,t^E]\). Opt-MinC-TC is briefly explained as follows:

Line 5–6: If length of time window \([td_i, td_j]\) is less than \(\frac{D}{B^{max}}\), we know that the given MinC-TC BRR cannot be successfully scheduled within it. Hence, we only need to consider those time windows with the lengths at least \(\frac{D}{B^{max}}\).

Line 7: We consider the scheduling of the given MinC-TC BRR within each time window in ltw.

Lines 8–14: Within time window \(tw_i \in ltw\), we remove those edges with available bandwidth less than \(\frac{D}{tw_i^e-tw_i^s}\). Such pruning technique is explained in Sect. 3. After the pruning, if remaining of G becomes disconnected, and \(v_s\) and \(v_d\) are in two different components, we continue to the next time window in ltw because no path in current time window has enough available bandwidth to finish the data transfer of the given BRR; otherwise, we employ Dijkstra’s algorithm to identify the path with the least length from \(v_s\) to \(v_d\). We use variable p to record the qualified path with the least length among all time windows in ltw, and tw to record the corresponding time window (Line 14).

Line 15–18: After the iteration of all time windows in ltw, \(p \ne NULL\) denotes that we could successfully identify the qualified path with the least length. We then make and return the QR on p within time window tw as shown in Line 16, which is the QRMC of the given BRR. While after the iteration of ltw, \(p == NULL\) denotes we could not identify any qualified path within any time window in ltw, the given BRR could not be successfully scheduled, we then return NULL.

figure a

4.1.2 Algorithm Illustration

We illustrate Opt-MinC-TC using the example scheduling network G shown in Fig. 1 and the example MinC-TC BRR \((v_s, v_d, 12\,\hbox {Gb/s}, 36\,\hbox {Gb}, [0, 11\,\hbox {s}])\).

Initialize parameters as shown in Line 1 of Algorithm 1. For the example network G, the identified time window list is \(ltw = \left\{ [0, 6\,\hbox {s}], [0, 11\,\hbox {s}], [6\,\hbox {s}, 11\,\hbox {s}] \right\}\). Iterate through ltw. Within time window \([0, 6\,\hbox {s}]\), we try to remove those edges with available bandwidths less than \(\frac{36\,\hbox {Gb}}{6\,\hbox {s}-0} = 6\,\hbox {Gb/s}\), and no edge will be removed. The path with the least length from \(v_s\) to \(v_d\) is \(v_s-v_2-v_d\) with available bandwidth of \(6\,\hbox {Gb/s}\) and length of 2. Within time window \([0, 11\,\hbox {s}]\), we use similar strategy and edge \(v_s-v_2\) will be removed, the path with the least length from \(v_s\) to \(v_d\) is \(v_s-v_1-v_2-v_d\) with available bandwidth of \(7\,\hbox {Gb/s}\) and length of 3. Within time window \([6\,\hbox {s}, 11\,\hbox {s}]\), we use similar strategy, and edges \(v_s-v_2\) and \(v_1-v_2\) will be removed. After the removal, \(G'\) becomes disconnected, and \(v_s\) and \(v_d\) are in two different components, time window iteration stops here. We have \(p = v_s-v_2-v_d\) and \(tw = [0, 6\,\hbox {s}]\). Following Line 16 of Algorithm 1, the corresponding QRMC is \((v_s-v_2-v_d,6\,\hbox {Gb/s},[0,6\,\hbox {s}],72)\) and will be returned.

4.1.3 Optimality Proof

Theorem 1

Opt-MinC-TC returns the QRMC, if it exists, for the input MinC-TC BRR.

Proof

We use proof-by-contradiction to prove the theorem. Suppose for the input BRR, its QRMC exists and is made on path \(p''\) within time window \(tw''\). As we will see, the QR identified using Opt-MinC-TC is not NULL, and we suppose this QR is made on path p and we have \(L(p'') < L(p)\), namely length of the data transfer path of the QR identified by Opt-MinC-TC is larger than that of the optimal data transfer path.

Let us consider the scheduling within \(tw''\) using Opt-MinC-TC. From our supposition, we know that \(p''\) can finish the data transfer of the input BRR. So after the edge pruning procedure stated in Line 8 of Algorithm 1, \(v_s\) and \(v_d\) are in the same component of \(G'\). Line 12 of Algorithm 1 denotes the path with the least length from \(v_s\) to \(v_d\) is \(p'\), hence we have \(L(p'') \ge L(p')\). During the time window iteration, we use variable p to record the qualified path with the least length among all qualified paths within all time windows in ltw, so after the time window iteration, we have \(L(p) \le L(p')\). Along with \(L(p'') \ge L(p')\), we have \(L(p'') \ge L(p)\), which contradicts our initial supposition that \(L(p'') < L(p)\). Proof ends.□

4.2 Algorithm Design and Analysis of FBR-MinC-TC

4.2.1 Algorithm Design and Explanation

Design of FBR-MinC-TC originates from the scheduling algorithm proposed by the scientists at Lawrence Berkeley National Laboratory, who is currently managing ESnet [9]. Please refer to Algorithm 2 for the detailed algorithm design and pseudocode of FBR-MinC-TC. In the worst case, its complexity is also \(O\left( |td|^2 \cdot \left( |E |+ |V |\cdot \log {|V |} \right) \right)\). Line 3 of Algorithm 2 denotes that within each time window in ltw, we directly employ Dijkstra’s algorithm to identify the path with the least length from \(v_s\) to \(v_d\).

figure b

4.2.2 Algorithm Illustration

We illustrate FBR-MinC-TC using G shown in Fig. 1, and the example MinC-TC BRR \((v_s, v_d, 12\,\hbox {Gb/s}, 36\,\hbox {Gb}, [0, 11\,\hbox {s}])\).

Among all three time windows in \(ltw = \left\{ [0, 6\,\hbox {s}], [0, 11\,\hbox {s}], [6\,\hbox {s}, 11\,\hbox {s}] \right\}\), the path with the least length from \(v_s\) and \(v_d\) is the same as \(v_s-v_2-v_d\). After time window iteration, the estimated QRMC will be made on the recorded path \(v_s-v_2-v_d\) within recorded time window \([0, 6\,\hbox {s}]\): \((v_s-v_2-v_d,6\,\hbox {Gb/s},[0, 6\,\hbox {s}],72)\).

5 Algorithm Design and Analysis for MinT-CC

In this section, we focus on the algorithm design and analysis for MinT-CC. Its optimal algorithm, Opt-MinT-CC, is proposed followed by the comparison algorithm, FBR-MinT-CC. For each algorithm, we present the detailed algorithm design and brief explanation, and then illustrate it using an example.

5.1 Algorithm Design and Analysis of Opt-MinT-CC

5.1.1 Algorithm Design and Explanation

Please refer to Algorithm 3 for the detailed algorithm design and pseudocode of Opt-MinT-CC. Its complexity is \(O\left( |td|^2 \cdot |E| \cdot \left( |E |+ |V |\cdot \log {|V |} \right) \right)\) in the worst case. Opt-MinT-CC is briefly explained as follows:

Lines 9–14: After the edge pruning process (Line 5), if \(v_s\) and \(v_d\) are in the same component of \(G'\), we do the operations stated in Line 9. Every time we add one edge to \(G'\), we run Dijkstra’s algorithm and try to identify the path with the least length from \(v_s\) to \(v_d\) through the newly added edge. If the returned path \(p' \ne NULL\) and its length is no larger than \(\lfloor \frac{C^{max}}{D}\rfloor\), then the required data transfer could be successfully finished on \(p'\) and the cost is no larger than the preset maximum cost \(C^{max}\). If the data transfer on path \(p'\) has earlier completion time than t, Line 14 is executed and the edge iteration (Line 10) stops here. Because the edges in \(E'\) are sorted by their available bandwidth in descending order, the other qualified paths, if there are, within current time window will not have more available bandwidth than \(p'\), namely the data transfers on these qualified paths will not have an earlier completion time. So it is not necessary to consider these qualified paths.

Lines 15–18: During the time window iteration, we use variable t to record the earliest data transfer completion time. After the time window iteration, \(p \ne NULL\) denotes that we can successfully identify at least one qualified path, we then make and return the corresponding QR on p following Line 16.

figure c

5.1.2 Algorithm Illustration

We illustrate Opt-MinT-CC using the example scheduling network G shown in Fig. 1 and the example MinT-CC BRR \((v_s, v_d, 12\,\hbox {Gb}/\hbox {s}, 54\,\hbox {Gb}, 0, 200)\).

Initialize parameters as shown in Line 1 of Algorithm 3, and the identified time window list is \(ltw = \{ [0, 6\,\hbox {s}], [0, 11\,\hbox {s}], [0, \infty ), [6\,\hbox {s}, 11\,\hbox {s}], [6\,\hbox {s}, \infty ),\)\([11\,\hbox {s}, \infty ) \}\). Iterate through ltw. Within time window \([0, 6\,\hbox {s}]\), edge \(v_s-v_2\) will be removed. The sorted remaining edge set becomes \(E'=\{v_2-v_d,v_s-v_1,v_1-v_2\}\). Remove all edges from \(G'\), and then add edges in \(E'\) to \(G'\) one at a time. The first path returned by Dijkstra’s algorithm is \(v_s-v_1-v_2-v_d\) with available bandwidth of \(9\,\hbox {Gb/s}\) and length of 3. We have \(3 \le \lfloor \frac{200}{54}\rfloor = 3\), and \(0 + \frac{54\,\hbox {Gb}}{9\,\hbox {Gb/s}} = 6\,\hbox {s}\). We then have \(t = 6\,\hbox {s}\), \(p = v_s-v_1-v_2-v_d\) and \(tw = [0, 6\,\hbox {s}]\), and the edge adding loop stops here. Within time window \([0, 11\,\hbox {s}]\), using similar strategy, we know that path \(v_s-v_1-v_2-v_d\) is also the first path returned by Dijkstra’s algorithm. However, completion time of the data transfer on that path is \(\frac{54\,\mathrm{Gb}}{7\,\mathrm{Gb/s}} \approx 7.71\,\hbox {s} > t = 6\,\hbox {s}\), so the time window iteration continues. Similarly, we could not identify any QR with an earlier data transfer completion time within rest of the time windows. After the time window iteration, we have \(p = v_s-v_1-v_2-v_d\) and \(tw = [0, 6\,\hbox {s}]\), so the corresponding QRECT is \((v_s-v_1-v_2-v_d,9\,\hbox {Gb/s},[0, 6\,\hbox {s}],162)\) and will be returned.

5.1.3 Optimality Proof

Theorem 2

Opt-MinT-CC returns the QRECT, if it exists, for the input MinT-CC BRR.

Proof

We use proof-by-contradiction to prove the theorem. Suppose for the input BRR, its QRECT exists and is made on path \(p''\) within time window \(tw''\) with the data transfer completion time of \(t''\). As we will see, the QR identified using Opt-MinT-CC is not NULL, and we suppose this QR is made on path p with the data transfer completion time of t. We assume \(t'' < t\).

Let us consider the scheduling process within time window \(tw''\). The edge pruning procedure will not remove any edge on path \(p''\) since it could finish the data transfer of the given BRR. We suppose the bottleneck edge of path \(p''\) is edge \(e''\), then we have \(e'' \in E'\). We have two cases for the edge adding loop (Line 10 of Algorithm 3): The loop stops before or after edge \(e''\) is added. We consider each case as follows:

Case (1): The edge adding loop stops before edge \(e''\) is added. Since \(e'' \in E'\), this case happens when the Break statement on Line 14 of Algorithm 3 is executed and we have found a qualified path \(p'\) before adding \(e''\) to \(G'\). Since all edges in \(E'\) is sorted by their available bandwidths in descending order and \(p'\) is found before adding \(e''\) to \(G'\), we have \(B(p', tw'') \ge B(e'', tw'') = B(p'', tw'')\). Then we have \(tw''^s + \frac{D}{min\left( B^{max},\,B(p', \,tw'')\right) } \le tw''^s + \frac{D}{min(B^{max},\,B(p'', \,tw''))} = t''\). As we can see from Lines 12 – 13 of Algorithm 3, we use path p to denote the qualified path with the earliest data transfer completion time among all qualified paths within all time windows, and Opt-MinT-CC returns the QR made on p at the end of the algorithm. Hence, we have \(t \le tw''^s + \frac{D}{min\left( B^{max}, B(p', \,tw'')\right) }\), we then have \(t \le t''\), which contradicts our initial assumption that \(t'' < t\).

Case (2): The edge adding loop stops after edge \(e''\) is added. Since path \(p''\) is the optimal data transfer path, we know that after we add edge \(e''\) to \(G'\), \(v_s\) and \(v_d\) are connected by at least one path for the first time within current time window. As shown in Line 11 of Algorithm 3, Dijkstra’s algorithm is used to identify the path with the least length from \(v_s\) to \(v_d\) through \(e''\), suppose this path is \(p'\). Since both \(p''\) and \(p'\) share the same bottleneck edge \(e''\), we know that \(B(p'', tw'') = B(p', tw'')\). From our analysis in Case (1), we also have the conclusion that \(t \le t''\), which also contradicts our initial assumption that \(t'' < t\).

In summary, each of the above two cases contradicts our initial assumption. Proof ends. □

5.2 Algorithm Design and Analysis of FBR-MinT-CC

5.2.1 Algorithm Design

Please refer to Algorithm 4 for the detailed algorithm design and pseudocode of FBR-MinT-CC. In the worst case, its complexity is \(O\left( |td|^2 \cdot \left( |E |+ |V |\cdot \log {|V |} \right) \right)\).

figure d

5.2.2 Algorithm Illustration

We illustrate FBR-MinT-CC using G shown in Fig. 1 and the example MinT-CC BRR \((v_s, v_d, 12\,\hbox {Gb/s}, 54\,\hbox {Gb}, 0, 200)\).

Iterate through \(ltw = \{ [0, 6\,\hbox {s}], [0, 11\,\hbox {s}], [0, \infty ), [6\,\hbox {s}, 11\,\hbox {s}], [6\,\hbox {s}, \infty ), [11\,\hbox {s}, \infty )\}\). Within time windows \([0,6\,\hbox {s}]\) and \([0, 11\,\hbox {s}]\), we know that the shortest path from \(v_s\) to \(v_d\), \(v_s-v_2-v_d\), could not finish the data transfer of the example BRR. Within time window \([0, \infty )\), the shortest path from \(v_s\) to \(v_d\) is \(v_s-v_2-v_d\) with the available bandwidth of \(3\,\hbox {Gb/s}\) and length of 2, which is less than \(\lfloor \frac{200}{54}\rfloor = 3\). The completion time of the data transfer on it is \(\frac{54\,\hbox {Gb}}{3\,\hbox {Gb/s}} = 18\,\hbox {s}\). After the iteration of the rest of the time windows, we know we cannot find any QR with earlier completion time, and we have \(t = 18\,\hbox {s}\), \(p = v_s-v_2-v_d\) and \(tw = [0, \infty )\). The corresponding QR is \((v_s-v_2-v_d,3\,\hbox {Gb/s},[0,18\,\hbox {s}],108)\), the estimated QRECT for the example BRR.

6 Performance Evaluation

One of the most widely used bandwidth reservation service in scientific area is the OSCARS of ESnet [28,29,30,31,32]. Currently more than 40 U.S. Department of Energy’s research sites, including all national laboratory systems, and another 140 commercial and research institutions around the world are using the service provided by ESnet for their daily large-scale data movement. To make our performance evaluation as real and accurate as possible, topology of ESnet is drawn using the data gathered from ESnet [10, 33] to mimic the real ESnet scenario, and we then conduct intense simulations on the drawn topology.

We run 10 sets of simulations and for simulation i, \(1 \le i \le 10\), 10 BRR batches containing \(i \times 200\) BRRs are randomly generated. For MinC-TC BRR \((v_s,v_d,B^{max},D,[t^S,t^E])\) and MinT-CC BRR \((v_s,v_d,B^{max},D,t^S,C^{max})\), \(v_s\) and \(v_d\) are two randomly selected nodes from the node set \((v_s \ne v_d)\), \(B^{max}\) is a random integer within 1 and 10,000, \(D \le B^{max} \cdot (t^E-t^S)\), \(t^S\) is a random integer within the range [0, 19] while \(t^E\) is a random integer from the range \((t^S,20]\), and \(C^{max}\) is the multiplication between D and a random integer within the range [1, 10]. All the proposed algorithms, Opt-MinC-TC, FBR-MinC-TC, Opt-MinT-CC and FBR-MinT-CC, are implemented to process the same batches of BRRs. Several performance metrics are collected after the BRR processing, and corresponding figures are drawn. To make our experiment results as accurate as possible, all figures in this section show both the average performance measurements of the performance metrics and the corresponding variances with the 95% confidence level across all the simulation sets.

6.1 Performance Analysis of Opt-MinC-TC and FBR-MinC-TC

Fig. 2
figure 2

Comparison of the BRR scheduling ratio by Opt-MinC-TC and FBR-MinC-TC

Fig. 3
figure 3

Comparison of the average length of the data transfer paths of the scheduled BRRs by Opt-MinC-TC and FBR-MinC-TC

After Opt-MinC-TC and FBR-MinC-TC finish the processing of all the BRRs in one batch, two performance metrics are collected: (1) BRR scheduling success ratio, defined as the percentage of BRRs that have been successfully scheduled within the BRR batch, and (2) average length of the data transfer paths of the successfully scheduled BRRs within the batch. The second metric is used to measure the data transfer cost of the scheduled BRRs based on our data transfer cost model proposed in Sect. 3.2.

After data analysis, we further plot the experimental results in Figs. 2 and 3. Suppose we use LBRR to denote one BRR batch, and s and \(s'\) to denote the set of MinC-TC BRRs within LBRR that can be successfully scheduled by Opt-MinC-TC and FBR-MinC-TC, respectively. Opt_MinC_TC_FBR in Figs. 2 and 3 denotes the ratio of the BRRs in one batch that can be successfully scheduled by both Opt-MinC-TC and FBR-MinC-TC, and the corresponding average length of the data transfer paths, respectively. We know that any \(brr \in s'\) can also be successfully scheduled by Opt-MinC-TC and its cost computed by Opt-MinC-TC and FBR-MinC-TC is identical. The data analysis shows that Opt-MinC-TC can successfully schedule 10.69% more BRRs averagely in one BRR batch than FBR-MinC-TC, namely \(\frac{|s| - |s'|}{|LBRR|} = 10.69\%\) in average as shown by Opt_MinC_TC_Extra in Fig. 2, and the average length of the data transfer paths of the scheduled BRRs by Opt-MinC-TC is 20.30% larger than that computed by FBR-MinC-TC (Fig. 3). From the above two parameters, we derive that \(s' \subset s\) and for those BRRs that Opt-MinC-TC can successfully schedule while FBR-MinC-TC cannot, namely the BRRs in set \(s - s'\), lengths of their data transfer paths are relatively longer than those of the BRRs both Opt-MinC-TC and FBR-MinC-TC can successfully schedule as shown by Opt_MinC_TC_Extra in Fig. 3.

In summary, Opt-MinC-TC successfully schedules much more BRRs in one batch with higher average length of the data transfer paths of the scheduled BRRs. From our above result analysis, we know that Opt-MinC-TC has a much better overall scheduling performance than FBR-MinC-TC.

6.2 Performance Analysis of Opt-MinT-CC and FBR-MinT-CC

After Opt-MinT-CC and FBR-MinT-CC finish the processing of all the BRRs in one batch, similar performance metrics are collected: (1) BRR scheduling success ratio, and (2) average data transfer completion time of the successfully scheduled BRRs within the batch.

Fig. 4
figure 4

Comparison of the BRR scheduling ratio by Opt-MinT-CC and FBR-MinT-CC

Fig. 5
figure 5

Comparison of the average data transfer completion time by Opt-MinT-CC and FBR-MinT-CC

After data analysis, we further plot the experimental results in Figs. 4 and 5. Similarly, we also use set s and \(s'\) to denote the set of MinT-CC BRRs within a batch LBRR that can be successfully scheduled by Opt-MinT-CC and FBR-MinT-TC, respectively. The data analysis shows that the average BRR scheduling ratio computed by Opt-MinT-CC and FBR-MinT-CC is identical (Fig. 4), namely \(s = s'\), and the average data transfer completion time of the successfully scheduled BRRs by FBR-MinT-CC is 7.89% higher than that computed by Opt-MinT-CC (Fig. 5).

From the experiment result, we know that if a MinT-CC BRR is schedulable, namely as long as the BRR can be successfully scheduled theoretically, both Opt-MinT-CC and FBR-MinT-CC can successfully schedule it. However, its data transfer completion time computed by Opt-MinT-CC and FBR-MinT-CC might be different, an example of which is shown in the illustrations of Opt-MinT-CC and FBR-MinT-CC.

In summary, Opt-MinT-CC schedules the same amount of BRRs in one batch as FBR-MinT-CC, but with less average data transfer completion time of the successfully scheduled BRRs. From our above result analysis, we know that Opt-MinT-CC has a much better overall scheduling performance than FBR-MinT-CC.

7 Conclusion and Future Work

In this paper, we focused on the trade-off between cost and performance of data transfers using bandwidth reservation in dedicated networks. The most common data transfer performance parameter, data transfer completion time, was specifically studied. We considered the scheduling of two types of BRRs regarding such trade-off: (1) to achieve the minimum data transfer cost given the data transfer deadline, and (2) to achieve the earliest data transfer completion time given the maximum data transfer cost. We suppose the data transfer cost is proportional to the length of the data transfer path. We then proposed two bandwidth reservation algorithms with rigorous optimality proofs, i.e., Opt-MinC-TC and Opt-MinT-CC, to optimize the scheduling of each MinC-TC and MinT-CC BRR. We compared the proposed algorithms with two scheduling algorithms originating from one widely used scheduling algorithm in production networks, i.e., FBR-MinC-TC and FBR-MinT-CC. Extensive simulations were conducted on the topology of ESnet drawn using the real data collected from online, and different performance metrics were used to evaluation the scheduling performance of the proposed algorithms. The extensive simulation results showed Opt-MinC-TC and Opt-MinT-CC have much better overall scheduling performance than FBR-MinC-TC and FBR-MinT-CC, respectively.

We plan to study the following issues in the near future: (1) BRRs with different priorities and how to break the bandwidth reservation of the BRRs with lower priorities to satisfy the requirements of those with higher priorities, (2) the bandwidth reservation service with guaranteed performance in Cloud environment.