1 Introduction

Wireless networks have experienced great success in the past decades which have totally revolutionized the world of communications. Compared with wired networks, wireless networks are capable of providing more freedom to mobile communications, though meanwhile some new challenges arise such as higher latency, lower bandwidth and higher burst error rate. Taking the cellular network, the traditional homogeneous wireless network as an example, due to the mobility of users and the limited coverage of base stations or access points, mobile terminals (MTs) may demand the cooperation of several base stations to accomplish a continuous voice service process. During this process, MTs have to decide whether the connection should use the current base station or switch to another base station for continuity. Nowadays with the development of network access technology and increasing demands of service types, highly integrated HWNs [13] come into stage which provide better and more comprehensive services. Figure 1 shows an example of typical HWNs consisting of 3G/4G cellular networks and the IEEE 802.11 wireless local area networks (WLAN). Compared with the traditional homogeneous wireless network in which services are coupled with certain network access technologies, HWNs support more comprehensive services such as video, voice and file transfer protocol (FTP) etc. It can also provide ubiquitous coverage and high quality of service (QoS) provisioning of applications. A key issue that aids in providing seamless connection of services is vertical handoff decision algorithm, that is, the ability to make correct decisions at any time whether to carry out vertical handoff or not, meanwhile to choose the appropriate candidate wireless access network with the best performance. Differing from horizontal handoff in traditional homogeneous wireless network where continuity of service is the main consideration, the purpose of vertical handoff in HWNs is to pursue better QoS guarantee for users although extra handoff cost may be included. Therefore for the design of vertical handoff algorithm, how to choose appropriate handoff criterion is the primary problem which can determine whether and how to perform the handoff.

Fig. 1
figure 1

A schematic diagram of HWNs

Recently lots of effort has been made to the criterion designing of vertical handoff algorithm in HWNs. For example, a vertical handoff algorithm is developed in [4] where a parameter named signal to interference and noise ratio (SINR) is proposed as a unique criterion to determine whether to trigger vertical handoff or not. This algorithm inevitably leads to frequent handoffs due to the fact that MTs always connect to the wireless network with the best SINR, and such frequent handoffs will decrease the performance of HWNs consequently. To improve the performance, [5] introduced a vertical handoff algorithm based on both analytic hierarchy process (AHP) and SINR. In this algorithm, more than one network parameters are taken into account such as required bandwidth, handoff cost and available bandwidth of the participating access networks. By mapping these parameters of each single network into a vector, the characteristics of HWNs are then described by a matrix composed of these vectors. Next a weight vector for network parameters is given via AHP method. Finally multiply the matrix by weight vector, then the performance vector is thus obtained. Based on this, a new criterion is developed where network with the largest value of performance vector will be the candidate network for handoff. The above mentioned vertical handoff algorithm mainly focuses on single service type in HWNs although the performance criterion may contain one or several parameters. However in reality, multiple services with different priorities take a larger proportion in HWNs. For multiple services, Cheng et al. [6] put forward an optimal vertical handoff mechanism called guard-channel-based incremental and dynamic optimization algorithm. In this work, channels are reversed according to different priorities of multiple service types and the available resource is thus different for each type of service. A newly-arrived call is accepted only if the available resource for this service type can meet the resource demand of this call. The criterion of such algorithm contains only bandwidth and fails to fully consider more factors such as delay, handoff cost and mobility of MTs, etc. To overcome the above disadvantages, some researchers adopted multiple attribute decision making [7], game theory [8, 9], adaptive fuzzy logic [10] and genetic algorithm [11] where multiple network parameters are included in criterion. Although experimental results show effectiveness, these algorithms are hard to be applied in practical environment due to high complexity. Actually the wireless network environment does not vary abruptly in most cases, and thus a vertical handoff algorithm based on Markov decision process (MDP) was proposed in [12] which reduced the complexity greatly. The criterion includes multiple parameters such as delay and network bandwidth. However, the different priorities of multiple services are still not considered in handoff algorithm. In such cases, valuable network bandwidth may be occupied by services with low priorities since first-come first-serve mechanism is adopted in HWNs. Consequently, traditional bandwidth reservation scheme will have disadvantages because all the reserved bandwidth resource is available to services despite of their priorities. To deal with this, new bandwidth reservation scheme should be established in which bandwidth should be reserved concerning with priorities. In a word, a new vertical handoff algorithm is demanded for HWNs in which different priorities of multiple services as well as various network parameters should be taken into account.

Motivated by this, we propose an optimal vertical handoff decision algorithm for multiple services with different priorities in HWNs. In our model, the vertical handoff decision problem can be formulated as an MDP. And the reward function of MDP consists of both QoS profit and handoff cost where multiple network parameters are included. In addition, a novel dynamic bandwidth reservation scheme is proposed taking into account different priorities. This scheme records the bandwidth request of all services over certain time slots in each candidate wireless network, and then the bandwidth request in the future time slots can be estimated. Via this estimation, multiple thresholds are set such that various bandwidths are reserved for different priorities. Moreover, the probability that actual bandwidth request in future time slots will exceed the pre-set thresholds can be ensured to be smaller than a desired value. The main contributions of this paper lie in the following three aspects:

  1. (1)

    An optimal handoff decision algorithm is proposed for multiple services with different priorities in which a unified criterion is introduced as a tradeoff of both QoS profit and handoff cost. Compared with existing achievements, multiple network parameters are contained in the criterion for a comprehensive evaluation of network utilization, e.g., time delay, relative received signal strength and available bandwidth.

  2. (2)

    Due to multiple services with different priorities, we propose an online dynamic multi-threshold bandwidth reservation scheme. By measuring the occupied bandwidth of different services over certain time slots, the possible bandwidth request of multiple services are estimated for the future time slots with the application of Large Deviation Principle. And multiple thresholds can be set consequently based on this estimation. Moreover, these thresholds are dynamically adjusted with the periodical variation of network load.

  3. (3)

    For the optimization of handoff criterion, MDP model is applied for the vertical handoff decision. By classifying the possible network parameters into several discrete sets, the dimension of state space is greatly reduced and the complexity of algorithm is largely decreased.

The rest of this paper is organized as follows. In Sect. 2, the concerned vertical handoff decision process in HWNs is modeled as an MDP, and an optimal vertical handoff decision algorithm is proposed based on this model. Section 3 proposes an online dynamic multi-threshold bandwidth reservation scheme based on Large Deviation Principle where different priorities of multiple types of services in HWNs are considered. Numerical simulations as well as comparison with existing results are presented in Sect. 4. Finally, a short conclusion is drawn in Sect. 5.

2 MDP Model and Vertical Handoff Decision Algorithm

In this section, we introduce the MDP model and the corresponding handoff decision algorithm. For simplicity, we will not take into account the different priorities of multiple services, i.e., the bandwidth is available to all the services despite of different priorities. The problem concerning about the multiple services with different priorities will be discussed in Sect. 3.

2.1 MDP Model

Suppose an MT moves across certain areas with \(M\) co-located wireless networks as shown in Fig. 1, and each wireless network has total \(C^i,(i=1,2,\ldots ,M, C^i \in Z)\) basic bandwidth units. The MT can perceive information such as available bandwidth, the average time delay and the relative received signal strength from wireless networks when the MT locates in the coverage area of these networks. If the MT switches from one network to another, it will achieve better QoS but cause extra signaling switching cost and handoff cost. A new criterion should be proposed to evaluate this handoff process which takes into account both QoS profit and handoff cost.

Fig. 2
figure 2

Time series of MDP model

The vertical handoff decision process can be described by a finite state, infinite horizon MDP model [13]. The MDP model consists of the following five elements: (1) decision epochs; (2) states; (3) actions; (4) transition probabilities; (5) rewards. The MT can make handoff decision at each time epoch \(t, (t=1,2,\ldots , N )\) as shown in Fig. 2, and the interval of two epochs is set as a fixed time interval \(\tau \). For a certain decision epoch \(t\), the state of MT is defined as \(s_t\), and the MT can take an action \(a_t\in A=\{1,2,\ldots ,M\}\) based on the current perceived information from wireless networks where \(A\) is an action set. The state \(s_t\) includes available bandwidth, time delay and the relative received signal strength of all candidate networks in HWNs, and \(s_t\in S\) with \(S\) as the state space. When an action \(a_t\) is taken with state \(s_t\), the MT get an instant reward \(r(s_t,a_t)\) at this epoch, and then transfers to next state with the transition probability \(P(s_{t+1}|s_t,a_t)\). The MT repeats the handoff decision process at each decision epoch to continue the session until the MT terminates its session at time epoch \(N\), here \(N\) is a random variable since each session may last different time slots. In this vertical handoff decision process, all the actions over the time interval \([0,N]\) compose a policy \(\pi =(a_1,a_2,\ldots ,a_N)\).

For simplicity, let \(s\) denote the initial state at first decision epoch \(t=0\), and \(\upsilon ^{\pi }(s)\) denotes the expected total reward along \([0,N]\) with the decision policy \(\pi \). And there is

$$\begin{aligned} {\upsilon ^\pi }(s) = E_s^\pi \left[{E_N}\left\{ \sum _{t=1}^N {r({s_t},{a_t})} \right\}\right] \end{aligned}$$
(1)

We assume that at each decision epoch, the MT continues to keep the connection with probability \(\lambda \), and terminates the connection with the probability \(1-\lambda \). It’s easy to infer that random variable \(N\) follows a geometrically distributed with mean \(1/(1 - \lambda )\), represented as

$$\begin{aligned} P(N=n)=\lambda ^{n-1}(1-\lambda ) \end{aligned}$$
(2)

Then, Eq. (1) can be represented by an infinite-horizon MDP with a discount factor \(\lambda \) as follows:

$$\begin{aligned} {\upsilon ^\pi }(s)&= E_s^\pi \left\{ \sum _{n =1}^\infty {\sum _{t = 1}^n {r({s_t},{a_t})} {\lambda ^{n - 1}}(1 -\lambda )} \right\} \end{aligned}$$
(3)

Our objective is to find an optimal MDP policy \(\pi ^*\) to maximize the expected reward function \(\upsilon ^*(s)\). That is to say, \(\upsilon ^{\pi ^*}(s)\ge \upsilon ^\pi (s)\) holds for all \(\pi \in \varPi \) where \(\varPi \) is the possible policy space. In next subsection we will introduce the mapping between parameters in HWNs and in MDP model.

2.2 Parameter Description in MDP Model

Considering the vertical handoff decision algorithm in HWNs, the state space \(S\) composes of states from current and candidate handoff network. For a comprehensive evaluation of network utilization, \(S\) includes the corresponding priority of the service, the available bandwidth, the time delay. Since the relative received signal strength [14] is an important indicator for the QoS, it is also included in the criterion of our model. And the state space \(S\) is then expressed as following:

$$\begin{aligned} S&= M\times K\times B^1\times D^1 \times P^1\times \cdots \times B^i \times D^i \times P^i\nonumber \\&\quad \times \cdots \times B^M \times D^M \times P^M \end{aligned}$$
(4)

where symbol \(\times \) denotes the Cartesian product, parameter \(M\) denotes the set of available candidate networks, and \(K\) denotes the service types. \(B^i\), \(D^i\) and \(P^i\,(i=1,2,\ldots ,M)\), represent the available bandwidth, time delay and the relative received signal strength of the each network \(i\) in HWNs respectively.

Obviously, the available bandwidth, time delay and the relative received signal strength of networks will have huge number of possible values, and this will cause the curse of dimensionality consequently. To reduce the size of state space \(S\), all the values of available bandwidth, time delay and the relative received signal strength are classified into several classes. The available bandwidth of network \(i\) can be represented as follows:

$$\begin{aligned} B^i=\{b_1^i,b_2^i,\ldots ,b_{max}^i\} \qquad i=1,2,\ldots ,M \end{aligned}$$
(5)

Here \(b_1^i<b_2^i<\cdots <b_{max}^i\) take values in integer sets. \( b_1^i\) represents the class of minimum available bandwidth and \(b_{max}^i\) represents the class of maximum available bandwidth provided by network \(i\).

Similarly, the delay and the relative received signal strength of network \(i\) can also be expressed via a discrete set as following:

$$\begin{aligned} D^i=\{d_1^i,d_2^i,\ldots ,d_{max}^i\} \qquad i=1,2,\ldots ,M \end{aligned}$$
(6)

and

$$\begin{aligned} P^i=\{p_1^i,p_2^i,\ldots ,p_{max}^i\} \qquad i=1,2,\ldots ,M \end{aligned}$$
(7)

where \(d_1^i<d_2^i<\cdots <d_{max}^i\) and \(p_1^i<p_2^i<\cdots <p_{max}^i\) take values in integer sets. \(d_1^i\) represents the class of minimum time delay and \(d_{max}^i\) represents the class of maximum time in network \(i\), while \(p_1^i\) and \(p_{max}^i\) represent the class of minimum and maximum relative received signal strength of network \(i\) respectively.

Considering service \(k\) of MT which is currently connected to network \(m\), its state can be expressed by vector \(s=[m,k,b_1,d_1,p_1,\ldots ,b_M,d_M,p_M]\). If the service \(k\) makes vertical handoff decision, i.e., takes an action \(a\), it receives a QoS profit \(f(s,a)\) and consumes handoff cost penalty. The QoS profit \(f(s,a)\) can be defined as:

$$\begin{aligned} f(s,a) = {\omega _b}{f_b}(s,a) + {\omega _d}{f_d}(s,a) + {\omega _p}{f_p}(s,a) \end{aligned}$$
(8)

where \(f_b(s,a)\) denotes the bandwidth profit function, \(f_d(s,a)\) denotes the delay profit function and \(f_p(s,a)\) denotes the relative received signal strength profit function; \(\omega _b\), \(\omega _d\) and \(\omega _p\) represent the weight factors for bandwidth, delay, and the relative received signal strength, respectively. The weight factors above are subject to \(\omega _b+\omega _d+\omega _d=1\) and \(0\le \omega _b\le 1\), \(0\le \omega _d\le 1\), \(0\le \omega _p\le 1\). Weight factors can be selected according to the relative importance of network parameters.

In this paper, the bandwidth profit function \(f_b(s,a)\) is defined as follows:

$$\begin{aligned} f_b(s,a)=\left\{ \begin{array}{ll} \frac{b_a-b_m}{{max_{i\in A,i\ne m}}(b_i-b_m)},&{} b_a>b_m ,\\ 0,&{} b_a=b_m,\\ \frac{-(b_a-b_m)}{{min_{i\in A,i\ne m}} (b_i-b_m)}, &{} b_a<b_m. \end{array} \right. \end{aligned}$$
(9)

In the above bandwidth profit function \(f_b(s,a)\), \(b_m\) represents the available bandwidth class which is provided by currently connected network \(m\), and \(b_i\) represents the available bandwidth class provided by candidate network \(i\) in HWNs. \(b_a\in \{b_1,b_2,\ldots ,b_M\}\) represents the possible available bandwidth class if action \(a\) is taken.

For the delay profit function \(f_d(s,a)\), its defined in another way as follows [9]:

$$\begin{aligned} f_d(s,a) =\left\{ \begin{array}{ll} 1,&{} 0<d_a\le L_D,\\ \frac{U_D-d_a}{U_D-L_D},&{} L_D<d_a<U_D,\\ 0,&{} d_a>U_D. \end{array} \right. \end{aligned}$$
(10)

Here the constants \(U_D\) and \(L_D\) in (10) represent the maximum and minimum delay class. \(d_a\in \{d_1,d_2,\ldots d_M\}\) represents the possible time delay class if action a is taken.

And the relative received signal strength profit function \(f_p(s,a)\) is given similar to the bandwidth profit function as follows:

$$\begin{aligned} f_p(s,a)=\left\{ \begin{array}{ll} \frac{p_a-p_m}{{max_{i\in A,i\ne m}}(p_i-p_m)},&{} p_a>p_m ,\\ 0,&{} p_a=p_m,\\ \frac{ -(p_a-p_m)}{min_{i\in A,i\ne m}(p_i-p_m)}, &{} p_a<p_m. \end{array} \right. \end{aligned}$$
(11)

where \(p_m\) represents the relative received signal strength class which is provided by currently connected network \(m\), and \(p_i\) represents therelative received signal strength class provided by candidate network \(i\) in HWNs. \(p_a\in \{p_1,p_2,\ldots ,p_M\}\) represents the possible relative received signal strength class if action \(a\) is taken.

Now we consider the handoff cost. During the vertical handoff process, it will incur extra signaling switch cost and handoff delay cost, therefore the handoff cost function \(g(s,a)\) include two parts: signaling switch cost function \(g_{cost}(s,a)\) and handoff cost function \(g_{delay}(s,a)\). The former reflects the complexity of signaling load incurred by exchange information with other wireless networks at the network layer when vertical handoff is performed [15]. And the latter, i.e., handoff delay, can result in packet loss that will lead to degradation of service quality [16]. Here \(g_{cost}(s,a)\) and \(g_{delay}(s,a)\) are given by

$$\begin{aligned} g_{cost}(s,a)=\left\{ \begin{array}{ll} K_{m,a}, &{} a\ne m,\\ 0,&{} a=m. \end{array} \right. \end{aligned}$$
(12)

and

$$\begin{aligned} g_{delay}(s,a)=\left\{ \begin{array}{ll} 0,&{} 0<{d_a}'<d_{min},\\ \frac{ d_{max}-{d_a}'}{d_{max}-d_{min}},&{} d_{min}<{d_a}'<d_{max},\\ 1,&{} {d_a}'\ge d_{max}. \end{array} \right. \end{aligned}$$
(13)

where \(K_{m,a}\) is the normalized signaling switch cost from the current network \(m\) to network \(a\). And there is \(K_{m,a}=K_{a,m}\) evidently. \({d_a}'\) represents the handoff delay when action \(a\) is taken by the MT. \(d_{max}\) and \(d_{min}\) are the maximum and minimum thresholds of the handoff delay, respectively. It is desired for the connection of service when the handoff delay is smaller than \(d_{min}\). If the handoff delay is between \(d_{min}\) and \(d_{max}\), the connection can be considered to be acceptable. However, when the delay exceeds \(d_{max}\), the quality of the connection of the service is unacceptable. Obviously the smaller \({d_a}'\) is, the less \(g_{delay}(s,a)\) will be.

Combine the signaling switch cost and handoff delay cost together, we get the total handoff cost function \(g(s,a)\) given as

$$\begin{aligned} g(s,a)=\alpha g_{cost}(s,a)+(1-\alpha )g_{delay}(s,a) \end{aligned}$$
(14)

where \(0\le \alpha \le 1\) is the weight factor and it can be selected according to the relative importance of signaling switch cost and handoff delay cost.

In MDP model, the handoff criterion is the tradeoff between profit function \(f(s,a)\) and handoff cost \(g(s,a)\), thus we define reward function \(r(s,a)\) as:

$$\begin{aligned} r(s,a)=f(s,a)-g(s,a) \end{aligned}$$
(15)

Considering a certain service \(k\) at decision epoch \(t\) in network \(j\), the current state is \(s=[j,k,b_1,d_1,p_1,\ldots ,b_M,d_M,p_M]\), the probability function of transferring from current state \(s\) to next state \(s'=[j,k,\acute{b_1},\acute{d_1},\acute{p_1},\ldots ,\acute{b_M},\acute{d_M},\acute{p_M}]\) is given by

$$\begin{aligned} P(s'|s,a)=\left\{ \begin{array}{ll} \prod \nolimits _{m=1}^{m=M}P(\acute{b_m},\acute{d_m},\acute{p_m}|b_m,d_m,p_m), &{}j=a, \\ 0,&{} j\ne a. \end{array} \right. \end{aligned}$$
(16)

Due to the fact that the different wireless networks are using different wireless access technologies and managed by different operators in co-coverage area, the probability function of the bandwidth, delay and the relative received signal strength in each wireless network will be independent reasonably. And according to Eq. (16), the transition probability of state in HWNs can be represented by the combination of transition probabilities of each network. It can be easily concluded that the state transition probability in HWNs only depends on the current state and action instead of the previous states, so it keeps the Markovian property.

2.3 Optimal Vertical Handoff Decision Algorithm

Now let \(\upsilon (s)\) denote the maximum expected total reward with respect to initial state \(s\). That is to say

$$\begin{aligned} \upsilon (s)= max_{\pi \in \varPi } \upsilon ^\pi (s) \end{aligned}$$
(17)

And according to [11], the optimality equation is given by

$$\begin{aligned} \upsilon (s)= max_{a\in A}\{r(s,a)+\sum _{s'\in S} \lambda P(s'|s,a)\upsilon (s')\} \end{aligned}$$
(18)

There are several algorithms to solve the optimization problem in (18) such as value iteration algorithm [11], action elimination algorithm and policy iteration algorithm etc. Here we adopt the value iteration algorithm (see Algorithm 1) to find a stationary \(\varepsilon \) optimal policy.

figure e

In the value iteration algorithm, the norm function \(\parallel \cdot \parallel _\propto \) is defined as \(\parallel \cdot \Vert _\propto =max |\upsilon (s)|\) for \(s\in S\), and the \(\varepsilon \) is a desired small value for the iteration ending.

Once the MT makes a vertical handoff, it will select the candidate wireless network with the best handoff criterion \(r(s,a)\) to continue its connection. Among all the possible network parameters which are included in the criterion for handoff, bandwidth is of the most importance. However, the scarce bandwidth may be occupied by services with low priorities since first-come first-serve mechanism is adopted in HWNs. To avoid this, bandwidth should be reserved differently for services with different priorities. In next session, we will introduce a dynamic bandwidth reservation scheme for multiple services with different priorities.

3 Dynamic Bandwidth Reservation Scheme

In this section, we aim at establishing an online dynamic multi-threshold bandwidth reservation scheme such that bandwidth can be reserved for services with high priorities. By measuring the occupied bandwidth of different services over certain time slots, the possible bandwidth request at the future \(L\)th time slots can be estimated via Large Deviation Principle. Based on this, multiple thresholds can be set. And we can guarantee that the actual bandwidth requirements of the future \(L\)th time slots for multiple types of services will not exceed these thresholds in probability sense.

3.1 Estimation Model of Future Bandwidth Request

Suppose that the HWNs support \(K\) different types of services and the priorities of these services are set from 1 to \(K\) with \(K\) representing the highest priority. Assume that for each type of service, the bandwidth request is the integer multiples of basic bandwidth unit (BU).

For each service, it has two possible calls: the new call and the handoff call, and we treat them equally for simplification. If the wireless network \(i\) with total bandwidth \(C^i\) can support \(K\) types of service, \(C^i\) BUs can then be divided into \(K\) regions by thresholds from \(h_1^i\) to \(h_K^i=C^i\). Here \(h_k^i, k=1,2,\ldots ,K\) is the threshold for the \(k\)th service type between the normal channels and guard channels.

Fig. 3
figure 3

Bandwidth thresholds in heterogeneous wireless networks

These thresholds works as follows: Considering the threshold \(h_k^i\), if the occupied bandwidth units is less than \(h_k^i\), which means there are spare bandwidth units, service \(k\) can be admitted as well as all the services whose priorities are higher than \(k\). Otherwise, service \(k\) and services with lower priorities will be declined. Only services with priority higher than \(k\) may be admitted which depends on new circle of judgments. If the wireless network only supports a part of the \(K\) types of services, the number of the thresholds correspondingly decreases. Without loss of generality, we consider the \(K\) different types of services division and the basic model of the proposed scheme is shown in Fig. 3.

For mathematical description, we use the \(K\)-dimension vector \(\mathbf{H }^i(t)=[h_1^i(t), h_2^i(t),\ldots , h_K^i(t)]'\in R^K\) to denote multiple thresholds in wireless network \(i\) and the elements in this vector satisfy \(h_1^i(t)\le h_2^i(t)\le \cdots \le h_K^i(t)\), where the notation “\('\)”denotes the transpose of a matrix or vector and “\(=\)” can’t be held for all \(h_k^i(t)\). If the “\(=\)” holds in the inequalities for all \(h_k^i(t)\), it means the wireless network only supports one type of service and this case can be included in our discussion.

For \(K\) different types of services in HWNs, the occupied bandwidth in wireless network \(i\) at time epoch \(t\) can be denoted by \(K\)-dimension vector \({{\mathbf{W}}^i}(t) = {\left[ {w_1^i(t), w_2^i(t),\ldots w_K^i(t)}\right] ^\prime }\in {R^K}\). For simplification of further discussion, we define a new vector \({{\mathbf{V}}^i}(t) = {[ {v_1^i(t), v_2^i(t), \ldots , v_K^i(t)} ]^\prime }\in {R^K}\) whose \(k\)th element is defined as \(v_k^i(t) = \sum \nolimits _{e = 1}^k {w_e^i(t)}\). It is obvious that at any decision time epoch \(t\), there is \(v_1^i(t) < v_2^i(t) < \cdots < v_K^i(t)\).

For a given time epoch \(t\), let \({{\mathbf{A}}^i}(t) \in {R^K}\) denote the increasing bandwidth request of the \(K\) different types services which is caused by services arrival in time slot \([t-1,t)\), and \({{\mathbf{D}}^i}(t) \in {R^K}\) denote the bandwidth releasing of the \(K\) different types of services caused by service departure in time slot \([t-1,t)\). Thus the changing bandwidth in network \(i\) at time epoch \(t\) is characterized by

$$\begin{aligned} {{\mathbf{I}}^i}(t) = {{\mathbf{A}}^i}(t) - {{\mathbf{D}}^i}(t) \end{aligned}$$
(19)

and we have the bandwidth occupation at time epoch \(t+1\) as

$$\begin{aligned} \mathbf{W }^i(t+1)=\mathbf{W }^i(t)+\mathbf{I }^i(t) \end{aligned}$$
(20)

The bandwidth changing during the time interval \([t,t+L)\) can then be expressed as

$$\begin{aligned} \mathbf{I }^{i,t + L}(t) = \sum _{l = 1}^{l=L} \mathbf{I }^i(t + l) \end{aligned}$$
(21)

Considering the wireless network \(i\), if the bandwidth is reserved appropriately, the current reserved bandwidth resource at decision epoch \(t\) will continue guaranteeing all the bandwidth request over the future \(L\) time slots, which means:

$$\begin{aligned} {{\mathbf{V}}^i}(t){\mathbf{+ }}{{\mathbf{I}}^{i,t + L}}(t) \prec {{\mathbf{H}}^i}(t) \end{aligned}$$
(22)

In above equation, operator “\(\prec \)” denotes that for all the elements in vectors \(\mathbf{V }^i(t)\), \(\mathbf{I }^{i,t+L}\), \(\mathbf{H }^i(t)\), there is \(v_k^i(t) + i_k^{i,t + L}(t) < h_k^i(t)\) for \(k=1,2,\ldots ,K\).

For simplicity, we define a mapping function \(\varPsi :\mathbf{R }^K\times \mathbf{R }^K\longrightarrow \mathbf{R }\) which is given as:

$$\begin{aligned} \varPsi ({\mathbf{V }^i(t)}+{\mathbf{I}^{i,t+L}(t)},{\mathbf{H }^i(t)})& ={ \prod _{k=1}^K} \left\{ {|v_k^i(t)+I_k^{i,t+L}(t)-h_k^i(t)|} - {\left[{v_k^i(t)+I_k^{i,t+L}(t)-h_k^i(t)}\right]}\right\}\end{aligned}$$
(23)

Obviously, if inequality (22) establishes, there is \(\varPsi (\mathbf{V }^i(t)+\mathbf{I }^{i,t+L}(t),\mathbf{H }^i(t))>0\) and this function will be applied to judge whether enough bandwidth resource has been reserved for the future \(L\) time slots.

According to function (23), there are two possible cases that will exist:

  1. 1)

    \(\varPsi (\mathbf{V }^i(t)+\mathbf{I }^{i,t+L}(t),\mathbf{H }^i(t))=0\): In this case, there exists a certain service \(k\) whose bandwidth request exceeds the pre-reserved bandwidth threshold vector during time interval \([t,t+L)\). The pre-reserved thresholds can not meet the bandwidth request and therefore must be updated. Assuming that the arrival rate and departure rate are constant during \([t,t+L)\), if we keep the current threshold \(\mathbf{H }^i(t)\) unchanged, there may be a high probability that the blocking/dropping will happen. To avoid this, we suggest the new thresholds set as:

    $$\begin{aligned} \mathbf{H }^i(t)\leftarrow \mathbf{V }^i(t)+\mathbf{I }^{(i,t+L)} \end{aligned}$$
    (24)
  2. 2)

    \(\varPsi (\mathbf{V }^i(t)+\mathbf{I }^{i,t+L}(t),\mathbf{H }^i(t))>0\): In this case, it means the threshold vector inequality (22) holds. And the bandwidth request of the \(K\) different types of service during the time interval \([t,t+L)\) will be less than the corresponding reserved bandwidth described by \(\mathbf{H }^i(t)\). However, this does not necessarily imply that enough bandwidth has been reserved for each time epoch during the future \(L\) time slots, i.e., for each time epoch \(t+l, l=1,\ldots ,L\), \(\varPsi (\mathbf{V }^i(t)+\mathbf{I }^{i,t+l}(t),\mathbf{H }^i(t))>0\) does not necessarily establish.

To make sure that inequality (22) stands for each time epoch during the time interval \([t,t+L)\), further restriction should be added. We define the average achievable bandwidth increasing during the future \(L\) time slots as follows:

$$\begin{aligned} \mathbf{a }^i(t,L) = \frac{\mathbf{H }^i(t)-\mathbf{V }^i(t)}{L} \end{aligned}$$
(25)

and the (22) can be rewritten as

$$\begin{aligned} \frac{{{{\mathbf{I}}^{i,t + L}}(t)}}{L} \prec \frac{{{{\mathbf{H}}^i}(t) - {{\mathbf{V}}^i}(t)}}{L} \end{aligned}$$
(26)

Combined with Eq. (25), there is

$$\begin{aligned} \frac{{\sum _{l = 1}^L {{{\mathbf{I}}^i}} (t + l)}}{L} \prec {\mathbf{a }^i}(t,L) \end{aligned}$$
(27)

Hence, there is the following inequality establishing

$$\begin{aligned} \frac{{\sum _{l = 1}^{L} {\left\langle {{{\mathbf{I}}^i}(t + l),{\mathbf{1}}} \right\rangle } }}{L} < \left\langle {{{\mathbf{a}}^i}(t,L),{\mathbf{1}}} \right\rangle \end{aligned}$$
(28)

where “\(\langle \cdot ,\cdot \rangle \)” is inner product of vectors and \(\mathbf{1 }=[1,1,\ldots ,1]'\in R^K\).

Next, let \(\langle {{{\mathbf{A}}^i}(t),{\mathbf{1}}} \rangle \in \{ 0,1,\ldots ,{s_A}\} \) denote the number of total bandwidth resource request of the \(K\) services arrival in time interval \([t-1,t)\) of network \(i\) and \(s_A\) is the maximum total number of bandwidth request of the \(K\) services in time interval \([t-1,t)\). Let \(\langle {\mathbf{D }^i}(t),\mathbf{1 } \rangle \in \{ 0,1,\ldots ,{s_D}\} \) denote the number of total bandwidth releasing of \(K\) types of services depart in time interval \([t-1,t)\) and \(s_D\) is the maximum number of bandwidth releasing in time interval \([t-1,t)\). Thus the bandwidth changing in time interval \([t-1,t)\) can be written as \(\langle \mathbf{I }^i(t),\mathbf{1 } \rangle \in \{ -{s_D},\ldots , -1,0,1,\ldots ,{s_A}\}\). And the bandwidth resource overflow probability at time epoch \((t+L )\)th can be characterized as

$$\begin{aligned} P^{t+L} = P \left(\frac{{\sum \nolimits _{l = 1}^L {\left\langle {{{\mathbf{I}}^i}(t + l),{\mathbf{1}}} \right\rangle } }}{L} > \left\langle {{{\mathbf{a}}^i}(t,L),{\mathbf{1}}} \right\rangle \right) \end{aligned}$$
(29)

Here the calculation of overflow probability \(P^{t + L}\) is given as follows:

It’s obvious that arrival services and department services are independent, and thus \(\langle \mathbf{A }^i(t),\mathbf{1 } \rangle \) and \(\langle \mathbf{D }^i(t),\mathbf{1 } \rangle \) \((t=1,2,\ldots )\) are independent and identically distributed process. Therefore \(\langle \mathbf{I }^i(t),\mathbf{1 }\rangle (t = 1,2,\ldots )\) is independent and identically distributed random variable, having a finite moment-generating function \(M(\theta ) = E{e^{\theta \langle {{{\mathbf{I}}^i}(t){\mathbf{,1}}} \rangle }}\) which is finite in a neighborhood of 0. If

$$\begin{aligned} \frac{{\sum \nolimits _{l = 1}^{L} {\left\langle {{{\mathbf{I}}^i}(t + l),{\mathbf{1}}} \right\rangle } }}{L} > \left\langle {{{\mathbf{a}}^i}(t,L),{\mathbf{1}}} \right\rangle \end{aligned}$$

according to Cram\(\acute{e}\)s Theorem in the context of the large deviation principle [18], we conclude

$$\begin{aligned} \mathop {Lim}\limits _{L \rightarrow \infty } \frac{1}{L}\log P\left(\frac{{\sum \nolimits _{l = 1}^L {\left\langle {{{\mathbf{I}}^i}(t + l),{\mathbf{1}}} \right\rangle } }}{L} > \left\langle {{{\mathbf{a}}^i}(t,L),{\mathbf{1}}} \right\rangle\right ) =-\Lambda (\langle \mathbf{a }^i(t,L),\mathbf{1 } \rangle ) \end{aligned}$$
(30)

where

$$\begin{aligned} \Lambda (\langle \mathbf{a }^i(t,L),\mathbf{1 } \rangle )=\mathop {\sup }\limits _{\theta > 0} \{ {\langle \mathbf{a }^i(t,L),\mathbf{1 }}\rangle \theta -\log M(\theta )\} \end{aligned}$$
(31)

and

$$\begin{aligned} \log M(\theta )=\log \left\{{\sum _{x=-{s_A}}^{s_D} {{\pi _x}\exp [x\theta ]} } \right\} \end{aligned}$$
(32)

Here \(\pi _x\) is the distribution of stochastic process \(\langle \mathbf{I }^i(t),\mathbf{1 }\rangle (t = 1,2,\ldots ,L)\), i.e., the probability of bandwidth changing.

Note that \(M(\theta )\) is a convex function, and the rate function of (31) is also convex. For sufficiently large values of \(N\), according to (30), the overflow probability is given by

$$\begin{aligned} P^{t + L} \approx \exp [-L\Lambda ( \langle {\mathbf{a }^i}(t,L),\mathbf{1 }\rangle ) \end{aligned}$$
(33)

For the appropriate reservation of bandwidth, a desired value \(P_{QoS}\) is introduced and the calculated bandwidth resource overflow probability \(P^{t+L}\) should not exceed \(P_{QoS}\). If after calculation, \(P^{t+L}\) will exceed \(P_{QoS}\) which means no enough bandwidth has been reserved, we should adjust the threshold vector \(\mathbf{H }^i(t)\) to lower the overflow probability \(P^{t+L}\):

$$\begin{aligned} \mathbf{H }^i(t+1)\leftarrow \mathbf{H }^i(t)+\Delta \end{aligned}$$
(34)

where \( \Delta = [\delta _1^i, \delta _2^i,\ldots , \delta _{K - 1}^i,\delta _K^i]' \in {R^K}\) and \(\delta _k^i, k=1,2,\ldots ,K\) is the basic bandwidth request for the \(k\)th service. If not, \(\mathbf{H }^i(t)\) remains unchanged.

Remark 1

When we calculate the bandwidth resource overflow probability \(P^{t+L}\), the information about the arrival service \(\mathbf{A }^i(t+L),l=1,2,\ldots ,L\) and departure service \(\mathbf{D }^i(t+L), l=1,2,\ldots ,L\) is needed, i.e., \(\langle \mathbf{I }^i(t),\mathbf{1 }\rangle , t = 1,2,\ldots ,L\) is required for calculation of \(P^{t+L}\). However, at time epoch \(t\), we can only get the prior knowledge of \(\mathbf{I }^i(t-l),l=1,2,\ldots ,L\) and information of current epoch \(\mathbf{I }^i(t)\). To deal with this, we propose a method based on sliding window which uses the knowing \(\mathbf{I }^i(t-l),l=1,2,\ldots ,L\) to substitute the unknown \(\langle \mathbf{I }^i(t),\mathbf{1 }\rangle , t = 1,2,\ldots ,L\) such that \(P^{t+L}\) can be calculated.

However, the probability distribution \(\pi _x\) is still unknown, and the following subsection will show us how to estimate \(\pi _x\). Based on this, detailed scheme for bandwidth reservation is proposed via Large Deviation Principle.

3.2 Bandwidth Reservation Scheme Via Large Deviation Principle

According to Eqs. (31) and (32), for the calculation of \(P^{t+L}\), we should know \(\langle \mathbf{a }^i(t,L),\mathbf{1 }\rangle \) and \(\pi _x\). Although we have suggested using \(\mathbf{I }^i(t-l)\) to substitute the unknown \(\langle \mathbf{I }^i(t),\mathbf{1 }\rangle \), there is no prior knowledge about \(\pi _x\), i.e., the probability distribution of \(\langle \mathbf{I }^i(t),\mathbf{1 }\rangle \). To solve this, a method based on sliding window can be applied to estimate \(\pi _x\).

At decision time epoch \(t\), We can get the sequence of bandwidth changing given by \([\langle \mathbf{I }^i(t-L),\mathbf{1 } \rangle , \langle \mathbf{I }^i(t-L+1),\mathbf{1 }\rangle ,\ldots , \langle \mathbf{I }^i(t-1),\mathbf{1 }\rangle ]\) which covers a window of \(L\) recent information. Define the observation vector at time epoch \(t\) as \(Win(t)=[\langle \mathbf{I }^i(t-L),\mathbf{1 } \rangle , \langle \mathbf{I }^i(t-L+1),\mathbf{1 }\rangle ,\ldots ,\langle \mathbf{I }^i(t-1),\mathbf{1 }\rangle ]\), and the sliding window moves forward as time \(t\) goes on. For time epoch \(t+1\), the sliding window is \(Win(t+1)=[\langle \mathbf{I }^i(t-L+1),\mathbf{1 } \rangle , \langle \mathbf{I }^i(t-L+2),\mathbf{1 }\rangle , \ldots , \langle \mathbf{I }^i(t),\mathbf{1 }\rangle ]\), now for each time epoch t, the parameter \(\pi _x(t)\) can be estimated by following ways:

Considering the slide window containing \(L\) time slots, if at certain slot \(l\), the bandwidth changing is \(x\), i.e.,\(\langle \mathbf{I }^i(l),\mathbf{1 }\rangle =x\), define an indicator function as,

$$\begin{aligned} \varPhi _x^i(l)=\left\{ \begin{array}{ll} 1, &{} \quad if \langle \mathbf{I }^i(l),\mathbf{1 } \rangle =x,\\ 0, &{} \quad otherwise. \end{array} \right. \end{aligned}$$
(35)

Let \(L_x\) be a counting process which denotes the number of time slots when \(\langle \mathbf{I }^i(l),\mathbf{1 }\rangle =x\) happens during a sliding window, that is to say

$$\begin{aligned} L_x=\varSigma _{l=t-L+1}^{t}\varPhi _x^i(l) \end{aligned}$$
(36)

And the frequency of \(\langle \mathbf{I }^i(l),\mathbf{1 }\rangle =x \) over time interval \([t-L+1,t)\) can be given by

$$\begin{aligned} \tilde{q}_x(t)=\frac{L_x}{L} \end{aligned}$$
(37)

If we use the frequency \(\tilde{q}_x(t)\) as the probability function \(\pi _x\) for the calculation of \(P^{t+L}\), this may result in estimation error and \(\pi _x\) may change un-smoothly. Thus we introduce a forgetting factor \(\rho \) \((0\le \rho \le 1)\) to smoothen \(\pi _x\) as follows:

$$\begin{aligned} \tilde{\pi }_x(t) = \rho {\tilde{q}_x}(t) + (1 - \rho ){\tilde{\pi }_x}(t - 1) \end{aligned}$$
(38)

Remark 2

According to the theory of probability, when the scale of the sliding window \(L\) is large enough, the frequency converges to probability. However, if \(L\) is taken too large, it will be difficult for information storage and calculation. Hence, we should set \(L\) as an appropriate value. Also the forgetting factor \(\rho \) can balance the influence of current data and past data. In [19], the range of \(\rho \) is from 0.7 to 0.9.

Finally, the online dynamic bandwidth reservation scheme based on Large Deviation Principle is summarized in Algorithm 2.

figure f

In Algorithm 2, the history information about bandwidth changing can be obtained and recorded as time goes. Based on these information, the possible bandwidth request in future time slots can be estimated and multiple thresholds for bandwidth reservation can then be set. This scheme ensures that \(P^{t+L}\), the overflow probability in future time slot, can be less than a desired small value \(P_{QoS}\). With this scheme, bandwidth resource can be reserved appropriately which takes into account different priorities of multiple services.

4 Simulation Results and Discussion

   In this section, we evaluate the performance of our proposed vertical handoff decision algorithm for multiple types of services with different priorities in HWNs and illustrate the performance by numerical examples. Our simulation environment, which is implemented in MATLAB, consists of a heterogeneous wireless network comprising three WLANs and several adjacent cellular networks. The cellular system covers all the object area seamlessly, while the WLAN is settled only in the “hot” area selectly. The experimental overlay structure is shown in Fig. 4.

Fig. 4
figure 4

The voice and data service in WLAN

We denote network 1 as WLAN and network 2 as cellular network, respectively. In our simulation environment, the coverage radius of the base station in cellular system and the access point in WLAN are 1000 and 100 m, respectively. For computation simplification, without loss of generality, we only focus on two types of services, namely, voice service and data service. The voice service is subjected to real time type of service and has fixed bandwidth requirement, while the data service is subjected to none-real time type of service and requires a range of variable bandwidth. It’s obvious the priority of voice service is prior to data service. Actually, the data connection would be served only when there’s bandwidth left. For simplicity, we let the voice service contain two priorities, namely \(\hbox {voice}_{1}\) and \(\hbox {voice}_{2}\), and the data service contain only one priority. We assume the requested bandwidth for a data connection is random, and takes value from 0 to 2 bandwidth units. The user connections in hot area or other area are randomly generated with a poisson process and the life time of a single connection subjects to exponential distribution. The arrival rate in hot area is three times than that in other area. Due to the mobility of users, there exists a certain probability that the MTs move out of the coverage area of current network. Thus in this situation, a vertical handoff may be carried out. The values of parameters used in simulation are summarized in Table 1.

Table 1 The simulation parameters In HWN
Fig. 5
figure 5

The voice service in WLAN

Fig. 6
figure 6

The voice service in cellular network

For the network service providers, they pay most attention to reducing the new call blocking probability to obtain more profits and handoff call dropping probability to improve the users’ service experience. We compare our work with the approaches revealed in recent years [11, 2024]. Here Figs. 5 and 6 illustrate the blocking dropping probability of voice traffic in WLAN and cellular network, respectively. According to Fig. 5, in WLAN, the dropping probability of \(\hbox {voice}_1\) with the handoff call type is lower than that of \(\hbox {voice}_2\), and so does the new call. This is due to that the priority setting of \(\hbox {voice}_1\) is prior to that of \(\hbox {voice}_2\). The similar situation can also be found in Fig. 6. These two figures show that the proposed dynamic bandwidth reservation scheme can fulfil the bandwidth request of services of different priorities. By comparing the two figures, it’s easy to find that the blocking/dropping probability of different services in cellular system is larger than that in WLAN, this is because the MTs in the collocated coverage area can access to both WLAN and cellular network, while the MTs in other area have only one choice, i.e., cellular network.

Fig. 7
figure 7

Structure of the MDP optimal policy on the available bandwidth when \(s=[1,1,b_1^1,2,0.5,b_2^2,2,0.7]\)

Fig. 8
figure 8

Structure of the MDP optimal policy on the available bandwidth when \(s=[1,1,b_1^1,2,0.5,b_2^2,2,0.7]\)

The optimal policy \(\pi ^*\) of MDP is computed by implementing value iteration algorithm and the state space \(S\) is \(8-D\). To plot the MDP policy, \(d_1\), \(d_2\), \(p_1\), \(p_2\) are all set to constants. Figure 7 shows the structures of the optimal policy of \(\hbox {voice}_1\) for the current state \(s=[1,1,b_1^1,2,0.5,b_2^2,2,0.7]\) where the discount factor \(\lambda \) is 0.9, i.e., the average connection time is 100s. Given a fixed value of \(b_1^1\) (e.g., \(b_1^1=4\) ), the optimal policy chooses network 1 if \(b_2^1\le 2\), and selects network 2 if \(b_2^2>2\). Figure 8 shows the structures of the optimal policy \(\pi ^*\) of \(\hbox {voice}_2\) for the current state \(s=[1,2,b_1^2,0.5,\,b_2^2,2,0.7]\).

In order to evaluate our proposed vertical handoff decision based on MDP algorithm conducted in HWNs, we compare results with BAN algorithm [11], SAW algorithm [20] and VHDA [21], which are typical decision algorithms discussed in the literature recently. The SAW algorithm selects the network based on the available bandwidth, the relative received signal strength and the average delay, the BAN algorithm always selects the wireless network which can offer the maximize available bandwidth, and the VHDA considers related parameters and applies rule-based system for vertical handoff decision.

Fig. 9
figure 9

The expected number of handoff call of \(\hbox {voice}_{1}\)

Fig. 10
figure 10

The expected number of handoff call of \(\hbox {voice}_{2}\)

Fig. 11
figure 11

The expected total reward of voice service with priority 1 and priority 2 in HWN

Fig. 12
figure 12

The expected total reward of data service in HWNs

Figures 9, 10, 11 and 12 show the expected number of handoff and the expected total reward of \(\hbox {voice}_1\) and \(\hbox {voice}_2\) under different discount factor \(\lambda \), respectively. When \(\lambda \) varies from 0.9 to 0.98, the corresponding expected connection time varies from 100s to 500s. It obviously shows that the proposed vertical handoff algorithm, compared with other methods, can achieve the most expected total reward as well as least numbers of handoff. It can be seen from the performance figures that the expected handoff number can be decreased by about 15–22 %.

5 Conclusion

In this paper, an optimal vertical handoff decision algorithm is proposed for multiple services with different priorities in HWNs. Considering the different priorities of services, we propose an online dynamic bandwidth reservation scheme based on Large Deviation Principle. This scheme is by using the history data of sliding window to estimate the bandwidth request in the future, and to keep the bandwidth overflow probability less than a desired value. Besides an optimal vertical handoff decision problem is modeled as an MDP, where the criterion of the reward function takes into account both QoS profit and handoff cost. This criterion consists of multiple network parameters and provides a more comprehensive and unified evaluation of the vertical handoff. Simulation results show that the proposed vertical handoff algorithm based on bandwidth reservation scheme not only decreases the expected handoff number by about 15–22 % than the existing algorithm e.g., SAW, BAN and VHDA, but also ensures the high utility of HWNs.