1 Introduction

Parallel-server queueing systems with customer abandonment serve as a building block for modeling complex service systems such as customer call centers. Gans et al. [15] argued that best practice companies should operate a service system in a quality- and efficiency-driven (QED) regime, also known as the Halfin–Whitt regime. In this regime, the service system provides high-quality service and at the same time achieves high server utilization. This is possible only when there are a large number of servers working in parallel on a common pool of customers.

This paper focuses on a subset of \(GI/GI/n+M\) queueing systems. In such a queueing system, there are \(n\) identical servers serving customers whose interarrival times are i.i.d. following a general distribution (the first \(GI\)) and whose service times are i.i.d. following a general distribution (the second \(GI\)). In addition, customers’ patience times are i.i.d. following an exponential distribution (\(+M\)): when a customer’s waiting time in queue exceeds his patience time, the customer abandons the system without service.

Despite decades of research, there is still no general analytical or numerical tool to efficiently and accurately predict the steady-state performance of a \(GI/GI/n+M\) system. Computer simulation is often the only remaining tool available, but it can be slow when the number of servers is large, particularly when the system is in the QED regime.

Recently, Dai and He [8] proposed diffusion models to approximate a \(GI/Ph/n+M\) system when the service time distribution is of phase type. Their approximations are rooted in many-server heavy-traffic limit theorems proved in [7]. The numerical examples in [8] demonstrate that the steady-state performance of the diffusion model provides a remarkably accurate estimate for the steady-state performance of the corresponding queueing system, even when the number of servers is moderate. In this paper, we provide a justification for the diffusion approximation procedure in [8].

We now describe our results and contributions more precisely. When the number of servers \(n\) is fixed, a \(GI/Ph/n+M\) system can be represented by a certain continuous-time Markov process \(\mathbb {X}^n=\{\mathbb {X}^n(t): t\ge 0\}\). Often \(\mathbb {X}^n(t)\) converges in distribution to \(\mathbb {X}^n(\infty )\) as time \(t\rightarrow \infty \), where the random variable \(\mathbb {X}^n(\infty )\) has the stationary distribution of \(\mathbb {X}^n\). On the other hand, Dai et al. [7] shows that \(\tilde{\mathbb {X}}^n\), as a sequence of stochastic processes that are centered and scaled versions of \(\mathbb {X}^n\), converges in distribution to some diffusion process \(\tilde{\mathbb {X}}\) as \(n\rightarrow \infty \) under a heavy-traffic condition. The limit process \(\tilde{\mathbb {X}}\) is a piecewise Ornstein–Uhlenbeck (OU) process. A similar result was proved earlier in [23] for \(GI/Ph/n\) systems without customer abandonment. The convergence proved in [7] implies that each finite \(t\ge 0,\,\tilde{\mathbb {X}}^n(t)\) converges in distribution to \(\tilde{\mathbb {X}}(t)\), but, as in almost all diffusion limits, does not cover the case when \(t=\infty \). This paper proves that \(\tilde{\mathbb {X}}^n(\infty )\) converges in distribution to \(\tilde{\mathbb {X}}(\infty )\) as \(n\rightarrow \infty \); see Theorem 1 and Corollary 1 in Sect. 3 below. Often, one can also prove that \(\tilde{\mathbb {X}}(t)\) converges in distribution to \(\tilde{\mathbb {X}}(\infty )\) as \(t\rightarrow \infty \). Formally, our results can be stated as

$$\begin{aligned} \lim _{n\rightarrow \infty } \lim _{t\rightarrow \infty } \tilde{\mathbb {X}}^n(t) \mathop {=}\limits ^{d} \lim _{t\rightarrow \infty } \lim _{n\rightarrow \infty } \tilde{\mathbb {X}}^n(t), \end{aligned}$$

which is known as the interchange limit theorem.

There has been a surge of interest in establishing interchange limit theorems in the last ten years in both the conventional heavy-traffic setting and many-server heavy-traffic setting. To prove an interchange limit theorem, when the stationary distribution of the limit process \(\tilde{\mathbb {X}}\) is unique, it is sufficient to prove that the sequence of random variables \(\{\tilde{\mathbb {X}}^n(\infty ): n\ge 1\}\) is tight. Gamarnik and Zeevi [11] pioneered an approach in proving tightness in the context of generalized Jackson networks in conventional heavy traffic when all distributions are assumed to have finite exponential moments. Key to their proof is the construction of a geometric Lyapunov function. Inspired by this work, Budhiraja and Lee [4] devised an alternative method to prove tightness along with some other results also in the context of generalized Jackson networks, but with the minimal two-moment assumption on all distributions. Budhiraja and Lee [4] did not construct a geometric Lyapunov function, but they cleverly utilized and sharpened a fluid limit approach introduced in [6], and their approach is potentially more general and obtains sharper results.

Our current paper follows the approach in [11] by constructing a geometric Lyapunov function; see Lemma 4 in Sect. 4 below. We heavily rely on a delicate analysis of the behavior of our Lyapunov functions applied to a fluid model for \(GI/Ph/n+M\) queues. In particular, when applied to the fluid model, we show that our Lyapunov function decreases at a rate that is proportional to the size of the fluid state when it is far away from origin; see part (b) of Lemma 7. Because of the customer abandonment in our model, when the waiting fluid is high, the decreasing rate in our Lyapunov function should be expected due to abandonment. However, when the waiting fluid is not high, but a large fluid state is due to the huge imbalance of servers among different service phases, the decreasing rate is by no means obvious. Our proof relies critically on a common quadratic function that was used in [10]. Using the common quadratic Lyapunov function, the authors were able to devise a Lyapunov function and prove the existence of the stationary distribution of the limit process \(\tilde{\mathbb {X}}\). The diffusion process \(\tilde{\mathbb {X}}\) has two different drifts in two separate regions of state space, and the Lyapunov function works well in these two regions. It remains an open problem whether the approach in [4] can be adapted to our setting. One major step in their approach is to obtain estimates on moments of the state process that are uniform in both time and the scaling parameter. In [4], the authors relied on the uniform Lipschitz property (with respect to time) of the Skorohod map to obtain such estimates. However, in our many-server queue setting, the map introduced in Lemma 2 does not have such property: it is Lipschitz continuous but the Lipschitz constant depends on time; see part (c) of Lemma 2.

In the many-server setting without customer abandonment, [17] is the first paper to establish a many-server diffusion limit for the \(GI/M/n\) model. In the same paper, they proved the tightness result. Gamarnik and Mom c̆ ilović [12] proves the tightness result where service time distribution is lattice-valued with a finite support. Gamarnik and Goldberg [14] has generalized this result to \(GI/GI/n\) queues. In a single class, multiple server pool model, Tezcan [26] proved asymptotic optimality of some routing policies and some interchange limit results.

In the many-server setting with customer abandonment, Gamarnik and Stolyar [13] proved a tightness result. In their model, customers have many classes. Each class has its own homogeneous Poisson arrival process, exponential service time distribution, and exponential patience time distribution with class-dependent rate. The service policy in choosing which class of customers to serve next can be arbitrary as long as it is non-idling. When the service policy is first-in-first-out across customer classes and the patience rate is independent of customer classes, their model reduces to a special case of our model considered in this paper. Their proof critically relies on the assumption that service time distributions are exponential and it remains an open problem whether their tightness result holds for non-exponential service time distributions. Empirical study in [2] suggests that the service time distributions are approximately lognormal, not exponential.

In the conventional heavy-traffic setting, Gurvich [16] systematically generalized the results in [11] to multiclass queueing networks. Katsuda [18] proves some interchange limit results for a multiclass single-server queue with feedback under various disciplines. Ye and Yao [27] studied interchange limit results in a head-of-line bandwidth sharing model with two customer classes and two servers.

The rest of the paper is organized as follows. Sect. 2 presents the background on \(GI/Ph/n+M \) queues and diffusion approximations. Assuming positive Harris recurrence, Sect. 3 summarizes our main results, Theorem 1 and Corollary 1. Section 4 introduces our Lyapunov function and a fluid model used to prove Lemma 4. Section 5 discusses a key lemma, Lemma 4, for proving Theorem 1. In Appendix 1, we prove the positive Harris recurrence of \(GI/Ph/n+M\) queues when \(n\) is fixed. Appendix 2 is devoted to the proof of a negative drift condition for the fluid model. Appendix 3 uses this negative drift condition for the fluid model to establish a negative drift condition for the diffusion-scaled processes.

1.1 Notation

All random variables and stochastic processes are defined on a common probability space \((\Omega ,\mathcal {F},\mathbb {P})\) unless otherwise specified. For a positive integer \(d,\,{\mathbb {R}}^{d}\) denotes the \(d\)-dimensional Euclidean space. Given a subset \(S\) of some Euclidean space, the space of right-continuous functions \(f :\mathbb {R}_+ \rightarrow S\) with left limits in \((0,\infty )\) is denoted by \(\mathbb {D}(\mathbb {R}_+,S)\) or simply \(\mathbb {D}(S)\). For a sequence of random elements \(\{X_n: n =1, 2, \ldots \}\) taking values in a metric space, we write \(X_n \Rightarrow X\) to denote the convergence of \(X_n\) to \(X\) in distribution. Each stochastic process with sample paths in \(\mathbb {D}(S)\) is considered to be a \(\mathbb {D}(S)\)-valued random element. The space \(\mathbb {D}(S)\) is assumed to be endowed with the Skorohod \(J_1\)-topology. Given \(x\in {\mathbb {R}}\), we set \(x^{+}=\max \{x,0\}\) and \(x^{-}=\max \{-x,0\}\). All vectors are envisioned as column vectors. For a \(K\)-dimensional vector \(u\), we write \(\left| u\right| \) for its Euclidean norm. Given \( y \in \mathbb {D}(S)\) and \(t>0\), we set \(\Vert y\Vert _{t} = \sup _{ 0 \le s \le t} |y(s)|\), where \(|\cdot |\) denotes the Euclidean norm in \(S\). Given a \(K \times K\) matrix \(M\), we use \(M^{\prime }\) to denote its transpose, and similarly for vector transposition. We write \(M_{ij}\) for its \((i,j)\)th entry. Let the matrix norm of \(M\) be \(|M|=\sum _{ij} |M_{i,j}|,\) where \(|M_{ij}|\) is the absolute value of \(M_{ij}\). We reserve \(I\) for the \(K\times K\) identity matrix and \(e\) for the \(K\)-dimensional vector of ones.

2 \(GI/Ph/n+M\) queues and diffusion approximations

In Sect. 2.1, we describe the \(GI/Ph/n+M\) queueing model and probabilistic assumptions and in Sect. 2.2 we state a diffusion limit under a many-server heavy-traffic condition.

2.1 Model description and probabilistic assumptions

In a \(GI/Ph/n+M\) system, there are \(n\) identical servers. We allow some of the system parameters to change with \(n\) as the number of servers increases. For the rest of this section, we keep \(n\) fixed. Customers arrive according to a delayed renewal process with interarrival times given by \(\{\xi ^n (i): i=0, 1, 2, \ldots \}\). We assume that \(\{\xi ^n(i): i=1,2, \ldots \}\) is a sequence of independent, identically distributed (i.i.d.) random variables with a general distribution and this sequence is independent of \(\xi ^n(0)\). Here, \(\xi ^n(0)\) is the time that the first customer after time \(0\) is to arrive at the system. Upon arrival, a customer enters service immediately if an idle server is available. Otherwise, he waits in a buffer with infinite waiting room that holds a first-in-first-out (FIFO) queue. When a server completes a service, the server picks the next customer from the FIFO queue if there is one waiting. Otherwise, the server becomes idle.

The service times are i.i.d. random variables, following a phase-type distribution which corresponds to the absorption time of a certain transient continuous-time Markov chain (the \(Ph\) in \(GI/Ph/n+M\) notation). Specifically, let \(p\) be a \(K\)-dimensional nonnegative vector with entries summing up to one, \(\nu \) be a \(K\)-dimensional positive vector, and \(P\) be a \(K \times K\) sub-stochastic matrix. We assume that the diagonal entries of \(P\) are zero and \(I-P\) is invertible. Consider a continuous-time Markov chain with \(K +1\) phases (or states) where phases \(1, 2, \ldots , K\) are transient and phase \(K +1\) is absorbing. The Markov chain has the initial distribution \(p\) on \(\{1, 2, \ldots , K\}\). Each time it enters phase \(k\), the amount of time it stays in phase \(k\) is exponentially distributed with mean \({1}/{\nu _k}\). Each time it leaves phase \(k\), the Markov chain enters phase \(j\) with probability \(P_{kj}\) or enters phase \(K+1\) with probability \(1-\sum _{j \le K}{P_{kj}}\). Once the Markov chain enters state \(K+1\), it stays there forever. A phase-type random variable with parameters \((p, \nu , P)\) is defined as the first time until the above continuous-time Markov chain reaches state \(K +1\).

As discussed in the introduction, each customer waiting in the queue has a patience time. The patience times are i.i.d. exponentially distributed with rate \(\alpha >0\). When a customer’s waiting time in queue exceeds his patience time, the customer leaves the system without service. Once a customer enters service, he does not abandon. The sequence of interarrival times, service times, and abandon times are assumed to be independent.

To describe the “state” of the system as time evolves, we let \(Z_{k}^{n}(t)\) denote the number of customers in phase \(k\) service in the \(n\)th system at time \(t\); service times in phase \(k\) are exponentially distributed with rate \(\nu _{k}\). We use \(Z^{n}(t)\) to denote the corresponding \(K\)-dimensional vector. We call \(Z^{n}=\{Z^{n}(t):t\ge 0\}\) the server-allocation process. Let \(N^{n}(t)\) denote the number of customers in the \(n\)th system at time \(t\), either in queue or in service. Let \(A^n(t)\) be the “age” of the interarrival time at time \(t\), i.e., the time between the arrival time of the most recent arrival before time \(t\) and time \(t\). Then, \((A^n,N^n, Z^n)\) is called the state processes for the \(n\)th system. It follows that for each \(n\) fixed the state process \((A^n,N^n, Z^n)\) has state space \(\mathcal{S} =\mathbb {R}_+\times \mathbb {Z}_+ \times \mathbb {Z}_+^K\). As time \(t\) goes on, \(A^n(t)\) increases linearly while \((N^n(t) , Z^n(t)) \) remains constant. When \(A^n(t)\) reaches the interarrival for the next arrival, it instantaneous jumps to zero. We adopt the convention that all processes are right continuous on \([0, \infty )\), having left limits in \((0, \infty )\). It follows that \((A^n,N^n, Z^n)\) is a piecewise deterministic Markov process that conforms to Assumption 3.1 of [9], and hence it is a strong Markov process [9, p. 362]. Throughout the paper, we make the following assumptions.

Assumption 1

The interarrival times \(\{\xi ^n (i): i=1, 2, \ldots \}\) from the second customer onwards have the following representation: \(\xi ^n(i)= \frac{1}{\lambda ^n} u(i),\,i\ge 1\), where \(\{u(i):i\ge 1\}\) is an i.i.d. sequence with \(\mathbb {E}(u(i))=1\) and \(\mathbb {E}(u(i))^2<\infty \).

Assumption 2

For each \(n\), the Markov process \((A^n, N^n, Z^n)\) is positive Harris recurrent and has stationary distribution \(\pi ^n\).

Assumption 1 appears to be restrictive. A consequence of the assumption is that the squared coefficient of variation (variance divided by squared mean) of interarrival times does not depend on the scaling parameter \(n\). All results in this paper can be shown to continue to hold if we adopt a triangular array of random variables to model the family of interarrival time sequences and impose some uniform integrability condition on the second moment; see, for example, (3.4) of [1]. As discussed in the introduction, the major motivation of this paper is to justify the diffusion model procedure in [8], which applies to any queueing system with a fixed arrival process that has no scaling parameter. We adopt the current assumption for notational simplicity without affecting the relevance of our results.

For the definition of positive Harris recurrence of a Markov process in Assumption 2, see, for example, [5]. Since our model allows customer abandonment, for each fixed \(n\), the queueing system should be stable in some sense. However, to prove that the Markov process is positive Harris recurrent, some conditions on the interarrival time distribution are needed. To keep our paper focused on the interchange of limits, we do not pursue the best sufficient conditions for positive Harris recurrence. Proposition 2 in Appendix 1 provides a sufficient condition on the interarrival distribution for Assumption 2 to hold.

2.2 Halfin–Whitt regime and diffusion limits

We describe a parameter regime, known as the Halfin–Whitt regime, that was first introduced in [17]. When the parameters are in this regime, certain diffusion processes serve as good approximations for \(GI/Ph/n+M\) systems. Most of the materials in this section can be found in [7], and readers are referred there for more details.

We consider a sequence of \(GI/Ph/n+M\) systems indexed by \(n\). The arrival rate \(\lambda ^n\) depends on \(n\). We assume that the service time and the patience time distributions do not depend on \(n\). Let \(1/\mu \) denote the mean service time. Since service time distribution is assumed to be phase type with parameters \((p, \nu , P)\) and the \(i\)th component of \((I-P')^{-1}p\) is the expected number of visits to phase \(i\) before the Markov chain is absorbed into state \(K+1\), the mean service time has the following formula:

$$\begin{aligned} \frac{1}{\mu } = \sum _{i=1}^K \frac{1}{\nu _i} [(I-P')^{-1}p]_i. \end{aligned}$$
(2.1)

In vector form, \(1/\mu = e' R^{-1}p\), where \(e\) is the (column) vector of ones and

$$\begin{aligned} R=(I-P^{\prime }){{\mathrm{diag}}}(\nu ). \end{aligned}$$
(2.2)

Define

$$\begin{aligned} \rho ^{n}=\frac{\lambda ^{n}}{n\mu } \quad \text {and} \quad \beta ^n=\sqrt{n} (1 - \rho ^n). \end{aligned}$$

The quantity \(\rho ^n\) is said to be the traffic intensity of the \(n\)th system. We assume that

$$\begin{aligned} \lim _{n\rightarrow \infty } \beta ^n=\beta \quad \text { for some }\beta \in {\mathbb {R}}. \end{aligned}$$
(2.3)

When condition (2.3) is satisfied, the sequence of systems is critically loaded, and is said to be in the QED regime or the Halfin–Whitt regime.

Setting

$$\begin{aligned} X^{n}(t)=N^{n}(t)-n\quad \ \text {for }t\ge 0, \end{aligned}$$
(2.4)

we call \(X^{n}=\{X^{n}(t):t\ge 0\}\) the centered total-customer-count process in the \(n\)th system. One can check that \((X^{n}(t))^{+}\) is the number of customers waiting in queue at time \(t\), and \((X^{n}(t))^{-}\) is the number of idle servers at time \(t\). Clearly,

$$\begin{aligned} e^{\prime }Z^{n}(t)=n-(X^{n}(t))^{-}\quad \ \text {for }t\ge 0. \end{aligned}$$

Now, we define the diffusion-scaled state processes:

$$\begin{aligned} \tilde{A}^{n}(t)=\frac{1}{\sqrt{n}}A^{n}(t),\qquad \tilde{X}^{n}(t)=\frac{1}{\sqrt{n}}X^{n}(t)\quad \ \text {for }t\ge 0. \end{aligned}$$
(2.5)

and

$$\begin{aligned} \tilde{Z}^{n}(t)=\frac{1}{\sqrt{n}}{( {Z}^{n}(t) -n\gamma )}\quad \ \text {for }t\ge 0, \end{aligned}$$
(2.6)

where

$$\begin{aligned} \gamma =\mu R^{-1}p, \end{aligned}$$
(2.7)

which is a \(K\)-dimensional vector since \(\mu \in \mathbb {R}_+\). It follows from (2.1) that \(\sum _{k=1}^{K}\gamma _{k}=1\). One interprets \(\gamma _{k}\) to be the fraction of busy servers in phase \(k\) service. We write \(\tilde{\mathcal{S}}^n\) for the state space of \((\tilde{A}^n,\tilde{X}^n,\tilde{Z}^n)\):

$$\begin{aligned} \tilde{\mathcal{S}}^n \!=\! \{(a,x,z) \in \mathbb {R}_+\times \mathbb {R}\times \mathbb {R}^K: \sqrt{n} x +n \in \mathbb {Z}_+,\sqrt{n} z + n\gamma \in \mathbb {Z}_+^K, e'z+x^-=0\}. \end{aligned}$$

The following result establishes a many-server diffusion limit for a sequence of \(GI/Ph/n+M\) systems that operate in the Halfin–Whitt regime. Since the abandonment rate \(\alpha \) is positive, the limit process is positive recurrent.

Proposition 1

Consider a sequence of \(GI/Ph/n+M\) systems satisfying Assumption 1 and (2.3). Assume that \(\xi ^n(0)=0\) and \((\tilde{X}^{n}(0),\tilde{Z}^{n}(0))\Rightarrow (\tilde{X}(0),\tilde{Z} (0))\) for a pair of random variables \((\tilde{X}(0),\tilde{Z}(0))\). We then have

  1. (a)
    $$\begin{aligned} (\tilde{X}^{n},\tilde{Z}^{n})\Rightarrow (\tilde{X},\tilde{Z})\quad \text { as }n\rightarrow \infty , \end{aligned}$$
    (2.8)

    where \((\tilde{X},\tilde{Z})\) is some continuous-time Markov process on

    $$\begin{aligned} \tilde{\mathcal{S}}= \{ (x,z)\in \mathbb {R}\times \mathbb {R}^K: e' z + {x}^-=0\}. \end{aligned}$$
    (2.9)
  2. (b)

    The Markov process \((\tilde{X}, \tilde{Z})\) given in (2.8) is positive recurrent and has a unique stationary distribution \(\pi \).

Proof

Part (a) follows from [7]. It is assumed in [7] that as \(n\rightarrow \infty \),

$$\begin{aligned} \tilde{E}^{n} \Rightarrow \tilde{E}\quad \mathrm{{for\;some\;Brownian\;motion}}\;\tilde{E}, \end{aligned}$$
(2.10)

where

$$\begin{aligned} \tilde{E}^{n}(t)=\frac{1}{\sqrt{n}} [E^{n}(t)-\lambda ^{n}t], \end{aligned}$$
(2.11)

with \({E}^{n}(t)\) being the number of arrivals in \((0, t]\). Under Assumption 1 on the interarrival times, it follows from the functional central limit theorem that (2.10) holds.

Part (b) is an immediate corollary of Theorem 3 in [10]. \(\square \)

3 The main result

In this section, we work under Assumption 2. Let \(\tilde{\pi }_n\) be the stationary distributions of the diffusion-scaled state process \((\tilde{A}^n, \tilde{X}^n, \tilde{Z}^n)\) defined in (2.5) and (2.6). Now, we state the main theorem of this paper.

Theorem 1

Suppose that Assumptions 1 and 2 and the many-server heavy-traffic condition (2.3) hold. Assume that the abandonment rate \(\alpha \) is strictly positive. Then, the sequence of probability distributions \(\{\tilde{\pi }^n: n \ge 1 \}\) is tight.

The following corollary states the validity of interchange of heavy-traffic and steady-state limits. Since [10] proved that \((\tilde{X}, \tilde{Z})\) has a unique stationary distribution, the corollary follows from Theorem 1 by a standard argument; see [4, 11].

Corollary 1

Under the assumptions of Theorem 1, the sequence of marginal distributions of \(\tilde{\pi }^n\) on \((\tilde{X}^{n},\tilde{Z}^n)\) converges weakly to \(\pi \), where \(\pi \) is the unique stationary distribution of \((\tilde{X},\tilde{Z})\) and \((\tilde{X},\tilde{Z})\) is the diffusion process from Proposition 1.

4 Our Lyapunov function and a fluid model

In this section, we first introduce the Lyapunov function that lies at the heart of this paper. We then introduce a fluid model associated with the sequence of \(GI/Ph/n+M\) systems and assert that our Lyapunov function is a geometric Lyapunov function for the fluid model. The proof of our main result relies on a comparison between the diffusion-scaled processes and the fluid model.

To define our Lyapunov function, we first introduce a lemma on so-called common quadratic Lyapunov functions. The lemma was established in Theorem 1 of [10]. Recall that \(p\) is a discrete probability distribution on \(\{1, \ldots , K\}\), representing the initial distribution of the phase-type service time distribution, \(e\) is a vector of ones, and \(R\) is a matrix given in (2.2) and is obtained from the parameters of the phase-type service time distribution.

Lemma 1

There exists a \(K\times K\) positive definite matrix \(Q\) such that

$$\begin{aligned}&QR+R'Q \quad \text {is positive definite}, \\&Q(I-pe')R+ R'(I - ep')Q \quad \text {is positive semi-definite}. \end{aligned}$$

Our Lyapunov function is the square root of a function \(g=g_\beta \) that depends on the “slack parameter” \(\beta \) in (2.3). It is defined as

$$\begin{aligned} g(x,z)= \left\{ \begin{array} {ll} (x+ \beta )^2 + \kappa (z+\beta \gamma )'Q(z+\beta \gamma ), &{} \text {if} \quad \beta \ge 0,\\ (\alpha x + \mu \beta )^2 + \kappa (z+\beta \gamma )'Q(z+\beta \gamma ), &{} \text {if} \quad \beta <0, \end{array}\right. \end{aligned}$$
(4.1)

where \(Q\) is given in Lemma 1, \(\gamma \) is defined in (2.7), and \(\kappa \) is a large positive constant to be determined later.

The quadratic function in (4.1) is slightly different from the one used in [10] in their Lyapunov function. Our modification is needed, because [10] focused on a \(K\)-dimensional state process, but the current paper focuses on the degenerate \((K+1)\)-dimensional state process \((\tilde{X}, \tilde{Z})\) on the manifold \(\tilde{\mathcal{S}}\); both state processes are equivalent. More importantly, the centering for \((x, z)\) in the two quadratic terms is different. Our centering allows us to obtain a stronger property for the fluid model; see part (a) of Lemma 7.

In addition, it is clear that when \(\beta \ge 0\), the function \(g\) achieves its minimum value 0 at \((x^*, z^*) = (-\beta , -\beta \gamma ) \in \tilde{\mathcal{S}}\). For \(\beta <0\), the function \(g\) achieves its minimum value \(\kappa \beta ^2 \gamma ' Q \gamma \) at \((\bar{x}^*, \bar{z}^*) = \left( -\frac{\mu \beta }{ \alpha }, 0\right) \in \tilde{\mathcal{S}}\). This can be verified by rewriting \(g(x,z)\) as follows:

$$\begin{aligned} g(x,z)&= (\alpha x + \mu \beta )^2 + \kappa z'Qz + \kappa \beta ^2 \gamma ' Q \gamma + 2 \kappa \beta z' Q\gamma \\&= (\alpha x + \mu \beta )^2 + \kappa z'Qz + \kappa \beta ^2 \gamma ' Q \gamma + 2 \kappa \beta b z'e \\&= (\alpha x + \mu \beta )^2 + \kappa z'Qz + \kappa \beta ^2 \gamma ' Q \gamma + 2 \kappa b (-\beta ) \cdot x^{-}, \end{aligned}$$

where the second equality follows from Lemma 8 with constant \(b>0\), and in the last equality we have used the fact that \((x, z) \in \tilde{\mathcal{S}}\). We remark that if \(\beta <0\), both \(\sqrt{g}\) and \(\sqrt{g} - \sqrt{\kappa \beta ^2 \gamma ' Q \gamma }\) can serve as our Lypaunov function. Those two functions only differ by a constant, and the function \(\sqrt{g} - \sqrt{\kappa \beta ^2 \gamma ' Q \gamma }\) has a minimum value of zero on the manifold \(\tilde{\mathcal{S}}\). We use \(\sqrt{g}\) as our Lyapunov function for the sake of clarity of the presentation.

We now introduce a fluid model associated with \(GI/Ph/n+M\) systems. This fluid model is defined through a map, which is established in a more general setting in [7]. Recall the definition of \(\tilde{\mathcal{S}}\) in (2.9) and write

$$\begin{aligned} \tilde{\mathcal{T}}= \{ (u,v)\in \mathbb {R}\times \mathbb {R}^K: e' v = 0\}. \end{aligned}$$

Lemma 2

Let \(\alpha >0\).

  1. (a)

    For each \((u,v)\in \mathbb {D}(\tilde{\mathcal{T}})\), there exists a unique \((x,z)\in \mathbb {D}(\tilde{\mathcal{S}})\) such that

    $$\begin{aligned} x(t)&= u(t)-\alpha \int \limits _{0}^{t}(x(s))^{+}\, ds-e^{\prime }R\int \limits _{0} ^{t}z(s)\,ds,\end{aligned}$$
    (4.2)
    $$\begin{aligned} z(t)&= v(t)-p(x(t))^{-}-(I-pe^{\prime })R\int \limits _{0}^{t}z(s)\,ds \end{aligned}$$
    (4.3)

    for \(t\ge 0\).

  2. (b)

    For each \((u,v)\in \mathbb {D}(\tilde{\mathcal{T}})\), define \((x,z)=\Psi (u,v)\in \mathbb {D}(\tilde{\mathcal{S}})\), where \((x,z)\) satisfies (4.2) and (4.3). The map \(\Psi \) is well-defined and is continuous when both the domain \(\mathbb {D}(\tilde{\mathcal{T}})\) and the range \(\mathbb {D}(\tilde{\mathcal{S}})\) are endowed with the standard Skorohod \(J_{1}\)-topology.

  3. (c)

    The map \(\Psi \) is Lipschitz continuous in the sense that for any \(T>0\), there exists a constant \(C=C(T)>0\) such that

    $$\begin{aligned} \Vert \Psi (y^{1})-\Psi (y^{2}) \Vert _T \le C \Vert y^{1}-y^{2}\Vert _T \quad \text { for any }y^{1},y^{2}\in \mathbb {D}(\tilde{\mathcal{T}}). \end{aligned}$$
  4. (d)

    The map \(\Psi \) is positively homogeneous in the sense that

    $$\begin{aligned} \Psi (by)=b\Psi (y)\quad \text { for each }b>0\text { and each }y\in \mathbb {D}(\tilde{\mathcal{T}}). \end{aligned}$$

We now define the fluid counterpart of the diffusion-scaled state processes \((\tilde{X}^n, \tilde{Z}^n)\). Fix an initial state \((\tilde{x}(0), \tilde{z}(0))\in \tilde{\mathcal{S}}\). For \(t \ge 0\), we set

$$\begin{aligned} \tilde{u}(t)=\tilde{x}(0)-\mu \beta t \quad \mathrm{{and}} \quad \tilde{v}(t)=(I-pe') \tilde{z}(0), \end{aligned}$$
(4.4)

and, after noting that \((\tilde{u}(0),\tilde{v}(0))\in \mathbb {D}(\tilde{\mathcal{T}})\), set

$$\begin{aligned} (\tilde{x}, \tilde{z}) = \Psi ( \tilde{u}, \tilde{v}). \end{aligned}$$
(4.5)

We call \((\tilde{x}, \tilde{z})\) the fluid model starting from \((\tilde{x}(0),\tilde{z}(0))\). The next lemma is a negative drift condition for the fluid model and states that the function \(\sqrt{g}\) is a geometric Lyapunov function for the fluid model. Appendix 2 is devoted to its proof.

Lemma 3

Fix some \(t_0>0\). There exists constants \(C=C(t_0)>0\) and \(\epsilon =\epsilon (t_0)\in (0,1)\) such that for each initial state \((\tilde{x}(0), \tilde{z}(0))\in \tilde{\mathcal{S}}\), we have

$$\begin{aligned} \sqrt{g(\tilde{x}(t_0),\tilde{z}({t_0}))} - \sqrt{g(\tilde{x}(0), \tilde{z}(0))} \le C-\epsilon \sqrt{g(\tilde{x}(0), \tilde{z}(0))}. \end{aligned}$$

5 Our Lyapunov function and diffusion-scaled processes

In this section, we present a negative drift condition for the diffusion-scaled processes, and we briefly outline how this leads to our main result. We use the same Lyapunov function \(\sqrt{g}\) as in the fluid model.

We are now ready to formulate our negative drift condition for the diffusion-scaled processes. Here and in the rest of this paper, we adopt the notational convention that

$$\begin{aligned} \mathbb {E}_{(a, x,z)}[ \quad \cdot \quad ] = \mathbb {E}[ \quad \cdot \quad |\tilde{A}^n(0)=a, \tilde{X}^n(0)= x, \tilde{Z}^n (0) =z ] \end{aligned}$$

for each initial state \((a,x,z)\in \tilde{\mathcal{S}}^n\).

Lemma 4

Fix any \(t_0>0\). Under the assumptions of Theorem 1, there exists a nonnegative function \(g\) on \(\tilde{\mathcal{S}}\), as well as two constants \(C=C(t_0)>0\) and \(\epsilon =\epsilon (t_0) \in (0,1)\), such that for each \(n\) and each feasible initial state \((a,x,z)\in \tilde{\mathcal{S}}^n\), we have

$$\begin{aligned} \mathbb {E}_{(a,x,z)} \left[ \sqrt{g (\tilde{X}^n(t_0), \tilde{Z}^n(t_0))} \right] - \sqrt{g(x,z)} \le C - \epsilon \sqrt{g(x,z)}. \end{aligned}$$
(5.1)

The function \(\sqrt{g}\) satisfying (5.1) is essentially a geometric Lyapunov function with a geometric drift size \(1-\epsilon \) and drift time \(t_0\), see for instance Sect. 3 in [11]. Readers are referred to [11, 22] for more details on the definition of a Lyapunov function and its application in deriving bounds for stationary distributions of Markov processes.

The proof of the negative drift condition in Lemma 4 is lengthy and will be given in Appendix 3. Assuming the lemma, the proof of Theorem 1 is standard. We end this section by giving a sketch of the proof, which is almost identical to the proof of Theorem 5 in [11].

Proof Sketch of theorem 1

In order to show that \(\tilde{\pi }^n\) is tight, since \(\tilde{A}^n\Rightarrow 0\) as \(n\rightarrow \infty \) and \(g\) has compact level sets (see Appendix 2), it is sufficient to show that for any given \(\delta >0\), there exists some large constant \(s\), such that for all \(n\) sufficiently large,

$$\begin{aligned} {P_{\tilde{\pi }^n} ( \sqrt{g( \tilde{X}^n(0), \tilde{Z}^n(0))}>s)}\le \delta . \end{aligned}$$

By Markov’s inequality, it suffices to show that, for some \(C<\infty \),

$$\begin{aligned} \mathbb {E}_{\tilde{\pi }^n} \left[ \sqrt{g( \tilde{X}^n(0), \tilde{Z}^n(0))}\right] \le C/\epsilon , \end{aligned}$$
(5.2)

where \(C\) and \(\epsilon \) are constants in (5.1). To prove (5.2), we use (5.1) in the following form:

$$\begin{aligned} \epsilon \sqrt{g(x,z)}-C \le \sqrt{g(x,z)} - \mathbb {E}_{(a,x,z)} [\sqrt{g (\tilde{X}^n(t_0), \tilde{Z}^n(t_0))} ]. \end{aligned}$$

We argue that the right-hand side is nonpositive after taking the expectation with respect to \(\tilde{\pi }^n\), and (5.2) then follows. For each integer \(k\ge 1\) and each \((x,z)\in \mathbb {R}\times \mathbb {R}^K\), set \(f_k(x, z) = \sqrt{g(x,z)} \wedge k\). It can be checked that \(f_k(x,z) - \mathbb {E}_{(a,x,z)} f_k ( \tilde{X}^n(t_0), \tilde{Z}^n(t_0) )\) is bounded below by \(-C\) for all \((a,x,z)\in \tilde{\mathcal{S}}^n\). Therefore, Fatou’s Lemma can be applied and we deduce that

$$\begin{aligned}&\int \limits _{\tilde{\mathcal{S}}} \left[ \sqrt{g(x,z)}- \mathbb {E}_{(a,x,z)} \sqrt{g( \tilde{X}^n(t_0), \tilde{Z}^n(t_0))} \right] d \tilde{\pi }^n(a,x,z)\\&\quad \le \mathop {\lim \inf }_{k \rightarrow \infty } \int \limits _{\tilde{\mathcal{S}}} \left[ f_k(x,z) - \mathbb {E}_{(a,x,z)} f_k ( \tilde{X}^n(t_0), \tilde{Z}^n(t_0) ) \right] d \tilde{\pi }^n(a,x,z) = 0, \end{aligned}$$

where the equality follows from stationarity of \(\tilde{\pi }^n\). \(\square \)