1 Introduction

We study a load balancing system also known as supermarket checkout system, that is, a single-hop stochastic processing network (SPN) where each server has its own queue. There is a single stream of arrivals and, right after arriving, the jobs are routed to one of the queues by a dispatcher. Popular routing algorithms are join the shortest queue (JSQ) and power-of-d choices. Under JSQ, the new arrival is immediately sent to the shortest queue. Under power-of-d choices, d queues are sampled uniformly at random and the job is routed to the shortest queue among these d.

Exact analysis of many SPNs (including load balancing systems) usually becomes intractable. Hence, a common practice is to study the SPNs in some asymptotic regime to gain insights about their behavior. A popular regime is heavy traffic, where the number of servers is constant and the load is increased to the maximum capacity. One of the advantages of studying systems in heavy traffic is that, in the limit, many systems behave as lower-dimensional systems, a phenomenon known as state space collapse (SSC). In other words, if an SPN experiences SSC in the heavy-traffic limit, it behaves as if the number of queues was smaller.

Another popular asymptotic regime is mean field, where the load is kept constant and the number of servers is increased to infinity. In this regime, the main idea is that, as the number of servers increases, one can isolate one queue and study its interactions with the rest of the system. Then, since all the queues are equivalent, one uses the analysis of this single queue to understand the behavior of the entire system.

In this paper we work with the many-server heavy-traffic regime, where both, the load and the number of servers, increase together. Specifically, we let N be the number of servers and we parametrize the arrival process so that the mean arrival rate per server is \(1-N^{-\alpha }\), where \(\alpha >0\). Then, the total arrival rate to the system is \(N(1-N^{-\alpha })\). In the many-server heavy-traffic regime, there are different phases depending on the value of \(\alpha \). As \(\alpha \downarrow 0\), we approximately approach the mean-field regime, \(\alpha =\frac{1}{2}\) represents the Halfin–Whitt regime [24], \(\alpha =1\) represents the nondegenerate-slowdown regime (NDS) [2], and \(\alpha \rightarrow \infty \) can be thought of as the classical heavy-traffic regime. In this paper, we look at all super-NDS regimes, i.e., the regimes with \(\alpha >1\). Hence, we study regimes where the systems are more heavily loaded than NDS. The main contributions of this paper are summarized below:

  1. (i)

    We show that the total queue length scaled by \(N^{-\alpha }\) (or, equivalently, the average queue length scaled by \(N^{1-\alpha }\)) converges in distribution to an exponential random variable if the load grows ‘fast enough’ with respect to the number of servers. In particular, under power-of-d choices with constant d, we show that this result is valid if \(\alpha >3\) (see Corollary 1); and under JSQ the same result holds for \(\alpha >2\) (see Corollary 2). Further, we show the condition that \(\alpha \) must satisfy under power-of-d choices when d is a function of the number of servers. Specifically, we show that if \(d{\mathop {=}\limits ^{\triangle }}\lceil cN^\beta \rceil \) for some \(c>0\) and \(\beta \ge 0\) such that \(d\in [N]\), the convergence to the exponential random variable is valid if \(\alpha +\beta >3\) (see Theorem 1). We provide two proofs to our result, which we explain in the next two contributions.

  2. (ii)

    We first show the result using one-sided Laplace transform (see Sect. 3). Specifically, we generalize the Transform method introduced in [30] for discrete-time systems, to a continuous-time model. We briefly describe the Transform method below, and we provide more details in Sect. 3.1.

  3. (iii)

    We compute the rate of convergence of the scaled total queue length to the exponential random variable in Wasserstein’s distance (see Theorem 3). This is a stronger version of Theorem 1, where we actually obtain the convergence in distribution as a consequence of the error bound. To show this result, we use Stein’s method (see Sect. 4).

  4. (iv)

    All these proofs are powered by a multiplicative SSC result that we show in Proposition 1. We show SSC to the line generated by the vector \({\varvec{1}}\), i.e., we show that all the queue lengths are similar in the limit. Specifically, we compute bounds for the moments of the norm of the difference between the queue length vector and its projection on the line generated by the vector \({\varvec{1}}\). Further, we compute a bound for the moment generating function (MGF) of its norm. These bounds grow to infinity as the number of servers increase. However, after scaling the total queue length by \(N^{-\alpha }\) they become negligible (see Sect. 2.1), hence the name multiplicative.

  5. (v)

    We compute the rate of convergence in expected value of the total average queue length scaled by \(N^{-\alpha }\). Specifically, we show that the rate of convergence is of order \(\log (N)N^{3-\beta }\). As a consequence, we prove the convergence of the expectation under the same conditions established in Theorem 1 (see Theorem 2). Similarly to Theorem 1, we explicitly show that the power-of-d choices algorithm with constant d and the JSQ algorithm are immediate consequences of Theorem 2. To prove this result, we use the Drift method (see 3.3), which we briefly explain in Sect. 1.1, and Stein’s method (see Sect. 4).

Before discussing the related work, we briefly summarize the Transform method and Stein’s method. The Transform method introduced in [30] is a two-step procedure to compute the distribution of the queue lengths in classical heavy-traffic regime, and it can be used for queueing systems that experience SSC to a one-dimensional subspace. The method is introduced in the context of a load balancing system and a generalized switch. Before using the method, positive recurrence and SSC to a one-dimensional subspace must be proved. The main idea is to consider an exponential test function such that, after setting its drift to zero, yields the MGF of the projection of the vector of queue lengths on the subspace where SSC occurs. Then, an implicit expression that is valid for all traffic is obtained. The last step is to take the heavy-traffic limit and prove that the terms depending on the queue lengths vanish, so that we obtain an explicit expression for the limiting MGF.

Stein’s method is based on the approach introduced by [46]. The main idea is to bound the Wasserstein’s distance between the pre-limit random variable and the limiting random variable. In the definition of Wasserstein’s distance, all the Lipschitz functions with constant 1 are considered (see Definition 3). Hence, one needs to compute a bound that is valid for a family of functions.

In both methods, the use of test functions is essential. In the Transform method, this function is exponential and part of the merit of [30] was to realize the right exponential test function. In fact, the entire proof depends on this specific test function. In Stein’s method, instead, one simultaneously considers a whole family of test functions. Then, by studying their drift we can get the bounds on the Wasserstein’s distance.

The rest of the paper is organized as follows. In the rest of this section, we discuss related work (Sect. 1.1) and we establish the notation (Sect. 1.2). In Sect. 2 we present the model, the main result and some essential results to our proofs (including SSC). In Sect. 3 we present our proof based on the Transform technique and we additionally show the rate of convergence of the expected queue length using the Drift method. In Sect. 4 we compute the rate of convergence in Wasserstein’s distance using Stein’s method, and we obtain a proof of the main result as a consequence. We additionally obtain the rate of convergence in mean as a consequence of the main theorem of Sect. 4. Finally, in Sect. 6 we present concluding remarks and future work.

1.1 Related work

JSQ was first proposed in [59], and since then, it has received plenty of attention [1, 3, 14, 15, 17, 30, 42, 54]. It is particularly interesting in the heavy-traffic regime, because it experiences SSC to a one-dimensional subspace. Hence, its queue length vector behaves as a single-server queue and the distribution of the queue lengths is known to be exponential. Specifically, it has been proven that the scaled vector of queue lengths converges in distribution to a vector of the form \(\Upsilon {\varvec{1}}\), where \(\Upsilon \) is an exponential random variable and \({\varvec{1}}\) is the vector of ones. This proof has been performed using the diffusion limits approach [17], the Drift method [15] and the Transform method [30].

A drawback of this policy is that it requires a large communication overhead and, hence, it is not practical in large-scale systems (assuming that the dispatcher does not use any state-information stored in its memory to make the routing decision). On the other extreme is random routing, where all the arriving jobs are routed to a queue selected uniformly at random and, hence, no communication overhead is required. Power-of-d choices can be considered in between these two, since it only requires scanning d queues before routing. It has been proven that, even if \(d=2\), the queue lengths decrease considerably when compared to random routing. This result has been shown in the mean-field regime [39, 40, 51] and in the classical heavy-traffic regime [29, 37]. An extensive list of the literature on these policies is presented in [50].

The mean-field regime has become popular after it was used to show that the power-of-2 choices algorithm yields queue lengths that are considerably smaller than random routing [39, 40, 51]. It was later proved that the JSQ system behaves as an \(M/M/\infty \) system in the mean-field regime [3]. In [42], it was shown that under power-of-d choices with d growing with N, the fluid limit does not depend on the growth rate and, hence, power-of-d and JSQ have the same fluid limit. More recently, it has been shown that, in this regime, there must always be a proportion of empty queues and, hence, any routing policy that prioritizes empty queues yields queue lengths of at most one job [21]. Under the same logic, the join the idle queue (JIQ) policy has become popular. It was proposed in [34] and the idea is that, whenever a server idles, it communicates its status to the dispatcher. Then, the arrivals are routed randomly to one of the empty queues. If none of the queues is empty, then a server is selected uniformly at random. This policy has been rigorously analyzed in [49] under exponential job sizes, and in [18] for general job-size distributions. In both cases, the authors show that the steady-state probability that an arriving job waits in line vanishes as the number of servers grows to infinity.

Among the many-server heavy-traffic regimes, one of the most popular is the Halfin-Whit regime, where the difference between the service and arrival rate per server is \(N^{-1/2}\), i.e., \(\alpha =\frac{1}{2}\). This regime was introduced in [24], where the authors present the classical analysis of the M/M/N queue. More recently, [16] shows that the number of empty queues and the number of queues with one customer in line are of order \(O(\sqrt{N})\). The authors use the diffusion limits approach, but interchange of limits is not proved. This step is completed in [7]. In [4, 5] the work of [16] is continued. Specifically, in [4] the authors study tail asymptotics of the stationary distribution, and in [5] they study the moments of the stationary distribution. In [43], the authors show that JIQ routing yields diffusion-level optimality in the Halfin–Whitt regime.

In [33], load balancing systems under several routing policies in the sub-Halfin–Whitt regime are studied, and in [32] the analysis is extended to the case when \(\alpha \in \left[ \frac{1}{2},1\right) \). In [55] a load balancing system operating under power-of-d, where jobs are batches of tasks, is analyzed. Specifically, the authors find conditions on the value of d (as a function of the number of servers, the load and the number of tasks per job) such that power-of-d choices achieves zero delay in sub-Halfin–Whitt regime.

The NDS regime was introduced in [2] in the context of an M/M/N queue, and the author shows that the regime yields new diffusion processes. More recently, it has been used to compare routing policies in load balancing systems [21]. Specifically, the authors in [21] characterize the diffusion approximation of JSQ and propose a new policy with less communication overhead, and that achieves JSQ optimality. This policy is called idle-one-first and prioritizes routing to servers that are idling or have one job.

The heavy-traffic asymptotics of several systems have been studied in the literature. Most of the work is on systems that satisfy the complete resource pooling (CRP) condition, i.e., that satisfy SSC into a one-dimensional subspace. For a formal definition of the CRP condition, the reader is referred to [12, 27, 58]. The vast majority of the work has been performed using the diffusion limits approach [19, 25,26,27, 47, 57, 58]. In this approach, the scaled queue lengths are shown to converge to a reflected Brownian motion (RBM) process, and then the steady-state behavior of this RBM is studied using SSC. The last step is to show interchange of limits, which is usually challenging.

Recently, three ‘direct methods’ have been proposed to perform heavy-traffic analysis [11]. In these approaches, there is no need to show interchange of limits, as one directly works with the queue length process instead of working with an RBM. The methods are: (i) the BAR approach, (ii) the Drift method and (iii) Stein’s method. BAR stands for basic adjoint relationship, and the main idea of the method is to use carefully chosen exponential functions to handle the jumps of a continuous-time process [10, 41].

In the Drift method, the main idea is to choose the right test function and set its drift to zero in steady state. Then, using SSC, one gets error bounds on a function of the queue lengths that are tight in heavy traffic. If we use polynomial test functions, we obtain moments of the queue lengths [15, 28, 35, 36, 53]. The Transform method introduced in [30] is based on the Drift method and uses exponential test functions along with SSC to obtain the limiting MGF of the scaled queue lengths. Both, the Transform method and the BAR approach use exponential test functions, but the Transform method is focused on using SSC to obtain a closed-form distribution for the scaled queue lengths in the limit.

Stein’s method for analyzing SPNs was first introduced in [22], and it has now emerged as a simple yet powerful method that can be used not only to show asymptotic convergence, but also to bound the rate of convergence in Wasserstein’s distance. It has become a popular approach for both, the mean-field, the classical heavy-traffic and the many-server heavy-traffic regimes [7,8,9, 33, 48, 60, 61]. A key component of our proof that is novel relative to prior literature, is the use of Stein’s method in the presence of a multiplicative SSC.

Multiplicative SSC has been used in a variety of contexts in the literature [6, 13, 31, 45, 52, 53, 57]. The most relevant work in our context are the results in [13, 53]. In [13] the authors study a parallel-server system in the Halfin–Whitt regime, and they propose a framework for establishing SSC in queueing systems with multiple server pools in parallel and different customer classes. They use the fluid dynamics to establish their result. In [53] the multiplicative SSC result is used in the context of the heavy-traffic analysis of a bandwidth sharing network, and they use the Drift method to analyze it. Their proof is based on bounding the drift of the error of approximating the actual vector of flows by its projection on the subspace where SSC occurs, which is traditional in the Drift method [15, 35, 36]. In the traditional Drift method technique, the SSC bounds are independent of the heavy-traffic parameter. However, in [53], the bounds depend on the heavy-traffic parameter and the authors show that they become negligible after scaling. In this paper, we adopt their technique to show SSC and we use it in the context of a load balancing system in the many-server heavy-traffic regime.

In Table 1 we show a summary of the related work presented above, classified according to the value of \(\alpha \).

Table 1 Literature review for asymptotic regimes depending on the value of \(\alpha \)

1.2 Notation

We use \({\mathbb {R}}\) and \({\mathbb {Z}}\) to denote the sets of real and integer numbers, respectively. We add a subscript \(+\) when we refer to nonnegative numbers, and a superscript \(n\in {\mathbb {Z}}_+\) when we mean vector spaces. We use bold letters to denote vectors. Given a vector \({\varvec{x}}\in {\mathbb {R}}^N\), we use \(x_{(i)}\) to its \(i{}^{\text {th}}\) smallest element.

For \({\varvec{x}}\in {\mathbb {R}}^n\), and \(p\in {\mathbb {Z}}_+\) with \(p\ge 1\) we use \(\Vert {\varvec{x}}\Vert _p\) to denote the p-norm of \({\varvec{x}}\), and we omit the index when we refer to Euclidean norm (i.e., when \(p=2\)). We use \({\varvec{1}}\) and \({\varvec{0}}\) to denote the vectors of all-one and all-zero elements, respectively. We use \({\varvec{e}}_n^{(i)}\) to denote the n-dimensional \(i{}^{\text {th}}\) canonical vector, i.e., an n-dimensional vector with a 1 in the \(i{}^{\text {th}}\) position and 0’s everywhere else. When the dimension is clear from the context, we may omit the subscript n.

Given a random variable X, we use \({\mathbb {E}}\left[ X \right] \) to denote its expected value and \(\text {Var}\left[ X \right] \) for its variance. For an event A we use \(\mathbbm {1}_{\left\{ A \right\} }\) to denote the indicator function of A. We use \(\Rightarrow \) to denote convergence in distribution.

Given two integers \(k,n\in {\mathbb {Z}}_+\), we use \(\left( {\begin{array}{c}n\\ k\end{array}}\right) \) for the binomial coefficient and we use the convention \(\left( {\begin{array}{c}n\\ k\end{array}}\right) =0\) if \(n<k\). We use [n] to the denote the set of positive integers that are smaller than or equal to n, i.e., \([n]{\mathop {=}\limits ^{\triangle }}\{1,\ldots ,n\}\).

For a function f with domain Dom(f), we denote \(\Vert f\Vert {\mathop {=}\limits ^{\triangle }}\sup _{x\in Dom(f)}|f(x)|\), and we use \(f'\), \(f''\) and \(f'''\) for its first, second and third derivative, respectively (provided their existence).

2 Model and asymptotic result

Consider a load balancing system operating in continuous time. Specifically, there are N parallel servers, and each of them has an infinite buffer. Arrivals to the system occur according to a Poisson process at rate \(\lambda N\), where \(\lambda \in (0,1)\). Upon arrival, a dispatcher immediately routes the new job to one of the servers, where they wait in line until the server can process them. All the servers are identical, and all the arriving jobs have exponential size with mean 1. Routing occurs according to power-of-d choices, where \(d\in {\mathbb {Z}}_+\) is of the form \(d{\mathop {=}\limits ^{\triangle }}\lceil cN^\beta \rceil \) for constants \(c>0\) and \(\beta \in [0,1]\). Specifically, upon arrival of a job, d servers are sampled uniformly at random and the new job is routed to the server with the shortest queue among those d. Ties are broken with the minimum index rule. Observe that if \(c=\beta =1\), then \(d=N\) and power-of-d choices is equivalent to JSQ.

For each \(t\in {\mathbb {R}}_+\) and each \(i\in [N]\), let \(q_i(t)\) be the number of jobs in queue i at time t, including the job in service (if any). Then, the queue length process \(\left\{ {\varvec{q}}(t):t\in {\mathbb {R}}_+ \right\} \) is a continuous-time Markov chain (CTMC) with the generator matrix G defined in (1). Let \({\varvec{q}}\in {\mathbb {Z}}_+^N\), and for each \(i\in [N]\) let \(\psi _{{\varvec{q}}}(i)\) be the index of the \(i{}^{\text {th}}\) smallest element of \({\varvec{q}}\), breaking ties by minimum index rule. Then, for any \({\varvec{q}}'\in {\mathbb {Z}}_+^N\) we have that the transition rate from state \({\varvec{q}}\) to state \({\varvec{q}}'\) is

$$\begin{aligned} G_{{\varvec{q}},{\varvec{q}}'}{\mathop {=}\limits ^{\triangle }}{\left\{ \begin{array}{ll} -\left( \lambda N + \sum _{i=1}^N \mathbbm {1}_{\left\{ q_i>0 \right\} }\right) &{} \text {if }{\varvec{q}}={\varvec{q}}', \\ \mathbbm {1}_{\left\{ q_i>0 \right\} } &{} \text {if }{\varvec{q}}'={\varvec{q}}-{\varvec{e}}^{(i)}, \text {with }i\in [N], \\ \lambda N \dfrac{\left( {\begin{array}{c}N-i\\ d-1\end{array}}\right) }{\left( {\begin{array}{c}N\\ d\end{array}}\right) } &{} \text {if }{\varvec{q}}' = {\varvec{q}}+ {\varvec{e}}^{(\psi _{{\varvec{q}}}(i))}, \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(1)

The first case is the additive inverse of the sum of the other cases; the second case corresponds to a departure from queue i, which occurs at rate 1 and it can only happen if the queue is nonempty; and the third case corresponds to an arrival to the \(i{}^{\text {th}}\) shortest queue. Arrivals occur at rate \(\lambda N\), and \(\frac{\left( {\begin{array}{c}N-i\\ d-1\end{array}}\right) }{\left( {\begin{array}{c}N\\ d\end{array}}\right) }\) represents the probability that the new arrival is routed to the \(i{}^{\text {th}}\) shortest queue under power-of-d choices, for the following reason. Since the dispatcher samples d queues uniformly at random, there are \(\left( {\begin{array}{c}N\\ d\end{array}}\right) \) possible groups of d servers. One of the sampled servers needs to be the one labeled as \(\psi _{{\varvec{q}}}(i)\), and the other \(d-1\) servers need to have longer queues. Since \(\psi _{{\varvec{q}}}(i)\) represents the index of the \(i{}^{\text {th}}\) shortest queue, there are \(N-i\) queues that are longer and, hence, we there are \(\left( {\begin{array}{c}N-i\\ d-1\end{array}}\right) \) possible groups of servers that ensure that routing occurs to the server labeled as \(\psi _{{\varvec{q}}}(i)\). Hence, the probability of sending the new arrival to the \(i{}^{\text {th}}\) shortest queue is \(\frac{\left( {\begin{array}{c}N-i\\ d-1\end{array}}\right) }{\left( {\begin{array}{c}N\\ d\end{array}}\right) }\).

Remark 1

For ease of exposition, in this paper we assume that ties for the \(i{}^{\text {th}}\) shortest queue are broken deterministically, with the minimum index rule. However, all our results are valid for Markovian tie-breaking rules. It is true that the transition rate matrix G changes according to this rule. However, as the reader will notice later in this paper, we work with the total queue length and this object does not change if we change the deterministic tie-breaking rule described above by Markovian rules.

We are interested in the steady-state analysis of the load balancing system described above. First observe that the Markov chain is irreducible and nonexplosive. Additionally, the total arrival rate to the system \((\lambda N)\) is strictly smaller than the total service rate (N) for any \(\lambda \in (0,1)\). Then, the queue length process is also positive recurrent. Hence, stationary distribution exists and it is unique [23, Proposition 6.9b]. Let \({\overline{{\varvec{q}}}}\) be a steady-state random vector which is limit in distribution of \(\left\{ {\varvec{q}}(t):t\in {\mathbb {R}}_+ \right\} \), and define \({\overline{q}}_\Sigma {\mathop {=}\limits ^{\triangle }}\sum _{i=1}^N {\overline{q}}_i\).

We parametrize the system as follows. Consider \(\alpha >0\) and let \(\lambda ^{(N)}{\mathop {=}\limits ^{\triangle }}1-N^{-\alpha }\) be the arrival rate per server to the system. Then, the total arrival rate is \(\lambda ^{(N)}N\). Let \(\left\{ {\varvec{q}}^{(N)}(t):t\in {\mathbb {R}}_+ \right\} \) be the queue length process of the \(N{}^{\text {th}}\) system and \({\overline{{\varvec{q}}}}^{(N)}\) a steady-state random vector which is limit in distribution of \(\left\{ {\varvec{q}}^{(N)}(t):t\in {\mathbb {R}}_+ \right\} \). In the next theorem we present the main result of this paper.

Theorem 1

Consider a sequence of load balancing systems operating under power-of-d, parametrized by N as described above. If \(\alpha +\beta >3\), then \(N^{1-\alpha }\left( \frac{{\overline{q}}_\Sigma ^{(N)}}{N}\right) \Rightarrow \Upsilon \) as \(N\rightarrow \infty \), where \(\Upsilon \) is an exponential random variable with mean 1.

Immediate corollaries of Theorem 1 are the cases of power-of-d with constant d, and JSQ (which corresponds to \(d=N\)). We formally present these results below.

Corollary 1

Consider a sequence of load balancing systems operating under power-of-d choices, parametrized by N as described in Theorem 1. Suppose \(d=c\), where \(c\in {\mathbb {Z}}_+\) is a fixed parameter. If \(\alpha >3\), then \(N^{-\alpha }{\overline{q}}_\Sigma ^{(N)}\Rightarrow \Upsilon \) as \(N\rightarrow \infty \), where \(\Upsilon \) is an exponential random variable with mean 1.

The proof of Corollary 1 holds easily after setting \(\beta =0\) in Theorem 1. Now we present a result for the load balancing system under JSQ.

Corollary 2

Consider a sequence of load balancing systems operating under JSQ, parametrized by N as described in Theorem 1. If \(\alpha >2\), then \(N^{-\alpha }{\overline{q}}_\Sigma ^{(N)}\Rightarrow \Upsilon \) as \(N\rightarrow \infty \), where \(\Upsilon \) is an exponential random variable with mean 1.

The proof of Corollary 2 holds after letting \(c=\beta =1\) in Theorem 1.

Before proceeding with the proof of Theorem 1, we introduce the definition of the drift of a function, which is essential in our proofs. We use the definition provided in [53, Equation (14)].

Definition 1

Let \(\left\{ X(t): t\in {\mathbb {R}}_+ \right\} \) be a CTMC with countable state space \({\mathcal {X}}\) and transition rate matrix \(G^X\). For a function \(Z:{\mathcal {X}}\rightarrow {\mathbb {R}}_+\) and any \(x\in {\mathcal {X}}\), we define the drift of Z at x as

$$\begin{aligned} \Delta Z(x){\mathop {=}\limits ^{\triangle }}\sum _{x'\in {\mathcal {X}},\, x\ne x'} G^X_{x,x'} \left( Z(x')-Z(x)\right) . \end{aligned}$$

We write that we set the drift of Z to zero in steady state when we use the property \({\mathbb {E}}\left[ \Delta Z(X(t)) \right] =0\) under stationary distribution, provided that \({\mathbb {E}}\left[ Z(X(t)) \right] <\infty \) in steady state and \(\sup _{x\in {\mathcal {X}}} |G^X_{x,x}|<\infty \).

The drift of a function Z is also known as the generator applied to Z in the context of CTMCs. Here, we refer to it as drift, for consistency with [30, 53].

2.1 Essential results

The main difficulties in the proof of Theorem 1 are the dependency among queues, and handling the reflections due to the queue lengths being nonnegative. The latter is represented by the indicator functions in the transition rate matrix G. Both of these difficulties are handled with appropriate bounds, that are contributions by themselves.

We first present the SSC result, which establishes that all the queue lengths are approximately equal in the limit as \(N\rightarrow \infty \). Before stating the result formally, we introduce some notation. Given a vector \({\varvec{x}}\in {\mathbb {R}}_+^N\), define

$$\begin{aligned} {\varvec{x}}_\parallel {\mathop {=}\limits ^{\triangle }}{\varvec{1}}\left( \dfrac{\sum _{i=1}^N x_i}{N}\right) ,\;\text {and}\; {\varvec{x}}_\perp {\mathop {=}\limits ^{\triangle }}{\varvec{x}}-{\varvec{x}}_\parallel . \end{aligned}$$
(2)

Then, \({\varvec{x}}_\parallel \) is the projection of \({\varvec{x}}\) to the line generated by \({\varvec{1}}\), and \({\varvec{x}}_\perp \) represents the error of approximating \({\varvec{x}}\) by \({\varvec{x}}_\parallel \).

Our goal is to show that the queue lengths are similar in the limit. Using the notation above, we intuitively need to show that \({\overline{{\varvec{q}}}}^{(N)}\approx {\overline{{\varvec{q}}}}^{(N)}_\parallel \) asymptotically. We accomplish this goal by showing that \({\overline{{\varvec{q}}}}_\perp ^{(N)}\) is negligible as \(N\rightarrow \infty \), with the following result.

Proposition 1

Consider a load balancing system operating under power-of-d choices, as described in Sect. 2, and let \(\lambda _0\in (0,1)\). If c and \(\beta \) are such that \(d= \lceil c N^\beta \rceil \ge 2\), then for any \(\lambda \in (\lambda _0,1)\):

  1. 1.

    There exists a finite constant C, which is independent of \(\lambda \), \(\beta \), c and N, such that for any positive integer r we have

    $$\begin{aligned} {\mathbb {E}}\left[ \left\| {\overline{{\varvec{q}}}}_\perp \right\| ^r \right] \le C^r \left( \dfrac{N^2}{d-1}\right) ^r r^{r+\frac{1}{2}}. \end{aligned}$$
    (3)
  2. 2.

    Let e be Euler’s constant and \({\overline{C}}{\mathop {=}\limits ^{\triangle }}C \exp \left( {\scriptstyle \frac{1}{2e}}\right) \). Then, for any positive integer r we have

    $$\begin{aligned} {\mathbb {E}}\left[ \left\| {\overline{{\varvec{q}}}}_\perp \right\| ^r \right] ^{\frac{1}{r}}\le {\overline{C}}r \left( \dfrac{N^2}{d-1}\right) . \end{aligned}$$
    (4)
  3. 3.

    Let \(\theta ^*\in {\mathbb {R}}\) be such that \(|\theta ^*|<\frac{1}{2}\log \left( 1 + \frac{\lambda _0(d-1)}{2N^2} \right) \). Then,

    $$\begin{aligned} {\mathbb {E}}\left[ \exp \left( {\scriptstyle \theta ^* \left\| {\overline{{\varvec{q}}}}_\perp \right\| }\right) \right] \le \dfrac{\lambda _0 (d-1) \exp \left( {\scriptstyle \frac{2\theta ^* N^2}{\lambda _0(d-1)}}\right) }{\lambda _0(d-1) + 2N^2 \left( 1- \exp \left( {\scriptstyle 2\theta ^*}\right) \right) }. \end{aligned}$$
    (5)

We prove Proposition 1 in Sect. 5. Before ending this section, we discuss the result and prove a preliminary result that is essential for the rest of this paper.

The bounds (3), (4) and (5) clearly increase to infinity as \(N\rightarrow \infty \), unless \(\beta >2\). However, we are interested in a result that is valid for constant d as well as \(d=N\) (i.e., where \(\beta \in [0,1]\)), so we do not want to add conditions on \(\beta \). Instead, we must consider the scaling of the vector of queue lengths to argue that the bounds (3), (4) and (5) imply SSC. Indeed, Proposition 1 provides a multiplicative SSC result, which means collapse of the scaled vector of queue lengths. Observe that Theorem 1 provides convergence in distribution of the average queue length scaled by \(N^{1-\alpha }\). Therefore, the SSC result should imply that \(N^{1-\alpha }{\overline{{\varvec{q}}}}_\perp ^{(N)}\) is asymptotically negligible. In fact, using the bounds from Proposition 1 we obtain that \(N^{1-\alpha }{\overline{{\varvec{q}}}}_\perp ^{(N)}\) is negligible when \(\alpha +\beta >3\), which is the same condition from Theorem 1 and can be satisfied for constant d or \(d=N\).

We end this section with the result we use to handle the indicator function from the generator matrix G. When one computes the drift of any function, we get a term of the form \(\mathbbm {1}_{\left\{ q_i>0 \right\} }\) for each \(i\in [N]\). As mentioned above, this term represents that service cannot occur at empty queues and, therefore, the queue lengths cannot be negative. In this paper, we handle this indicator function using the property \(\mathbbm {1}_{\left\{ q_i>0 \right\} }=1-\mathbbm {1}_{\left\{ q_i=0 \right\} }\) for every \(i\in [N]\) and the following lemma. In fact, Lemma 1 is repeatedly used in the proof of SSC, in the two proofs that we provide for Theorem 1 and in the proof of Theorem 2.

The result presented in Lemma 1 is more general than the load balancing system described in Sect. 2, and holds for a variety of routing algorithms. All we need is throughput optimality, which we define below for clarity.

Definition 2

We say that a routing algorithm \({\mathcal {A}}\) is throughput optimal for the load balancing system described in Sect. 2 if the Markov chain \(\{{\varvec{q}}(t):t\in {\mathbb {R}}_+ \}\) operating under \({\mathcal {A}}\) is positive recurrent for all \(\lambda \in (0,1)\).

Now we present the result.

Lemma 1

Consider a load balancing system as described in Sect. 2, where the routing policy is throughput optimal. Let \(\lambda \in (0,1)\) be the arrival rate per server, and let \({\overline{{\varvec{q}}}}\) be a steady-state random vector which is limit in distribution of \(\left\{ {\varvec{q}}(t):t\in {\mathbb {R}}_+ \right\} \). Then,

$$\begin{aligned} {\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } \right] = N(1-\lambda ). \end{aligned}$$

Note that if we use \(\lambda ^{(N)}\) in this lemma, we obtain

$$\begin{aligned} {\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i^{(N)}=0 \right\} } \right] =N^{1-\alpha }. \end{aligned}$$

Now we prove the result.

Proof

(of Lemma 1) Take \(M\in {\mathbb {Z}}_+\), and consider the test function \(V_M({\varvec{q}})=\min \left\{ \sum _{i=1}^N q_i,\, M\right\} \). The proof follows after setting its drift to zero in steady state and letting \(M\rightarrow \infty \).

We first compute the drift, considering the three cases: (i) \(\sum _{i=1}^N q_i<M\), (ii) \(\sum _{i=1}^N q_i=M\), and (iii) \(\sum _{i=1}^N q_i>M\). Observing that \(\Delta V_M({\varvec{q}})\mathbbm {1}_{\left\{ \sum _{i=1}^N q_i>M \right\} }=0\), we obtain

$$\begin{aligned} \Delta V_M({\varvec{q}})&= \mathbbm {1}_{\left\{ \sum _{i=1}^N q_i<M \right\} } \left( \lambda N - \sum _{j=1}^N \mathbbm {1}_{\left\{ q_j>0 \right\} }\right) - \mathbbm {1}_{\left\{ \sum _{i=1}^N q_i =M \right\} } \left( \sum _{j=1}^N \mathbbm {1}_{\left\{ q_j>0 \right\} }\right) \nonumber \\&{\mathop {=}\limits ^{(a)}} \mathbbm {1}_{\left\{ \sum _{i=1}^N q_i <M \right\} }\lambda N - \mathbbm {1}_{\left\{ \sum _{i=1}^N q_i\le M \right\} } \left( \sum _{j=1}^N \mathbbm {1}_{\left\{ q_j>0 \right\} }\right) \end{aligned}$$
(6)

where (a) holds because \(\mathbbm {1}_{\left\{ \sum _{i=1}^N q_i=M \right\} }+\mathbbm {1}_{\left\{ \sum _{i=1}^N q_i<M \right\} } = \mathbbm {1}_{\left\{ \sum _{i=1}^N q_i\le M \right\} }\) by properties of indicator functions, and reorganizing terms.

Observe that \({\mathbb {E}}\left[ |V_M({\varvec{q}}(t))| \right] <\infty \) by definition of the test function \(V_M({\varvec{q}})\), and \(\sup _{{\varvec{q}}\in {\mathbb {Z}}_+^N}|G_{{\varvec{q}},{\varvec{q}}}| =N(\lambda +1)<\infty \) by definition of G in (1). Hence, \({\mathbb {E}}\left[ \Delta V_M({\overline{{\varvec{q}}}}) \right] =0\) in steady state. Applying this property to (6) and reorganizing terms, we obtain

$$\begin{aligned} \lambda N {\mathbb {P}}\left( \sum _{i=1}^N {\overline{q}}_i<M\right) = {\mathbb {E}}\left[ \mathbbm {1}_{\left\{ \sum _{i=1}^N {\overline{q}}_i\le M \right\} } \left( \sum _{j=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_j>0 \right\} }\right) \right] . \end{aligned}$$

Now we take the limit as \(M\rightarrow \infty \). Observe that \(\{{\varvec{q}}(t):t\in {\mathbb {R}}_+ \}\) is positive recurrent and, therefore, \( {\mathbb {P}}\left( \sum _{i=1}^N {\overline{q}}_i<\infty \right) =1\). Hence, we obtain

$$\begin{aligned} \lambda N&= {\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i>0 \right\} } \right] = N-{\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } \right] . \end{aligned}$$

The result holds after reorganizing terms. \(\square \)

Observe that in Lemma 1 we do not need to assume that routing occurs according to power-of-d choices. Even though the drift is defined in terms of the generator matrix G, and this matrix changes with the routing algorithm, the proof of Lemma 1 does not use the details of G. We only use that, if there is an arrival, the total queue length increases by 1 and, if there is a departure (when the system is not empty), the total queue length decreases by 1.

Remark 2

An alternative proof of Lemma 1 is using Little’s law as follows. The expected number of busy servers \({\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i>0 \right\} } \right] \) equals the expected arrival rate \(\lambda N\) multiplied by the expected time a job is processed, which is 1. We presented a proof using the Drift method above to highlight the similarities between the method in continuous and discrete-time systems.

Remark 3

In the proof of Lemma 1, we use the test function \(V_M({\varvec{q}})\), which is bounded by M with probability 1, even if the expected total queue length is infinite. If we additionally assume that \({\mathbb {E}}\left[ {\overline{q}}_\Sigma \right] <\infty \), we can instead use the linear test function \(V_\ell ({\varvec{q}})=\sum _{i=1}^N q_i\) to prove the result. Whether \({\mathbb {E}}\left[ {\overline{q}}_\Sigma \right] <\infty \) or not depends on the routing policy. For example, random routing, power-of-d choices and JSQ ensure that the expected total queue length is finite in steady state for any finite N.

In the last remark we claim that three routing policies yield finite expected total queue length for the load balancing system. The first observation for this proof is that it suffices to show that the expected total queue length is finite under random routing. Among the three routing policies, random routing results in the longest expected number of jobs in the system because it does not optimize the choice of the server where the new job goes.

To prove that random routing yields finite total expected queue length, there are several options. For brevity, we only discuss two methods here. One option is the Foster-Lyapunov approach, as indicated in [23, Corollaries 6.15 and 6.18]. In this approach we bound the drift of the test function \(V_2({\varvec{q}})=\Vert {\varvec{q}}\Vert ^2\) to obtain the result.

Another option is using the simplicity of random routing to compute the its total expected queue length. Since the arrivals to the system follow a Poisson process and routing is random (and independent of the arrival process), the splitting property of the Poisson process implies that arrivals to each queue also follow a Poisson process. Hence, random routing results in N M/M/1 queues, and the mean queue length of an M/M/1 queue is known (see [23, Section 6.6], for example).

Remark 4

In the classical heavy-traffic regime, one defines the heavy-traffic parameter as \(\epsilon {\mathop {=}\limits ^{\triangle }}\mu _\Sigma -\lambda _\Sigma \), where \(\mu _\Sigma \) is the total service rate (the sum of the mean service rate of each server) and \(\lambda _\Sigma \) is the total arrival rate. Then, one parametrizes the vector of queue lengths by \(\epsilon \) and can show that \(\epsilon \frac{1}{N}\sum _{i=1}^N {\overline{q}}^{(\epsilon )}_i\Rightarrow {\tilde{\Upsilon }}\), where \({\tilde{\Upsilon }}\) is an exponential random variable whose mean depends on the variance of the arrival and service processes. Further, one can show that \(\epsilon {\overline{{\varvec{q}}}}^{(\epsilon )}\Rightarrow {\varvec{1}}{\tilde{\Upsilon }}\) [15, 17, 30].

In this paper we study the many-server heavy-traffic regime, and our goal is to find the value of \(\alpha \) such that the scaled average queue length converges in distribution to an exponential random variable. In Theorem 1 we show convergence in distribution of the total queue length scaled by \(N^{-\alpha }\), which is equivalent to the average queue length scaled by \(N^{1-\alpha }\). Additionally, observe that the difference between the total service and arrival rate in this paper is \(N^{1-\alpha }\). In other words, in the many-server heavy-traffic regime, \(N^{1-\alpha }\) plays the role of the heavy-traffic parameter \(\epsilon \). Further, in the classical heavy-traffic regime there is an analogous result to Lemma 1, which is key to bound the so-called unused service.

Remark 5

As mentioned above, Lemma 1 is essential in the analysis. In discrete-time systems, there is an analogous of the indicator functions, known as unused service. In simple words, one models the number of jobs that each server processes in one time slot with the potential service, which is a random variable independent of the queue lengths and arrival processes. Then, the number of processed jobs is the minimum between the potential service and the number of jobs in the queue. The difference between the potential and actual service is the unused service. Hence, similarly to the indicator functions, the unused service prevents the queue lengths to go negative. The key property of the unused service is that, whenever it’s positive, the queue is empty in the next time slot. This property is analogous to the indicator function \(\mathbbm {1}_{\left\{ q_i=0 \right\} }\), which is positive only when the queues are empty and, hence, the server cannot process a job. This property of the unused service is key in the analysis of discrete-time systems, as it greatly facilitates the computation of performance measures such as the moments of the queue lengths and their distribution [15, 28, 30, 35, 36]. We will show that handling the indicator function \(\mathbbm {1}_{\left\{ q_i=0 \right\} }\) is key in the proofs we provide of Theorem 1.

3 Transform method: Proof of Theorem 1

The first proof of Theorem 1 that we present is motivated by the Transform method introduced in [30]. Before providing the proof, we briefly summarize the main idea and the steps of the method.

3.1 Review of the Transform method introduced in [30]

In [30] the authors introduce a Transform method based on the Drift method [15, 28, 35, 36], to compute the heavy-traffic distribution of the vector of queue lengths of SPNs that satisfy the CRP condition, i.e., that behave as a single-server queue in the limit.

3.1.1 Overview

The Transform method introduced in [30] uses an exponential transformation of the queue lengths to compute their heavy-traffic distribution. This exponential transformation can be the characteristic function, the MGF or the one-sided Laplace transform. In [30] the focus is on using the MGF. The authors propose a two-step procedure that yields the limiting MGF of the vector of queue lengths. In the first step, the goal is to obtain an implicit equation for the MGF of the queue lengths that is valid for all traffic. Then, in the second step, the heavy-traffic limit is taken and one needs to bound some terms to obtain an explicit limiting MGF. In the case of the SPNs studied in [30], this limit corresponds to the MGF of an exponential random variable.

3.1.2 Prerequisites for the MGF method

The two prerequisites of the MGF method are positive recurrence of the vector of queue lengths and SSC to a one-dimensional subspace. In the first step of the MGF method, the idea is to use the key unused service property described in Remark 5 to obtain an equation that is valid for all traffic. In this step, using SSC is key to obtain an expression that yields the exact MGF in the heavy-traffic limit. Then, one sets to zero the drift of the MGF and obtains an implicit equation, which depends on the MGF of the arrivals, the potential service and the unused service. In the second step of the MGF method, the goal is to bound the MGF of the unused service and compute the heavy-traffic limit of the MGF of the scaled queue lengths.

In Sect. 3.2 we use the key ideas of the MGF method described above, and we extend them to be used for the load balancing system modeled in continuous time in the many-server heavy-traffic regime (as described in Sect. 2). In this case, the role of the unused service is played by the indicator functions \(\mathbbm {1}_{\left\{ q_i(t)=0 \right\} }\) for \(i\in [N]\), since no job can be served from the \(i{}^{\text {th}}\) queue if \(q_i(t)=0\). In fact, this indicator function satisfies the key property \(\mathbbm {1}_{\left\{ q_i=0 \right\} }q_i=0\) with probability 1 for all \(i\in [N]\), which written in ‘exponential form’ yields \(\mathbbm {1}_{\left\{ q_i=0 \right\} }=\mathbbm {1}_{\left\{ q_i=0 \right\} }\exp \left( {\scriptstyle {\tilde{\theta }}q_i}\right) \), where \({\tilde{\theta }}\) is any real number. We present the details of the proof in Sect. 3.2.

3.2 Proof of Theorem 1 using the Transform method

In this proof we use one-sided Laplace transform to illustrate a different transform that falls in the scope of the work of [30]. The exponential equation required for Step 1 is accomplished by the following lemma.

Lemma 2

Consider a load balancing system operating under power-of-d choices, with \(d=\lceil cN^\beta \rceil \), as described in Theorem 1. Given a vector \({\varvec{q}}\in {\mathbb {Z}}_+^N\) and \(N\in {\mathbb {Z}}_+\), define

$$\begin{aligned} \phi ({\varvec{q}},N) {\mathop {=}\limits ^{\triangle }}\left( \exp \left( {\scriptstyle \theta N^{-\alpha } q_\Sigma }\right) -1 \right) \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ q_i=0 \right\} }\right) , \end{aligned}$$

where \(q_\Sigma {\mathop {=}\limits ^{\triangle }}\sum _{i=1}^N q_i\). For every \(N\in {\mathbb {Z}}_+\), if

$$\begin{aligned} |\theta | < \dfrac{1}{4\lceil \alpha -1\rceil \lceil \log (N)\rceil N^{1-\alpha }}\log \left( 1+\tfrac{\lambda _0(d-1)}{2N^2} \right) , \end{aligned}$$

then,

$$\begin{aligned}&\left| {\mathbb {E}}\left[ \phi \left( {\overline{{\varvec{q}}}}^{(N)},N\right) \right] \right| \\&\le 2{\overline{C}}\lambda _0 |\theta | \dfrac{N^{(1-\alpha )\left( 1-\frac{1}{r}\right) } \lceil \alpha -1\rceil \lceil \log (N)\rceil N^2\exp \left( {\scriptstyle \tfrac{4|\theta |\lceil \alpha -1\rceil \lceil \log (N)\rceil N^{3-\alpha }}{\lambda _0(d-1)}}\right) }{\lambda _0(d-1) + 2N^2\left( 1-\exp \left( {\scriptstyle 4|\theta | \lceil \alpha -1\rceil \lceil \log (N)\rceil N^{1-\alpha }}\right) \right) }. \end{aligned}$$

This implies that, if \(\alpha +\beta >3\), then \({\mathbb {E}}\left[ \phi \left( {\overline{{\varvec{q}}}}^{(N)},N\right) \right] \) is \(o(N^{1-\alpha })\).

The proof of Lemma 2 is presented in Appendix A.1, and heavily uses the SSC result presented in Proposition 1. Now we prove the theorem.

Proof

(of Theorem 1 using Transform method) We omit the dependence on N of the variables, and we work with d instead of \(\lceil cN^\beta \rceil \) for ease of exposition. This proof is based on the use of the test function \(V_{\exp }({\varvec{q}}){\mathop {=}\limits ^{\triangle }}\exp \left( {\scriptstyle \theta N^{-\alpha }q_\Sigma }\right) \), where \(\theta <0\). Using the definition of drift, we obtain that for any \({\varvec{q}}\in {\mathbb {Z}}_+^N\)

$$\begin{aligned}&\Delta V_{\exp }({\varvec{q}}) \\&= \exp \left( {\scriptstyle \theta N^{-\alpha }q_\Sigma }\right) \left( \sum _{i=1}^N \lambda N \dfrac{\left( {\begin{array}{c}N-i\\ d-1\end{array}}\right) }{\left( {\begin{array}{c}N\\ d\end{array}}\right) } \left( \exp \left( {\scriptstyle \theta N^{-\alpha }}\right) -1\right) + N \left( \exp \left( {\scriptstyle -\theta N^{-\alpha }}\right) -1\right) \right) \\&\quad - \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ q_i=0 \right\} }\right) \exp \left( {\scriptstyle \theta N^{-\alpha }q_\Sigma }\right) \left( \exp \left( {\scriptstyle -\theta N^{-\alpha }}\right) -1\right) \\&{\mathop {=}\limits ^{(a)}} \exp \left( {\scriptstyle \theta N^{-\alpha }q_\Sigma }\right) \left( \lambda N(\exp \left( {\scriptstyle \theta N^{-\alpha }}\right) -1) + \left( N-\sum _{i=1}^N \mathbbm {1}_{\left\{ q_i=0 \right\} }\right) (\exp \left( {\scriptstyle -\theta N^{-\alpha }}\right) -1) \right) \\&{\mathop {=}\limits ^{(b)}} \left( \exp \left( {\scriptstyle -\theta N^{-\alpha }}\right) -1\right) \exp \left( {\scriptstyle \theta N^{-\alpha }{\overline{q}}_\Sigma }\right) \left( N \left( 1-\lambda \exp \left( {\scriptstyle \theta N^{-\alpha }}\right) \right) - \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } \right) \\&{\mathop {=}\limits ^{(c)}} \left( \exp \left( {\scriptstyle -\theta N^{-\alpha }}\right) -1\right) \left( \exp \left( {\scriptstyle \theta N^{-\alpha }{\overline{q}}_\Sigma }\right) N \left( 1-\lambda \exp \left( {\scriptstyle \theta N^{-\alpha }}\right) \right) \right. \\ {}&\quad \left. - \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }-\phi ({\varvec{q}},N) \right) , \end{aligned}$$

where (a) holds because \(\sum _{i=1}^N \left( {\begin{array}{c}N-i\\ d-1\end{array}}\right) = \left( {\begin{array}{c}N\\ d\end{array}}\right) \); (b) holds by factorizing the term \((\exp \left( {\scriptstyle -\theta N^{-\alpha }}\right) -1)\) and rearranging terms; and (c) holds for the function \(\phi ({\varvec{q}},N)\) defined in Lemma 2.

Now we set the drift of \(V_{\exp }({\varvec{q}})\) to zero in steady state. Observe that, since \(\theta <0\), we have \({\mathbb {E}}\left[ \exp \left( {\scriptstyle \theta N^{-\alpha }{\overline{q}}_\Sigma }\right) \right] \le 1\). Additionally, \(G_{{\varvec{q}},{\varvec{q}}}\le N(\lambda +1)<\infty \). Then, we know \({\mathbb {E}}\left[ \Delta V_{\exp }({\overline{{\varvec{q}}}}) \right] =0\). Therefore, taking expected value with respect to stationary distribution in the expression above, replacing \(\lambda =1-N^{-\alpha }\), using Lemma 1 and rearranging terms we obtain

$$\begin{aligned} {\mathbb {E}}\left[ \theta N^{-\alpha }{\overline{q}}_\Sigma \right]&= \dfrac{N^{1-\alpha } + {\mathbb {E}}\left[ \phi ({\overline{{\varvec{q}}}},N) \right] }{N\left( 1-(1-N^{-\alpha })\exp \left( {\scriptstyle \theta N^{-\alpha }}\right) \right) }. \end{aligned}$$
(7)

This completes Step 1. Observe that (7) gives an expression for the one-sided Laplace transform of \(N^{-\alpha }{\overline{q}}_\Sigma \) that is valid for all N. However, the numerator depends on \({\mathbb {E}}\left[ \phi ({\overline{{\varvec{q}}}},N) \right] \), which depends on the queue lengths.

Now we move to the second step, where the goal is to take the many-server heavy-traffic limit. The fraction (7) is of the form \(\frac{0}{0}\) in the limit as \(N\rightarrow \infty \), so we take Taylor expansion of the exponential function in the denominator. Expanding up to second order and canceling the factor \(N^{1-\alpha }\) from the numerator and the denominator we obtain

$$\begin{aligned} {\mathbb {E}}\left[ \exp \left( {\scriptstyle \theta N^{-\alpha }{\overline{q}}_\Sigma }\right) \right]&= \dfrac{1+N^{\alpha -1}{\mathbb {E}}\left[ \phi ({\overline{{\varvec{q}}}},N) \right] }{1-\theta + O(N^{-\alpha })}. \end{aligned}$$

Finally, taking the limit as \(N\rightarrow \infty \) we obtain

$$\begin{aligned} \lim _{N\rightarrow \infty } {\mathbb {E}}\left[ \exp \left( {\scriptstyle \theta N^{-\alpha } {\overline{q}}_\Sigma }\right) \right]&= \dfrac{1}{1-\theta }, \end{aligned}$$

which is the one-sided Laplace transform of an exponential random variable with mean 1. \(\square \)

Observe that (7) is valid for all N. Hence, it can be used to obtain an error bound and rate of convergence between \({\mathbb {E}}\left[ \exp \left( {\scriptstyle \theta N^{-\alpha }{\overline{q}}_\Sigma }\right) \right] \) and \(\frac{1}{1-\theta }\), which is the limiting one-sided Laplace transform. In this paper we do not perform this step for brevity.

3.3 Rate of convergence of the first moment

In Theorem 1 we showed convergence in distribution of the average queue length scaled by \(N^{1-\alpha }\) (or, equivalently, the total queue length scaled by \(N^{-\alpha }\)). However, convergence in distribution is not a sufficient condition to conclude convergence of the expected value. In other words, in Theorem 1 we showed \(N^{-\alpha }{\overline{q}}_\Sigma ^{(N)}\Rightarrow \Upsilon \), where \(\Upsilon \) is an exponential random variable with mean 1. However, from this statement we cannot directly conclude that \(\lim _{N\rightarrow \infty }{\mathbb {E}}\left[ N^{-\alpha }{\overline{q}}_\Sigma ^{(N)} \right] =1\). In this section we show that the last result holds using the Drift method [15, 35, 36, 53]. We first state the result formally.

Theorem 2

Consider a sequence of load balancing systems operating under power-of-d with \(d=\lceil cN^\beta \rceil \), parametrized by N as described in Sect. 2. If \(d\ge 2\), then

$$\begin{aligned} \left| {\mathbb {E}}\left[ \sum _{i=1}^N {\overline{q}}_i \right] - N^{\alpha }\right| \le 1 +\left( \dfrac{{\overline{C}}e}{c}\right) \left\lceil \alpha -1\right\rceil \left\lceil \log (N)\right\rceil \left( \frac{\lceil cN^\beta \rceil }{\lceil cN^\beta \rceil -1}\right) N^{3-\beta }, \end{aligned}$$
(8)

where \({\overline{C}}\) is the constant from Proposition 1. Additionally, if \(\alpha +\beta >3\), then

$$\begin{aligned} \lim _{N\rightarrow \infty } N^{-\alpha }{\mathbb {E}}\left[ {\overline{q}}_\Sigma ^{(N)} \right] =1. \end{aligned}$$

Note that the second part of the theorem is an immediate consequence of the error bound because, after multiplying everything by \(N^{-\alpha }\), the right-hand side of (8) converges to zero as \(N\rightarrow \infty \).

Similarly to Theorem 1, the case of power-of-d choices and JSQ are immediate consequences of Theorem 2. We formally state them below.

Corollary 3

Consider a sequence of load balancing systems operating under power-of-d choices with constant d, parametrized by N as described in Sect. 2. If \(d\ge 2\) and \(\alpha >3\), then

$$\begin{aligned} \lim _{N\rightarrow \infty } N^{-\alpha }{\mathbb {E}}\left[ {\overline{q}}_\Sigma ^{(N)} \right] =1. \end{aligned}$$

The proof of Corollary 3 holds easily after letting \(\beta =0\) in Theorem 2. Now we present the formal result for JSQ routing.

Corollary 4

Consider a sequence of load balancing systems operating under JSQ, parametrized by N as described in Sect. 2. If \(\alpha >2\), then

$$\begin{aligned} \lim _{N\rightarrow \infty } N^{-\alpha }{\mathbb {E}}\left[ {\overline{q}}_\Sigma ^{(N)} \right] =1. \end{aligned}$$

The proof of Corollary 4 holds after realizing that JSQ is equivalent to power-of-d choices with \(d=N\). Hence, it suffices to replace \(c=\beta =1\) in Theorem 2.

In the rest of this section, we prove Theorem 2 using the Drift method. In the Drift method there are two main steps. First, one shows SSC (which we did in Proposition 1), and secondly, one sets to zero the drift of \(V_\parallel ({\varvec{q}})=\Vert {\varvec{q}}_\parallel \Vert ^2\) in steady state (provided that its expectation is finite). To perform the second step, we first compute the drift of the test function \(V_\parallel ({\varvec{q}}){\mathop {=}\limits ^{\triangle }}\left\| {\varvec{q}}_\parallel \right\| ^2\). We provide the result in the following auxiliary lemma.

Lemma 3

Consider a load balancing system as described in Sect. 2. Let \(V_\parallel ({\varvec{q}}){\mathop {=}\limits ^{\triangle }}\left\| {\varvec{q}}_\parallel \right\| ^2\). Then,

$$\begin{aligned} \Delta V_\parallel ({\varvec{q}}) = \lambda \sum _{i=1}^N \dfrac{\left( {\begin{array}{c}N-i\\ d-1\end{array}}\right) }{\left( {\begin{array}{c}N\\ d\end{array}}\right) } \left( 1 + 2\sum _{j=1}^N q_j\right) + \frac{1}{N} \sum _{i=1}^N \left( 1-\mathbbm {1}_{\left\{ q_i=0 \right\} }\right) \left( 1-2\sum _{j=1}^N q_j \right) . \end{aligned}$$

We present the proof of Lemma 3 in Appendix A.2.

Proof

(of Theorem 2) Similarly to our previous proofs, we omit the dependence on N of the variables and we work with d instead of \(\lceil cN^\beta \rceil \) for ease of exposition. We start computing the drift of \(V_\parallel ({\varvec{q}})=\Vert {\varvec{q}}_\parallel \Vert ^2\). By Lemma 3, and since \(\sum _{i=1}^N \left( {\begin{array}{c}N-i\\ d-1\end{array}}\right) =\left( {\begin{array}{c}N\\ d\end{array}}\right) \), we obtain

$$\begin{aligned} \Delta V_\parallel ({\varvec{q}})&= \lambda \left( 1 + 2\sum _{i=1}^N q_i\right) + \dfrac{1}{N}\left( N - \sum _{i=1}^N \mathbbm {1}_{\left\{ q_i=0 \right\} }\right) \left( 1-2\sum _{i=1}^N q_i\right) . \end{aligned}$$

Now we set the drift of \(V_\parallel ({\varvec{q}})\) to zero. We skip the proof of \({\mathbb {E}}\left[ V_\parallel ({\overline{{\varvec{q}}}}) \right] <\infty \) for ease of exposition. Taking expectation with respect to the stationary distribution, replacing \(\lambda =1-N^{-\alpha }\), using Lemma 1 to replace \({\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } \right] =N^{1-\alpha }\) and reorganizing terms, we obtain:

$$\begin{aligned} N^{-\alpha }{\mathbb {E}}\left[ \sum _{i=1}^N {\overline{q}}_i \right]&= 1-N^{-\alpha } + \frac{1}{N}{\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } \right) \left( \sum _{j=1}^N {\overline{q}}_j\right) \right] . \end{aligned}$$
(9)

We bound the last term of (9) using SSC. Specifically, we use (4), which establishes that for any positive integer r we have

$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\overline{{\varvec{q}}}}_\perp \right\| ^r \right] ^{\frac{1}{r}}\le {\overline{C}}r \left( \dfrac{N^2}{d-1}\right) , \end{aligned}$$

where \({\overline{C}}\) is a constant.

First, note \(\mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }{\overline{q}}_i=0\) with probability 1 for all \(i\in [N]\). Then,

$$\begin{aligned} \frac{1}{N}\left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } \right) \left( \sum _{j=1}^N {\overline{q}}_j\right)&= \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\left( \frac{1}{N}\sum _{j=1}^N {\overline{q}}_j - {\overline{q}}_i \right) \\&{\mathop {=}\limits ^{(a)}} - \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }{\overline{q}}_{\perp i}, \end{aligned}$$

where \({\overline{q}}_{\perp i}\) is the \(i{}^{\text {th}}\) element of \({\overline{{\varvec{q}}}}_\perp \) and (a) holds by the definition of \({\overline{{\varvec{q}}}}_\perp \) in (2). Then,

$$\begin{aligned} \frac{1}{N} \left| \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } \right) \left( \sum _{j=1}^N {\overline{q}}_j\right) \right|&= \left| {\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }{\overline{q}}_{\perp i} \right] \right| \\&{\mathop {\le }\limits ^{(a)}} {\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } \right] ^{1-\frac{1}{r}} {\mathbb {E}}\left[ \Vert {\overline{{\varvec{q}}}}_\perp \Vert _r^r \right] ^{\frac{1}{r}} \\&{\mathop {\le }\limits ^{(b)}} N^{(1-\alpha )\left( 1-\frac{1}{r}\right) } {\overline{C}}r \left( \dfrac{N^2}{\lceil cN^\beta \rceil -1}\right) \\&= N^{(1-\alpha )\left( 1-\frac{1}{r}\right) } \frac{{\overline{C}}}{c} r N^{2-\beta } \left( \dfrac{cN^\beta }{ \lceil cN^\beta \rceil -1}\right) \\&= \left( \frac{{\overline{C}}}{c}\right) r N^{3-\alpha -\beta } N^{\frac{\alpha -1}{r}}\left( \dfrac{cN^\beta }{ \lceil cN^\beta \rceil -1}\right) \\&{\mathop {\le }\limits ^{(c)}} \left( \dfrac{{\overline{C}}e}{c}\right) \left\lceil \alpha -1\right\rceil \left\lceil \log (N)\right\rceil \\&\quad \times \left( \frac{\lceil cN^\beta \rceil }{\lceil cN^\beta \rceil -1}\right) N^{3-\alpha -\beta }, \end{aligned}$$

where r is a positive integer. Here, (a) holds by Hölder’s inequality; (b) holds by Lemma 1 and by Proposition 1 for \(r\ge 2\) because of the inequalities of norms; and (c) holds by setting \(r=\left\lceil \alpha -1\right\rceil \left\lceil \log (N)\right\rceil \), because \(N^{\frac{\alpha -1}{\left\lceil \alpha -1\right\rceil \left\lceil \log (N)\right\rceil }}\le e\), and because \(cN^\beta \le \lceil cN^\beta \rceil \) by definition of the ceiling function. Using this result in (9), we obtain

$$\begin{aligned} \left| N^{-\alpha }{\mathbb {E}}\left[ \sum _{i=1}^N {\overline{q}}_i \right] - 1 \right| \le N^{-\alpha } +\left( \dfrac{{\overline{C}}e}{c}\right) \left\lceil \alpha -1\right\rceil \left\lceil \log (N)\right\rceil \left( \frac{\lceil cN^\beta \rceil }{\lceil cN^\beta \rceil -1}\right) N^{3-\alpha -\beta }. \end{aligned}$$

This proves the theorem. \(\square \)

As we show in Sect. 4, using Stein’s method one immediately obtains both: convergence of the queue lengths in distribution and in expected value. However, the Transform method we used in Sect. 3.2 only provides convergence in distribution. For completeness, we provided the proof of convergence in mean in the last subsection, and we used the Drift method for consistency. An alternate path to prove both types of convergence using Transform methods would be to consider \(\theta \in {\mathbb {R}}\), as opposed to \(\theta <0\), and obtain convergence of the MGF (also known as two-sided Laplace transform). The proof is essentially the same as we developed in this section, with the exception that, to set the drift to zero, one needs to show that \({\mathbb {E}}\left[ \exp \left( {\scriptstyle \theta N^{-\alpha }{\overline{q}}_\Sigma ^{(N)}}\right) \right] <\infty \) for \(|\theta |<\Theta \), where \(\Theta \) is a constant independent of N. This step might be challenging in some systems.

4 Rate of convergence in Wasserstein’s distance

The proof we provide in this section is based on bounding the Wasserstein’s distance between the scaled total queue length and an exponential random variable. We start with the definition of this metric as presented in [44].

Definition 3

For two probability measures \(\nu _1\) and \(\nu _2\), the Wasserstein’s distance between them is

$$\begin{aligned} d_W(\nu _1,\nu _2) {\mathop {=}\limits ^{\triangle }}\sup _{h\in \text {Lip(1)}} \left| \int h(x)\,\mathrm{d}\nu _1(x) - \int h(x)\,\mathrm{d}\nu _2(x) \right| , \end{aligned}$$

where Lip(1)\({\mathop {=}\limits ^{\triangle }}\left\{ h:{\mathbb {R}}\rightarrow {\mathbb {R}}\text { such that } |h(x)-h(y)|\le |x-y| \right\} \) is the set of Lipschitz functions with constant 1.

For random variables X and Y with laws \(\nu _1\) and \(\nu _2\), respectively, we write \(d_W(X,Y)\) instead of \(d_W(\nu _1,\nu _2)\), and when the measures are clear from the context we write

$$\begin{aligned} d_W(X,Y) = \sup _{h\in \text {Lip(1)}} \left| {\mathbb {E}}\left[ h(X) \right] - {\mathbb {E}}\left[ h(Y) \right] \right| . \end{aligned}$$

In the rest of this section we prove the following theorem.

Theorem 3

Consider a load balancing system operating under power-of-d choices with \(d=\lceil cN^\beta \rceil \), as described in Theorem 1. Let \(\Upsilon \) be an exponential random variable with mean 1. Then,

$$\begin{aligned} \begin{aligned}&d_W\left( (1-\lambda ){\overline{q}}_\Sigma , \Upsilon \right) \le {\overline{C}}e \left( \dfrac{N^3(1-\lambda )}{d -1}\right) \left\lceil \log \left( \tfrac{1}{N(1-\lambda )}\right) \right\rceil + \dfrac{5}{3}(1-\lambda ), \end{aligned} \end{aligned}$$
(10)

where \({\overline{C}}\) is the constant from Proposition 1.

Note that if we let \(\lambda =1-N^{-\alpha }\) and \(\alpha +\beta >3\), the right-hand side of (10) converges to zero as \(N\rightarrow \infty \). It is known that convergence to zero of the Wasserstein’s distance implies convergence in distribution [20, Theorem 2]. Therefore, since one of the assumptions of Theorem 1 is that \(\alpha +\beta >3\), we can prove Theorem 1 as a consequence of Theorem 3.

The Wasserstein’s distance considers the family of all Lipschitz-1 functions. Then, one can use specific functions to obtain the rate of convergence of a variety of functions of \((1-\lambda ){\overline{q}}_\Sigma \) and \(\Upsilon \). In the next corollary, we show how to obtain the rate of convergence of the mean as a consequence of Theorem 3.

Corollary 5

Consider a load balancing system as described in Theorem 3. Then,

$$\begin{aligned} \left| {\mathbb {E}}\left[ (1-\lambda ) {\overline{q}}_\Sigma \right] - 1\right| \le {\overline{C}}e \left( \dfrac{N^3(1-\lambda )}{d -1}\right) \left\lceil \log \left( \tfrac{1}{N(1-\lambda )}\right) \right\rceil + \dfrac{5}{3}(1-\lambda ). \end{aligned}$$

The proof of Corollary 5 holds by noticing that \(f(x)=x\) is a Lipschitz-1 function and, hence,

$$\begin{aligned} \left| {\mathbb {E}}\left[ (1-\lambda ){\overline{q}}_\Sigma \right] - 1 \right| \le d_W\left( (1-\lambda ){\overline{q}}_\Sigma , \Upsilon \right) . \end{aligned}$$

This approach is an alternate proof to Theorem 2, and shows the power of Stein’s method.

Now we prove Theorem 3. We start with a result presented in [44, Theorem 5.4 part 1].

Lemma 4

Let Y be a random variable with \({\mathbb {E}}\left[ Y \right] <\infty \), and let \(\Upsilon \) be an exponential random variable with mean 1. Define

$$\begin{aligned} {\mathcal {F}}_W {\mathop {=}\limits ^{\triangle }}\left\{ g:{\mathbb {R}}\rightarrow {\mathbb {R}}\text { such that } g(0)=0,\, \Vert g'\Vert \le 1,\, \Vert g''\Vert \le 2 \right\} . \end{aligned}$$

Then,

$$\begin{aligned} d_W(Y,\Upsilon )\le \sup _{g\in {\mathcal {F}}_W} \left| {\mathbb {E}}\left[ g'(Y)-g(Y) \right] \right| . \end{aligned}$$

Now we prove the theorem.

Proof

(of Theorem 3) Similarly to all our previous proofs, for ease of exposition we omit the dependence on N of the variables and we use d instead of \(\lceil cN^\beta \rceil \). We use Lemma 4 with \(Y=(1-\lambda ){\overline{q}}_\Sigma \). Let f be a differentiable function such that \(g=f'\in {\mathcal {F}}_W\). By assuming differentiability we do not lose generality, for the following reason. Observe that \(f'\in {\mathcal {F}}_W\) implies that \(f\in \text {Lip(1)}\) and, hence, it implies that f is integrable. Therefore, if \(f'\in {\mathcal {F}}_W\), then f is well defined [56, Theorem 7.2].

By definition of drift, for any vector \({\varvec{q}}\in {\mathbb {Z}}_+^N\), we have

$$\begin{aligned}&\Delta f\left( (1-\lambda )q_\Sigma \right) \\&= \sum _{i=1}^N \lambda N \dfrac{\left( {\begin{array}{c}N-i\\ d-1\end{array}}\right) }{\left( {\begin{array}{c}N\\ d\end{array}}\right) } \bigg (f\big ((1-\lambda )q_\Sigma + 1-\lambda \big ) - f\big ((1-\lambda )q_\Sigma \big ) \bigg ) \\&\quad + \sum _{i=1}^N \left( 1-\mathbbm {1}_{\left\{ q_i=0 \right\} }\right) \bigg ( f\big ((1-\lambda )q_\Sigma - 1+\lambda \big ) - f\big ((1-\lambda )q_\Sigma \big ) \bigg ) \\&{\mathop {=}\limits ^{(a)}} \lambda N \bigg (f\big ((1-\lambda )q_\Sigma + 1-\lambda \big ) - f\big ((1-\lambda )q_\Sigma \big ) \bigg ) \\&\quad + \left( N-\sum _{i=1}^N \mathbbm {1}_{\left\{ q_i=0 \right\} }\right) \bigg (f\big ((1-\lambda ) q_\Sigma - (1-\lambda )\big ) - f\big ((1-\lambda )q_\Sigma \big ) \bigg ) \\&{\mathop {=}\limits ^{(b)}}\lambda N \left( (1-\lambda )f'\big ((1-\lambda )q_\Sigma \big ) + \dfrac{(1-\lambda )^2}{2}f''\big ((1-\lambda )q_\Sigma \big ) + \dfrac{(1-\lambda )^3}{6}f'''\left( \xi _1 \right) \right) \\&\quad + N\left( -(1-\lambda ) f'\big ((1-\lambda )q_\Sigma \big ) + \dfrac{(1-\lambda )^2}{2} f''\big ((1-\lambda )q_\Sigma \big ) - \dfrac{(1-\lambda )^3}{6}f'''(\xi _2) \right) \\&\quad + \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ q_i=0 \right\} }\right) \left( (1-\lambda ) f'\big ((1-\lambda )q_\Sigma \big ) - \dfrac{(1-\lambda )^2}{2} f''\big ((1-\lambda )q_\Sigma \big ) \right) \\&\quad +\left( \sum _{i=1}^N \mathbbm {1}_{\left\{ q_i=0 \right\} }\right) \dfrac{(1-\lambda )^3}{6}f'''(\xi _3) \end{aligned}$$

where \(\xi _1\) is between \((1-\lambda )q_\Sigma \) and \((1-\lambda )\left( q_\Sigma +1\right) \), and \(\xi _2,\xi _3\) are between \((1-\lambda )\) and \((1-\lambda )\left( q_\Sigma -1\right) \) Here, (a) holds because \(\sum _{i=1}^N \left( {\begin{array}{c}N-i\\ d-1\end{array}}\right) =\left( {\begin{array}{c}N\\ d\end{array}}\right) \); and (b) holds by taking Taylor approximation.

Since \(f'\in {\mathcal {F}}_W\), we know that f is integrable. Then, we can set its drift to zero in steady state. Taking expectation with respect to stationary distribution and reorganizing terms, we obtain

$$\begin{aligned}&{\mathbb {E}}\left[ f'\big ((1-\lambda ){\overline{q}}_\Sigma \big ) \right] \\&= \dfrac{1}{N(1-\lambda )} {\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) f'\big ((1-\lambda ){\overline{q}}_\Sigma \big ) \right] + \left( \dfrac{1+\lambda }{2}\right) {\mathbb {E}}\left[ f''\big ((1-\lambda ){\overline{q}}_\Sigma \big ) \right] \\&\quad -\dfrac{1}{2N} {\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) f''\big ((1-\lambda ){\overline{q}}_\Sigma \big ) \right] +\dfrac{\lambda (1-\lambda )}{6}{\mathbb {E}}\left[ f'''(\xi _1) \right] \\&\quad -\left( \dfrac{1-\lambda }{6} \right) {\mathbb {E}}\left[ f'''\left( \xi _2\right) \right] + \left( \dfrac{1-\lambda }{6N}\right) {\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) f'''(\xi _3) \right] . \end{aligned}$$

Using the last expression and the triangle inequality, we have

$$\begin{aligned}&\left| {\mathbb {E}}\left[ f'\big ((1-\lambda ){\overline{q}}_\Sigma \big ) - f''\big ((1-\lambda ){\overline{q}}_\Sigma \big ) \right] \right| \nonumber \\&\le \dfrac{1}{N(1-\lambda )} {\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) \left| f'\big ((1-\lambda ){\overline{q}}_\Sigma \big )\right| \right] + \left| \dfrac{\lambda -1}{2}\right| {\mathbb {E}}\left[ |f''\big ((1-\lambda ){\overline{q}}_\Sigma \big )| \right] \nonumber \\&\quad +\dfrac{1}{2N}{\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) \left| f''\big ((1-\lambda ){\overline{q}}_\Sigma \big )\right| \right] + \dfrac{\lambda (1-\lambda )}{6}{\mathbb {E}}\left[ \left| f'''(\xi _1)\right| \right] \nonumber \\&\quad +\left( \dfrac{1-\lambda }{6} \right) {\mathbb {E}}\left[ \left| f'''\left( \xi _2\right) \right| \right] + \left( \dfrac{1-\lambda }{6N}\right) {\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) \left| f'''(\xi _3)\right| \right] . \end{aligned}$$
(11)

We bound term by term. For the first term we expand \(f'\big ((1-\lambda ){\overline{q}}_\Sigma \big )\) in Taylor series up to first order, around 0. Since \(f'\in {\mathcal {F}}_W\), we know that \(f'(0)=0\). Then, \(f'\big (1-\lambda ){\overline{q}}_\Sigma \big ) = (1-\lambda ){\overline{q}}_\Sigma f''(\xi _4)\), where \(|\xi _4|\in \left( 0,{\overline{q}}_\Sigma \right) \). Therefore, we obtain

$$\begin{aligned}&\dfrac{1}{N(1-\lambda )}{\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) \left| f'\big ((1-\lambda ){\overline{q}}_\Sigma \big )\right| \right] \\&\quad = \dfrac{1}{N}{\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) {\overline{q}}_\Sigma \left| f''(\xi _4)\right| \right] \\&\quad {\mathop {\le }\limits ^{(a)}} \dfrac{1}{N}{\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) {\overline{q}}_\Sigma \right] \\&\quad {\mathop {=}\limits ^{(b)}} {\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\left( \dfrac{{\overline{q}}_\Sigma }{N}-{\overline{q}}_i\right) \right] \\&\quad {\mathop {=}\limits ^{(c)}} -{\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } {\overline{q}}_{\perp i} \right] \\&\quad {\mathop {\le }\limits ^{(d)}} {\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } \right] ^{1-\frac{1}{r}} {\mathbb {E}}\left[ \left\| {\overline{{\varvec{q}}}}_\perp \right\| _r^r \right] ^{\frac{1}{r}} \\&\quad {\mathop {\le }\limits ^{(e)}} (1-\lambda )^{1-\frac{1}{r}} N^{1-\frac{1}{r}} {\mathbb {E}}\left[ \left\| {\overline{{\varvec{q}}}}_\perp \right\| ^r \right] ^{\frac{1}{r}}\\&\quad {\mathop {\le }\limits ^{(f)}} {\overline{C}}(1-\lambda )^{1-\frac{1}{r}} r \left( \dfrac{N^{3-\frac{1}{r}}}{d-1}\right) , \end{aligned}$$

where \(r>1\). Here, (a) holds because \(f'\in {\mathcal {F}}_W\) and, hence, \(\left| f''(\xi _4)\right| \le 1\); (b) holds because \(\mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }{\overline{q}}_i=0\) for all \(i\in [N]\); (c) holds by definition of \({\overline{{\varvec{q}}}}_\perp \) in (2); (d) holds by Hölder’s inequality; (e) holds because for any \(r\ge 2\) the \(r{}^{\text {th}}\) norm lower bounds Euclidean norm, and because \({\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } \right] =N^{1-\alpha }=N(1-\lambda )\) by Lemma 1; and (f) holds because, by SSC in Proposition 1, we have \({\mathbb {E}}\left[ \left\| {\overline{{\varvec{q}}}}_\perp \right\| ^r \right] ^{\frac{1}{r}}\le {\overline{C}}r \left( \frac{N^2}{d-1}\right) \).

Taking \(r=\left\lceil \log \left( \frac{1}{N(1-\lambda )}\right) \right\rceil \), we obtain

$$\begin{aligned} \begin{aligned}&\dfrac{1}{N(1-\lambda )}{\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) \left| f'\big ((1-\lambda ){\overline{q}}_\Sigma \big )\right| \right] \\&\le {\overline{C}}e \left( \dfrac{N^3(1-\lambda )}{d-1}\right) \left\lceil \log \left( \tfrac{1}{N(1-\lambda )}\right) \right\rceil . \end{aligned} \end{aligned}$$
(12)

For the second term, since \(f'\in {\mathcal {F}}_W\) we obtain

$$\begin{aligned} \left( \dfrac{1-\lambda }{2}\right) {\mathbb {E}}\left[ \left| f''\big ((1-\lambda ){\overline{q}}_\Sigma \big ) \right| \right] \le \dfrac{1-\lambda }{2}. \end{aligned}$$
(13)

For the third term we obtain

$$\begin{aligned} \dfrac{1}{2N}{\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) \left| f''\big ((1-\lambda ){\overline{q}}_\Sigma \big )\right| \right]&{\mathop {\le }\limits ^{(a)}} \dfrac{1}{2N} {\mathbb {E}}\left[ \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} } \right] {\mathop {=}\limits ^{(b)}} \dfrac{1-\lambda }{2}, \end{aligned}$$
(14)

where (a) holds because \(f'\in {\mathcal {F}}_W\); and (b) holds by Lemma 1.

For the fourth term, since \(f'\in {\mathcal {F}}_W\), we obtain

$$\begin{aligned} \dfrac{\lambda (1-\lambda )}{6}{\mathbb {E}}\left[ \left| f'''(\xi _1) \right| \right] \le \dfrac{\lambda (1-\lambda )}{3}. \end{aligned}$$
(15)

Similarly, for the fifth term we have

$$\begin{aligned} \left( \dfrac{1-\lambda }{6}\right) {\mathbb {E}}\left[ \left| f'''(\xi _2) \right| \right] \le \dfrac{1-\lambda }{3}. \end{aligned}$$
(16)

For the last term, we obtain

$$\begin{aligned} \left( \dfrac{1-\lambda }{6N}\right) {\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) \left| f'''(\xi _3)\right| \right]&{\mathop {\le }\limits ^{(a)}} \left( \dfrac{1-\lambda }{3N}\right) {\mathbb {E}}\left[ \left( \sum _{i=1}^N \mathbbm {1}_{\left\{ {\overline{q}}_i=0 \right\} }\right) \right] \nonumber \\&{\mathop {=}\limits ^{(b)}} \dfrac{(1-\lambda )^2}{3} \end{aligned}$$
(17)

where (a) holds because \(f'\in {\mathcal {F}}_W\); and (b) holds by Lemma 1.

Using (12)–(17) in (11) and rearranging terms, we obtain

$$\begin{aligned}&{\mathbb {E}}\left[ \left| f'\big ((1-\lambda ){\overline{q}}_\Sigma \big ) - f''\big ((1-\lambda ){\overline{q}}_\Sigma \big ) \right| \right] \\&\le {\overline{C}}e \left( \dfrac{N^3(1-\lambda )}{d-1}\right) \left\lceil \log \left( \tfrac{1}{N(1-\lambda )}\right) \right\rceil + \dfrac{5}{3}(1-\lambda ). \end{aligned}$$

This proves the result. \(\square \)

5 Proof of state space collapse

Before ending this paper, we prove the SSC result presented in Proposition 1. The proof is based on [53, Lemma 10 in Appendix A], which we state below for completeness.

Lemma 5

Let \(\left\{ X(t):t\in {\mathbb {R}}_+ \right\} \) be a CTMC over a countable state space \({\mathcal {X}}\), with transition rate matrix \(G^X\). Suppose that it is irreducible, nonexplosive and positive recurrent, and it converges in distribution to a random variable \({\overline{X}}\) as \(t\rightarrow \infty \). Consider a Lyapunov function \(Z:{\mathcal {X}}\rightarrow {\mathbb {R}}_+\) and suppose its drift satisfies the following conditions:

  1. (C1)

    There exist constants \(\gamma >0\) and \(B>0\) such that \(\Delta Z(x)\le -\gamma \) for any \(x\in {\mathcal {X}}\) with \(Z(x)>B\)

  2. (C2)

    \(\nu _{\max }{\mathop {=}\limits ^{\triangle }}\sup \left\{ |Z(x')-Z(x)|: x,x'\in {\mathcal {X}}\text { and }G^X_{x,x'}>0 \right\} <\infty \).

  3. (C3)

    \({\overline{G}}{\mathop {=}\limits ^{\triangle }}\sup \left\{ -G^X_{x,x}: x\in {\mathcal {X}}\right\} <\infty \).

Then, for any nonnegative integer j, we have

$$\begin{aligned} {\mathbb {P}}\left( Z({\overline{X}})>B+2\nu _{\max }j\right) \le \left( \dfrac{G_{\max }\nu _{\max }}{G_{\max }\nu _{\max }+\gamma }\right) ^{j+1}, \end{aligned}$$
(18)

where

$$\begin{aligned} G_{\max }{\mathop {=}\limits ^{\triangle }}\sup \left\{ \sum _{x'\in {\mathcal {X}}: Z(x)<Z(x')}G^X_{x,x'} :x\in {\mathcal {X}}\right\} . \end{aligned}$$

As a result, for any positive integer r, the \(r{}^{\text {th}}\) moment of \(Z({\overline{X}})\) can be bounded as follows:

$$\begin{aligned} {\mathbb {E}}\left[ Z({\overline{X}})^r \right] \le (2B)^r + (4\nu _{\max })^r\left( \dfrac{G_{\max }\nu _{\max }+\gamma }{\gamma } \right) ^r r! \end{aligned}$$
(19)

In the proof of Proposition 1, we additionally use a bound on the moment generating function of \(Z({\overline{X}})\), which we present below.

Lemma 6

Let \(\left\{ X(t):t\in {\mathbb {R}}_+ \right\} \) be a CTMC as described in Lemma 5, and suppose it satisfies the three conditions therein. Let \(\theta \in {\mathbb {R}}\) be such that \(|\theta |<\frac{1}{2\nu _{\max }}\log \left( 1+\tfrac{\gamma }{G_{\max }\nu _{\max }}\right) \). Then,

$$\begin{aligned} {\mathbb {E}}\left[ \exp \left( {\scriptstyle \theta Z({\overline{X}})}\right) \right] \le \dfrac{\exp \left( {\scriptstyle \theta B}\right) \gamma }{\gamma + G_{\max }\nu _{\max }(1-e^{2\nu _{\max }\theta })}. \end{aligned}$$

The proof of Lemma 6 is presented in Appendix B.1.

In the next subsection we present a series of auxiliary lemmas that we use in the proof of Proposition 1 and we prove in the appendix.

5.1 Auxiliary lemmas to prove Proposition 1

In the proof of Proposition 1, we use Lemmas 5 and 6 with \(Z({\varvec{q}})=\Vert {\varvec{q}}_\perp \Vert \). To show that condition (C1) is satisfied, we need to bound the drift of \(Z({\varvec{q}})\) outside a bounded set. On this step of the proof, we use the definition of \({\varvec{q}}_\perp {\mathop {=}\limits ^{\triangle }}{\varvec{q}}-{\varvec{q}}_\parallel \) and compute a bound based on the properties of \({\varvec{q}}\) and \({\varvec{q}}_\parallel \). Before stating the result with these bounds, we introduce the following notation.

Given a vector \({\varvec{q}}\in {\mathbb {Z}}_+^N\), define the following functions:

$$\begin{aligned} V({\varvec{q}}){\mathop {=}\limits ^{\triangle }}\left\| {\varvec{q}}\right\| ^2,\quad V_\parallel ({\varvec{q}}){\mathop {=}\limits ^{\triangle }}\left\| {\varvec{q}}_\parallel \right\| ^2 ,\quad W_\perp ({\varvec{q}}){\mathop {=}\limits ^{\triangle }}\left\| {\varvec{q}}_\perp \right\| . \end{aligned}$$
(20)

Lemma 7

Given a vector \({\varvec{q}}\in {\mathbb {Z}}_+^{(N)}\), consider the functions \(V({\varvec{q}}),\, V_\parallel ({\varvec{q}})\) and \(W_\perp ({\varvec{q}})\) defined in (20). Then,

$$\begin{aligned} \Delta W_\perp ({\varvec{q}})\le \dfrac{1}{2\left\| {\varvec{q}}_\perp \right\| }\left( \Delta V({\varvec{q}}) - \Delta V_\parallel ({\varvec{q}}) \right) . \end{aligned}$$

We prove Lemma 7 in Appendix B.2. Using Lemma 7, the proof of (C1) reduces to computing an upper bound on \(\Delta V({\varvec{q}})\) and a lower bound on \(\Delta V_\parallel ({\varvec{q}})\), which we provide in the following lemmas.

Lemma 8

Consider a load balancing system operating under power-of-d choices, as described in Proposition 1. Let \(V({\varvec{q}})\) be as defined in (20). Then, for any vector of queue lengths \({\varvec{q}}\in {\mathbb {Z}}_+^{(N)}\) we have

$$\begin{aligned} \Delta V({\varvec{q}}) \le N(\lambda +1) - 2(1-\lambda )\sum _{i=1}^N q_i - 2\lambda \left( \dfrac{d-1}{N}\right) \left\| {\varvec{q}}_\perp \right\| . \end{aligned}$$

The proof of Lemma 8 is presented in Appendix B.3. In the next lemma we provide a lower bound to \(\Delta V_\parallel ({\varvec{q}})\).

Lemma 9

Consider a load balancing system as described in Sect. 2. Let \(V_\parallel ({\varvec{q}})\) be as defined in (20). Then, for any vector of queue lengths \({\varvec{q}}\in {\mathbb {Z}}^{(N)}_+\) we have

$$\begin{aligned} \Delta V_\parallel ({\varvec{q}})\ge -2(1-\lambda ) \sum _{i=1}^N q_i. \end{aligned}$$

Observe that we do not use the routing algorithm in the proof of Lemma 9. Indeed, the proof is based on the definition of the drift, properties of the Euclidean norm and the definition of indicator function. We present the proof of Lemma 9 in Appendix B.4.

5.2 Proof of Proposition 1

Using the lemmas proved in the previous subsection, we prove the SSC result stated in Proposition 1.

Proof

(of Proposition 1) We prove the proposition using d instead of \(\lceil cN^\beta \rceil \), for ease of exposition.

We show that each of the three conditions of Lemma 5 are satisfied. To show (C1), we use Lemmas 7, 8 and 9. Specifically, using the bounds from Lemmas 8 and 9 in Lemma 7 and canceling the term \(2(1-\lambda )\sum _{i=1}^N q_i\), we obtain

$$\begin{aligned} \Delta W_\perp ({\varvec{q}})&\le \dfrac{1}{2\left\| {\varvec{q}}_\perp \right\| }\left( N(\lambda +1) - 2\lambda \left( \dfrac{d-1}{N}\right) \left\| {\varvec{q}}_\perp \right\| \right) \\&= \dfrac{N(\lambda +1)}{2\left\| {\varvec{q}}_\perp \right\| } - \dfrac{\lambda (d-1)}{N} \\&{\mathop {\le }\limits ^{(a)}} \dfrac{N}{\left\| {\varvec{q}}_\perp \right\| } - \dfrac{\lambda _0(d-1)}{N}, \end{aligned}$$

where (a) holds because \(\lambda \in (\lambda _0,1)\). Therefore, (C1) is satisfied with

$$\begin{aligned} \gamma = \dfrac{\lambda _0(d-1)}{2N},\;\text {and}\; B=\dfrac{2N^2}{\lambda _0(d-1)}. \end{aligned}$$
(21)

Now we verify the (C2). Recall the definition of \(\nu _{\max }\):

$$\begin{aligned} \nu _{\max }{\mathop {=}\limits ^{\triangle }}\sup \left\{ \left| \left\| {\varvec{q}}_\perp \right\| -\left\| {\varvec{q}}'_\perp \right\| \right| :\; {\varvec{q}},{\varvec{q}}'\in {\mathbb {Z}}_+^{(N)}\; \text {and}\; G_{{\varvec{q}},{\varvec{q}}'}>0 \right\} . \end{aligned}$$

From the definition of the transition rate matrix G in (1), observe that if \({\varvec{q}},{\varvec{q}}'\in {\mathbb {Z}}_+^N\) are such that \(G_{{\varvec{q}},{\varvec{q}}'}>0\), then there are only two options: either \({\varvec{q}}'={\varvec{q}}+{\varvec{e}}^{(i)}\) or \({\varvec{q}}'={\varvec{q}}-{\varvec{e}}^{(i)}\) for some \(i\in [N]\). Then, by definition of \({\varvec{q}}_\perp \) and linearity of projection, we have, \({\varvec{q}}'_\perp = {\varvec{q}}_\perp \pm \left( {\varvec{e}}^{(i)} - \frac{1}{N}{\varvec{1}}\right) \). Then, by triangle inequality, we obtain

$$\begin{aligned} \left\| {\varvec{q}}'_\perp \right\|&\le \left\| {\varvec{q}}_\perp \right\| + \left\| {\varvec{e}}^{(i)} - \frac{1}{N}{\varvec{1}}\right\| . \end{aligned}$$

which implies

$$\begin{aligned} \left\| {\varvec{q}}_\perp \right\| - \left\| {\varvec{q}}'_\perp \right\|&\le \left\| {\varvec{e}}^{(i)} - \frac{1}{N}{\varvec{1}}\right\| \\&{\mathop {=}\limits ^{(a)}} \sqrt{\dfrac{N-1}{N^2} + \left( 1 - \dfrac{1}{N}\right) ^2 } \\&{\mathop {=}\limits ^{(b)}} \sqrt{1-\dfrac{1}{N}} \\&{\mathop {\le }\limits ^{(c)}} 1, \end{aligned}$$

where (a) holds by definition of the Euclidean norm; (b) holds after reorganizing terms; and (c) holds because if \(0\le x\le 1\) we have \(\sqrt{x}\le 1\). Since the inequality above holds for every pair of \({\varvec{q}},{\varvec{q}}'\in {\mathbb {Z}}_+^N\) such that \(G_{{\varvec{q}},{\varvec{q}}'}>0\), we obtain

$$\begin{aligned} \nu _{\max }\le 1. \end{aligned}$$
(22)

To verify condition (C3), observe from (1) that

$$\begin{aligned} {\overline{G}}&= N\left( \lambda +1\right) , \end{aligned}$$
(23)

where the maximum is attained when none of the queues is empty. Finally, observe that \(G_{\max }\le {\overline{G}}\) by definition, and the upper bound is indeed attained when \({\varvec{q}}={\varvec{q}}_\parallel \ne {\varvec{0}}\). Therefore,

$$\begin{aligned} G_{\max }= N\left( \lambda + 1\right) . \end{aligned}$$
(24)

We verified that all the conditions are satisfied. Then, using (21), (22) and (24) in (19) we obtain that for any positive integer r

$$\begin{aligned} {\mathbb {E}}\left[ \left\| {\varvec{q}}_\perp \right\| ^r \right]&\le \left( \dfrac{4N^2}{\lambda _0(d-1)}\right) ^r + \left( \dfrac{16 N^2+4\lambda _0(d-1)}{\lambda _0(d-1)} \right) ^r r! \\&{\mathop {\le }\limits ^{(a)}} C^r \left( \dfrac{N^2}{d-1}\right) ^r r! \\&{\mathop {\le }\limits ^{(b)}} C^r \left( \dfrac{N^2}{d-1}\right) ^r r^{r+\frac{1}{2}}, \end{aligned}$$

where (a) holds for some constant \(C\ge 16\) which is independent of \(\lambda \), d, N, c, and r; and (b) holds by Stirling’s approximation and because \(e^{1-r}\le 1\). This proves (3).

From (3), observe that \(r^{\frac{1}{2r}}\) is maximized at \(r=e\), so \(r^{\frac{1}{2r}}\le \exp \left( {\scriptstyle \frac{1}{2e}}\right) \). This completes the proof of (4).

To prove (5) we use Lemma 6 and that \(\lambda \in (0,1)\). Then, replacing \(d=\lceil cN^\beta \rceil \) we obtain the results. \(\square \)

6 Conclusion and future work

In this paper we study a supermarket checkout system in the many-server heavy-traffic regime. We parametrize the arrival rate so that the arrival rate per server is \(N^{-\alpha }\), for \(\alpha >0\) where N is the number of servers. Specifically, we answer the question: how fast should the number of servers grow with respect to the load to observe the classical heavy-traffic behavior of the scaled average queue lengths? We show that under power-of-d choices, where \(d=\lceil c N^\beta \rceil \), we need \(\alpha +\beta >3\). Then, the cases of constant d and JSQ routing are immediate consequences of our result, and we obtain \(\alpha >3\) and \(\alpha >2\), respectively. We use two proof techniques: one based on the Transform method proposed in [30] and the other one based on Stein’s method. We additionally show the rate of convergence of the expected value.

The case of \(\alpha \le 1\) is well studied in the literature. Then, there is a gap between our results and the literature. Future work is to explore how the system behaves if \(\alpha \in (1,3]\) for power-of-d choices and if \(\alpha \in (1,2]\) JSQ. We believe that there are only two phase transitions for \(\alpha \in (0,\infty )\): one at \(\alpha =\frac{1}{2}\) which corresponds to the Halfin–Whitt regime; and one at \(\alpha =1\) which corresponds to the NDS regime. Hence, we need to develop new proof techniques to close the gap.

Another line of future work is to create a unifying framework for all \(\alpha \in (0,\infty )\). As mentioned in Sect. 1, the cases of \(\alpha \in (0,1]\) are well-studied in the literature. However, the proof techniques are different for every phase of \(\alpha \). We believe there is a framework which gives a generic result, where we can obtain the results from the literature by simply plugging in the desired value of \(\alpha \).