1 Introduction

The \(n\times n\) switch is a model that has been widely used and studied to understand the behavior of Internet routers and data center switches. The importance of this model in the design of Internet routers is well-known. In the mid to late 1990s when the Internet was exploding, it served as an important model to study and design scheduling algorithms for the switch fabric of Internet routers. The model is now used to understand the design of data centers used for cloud-computing services. Today’s data centers consist of a massive number of servers organized in racks, which are interconnected through a data center network. An ideal data center network is a huge input-queued switch with one port for each server. However, real switches are much smaller and they have to be interconnected appropriately, and routing and scheduling algorithms have to be designed, so that the overall network emulates an \(n\times n\) switch. Designing such a network is a challenging and active area of research; see [1, 14] for example. Here, we do not explicitly consider a data center network, but only consider the \(n\times n\) switch which is the underlying model (see [14], which argues why the model is appropriate even for a data center network) and study the behavior of the well-known MaxWeight algorithm [22] for this model.

As mentioned in [15], the \(n\times n\) switch model also serves as a canonical theoretical example of a problem which exhibits the so-called multi-dimensional state space collapse, which makes it difficult to study using traditional heavy-traffic theory. Recently, it has been shown in [11] that the heavy-traffic behavior of the mean queue length in an \(n\times n\) switch operating under the MaxWeight scheduling algorithm can be precisely characterized using a nontrivial extension of a drift technique introduced in [6]. In particular, one of the key contributions of [11] is to extend the drift technique to cover the case of multi-dimensional state space collapse. The result in [11] also resolved an open question on the scaling behavior of the heavy-traffic queue length in a switch operating under the MaxWeight algorithm. In particular, it showed that the total heavy-traffic-scaled queue length is O(n) or the mean heavy-traffic-scaled delay experienced by a packet is O(1). While there have been other results establishing O(1) delay scaling, the significance of the result in [11] is that the result holds for the original MaxWeight algorithm introduced in [22], with no additional scheduling operations required.

The results in [11] were obtained under the assumption that every input and output port of the switch is saturated (i.e., close to capacity), and the arrival rates to each input port and each output port are close to capacity by the same amount. For the purposes of this paper, we call a switch where only some of ports are saturated an “incompletely saturated switch.” The main purpose of this paper is to show that the MaxWeight algorithm has order-optimal scaling (in the number of ports of the switch) for the case of an incompletely saturated switch and for the case where each port has a different rate of saturation. The results in [11] were obtained by setting the drift of a test function to zero in steady state. This function was carefully chosen based on the underlying symmetry when all ports are saturated. Since we do not have such a symmetry in the incompletely saturated switch, a new test function has to be used. The main reason for this is that the geometry of the state space collapse is different here than in the all-port saturated case considered in [11]. In this paper, we propose a novel test function to use for the incompletely saturated switch. Another major contribution of the paper is that we express this function in a simple form, so that it can be generalized for use in other problems that exhibit state space collapse into other regions. The case where each port has a different rate of saturation is of theoretical importance since it corresponds to the situation where the drift vector is not the identity matrix in the diffusion limit [8]. In fact, the diffusion limit is no longer symmetric in the components of the limiting stochastic process (i.e., the diffusion limit is not symmetric across the ports), but here we show that the technique in [11] works even in this asymmetric situation to produce an exact formula for a certain linear combination of the queue lengths. This result can be further used to show optimal queue length scaling under some conditions on the saturation rates.

The rest of the paper is organized as follows. The model of an \(n\times n\) switch and the MaxWeight algorithm are presented in Sect. 2. General results on queue lengths are presented in Sect. 3. In Sect. 4, we present various extensions and special cases in order to interpret the general result. Concluding remarks are provided in the last section.

We remark that a very preliminary version of this paper appeared in [10].

1.1 Related work

The MaxWeight algorithm for general stochastic networks, of which the \(n\times n\) switch is a special case, was presented in [22], where it was shown that algorithm is throughput-optimal. The special case of the switch was considered in [12], where it was shown that simpler algorithms such as MaxSize and maximal matchings are not throughput-optimal. The case of nonstochastic arrivals was considered in [24], where in addition to the throughput optimality of MaxWeight-type algorithms, a lower bound on the throughput loss of simpler algorithms such as maximal matching was established.

Here we are interested in performance metrics beyond throughput optimality. In particular, we are interested in understanding whether the MaxWeight matching algorithm for switches achieves small queue lengths, at least under a heavy-traffic scaling regime. Using diffusion limits, the heavy-traffic optimality of the MaxWeight algorithm in a switch where only one port is saturated was established in [21], although the final step of interchanging the order of the heavy-traffic scaling limit and letting time go to infinity was not undertaken there. Motivated by this result, [6] studied the switch directly in steady state, established heavy-traffic optimality, and introduced a new drift method of studying stochastic networks in heavy-traffic. However, it should be emphasized that the results in [6, 21] apply only to the case of a single saturated port since they both rely on state of the system collapsing to a single dimension in the heavy-traffic limit. In a recent development, the case where all ports are uniformly saturated (thus leading to the more difficult case of multi-dimensional state space collapse) was studied in [11], where an exact expression for the heavy-traffic scaled queue length under the MaxWeight algorithm is derived. Additionally, this expression shows that the algorithm has heavy-traffic optimal scaling in the size of the switch, resolving an open conjecture stated in [15]. The results in [11] use and significantly extend the drift technique presented in [6].

State collapse in the case where multiple ports are saturated has been established in [2, 18, 19] and, using the state space collapse result in [18], a diffusion limit was established in [8]. However, properties of the diffusion limit (such as its steady state distribution or mean queue lengths) were difficult to establish. The result in [11] can thus be interpreted as a derivation of the sum of the first moments of the limiting vector stochastic process, but obtained without going through the usual fluid/diffusion limit scaling arguments. An entirely different technique to study heavy-traffic optimality was presented in [17], where the authors approximate the scheduling decisions made by a switch which can change its schedule infinitely often to simulate a queueing network with product-form steady state distribution as in [4]. The resulting algorithm is heavy-traffic optimal, but has a very high computational complexity. The optimal scaling of the queue length as a function of the switch size in the nonheavy-traffic limit appears to be still open. Alternatively, one can consider asymptotic regimes other than the heavy-traffic limit. The best known results in this regard are the ones in [13, 16], but these require algorithms that are more involved than the original MaxWeight algorithm.

2 System model and background

In this section, we present the model of an input-queued switch, the MaxWeight scheduling algorithm, and some lemmas that will later play a key role in the results.

Note on notation For ease of understanding, and to allow the reader to compare and contrast with the results in [11], here we use definitions and notation consistent with [11]. Since an \(n\times n\) switch has \(n^2\) queues, we often deal with the Euclidean space \(\mathbb {R} ^{n^2}\). To describe the elements of the space \(\mathbb {R} ^{n^2}\), we will use the terms vector and matrix interchangeably, because these elements may be viewed either as \(n^2\)-dimensional vectors or as \(n \times n\) matrices. Both these views will be used in this paper for ease of exposition. We primarily use the inner product and norm of the vector space \(\mathbb {R} ^{n^2}\), but we represent the vectors in \(\mathbb {R} ^{n^2}\) as \(n\times n\) matrices for convenience. Thus, \(\mathbf{x} \) is a matrix with the \((i,j){\text {th}}\) component denoted by \(x_{ij} \), and for two vectors \(\mathbf{x} \) and \(\mathbf{y} \) in \(\mathbb {R} ^{n^2}\), their inner product \(\left\langle \mathbf{x} ,\mathbf{y} \right\rangle \) and Euclidean norm \(\Vert \mathbf{x} \Vert \) are defined by

$$\begin{aligned} \left\langle \mathbf{x} ,\mathbf{y} \right\rangle \triangleq \sum _{i=1}^n\sum _{j=1}^n x_{ij} y_{ij} , \quad \Vert \mathbf{x} \Vert \triangleq \sqrt{\left\langle \mathbf{x} ,\mathbf{x} \right\rangle } = \sqrt{\sum _{i=1}^n\sum _{j=1}^n x_{ij} ^2}. \end{aligned}$$

For two vectors \(\mathbf{x} \) and \(\mathbf{y} \) in \(\mathbb {R} ^{n^2}\), \(\mathbf{x} \le \mathbf{y} \) means \(x_{ij} \le y_{ij} \) for every (ij). We use \(\mathbf{1}\) to denote the all ones vector. Let \(\mathbf{e}^{(i)}\) denote the vector defined by \(e^{(i)} _{ij} =1\) for all j and \(e^{(i)} _{i'j}=0\) for all \(i' \ne i\) and for all j. Thus, \(\mathbf{e}^{(i)}\) is a matrix whose \(i{\text {th}}\) row is all ones, with zeros everywhere else. Similarly, let \(\widetilde{\mathbf{e}}^{(j)}\) denote the vector defined by \(\widetilde{ e}^{(j)} _{ij} =1\) for all i and \(\widetilde{ e}^{(j)} _{ij'}=0\) for all \(j' \ne j\) and for all i, i.e., it is a matrix whose \(j{\text {th}}\) column is all ones, with zeros everywhere else. We use \(\sum _i(.)\) without the limits on summation to denote \(\sum _{i=1}^{n}(.)\). For a random process \(\mathbf{q} (t)\) and a function V(.), we will sometimes use V(t) to denote \(V(\mathbf{q} (t))\). We use \(\text {Var}(.)\) to denote variance of a random variable and \(\text {Cov}(.)\) to denote covariance between two random variables.

2.1 The switch model and the MaxWeight algorithm

For the purposes of queueing-theoretical analysis, an \(n\times n\) switch can be treated as an \(n\times n\) matrix of queues operating in a time-slotted discrete-time fashion. Let \(a_{ij} (t)\) denote the number of packet arrivals to the \((i,j){\text {th}}\) queue, i.e., the queue in the \(i{\text {th}}\) row and \(j{\text {th}}\) column. We let \(\mathbf{a} \in \mathbb {R} ^{n^2}\) denote the vector \((a_{ij} )_{ij} \). For every queue (ij), the arrival process \(a_{ij} (t)\) is a stochastic process that is i.i.d. across time, with mean \(\mathbb {E} [a_{ij} (t)]=\lambda _{ij} \) and variance \(\text {Var}(a_{ij} (t))=\sigma ^2 _{ij} \) for any time t. We assume that the arrival processes are independent across queues (the processes \(a_{ij} (t)\) and \(a_{i'j'}(t)\) are independent for \((i,j)\ne (i',j')\)) and are also independent of the queue lengths or schedules chosen in the switch. We further assume that for all ijt, \(a_{ij} (t) \le a_{\max }\) for some \(a_{\max }\ge 1\). The arrival rate vector is denoted by \(\varvec{\lambda }= (\lambda _{ij} )_{ij} \) and the variance vector \((\sigma ^2 _{ij} )_{ij} \) is denoted by \((\varvec{\sigma })^2\) or \(\varvec{\sigma }^2\). We will use \(\varvec{\sigma }\) to denote \((\sigma _{ij} )_{ij} \). We denote the queue length of packets at input port i to be delivered at output port j at time t by \(q_{ij}(t) \). Let \(\mathbf{q} \in \mathbb {R} ^{n^2}\) denote the vector of all queue lengths.

The key scheduling constraints are that (i) at most one packet can be removed from each queue in each time slot and (ii) at most one queue can be served in each row and each column in each time slot. These constraints arise from technological constraints in a real switch, where each row represents an input port and each column represents an output port; see [20] for example. The scheduling constraints can be captured in graph-theoretical language as follows: Let G denote a complete \(n \times n \) bipartite graph with \(n^2\) edges. Each node on the left side of the bipartite graph can be thought of as representing a row in the matrix of switches and each node on the right side represents a column. The schedule in each time slot is a matching on this graph G. Let \(s_{ij} \) be the amount of service provided to queue (ij) in a given time slot. Thus, \(s_{ij} =1\) if the link between input port i and output port j is matched or scheduled, and \(s_{ij} =0\) otherwise, and we denote \(\mathbf{s} = (s_{ij} )_{ij} \). Then, the set of feasible schedules, \(\mathcal {S} \subset \{0,1\}^{n^{2}}\), is the set of all vectors \(\mathbf{s} \) which satisfy

$$\begin{aligned} \sum _{i=1}^{n}s_{ij} \le 1, \,\,\sum _{j=1}^{n}s_{ij} \le 1 \> \forall \> \,i,j\in \{1,2,\ldots ,n\}. \end{aligned}$$

Let \(\mathcal {S} ^*\) denote the set of maximal feasible schedules. Then, it is easy to see that \(\mathcal {S} ^*\) is the set of all vectors \(\mathbf{s} \) which satisfy

$$\begin{aligned} \sum _{i=1}^{n}s_{ij} = 1, \sum _{j=1}^{n}s_{ij} = 1 \> \forall \> i,j\in \{1,2,\ldots ,n\} . \end{aligned}$$
(1)

A scheduling policy or algorithm picks a schedule \(\mathbf{s} (t)\) in every time slot based on the current queue length vector, \(\mathbf{q} (t)\). In each time slot, the order of events is as follows: Queue lengths at the beginning of time slot t are \(\mathbf{q}(t) \). A schedule \(\mathbf{s} (t)\) is then picked for that time slot based on the queue lengths. Then, arrivals for that time \(\mathbf{a} (t)\) happen. Finally, the packets are served, and there is unused service if there are no packets in a scheduled queue. The queue lengths are then updated to give the queue lengths for the next time slot. The queue lengths therefore evolve as follows:

$$\begin{aligned} q_{ij} (t+1)= & {} \left[ q_{ij}(t) +a_{ij} (t)-s_{ij} (t)\right] ^+ \nonumber \\= & {} q_{ij}(t) +a_{ij} (t)-s_{ij} (t)+u_{ij} (t),\nonumber \\ \mathbf{q} (t+1)= & {} \mathbf{q} (t)+\mathbf{a} (t)-\mathbf{s} (t)+\mathbf{u} (t) , \end{aligned}$$
(2)

where \([x]^+ = \max (0,x)\) is the projection onto the positive real axis, and \(u_{ij} (t)\) is the unused service on link (ij). Unused service is 1 only when link (ij) is scheduled, but has zero queue length; and it is 0 in all other cases. Thus, when \(u_{ij} (t)=1\), we have \(q_{ij} (t)=0\), \(a_{ij} (t)=0\), \(s_{ij} (t)=1\) and \(q_{ij} (t+1)=0\). Therefore, we have \(u_{ij} (t)q_{ij}(t) =0\), \(u_{ij} (t)a_{ij} (t)=0\) and \(u_{ij} (t)q_{ij} (t+1)=0\). Also note that since \(u_{ij} (t)\le s_{ij} (t)\), we have that \(\sum _{i=1}^{n}u_{ij} \in \{0,1\}\) and \(\sum _{j=1}^{n}u_{ij} \in \{0,1\}\) for all ij.

The MaxWeight algorithm is a popular scheduling algorithm for switches. In every time slot t, each link (ij) is given a weight equal to its queue length \(q_{ij} (t)\) and the schedule with the maximum weight among the feasible schedules \(\mathcal {S}\) is chosen at that time slot. In other words, using queue lengths as the weights, the permutation matrix with the maximum weight is picked in every time slot. This algorithm is presented in Algorithm 1.

figure a

Note that there is always a maximum weight schedule that is maximal. If the MaxWeight schedule chosen at time t, \(\mathbf{s} \), is not maximal, there exists a maximal schedule \(\mathbf{s} ^*\in \mathcal {S} ^*\) such that \(\mathbf{s} \le \mathbf{s} ^*\). For any link (ij) such that \(s_{ij} =0\) and \(s^*_{ij} =1\), \(q_{ij}(t) =0\). If not, \(\mathbf{s} \) would not have been a maximum weight schedule. Therefore, we can pretend that the actual schedule chosen is \(\mathbf{s} ^*\) and the links (ij) that are in \(\mathbf{s} ^*\) but not in \(\mathbf{s} \) have an unused service of 1. Note that this does not change the scheduling algorithm, but it is just a notational convenience. Therefore, without loss of generality, we assume that the schedule chosen in each time slot is a maximal schedule, i.e.,

$$\begin{aligned} \mathbf{s} (t)\in \mathcal {S} ^* \text { for all time }t. \end{aligned}$$

Hence, the MaxWeight schedule picks one of the n! possible permutations from the set \(\mathcal {S} ^*\) in each time slot.

Under i.i.d. arrivals, the queue lengths process \(\mathbf{q} (t)\) is a Markov chain. The switch is said to be stable under a scheduling policy if the sum of all the queue lengths is finite in an appropriate stochastic sense (see [20] for example). The capacity region of the switch is the set of arrival rates \(\varvec{\lambda }\) for which the switch is stable under some scheduling policy. A policy that stabilizes the switch under any arrival rate in the capacity region is said to be throughput-optimal. It is well-known [12, 22] that the capacity region \(\mathcal {C}\) of the switch is convex hull of all feasible schedules:

$$\begin{aligned} \mathcal {C}= & {} \text {Conv}(\mathcal {S}) \nonumber \\= & {} \left\{ \varvec{\lambda } \in \mathbb {R} ^{n^2}_+ : \sum _{i=1}^{n} \lambda _{ij} \le 1 , \sum _{j=1}^{n} \lambda _{ij} \le 1 \> \forall \> i,j\in \{1,\ldots ,n\} \right\} \nonumber \\= & {} \left\{ \varvec{\lambda } \in \mathbb {R} ^{n^2}_+ : \left\langle \varvec{\lambda },\mathbf{e}^{(i)} \right\rangle \le 1, \left\langle \varvec{\lambda },\widetilde{\mathbf{e}}^{(j)} \right\rangle \le 1 \> \forall \> i,j\in \{1,\ldots ,n\} \right\} . \end{aligned}$$
(4)

For any arrival rate vector \(\varvec{\lambda }\), \(\rho \triangleq \max _{ij} \{\sum _i\lambda _{ij} ,\sum _j\lambda _{ij} \}\) is called the load. It is also known that the queue lengths process is positive recurrent under the MaxWeight algorithm whenever the arrival rate is in the capacity region \(\mathcal {C}\) (equivalently, load \(\rho <1\)) and therefore is throughput-optimal [12, 22, 20, Chap 4].

For any arrival rate in the capacity region \(\mathcal {C} \), due to positive recurrence of \(\mathbf{q} (t)\), we have that a steady state distribution exists under the MaxWeight policy. Let \(\overline{\mathbf{q}}\) denote the steady state random vector. In this paper, we focus on the weighted average queue length under the steady state distribution, i.e., \(\mathbb {E} [\sum _{i,j}\alpha _{ij} \overline{q}_{ij} ]\), for some weights \(\alpha _{ij} \), which can be shown to exist as in [11]. We consider a set of switch systems indexed by a parameter \(\epsilon \), with arrival rate \(\varvec{\lambda }^{\epsilon }\) so that the arrival rate approaches the vector \(\varvec{\nu }\) on the boundary of the capacity region \(\mathcal {C}\) in the limit as \(\epsilon \rightarrow 0\). This is called the heavy-traffic limit. We are interested in the weighted average queue length in the heavy-traffic limit, i.e., \(\lim _{\epsilon \rightarrow 0} \mathbb {E} [\sum _{i,j}\alpha _{ij} \overline{q}_{ij} ]\). In particular, in this paper, we will consider cases where the sum of the arrival rates at some rows and some columns approach 1,  and they may approach 1 at different rates at each column and row.

2.2 Kingman bound for a discrete-time queue

To establish our results, we later show that the total queue length along each row and each column is lower-bounded. For this purpose, we use a bound on the steady state queue length in a simple discrete-time queue [6]. While the well-known Kingman bound is for continuous-time G/G/1 queues, due to the similarity in establishing the result, the bound for the discrete-time case is also called the Kingman bound in [6] and we use the same terminology here. We state the version of the result for the special case where a queue can serve only one packet per slot here since this is what is used in this paper. In this special case, instead of a bound, one has an exact expression for the mean queue length which we present below.

Lemma 1

Consider a single server operating in discrete time. In each time slot, packets arrive according to an i.i.d. arrival process \(\alpha (t)\) with mean \(\lambda \) and variance \(\sigma ^2.\) Let q denote the queue length. Each packet needs exactly one time slot of service. The server operates according to any nonidling policy, serving one packet in every time slot whenever the queue is nonempty. Then, the queue is positive recurrent as long as \(\lambda <1\), and the steady state mean queue length is given by

$$\begin{aligned} \mathbb {E} [q]=\frac{\sigma ^2}{2(1-\lambda )}-\frac{\lambda }{2}. \end{aligned}$$

We note that the first term on the right-hand side of the above equation is what is referred to as the Kingman bound in [6].

2.3 Moment bounds from Lyapunov drift conditions

In later sections of the paper, we establish state space collapse results by obtaining moment bounds on certain quantities related to the queue length vector based on drift of a Lyapunov function. A key ingredient in this approach is to obtain moment bounds from drift conditions. A lemma from [7] was used in [6] to obtain these bounds, and a different result from [3] was used in [11] to obtain tighter bounds. Here we state [3, Theorem 1] in the form it was stated in [11].

Lemma 2

For an irreducible and aperiodic Markov chain \(\{X(t)\}_{t\ge 0}\) over a countable state space \(\mathcal {X},\) suppose \(Z:\mathcal {X}\rightarrow \mathbb {R} _+\) is a nonnegative-valued Lyapunov function. We define the drift of Z at X as

$$\begin{aligned} \Delta Z(X)\triangleq [Z(X(t+1))-Z(X(t))]\>\mathcal {I}(X(t)=X), \end{aligned}$$

where \(\mathcal {I}(.)\) is the indicator function. Thus, \(\Delta Z(X)\) is a random variable that measures the amount of change in the value of Z in one step, starting from state X. This drift is assumed to satisfy the following conditions:

  • C1     There exists an \(\eta >0,\) and a \(\zeta <\infty \) such that, for any \(t=1,2,\ldots \) and for all \(X\in \mathcal {X}\) with \( Z(X)\ge \zeta ,\)

    $$\begin{aligned} \mathbb {E} [\Delta Z(X) | X(t)=X] \le - \eta . \end{aligned}$$
  • C2     There exists a \(D < \infty \) such that, for all \(X \in \mathcal {X},\)

    $$\begin{aligned} \mathbb {P} \left( |\Delta Z(X)| \le D\right) = 1. \end{aligned}$$

Further assume that the Markov chain \(\{X(t)\}_t\) converges in distribution to a random variable \(\overline{X}\). Then, for any \(r=1,2,\ldots \),

$$\begin{aligned} \mathbb {E} [Z\left( \overline{X}\right) ^r]\le&\left( 2\zeta \right) ^r+ \left( 4D\right) ^r \left( \frac{D+\eta }{\eta } \right) ^r r!. \end{aligned}$$

3 Incompletely saturated switch

In this section, we will study the switch system when an arbitrary number of ports are saturated. We consider the switch where \(n_1 \le n\) input ports (rows) and \(n_2 \le n\) output ports (columns) are saturated. Without loss of generality, we assume that input ports (rows) \(1,2,\ldots ,n_1 \) and output ports (columns) \(1,2,\ldots ,n_2\) are saturated. Thus, we consider a point \(\varvec{\nu }\) on the boundary of the capacity region that lies in \(\text {Relint}(\mathcal {F} _{n_1n_2} )\), the relative interiorFootnote 1 of the face \(\mathcal {F} _{n_1n_2} \) defined by

$$\begin{aligned} \mathcal {F} _{n_1n_2}\triangleq & {} \left\{ \varvec{\nu } \in \mathcal {C} : \sum _{j=1}^{n} \nu _{ij} = 1 \> \forall \> i\in \{1,\ldots ,n_1\}, \sum _{i=1}^{n} \nu _{ij} = 1\> \forall \> j\in \{1,\ldots ,n_2\}\right\} \nonumber \\= & {} \left\{ \varvec{\nu }\in \mathcal {C} : \left\langle \varvec{\nu },\mathbf{e}^{(i)} \right\rangle = 1\> \forall \> i\in \{1,\ldots ,n_1\}, \right. \nonumber \\&\qquad \left. \phantom {\mathbb {R} ^{n^2}_+} \,\left\langle \varvec{\lambda },\widetilde{\mathbf{e}}^{(j)} \right\rangle =1 \> \forall \> j \in \{1,\ldots ,n_2\} \right\} . \end{aligned}$$
(5)

In other words, if we let \(\delta _i =1-\sum _j \nu _{ij} =1-\left\langle \varvec{\nu },\mathbf{e}^{(i)} \right\rangle \) and \(\tilde{\delta }_j =1-\sum _i \nu _{ij} =\left\langle \varvec{\nu },\widetilde{\mathbf{e}}^{(j)} \right\rangle \), we have that \(\delta _i =0\) for \(i=1,{\ldots }n_1\), \(\widetilde{\delta }_j =0\) for \(j=1,{\ldots }n_2\) and \(\delta _i >0\) for \(i>n_1\), \(\widetilde{\delta }_j >0\) for \(j>n_2\).

We consider a sequence of systems indexed by \(\epsilon \). In this section, we consider an i.i.d. arrival process \(\mathbf{a} ^{(\epsilon )} (t)\) with mean and variance given by

$$\begin{aligned} \mathbb {E} [\mathbf{a} ^{(\epsilon )} (t)]= & {} \varvec{\lambda }^{(\epsilon )} =\varvec{\nu } -\epsilon {{\varvec{k}}}, \\ Var[\mathbf{a} ^{(\epsilon )} (t)]= & {} \left( \varvec{\sigma }^{(\epsilon )}\right) ^2, \end{aligned}$$

such that as \(\epsilon \rightarrow 0\), \(\left( \varvec{\sigma }^{(\epsilon )}\right) ^2 \rightarrow \varvec{\sigma }^2\). Here \({{\varvec{k}}}\in \mathbb {R} _{+}^{n^2}\) is vector that represents the rates of saturation of different ports. Define

$$\begin{aligned} \kappa _i({{\varvec{k}}})\triangleq & {} \left\langle {{\varvec{k}}},\mathbf{e}^{(i)} \right\rangle = \sum _j k_{ij} \text { and } \nonumber \\ \widetilde{\kappa }_j({{\varvec{k}}})\triangleq & {} \left\langle {{\varvec{k}}},\widetilde{\mathbf{e}}^{(j)} \right\rangle =\sum _i k_{ij} . \end{aligned}$$
(6)

For simplicity of notation, we will suppress the dependence on \({{\varvec{k}}}\). Note that \(\sum _i \kappa _i=\sum _j \widetilde{\kappa }_j\). Let \(\kappa _{avg} \triangleq \sum _i \kappa _i/n=\sum _j \widetilde{\kappa }_j/n\), \(\kappa _{\min }=\min _i\kappa _i\), \(\widetilde{\kappa }_{\min }=\min _j \widetilde{\kappa }_j\) and similarly for \(\kappa _{\max },\widetilde{\kappa }_{\max }\). Note that in [11], the setting when \({{\varvec{k}}}=\varvec{\nu }\) is studied, in which case \(\kappa _i=1\) and \(\widetilde{\kappa _j}= 1\) for all ij. In order to make sure that the heavy-traffic parameter \(\epsilon \) is comparable to this case, we assume without loss of generality that \(\kappa _{avg}=1\). In other words, we normalize the vector \({{\varvec{k}}}\) by assuming that \(\left\langle {{\varvec{k}}},\mathbf{1} \right\rangle =n\). Such a normalized k is called the saturation rate vector. We will study the switch in the heavy-traffic limit as \(\epsilon \downarrow 0\). Define

$$\begin{aligned} \gamma _i ^{(\epsilon )}\triangleq & {} 1-\sum _j \lambda ^{(\epsilon )} _{ij} = \delta _i+\epsilon \kappa _i ,\\ \widetilde{\gamma }_j ^{(\epsilon )}\triangleq & {} 1-\sum _i \lambda ^{(\epsilon )} _{ij} = \widetilde{\delta }_j+\epsilon \widetilde{\kappa }_j . \end{aligned}$$

Note that \(\gamma _i ^{(\epsilon )} =\epsilon \kappa _i \) for \(i \le n_1\), and \(\widetilde{\gamma }_j ^{(\epsilon )} =\epsilon \widetilde{\kappa }_j \) for \(j\le n_2\). For the unsaturated ports, \(\lim _{\epsilon \downarrow 0}\gamma _i ^{(\epsilon )} = \delta _i>0\) for \(i > n_1\), and \(\lim _{\epsilon \downarrow 0}\widetilde{\gamma }_j ^{(\epsilon )} =\widetilde{\delta }_j>0 \) for \(j>n_2\).

3.1 Universal lower bound

We now present lower bounds on the steady state queue lengths that are satisfied by any scheduling algorithm.

Proposition 1

Consider a set of switch systems with the arrival processes \(\mathbf{a} ^{(\epsilon )} (t)\) described above, parameterized by \(0<\epsilon <1\), such that the mean arrival rate vector is \(\varvec{\lambda }^{\epsilon }=\varvec{\nu }-\epsilon {{\varvec{k}}}\) for some \(\varvec{\nu } \in \text {Relint}(\mathcal {F} _{n_1n_2} )\), and the variance is \(\left( \varvec{\sigma }^{(\epsilon )}\right) ^2\). For \(1\le i,j \le n\), \(\gamma _i\) , \(\widetilde{\gamma }_j\) are defined as above. Fix a scheduling policy under which the switch system is stable for any \(0<\epsilon <1\). Let \(\mathbf{q} ^{(\epsilon )}(t)\) denote the queue lengths process under this policy for each system. Suppose that this process converges in distribution to a steady state random vector \(\overline{\mathbf{q}} ^{(\epsilon )}\). Then, for each of these systems, the steady state mean queue lengths can be lower-bounded as follows:

$$\begin{aligned} \mathbb {E} \left[ \sum _j \overline{q}_{ij} ^{(\epsilon )} \right]\ge & {} \frac{\sum _j \left( \sigma ^{(\epsilon )} _{ij}\right) ^2 }{2\gamma _i }- \frac{1-\gamma _i }{2} \text { for all } 1\le i \le n, \end{aligned}$$
(7)
$$\begin{aligned} \mathbb {E} \left[ \sum _i \overline{q}_{ij} ^{(\epsilon )} \right]\ge & {} \frac{\sum _i \left( \sigma ^{(\epsilon )} _{ij}\right) ^2 }{2\widetilde{\gamma }_j }- \frac{1-\widetilde{\gamma }_j }{2} \text { for all } 1\le j \le n . \end{aligned}$$
(8)

Therefore, in the heavy-traffic limit as \(\epsilon \downarrow 0\), if \(\left( \varvec{\sigma }^{(\epsilon )}\right) ^2 \rightarrow \varvec{\sigma }^2\), for the saturated ports we have

$$\begin{aligned} \liminf _{\epsilon \downarrow 0} \epsilon \mathbb {E} \left[ \sum _j \overline{q}_{ij} ^{(\epsilon )} \right]\ge & {} \frac{\sum _j \sigma _{ij}^2 }{2\kappa _i } \text { for all } 1\le i \le n_1,\\ \liminf _{\epsilon \downarrow 0} \epsilon \mathbb {E} \left[ \sum _i \overline{q}_{ij} ^{(\epsilon )} \right]\ge & {} \frac{\sum _i \sigma _{ij}^2 }{2\widetilde{\kappa }_j } \text { for all } 1\le i \le n_2, \end{aligned}$$

and for the unsaturated ports we have

$$\begin{aligned} \liminf _{\epsilon \downarrow 0} \epsilon \mathbb {E} \left[ \sum _j \overline{q}_{ij} ^{(\epsilon )} \right]\ge & {} 0 \text { for all } i> n_1,\\ \liminf _{\epsilon \downarrow 0} \epsilon \mathbb {E} \left[ \sum _i \overline{q}_{ij} ^{(\epsilon )} \right]\ge & {} 0 \text { for all } i > n_2. \end{aligned}$$

Proof

The queue lengths at each port can be lower-bounded by a single-server queue as follows: Consider \(\sum _j q_{ij} ^{(\epsilon )} (t)\), the total queue length at input port (row) i. It can be lower-bounded sample path wise by a coupled single-server queue with arrival process \(\sum _j a_{ij} ^{(\epsilon )} (t)\) [11, Proposition 1]. The mean and variance of the arrival process for this single-server queue are then \((1-\gamma _i)\) and \(\left( \sigma ^{(\epsilon )} _{ij}\right) ^2\), respectively, because of the independence of the arrival processes across the queues in the matrix. Then, using the Kingman bound for a single-server queue in Lemma 1, we get (7). Similarly lower bounding the total queue length for output port (column) j, \(\sum _i q_{ij} ^{(\epsilon )} (t),\) by a single-server queue, we get (8). Taking the heavy-traffic limits using the facts that \(\gamma _i ^{(\epsilon )} =\epsilon \kappa _i \), \(\widetilde{\gamma }_j ^{(\epsilon )} =\epsilon \widetilde{\kappa }_j \) for saturated ports and \(\lim _{\epsilon \downarrow 0}\gamma _i ^{(\epsilon )} >0\), \(\lim _{\epsilon \downarrow 0}\widetilde{\gamma }_j ^{(\epsilon )} >0 \) for unsaturated ports, gives the heavy-traffic limits.\(\square \)

3.2 State space collapse

In this subsection, we will show that under the MaxWeight algorithm the queue lengths vector concentrates close to a lower-dimensional cone. In order to make this more precise, we need to first present the following definitions.

The heavy-traffic rate vector \(\varvec{\nu }\) lies in the relative interior of the face \(\mathcal {F} _{n_1n_2} \) which is at the intersection of hyperplanes with the \(n_1+n_2\) normal vectors, \(\{\mathbf{e}^{(i)} \text { for } 1\le i \le n_1\}\cup \{\widetilde{\mathbf{e}}^{(j)} \text { for } 1\le j \le n_2\}\). Call the cone spanned by these normal vector \(\mathcal {K} _{n_1n_2} \), and the subspace spanned by these normal vectors \(\mathcal {S} _{n_1n_2} \), i.e.,

$$\begin{aligned} \mathcal {K} _{n_1n_2}\triangleq & {} \Bigl \{ \mathbf{x} \in \mathbb {R} ^{n^2} : \mathbf{x} = \sum _{i=1}^{n_1} w_i \mathbf{e}^{(i)} + \sum _{j=1}^{n_2} \widetilde{w}_j \widetilde{\mathbf{e}}^{(j)} \text { where }\\&\quad w_i \in \mathbb {R} _+ \text { for } 1\le i \le n_1, \widetilde{w}_j \in \mathbb {R} _+ \text { for } 1\le j \le n_2 \Bigr \} \\= & {} \Bigl \{ \mathbf{x} \in \mathbb {R} ^{n^2} : \mathbf{x} = \sum _{i=1}^{n} w_i \mathbf{e}^{(i)} + \sum _{j=1}^{n} \widetilde{w}_j \widetilde{\mathbf{e}}^{(j)} \text { where }\\&\quad w_i \in \mathbb {R} _+ \text { for } 1\le i \le n_1 \text { and } w_i=0 \text { for } i>n_1,\\&\quad \widetilde{w}_j \in \mathbb {R} _+ \text { for } 1\le j \le n_2 \text { and } \widetilde{w}_j=0 \text { for } j>n_2\Bigr \}. \end{aligned}$$
$$\begin{aligned} \mathcal {S} _{n_1n_2}\triangleq & {} \Bigl \{ \mathbf{x} \in \mathbb {R} ^{n^2} : \mathbf{x} = \sum _{i=1}^{n_1} w_i \mathbf{e}^{(i)} + \sum _{j=1}^{n_2} \widetilde{w}_j \widetilde{\mathbf{e}}^{(j)} \text { where } \nonumber \\&\quad w_i \in \mathbb {R} \text { for } 1\le i \le n_1, \widetilde{w}_j \in \mathbb {R} \text { for } 1\le j \le n_2 \Bigr \} \\= & {} \Bigl \{ \mathbf{x} \in \mathbb {R} ^{n^2} : \mathbf{x} = \sum _{i=1}^{n} w_i \mathbf{e}^{(i)} + \sum _{j=1}^{n} \widetilde{w}_j \widetilde{\mathbf{e}}^{(j)} \text { where }\nonumber \\&\quad w_i \in \mathbb {R} \text { for } 1\le i \le n_1 \text { and } w_i=0 \text { for } i>n_1,\nonumber \\&\quad \widetilde{w}_j \in \mathbb {R} \text { for } 1\le j \le n_2 \text { and } \widetilde{w}_j=0 \text { for } j>n_2\Bigr \}.\nonumber \end{aligned}$$
(9)

The components of any vector \(\mathbf{x} \) in the subspace \(\mathcal {S} _{n_1n_2} \) can be written in the form \(x_{ij} =w_i+\widetilde{ w}_{j} \), where \(w_i \in \mathbb {R} \) for \(1\le i \le n_1\), \(w_i =0\) for \( i > n_1 \), \(\widetilde{ w}_{j} \in \mathbb {R} \) for \(1\le j\le n_2\), and \(\widetilde{ w}_{j} =0\) for \( j > n_1 \). The same is true for any vector \(\mathbf{x} \) in the cone \(\mathcal {K} _{n_1n_2} \) with the further restriction that \(w_i \ge 0 \) for \(1\le i \le n_1\) and \(\widetilde{ w}_{j} \ge 0\) for \(1\le j\le n_2\). This leads to the following lemma relating the structure of the cone \(\mathcal {K} _{n_1n_2} \) and the subspace \(\mathcal {S} _{n_1n_2} \).

Lemma 3

Let \(n_1 < n\) and \(n_2 < n\). The cone \(\mathcal {K} _{n_1n_2} \) is the intersection of the space \(\mathcal {S} _{n_1n_2} \) and the positive orthant, i.e.,

$$\begin{aligned} \mathcal {K} _{n_1n_2} = \mathcal {S} _{n_1n_2} \cap \mathbb {R} ^{n^2}_+. \end{aligned}$$

Proof

From the definitions above, it is clear that \(\mathcal {K} _{n_1n_2} \subseteq \mathcal {S} _{n_1n_2} \) and \(\mathcal {K} _{n_1n_2} \subseteq \mathbb {R} ^{n^2}_+\). Therefore, we have \(\mathcal {K} _{n_1n_2} \subseteq \mathcal {S} _{n_1n_2} \cap \mathbb {R} ^{n^2}_+\). Now suppose that \(\mathbf{x} \in \mathcal {S} _{n_1n_2} \cap \mathbb {R} ^{n^2}_+\). We have that \(x_{ij} = w_i+\widetilde{w}_j \ge 0 \), where \(w_i =0\) for \( i > n_1 \) and \(\widetilde{ w}_{j} =0\) for \( j > n_1 \). Since \(n_1 < n\), we have \(x_{in}=w_i \ge 0\), and so we get that \(w_i \ge 0 \) for \(1\le i \le n_1\). Similarly, we get that \(\widetilde{ w}_{j} \ge 0\) for \(1\le j\le n_2\), proving that \(\mathbf{x} \in \mathcal {K} _{n_1n_2} \) and so \( \mathcal {S} _{n_1n_2} \cap \mathbb {R} ^{n^2}_+ \subseteq \mathcal {K} _{n_1n_2} \). \(\square \)

Let \(\mathbf{x} _{\parallel \mathcal {K} } \) denote the projection of \(\mathbf{x}\) onto the convex cone \(\mathcal {K} _{n_1n_2} \), and let \(\mathbf{x} _{\perp \mathcal {K} } \triangleq \mathbf{x} -\mathbf{x} _{\parallel \mathcal {K} } \) be the perpendicular component. Similarly, let \(\mathbf{x} _{\parallel \mathcal {S} } \) denote the projection of \(\mathbf{x}\) onto the subspace \(\mathcal {S} _{n_1n_2} \), and let \(\mathbf{x} _{\perp \mathcal {S} } \triangleq \mathbf{x} -\mathbf{x} _{\parallel \mathcal {S} } \) be the perpendicular component. For simplicity of notation, we suppress the dependence on \(n_1\) and \(n_2\) . In [11], \(\mathbf{x} _{\parallel } \) and \(\mathbf{x} _{\perp } \) were used to denote the projections we denote here by \(\mathbf{x} _{\parallel \mathcal {K} } \) and \(\mathbf{x} _{\perp \mathcal {K} } \), respectively. We will show that under the MaxWeight algorithm, all the moments of \(\mathbf{q} _{\perp \mathcal {K} } \) are bounded in steady state independent of \(\epsilon \). Since the \(\ell _1\) norm of the queues length vector \(\Vert \mathbf{q} \Vert _1\), is \(\Omega (1/\epsilon ),\) as shown in the previous subsection, this establishes that the perpendicular component \(\mathbf{q} _{\perp \mathcal {K} } \) is a negligible part of the queue lengths vector \(\mathbf{q} \) for small \(\epsilon \). Thus, we establish state space collapse onto the cone \(\mathcal {K} _{n_1n_2} \).

For \(\varvec{\nu } \in \text {Relint}(\mathcal {F} _{n_1n_2} )\), the vector in the relative interior of the face \(\mathcal {F} _{n_1n_2} \), let \(\nu _{min}\triangleq \min _{i,j}\nu _{ij}\). We assume that \(\nu _{\min } > 0\). Then, \(\nu _{\min }' >0,\) where

$$\begin{aligned} \nu _{\min }' \triangleq \min \left\{ \nu _{min}, \min _{ i>n_1}\left\{ 1-\left\langle \varvec{\nu },\mathbf{e}^{(i)} \right\rangle \right\} , \min _{ j>n_2}\left\{ 1-\left\langle \varvec{\nu },\widetilde{\mathbf{e}}^{(j)} \right\rangle \right\} \right\} . \end{aligned}$$

Proposition 2

Consider a set of switch systems under the MaxWeight scheduling algorithm, with the arrival processes \(\mathbf{a} ^{(\epsilon )} (t)\), parameterized by \(0<\epsilon <1\), and maximum possible arrivals in any queue \(a_{\max }\). The mean arrival rate vector is \(\varvec{\lambda }^{\epsilon }=\varvec{\nu }-\epsilon {{\varvec{k}}}\) for some \(\varvec{\nu } \in \text {Relint}(\mathcal {F} _{n_1n_2} )\) such that \(\nu _{\min } \triangleq \min _{ij} \nu _{ij} >0\), and a normalized saturation rate vector \({{\varvec{k}}}\in \mathbb {R} ^{n^2}_+\) such that \(\left\langle {{\varvec{k}}},\mathbf{1} \right\rangle =n\). Let the variance \(\left( \varvec{\sigma }^{(\epsilon )}\right) ^2\) of the arrival process be such that \(\Vert \varvec{\sigma }^{(\epsilon )}\Vert ^2 \le \widetilde{\sigma }^2\) for some \(\widetilde{\sigma }^2\) not dependent on \(\epsilon \). Let \(\mathbf{q} ^{(\epsilon )}(t)\) denote the queue lengths process of each system, which is positive recurrent. Therefore, the process \(\mathbf{q} ^{(\epsilon )}(t)\) converges to a steady state random vector in distribution, which we denote by \(\overline{\mathbf{q}} ^{(\epsilon )}\). Then, for each system with \(0< \epsilon \le \nu _{\min }' /2\Vert {{\varvec{k}}}\Vert \), the steady state queue lengths vector satisfies

$$\begin{aligned} \mathbb {E} \left[ \Vert \overline{\mathbf{q}} _{\perp \mathcal {S} } ^{(\epsilon )} \Vert ^r\right] \le \mathbb {E} \left[ \Vert \overline{\mathbf{q}} _{\perp \mathcal {K} } ^{(\epsilon )} \Vert ^r\right] \le (M_r)^r \>\> \forall r \in \{1,2,\ldots \}, \end{aligned}$$

where \(\nu _{\min }' \) is defined as above and \(M_r \) is a function of \(r,\widetilde{\sigma },\varvec{\nu },a_{\max },\nu _{\min }' \) but independent of \(\epsilon \).

Proof

We omit the superscript \(^{(\epsilon )} \) in this proof for simplicity of notation. For the Markov chain \(\mathbf{q} \), consider the Lyapunov function \(W_{\perp \mathcal {K} }(\mathbf{q} ) \triangleq \Vert \mathbf{q} _{\perp \mathcal {K} } \Vert \). We will use Lemma 2 to obtain moment bounds from the drift of \(W_{\perp \mathcal {K} } (.)\). Similar to [11, Proposition 2], under the MaxWeight scheduling algorithm it can be shown that

$$\begin{aligned} \mathbb {E} \left[ \left. \Delta W_{\perp \mathcal {K} }(\mathbf{q} ) \right| \mathbf{q}(t)=\mathbf{q}\right]\le & {} \frac{1}{2\Vert \mathbf{q} _{\perp \mathcal {K} } \Vert }\left( \Vert \varvec{\lambda }\Vert ^2+\Vert \varvec{\sigma }\Vert ^2+n -2\epsilon \left\langle \mathbf{q} _{\perp \mathcal {K} } ,{{\varvec{k}}}\right\rangle \phantom {2\min _\mathbf{{r}\in \mathcal {C} }}\right. \nonumber \\&\left. +\, 2 \mathbb {E} \left[ \left. \left\langle \mathbf{q} _{\parallel \mathcal {K} } ,\mathbf{s} (t)-\varvec{\nu }\right\rangle \right| \mathbf{q}(t)=\mathbf{q}\right] +2\min _\mathbf{{r}\in \mathcal {C} } \left\langle \mathbf{q} ,\varvec{\nu }-\mathbf {r}\right\rangle \right) .\nonumber \\ . \end{aligned}$$
(10)

Recall that since \(\mathbf{s} \in \mathcal {S} ^*\), \(\left\langle \mathbf{e}^{(i)} ,\mathbf{s} (t)\right\rangle =1\) and \(\left\langle \widetilde{\mathbf{e}}^{(j)} ,\mathbf{s} (t)\right\rangle =1\) for all ijt. Similarly, since \(\varvec{\nu } \in \mathcal {F} _{n_1n_2} \), for \(1 \le i\le n_1\) and \(1 \le j\le n_2\), we have \(\left\langle \mathbf{e}^{(i)} ,\varvec{\nu }\right\rangle =1\) and \(\left\langle \widetilde{\mathbf{e}}^{(j)} ,\varvec{\nu } \right\rangle =1\). By the definition of the cone \(\mathcal {K} _{n_1n_2} \), the vector \(\mathbf{q} _{\parallel \mathcal {K} } \) can be written as \(\mathbf{q} _{\parallel \mathcal {K} } = \sum _{i=1}^{n_1} w_i \mathbf{e}^{(i)} + \sum _{j=1}^{n_2} \widetilde{w}_j \widetilde{\mathbf{e}}^{(j)} \). Putting all these together, we get \(\left\langle \mathbf{q} _{\parallel \mathcal {K} } ,\mathbf{s} (t)-\varvec{\nu }\right\rangle =0\). We now use the following claim to bound the last term in (10).

Claim 1

For any \(\mathbf{q} \in \mathbb {R} ^{n^2}\) and \(\varvec{\nu } \in \text {Relint}(\mathcal {F} _{n_1n_2} )\) such that \(\nu _{\min } >0\),

$$\begin{aligned} \varvec{\nu }+\frac{\nu _{\min }' }{\Vert \mathbf{q} _{\perp \mathcal {K} } \Vert }\mathbf{q} _{\perp \mathcal {K} } \in \mathcal {C} . \end{aligned}$$

Proof

We will verify that \(\varvec{\nu }+\frac{\nu _{\min }' }{\Vert \mathbf{q} _{\perp \mathcal {K} } \Vert }\mathbf{q} _{\perp \mathcal {K} } \) satisfies all the conditions in the definition of \(\mathcal {C}\) in (4) Note that \(\frac{\mathbf{q} _{\perp \mathcal {K} } }{\Vert \mathbf{q} _{\perp \mathcal {K} } \Vert }\) is a unit vector along some direction. Since \(\nu _{ij} \ge \nu _{\min }' \), clearly \({\varvec{\nu }+\frac{\nu _{\min }' }{\Vert \mathbf{q} _{\perp \mathcal {K} } \Vert }\mathbf{q} _{\perp \mathcal {K} } \in \mathbb {R} ^{n^2}_+}\).

It is well-known that for any \(\mathbf{x} \in \mathcal {K} _{n_1n_2} \), \(\left\langle \mathbf{q} _{\perp \mathcal {K} } ,\mathbf{x} \right\rangle \le 0\). Since \(\mathbf{e}^{(i)} \in \mathcal {K} _{n_1n_2} \) for \(1\le i\le n_1\), we have \(\left\langle \mathbf{q} _{\perp \mathcal {K} } ,\mathbf{e}^{(i)} \right\rangle \le 0\). Then, using the fact that \(\varvec{\nu } \in \mathcal {F} _{n_1n_2} \), we have, for \(1\le i\le n_1\),

$$\begin{aligned} \left\langle \varvec{\nu }+\frac{\nu _{\min }' }{||\mathbf{q} _{\perp \mathcal {K} } ||}\mathbf{q} _{\perp \mathcal {K} } ,\mathbf{e}^{(i)} \right\rangle \le 1. \end{aligned}$$

For \(i>n_1\),

$$\begin{aligned} \left\langle \varvec{\nu }+\frac{\nu _{\min }' }{||\mathbf{q} _{\perp \mathcal {K} } ||}\mathbf{q} _{\perp \mathcal {K} } ,\mathbf{e}^{(i)} \right\rangle&= \left\langle \varvec{\nu }, \mathbf{e}^{(i)} \right\rangle +\nu _{\min }' \left\langle \frac{\mathbf{q} _{\perp \mathcal {K} } }{||\mathbf{q} _{\perp \mathcal {K} } ||},\mathbf{e}^{(i)} \right\rangle \\&\mathop {\le }\limits ^{(a)} \left\langle \varvec{\nu }, \mathbf{e}^{(i)} \right\rangle +\nu _{min}' \le 1, \end{aligned}$$

where (a) follows from the Cauchy–Schwartz inequality and the last inequality follows from the definition of \(\nu _{\min }' \). It can similarly be shown that \({\left\langle \varvec{\nu }+\frac{\nu _{\min } }{\Vert \mathbf{q} _{\perp \mathcal {K} } \Vert }\mathbf{q} _{\perp \mathcal {K} } ,\widetilde{\mathbf{e}}^{(j)} \right\rangle \le 1}\) for \(1 \le j \le n_2\) as well as \(j>n_2\), proving the claim. \(\square \)

Using the claim, the last term in (10) can be bounded as

$$\begin{aligned} 2\min _\mathbf{{r}\in \mathcal {C} } \left\langle \mathbf{q} ,\varvec{\nu }-\mathbf {r}\right\rangle\le & {} 2\left\langle \mathbf{q} ,\varvec{\nu }-\left( \varvec{\nu }+\frac{\nu _{\min }' }{\Vert \mathbf{q} _{\perp \mathcal {K} } \Vert }\mathbf{q} _{\perp \mathcal {K} } \right) \right\rangle \\= & {} -2\left\langle \mathbf{q} ,\frac{\nu _{\min }' }{\Vert \mathbf{q} _{\perp \mathcal {K} } \Vert }\mathbf{q} _{\perp \mathcal {K} } \right\rangle \\= & {} -2 \nu _{\min }' \Vert \mathbf{q} _{\perp \mathcal {K} } \Vert . \end{aligned}$$

Using this in (10) and bounding the \(-2\epsilon \left\langle \mathbf{q} _{\perp \mathcal {K} } ,{{\varvec{k}}}\right\rangle \) term using the Cauchy–Schwartz inequality, we get

$$\begin{aligned} \mathbb {E} \left[ \left. \Delta W_{\perp \mathcal {K} }(\mathbf{q} ) \right| \mathbf{q}(t)=\mathbf{q}\right]\le & {} \frac{\Vert \varvec{\lambda }\Vert ^2+\Vert \varvec{\sigma }\Vert ^2+n}{2\Vert \mathbf{q} _{\perp \mathcal {K} } \Vert }-\nu _{\min }' +\epsilon \Vert {{\varvec{k}}}\Vert \\\le & {} \frac{\Vert \varvec{\lambda }\Vert ^2+\Vert \varvec{\sigma }\Vert ^2+n}{2\Vert \mathbf{q} _{\perp \mathcal {K} } \Vert }-\frac{\nu _{\min }' }{2}, \text { whenever } \epsilon \le \frac{\nu _{\min }' }{2\Vert {{\varvec{k}}}\Vert } \\\le & {} - \frac{\nu _{\min }' }{4}, \text { for all } \mathbf{q} \text { such that } W_{\perp \mathcal {K} }(\mathbf{q} ) \ge \frac{2(\Vert \varvec{\lambda }\Vert ^2+\Vert \varvec{\sigma }\Vert ^2+n)}{\nu _{\min }' }. \end{aligned}$$

Thus, the conditions of Lemma 2 is valid with \(\zeta = \frac{2(\Vert \varvec{\lambda }\Vert ^2+\Vert \varvec{\sigma }\Vert ^2+n)}{\nu _{\min }' }\) and \(\eta = \frac{\nu _{\min }' }{4}\). Moreover, \(\zeta \) can be upper-bounded by \(\zeta \le \frac{2(\Vert \varvec{\nu }\Vert ^2+\Vert \varvec{\sigma }\Vert ^2+n)}{\nu _{\min }' }\), an expression that doesn’t contain \(\epsilon \). The conditions of Lemma 2 can be verified using nonexpansivity of projection and the fact that the maximum arrivals at every time are \(a_{\max }\) [11]. Then, from Lemma 2, we get the bound on \(\mathbb {E} \left[ \Vert \overline{\mathbf{q}} _{\perp \mathcal {K} } ^{(\epsilon )} \Vert ^r\right] \) in the proposition. Since \(\mathcal {K} _{n_1n_2} \subseteq \mathcal {S} _{n_1n_2} \), we have \(\Vert \overline{\mathbf{q}} _{\perp \mathcal {S} } ^{(\epsilon )} \Vert ^r \le \Vert \overline{\mathbf{q}} _{\perp \mathcal {K} } ^{(\epsilon )} \Vert ^r\), completing the proof. \(\square \)

3.3 Asymptotically tight upper and lower bounds under the MaxWeight policy

In this subsection, we will use the state space collapse result from the previous section to obtain lower and upper bounds on weighted sums of queue lengths under the MaxWeight algorithm that are tight in heavy-traffic limit. It turns out that the queue length behavior under the MaxWeight algorithm when there is at least one unsaturated port is qualitatively different from the case when all ports are saturated. The reason for this is discussed in Corollary 5 in Sect. 4. So, in this subsection, we focus on the case when at least one port in not saturated. If all the input ports are saturated, from the definition of the capacity region, it follows that all the output ports are also saturated, i.e., whenever \(n_1 =n\), we also have \(n_2 =n\). Similarly, if all the output ports are saturated, it again follows that all the input ports are saturated. Since we are interested in an incompletely saturated switch, in this section we assume \(n_1 < n\) and \(n_2 < n\).

The queue length bounds are obtained by setting the drift of the following function to zero in steady state:

$$\begin{aligned} V(\mathbf{q} ) =\left||\mathbf{q} _{\parallel \mathcal {S} } \right||^2. \end{aligned}$$

Its drift is defined as

$$\begin{aligned} \Delta V(\mathbf{q} ) \triangleq&[V(\mathbf{q} (t+1))-V(\mathbf{q} (t))]\>\mathcal {I}(\mathbf{q} (t)=\mathbf{q} ). \end{aligned}$$

We now state the main result of the paper in a general form. In Sect. 4, we will interpret this result as well as present various special cases.

Theorem 1

Consider a set of switch systems under the MaxWeight scheduling algorithm, with the arrival processes \(\mathbf{a} ^{(\epsilon )} (t)\), parameterized by \(0<\epsilon <1\), and maximum possible arrivals in any queue \(a_{\max }\). The mean arrival rate vector is \(\varvec{\lambda }^{\epsilon }=\varvec{\nu }-\epsilon {{\varvec{k}}}\) for some \(\varvec{\nu } \in \text {Relint}(\mathcal {F} _{n_1n_2} )\) such that \(\nu _{\min } \triangleq \min _{ij} \nu _{ij} >0\), and a normalized saturation rate vector \({{\varvec{k}}}\in \mathbb {R} ^{n^2}_+\) such that \(\left\langle {{\varvec{k}}},\mathbf{1} \right\rangle =n\). Let the variance \(\left( \varvec{\sigma }^{(\epsilon )}\right) ^2\) of the arrival process be such that \(\Vert \varvec{\sigma }^{(\epsilon )}\Vert ^2 \le \widetilde{\sigma }^2\) for some \(\widetilde{\sigma }^2\) not dependent on \(\epsilon \) and assume that \(n_1 <n\) and \(n_2 <n\). Let \(\mathbf{q} ^{(\epsilon )}(t)\) denote the queue lengths process of each system, which is positive recurrent. Therefore, the process \(\mathbf{q} ^{(\epsilon )}(t)\) converges to a steady state random vector in distribution, which we denote by \(\overline{\mathbf{q}} ^{(\epsilon )}\). Then, for each system with \(0< \epsilon \le \nu _{\min }' /2\Vert {{\varvec{k}}}\Vert \), the steady state queue lengths vector satisfies

$$\begin{aligned} \frac{1}{2\epsilon }\left\langle \left( \varvec{\sigma }^{(\epsilon )} \right) ^2, \varvec{\zeta } \right\rangle -B_1(\epsilon ) \le \mathbb {E} \left[ \left\langle \overline{\mathbf{q}} ^{(\epsilon )} ,\varvec{\alpha } \right\rangle \right] \le \frac{1}{2\epsilon }\left\langle \left( \varvec{\sigma }^{(\epsilon )} \right) ^2, \varvec{\zeta } \right\rangle +B_2(\epsilon ) \end{aligned}$$

for any fixed weight vector \(\varvec{\alpha } \in \mathbb {R} ^{n^2}\) such that \(\left\langle \varvec{\alpha },\mathbf{e}^{(i)} \right\rangle =n\kappa _i \) for \( i \le n_1 \) and \(\left\langle \varvec{\alpha },\widetilde{\mathbf{e}}^{(j)} \right\rangle =n\widetilde{\kappa }_j \) for \(j \le n_2 \), where \(B_1(\epsilon )\) as well as \(B_2(\epsilon )\) are \(o(\frac{1}{\epsilon })\), i.e., \(\lim _{\epsilon \rightarrow 0} \epsilon B_1(\epsilon )=0\) and \(\lim _{\epsilon \rightarrow 0} \epsilon B_2(\epsilon )=0\). The vector \(\varvec{\zeta }\) is defined by

$$\begin{aligned} \zeta _{ij} \triangleq \left\{ \begin{array}{ll} 2-\frac{2n-n_1 -n_2 }{n^2-n_1 n_2 } &{}\quad \text {if}\, i\le n_1 \text { and } j\le n_2 \\ 1+\frac{n_2 }{n^2-n_1 n_2 } &{}\quad \text {if}\, i \le n_1 \text { and } j>n_2 \\ 1+\frac{n_1 }{n^2-n_1 n_2 } &{}\quad \text {if}\, i>n_1 \text { and } j \le n_2 \\ 0 &{}\quad \text {if}\ i> n_1 \text { and } j> n_2 . \end{array}\right. \end{aligned}$$
(11)

Thus, in the heavy-traffic limit as \(\epsilon \downarrow 0\), we have

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \left\langle \overline{\mathbf{q}} ^{(\epsilon )} ,\varvec{\alpha } \right\rangle \right]&= \frac{1}{2}\left\langle \varvec{\sigma }^2, \varvec{\zeta } \right\rangle . \end{aligned}$$

Moreover, for any \(i>n_1\) and \(j>n_2\),

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \overline{q}_{ij} ^{(\epsilon )} \right] =0. \end{aligned}$$

Note that, in general, the weights \(\alpha _{ij} \) are allowed to be negative. We now present the proof of the theorem.

Proof

We consider the switch for a fixed \(0< \epsilon \le \nu _{\min }' /2\Vert {{\varvec{k}}}\Vert \). For simplicity of notation, we again omit the superscript \((\epsilon )\) in this proof. Similar to the notation in [11], we use \(\overline{\mathbf{q}} \) to denote the steady state queue length vector and \(\overline{\mathbf{a}} \) to denote the steady state arrival vector, which is identically distributed to the vector \(\mathbf{a} (t)\) at any time t. We use \(\mathbf{s} (\overline{\mathbf{q}} )\) and \(\mathbf{u} (\overline{\mathbf{q}} )\) for the schedule and unused service to explicitly show their dependence on the queue lengths. If the queue length at time t is \(\overline{\mathbf{q}} \), then the queue length at time \(t+1\), \(\overline{\mathbf{q}} +\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} )+\mathbf{u} (\overline{\mathbf{q}} )\), is denoted by \(\overline{\mathbf{q}} ^+\). Since \(\overline{\mathbf{q}} \) is the steady state queue length, it has the same distribution as \(\overline{\mathbf{q}} ^+\).

It can be easily shown using Lemma 2 that, in steady state, \(\mathbb {E} [\left||\overline{\mathbf{q}} \right||^2]\) is finite, and consequently we have

$$\begin{aligned} \mathbb {E} [V(\overline{\mathbf{q}} )]< \infty \quad \text { and } \quad \mathbb {E} [\left||\overline{\mathbf{q}} \right||_1] = \mathbb {E} [\sum _{ij} \overline{q}_{ij} ] < \infty , \end{aligned}$$
(12)

where \(\left||.\right||_1\) denotes the \(\ell _1\) norm. See Lemma 5 in [11] for details. Setting the drift of \(V(\overline{\mathbf{q}} )\) to zero in steady state, we get

$$\begin{aligned} 0&= \mathbb {E} [\Delta V (\overline{\mathbf{q}} )]\\&=\mathbb {E} [V (\overline{\mathbf{q}} +\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} )+\mathbf{u} (\overline{\mathbf{q}} ))-V (\overline{\mathbf{q}} )]\\&= \mathbb {E} \left[ \left||\left( \overline{\mathbf{q}} +\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} )+\mathbf{u} (\overline{\mathbf{q}} )\right) _{\parallel \mathcal {S} } \right||^2-\left||\overline{\mathbf{q}} _{\parallel \mathcal {S} } \right||^2\right] \\&\mathop {=}\limits ^{(a)} \mathbb {E} \left[ \left||\overline{\mathbf{q}} _{\parallel \mathcal {S} } +(\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} ))_{\parallel \mathcal {S} } +\mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right||^2-\left||\overline{\mathbf{q}} _{\parallel \mathcal {S} } \right||^2\right] \\&= \mathbb {E} \left[ \left||\overline{\mathbf{q}} _{\parallel \mathcal {S} } +(\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} ))_{\parallel \mathcal {S} } \right||^2 +2\left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } +(\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} ))_{\parallel \mathcal {S} } , \mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right\rangle \right] \\&\quad +\,\mathbb {E} \left[ \left||\mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right||^2-\left||\overline{\mathbf{q}} _{\parallel \mathcal {S} } \right||^2\right] \\&= \mathbb {E} \left[ \left||\overline{\mathbf{q}} _{\parallel \mathcal {S} } +(\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} ))_{\parallel \mathcal {S} } \right||^2 +2\left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } +(\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} ))_{\parallel \mathcal {S} } +\mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} ), \mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right\rangle \right] \\&\quad -\,\mathbb {E} \left[ \left||\mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right||^2+\left||\overline{\mathbf{q}} _{\parallel \mathcal {S} } \right||^2\right] \\&\mathop {=}\limits ^{(b)} \mathbb {E} \left[ \left||\overline{\mathbf{q}} _{\parallel \mathcal {S} } +(\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} ))_{\parallel \mathcal {S} } \right||^2 +2\left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } ^+, \mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right\rangle -\left||\mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right||^2-\left||\overline{\mathbf{q}} _{\parallel \mathcal {S} } \right||^2\right] \\&= \mathbb {E} \left[ \left||(\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} ))_{\parallel \mathcal {S} } \right||^2+2\left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , (\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} ))_{\parallel \mathcal {S} } \right\rangle -\left||\mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right||^2+2\left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } ^+, \mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right\rangle \right] , \end{aligned}$$

where (a) and (b) follow from the fact that projection onto a subspace is linear. Therefore, we have

$$\begin{aligned} 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , (\mathbf{s} (\overline{\mathbf{q}} )-\overline{\mathbf{a}} )_{\parallel \mathcal {S} } \right\rangle \right]= & {} \mathbb {E} \left[ \left||(\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} ))_{\parallel \mathcal {S} } \right||^2\right] -\mathbb {E} \left[ \left||\mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right||^2\right] \end{aligned}$$
(13)
$$\begin{aligned}&\quad +\,2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } ^+, \mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right\rangle \right] . \end{aligned}$$
(14)

We will now study each of the terms in this equation. Consider the LHS term in (13). Since any vector of the form \(\mathbf{x} _{\perp \mathcal {S} } \) is orthogonal to the space \(\mathcal {S} _{n_1n_2} \), we get

$$\begin{aligned}&2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , (\mathbf{s} (\overline{\mathbf{q}} )-\overline{\mathbf{a}} )_{\parallel \mathcal {S} } \right\rangle \right] \nonumber \\&\quad =2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , (\mathbf{s} (\overline{\mathbf{q}} )-\overline{\mathbf{a}} )_{\parallel \mathcal {S} } \right\rangle \right] +2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , (\mathbf{s} (\overline{\mathbf{q}} )-\overline{\mathbf{a}} )_{\perp \mathcal {S} } \right\rangle \right] \nonumber \\&\quad = 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , \mathbf{s} (\overline{\mathbf{q}} )-\overline{\mathbf{a}} \right\rangle \right] \nonumber \\&\quad {\mathop {=}\limits ^{(a)}} 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , \mathbf{s} (\overline{\mathbf{q}} )-\varvec{\lambda }\right\rangle \right] \nonumber \\&\quad = 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , \mathbf{s} (\overline{\mathbf{q}} )-(\varvec{\nu } -\epsilon {{\varvec{k}}})\right\rangle \right] \nonumber \\&\quad = 2\epsilon \mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , {{\varvec{k}}} \right\rangle \right] +2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , \mathbf{s} (\overline{\mathbf{q}} )-\varvec{\nu } \right\rangle \right] \nonumber \\&\quad {\mathop {=}\limits ^{(b)}} 2\epsilon \mathbb {E} \left[ \left\langle \sum _{i=1}^{n_1} w_i \mathbf{e}^{(i)} + \sum _{j=1}^{n_2} \widetilde{w}_j \widetilde{\mathbf{e}}^{(j)} , {{\varvec{k}}} \right\rangle \right] \nonumber \\&\qquad \quad +\,2\mathbb {E} \left[ \left\langle \sum _{i=1}^{n_1} w_i \mathbf{e}^{(i)} + \sum _{j=1}^{n_2} \widetilde{w}_j \widetilde{\mathbf{e}}^{(j)} , \mathbf{s} (\overline{\mathbf{q}} )-\varvec{\nu } \right\rangle \right] \nonumber \\&\quad {\mathop {=}\limits ^{(c)}} 2\epsilon \mathbb {E} \left[ \sum _{i=1}^{n_1} w_i\left\langle \mathbf{e}^{(i)} ,{{\varvec{k}}}\right\rangle + \sum _{j=1}^{n_2} \widetilde{w}_j \left\langle \widetilde{\mathbf{e}}^{(j)} , {{\varvec{k}}} \right\rangle \right] \nonumber \\&\qquad \quad +\,2\mathbb {E} \left[ \sum _{i=1}^{n_1} w_i \left\langle \mathbf{e}^{(i)} ,\mathbf{s} (\overline{\mathbf{q}} )-\varvec{\nu } \right\rangle + \sum _{j=1}^{n_2} \widetilde{w}_j \left\langle \widetilde{\mathbf{e}}^{(j)} , \mathbf{s} (\overline{\mathbf{q}} )-\varvec{\nu } \right\rangle \right] \nonumber \\&\quad {\mathop {=}\limits ^{(d)}} \frac{2\epsilon }{n} \mathbb {E} \left[ \sum _{i=1}^{n_1} w_i\left\langle \mathbf{e}^{(i)} ,\varvec{\alpha }\right\rangle + \sum _{j=1}^{n_2} \widetilde{w}_j \left\langle \widetilde{\mathbf{e}}^{(j)} , \varvec{\alpha } \right\rangle \right] \nonumber \\&\quad = \frac{2\epsilon }{n} \mathbb {E} \left[ \left\langle \sum _{i=1}^{n_1} w_i \mathbf{e}^{(i)} +\sum _{j=1}^{n_2} \widetilde{w}_j \widetilde{\mathbf{e}}^{(j)} , \varvec{\alpha } \right\rangle \right] \nonumber \\&\quad = \frac{2\epsilon }{n} \mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , \varvec{\alpha } \right\rangle \right] \nonumber \\&\quad = \frac{2\epsilon }{n} \mathbb {E} \left[ \left\langle \overline{\mathbf{q}} , \varvec{\alpha } \right\rangle \right] -\frac{2\epsilon }{n} \mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\perp \mathcal {S} } , \varvec{\alpha } \right\rangle \right] , \end{aligned}$$
(15)

where (a) follows from the fact that the arrivals are independent of queue lengths. From the definition of the space \(\mathcal {S} _{n_1n_2} \) in (9), we know that the vector \(\overline{\mathbf{q}} _{\parallel \mathcal {S} } \) can be represented as \(\sum _{i=1}^{n_1} w_i \mathbf{e}^{(i)} + \sum _{j=1}^{n_2} \widetilde{w}_j \widetilde{\mathbf{e}}^{(j)} \) for some \(w_i\in \mathbb {R} \), \(1\le i \le n_1 \), and \(\widetilde{ w}_{j} \in \mathbb {R} \), \(1\le i \le n_1 \), giving (b). Since the schedule is always assumed to be maximal, from (1), we have that \(\left\langle \mathbf{e}^{(i)} ,\mathbf{s} \right\rangle =1\) for all i. Since the first \(n_1\) rows are saturated, we have from (5) that \(\left\langle \mathbf{e}^{(i)} ,\varvec{\nu }\right\rangle =1\) for \(1\le i\le n_1 \). Therefore, we get that \(\left\langle \mathbf{e}^{(i)} ,\mathbf{s} (\overline{\mathbf{q}} )-\varvec{\nu } \right\rangle =0\) for \(1\le i\le n_1 \), and similarly we have \(\left\langle \widetilde{\mathbf{e}}^{(j)} , \mathbf{s} (\overline{\mathbf{q}} )-\varvec{\nu } \right\rangle =0\) for \(1\le j\le n_2 \). Consequently, the second term in (c) vanishes. From the definition of \(\kappa _i\) in (6) and the assumption on \(\varvec{\alpha }\), we have that \(\left\langle \mathbf{e}^{(i)} ,{{\varvec{k}}}\right\rangle =\kappa _i=\left\langle \mathbf{e}^{(i)} ,\varvec{\alpha }\right\rangle /n\) for \(1\le i\le n_1 \). Similarly for \(1\le j\le n_2 \), we have \(\left\langle \widetilde{\mathbf{e}}^{(j)} ,{{\varvec{k}}}\right\rangle =\kappa _i=\left\langle \widetilde{\mathbf{e}}^{(j)} ,\varvec{\alpha }\right\rangle /n\), giving us (d). Using the Cauchy–Schwartz inequality, we can bound the last term in (15) as follows:

$$\begin{aligned} - \mathbb {E} \left[ \left|| \overline{\mathbf{q}} _{\perp \mathcal {S} } \right|| \right] \left||\varvec{\alpha }\right||\le & {} \mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\perp \mathcal {S} } , \varvec{\alpha } \right\rangle \right] \le \mathbb {E} \left[ \left|| \overline{\mathbf{q}} _{\perp \mathcal {S} } \right|| \right] \left||\varvec{\alpha }\right|| ,\nonumber \\ -M_1 \left||\varvec{\alpha }\right||\le & {} \mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\perp \mathcal {S} } , \varvec{\alpha } \right\rangle \right] \le M_1\left||\varvec{\alpha }\right||, \end{aligned}$$
(16)

where the last set of inequalities follow from the state space collapse in Proposition 2. Putting this back in (15), we get

$$\begin{aligned} -\frac{2\epsilon }{n} M_1\left||\varvec{\alpha }\right|| \le 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } , (\mathbf{s} (\overline{\mathbf{q}} )-\overline{\mathbf{a}} )_{\parallel \mathcal {S} } \right\rangle \right] -\frac{2\epsilon }{n} \mathbb {E} \left[ \left\langle \overline{\mathbf{q}} , \varvec{\alpha } \right\rangle \right] \le \frac{2\epsilon }{n} M_1\left||\varvec{\alpha }\right||. \end{aligned}$$
(17)

We now consider the first term on the RHS of (13). Let \(\mathbf{f} _1, \mathbf{f} _2, \ldots , \mathbf{f} _L\) be an orthonormal basis of the space \(\mathcal {S} _{n_1n_2} \), where L is the dimension of the space \(\mathcal {S} _{n_1n_2} \). From the definition of the space \(\mathcal {S} _{n_1n_2} \), we know that each of these vectors \(\mathbf{f} _l\) can be written as \(f_{l_{ij} } = v_{l_i} +\widetilde{v}_{l_j} \) for some \(v_{l_i} , \widetilde{v}_{l_j} \in \mathbb {R} \) for all ij with \(v_{l_i} =0\) for \(i>n_1\), \(\widetilde{v}_{l_j} =0\) for \(j>n_2\). Then the norm of the projection onto the subspace \(\mathcal {S} _{n_1n_2} \) can be written in terms of the projections onto the basis vectors as follows:

$$\begin{aligned}&\mathbb {E} \left[ \left||(\overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} ))_{\parallel \mathcal {S} } \right||^2\right] \nonumber \\&\quad = \mathbb {E} \left[ \sum _l\left\langle \overline{\mathbf{a}} -\mathbf{s} (\overline{\mathbf{q}} ), \mathbf{f} _l \right\rangle ^2\right] \nonumber \\&\quad = \sum _l \mathbb {E} \left[ \left( \sum _{ij}(\overline{a}_{ij} -s_{ij} (\overline{\mathbf{q}} ))f_{l_{ij} } \right) ^2\right] \nonumber \\&\quad = \sum _l \mathbb {E} \left[ \left( \sum _{ij}(\overline{a}_{ij} -s_{ij} (\overline{\mathbf{q}} ))(v_{l_i} +\widetilde{v}_{l_j} ) \right) ^2\right] \nonumber \\&\quad = \sum _l \mathbb {E} \left[ \left( \sum _{i}v_{l_i} \left( \sum _j (\overline{a}_{ij} -s_{ij} (\overline{\mathbf{q}} ))\right) +\sum _{j}\widetilde{v}_{l_j} \left( \sum _i(\overline{a}_{ij} -s_{ij} (\overline{\mathbf{q}} ))\right) \right) ^2\right] \nonumber \\&\quad {\mathop {=}\limits ^{(a)}} \sum _l \mathbb {E} \left[ \left( \sum _{i}v_{l_i} \left( \sum _j \overline{a}_{ij} -(1-\kappa _i \epsilon )\right) +\sum _{j}\widetilde{v}_{l_j} \left( \sum _i\overline{a}_{ij} -(1-\widetilde{\kappa }_j \epsilon )\right) \right. \right. \nonumber \\&\qquad \left. \left. -\epsilon \left( \sum _i \kappa _i v_{l_i} +\sum _j\widetilde{\kappa }_j \widetilde{v}_{l_j} \right) \right) ^2\right] \nonumber \\&\quad {\mathop {=}\limits ^{(b)}} \sum _lVar\left( \sum _i v_{l_i} \sum _j \overline{a}_{ij} +\sum _j \widetilde{v}_{l_j} \sum _i \overline{a}_{ij} \right) +\epsilon ^2\sum _l\left( \sum _i \kappa _i v_{l_i} +\sum _j\widetilde{\kappa }_j \widetilde{v}_{l_j} \right) ^2\nonumber \\&\quad {\mathop {=}\limits ^{(c)}} \sum _l\left[ Var\left( \sum _i v_{l_i} \sum _j \overline{a}_{ij} \right) + Var\left( \sum _j \widetilde{v}_{l_j} \sum _i \overline{a}_{ij} \right) \right. \nonumber \\&\qquad +\left. 2Cov\left( \sum _i v_{l_i} \sum _{j'} \overline{a}_{ij'}, \sum _j \widetilde{v}_{l_j} \sum _{i'} \overline{a}_{i'j} \right) \right] +\epsilon ^2\sum _l\left\langle {{\varvec{k}}},\mathbf{f} _l\right\rangle ^2\nonumber \\&\quad {\mathop {=}\limits ^{(d)}} \sum _l\left[ \sum _i v_{l_i} ^2 Var\left( \sum _j \overline{a}_{ij} \right) +\sum _j \widetilde{v}_{l_j} ^2 Var\left( \sum _i \overline{a}_{ij} \right) \right. \nonumber \\&\qquad +\left. 2\sum _{ij} v_{l_i} \widetilde{v}_{l_j} Cov\left( \sum _{j'}\overline{a}_{ij'}, \sum _{i'} \overline{a}_{i'j}\right) \right] +\epsilon ^2\left||{{\varvec{k}}}_{\parallel \mathcal {S} } \right||^2\nonumber \\&\quad {\mathop {=}\limits ^{(e)}} \sum _l\left[ \sum _i v_{l_i} ^2 \sum _j \sigma _{ij} ^2+\sum _j \widetilde{v}_{l_j} ^2 \sum _i \sigma _{ij} ^2+2\sum _{ij}v_{l_i} \widetilde{v}_{l_j} \sigma _{ij} ^2\right] +\epsilon ^2\left||{{\varvec{k}}}_{\parallel \mathcal {S} } \right||^2\nonumber \\&\quad = \sum _l\left[ \sum _{ij} (v_{l_i} +\widetilde{v}_{l_j} )^2\sigma _{ij} ^2 \right] +\epsilon ^2\left||{{\varvec{k}}}_{\parallel \mathcal {S} } \right||^2\nonumber \\&\quad = \sum _{ij}\sum _l f_{l_{ij} } ^2\sigma _{ij} ^2+\epsilon ^2\left||{{\varvec{k}}}_{\parallel \mathcal {S} } \right||^2\nonumber \\&\quad = \sum _{ij}\sum _l \left\langle \varvec{\chi }^{(ij)}, \mathbf{f} _l\right\rangle \sigma _{ij} ^2+\epsilon ^2\left||{{\varvec{k}}}_{\parallel \mathcal {S} } \right||^2\nonumber \\&\quad = \sum _{ij}\left|| \varvec{\chi }^{(ij)}_{\parallel \mathcal {S} } \right||^2 \sigma _{ij} ^2+\epsilon ^2\left||{{\varvec{k}}}_{\parallel \mathcal {S} } \right||^2\nonumber \\&\quad = \frac{1}{n}\left\langle \varvec{\sigma }^2, \varvec{\zeta } \right\rangle +\epsilon ^2\left||{{\varvec{k}}}_{\parallel \mathcal {S} } \right||^2 , \end{aligned}$$
(18)

where \(\varvec{\chi }^{(ij)}\) is the matrix with 1 in the (ij)th position and 0 everywhere else. Since a maximal schedule is always picked, from (1), we get (a). Equation (b) follows from the fact that the total arrival rate for row i is \((1-\kappa _i \epsilon )\) and that for column j it is \((1-\widetilde{\kappa }_j \epsilon )\). Expanding the variance of two random variables along with the definition of \(\kappa _i\) and \(\widetilde{\kappa }_j\) in (6) gives (c). Independence of the arrival processes across the ports gives (d) and (e). The following lemma, which is proved in Appendix 1, gives (18).

Lemma 4

For all \(1\le i,j \le n\),

$$\begin{aligned} \left||\varvec{\chi }^{(ij)}_{\parallel \mathcal {S} } \right||^2 = \frac{\zeta _{ij} }{n}, \end{aligned}$$

where \(\varvec{\zeta }\) is defined in (11).

We will now focus on the second term on the RHS of (13). In order to do this, we will first obtain a bound on the unused service. We know from (12) that \(\mathbb {E} [\sum _{ij} \overline{q}_{ij} ]\) is finite and so \(\mathbb {E} [\sum _{j}\overline{q}_{ij} ]\) finite for each i. For \(i\le n_1 \), setting the drift of \(\sum _{j}\overline{q}_{ij} \) to zero in steady state, we get

$$\begin{aligned} \mathbb {E} \left[ \sum _j \overline{q}_{ij} \right]= & {} \mathbb {E} \left[ \sum _j \overline{q}_{ij} ^+\right] \\= & {} \mathbb {E} \left[ \sum _j\overline{q}_{ij} + \overline{a}_{ij} -s_{ij} (\overline{\mathbf{q}} )+u_{ij} (\overline{\mathbf{q}} ) \right] ,\\ 0= & {} \sum _j (\nu _{ij} -\epsilon k_{ij} )-1+\mathbb {E} \left[ \sum _j u_{ij} (\overline{\mathbf{q}} )\right] ,\\ \mathbb {E} \left[ \sum _ju_{ij} (\overline{\mathbf{q}} )\right]= & {} \epsilon \kappa _i , \end{aligned}$$

where the last equality follows from (5) since the row i is saturating. Similarly, for \(j\le n_2\), we have that \(\mathbb {E} \left[ \sum _i u_{ij} (\overline{\mathbf{q}} )\right] =\epsilon \widetilde{\kappa }_j \).

For any \(\mathbf{x} \in \mathbb {R} ^{n^2}\), let \(\widehat{\mathbf{x} }\in \mathbb {R} ^{n^2}\) denote its projection on to the space spanned by the vectors \(\varvec{\chi }^{(ij)}\) for \(i\le n_1 \) or \(j \le n_2 \). Call this space \(\mathcal {X}_{n_1 n_2 }\). Clearly, this space contains the space \(\mathcal {S} _{n_1n_2} \), i.e., \(\mathcal {S} _{n_1n_2} \subseteq \mathcal {X}_{n_1 n_2 }\). In other words, the vector \(\widehat{\mathbf{x} }\in \mathbb {R} ^{n^2}\) is obtained by replacing the \((n-n_1)(n-n_2)\) components with \(i>n_1\) and \(j>n_2\) with zeros, i.e.,

$$\begin{aligned} \widehat{x}_{ij} \triangleq \left\{ \begin{array}{ll} x_{ij} &{}\quad \text {if}\ i\le n_1 \text { or } j\le n_2\\ 0 &{}\quad \text {if}\ i> n_1 \text { and } j> n_2.\\ \end{array}\right. \end{aligned}$$

Moreover, for any \(\mathbf{x} \in \mathbb {R} ^{n^2}\), \(\mathbf{x} -\widehat{\mathbf{x} }\) is orthogonal to the space \(\mathcal {X}_{n_1 n_2 }\) and so is also orthogonal to the space \(\mathcal {S} _{n_1n_2} \).

Now the second term on the RHS of (13) can be upper-bounded as follows:

$$\begin{aligned} \mathbb {E} \left[ \left||\mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right||^2\right]&= \mathbb {E} \left[ \left||\left( \widehat{\mathbf{u} }(\overline{\mathbf{q}} )+\mathbf{u} (\overline{\mathbf{q}} )-\widehat{\mathbf{u} }(\overline{\mathbf{q}} )\right) _{\parallel \mathcal {S} } \right||^2\right] \nonumber \\&\mathop {=}\limits ^{(a)} \mathbb {E} \left[ \left||\left( \widehat{\mathbf{u} }(\overline{\mathbf{q}} )\right) _{\parallel \mathcal {S} } +\left( \mathbf{u} (\overline{\mathbf{q}} )-\widehat{\mathbf{u} }(\overline{\mathbf{q}} )\right) _{\parallel \mathcal {S} } \right||^2\right] \nonumber \\&\mathop {=}\limits ^{(b)} \mathbb {E} \left[ \left||\left( \widehat{\mathbf{u} }(\overline{\mathbf{q}} )\right) _{\parallel \mathcal {S} } \right||^2\right] \nonumber \\&\mathop {\le }\limits ^{(c)} \mathbb {E} \left[ \left||\left( \widehat{\mathbf{u} }(\overline{\mathbf{q}} )\right) \right||^2\right] \nonumber \\&= \mathbb {E} \left[ \sum _{ij} \widehat{u}^2_{ij} (\overline{\mathbf{q}} )\right] \nonumber \\&\mathop {=}\limits ^{(d)} \mathbb {E} \left[ \sum _{ij} \widehat{u}_{ij} (\overline{\mathbf{q}} )\right] \nonumber \\&\le \mathbb {E} \left[ \sum _{i=1}^{n_1}\sum _j u_{ij} (\overline{\mathbf{q}} )\right] +\mathbb {E} \left[ \sum _{j=1}^{n_2}\sum _i u_{ij} (\overline{\mathbf{q}} )\right] \nonumber \\&\mathop {=}\limits ^{(e)} \epsilon \sum _{i=1}^{n_1}\kappa _i +\epsilon \sum _{j=1}^{n_2} \widetilde{\kappa }_j \nonumber \\&\le 2\epsilon \left\langle {{\varvec{k}}},\mathbf{1} \right\rangle \nonumber \\&=2\epsilon n, \end{aligned}$$
(19)

where (a) follows from linearity of projection onto a subspace. Since, for any vector \(\mathbf{x} \), \(\mathbf{x} -\widehat{\mathbf{x} }\) is orthogonal to the space \(\mathcal {S} _{n_1n_2} \), we get (b). Inequality (c) is true due to the nonexpansive property of projection. Since, \(u_{ij} \in \{0,1\}\), we have (d). Since the saturation rate vector k is assumed to be normalized, we get (19). Using the trivial lower bound of zero, we have

$$\begin{aligned} 0\le \mathbb {E} \left[ \left||\mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right||^2\right] \le 2\epsilon n . \end{aligned}$$
(20)

We will now consider the final term, the one in (14).

$$\begin{aligned} 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } ^+, \mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right\rangle \right]&=2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } ^+, \mathbf{u} (\overline{\mathbf{q}} )\right\rangle \right] \nonumber \\&\mathop {=}\limits ^{(a)} 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } ^+, \widehat{\mathbf{u} }(\overline{\mathbf{q}} )\right\rangle \right] \nonumber \\&\mathop {=}\limits ^{(b)} 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} ^+, \widehat{\mathbf{u} }(\overline{\mathbf{q}} )\right\rangle \right] - 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\perp \mathcal {S} } ^+, \widehat{\mathbf{u} }(\overline{\mathbf{q}} )\right\rangle \right] \nonumber \\&= -2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\perp \mathcal {S} } ^+, \widehat{\mathbf{u} }(\overline{\mathbf{q}} )\right\rangle \right] , \end{aligned}$$
(21)

where (a) follows from the fact that \(\mathbf{x} -\widehat{\mathbf{x} }\) is orthogonal to the space \(\mathcal {S} _{n_1n_2} \). Since, by the definition of unused service in (2), we have that \(\overline{q}_{ij} ^+=0\) whenever \(u_{ij} (\overline{\mathbf{q}} )>0\), the first term in (b) vanishes. Therefore, using the Cauchy–Schwartz inequality, we get

$$\begin{aligned} -2\sqrt{\mathbb {E} \left[ \left||\overline{\mathbf{q}} _{\perp \mathcal {S} } ^+\right||^2\right] \mathbb {E} \left[ \left||\widehat{\mathbf{u} }(\overline{\mathbf{q}} )\right||^2\right] }\le & {} 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } ^+, \mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right\rangle \right] \le 2\sqrt{\mathbb {E} \left[ \left||\overline{\mathbf{q}} _{\perp \mathcal {S} } ^+\right||^2\right] \mathbb {E} \left[ \left||\widehat{\mathbf{u} }(\overline{\mathbf{q}} )\right||^2\right] }, \nonumber \\ -2M_2\sqrt{\mathbb {E} \left[ \left||\widehat{\mathbf{u} }\right||^2\right] }\le & {} 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } ^+, \mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right\rangle \right] \le 2M_2\sqrt{\mathbb {E} \left[ \left||\widehat{\mathbf{u} }\right||^2\right] }, \end{aligned}$$
(22)
$$\begin{aligned} -2M_2\sqrt{2\epsilon n}\le & {} 2\mathbb {E} \left[ \left\langle \overline{\mathbf{q}} _{\parallel \mathcal {S} } ^+, \mathbf{u} _{\parallel \mathcal {S} } (\overline{\mathbf{q}} )\right\rangle \right] \le 2M_2\sqrt{2\epsilon n}, \end{aligned}$$
(23)

where (22) is obtained by using the fact that, in steady state, \(\mathbb {E} \left[ \left||\overline{\mathbf{q}} _{\perp \mathcal {S} } ^+\right||^2\right] =\mathbb {E} \left[ \left||\overline{\mathbf{q}} _{\perp \mathcal {S} } \right||^2\right] \), which is bounded by \(M_2\) from state space collapse in Proposition 2. We get (23) from the bound in (19).

Substituting (17), (18), (20) and (23) in (13) and (14), we get the theorem with

$$\begin{aligned} B_1(\epsilon )= & {} M_1\left||\varvec{\alpha }\right||-\frac{n\epsilon }{2}\left||{{\varvec{k}}}_{\parallel \mathcal {S} } \right||^2+n^2+2M_2\frac{n\sqrt{n}}{\sqrt{2\epsilon }},\\ B_2(\epsilon )= & {} M_1\left||\varvec{\alpha }\right||+\frac{n\epsilon }{2}\left||{{\varvec{k}}}_{\parallel \mathcal {S} } \right||^2+2M_2\frac{n\sqrt{n}}{\sqrt{2\epsilon }}. \end{aligned}$$

\(\square \)

The main idea of the proof is to set the drift of a carefully chosen test function V(.) to zero. The choice of this function is crucial to obtain tight heavy-traffic bounds. We will now briefly motivate our choice of the function V(.). For a discrete-time single-server (G/G/1) queue, with q(t) that evolves according to \(q(t+1)=q(t)+a(t)-s(t)+u(t)\), the right test function to obtain tight queue length bounds is \(q^2\). Such a bound is known as the Kingman bound [20, Section 10.1]. Next, consider a load balancing system under the ‘Join the shortest queue’(JSQ) policy operating in discrete time. There are a finite number of servers, each with a separate queue, similar to supermarket checkout lanes. Whenever a user arrives into the system, (s)he joins the queue with the shortest length, breaking ties uniformly at random. Tight heavy-traffic queue length bounds are obtained for this system in [6] by first showing that the queue lengths collapse to a single dimension where they are all equal. Then, tight bounds are obtained by setting the drift of the quadratic test function \(\mathbb {E} [(\sum _i \overline{q} _i)^2]\) to zero in steady state. This function is the same as \(\left||\mathbf{q} _{\parallel }\right||^2\) (up to a factor of n, which is not important), where \(\mathbf{q} _{\parallel }\) denotes the projection of the queue length vector \(\mathbf{q}\) onto the region of state space collapse, which is the line along the vector that has ones in all components.

These examples motivate us to choose the norm square of projection of the queue lengths vector onto the region of state space collapse as the test function in general. In the case of the switch, this would be \(\left||\mathbf{q} _{\parallel \mathcal {K} }\right||^2\). However, a projection operator onto a convex cone is difficult to study because it is not linear. Moreover, a closed form expression for projection onto a general cone is not known. Therefore, we relax this function and use \(\left||\mathbf{q} _{\parallel \mathcal {S} } \right||^2\) as the test function. Since we use the relaxed function, it is sufficient to use state space collapse into the space \(\mathcal {S} _{n_1n_2} \), as opposed to the stronger form of state space collapse into the cone \(\mathcal {K} _{n_1n_2} \). Note that, in the proof of Theorem 1, we use state space collapse only at two instances, viz., (16) and (22), and both these use the weaker result, \(\mathbb {E} \left[ \Vert \overline{\mathbf{q}} _{\perp \mathcal {S} } ^{(\epsilon )} \Vert ^r\right] \). In other words, we only use the fact that the state collapses to the space \(\mathcal {S} _{n_1n_2} \) and not the cone \(\mathcal {K} _{n_1n_2} \). Such a relaxation works because of the property of the cone \(\mathcal {K} _{n_1n_2} \) that it is just the intersection of the space it spans, \(\mathcal {S} _{n_1n_2} \), with the positive orthant \(\mathbb {R} ^{n^2}_+\), as proved in Lemma 3. Therefore, any positive queue length vector that is in the space \(\mathcal {S} _{n_1n_2} \) is also in the cone \(\mathcal {K} _{n_1n_2} \). Since queue length vectors are nonnegative, if we know that a queue length vector collapses onto the space \(\mathcal {S} _{n_1n_2} \), we know that it should collapse onto the cone \(\mathcal {K} _{n_1n_2} \).

Even though we only need the weaker version of state space collapse, viz., the bound on \(\mathbb {E} \left[ \Vert \overline{\mathbf{q}} _{\perp \mathcal {S} } ^{(\epsilon )} \Vert ^r\right] \), we proved a stronger version, viz., the bound on \(\mathbb {E} \left[ \Vert \overline{\mathbf{q}} _{\perp \mathcal {K} } ^{(\epsilon )} \Vert ^r\right] \) in Proposition 2 for completeness. The proof of weaker state space collapse can be much simpler because projection onto a subspace is a linear operator, while projection onto a convex cone is not. Therefore, we can write down the complete proof of Theorem 1 (including the weaker version of Proposition 2) without even referring to the cone \(\mathcal {K} _{n_1n_2} \).

The intuition behind our choice of the test function as discussed above may be applicable to other problem settings beyond just the switch system. For any problem that exhibits state space collapse into a cone that satisfies a property similar to Lemma 3, we may be able to use \(\left||\mathbf{q} _{\parallel \mathcal {S} } \right||^2\) as the test function with appropriately defined inner product and norm. Moreover, as illustrated in the proof of Theorem 1, we don’t need to know the explicit closed form of \(\mathbf{q} _{\parallel \mathcal {S} } \) to use this test function. Our presentation of the test function in the form \(\left||\mathbf{q} _{\parallel \mathcal {S} } \right||^2\) that is easily generalizable to other problem settings is a major contribution of this paper. Such a test function with a novel inner product has been successfully used in [9, 23]. The reason for the choice of \(\left||\mathbf{q} _{\parallel \mathcal {S} } \right||^2\) as a test function is the following: When a quadratic test function is chosen, and the drift is set to zero, we get several terms as in (14). Among these terms, the cross term between \(\overline{\mathbf{q}} ^+\) and \(\mathbf{u} (\overline{\mathbf{q}} )\), as in (14), is the crucial term to bound in heavy-traffic. Under state space collapse, this term should be small and can be bounded using an argument similar to that in the set of equations ending in (21).

We now make a short remark about the scheduling policy. In the proof of Theorem 1, we do not use any details about the scheduling policy, except for the fact that it is throughput-optimal and exhibits state space collapse as in Proposition 2. Therefore, the queue length bounds in Theorem 1 hold true under any throughput-optimal algorithm that exhibits state space collapse as in Proposition 2. Moreover, in the proof of state space collapse in Proposition 2, we use the details of the MaxWeight scheduling policy only in the Claim 1 in Sect. 3.2.

4 Discussion

In this section, we present various corollaries and extensions of Theorem 1, and interpret the results. The following corollary gives a bound on the sum of the queue lengths.

Corollary 1

Consider the set of switch systems operating under the MaxWeight algorithm as described in Theorem 1, with \(0\le n_1 ,n_2 <n\). Then, in the heavy-traffic limit, we have

$$\begin{aligned} \frac{1}{2\kappa _{\max }}\left\langle \varvec{\sigma }^2, \varvec{\zeta } \right\rangle \le \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \sum _{ij} \overline{q} _{ij} ^{(\epsilon )} \right] \le \frac{1}{2\kappa _{\min }}\left\langle \varvec{\sigma }^2, \varvec{\zeta } \right\rangle , \end{aligned}$$

where \(\kappa _{\max } =\max _{i\le n_1 ,j\le n_2 }\{\kappa _i ,\widetilde{\kappa }_j \}\) and \(\kappa _{i\le n_1 ,j\le n_2 } =\min _{ij} \{\kappa _i ,\widetilde{\kappa }_j \}\). Moreover, under any stable algorithm the sum of all the queue lengths is lower-bounded by

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \sum _{ij} \overline{q}_{ij} ^{(\epsilon )} \right] \ge \frac{1}{2}\max \left\{ \left\langle \varvec{\sigma }^2, \varvec{\zeta }' \right\rangle ,\left\langle \varvec{\sigma }^2, \varvec{\zeta }'' \right\rangle \right\} , \end{aligned}$$

where \(\zeta '_{ij} = 1/\kappa _i \) and \(\zeta '_{ij} = 1/\widetilde{\kappa }_j \).

Proof

It is easy to see that there exists a weight vector \(\varvec{\alpha } \in \mathbb {R} ^{n^2}_+\) such that \(\kappa _{\min }\le \alpha _{ij} \) for all ij, \(\left\langle \varvec{\alpha },\mathbf{e}^{(i)} \right\rangle =n\kappa _i \) for \( i \le n_1 \) and \(\left\langle \varvec{\alpha },\widetilde{\mathbf{e}}^{(j)} \right\rangle =n\widetilde{\kappa }_j \) for \(j \le n_2 \). Using such a weight vector in Theorem 1, we get the upper bound. Similarly, by picking an \(\varvec{\alpha }\) such that \(\kappa _{\max }\ge \alpha _{ij} \) for all ij we get the lower bound. \(\square \)

If the saturation rate vector \({{\varvec{k}}}\) is such that \(\kappa _{\max }/\kappa _{\min }\) is a constant that does not scale with n, it is easy to see that the sum of all the queue lengths under MaxWeight are within a constant factor of the universal lower bound after noting that \(\zeta _{ij} \le 2\) for all ij.

For some values of \({{\varvec{k}}}\), we get an exact expression for the sum of queue lengths in heavy-traffic.

Corollary 2

Suppose that \({{\varvec{k}}}=\varvec{\nu }\) in the set of switch systems described in Theorem 1, so that \(\varvec{\lambda }^{\epsilon }=\varvec{\nu }(1-\epsilon )\) for some \(\varvec{\nu } \in \text {Relint}(\mathcal {F} _{n_1n_2} )\) such that \(\nu _{\min } \triangleq \min _{ij} \nu _{ij} >0\) and \(0\le n_1 ,n_2 <n\). Then, in heavy-traffic we have

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \sum _{ij} \overline{q}_{ij} ^{(\epsilon )} \right]= & {} \frac{1}{2}\left\langle \varvec{\sigma }, \varvec{\zeta } \right\rangle \text { and }\\ \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \overline{q} _{ij}^{(\epsilon )} \right]= & {} 0 \ \ \forall \ \ i>n_1 ,j>n_2 . \end{aligned}$$

Proof

Since \(\kappa _i =1\) and \(\widetilde{\kappa }_j =1\) for \(i\le n_1 \) and \(j\le n_2 \), we pick \(\alpha _{ij} =1\) for all ij.

Since the weight vector \(\varvec{\alpha }\) satisfies the condition in Theorem 1, the corollary follows.

\(\square \)

The following corollary considers the case when exactly one port of the switch is saturated. Under this condition, the switch is said to satisfy the complete resource pooling condition and was studied in [21]. In this case, the state collapses onto a line. Without loss of generality, the corollary is stated when an input port is saturated.

Corollary 3

Consider the set of switch systems operating under the MaxWeight algorithm as described in Theorem 1, with \(n_1 =1\), \(n_2 =0\) and \({{\varvec{k}}}=\varvec{\nu }\) so that \(\varvec{\lambda }^{\epsilon }=\varvec{\nu }(1-\epsilon )\) for some \(\varvec{\nu } \in \text {Relint}(\mathcal {F} _{10})\) such that \(\nu _{\min } \triangleq \min _{ij} \nu _{ij} >0\). Then, in heavy-traffic we have

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \sum _{j} \overline{q} _{1j}^{(\epsilon )} \right]= & {} \frac{\sum _{1j}\sigma ^2_{1j}}{2} \text { and }\\ \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \overline{q} _{ij}^{(\epsilon )} \right]= & {} 0 \ \ \forall \ \ i>1,j. \end{aligned}$$

Moreover, MaxWeight is heavy-traffic optimal, i.e., \(\lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \sum _{ij} \overline{q} _{1j}^{(\epsilon )} \right] \) is minimized under the MaxWeight algorithm.

The proof follows directly from Theorem 1 and the universal lower bound in Proposition 1. In order to clearly see the scaling of queue lengths in terms of n, we state the following corollary under Bernoulli arrivals, which again follows from Corollary 2 and Proposition 1.

Corollary 4

Consider the set of switch systems operating under the MaxWeight algorithm, as described in Theorem 1. Suppose that the arrival process for each queue is Bernoulli with the arrival rate vector \(\varvec{\lambda }^{(\epsilon )} \), where \(\lambda ^{(\epsilon )} _{ij} =(1-\epsilon )/2n\) for \(i>n_1 , j>n_2 \) and \(\lambda ^{(\epsilon )} _{ij} =(1-\epsilon )/n\) otherwise, with \(0\le n_1 ,n_2 <n\) so that \(n_1\) inputs and \(n_2\) outputs are saturating. Then, in the heavy-traffic limit, we have

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \sum _{ij} \overline{q}_{ij} ^{(\epsilon )} \right] = \frac{n_1 +n_2 }{2}\left( 1-\frac{1}{n}\right) . \end{aligned}$$

Moreover, under any stable algorithm the sum of all the queue lengths is lower-bounded by

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \sum _{ij} \overline{q}_{ij} ^{(\epsilon )} \right] \ge \frac{\max \{n_1 ,n_2 \}}{2}\left( 1-\frac{1}{n}\right) . \end{aligned}$$

So, we know that MaxWeight is within less than a factor of two away from heavy-traffic optimality under incomplete saturation. A similar observation was made under the completely saturated case in [11]. It is not clear if the gap is because the lower bound is loose or because MaxWeight is a constant factor away from heavy-traffic optimality. Under the MaxWeight algorithm, the sum of all queue lengths is as if we have \((n_1 +n_2 )\) separate queues, each with variance \((1-\frac{1}{n})\), which is the total variance of the arrivals in each row or column. This is because there are \((n_1 +n_2 )\) independent constraints on the capacity region that are tight in the limit. This is the same reason why the state collapses to the \((n_1 +n_2 )\)-dimensional space \(\mathcal {S} _{n_1n_2} \). So, in general the number of tight constraints in the limit is important.

Theorem 1 is valid only for the incompletely saturated switch, \(n_1 <n\) and \(n_2 <n\). However, a similar result can be proved for the completely saturated case \(n_1 =n_2 =n\) as follows.

Corollary 5

Consider the set of switch systems operating under the MaxWeight algorithm, parameterized by \(0<\epsilon <1\), as described in Theorem 1, with the only difference being that, \(n_1 =n_2 =n\). Then, as long as \(0< \epsilon \le \nu _{\min }' /2\Vert {{\varvec{k}}}\Vert \), the steady state queue lengths vector satisfies

$$\begin{aligned} \left( 1-\frac{1}{2n}\right) \frac{\left\| \varvec{\sigma }^{(\epsilon )} \right\| ^2}{\epsilon } - B_3(\epsilon ,n) \le \mathbb {E} \left[ \left\langle \overline{\mathbf{q}} ^{(\epsilon )} ,\varvec{\alpha } \right\rangle \right] \le \left( 1-\frac{1}{2n}\right) \left\| \varvec{\sigma }^{(\epsilon )} \right\| ^2 + B_4(\epsilon ,n) \end{aligned}$$

for any fixed weight vector \(\varvec{\alpha } \in \mathbb {R} ^{n^2}\) such that \(\left\langle \varvec{\alpha },\mathbf{e}^{(i)} \right\rangle =n\kappa _i \) and \(\left\langle \varvec{\alpha },\widetilde{\mathbf{e}}^{(j)} \right\rangle =n\widetilde{\kappa }_j \) for all \(1\le i,j \le n\), where \(B_3(\epsilon )\) and \(B_4(\epsilon )\) are \(o(\frac{1}{\epsilon })\). Thus, in the heavy-traffic limit as \(\epsilon \downarrow 0\), we have

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \left\langle \overline{\mathbf{q}} ^{(\epsilon )} ,\varvec{\alpha } \right\rangle \right] = \left( 1-\frac{1}{2n}\right) \left\| \varvec{\sigma }\right\| ^2. \end{aligned}$$

Proof

Note that Proposition 2 is valid in the case when \(n_1 =n_2 =n\). Most of the proof of Theorem 1 also holds true, except for Lemma 4. The norm of the projections of unit vectors \(\varvec{\chi }^{(ij)}\) onto the cone \(\mathcal {S} _{nn}\) is different from \(\zeta _{ij} \). This fundamental difference in behavior for the case \(n_1 =n_2 =n\) is for the following reason: For \(n_1 ,n_2 <n\), the cone \(\mathcal {S} _{n_1n_2} \) is spanned by the vectors \(\mathbf{e}^{(i)} ,\widetilde{\mathbf{e}}^{(j)} \) for \(i\le n_1 , j\le n_2 \), which are linearly independent, and so the cone \(\mathcal {S} _{n_1n_2} \) has dimension \((n_1 +n_2 )\). When \(n_1 =n_2 =n\), the vectors \(\mathbf{e}^{(i)} ,\widetilde{\mathbf{e}}^{(j)} \) for \(i\le n, j\le n\) are not linearly independent because, clearly, \(\sum _i \mathbf{e}^{(i)} = \sum _j \widetilde{\mathbf{e}}^{(j)} \), and so the dimension of the cone \(\mathcal {S} _{nn}\) is smaller than 2n. It can be shown that the cone \(\mathcal {S} _{nn}\) has dimension \((2n-1)\) [25, page 20]. The proof will be complete once we calculate \(\left||\varvec{\chi }^{(ij)}_{\parallel \mathcal {S} } \right||^2\) for all ij. By symmetry, we have that these norms have the same value for all ij, i.e., \(\left||\varvec{\chi }^{(ij)}_{\parallel \mathcal {S} } \right||^2 = \xi \) for some \(\xi \), which can be calculated as follows: Suppose \(\mathbf{f} _1, \mathbf{f} _2, \ldots , \mathbf{f} _{2n-1}\) is an orthonormal basis of \(\mathcal {S} _{nn}\). We have

$$\begin{aligned} n^2\xi&= \sum _{ij}\left||\varvec{\chi }^{(ij)}_{\parallel \mathcal {S} } \right||^2\\&= \sum _{ij}\sum _{l=1}^{2n-1}\left\langle \varvec{\chi }^{(ij)},\mathbf{f} _l\right\rangle ^2\\&= \sum _{l=1}^{2n-1}\sum _{ij}\left\langle \varvec{\chi }^{(ij)},\mathbf{f} _l\right\rangle ^2\\&\mathop {=}\limits ^{(a)} \sum _{l}^{2n-1}\left||\mathbf{f} _l\right||_{\parallel \mathcal {S} } ^2\\&= 2n-1, \end{aligned}$$

where (a) follows from the fact that \(\{\varvec{\chi }^{(ij)}\}_{ij}\) is an orthonormal basis of \(\mathbb {R} ^{(n^2)}\). Replacing \(\zeta _{ij} \) by \(\xi =(2n-1)/n^2\), we get the corollary.\(\square \)

Similar to Corollary 1, we can get lower and upper bounds on the sum of queue lengths in heavy-traffic using \(\kappa _{\max }\) and \(\kappa _{\min }\). We now state the following corollary to illustrate the use of the weight vectors \(\varvec{\alpha }\).

Corollary 6

Consider the completely saturated switch system in Corollary 5 with \({{\varvec{k}}}=\varvec{\nu }\). Then, in the heavy-traffic limit the queue lengths satisfy the following relations:

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \sum _{ij} \overline{q}_{ij} ^{(\epsilon )} \right]= & {} \left( 1-\frac{1}{2n}\right) \left\| \varvec{\sigma }\right\| ^2,\\ \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \sum _{ij} \overline{q}_{ij} \nu _{ij} ^{(\epsilon )} \right]= & {} \left( \frac{2n-1}{2n^2}\right) \left\| \varvec{\sigma }\right\| ^2,\\ \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \sum _{ij} \overline{q}_{i\pi (i)}^{(\epsilon )} \right]= & {} \left( \frac{2n-1}{2n^2}\right) \left\| \varvec{\sigma }\right\| ^2 \text { for any permutation } \pi . \end{aligned}$$

Proof

The proof follows directly from Corollary 5 by choosing \(\varvec{\alpha }=\mathbf{1} \), \(\varvec{\alpha }=n\varvec{\nu }\) and \(\varvec{\alpha }=nP_{\pi }\), respectively, where \(P_{\pi }\) is the permutation matrix corresponding to the permutation \(\pi \). \(\square \)

Note that the first result above is the main result of [11]. Even though the proof of Theorem 1 in [11] is written in a different style, it is equivalent to the proof presented here, because the function used there is equivalent to the norm \(\left||\mathbf{q} _{\parallel \mathcal {S} } \right||^2\). It can be shown (using the orthogonality principle) that, for any vector \(\mathbf{q} \in \mathbb {R} ^{n^2}\), its projection onto the space \(\mathcal {S} _{nn}\) is given by

$$\begin{aligned} q_{\parallel \mathcal {S} ij}=\frac{\sum _iq_{ij} }{n}+\frac{\sum _jq_{ij} }{n}-\frac{\sum _{ij} q_{ij} }{n^2}. \end{aligned}$$

Taking the norm, we get

$$\begin{aligned} \left||\mathbf{q} _{\parallel \mathcal {S} } \right||^2 = \frac{1}{n}\left( \sum _i \left( \sum _jq_{ij} \right) ^2 +\sum _j \left( \sum _i q_{ij} \right) ^2 - \frac{1}{n} \left( \sum _{ij} q_{ij} \right) ^2 \right) , \end{aligned}$$

which is the function used in [11] scaled by n. We now present a further special case, which was also studied in [11], to contrast it with the result in Corollary 4.

Corollary 7

Suppose that the traffic of the completely saturated systems in Corollary 5 is uniform Bernoulli with uniform saturation rate, i.e., \(\varvec{\lambda }^{(\epsilon )} =(1-\epsilon )/n\mathbf{1} \). In heavy-traffic we have

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \epsilon \mathbb {E} \left[ \sum _{ij} \overline{q}_{ij} ^{(\epsilon )} \right] = \frac{2n-1}{2}\left( 1-\frac{1}{n}\right) . \end{aligned}$$

Notice that the behavior of the queue length here is similar to \((2n-1)\) separate queues. This is consistent with the discussion after Corollary 4 because in the heavy-traffic limit, among the 2n constraints in the capacity region, we have only \((2n-1)\) linearly independent ones.

5 Conclusion

We consider the heavy-traffic queue length behavior in an input-queued switch operating under the MaxWeight algorithm. It was recently shown in [11] that, in the heavy-traffic regime, the queue length scales optimally with the size of the switch when all the ports in the switch saturate at the same rate. In this paper, we considered the case when an arbitrary set of ports saturate, and each port is allowed to saturate at different rate. We obtained an exact heavy-traffic characterization of a linear combination of queue lengths and showed that the MaxWeight algorithm achieves optimal scaling of the sum of all the queue lengths in heavy-traffic.