Keywords

1 Introduction

Online load balancing problem is an active research topic since last century. Instead of the traditional measurement of algorithm performance, competitive ratio (e.g., [2, 3, 5, 12]), we utilize another well-known measurement as “Regret”, involved by [6].

In this paper we define online load balancing problem as follows. There are K parallel servers and the protocol is defined as a game between the player and the environment. On each round \(t=1,\dots , T\), (i) the player selects a distribution \(\varvec{\alpha }_{t}\) over K servers, which can be viewed as an allocation of data, (ii) then the environment assigns a loaded condition \(l_{t,i}\) for each server i and the loss of server i is given as \(\alpha _{t,i}l_{t,i}\). The goal of the player is to minimize the makespan of the cumulative loss vector of all servers after T rounds, i.e., \(\max _{i=1,\dots ,K} \sum _{t=1}^T \alpha _{t,i}l_{t,i}\), compared relatively to the makespan obtained by the optimal static allocation \(\varvec{\alpha }^*\) in hindsight. More precisely, the goal is to minimize the regret, the difference between the player’s makespan and the static optimal makespan. The makespan cost can be viewed as \(L_\infty \)-norm of the vector of cumulative loss of each server (we will give a formal definition of the problem in the next section).

Even-Dar et al. [6] gave an algorithm based on the regret minimum framework by involving an extra concept, the Blackwell approachability [4] with respect to \(L_{2}\)-norm, to a target set, which is defined in the following section. This algorithm achieves the regret bound as \(O(\sqrt{KT}).\) Simultaneously another algorithm, DIFF, achieves the regret upper bound as \(O((\ln K)\sqrt{T}).\) Rahklin et al. [13] gave a theoretical result for the online load balancing problem, that the upper bound to regret can achieve \(O(\sqrt{(\ln K)T}),\) rather than \(O((\ln K)\sqrt{T}).\) However there is no efficient algorithm given in this paper to obtain this regret.

Then, there were some explorations about the equivalence between the Blackwell approachability and online linear optimization (OLO) [1], in addition and online convex optimization (OCO) by involving a support function [15].

These work [1, 15] implied that the Blackwell approachability with respect to general norm can be guaranteed by sub-linearity of regret from OCO problem reduced by Blackwell approaching game. Moreover due to this result we give an efficient algorithm to online load balancing problem, achieving the best known regret.

More specifically speaking, we propose algorithms for online load balancing for arbitrary norms under a natural assumption. This algorithm is composed by three reductions. (i) First reduction is from load balancing problem to Blackwell approaching game in a general metric. In this reduction we extend the \(L_{2}\)-norm of load balancing problem in [6] to any general norm with a reasonable assumption. In this reduced Blackwell approaching game the metric is induced by the norm of load balancing problem. This reduction implies that the regret of load balancing problem can be bounded by the convergence rate of a corresponding Blackwell approaching game. (ii) Second reduction directly follows the existing work. Due to [15], we give a reduction from Blackwell approaching game to an OCO problem, by showing the existence of such reduction. Thus we can bound the regret of online load balancing with the regret of corresponding OCO problem. (iii) The last reduction is from OCO problem to two OLO problems, so that we can predict with FTRL.

Conclusively, we can predict the allocation of serves on each round in online load balancing according to the prediction of corresponding two OLO problems. Simultaneously we give the regret bound of online load balancing problem with this OLO regret.

Thus our technical contributions are the following:

  • We propose a new reduction technique from online load balancing to a Blackwell approaching game. This reduction enables us to use more general norms, induced by online load balancing, in Blackwell approaching game rather than \(L_2\)-norm used in the previous work [6]. Based on this reduction we can reduce online load balancing with general norm to OLO problem, by using the reduction technique of Shimkin [15] from Blackwell games to OCO problem, further to OLO problems. In conclusion, online load balancing problem can be reduced to two OLO problems according to our reduction route. Therefore the regret bound to online load balancing problem can be optimized by the corresponding OLO regret bound.

  • Especially, according to above reduction route, we give an efficient algorithm for online load balancing w.r.t. \(L_\infty \)-norm, achieving the best known \(O(\sqrt{T\ln K})\) regret. The algorithm involves linear programming and the second order cone programming and runs in polynomial time per trial. This is the first polynomial time algorithm achieving \(O(\sqrt{T\ln K})\) regret.

2 Preliminaries

First we give some notations. We use \(\left\Vert \cdot \right\Vert \) to denote a norm of a vector. Moreover, for a norm \(\left\Vert \cdot \right\Vert \), \(\left\Vert \varvec{x} \right\Vert _*\) denotes the dual norm of \(\left\Vert \varvec{x} \right\Vert .\) A norm \(\left\Vert \cdot \right\Vert \) over \(\mathbb {R}^d\) is monotone if \(\left\Vert \varvec{x} \right\Vert \le \left\Vert \varvec{y} \right\Vert \) whenever \(|x_i| \le |y_i|\) for every \(1 \le i \le d\).

2.1 Online Load Balancing

Firstly we begin with a standard (offline) load balancing problem. Suppose that we have K servers to do a simple task with a large amount of data. The task can be easily parallelized in such a way that we can break down the data into K pieces and assign them to the servers, and then each server processes the subtask in time proportional to the size of data assigned. An example is to find blacklisted IP addresses in an access log data. Each server is associated with loaded condition, expressed in terms of “the computation time per unit data”. The goal is to find a data assignment to the servers so as to equalize the computation time for all servers. In other words, we want to minimize the makespan, defined as the maximum of the computation time over all servers.

Formally, the problem is described as follows: The input is a K-dimensional vector \(\varvec{l}=(l_1, l_2, \ldots , l_K) \in \mathbb {R}_+^K\), where each \(l_i\) represents the loaded condition of the i-th server. The output is a K-dimensional probability vector \(\varvec{\alpha }= (\alpha _1, \alpha _2, \ldots , \alpha _K) \in \varDelta (K) = \{ \varvec{\alpha }\in [0,1]^K \mid \sum _{i=1}^K \alpha _i = 1 \}\), where each \(\alpha _i\) represents the fraction of data assigned to the i-th server. The goal is to minimize the makespan \(\left\Vert \varvec{\alpha }\odot {\varvec{l}} \right\Vert _\infty \), where \(\varvec{\alpha }\odot {\varvec{l}}= (\alpha _1 l_1, \alpha _2 l_2, \ldots , \alpha _K l_K)\). Note that it is clear that the optimal solution is given by \(\alpha _i = l_i^{-1}/\sum _{j=1}^K l_j^{-1}\), which equalizes the computation time of every server as \( C_\infty ^*({\varvec{l}})\,{\mathop {=}\limits ^{\text {def}}}\, \min _{\varvec{\alpha }\in \varDelta (K)} \left\Vert \varvec{\alpha }\odot {\varvec{l}} \right\Vert _\infty = \frac{1}{\sum _{j=1}^K 1/l_j}.\)

Note also that the objective is generalized to the \(L_p\)-norm for any p in the literature.

In this paper, we consider a more general objective \(\left\Vert \varvec{\alpha }\odot {\varvec{l}} \right\Vert \) for an arbitrary norm that satisfies certain assumptions stated below. In the general case, the optimal value is denoted by \( C^*({\varvec{l}}) {\mathop {=}\limits ^{\text {def}}} \min _{\varvec{\alpha }\in \varDelta (K)} \left\Vert \varvec{\alpha }\odot {\varvec{l}} \right\Vert .\)

Assumption 1

Throughout the paper, we put the following assumptions on the norm. (i) The norm is monotone, and (ii) the function \(C^{*}\) is concave.

Note that the first assumption is natural for load balancing and the both assumptions are satisfied by \(L_p\)-norm for \(p > 1\).

Now we proceed to the online load balancing problem with respect to a norm \(\left\Vert \cdot \right\Vert \) that satisfies Assumption 1. The problem is described as a repeated game between the learner and the environment who may behave adversarially. In each round \(t = 1, 2, \ldots , T\), the learner chooses an assignment vector \(\varvec{\alpha }_t \in \varDelta (K)\), and then receives from the environment a loaded condition vector \({\varvec{l}}_t \in [0,1]^K\), which may vary from round to round. After the final round is over, the performance of the learner is naturally measured by \(\left\Vert \sum _{t=1}^T \varvec{\alpha }_t \odot {\varvec{l}}_t \right\Vert \). We want to make the learner perform nearly as well as the performance of the best fixed assignment in hindsight (offline optimal solution), which is given by \(C^*(\sum _{t=1}^T {\varvec{l}}_t)\). To be more specific, the goal is to minimize the following regret:

$$\begin{aligned}\text {Regret}(T) = \left\Vert \sum _{t=1}^T \varvec{\alpha }_t \odot {\varvec{l}}_t \right\Vert - C^*\left( \sum _{t=1}^T {\varvec{l}}_t\right) .\end{aligned}$$

2.2 Repeated Game with Vector Payoffs and Approachability

We briefly review the notion of Blackwell’s approachability, which is defined for a repeated game with vector payoffs. The game is specified by a tuple \((A, B, r, S, \text {dist})\), where A and B are convex and compact sets, \(r:A \times B \rightarrow \mathbb {R}^d\) is a vector-valued payoff function, \(S \subseteq \mathbb {R}^d\) is a convex and closed set called the target set, and \(\text {dist}:\mathbb {R}^d \times \mathbb {R}^d \rightarrow \mathbb {R}_+\) is a metric. The protocol proceeds in trials: In each round \(t=1, 2, \ldots , T\), the learner chooses a vector \(\varvec{a}_t \in A\), the environment chooses a vector \(\varvec{b}_t \in B\), and then the learner obtains a vector payoff \(\varvec{r}_t \in \mathbb {R}^d\), given by \(\varvec{r}_t = r(\varvec{a}_t, \varvec{b}_t)\). The goal of the learner is to make the average payoff vector arbitrarily close to the target set S.

Definition 1 (Approachability)

For a game \((A,B,r,S,\mathrm {dist})\), the target set S is approachable with convergence rate \(\gamma (T)\) if there exists an algorithm for the learner such that the average payoff \(\bar{\varvec{r}}_T = (1/T) \sum _{t=1}^T \varvec{r}_t\) satisfies

$$ \mathrm {dist}(\bar{\varvec{r}}_T,S) {\mathop {=}\limits ^{\mathrm {def}}} \min _{\varvec{s}\in S} \mathrm {dist}(\bar{\varvec{r}}_T, \varvec{s}) \le \gamma (T) $$

against any environment. In particular, we simply say that S is approachable if it is approachable with convergence rate o(T).

Blackwell characterizes the approachability in terms of the support function as stated in the proposition below.

Definition 2

For a set \(S \subseteq \mathbb {R}^d\), the support function \(h_S: \mathbb {R}^d \rightarrow \mathbb {R}\cup \{\infty \}\) is defined as

$$ h_S(\varvec{w}) = \sup _{\varvec{s}\in S} \langle \varvec{s}, \varvec{w} \rangle . $$

It is clear from definition that \(h_S\) is convex whenever S is convex.

Definition 3

(Blackwell [4]). A game \((A,B,r,S,\mathrm {dist})\) satisfies Blackwell Condition, if and only if

$$\begin{aligned} \forall \varvec{w}\in \mathbb {R}^d \; \left( \min _{\varvec{a}\in A} \min _{\varvec{b}\in B} \langle \varvec{w}, r(\varvec{a}, \varvec{b}) \rangle \le h_S(\varvec{w}) \right) . \end{aligned}$$
(1)

Remark 1

In [4], Blackwell characterized the approachability of a target set for \(L_{2}\)-norm metric in terms of the Blackwell condition.

In what follows, we only consider a norm metric, i.e, \(\mathrm {dist}(\varvec{r},\varvec{s}) = \left\Vert \varvec{r}- \varvec{s} \right\Vert \) for some norm \(\left\Vert \cdot \right\Vert \) over \(\mathbb {R}^d\). The following proposition is useful.

Proposition 1

For any \(\varvec{w}\in \mathbb {R}^d\), \(\varvec{s}^* = \arg \max _{\varvec{s}\in S} \langle \varvec{s}, \varvec{w} \rangle \) is a sub-gradient of \(h_S(\varvec{w})\) at \(\varvec{w}\).

Proof

For any \(\varvec{w}, {\varvec{u}}\in \mathbb {R}^d\), let and . Since \(\langle \varvec{s}^*, {\varvec{u}} \rangle \le \langle \varvec{s}^{\varvec{u}}, {\varvec{u}} \rangle \), we have

$$\begin{aligned} h_S(\varvec{w}) - h_S({\varvec{u}})&= \sup _{\varvec{s}\in S} \langle \varvec{s}, \varvec{w} \rangle - \sup _{\varvec{s}\in S}\langle \varvec{s}, {\varvec{u}} \rangle = \langle \varvec{s}^*, \varvec{w} \rangle - \langle \varvec{s}^{\varvec{u}}, {\varvec{u}} \rangle \\&\le \langle \varvec{s}^*, \varvec{w}- {\varvec{u}} \rangle , \end{aligned}$$

which implies the proposition.   \(\square \)

2.3 Online Convex Optimization

In this subsection we briefly review online convex optimization with some known results. See, e.g., [7, 14] for more details.

An online convex optimization (OCO) problem is specified by (WF), where \(W \subseteq \mathbb {R}^d\) is a compact convex set called the decision set and \(F \subseteq \{f:W \rightarrow \mathbb {R}\}\) is a set of convex functions over W called the loss function set. The OCO problem (WF) is described by the following protocol between the learner and the adversarial environment. For each round \(t=1,2,\ldots ,T\), the learner chooses a decision vector \(\varvec{w}_t \in W\) and then receives from the environment a loss function \(f_t \in F\). In this round, the learner incurs the loss given by \(f_t(\varvec{w}_t)\). The goal is to make the cumulative loss of the learner nearly as small as the cumulative loss of the best fixed decision. To be more specific, the goal is to minimize the following regret:

$$ \mathrm {Regret}_{(W,F)}(T) = \sum _{t=1}^T f_t(\varvec{w}_t) - \min _{\varvec{w}\in W} \sum _{t=1}^T f_t(\varvec{w}). $$

Here we add the subscript (WF) to distinguish from the regret for online load balancing.

Any OCO problem can be reduced to an online linear optimization (OLO) problem, which is an OCO problem with linear loss functions. More precisely, an OLO problem is specified by (WG), where \(G \subseteq \mathbb {R}^d\) is the set of cost vectors such that the loss function at round t is \(\langle \varvec{g}_t, \cdot \rangle \) for some cost vector \(\varvec{g}_t \in G\). For the OLO problem (WG), the regret of the learner is thus given by

$$ \mathrm {Regret}_{(W,G)}(T) = \sum _{t=1}^T \langle \varvec{g}_t, \varvec{w}_t \rangle - \min _{\varvec{w}\in W} \sum _{t=1}^T \langle \varvec{g}_t, \varvec{w} \rangle . $$

The reduction from OCO to OLO is simple. Run any algorithm for OLO (WG) with \(\varvec{g}_t \in \partial f_t(\varvec{w}_t)\), and then it achieves \(\mathrm {Regret}_{(W,F)}(T) \le \mathrm {Regret}_{(W,G)}(T)\), provided that G is large enough, i.e., \(G \supseteq \bigcup _{f \in F, \varvec{w}\in W} \partial f(\varvec{w})\).

A standard FTRL (follow-the-regularized-leader) strategy for the OLO problem (WG) is to choose \(\varvec{w}_t\) as

(2)

where \(R:W \rightarrow \mathbb {R}\) is a strongly convex function called the regularizer and \(\eta _t \in \mathbb {R}_+\) is a parameter. Using the strategy (2) the following regret bound is known.

Proposition 2

([14]). Suppose that the regularizer \(R:W \rightarrow \mathbb {R}\) is \(\sigma \)-strongly convex w.r.t. some norm \(\Vert \cdot \Vert \), i.e., for any \(\varvec{w}, {\varvec{u}}\in W\), for any \(\varvec{z}\in \partial R(\varvec{w})\), \(R({\varvec{u}}) \ge R(\varvec{w}) + \langle \varvec{z},{\varvec{u}}-\varvec{w}\rangle + \frac{\sigma }{2}\Vert {\varvec{u}}-\varvec{w}\Vert ^2 \). Then, for the OLO problem (WG), the regret of the strategy (2) satisfies

$$ \mathrm {Regret}_{(W,G)}(T) = O(D_R L_G\sqrt{T/\sigma }), $$

where \(D_R = \sqrt{\max _{\varvec{w}\in W}R(\varvec{w})}\), \(L_G=\max _{\varvec{g}\in G}\Vert \varvec{g}\Vert _*\) and \(\eta _t=(L_G/D_R)\sqrt{T/\sigma }\).

Note however that the strategy does not consider the computational feasibility at all. For efficient reduction, we need an efficient algorithm that computes a sub-gradient \(\varvec{g}\in \partial f(\varvec{w})\) when given (a representation of) \(f \in F\) and \(w \in W\), and an efficient algorithm for solving the convex optimization problem (2).

For a particular OLO problem (WG) with \(L_1\) ball decision set \(W = \{ \varvec{w}\in \mathbb {R}^d \mid \left\Vert \varvec{w} \right\Vert _1 \le 1 \}\), an algorithm called EG\(^\pm \) [8] finds in linear time the optimal solution of (2) with an entropic regularizer and achieves the following regret.

Theorem 2

([9]). For the OLO problem (WG) with \(W = \{ \varvec{w}\in \mathbb {R}^d \mid \left\Vert \varvec{w} \right\Vert _1 \le 1 \}\) and \(G = \{ \varvec{g}\in \mathbb {R}^d \mid \left\Vert \varvec{g} \right\Vert _\infty \le M \}\), EG\(^\pm \) achieves

$$ \mathrm {Regret}_{(W,G)}(T) \le M\sqrt{2T \ln (2d)}. $$

3 Main Result

In this section, we propose a meta-algorithm for online load balancing, which is obtained by combining a reduction to two independent OLO problems and an OLO algorithm (as an oracle) for the reduced problems. Note that the reduced OLO problems depend on the choice of norm for online load balancing, and the OLO problems are further reduced to some optimization problems defined in terms of the norm. For efficient implementation, we assume that the optimization problems are efficiently solved.

Now we consider the online load balancing problem on K servers with respect to a norm \(\left\Vert \cdot \right\Vert \) defined over \(\mathbb {R}^K\) that satisfies Assumption 1. The reduction we show consists of three reductions, the first reduction is to a repeated game with vector payoffs, the second one is to an OCO problem, and the last one is to two OLO problems. In the subsequent subsections, we give these reductions, respectively.

3.1 Reduction to a Vector Payoff Game

We will show that the online load balancing problem can be reduced to the following repeated game with vector payoffs, denoted by \(P = (A,B,r,S,\mathrm {dist})\), where

  • \(A = \varDelta (K)\),    \(B = [0,1]^K\),

  • \(r: A \times B \rightarrow \mathbb {R}^K \times \mathbb {R}^K\) is the payoff function defined as \(r(\varvec{\alpha }, {\varvec{l}}) = (\varvec{\alpha }\odot {\varvec{l}}, {\varvec{l}})\),

  • \(S = \{ (\varvec{x},\varvec{y}) \in [0,1]^K \times [0,1]^K \mid \left\Vert \varvec{x} \right\Vert \le C^*(\varvec{y}) \}\), and

  • \(\mathrm {dist}\) is the metric over \(\mathbb {R}^K \times \mathbb {R}^K\) defined as \(\mathrm {dist}(\varvec{r}, \varvec{s}) = \left\Vert \varvec{r}- \varvec{s} \right\Vert ^+\), where \(\left\Vert \cdot \right\Vert ^+\) is the norm over \(\mathbb {R}^K \times \mathbb {R}^K\) defined as

    $$ \left\Vert (\varvec{x},\varvec{y}) \right\Vert ^+ = \left\Vert \varvec{x} \right\Vert + \left\Vert \varvec{y} \right\Vert . $$

Here we use the convention that \(\mathbb {R}^{2K} = \mathbb {R}^K \times \mathbb {R}^K\). Note that the target set S is convex since \(\left\Vert \cdot \right\Vert \) is convex and \(C^*\) is concave by our assumption. Note also that it is easy to verify that \(\left\Vert \cdot \right\Vert ^+\) is a norm whenever \(\left\Vert \cdot \right\Vert \) is a norm, and its dual is

$$\begin{aligned} \left\Vert (\varvec{x},\varvec{y}) \right\Vert ^+_* = \max \{ \left\Vert \varvec{x} \right\Vert _*, \left\Vert \varvec{y} \right\Vert _* \}. \end{aligned}$$
(3)

The reduction is similar to that in [6], but they consider a fixed norm \(\left\Vert \cdot \right\Vert _2\) to define the metric, no matter what norm is used for online load balancing.

Proposition 3

Assume that we have an algorithm for the repeated game P that achieves convergence rate \(\gamma (T)\). Then, the algorithm, when directly applied to the online load balancing problem, achieves

$$ \mathrm {Regret}(T) \le T \gamma (T). $$

Proof

Let \(\mathcal {A}\) denote an algorithm for the repeated game P with convergence rate \(\gamma (T)\). Assume that when running \(\mathcal {A}\) against the environment of online load balancing, we observe, in each round t, \(\alpha _t \in \varDelta (K)\) output from \(\mathcal {A}\) and \({\varvec{l}}_t \in [0,1]^K\) output from the environment.

Let \((\varvec{x},\varvec{y}) = \arg \min _{(\varvec{x},\varvec{y}) \in S} \left\Vert \bar{r}_T - (\varvec{x},\varvec{y}) \right\Vert ^+\), where \(\bar{r}_T = (1/T) \sum _{t=1}^T r(\varvec{\alpha }_t,{\varvec{l}}_t)\) is the average payoff. Note that by the assumption of \(\mathcal {A}\), we have \(\left\Vert \bar{r}_T - (\varvec{x},\varvec{y}) \right\Vert ^+ \le \gamma (T)\). For simplicity, let \( L^\mathcal {A}_T = (1/T) \sum _{t=1}^T \varvec{\alpha }_t \odot {\varvec{l}}_t\) and \(L_T = (1/T) \sum _{t=1}^T {\varvec{l}}_t.\)

Then, we have

where the first inequality is from the definition of S and the triangle inequality, the third inequality is from the triangle inequality, and the fourth inequality is from the monotonicity of the norm.   \(\square \)

3.2 Reduction to an OCO Problem

Next we give the second sub-reduction from the repeated game P to an OCO problem. We just follow a general reduction technique of Shimkin [15] as given in the next theorem.

Theorem 3

([15]). Let \((A,B,r,S,\mathrm {dist})\) be a repeated game with vector payoffs, where \(\mathrm {dist}(\varvec{r},\varvec{s}) = \left\Vert \varvec{r}- \varvec{s} \right\Vert \) for some norm \(\left\Vert \cdot \right\Vert \) over \(\mathbb {R}^d\). Assume that we have an algorithm \(\mathcal {A}\) that witnesses the Blackwell condition, i.e., when given \(\varvec{w}\in \mathbb {R}^d\), \(\mathcal {A}\) finds \(\varvec{a}\in A\) such that \(\langle \varvec{w}, r(\varvec{a},\varvec{b}) \rangle \le h_S(\varvec{w})\) for any \(\varvec{b}\in B\). Assume further that we have an algorithm \(\mathcal {B}\) for the OCO problem (WF), where \(W = \{ \varvec{w}\in \mathbb {R}^d \mid \left\Vert \varvec{w} \right\Vert _* \le 1 \}\) and \(F = \{ f:\varvec{w}\mapsto \langle -r(\varvec{a},\varvec{b}), \varvec{w} \rangle + h_S(\varvec{w}) \mid \varvec{a}\in A, \varvec{b}\in B \}\). Then, we can construct an algorithm for the repeated game such that its convergence rate \(\gamma (T)\) satisfies

$$ \gamma (T) \le \frac{\mathrm {Regret}_{(W,F)}(T)}{T}. $$

Moreover, the algorithm runs in polynomial time (per round) if \(\mathcal {A}\) and \(\mathcal {B}\) are polynomial time algorithms.

For completeness, the reduction algorithm(Algorithm 1) is as follow.

figure a

The rest to show in this subsection is to ensure the existence of algorithm \(\mathcal {A}\) required for the reduction as stated in the theorem above. In other words, we show that the Blackwell condition holds for our game \(P = (\varDelta (K), [0,1]^K, r, S, \mathrm {dist})\), where \(r(\varvec{\alpha },{\varvec{l}}) = (\varvec{\alpha }\odot , {\varvec{l}}, {\varvec{l}}) \in \mathbb {R}^K \times \mathbb {R}^K\), \(S = \{(\varvec{x},\varvec{y}) \in [0,1]^K \times [0,1]^K \mid \left\Vert \varvec{x} \right\Vert \le C^*(\varvec{y}) \}\), and \(\mathrm {dist}(\varvec{r},\varvec{s}) = \left\Vert \varvec{r}- \varvec{s} \right\Vert ^+\).

Lemma 1

The Blackwell condition holds for game P. That is, for any \(\varvec{w}\in \mathbb {R}^K \times \mathbb {R}^K\), we have

$$ \min _{\varvec{\alpha }\in \varDelta (K)} \max _{{\varvec{l}}\in [0,1]^K} \langle \varvec{w}, r(\varvec{\alpha }, {\varvec{l}}) \rangle \le h_S(\varvec{w}). $$

Proof

(Proof of Lemma 1). Let \(\varvec{w}= (\varvec{w}_1,\varvec{w}_2) \in \mathbb {R}^K \times \mathbb {R}^K\). By the definition of r, the inner product in the Blackwell condition can be rewritten as a bilinear function

$$ f(\varvec{\alpha },{\varvec{l}})=\langle \varvec{w}, r(\varvec{\alpha }, {\varvec{l}}) \rangle = \sum _{i=1}^K w_{1,i} \alpha _i l_i + \sum _{i=1}^{K} w_{2,i} l_i $$

over \(\varDelta (K) \times [0,1]^K\). Therefore, f meets the condition of Minimax Theorem of von Neumann. and we have

$$ \min _{\varvec{\alpha }\in \varDelta (K)} \max _{{\varvec{l}}\in [0,1]^K} f(\varvec{\alpha }, {\varvec{l}}) = \max _{{\varvec{l}}\in [0,1]^K} \min _{\varvec{\alpha }\in \varDelta (K)} f(\varvec{\alpha }, {\varvec{l}}). $$

Let \({\varvec{l}}^* = \arg \max _{{\varvec{l}}\in [0,1]^K} \min _{\varvec{\alpha }\in \varDelta (K)} f(\varvec{\alpha },{\varvec{l}})\) and \(\varvec{\alpha }^* = \arg \min _{\varvec{\alpha }\in \varDelta (K)} \left\Vert \varvec{\alpha }\odot {\varvec{l}}^* \right\Vert \). Note that by the definition of S, we have \((\varvec{\alpha }^{*} \odot {\varvec{l}}^*, {\varvec{l}}^*) \in S\). Hence we get

$$\begin{aligned} \min _{\varvec{\alpha }\in \varDelta (K)} \max _{{\varvec{l}}\in [0,1]^K} f(\varvec{\alpha }, {\varvec{l}})&= \max _{{\varvec{l}}\in [0,1]^K} \min _{\varvec{\alpha }\in \varDelta (K)} f(\varvec{\alpha }, {\varvec{l}}) \\&= f(\varvec{\alpha }^*, {\varvec{l}}^*) \\&= \langle \varvec{w}, ((\varvec{\alpha }^{*} \odot {\varvec{l}}^*),{\varvec{l}}^*) \rangle \\&\le \sup _{\varvec{s}\in S} \langle \varvec{w}, \varvec{s} \rangle \\&= h_S(\varvec{w}), \end{aligned}$$

which completes the lemma.    \(\square \)

The lemma ensures the existence of algorithm \(\mathcal {A}\). On the other hand, for an algorithm \(\mathcal {B}\) we need to consider the OCO problem (WF), where the decision set is

$$\begin{aligned} W = \{ \varvec{w}\in \mathbb {R}^K \times \mathbb {R}^K \mid \left\Vert \varvec{w} \right\Vert ^+_* \le 1 \}, \end{aligned}$$
(4)

and the loss function set is

$$\begin{aligned} F = \{ f:\varvec{w}\mapsto \langle -r(\varvec{\alpha },{\varvec{l}}), \varvec{w} \rangle + h_S(\varvec{w}) \mid \varvec{\alpha }\in \varDelta (K), {\varvec{l}}\in [0,1]^K \}. \end{aligned}$$
(5)

Since W is a compact and convex set and F consists of convex functions, we could apply a number of existing OCO algorithms to obtain \(\mathrm {Regret}_{(W,F)}(T) = O(\sqrt{T})\). In the next subsection, we show that the problem can be simplified to two OLO problems.

3.3 Reduction to Two OLO Problems

Consider the OCO problem (WF) given by (4) and (5). Following the standard reduction technique from OCO to OLO stated in Sect. 2.3, we obtain an OLO problem (WG) to cope with, where \(G \subseteq \mathbb {R}^K \times \mathbb {R}^K\) is any set of cost vectors that satisfies

(6)

By (3), the decision set W can be rewritten as \(W = B_*(K) \times B_*(K)\) where \(B_*(K) = \{ \varvec{w}\in \mathbb {R}^K \mid \left\Vert \varvec{w} \right\Vert _* \le 1\}\) is the K-dimensional unit ball with respect to the dual norm \(\left\Vert \cdot \right\Vert _*\). By Proposition 1, any \(\varvec{s}\in \partial h_S(\varvec{w})\) is in the target set S, which is a subset of \([0,1]^K \times [0,1]^K\). Moreover, \(r(\varvec{\alpha },{\varvec{l}}) = (\varvec{\alpha }\odot {\varvec{l}}, {\varvec{l}}) \in [0,1]^K \times [0,1]^K\) for any \(\varvec{\alpha }\in \varDelta (K)\) and \({\varvec{l}}\in [0,1]^K\). Therefore, \(G = [-1,1]^K \times [-1,1]^K\) satisfies (6).

Thus, \((B_*(K) \times B_*(K), [-1,1]^K \times [-1,1]^K)\) is a suitable OLO problem reduced from the OCO problem (WF). Furthermore, we can break the OLO problem into two independent OLO problems \((B_*(K), [-1,1]^K)\) in the straightforward way: Make two copies of an OLO algorithm \(\mathcal {C}\) for \((B_*(K), [-1,1]^K)\), denoted by \(\mathcal {C}_1\) and \(\mathcal {C}_2\), and use them for predicting the first half and second half decision vectors, respectively. More precisely, for each trial t, (1) receive predictions \(\varvec{w}_{t,1} \in B_*(K)\) and \(\varvec{w}_{t,2} \in B_*(K)\) from \(\mathcal {C}_1\) and \(\mathcal {C}_2\), respectively, (2) output their concatenation \(\varvec{w}_t = (\varvec{w}_{t,1}, \varvec{w}_{t,2}) \in W\), (3) receive a cost vector \(\varvec{g}_t = (\varvec{g}_{t,1}, \varvec{g}_{t,2}) \in [0,1]^K \times [0,1]^K\) from the environment, (4) feed \(\varvec{g}_{t,1}\) and \(\varvec{g}_{t,2}\) to \(\mathcal {C}_1\) and \(\mathcal {C}_2\), respectively, to make them proceed.

It is clear that the procedure above ensures the following lemma.

Lemma 2

The OCO problem (WF) defined as (4) and (5) can be reduced to the OLO problem \((B_*(K),[0,1]^K)\), and

$$ \mathrm {Regret}_{(W,F)}(T) \le 2 \mathrm {Regret}_{(B_*(K),[0,1]^K)}(T). $$

3.4 Putting All the Pieces Together

Combining all reductions stated in the previous subsections, we get an all-in-one algorithm as described in Algorithm 2.

figure b

It is clear that combining Proposition 3, Theorem 3 and Lemma 2, we get the following regret bound of Algorithm 2.

Theorem 4

Algorithm 2 achieves

$$ \mathrm {Regret}(T) \le 2 \mathrm {Regret}_{(B_*(K),[-1,1]^K)}(T), $$

where the regret in the right hand side is the regret of algorithm \(\mathcal {C}_1\) (and \(\mathcal {C}_2\) as well). Moreover, if \(\mathcal {A}\), \(\mathcal {B}\) and \(\mathcal {C}_1\) runs in polynomial time, then Algorithm 2 runs in polynomial time (per round).

By applying the FTRL as in (2) to the OLO problem \((B_*(K),[-1,1]^K)\) with a strongly convex regularizer R, Proposition 2 implies the following regret bound.

Corollary 1

Assume that there exists a regularizer \(R:B_*(K) \rightarrow \mathbb {R}\) that is \(\sigma \)-strongly convex w.r.t. \(L_1\)-norm. Then, there exists an algorithm for the online load balancing problem that achieves

$$\begin{aligned} \mathrm {Regret}(T) = O(D_R \sqrt{T/\sigma }), \end{aligned}$$

where \(D_R = \sqrt{\max _{\varvec{w}\in B_*(K)}R(\varvec{w})}\).

In particular, for the OLO problem \((B_1(K),[-1,1]^K)\), algorithm EG\(^\pm \) achieves \(\sqrt{2T \ln 4K}\) regret bound as shown in Theorem 2. Thus we have \(O(\sqrt{T \ln K})\) regret bound for online load balancing with respect to \(L_\infty \)-norm (i.e., w.r.t. makespan), which improves the bound of [6] by a factor of \(\sqrt{\ln K}\). Moreover, for \(L_\infty \)-norm, it turns out that we have polynomial time algorithms for \(\mathcal {A}\) and \(\mathcal {B}\), which we will give in the next section. We thus obtain the following corollary.

Corollary 2

There exists a polynomial time (per round) algorithm for the online load balancing problem with respect to \(L_\infty \)-norm that achieves

$$\begin{aligned}\mathrm {Regret}(T) \le 2\sqrt{2T \ln 4K}.\end{aligned}$$

4 Algorithmic Details for \(L_\infty \)-norm

In this section we give details of Algorithm 2 for the makespan problem, i.e., for \(L_\infty \)-norm.

4.1 Computing \(\alpha _t\)

First, we give details of implementation of \(\mathcal {A}\) in Algorithm 2. Specifically, on the round t,  we need to choose \(\varvec{\alpha }_{t}\), which is the optimal solution of the problem in Lemma 1. That is,

$$\begin{aligned} \min _{\varvec{\alpha } \in \varDelta (K)}\max _{{\varvec{l}}\in [0,1]^{K}}\langle \varvec{w}_{1}, (\varvec{\alpha } \odot {\varvec{l}})\rangle +\langle \varvec{w}_{2},{\varvec{l}}\rangle , \end{aligned}$$
(7)

where we set that \(\varvec{w}=(\varvec{w}_{1},\varvec{w}_{2})\) and \(\varvec{w}_{1}\) and \(\varvec{w}_2\) are K-dimensional vectors, respectively. We see that the optimization of this objective function is defined by \(l_{i}=0\) if \(w_{1,i} \cdot \alpha _i+w_{2,i} \le 0,\) otherwise we let \(l_i=1.\) Hence we can convert our problem to choose \(\varvec{\alpha }\) as

$$\min _{\varvec{\alpha } \in \varDelta (K)}\max _{\varvec{l} \in [0,1]^{K}}\langle \varvec{w}_{1}, (\varvec{\alpha } \odot \varvec{l})\rangle +\langle \varvec{w}_{2},\varvec{l} \rangle =\min _{\varvec{\alpha } \in \varDelta (K)}\sum _{i=1}^{K}\max \left\{ 0, \alpha _i w_{1,i}+w_{2,i}\right\} ,$$

which is equivalent to

$$\begin{aligned} \min _{\varvec{\alpha }\in \varDelta (K),\varvec{\beta }\ge \varvec{0}} \sum _{i=1}^{K} \beta _i \quad \mathrm {s{.}t.}~\beta _i \ge w_{1,i} \alpha _i+ w_{2,i} \quad \forall i=1,\dots ,K. \end{aligned}$$

The above problem is a linear program with O(K) variables and O(K) linear constraints. Thus, computing \(\varvec{\alpha }_t\) in the problem (7) can be solved in polynomial time.

4.2 Computing Subgradients \(g_t\) for the \(\infty \)-norm

The second component of Algorithm 2 is the algorithm \(\mathcal {B}\), which computes subgradients \(\varvec{s}_t \in \partial h_S(\varvec{w}_t)\). By Proposition 1, we have \(\varvec{s}_t=\arg \max _{\varvec{s}\in S}\langle \varvec{s}, \varvec{w}_{t} \rangle .\) Recall that \(S=\{ (\varvec{x}, \varvec{y}) \in [0,1]^K \times [0,1]^K \mid \Vert \varvec{x}\Vert _\infty \le C_\infty ^*(\varvec{y})\}\). In particular, the condition that \(\Vert \varvec{x}\Vert _\infty \le C_\infty ^*(\varvec{y})\) can be represented as

$$\begin{aligned}&\max _i x_i \le \min _{\varvec{\alpha }\in \varDelta (K)}\Vert \varvec{\alpha } \odot \varvec{y}\Vert _{\infty } \iff x_i \le \frac{1}{\sum _{j=1}^K \frac{1}{y_j}}, {\forall i}. \end{aligned}$$

Therefore, the computation of the subgradient \(\varvec{s}_t\) is formulated as

$$\begin{aligned} \max _{\varvec{x},\varvec{y}\in [0,1]^K} \langle \varvec{w}_{1}, \varvec{x}\rangle + \langle \varvec{w}_{2}, \varvec{y} \rangle \quad \mathrm {s{.}t.}~x_{i} \le \frac{1}{\sum _{j}\frac{1}{y_{j}}}, \quad \forall i=1,\dots ,K. \end{aligned}$$
(8)

Now we show that there exists an equivalent second order cone programming (SOCP) formulation (e.g., [11]) for this problem.

First we give the definition of the second order cone programming, and then we give a proposition, which states that our optimization problem is equivalent to the second order cone programming.

Definition 4

The standard form for the second order conic programming (SOCP) model is as follows:

$$\min _{\varvec{x}} \langle \varvec{c},\varvec{x}\rangle ~\text {s.t.}~A\varvec{x}=\varvec{b}, \Vert C_{i}\varvec{x}+\varvec{d}_{i} \Vert _{2} \le \varvec{e}_{i}^{\top } \varvec{x}+\varvec{f}_{i} \quad \text {for}~i=1, \cdots , m,$$

where the problem parameters are \(\varvec{c}\in \mathbb {R}^{n},\) \(C_{i} \in \mathbb {R}^{n_{i} \times n},\) \(\varvec{d}_{i} \in \mathbb {R}^{n_{i}},\) \(\varvec{e}\in \mathbb {R}^{n},\) \(\varvec{f}_{i}\in \mathbb {R},\) \(A\in \mathbb {R}^{p \times n},\) and \(\varvec{b}\in \mathbb {R}^{p}.\) \(\varvec{x}\in \mathbb {R}^{n}\) is the optimization variable.

Then we obtain the following proposition.

Proposition 4

\(\sum _{i=1}^{K}\frac{x^{2}}{y_{i}} \le x,\) \(x \ge 0\) and \(y_{i} \ge 0\) is equivalent to \(x^{2} \le y_{i}z_{i},\) where \(y_{i},z_{i} \ge 0\) and \(\sum _{i=1}^{K}z_{i}=x.\)

Proof

On the direction “\(\Rightarrow \)

From \(\sum _{i=1}^{k}\frac{x^{2}}{y_{i}} \le x\) we obtain that \(\sum _{i=1}^{k}\frac{x}{y_{i}} \le 1.\) By setting

$$z_{i}=x \cdot \frac{\frac{1}{y_{i}}}{\sum _{i}\frac{1}{y_{i}}},$$

we can have that \(x^{2} \le y_{i}z_{i},\) and \(\sum _{i=1}^{k}z_{i}=x.\)

On the other direction “\(\Leftarrow \)” Due to \(x^{2} \le y_{i}z_{i},\) we have \(\frac{x^{2}}{y_{i}} \le z_{i}.\) So we have that

$$\sum _{i=1}^{k}\frac{x^{2}}{y_{i}} \le \sum _{i=1}^{k}z_{i} =x.$$

   \(\square \)

Again in our case we need find to the optimal vector \(\varvec{s}\in S,\) which satisfies that Then we can reduce our problem in following theorem.

Theorem 5

The optimization problem (8) can be solved by the second order cone programming.

Proof

To prove this theorem we only need to represent the original problem (8) as a standard form of the SOCP problem. Note that we only consider the case that \(y_{i}\ne 0\) for all \(i=1,\dots ,K\). The case where \(y_i=0\) for some i is trivial. To see this, by definition of S, we know that for all i\(x_{i}=0.\) Then, the resulting problem is a linear program, which is a special case of the SOCP. Now we assume that \(y_i \ne 0\) for \(i=1,\dots , K\). For \(x_{i} \le \frac{1}{\sum _{j}\frac{1}{y_{j}}},\) we multiply \(x_{i}\) on both sides and rearrange the inequality:

$$\sum _{j=1}^{K} \frac{x_{i}^{2}}{y_{j}} \le x_{i}.$$

By Proposition 4, this is equivalent with

$$y_{j}z_{i,j} \ge x_{i}^{2}, \quad y_{j},z_{i,j} \ge 0, \quad \sum _{j=1}^{K}z_{i,j} =x_{i}.$$

By [11], we may rewrite it as follows: For each i

$$\begin{aligned} x_{i}^{2} \le y_{j}z_{i,j}; \quad y_{j}, z_{i,j} \ge 0 \Longleftrightarrow \Vert (2x_{i}, y_{j}-z_{i,j})\Vert _{2} \le y_{j}+z_{i,j} \quad \forall j=1,\dots ,K. \end{aligned}$$
(9)

The above equivalence is trivial. On the other hand, since \(x_{i} \le \frac{1}{\sum _{j}\frac{1}{y_{j}}},\) and \(y_{i} \in [0,1],\) naturally we have \(x_{i} \in [0,1].\) So we need only constrain that \(y_{i}\in [0,1].\) We can apply the face that if \(y_{i}\) is positive so \(\vert y_{i} \vert =y_{i},\) and if \(y_{i} \le 1,\) so \(\vert y_{i} \vert \le 1.\) Therefore we may give a \((K^{2}+2K) \times (K^{2}+2K)\)-matrix \(C_{i}\) in SOCP, and the variable vector is composed as follows:

$$\begin{aligned} \varvec{\tilde{x}}=(x_{1}, \cdots , x_{K},y_{1}, \cdots , y_{K}, z_{1,1}, \cdots , z_{1,K}, \cdots , z_{K,1} \cdots , z_{K,K}), \end{aligned}$$
(10)

where for \(z_{i,j},\) i is corresponding to \(x_{i}.\)

Now we may give the second order cone programming of our target problem as follows:

$$\begin{aligned} \begin{aligned}&\min _{\varvec{\tilde{x}}} \langle -(\varvec{w}_{1},\varvec{w}_{2}, 0, \cdots ,0), \varvec{\tilde{x}} \rangle \\&\mathrm {s{.}t.}\Vert C_{i} \varvec{\tilde{x}} \Vert _{2} \le \varvec{e}_{i}^{\top } \varvec{\tilde{x}} +\varvec{d}_{i} \quad \forall i=1, \cdots , K^{2}+2K, \\&A\varvec{\tilde{x}}=\varvec{b}. \end{aligned} \end{aligned}$$
(11)

where \(C_{i},\) \(\varvec{e}_{i}\), A and \(\varvec{b}\) are defined as follows:

Firstly the matrix C for hyperbolic constraints are given as: For a fixed \(s\in [K],\) where \([K]=\{1, \cdots , K\}\) in matrix \(C_{i},\) where \(i\in [(s-1)K, sK]\) we let \((C_{i})_{1,s}=2,\) \((C_{i})_{K+i,K+i}=1\), \((C_{i})_{2K+(s-1)K+i,2K+(s-1)K+i}=-1,\) and others are 0. \(\varvec{e}_{i}\) is defined as \((\varvec{e}_{i})_{K+i}=1\) and \((\varvec{e}_{i})_{2K+(s-1)K+i}=1,\) others are 0.

Next we need to constrain that \(y_{i}\) is less than 1. For \(i\in [K^{2},K^{2}+K]\) we let that \((C_{i})_{K+i,K+i}=1\) and others are 0. And we let that \(\varvec{e}_{i}\) is a zero vector and \(\varvec{d}_{i}=1.\) It means that \(\Vert y_{i}\Vert \le 1.\) For \(i \in [K^{2}+K, K^{2}+2K],\) we set \((C_{i})_{K+i,K+i}=1\) \(\varvec{e}_{K+i}=1,\) and \(\varvec{d}_{i}=0\).

At last we need to constrain that \(\sum _{j=1}^{K}z_{j}=x_{i}\) in Eq. 9: Let \(A \in \mathbb {R}^{K \times (3K+K^{2})}\) for each row vector \(A_{j},\) where \(j \in [K],\) we have that \((A_{j})_{j}=1\) and \((A_{j})_{2K+(j-1)j+m}=-1,\) for all \(m=1,\cdots , K.\) No w the matrix A is composed by the row vectors \(A_{j}.\) and \(\varvec{b}\) is a zero vector.    \(\square \)

5 Conclusion

In this paper we give a framework for online load balancing problem by reducing it to two OLO problems. Moreover, for online load balancing problem with respect to \(L_\infty \)-norm we achieve the best known regret bound in polynomial time. Firstly, we reduce online load balancing with \(\Vert \cdot \Vert \) norm to a vector payoff game measured by combination norm \(\Vert \cdot \Vert ^{+}.\) Next due to [15] this vector payoff game is reduced to an OCO problem. At last, we can reduce this OCO problem to two independent OLO problems. Especially, for makespan, we give an efficient algorithm, which achieves the best known regret bound \(O(\sqrt{T \ln K}),\) by processing linear programming and second order cone programming in each trial. Recently Kwon [10] proposed a similar reduction with other type of induced norm instead of our combination norm.

There are some open problems left in this topic. For instance, an efficient algorithm for online load balancing with respect to general norm or p-norm is still an open problem. Kwon’s reduction [10] might be helpful to this discussion. Furthermore, the lower bound of online load balancing is still unknown.