Numerical Methods for the Resource Allocation Problem in a Computer Network

Vorontsova, E. A.; Gasnikov, A. V.; Dvurechensky, P. E.; Ivanova, A. S.; Pasechnyuk, D. A.

doi:10.1134/S0965542521020135

Numerical Methods for the Resource Allocation Problem in a Computer Network

INFORMATION SCIENCE
Published: 07 April 2021

Volume 61, pages 297–328, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Mathematics and Mathematical Physics Aims and scope Submit manuscript

Numerical Methods for the Resource Allocation Problem in a Computer Network

Download PDF

E. A. Vorontsova¹,
A. V. Gasnikov^1,2,
P. E. Dvurechensky^3,2,
A. S. Ivanova⁴ &
…
D. A. Pasechnyuk¹

141 Accesses
3 Citations
Explore all metrics

Abstract

The resource allocation problem in computer networks with a large number of links is considered. The links are used by consumers (users), whose number can also be very large. For the dual problem, numerical optimization methods are proposed, such as the fast gradient method, the stochastic projected subgradient method, the ellipsoid method, and the random gradient extrapolation method. A convergence rate estimate is obtained for each of the methods. Algorithms for distributed computation of steps in the considered methods as applied to computer networks are described. Special attention is given to the primal-dual property of the proposed algorithms.

Distributed Optimization Over Networks

Exact spectral-like gradient method for distributed optimization

Article 19 September 2019

Further Algebraic Algorithms in the Congested Clique Model and Applications to Graph-Theoretic Problems

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 INTRODUCTION

1.1 Motivation

In this paper, the problem of controlling modern communication networks is considered from the point of view of optimization and stochastic modeling. To solve problems of this type, we need to represent and analyze the mathematical model arising in the simulation of large-scale broadband networks. It is expected that, in future communication networks, there appear applications that will be able to change their data transmission rates according to the available network capacity. An example of such a network is TCP traffic through the Internet.

The key issue addressed in this paper is how the available capacity of the network is to be allocated among competing flows. The use of available capacities by consumers is controlled by correcting the link prices.

Thus, we consider the problem of optimizing resource allocation in computer networks with a large number of links. The links are used by consumers (users), whose number can also be very large. The goal of this study is to determine a resource allocation mechanism, where the resources are understood as available capacities of network links. Additionally, it is necessary to ensure stable performance of the system and to prevent overloads. As an optimality criterion, we use the sum of the utilities of all users of the computer network.

Originally, standard resource allocation problems reducing to the maximization of the aggregate utility of users in the case of shared use of available resources were considered in [1]. Resource allocation in computer networks was investigated in the recent work [2]. Proposed in [3], the mechanisms of decentralized resource allocation drew much attention in economic studies (see, e.g., [4–6] and references therein). In this paper, following [7, 8], we consider various mechanisms of price adjustment. The proposed approaches are of practical importance due to their decentralized nature, which means that the price of an individual link is established and adjusted relying only on the reactions of users employing this link, rather than on the reactions of all network users. In the case of such an adjustment mechanism, all links perform independently.

Additionally, one of the approaches proposed in this paper relies on the stochastic projected subgradient method and overcomes the following difficulty arising in actual networks: data packets sent by users arrive at a link at different times, so the total traffic through the link is not known in practice. This difficulty is obviated by applying stochastic methods. They can do without the exact value of total traffic, managing only with its estimate, which can be obtained using the traffic of a single user. The idea of using the stochastic projected subgradient method for solution of this problem was proposed in [2].

1.2 Content of This Paper

This paper is organized as follows. The formulation of the problem and the construction of its dual are described in Section 2. Additionally, we state all necessary assumptions for the primal problem. In Section 3, the problem is solved by applying Nesterov’s fast gradient method [9], whose complexity bound is found to be $O\left( {\tfrac{1}{{\sqrt \varepsilon }}} \right)$. In Section 4, this problem is solved using the stochastic projected subgradient method with $O\left( {\tfrac{1}{{{{\varepsilon }^{2}}}}} \right)$ complexity bounds.

In Section 5, the problem is solved by applying the ellipsoid method, which is well suited for low-dimensional problems, and an algorithm for constructing the accuracy certificate for this method is described. We present complexity bounds on the order of $O\left( {{{m}^{2}}\ln\tfrac{1}{\varepsilon }} \right)$, where $m$ is the number of links in the network. A regularization technique for recovering the solution of the primal problem from the solution of the dual one if the method is not primal-dual is described in Section 6. The regularized problem is solved using the random gradient extrapolation method in Section 7. Its complexity bounds are presented, which are on the order of $O\left( {\tfrac{1}{{\sqrt \varepsilon }}\ln\tfrac{1}{\varepsilon }} \right)$, where the logarithmic factor appears due to the regularization of the dual problem.

Numerical experiments supporting the theoretical results obtained in the preceding sections are presented in Section 8.

Additionally, for each algorithm, we describe its distributed computation in the context of the problem under consideration.

2 FORMULATION OF THE PROBLEM

Consider a computer network with $m$ links and $n$ users (or nodes), see Fig. 1.

The users exchange data packets through a fixed set of links. The network structure is specified by the routing matrix $C = (C_{i}^{j}) \in {{{\mathbf{R}}}^{{m \times n}}}$. The matrix columns ${{{\mathbf{C}}}_{i}} \ne 0$, $i = 1, \ldots ,n$, are $m$-dimensional Boolean vectors such that $C_{i}^{j} = 1$ if node $i$ uses link $j$ and $C_{i}^{j} = 0$ otherwise. The link capacities are described by a vector ${\mathbf{b}} \in {{{\mathbf{R}}}^{m}}$ with strictly positive components.

The users estimate the performance of the network with the help of utility functions ${{u}_{k}}({{x}_{k}})$, $k = 1, \ldots ,n$, where ${{x}_{k}} \in {{{\mathbf{R}}}_{ + }}$ is the rate of data transmission from the $k$th user. As an optimality criterion for the system, we use the sum of the utility functions for all users [1].

The problem of maximizing the aggregate utility of the network under constraints imposed on the link capacities is stated as follows:

$$\mathop {\max}\limits_{\left\{ {C{\mathbf{x}} = \sum\limits_{k = 1}^n \,{{{\mathbf{C}}}_{k}}{{x}_{k}}} \right\} \leqslant {\mathbf{b}}} \left\{ {U({\mathbf{x}}) = \sum\limits_{k = 1}^n \,{{u}_{k}}({{x}_{k}})} \right\},$$

(1)

where ${\mathbf{x}} = ({{x}_{1}}, \ldots ,{{x}_{n}}) \in \mathbb{R}_{ + }^{n}$. The solution of this problem is an optimal resource allocation ${\mathbf{x}}\text{*}$.

Consider the standard transition to the dual problem for (1). Given a vector of dual multipliers $\boldsymbol{\lambda} = ({{\lambda }_{1}}, \ldots ,{{\lambda }_{m}}) \in \mathbb{R}_{ + }^{m}$, which can be interpreted as the price vector of the links, the dual objective function is defined as

$$\varphi (\boldsymbol{\lambda} ) = \mathop {\max}\limits_{{\mathbf{x}} \in {\mathbf{R}}_{ + }^{n}} \left\{ {\sum\limits_{k = 1}^n \,{{u}_{k}}({{x}_{k}}) + \left\langle {\boldsymbol{\lambda} ,{\mathbf{b}} - \sum\limits_{k = 1}^n \,{{{\mathbf{C}}}_{k}}{{x}_{k}}} \right\rangle } \right\} = \left\langle {\boldsymbol{\lambda} ,{\mathbf{b}}} \right\rangle + \sum\limits_{k = 1}^n \,\left( {{{u}_{k}}({{x}_{k}}(\boldsymbol{\lambda} )) - \left\langle {\boldsymbol{\lambda} ,{{{\mathbf{C}}}_{k}}{{x}_{k}}(\boldsymbol{\lambda} )} \right\rangle } \right),$$

(2)

here, the users choose optimal data transmission rates ${{x}_{k}}$ by solving the optimization problem

$${{x}_{k}}(\boldsymbol{\lambda} ) = \mathop {\arg\,\max}\limits_{{{x}_{k}} \in {{{\mathbf{R}}}_{ + }}} \,\left\{ {{{u}_{k}}({{x}_{k}}) - {{x}_{k}}\left\langle {\boldsymbol{\lambda} ,{{{\mathbf{C}}}_{k}}} \right\rangle } \right\}.$$

(3)

Let ${\mathbf{x}}(\boldsymbol{\lambda} )$ denote the vector with components ${{x}_{k}}(\boldsymbol{\lambda} )$. Then, to find optimal prices $\boldsymbol{\lambda} \text{*}$, we need to solve the problem

$$\mathop {\min}\limits_{\boldsymbol{\lambda} \in {\mathbf{R}}_{ + }^{m}} \varphi (\boldsymbol{\lambda} ).$$

(4)

Assume that the Slater condition is satisfied for the primal problem. Then, by virtue of strong duality, both primal and dual problems have solutions. By using the Slater condition, it is possible to compactify the solution of the dual problem. Assume that the solution of the dual problem satisfies the estimate

$${{\left\| {\lambda \text{*}} \right\|}_{2}} \leqslant R.$$

Here, $R$ has no effect on the performance of the considered algorithms, but is only involved in their convergence rate estimates.

The basic idea of this paper is to apply various optimization methods for solving dual problem (4) with the addition of primal-dual analysis of these methods, which makes it possible to recover the solution of primal problem (1). In this sense, we develop the approach addressed in our previous works [10–21]. The basic difference is that we consider inequality constraints and analyze stochastic algorithms in the terms of estimates with high probability, rather than on average.

2.1 Strongly Concave Utility Functions

In some sections, we assume that the utility functions ${{u}_{k}}({{x}_{k}})$, $k = 1, \ldots ,n$, are strongly concave with a constant $\mu $. In this subsection, we describe the properties of the dual problem under this assumption.

Proposition 1 (Demyanov–Danskin–Rubinov theorem, see [22, 23]). Suppose that, for any $\boldsymbol{\lambda} \in {{\mathbb{R}}_{ + }}$, it holds that $\varphi (\boldsymbol{\lambda} ) = \mathop {\max}\limits_{{\mathbf{x}} \in X} F({\mathbf{x}},\boldsymbol{\lambda} )$, where $F({\mathbf{x}},\boldsymbol{\lambda} )$ is a convex and smooth function of $\boldsymbol{\lambda} $ with a maximum reached at the only point $x(\boldsymbol{\lambda} )$. Then $\nabla \varphi (\boldsymbol{\lambda} ) = {{\nabla }_{\boldsymbol{\lambda} }}F(x(\boldsymbol{\lambda} ),\boldsymbol{\lambda} )$.

Proposition 2 (see [24]). Suppose that the functions ${{u}_{k}}({{x}_{k}})$ are $\mu $-strongly concave for all $k = 1, \ldots ,n$. Then function (2), where ${{x}_{k}}(\boldsymbol{\lambda} ),$ $k = 1, \ldots ,n$, solve problem (3), is $n{{m}^{2}}{\text{/}}\mu $-smooth, i.e., the gradient of the function $\varphi (\boldsymbol{\lambda} )$ satisfies the Lipschitz condition with constant $L = n{{m}^{2}}{\text{/}}\mu $:

$$\mathop {\left| {\left| {\nabla \varphi ({{\boldsymbol{\lambda} }^{2}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{1}})} \right|} \right|}\nolimits_2 \leqslant L{{\left\| {{{\boldsymbol{\lambda} }^{2}} - {{\boldsymbol{\lambda} }^{1}}} \right\|}_{2}}.$$

The proof of the proposition can be found in the Appendix.

2.2 Concave Utility Functions

Now we assume that the utility functions ${{u}_{k}}({{x}_{k}})$, $k = 1, \ldots ,n$, are concave, but not strongly concave. Then the dual problem is not smooth. In this subsection, we describe some properties of subgradients of the dual problem under these assumptions.

The subgradient of dual problem (4) is defined as

$$\nabla \varphi (\boldsymbol{\lambda} ) = {\mathbf{b}} - C{\mathbf{x}}.$$

Since ${\mathbf{x}}$ is a bounded rate of data transmission and the vector ${\mathbf{b}}$ is also bounded, we see that the subgradients of the dual problem are bounded. Thus, there exists a positive constant $M$ such that

$${{\left\| {\nabla \varphi (\boldsymbol{\lambda} )} \right\|}_{2}} \leqslant M.$$

(5)

As a rough estimate from above for the constant $M$ in (5), we can use $O(n\sqrt m )$. The multiplier $n$ appears because there are $n$ terms and $\sqrt m $ is used as an estimate for the dependence of the $2$-norm on the vector dimension $m$.

3 FAST GRADIENT METHOD

In this section, we assume that the utility functions ${{u}_{k}}({{x}_{k}})$, $k = 1, \ldots ,n$, are strongly concave with a constant $\mu $; therefore, the dual problem is smooth.

Dual problem (4) is solved by applying Nesterov’s fast gradient method (FGM) in the following version (PDFGM method, see Algorithm 1).

Algorithm 1. Primal-Dual Fast Gradient Method (PDFGM)

Input: ${{u}_{k}}({\mathbf{x}}),$ $k = 1, \ldots ,n$, are strongly concave utility functions for each user; ${{\boldsymbol{\lambda} }^{0}}$ is the initial price vector, ${{\alpha }_{t}}: = \tfrac{{t + 1}}{2}$, ${{A}_{{ - 1}}}: = 0$, ${{A}_{t}}: = {{A}_{{t - 1}}} + {{\alpha }_{t}} = \tfrac{{(t + 1)(t + 2)}}{4}$, and ${{\tau }_{t}}: = \tfrac{{{{\alpha }_{{t + 1}}}}}{{{{A}_{{t + 1}}}}} = \tfrac{2}{{t + 3}}$, $t = 0,1, \ldots ,N - 1$.

1: for $t = 0,1, \ldots ,N - 1$

2: Compute $\varphi ({{\boldsymbol{\lambda} }^{t}})$, $\nabla \varphi ({{\boldsymbol{\lambda} }^{t}})$

3: ${{{\mathbf{y}}}^{t}}: = \mathop {\left[ {{{\boldsymbol{\lambda} }^{t}} - \tfrac{1}{L}\left( {{\mathbf{b}} - \sum\nolimits_{k = 1}^n {{{{\mathbf{C}}}_{k}}{{x}_{k}}({{\boldsymbol{\lambda} }^{t}})} } \right)} \right]}\nolimits_ + $

4: ${{{\mathbf{z}}}^{t}}: = \mathop {\left[ {{{\boldsymbol{\lambda} }^{0}} - \tfrac{1}{L}\sum\nolimits_{j = 0}^t {{{\alpha }_{j}}} \left( {{\mathbf{b}} - \sum\nolimits_{k = 1}^n {{{{\mathbf{C}}}_{k}}{{x}_{k}}({{\boldsymbol{\lambda} }^{j}})} } \right)} \right]}\nolimits_ + $

5: ${{\boldsymbol{\lambda} }^{{t + 1}}}: = {{\tau }_{t}}{{{\mathbf{z}}}^{t}} + (1 - {{\tau }_{t}}){{{\mathbf{y}}}^{t}}$

6: ${{{\mathbf{\hat {x}}}}^{{t + 1}}}: = \tfrac{1}{{{{A}_{{t + 1}}}}}\sum\nolimits_{j = 0}^{t + 1} {{{\alpha }_{j}}{\mathbf{x}}({{\boldsymbol{\lambda} }^{{\mathbf{j}}}})} $

7: end for

8: return ${{\boldsymbol{\lambda} }^{N}}$, ${{{\mathbf{\hat {x}}}}^{N}}$

3.1 Distributed Method

The problem under consideration can also be solved using the distributed version of FGM, which means that each link can compute an optimal data transmission rate only relying on the reactions of the users that employ this link without interacting with the other links.

The process occurring at the $t$th iteration for link $j$ can be described as follows.

1. Given information received from the users after the preceding iteration with index $t - 1$ (vector ${{{\mathbf{x}}}^{t}} = {\mathbf{x}}({{\boldsymbol{\lambda} }^{t}})$), the link $j$ computes

$$y_{j}^{t} = \mathop {\left[ {\lambda _{j}^{t} - \frac{1}{L}\left( {{{b}_{j}} - \sum\limits_{k = 1}^n \,C_{k}^{j}x_{k}^{t}} \right)} \right]}\nolimits_ + .$$

Here, $C_{k}^{j} \ne 0$ only for users employing the link $j$. Therefore, to compute this step, the link needs only information from the users employing this link.

2. Similarly, the link $j$ computes

$$z_{j}^{t} = \mathop {\left[ {\lambda _{j}^{0} - \frac{{{{\alpha }_{j}}}}{L}\left( {{{b}_{j}} - \sum\limits_{k = 1}^n \,C_{k}^{j}x_{k}^{t}} \right)} \right]}\nolimits_ + .$$

3. After obtaining values at two preceding steps, link $j$ computes the price for the next iteration $t + 1$:

$$\lambda _{j}^{{t + 1}} = {{\tau }_{t}}z_{j}^{t} + (1 - {{\tau }_{t}})y_{j}^{t}$$

and sends out this information to all users connected to it.

4. The users compute the optimal data transmission rates ${{{\mathbf{\hat {x}}}}^{{t + 1}}}$; specifically, for user $k$, we obtain

$${{x}_{k}}({{\boldsymbol{\lambda} }^{{t + 1}}}) = \mathop {\arg\,\max}\limits_{{{x}_{k}} \in {{{\mathbf{R}}}_{ + }}} \,\left( {{{u}_{k}}({{x}_{k}}) - {{x}_{k}}\sum\limits_{j = 1}^m \,\lambda _{j}^{{t + 1}}C_{k}^{j}} \right),$$

where, by the definition of the matrix $C$, the user needs only data from the links it employs. Next, the user computes the optimal rate

$$\hat {x}_{k}^{{t + 1}} = \frac{{{{A}_{t}}\hat {x}_{k}^{t} + x_{k}^{{t + 1}}}}{{{{A}_{{t + 1}}}}}.$$

Remark 1. A disadvantage of this algorithm is that each link has to know the reactions of all users that employ it at every iteration step. Unfortunately, in actual networks, users do not transmit data simultaneously, so it is rather difficult to collect this information for the link. However, if complete information on the users is available, the link can establish an equilibrium price more quickly.

3.2 Estimation of the Convergence Rate of FGM

Before proving the convergence of FGM for the problem under consideration, we state the key lemma necessary for estimating the residuals in the constraints and the duality gap after running PDFGM.

Lemma 1. Suppose that Algorithm 1 starts at an initial point ${{\boldsymbol{\lambda} }^{0}}$ lying in the Euclidean ball of radius $R$ centered at the origin. Then, after performing $N$ iterations of Algorithm 1, it holds that

$${{A}_{N}}\varphi ({{{\mathbf{y}}}^{N}}) - {{A}_{N}}U({{{\mathbf{\hat {x}}}}^{N}}) + 2\hat {R}{{A}_{N}}\mathop {\left\| {\mathop {(C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}})}\nolimits_ + } \right\|}\nolimits_2 \leqslant \tfrac{{37L{{{\hat {R}}}^{2}}}}{9},$$

(6)

where ${{{\mathbf{\hat {x}}}}^{N}} = \tfrac{1}{{{{A}_{N}}}}\sum\nolimits_{t = 0}^{N - 1} {{{\alpha }_{t}}{\mathbf{x}}({{\boldsymbol{\lambda} }^{t}})} $ and $\hat {R} = 3R$.

The proof of the lemma can be found in the Appendix.

Now we formulate a theorem on the convergence rate estimate for Algorithm 1.

Theorem 1. Suppose that Algorithm 1 starts at an initial point ${{\boldsymbol{\lambda} }^{0}}$ lying in the Euclidean ball of radius $R$ centered at the origin. Then, after performing

$$N = \left\lceil {\frac{{2\hat {R}}}{3}\sqrt {\frac{{37L}}{\varepsilon }} } \right\rceil $$

iterations of Algorithm 1, it holds that

$$U({\mathbf{x}}\text{*}) - U({{{\mathbf{\hat {x}}}}^{N}}) \leqslant \varepsilon ,\quad \mathop {\left\| {\mathop {(C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}})}\nolimits_ + } \right\|}\nolimits_2 \leqslant \frac{\varepsilon }{{\hat {R}}},$$

where ${{{\mathbf{\hat {x}}}}^{N}} = \tfrac{1}{{{{A}_{N}}}}\sum\nolimits_{t = 0}^{N - 1} {{{\alpha }_{t}}{\mathbf{x}}({{\boldsymbol{\lambda} }^{t}})} $, ${\mathbf{x}}\text{*}$ is an optimal solution of problem (1), and $\hat {R} = 3R$.

Proof. Let $\operatorname{Opt} [P]$ denote the optimal value in the original primal problem (1), and let $\operatorname{Opt} [D]$ denote the optimal value in the dual problem (4). By the weak duality, we have

$$\operatorname{Opt} [D] \geqslant \operatorname{Opt} [P].$$

Moreover, for all ${\mathbf{x}} \in \mathbb{R}_{ + }^{n}$, the optimal solution $\boldsymbol{\lambda} \text{*}$ of dual problem (4) satisfies

$$\operatorname{Opt} [P] \geqslant U({\mathbf{x}}) - \left\langle {\boldsymbol{\lambda} \text{*},\mathop {\left( {\sum\limits_{k = 1}^n \,{{{\mathbf{C}}}_{k}}{{x}_{k}} - {\mathbf{b}}} \right)}\nolimits_ + } \right\rangle \geqslant U({\mathbf{x}}) - \hat {R}\mathop {\left\| {\mathop {(C{\mathbf{x}} - {\mathbf{b}})}\nolimits_ + } \right\|}\nolimits_2 .$$

(7)

Then

$$\begin{gathered} \varphi ({{{\mathbf{y}}}^{N}}) - U({{{{\mathbf{\hat {x}}}}}^{N}}) = \varphi ({{{\mathbf{y}}}^{N}}) - U({{{{\mathbf{\hat {x}}}}}^{N}}) + \operatorname{Opt} \text{[}P] - \operatorname{Opt} [P] + \operatorname{Opt} [D] - \operatorname{Opt} [D] = \underbrace {(\operatorname{Opt} [D] - \operatorname{Opt} [P])}_{ \geqslant 0} \\ \, + (\operatorname{Opt} [P] - U({{{{\mathbf{\hat {x}}}}}^{N}})) + \underbrace {(\varphi ({{{\mathbf{y}}}^{N}}) - \operatorname{Opt} [D])}_{ \geqslant 0}\;\mathop \geqslant \limits^{(7)} \; - {\kern 1pt} \left\langle {\boldsymbol{\lambda} \text{*},{{{({\mathbf{b}} - C{{{{\mathbf{\hat {x}}}}}^{N}})}}_{ + }}} \right\rangle \geqslant - \hat {R}{\kern 1pt} {{\left\| {{{{(C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}})}}_{ + }}} \right\|}_{2}}. \\ \end{gathered} $$

Substituting the last inequality into (6) yields the estimate

$$\hat {R}{{\left\| {{{{(C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}})}}_{ + }}} \right\|}_{2}} \leqslant \frac{{37L{{{\hat {R}}}^{2}}}}{{9{{A}_{N}}}}.$$

Consequently, $\varphi ({{{\mathbf{y}}}^{N}}) - U({{{\mathbf{\hat {x}}}}^{N}}) \geqslant - \tfrac{{37L{{{\hat {R}}}^{2}}}}{{9{{A}_{N}}}}$. On the other hand, it follows from (6) that $\varphi ({{{\mathbf{y}}}^{N}}) - U({{{\mathbf{\hat {x}}}}^{N}}) \leqslant \tfrac{{37L{{{\hat {R}}}^{2}}}}{{9{{A}_{N}}}}$. Therefore,

$$\left| {\varphi ({{{\mathbf{y}}}^{N}}) - U({{{{\mathbf{\hat {x}}}}}^{N}})} \right| \leqslant \frac{{37L{{{\hat {R}}}^{2}}}}{{9{{A}_{N}}}}.$$

Since $\varphi ({{{\mathbf{y}}}^{N}}) \geqslant \operatorname{Opt} [D] = \varphi ({\mathbf{y}}\text{*}) \geqslant \operatorname{Opt} [P] = U({\mathbf{x}}\text{*})$, we have

$$U({\mathbf{x}}\text{*}) - U({{{\mathbf{\hat {x}}}}^{N}}) \leqslant \frac{{37L{{{\hat {R}}}^{2}}}}{{9{{A}_{N}}}} = \frac{{148L{{{\hat {R}}}^{2}}}}{{9(N + 1)(N + 2)}} \leqslant \frac{{148L{{{\hat {R}}}^{2}}}}{{9{{N}^{2}}}} \leqslant \varepsilon .$$

Expressing $N$ from the last inequality gives the estimate from the condition of the theorem.

4 STOCHASTIC PROJECTED SUBGRADIENT METHOD

Consider the original problem (1), now assuming that the utility functions ${{u}_{k}}({{x}_{k}})$, $k = 1, \ldots ,n$, are concave, but not strongly concave. In this case, dual problem (4) becomes nonsmooth. Accordingly, for its solution, we propose the stochastic projected subgradient method. For the first time, the idea of using this method for solving the given problem was proposed in [2].

Consider the probability space $\left( {\Omega ,\mathcal{F},\mathbb{P}} \right)$. Suppose that a sequence of independent random variables $\{ {{\xi }^{t}}\} _{{t = 0}}^{\infty }$ uniformly distributed on $\{ 1, \ldots ,n\} $ is defined on $\left( {\Omega ,\mathcal{F},\mathbb{P}} \right)$, i.e.,

$$\mathbb{P}({{\xi }^{t}} = i) = \frac{1}{n},\quad i \in \{ 1, \ldots ,n\} .$$

If there is an oracle producing the stochastic subgradient of the dual function $\nabla \varphi (\boldsymbol{\lambda} ,\xi )$, i.e.,

$$\nabla \varphi (\boldsymbol{\lambda} ,\xi ) = {\mathbf{b}} - n{{C}_{\xi }}{{x}_{\xi }}(\boldsymbol{\lambda} ),$$

then

$$\mathbb{E}{\kern 1pt} [{\mathbf{b}} - n{{C}_{{{{\xi }_{t}}}}}{{x}_{{{{\xi }_{t}}}}}({{\boldsymbol{\lambda} }^{t}})\,{\text{|}}\,{{\xi }^{t}}] = {\mathbf{b}} - \sum\limits_{k = 1}^n \,{{{\mathbf{C}}}_{k}}{{x}_{k}}({{\boldsymbol{\lambda} }^{t}}) = \nabla \varphi ({{\boldsymbol{\lambda} }^{t}})$$

Algorithm 2. Primal-Dual Stochastic Projected Subgradient Method (PDSPSGM), Version 1

Input: ${{u}_{k}}({\mathbf{x}}),$ $k = 1, \ldots ,n$, are concave utility functions for each user, and $\beta $ is the step of the method.

1: ${{\boldsymbol{\lambda} }^{0}}: = 0$

2: for $t = 1, \ldots ,N - 1$

3: Compute $\nabla \varphi ({{\boldsymbol{\lambda} }^{{t - 1}}},\xi )$

4: ${{\boldsymbol{\lambda} }^{t}}: = {{[{{\boldsymbol{\lambda} }^{{t - 1}}} - \beta ({\mathbf{b}} - n{{C}_{{{{\xi }^{{t - 1}}}}}}{{x}_{{{{\xi }^{{t - 1}}}}}}({{\boldsymbol{\lambda} }^{{t - 1}}}))]}_{ + }}$

5: ${{{\mathbf{\hat {x}}}}^{{t + 1}}}: = \tfrac{1}{{t + 1}}\sum\nolimits_{j = 0}^t {{\mathbf{x}}({{\boldsymbol{\lambda} }^{j}})} $

6: ${{\hat {\boldsymbol{\lambda} }}^{{t + 1}}}: = \tfrac{1}{{t + 1}}\sum\nolimits_{j = 0}^t {{{\boldsymbol{\lambda} }^{j}}} $

7: end for

8: return ${{\hat {\boldsymbol{\lambda} }}^{N}},\;{{{\mathbf{\hat {x}}}}^{N}}$

Therefore, the stochastic subgradient is an unbiased estimator of the subgradient.

An optimal solution of problem (2) is sought using PDSPSGM. We describe two versions of this method (see Algorithms 2 and 3). Algorithm 2 relies on a complete model of reconstructing the vector ${\mathbf{x}}(\boldsymbol{\lambda} )$ at every iteration. However, the computation of ${\mathbf{x}}(\boldsymbol{\lambda} )$ is nearly equivalent in complexity to the computation of a complete subgradient of $\varphi (\boldsymbol{\lambda} )$. Therefore, the basic algorithm is Algorithm 3, in which the vector ${\mathbf{x}}(\boldsymbol{\lambda} )$ is reconstructed using an incomplete stochastic model, which means that only one component of the vector ${\mathbf{x}}(\boldsymbol{\lambda} )$ is updated at every iteration step, while the others remain unchanged. In the proof of the convergence theorem, we first establish the convergence estimate for Algorithm 2 and then show that the approximate solution of the primal problem produced by Algorithm 3 is close in accuracy to the solution obtained using Algorithm 2.

Algorithm 3. Primal-Dual Stochastic Projected Subgradient Method (PDSPSGM), Version 2

Input: ${{u}_{k}}({\mathbf{x}}),$ $k = 1, \ldots ,n$, are concave utility functions for each user, and $\beta $ is the step of the method.

1: ${{\boldsymbol{\lambda} }^{0}}: = 0$

2: for $t = 1, \ldots ,N - 1$

3: Compute $\nabla \varphi ({{\boldsymbol{\lambda} }^{{t - 1}}},\xi )$

4: ${{\boldsymbol{\lambda} }^{t}}: = {{[{{\boldsymbol{\lambda} }^{{t - 1}}} - \beta ({\mathbf{b}} - n{{C}_{{{{\xi }^{{t - 1}}}}}}{{x}_{{{{\xi }^{{t - 1}}}}}}({{\boldsymbol{\lambda} }^{{t - 1}}}))]}_{ + }}$

5: ${\mathbf{\tilde {x}}}_{{{{\xi }^{t}}}}^{{t + 1}}: = \tfrac{t}{{t + 1}}{\mathbf{\tilde {x}}}_{{{{\xi }^{t}}}}^{t} + \tfrac{1}{{t + 1}}n{{x}_{{{{\xi }^{t}}}}}({{\lambda }^{t}})$, ${\mathbf{\tilde {x}}}_{j}^{{t + 1}}: = {\mathbf{\tilde {x}}}_{j}^{t}$ for $j \ne {{\xi }^{t}}$

6: ${{\hat {\boldsymbol{\lambda} }}^{{t + 1}}}: = \tfrac{1}{{t + 1}}\sum\nolimits_{j = 0}^t {{{\boldsymbol{\lambda} }^{j}}} $

7: end for

8: return ${{\hat {\boldsymbol{\lambda} }}^{N}},\;{{{\mathbf{\tilde {x}}}}^{N}}$

4.1 Distributed Method

Let us describe how the distributed version of the stochastic projected subgradient method can be applied for solving the problem under consideration.

The process occurring at the $t$th iteration for link $j$ is as follows:

1. Given the information received from the links after the preceding iteration with index $t - 1$, the random user ${{\xi }^{t}}$ transmits data at the optimal rate

$${{x}_{{{{\xi }^{t}}}}}({{\boldsymbol{\lambda} }^{{t + 1}}}) = \mathop {\arg\,\max}\limits_{{{x}_{{{{\xi }^{t}}}}} \in {{{\mathbf{R}}}_{ + }}} \,\left( {{{u}_{{{{\xi }^{t}}}}}({{x}_{{{{\xi }^{t}}}}}) - {{x}_{{{{\xi }^{t}}}}}\sum\limits_{j = 1}^m \,\lambda _{j}^{{t + 1}}C_{{{{\xi }^{t}}}}^{j}} \right),$$

where, by the definition of the matrix $C,$ the information required for the user is only from the links used by the user.

2. The link $j$ computes the price for the next iteration based on the reaction of this user:

$$\lambda _{j}^{{t + 1}} = {{[\lambda _{j}^{t} - \beta ({{b}_{j}} - nC_{{{{\xi }^{t}}}}^{j}x_{{{{\xi }^{t}}}}^{t})]}_{ + }}.$$

Here, $C_{{{{\xi }^{t}}}}^{j} \ne 0$ only for users employing link $j.$ Therefore, the price changes only for actual links of the user transmitting data.

Remark 2. The main advantage of this method is that the link changes the price relying only on the reactions of a single user, which makes the problem formulation much closer to the situation occurring in actual networks, where users do not transmit data simultaneously.

4.2 Estimation of the Convergence Rate of the Stochastic Projected Subgradient Method

Before proving the main theorem on convergence rate estimates for the proposed methods, we state the necessary assumptions for the problem under study. Assume that there exists a positive constant $M = O(n\sqrt m )$ such that

$${{\left\| {\nabla \varphi (\lambda ,\xi )} \right\|}_{2}} \leqslant M.$$

(8)

This assumption holds, since the data transmission rate ${\mathbf{x}}$ is bounded and the capacity vector ${\mathbf{b}}$ is bounded as well in view of the physical considerations. Therefore, by its definition, the stochastic subgradient is also bounded.

Additionally, we assume that

$$\mathbb{E}\left[ {\exp\left( {\frac{{\mathop {\left\| {\nabla \varphi (\lambda ,\xi ) - \nabla \varphi (\lambda )} \right\|}\nolimits_2^2 }}{{{{\sigma }^{2}}}}} \right)} \right] \leqslant \exp(1),$$

where $\sigma $ is a positive numerical constant and the order of dependence on $n$ and $m$ is the same as for $M$.

To estimate the convergence rate of Algorithm 3, it is necessary to assume that ${{u}_{k}}({{x}_{k}}),k = 1, \ldots ,n$, are Lipschitz continuous functions with constant ${{M}_{{{{u}_{k}}}}}$. Then $U({\mathbf{x}})$ is a Lipschitz continuous function with a constant ${{M}_{U}}$:

$$\forall {\mathbf{x}},{\mathbf{y}}\quad \left| {U({\mathbf{x}}) - U({\mathbf{y}})} \right| \leqslant {{M}_{U}}{{\left\| {{\mathbf{x}} - {\mathbf{y}}} \right\|}_{2}},$$

where ${{M}_{U}} = O(\sqrt n )$. It may happen that ${{u}_{k}}({{x}_{k}})$ is a Lipschitz continuous function everywhere, except, for instance, the point $0$. An example of such a function is ${{u}_{k}}({{x}_{k}}) = \ln{{x}_{k}}$, which is one of the most widespread utility functions. However, by the specific features of the problem, there always exist $\bar {\varepsilon } > 0$ and $\underline \varepsilon > 0$ such that $x_{k}^{*} \geqslant \underline \varepsilon $ and $x_{k}^{*} \leqslant \bar {\varepsilon }$. Then the problem can be solved on the compact set $Q = \left\{ {{\mathbf{x}}:\underline \varepsilon \leqslant {{x}_{k}} \leqslant \bar {\varepsilon },\;k = 1, \ldots ,n} \right\}$, and the considered function ${{u}_{k}}({{x}_{k}}) = \ln{{x}_{k}}$ becomes Lipschitz continuous on $Q$. In the general case, a concave utility function $u(x)$ is Lipschitz continuous on a compact set lying in the relative interior of the domain of $u(x)$.

Suppose that

$$\mathbb{E}\left[ {\exp\left( {\frac{{\mathop {\left\| {{\mathbf{x}}(\boldsymbol{\lambda} \lambda ,\xi ) - {\mathbf{x}}(\boldsymbol{\lambda} )} \right\|}\nolimits_2^2 }}{{\sigma _{x}^{2}}}} \right)} \right] \leqslant \exp(1),$$

where ${{\sigma }_{x}} = O(\sqrt n )$ is a positive numerical constant and

$${\mathbf{x}}(\boldsymbol{\lambda} ,\xi ) = {{(0, \ldots ,n{{x}_{\xi }}(\boldsymbol{\lambda} ), \ldots ,0)}^{{\text{T}}}}.$$

Below is the key lemma necessary for obtaining convergence rate estimates for the residual in the constraints and the duality gap after running PDSPSGM.

Lemma 2. Suppose that Algorithm 3 starts at the initial point ${{\boldsymbol{\lambda} }^{0}} = 0$ with a step $\beta $. Then, after performing $N$ iterations of Algorithm 3, with probability 1 – 4$\delta $,

$$\begin{gathered} \varphi ({{{\hat {\boldsymbol{\lambda} }}}^{N}}) - U({{{{\mathbf{\tilde {x}}}}}^{N}}) + 2R\mathop {\left\| {{{{[C{{{{\mathbf{\tilde {x}}}}}^{N}} - {\mathbf{b}}]}}_{ + }}} \right\|}\nolimits_2 \leqslant {{C}_{1}}\frac{{{{R}^{2}}\sigma \sqrt {g(N)J} }}{{\sqrt N }} + \frac{{2{{R}^{2}}}}{{\beta N}} + \frac{{\beta {{M}^{2}}}}{2} \\ \, + \frac{{\sqrt 2 \left( {1 + \sqrt {3\ln\tfrac{1}{\delta }} } \right)}}{{\sqrt N }}\left( {{{M}_{U}}{{\sigma }_{x}} + 2R\left( {\sigma + {{\sigma }_{x}}\sqrt {{{\lambda }_{{{\text{max}}}}}({{C}^{{\text{T}}}}C)} } \right)} \right), \\ \end{gathered} $$

where

$${{\hat {\boldsymbol{\lambda} }}^{N}} = \frac{1}{N}\sum\limits_{t = 0}^{N - 1} \,{{\boldsymbol{\lambda} }^{t}},$$

$${{{\mathbf{\tilde {x}}}}^{N}} = \frac{1}{N}\sum\limits_{t = 0}^{N - 1} \,{\mathbf{x}}({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}),$$

${{C}_{1}}$ is a positive numerical constant, $g(N) = \ln\left( {\tfrac{N}{\delta }} \right) + \ln\ln\left( {\tfrac{F}{f}} \right)$,

$$F = 2{{\sigma }^{2}}N{{(2\beta )}^{N}}\left( {2{{R}^{2}} + 2{{\beta }^{2}}{{M}^{2}} + \beta {{R}^{2}} + 24\ln\frac{N}{\delta }\beta {{\sigma }^{2}}N} \right),$$

$f = {{\sigma }^{2}}{{R}^{2}}$,

$$J = \max\left\{ {1,\quad \frac{1}{R}\beta {{C}_{1}}\sqrt {{{\sigma }^{2}}g(N)} + \sqrt {\frac{1}{{{{R}^{2}}}}{{\beta }^{2}}C_{1}^{2}{{\sigma }^{2}}g(N) + \frac{{2{{R}^{2}} + 2{{\beta }^{2}}{{M}^{2}}}}{{{{R}^{2}}}}} } \right\},$$

and $R$ is determined by the condition ${{\left\| {\boldsymbol{\lambda} \text{*}} \right\|}_{2}} \leqslant R$.

The proof of the lemma can be found in the Appendix.

Now we formulate a theorem on the convergence rate estimate for Algorithm 3.

Theorem 2. Suppose that Algorithm 3 starts at the initial point ${{\boldsymbol{\lambda} }^{0}} = 0$ with step $\beta = \tfrac{R}{{M\sqrt N }}$. Define

$$A = \sqrt 2 \left( {1 + \sqrt {3\ln\frac{1}{\delta }} } \right)\left( {{{M}_{U}}{{\sigma }_{x}} + 2R\left( {\sigma + {{\sigma }_{x}}\sqrt {{{\lambda }_{{{\text{max}}}}}({{C}^{{\text{T}}}}C)} } \right)} \right) + 2.5RM.$$

Then, after performing

$$N = O\left( {\left\lceil {\frac{{{{A}^{2}}}}{{{{\varepsilon }^{2}}}}\ln\left( {\frac{{MR}}{{\varepsilon \delta }}} \right)} \right\rceil } \right)$$

iterations of Algorithm 3, with probability 1 – 4$\delta $,

$$U({\mathbf{x}}\text{*}) - U({{{\mathbf{\tilde {x}}}}^{N}}) \leqslant \varepsilon ,\quad \mathop {\left\| {{{{(C{{{{\mathbf{\tilde {x}}}}}^{N}} - {\mathbf{b}})}}_{ + }}} \right\|}\nolimits_2 \leqslant \frac{\varepsilon }{R},$$

where ${{{\mathbf{\tilde {x}}}}^{N}} = \tfrac{1}{N}\sum\nolimits_{t = 0}^{N - 1} {{\mathbf{x}}({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}})} $ and ${\mathbf{x}}\text{*}$ is an optimal solution of problem (1).

Proof. The beginning of the proof is the same as for Theorem 1, but we use the estimate from Lemma 2. As a result, for the step $\beta = \tfrac{R}{{M\sqrt N }}$, we obtain

$$\frac{{\sqrt 2 \left( {1 + \sqrt {3\ln\tfrac{1}{\delta }} } \right)}}{{\sqrt N }}\left( {{{M}_{U}}{{\sigma }_{x}} + 2R\left( {\sigma + {{\sigma }_{x}}\sqrt {{{\lambda }_{{{\text{max}}}}}({{C}^{{\text{T}}}}C)} } \right)} \right) + \frac{{5RM}}{{2\sqrt N }} + {{C}_{1}}\frac{{{{R}^{2}}\sigma \sqrt {g(N)J} }}{{\sqrt N }},$$

moreover, up to constants, $g(N) \approx \ln\left( {\tfrac{N}{\delta }} \right)$ and $J \approx \max\left\{ {1,\beta \sqrt {g(N)} } \right\}$. Next, we find $N$ for which the estimate becomes less than $\varepsilon $.

We introduce the following notation:

$$A = \sqrt 2 \left( {1 + \sqrt {3\ln\frac{1}{\delta }} } \right)\left( {{{M}_{U}}{{\sigma }_{x}} + 2R\left( {\sigma + {{\sigma }_{x}}\sqrt {{{\lambda }_{{{\text{max}}}}}({{C}^{{\text{T}}}}C)} } \right)} \right) + 2.5RM,$$

$$B = {{C}_{1}}{{R}^{2}}\sigma .$$

It is necessary to obtain the minimum estimate on the iteration number $N$ required for achieving the prescribed accuracy $\varepsilon $. For $J = 1$, we obtain

$$\sqrt N = \left\lceil {\frac{{A + B\sqrt {\ln\left( {\tfrac{N}{\delta }} \right)} }}{\varepsilon }} \right\rceil .$$

(9)

Substituting $N$ recursively, we derive from (9) the complexity bound

$$N = O\left( {\left\lceil {\frac{{{{A}^{2}}}}{{{{\varepsilon }^{2}}}}\ln\left( {\frac{{MR}}{{\varepsilon \delta }}} \right)} \right\rceil } \right).$$

For $J = \beta \sqrt {g(N)} = \tfrac{{R\sqrt {g(N)} }}{{M\sqrt N }}$, we assume that

$$\frac{A}{{\sqrt N }} + \frac{{Bg(N)R}}{{MN}} = \frac{A}{{\sqrt N }} + \frac{{\bar {B}g(N)}}{N} \leqslant \varepsilon .$$

Since the minimum $N$ is needed, replacing the last inequality with equality and solving the resulting equation, we obtain

$$\sqrt N = \left\lceil {\frac{{A + \sqrt {{{A}^{2}} + 4\varepsilon \bar {B}\ln\left( {\tfrac{N}{\delta }} \right)} }}{{2\varepsilon }}} \right\rceil .$$

By analogy with the case $J = 1$, this equality yields the estimate

$$N = O\left( {\left\lceil {\frac{{{{A}^{2}}}}{\varepsilon }\ln\left( {\frac{{MR}}{{\varepsilon \delta }}} \right)} \right\rceil } \right).$$

The worst of the complexity bounds for $J = 1$ and $J = \beta \sqrt {g(N)} $ is the estimate from the condition of the theorem.

5 ELLIPSOID METHOD

In this section, the original problem (1) is solved by applying the ellipsoid method [25]. This method can be used when the dual problem has a low dimension ($m$) or when high accuracy of the solution is required. The method is primal-dual, i.e., the solution of the primal problem can be recovered from the solution of the dual problem.

Consider the original problem (1) and its dual (2). As in the preceding section, the functions ${{u}_{k}}({{x}_{k}})$, $k = 1, \ldots ,n$, are assumed to be concave, but not strongly concave. Additionally, we assume that the solution of the dual problem lies in the Euclidean ball of radius $R$ centered at the origin, i.e., ${{\left\| {\boldsymbol{\lambda} \text{*}} \right\|}_{2}} \leqslant R$. As an initial point of the method, we use the zero vector ${{\boldsymbol{\lambda} }^{0}} = 0$. The problem is solved on the set

$${{\Lambda }_{{2R}}} = \left\{ {\boldsymbol{\lambda} \in \mathbb{R}_{ + }^{m}:{{{\left\| \boldsymbol{\lambda} \right\|}}_{2}} \leqslant 2R} \right\}.$$

Let us describe the ellipsoid method (Algorithm 4), which is used to solve the dual problem.

Algorithm 4. Ellipsoid Method

Input: ${{u}_{k}}({{x}_{k}}),k = 1, \ldots ,n$, are concave utility functions

1: ${{B}_{0}}: = 2R \cdot {{I}_{n}}$, ${{I}_{n}}$ is the identity matrix

2: for $t = 0, \ldots ,N - 1$

3: Compute $\nabla \varphi ({{\boldsymbol{\lambda} }^{t}})$

4: ${{{\mathbf{q}}}_{t}}: = B_{t}^{{\text{T}}}\nabla \varphi ({{\boldsymbol{\lambda} }^{t}})$

5: ${{{\mathbf{p}}}_{t}}: = \frac{{B_{t}^{{\text{T}}}{{{\mathbf{q}}}_{t}}}}{{\sqrt {{\mathbf{q}}_{t}^{{\text{T}}}{{B}_{t}}B_{t}^{{\text{T}}}{{{\mathbf{q}}}_{t}}} }}$

6: ${{B}_{{t + 1}}}: = \frac{m}{{\sqrt {{{m}^{2}} - 1} }}{{B}_{t}} + \left( {\frac{m}{{m + 1}} - \frac{m}{{\sqrt {{{m}^{2}} - 1} }}} \right){{B}_{t}}{{{\mathbf{p}}}_{t}}{\mathbf{p}}_{t}^{{\text{T}}}$

7: ${{\lambda }^{{t + 1}}}: = {{\lambda }^{t}} - \frac{1}{{m + 1}}{{B}_{t}}{{{\mathbf{p}}}_{t}}$

8: end for

9: return ${{\boldsymbol{\lambda} }^{N}}$

To reconstruct the solution of the primal problem from the solution of the dual one, it is necessary to determine the accuracy certificate $\xi $ for the ellipsoid method. Recall that the accuracy certificate is a sequence of weights $\xi = \{ {{\xi }_{t}}\} _{{t = 0}}^{{N - 1}}$ such that

$${{\xi }_{t}} \geqslant 0,\quad \sum\limits_{t = 0}^{N - 1} \,{{\xi }_{t}} = 1.$$

In our case, the accuracy certificate is constructed in the course of running the ellipsoid method (see Algorithm 5); its general scheme can be described as follows [26].

1. Find the “narrowest strip” containing the ellipsoid ${{Q}_{N}}$ remaining after iteration $N$, i.e., a vector ${\mathbf{h}}$ such that the following inequality holds on ${{Q}_{N}}$:

$$\mathop {\max}\limits_{\boldsymbol{\lambda} \in {{Q}_{N}}} \left\langle {{\mathbf{h}},\boldsymbol{\lambda} } \right\rangle - \mathop {\min}\limits_{\boldsymbol{\lambda} \in {{Q}_{N}}} \left\langle {{\mathbf{h}},\boldsymbol{\lambda} } \right\rangle \leqslant 1.$$

(10)

For the ellipsoid method, all ${{Q}_{N}}$ are represented in the form

$${{Q}_{N}} = \{ {{B}_{N}}{\mathbf{z}} + {{\lambda }^{N}}:{{{\mathbf{z}}}^{{\text{T}}}}{\mathbf{z}} \leqslant 1\} .$$

Then, to solve (10), we need to perform a singular value decomposition ${{B}_{N}} = UDV,$ where $U$ and $V$ are orthogonal matrices and $D$ is a diagonal matrix with positive diagonal elements. Next, the desired vector ${\mathbf{h}}$ is determined as ${\mathbf{h}} = 1{\text{/}}(2{{\sigma }^{i}}\text{*}) \cdot U{{{\mathbf{e}}}^{i}}\text{*}$, where ${{i}_{*}}$ is the index of the smallest diagonal element of $D$, ${{\sigma }^{i}}\text{*}$ is the value of this element, and ${{{\mathbf{e}}}^{i}}$ are the vectors of the standard basis.

2. For the vectors ${{{\mathbf{h}}}^{ + }} = \left[ {{\mathbf{h}}, - \left\langle {{\mathbf{h}},{{\boldsymbol{\lambda} }^{N}}} \right\rangle } \right]$ and ${{{\mathbf{h}}}^{ - }} = - {{{\mathbf{h}}}^{ + }}$, find expansions of the form

$${{{\mathbf{h}}}^{ + }} = \sum\limits_{t = 0}^{N - 1} \,{{\nu }_{t}}\left[ {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}), - \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),{{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right] + {{{\mathbf{y}}}^{ + }},$$

$${{{\mathbf{h}}}^{ - }} = \sum\limits_{t = 0}^{N - 1} \,{{\mu }_{t}}\left[ {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}), - \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),{{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right] + {{{\mathbf{z}}}^{ + }};$$

their existence follows from Proposition 4.1 in [26]. This step is described by Steps 6–13 in Algorithm 5 (see below).

3. From the expansion coefficients ${{\nu }_{t}}$ and ${{\mu }_{t}}$ of the vectors ${{{\mathbf{h}}}^{ + }}$ and ${{{\mathbf{h}}}^{ - }}$, respectively, derive expressions for ${{\xi }_{t}},\;t \in {{I}_{N}}$, where

$${{I}_{N}} = \{ t \leqslant N - 1:{{\boldsymbol{\lambda} }^{t}} \in \operatorname{int} {{\Lambda }_{{2R}}}\} .$$

Expansion coefficients are determined only for feasible points obtained in the course of running Algorithm 5.

Algorithm 5. Construction of the Accuracy Certificate for the Ellipsoid Method

Input: $N - 1$ is the number of the iteration at which the accuracy certificate is computed, and $\mathop {\left\{ {{{B}_{t}},{{\boldsymbol{\lambda} }^{t}},\nabla \varphi ({{\boldsymbol{\lambda} }^{t}})} \right\}}\nolimits_{t = 0}^{N - 1} $ is the work protocol of the ellipsoid method after $N$ iterations

1: if $\nabla \varphi ({{\boldsymbol{\lambda} }^{{N - 1}}}) = 0,$ then

2: ${{\xi }_{t}}: = 0$ for all $t = 0, \ldots ,N - 2$

3: ${{\xi }_{{N - 1}}}: = 1$

4: otherwise

5: ${\mathbf{h}}: = 1{\text{/}}(2{{\sigma }^{i}}\text{*}) \cdot U{{{\mathbf{e}}}^{i}}\text{*}$

6: ${{{\mathbf{g}}}_{\nu }}: = {\mathbf{h}},$ ${{{\mathbf{g}}}_{\mu }}: = - {\mathbf{h}}$

7: for $t = 0, \ldots ,N - 1$

8: ${\mathbf{q}}: = B_{t}^{{\text{T}}}\nabla \varphi ({{\lambda }^{t}})$

9: ${{\nu }_{t}}: = {{[{\mathbf{g}}_{\nu }^{{\text{T}}}{{B}_{t}}{\mathbf{q}}]}_{ + }}{\text{/}}\left\| {\mathbf{q}} \right\|_{2}^{2}$

10: ${{{\mathbf{g}}}_{\nu }}: = {{{\mathbf{g}}}_{\nu }} - {{\nu }_{t}}\nabla \varphi ({{\boldsymbol{\lambda} }^{t}})$

11: ${{\mu }_{t}}: = {{[{\mathbf{g}}_{\mu }^{{\text{T}}}{{B}_{t}}{\mathbf{q}}]}_{ + }}{\text{/}}\left\| {\mathbf{q}} \right\|_{2}^{2}$

12: ${{{\mathbf{g}}}_{\mu }}: = {{{\mathbf{g}}}_{\mu }} - {{\mu }_{t}}\nabla \varphi ({{\boldsymbol{\lambda} }^{t}})$

13: end for

14: ${{\xi }_{t}}: = ({{\nu }_{t}} + {{\mu }_{t}}){\text{/}}\sum\nolimits_{i \in {{I}_{N}}}^{} {({{\nu }_{i}} + {{\mu }_{i}})} ,$ $t \in {{I}_{N}}$

15: end if

16: return $\mathop {\left\{ {{{\xi }_{t}}} \right\}}\nolimits_{t = 0}^{N - 1} $

Remark 3. In contrast to FGM and the stochastic projected subgradient method, the computation of Steps 4–6 of Algorithm 4 in the ellipsoid method requires information about all gradient components, i.e., information from all users. Accordingly, it is necessary to have a common center for all links that collects information from them and performs these computations.

An estimate for the convergence rate of the ellipsoid method for the problem under study is provided by the following result.

Theorem 3 (see [26]). Suppose that Algorithm 4 starts from the initial ball ${{B}_{0}} = \{ \boldsymbol{\lambda} \in {{\mathbb{R}}^{m}}:\left\| \boldsymbol{\lambda} \right\| \leqslant 2R\} $ and the accuracy certificate $\xi $ is produced by Algorithm 5. Then, after performing

$$N = 2m(m + 1)\left\lceil {\ln\left( {\frac{{32 \cdot 4MR}}{\varepsilon }} \right)} \right\rceil $$

(11)

iterations, it is true that

$$U({\mathbf{x}}\text{*}) - U({{{\mathbf{\hat {x}}}}^{N}}) \leqslant \varepsilon ,\quad {{\left\| {{{{[C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}}]}}_{ + }}} \right\|}_{2}} \leqslant \frac{\varepsilon }{R},$$

where

$${{{\mathbf{\hat {x}}}}^{N}} = \sum\limits_{t \in {{I}_{N}}} \,{{\xi }_{t}}{\mathbf{x}}({{\lambda }^{t}}),\quad {{I}_{N}} = \left\{ {t \leqslant N - 1:{{\lambda }^{t}} \in \operatorname{int} {{\Lambda }_{{2R}}}} \right\}.$$

The proof of this theorem can be found in the Appendix.

6 REGULARIZATION OF THE DUAL PROBLEM

In previous sections, we considered primal-dual methods for solving the dual problem. However, there is a standard approach in which the solution of the primal problem can be recovered from the solution of the dual problem without using primal-dual methods. The key idea of this approach is a regularization of the dual problem such that the resulting regularized problem is strongly convex. In what follows, we describe this approach in detail and state lemmas relating the solutions of the primal and dual problems.

Functional (2) is regularized in the sense of Tikhonov:

$${{\varphi }_{\delta }}(\boldsymbol{\lambda} ) = \varphi (\boldsymbol{\lambda} ) + \frac{\delta }{2}\left\| \boldsymbol{\lambda} \right\|_{2}^{2}$$

and, instead of problem (4), we solve the regularized problem

$$\mathop {\min}\limits_{\boldsymbol{\lambda} \in {\mathbf{R}}_{ + }^{m}} {{\varphi }_{\delta }}(\boldsymbol{\lambda} ).$$

An optimal parameter $\delta $ will be specified later. As in Section 5, we assume that the problem is solved on the set

$${{\Lambda }_{{2R}}} = \{ \lambda \in \mathbb{R}_{ + }^{m}:{{\left\| \boldsymbol{\lambda} \right\|}_{2}} \leqslant 2R\} .$$

For the resulting regularized function, we formulate the following lemma on the smoothness of the regularized problem.

Lemma 3. Suppose that the function $\varphi (\boldsymbol{\lambda} )$ is $L$-smooth. Then the regularized function ${{\varphi }_{\delta }}(\boldsymbol{\lambda} )$ is $(L + \delta )$-smooth, i.e., for any ${{\boldsymbol{\lambda} }^{1}}$, ${{\boldsymbol{\lambda} }^{2}} \in {\mathbf{R}}_{ + }^{m}$,

$${{\left\| {\nabla {{\varphi }_{\delta }}({{\boldsymbol{\lambda} }^{1}}) - \nabla {{\varphi }_{\delta }}({{\boldsymbol{\lambda} }^{2}})} \right\|}_{2}} \leqslant (L + \delta ){{\left\| {{{\boldsymbol{\lambda} }^{1}} - {{\boldsymbol{\lambda} }^{2}}} \right\|}_{2}}.$$

(12)

Proof. The gradient of the regularized function is given by

$$\nabla {{\varphi }_{\delta }}(\boldsymbol{\lambda} ) = \nabla \varphi (\boldsymbol{\lambda} ) + \delta \boldsymbol{\lambda} .$$

Therefore, we have

$${{\left\| {\nabla {{\varphi }_{\delta }}({{\boldsymbol{\lambda} }^{1}}) - \nabla {{\varphi }_{\delta }}({{\boldsymbol{\lambda} }^{2}})} \right\|}_{2}} = {{\left\| {\nabla \varphi ({{\boldsymbol{\lambda} }^{1}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{2}}) + \delta ({{\boldsymbol{\lambda} }^{1}} - {{\boldsymbol{\lambda} }^{2}})} \right\|}_{2}} \leqslant {{\left\| {\nabla \varphi ({{\boldsymbol{\lambda} }^{1}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{2}})} \right\|}_{2}} + \delta {{\left\| {{{\boldsymbol{\lambda} }^{1}} - {{\boldsymbol{\lambda} }^{2}}} \right\|}_{2}}.$$

By Proposition 2, this estimate implies (12).

Additionally, to estimate the convergence of the algorithm for the primal problem, we need the following auxiliary lemma concerning the relationship between the gradient estimate for the dual problem and convergence estimates with respect to the function and the residual in the constraint for the primal problem.

Lemma 4 (see [10]). Let ${\mathbf{x}}\text{*}$ be a solution of primal problem (1). Then

$${{\left\| {C{\mathbf{x}}(\boldsymbol{\lambda} ) - {\mathbf{b}}} \right\|}_{2}} \leqslant {{\left\| {\nabla {{\varphi }_{\delta }}(\boldsymbol{\lambda} )} \right\|}_{2}} + \delta {{\left\| \boldsymbol{\lambda} \right\|}_{2}},$$

(13)

$$U({\mathbf{x}}\text{*}) - U({\mathbf{x}}(\boldsymbol{\lambda} )) \leqslant {{\left\| {\nabla {{\varphi }_{\delta }}(\lambda )} \right\|}_{2}} \cdot {{\left\| \boldsymbol{\lambda} \right\|}_{2}} + \delta \left\| \boldsymbol{\lambda} \right\|_{2}^{2},$$

(14)

where ${\mathbf{x}}(\boldsymbol{\lambda} )$ is defined by (3).

Proof. By virtue of (3), we have

$$U({\mathbf{x}}(\lambda )) + \left\langle {\lambda ,{\mathbf{b}} - C{\mathbf{x}}(\lambda )} \right\rangle \geqslant U({\mathbf{x}}{\kern 1pt} {\text{*}}) + \left\langle {\lambda ,{\mathbf{b}} - C{\mathbf{x}}{\kern 1pt} {\text{*}}} \right\rangle \geqslant U({\mathbf{x}}{\kern 1pt} {\text{*}}),$$

whence

$$U({\mathbf{x}}(\boldsymbol{\lambda} )) \geqslant U({\mathbf{x}}\text{*}) - \left\langle {\boldsymbol{\lambda} ,{\mathbf{b}} - C{\mathbf{x}}(\boldsymbol{\lambda} )} \right\rangle = U({\mathbf{x}}\text{*}) - \left\langle {\boldsymbol{\lambda} ,\nabla \varphi (\boldsymbol{\lambda} )} \right\rangle .$$

Since $\varphi (\boldsymbol{\lambda} ) = {{\varphi }_{\delta }}(\boldsymbol{\lambda} ) - \tfrac{\delta }{2}\left\| \boldsymbol{\lambda} \right\|_{2}^{2}$, it is true that

$${{\left\| {\nabla \varphi (\boldsymbol{\lambda} )} \right\|}_{2}} = {{\left\| {\nabla {{\varphi }_{\delta }}(\boldsymbol{\lambda} ) - \delta \boldsymbol{\lambda} } \right\|}_{2}} \leqslant \left\| {\nabla {{\varphi }_{\delta }}(\boldsymbol{\lambda} )} \right\| + \delta {{\left\| \boldsymbol{\lambda} \right\|}_{2}}.$$

Combining this inequality with the relation $\nabla \varphi (\boldsymbol{\lambda} ) = {\mathbf{b}} - C{\mathbf{x}}(\boldsymbol{\lambda} )$ yields (13).

Furthermore, estimate (14) follows from

$$\begin{gathered} U({\mathbf{x}}\text{*}) - U({\mathbf{x}}(\boldsymbol{\lambda} )) \leqslant \left\langle {\boldsymbol{\lambda} ,\nabla \varphi (\boldsymbol{\lambda} )} \right\rangle \leqslant {{\left\| {\nabla \varphi (\boldsymbol{\lambda} )} \right\|}_{2}} \cdot {{\left\| \boldsymbol{\lambda} \right\|}_{2}} \\ \, \leqslant {{\left\| \boldsymbol{\lambda} \right\|}_{2}} \cdot \left( {\mathop {\left\| {\nabla {{\varphi }_{\delta }}(\boldsymbol{\lambda} )} \right\|}\nolimits_2 + \delta \mathop {\left\| \boldsymbol{\lambda} \right\|}\nolimits_2 } \right) \leqslant {{\left\| {\nabla {{\varphi }_{\delta }}(\boldsymbol{\lambda} )} \right\|}_{2}} \cdot {{\left\| \boldsymbol{\lambda} \right\|}_{2}} + \delta \left\| \boldsymbol{\lambda} \right\|_{2}^{2}. \\ \end{gathered} $$

Additionally, we need the following result concerning convergence with respect to the gradient of the regularized function.

Lemma 5. Let $\boldsymbol{\lambda} _{\delta }^{*}$ be a solution of the regularized dual problem. Then

$$\mathop {\left\| {\nabla {{\varphi }_{\delta }}({{\boldsymbol{\lambda} }^{N}})} \right\|}\nolimits_2 \leqslant (L + \delta ){{\left\| {{{\boldsymbol{\lambda} }^{N}} - \boldsymbol{\lambda} _{\delta }^{*}} \right\|}_{2}}.$$

The proof follows immediately from Lemma 3 and the relation

$$\nabla {{\varphi }_{\delta }}(\boldsymbol{\lambda} _{\delta }^{*}) = 0.$$

We have formulated the lemmas necessary for the regularized problem. An example of applying this approach is considered in the next section.

7 RANDOM GRADIENT EXTRAPOLATION METHOD

Consider the random gradient extrapolation method [27]. Note that this method does not require updating the gradient at every iteration step. It is necessary to update only one of its components at every iteration, which considerably reduces the computations, especially for large-scale problems. Since this method is not primal-dual, Algorithm 6 has to be applied to the regularized problem.

The parameters $\alpha $, $\eta $, $\tau $, and ${{\theta }_{t}}$ are specified as

$$\bar {\alpha } = 1 - \frac{1}{{n + \sqrt {{{n}^{2}} + 16nL{\text{/}}\delta } }},$$

(15)

$$\alpha = n\bar {\alpha },\quad \eta = \frac{{\delta \bar {\alpha }}}{{1 - \bar {\alpha }}},\quad \tau = \frac{1}{{n(1 - \bar {\alpha })}} - 1,\quad {{\theta }_{t}} = \mathop {\bar {\alpha }}\nolimits^{ - t} .$$

(16)

7.1 Distributed Method

This section presents a distributed version of the considered method. By way of introduction, we note that the vectors $\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_1^0 , \ldots ,\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_n^0 $ are stored by the corresponding users and influence the formation of optimal data traffic for the corresponding user. As was noted in the description of the distributed FGM, the optimal traffic for a user is influenced only by the prices of the links through which this user exchanges packets. Therefore, we can assume that the only nonzero components in the vector $\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_k^t $ are those whose indices coincide with the indices of the used links.

Algorithm 6. Random Gradient Extrapolation Method (RGEM)

Input: Parameters $\alpha $, $\eta $, $\tau $, $\{ {{\theta }_{t}}\} _{{t = 1}}^{N}$

1: ${{{\boldsymbol{\lambda}} }^{0}}: = {\mathbf{0}}$

2: $\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_i^0 : = {{{\boldsymbol{\lambda}} }^{0}}$, $i = 1, \ldots ,n$

3: ${{y}_{{ - 1}}} = {{y}_{0}} = {\mathbf{0}}$

4: for $t = 1, \ldots ,N$

5: Choose ${{k}_{t}}$ at random from the set $\{ 1, \ldots ,n\} $ uniformly over all values

6: ${\mathbf{\tilde {y}}}_{k}^{t}: = {\mathbf{y}}_{k}^{{t - 1}} + \alpha ({\mathbf{y}}_{k}^{{t - 1}} - {\mathbf{y}}_{k}^{{t - 2}}),$ $k = 1, \ldots ,n$

7: ${{{\boldsymbol{\lambda}} }^{t}}: = \mathop {\left[ {\eta {{{\boldsymbol{\lambda}} }^{{t - 1}}} - \tfrac{1}{n}\sum\nolimits_{k = 1}^n {{\mathbf{\tilde {y}}}_{k}^{t}} } \right]}\nolimits_ + {\text{/}}(\delta + \eta )$

8:

9: $\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_{{{k}_{t}}}^t : = ({{{\boldsymbol{\lambda}} }^{t}} + \tau \mathop {\underline {\boldsymbol{\lambda}} }\nolimits_{{{k}_{t}}}^{t - 1} ){\text{/}}(1 + \tau )$

10: $\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_k^t : = \mathop {\underline {\boldsymbol{\lambda}} }\nolimits_k^{t - 1} $, $k \in \{ 1, \ldots ,n\} {{\backslash }}\{ {{k}_{t}}\} $

11:

12: ${\mathbf{y}}_{{{{k}_{t}}}}^{t}: = {\mathbf{b}} - n{{{\mathbf{C}}}_{{{{k}_{t}}}}}{{x}_{{{{k}_{t}}}}}(\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_{{{k}_{t}}}^t )$

13: ${\mathbf{y}}_{k}^{t}: = {\mathbf{y}}_{k}^{{t - 1}}$, $k \in \{ 1, \ldots ,n\} {{\backslash }}\{ {{k}_{t}}\} $

14: end for

15: ${{\bar {{\boldsymbol{\lambda}} }}^{N}}: = \left( {\sum\nolimits_{t = 0}^{N - 1} {{{\theta }_{t}}{{{\boldsymbol{\lambda}} }^{t}}} } \right){\text{/}}\sum\nolimits_{t = 1}^N {{{\theta }_{t}}} $

16: return ${{\bar {{\boldsymbol{\lambda}} }}^{N}}$

Let us describe the distributed algorithm at the $t$th iteration.

1. Using information collected from the users at the preceding iteration, link $j$ computes

$$\tilde {y}_{{k,j}}^{t}: = y_{{k,j}}^{{t - 1}} + \alpha (y_{{k,j}}^{{t - 1}} - y_{{k,j}}^{{t - 2}}) = {{b}_{j}} - nC_{k}^{j}{{x}_{k}}(\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_k^{t - 1} ) + \alpha (nC_{k}^{j}{{x}_{k}}(\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_k^{t - 2} ) - nC_{k}^{j}{{x}_{k}}(\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_k^{t - 1} )).$$

Note that, by the definition of the matrix $C,$ link $j$ needs information only from the users exchanging packets through this link.

2. The price of link $j$ changes according to the rule

$$\lambda _{j}^{t} = {{\left[ {\eta \lambda _{j}^{{t - 1}} - \frac{1}{n}\sum\limits_{k = 1}^n \,\tilde {y}_{{k,j}}^{t}} \right]}_{ + }}{\text{/}}(\delta + \eta ).$$

3. One of the users, ${{k}_{t}}$, reacts to the price change and stores the local price vector

$$\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_{{{k}_{t}}}^t = ({{{\boldsymbol{\lambda}} }^{t}} + \tau \mathop {\underline {\boldsymbol{\lambda}} }\nolimits_{{{k}_{t}}}^{t - 1} ){\text{/}}(1 + \tau ),$$

while the local prices for the other users remain unchanged, i.e., $\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_{{{k}_{t}}}^t = \mathop {\underline {\boldsymbol{\lambda}} }\nolimits_{{{k}_{t}}}^{t - 1} $.

4. The user ${{k}_{t}}$ computes

$$x_{{{{k}_{t}}}}^{t}(\mathop {\underline \lambda }\nolimits_{{{k}_{t}},j}^t ) = \mathop {\arg\,\max}\limits_{{{x}_{k}} \in {{{\mathbf{R}}}_{ + }}} \,\left( {{{u}_{k}}({{x}_{k}}) - {{x}_{k}}\sum\limits_{j = 1}^m \,\mathop {\underline \lambda }\nolimits_{{{k}_{t}},j}^t C_{k}^{j}} \right)$$

and transmits this information to the used links.

5. Link $j$ updates the information for the user ${{k}_{t}}$:

$$y_{{{{k}_{t}},j}}^{t} = {{b}_{j}} - nC_{{{{k}_{t}}}}^{j}{{x}_{{{{k}_{t}}}}}(\mathop {\underline {\boldsymbol{\lambda}} }\nolimits_{{{k}_{t}}}^t ).$$

This information is updated by the link only if the user ${{k}_{t}}$ exchanges packets through it.

7.2 Estimation of the Convergence Rate of RGEM

Following Section 3, we consider problem (2) with $\mu $-strongly concave cost functions ${{u}_{k}}({{x}_{k}}),k = 1, \ldots ,n$. Recall that, since the cost functions are strongly concave, the dual problem (4) is smooth with Lipschitz constant $L = \tfrac{{n{{m}^{2}}}}{\mu }$.

To estimate the convergence rate of the method, we need the estimate for the residual with respect to the argument from Theorem 2.1 in [27], namely,

$$\mathbb{E}\left[ {\left\| {{\boldsymbol{\lambda}} _{\delta }^{*} - {{{\boldsymbol{\lambda}} }^{N}}} \right\|_{2}^{2}} \right] \leqslant \frac{{4\Delta {{{(\bar {\alpha })}}^{N}}}}{\delta },$$

(17)

where $\Delta = \delta \left\| {{\boldsymbol{\lambda}} _{\delta }^{*} - {{{\boldsymbol{\lambda}} }^{0}}} \right\|_{2}^{2} + \tfrac{B}{{n\delta }} + {{\varphi }_{\delta }}({{\lambda }^{0}}) - {{\varphi }_{\delta }}(\lambda _{\delta }^{*})$ and $B = \left\| {\mathbf{b}} \right\|_{2}^{2}$.

By using (17), it is possible to prove the following convergence estimate theorem for the method as applied to problem (11).

Theorem 4. Suppose that the regularized dual problem (11) is solved by applying RGEM with parameters (15), (16), and $\delta = \tfrac{\varepsilon }{{8{{R}^{2}}}}$ and with

$$N = \left\lceil {2\left( {n + \sqrt {{{n}^{2}} + \frac{{128nL{{R}^{2}}}}{\varepsilon }} } \right)\ln\left( {\frac{{4RA}}{\varepsilon }} \right)} \right\rceil $$

iterations, where $A = 2\left( {LR + \tfrac{\varepsilon }{{8R}}} \right)\sqrt {6 + \tfrac{{16L{{R}^{2}}n + 8B}}{{n\varepsilon }}} $. Then

$$\mathbb{E}{\kern 1pt} [U({\mathbf{x}}\text{*}) - U({\mathbf{x}}({{\lambda }^{N}}))] \leqslant \varepsilon ,\quad \mathbb{E}\left[ {\mathop {\left\| {C{\mathbf{x}}({{\lambda }^{N}}) - {\mathbf{b}}} \right\|}\nolimits_2 } \right] \leqslant \frac{\varepsilon }{{2R}}.$$

Proof. Lemma 4 implies estimate (13) for the residual with respect to the constraints and estimate (14) for the residual with respect to the objective function. By the assumption ${\boldsymbol{\lambda}} \in {{\Lambda }_{{2R}}}$, we have

$${{\left\| {C{{{\mathbf{x}}}^{N}} - {\mathbf{b}}} \right\|}_{2}} \leqslant {{\left\| {\nabla {{\varphi }_{\delta }}({{{\boldsymbol{\lambda}} }^{N}})} \right\|}_{2}} + 2\delta R,$$

(18)

$$U({\mathbf{x}}\text{*}) - U({{{\mathbf{x}}}^{N}}) \leqslant 2R{{\left\| {\nabla {{\varphi }_{\delta }}({{{\boldsymbol{\lambda}} }^{N}})} \right\|}_{2}} + 4\delta {{R}^{2}},$$

(19)

where ${{{\mathbf{x}}}^{N}} = {\mathbf{x}}({{{\boldsymbol{\lambda}} }^{N}})$. Combining Lemma 5 with inequality (17) yields the following estimate for ${{\left\| {\nabla {{\varphi }_{\delta }}({{{\boldsymbol{\lambda}} }^{N}})} \right\|}_{2}}$:

$$\mathbb{E}\left[ {{{{\left\| {\nabla {{\varphi }_{\delta }}({{{\boldsymbol{\lambda}} }^{N}})} \right\|}}_{2}}} \right] \leqslant 2(L + \delta )\sqrt {\frac{\Delta }{\delta }} {{(\bar {\alpha })}^{{N/2}}}.$$

Let us estimate $\Delta $. The function ${{\varphi }_{\delta }}$ with a Lipschitz continuous gradient satisfies the inequality

$${{\varphi }_{\delta }}({{{\boldsymbol{\lambda}} }^{0}}) - {{\varphi }_{\delta }}({\boldsymbol{\lambda}} _{\delta }^{*}) \leqslant \left\langle {\nabla {{\varphi }_{\delta }}({\boldsymbol{\lambda}} _{\delta }^{*}),{{\lambda }^{0}} - {{\lambda}} \text{*}} \right\rangle + \frac{{L + \delta }}{2}\left\| {{\boldsymbol{\lambda}} _{\delta }^{*} - {{{\boldsymbol{\lambda}} }^{0}}} \right\|_{2}^{2}.$$

Since $\nabla {{\varphi }_{\delta }}({\boldsymbol{\lambda}} _{\delta }^{*}) = 0$, we obtain

$$\Delta \leqslant \delta \left\| {{\boldsymbol{\lambda}} _{\delta }^{*} - {{{\boldsymbol{\lambda}} }^{0}}} \right\|_{2}^{2} + \frac{B}{{n\delta }} + \frac{{L + \delta }}{2}\left\| {{\boldsymbol{\lambda}} _{\delta }^{*} - {{{\boldsymbol{\lambda}} }^{0}}} \right\|_{2}^{2} \leqslant (6\delta + 2L){{R}^{2}} + \frac{B}{{n\delta }}.$$

Suppose that $\delta $ is chosen so that $4\delta {{R}^{2}} = \tfrac{\varepsilon }{2}$. Then $\delta = \tfrac{\varepsilon }{{8{{R}^{2}}}}$. It follows that

$$4\delta {{R}^{2}} = \frac{\varepsilon }{2},\quad 2\delta R = \frac{\varepsilon }{{4R}}.$$

Assume that $U({\mathbf{x}}\text{*}) - U({\mathbf{x}}({{{\boldsymbol{\lambda}} }^{N}})) \leqslant \varepsilon $. Then, by virtue of (18) and (19), it is true that

$${{\left\| {\nabla {{\varphi }_{\delta }}({{{\boldsymbol{\lambda}} }^{N}})} \right\|}_{2}} \leqslant \frac{\varepsilon }{{4R}},$$

whence

$$2(L + \delta )\sqrt {\frac{\Delta }{\delta }} {{(\bar {\alpha })}^{{N/2}}} \leqslant \frac{\varepsilon }{{4R}}.$$

Taking into account

$$2(L + \delta )\sqrt {\frac{\Delta }{\delta }} \leqslant 2\left( {LR + \frac{\varepsilon }{{8R}}} \right)\sqrt {6 + \frac{{16L{{R}^{2}}n + 8B}}{{n\varepsilon }}} ,$$

we obtain the following estimate for the number of iterations:

$$N = \left\lceil {2\left( {n + \sqrt {{{n}^{2}} + \frac{{128nL{{R}^{2}}}}{\varepsilon }} } \right)\ln\left( {\frac{{4RA}}{\varepsilon }} \right)} \right\rceil ,$$

where $A = 2\left( {LR + \tfrac{\varepsilon }{{8R}}} \right)\sqrt {6 + \tfrac{{16L{{R}^{2}}n + 8B}}{{n\varepsilon }}} $.

Remark 4. The complexity bound for Algorithm 6 can also be represented in the form $O\left( {\max\left\{ {n,\sqrt {nL{{R}^{2}}{\text{/}}\varepsilon } } \right\}\ln\left( {\tfrac{1}{\varepsilon }} \right)} \right)$, where the logarithmic factor appears due to the necessity of regularization of the dual problem. At every iteration, only one component of the user reaction vector to changed prices is computed; accordingly, the arithmetic complexity of the operation is better than that in the case of computing all components of these vectors. For FGM, the assumptions made about the objective function are similar, but, since the complete gradient has to be computed at every iteration step, the complexity bound for the algorithm is $O\left( {n\sqrt {L{{R}^{2}}{\text{/}}\varepsilon } } \right)$. Thus, although the theoretical convergence estimate for RGEM has the same order as for FGM, in practice the gain is obtained due to the cheaper computations within a single iteration.

8 NUMERICAL EXPERIMENTS

The software code for numerical experiments was written in Python 3.6 and C++14. The source code for experiments and the methods considered in this paper is available at https://github.com/dmivilensky/network-resource-allocation. The running time was measured on a computer with a 2-core Intel Core i5-5250U 1.6 GHz processor and 8 GB RAM.

8.1 Strongly Convex (Quadratic) Utility Functions

Consider problem (1) for utility functions of the form

$${{u}_{k}}({{x}_{k}}) = {{a}_{k}}{{x}_{k}} - \frac{{\sigma n}}{2}x_{k}^{2},\quad {{a}_{k}} \sim \mathcal{U}(0,100),\quad \sigma = 0.1,$$

where ${{a}_{k}}$ are independent identically distributed random variables. Then problem (3) can be solved explicitly:

$${\mathbf{x}}(\lambda ) = \frac{{{{{[{\mathbf{a}} - C{\boldsymbol{\lambda}} ]}}_{ + }}}}{{n\sigma }}.$$

For a small number of users ($n = 1500$), the link capacities are chosen identical (in this case, ${\mathbf{b}} = (5, \ldots ,5)^{\text{T}}$), and the demand for data transmission is uniform (${{c}_{{ij}}} = 1$ for any $i,j$). For a larger number of users, the capacity vector is generated at random, so that ${{b}_{i}} \sim \mathcal{U}(1,6)$. The elements of the demand matrix are also chosen randomly and independently, so that ${{c}_{{ij}}} = 1$ with probability $p = 0.5$ and ${{c}_{{ij}}} = 0$ with probability $q = 0.5$.

Table 1 presents the number of iterations and the running times of the fast gradient method (FGM) and the random gradient extrapolation method (RGEM) for various network configurations (with $m$ links), various numbers of users $n$, and various values of the required accuracy $\varepsilon $. The cases in which RGEM converges to the solution faster than FGM, despite the larger number of iterations than in FGM, are highlighted in the table. Indeed, for $n \gg 0$, RGEM requires fewer queries for the optimal solution ${{x}_{k}}({\boldsymbol{\lambda}} )$ from users than in other algorithms, since a query at one RGEM iteration is sent to only one random user.

Table 1. Comparison of the number of iterations and the running time of FGM and RGEM for strongly convex (quadratic) utility functions

Full size table

8.2 Convex (Logarithmic) Utility Functions

Consider the performance of the stochastic subgradient method (Algorithm 2) and the ellipsoid method (Algorithm 4) for the utility function

$${{u}_{k}}({{x}_{k}}) = \ln{{x}_{k}}.$$

In this case, an explicit solution of problem (3) is given by

$${\mathbf{x}}({\boldsymbol{\lambda}} ) = \frac{1}{{C{\boldsymbol{\lambda}} }}$$

(the operation $1/ \cdot $ as applied to a vector is understood elementwise). For a small number of users ($n = 1500$), the link capacities are chosen identical (in this case, ${\mathbf{b}} = {{(5, \ldots ,5)}^{{\text{T}}}}$) and the demand for data transmission is uniform (${{c}_{{ij}}} = 1$ for any $i,j$). For a larger number of users, the capacity vector is randomly generated, so that ${{b}_{i}} \sim \mathcal{U}(1,6)$. The elements of the demand matrix are also chosen randomly and independently, so that ${{c}_{{ij}}} = 1$ with probability $p = 0.5$ and ${{c}_{{ij}}} = 0$ with probability $q = 0.5$.

Table 2 presents the number of iterations and the running times of the stochastic subgradient method (SGM) and the ellipsoid method for various network configurations, various numbers of users, and various values of the required accuracy. The cases in which SGM converges to the solution faster than the ellipsoid method are highlighted in the table.

Table 2. Comparison of the number of iterations and the running time of the stochastic subgradient method and the ellipsoid method for convex (logarithmic) utility functions

Full size table

Note that, as in RGEM, only one component of the user reaction vector ${\mathbf{x}}({\boldsymbol{\lambda}} )$ to established prices has to be computed at every iteration in SGM. Thus, when the number of iterations of the method is large, the number of computed components ${{x}_{k}}({{{\boldsymbol{\lambda}} }^{t}})$ is smaller than in other algorithms, for example, in the ellipsoid method, and the same is true of the communication complexity in the case of distributed implementation.

9 CONCLUSIONS

To conclude, we note some possible directions of development of this work and briefly describe suitable methods without detailed analysis of their convergence estimates.

In Section 5, as applied to low-dimensional problems, we considered the ellipsoid method, which is primal-dual. There are other methods that are highly accurate and well suited for low-dimensional problems. An example is Vaidya’s cutting plane method [28]. However, to recover the solution of the primal problem when the dual one is solved using Vaidya’s method, we need convergence in the gradient norm for the dual problem. For this purpose, the dual problem has to be smooth, which is ensured by the strong convexity of the objective function in the primal problem (Proposition 2). If the primal problem is not strongly convex, it can be regularized as described in Section 6, but the convergence estimate will then degrade logarithmically.

Additionally, if the dual problem is sufficiently smooth, it can be solved by applying high-order methods [29, 30]. The steps of these methods can be computed on a distributed basis, since the given problem makes use of a centralized architecture in terms of the interaction of a link and the users using it. Note, however, that high-order optimal methods that require linesearch and do not have the primal-dual property apply to only preliminarily regularized dual problems.

Another direction is represented by variance reduced methods (see, e.g., [31, 32]), which are intermediate between the stochastic gradient method and FGM. However, these methods are not primal-dual either, so they apply to preliminarily regularized dual problems.

Of special interest are the Hogwild! method [33] and minibatching techniques. In this case, data are sent out not by all users simultaneously, but by more than one of them, in contrast to stochastic methods. By setting the size of the batch equal to the number of users transmitting data at a time, one can take into account the specific features of actual networks.

REFERENCES

F. P. Kelly, A. K. Maulloo, and D. K. H. Tan, “Rate control for communication networks: Shadow prices, proportional fairness, and stability,” J. Oper. Res. Soc. 49, 237–252 (1998).
Article Google Scholar
D. B. Rokhlin, “Resource allocation in communication networks with large number of users: The stochastic gradient descent method” (2019). https://arxiv.org/abs/1905.04382
K. J. Arrow and L. Hurwicz, Decentralization and Computation in Resource Allocation (Department of Economics, Stanford Univ., Stanford, CA, 1958).
Google Scholar
A. Kakhbod, Resource Allocation in Decentralized Systems with Strategic Agents: An Implementation Theory Approach (Springer Science & Business Media, New York, 2013).
Book Google Scholar
D. E. Campbell, Resource Allocation Mechanisms (Cambridge Univ. Press, Cambridge, 1987).
Book Google Scholar
E. J. Friedman and S. S. Oren, “The complexity of resource allocation and price mechanisms under bounded rationality,” Econ. Theory 6, 225–250 (1995).
Article Google Scholar
Yu. Nesterov and V. Shikhman, “Dual subgradient method with averaging for optimal resource allocation,” Eur. J. Oper. Res. 270, 907–916 (2018).
Article MathSciNet Google Scholar
A. Ivanova, P. Dvurechensky, A. Gasnikov, and D. Kamzolov, “Composite optimization for the resource allocation problem” (2018). arXiv preprint arXiv:1810.00595
Yu. E. Nesterov, “Method of minimizing convex functions with convergence rate $O(1{\text{/}}{{k}^{2}})$,” Dokl. Akad. Nauk SSSR 269 (3), 543–547 (1983).
MathSciNet Google Scholar
A. V. Gasnikov, E. V. Gasnikova, Yu. E. Nesterov, and A. V. Chernov, “Efficient numerical methods for entropy-linear programming problems,” Comput. Math. Math. Phys. 56, 514–524 (2016).
Article MathSciNet Google Scholar
A. Chernov, P. Dvurechensky, and A. Gasnikov, “Fast primal-dual gradient method for strongly convex minimization problems with linear constraints,” Discrete Optimization and Operations Research: Proceedings of the 9th International Conference, DOOR 2016, Vladivostok, Russia, September 19–23, 2016 (Springer International, Berlin, 2016), pp. 391–403.
P. Dvurechensky, A. Gasnikov, E. Gasnikova, S. Matsievsky, A. Rodomanov, and I. Usik, “Primal-dual method for searching equilibrium in hierarchical congestion population games,” Supplementary Proceedings of the 9th International Conference on Discrete Optimization and Operations Research and Scientific School (DOOR 2016) Vladivostok, Russia, September 19–23, 2016, pp. 584–595. arXiv:1606.08988
A. Anikin, A. Gasnikov, A. Turin, and A. Chernov, “Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints,” Comput. Math. Math. Phys. 57, 1262–1276 (2017).
Article MathSciNet Google Scholar
P. Dvurechensky, A. Gasnikov, and A. Kroshnin, “Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn’s algorithm,” Proceedings of the 35th International Conference on Machine Learning (2018), Vol. 80, pp. 1367–1376. arXiv:1802.04367
Yu. Nesterov, A. Gasnikov, S. Guminov, and P. Dvurechensky, “Primal-dual accelerated gradient methods with small-dimensional relaxation oracle” (2018). arXiv:1809.05895
S. Guminov, P. Dvurechensky, and A. Gasnikov, “On accelerated alternating minimization” (2019). arXiv:1906.03622
S. V. Guminov, Yu. E. Nesterov, P. E. Dvurechensky, and A. V. Gasnikov, “Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems,” Dokl. Math. 99, 125–128 (2019).
Article Google Scholar
A. Kroshnin, N. Tupitsa, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, and C. A. Uribe, “On the complexity of approximating Wasserstein barycenters,” Proceedings of the 36th International Conference on Machine Learning, Ed. by K. Chaudhuri and R. Salakhutdinov (PMLR, California, US, 2019), Vol. 97, pp. 3530–3540. arXiv:1901.08686
C. A. Uribe, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, and A. Nedich, “Distributed computation of Wasserstein barycenters over networks,” 2018 IEEE Conference on Decision and Control (CDC) (2018), pp. 6544–6549. arXiv:1803.02933
D. Dvinskikh, E. Gorbunov, A. Gasnikov, P. Dvurechensky, and C. A. Uribe, “On primal and dual approaches for distributed stochastic convex optimization over networks,” 2019 IEEE Conference on Decision and Control (CDC) (2019). arXiv:1903.09844
P. Dvurechensky, D. Dvinskikh, A. Gasnikov, C. A. Uribe, and A. Nedic, “Decentralize and randomize: Faster algorithm for Wasserstein barycenters,” Adv. Neural Inf. Process. Syst. 31, 10783–10793 (2018). arXiv:1806.03915
D. M. Danskin, Theory of Maximin (Sovetskoe Radio, Moscow, 1970) [in Russian].
Google Scholar
V. F. Demyanov and V. N. Malozemov, Introduction to Minimax (Nauka, Moscow, 1972; Wiley, New York, 1974).
Yu. Nesterov, “Smooth minimization of nonsmooth functions,” Math. Program. 103, 127–152 (2005).
Article MathSciNet Google Scholar
D. B. Yudin and A. S. Nemirovski, “Information complexity and efficient methods for solving convex optimization problems,” Ekon. Mat. Metody, No. 2, 357–369 (1976).
Google Scholar
A. Nemirovski, S. Onn, and U. G. Rothblum, “Accuracy certificates for computational problems with convex structure,” Math. Oper. Res. 35, 52–78 (2010).
Article MathSciNet Google Scholar
G. Lan and Y. Zhou, “Random gradient extrapolation for distributed and stochastic optimization,” SIAM J. Optim. 28, 2753–2782 (2018).
Article MathSciNet Google Scholar
S. Bubeck, “Convex optimization: Algorithms and complexity,” Found. Trends Mach. Learn. 8 (3–4), 231–357 (2015).
Article Google Scholar
Yu. Nesterov, “Implementable tensor methods in unconstrained convex optimization,” Tech. Rep. (Universite catholique de Louvain, Center for Operations Research and Econometrics (CORE), 2018).
A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, C. A. Uribe, B. Jiang, H. Wang, S. Zhang, S. Bubeck, Q. Jiang, Y. T. Lee, Y. Li, and A. Sidford, “Near optimal methods for minimizing convex functions with Lipschitz pth derivatives,” Proceedings of the Thirty-Second Conference on Learning Theory (2019), Vol. 99, pp. 1392–1393. http://proceedings.mlr.press/v99/gasnikov19b.html
K. Zhou, F. Shang, and J. Cheng, “A simple stochastic variance reduced algorithm with fast convergence rates” (2018). arXiv preprint arXiv:1806.11027
K. Zhou, “Direct acceleration of SAGA using sampled negative momentum” (2018). arXiv preprint arXiv:1806.11048
F. Niu, B. Recht, C. Re, and S. J. Wright, “Hogwild: A lock-free approach to parallelizing stochastic gradient descent,” in Advances in Neural Information Processing Systems (2011), pp. 693–701.
Google Scholar
C. Jin, P. Netrapalli, R. Ge, S. M. Kakade, and M. I. Jordan, “A short note on concentration inequalities for random vectors with subgaussian norm” (2019). arXiv preprint arXiv:1902.03736
A. Juditsky and A. Nemirovski, “Large deviations of vector-valued martingales in 2-smooth normed spaces,” Tech. Rep. (2008). http://hal.archives-ouvertes.fr/hal-00318071
Yu. Nesterov, Lectures on Convex Optimization, 2nd ed. (Springer, 2018).
Book Google Scholar
Yu. E. Nesterov, Doctoral Dissertation in Mathematics and Physics (Moscow Inst. of Physics and Technology, Dolgoprudnyi, 2013).
A. Beck, Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB (SIAM, Philadelphia, 2014).
Book Google Scholar

Download references

Funding

Gasnikov’s research was supported by the Russian Foundation for Basic Research, grant no. 18-31-20005 mol_a_ved and 19-31-51001 Scientific mentoring. Dvurechensky’s research was supported by the Russian Foundation for Basic Research, grant no. 18-29-03071 mk. Vorontsova’s research was supported by the Ministry of Science and Higher Education of the Russian Federation (state assignment no. 075-00337-20-03), project no. 0714-2020-0005.

Author information

Authors and Affiliations

Moscow Institute of Physics and Technology (National Research University), 141701, Dolgoprudnyi, Moscow oblast, Russia
E. A. Vorontsova, A. V. Gasnikov & D. A. Pasechnyuk
Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 127051, Moscow, Russia
A. V. Gasnikov & P. E. Dvurechensky
Weierstrass Institute for Applied Analysis and Stochastics, 10117, Berlin, Germany
P. E. Dvurechensky
National Research University Higher School of Economics, 109028, Moscow, Russia
A. S. Ivanova

Authors

E. A. Vorontsova
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Gasnikov
View author publications
You can also search for this author in PubMed Google Scholar
P. E. Dvurechensky
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Ivanova
View author publications
You can also search for this author in PubMed Google Scholar
D. A. Pasechnyuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. S. Ivanova.

Additional information

Translated by I. Ruzanova

APPENDIX

1.1 Auxiliary Results

Below are some lemmas from other works that are used in the proofs. Additionally, we prove assertions concerning the properties of the dual function that are used in the proof of the main theorems.

Lemma 6 (see [34], Lemma 2). For a random vector $\xi \in {{{\mathbf{R}}}^{n}}$, the following assertions are equivalent up to a constant multiplying $\sigma $:

1. Tails: $\mathbb{P}\left\{ {{{{\left\| \xi \right\|}}_{2}} \geqslant \gamma } \right\} \leqslant 2\exp\left( { - \tfrac{{{{\gamma }^{2}}}}{{2{{\sigma }^{2}}}}} \right)$ $\forall \gamma \geqslant 0$.

2. Moments: ${{(\mathbb{E}{\kern 1pt} [{{\xi }^{p}}])}^{{1/p}}} \leqslant \sigma \sqrt p $ for any positive integer $p$.

3. Light-tail assumption: $\mathbb{E}\left[ {\exp\left( {\tfrac{{\left\| \xi \right\|_{2}^{2}}}{{{{\sigma }^{2}}}}} \right)} \right] \leqslant \exp(1)$.

Lemma 7 (see [34, Corollary 8]). Let $\{ {{\xi }^{k}}\} _{{k = 1}}^{N}$ be a sequence of random vectors from ${{{\mathbf{R}}}^{n}}$ such that for $k = 1, \ldots ,N$ and any $\gamma \geqslant 0$,

$$\mathbb{E}{\kern 1pt} [{{\xi }^{k}}\,{\text{|}}\,{{\xi }^{1}}, \ldots ,{{\xi }^{{k - 1}}}] = 0,\quad \mathbb{E}\left[ {{{{\left\| {{{\xi }^{k}}} \right\|}}_{2}} \geqslant \gamma \,{\text{|}}\,{{\xi }^{1}}, \ldots ,{{\xi }^{{k - 1}}}} \right] \leqslant \exp\left( { - \frac{{{{\gamma }^{2}}}}{{2\sigma _{k}^{2}}}} \right)\quad almost{\text{ }}surely,$$

where $\sigma _{k}^{2}$ belongs to $\sigma ({{\xi }^{1}}, \ldots ,{{\xi }^{{k - 1}}})$ for all $k = 1, \ldots ,N.$ Let ${{S}_{N}} = \sum\nolimits_{k = 1}^N {{{\xi }^{k}}} .$ Then there exists a constant ${{C}_{1}}$ such that, for any fixed $\delta > 0$ and $B > b > 0$, with probability $1 - \delta $

$$either \;\sum\limits_{k = 1}^N \,\sigma _{k}^{2} \geqslant B$$

$$or\quad {{\left\| {{{S}_{N}}} \right\|}_{2}} \leqslant {{C}_{1}}\sqrt {\max\left\{ {\sum\limits_{k = 1}^N \,\sigma _{k}^{2},b} \right\}\left( {\ln\frac{{2n}}{\delta } + \ln\ln\frac{B}{b}} \right)} .$$

Lemma 8 (see [35, corollary to Theorem 2.1, case (ii)]). Suppose that a sequence $\{ {{\xi }^{k}}\} _{{k = 1}}^{N}$ of random vectors from ${{{\mathbf{R}}}^{n}}$ satisfies the condition

$$\mathbb{E}{\kern 1pt} [{{\xi }^{k}}\,{\text{|}}\,{{\xi }^{1}}, \ldots ,{{\xi }^{{k - 1}}}] = 0\;\;almost{\text{ }}surely,\quad k = 1, \ldots ,N,$$

and let ${{S}_{N}} = \sum\nolimits_{k = 1}^N {{{\xi }^{k}}} $. Assume that the sequence$\{ {{\xi }^{k}}\} _{{k = 1}}^{N}$ satisfies the light-tail assumption

$$\mathbb{E}\left[ {\exp\left( {\frac{{\left\| {{{\xi }^{k}}} \right\|_{2}^{2}}}{{\sigma _{k}^{2}}}} \right)\,{\text{|}}\,{{\xi }^{1}}, \ldots ,{{\xi }^{{k - 1}}}} \right] \leqslant \exp(1)\;\;almost{\text{ }}surely,\quad k = 1, \ldots ,N,$$

where ${{\sigma }_{1}}, \ldots ,{{\sigma }_{N}}$ are positive numbers. Then, for all $\gamma \geqslant 0$,

$$\mathbb{P}\left\{ {\left\| {{{S}_{N}}} \right\| \geqslant (\sqrt 2 + \sqrt {2\gamma } )\sqrt {\sum\limits_{k = 1}^N \,\sigma _{k}^{2}} } \right\} \leqslant \exp\left( { - \frac{{{{\gamma }^{2}}}}{3}} \right).$$

Proof of Proposition 2. The dual function is represented in the form

$$\varphi ({\mathbf{{\boldsymbol{\lambda}} }}) = \sum\limits_{k = 1}^n \,\left\{ {{{u}_{k}}({{x}_{k}}({\boldsymbol{\lambda}} )) - \left\langle {{\boldsymbol{\lambda}} ,{{{\mathbf{C}}}_{k}}} \right\rangle {{x}_{k}}({\boldsymbol{\lambda}} ) + \frac{1}{n}\left\langle {{\boldsymbol{\lambda}} ,{\mathbf{b}}} \right\rangle } \right\} = \sum\limits_{k = 1}^n \,{{\varphi }_{k}}({\boldsymbol{\lambda}} ).$$

Proposition 1 implies that

$$\nabla \varphi ({\boldsymbol{\lambda}} ) = \sum\limits_{k = 1}^n \,\nabla {{\varphi }_{k}}({\boldsymbol{\lambda}} ) = \sum\limits_{k = 1}^n \,\left( {\frac{1}{n}{\mathbf{b}} - {{{\mathbf{C}}}_{k}}{{x}_{k}}({\boldsymbol{\lambda}} )} \right).$$

Define

$$\begin{gathered} {{x}_{k}}({{{\boldsymbol{\lambda}} }^{1}}) = \mathop {\arg\,\max\,}\limits_{{{x}_{k}} \in {{{\mathbf{R}}}_{ + }}} \left\{ {{{u}_{k}}({{x}_{k}}) - {{x}_{k}}\left\langle {{{{\boldsymbol{\lambda}} }^{1}},{{{\mathbf{C}}}_{k}}} \right\rangle } \right\}, \\ {{x}_{k}}({{{\boldsymbol{\lambda}} }^{2}}) = \mathop {\arg\,\max}\limits_{{{x}_{k}} \in {{{\mathbf{R}}}_{ + }}} \,\left\{ {{{u}_{k}}({{x}_{k}}) - {{x}_{k}}\left\langle {{{{\boldsymbol{\lambda}} }^{2}},{{{\mathbf{C}}}_{k}}} \right\rangle } \right\}. \\ \end{gathered} $$

The necessary maximum conditions of the first order are written as

$$\begin{gathered} \left\langle {\nabla {{u}_{k}}({{x}_{k}}({{{\boldsymbol{\lambda}} }^{1}})) - \left\langle {{{{\boldsymbol{\lambda}} }^{1}},{{{\mathbf{C}}}_{k}}} \right\rangle ,{{x}_{k}}({{{\boldsymbol{\lambda}} }^{1}}) - {{x}_{k}}({{{\boldsymbol{\lambda}} }^{2}})} \right\rangle \geqslant 0, \\ \left\langle {\nabla {{u}_{k}}({{x}_{k}}({{{\boldsymbol{\lambda}} }^{2}})) - \left\langle {{{{\boldsymbol{\lambda}} }^{2}},{{{\mathbf{C}}}_{k}}} \right\rangle ,{{x}_{k}}({{{\boldsymbol{\lambda}} }^{2}}) - {{x}_{k}}({{{\boldsymbol{\lambda}} }^{1}})} \right\rangle \geqslant 0. \\ \end{gathered} $$

Adding these inequalities yields

$$\left\langle {\nabla {{u}_{k}}({{x}_{k}}({{{\boldsymbol{\lambda}} }^{2}})) - \nabla {{u}_{k}}({{x}_{k}}({{{\boldsymbol{\lambda}} }^{1}})),{{x}_{k}}({{{\boldsymbol{\lambda}} }^{1}}) - {{x}_{k}}({{{\boldsymbol{\lambda}} }^{2}})} \right\rangle \leqslant \left\langle {\left\langle {{{{\boldsymbol{\lambda}} }^{2}},{{{\mathbf{C}}}_{k}}} \right\rangle - \left\langle {{{{\boldsymbol{\lambda}} }^{1}},{{{\mathbf{C}}}_{k}}} \right\rangle ,{{x}_{k}}({{{\boldsymbol{\lambda}} }^{1}}) - {{x}_{k}}({{{\boldsymbol{\lambda}} }^{2}})} \right\rangle .$$

Since ${{u}_{k}}({{x}_{k}})$ is strongly concave, for any $x_{k}^{1}$ and $x_{k}^{2}$, $k = 1, \ldots ,n$, we have

$$\left\langle {\nabla {{u}_{k}}(x_{k}^{2}) - \nabla {{u}_{k}}(x_{k}^{1}),x_{k}^{1} - x_{k}^{2}} \right\rangle \geqslant \mu \left\| {x_{k}^{1} - x_{k}^{2}} \right\|_{2}^{2},$$

whence

$$\mu \left\| {{{x}_{k}}({{{\boldsymbol{\lambda}} }^{1}}) - {{x}_{k}}({{{\boldsymbol{\lambda}} }^{2}})} \right\|_{2}^{2} \leqslant \left\langle {\left\langle {{{{\boldsymbol{\lambda}} }^{2}},{{{\mathbf{C}}}_{k}}} \right\rangle - \left\langle {{{{\boldsymbol{\lambda}} }^{1}},{{{\mathbf{C}}}_{k}}} \right\rangle ,{{x}_{k}}({{{\boldsymbol{\lambda}} }^{1}}) - {{x}_{k}}({{{\boldsymbol{\lambda}} }^{2}})} \right\rangle \leqslant {{\left\| {{{{\mathbf{C}}}_{k}}} \right\|}_{2}} \cdot {{\left\| {{{{\boldsymbol{\lambda}} }^{1}} - {{{\boldsymbol{\lambda}} }^{2}}} \right\|}_{2}} \cdot {{\left\| {{{x}_{k}}({{{\boldsymbol{\lambda}} }^{1}}) - {{x}_{k}}({{{\boldsymbol{\lambda}} }^{2}})} \right\|}_{2}}.$$

Then the following estimate can be obtained for all gradient components $\nabla {{\varphi }_{k}}$:

$${{\left\| {\nabla {{\varphi }_{k}}({{{\boldsymbol{\lambda}} }^{1}}) - \nabla {{\varphi }_{k}}({{{\boldsymbol{\lambda}} }^{2}})} \right\|}_{2}} \leqslant {{\left\| {{{{\mathbf{C}}}_{k}}} \right\|}_{2}} \cdot {{\left\| {{{x}_{k}}({{{\boldsymbol{\lambda}} }^{1}}) - {{x}_{k}}({{{\boldsymbol{\lambda}} }^{2}})} \right\|}_{2}} \leqslant \frac{1}{\mu }\left\| {{{{\mathbf{C}}}_{k}}} \right\|_{2}^{2} \cdot {{\left\| {{{{\boldsymbol{\lambda}} }^{1}} - {{{\boldsymbol{\lambda}} }^{2}}} \right\|}_{2}}.$$

The matrix $C$, in view of its structure, satisfies the estimate ${{\left\| {{{{\mathbf{C}}}_{k}}} \right\|}_{2}} \leqslant m$. Then the gradient of the dual function satisfies

$${{\left\| {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{1}}) - \nabla \varphi ({{{\boldsymbol{\lambda}} }^{2}})} \right\|}_{2}} \leqslant \sum\limits_{k = 1}^n \,{{\left\| {\nabla {{\varphi }_{k}}({{{\boldsymbol{\lambda}} }^{1}}) - \nabla {{\varphi }_{k}}({{{\boldsymbol{\lambda}} }^{2}})} \right\|}_{2}} \leqslant \frac{{{{m}^{2}}n}}{\mu }{{\left\| {{{{\boldsymbol{\lambda}} }^{1}} - {{{\boldsymbol{\lambda}} }^{2}}} \right\|}_{2}}.$$

Proof of Lemma 1. First, we state and prove a technical lemma.

Define ${{d}_{L}}({\boldsymbol{\lambda}} ) = \tfrac{L}{2}\left\| {{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{0}}} \right\|_{2}^{2}$ and consider the sequences

$${{l}_{t}}({\boldsymbol{\lambda}} ) = \sum\limits_{j = 0}^t \,{{\alpha }_{j}}\left[ {\varphi ({{{\boldsymbol{\lambda}} }^{j}}) + \left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{j}}),{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{j}}} \right\rangle } \right]$$

and

$${{\psi }_{t}}({\boldsymbol{\lambda}} ) = {{l}_{t}}({\boldsymbol{\lambda}} ) + {{d}_{L}}({\boldsymbol{\lambda}} ),\quad t = 0,1, \ldots ,$$

where ${{\{ {{{\boldsymbol{\lambda}} }^{j}}\} }_{{j \geqslant 0}}}$ is the sequence of points generated by Algorithm 1.

Lemma 9. After executing $N$ steps of Algorithm 1, it is true that

$${{A}_{N}}\varphi ({{{\mathbf{y}}}^{N}}) \leqslant \mathop {\min}\limits_{{\boldsymbol{\lambda}} \in {\mathbf{R}}_{ + }^{m}} {{\psi }_{N}}({\boldsymbol{\lambda}} ) = {{\psi }_{N}}({{{\mathbf{z}}}^{N}}).$$

(A.20)

Proof. Inequality (A.20) is proved by induction. At $t = 0$, (A.20) is true. Indeed,

where holds, since ${{\alpha }_{0}} = 1{\text{/}}2 \leqslant 1$, while holds, since the function $\varphi (\lambda )$ has a Lipschitz continuous gradient (see Proposition 2 and [36, Lemma 1.2.3]). Thus, ${{A}_{0}}\varphi ({{{\mathbf{y}}}^{0}}) = \tfrac{1}{2}\varphi ({{{\mathbf{y}}}^{0}}) \leqslant {{\psi }_{0}}$.

Assume that (A.20) holds for $t$:

$${{A}_{t}}\varphi ({{{\mathbf{y}}}^{t}}) \leqslant {{\psi }_{t}}({{{\mathbf{z}}}^{t}}).$$

(A.21)

Let us prove that (A.20) holds for $t + 1$. Indeed, we have

where holds, since the prox-function $\tfrac{1}{2}\left\| {{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{0}}} \right\|_{2}^{2}$ is strongly convex and in view of the properties of the extremum at the point ${{{\mathbf{z}}}^{t}}$; follows from (A.21); and holds in view of the convexity of the function $\varphi ({\boldsymbol{\lambda}} )$.

Since the FGM coefficients ${{A}_{t}}$ and ${{\alpha }_{t}}$ are related by the equalities ${{A}_{{t + 1}}} = \sum\nolimits_{j = 0}^{t + 1} {{{\alpha }_{j}}} = {{A}_{t}} + {{\alpha }_{{t + 1}}}$ and ${{\tau }_{t}} = {{\alpha }_{{t + 1}}}{\text{/}}{{A}_{{t + 1}}}$, the relation ${{{\boldsymbol{\lambda}} }^{{t + 1}}} = {{\tau }_{t}}{{{\mathbf{z}}}^{t}} + (1 - {{\tau }_{t}}){{{\mathbf{y}}}^{t}}$ from Algorithm 1 can be rewritten as

$${{A}_{{t + 1}}}{{{\boldsymbol{\lambda}} }^{{t + 1}}} = {{\alpha }_{{t + 1}}}{{{\mathbf{z}}}^{t}} + {{A}_{t}}{{{\mathbf{y}}}^{t}}.$$

Using the last relations, we can make the following transformations:

$$\begin{gathered} {{A}_{t}}\left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}),{{{\mathbf{y}}}^{t}} - {{{\boldsymbol{\lambda}} }^{{t + 1}}}} \right\rangle + {{\alpha }_{{t + 1}}}\left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}),{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{{t + 1}}}} \right\rangle = - {{A}_{{t + 1}}}\left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}),{{{\boldsymbol{\lambda}} }^{{t + 1}}}} \right\rangle \\ \, + {{\alpha }_{{t + 1}}}\left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}),{\boldsymbol{\lambda}} } \right\rangle + {{A}_{t}}\left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}),{{{\mathbf{y}}}^{t}}} \right\rangle = {{\alpha }_{{t + 1}}}\left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}),{\boldsymbol{\lambda}} - {{{\mathbf{z}}}^{t}}} \right\rangle . \\ \end{gathered} $$

Then

$$\begin{gathered} {{A}_{t}}\left( {\varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}) + \left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}),{{{\mathbf{y}}}^{t}} - {{{\boldsymbol{\lambda}} }^{{t + 1}}}} \right\rangle } \right) + \frac{L}{2}\left\| {{\boldsymbol{\lambda}} - {{{\mathbf{z}}}^{t}}} \right\|_{2}^{2} + {{\alpha }_{{t + 1}}}\left[ {\varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}) + \left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}),{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{{t + 1}}}} \right\rangle } \right] \\ \, = {{A}_{{t + 1}}}\varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}) + \frac{L}{2}\left\| {{\boldsymbol{\lambda}} - {{{\mathbf{z}}}^{t}}} \right\|_{2}^{2} + {{\alpha }_{{t + 1}}}\left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}),{\boldsymbol{\lambda}} - {{{\mathbf{z}}}^{t}}} \right\rangle . \\ \end{gathered} $$

(A.23)

After replacing the last expression in (A.22) by (A.23), we can use an extended version of the Fenchel inequality for conjugate functions [37], namely,

$$\left\langle {{\mathbf{g}},{\mathbf{s}}} \right\rangle + \frac{\xi }{2}{{\left\| {\mathbf{s}} \right\|}^{2}} \geqslant - \frac{1}{{2\xi }}\left\| {\mathbf{g}} \right\|_{*}^{2},\quad {\mathbf{g}} \in \mathbb{E}\text{*},\quad {\mathbf{s}} \in \mathbb{E},$$

where $\mathbb{E}$ is a finite-dimensional real vector space, $\mathbb{E}\text{*}$ is the space of linear functions on $\mathbb{E}$ (dual space), and the norm in the dual space is given by ${{\left\| {\mathbf{g}} \right\|}_{*}} = \mathop {\max}\limits_{\mathbf{x}} \{ \left\langle {{\mathbf{g}},{\mathbf{x}}} \right\rangle \,{\text{|}}\,{{\left\| {\mathbf{x}} \right\|}_{E}} = 1\} $. In our case, ${\mathbf{g}} = \nabla \varphi ({{\lambda }^{{t + 1}}})$, ${\mathbf{s}} = \lambda - {{{\mathbf{z}}}^{t}}$, $\xi = \tfrac{L}{{{{\alpha }_{{t + 1}}}}}$. Therefore,

$${{\psi }_{{t + 1}}}({{{\mathbf{z}}}^{{t + 1}}}) \geqslant {{A}_{{t + 1}}}\varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}) - \frac{{\alpha _{{t + 1}}^{2}}}{{2L}}\left\| {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}})} \right\|_{2}^{2}.$$

(A.24)

To complete the proof of the lemma, we need to show that ${{A}_{{t + 1}}}\varphi ({{{\mathbf{y}}}^{{t + 1}}})$ is smaller than the right-hand side of inequality (A.24).

Since the function $\varphi ({\boldsymbol{\lambda}} )$ is $L$-smooth (see Proposition 2),

$$\begin{gathered} \varphi ({{{\mathbf{y}}}^{{t + 1}}}) \leqslant \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}) + \left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}),{{{\mathbf{y}}}^{{t + 1}}} - {{{\boldsymbol{\lambda}} }^{{t + 1}}}} \right\rangle + \frac{L}{2}\left\| {{{{\mathbf{y}}}^{{t + 1}}} - {{{\boldsymbol{\lambda}} }^{{t + 1}}}} \right\|_{2}^{2} \\ \, = \mathop {\min}\limits_{\boldsymbol{\lambda}} \left\{ {\varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}) + \left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}),{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{{t + 1}}}} \right\rangle + \frac{L}{2}\left\| {{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{{t + 1}}}} \right\|_{2}^{2}} \right\} = \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}) - \frac{1}{{2L}}\left\| {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}})} \right\|_{2}^{2}. \\ \end{gathered} $$

Multiplying both sides of the resulting inequality by ${{A}_{{t + 1}}}$ yields

$${{A}_{{t + 1}}}\varphi ({{{\mathbf{y}}}^{{t + 1}}}) \leqslant {{A}_{{t + 1}}}\varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}) - \frac{{{{A}_{{t + 1}}}}}{{2L}}\left\| {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}})} \right\|_{2}^{2}.$$

Since the FGM coefficients satisfy $\alpha _{{t + 1}}^{2} \leqslant {{A}_{{t + 1}}}$, we obtain

$${{A}_{{t + 1}}}\varphi ({{{\mathbf{y}}}^{{t + 1}}}) \leqslant {{A}_{{t + 1}}}\varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}}) - \frac{{\alpha _{{t + 1}}^{2}}}{{2L}}\left\| {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{{t + 1}}})} \right\|_{2}^{2}.$$

(A.25)

Therefore, by virtue of (A.24) and (A.25), ${{A}_{{t + 1}}}\varphi ({{{\mathbf{y}}}^{{t + 1}}}) \leqslant {{\psi }_{{t + 1}}}({{{\mathbf{z}}}^{{t + 1}}})$, as required.

Proof of Lemma 1. Define the set

$${{\Lambda }_{{2\hat {R}}}} = \left\{ {{\boldsymbol{\lambda}} \in \mathbb{R}_{ + }^{m}:{{{\left\| {\boldsymbol{\lambda}} \right\|}}_{2}} \leqslant 2\hat {R}} \right\},$$

where $\hat {R}$ is defined by the inequalities

$${{\left\| {{{{\boldsymbol{\lambda}} }^{0}} - {\boldsymbol{\lambda}} \text{*}} \right\|}_{2}} + {{\left\| {{{{\boldsymbol{\lambda}} }^{0}}} \right\|}_{2}} \leqslant {{\left\| {{\boldsymbol{\lambda}} \text{*}} \right\|}_{2}} + 2{{\left\| {{{{\boldsymbol{\lambda}} }^{0}}} \right\|}_{2}} \leqslant 3R = \hat {R}.$$

All ${{{\boldsymbol{\lambda}} }^{t}}$ belong to ${{\Lambda }_{{2\hat {R}}}}$, since

$${{\left\| {{{{\boldsymbol{\lambda}} }^{t}}} \right\|}_{2}} \leqslant {{\left\| {{{{\boldsymbol{\lambda}} }^{t}} - {\boldsymbol{\lambda}} \text{*}} \right\|}_{2}} + {{\left\| {{\boldsymbol{\lambda}} \text{*} - \;{{{\boldsymbol{\lambda}} }^{0}}} \right\|}_{2}} + {{\left\| {{{{\boldsymbol{\lambda}} }^{0}}} \right\|}_{2}} \leqslant 2{{\left\| {{\boldsymbol{\lambda}} \text{*} - \;{{{\boldsymbol{\lambda}} }^{0}}} \right\|}_{2}} + {{\left\| {{{{\boldsymbol{\lambda}} }^{0}}} \right\|}_{2}} \leqslant 2{{\left\| {{\boldsymbol{\lambda}} \text{*}} \right\|}_{2}} + 3{{\left\| {{{{\boldsymbol{\lambda}} }^{0}}} \right\|}_{2}} \leqslant 5R \leqslant 2\hat {R},$$

where the second inequality was obtained taking into account that ${{\left\| {{{{\boldsymbol{\lambda}} }^{t}} - {\boldsymbol{\lambda}} \text{*}} \right\|}_{2}} \leqslant {{\left\| {{\boldsymbol{\lambda}} \text{*} - \;{{{\boldsymbol{\lambda}} }^{0}}} \right\|}_{2}}$ for $t = 0,1, \ldots $ .

The last inequality can be proved as follows. For any ${\boldsymbol{\lambda}} \in {\mathbf{R}}_{ + }^{m}$, by Lemma 9 and the strong convexity of the function ${{\psi }_{t}}({\boldsymbol{\lambda}} )$ with a constant $L$, it is true that

$$\begin{gathered} {{A}_{t}}\varphi ({{{\mathbf{y}}}^{t}}) + \frac{L}{2}\left\| {{\boldsymbol{\lambda}} - {{{\mathbf{z}}}^{t}}} \right\|_{2}^{2} \leqslant {{\psi }_{t}}({{{\mathbf{z}}}^{t}}) + \frac{L}{2}\left\| {{\boldsymbol{\lambda}} - {{{\mathbf{z}}}^{t}}} \right\|_{2}^{2} \leqslant {{\psi }_{t}}({\boldsymbol{\lambda}} ) \\ = \sum\limits_{j = 0}^t \,{{\alpha }_{j}}\left[ {\varphi ({{{\boldsymbol{\lambda}} }^{j}}) + \left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{j}}),{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{j}}} \right\rangle } \right] + \frac{L}{2}\left\| {{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{0}}} \right\|_{2}^{2}. \\ \end{gathered} $$

(A.26)

Since the function $\varphi ({\boldsymbol{\lambda}} )$ is convex, the last expression in (A.26) can be estimated from above as ${{A}_{t}}\varphi ({\boldsymbol{\lambda}} ) + \tfrac{L}{2}\left\| {{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{0}}} \right\|_{2}^{2}$. Then, for ${\boldsymbol{\lambda}} = {\boldsymbol{\lambda}} \text{*}$,

$$\frac{L}{2}\left\| {{\boldsymbol{\lambda}} \text{*} - \;{{{\mathbf{z}}}^{t}}} \right\|_{2}^{2} \leqslant {{A}_{t}}(\varphi ({{{\mathbf{y}}}^{t}}) - \varphi ({\boldsymbol{\lambda}} \text{*})) + \frac{L}{2}\left\| {{\boldsymbol{\lambda}} {\kern 1pt} {\text{*}} - {{{\mathbf{z}}}^{t}}} \right\|_{2}^{2} \leqslant \frac{L}{2}\left\| {{\boldsymbol{\lambda}} {\kern 1pt} {\text{*}} - {{{\boldsymbol{\lambda}} }^{0}}} \right\|_{2}^{2}.$$

Therefore,

$${{\left\| {{\boldsymbol{\lambda}} \text{*} - \;{{{\mathbf{z}}}^{t}}} \right\|}_{2}} \leqslant {{\left\| {{\boldsymbol{\lambda}} \text{*} - \;{{{\boldsymbol{\lambda}} }^{0}}} \right\|}_{2}}.$$

(A.27)

Since ${{{\mathbf{y}}}^{t}}$ in Algorithm 1 is determined by a gradient projection step for a convex function $\varphi ({\boldsymbol{\lambda}} )$, the sequence of points ${{{\mathbf{y}}}^{t}}$, $t = 0,1, \ldots $, generated by the algorithm is bounded (the proof of this fact can be found, e.g., in [38, Lemma 9.17, p. 183] or in [28, p. 265]):

$${{\left\| {{\boldsymbol{\lambda}} \text{*} - \;{{{\mathbf{y}}}^{t}}} \right\|}_{2}} \leqslant {{\left\| {{\boldsymbol{\lambda}} \text{*} - \;{{{{\lambda}} }^{0}}} \right\|}_{2}}.$$

(A.28)

Furthermore,

$${{\left\| {{{{\boldsymbol{\lambda}} }^{{t + 1}}} - {\boldsymbol{\lambda}} \text{*}} \right\|}_{2}} = {{\left\| {{{\tau }_{t}}({{{\mathbf{z}}}^{t}} - {\boldsymbol{\lambda}} \text{*}) + (1 - {{\tau }_{t}})({{{\mathbf{y}}}^{t}} - {\boldsymbol{\lambda}} \text{*})} \right\|}_{2}} \leqslant {{\tau }_{t}}{{\left\| {{{{\mathbf{z}}}^{t}} - {\boldsymbol{\lambda}} \text{*}} \right\|}_{2}} + (1 - {{\tau }_{t}}){{\left\| {{{{\mathbf{y}}}^{t}} - {\boldsymbol{\lambda}} \text{*})} \right\|}_{2}}.$$

Combining this inequality with (A.27) and (A.28) yields the required result:

$${{\left\| {{{{\boldsymbol{\lambda}} }^{{t + 1}}} - {\boldsymbol{\lambda}} \text{*}} \right\|}_{2}} \leqslant {{\left\| {{\boldsymbol{\lambda}} \text{*} - \;{{{\boldsymbol{\lambda}} }^{0}}} \right\|}_{2}},\quad t = - 1,0,1, \ldots {\kern 1pt} {\kern 1pt} \,.$$

By Lemma 9,

where ① holds, since

$${{\left\| {{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{0}}} \right\|}^{2}} \leqslant 2{{\left\| {\boldsymbol{\lambda}} \right\|}^{2}} + 2{{\left\| {{{{\boldsymbol{\lambda}} }^{0}}} \right\|}^{2}} \leqslant 8{{\hat {R}}^{2}} + \frac{2}{9}{{\hat {R}}^{2}} = \frac{{74}}{9}{{\hat {R}}^{2}}.$$

(A.29)

Applying the definitions of the dual objective function $\varphi ({{{\boldsymbol{\lambda}} }^{t}})$ (see (2)) and of its gradient $\nabla \varphi ({{{\boldsymbol{\lambda}} }^{t}})$ (see Proposition 1) yields

$$\begin{gathered} \sum\limits_{t = 0}^N \,{{\alpha }_{t}}\left[ {\varphi ({{{\boldsymbol{\lambda}} }^{t}}) + \left\langle {\nabla \varphi ({{{\boldsymbol{\lambda}} }^{t}}),{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{t}}} \right\rangle } \right] \\ \, = \sum\limits_{t = 0}^N \,{{\alpha }_{t}}\left( {\left\langle {{{{\boldsymbol{\lambda}} }^{t}},{\mathbf{b}}} \right\rangle + \sum\limits_{k = 1}^n \,\left( {{{u}_{k}}(x_{k}^{t}({{{\boldsymbol{\lambda}} }^{t}})) - \left\langle {{{{\boldsymbol{\lambda}} }^{t}},{{{\mathbf{C}}}_{k}}x_{k}^{t}({{{\boldsymbol{\lambda}} }^{t}})} \right\rangle } \right) + \left\langle {{\mathbf{b}} - \sum\limits_{k = 1}^n \,{{{\mathbf{C}}}_{k}}x_{k}^{t}({{{\boldsymbol{\lambda}} }^{t}}),{\boldsymbol{\lambda}} - {{{\boldsymbol{\lambda}} }^{t}}} \right\rangle } \right) \\ \, = \sum\limits_{t = 0}^N \,{{\alpha }_{t}}\left( {\sum\limits_{k = 1}^n \,{{u}_{k}}(x_{k}^{t}({{{\boldsymbol{\lambda}} }^{t}})) + \left\langle {{\boldsymbol{\lambda}} ,{\mathbf{b}} - \sum\limits_{k = 1}^n \,{{{\mathbf{C}}}_{k}}x_{k}^{t}({{{\boldsymbol{\lambda}} }^{t}})} \right\rangle } \right) \leqslant {{A}_{N}}\left( {U({{{{\mathbf{\hat {x}}}}}^{N}}) + \left\langle {{\boldsymbol{\lambda}} ,{\mathbf{b}} - C{{{{\mathbf{\hat {x}}}}}^{N}}} \right\rangle } \right), \\ \end{gathered} $$

where the last the inequality holds, since the utility functions are concave.

Thus,

$$\begin{gathered} {{A}_{N}}\varphi ({{{\mathbf{y}}}^{N}}) \leqslant {{A}_{N}}U({{{{\mathbf{\hat {x}}}}}^{N}}) + \frac{{37L{{{\hat {R}}}^{2}}}}{9} + {{A}_{N}}\mathop {\min}\limits_{{\boldsymbol{\lambda}} \in {{\Lambda }_{{2\hat {R}}}}} \left\{ {\left\langle {{\boldsymbol{\lambda}} ,{\mathbf{b}} - C{{{{\mathbf{\hat {x}}}}}^{N}}} \right\rangle } \right\} = {{A}_{N}}U({{{{\mathbf{\hat {x}}}}}^{N}}) + \frac{{37L{{{\hat {R}}}^{2}}}}{9} \\ \, - {{A}_{N}}\mathop {\max}\limits_{{\boldsymbol{\lambda}} \in {{\Lambda }_{{2\hat {R}}}}} \left\{ {\left\langle {{\boldsymbol{\lambda}} ,C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}}} \right\rangle } \right\} = {{A}_{N}}U({{{{\mathbf{\hat {x}}}}}^{N}}) + \frac{{37L{{{\hat {R}}}^{2}}}}{9} - 2\hat {R}{{A}_{N}}\mathop {\left\| {\mathop {\left( {C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}}} \right)}\nolimits_ + } \right\|}\nolimits_2 , \\ \end{gathered} $$

which yields estimate (6).

Proof of Lemma 2. First, we prove several auxiliary technical lemmas.

Lemma 10. Let $A,B$, and $\{ {{r}_{t}}\} _{{t = 0}}^{N}$ be nonnegative numbers such that, for any $l = 1, \ldots ,N,$

$$\frac{1}{2}r_{l}^{2} \leqslant Ar_{0}^{2} + B{{r}_{0}}\sqrt {\sum\limits_{t = 0}^{l - 1} \,r_{t}^{2}} .$$

(A.30)

Then

$${{r}_{l}} \leqslant C{{r}_{0}},$$

(A.31)

where $C$ is a positive constant satisfying ${{C}^{2}} \geqslant \max\{ 1,2A + 2BC\sqrt N \} $, i.e., for example, it is possible to use

$$C = \max\left\{ {1,B\sqrt N + \sqrt {{{B}^{2}}N + 2A} } \right\}.$$

Proof. Relation (A.31) is proved by induction. For $l = 0$, this inequality holds, since $C \geqslant 1$. Assuming that (A.31) holds for all $l < N,$ we prove that it holds for $l + 1$ as well. Indeed,

$${{r}_{{l + 1}}}\mathop \leqslant \limits^{({\text{A}}.30)} \sqrt 2 \sqrt {Ar_{0}^{2} + B{{r}_{0}}\sqrt {\sum\limits_{t = 0}^l \,r_{t}^{2}} } \mathop \leqslant \limits^{({\text{A}}.31)} {{r}_{0}}\sqrt 2 \sqrt {A + BC\sqrt N } = {{r}_{0}}\underbrace {\sqrt {2A + 2BC\sqrt N } }_{ \leqslant C} \leqslant C{{r}_{0}}.$$

Lemma 11. Suppose that sequences of nonnegative coefficients ${{\{ {{R}_{t}}\} }_{{t \geqslant 0}}}$ and random vectors ${{\{ {{{\boldsymbol{\eta }}}^{t}}\} }_{{t \geqslant 0}}}$ and ${{\{ {{{\mathbf{a}}}^{t}}\} }_{{t \geqslant 0}}}$ are such that, for all $l = 1, \ldots ,N,$

$$\frac{1}{2}R_{l}^{2} \leqslant A + u{\kern 1pt} \sum\limits_{t = 0}^{l - 1} \,\left\langle {{{{\boldsymbol{\eta }}}^{t}},{{{\mathbf{a}}}^{t}}} \right\rangle ,$$

(A.32)

where $A$ is a nonnegative constant, $d \geqslant 1$ is a positive constant, ${{\left\| {{{{\mathbf{a}}}^{t}}} \right\|}_{2}} \leqslant {{\widetilde R}_{t}}d$ and ${{\widetilde R}_{t}} = \max\{ {{\widetilde R}_{{t - 1}}},{{R}_{t}}\} $ for all $t \geqslant 1$, ${{\widetilde R}_{0}} = {{R}_{0}}$, and ${{\widetilde R}_{t}}$ depends only on ${{{\boldsymbol{\eta }}}^{0}}, \ldots ,{{{\boldsymbol{\eta }}}^{t}}$. Additionally, suppose that ${{{\mathbf{a}}}^{t}}$ is a function of ${{{\boldsymbol{\eta }}}^{0}}, \ldots ,{{{\boldsymbol{\eta }}}^{{t - 1}}}$ $\forall t \geqslant 1$, ${{a}^{0}}$ is a constant vector, and, for any $t \geqslant 0$,

$$\mathbb{E}{\kern 1pt} [{{\eta }^{t}}\,{\text{|}}\,\{ {{{\boldsymbol{\eta }}}^{k}}\} _{{k = 0}}^{{t - 1}}] = 0,\quad \mathbb{E}\left[ {\exp\left( {\left\| {{{{\boldsymbol{\eta }}}^{t}}} \right\|_{2}^{2}{{\sigma }^{{ - 2}}}} \right)\,{\text{|}}\,\{ {{{\boldsymbol{\eta }}}^{k}}\} _{{k = 0}}^{{t - 1}}} \right] \leqslant \exp(1).$$

Then, with probability $1 - 2\delta $, the inequalities

$${{\widetilde R}_{l}} \leqslant J{{R}_{0}}\quad and\quad A + u\sum\limits_{t = 0}^{l - 1} \,\left\langle {{{{\boldsymbol{\eta }}}^{t}},{{{\mathbf{a}}}^{t}}} \right\rangle \leqslant A + udD\sqrt {{{\sigma }^{2}}g(N)NJ} \widetilde R_{0}^{2}$$

hold for all $l = 1, \ldots ,N$ simultaneously, where $D$ is a positive constant,

$$F = 2{{\sigma }^{2}}{{d}^{2}}N{{(2ud)}^{N}}\left( {2A + ud\widetilde R_{0}^{2} + 12ud\ln\frac{N}{\delta }{{\sigma }^{2}}N} \right),$$

$f = {{d}^{2}}{{\sigma }^{2}}\widetilde R_{0}^{2}$, $g(N) = \ln\left( {\tfrac{N}{\delta }} \right) + \ln\ln\left( {\tfrac{F}{f}} \right)$, and

$$J = \max\left\{ {1,\frac{1}{{{{{\widetilde R}}_{0}}}}udD\sqrt {{{\sigma }^{2}}g(N)} + \sqrt {\frac{1}{{\widetilde {R_{0}^{2}}}}{{u}^{2}}{{d}^{2}}C_{1}^{2}{{\sigma }^{2}}g(N) + \frac{{2A}}{{R_{0}^{2}}}} } \right\}.$$

Proof. The Cauchy–Schwarz inequality is applied to the second term on the right-hand side of (A.32):

$$\frac{1}{2}R_{l}^{2} \leqslant A + ud\sum\limits_{t = 0}^{l - 1} \,{{\left\| {{{{\boldsymbol{\eta }}}^{t}}} \right\|}_{2}}{{\widetilde R}_{t}} \leqslant A + \frac{{ud}}{2}\sum\limits_{t = 0}^{l - 1} \,\widetilde R_{t}^{2} + \frac{{ud}}{2}\sum\limits_{t = 0}^{l - 1} \,\left\| {{{{\boldsymbol{\eta }}}^{t}}} \right\|_{2}^{2}.$$

(A.33)

By Theorem 2.1 from [35], we have

$$(\forall N \geqslant 1,\;\forall \gamma \geqslant 0)\,:\quad \mathbb{P}\left\{ {{{{\left\| {\sum\limits_{t = 0}^{N - 1} \,{{{\boldsymbol{\eta }}}^{t}}} \right\|}}_{2}} \geqslant (\sqrt 2 + \sqrt 2 \gamma )\sqrt {\sum\limits_{t = 0}^{N - 1} \,\sigma _{t}^{2}} } \right\} \leqslant \exp\left( { - \frac{{{{\gamma }^{2}}}}{3}} \right).$$

(A.34)

Then, with probability at least

$$1 - \frac{\delta }{N} = 1 - \exp\left( { - \frac{{{{\gamma }^{2}}}}{3}} \right),$$

(A.35)

it holds that

$${{\left\| {{{{\boldsymbol{\eta }}}^{t}}} \right\|}_{2}} \leqslant \sqrt 2 \left( {1 + \sqrt {3\ln\frac{N}{\delta }} } \right)\sigma \leqslant 2\sqrt {6\ln\frac{N}{\delta }} \sigma .$$

(A.36)

Indeed, expressing $\gamma $ from (A.35) yields $\gamma = \sqrt {3\ln\tfrac{N}{\delta }} $. Plugging this expression into (A.34) and substituting a unified $\sigma \in {{{\mathbf{R}}}_{ + }}$ for the sequence ${{\sigma }_{t}}$, $t = 0, \ldots ,N - 1$, we obtain estimate (A.36).

Combining the resulting inequalities, we see that, with probability greater than or equal to $1 - \delta $, the inequality

$$\frac{1}{2}R_{l}^{2} \leqslant A + \frac{{ud}}{2}\sum\limits_{t = 0}^{l - 1} \,\widetilde R_{t}^{2} + 12ud\ln\frac{N}{\delta }{{\sigma }^{2}}l$$

holds for all $l = 1, \ldots ,N$ simultaneously. Note that the last term in this estimate is a nondecreasing function of $l$. Define $\hat {l}$ as the largest integer for which $\hat {l} \leqslant l$ and ${{\widetilde R}_{{tl}}} = {{R}_{{\hat {l}}}}$. Then ${{R}_{{\hat {l}}}} = {{\widetilde R}_{{\hat {l}}}} = {{\widetilde R}_{{\hat {l} + 1}}} = \ldots = {{\widetilde R}_{l}}$ and, hence, with probability $ \geqslant {\kern 1pt} 1 - \delta $,

$$\frac{1}{2}\widetilde R_{l}^{2} \leqslant A + \frac{{ud}}{2}\sum\limits_{t = 0}^{\hat {l} - 1} \,\widetilde R_{t}^{2} + 12ud\ln\frac{N}{\delta }{{\sigma }^{2}}\hat {l} \leqslant A + \frac{{ud}}{2}\sum\limits_{t = 0}^{l - 1} \,\widetilde R_{t}^{2} + 12ud\ln\frac{N}{\delta }{{\sigma }^{2}}l\quad \forall l = 1, \ldots ,N.$$

As a result, with probability $ \geqslant {\kern 1pt} 1 - \delta $, we have the estimate

$$\begin{gathered} \widetilde R_{l}^{2} \leqslant 2A + ud\sum\limits_{t = 0}^{l - 1} \,\widetilde R_{k}^{2} + 24ud\ln\frac{N}{\delta }{{\sigma }^{2}}l \leqslant 2A\underbrace {(1 + ud)}_{ \leqslant 2ud} + \underbrace {(ud + {{u}^{2}}{{d}^{2}})}_{ \leqslant 2{{u}^{2}}{{d}^{2}}}\sum\limits_{t = 0}^{l - 2} \,\widetilde R_{t}^{2} \\ + \;24ud\ln\frac{N}{\delta }{{\sigma }^{2}}\underbrace {(l + ud(l - 1))}_{ \leqslant 2udl} \leqslant 2ud\left( {2A + ud\sum\limits_{t = 0}^{l - 2} \,\widetilde R_{t}^{2} + 24ud\ln\frac{N}{\delta }{{\sigma }^{2}}l} \right)\quad \forall l = 1, \ldots ,N. \\ \end{gathered} $$

Applying this estimate recursively, we conclude that, with probability $ \geqslant {\kern 1pt} 1 - \delta $,

$$\widetilde R_{l}^{2} \leqslant {{(2ud)}^{l}}\left( {2A + ud\widetilde R_{0}^{2} + 24ud\ln\frac{N}{\delta }{{\sigma }^{2}}l} \right).$$

Next, consider the sequence of random variables ${{\xi }^{t}} = \left\langle {{{{\boldsymbol{\eta }}}^{t}},{{{\mathbf{a}}}^{t}}} \right\rangle $. Note that $\mathbb{E}{\kern 1pt} [{{\xi }^{t}}\,{\text{|}}\,{{\xi }^{0}}, \ldots ,{{\xi }^{{t - 1}}}] = \left\langle {\mathbb{E}{\kern 1pt} [{{{\boldsymbol{\eta }}}^{t}}\,{\text{|}}\,{{{\boldsymbol{\eta }}}^{0}}, \ldots ,{{{\boldsymbol{\eta }}}^{{k - 1}}}],{{{\mathbf{a}}}^{t}}} \right\rangle = 0$. Then, using the Cauchy–Schwarz inequality yields

$$\begin{gathered} \mathbb{E}\left[ {\exp\left( {\frac{{{{{({{\xi }^{t}})}}^{2}}}}{{{{\sigma }^{2}}{{d}^{2}}\widetilde R_{t}^{2}}}} \right)\,{\text{|}}\,{{\xi }^{0}}, \ldots ,{{\xi }^{{t - 1}}}} \right] \leqslant \mathbb{E}\left[ {\exp\left( {\frac{{\left\| {{{{\boldsymbol{\eta }}}^{t}}} \right\|_{2}^{2}{{d}^{2}}\widetilde R_{t}^{2}}}{{{{\sigma }^{2}}{{d}^{2}}\widetilde R_{t}^{2}}}} \right)\,{\text{|}}\,{{{\boldsymbol{\eta }}}^{0}}, \ldots ,{{{\boldsymbol{\eta }}}^{{t - 1}}}} \right] \\ = \mathbb{E}\left[ {\exp\left( {\frac{{\left\| {{{{\boldsymbol{\eta }}}^{t}}} \right\|_{2}^{2}}}{{{{\sigma }^{2}}}}} \right)\,{\text{|}}\,{{{\boldsymbol{\eta }}}^{0}}, \ldots ,{{{\boldsymbol{\eta }}}^{{t - 1}}}} \right] \leqslant \exp(1). \\ \end{gathered} $$

Define $\hat {\sigma }_{t}^{2} = {{\sigma }^{2}}{{d}^{2}}\widetilde R_{t}^{2}$. Then, with probability $ \geqslant 1 - \delta $, it is true that

$$\begin{gathered} \sum\limits_{t = 0}^{l - 1} \,\hat {\sigma }_{t}^{2} \leqslant {{\sigma }^{2}}{{d}^{2}}l{{(2ud)}^{l}}\left( {2A + ud\widetilde R_{0}^{2} + 24ud\ln\frac{N}{\delta }{{\sigma }^{2}}l} \right) \\ \, \leqslant {{\sigma }^{2}}{{d}^{2}}N{{(2ud)}^{N}}\left( {2A + ud\widetilde R_{0}^{2} + 24ud\ln\frac{N}{\delta }{{\sigma }^{2}}N} \right): = \frac{F}{2} \\ \end{gathered} $$

for all $l = 1, \ldots ,N$ simultaneously, where

$$F = 2{{\sigma }^{2}}{{d}^{2}}N{{(2ud)}^{N}}\left( {2A + ud\widetilde R_{0}^{2} + 24ud\ln\frac{N}{\delta }{{\sigma }^{2}}N} \right).$$

Using Corollary 8 from [34] for $b = \hat {\sigma }_{0}^{2}$, we see that, for any $l = 1, \ldots ,N$ with probability $ \geqslant {\kern 1pt} 1 - \tfrac{\delta }{N}$,

$${\text{either }}\;\;\sum\limits_{t = 0}^{l - 1} \,\hat {\sigma }_{t}^{2} \geqslant F\quad {\text{or }}\;\;\left| {\sum\limits_{t = 0}^{l - 1} \,{{\xi }^{t}}} \right| \leqslant {{C}_{1}}\sqrt {\sum\limits_{t = 0}^{l - 1} \,\hat {\sigma }_{t}^{2}\left( {\ln\left( {\frac{N}{\delta }} \right) + \ln\ln\left( {\frac{F}{f}} \right)} \right)} ,$$

(A.37)

where ${{C}_{1}} > 0$ is a constant independent of $F$ or $f$.

Combining the resulting estimates, we conclude that, with probability $ \geqslant {\kern 1pt} 1 - \delta $, estimate (A.37) holds for all $l = 1, \ldots ,N$ simultaneously.

Taking into account the choice of $F$, with probability $ \geqslant {\kern 1pt} 1 - 2\delta $, the estimate

$$\left| {\sum\limits_{t = 0}^{l - 1} \,{{\xi }^{t}}} \right| \leqslant {{C}_{1}}\sqrt {\sum\limits_{t = 0}^{l - 1} \,\hat {\sigma }_{t}^{2}\left( {\ln\left( {\frac{N}{\delta }} \right) + \ln\ln\left( {\frac{F}{f}} \right)} \right)} $$

holds for all $l = 1, \ldots ,N$ simultaneously.

For convenience in what follows, we introduce $g(N): = \ln\left( {\tfrac{N}{\delta }} \right) + \ln\ln\left( {\tfrac{F}{f}} \right) \approx \ln\left( {\tfrac{N}{\delta }} \right)$, neglecting the constant. Using $\hat {\sigma }_{t}^{2} = {{\sigma }^{2}}{{d}^{2}}\widetilde R_{t}^{2}$, we find that, with probability $ \geqslant {\kern 1pt} 1 - 2\delta $, the estimate

$$\frac{1}{2}\widetilde R_{l}^{2} \leqslant A + u\sum\limits_{t = 0}^{l - 1} \,\underbrace {\left\langle {{{{\boldsymbol{\eta }}}^{t}},{{{\mathbf{a}}}^{t}}} \right\rangle }_{{{\xi }^{t}}} \leqslant A + udD\sqrt {{{\sigma }^{2}}g(N)} \sqrt {\sum\limits_{t = 0}^{l - 1} \,\widetilde R_{t}^{2}} $$

(A.38)

holds for all $l = 1, \ldots ,N$ simultaneously. After choosing $A = \tfrac{A}{{\widetilde R_{0}^{2}}}$, $B = \tfrac{1}{{{{{\widetilde R}}_{0}}}}ud{{C}_{1}}\sqrt {{{\sigma }^{2}}g(N)} $, and ${{r}_{t}} = {{\widetilde R}_{t}}$, Lemma 10 implies that, with probability $1 - 2\delta $,

$${{\widetilde R}_{l}} \leqslant J{{R}_{0}}$$

for all $l = 1, \ldots ,N$ simultaneously, where

$$J = \max\left\{ {1,\frac{1}{{{{{\widetilde R}}_{0}}}}ud{{C}_{1}}\sqrt {{{\sigma }^{2}}g(N)} + \sqrt {\frac{1}{{\widetilde R_{0}^{2}}}{{u}^{2}}{{d}^{2}}C_{1}^{2}{{\sigma }^{2}}g(N) + \frac{{2A}}{{R_{0}^{2}}}} } \right\}.$$

It follows that, with probability $1 - 2\delta $, the estimate

$$A + u\sum\limits_{t = 0}^{l - 1} \,\left\langle {{{{\boldsymbol{\eta }}}^{t}},{{{\mathbf{a}}}^{t}}} \right\rangle \leqslant A + ud{{C}_{1}}\sqrt {{{\sigma }^{2}}g(N)lJ} \widetilde R_{0}^{2} \leqslant A + ud{{C}_{1}}\sqrt {{{\sigma }^{2}}g(N)NJ} \widetilde R_{0}^{2}$$

holds for all $l = 1, \ldots ,N$ simultaneously.

Proof of Lemma 2. For ${\boldsymbol{\lambda }} \in {\mathbf{R}}_{ + }^{m}$

$$\left\| {{{\boldsymbol{\lambda} }^{{t + 1}}} - \boldsymbol{\lambda} } \right\|_{2}^{2}\left\| {{{{[{{\boldsymbol{\lambda} }^{t}} - \beta \nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}})]}}_{ + }} - \boldsymbol{\lambda} } \right\|_{2}^{2} \leqslant \left\| {{{\boldsymbol{\lambda} }^{t}} - \boldsymbol{\lambda} } \right\|_{2}^{2} - 2\beta \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}),{{\boldsymbol{\lambda} }^{t}} - \boldsymbol{\lambda} } \right\rangle + {{\beta }^{2}}\left\| {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}})} \right\|_{2}^{2},$$

i.e.,

$$0 \leqslant \frac{1}{{2\beta }}\left( {\left\| {{{\boldsymbol{\lambda} }^{t}} - \boldsymbol{\lambda} } \right\|_{2}^{2} - \left\| {{{\boldsymbol{\lambda} }^{{t + 1}}} - \boldsymbol{\lambda} } \right\|_{2}^{2}} \right) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle + \frac{\beta }{2}\left\| {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}})} \right\|_{2}^{2}.$$

(A.39)

Adding $\varphi ({{\boldsymbol{\lambda} }^{t}})$ to both sides of inequality (A.39), multiplying it by $N$, and summing the result from $0$ to $N - 1$, we obtain

$$\begin{gathered} \frac{1}{N}\sum\limits_{t = 0}^{N - 1} \,\varphi {{\boldsymbol{\lambda} }^{t}}) \leqslant \frac{1}{N}\sum\limits_{t = 0}^{N - 1} \,\left\{ {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle \;\mathop + \limits_{_{{_{{_{{_{{_{{}}}}}}}}}}} \;\frac{\beta }{2}\left\| {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}})} \right\|_{2}^{2} } \right. \\ \left. {\, + \frac{1}{{2\beta }}\left( {\left\| {{{\boldsymbol{\lambda} }^{t}} - \boldsymbol{\lambda} } \right\|_{2}^{2} - \left\| {{{\boldsymbol{\lambda} }^{{t + 1}}} - \boldsymbol{\lambda} } \right\|_{2}^{2}} \right)} \right\}. \\ \end{gathered} $$

(A.40)

Since $\varphi (\boldsymbol{\lambda} )$ is convex, for ${{\hat {\boldsymbol{\lambda} }}^{N}} = \tfrac{1}{N}\sum\limits_{t = 0}^{N - 1} \,{{\boldsymbol{\lambda} }^{t}}$, we have

$$N\varphi ({{\hat {\boldsymbol{\lambda} }}^{N}}) \leqslant \sum\limits_{t = 0}^{N - 1} \,\left\{ {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right\} + \frac{\beta }{2}\left\| {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}})} \right\|_{2}^{2} + \frac{1}{{2\beta }}\left( {\left\| {{{\boldsymbol{\lambda} }^{0}} - \boldsymbol{\lambda} } \right\|_{2}^{2} - \left\| {{{\boldsymbol{\lambda} }^{N}} - \boldsymbol{\lambda} } \right\|_{2}^{2}} \right).$$

(A.41)

Setting $\boldsymbol{\lambda} = \boldsymbol{\lambda} \text{*}$ and adding and subtracting $\sum\nolimits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} \text{*} - \;{{\boldsymbol{\lambda} }^{t}}} \right\rangle $ on the right-hand side, we obtain

$$\begin{gathered} N\varphi ({{{\hat {\boldsymbol{\lambda} }}}^{N}}) \leqslant \sum\limits_{t = 0}^{N - 1} \,\left\{ {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} \text{*} - \;{{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right\} + \frac{\beta }{2}\left\| {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}})} \right\|_{2}^{2} \\ \, + \sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} \text{*} - \;{{\boldsymbol{\lambda} }^{t}}} \right\rangle + \frac{1}{{2\beta }}\left( {\left\| {{{\boldsymbol{\lambda} }^{0}} - \boldsymbol{\lambda} \text{*}} \right\|_{2}^{2} - \left\| {{{\boldsymbol{\lambda} }^{N}} - \boldsymbol{\lambda} \text{*}} \right\|_{2}^{2}} \right). \\ \end{gathered} $$

(A.42)

The convexity of $\varphi (\boldsymbol{\lambda} )$ implies that

$$\sum\limits_{t = 0}^{N - 1} \,\left\{ {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} {\kern 1pt} {\text{*}} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right\} \leqslant \sum\limits_{t = 0}^{N - 1} \,\left\{ {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \varphi (\boldsymbol{\lambda} \text{*}{\kern 1pt} ) - \varphi ({{\boldsymbol{\lambda} }^{t}})} \right\} \leqslant \sum\limits_{t = 0}^{N - 1} \,\varphi (\boldsymbol{\lambda} \text{*}{\kern 1pt} ) \leqslant N\varphi (\boldsymbol{\lambda} \text{*}{\kern 1pt} ).$$

Substituting this estimate into (A.42) yields

$$\frac{1}{{2\beta }}\left\| {{{\boldsymbol{\lambda} }^{N}} - \boldsymbol{\lambda} \text{*}} \right\|_{2}^{2} \leqslant \frac{1}{{2\beta }}\left\| {{{\boldsymbol{\lambda} }^{0}} - \boldsymbol{\lambda} \text{*}} \right\|_{2}^{2} + \sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} \text{*} - \;{{\boldsymbol{\lambda} }^{t}}} \right\rangle + \frac{\beta }{2}\left\| {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}})} \right\|_{2}^{2}.$$

(A.43)

Define ${{R}_{t}} = {{\left\| {{{\boldsymbol{\lambda} }^{t}} - \boldsymbol{\lambda} \text{*}} \right\|}_{2}}$ and ${{\widetilde R}_{t}} = \max \{ {{\widetilde R}_{{t - 1}}},{{R}_{t}}\} $, where ${{R}_{0}} = {{\widetilde R}_{0}}$. Since ${{\boldsymbol{\lambda} }^{0}} = {\mathbf{0}}$ and ${{\left\| {\boldsymbol{\lambda} \text{*}} \right\|}_{2}} \leqslant R$, we have ${{R}_{0}} = R$. Moreover, by construction, ${{\boldsymbol{\lambda} }^{t}} \in {{B}_{{{{{\widetilde R}}_{t}}}}}(\boldsymbol{\lambda} \text{*})$. In a similar manner, we define ${{\left\| {{{{\mathbf{a}}}^{t}}} \right\|}_{2}} = {{\left\| {{{\boldsymbol{\lambda} }^{t}} - \boldsymbol{\lambda} \text{*}} \right\|}_{2}} \leqslant {{\widetilde R}_{t}}$. Then (A.43) can be rewritten as

$$\frac{1}{{2\beta }}\widetilde R_{N}^{2} \leqslant \frac{1}{{2\beta }}\widetilde R_{0}^{2} + \sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),{{{\mathbf{a}}}^{t}}} \right\rangle + \frac{\beta }{2}\left\| {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}})} \right\|_{2}^{2}.$$

Define ${{\eta }^{t}} = \nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}})$. By Theorem 2.1 from [35],

$$\mathbb{P}\left\{ {{{{\left\| {\sum\limits_{t = 0}^{N - 1} \,{{{\boldsymbol{\eta }}}^{t}}} \right\|}}_{2}} \geqslant (\sqrt 2 + \sqrt 2 \gamma )\sqrt {\sum\limits_{t = 0}^{N - 1} \,\sigma _{t}^{2}} \,{\text{|}}\,\{ {{\xi }^{t}}\} _{{t = 0}}^{{N - 1}}} \right\} \leqslant \exp\left( { - \frac{{{{\gamma }^{2}}}}{3}} \right).$$

(A.44)

Using Lemma 2 from [34], we obtain

$$\mathbb{E}\left[ {\exp\left( {\frac{{\left\| {{{{\boldsymbol{\eta }}}^{t}}} \right\|_{2}^{2}}}{{{{\sigma }^{2}}}}} \right)\,{\text{|}}\,\{ {{\xi }^{k}}\} _{{k = 0}}^{{t - 1}}} \right] \leqslant \exp(1),$$

where ${{\eta }^{t}}$ depends only on ${{\xi }^{{t - 1}}}, \ldots ,{{\xi }^{0}}$. Using the new notation and (8), we have

$$\widetilde R_{N}^{2} \leqslant \widetilde R_{0}^{2} + 2\beta \sum\limits_{t = 0}^{N - 1} \,\left\langle {{{{\boldsymbol{\eta }}}^{t}},{{{\mathbf{a}}}^{t}}} \right\rangle + {{\beta }^{2}}{{M}^{2}}.$$

Then, by Lemma 11 with constants $A = \widetilde R_{0}^{2} + {{\beta }^{2}}{{M}^{2}}$, $d = 1$, and $u = \beta $, we conclude that, with probability $1 - 2\delta $, where $\tfrac{\delta }{N} = \exp\left( { - \tfrac{{{{\gamma }^{2}}}}{3}} \right)$, the estimates

$${{\widetilde R}_{l}} \leqslant J{{R}_{0}}\quad {\text{and}}\quad \sum\limits_{t = 0}^{l - 1} \,\left\langle {{{\eta }^{t}},{{{\mathbf{a}}}^{t}}} \right\rangle \leqslant D\sqrt {{{\sigma }^{2}}g(N)NJ} \widetilde R_{0}^{2}$$

(A.45)

hold for all $l = 1, \ldots ,N$ simultaneously, where $D$ is a positive constant,

$$F = 2{{\sigma }^{2}}N{{(2\beta )}^{N}}\left( {2A + \beta \widetilde R_{0}^{2} + 24\ln\frac{N}{\delta }\beta {{\sigma }^{2}}N} \right),$$

$f = {{\sigma }^{2}}\widetilde R_{0}^{2}$, $g(N) = \ln\left( {\tfrac{N}{\delta }} \right) + \ln\ln\left( {\tfrac{F}{f}} \right)$, and

$$J = \max\left\{ {1,\frac{1}{{{{{\widetilde R}}_{0}}}}\beta {{C}_{1}}\sqrt {{{\sigma }^{2}}g(N)} + \sqrt {\frac{1}{{\widetilde R_{0}^{2}}}{{\beta }^{2}}C_{1}^{2}{{\sigma }^{2}}g(N) + \frac{{2A}}{{R_{0}^{2}}}} } \right\}.$$

To estimate the duality gap, we use (A.41), noting that this estimate holds for any $\boldsymbol{\lambda} \in {\mathbf{R}}_{ + }^{m}$. Therefore, taking the minimum over all $\boldsymbol{\lambda} $ from the set ${{\Lambda }_{{2R}}} = \left\{ {\boldsymbol{\lambda} \in \mathbb{R}_{ + }^{m}:{{{\left\| \boldsymbol{\lambda} \right\|}}_{2}} \leqslant 2R} \right\}$ yields

$$N\varphi ({{\hat {\boldsymbol{\lambda} }}^{N}}) \leqslant \mathop {\min}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} \,\left( {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right) + \frac{1}{{2\beta }}\left\| {{{\boldsymbol{\lambda} }^{0}} - \boldsymbol{\lambda} } \right\|_{2}^{2}} \right\} + \frac{{N\beta {{M}^{2}}}}{2},$$

where the last term was estimated using assumption (8). Additionally, the inequality $\left\| {{{\boldsymbol{\lambda} }^{N}} - \boldsymbol{\lambda} } \right\|_{2}^{2} \geqslant 0$ was taken into account. By virtue of (A.29), we obtain the estimate

$$\varphi ({{\hat {\boldsymbol{\lambda} }}^{N}}) \leqslant \frac{1}{N}\mathop {\min}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} \,\left( {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right)} \right\} + \frac{{2{{R}^{2}}}}{{\beta N}} + \frac{{\beta {{M}^{2}}}}{2}.$$

Adding and subtracting $\sum\nolimits_{t = 0}^{N - 1} {\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } $ from the expression under the minimum sign yields

$$\begin{gathered} \mathop {\min}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} \,\left( {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right)} \right\} \leqslant \mathop {\min}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} \,\left( {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right)} \right\} \\ \, + \mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} } \right\rangle } \right\} + \sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}), - {{\boldsymbol{\lambda} }^{t}}} \right\rangle . \\ \end{gathered} $$

Note that $ - \boldsymbol{\lambda} \text{*} \in {{\Lambda }_{{2R}}}$. Then we have

$$\begin{gathered} \sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}), - {{\boldsymbol{\lambda} }^{t}}} \right\rangle = \sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} \text{*} - \;{{\boldsymbol{\lambda} }^{t}}} \right\rangle \\ \, + \sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}), - \boldsymbol{\lambda} \text{*}} \right\rangle \leqslant \mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} } \right\rangle \\ \, + \sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} \text{*} - \;{{\boldsymbol{\lambda} }^{t}}} \right\rangle . \\ \end{gathered} $$

It follows that

$$\begin{gathered} \varphi ({{{\hat {\boldsymbol{\lambda} }}}^{N}}) \leqslant \frac{1}{N}\mathop {\min}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} \,\left( {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right)} \right\} + \frac{{2{{R}^{2}}}}{{\beta N}} + \frac{{\beta {{M}^{2}}}}{2} \\ \, \leqslant \frac{1}{N}\mathop {\min}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} \,\left( {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right)} \right\} + \frac{1}{N}\sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} \text{*} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle \\ \, + \frac{2}{N}\mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} } \right\rangle } \right\} + \frac{{2{{R}^{2}}}}{{\beta N}} + \frac{{\beta {{M}^{2}}}}{2}. \\ \end{gathered} $$

The definition of the norm implies that

$$\mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} {\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} } \right\rangle } } \right\} \leqslant 2R{\kern 1pt} \mathop {\left\| {\sum\limits_{t = 0}^{N - 1} {\left( {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{l}})} \right)} } \right\|}\nolimits_2 .$$

(A.46)

Using (A.44), we conclude that, with probability $1 - \delta $,

$$\mathop {\left\| {\sum\limits_{t = 0}^{N - 1} \,\left( {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{l}})} \right)} \right\|}\nolimits_2 \leqslant \sigma \sqrt {2N} \left( {1 + \sqrt {3\ln\frac{1}{\delta }} } \right).$$

(A.47)

Substituting the values of $\varphi ({{\boldsymbol{\lambda} }^{t}})$ and $\nabla \varphi ({{\boldsymbol{\lambda} }^{t}})$ into the expression $\sum\limits_{t = 0}^{N - 1} {\left( {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right)} $ in (A.46) yields

$$\begin{gathered} \sum\limits_{t = 0}^{N - 1} \,\left( {\left\langle {{{\boldsymbol{\lambda} }^{t}},{\mathbf{b}}} \right\rangle + \sum\limits_{k = 1}^n \,({{u}_{k}}({{x}_{k}}({{\boldsymbol{\lambda} }^{t}})) - \left\langle {{{\boldsymbol{\lambda} }^{t}},{{{\mathbf{C}}}_{k}}{{x}_{k}}({{\boldsymbol{\lambda} }^{t}})} \right\rangle + \left\langle {{\mathbf{b}} - C{{{\mathbf{x}}}^{t}}({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right) \\ \, = \sum\limits_{t = 0}^{N - 1} \,\left( {\sum\limits_{k = 1}^n \,({{u}_{k}}({{x}_{k}}({{\boldsymbol{\lambda} }^{t}})) + \left\langle {{\mathbf{b}} - C{{{\mathbf{x}}}^{t}}({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} } \right\rangle } \right). \\ \end{gathered} $$

Then, since the functions ${{u}_{k}}({{x}_{k}})$ are concave, it holds that

$$\frac{1}{N}\mathop {\min}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} \,\left( {\varphi ({{\boldsymbol{\lambda} }^{t}}) + \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} - {{\boldsymbol{\lambda} }^{t}}} \right\rangle } \right)} \right\} \leqslant U({{{\mathbf{\hat {x}}}}^{N}}) - \frac{1}{N}\mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} \,\left\langle {C{{{\mathbf{x}}}^{t}}({{\boldsymbol{\lambda} }^{t}}) - {\mathbf{b}},\boldsymbol{\lambda} } \right\rangle } \right\}.$$

Combining this inequality with (A.46) gives

$$\begin{gathered} \varphi ({{{\hat {\boldsymbol{\lambda} }}}^{N}}) \leqslant U({{{{\mathbf{\hat {x}}}}}^{N}}) - \frac{1}{N}\mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\sum\limits_{t = 0}^{N - 1} \,\left\langle {C{{{\mathbf{x}}}^{t}}({{\boldsymbol{\lambda} }^{t}}) - {\mathbf{b}},\boldsymbol{\lambda} } \right\rangle } \right\} + \frac{{2{{R}^{2}}}}{{\beta N}} + \frac{{\beta {{M}^{2}}}}{2} \\ + \;\frac{{2R}}{N}\mathop {\left\| {\sum\limits_{t = 0}^{N - 1} \,\left( {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}})} \right)} \right\|}\nolimits_2 + \frac{1}{N}\sum\limits_{t = 0}^{N - 1} \,\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - \nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),\boldsymbol{\lambda} \text{*} - \;{{\boldsymbol{\lambda} }^{t}}} \right\rangle . \\ \end{gathered} $$

From this, taking into account estimate (A.47) and result (A.44), we see that, with probability $1 - 3\delta $,

$$\varphi ({{\hat {\boldsymbol{\lambda} }}^{N}}) - U({{{\mathbf{\hat {x}}}}^{N}}) + 2R\mathop {\left\| {{{{[C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}}]}}_{ + }}} \right\|}\nolimits_2 \leqslant \frac{{2R\sigma \sqrt 2 \left( {1 + \sqrt {3\ln\tfrac{1}{\delta }} } \right)}}{{\sqrt N }} + \frac{{2{{R}^{2}}}}{{\beta N}} + \frac{{\beta {{M}^{2}}}}{2} + {{C}_{1}}\frac{{\sigma \sqrt {g(N)J} {{R}^{2}}}}{{\sqrt N }}.$$

(A.48)

By Theorem 2.1 in [35], for all $\gamma > 0$, it is true that

$$P\left\{ {\mathop {\left\| {\sum\limits_{t = 0}^{N - 1} \,\left( {{\mathbf{x}}({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - {\mathbf{x}}({{\boldsymbol{\lambda} }^{t}})} \right)} \right\|}\nolimits_2 \geqslant (\sqrt 2 + \sqrt 2 \gamma )\sqrt {\sum\limits_{t = 0}^{N - 1} \,\sigma _{x}^{2}} \,{\text{|}}\,\{ {{\xi }^{t}}\} _{{t = 0}}^{{N - 1}}} \right\} \leqslant \exp\left( { - \frac{{{{\gamma }^{2}}}}{3}} \right).$$

Setting $\gamma = \sqrt {3\ln\tfrac{1}{\delta }} $, we conclude that, with probability $1 - \delta $,

$${{\left\| {{{{{\mathbf{\tilde {x}}}}}^{N}} - {{{{\mathbf{\hat {x}}}}}^{N}}} \right\|}_{2}} = \frac{1}{N}\mathop {\left\| {\sum\limits_{t = 0}^{N - 1} \,\left( {{\mathbf{x}}({{\boldsymbol{\lambda} }^{t}},{{\xi }^{t}}) - {\mathbf{x}}({{\boldsymbol{\lambda} }^{t}})} \right)} \right\|}\nolimits_2 \leqslant {{\sigma }_{x}}\sqrt {\frac{2}{N}} \left( {1 + \sqrt {3\ln\frac{1}{\delta }} } \right).$$

Then, with probability $1 - \delta $,

$${{\left\| {C{{{{\mathbf{\tilde {x}}}}}^{N}} - C{{{{\mathbf{\hat {x}}}}}^{N}}} \right\|}_{2}} \leqslant {{\left\| C \right\|}_{2}} \cdot {{\left\| {{{{{\mathbf{\tilde {x}}}}}^{N}} - {{{{\mathbf{\hat {x}}}}}^{N}}} \right\|}_{2}} \leqslant {{\sigma }_{x}}\sqrt {\frac{{2{{\lambda }_{{{\text{max}}}}}({{C}^{{\text{T}}}}C)}}{N}} \left( {1 + \sqrt {3\ln\frac{1}{\delta }} } \right).$$

Note that

$$\begin{gathered} 2R\mathop {\left\| {{{{[C{{{{\mathbf{\tilde {x}}}}}^{N}} - {\mathbf{b}}]}}_{ + }}} \right\|}\nolimits_2 = \mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\left\langle {C{{{{\mathbf{\tilde {x}}}}}^{N}} - {\mathbf{b}},\boldsymbol{\lambda} } \right\rangle + \left\langle {C{{{{\mathbf{\tilde {x}}}}}^{N}} - C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}} + {\mathbf{b}},\boldsymbol{\lambda} } \right\rangle } \right\} \\ \leqslant \;\mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\left\langle {C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}},\boldsymbol{\lambda} } \right\rangle } \right\} + \mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \left\{ {\left\langle {C{{{{\mathbf{\tilde {x}}}}}^{N}} - C{{{{\mathbf{\hat {x}}}}}^{N}},\boldsymbol{\lambda} } \right\rangle } \right\} \leqslant 2R\mathop {\left\| {{{{[C{{{{\mathbf{\tilde {x}}}}}^{N}} - {\mathbf{b}}]}}_{ + }}} \right\|}\nolimits_2 + 2R{\text{||}}C{{{{\mathbf{\tilde {x}}}}}^{N}} - C{{{{\mathbf{\hat {x}}}}}^{N}}{\text{|}}{{{\text{|}}}_{2}} \\ \, \leqslant 2R{\kern 1pt} \mathop {\left\| {{{{[C{{{{\mathbf{\tilde {x}}}}}^{N}} - {\mathbf{b}}]}}_{ + }}} \right\|}\nolimits_2 + 2R{{\sigma }_{x}}\sqrt {\frac{{2{{\boldsymbol{\lambda} }_{{{\text{max}}}}}({{C}^{{\text{T}}}}C)}}{N}} \left( {1 + \sqrt {3\ln\frac{1}{\delta }} } \right). \\ \end{gathered} $$

(A.49)

Since the function $U$ is Lipschitz continuous, we obtain

$$\left| {U({{{{\mathbf{\tilde {x}}}}}^{N}}) - U({{{{\mathbf{\hat {x}}}}}^{N}})} \right| \leqslant {{M}_{U}}{{\left\| {{{{{\mathbf{\tilde {x}}}}}^{N}} - {{{{\mathbf{\hat {x}}}}}^{N}}} \right\|}_{2}} \leqslant {{M}_{U}}{{\sigma }_{x}}\sqrt {\frac{2}{N}} \left( {1 + \sqrt {3\ln\frac{1}{\delta }} } \right).$$

Then

$$U({{{\mathbf{\hat {x}}}}^{N}}) = U({{{\mathbf{\tilde {x}}}}^{N}}) + (U({{{\mathbf{\hat {x}}}}^{N}}) - U({{{\mathbf{\tilde {x}}}}^{N}})) \geqslant U({{{\mathbf{\tilde {x}}}}^{N}}) - {{M}_{U}}{{\sigma }_{x}}\sqrt {\frac{2}{N}} \left( {1 + \sqrt {3\ln\frac{1}{\delta }} } \right).$$

(A.50)

Substituting (A.49) and (A.50) into (A.48), we conclude that, with probability $1 - 4\delta $,

$$\begin{gathered} \varphi ({{{\hat {\boldsymbol{\lambda} }}}^{N}}) - U({{{{\mathbf{\tilde {x}}}}}^{N}}) + 2R{\kern 1pt} \mathop {\left\| {{{{[C{{{{\mathbf{\tilde {x}}}}}^{N}} - {\mathbf{b}}]}}_{ + }}} \right\|}\nolimits_2 \leqslant {{C}_{1}}\frac{{\sigma \sqrt {g(N)J} {{R}^{2}}}}{{\sqrt N }} + \frac{{2{{R}^{2}}}}{{\beta N}} + \frac{{\beta {{M}^{2}}}}{2} \\ \, + \frac{{\sqrt 2 \left( {1 + \sqrt {3\ln\tfrac{1}{\delta }} } \right)}}{{\sqrt N }}\left( {{{M}_{U}}{{\sigma }_{x}} + 2R\left( {\sigma + {{\sigma }_{x}}\sqrt {{{{\lambda} }_{{{\text{max}}}}}({{C}^{{\text{T}}}}C)} } \right)} \right). \\ \end{gathered} $$

Proof of Theorem 3. Since ${{\left\| {\nabla \varphi l{\lambda} )} \right\|}_{2}} \leqslant M$ for any $\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}$ (see (5)), we have the estimate

$$\mathop {\sup}\limits_{{{\boldsymbol{\lambda} }^{1}},{{\boldsymbol{\lambda} }^{2}} \in {{\Lambda }_{{2R}}}} \left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{1}}),{{\boldsymbol{\lambda} }^{2}} - {{\boldsymbol{\lambda} }^{1}}} \right\rangle \leqslant M \cdot 4R.$$

Theorem 4.1 from [26] yields

$$\mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}} \sum\limits_{t = 1}^N \,{{\xi }^{t}}\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),{{\boldsymbol{\lambda} }^{t}} - \boldsymbol{\lambda} } \right\rangle \leqslant {{\varepsilon }_{N}},$$

where ${{\varepsilon }_{N}} = 32 \times 4MR\exp\left\{ { - \tfrac{N}{{2m(m + 1)}}} \right\}$. Then

$$\forall \boldsymbol{\lambda} \in {{\Lambda }_{{2R}}}\sum\limits_{t \in {{I}_{N}}} \,{{\xi }^{t}}\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),{{\boldsymbol{\lambda} }^{t}} - \boldsymbol{\lambda} } \right\rangle \leqslant \sum\limits_{t = 1}^N \,{{\xi }^{t}}\left\langle {\nabla \varphi ({{\boldsymbol{\lambda} }^{t}}),{{\boldsymbol{\lambda} }^{t}} - \boldsymbol{\lambda} } \right\rangle \leqslant {{\varepsilon }_{N}}.$$

It follows that

$$\sum\limits_{t \in {{I}_{N}}} \,{{\xi }^{t}}\left\langle {{\mathbf{b}} - C{{{\mathbf{x}}}^{t}},{{\boldsymbol{\lambda} }^{t}}} \right\rangle + \mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\Lambda }_{R}}} \left\langle { - \sum\limits_{t \in {{I}_{N}}} \,{{\xi }^{t}}({\mathbf{b}} - C{{{\mathbf{x}}}^{t}}),\lambda } \right\rangle \leqslant {{\varepsilon }_{N}},$$

which can be rewritten as

$$\sum\limits_{t \in {{I}_{N}}} \,{{\xi }^{t}}\left\langle {{\mathbf{b}} - C{{{\mathbf{x}}}^{t}},{{\boldsymbol{\lambda} }^{t}}} \right\rangle \leqslant {{\varepsilon }_{N}} - 2R\mathop {\left\| {{{{[C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}}]}}_{ + }}} \right\|}\nolimits_2 .$$

(A.51)

Next, by virtue of (3), for each ${\mathbf{x}} \geqslant 0$ and $t \in {{I}_{N}}$, we have

$$U({{{\mathbf{x}}}^{t}}({{\boldsymbol{\lambda} }^{t}})) - \left\langle {C{{{\mathbf{x}}}^{t}}({{\boldsymbol{\lambda} }^{t}}) - {\mathbf{b}},{{\boldsymbol{\lambda} }^{t}}} \right\rangle \geqslant U({\mathbf{x}}) - \left\langle {C{\mathbf{x}} - {\mathbf{b}},{{\boldsymbol{\lambda} }^{t}}} \right\rangle .$$

Multiplying the $t$th inequality by ${{\xi }^{t}}$, summing the result over all indices from ${{I}_{N}}$, and taking into account that $\sum\nolimits_{t \in {{I}_{N}}} {{{\xi }^{t}}U({{{\mathbf{x}}}^{t}}) \leqslant U({{{{\mathbf{\hat {x}}}}}^{N}})} $ and the functions ${{u}_{k}}({{x}_{k}})$, $k = 1, \ldots ,N$, are concave, we obtain

$$U({\mathbf{x}}) - U({{{\mathbf{\hat {x}}}}^{N}}) + \left\langle {{\mathbf{b}} - C{\mathbf{x}},{{{\hat {\boldsymbol{\lambda} }}}^{N}}} \right\rangle \leqslant \sum\limits_{t \in {{I}_{N}}} \,{{\xi }^{t}}\left\langle {{\mathbf{b}} - C{{{\mathbf{x}}}^{t}},{{\boldsymbol{\lambda} }^{t}}} \right\rangle ,$$

where ${{\hat {\boldsymbol{\lambda} }}^{N}} = \sum\limits_{t \in {{I}_{N}}} \,{{\xi }^{t}}{{\boldsymbol{\lambda} }^{t}}$. Using estimate (A.51), we derive

$$2R{\kern 1pt} \mathop {\left\| {{{{[C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}}]}}_{ + }}} \right\|}\nolimits_2 + U({\mathbf{x}}\text{*}) - U({{{\mathbf{\hat {x}}}}^{N}}) + \left\langle {{\mathbf{b}} - C{\mathbf{x}}\text{*},{{{\hat {\boldsymbol{\lambda} }}}^{N}}} \right\rangle \leqslant {{\varepsilon }_{N}}.$$

(A.52)

Since ${{\hat {\boldsymbol{\lambda} }}^{N}} \in {{\Lambda }_{{2R}}}$ and, hence, ${{\hat {\boldsymbol{\lambda} }}^{N}} \geqslant 0$, whence $\left\langle {{\mathbf{b}} - C{\mathbf{x}}{\kern 1pt} {\text{*}}{{{\hat {\boldsymbol{\lambda} }}}^{N}}} \right\rangle \geqslant 0$, it follows from (A.51) that $U({\mathbf{x}}\text{*}) - U({{{\mathbf{\hat {x}}}}^{N}}) \leqslant {{\varepsilon }_{N}}$. Furthermore, since, by the definition of $\boldsymbol{\lambda} \text{*}$, $U({\mathbf{x}}\text{*}) \geqslant U({\mathbf{x}}) - \left\langle {\boldsymbol{\lambda} \text{*},C{\mathbf{x}} - {\mathbf{b}}} \right\rangle $ for all ${\mathbf{x}} \geqslant 0$, we obtain

$$\begin{gathered} U({{{{\mathbf{\hat {x}}}}}^{N}}) \leqslant U({\mathbf{x}}\text{*}) - \left\langle {\boldsymbol{\lambda} \text{*},{\mathbf{b}} - C{{{{\mathbf{\hat {x}}}}}^{N}}} \right\rangle \leqslant U({\mathbf{x}}\text{*}) - \mathop {\min}\limits_{\boldsymbol{\lambda} \in {{\boldsymbol{\lambda} }_{R}}} \left\{ {\left\langle {\boldsymbol{\lambda} ,{\mathbf{b}} - C{{{{\mathbf{\hat {x}}}}}^{N}}} \right\rangle } \right\} \\ \, = U({\mathbf{x}}\text{*}) + \mathop {\max}\limits_{\boldsymbol{\lambda} \in {{\boldsymbol{\lambda} }_{R}}} \left\{ {\left\langle {\boldsymbol{\lambda} ,C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}}} \right\rangle } \right\} \leqslant U({\mathbf{x}}\text{*}) + R{\kern 1pt} \mathop {\left\| {{{{[C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}}]}}_{ + }}} \right\|}\nolimits_2 . \\ \end{gathered} $$

Combining this relation with (A.52) yields $R{{\left\| {{{{[C{{{{\mathbf{\hat {x}}}}}^{N}} - {\mathbf{b}}]}}_{ + }}} \right\|}_{2}} \leqslant {{\varepsilon }_{N}}$. Estimate (11) for the number of iterations of the method follows from the continued inequality

$$\begin{gathered} {{\varepsilon }_{N}} = 32 \times 4MR\exp\left\{ { - \frac{N}{{2m(m + 1)}}} \right\} \leqslant \varepsilon \Rightarrow - \frac{N}{{2m(m + 1)}} \\ \, \leqslant \ln\left( {\frac{\varepsilon }{{32 \times 4MR}}} \right) \Rightarrow N \geqslant 2m(m + 1)\ln\left( {\frac{{32 \times 4MR}}{\varepsilon }} \right). \\ \end{gathered} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vorontsova, E.A., Gasnikov, A.V., Dvurechensky, P.E. et al. Numerical Methods for the Resource Allocation Problem in a Computer Network. Comput. Math. and Math. Phys. 61, 297–328 (2021). https://doi.org/10.1134/S0965542521020135

Download citation

Received: 29 November 2019
Revised: 10 September 2020
Accepted: 16 September 2020
Published: 07 April 2021
Issue Date: February 2021
DOI: https://doi.org/10.1134/S0965542521020135

Keywords:

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Numerical Methods for the Resource Allocation Problem in a Computer Network

Abstract

Similar content being viewed by others

Distributed Optimization Over Networks

Exact spectral-like gradient method for distributed optimization

Further Algebraic Algorithms in the Congested Clique Model and Applications to Graph-Theoretic Problems

1 INTRODUCTION

1.1 Motivation

1.2 Content of This Paper

2 FORMULATION OF THE PROBLEM

2.1 Strongly Concave Utility Functions

2.2 Concave Utility Functions

3 FAST GRADIENT METHOD

3.1 Distributed Method

3.2 Estimation of the Convergence Rate of FGM

4 STOCHASTIC PROJECTED SUBGRADIENT METHOD

4.1 Distributed Method

4.2 Estimation of the Convergence Rate of the Stochastic Projected Subgradient Method

5 ELLIPSOID METHOD

6 REGULARIZATION OF THE DUAL PROBLEM

7 RANDOM GRADIENT EXTRAPOLATION METHOD

7.1 Distributed Method

7.2 Estimation of the Convergence Rate of RGEM

8 NUMERICAL EXPERIMENTS

8.1 Strongly Convex (Quadratic) Utility Functions

8.2 Convex (Logarithmic) Utility Functions

9 CONCLUSIONS

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

APPENDIX

1.1 Auxiliary Results

Rights and permissions

About this article

Cite this article

Keywords:

Navigation

Numerical Methods for the Resource Allocation Problem in a Computer Network

Abstract

Similar content being viewed by others

Distributed Optimization Over Networks

Exact spectral-like gradient method for distributed optimization

Further Algebraic Algorithms in the Congested Clique Model and Applications to Graph-Theoretic Problems

1 INTRODUCTION

1.1 Motivation

1.2 Content of This Paper

2 FORMULATION OF THE PROBLEM

2.1 Strongly Concave Utility Functions

2.2 Concave Utility Functions

3 FAST GRADIENT METHOD

3.1 Distributed Method

3.2 Estimation of the Convergence Rate of FGM

4 STOCHASTIC PROJECTED SUBGRADIENT METHOD

4.1 Distributed Method

4.2 Estimation of the Convergence Rate of the Stochastic Projected Subgradient Method

5 ELLIPSOID METHOD

6 REGULARIZATION OF THE DUAL PROBLEM

7 RANDOM GRADIENT EXTRAPOLATION METHOD

7.1 Distributed Method

7.2 Estimation of the Convergence Rate of RGEM

8 NUMERICAL EXPERIMENTS

8.1 Strongly Convex (Quadratic) Utility Functions

8.2 Convex (Logarithmic) Utility Functions

9 CONCLUSIONS

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

APPENDIX

APPENDIX

1.1 Auxiliary Results

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Search

Navigation