Random gradient-free method for online distributed optimization with strongly pseudoconvex cost functions

Yan, Xiaoxi; Li, Cheng; Lu, Kaihong; Xu, Hang

doi:10.1007/s11768-023-00181-8

Random gradient-free method for online distributed optimization with strongly pseudoconvex cost functions

Research Article
Published: 20 November 2023

Volume 22, pages 14–24, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Control Theory and Technology Aims and scope Submit manuscript

Random gradient-free method for online distributed optimization with strongly pseudoconvex cost functions

Download PDF

Xiaoxi Yan¹,
Cheng Li¹,
Kaihong Lu² &
…
Hang Xu²

262 Accesses
Explore all metrics

Abstract

This paper focuses on the online distributed optimization problem based on multi-agent systems. In this problem, each agent can only access its own cost function and a convex set, and can only exchange local state information with its current neighbors through a time-varying digraph. In addition, the agents do not have access to the information about the current cost functions until decisions are made. Different from most existing works on online distributed optimization, here we consider the case where the cost functions are strongly pseudoconvex and real gradients of the cost functions are not available. To handle this problem, a random gradient-free online distributed algorithm involving the multi-point gradient estimator is proposed. Of particular interest is that under the proposed algorithm, each agent only uses the estimation information of gradients instead of the real gradient information to make decisions. The dynamic regret is employed to measure the proposed algorithm. We prove that if the cumulative deviation of the minimizer sequence grows within a certain rate, then the expectation of dynamic regret increases sublinearly. Finally, a simulation example is given to corroborate the validity of our results.

Online distributed optimization with stochastic gradients: high probability bound of regrets

Article 04 January 2024

Randomized difference-based gradient-free algorithm for distributed resource allocation

Article 16 March 2022

Convergence of Distributed Gradient-Tracking-Based Optimization Algorithms with Random Graphs

Article 05 April 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, distributed optimization has attracted extensive attention in various fields [1,2,3,4,5,6]. This is due to its wide range of practicability in numerous fields such as distributed resource allocation [1], distributed economic dispatch [2], distributed machine learning [3], distributed coordination control [4], etc.

Distributed optimization in static environments has been widely studied in [7,8,9]. However, in practical applications, the scenarios that distributed optimization occurs are often dynamic. In recent years, online distributed optimization has been extensively studied [10,11,12,13,14]. For example, in [10], an online distributed push-sum algorithm is proposed for the unconstrained problem, and an online distributed coordinated algorithm based on the gradient descent method is developed in [11]. In [12], an online distributed saddle point algorithm is developed for optimization problem with a global set constraint, an online distributed mirror descent algorithm is proposed in [13], and an online distributed dual averaging algorithm is designed in [14].

It is worth pointing out that all the above works rely on real gradient information. However, it is not feasible or costly to calculate the gradient information accurately in practical applications. For example, in the Internet of Things [15], fog computing can not get the closed expression of delay since its online decision-making needs to adapt to the user preferences and the availability of resources is temporarily unpredictable. Moreover, in bandit optimization [16], agents can only observe the values of the cost functions, not the specific cost functions. In these cases, using zero-order information is desirable for distributed optimization. Recently, zeroth-order random online distributed optimization has been investigated in [17,18,19,20]. In [17, 18], zeroth-order random online distributed algorithms are proposed, where the zeroth-order information of cost functions is used. Furthermore, in [19, 20], two different random gradient-free algorithms are proposed for online distributed optimization under time-varying networks.

It is worth noting that most of the mentioned articles are applied to convex optimization problems. However, the problems of pseudoconvex optimization exist widely in reality. For example, in computer vision [21], the geometric expressions are usually modeled by pseudoconvex functions. Also, in portfolio planning [22] and fractional programming problems [23], cost functions are commonly formulated as pseudoconvex functions. Pseudoconvex optimization has a wider application range than convex optimization, as it can also be applied to some nonconvex cases. In fact, distributed optimization problems with pseudoconvex functions have only been studied in [24, 25], where real gradient information of cost functions is required.

Motivated by [19, 20, 24,25,26], we study the online distributed optimization problems with strongly pseudoconvex cost functions and random gradient-free method in this paper. Compared with [24, 25], where agents need to achieve real gradient information, here agents only use estimation of gradients instead of real gradient information to make decisions. To solve this problem, an online distributed algorithm with random gradient-free method is proposed, where a multi-point gradient estimator is used to estimate the gradients of local cost functions. Different from [11,12,13,14,15,16,17], which are based on the fact that the cost functions are convex, here the cost functions are considered to be strongly pseudoconvex. In [19, 20], gradient-free method and the convexity of cost functions are used to analyze the convergence of the proposed algorithms. Different from them, here we employ the strong pseudomonotonicity of cost functions and the Karush–Kuhn–Tucker (KKT) condition associated with pseudoconvex optimization to guarantee the effectiveness. We prove that if the graph is B-strongly connected, and the cumulative deviation of the minimizer sequence grows with a certain rate, then the expectation of dynamic regret increases sublinearly.

This paper is organized as follows. In Sect. 2, we formulate the problem and propose an algorithm. In Sect. 3, the main results are presented and the detailed proofs are given. A simulation example is given in Sect. 4. Section 5 is the conclusion of the full paper.

Notations We use $\nabla \psi (u)$ to denote the gradient of function $\psi $ at point u. $\lfloor T\rfloor $ is defined as set $\{0, 1, \ldots , T\}$ for any $T\in {\mathbb {N}}.$ For vectors ${{\varvec{u}}},{{\varvec{v}}}\in {\mathbb {R}}^{m}$ and matrix ${{\varvec{W}}}\in {\mathbb {R}}^{m\times m},$ we denote $[{{\varvec{u}}}]_i$ represents the ith element of ${{\varvec{u}}},$ $ \Vert {{{\varvec{u}}}} \Vert =\sqrt{{{\varvec{u}}}^{\textrm{T}}{{\varvec{u}}}},$ $\langle {{{\varvec{u}}},{{\varvec{v}}}}\rangle _{{\varvec{W}}}=\langle {{{\varvec{W}}}{{\varvec{u}}}}, {{\varvec{v}}}\rangle ,$ $\Vert {{\varvec{u}}}\Vert _{{\varvec{W}}} ^{2}= {{\varvec{u}}}^{\textrm{T}} {{{\varvec{W}}}{{\varvec{u}}}}.$ We use ${\mathbb {E}}\{{{\varvec{u}}}\}$ to denote the expectation of random variable ${{\varvec{u}}}.$

2 Problem formulation

2.1 Basic graph theory

A time-varying directed communication graph is defined as ${\mathcal {G}}(t)=({\mathcal {V}}, {\mathcal {E}}(t), {{\varvec{W}}}(t)),$ where ${\mathcal {V}}=\{1, \ldots ,$ $ n\}$ is a vertex set, ${\mathcal {E}}(t) \subset {\mathcal {V}}\times {\mathcal {V}}$ denotes an edge set and ${{\varvec{W}}}(t)=(w_{ij})_{n\times n}$ is a non-negative matrix to represent the weight of adjacent edges. The neighbor set of agent i is defined as ${\mathcal {N}}_{i}(t)=\{j\in {\mathcal {V}}\vert $ $( j,i)\in {\mathcal {E}}\},$ where agent i can receive information from agent j. For digraph ${\mathcal {G}}(t)$ and some $B>0,$ ${\mathcal {E}}_B=$ $\bigcup _{k=tB}^{(t+1)B-1}{{\mathcal {E}}}(k)$ denotes the B-edge set. Based on the above conditions, if digraph ${\mathcal {G}}(t)$ and the edge set ${\mathcal {E}}_B(t)$ are all strongly connected for any $t\ge 0.$ Then, ${\mathcal {G}}(t)$ is called B-strongly connected graph.

In this paper, the following assumption is made for the communication graph ${\mathcal {G}}(t).$

Assumption 1

For any $t\ge 0,$ ${\mathcal {G}}(t)$ is B-strongly connected graph and matrix ${{\varvec{W}}}(t)$ is doubly stochastic.

For distributed optimization, ${{\varvec{W}}}(t)$ is an essential part in facilitating agents to achieve consensus [25]. For any $t\ge k\ge 0,$ we define

$$\begin{aligned} \left\{ {\begin{array}{ll} \varPhi (t,k)={{\varvec{W}}}(t\!-\!1)\cdots {{\varvec{W}}}(k\!+\!1){{\varvec{W}}}(k), &{} \text{ if } t>k;\\ \varPhi (t,k)={{\varvec{1}}}_n,&{} \text{ if } t=k. \end{array}} \right. \end{aligned}$$

(1)

Lemma 1

[25] Based on Assumption 1, for any $i,j \in {\mathcal {V}}$ and $t\ge k,$ there exist certain $D>0$ and $0< \rho <1$ satisfying

$$\begin{aligned} \big \vert [ \varPhi (t,k)]_{ij} -\dfrac{1}{n} \big \vert \le D \rho ^{t-k}. \end{aligned}$$

(2)

2.2 Online distributed optimization

In this paper, we consider a multi-agent system with n agents, where agents exchange local information via graph ${{\mathcal {G}}(t)}.$ For any $i\in {\mathcal {V}},$ a sequence of cost functions is given by $\{\psi _i^1,\ldots ,\psi _i^{\textrm{T}}\},$ where T is a finite time horizon but is unknown by agents. For any $ t\in \lfloor T\rfloor ,$ $\psi ^t_i: \varOmega \rightarrow {\mathbb {R}}$ is the cost function of agent i at time t and $\varOmega \subset {\mathbb {R}}^m.$ Then, all agents aim to collaboratively solve the following optimization problem with a set constraint, which has the form:

$$\begin{aligned}&\min ~ \psi ^t(z)=\textstyle \sum \limits _{i=1}^n \psi ^t_i(z)~~ \text{ s.t. } z\in \varOmega . \end{aligned}$$

(3)

Here, we make some basic assumptions for the problem.

Assumption 2

Constraint set $\varOmega $ is nonempty, convex, and compact.

Assumption 3

(Strong pseudomonotonicity) For any $m,~n\in \varOmega ,$ if $\langle \nabla {\psi ^t(n)},m-n\rangle \ge 0,$ then $\langle \nabla {\psi ^t(m)},m-n\rangle $ $\ge $ $\beta \Vert m-n\Vert ^2$ for some $\beta >0.$

It follows from [27] that if $\nabla \psi ^t$ is strong pseudomonotone, then $\psi ^t$ is strongly pseudoconvex. The definitions of strongly pseudoconvex functions are given as follows.

Definition 1

For a differentiable function $\psi (\cdot ):{\mathbb {R}}^m\rightarrow {\mathbb {R}}$ and a convex set $\varOmega \in {\mathbb {R}}^m,$ $\psi (\cdot )$ is named a pseudoconvex function if $\langle \nabla \psi (n),m-n\rangle \ge 0$ implies $\psi (m)-\psi (n)\ge 0$ for each pair of different points $m,n\in \varOmega .$ Moreover, if $\langle \nabla \psi (n),m-n\rangle \ge 0$ implies $\psi (m)-\psi (n)\ge \beta \Vert m-n\Vert ^2$ for some $\beta >0$ and each pair of different points $m,n\in \varOmega ,$ then $\psi (\cdot )$ is named a strongly pseudoconvex function on $\varOmega .$

Assumption 4

(Lipschitz continuous gradient) $\Vert \nabla \psi _i^t(m)-\nabla \psi _i^t(n)\Vert \le L_1\Vert m-n\Vert , ~\forall m, n\in \varOmega $ for some $L_1>0.$

In this paper, we are committed to developing a gradient-free method for solving problem (3). From [28], the smoothed version of $\psi ^t$ is defined as

$$\begin{aligned} \psi _{{\mu }}^t(z)=\dfrac{1}{(2\pi )^{m/2}}\oint _{{\mathbb {R}}^m} \psi ^t(z+\mu \xi ){\textrm{e}}^{-\frac{1}{2}\Vert \xi \Vert ^2} {\textrm{d}}\xi , \end{aligned}$$

(4)

where $\mu >0 $ is the smoothing parameter of function $\psi _{\mu }^t(z).$ And the multi-point unbiased gradient estimation is defined as

$$\begin{aligned}&\hat{{{\varvec{g}}}}(z_i(t))\nonumber \\&\quad =\dfrac{1}{Q_i}\textstyle \sum \limits _{q_i=1}^{Q_i} \dfrac{\psi _i^t\big (z_i(t)+\mu _i\xi _{q_i}(t)\big )-\psi _i^t\big ( z_i(t)\big )}{\mu _i}\xi _{q_i}(t), \end{aligned}$$

(5)

where $\hat{{{\varvec{g}}}}(z_i(t))$ ia an unbiased estimator of $\nabla \psi _{\mu _i}^t(z_i(t)),$ $\mu _i>0$ is the smoothing parameter and the random sequence $\xi _{q_i}(t)$ is locally generated from an i.i.d. standard Gaussian distribution for any $q\in Q,$ where $Q\in {\mathbb {N}}^+$ is the number of multi-point estimation. Based on results in [29], we know that if $\psi ^t(\cdot )$ is differentiable, then $\psi _{\mu }^t(\cdot )$ is also differentiable, the gradient of $\psi _{\mu }^t(\cdot )$ is defined as

$$\begin{aligned}&\nabla \psi _{\mu }^t(z)\nonumber \\&\quad =\dfrac{1}{(2\pi )^{m/2}}\oint _{{\mathbb {R}}^m} \dfrac{\psi ^t\big (z+\mu \xi \big )-\psi ^t\big ( z\big )}{\mu } \xi {\textrm{e}}^{-\frac{1}{2}\Vert \xi \Vert ^2} {\textrm{d}}\xi . \end{aligned}$$

(6)

Any online algorithm should mimic the performance of its offline counterpart, and the gap between them is called regret [25]. The regret with most stringent offline benchmark is the dynamic regret, whose offline benchmark is to minimize $\psi ^t(z)$ at each time. The definition of dynamic regrets is given by

$$\begin{aligned} {\mathcal {R}}_i^d(T)=\textstyle \sum \limits _{t=0}^T \psi ^t(z_i(t))-\textstyle \sum \limits _{t=0}^T \psi ^t(z^*(t)), ~i\in {\mathcal {V}}, \end{aligned}$$

(7)

where $z^*(t) =\text {argmin}_{z\in \varOmega }\psi ^t(z)$ for any $t\in \lfloor T\rfloor .$ An online optimization algorithm is announced “good” if regret (7) increases sublinearly, i.e., $\lim _{T\rightarrow \infty }{\mathcal {R}}_i^d(T)/T=0.$ Unfortunately, using dynamic regret will cause the problem to become insolvable when the minimizer of the cost function changes rapidly. Inspired by [25], the difficulty is described by the deviation of the minimization sequence $\{z^* (t)\}_{t=0}^{T}$ :

$$\begin{aligned} {\varTheta }_T=\textstyle \sum \limits _{t=0}^T \Vert z^* (t+1)-z^* (t)\Vert . \end{aligned}$$

(8)

2.3 Online distributed gradient-free algorithm

Now, we consider an offline and centralized optimization problem, defined as

$$\begin{aligned} \min ~ \psi (z), \text{ s.t. } z\in \varOmega , \end{aligned}$$

(9)

where objective function $\psi $ is strongly pseudoconvex, and constraint set $\varOmega $ satisfies Assumption 2. The KKT condition of pseudoconvex optimization in terms of variational inequality is given in the following lemma.

Lemma 2

[25] Suppose function $\psi : {\mathbb {R}}^m \rightarrow {\mathbb {R}}$ is pseudoconvex and differentiable and constraint set $\varOmega $ is convex. Then, $z^*$ is a minimum point of $\psi $ on $\varOmega $ if it can satisfy the following variational inequality

$$\begin{aligned} \langle \nabla \psi (z^*),z-z^*\rangle \ge 0, \quad \forall ~z\in \varOmega . \end{aligned}$$

(10)

Based on Lemma 2, we can know problem (9) is solved if there exists a point $z \in \varOmega $ satisfying

$$\begin{aligned} \langle \nabla \psi (z),u-z\rangle \ge 0,\quad \forall ~ u\in \varOmega . \end{aligned}$$

(11)

Combining with multi-point unbiased gradient estimation, we construct an auxiliary problem, defined as

$$\begin{aligned} \min ~z^{\textrm{T}}Pz+\langle \alpha \hat{{{\varvec{g}}}}(z_0) -2Pz_0,z\rangle ,\quad \text{ s.t. }\ z\in \varOmega , \end{aligned}$$

(12)

where $z_0\in \varOmega ,$ $\alpha >0$ and matrix $P\in {\mathbb {R}}^{m\times m}$ is positive definite and symmetric. Based on KKT condition, $z^*\in \varOmega $ is the solution to (12) if and only if

$$\begin{aligned} \langle 2Pz^* +\alpha \hat{{{\varvec{g}}}}(z_0)-2Pz_0,u-z^*\rangle ,\quad \forall u\in \varOmega . \end{aligned}$$

(13)

By comparing (11) and (13), we can achieve that $z^*$ is also the solution to (9) when $z^* =z_0.$ By replacing $z_0$ with z(t) and $z^*$ with $z(t+1),$ we propose the following auxiliary optimization strategy :

$$\begin{aligned}&z(t+1)\nonumber \\&\quad =\arg \min \limits _{z\in \varOmega }\Big \{ z^{\textrm{T}}Pz+\langle \alpha (t) \hat{{{\varvec{g}}}}(z(t))-2Pz(t),z\rangle \Big \}. \end{aligned}$$

(14)

Note that if $z(t+1)=z(t)$ in (14), then problem (9) is solved, which means that the solution of problem (9) is the equilibrium point of problem (14). Let matrix $P={{\varvec{I}}}_m,$ if $\psi ^t_i $ is a convex function, then we have $\arg \min _{z\in \varOmega }\{ z^{\textrm{T}}Pz+\langle \alpha (t)\hat{{{\varvec{g}}}}(z(t))-2Pz(t), z\rangle \}=z(t)-\frac{\alpha (t)}{2}\hat{{{\varvec{g}}}}(z(t)).$ Thus, (14) can be regarded as an extension of the gradient descent algorithm. The detailed proofs of the convergence for algorithm (14) can be found in [30, 31].

To solve problem (3), a random gradient-free online distributed algorithm is proposed

$$\begin{aligned} \left\{ {\begin{array}{l} z_i(t+1)\\ =\arg \min \limits _{z\in \varOmega }\Big \{z^{\textrm{T}}Pz+\langle \alpha (t)\hat{{{\varvec{g}}}}(z_i(t))-2P\nu _i(t),z\rangle \Big \},\\ \nu _i(t)=\textstyle \sum \limits _{j\in {\mathcal {N}}_i}w_{ij}z_{j}(t),\\ \hat{{{\varvec{g}}}}(z_i(t))\\ =\dfrac{1}{Q_i}\textstyle \sum \limits _{q_i=1}^{Q_i}\dfrac{\psi ^t_i (z_i(t)+\mu _i\xi _{q_i}(t))-\psi ^t_i(z_i(t))}{\mu _i}\xi _{q_i}(t), \end{array}} \right. \end{aligned}$$

(15)

where $z_i(t)$ is the state of agent i at time t with $z_i(0)\in \varOmega $ and $\alpha (t)>0$ is a diminishing learning rate with initial value $\alpha (0)=\alpha _0.$ Motivated by the multi-point gradient estimation method, consensus algorithm, and the auxiliary optimization strategy, we propose algorithm (15). When running algorithm (15), each agent updates status at each time only using the information received from its neighbors and its own gradient estimation at past time. Therefore, (15) is an online and distributed algorithm.

3 Main results

In this part, we will elaborate the main results of this paper and give their concrete proof.

Theorem 1

Based on Assumptions 1–4, if the learning rate $\alpha (t)=\frac{c}{\sqrt{t+1}}$ for some $c >0,$ then for any $i\in {\mathcal {V}}$ and $T\in {\mathbb {N}},$ the following inequality holds

$$\begin{aligned} {\mathbb {E}}\{{\mathcal {R}}_i^d(T)\}&\le nL_0\sqrt{\varGamma +\dfrac{F_1T}{\upsilon \ln (T+1)}\textstyle \sum \limits _{i=1}^n\mu _i+\dfrac{16hM\varTheta _T}{c\mu \ln 2}}\nonumber \\&\quad \cdot \Big ((T+1)^{\frac{3}{4}}\sqrt{\ln (T+1)}\Big ), \end{aligned}$$

(16)

where $\varGamma =\frac{(4{\hat{\rho }}_1/(c \upsilon )+ 2\hat{{\mathcal {K}}}_1))+3c^2(4{\hat{\rho }}_2/(c \upsilon )+2\hat{{\mathcal {K}}}_2)}{\rho (1-\rho )\ln 2}+ \frac{6nc L_0^2}{\upsilon \varepsilon }$ $+\frac{4d}{c \upsilon \ln 2},$ $\hat{{\mathcal {K}}}_1={\mathcal {C}}^2+\frac{{\mathcal {C}}L_0 Dn\sqrt{m}}{\varepsilon (1-\rho )} (m+ 4)L_0,$ ${\hat{\rho }}_2= \frac{n(5M{\mathcal {K}}_2+\sqrt{m+4}L_0(kL_1+L_0)Dn\sqrt{m})}{\varepsilon },$ $\hat{{\mathcal {K}}}_2\!=\!\frac{mn^2 H^2}{4\varepsilon ^2(1\!-\!\rho )}(m$ $+4)^2L_0^2,$ ${\hat{\rho }}_1= n(5M\hat{{\mathcal {K}}}_1+(kL_1+L_0)c {\mathcal {C}}),$ $F_1=L_1 kc(m+3)^{\frac{3}{2}},$ $L_0=\sup _{t\in \lfloor T\rfloor ,i\in {\mathcal {V}},z\in \varOmega }\Vert \nabla \psi _i^t(z)\Vert ,$ $d=Mk^2,$ $\varepsilon =\lambda _{\min }( {P}),$ ${\mathcal {C}}=D\sqrt{m} \textstyle \sum \limits _{i=1}^{n}\Vert z_i(0)\Vert _1,$ $\beta _1=\sup _{t\in \lfloor T\rfloor ,i\in {\mathcal {V}},z\in \varOmega }\Vert \hat{{{\varvec{g}}}}(z_i(t))\Vert ,$ $M=n\lambda _{\max }({P}),$ $L_1 =\sup _{t\in \lfloor T\rfloor ,i\in {\mathcal {V}},z\in \varOmega }\Vert \nabla ^2 f_i^t(z)\Vert ,$ and $h=\sup _{z\in \varOmega }\Vert z\Vert .$

By Theorem 1, we know that both $\varTheta _T$ and $\mu _i$ play important roles in the bound of dynamic regret expectation. Note that if $\mu _i=T^{-\frac{1}{2}-\varrho }$ for any $\varrho >0$ and $\varTheta _T$ grows sublinearly with $\frac{\sqrt{T+1}}{\ln (T+1)},$ then ${\mathbb {E}}\{{\mathcal {R}}_i^d(T)\}$ increase sublinearly with T, which implies the performance of online distributed algorithm (15) is “good”. Therefore, algorithm (15) can be employed to solve some strongly pseudoconvex optimization problems, where the specific gradient information is unavailable or expensive to obtain. If the fluctuation of minimizer sequence $\{z^* (t)\}_{t=0}^{T}$ is dramatical, ${\varTheta }_T$ might become linear with $\frac{\sqrt{T+1}}{\ln (T+1)},$ then the problem becomes insolvable. It is a natural phenomenon, even in online convex optimization.

Before proving Theorem 1, some useful lemmas are need to be presented. First, we use ${\mathcal {F}}_t$ to denote the $\sigma $-field generated by the entire history of the random variable as

$$\begin{aligned} {\mathcal {F}}_t=\{\text{ col } (z_i(s) )_{i\in {\mathcal {V}}}, \text{ col }(\varsigma _i(s))_{i\in {\mathcal {V}}}, s=0,\ldots , t\}. \end{aligned}$$

(17)

Based on the above definition, the following lemma can be achieved.

Lemma 3

[19, 28] If $\nabla \psi _i^t$ is $L_1$-Lipschitz continuous on $\varOmega ,$ then

$$\begin{aligned}&(\textrm{a})~~{\mathbb {E}}\{\Vert \hat{{{\varvec{g}}}}(z_i(t))\Vert ^2\vert {\mathcal {F}}_t\}\le (m+4)^2 L_0^2.\nonumber \\&(\textrm{b})~~\Vert \nabla \psi _{\mu _i}^t(z_i(t)) -\nabla \psi _i^t(z_i(t))\Vert \le \dfrac{\mu }{2}L_1 (m+3)^{\frac{3}{2}}.\nonumber \\&(\textrm{c})~~{\mathbb {E}}\{\Vert \hat{{{\varvec{g}}}}(z_i(t))-\nabla \psi _{\mu _i}^t(z_i(t)) \Vert \vert {\mathcal {F}}_{t}\}=0. \end{aligned}$$

(18)

To prove Theorem 1, first, we present the following lemma, which gives the upper bound of the discrepancy between each agent’s state and their average state at each update.

Lemma 4

Based on Assumptions 1–4, for any $i \in {\mathcal {V}},$

$$\begin{aligned} \Vert z_i(t)-{\bar{z}}(t)\Vert \le {\mathcal {C}}\rho ^t+\dfrac{n D\sqrt{m}\beta _1}{\varepsilon }\textstyle \sum \limits _{\gamma =0}^t \rho ^{t-\gamma }\alpha (\gamma ) \end{aligned}$$

(19)

and

$$\begin{aligned} \Vert z_i(t)-{\bar{z}}(t)\Vert ^2\le {\mathcal {K}}_1\rho ^t+{\mathcal {K}}_2 \textstyle \sum \limits _{\gamma =0}^t \rho ^{t-\gamma }(\alpha (\gamma ))^2, \end{aligned}$$

(20)

where ${\bar{z}}(t)=\frac{1}{n} \textstyle \sum \limits _{i=1}^n z_i(t) ,$ ${\mathcal {K}}_1={\mathcal {C}}^2+\frac{{\mathcal {C}}Dn\sqrt{m}\beta _1}{\varepsilon (1-\rho )},$ ${\mathcal {K}}_2=\frac{m(n D\beta _1)^2}{4\varepsilon ^2(1-\rho )}.$

Proof

See Appendix 1. $\square $

Next, we give the expectation for the upper bound of the cumulative square error between agents’ optimal state and their average state at each iteration time.

Lemma 5

Under Assumptions 1–4, we have

$$\begin{aligned}&{\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert {\bar{z}}(t)-z^* (t)\Vert ^2\Big \}\nonumber \\&\quad \le \dfrac{2d}{u\alpha (T)}+\dfrac{8hM\varTheta _T}{u \alpha (T)}+\dfrac{2{\hat{\rho }}_1}{u \alpha (T)}\textstyle \sum \limits _{t=0}^T \rho ^t\nonumber \\&\qquad +\dfrac{2{\hat{\rho }}_2}{u \alpha (T)}\textstyle \sum \limits _{t=0}^T \textstyle \sum \limits _{\gamma =0}^{t+1}\rho ^{t-\gamma }(\alpha (\gamma ))^2\nonumber \\&\qquad +\dfrac{F_1T\textstyle \sum \limits _{i=1}^n\mu _i}{u \alpha (T)} +\dfrac{nL_0}{\varepsilon u \alpha (T)}\textstyle \sum \limits _{t=0}^T(\alpha (t))^2. \end{aligned}$$

(21)

Proof

See Appendix 1. $\square $

Proof of Theorem 1

Based on Lemmas 4–5, we can achieve

$$\begin{aligned}&{\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert z(t)-z^* (t)\Vert ^2\Big \}\nonumber \\&\quad \le 2{\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T \Vert {\bar{z}}(t)\!-\!z^* (t)\Vert ^2\Big \} +2{\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert z_i (t)\!-\!{\bar{z}}(t)\Vert ^2\Big \} \nonumber \\&\quad \le \left( \dfrac{4{\hat{\rho }}_1}{u \alpha (T)}+2\hat{{\mathcal {K}}}_1\right) \textstyle \sum \limits _{t=0}^T \rho ^t+\dfrac{4d}{u \alpha (T)}+\dfrac{16hM\varTheta _T}{u \alpha (T)}\nonumber \\&\qquad +\left( \dfrac{4{\hat{\rho }} _2}{u \alpha (T)}+2\hat{{\mathcal {K}}}_2\right) \textstyle \sum \limits _{t=0}^T \textstyle \sum \limits _{\gamma =0}^{t+1}\rho ^{t-\gamma }(\alpha (\gamma ))^2\nonumber \\&\qquad +\dfrac{2F_1T\textstyle \sum \limits _{i=1}^n\mu _i}{u \alpha (T)}+\dfrac{2nL_0}{\varepsilon u \alpha (T)}\textstyle \sum \limits _{t=0}^T(\alpha (t))^2. \end{aligned}$$

(22)

Let $\alpha (t)=\frac{c}{\sqrt{t+1}},$ there holds

$$\begin{aligned}&\textstyle \sum \limits _{t=0}^T \textstyle \sum \limits _{\gamma =0}^{t+1}\rho ^{t-\gamma }(\alpha (\gamma ))^2\nonumber \\&\quad =\textstyle \sum \limits _{\gamma =1}^{T+1} \textstyle \sum \limits _{t=\gamma -1}^{T}\rho ^{t-\gamma }(\alpha (\gamma ))^2+\textstyle \sum \limits _{t=0}^T\rho ^t(\alpha (0))^2\nonumber \\&\quad \le \left( \textstyle \sum \limits _{\gamma =1}^{T+1}(\alpha (\gamma )\right) ^2\left( \textstyle \sum \limits _{t=0}^T\rho ^t \right) ^2+\textstyle \sum \limits _{t=0}^T\rho ^t (\alpha (0))^2 \nonumber \\&\quad \le \dfrac{c ^2 (1+\rho +\ln (T+1))}{ (1-\rho )\rho } \le \dfrac{3c ^2 \ln (T+1)}{ (1-\rho )\rho \ln 2}. \end{aligned}$$

(23)

Based on Jensen’s inequality, we have

$$\begin{aligned} \Big (\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert \Big )^2 \le (T+1)\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert ^2. \end{aligned}$$

(24)

Now, by taking expectation on both sides of inequality (24), we can obtain

$$\begin{aligned}&{\mathbb {E}}\Big \{\Big (\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert \Big )^2 \Big \}\nonumber \\&\quad \le (T+1){\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert ^2\Big \}. \end{aligned}$$

(25)

Based on (22), (23), and (25), we have

$$\begin{aligned}&{\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert \Big \}\nonumber \\&\quad \le \sqrt{(T+1){\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert ^2}\Big \}\nonumber \\&\quad \le \sqrt{\varGamma +\dfrac{2(T+1)F_1\textstyle \sum \limits _{i=1}^n\mu _i}{\upsilon c\ln (T+1)}+\dfrac{16hM\varTheta _T}{c\mu \ln 2}}\nonumber \\&\qquad \cdot \big (T+1\big )^{\frac{3}{4}}\sqrt{\ln (T+1)}. \end{aligned}$$

(26)

Note that $\nabla \psi _i ^t$ is bounded by $L_0$ for any $i \in {\mathcal {V}}$ and $t \in \lfloor T\rfloor .$ Thus,

$$\begin{aligned} {\mathbb {E}}\Big \{{\mathcal {R}}_i ^d (T)\Big \}&={\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T(\psi ^t(z_i (t))-\psi ^t(z^* (t))\Big \} \nonumber \\&\le {\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T \textstyle \sum \limits _{j=1}^n \Vert \psi ^t_j(z_i (t))-\psi ^t_j(z^* (t)\Vert \Big \}\nonumber \\&\le {\mathbb {E}}\Big \{nL_0 \textstyle \sum \limits _{t=0}^T\Vert z_i (t)-z^* (t)\Vert \Big \}. \end{aligned}$$

(27)

Substituting (26) into (27) yields (16). Thus, Theorem 1 is proved. $\square $

4 A simulation example

In this part, we illustrate the validity of proposed algorithm by using a numerical example. Assume a multi-agent system with six agents, labeled as ${\mathcal {V}}=\{1,2,\ldots ,6\}.$ Each agent exchange information with its neighbors via a time-varying digraph as shown in Fig. 1. For any $i\in {\mathcal {V}},$ cost function of agent i is given by

$$\begin{aligned} \psi _i(z)=p_i z^3+q_i(t)z, \end{aligned}$$

where $z\in \varOmega .$ In this simulation, parameters are selected as $p_1=0.6,$ $p_2=0.8,$ $p_3=p_6=1,$ $p_4=0.5,$ $p_5=0.1,$ $q_i(t)$ is randomly selected from $[-i,i]$ and subject to a uniform distribution. Moreover, $\varOmega =\{z\vert -10\le z\le -1\}$ is the constraint set of the objective function. Initial states of agents are selected as $z_1(0)=0.3,$ $z_2(0)=-0.5,$ $ z_3(0)=-0.5,$ $z_4(0)=0.4,$ $z_5(0)=-0.1$ and $z_6(0)=-0.2.$ Algorithm (15) is applied to solving this problem. Let $\alpha (t)={1}/{\sqrt{200t+500}},$ $\mu _i =0.05$ and $Q=1.$ The trajectories of $x_i(t), i=1,\ldots ,6$ are represented in Fig. 2 and Fig. 3 displays the trajectories of the average regrets of all agents. From Fig. 2, it can be clearly seen that all agents’ states gradually approach to the optimal solution $z^*(t).$ Furthermore, by observing Fig. 3, one can find that average regret of each agent gradually diminishes to zero. These observations indicate the correctness of achieved theoretical results.

5 Conclusions

This paper deals with the zeroth-order online distributed optimization problem where the sum of local cost functions is strongly pseudoconvex. To solve this problem, a random gradient-free algorithm based on multi-point gradient estimation method and auxiliary optimization strategy is proposed. Under this algorithm, each agent updates state at each time only using the information received from its neighbors at the current moment and the information of its own gradient estimation at previous time.The proposed algorithm is measured by the dynamic regret. The results indicate that if the communication graph is B-strongly connected, and the deviation of the minimizer sequence grows with a certain rate, then the expectation of dynamic regret increases with some sublinear bound. The simulation example in the previous section has verified its validity. The cases of communication delays and packet losses will be our future work, which means that there will be new difficulty for the online distributed pseudoconvex optimization problems.

Data Availability

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

Deng, Z., & Chen, T. (2022). Distributed algorithm design for constrained resource allocation problems with high-order multi-agent systems. Automatica, 144, 110492. https://doi.org/10.1016/j.automatica.2022.110492
Article MathSciNet Google Scholar
Yang, T., Lu, J., Wu, D., Wu, J., Shi, G., Meng, Z., & Johansson, K. H. (2016). A distributed algorithm for economic dispatch over time-varying directed networks with delays. IEEE Transactions on Industrial Electronics, 64(6), 5095–5106. https://doi.org/10.1109/TIE.2016.2617832
Article Google Scholar
Lee, S., Kim, J. K., Zheng, X., Ho, Q., Gibson, G. A., & Xing, E. P. (2014). On model parallelization and scheduling strategies for distributed machine learning. Advances in Neural Information Processing Systems, 27, 2834–2842.
Google Scholar
Lu, K., & Zhu, Q. (2020). Distributed algorithms involving fixed step size for mixed equilibrium problems with multiple set constraints. IEEE Transactions on Neural Networks and Learning Systems, 32(11), 5254–5260. https://doi.org/10.1109/TNNLS.2020.3027288
Article MathSciNet Google Scholar
Lu, K., Zhu, Q., & Yan, X. (2022). Distributed ergodic algorithms for mixed equilibrium problems: Absent of cut property. Automatica, 141, 110297. https://doi.org/10.1016/j.automatica.2022.110297
Article MathSciNet Google Scholar
Fu, Q., Xu, F., Shen, T., & Takai, K. (2020). Distributed optimal energy consumption control of HEVs under MFG-based speed consensus. Control Theory and Technology, 18, 193–203. https://doi.org/10.1007/s11768-020-0021-6
Article MathSciNet Google Scholar
Zhu, Y., Yu, W., Wen, G., Chen, G., & Ren, W. (2018). Continuous-time distributed subgradient algorithm for convex optimization with general constraints. IEEE Transactions on Automatic Control, 64(4), 1694–1701. https://doi.org/10.1109/TAC.2018.2852602
Article MathSciNet Google Scholar
Xiong, H., Han, J., Nian, X., & Li, S. (2021). Distributed projection subgradient algorithm for two-network zero-sum game with random sleep scheme. Control Theory and Technology, 19(3), 405–417. https://doi.org/10.1007/s11768-021-00055-x
Article MathSciNet Google Scholar
Lü, Q., Li, H., & Xia, D. (2017). Distributed optimization of first-order discrete-time multi-agent systems with event-triggered communication. Neurocomputing, 235, 255–263. https://doi.org/10.1016/j.neucom.2017.01.021
Article Google Scholar
Akbari, M., Gharesifard, B., & Linder, T. (2015). Distributed online convex optimization on time-varying directed graphs. IEEE Transactions on Control of Network Systems, 4(3), 417–428. https://doi.org/10.1109/TCNS.2015.2505149
Article MathSciNet Google Scholar
Mateos-Núnez, D., & Cortés, J. (2014). Distributed online convex optimization over jointly connected digraphs. IEEE Transactions on Network Science and Engineering, 1(1), 23–37. https://doi.org/10.1109/TNSE.2014.2363554
Article MathSciNet Google Scholar
Koppel, A., Jakubiec, F. Y., & Ribeiro, A. (2015). A saddle point algorithm for networked online convex optimization. IEEE Transactions on Signal Processing, 63(19), 5149–5164. https://doi.org/10.1109/TSP.2015.2449255
Article MathSciNet ADS Google Scholar
Shahrampour, S., & Jadbabaie, A. (2017). Distributed online optimization in dynamic environments using mirror descent. IEEE Transactions on Automatic Control, 63(3), 714–725. https://doi.org/10.1109/TAC.2017.2743462
Article MathSciNet Google Scholar
Hosseini, S., Chapman, A., & Mesbahi, M. (2016). Online distributed convex optimization on dynamic networks. IEEE Transactions on Automatic Control, 61(11), 3545–3550. https://doi.org/10.1109/TAC.2016.2525928
Article MathSciNet Google Scholar
Chen, T., & Giannakis, G. B. (2018). Bandit convex optimization for scalable and dynamic IoT management. IEEE Internet of Things Journal, 6(1), 1276–1286. https://doi.org/10.1109/JIOT.2018.2839563
Article Google Scholar
Agarwal, A., Dekel, O., & Xiao, L. (2010). Optimal algorithms for online convex optimization with multi-point bandit feedback. In The 23rd Conference on Learning Theory (pp. 28–40).
Xiong, Y., Li, X., You, K., & Wu, L. (2022). Distributed online optimization in time-varying unbalanced networks without explicit subgradients. IEEE Transactions on Signal Processing, 70, 4047–4060. https://doi.org/10.1109/TSP.2022.3194369
Article MathSciNet ADS Google Scholar
Tang, Y., Zhang, J., & Li, N. (2020). Distributed zero-order algorithms for nonconvex multiagent optimization. IEEE Transactions on Control of Network Systems, 8(1), 269–281. https://doi.org/10.1109/TCNS.2020.3024321
Article MathSciNet Google Scholar
Yuan, D., & Ho, D. W. (2014). Randomized gradient-free method for multiagent optimization over time-varying networks. IEEE Transactions on Neural Networks and Learning Systems, 26(6), 1342–1347. https://doi.org/10.1109/TNNLS.2014.2336806
Article MathSciNet PubMed Google Scholar
Pang, Y., & Hu, G. (2019). Randomized gradient-free distributed optimization methods for a multiagent system with unknown cost function. IEEE Transactions on Automatic Control, 65(1), 333–340. https://doi.org/10.1109/tac.2019.2914025
Article MathSciNet Google Scholar
Poudel, P., & Shirvaikar, M. (2010). Optimization of computer vision algorithms for real time platforms. In 2010 42nd Southeastern Symposium on System Theory (SSST) (pp. 51–55). IEEE. https://doi.org/10.1109/SSST.2010.5442803
Liu, Q., Guo, Z., & Wang, J. (2012). A one-layer recurrent neural network for constrained pseudoconvex optimization and its application for dynamic portfolio optimization. Neural Networks, 26, 99–109. https://doi.org/10.1016/j.neunet.2011.09.001
Article PubMed Google Scholar
Bian, W., Ma, L., Qin, S., & Xue, X. (2018). Neural network for nonsmooth pseudoconvex optimization with general convex constraints. Neural Networks, 101, 1–14. https://doi.org/10.1016/j.neunet.2018.01.008
Article PubMed Google Scholar
Xu, H., & Lu, K. (2022). Continuous-time distributed optimization with strictly pseudoconvex objective functions. Journal of the Franklin Institute, 359(2), 1483–1502. https://doi.org/10.1016/j.jfranklin.2021.11.034
Article MathSciNet Google Scholar
Lu, K., Jing, G., & Wang, L. (2019). Online distributed optimization with strongly pseudoconvex-sum cost functions. IEEE Transactions on Automatic Control, 65(1), 426–433. https://doi.org/10.1109/TAC.2019.2915745
Article MathSciNet Google Scholar
Lu, K., & Wang, L. (2023). Online distributed optimization with nonconvex objective functions via dynamic regrets. IEEE Transactions on Automatic Control, 68(11), 6509–6524. https://doi.org/10.1109/TAC.2023.3239432
Article MathSciNet Google Scholar
Hu, X., & Wang, J. (2006). Solving pseudomonotone variational inequalities and pseudoconvex optimization problems using the projection neural network. IEEE Transactions on Neural Networks, 17(6), 1487–1499. https://doi.org/10.1109/TNN.2006.879774
Article PubMed Google Scholar
Hajinezhad, D., Hong, M., & Garcia, A. (2019). ZONE: Zeroth-order nonconvex multiagent optimization over networks. IEEE Transactions on Automatic Control, 64(10), 3995–4010. https://doi.org/10.1109/TAC.2019.2896025
Article MathSciNet Google Scholar
Nesterov, Y., & Spokoiny, V. (2017). Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17, 527–566. https://doi.org/10.1007/s10208-015-9296-2
Article MathSciNet Google Scholar
El Farouq, N. (2001). Pseudomonotone variational inequalities: Convergence of the auxiliary problem method. Journal of Optimization Theory and Applications, 111(2), 305. https://doi.org/10.1023/A:1012234817482
Article MathSciNet Google Scholar
Pang, J. S., & Chan, D. (1982). Iterative methods for variational and complementarity problems. Mathematical Programming, 24(1), 284–313. https://doi.org/10.1007/BF01585112
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, Jiangsu, China
Xiaoxi Yan & Cheng Li
College of Electrical Engineering and Automation, Shandong University of Science and Technology, Qingdao, 266590, Shandong, China
Kaihong Lu & Hang Xu

Authors

Xiaoxi Yan
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Kaihong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hang Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kaihong Lu.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

This work was supported by the National Natural Science Foundation of China (Nos. 62103169, 51875380) and the China Postdoctoral Science Foundation (No. 2021M691313).

Appendix

1.1 Proof of Lemma 4

Note that $z_i(t+1)$ generated by algorithm (15) is the solution to the following optimization:

$$\begin{aligned} \min _{z\in \varOmega } ~z^{\textrm{T}}Pz+\langle \alpha (t)\hat{{{\varvec{g}}}}(z_i(t))-2P\nu _i(t),z\rangle . \end{aligned}$$

(28)

Then according to the KKT condition, we can have

$$\begin{aligned}&\langle z_i(t+1)-\nu _i(t),z_i(t+1)-z\rangle _P\nonumber \\&\quad \le \dfrac{\alpha (t)}{2}\langle \hat{{{\varvec{g}}}}(z_i(t)),z_i(t+1)-z\rangle \end{aligned}$$

(29)

for any $z\in \varOmega .$ Note that $\nu _i(t)\in \varOmega .$ Let $z= \nu _i(t),$ based on $2\varepsilon \Vert z_i(t+1)-\nu _i(t)\Vert ^2\le \Vert z_i(t+1)-\nu _i(t)\Vert _P^2$ and $\Vert \hat{{{\varvec{g}}}}(z_i(t))\Vert \le \beta _1,$ the following inequality holds

$$\begin{aligned} 2\varepsilon \Vert z_i(t+1)-\nu _i(t)\Vert ^2\le \dfrac{\beta _1\alpha (t)}{2}\Vert z_i(t+1)-\nu _i(t)\Vert . \end{aligned}$$

(30)

For any $i\in {\mathcal {V}},$ $r_i(t)$ is defined as $r_i(t)=z_i(t+1)-\nu _i(t).$ From inequality (30), one can obtain $\Vert r_i(t)\Vert \le \beta _1\alpha (t)/(2\varepsilon ).$ Moreover,

$$\begin{aligned} z_i(t+1)=\textstyle \sum \limits \limits _{j\in {\mathcal {N}}_i(t)}w_{ij}z_j(t)+r_i(t). \end{aligned}$$

(31)

For any $i\in {\mathcal {V}},$ vector ${\tilde{z}}_s(t)\in {\mathbb {R}}^n$ and ${\tilde{r}}_s(t)\in {\mathbb {R}}^n $ are defined as a superposition of the sth entry of $z_i(t)$ and $r_i(t)$, respectively. Then, it follows that

$$\begin{aligned} {\tilde{z}}_s(t+1)={{\varvec{W}}}(t){\tilde{z}}_s(t)+{\tilde{r}}_s(t), \end{aligned}$$

(32)

which means the following equation holds

$$\begin{aligned} {\tilde{z}}_s(t)=\varPhi (t,0){\tilde{z}}_s(0)+\textstyle \sum \limits _{\gamma =1}^{t}\varPhi (t,\gamma ) {\tilde{r}}_s(\gamma -1), \end{aligned}$$

(33)

where the definition of $\varPhi (t,k)$ can be found in (1). Since matrix $\varPhi (t,k)$ is doubly stochastic, Eq. (33) can be further expressed

$$\begin{aligned} {{\varvec{1}}}^{\textrm{T}}{\tilde{z}}_s(t)={{\varvec{1}}}^\textrm{T}{\tilde{z}}_s(0)+\textstyle \sum \limits _ {\gamma =1}^{t}{{\varvec{1}}}^{\textrm{T}}{\tilde{r}}_s(\gamma -1). \end{aligned}$$

(34)

Combining (33) and (34), one can obtain

$$\begin{aligned}&\Big \vert [{{\tilde{z}}}_s(t)]_i-\dfrac{1}{n}{{\varvec{1}}}^{\textrm{T}}{\tilde{z}}_s(t)\Big \vert \nonumber \\&\quad \le \Big \vert \Big ([\varPhi (t,0)]_{i.}-\dfrac{1}{n}{{\varvec{1}}}^\textrm{T}\Big )\tilde{z}_s(0)\Big \vert \nonumber \\&\qquad +\textstyle \sum \limits _{\gamma =1}^{t}\Big \vert \Big ([\varPhi (t,\gamma )]_{i.}-\dfrac{1}{n}{{\varvec{1}}}^{\textrm{T}}\Big ){\tilde{r}}_s(\gamma -1)\Big \vert \nonumber \\&\quad \le \max \limits _{1\le j\le n}\Big \vert [\varPhi (t,0)]_{ij}-\dfrac{1}{n}\Big \vert \Vert {\tilde{z}}_s(0)\Vert _{1} \dfrac{n\beta _1}{2\varepsilon } \nonumber \\&\qquad \cdot \textstyle \sum \limits _{\gamma =1}^t \alpha (\gamma -1) \max \limits _{1\le j\le n}\Big \vert [\varPhi (t,\gamma )]_{ij}-\dfrac{1}{n}\Big \vert \end{aligned}$$

(35)

for any $i\in {\mathcal {V}}.$ Based on inequality (2), one has

$$\begin{aligned}&\Big \vert [{{\tilde{z}}}_s(t)]_i-\dfrac{1}{n}{{\varvec{1}}}^{\textrm{T}}{\tilde{z}}_s(t)\Big \vert \nonumber \\&\quad \le D\rho ^t\Vert {\tilde{z}}_s(0)\Vert _{1}+\dfrac{n D\beta _1}{2\varepsilon }\textstyle \sum \limits _{\gamma =1}^t \rho ^{t-\gamma }\alpha (\gamma -1)\nonumber \\&\quad \le D\rho ^t\Vert {\tilde{z}}_s(0)\Vert _{1}+\dfrac{n D\beta _1}{2\varepsilon }\textstyle \sum \limits _{\gamma =0}^t \rho ^{t-\gamma }\alpha (\gamma ). \end{aligned}$$

(36)

This inequality directly proves the validity of inequality (19). $\square $

Furthermore, we can have

$$\begin{aligned}&E\Vert z_i(t)-{\bar{z}}(t)\Vert \nonumber \\&\quad \le {\mathcal {C}}\rho ^t+\dfrac{n D\sqrt{m}}{2\varepsilon }({m+4})L_0\textstyle \sum \limits _{\gamma =1}^t \rho ^{t-\gamma }\alpha (\gamma ). \end{aligned}$$

(37)

Note that $0<\rho <1$ and learning rate $\alpha (t)$ is non-increasing, we can obtain

$$\begin{aligned}&\Vert z_i(t+1)-{\bar{z}}(t)\Vert ^2\nonumber \\&\quad \le \left( {\mathcal {C}}^2+\dfrac{{\mathcal {C}}n D\sqrt{m}\alpha _0\beta _1}{\varepsilon (1-\rho )}\right) \rho ^t\nonumber \\&\qquad +\dfrac{m(n\beta _1D)^2}{4\varepsilon ^2}\Big (\textstyle \sum \limits _{\gamma =0}^t \rho ^{t-\gamma }\alpha (\gamma )\Big )^2. \end{aligned}$$

(38)

By Cauchy–Schwarz inequality, we can obtain

$$\begin{aligned}&\left( \textstyle \sum \limits _{\gamma =0}^t \rho ^{t-\gamma }\alpha (\gamma -1)\right) ^2\nonumber \\&\quad \le \left( \textstyle \sum \limits _{\gamma =0}^t \rho ^{t-\gamma }\right) \left( \textstyle \sum \limits _{\gamma =0}^t \rho ^{t-\gamma }(\alpha (\gamma ))^2\right) \nonumber \\&\quad \le \dfrac{1}{1-\rho }\textstyle \sum \limits _{\gamma =0}^t \rho ^{t-\gamma }(\alpha (\gamma ))^2. \end{aligned}$$

(39)

Combining (38) and (39), we can obtain the validity of inequality (20).$\square $

Furthermore, we can achieve

$$\begin{aligned}&{\mathbb {E}}\Vert z_i(t)-{\bar{z}}(t)\Vert ^2\nonumber \\&\quad \le {\mathcal {K}}_1\rho ^t+{\mathcal {K}}_2(m+4)^2 L_0^2\ \textstyle \sum \limits _{\gamma =1}^t \rho ^{t-\gamma }\alpha (\gamma )^2. \end{aligned}$$

(40)

1.2 Proof of Lemma 5

To prove Lemma 5, an auxiliary lemma is given as follows.

Lemma 6

Based on Assumptions 1–4, for $t\in \lfloor T\rfloor $ and any $u\in {\mathbb {R}}^m,$

$$\begin{aligned}&\textstyle \sum \limits _{i=1}^{n}\langle z_i(t)-\nu _i(t),u-z_i(t+1)\rangle _P\nonumber \\&\quad \le \dfrac{5nL}{2}{\mathcal {K}}_1\rho ^t+\dfrac{5nL}{2}{\mathcal {K}}_2 \textstyle \sum \limits _{\gamma =0}^{t+1} \rho ^{t-\gamma }(\alpha (\gamma ))^2. \end{aligned}$$

(41)

Proof of Lemma 6

Note that

$$\begin{aligned}&\textstyle \sum \limits _{i=1}^{n}\langle z_i(t)-\nu _i(t),u-z_i(t+1)\rangle _P\nonumber \\&\quad =\textstyle \sum \limits _{i=1}^{n}\langle z_i(t)-\nu _i(t),u-{\bar{z}}(t+1)\rangle _P\nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^{n}\langle z_i(t)-\nu _i(t),{\bar{z}}(t+1)-z_i(t+1)\rangle _P\nonumber \\&\quad \le \left\langle \textstyle \sum \limits _{i=1}^{n} (z_i(t)-\nu _i(t)),u-{\bar{z}}(t+1)\right\rangle _P\nonumber \\&\qquad +M\textstyle \sum \limits _{i=1}^{n}\Vert z_i(t)-\nu _i(t)\Vert \Vert {\bar{z}}(t+1)-z_i(t+1)\Vert . \end{aligned}$$

(42)

It is not hard to prove that $ \sum _{i=1}^n P\nu _i(t) = \sum _{i=1}^n Pz_i(t),$ Then, $\big \langle \sum _{i=1}^n\Big (z_i(t+1)-\nu _i(t)\Big ), u-{\bar{z}}(t+1)\big \rangle _P=0.$ Using Young’s inequality and the fact that $\sum _{i=1}^n\Vert z_i(t)-\nu _i(t)\Vert ^2\le 4\sum _{i=1}^n\Vert z_i(t)-{\bar{z}}(t)\Vert ^2,$ we can have

$$\begin{aligned}&\textstyle \sum \limits _{i=1}^n\langle z_i(t+1)-\nu _i(t),u-z_i(t+1)\rangle _P\nonumber \\&\quad \le 2M\textstyle \sum \limits _{i=1}^n\Vert z_i(t)-{\bar{z}}(t)\Vert ^2\nonumber \\&\qquad +\dfrac{M}{2}\textstyle \sum \limits _{i=1}^n\Vert {\bar{z}}(t+1)-z_i(t+1)\Vert ^2. \end{aligned}$$

(43)

Combining with (20) in Lemma 4, it immediately implies (41). $\square $

Proof of Lemma 5

Note that

$$\begin{aligned}&\dfrac{1}{2}\Vert z^*(t+1)-z_i(t+1)\Vert ^2_P- \dfrac{1}{2}\Vert z^*(t)-z_i(t)\Vert ^2_P\nonumber \\&\quad =\langle z_i(t)-z_i(t+1),z^*(t)-z_i(t+1)\rangle _P-\dfrac{1}{2}\Vert z_i(t)\nonumber \\&\qquad -z_i(t+1)\Vert ^2_P +\textstyle \sum \limits _{i=1}^n\big \langle \dfrac{1}{2}\big (z^*(t+1)+z^*(t)\big )\nonumber \\&\qquad -z_i(t+1),z^*(t+1)+z^*(t)\big \rangle _P. \end{aligned}$$

(44)

Now, we denote ${\mathcal {O}}(t)=\frac{1}{2}\sum _{i=1}^n\Vert z_i(t)^*-z_i(t)\Vert ^2_ P.$ Then,

$$\begin{aligned}&\nabla {\mathcal {O}}(t)\nonumber \\&\quad ={\mathcal {O}}(t+1)-{\mathcal {O}}(t)\nonumber \\&\quad =\textstyle \sum \limits _{i=1}^n\Big \langle z_i(t)-z_i(t+1),z^*(t)-z_i(t+1)\Big \rangle _P\nonumber \\&\qquad -\dfrac{1}{2}\textstyle \sum \limits _{i=1}^n\Vert z_i(t)-z_i(t+1)\Vert ^2_P +\textstyle \sum \limits _{i=1}^n\Big \langle \dfrac{1}{2}(z^*(t+1)\nonumber \\&\qquad +z^*(t))-z_i(t+1), z^*(t+1)-z^*(t)\Big \rangle _P\nonumber \\&\quad \le \textstyle \sum \limits _{i=1}^n\Big \langle z_i(t)-z_i(t+1),z^*(t)-z_i(t+1)\Big \rangle _P\nonumber \\&\qquad -\textstyle \sum \limits _{i=1}^n\dfrac{\varepsilon }{2}\Vert z_i(t)-z_i(t+1)\Vert ^2\nonumber \\&\qquad +2hM\Vert z^*(t+1)-z^*(t)\Vert ^2. \end{aligned}$$

(45)

Based on KKT condition, one can obtain

$$\begin{aligned}&\langle \nu _i(t)-z_i(t+1),z^*(t)-z_i(t+1)\rangle _P\nonumber \\&\quad \le \dfrac{\alpha (t)}{2}\langle \hat{{{\varvec{g}}}}(z_i(t)),z^*(t)-z_i(t+1)\rangle . \end{aligned}$$

(46)

Using (41) in Lemma 6, we have

$$\begin{aligned}&\textstyle \sum \limits _{i=1}^n\langle z_i(t)-z_i(t+1),z^*(t)-z_i(t+1)\rangle _P\nonumber \\&\quad =\textstyle \sum \limits _{i=1}^n\langle \nu _i(t)-z_i(t+1),z^*(t)-z_i(t+1)\rangle _P\nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^n\langle z_i(t)-\nu _i(t),z^*(t)-z_i(t+1)\rangle _P\nonumber \\&\quad \le \textstyle \sum \limits _{i=1}^n\dfrac{\alpha (t)}{2}\big \langle \hat{{{\varvec{g}}}}(z_i(t)),z^*(t)-z_i(t+1)\rangle _P\nonumber \\&\qquad +\dfrac{5Mn}{2}{\mathcal {K}}_1\rho ^t+\dfrac{5M n}{2}{\mathcal {K}}_2\textstyle \sum \limits _{\gamma =0}^{t+1} \rho ^{t-\gamma }\alpha ((\gamma ))^2. \end{aligned}$$

(47)

Based on Lemma 2 and Assumption 3, the following inequality can be achieved

$$\begin{aligned} \Big \langle \textstyle \sum \limits _{i=1}^n\nabla \psi ^t_i({\bar{z}}(t)),{\bar{z}}(t)-z^*(t)\Big \rangle \ge \dfrac{\upsilon }{2}\Vert {\bar{z}}(t)-z^*(t)\Vert ^2. \end{aligned}$$

(48)

Note that $\Vert \nabla ^2 \psi _i^t\Vert \le L_1$ for any $z\in \varOmega ,$ it implies $\Vert \nabla \psi _i^t(z_i(t))-\nabla \psi _i^t({\bar{z}}(t))\Vert \le L_1 \Vert z_i(t)-{\bar{z}}(t)\Vert $ for any $i \in {\mathcal {V}}.$ By combining the boundedness of $\varOmega $ in Assumption 2, one can obtain

$$\begin{aligned}&\textstyle \sum \limits _{i=1}^n\langle \nabla \psi _i^t(z_i(t)),z^*(t)-z_i(t)\rangle \nonumber \\&\quad =\textstyle \sum \limits _{i=1}^n\langle \nabla \psi _i^t(z_i(t))-\nabla \psi _i^t({\bar{z}}(t)),z^*(t)-z_i(t)\rangle \nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^n\langle \nabla \psi _i^t({\bar{z}}(t)),{\bar{z}}(t)-z_i(t)\rangle \nonumber \\&\qquad -\textstyle \sum \limits _{i=1}^n \langle \nabla \psi _i^t({\bar{z}}(t)),{\bar{z}}(t)-z^*(t)\rangle \nonumber \\&\quad \le \textstyle \sum \limits _{i=1}^n (\kappa L_1+L_0)\Vert z_i(t)-{\bar{z}}(t)\Vert -\dfrac{\upsilon }{2}\Vert {\bar{z}}(t)-z^*(t)\Vert ^2. \end{aligned}$$

(49)

Thus, the following inequality can be achieved

$$\begin{aligned}&\alpha (t)\textstyle \sum \limits _{i=1}^n\langle \nabla \psi _i^t(z_i(t)),z^*(t)-z_i(t+1)\rangle \nonumber \\&\quad =\textstyle \sum \limits _{i=1}^n\alpha (t)\langle \nabla \psi _i^t(z_i(t)),z^*(t)-z_i(t)\rangle \nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^n\alpha (t)\langle \nabla \psi _i^t(z_i(t)),z_i(t)-z_i(t+1)\rangle \nonumber \\&\quad \le -\dfrac{\upsilon \alpha (t)}{2}\Vert {\bar{z}}(t)-z^*(t)\Vert ^2\nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^nL_0\alpha (t)\Vert z_i(t)-z_i(t+1)\Vert \nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^n (\kappa L_1+L_0)\alpha (t)\Vert z_i(t)-{\bar{z}}(t)\Vert \nonumber \\&\quad \le -\dfrac{\upsilon \alpha (t)}{2}\Vert {\bar{z}}(t)-z^*(t)\Vert ^2\nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^n (\kappa L_1 +L_0)\alpha (t)\Vert z_i(t)-{\bar{z}}(t)\Vert \nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^n\dfrac{\varepsilon }{2}\Vert z_i(t)-z_i(t+1)\Vert ^2+\dfrac{n(L_0\alpha (t))^2}{2\varepsilon }. \end{aligned}$$

(50)

Moreover, we can also achieve that

$$\begin{aligned}&\textstyle \sum \limits _{i=1}^n\alpha (t)\langle \nabla \psi _{\mu ^i}^t(z_i(t))\!-\!\nabla \psi _i^t(z_i(t)),z^*(t)\!-\!z_i(t\!+\!1)\rangle \nonumber \\&\quad \le \textstyle \sum \limits _{i=1}^n \alpha (t)k\Vert \nabla \psi _{\mu ^i}^t(z_i(t))-\nabla \psi _i^t(z_i(t))\Vert \nonumber \\&\quad \le \dfrac{L_1c k(m+3)^{\frac{3}{2}}\textstyle \sum \limits _{i=1}^n \mu _i}{2}. \end{aligned}$$

(51)

Note that

$$\begin{aligned}&\textstyle \sum \limits _{i=1}^n\alpha (t)\langle \hat{{{\varvec{g}}}}(z_i(t)),z^*(t)-z_i(t+1)\rangle \nonumber \\&\quad =\textstyle \sum \limits _{i=1}^n\alpha (t)\langle \hat{{{\varvec{g}}}}(z_i(t))-\nabla \psi _{\mu ^i}^t(z_i(t)),z^*(t)-z_i(t+1)\rangle \nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^n\alpha (t)\langle \nabla \psi _i^t(z_i(t)),z^*(t)-z_i(t+1)\rangle \nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^n\alpha (t)\langle \nabla \psi _{\mu ^i}^t(z_i(t))-\nabla \psi _i^t(z_i(t)),z^*(t)\nonumber \\&\qquad -z_i(t+1)\rangle . \end{aligned}$$

(52)

Substituting (50) and (51) into (52), using (c) in Lemma 3, then taking the total expectation for the achieved inequality after taking the conditional expectation over ${\mathcal {F}}_t,$ so we can obtain

$$\begin{aligned}&{\mathbb {E}}\{\textstyle \sum \limits _{i=1}^n\alpha (t)\langle \hat{{{\varvec{g}}}}(z_i(t)),z^*(t)-z_i(t+1)\rangle \}\nonumber \\&\quad =\textstyle \sum \limits _{i=1}^n\alpha (t)\langle \nabla \psi _i^t(z_i(t)),z^*(t)-z_i(t+1)\rangle \nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^n\alpha (t)\langle \nabla \psi _{\mu ^i}^t(z_i(t))-\nabla \psi _i^t(z_i(t)),z^*(t)\nonumber \\&\qquad -z_i(t+1)\rangle \nonumber \\&\quad \le \dfrac{L_1 kc(m+3)^{\frac{3}{2}}\textstyle \sum \limits _{i=1}^n \mu _i}{2}-\dfrac{\upsilon \alpha (t)}{2}\Vert {\bar{z}}(t)-z^*(t)\Vert ^2\nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^n (\kappa L_1+L_0)\alpha (t)\Vert z_i(t)-{\bar{z}}(t+1)\Vert \nonumber \\&\qquad +\textstyle \sum \limits _{i=1}^n\dfrac{\varepsilon }{2}\Vert z_i(t)-z_i(t+1)\Vert ^2+\dfrac{n(L_0\alpha (t))^2}{2\varepsilon }. \end{aligned}$$

(53)

By (45), (47), and (53), using (25), we have

$$\begin{aligned}&{\mathbb {E}}\{\nabla {\mathcal {O}}(t)\}\nonumber \\&\quad \le -\dfrac{\upsilon \alpha (t)}{4}{\mathbb {E}}\{\Vert {\bar{z}}(t)-z^*(t)\Vert ^2\}+\dfrac{F_1\textstyle \sum \limits _{i=1}^n \mu _i}{4}\nonumber \\&\qquad +\dfrac{{\hat{\rho }}_1}{2}\rho ^t+\dfrac{n(L_0\alpha (t))^2}{2\varepsilon }+\dfrac{{\hat{\rho }}_2}{2}\textstyle \sum \limits _{\gamma =1}^{t+1}\rho ^{t-\gamma }(\alpha (\gamma ))^2\nonumber \\&\qquad +2hL\Vert z^*(t+1)-z^*(t)\Vert . \end{aligned}$$

(54)

Due to ${\mathcal {O}}(t) \ge 0$ for any $t\in \lfloor T\rfloor ,$ we can have $-\sum _{t=0}^T \nabla {\mathcal {O}}(t)={\mathcal {O}}(0)-{\mathcal {O}}(T)\le {\mathcal {O}}(0)\le d/2.$ By summing from $t=0$ to T on both sides of inequality (54), the validity of Lemma 5 is verified. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yan, X., Li, C., Lu, K. et al. Random gradient-free method for online distributed optimization with strongly pseudoconvex cost functions. Control Theory Technol. 22, 14–24 (2024). https://doi.org/10.1007/s11768-023-00181-8

Download citation

Received: 30 March 2023
Revised: 30 June 2023
Accepted: 15 August 2023
Published: 20 November 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11768-023-00181-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Random gradient-free method for online distributed optimization with strongly pseudoconvex cost functions

Abstract

Similar content being viewed by others

Online distributed optimization with stochastic gradients: high probability bound of regrets

Randomized difference-based gradient-free algorithm for distributed resource allocation

Convergence of Distributed Gradient-Tracking-Based Optimization Algorithms with Random Graphs

1 Introduction

2 Problem formulation

2.1 Basic graph theory

Assumption 1

Lemma 1

2.2 Online distributed optimization

Assumption 2

Assumption 3

Definition 1

Assumption 4

2.3 Online distributed gradient-free algorithm

Lemma 2

3 Main results

Theorem 1

Lemma 3

Lemma 4

Proof

Lemma 5

Proof

Proof of Theorem 1

4 A simulation example

5 Conclusions

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Appendix

Appendix

1.1 Proof of Lemma 4

1.2 Proof of Lemma 5

Lemma 6

Proof of Lemma 6

Proof of Lemma 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation