1 Introduction

In recent years, distributed optimization has attracted extensive attention in various fields [1,2,3,4,5,6]. This is due to its wide range of practicability in numerous fields such as distributed resource allocation [1], distributed economic dispatch [2], distributed machine learning [3], distributed coordination control [4], etc.

Distributed optimization in static environments has been widely studied in [7,8,9]. However, in practical applications, the scenarios that distributed optimization occurs are often dynamic. In recent years, online distributed optimization has been extensively studied [10,11,12,13,14]. For example, in [10], an online distributed push-sum algorithm is proposed for the unconstrained problem, and an online distributed coordinated algorithm based on the gradient descent method is developed in [11]. In [12], an online distributed saddle point algorithm is developed for optimization problem with a global set constraint, an online distributed mirror descent algorithm is proposed in [13], and an online distributed dual averaging algorithm is designed in [14].

It is worth pointing out that all the above works rely on real gradient information. However, it is not feasible or costly to calculate the gradient information accurately in practical applications. For example, in the Internet of Things [15], fog computing can not get the closed expression of delay since its online decision-making needs to adapt to the user preferences and the availability of resources is temporarily unpredictable. Moreover, in bandit optimization [16], agents can only observe the values of the cost functions, not the specific cost functions. In these cases, using zero-order information is desirable for distributed optimization. Recently, zeroth-order random online distributed optimization has been investigated in [17,18,19,20]. In [17, 18], zeroth-order random online distributed algorithms are proposed, where the zeroth-order information of cost functions is used. Furthermore, in [19, 20], two different random gradient-free algorithms are proposed for online distributed optimization under time-varying networks.

It is worth noting that most of the mentioned articles are applied to convex optimization problems. However, the problems of pseudoconvex optimization exist widely in reality. For example, in computer vision [21], the geometric expressions are usually modeled by pseudoconvex functions. Also, in portfolio planning [22] and fractional programming problems [23], cost functions are commonly formulated as pseudoconvex functions. Pseudoconvex optimization has a wider application range than convex optimization, as it can also be applied to some nonconvex cases. In fact, distributed optimization problems with pseudoconvex functions have only been studied in [24, 25], where real gradient information of cost functions is required.

Motivated by [19, 20, 24,25,26], we study the online distributed optimization problems with strongly pseudoconvex cost functions and random gradient-free method in this paper. Compared with [24, 25], where agents need to achieve real gradient information, here agents only use estimation of gradients instead of real gradient information to make decisions. To solve this problem, an online distributed algorithm with random gradient-free method is proposed, where a multi-point gradient estimator is used to estimate the gradients of local cost functions. Different from [11,12,13,14,15,16,17], which are based on the fact that the cost functions are convex, here the cost functions are considered to be strongly pseudoconvex. In [19, 20], gradient-free method and the convexity of cost functions are used to analyze the convergence of the proposed algorithms. Different from them, here we employ the strong pseudomonotonicity of cost functions and the Karush–Kuhn–Tucker (KKT) condition associated with pseudoconvex optimization to guarantee the effectiveness. We prove that if the graph is B-strongly connected, and the cumulative deviation of the minimizer sequence grows with a certain rate, then the expectation of dynamic regret increases sublinearly.

This paper is organized as follows. In Sect. 2, we formulate the problem and propose an algorithm. In Sect. 3, the main results are presented and the detailed proofs are given. A simulation example is given in Sect. 4. Section 5 is the conclusion of the full paper.

Notations   We use \(\nabla \psi (u)\) to denote the gradient of function \(\psi \) at point u. \(\lfloor T\rfloor \) is defined as set \(\{0, 1, \ldots , T\}\) for any \(T\in {\mathbb {N}}.\) For vectors \({{\varvec{u}}},{{\varvec{v}}}\in {\mathbb {R}}^{m}\) and matrix \({{\varvec{W}}}\in {\mathbb {R}}^{m\times m},\) we denote \([{{\varvec{u}}}]_i\) represents the ith element of \({{\varvec{u}}},\) \( \Vert {{{\varvec{u}}}} \Vert =\sqrt{{{\varvec{u}}}^{\textrm{T}}{{\varvec{u}}}},\) \(\langle {{{\varvec{u}}},{{\varvec{v}}}}\rangle _{{\varvec{W}}}=\langle {{{\varvec{W}}}{{\varvec{u}}}}, {{\varvec{v}}}\rangle ,\) \(\Vert {{\varvec{u}}}\Vert _{{\varvec{W}}} ^{2}= {{\varvec{u}}}^{\textrm{T}} {{{\varvec{W}}}{{\varvec{u}}}}.\) We use \({\mathbb {E}}\{{{\varvec{u}}}\}\) to denote the expectation of random variable \({{\varvec{u}}}.\)

2 Problem formulation

2.1 Basic graph theory

A time-varying directed communication graph is defined as \({\mathcal {G}}(t)=({\mathcal {V}}, {\mathcal {E}}(t), {{\varvec{W}}}(t)),\) where \({\mathcal {V}}=\{1, \ldots ,\) \( n\}\) is a vertex set, \({\mathcal {E}}(t) \subset {\mathcal {V}}\times {\mathcal {V}}\) denotes an edge set and \({{\varvec{W}}}(t)=(w_{ij})_{n\times n}\) is a non-negative matrix to represent the weight of adjacent edges. The neighbor set of agent i is defined as \({\mathcal {N}}_{i}(t)=\{j\in {\mathcal {V}}\vert \) \(( j,i)\in {\mathcal {E}}\},\) where agent i can receive information from agent j. For digraph \({\mathcal {G}}(t)\) and some \(B>0,\) \({\mathcal {E}}_B=\) \(\bigcup _{k=tB}^{(t+1)B-1}{{\mathcal {E}}}(k)\) denotes the B-edge set. Based on the above conditions, if digraph \({\mathcal {G}}(t)\) and the edge set \({\mathcal {E}}_B(t)\) are all strongly connected for any \(t\ge 0.\) Then, \({\mathcal {G}}(t)\) is called B-strongly connected graph.

In this paper, the following assumption is made for the communication graph \({\mathcal {G}}(t).\)

Assumption 1

For any \(t\ge 0,\) \({\mathcal {G}}(t)\) is B-strongly connected graph and matrix \({{\varvec{W}}}(t)\) is doubly stochastic.

For distributed optimization, \({{\varvec{W}}}(t)\) is an essential part in facilitating agents to achieve consensus [25]. For any \(t\ge k\ge 0,\) we define

$$\begin{aligned} \left\{ {\begin{array}{ll} \varPhi (t,k)={{\varvec{W}}}(t\!-\!1)\cdots {{\varvec{W}}}(k\!+\!1){{\varvec{W}}}(k), &{} \text{ if } t>k;\\ \varPhi (t,k)={{\varvec{1}}}_n,&{} \text{ if } t=k. \end{array}} \right. \end{aligned}$$
(1)

Lemma 1

[25] Based on Assumption 1, for any \(i,j \in {\mathcal {V}}\) and \(t\ge k,\) there exist certain \(D>0\) and \(0< \rho <1\) satisfying

$$\begin{aligned} \big \vert [ \varPhi (t,k)]_{ij} -\dfrac{1}{n} \big \vert \le D \rho ^{t-k}. \end{aligned}$$
(2)

2.2 Online distributed optimization

In this paper, we consider a multi-agent system with n agents, where agents exchange local information via graph \({{\mathcal {G}}(t)}.\) For any \(i\in {\mathcal {V}},\) a sequence of cost functions is given by \(\{\psi _i^1,\ldots ,\psi _i^{\textrm{T}}\},\) where T is a finite time horizon but is unknown by agents. For any \( t\in \lfloor T\rfloor ,\) \(\psi ^t_i: \varOmega \rightarrow {\mathbb {R}}\) is the cost function of agent i at time t and \(\varOmega \subset {\mathbb {R}}^m.\) Then, all agents aim to collaboratively solve the following optimization problem with a set constraint, which has the form:

$$\begin{aligned}&\min ~ \psi ^t(z)=\textstyle \sum \limits _{i=1}^n \psi ^t_i(z)~~ \text{ s.t. } z\in \varOmega . \end{aligned}$$
(3)

Here, we make some basic assumptions for the problem.

Assumption 2

Constraint set \(\varOmega \) is nonempty, convex, and compact.

Assumption 3

(Strong pseudomonotonicity) For any \(m,~n\in \varOmega ,\) if \(\langle \nabla {\psi ^t(n)},m-n\rangle \ge 0,\) then \(\langle \nabla {\psi ^t(m)},m-n\rangle \) \(\ge \) \(\beta \Vert m-n\Vert ^2\) for some \(\beta >0.\)

It follows from [27] that if \(\nabla \psi ^t\) is strong pseudomonotone, then \(\psi ^t\) is strongly pseudoconvex. The definitions of strongly pseudoconvex functions are given as follows.

Definition 1

For a differentiable function \(\psi (\cdot ):{\mathbb {R}}^m\rightarrow {\mathbb {R}}\) and a convex set \(\varOmega \in {\mathbb {R}}^m,\) \(\psi (\cdot )\) is named a pseudoconvex function if \(\langle \nabla \psi (n),m-n\rangle \ge 0\) implies \(\psi (m)-\psi (n)\ge 0\) for each pair of different points \(m,n\in \varOmega .\) Moreover, if \(\langle \nabla \psi (n),m-n\rangle \ge 0\) implies \(\psi (m)-\psi (n)\ge \beta \Vert m-n\Vert ^2\) for some \(\beta >0\) and each pair of different points \(m,n\in \varOmega ,\) then \(\psi (\cdot )\) is named a strongly pseudoconvex function on \(\varOmega .\)

Assumption 4

(Lipschitz continuous gradient) \(\Vert \nabla \psi _i^t(m)-\nabla \psi _i^t(n)\Vert \le L_1\Vert m-n\Vert , ~\forall m, n\in \varOmega \) for some \(L_1>0.\)

In this paper, we are committed to developing a gradient-free method for solving problem (3). From [28], the smoothed version of \(\psi ^t\) is defined as

$$\begin{aligned} \psi _{{\mu }}^t(z)=\dfrac{1}{(2\pi )^{m/2}}\oint _{{\mathbb {R}}^m} \psi ^t(z+\mu \xi ){\textrm{e}}^{-\frac{1}{2}\Vert \xi \Vert ^2} {\textrm{d}}\xi , \end{aligned}$$
(4)

where \(\mu >0 \) is the smoothing parameter of function \(\psi _{\mu }^t(z).\) And the multi-point unbiased gradient estimation is defined as

$$\begin{aligned}&\hat{{{\varvec{g}}}}(z_i(t))\nonumber \\&\quad =\dfrac{1}{Q_i}\textstyle \sum \limits _{q_i=1}^{Q_i} \dfrac{\psi _i^t\big (z_i(t)+\mu _i\xi _{q_i}(t)\big )-\psi _i^t\big ( z_i(t)\big )}{\mu _i}\xi _{q_i}(t), \end{aligned}$$
(5)

where \(\hat{{{\varvec{g}}}}(z_i(t))\) ia an unbiased estimator of \(\nabla \psi _{\mu _i}^t(z_i(t)),\) \(\mu _i>0\) is the smoothing parameter and the random sequence \(\xi _{q_i}(t)\) is locally generated from an i.i.d. standard Gaussian distribution for any \(q\in Q,\) where \(Q\in {\mathbb {N}}^+\) is the number of multi-point estimation. Based on results in [29], we know that if \(\psi ^t(\cdot )\) is differentiable, then \(\psi _{\mu }^t(\cdot )\) is also differentiable, the gradient of \(\psi _{\mu }^t(\cdot )\) is defined as

$$\begin{aligned}&\nabla \psi _{\mu }^t(z)\nonumber \\&\quad =\dfrac{1}{(2\pi )^{m/2}}\oint _{{\mathbb {R}}^m} \dfrac{\psi ^t\big (z+\mu \xi \big )-\psi ^t\big ( z\big )}{\mu } \xi {\textrm{e}}^{-\frac{1}{2}\Vert \xi \Vert ^2} {\textrm{d}}\xi . \end{aligned}$$
(6)

Any online algorithm should mimic the performance of its offline counterpart, and the gap between them is called regret [25]. The regret with most stringent offline benchmark is the dynamic regret, whose offline benchmark is to minimize \(\psi ^t(z)\) at each time. The definition of dynamic regrets is given by

$$\begin{aligned} {\mathcal {R}}_i^d(T)=\textstyle \sum \limits _{t=0}^T \psi ^t(z_i(t))-\textstyle \sum \limits _{t=0}^T \psi ^t(z^*(t)), ~i\in {\mathcal {V}}, \end{aligned}$$
(7)

where \(z^*(t) =\text {argmin}_{z\in \varOmega }\psi ^t(z)\) for any \(t\in \lfloor T\rfloor .\) An online optimization algorithm is announced “good” if regret (7) increases sublinearly, i.e., \(\lim _{T\rightarrow \infty }{\mathcal {R}}_i^d(T)/T=0.\) Unfortunately, using dynamic regret will cause the problem to become insolvable when the minimizer of the cost function changes rapidly. Inspired by [25], the difficulty is described by the deviation of the minimization sequence \(\{z^* (t)\}_{t=0}^{T}\) :

$$\begin{aligned} {\varTheta }_T=\textstyle \sum \limits _{t=0}^T \Vert z^* (t+1)-z^* (t)\Vert . \end{aligned}$$
(8)

2.3 Online distributed gradient-free algorithm

Now, we consider an offline and centralized optimization problem, defined as

$$\begin{aligned} \min ~ \psi (z), \text{ s.t. } z\in \varOmega , \end{aligned}$$
(9)

where objective function \(\psi \) is strongly pseudoconvex, and constraint set \(\varOmega \) satisfies Assumption 2. The KKT condition of pseudoconvex optimization in terms of variational inequality is given in the following lemma.

Lemma 2

[25] Suppose function \(\psi : {\mathbb {R}}^m \rightarrow {\mathbb {R}}\) is pseudoconvex and differentiable and constraint set \(\varOmega \) is convex. Then,  \(z^*\) is a minimum point of \(\psi \) on \(\varOmega \) if it can satisfy the following variational inequality

$$\begin{aligned} \langle \nabla \psi (z^*),z-z^*\rangle \ge 0, \quad \forall ~z\in \varOmega . \end{aligned}$$
(10)

Based on Lemma 2, we can know problem (9) is solved if there exists a point \(z \in \varOmega \) satisfying

$$\begin{aligned} \langle \nabla \psi (z),u-z\rangle \ge 0,\quad \forall ~ u\in \varOmega . \end{aligned}$$
(11)

Combining with multi-point unbiased gradient estimation,  we construct an auxiliary problem,  defined as

$$\begin{aligned} \min ~z^{\textrm{T}}Pz+\langle \alpha \hat{{{\varvec{g}}}}(z_0) -2Pz_0,z\rangle ,\quad \text{ s.t. }\ z\in \varOmega , \end{aligned}$$
(12)

where \(z_0\in \varOmega ,\) \(\alpha >0\) and matrix \(P\in {\mathbb {R}}^{m\times m}\) is positive definite and symmetric. Based on KKT condition,  \(z^*\in \varOmega \) is the solution to (12) if and only if

$$\begin{aligned} \langle 2Pz^* +\alpha \hat{{{\varvec{g}}}}(z_0)-2Pz_0,u-z^*\rangle ,\quad \forall u\in \varOmega . \end{aligned}$$
(13)

By comparing (11) and (13), we can achieve that \(z^*\) is also the solution to (9) when \(z^* =z_0.\) By replacing \(z_0\) with z(t) and \(z^*\) with \(z(t+1),\) we propose the following auxiliary optimization strategy : 

$$\begin{aligned}&z(t+1)\nonumber \\&\quad =\arg \min \limits _{z\in \varOmega }\Big \{ z^{\textrm{T}}Pz+\langle \alpha (t) \hat{{{\varvec{g}}}}(z(t))-2Pz(t),z\rangle \Big \}. \end{aligned}$$
(14)

Note that if \(z(t+1)=z(t)\) in (14), then problem (9) is solved,  which means that the solution of problem (9) is the equilibrium point of problem (14). Let matrix \(P={{\varvec{I}}}_m,\) if \(\psi ^t_i \) is a convex function,  then we have \(\arg \min _{z\in \varOmega }\{ z^{\textrm{T}}Pz+\langle \alpha (t)\hat{{{\varvec{g}}}}(z(t))-2Pz(t), z\rangle \}=z(t)-\frac{\alpha (t)}{2}\hat{{{\varvec{g}}}}(z(t)).\) Thus,  (14) can be regarded as an extension of the gradient descent algorithm. The detailed proofs of the convergence for algorithm (14) can be found in [30, 31].

To solve problem (3), a random gradient-free online distributed algorithm is proposed

$$\begin{aligned} \left\{ {\begin{array}{l} z_i(t+1)\\ =\arg \min \limits _{z\in \varOmega }\Big \{z^{\textrm{T}}Pz+\langle \alpha (t)\hat{{{\varvec{g}}}}(z_i(t))-2P\nu _i(t),z\rangle \Big \},\\ \nu _i(t)=\textstyle \sum \limits _{j\in {\mathcal {N}}_i}w_{ij}z_{j}(t),\\ \hat{{{\varvec{g}}}}(z_i(t))\\ =\dfrac{1}{Q_i}\textstyle \sum \limits _{q_i=1}^{Q_i}\dfrac{\psi ^t_i (z_i(t)+\mu _i\xi _{q_i}(t))-\psi ^t_i(z_i(t))}{\mu _i}\xi _{q_i}(t), \end{array}} \right. \end{aligned}$$
(15)

where \(z_i(t)\) is the state of agent i at time t with \(z_i(0)\in \varOmega \) and \(\alpha (t)>0\) is a diminishing learning rate with initial value \(\alpha (0)=\alpha _0.\) Motivated by the multi-point gradient estimation method, consensus algorithm, and the auxiliary optimization strategy, we propose algorithm (15). When running algorithm (15), each agent updates status at each time only using the information received from its neighbors and its own gradient estimation at past time. Therefore, (15) is an online and distributed algorithm.

3 Main results

In this part, we will elaborate the main results of this paper and give their concrete proof.

Theorem 1

Based on Assumptions 14, if the learning rate \(\alpha (t)=\frac{c}{\sqrt{t+1}}\) for some \(c >0,\) then for any \(i\in {\mathcal {V}}\) and \(T\in {\mathbb {N}},\) the following inequality holds

$$\begin{aligned} {\mathbb {E}}\{{\mathcal {R}}_i^d(T)\}&\le nL_0\sqrt{\varGamma +\dfrac{F_1T}{\upsilon \ln (T+1)}\textstyle \sum \limits _{i=1}^n\mu _i+\dfrac{16hM\varTheta _T}{c\mu \ln 2}}\nonumber \\&\quad \cdot \Big ((T+1)^{\frac{3}{4}}\sqrt{\ln (T+1)}\Big ), \end{aligned}$$
(16)

where \(\varGamma =\frac{(4{\hat{\rho }}_1/(c \upsilon )+ 2\hat{{\mathcal {K}}}_1))+3c^2(4{\hat{\rho }}_2/(c \upsilon )+2\hat{{\mathcal {K}}}_2)}{\rho (1-\rho )\ln 2}+ \frac{6nc L_0^2}{\upsilon \varepsilon }\) \(+\frac{4d}{c \upsilon \ln 2},\) \(\hat{{\mathcal {K}}}_1={\mathcal {C}}^2+\frac{{\mathcal {C}}L_0 Dn\sqrt{m}}{\varepsilon (1-\rho )} (m+ 4)L_0,\) \({\hat{\rho }}_2= \frac{n(5M{\mathcal {K}}_2+\sqrt{m+4}L_0(kL_1+L_0)Dn\sqrt{m})}{\varepsilon },\) \(\hat{{\mathcal {K}}}_2\!=\!\frac{mn^2 H^2}{4\varepsilon ^2(1\!-\!\rho )}(m\) \(+4)^2L_0^2,\) \({\hat{\rho }}_1= n(5M\hat{{\mathcal {K}}}_1+(kL_1+L_0)c {\mathcal {C}}),\) \(F_1=L_1 kc(m+3)^{\frac{3}{2}},\) \(L_0=\sup _{t\in \lfloor T\rfloor ,i\in {\mathcal {V}},z\in \varOmega }\Vert \nabla \psi _i^t(z)\Vert ,\) \(d=Mk^2,\) \(\varepsilon =\lambda _{\min }( {P}),\) \({\mathcal {C}}=D\sqrt{m} \textstyle \sum \limits _{i=1}^{n}\Vert z_i(0)\Vert _1,\) \(\beta _1=\sup _{t\in \lfloor T\rfloor ,i\in {\mathcal {V}},z\in \varOmega }\Vert \hat{{{\varvec{g}}}}(z_i(t))\Vert ,\) \(M=n\lambda _{\max }({P}),\) \(L_1 =\sup _{t\in \lfloor T\rfloor ,i\in {\mathcal {V}},z\in \varOmega }\Vert \nabla ^2 f_i^t(z)\Vert ,\) and \(h=\sup _{z\in \varOmega }\Vert z\Vert .\)

By Theorem 1, we know that both \(\varTheta _T\) and \(\mu _i\) play important roles in the bound of dynamic regret expectation. Note that if \(\mu _i=T^{-\frac{1}{2}-\varrho }\) for any \(\varrho >0\) and \(\varTheta _T\) grows sublinearly with \(\frac{\sqrt{T+1}}{\ln (T+1)},\) then \({\mathbb {E}}\{{\mathcal {R}}_i^d(T)\}\) increase sublinearly with T,  which implies the performance of online distributed algorithm (15) is “good”. Therefore, algorithm (15) can be employed to solve some strongly pseudoconvex optimization problems, where the specific gradient information is unavailable or expensive to obtain. If the fluctuation of minimizer sequence \(\{z^* (t)\}_{t=0}^{T}\) is dramatical, \({\varTheta }_T\) might become linear with \(\frac{\sqrt{T+1}}{\ln (T+1)},\) then the problem becomes insolvable. It is a natural phenomenon, even in online convex optimization.

Before proving Theorem 1, some useful lemmas are need to be presented. First, we use \({\mathcal {F}}_t\) to denote the \(\sigma \)-field generated by the entire history of the random variable as

$$\begin{aligned} {\mathcal {F}}_t=\{\text{ col } (z_i(s) )_{i\in {\mathcal {V}}}, \text{ col }(\varsigma _i(s))_{i\in {\mathcal {V}}}, s=0,\ldots , t\}. \end{aligned}$$
(17)

Based on the above definition, the following lemma can be achieved.

Lemma 3

[19, 28] If \(\nabla \psi _i^t\) is \(L_1\)-Lipschitz continuous on \(\varOmega ,\) then

$$\begin{aligned}&(\textrm{a})~~{\mathbb {E}}\{\Vert \hat{{{\varvec{g}}}}(z_i(t))\Vert ^2\vert {\mathcal {F}}_t\}\le (m+4)^2 L_0^2.\nonumber \\&(\textrm{b})~~\Vert \nabla \psi _{\mu _i}^t(z_i(t)) -\nabla \psi _i^t(z_i(t))\Vert \le \dfrac{\mu }{2}L_1 (m+3)^{\frac{3}{2}}.\nonumber \\&(\textrm{c})~~{\mathbb {E}}\{\Vert \hat{{{\varvec{g}}}}(z_i(t))-\nabla \psi _{\mu _i}^t(z_i(t)) \Vert \vert {\mathcal {F}}_{t}\}=0. \end{aligned}$$
(18)

To prove Theorem 1, first, we present the following lemma, which gives the upper bound of the discrepancy between each agent’s state and their average state at each update.

Lemma 4

Based on Assumptions 14, for any \(i \in {\mathcal {V}},\)

$$\begin{aligned} \Vert z_i(t)-{\bar{z}}(t)\Vert \le {\mathcal {C}}\rho ^t+\dfrac{n D\sqrt{m}\beta _1}{\varepsilon }\textstyle \sum \limits _{\gamma =0}^t \rho ^{t-\gamma }\alpha (\gamma ) \end{aligned}$$
(19)

and

$$\begin{aligned} \Vert z_i(t)-{\bar{z}}(t)\Vert ^2\le {\mathcal {K}}_1\rho ^t+{\mathcal {K}}_2 \textstyle \sum \limits _{\gamma =0}^t \rho ^{t-\gamma }(\alpha (\gamma ))^2, \end{aligned}$$
(20)

where \({\bar{z}}(t)=\frac{1}{n} \textstyle \sum \limits _{i=1}^n z_i(t) ,\) \({\mathcal {K}}_1={\mathcal {C}}^2+\frac{{\mathcal {C}}Dn\sqrt{m}\beta _1}{\varepsilon (1-\rho )},\) \({\mathcal {K}}_2=\frac{m(n D\beta _1)^2}{4\varepsilon ^2(1-\rho )}.\)

Proof

See Appendix 1. \(\square \)

Next, we give the expectation for the upper bound of the cumulative square error between agents’ optimal state and their average state at each iteration time.

Lemma 5

Under Assumptions 14, we have

$$\begin{aligned}&{\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert {\bar{z}}(t)-z^* (t)\Vert ^2\Big \}\nonumber \\&\quad \le \dfrac{2d}{u\alpha (T)}+\dfrac{8hM\varTheta _T}{u \alpha (T)}+\dfrac{2{\hat{\rho }}_1}{u \alpha (T)}\textstyle \sum \limits _{t=0}^T \rho ^t\nonumber \\&\qquad +\dfrac{2{\hat{\rho }}_2}{u \alpha (T)}\textstyle \sum \limits _{t=0}^T \textstyle \sum \limits _{\gamma =0}^{t+1}\rho ^{t-\gamma }(\alpha (\gamma ))^2\nonumber \\&\qquad +\dfrac{F_1T\textstyle \sum \limits _{i=1}^n\mu _i}{u \alpha (T)} +\dfrac{nL_0}{\varepsilon u \alpha (T)}\textstyle \sum \limits _{t=0}^T(\alpha (t))^2. \end{aligned}$$
(21)

Proof

See Appendix 1. \(\square \)

Proof of Theorem 1

Based on Lemmas 45, we can achieve

$$\begin{aligned}&{\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert z(t)-z^* (t)\Vert ^2\Big \}\nonumber \\&\quad \le 2{\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T \Vert {\bar{z}}(t)\!-\!z^* (t)\Vert ^2\Big \} +2{\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert z_i (t)\!-\!{\bar{z}}(t)\Vert ^2\Big \} \nonumber \\&\quad \le \left( \dfrac{4{\hat{\rho }}_1}{u \alpha (T)}+2\hat{{\mathcal {K}}}_1\right) \textstyle \sum \limits _{t=0}^T \rho ^t+\dfrac{4d}{u \alpha (T)}+\dfrac{16hM\varTheta _T}{u \alpha (T)}\nonumber \\&\qquad +\left( \dfrac{4{\hat{\rho }} _2}{u \alpha (T)}+2\hat{{\mathcal {K}}}_2\right) \textstyle \sum \limits _{t=0}^T \textstyle \sum \limits _{\gamma =0}^{t+1}\rho ^{t-\gamma }(\alpha (\gamma ))^2\nonumber \\&\qquad +\dfrac{2F_1T\textstyle \sum \limits _{i=1}^n\mu _i}{u \alpha (T)}+\dfrac{2nL_0}{\varepsilon u \alpha (T)}\textstyle \sum \limits _{t=0}^T(\alpha (t))^2. \end{aligned}$$
(22)

Let \(\alpha (t)=\frac{c}{\sqrt{t+1}},\) there holds

$$\begin{aligned}&\textstyle \sum \limits _{t=0}^T \textstyle \sum \limits _{\gamma =0}^{t+1}\rho ^{t-\gamma }(\alpha (\gamma ))^2\nonumber \\&\quad =\textstyle \sum \limits _{\gamma =1}^{T+1} \textstyle \sum \limits _{t=\gamma -1}^{T}\rho ^{t-\gamma }(\alpha (\gamma ))^2+\textstyle \sum \limits _{t=0}^T\rho ^t(\alpha (0))^2\nonumber \\&\quad \le \left( \textstyle \sum \limits _{\gamma =1}^{T+1}(\alpha (\gamma )\right) ^2\left( \textstyle \sum \limits _{t=0}^T\rho ^t \right) ^2+\textstyle \sum \limits _{t=0}^T\rho ^t (\alpha (0))^2 \nonumber \\&\quad \le \dfrac{c ^2 (1+\rho +\ln (T+1))}{ (1-\rho )\rho } \le \dfrac{3c ^2 \ln (T+1)}{ (1-\rho )\rho \ln 2}. \end{aligned}$$
(23)

Based on Jensen’s inequality, we have

$$\begin{aligned} \Big (\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert \Big )^2 \le (T+1)\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert ^2. \end{aligned}$$
(24)

Now, by taking expectation on both sides of inequality (24), we can obtain

$$\begin{aligned}&{\mathbb {E}}\Big \{\Big (\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert \Big )^2 \Big \}\nonumber \\&\quad \le (T+1){\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert ^2\Big \}. \end{aligned}$$
(25)

Based on (22), (23), and (25), we have

$$\begin{aligned}&{\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert \Big \}\nonumber \\&\quad \le \sqrt{(T+1){\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T\Vert {z}_i (t)-z^* (t)\Vert ^2}\Big \}\nonumber \\&\quad \le \sqrt{\varGamma +\dfrac{2(T+1)F_1\textstyle \sum \limits _{i=1}^n\mu _i}{\upsilon c\ln (T+1)}+\dfrac{16hM\varTheta _T}{c\mu \ln 2}}\nonumber \\&\qquad \cdot \big (T+1\big )^{\frac{3}{4}}\sqrt{\ln (T+1)}. \end{aligned}$$
(26)

Note that \(\nabla \psi _i ^t\) is bounded by \(L_0\) for any \(i \in {\mathcal {V}}\) and \(t \in \lfloor T\rfloor .\) Thus,

$$\begin{aligned} {\mathbb {E}}\Big \{{\mathcal {R}}_i ^d (T)\Big \}&={\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T(\psi ^t(z_i (t))-\psi ^t(z^* (t))\Big \} \nonumber \\&\le {\mathbb {E}}\Big \{\textstyle \sum \limits _{t=0}^T \textstyle \sum \limits _{j=1}^n \Vert \psi ^t_j(z_i (t))-\psi ^t_j(z^* (t)\Vert \Big \}\nonumber \\&\le {\mathbb {E}}\Big \{nL_0 \textstyle \sum \limits _{t=0}^T\Vert z_i (t)-z^* (t)\Vert \Big \}. \end{aligned}$$
(27)

Substituting (26) into (27) yields (16). Thus, Theorem 1 is proved. \(\square \)

4 A simulation example

In this part, we illustrate the validity of proposed algorithm by using a numerical example. Assume a multi-agent system with six agents, labeled as \({\mathcal {V}}=\{1,2,\ldots ,6\}.\) Each agent exchange information with its neighbors via a time-varying digraph as shown in Fig. 1. For any \(i\in {\mathcal {V}},\) cost function of agent i is given by

$$\begin{aligned} \psi _i(z)=p_i z^3+q_i(t)z, \end{aligned}$$

where \(z\in \varOmega .\) In this simulation, parameters are selected as \(p_1=0.6,\) \(p_2=0.8,\) \(p_3=p_6=1,\) \(p_4=0.5,\) \(p_5=0.1,\) \(q_i(t)\) is randomly selected from \([-i,i]\) and subject to a uniform distribution. Moreover, \(\varOmega =\{z\vert -10\le z\le -1\}\) is the constraint set of the objective function. Initial states of agents are selected as \(z_1(0)=0.3,\) \(z_2(0)=-0.5,\) \( z_3(0)=-0.5,\) \(z_4(0)=0.4,\) \(z_5(0)=-0.1\) and \(z_6(0)=-0.2.\) Algorithm (15) is applied to solving this problem. Let \(\alpha (t)={1}/{\sqrt{200t+500}},\) \(\mu _i =0.05\) and \(Q=1.\) The trajectories of \(x_i(t), i=1,\ldots ,6\) are represented in Fig. 2 and Fig. 3 displays the trajectories of the average regrets of all agents. From Fig. 2, it can be clearly seen that all agents’ states gradually approach to the optimal solution \(z^*(t).\) Furthermore, by observing Fig. 3, one can find that average regret of each agent gradually diminishes to zero. These observations indicate the correctness of achieved theoretical results.

Fig. 1
figure 1

The time-varying directed graph sequence

Fig. 2
figure 2

The state trajectories of all agents under algorithm (15)

Fig. 3
figure 3

The average regret trajectories of all agents under algorithm (15)

5 Conclusions

This paper deals with the zeroth-order online distributed optimization problem where the sum of local cost functions is strongly pseudoconvex. To solve this problem, a random gradient-free algorithm based on multi-point gradient estimation method and auxiliary optimization strategy is proposed. Under this algorithm, each agent updates state at each time only using the information received from its neighbors at the current moment and the information of its own gradient estimation at previous time.The proposed algorithm is measured by the dynamic regret. The results indicate that if the communication graph is B-strongly connected, and the deviation of the minimizer sequence grows with a certain rate, then the expectation of dynamic regret increases with some sublinear bound. The simulation example in the previous section has verified its validity. The cases of communication delays and packet losses will be our future work, which means that there will be new difficulty for the online distributed pseudoconvex optimization problems.