1 Introduction

The alternating direction method of multiplier (ADMM), proposed by Gabay and Mercier [30] and Glowinski and Marroco [31], has been shown to be an effective approach for solving linearly constrained separable optimization problems. It is closely related to the Douglas-Rachford [25] and Peachman-Rachford [58] operator splitting methods that date back to the 1950s. Due to of its success in solving structured convex optimization, ADMM has been widely used in various applications such as machine learning [14, 24, 55], compressive sensing [19, 68, 69], image processing [65, 66, 72,73,74], and reconstruction [18, 36, 71], sparse and low-rank optimization [49, 56].

Theoretical analysis of ADMM and its variants including linearized ADMM and proximal ADMM has been established extensively in the context of convex optimization. For example, Boley [8] studied the local linear convergence for solving quadratic and linear programs. Deng and Yin [23] studied the convergence rate of a general ADMM method in which a proximal term was added to each subproblem. Eckstein and Bertsekas [26] proved the linear convergence for solving linear programs, which depends on a bound on the largest iterate in the course of the ADMM algorithm. Hager, Yashtini, and Zhang [37] established the ergodic convergence rate of a proximal linearized ADMM method [18], in which the proximal parameter updates through a backtracking line search strategy.

Lions and Mercier [53] showed that the Douglas–Rachford operator splitting method converges linearly under the assumption that some involved monotone operators are both coercive and Lipschitz. When both functions in the objective are convex smooth and have Lipschitz continuous gradients, the work [32], establishes the O(1/k) and \(O(1/k^2)\) convergence rates based on the iteration number k for a Jacobi and its accelerated variant of ADMM respectively. Similar results also established in [33] for a Gauss-Seidel variant of ADMM and its accelerated variant, which requires only one of the two functions to be smooth with Lipschitz continuous gradients. Based on the variational inequality, He and Yuan established the O(1/k) convergence rate of Douglas-Rachford variant of ADMM in the ergodic [43] and non-ergodic sense [44]. Goldstein and Donoghue [34] proved the O(1/k) rate for the ADMM and \(O(1/k^2)\) rate for its accelerated variant based on the dual objective function under strong convexity assumptions. Davis and Yin [21] showed that the linear and sublinear rates of ADMM can be obtained as an extension of Douglas-Rachford splitting method in the sense of fixed-point residual and objective error. The R-linear convergence rate of ADMM under certain error bound condition proved in [45]. In [75], the authors consider a proximal ADMM with Fortin and Glowinski’s larger step size, and by using variational analysis they prove that the local linear convergence can be guaranteed without the strong convexity of objective functions together with the full rank assumption of the coefficient matrices, or the full polyhedricity assumption of their subdifferential.

The convergence of multi-block ADMM for minimizing the sum of more than two functions requires strong convexity assumptions on some or all functions in the objective [15, 17, 23, 38, 51, 52]. Without strong convexity assumptions, Chen et al. [16] showed that the multi-block ADMM is not necessarily convergent unless there exists at least two orthogonal coefficient matrices. Deng et al. [22] showed that the multi-block Jacobi ADMM converges with the o(1/k) rate when all coefficient matrices are mutually near-orthogonal and have full column rank or proximal terms are added to the subproblems. In [40, 41] the authors combined the ADMM with either forward or backward substitution procedure and they proved the convergence results from contraction perspectives. He, Hou, and Yuan [42] showed that the local linear convergence rate is provable if certain standard error bound condition is assumed.

He et al. [39] combined the Jacobi ADMM with a relaxation step and proved the convergence and O(1/k) rate in both ergodic and non-ergodic senses. In [20], the authors propose to update the Lagrange multiplier sequentially once each decomposed subproblem over the primal variables is solved. Their scheme updates both the primal and dual variables in Gauss-Seidel fashion. Recently, a generalized symmetric ADMM was proposed in [7] which updates the Lagrange multiplier twice with suitable stepsizes to solve the multi-block separable convex programming; they proved a worst-case O(1/k) ergodic convergence rate. In [6], the latter work [7] had been extended to stochastic ADMM-type version, and the so-called over-relaxation stepsize \((0, (1 + \sqrt{5})/2]\) was used in an accelerated stochastic ADMM [5].

The ADMM algorithm generally fails to solve nonconvex possibly nonsmooth optimization problems. However, its great performance on some practical applications such as phase retrieval [64], distributed clustering [28], sparse zero variance discriminant analysis [1], matrix separation [60], imaging [72,73,74], sparse feedback control [50] has encouraged researchers to study underlying conditions for which nonconvex nonsmooth ADMM converges. Wang and Yin [63] established the convergence of nonconvex nonsmooth multi-block ADMM for both separable and non-separable objective functions. For the separable objective function they assumed that majority of functions in the objective are prox-regular. When the functions are either smooth and nonconvex or convex and nonsmooth, Hong et al. [46] proved the convergence of the ADMM, provided that the penalty parameter in the augmented Lagrangian is chosen to be sufficiently large.

Wang et al. [62] analyzed the so-called Bregman ADMM under the assumption that the objective function is subanalytic, and included the standard ADMM as a special case. In [54] the authors studied the convergence of a full linearized ADMM, given that one of the coefficient matrices is full column rank and the penalty parameter in the augmented Lagrangian is sufficiently large. The work [47] considered a variant of multi-block proximal ADMM, where the proximal ADMM updates can be implemented for all the block variables except for the last block, for which either a gradient step or a majorization–minimization step is implemented. The authors analyzed its iteration complexity. In [35] Guo et al. considered a two-block ADMM to minimize a sum of two nonconvex functions with linear constraints and they proved the convergence by assuming that the generated sequence is bounded. Under some assumptions on the coefficient matrices, Yang, Pong, and Chen [67] studied a three-block ADMM to solve a special class of nonconvex and nonsmooth problems with applications to background/foreground extraction. Li and Pong [48] proved the global convergence of two-block ADMM under assumption that one of the functions in the objective is twice differentiable and has bounded Hessian. Under Kurdyka-Łojasiewicz property, Bot and Nguyen [13] and Yashtini [70] established the convergence and convergence rates of proximal variants of ADMM. The iteration complexity of two classes of fully and partially linearized multi-block ADMM with the choice of some relaxation parameter in the multiplier update established in [57]. In [61] the authors showed that ADMM is closely related to Douglas–Rachford splitting and Peaceman–Rachford splitting, and established a unifying global convergence result under the only assumption of Lipschitz differentiability of one function.

In this paper, we study the global convergence analysis of a variant of ADMM algorithm to solve the linearly constrained nonconvex and nonsmooth minimization problem

$$\begin{aligned} \begin{array}{ccc} &{}\displaystyle {\min _{x,y}}&{} {\mathcal {F}}(x,y):= f(x)+g(x)+h(y)\\ &{}\mathrm{s.t.}&{} Ax+By+c=0, \end{array} \end{aligned}$$
(1)

where \(x\in {\mathbb {R}}^n\) and \(y\in {\mathbb {R}}^m\) are unknown variables, \(A\in {\mathbb {R}}^{n\times p}\), \(B\in {\mathbb {R}}^{m\times p}\), \(c\in {\mathbb {R}}^p\). The function \(f:{\mathbb {R}}^n\rightarrow \bar{{\mathbb {R}}}:={\mathbb {R}} \cup \{+\infty \}\) is proper and lower-semicontinuous while \(g:{\mathbb {R}}^n\rightarrow \bar{{\mathbb {R}}}\) and \(h: {\mathbb {R}}^m\rightarrow \bar{{\mathbb {R}}}\) are proper smooth functions. We do not assume any convexity assumption on f, g, and h. The augmented Lagrangian function \({\mathcal {L}}^{\alpha } (x,y,z)\) associated with the problem (1) is defined by

$$\begin{aligned}&{\mathcal {L}}^{\alpha }: {\mathbb {R}}^n\times {\mathbb {R}}^m\times {\mathbb {R}}^p\rightarrow {\mathbb {R}}\nonumber \\&{\mathcal {L}}^{\alpha } (x,y,z)=f(x)+g(x)+h(y)+\langle z, Ax+By+c\rangle +\frac{\alpha }{2}\big \Vert Ax+By+c\big \Vert ^2, \end{aligned}$$
(2)

where \(\alpha >0\) and \(z\in {\mathbb {R}}^p\) is the Lagrange multiplier associated with the linear constraint \(Ax+By+c=0\). Let \(\{Q_1^k\}_{k\ge 0}\subseteq {\mathbb {R}}^{n\times n}\) and \(\{Q_2^k\}_{k\ge 0}\subseteq {\mathbb {R}}^{m\times m}\) be two sequences of symmetric and positive semidefinite matrices. Given the initial vector \((x^0,y^0,z^0)\) and for \(k=1,2, \dots \) until some stopping criterion satisfied the variable metric proximal ADMM generates the sequence \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) recursively as follows

$$\begin{aligned} \begin{array}{cll} x^{k+1}&{}\in &{}\displaystyle {\arg \min _{x}}\; {\mathcal {L}}^{\alpha } (x,y^k,z^k)+\frac{1}{2} \Vert x-x^k\Vert ^2_{Q_1^k},\\ y^{k+1}&{}=&{}\displaystyle {\arg \min _{y}}\; {\mathcal {L}}^{\alpha } (x^{k+1},y,z^k)+\frac{1}{2} \Vert y-y^k\Vert ^2_{Q_2^k},\\ z^{k+1}&{}=&{}z^k+\alpha (Ax^{k+1}+By^{k+1}+c), \end{array} \end{aligned}$$
(3)

where \(\Vert v\Vert ^2_{Q}=\langle v, Q v\rangle \) for any \(v\in {\mathbb {R}}^d\) and \(Q\in {\mathbb {R}}^{d\times d}\), \(\langle \cdot , \cdot \rangle \) denotes the Euclidean inner product, and \(\Vert \cdot \Vert =\sqrt{\langle \cdot ,\cdot \rangle }\) denotes the \(\ell _2\) norm. The algorithm (3) can be equivalently written as follows

$$\begin{aligned} \begin{array}{lll} x^{k+1}&{}\in &{}\displaystyle {\arg \min _{x\in {\mathbb {R}}^n} } f(x)+g(x)+\langle z^k,Ax\rangle +\frac{\alpha }{2}\Vert Ax+By^k+c\Vert ^2+\frac{1}{2} \Vert x-x^k\Vert ^2_{Q_1^k}, \\ y^{k+1}&{}=&{}\displaystyle {\arg \min _{y\in {\mathbb {R}}^m} } h(y)+ \langle z^k,By\rangle +\frac{\alpha }{2}\Vert By+Ax^{k+1}+c\Vert ^2+\frac{1}{2} \Vert y-y^k\Vert ^2_{Q_2^k}, \\ z^{k+1}&{}=&{}z^k+\alpha (Ax^{k+1}+By^{k+1}+c). \end{array} \end{aligned}$$

For efficiency, we take advantage of the smooth structure of \(g(\cdot )\), \(\frac{\alpha }{2}\Vert A \cdot +By^k+c\Vert ^2\), and \(h(\cdot )\), and we replace them by their proper linearization to obtain the variable metric Proximal Linearized ADMM (PL-ADMM):

$$\begin{aligned} \begin{array}{l} x^{k+1}\in \displaystyle {\arg \min _{x\in {\mathbb {R}}^n}\; {{\hat{f}}}^k(x)}\\ {{\hat{f}}}^k(x):= f(x)+ \big \langle \nabla g(x^k)+\alpha A^\mathsf{T}\big (Ax^k+By^k+c+{\alpha }^{-1} z^k\big ),x-x^k\big \rangle +\frac{1}{2}\Vert x-x^k\Vert ^2_{Q_1^k}, \\ y^{k+1}=\displaystyle { \arg \min _{y\in {\mathbb {R}}^m}}\;{{\hat{h}}}^k(y) \\ {{\hat{h}}}^k(y):= \big \langle \nabla h(y^k)+B^\mathsf{T}z^k, y-y^k\big \rangle +\frac{\alpha }{2}\Vert By+Ax^{k+1}+c\Vert ^2+\frac{1}{2} \Vert y-y^k\Vert ^2_{Q_2^k}, \\ z^{k+1}=z^k+\alpha \beta (Ax^{k+1}+By^{k+1}+c), \end{array} \end{aligned}$$
(4)

where \(\beta \in (0,2)\) is an over-relaxation parameter. In most applications, \(B^\mathsf{T}B\) is a diagonal, nearly diagonal, or nearly orthogonal matrix thus the y-subproblem with proper choice of \(Q_2^k\) can be solved efficiently, thus the linearization \(\frac{\alpha }{2}\Vert By+Ax^{k+1}+c\Vert ^2\) may be unnecessary. If \(B^\mathsf{T}B \) is nearly a diagonal matrix (or nearly an orthogonal matrix), one can replace \(B^\mathsf{T}B\) by a certain symmetric diagonal (orthogonal) matrix \(D\approx B^\mathsf{T}B\). This replacement gives rise to \(\frac{\alpha }{2} y^* B^\mathsf{T}B y=\frac{\alpha }{2} y^* D y^*\), and then one can choose \(Q_2^k= \alpha (D-B^\mathsf{T}B)\). In general wise choices of proximal matrices \(\{Q_i^k\}_{k\ge 0}\) for \(i=1,2\) can lead us to much easier computations for \(x^{k+1}\) and \(y^{k+1}\), consequently might yield a more efficient scheme. For instance, by setting \(Q_1^k=\frac{1}{t_k} I_n\) where \(\{t_k\}_{k\ge 0}\) is a positive sequence and \(I_n\) is an \(n\times n\) identity matrix, the x subproblem in (4) becomes the following prox-linear problem

$$\begin{aligned} x^{k+1}:= & {} \arg \min _{x\in {\mathbb {R}}^n} \Big \{f(x)+\big \langle p^k,x-x^k\big \rangle +\frac{1}{2t^k}\Vert x-x^k\Vert ^2\Big \}\\= & {} \arg \min _{x\in {\mathbb {R}}^n} \Big \{f(x)+\frac{1}{2t^k}\Vert x-x^k+t^kp^k\Vert ^2\Big \}, \end{aligned}$$

where \(p^k:=\nabla g(x^k)+\alpha A^\mathsf{T}(Ax^k+By^k+c+\alpha ^{-1}z^k)\). Prox-linear subproblems can be easier to compute specially when f is a separable function.

The PL-ADMM (4) is related but different from [13, 54]. Work [54] considers the fixed proximal terms \(\frac{L_x}{2}\Vert x-x^k\Vert ^2\) and \(\frac{L_y}{2}\Vert y-y^k\Vert ^2\), where \(L_x>0\) and \(L_y>0\) are positive constants, and \(\beta =1\). Algorithm 2 in [13] does not exploit lineariziation of \(\frac{\alpha }{2} \Vert A\cdot +By^k+c\Vert ^2\) in the x subproblem, and solves (1) with \(g(x)=0\), \(A=-I_{n\times n}\) and \(c=0\).

The main contributions of this paper is to analyze the theoretical properties of the PL-ADMM (4). In Theorem 1 we prove that the sequence generated by the PL-ADMM is bounded, and in Theorem 2 we show that in fact it is Cauchy, thus converges. In Theorem 3 we establish the rate of convergence for the error of regularized augmented Lagrangian, which by Lemma 11 this provides the rate of convergence for the error of objective function. Finally in Theorem 4 we analyze the convergence rate for the sequential error.

We present now the structure of the paper: in Sect. 2, we recall some definitions, well-known facts, and set the notation. In Sect. 3, we drive some fundamental properties based on the augmented Lagrangian for the sequence generated by the PL-ADMM. Sections 4 and 5 contain the main theoretical results of the paper. More specifically, in Sect. 4 we introduce the regularized augmented Lagrangian and some sufficient decrease condition which enables us to prove the boundedness of the PL-ADMM sequence. In Sect. 5 we prove general convergence and the convergence rates. Section 6 concludes the paper.

2 Preliminaries

Throughout this paper, we denote \({\mathbb {R}}\) as the real number set while \({\mathbb {Z}}\) as the set of integers. The set \(\bar{{\mathbb {R}}}:={\mathbb {R}}\cup \{+\infty \}\) is the extended real number, \({\mathbb {R}}_+\) is the positive real number set, and \({\mathbb {Z}}_+\) is the set of positive integers. The domain of \(\Phi :{\mathbb {R}}^d\rightarrow \bar{{\mathbb {R}}}={\mathbb {R}}\cup \{+\infty \}\), denoted \(\mathrm{dom}\; \Phi \), is defined by \( \mathrm{dom}\; \Phi :=\{x:\; \Phi (x)<+\infty \}. \) We write \(x^k\) is \(\Phi \)-converging to x, and we write \(x^k\xrightarrow {\Phi } x\), iff \(x^k\rightarrow x\) and \(\Phi (x^k)\rightarrow \Phi (x)\). Given the matrix X, \(\mathrm{Im}(X)\) denotes its image. We denote by \(I_n\) the \(n\times n\) identity matrix for \(n\in {\mathbb {Z}}_+\). The minimum, maximum, and the smallest positive eigenvalues of the matrix \(X\in {\mathbb {R}}^{n\times n}\) are denoted by \(\lambda _{\min }^X\), \(\lambda _{\max }^X\), \(\lambda _+^X\), respectively. The Euclidean scalar product of \({\mathbb {R}}^n\) and its corresponding norms are, respectively, denoted by \(\langle \cdot , \cdot \rangle \) and \(\Vert \cdot \Vert =\sqrt{\langle \cdot ,\cdot \rangle }\). If \(n_1,\dots , n_p\in {\mathbb {Z}}_+\) and \(p\in {\mathbb {Z}}_+\), then for any \(v:=(v_1,\dots ,v_p)\in {\mathbb {R}}^{n_1}\times {\mathbb {R}}^{n_2}\times \dots \times {\mathbb {R}}^{n_p}\) and \(v':=(v'_1,\dots ,v'_p)\in {\mathbb {R}}^{n_1}\times {\mathbb {R}}^{n_2}\times \dots \times {\mathbb {R}}^{n_p}\) the Cartesian product and its norm are defined by

$$\begin{aligned} \ll v,v'\gg =\sum _{i=1}^{p}\langle v_i,v_i'\rangle \quad \quad \frac{1}{\sqrt{p}} \sum _{i=1}^{p} \Vert v_i\Vert \le |||v||| = \sqrt{\sum _{i=1}^{p} \Vert v_i\Vert ^2}\le \sum _{i=1}^{p} \Vert v_i\Vert . \end{aligned}$$
(5)

For the sequence \(\{u^k\}_{k\ge 1}\), \(\Delta u^{k}:=u^k-u^{k-1}\), for all \(k\ge 1\).

2.1 Subdifferential and critical points

Let \(\Phi :{\mathbb {R}}^d\rightarrow \bar{{\mathbb {R}}}\). The Fréchet subdifferential of \(\Phi \) at \(x\in \mathrm{dom} \Phi \) is the set \(\partial _F\phi (x)\) of those elements \(s\in {\mathbb {R}}^d\) such that

$$\begin{aligned} \lim _{y\rightarrow x}\inf _{y\ne x} \frac{\Phi (y)-\Phi (x)-\langle s, y-x\rangle }{\Vert y-x\Vert }\ge 0 \end{aligned}$$

For any \(x\notin \mathrm{dom} \Phi \), we set \(\partial _F\Phi (x):= \emptyset \). The (limiting) subdifferential of \(\Phi \) at \(x\in \mathrm{dom}\;\Phi \) is the set \(\partial \Phi (x)\) of elements \(s\in {\mathbb {R}}^d\) for which there exists sequences \(\{x^k\}_{k\in {\mathbb {N}}}\) and \(\{s^k\}_{k\in {\mathbb {N}}}\) in \({\mathbb {R}}^d\) such that \(x^k\xrightarrow {\Phi } x\) and \(s^k\rightarrow s\), and \(s^k\in \partial _F\Phi (x^k)\). As before, \(\partial \Phi (x):=\emptyset \) for \(x\notin \mathrm{dom} \Phi \), and its domain is \(\mathrm{dom} \partial \Phi :=\{x\in {\mathbb {R}}^d: \partial \Phi (x)\ne \emptyset \}.\) Note that \(\partial _F\Phi (x) \subset \partial \Phi (x)\), where the first set is convex and closed while the second one is closed ([59], Theorem 8.6). When \(\Phi \) is convex the two sets coincide and \( \partial _F\Phi (x)=\partial \Phi (x) =\{s\in {\mathbb {R}}^d: \Phi (y)\ge \Phi (x) +\langle s, y-x\rangle \; \forall y\in {\mathbb {R}}^d\}. \) Let \(F:{\mathbb {R}}^n\times {\mathbb {R}}^m\rightarrow (-\infty ,+\infty ]\) be a lower semicontinuous function, the subdifferential of F(xy) at \(({{\hat{x}}},{{\hat{y}}})\) is \( \partial F({{\hat{x}}}, {{\hat{y}}})=\big (\partial _x F({{\hat{x}}},{{\hat{y}}}), \partial _y F({{\hat{x}}},{{\hat{y}}})\big ), \) where \(\partial _x F\) and \(\partial _y F\) are respectively the differential of the function \(F(\cdot ,y)\) when \(y\in {\mathbb {R}}^m\) is fixed, and \(F(x,\cdot )\) when x is fixed. The lazy slope of \(\Phi \) at x is \(\Vert \partial \Phi (x)\Vert _-:=\inf _{s\in \partial \Phi (x)} \Vert s\Vert \) if \(x\in \mathrm{dom} \partial \Phi \), and otherwise \(+\infty \). We say that \(x\in {\mathbb {R}}^d\) is a critical point iff \(0\in \partial \Phi (x)\). We denote the set of critical points of \(\Phi \) by \(\mathrm{crit}\; \Phi \). It follows from these definitions that if \(x^k\xrightarrow {\Phi } x\) with \(\lim \inf _{k\rightarrow \infty }\Vert \partial \Phi (x)\Vert _-=0\), then x is a critical point.

If \(\Phi :{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) is Fréchet differentiable and its gradient is Lipschitz continuous with constant \(L_{\Phi }>0\). Then for every \(u,v\in {\mathbb {R}}^d\) and every \(\xi \in [u,v]=\{(1-t)u+tv:\;t\in [0,1]\}\) it holds

$$\begin{aligned} \Phi (v)\le \Phi (u)+\langle \nabla \Phi (\xi ),v-u\rangle +\frac{L_{\Phi }}{2}\Vert v-u\Vert ^2. \end{aligned}$$
(6)

Moreover, if \(\Phi \) is bounded from below, by setting \(\xi =u\) and \(v=u-\delta \nabla \Phi (u)\) in (6), it is easy to show that the term \(\Phi (v)-(\delta -\frac{L_{\Phi } \delta ^2}{2})\Vert \nabla \Phi (v)\Vert ^2\) is bounded from below for every \(\delta >0\).

2.2 The Kurdyka-Łojasiewicz properties

Let \(\eta \in (0,+\infty ]\), and let \(\psi :[0,\eta ]\rightarrow [0,+\infty )\) be a continuous concave function such that \(\psi (0)=0\) and \(\psi \) is continuously differentiable on \((0,\eta )\), with \(\psi '(t)>0\) for all \(t\in (0,\eta )\).

A proper lower semi-continuous function \(\Phi :{\mathbb {R}}^d\rightarrow \bar{{\mathbb {R}}}\) has the Kurdyka-Łojasiewicz (KŁ) property at \(x^*\in \mathrm{dom} \;\partial \Phi \) with desingularising function \(\psi \), iff there exists \(\epsilon >0\) such that the Kurdyka-Łojasiewicz inequality

$$\begin{aligned} \psi '\big (\Phi (x)-\Phi (x^*)\big ) \Vert \partial \Phi (x)\Vert _- \ge 1 \end{aligned}$$
(7)

holds for all x in the strict local upper level set

$$\begin{aligned} \Gamma _{\eta }(x^*, \epsilon ) =\{x\in {\mathbb {R}}^d: \Vert x-x^*\Vert<\epsilon \;\; \mathrm{and} \;\; \Phi (x^*)<\Phi (x)<\Phi (x^*)+\eta \}. \end{aligned}$$

A proper lower semi-continuous function having the Kurdyka-Łojasiewicz property at each point of \(\mathrm{dom} \partial \Phi \) is a KŁ function. When \(\Phi \) is of class \(C^1\), (7) becomes \(\Vert \nabla (\psi \circ \Phi )(x)\Vert \ge 1\). This means that the more \(\Phi \) is flat around its critical points, the more \(\psi \) has to be steep around 0, and this justifies the term “desingularising”. If \(\Omega \subset {\mathbb {R}}^d\) and assume that \(\Phi \) takes constant value on \(\Omega \). The function \(\Phi \) satisfies a KŁ property on \(\Omega \) if there exists \(\epsilon >0\) such that the (7) holds for every \(x^*\in \Omega \) and every element x belongs to the intersection

$$\begin{aligned} \Gamma _{\eta ,\Omega }(x^*, \epsilon ) =\{x\in {\mathbb {R}}^d: \mathrm{dist} (x,\Omega )<\epsilon \;\; \mathrm{and} \;\; \Phi (x^*)<\Phi (x)<\Phi (x^*)+\eta \}. \end{aligned}$$

We will see that the growth of \(\psi \) has a direct impact on the convergence rate of PL-ADMM. If \(\psi (s)={{\bar{c}}} s^{1-\theta }\) for some \({{\bar{c}}}>0\) and \(\theta \in [0,1)\), then the Kurdyka-Łojasiewicz inequality (7) becomes

$$\begin{aligned} \big (\Phi (x)-\Phi (x^*)\big )^{\theta }\le c \Vert \partial \Phi (x)\Vert _- \end{aligned}$$
(8)

for any \(x\in \Gamma _{\eta }(x^*, \epsilon )\) where \(c=(1-\theta ){{\bar{c}}}\). In this case, we say \(\Phi \) has the KŁ property at \(x^*\) with an exponent \(\theta \). This asserts that \((\Phi (x)-\Phi (x^*))^{\theta }/\Vert \partial \Phi (x)\Vert _- \) remains bounded around \(x^*\).

This definition encompasses broad classes of functions arising in practical optimization problems. For example, if f is a proper closed semi-algebraic function, then f is a KŁ function with exponent \(\theta \in [0,1)\) [4]. The function l(Ax), where l is strongly convex on any compact set and it is twice differentiable, and \(A\in {\mathbb {R}}^{m\times n}\), is a KŁ function. Convex piecewise linear-quadratic function, \(\Vert x\Vert _1\), \(\Vert x\Vert _0\), \(\gamma \sum _{i=1}^k |x_{[i]}|\) where \(|x_{[i]}|\) is the ith largest (in magnitude) entry in x, \(k\le n\) and \(\gamma \in (0,1]\), \(\delta _{\Delta }(x)\) where \(\Delta =\{x\in {\mathbb {R}}^n: e^\mathsf{T}x=1, x\ge 0\}\), and least square problems with SCAD [27] or MCP [76] regularized functions are all KŁ functions. The KŁ property characterizes the local geometry of a function around the set of critical points. If \(\Phi \) is differentiable then \(\partial \Phi (x)=\nabla \Phi (x)\), the inequality (8) becomes \( \Phi (x)-\Phi (x^*)\le c \Vert \nabla \Phi (x)\Vert ^{\frac{1}{\theta }}, \) which generalizes the Polyak-Łojasiewics condition \(\Phi (x)-\Phi (x^*)\le {\mathcal {O}}(\Vert \nabla \Phi (x)\Vert ^2)\) (\(\theta =\frac{1}{2}\)). We refer interested readers to [2,3,4, 9,10,11,12, 29] for more properties of KŁ functions and illustrating examples.

3 Augmented Lagrangian-based properties

In this section, we establish some important properties for the PL-ADMM algorithm (4) based on the augmented Lagrangian functional (2).

Assumption 1

We begin by making some assumptions.

A1. The function f is lower-semicontinuous and coercive;

A2. g is coercive and h is bounded from below;

A3. The matrix B is full rank, \(\mathrm{Im}({A})\subseteq \mathrm{Im}(B)\) and \(b\in \mathrm{Im}(B)\);

A4. g and h have respectively \(L_g\) and \(L_h\) Lipschitz continuous gradients;

A5. \(\alpha >0\), \(\beta \in (0,2)\), \({q_i}^-:=\inf _{k\ge 0} \Vert Q_i^k\Vert \ge 0\) and \(q_i:=\sup _{k\ge 0} \Vert Q_i^k\Vert <+\infty \), \(i=1,2\)

Lemma 1

(Subgradient bound) Suppose that the Assumption 1 holds. Let \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) be a sequence generated by the PL-ADMM algorithm (4). There exists a constant \(\rho >0\) and \(d^{k+1}:=(d^{k+1}_x,d^{k+1}_y,d^{k+1}_z)\in \partial {\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k+1})\) where

$$\begin{aligned} d^{k+1}_x= & {} \nabla g(x^{k+1})-\nabla g(x^k) +A^\mathsf{T}\Delta z^{k+1} +\alpha A^\mathsf{T}B\Delta y^{k+1}+\alpha A^\mathsf{T}A\Delta x^{k+1} - Q_1^k \Delta x^{k+1}, \\ d^{k+1}_y= & {} \nabla h(y^{k+1})-\nabla h(y^k) +B^\mathsf{T}\Delta z^{k+1}- {Q_2^k} \Delta y^{k+1}, \quad \quad d^{k+1}_z = \frac{1}{\alpha \beta } \Delta z^{k+1} \end{aligned}$$

such that

$$\begin{aligned} |||d^{k+1}|||\le & {} \rho \Big (\Vert \Delta x^{k+1}\Vert + \Vert \Delta y^{k+1}\Vert + \Vert \Delta z^{k+1}\Vert \Big ), \end{aligned}$$
(9)

where for any sequence \(\{u^k\}_{k\ge 0}\), \(\Delta u^{k+1}=u^{k+1}-u^{k}\), and

$$\begin{aligned} \rho :=\max \{ q_1+L_g+\alpha \Vert A\Vert ^2, \alpha \Vert A\Vert \Vert B\Vert +L_h+q_2, \Vert A\Vert +\Vert B\Vert +\frac{1}{\alpha \beta }\}. \end{aligned}$$
(10)

Proof

Let \(k\ge 0\) be fixed. By taking partial differential of \({\mathcal {L}}^{\alpha }\) with respect to x, and evaluating the result at the point \((x^{k+1},y^{k+1},z^{k+1})\) yields

$$\begin{aligned} \partial _x {\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k+1})= \partial f(x^{k+1})+\nabla g(x^{k+1})+ \alpha A^\mathsf{T}\big (Ax^{k+1}+By^{k+1} +\alpha ^{-1}z^{k+1}+ c\big ). \end{aligned}$$

By the optimality condition of x subproblem in (4) we have

$$\begin{aligned} -\nabla g(x^k)-\alpha A^\mathsf{T}\big (Ax^{k}+By^k+\alpha ^{-1}z^k+ c\big ) - Q_1^k \Delta x^{k+1} \in \partial f(x^{k+1}). \end{aligned}$$

Therefore, we obtain

$$\begin{aligned} d_x^{k+1}:= & {} \nabla g(x^{k+1})-\nabla g(x^k) +A^\mathsf{T}\Delta z^{k+1} +\alpha A^\mathsf{T}B\Delta y^{k+1}+\alpha A^\mathsf{T}A\Delta x^{k+1} - Q_1^k \Delta x^{k+1} \nonumber \\\in & {} \;\partial _x {\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k+1}). \end{aligned}$$
(11)

Taking partial differential of \({\mathcal {L}}^{\alpha }\) with respect to y and evaluating the result at \((x^{k+1},y^{k+1},z^{k+1}) \) gives \( \nabla _y {\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k+1}) = \nabla h(y^{k+1})+\alpha B^\mathsf{T}\big (Ax^{k+1}+By^{k+1}+\alpha ^{-1}z^{k+1}+c\big ). \) The optimality criterion of y subproblem in (4) is also given by

$$\begin{aligned} \nabla h(y^k) + \alpha B^\mathsf{T}(Ax^{k+1}+By^{k+1}+\alpha ^{-1}z^k+c) + {Q_2^k} \Delta y^{k+1} =0. \end{aligned}$$

Thus, we have

$$\begin{aligned} d_y^{k+1}:=\nabla h(y^{k+1})-\nabla h(y^k) +B^\mathsf{T}\Delta z^{k+1}- {Q_2^k} \Delta y^{k+1} \in \nabla _y {\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k+1}).\nonumber \\ \end{aligned}$$
(12)

By the z subproblem in the algorithm (4) it is easy to see that

$$\begin{aligned} d_z^{k+1}:=Ax^{k+1}+By^{k+1}+c=\frac{1}{\alpha \beta } \Delta z^{k+1}\in \nabla _z {\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k+1}). \end{aligned}$$
(13)

Hence, by (11), (12), and (13), we then have \( d^{k+1}:=(d^{k+1}_x,d^{k+1}_y,d^{k+1}_z)\in \partial {\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k+1}).\)

From (11), by using the triangle inequality we have

$$\begin{aligned} \Vert d_x^{k+1}\Vert\le & {} \Vert \nabla g(x^{k+1})-\nabla g(x^k)\Vert +\Vert A\Vert \cdot \Vert \Delta z^{k+1}\Vert +\alpha \Vert A\Vert \cdot \Vert B\Vert \Vert \Delta y^{k+1}\Vert \\&+ \alpha \Vert A\Vert ^2\Vert \Delta x^{k+1}\Vert + \Vert Q_1^k\Vert \Vert \Delta x^{k+1}\Vert . \end{aligned}$$

Since \(\nabla g\) is \(L_g\) Lipschitz continuous and \(q_1=\sup _{k\ge 0} \Vert Q_1^k\Vert <+\infty \) we then have

$$\begin{aligned} \Vert d_x^{k+1}\Vert \le \alpha \Vert A\Vert \Vert B\Vert \Vert \Delta y^{k+1}\Vert +(q_1+L_g+\alpha \Vert A\Vert ^2)\Vert \Delta x^{k+1}\Vert + \Vert A\Vert \Vert \Delta z^{k+1}\Vert . \end{aligned}$$
(14)

From (12), by the triangle inequality, \( \Vert d_y^{k+1}\Vert \le \Vert \nabla h(y^{k+1})-\nabla h(y^k)\Vert +\Vert B\Vert \Vert \Delta z^{k+1}\Vert + \Vert {Q_2^k}\Vert \Vert \Delta y^{k+1}\Vert . \) Since \(\nabla h\) is \(L_h\) Lipschitz continuous and \(q_2=\sup _{k\ge 0} \Vert Q_2^k\Vert <+\infty \) we get

$$\begin{aligned} \Vert d_y^{k+1}\Vert \le (L_h+q_2) \Vert \Delta y^{k+1}\Vert + \Vert B\Vert \Vert \Delta z^{k+1}\Vert . \end{aligned}$$
(15)

We also have

$$\begin{aligned} \Vert d_z^{k+1}\Vert =\frac{1}{\alpha \beta } \Vert \Delta z^{k+1}\Vert . \end{aligned}$$
(16)

By (14), (15), and (16) we obtain

$$\begin{aligned} |||d^{k+1}|||\le & {} \Vert d_x^{k+1}\Vert + \Vert d_y^{k+1}\Vert +\Vert d_z^{k+1}\Vert \le \rho \big (\Vert \Delta x^{k+1}\Vert + \Vert \Delta y^{k+1}\Vert + \Vert \Delta z^{k+1}\Vert \big ), \end{aligned}$$

where \(\rho \) defined in (10). \(\square \)

Lemma 2

(Limiting continuity) Suppose that the Assumption 1 holds. If \((x^*, y^*, z^*)\) is the limit point of a converging subsequence \(\{(x^{k_j},y^{k_j},z^{k_j})\}_{j\ge 0}\) generated by the PL-ADMM, then \( {\mathcal {L}}^{\alpha } (x^*, y^*, z^*)=\displaystyle {\lim _{j\rightarrow \infty }} {\mathcal {L}}^{\alpha } (x^{k_j},y^{k_j},z^{k_j}).\)

Proof

Let \(\{(x^{k_j},y^{k_j},z^{k_j})\}_{j\ge 0}\) be a subsequence of the sequence generated by the PL-ADMM algorithm such that \( \displaystyle {\lim _{j\rightarrow \infty }} (x^{k_j},y^{k_j},z^{k_j})=(x^*, y^*, z^*). \) The function f is lower semicontinuous hence we have

$$\begin{aligned} f(x^*)\le \liminf _{j\rightarrow \infty } f(x^{k_j}). \end{aligned}$$
(17)

From the x-subproblem in (4), we have \({{\hat{f}}}^k(x^{k+1})\le {{\hat{f}}}^k(x)\) for any \(x\in {\mathbb {R}}^n\). Choose \(k=k_j\), \(\forall j\ge 0\), and let \(x=x^*\) to get

$$\begin{aligned}&f(x^{k_j+1})+\Big \langle \nabla g(x^{k_j})+\alpha A^\mathsf{T}\big (Ax^{k_j} +By^{k_j}+\alpha ^{-1}z^{k_j}+c\big ),\Delta x^{k_j+1}\Big \rangle +\Vert \Delta x^{k_j+1}\Vert ^2_{Q_1^{k_j}}&\\&\le f(x^*)+\Big \langle \nabla g(x^{k_j}) +\alpha A^\mathsf{T}(Ax^{k_j}+By^{k_j}+\alpha ^{-1}z^{k_j}+c),x^*-x^{k_j}\Big \rangle +\Vert x^*-x^{k_j}\Vert ^2_{Q_1^{k_j}}.&\end{aligned}$$

By the continuity of \(\nabla g\) and the fact that \(\Delta x^{k_j+1}\) approaches zero, taking the limit supremum from the both sides leads to

$$\begin{aligned} \begin{array}{lll} \displaystyle {\limsup _{j\rightarrow \infty }} f(x^{k_j+1}) \le f(x^*)+\\ \displaystyle { \limsup _{j\rightarrow \infty }} \Big \{\Big \langle \nabla g(x^{k_j})+\alpha A^\mathsf{T}(Ax^{k_j}+By^{k_j}+\alpha ^{-1}z^{k_j}+c),x^*-x^{k_j}\Big \rangle +\Vert x^*-x^{k_j}\Vert ^2_{Q_1^{k_j}}\Big \}. \end{array} \end{aligned}$$

We have \(x^{k_j}\rightarrow x^*\) as \(j\rightarrow \infty \) thus the latter inequality reduces to \( \displaystyle {\limsup _{j\rightarrow \infty }} \;f(x^{k_j+1})\le f(x^*).\) Thus, in view of (17), we then have \(\displaystyle {\lim _{j\rightarrow \infty }} f(x^{k_j})=f(x^*).\)

Since the functions \(h(\cdot )\) and \(g(\cdot )\) are smooth we further have \(\lim _{j\rightarrow \infty } g(x^{k_j})= g(x^*)\) and \(\lim _{j\rightarrow \infty } h(y^{k_j})= h(y^*)\). Thus

$$\begin{aligned} \begin{array}{l} \displaystyle {\lim _{j\rightarrow \infty }} {\mathcal {L}}^{\alpha } (x^{k_j},y^{k_j},z^{k_j})=\\ \displaystyle {\lim _{j\rightarrow \infty }} \Big \{ f(x^{k_j})+g(x^{k_j})+h(y^{k_j}) + \langle z^{k_j}, Ax^{k_j}+By^{k_j}+ c\rangle +\frac{\alpha }{2}\Vert Ax^{k_j}+By^{k_j}+c\Vert ^2\Big \}\\ = f(x^*)+g(x^*)+h(y^*) +\langle z^*, Ax^*+By^*+c\rangle +\frac{\alpha }{2}\Vert Ax^*+By^*+c\Vert ^2 = {\mathcal {L}}^{\alpha } (x^*, y^*, z^*). \end{array} \end{aligned}$$

That completes the proof. \(\square \)

Lemma 3

(Limit point is critical point) Suppose that the Assumption 1 holds. Any limit point \((x^*,y^*,z^*)\) of the sequence \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) generated by the PL-ADMM (4) is a stationary point. That is, \(0\in {\mathcal {L}}^{\alpha } (x^*,y^*,z^*)\), or equivalently

$$\begin{aligned} 0\in&\partial f(x^*)+\nabla g(x^*)+A^\mathsf{T}z^*,\quad 0=\nabla h(y^*)+ B^\mathsf{T}z^*, \quad 0=Ax^*+By^*+c. \end{aligned}$$

Proof

Let \(\{(x^{k_j},y^{k_j},z^{k_j})\}_{j\ge 0}\) be a subsequence of \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) such that \((x^*, y^*, z^*)=\lim _{j\rightarrow \infty } (x^{k_j},y^{k_j},z^{k_j}).\) This follows that \(\Vert \Delta x^{k_j}\Vert \rightarrow 0\) \(\Vert \Delta y^{k_j}\Vert \rightarrow 0\), and \(\Vert \Delta z^{k_j}\Vert \rightarrow 0\) as \(j\rightarrow \infty \). By Lemma 2, \( {\mathcal {L}}^{\alpha } (x^{k_j},y^{k_j},z^{k_j})\rightarrow {\mathcal {L}}^{\alpha }(x^*, y^*, z^*)\), \(j\rightarrow +\infty \). Let \(d^{k_j}\in \partial {\mathcal {L}}^{\alpha } (x^{k_j},y^{k_j},z^{k_j})\), by Lemma 1 we have \(|||d^{k_j}\Vert ||\le \rho (\Vert \Delta x^{k_j}\Vert +\Vert \Delta y^{k_j}\Vert +\Vert \Delta z^{k_j}\Vert )\), where \(\rho >0\) is an scalar. Since \(|||d^{k_j}\Vert ||\rightarrow 0\) as \(j\rightarrow \infty \), hence \(d^{k_j}\rightarrow 0\). By the closeness criterion of the limiting sub-differential we then have \(0\in \partial {\mathcal {L}}^{\alpha }(x^*, y^*, z^*)\), or equivalently, \((x^*, y^*, z^*)\in \mathrm{crit} ({\mathcal {L}}^{\alpha })\).

\(\square \)

Lemma 4

(descent of \({\mathcal {L}}^{\alpha }\) during x update) Suppose that the Assumption 1 holds. For the sequence \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) generated by the PL-ADMM (4) we have

$$\begin{aligned} {\mathcal {L}}^{\alpha }(x^k,y^k,z^k)-{\mathcal {L}}^{\alpha }(x^{k+1},y^{k},z^{k}) \ge \frac{1}{2} \Vert \Delta x^{k+1}\Vert ^2_{A^k}, \end{aligned}$$
(18)

where \(A^k:= { Q_1^k} + \alpha A^\mathsf{T}A - L_g {I_n}\).

Proof

Let \(k\ge 0\) be fixed. From the x iterate of (4) it is clear that \({{\hat{f}}}^k(x^{k+1})\le {{\hat{f}}}^k(x)\) for any \(x\in {\mathbb {R}}^n.\) Setting \(x=x^k\) gives

$$\begin{aligned}&f(x^{k+1})-f(x^k)+\big \langle \nabla g(x^k),\Delta x^{k+1}\big \rangle +\frac{1}{2} \Vert \Delta x^{k+1}\Vert ^2_{Q_1^k} \nonumber&\\&\le -\alpha \big \langle A^\mathsf{T}(Ax^k+By^k+\alpha ^{-1}z^k+c),\Delta x^{k+1}\big \rangle .&\end{aligned}$$
(19)

We next consider

$$\begin{aligned}&{\mathcal {L}}^{\alpha }(x^k,y^k,z^k)-{\mathcal {L}}^{\alpha }(x^{k+1},y^{k},z^{k}) = f({x^k})-f(x^{k+1}) + g(x^k)-g(x^{k+1})&\\&-\langle z^k,A\Delta x^{k+1})\rangle +\frac{\alpha }{2}\Vert Ax^k+By^k+c\Vert ^2 -\frac{\alpha }{2}\Vert Ax^{k+1}+By^k+c\Vert ^2&\\&=f({x^k})-f(x^{k+1}) +g(x^k)-g(x^{k+1}) -\langle z^k,A\Delta x^{k+1}\rangle&\\&+\frac{\alpha }{2}\Vert A(x^{k}-x^{k+1})\Vert ^2 -\alpha \big \langle A^\mathsf{T}(Ax^{k+1}+By^k+c),\Delta x^{k+1}\big \rangle .&\end{aligned}$$

By (19) we then have

$$\begin{aligned}&{\mathcal {L}}^{\alpha }(x^k,y^k,z^k)-{\mathcal {L}}^{\alpha }(x^{k+1},y^{k},z^{k}) \ge&\\&g(x^k)-g(x^{k+1}) +\big \langle \nabla g(x^k),\Delta x^{k+1}\big \rangle +\frac{\alpha }{2}\Vert A\Delta x^{k+1}\Vert ^2 +\frac{1}{2} \Vert \Delta x^{k+1}\Vert ^2_{Q_1^k}. \end{aligned}$$

Since g is \(L_g\) Lipschitz continues this yields

$$\begin{aligned} {\mathcal {L}}^{\alpha }(x^k,y^k,z^k)-{\mathcal {L}}^{\alpha }(x^{k+1},y^{k},z^{k}) \ge \frac{1}{2} \Vert \Delta x^{k+1}\Vert ^2_{Q_1^k}+\frac{\alpha }{2}\Vert A\Delta x^{k+1}\Vert ^2 -\frac{L_g}{2}\Vert \Delta x^{k+1}\Vert ^2=\frac{1}{2} \Vert \Delta x^{k+1}\Vert ^2_{A^k}. \end{aligned}$$

That completes the proof. \(\square \)

Lemma 5

(descent of \({\mathcal {L}}^{\alpha }\) during y update) Suppose that the Assumption 1 holds. For the sequence \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) generated by the PL-ADMM algorithm (4) we have

$$\begin{aligned} {\mathcal {L}}^{\alpha }(x^{k+1},y^k,z^k)-{\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k}) \ge \Vert \Delta y^{k+1}\Vert _{B^k}^2, \end{aligned}$$
(20)

where \({B^k}= {Q_2^k} + \alpha B^\mathsf{T}B -L_h{I_m}\).

Proof

Let \(k\ge 1\) be fixed. From the optimality condition of y subproblem in (4) we have

$$\begin{aligned} \nabla h(y^{k})+\alpha B^\mathsf{T}(By^{k+1}+Ax^{k+1}+\alpha ^{-1}z^k+c)+Q_2^k\Delta y^{k+1}=0. \end{aligned}$$

Multiply this equation by \(\Delta y^{k+1}\) and rearrange to obtain

$$\begin{aligned} - \alpha \langle B^\mathsf{T}(By^{k+1}+Ax^{k+1}+\alpha ^{-1}z^k+c),\Delta y^{k+1}\rangle =\langle \nabla h(y^{k}),\Delta y^{k+1}\rangle +\Vert \Delta y^{k+1}\Vert _{Q_2^k}^2.\nonumber \\ \end{aligned}$$
(21)

We next consider

$$\begin{aligned}&{\mathcal {L}}^{\alpha }(x^{k+1},y^k,z^k)-{\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k})\\&\quad = h(y^k)-h(y^{k+1}) +\frac{\alpha }{2}\Vert By^k+Ax^{k+1}+c\Vert ^2 -\frac{\alpha }{2}\Vert By^{k+1}+Ax^{k+1}+c\Vert ^2-\langle z^k, B\Delta y^{k+1} \rangle \\&\quad = h(y^k)-h(y^{k+1}) +\frac{\alpha }{2}\Vert B\Delta y^{k+1}\Vert ^2 -\alpha \langle \Delta y^{k+1}, B^\mathsf{T}(By^{k+1}+Ax^{k+1}+\alpha ^{-1}z^k+c)\rangle \\&\quad = h(y^k)-h(y^{k+1})+\frac{\alpha }{2}\Vert B\Delta y^{k+1}\Vert ^2 +\langle \nabla h(y^{k}),\Delta y^{k+1}\rangle +\Vert \Delta y^{k+1}\Vert _{Q_2^k}^2, \end{aligned}$$

where the last equation is obtained by (21). Since \(\nabla h\) is \(L_h\) Lipschitz continuous we then get

$$\begin{aligned} {\mathcal {L}}^{\alpha }(x^{k+1},y^k,z^k){-}{\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k}) {\ge } \frac{1}{2} \Vert \Delta y^{k+1}\Vert ^2_{Q_2^k} {+}\frac{\alpha }{2}\Vert B\Delta y^{k+1}\Vert ^2 {-}\frac{L_h}{2}\Vert \Delta y^{k+1}\Vert ^2 {=} \frac{1}{2} \Vert \Delta y^{k+1}\Vert ^2_{B^k}. \end{aligned}$$

This completes the proof. \(\square \)

Lemma 6

Suppose that the Assumption 1 holds. For the sequence \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) generated by the PL-ADMM (4) we have

$$\begin{aligned} {\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k+1}) +\Big \Vert \Delta x^{k+1}\Big \Vert ^2_{ A^k} +\Big \Vert \Delta y^{k+1}\Big \Vert ^2_{B^k} \le {\mathcal {L}}^{\alpha }(x^{k},y^k,z^k) +\frac{1}{\alpha \beta }\Big \Vert \Delta z^{k+1}\Big \Vert ^2,\nonumber \\ \end{aligned}$$
(22)

where \(A^k\) and \({B^k}\) are defined respectively in Lemmas 4 and 5.

Proof

Let \(k\ge 1\) be fixed. By the z update in (4) and Lemma 4 and 5 we have

$$\begin{aligned} {\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k+1})= & {} {\mathcal {L}}^{\alpha }(x^{k+1},y^{k+1},z^{k}) + \frac{1}{\alpha \beta }\Vert \Delta z^{k+1}\Vert ^2\\\le & {} {\mathcal {L}}^{\alpha }(x^{k+1},y^{k},z^{k}) - \Vert \Delta y^{k+1}\Vert ^2_{B^k} + \frac{1}{\alpha \beta }\Vert \Delta z^{k+1}\Vert ^2 \\\le & {} {\mathcal {L}}^{\alpha }(x^{k},y^{k},z^{k})- \Vert \Delta x^{k+1}\Vert ^2_{A^k} -\Vert \Delta y^{k+1}\Vert ^2_{B^k} + \frac{1}{\alpha \beta }\Vert \Delta z^{k+1}\Vert ^2. \end{aligned}$$

Rearrange to obtain (22). \(\square \)

Lemma 7

(Monotonicity of \({\mathcal {L}}^{\alpha }\)) Suppose that the Assumption 1 holds. For the sequence \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) generated by the PL-ADMM (4) the following inequalities hold

$$\begin{aligned}&(a)\quad \quad \quad \frac{1}{\alpha \beta }\Big \Vert \Delta z^{k+1}\Big \Vert ^2 \le \theta _0 \Vert \Delta y^k\Vert ^2 + \theta _1\Vert \Delta y^{k+1}\Vert ^2 +\theta _2 \Big \Vert B^\mathsf{T}\Delta z^{k}\Big \Vert ^2 -\theta _2\Vert B^\mathsf{T}\Delta z^{k+1}\Big \Vert ^2 \\&(b)\quad \quad \quad {\mathcal {L}}^{\alpha }\big (x^{k+1},y^{k+1},z^{k+1}\big ) +\big \Vert \Delta x^{k+1}\big \Vert ^2_{{A^k}} +\big \Vert \Delta y^{k+1}\big \Vert ^2_{B^k-r\theta _1{I_m}} +\displaystyle {\frac{r-1}{\alpha \beta }}\big \Vert \Delta z^{k+1}\big \Vert ^2 \\&+r\theta _2 \big \Vert B^\mathsf{T}\Delta z^{k+1}\big \Vert ^2 \le {\mathcal {L}}^{\alpha }\big (x^{k},y^k,z^k\big ) +r\theta _0 \big \Vert \Delta y^k\big \Vert ^2 +r\theta _2 \big \Vert B^\mathsf{T}\Delta z^k\big \Vert ^2 \end{aligned}$$

where \(r >1\), \(\lambda _+^{B^\mathsf{T}B}\) is the smallest positive eigenvalue of \(B^\mathsf{T}B\), \(\rho (\beta )=1-|1-\beta |\), and

$$\begin{aligned} \theta _0:= \frac{2\beta (L_h+q_2)^2}{\alpha \lambda _{+}^{B^\mathsf{T}B} \big (\rho (\beta ))^2},\; \theta _1:= \frac{2\beta q_2^2}{\alpha \lambda _{+}^{B^\mathsf{T}B} \big (\rho (\beta ))^2 },\; \theta _2:=\frac{ |1-\beta |}{\alpha \beta \lambda _{+}^{B^\mathsf{T}B} \rho (\beta )}. \end{aligned}$$
(23)

Proof

Part (a). Let \(k\ge 1\) be fixed. We define the vector

$$\begin{aligned} w^{k+1}:=-{Q_2^k}\Delta y^{k+1}-\nabla h(y^k). \end{aligned}$$
(24)

Then \( \Delta w^{k+1}= Q_2^{k-1}\Delta y^{k}-{Q_2^k}\Delta y^{k+1}+ \nabla h(y^{k-1})-\nabla h(y^k), \) and by the triangle inequality

$$\begin{aligned} \Vert \Delta w^{k+1}\Vert\le & {} \Vert \nabla h(y^{k})-\nabla h(y^{k-1})\Vert +\Vert {Q_2^k}\Vert \Vert \Delta y^{k+1}\Vert +\Vert Q_2^{k-1}\Vert \Vert \Delta y^{k}\Vert \end{aligned}$$
(25)

By the fact that \(\nabla h\) is \(L_h\) Lipschitz continuous and \(q_2=\sup _{k\ge 0}\Vert {Q_2^k}\Vert <+\infty \) we obtain

$$\begin{aligned} \Vert \Delta w^{k+1}\Vert \le (L_h+q_2) \Vert \Delta y^k\Vert + q_2 \Vert \Delta y^{k+1}\Vert . \end{aligned}$$
(26)

and hence

$$\begin{aligned} \Vert \Delta w^{k+1}\Vert ^2 \le 2 (L_h+q_2)^2 \Vert \Delta y^k\Vert ^2 +2 q_2^2 \Vert \Delta y^{k+1}\Vert ^2. \end{aligned}$$
(27)

Expressing the optimality condition of y subproblem using \(w^{k+1}\) gives

$$\begin{aligned} w^{k+1}=\alpha B^\mathsf{T}(Ax^{k+1}+By^{k+1}+c+\alpha ^{-1}z^k) \end{aligned}$$

Combining this with the z iterate in (4) yields

$$\begin{aligned} B ^\mathsf{T}z^{k+1}=\beta w^{k+1}+(1-\beta ) B^\mathsf{T}z^k. \end{aligned}$$
(28)

This follows that \( B^\mathsf{T}\Delta z^{k+1}=\beta \Delta w^{k+1}+(1-\beta ) B^\mathsf{T}\Delta z^{k}. \) Since \(\beta \in (0,2)\), we can equivalently write this as follows

$$\begin{aligned} B^\mathsf{T}\Delta z^{k+1}= \rho (\beta )\cdot \frac{\beta \Delta w^{k+1}}{\rho (\beta )} +|1-\beta |\cdot (\mathrm{sign}(1-\beta ) B^\mathsf{T}\Delta z^k), \end{aligned}$$

where \(\mathrm{sign} (\lambda )=1\) if \(\lambda \ge 0\) and \(\mathrm{sign} (\lambda )=-1\) if \(\lambda <0\). By the convexity of \(\Vert \cdot \Vert ^2\) we have

$$\begin{aligned} \lambda _{+}^{B^\mathsf{T}B} \rho (\beta )\Big \Vert \Delta z^{k+1}\Big \Vert ^2\le & {} \frac{\beta ^2}{ \rho (\beta )} \Big \Vert \Delta w^{k+1}\Big \Vert ^2+|1-\beta | \Big \Vert B^\mathsf{T}\Delta z^{k}\Big \Vert ^2-|1-\beta | \Big \Vert B^\mathsf{T}\Delta z^{k+1}\Big \Vert ^2. \end{aligned}$$

Divide the both sides of the latter inequality by \(\alpha \beta \lambda _{+}^{B^\mathsf{T}B} \rho (\beta )\) and use (27) to obtain (a).

Part (b). Multiply (a) by \(r>1\) and combine it with (22) to obtain the result. \(\square \)

4 Regularized augmented Lagrangian

The regularized augmented Lagrangian function \({\mathcal {R}}:{\mathbb {R}}^n\times {\mathbb {R}}^m\times {\mathbb {R}}^p\times {\mathbb {R}}^m\times {\mathbb {R}}^p \rightarrow (-\infty ,+\infty ] \nonumber \) is defined by

$$\begin{aligned} {\mathcal {R}}(x,y,z,y',z')={\mathcal {L}}^{\alpha }(x,y,z)+ r \theta _0 \Vert y-y'\Vert ^2+ r \theta _2 \Vert B^\mathsf{T}(z-z')\Vert ^2, \end{aligned}$$
(29)

where \(r>1\), and \(\theta _0\) and \(\theta _2\) are as in (23).

For any \(k\ge 1\), we denote

$$\begin{aligned} {\mathcal {R}}_k:={\mathcal {R}}(x^k,y^k,z^k,y^{k-1},z^{k-1}) ={\mathcal {L}}^{\alpha }(x^k,y^k,z^k)+ r\theta _0 \Vert \Delta y^k\Vert ^2+ r \theta _2 \Vert B^\mathsf{T}\Delta z^k\Vert ^2.\qquad \end{aligned}$$
(30)

By (23) we then have

$$\begin{aligned} {\mathcal {R}}_{k+1} +\Vert \Delta x^{k+1}\Vert ^2_{A^k}+\Vert \Delta y^{k+1}\Vert ^2_{D^k}+\frac{r-1}{\alpha \beta }\Vert \Delta z^{k+1}\Vert ^2 \le {\mathcal {R}}_k. \end{aligned}$$
(31)

where \(D^k:= B^k-r(\theta _0+\theta _1) {I_m}.\)

Assumption 2

(Sufficient decrease condition)

  1. (i)

    The parameters \(\alpha>0, \beta \in (0,2), r>1\), and the symmetric positive semidefinite matrices \(\{Q_1^k\}_{k\ge 0}\) and \(\{Q_2^k\}_{k\ge 0}\) are chosen such that there exists \(\sigma _1>0\) and \(\sigma _2>0\) that

    $$\begin{aligned} Q_1^{k}+\alpha {A^\mathsf{T}A}-L_g I_n \succeq \sigma _1 I_n \quad \mathrm{and} \quad Q_2^k+\alpha B^\mathsf{T}B - (L_h+r\theta _0+r\theta _1)I_m\succeq \sigma _2 I_m. \end{aligned}$$

    We then let \( \sigma =\min \{\sigma _1,\sigma _2, \frac{r-1}{\alpha \beta }\},\) then for all \(k\ge 1\) we have

    $$\begin{aligned} {\mathcal {R}}_{k+1} +\sigma \Big (\Vert \Delta x^{k+1}\Vert ^2+\Vert \Delta y^{k+1}\Vert ^2+\Vert \Delta z^{k+1}\Vert ^2\Big ) \le {\mathcal {R}}_k\le {\mathcal {R}}_1. \end{aligned}$$
    (32)
  2. (ii)

    (Relaxed Version) The conditions in the assumption can be relaxed. Chosen the parameters \(\alpha>0, \beta \in (0,2), r>1\), suppose that the symmetric positive semidefinite matrices\(\{Q_1^k\}_{k\ge 0}\) and \(\{Q_2^k\}_{k\ge 0}\) are chosen such that after a finite number of iterations \(k_0\ge 0\) we have

    $$\begin{aligned} \sigma _k=\inf _{k\ge k_0}\Big \{\Vert Q_1^k\Vert +\alpha \Vert A\Vert ^2-L_g, \Vert Q_2^k\Vert +\alpha \Vert B\Vert ^2 - (L_h+r\theta _0+r\theta _1), \frac{r-1}{\alpha \beta } \Big \}>\sigma >0. \end{aligned}$$

    With this, we would then have

    $$\begin{aligned} {\mathcal {R}}_{k+1} +\sigma \Big (\Vert \Delta x^{k+1}\Vert ^2+\Vert \Delta y^{k+1}\Vert ^2+\Vert \Delta z^{k+1}\Vert ^2\Big ) \le {\mathcal {R}}_k\le {\mathcal {R}}_{k_0},\quad \forall k\ge k_0. \end{aligned}$$
    (33)

Remark 1

We make a few remarks regarding to Assumption 2.

  1. (i)

    By setting \(Q_1^k=0\) and \(Q_2^k=0\) for all k, the PL-ADMM reduces to the linearized ADMM. In this case, for any \(\beta \in (0,2)\) and \(r>1\), \(\alpha \) needs to be chosen such that

    $$\begin{aligned} \alpha >\max \Big \{ L_g/\lambda _{\max }^{B^\mathsf{T}B}, \Big (1+\sqrt{1+8rd\beta /\rho (\beta )^2}\Big )\frac{L_h}{2\lambda _{\max }^{B^\mathsf{T}B}}\Big \} \end{aligned}$$

    where \(d=\lambda _{\max }^{B^\mathsf{T}B}/\lambda _+^{B^\mathsf{T}B}\).

  2. (ii)

    If the y subproblem in PL-ADMM contains the second order term \(\frac{\alpha }{2}y^*B^\mathsf{T}By\) and if \(B^\mathsf{T}B\) is nearly diagonal (or orthogonal), we can replace \(B^\mathsf{T}B\) by a certain symmetric diagonal (orthogonal) matrix D that approximates \(B^\mathsf{T}B\). By choosing \(Q_2^k=\alpha (D-B^\mathsf{T}B)\), the second order term \(\frac{\alpha }{2}y^*B^\mathsf{T}By\) is replaced by \(\frac{\alpha }{2}y^*Dy\). Note that by Assumption (A5) we have \(q_2^-=\alpha \Vert D-B^\mathsf{T}B\Vert \), and by (23) we have \(\theta _0\propto \frac{1}{\alpha }\) and \(\theta _1\propto \frac{1}{\alpha }\). Hence, for a fixed \(\beta \in (0,2)\) and \(r>1\), if \(\alpha \) chosen large enough, the relation \(q_2^-+\alpha \lambda _+^{B^\mathsf{T}B}-(L_h+r\theta _0+r\theta _1)>0\) and hence Assumption 2 (i) holds.

  3. (iii)

    (Application example) Consider the problem

    $$\begin{aligned} \min _{x\in {\mathbb {R}}^n}\;\; r(x)+\mathcal \ell ({\mathcal {A}} x-b) \end{aligned}$$

    where \({\mathcal {A}}:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^m\) and \(b\in {\mathbb {R}}^m\). The first term is the regularization term and the second term is the fitting term. Regularizers can be some sparsity-including functions, \(\ell _q\) quasi-norms for \(0<q\le 1\), indicator functions, or any lower-semicontinuous (not necessarily convex) functions satisfying KŁ conditions. The fitting term can be least squares, logistic functions, and other smooth functions. In order to solve this problem, we formulate it as

    $$\begin{aligned} \min _{x,y\in {\mathbb {R}}^n}\;\; r(x)+\mathcal \ell ({\mathcal {A}} y-b)\quad s.t. \;\;\; x-y=0 \end{aligned}$$

    Comparing to (1), \(h(y)=\ell ({\mathcal {A}} y-b)\), \(g(x)=0\), \(A=I\), and \(B=-I\). Note that A1-A4 hold. We set \(Q_1^k=\eta I\) where \(\eta >0\). Since B is a multiple of identity, we set \(Q_2^k=0\). The PL-ADMM algorithm to solve this problem is given as follows

    $$\begin{aligned} x^{k+1}= & {} \arg \min _{x}\;\; r(x)+ \frac{\eta }{2} \Vert \alpha \eta ^{-1}(x^k-y^k+\alpha ^{-1}z^k)+x-x^k\Vert ^2,\\ y^{k+1}= & {} x^{k+1}-\frac{1}{\alpha }\big ( {\mathcal {A}}^\mathsf{T}\nabla \ell (\mathcal Ay^k-b)+B^\mathsf{T}z^k \big ),\\ z^{k+1}= & {} z^k+\alpha \beta (x^{k+1}-y^{k+1}). \end{aligned}$$

    Note that \(B^\mathsf{T}B=I\), \(\lambda _+^{B^\mathsf{T}B}=1=\lambda _{\max }^{B^\mathsf{T}B}\). If \(\ell (.)=\Vert \;.\;\Vert ^2\), then \(L_h=\Vert {\mathcal {A}}\Vert ^2\). To guarantee Assumption 2 (i), we need to choose \( \alpha >\Vert {\mathcal {A}}\Vert ^2 \big (1+\sqrt{1+8\beta r/\rho (\beta )^2}\big ),\) for any \(r>1\) and \(\beta \in (0,2)\).

Theorem 1

(bounded sequence) We assume that Assumption 1, and 2 (ii) hold. Then sequence \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) generated by the PL-ADMM algorithm (4) is bounded.

Proof

Let \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) be a sequence generated by the PL-ADMM algorithm. By (33) there exists a \(k_0\ge 1\) such that \({\mathcal {R}}_{k+1}\le {\mathcal {R}}_{k_0}\) for all \(k\ge k_0\). Hence

$$\begin{aligned}&f(x^{k+1})+g(x^{k+1})+h(y^{k+1}) +\frac{\alpha }{2}\Big \Vert Ax^{k+1}+By^{k+1} +\alpha ^{-1}z^{k+1}+c\Big \Vert ^2-\frac{1}{2\alpha }\Big \Vert z^{k+1}\Big \Vert ^2 \nonumber \\&\quad +\sigma \Vert \Delta x^{k+1}\Vert ^2+ (\sigma +r\theta _0) \Vert \Delta y^{k+1}\Vert ^2+ \sigma \Vert \Delta z^{k+1}\Vert ^2 +r\theta _2\Vert B^\mathsf{T}\Delta z^{k+1}\Vert ^2 \le R_{k_0}.\quad \end{aligned}$$
(34)

We will next find a lower bound for \(-\frac{1}{2\alpha }\Vert z^{k+1}\Vert ^2\). Given \(w^{k+1}\) as in (24), we rearrange (28) to obtain \( \beta B^\mathsf{T}z^{k+1}=\beta w^{k+1}+(1-\beta ) B^\mathsf{T}(z^k-z^{k+1}).\) Since \(\beta \in (0,2)\) we can rewrite this equation as follows

$$\begin{aligned} \beta B^\mathsf{T}z^{k+1}= \rho (\beta ) \Big (\frac{\beta w^{k+1}}{\rho (\beta )}\Big ) +(|1-\beta |)\Big (\mathrm{sign}(1-\beta ) B^\mathsf{T}(z^k-z^{k+1})\Big ). \end{aligned}$$

By the convexity of \(\Vert \cdot \Vert ^2\) we have

$$\begin{aligned} \lambda _+^{B^\mathsf{T}B}\beta ^2 \Big \Vert z^{k+1}\Big \Vert ^2 \le \frac{\beta ^2}{\rho (\beta )} \Big \Vert w^{k+1}\Big \Vert ^2 +|1-\beta | \Big \Vert B^\mathsf{T}\Delta z^{k+1}\Big \Vert ^2. \end{aligned}$$
(35)

By (24) we have \(\Vert w^{k+1}\Vert ^2\le 2(q_2+L_h)^2\Vert \Delta y^{k+1}\Vert ^2+2\Vert \nabla h(y^{k+1})\Vert ^2\). Use this in (35) and then divide the both sides of the resulting inequality by \(-2\alpha \beta ^2\lambda _{+}^{B^\mathsf{T}B}\) to get

$$\begin{aligned} -\frac{1}{2\alpha }\Big \Vert z^{k+1}\Big \Vert ^2 \ge -\theta _3 \Vert \nabla h(y^{k+1})\Vert ^2 -\theta _4 \Vert \Delta y^{k+1}\Vert ^2 -\theta _5 \Big \Vert B^\mathsf{T}\Delta z^{k+1}\Big \Vert ^2, \end{aligned}$$

where

$$\begin{aligned} \theta _3:= \frac{1}{\alpha \rho (\beta )\lambda _{+}^{B^\mathsf{T}B}},\quad \theta _4:=\frac{(q_2+L_h)^2}{\alpha \rho (\beta |) \lambda _{+}^{B^\mathsf{T}B}},\quad \theta _5:=\frac{|1-\beta |}{2\alpha \beta ^2 \lambda _{+}^{B^\mathsf{T}B}}. \end{aligned}$$

Using the latter inequality, (34) leads to

$$\begin{aligned}&f(x^{k+1})+g(x^{k+1})+\frac{\alpha }{2}\Big \Vert Ax^{k+1}+By^{k+1}+\alpha ^{-1}z^{k+1}+c\Big \Vert ^2 +(r\theta _0+\sigma -\theta _4)\Big \Vert \Delta y^{k+1}\Big \Vert ^2\nonumber \\&\quad +\sigma \Big \Vert \Delta x^{k+1}\Big \Vert ^2 + (r\theta _2-\theta _5) \Big \Vert B^\mathsf{T}\Delta z^{k+1}\Big \Vert ^2 +\sigma \Vert \Delta z^{k+1}\Vert ^2 \le {\mathcal {R}}_{k_0} -\inf _y\Big \{ h(y)-\vartheta \Big \Vert \nabla h(y)\Big \Vert ^2 \Big \}. \nonumber \\ \end{aligned}$$
(36)

By the Assumption 1, \(\nabla h\) is \(L_h\) Lipschitz continuous, then for any \(k\ge k_0\ge 1\) and \(y\in {\mathbb {R}}^m\) it holds \(h(y)\le h(y^k)+\langle \nabla h(y^k),y-y^k\rangle +\frac{L_h}{2}\Vert y-y^k\Vert ^2.\) If \(\delta >0\) be an scalar, setting \(y=y^k-\delta \nabla h(y^k)\) yields \( h\Big (y^k-\delta \nabla h(y^k)\Big )\le h(y^k)- \Big (\delta -\frac{L_h\delta ^2}{2}\Big )\Vert \nabla h(y^k)\Vert ^2. \) Since h is bounded from below, then we have

$$\begin{aligned} -\infty <\inf \{h(y)- \Big (\delta -\frac{L_h\delta ^2}{2}\Big )\Vert \nabla h(y)\Vert ^2: \; y\in {\mathbb {R}}^m\} \end{aligned}$$
(37)

We choose \(\delta >0\) such that \(\theta _3=\delta -\frac{L_h\delta ^2}{2}\). Then by (37), the right hand side of (36) is finite. It is easy to verify for any \(r>1\) and any \(\beta \in (0,2)\), we also have \(r\theta _2-\theta _5>0\) and \(r\theta _0+\sigma -\theta _4>0\). Hence

$$\begin{aligned} f(x^{k+1})+g(x^{k+1})+\Big \Vert Ax^{k+1}+By^{k+1}+\alpha ^{-1}z^{k+1}+c\Big \Vert ^2 +\Big \Vert \Delta y^{k+1}\Big \Vert ^2 +\Vert \Delta z^k\Vert ^2<+\infty .\nonumber \\ \end{aligned}$$
(38)

Since f and g are coercive, then the sequence \(\{x^k\}_{k\ge k_0}\) and consequently \(\{Ax^k\}_{k\ge k_0}\) is bounded. By the z iterate of (4) we have \( By^{k+1} = \frac{1}{\alpha \beta } \Delta z^{k+1} -Ax^{k+1}-c. \) By (38), \(\{\Delta z^k\}_{k\ge 0}\) is bounded and since \(B^\mathsf{T}B\) is invertible then \(\{y^k\}_{k\ge k_0}\) is bounded. Finally, since \(\{Ax^k\}_{k\ge k_0}\) and \(\{By^k\}_{k\ge k_0}\) are bounded and also \(\{Ax^{k}+By^{k}+\frac{1}{\alpha }z^{k}\}_{k\ge k_0}\) is bounded, thus \(\{z^k\}_{k\ge k_0}\) is bounded.

We showed that the sequences \(\{x^k\}_{k\ge k_0}\),\(\{y^k\}_{k\ge k_0}\), and \(\{z^k\}_{k\ge k_0}\) are bounded. Hence, there exists \(M_x>0\), \(M_y>0\), \(M_z>0\) positive scalars such that

$$\begin{aligned} \Vert x^k\Vert \le M_x, \quad \Vert y^k\Vert \le M_y, \quad \Vert z^k\Vert \le M_z, \quad \forall k\ge k_0. \end{aligned}$$
(39)

We denote by \( {{\hat{M}}}_x=\displaystyle {\max _{0\le k\le k_0-1}} \Vert x^k\Vert ,\; {{\hat{M}}}_y=\displaystyle {\max _{0\le k\le k_0-1}} \Vert y^k\Vert ,\; {{\hat{M}}}_z=\displaystyle {\max _{0\le k\le k_0-1}} \Vert z^k\Vert . \) Thus we have \( \Vert x^k\Vert \le \max \{M_x,{{\hat{M}}}_x\}, \; \Vert y^k\Vert \le \max \{M_y,{{\hat{M}}}_y\}, \; \Vert z^k\Vert \le \max \{M_z,{{\hat{M}}}_z\}, \; \forall k\ge 1. \) This concludes the proof. \(\square \)

Remark 2

Theorem 1 was established for Assumption 2(ii). With Assumption 2(i), (39) holds for \(k\ge 1\).

Lemma 8

(Convergence of \({\mathcal {R}}_k\)) Suppose that the Assumption 1 and 2 (i) hold. If \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) is a sequence generated by the PL-ADMM algorithm (4), which assumed to be bounded. Then the sequence \(\{{\mathcal {R}}_{k}\}_{k\ge 1}\) defined in (30) is bounded from below and converges.

Proof

Let \(k\ge 0\) be fixed. By (30), we only need to show that

$$\begin{aligned} {\mathcal {L}}^{\alpha }(x^k,y^k,z^k)=f(x^k)+g(x^k)+h(y^k) +\langle z^k, Ax^k+By^k+c\rangle +\frac{\alpha }{2}\Vert Ax^k+By^k+c\Vert ^2 \end{aligned}$$

is bounded from below. Since \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) is a bounded sequence by Theorem 1, clearly \(\langle z^k, Ax^k+By^k+c\rangle \) and \(\Vert Ax^k+By^k+c\Vert ^2\) are bounded for \(k\ge 0\). By Assumption 1, h is bounded from below, and since f and g are coercive and \(\{x^k\}_{k\ge 0}\) is bounded, then \(\{f(x^k)\}_{k\ge 0}\) and \(\{g(x^k)\}_{k\ge 0}\) are bounded. Therefore \(\{{\mathcal {L}}^{\alpha }(x^k,y^k,z^k)\}_{k\ge 0}\) is bounded from below, and consequently \( -\infty <\inf \{{\mathcal {R}}_{k}:\; k\ge 1\}. \) By Assumption 2 (i), \(\{{\mathcal {R}}_k\}_{k\ge 1}\) is monotonically decreasing. This, together with the fact that \({\mathcal {R}}_k\) is bounded from below, we conclude that \(\{{\mathcal {R}}_k\}_{k\ge 1}\) is convergent.

\(\square \)

Lemma 9

Suppose that the Assumption 1 and 2 (ii) hold. If \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) is a sequence generated by the PL-ADMM algorithm (4), which assumed to be bounded, then

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert \Delta x^{k+1}\Vert =0,\quad \lim _{k\rightarrow \infty } \Vert \Delta y^{k+1}\Vert =0,\quad \lim _{k\rightarrow \infty } \Vert \Delta z^{k+1}\Vert =0. \end{aligned}$$

Proof

By summing up (33) from \(k=k_0\) to some \(K\ge k_0\) we have

$$\begin{aligned} \sum _{k=k_0}^{K} \Vert \Delta x^{k+1}\Vert ^2+\Vert \Delta y^{k+1}\Vert ^2 +\Vert \Delta z^{k+1}\Vert ^2\le \frac{1}{\sigma }({\mathcal {R}}_{k_0} - \inf _{k\ge 1} {\mathcal {R}}_k) <+\infty . \end{aligned}$$

We let K approach to infinity, and since \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) is bounded we have

$$\begin{aligned} \sum _{k\ge 0} \Vert \Delta x^{k+1}\Vert ^2+\Vert \Delta y^{k+1}\Vert ^2+\Vert \Delta z^{k+1}\Vert ^2<+\infty . \end{aligned}$$

This follows that \(\Vert \Delta x^{k+1}\Vert ^2+\Vert \Delta y^{k+1}\Vert ^2+\Vert \Delta z^{k+1}\Vert ^2\rightarrow 0,\) as \(k\rightarrow \infty \), and thus

$$\begin{aligned}\lim _{k\rightarrow \infty } \Vert \Delta x^{k+1}\Vert +\Vert \Delta y^{k+1}\Vert +\Vert \Delta z^{k+1}\Vert \rightarrow 0. \end{aligned}$$

This finishes the proof.

\(\square \)

Lemma 10

(Properties of limit point set) Let the Assumptions 1 and 2 hold. For a bounded sequence \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) generated by the PL-ADMM algorithm (4) the followings are true

  1. (i)

    The limit point set of the sequence \(\{(x^k,y^k,z^k)\}_{k\ge 0}\), denoted \(\omega \Big ((x^k,y^k,z^k)\}_{k\ge 0}\Big )\), is nonempty, connected and compact.

  2. (ii)

    \(\displaystyle {\lim _{k\rightarrow \infty } \mathrm{dist} \Big [(x^k,y^k,z^k), \omega \Big ((x^k,y^k,z^k)\}_{k\ge 0}\Big )\Big ] =0}\).

  3. (iii)

    \(\omega \Big ((x^k,y^k,z^k)\}_{k\ge 0}\Big ) \subseteq \mathrm{crit}\; {\mathcal {L}}^{\alpha } \).

Proof

These results follow by Lemma 9. We omit the proof. \(\square \)

Lemma 11

Suppose that the Assumption 1 holds. If \((x^*,y^*,z^*)\) is a limit point of a converging subsequence \(\{(x^{k_j},y^{k_j},z^{k_j})\}_{j\ge 0}\) of the PL-ADMM algorithm, then

$$\begin{aligned} \lim _{j\rightarrow \infty } {\mathcal {R}}_{k_j} ={\mathcal {R}}(x^*,y^*,z^*, y^*,z^*) = {\mathcal {L}}^{\alpha } (x^*,y^*,z^*) = f(x^*)+g(x^*)+h(y^*). \end{aligned}$$

Proof

Let \(\{(x^{k_j},y^{k_j},z^{k_j})\}_{j\ge 0}\) be a PL-ADMM subsequence such that \((x^{k_j},y^{k_j},z^{k_j})\rightarrow (x^*,y^*,z^*)\) as \(j\rightarrow \infty \). Hence \(\Vert \Delta y^{k_j}\Vert \rightarrow 0\) and \(\Vert B^\mathsf{T}\Delta z^{k_j}\Vert \le \Vert B\Vert \Vert \Delta z^{k_j}\Vert \rightarrow 0\) as \(j\rightarrow \infty \) hence

$$\begin{aligned} \lim _{j\rightarrow \infty } {\mathcal {R}}_{k_j}= & {} \lim _{j\rightarrow \infty } {\mathcal {L}}^{\alpha } (x^{k_j},y^{k_j},z^{k_j}) \\= & {} \lim _{j\rightarrow \infty } f(x^{k_j})+g(x^{k_j})+h(y^{k_j}) +\langle z^{k_j}, Ax^{k_j}+By^{k_j}+c\rangle +\frac{\alpha }{2}\Vert Ax^{k_j}+By^{k_j}+c\Vert ^2. \end{aligned}$$

By the z iterate of the PL-ADMM (4), as \(j\rightarrow \infty \) we have \(\Vert \Delta z^{k_j+1}\Vert \rightarrow 0\) and hence \(\Vert Ax^{k_j}+By^{k_j}+c\Vert \rightarrow 0\). Since \(\{z^{k_j}\}_{j\ge 0}\) is a bounded sequence we also have \(\langle z^{k_j}, Ax^{k_j}+By^{k_j}+c\rangle \rightarrow 0,\) as \(j\rightarrow \infty \). As g and h are smooth and f is lower semicontinuous, then

$$\begin{aligned} \lim _{j\rightarrow \infty } {\mathcal {R}}(x^{k_j},y^{k_j},z^{k_j},y^{k_j},z^{k_j})= & {} \lim _{j\rightarrow \infty } {\mathcal {L}}^{\alpha } (x^{k_j},y^{k_j},z^{k_j})\\= & {} \lim _{j\rightarrow \infty } f(x^{k_j})+g(x^{k_j})+h(y^{k_j}) =f(x^*)+g(x^*)+h(y^*). \end{aligned}$$

This concludes the proof. \(\square \)

5 Convergence and convergence rates

In this section, we establish the main theoretical results including convergence and convergence rates for the sequence generated by the PL-ADMM and the regularized augmented Lagrangian. We begin with some important lemmas.

Lemma 12

Suppose that the Assumption 1 holds. Let \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) be a sequence generated by the PL-ADMM algorithm (4). Define

$$\begin{aligned}&s_x^{k}:=d^{k}_x,\quad s_y^{k}:=d^{k}_y+2r\theta _0 \Delta y^{k},\quad s_z^{k}:=d^{k}_z+2r\theta _2BB^\mathsf{T}\Delta z^{k},&\\&s_{y'}^{k}:=-2r\theta _0 \Delta y^{k},\quad s_{z'}^{k}:=-2\theta _2BB^\mathsf{T}\Delta z^{k}&\end{aligned}$$

where \((d_x^k,d_y^k,d_z^k)\in \partial {\mathcal {L}}^{\alpha }(x^k,y^k,z^k)\). Then \( s^{k}:=(s^{k}_x, s^{k}_y, s^{k}_z, s^{k}_{y'},s^{k}_{z'}, )\in \partial {\mathcal {R}}(x^{k},y^{k},z^{k},y^{k-1},z^{k-1}) \) for \(k\ge 1\), and it holds

$$\begin{aligned} |||s^{k}|||\le {{\tilde{\rho }}} \Big (\Vert \Delta x^{k}\Vert +\Vert \Delta y^{k}\Vert +\Vert \Delta z^{k}\Vert \Big ), \end{aligned}$$
(40)

where

$$\begin{aligned} {{\tilde{\rho }}}=\sqrt{3}\rho +4r\max \{\theta _0, \theta _2\Vert B\Vert ^2\}, \end{aligned}$$
(41)

\(r>1\), \(\rho \) is given in (10), \(\theta _0\) and \(\theta _2\) are as (23).

Proof

Let \(k\ge 1\) fixed, and \((d_x^{k}, d_y^{k}, d_z^{k})\in \partial {\mathcal {L}}^{\alpha }(x^k,y^k,z^k)\). By taking partial derivatives of \({\mathcal {R}}_k\) with respect to \(x,y,z,y',z'\) we obtain

$$\begin{aligned}&7s_x^{k}:= \partial _x {\mathcal {R}}(x^{k},y^{k},z^{k},y^{k-1},z^{k-1}) =\partial _x {\mathcal {L}}^{\alpha }(x^{k},y^{k},z^{k})=d_x^{k}, \\&s_y^{k}:=\nabla _y {\mathcal {R}} (x^{k},y^{k},z^{k},y^{k-1},z^{k-1}) =\nabla _y {\mathcal {L}}^{\alpha }(x^{k},y^{k},z^{k})+2r\theta _0 \Delta y^k= d_y^{k}+2r\theta _0 \Delta y^k, \\&s_z^{k}:=\nabla _z {\mathcal {R}}(x^{k},y^{k},z^{k},y^{k-1},z^{k-1}) =\nabla _z {\mathcal {L}}^{\alpha }(x^{k},y^{k},z^{k}) +2r\theta _2 BB^\mathsf{T}\Delta z^{k}=d_z^{k}+2r\theta _2 BB^\mathsf{T}\Delta z^{k}, \\&s_{y'}^{k}:=\nabla _{y'} {\mathcal {R}}(x^{k},y^{k},z^{k},y^{k-1},z^{k-1}) =-2r\theta _0\Delta y^k, \\&d_{z'}^{k}:=\nabla _{z'} {\mathcal {R}}(x^{k},y^{k},z^{k},y^{k-1},z^{k-1}) =-2r \theta _2 BB^\mathsf{T}\Delta z^k. \end{aligned}$$

By the triangle inequality we obtain

$$\begin{aligned}&\Vert s_x^{k}\Vert =\Vert d^{k}_x\Vert ,\quad \Vert s_y^{k}\Vert \le \Vert d^{k}_y\Vert +2r\theta _0\Vert \Delta y^k\Vert ,\\&\Vert s_z^{k}\Vert \le \Vert d^{k}_z\Vert +2r \theta _2\Vert B\Vert ^2 \Vert \Delta z^k\Vert ,\quad \Vert s_{y'}^{k}\Vert = 2r\theta _0\Vert \Delta y^k\Vert ,\quad \Vert s_z^{k}\Vert =2r\theta _2 \Vert B\Vert ^2 \Vert \Delta z^k\Vert . \end{aligned}$$

By Lemma 1, this follows that

$$\begin{aligned} |||s^{k}|||\le & {} \Vert s_x^{k}\Vert +\Vert s_y^{k}\Vert +\Vert s_z^{k}\Vert + \Vert s_{y'}^{k}\Vert +\Vert s_z^{k}\Vert \\\le & {} \Vert d^{k}_x\Vert + \Vert d^{k}_y\Vert +\Vert d^{k}_z\Vert + 4r\theta _0\Vert \Delta y^{k}\Vert + 4r\theta _2 \Vert B\Vert ^2 \Vert \Delta z^k\Vert \\\le & {} \sqrt{3} |||d^k|||+4r\theta _0\Vert \Delta y^{k}\Vert + 4r \theta _2 \Vert B\Vert ^2 \Vert \Delta z^k\Vert \\\le & {} \sqrt{3}\rho \Vert \Delta x^{k}\Vert +( \sqrt{3}\rho +4r\theta _0)\Vert \Delta y^{k}\Vert +( \sqrt{3}\rho +4r\theta _2\Vert B\Vert ^2)\Vert \Delta z^{k}\Vert \\\le & {} {{\tilde{\rho }}} \Big (\Vert \Delta x^{k}\Vert +\Vert \Delta y^{k}\Vert +\Vert \Delta z^{k}\Vert \Big ). \end{aligned}$$

This concludes the proof. \(\square \)

Lemma 13

Let the Assumption 1 and 2 hold. If \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) is a sequence generated by the PL-ADMM algorithm (4), then the following statements hold.

  1. (i)

    The set \(\omega \Big (\{(x^k,y^k,z^k,y^{k-1},z^{k-1})\}_{k\ge 1}\Big )\) is nonempty, connected, and compact.

  2. (ii)

    \(\Omega \subseteq \{(x, y, z, y, z) \in {\mathbb {R}}^n\times {\mathbb {R}}^m\times {\mathbb {R}}^p\times {\mathbb {R}}^m\times {\mathbb {R}}^p: \;(x, y, z)\in \mathrm{crit} ({\mathcal {L}}^{\alpha })\Big \}.\)

  3. (iii)

    \(\lim _{k\rightarrow \infty } \mathrm{dist} \Big [ (x^k, y^k, z^k, y^{k-1}, x^{k-1}), \Omega \Big ]=0\);

  4. (iv)

    The sequences \(\{{\mathcal {R}}_k\}_{k\ge 1}\), \(\{{\mathcal {L}}^{\alpha }(x^k,y^k,z^k)\}_{k\ge 0}\), and \(\{{\mathcal {F}}(x^k,y^k)\}_{k\ge 0}\) approach to the same limit and if \((x^*, y^*, z^*, y^*, z^*)\in \Omega \), then

    $$\begin{aligned} {\mathcal {R}} (x^*, y^*, z^*, y^*, z^*)= {\mathcal {L}}^{\alpha }(x^*, y^*, z^*) = {\mathcal {F}}(x^*,y^*). \end{aligned}$$

Proof

These results follow immediately from Lemma 3, Theorem 8, and Lemma 12. \(\square \)

Theorem 2

(Convergence) Suppose that Assumptions 1 and 2 (ii) hold and \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) is a sequence generated by the PL-ADMM (4), which is assumed to be bounded. If the regularized augmented Lagrangian \({\mathcal {R}}\) defined in (29) satisfies the KŁ property on the limit point set \( \Omega :=\omega \big (\{(x^k,y^k,z^k,y^{k-1},z^{k-1})\}_{k\ge 1}\big );\) that is, for every \(v^*:=(x^*, y^*, z^*, y^*, z^*) \in \Omega \) there exists \(\epsilon >0\) and desingularizing function \(\psi :[0,\eta ]\rightarrow [0,+\infty )\), for some \(\eta \in [0,+\infty )\) (see Sect. 2.2) such that for all \(v=(x,y,z,y',z')\in {\mathbb {R}}^n\times {\mathbb {R}}^m\times {\mathbb {R}}^p\times {\mathbb {R}}^m\times {\mathbb {R}}^p\) belongs to the following set

$$\begin{aligned} {\mathcal {S}}:=\Big \{v: \;\mathrm{dist}(v,\Omega )<\epsilon \;\mathrm{and} \; {\mathcal {R}}(v^*)<{\mathcal {R}}(v)< {\mathcal {R}}(v^*)+\eta \Big \}, \end{aligned}$$
(42)

the inequality \( \psi '\Big ({\mathcal {R}}(v)-{\mathcal {R}}(v^*)\Big ) \Vert \partial {\mathcal {R}}(v)\Vert _-\ge 1 \) holds. Then \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) satisfies the finite length property

$$\begin{aligned} \sum _{k=0}^{\infty } \Vert \Delta x^{k}\Vert + \Vert \Delta y^{k}\Vert + \Vert \Delta z^{k}\Vert <+\infty , \end{aligned}$$

and consequently converges to a stationary point of (1).

Proof

By Lemma 8, the monotonically decreasing sequence \(\{{\mathcal {R}}_k\}_{k\ge k_0}\) for some \(k_0\ge 1 \) converges, thus let \(\lim _{k\rightarrow \infty }{\mathcal {R}}_k:={\mathcal {R}}_{\infty }\). This immediately follows that the sequence \(\{{\mathcal {E}}_k\}_{k\ge k_0}\) defined by

$$\begin{aligned} {\mathcal {E}}_k:={\mathcal {R}}_k-{\mathcal {R}}_{\infty }, \end{aligned}$$
(43)

is non-negative, monotonically decreasing, and converges to 0. Let us consider two scenarios:

Scenario 1. There is \(k_1\ge k_0\) such that \({\mathcal {E}}_{k_1}=0\). This gives rise to \({\mathcal {E}}_k=0\) for all \(k\ge k_1\). By (5), (33), and the fact that the sequence \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) is bounded, we then have

$$\begin{aligned}{}&\sum _{k\ge 0}\Big (\Vert \Delta x^{k+1}\Vert +\Vert \Delta y^{k+1}\Vert +\Vert \Delta z^{k+1}\Vert \Big ) \le \\&\sum _{k=1}^{k_1} \Big (\Vert \Delta x^{k}\Vert +\Vert \Delta y^{k}\Vert +\Vert \Delta z^{k}\Vert \Big ) + \sqrt{3}\sum _{k=k_1}^{\infty } \sqrt{\Big (\Vert \Delta x^{k}\Vert ^2 +\Vert \Delta y^{k}\Vert ^2 +\Vert \Delta z^{k}\Vert ^2 \Big )} \le \\&\sum _{k=1}^{k_1} \underbrace{\Big (\Vert \Delta x^{k}\Vert +\Vert \Delta y^{k}\Vert +\Vert \Delta z^{k}\Vert \Big )}_{<+\infty } +\frac{\sqrt{3}}{\sqrt{\sigma }} \sum _{k=k_1}^{\infty }\underbrace{\sqrt{{\mathcal {E}}_k-{\mathcal {E}}_{k+1}}}_{=0} \;<+\infty . \end{aligned}$$

Scenario 2. The sequence \(\{{\mathcal {E}}_k\}_{k\ge k_0}\) is a strictly positive sequence. Since the sequence \(\{{\mathcal {R}}_{k}\}_{k\ge k_0}\) is monotonically decreasing to \({\mathcal {R}}_{\infty }\), then there exists \( k_1\ge k_0\ge 1\) such that

$$\begin{aligned} {\mathcal {R}}_{\infty }<{\mathcal {R}}_k< {\mathcal {R}}_{\infty }+\eta ,\quad \forall k\ge k_1. \end{aligned}$$

Note that by Lemma 13, \(\Omega \) is nonempty, compact, and connected. By the assumption, \({\mathcal {R}}_k\) takes on a constant value \({\mathcal {R}}_{\infty }\) on \(\Omega \). By Lemma 13, since \(\lim _{k\rightarrow \infty } \mathrm{dist} \Big [ (x^k,y^k,z^k,y^{k-1},x^{k-1}), \Omega \Big ]=0\), thus there exists \(k_2\ge 1\) such that

$$\begin{aligned} \mathrm{dist} \Big [ (x^k,y^k,z^k,y^{k-1},x^{k-1}), \Omega \Big ]<\epsilon , \quad \forall k\ge k_2. \end{aligned}$$

Choose \({{\tilde{k}}} =\max \{ k_1,k_2,3 \}\). Then for \(k\ge {{\tilde{k}}}\), we have \((x^k,y^k,z^k,y^{k-1},x^{k-1})\in {\mathcal {S}}\) where \({\mathcal {S}}\) defined in (42). This follows that

$$\begin{aligned} 1\le \psi '( {\mathcal {E}}_k)\cdot \Vert \partial {\mathcal {R}}_k\Vert _-. \end{aligned}$$
(44)

Multiply both sides of this inequality by \(|||\Delta u^{k+1}|||^2=\Vert \Delta x^{k+1}\Vert ^2+\Vert \Delta y^{k+1}\Vert ^2+\Vert \Delta z^{k+1}\Vert ^2\), and use (33) and concavity of \(\psi \) (recall: \(\psi ({\mathcal {E}}_{k}) -\psi ({\mathcal {E}}_{k+1})\ge \psi '({\mathcal {E}}_{k}) ({\mathcal {E}}_{k} - {\mathcal {E}}_{k+1})\)) to obtain

$$\begin{aligned} |||\Delta u^{k+1}|||^2\le & {} \psi '( {\mathcal {E}}_k) \cdot |||\Delta u^{k+1}|||^2 \cdot \Vert \partial {\mathcal {R}}_k\Vert _-\\\le & {} \psi '( {\mathcal {E}}_k)\cdot \frac{1}{\sigma } ({\mathcal {E}}_{k} - {\mathcal {E}}_{k+1}) \cdot \Vert \partial {\mathcal {R}}_k\Vert _- \\\le & {} \frac{1}{\sigma } \Big (\psi ({\mathcal {E}}_{k}) -\psi ({\mathcal {E}}_{k+1})\Big ) \cdot \Vert \partial {\mathcal {R}}_k \Vert _-\\\le & {} \frac{\gamma }{2\sigma } \Big (\psi ({\mathcal {E}}_{k}) -\psi ({\mathcal {E}}_{k+1})\Big ) +\frac{1}{2\gamma } \Vert \partial {\mathcal {R}}_k\Vert _-. \end{aligned}$$

The last inequality is obtained by the use of arithmetic mean-geometric mean inequality holds for any \(\gamma >0\).

By Lemma 12 and (5) we then obtain

$$\begin{aligned} \Vert \Delta x^{k+1}\Vert +\Vert \Delta y^{k+1}\Vert +\Vert \Delta z^{k+1}\Vert \le \frac{\sqrt{3}\gamma }{2\sigma } \Big (\psi ({\mathcal {E}}_{k}) -\psi ({\mathcal {E}}_{k+1})\Big ) + \frac{\sqrt{3}{{\tilde{\rho }}}}{2\gamma }\Big (\Vert \Delta x^{k}\Vert +\Vert \Delta y^{k}\Vert +\Vert \Delta z^{k}\Vert \Big ).\nonumber \\ \end{aligned}$$
(45)

It is easy to validate the following identity for any sequence \(\{w^k\}_{k\ge 0}\) and two integer numbers \(i,I\in {\mathbb {Z}}_+\) with \(I\ge i\),

$$\begin{aligned} \sum _{k=i}^I\Vert \Delta w^{k}\Vert = \sum _{k=i}^{I} \Vert \Delta w^{k+1}\Vert +\Vert \Delta w^i\Vert -\Vert \Delta w^{I+1}\Vert . \end{aligned}$$

We sum the inequality (45) from \(k=i\ge {{\tilde{k}}} \) to \(k=I\ge i\), we choose \(\gamma \) large enough such that \(1>\frac{\sqrt{3}{{\tilde{\rho }}}}{2\gamma }\), define \(\delta _0:=1-\frac{\sqrt{3}{{\tilde{\rho }}}}{2\gamma }\) and rearrange to obtain

$$\begin{aligned}&\sum _{k=i}^I\Vert \Delta x^{k+1}\Vert +\Vert \Delta y^{k+1}\Vert +\Vert \Delta z^{k+1}\Vert \le \frac{\sqrt{3}\gamma }{2\sigma \delta _0} \big (\psi ({\mathcal {E}}_{i}) -\psi ({\mathcal {E}}_{I+1})\big ) \\&\quad +\frac{\sqrt{3}{{\tilde{\rho }}}}{2\gamma \delta _0} \Big (\Vert \Delta x^{i}\Vert +\Vert \Delta y^{i}\Vert +\Vert \Delta z^{i}\Vert \Big ) -\frac{\sqrt{3}{{\tilde{\rho }}}}{2\gamma \delta _0} \Big (\Vert \Delta x^{I+1}\Vert +\Vert \Delta y^{I+1}\Vert +\Vert \Delta z^{I+1}\Vert \Big ). \end{aligned}$$

Since \({\mathcal {E}}_k\) is a monotonically decreasing positive sequence \(\psi ({\mathcal {E}}_k)\ge \psi ({\mathcal {E}}_{k+1})>0\) and hence

$$\begin{aligned}&\sum _{k=i}^I\Vert \Delta x^{k+1}\Vert +\Vert \Delta y^{k+1}\Vert +\Vert \Delta z^{k+1}\Vert \le \frac{\sqrt{3}\gamma }{2\sigma \delta _0} \psi ({\mathcal {E}}_{i})+\frac{\sqrt{3}{{\tilde{\rho }}}}{2\gamma \delta _0} \big (\Vert \Delta x^{i}\Vert +\Vert \Delta y^{i}\Vert +\Vert \Delta z^{i}\Vert \big ). \end{aligned}$$

The right hand side of the latter inequality is finite, therefore the left hand side sum is bounded for any \(I\ge i\). We let I to approach infinity to obtain

$$\begin{aligned} \sum _{k\ge i}\Vert \Delta x^{k+1}\Vert +\Vert \Delta y^{k+1}\Vert +\Vert \Delta z^{k+1}\Vert \le \frac{\sqrt{3}\gamma }{2\sigma \delta _0} \psi ({\mathcal {E}}_{i})+\frac{\sqrt{3}{{\tilde{\rho }}}}{2\gamma \delta _0} \Big (\Vert \Delta x^{i}\Vert +\Vert \Delta y^{i}\Vert +\Vert \Delta z^{i}\Vert \Big ).\nonumber \\ \end{aligned}$$
(46)

Since \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) is a bounded sequence, then for any \(i\in {\mathbb {Z}}_+\) we clearly have

$$\begin{aligned} \lambda ({i}):=\sum _{k=1}^{i} \Vert \Delta x^{k}\Vert +\Vert \Delta y^{k}\Vert +\Vert \Delta z^{k}\Vert <+\infty . \end{aligned}$$
(47)

Thus, by combining (46) and (47) we conclude that \(\displaystyle {\sum _{k\ge 0} \Vert \Delta x^{k+1}\Vert +\Vert \Delta y^{k+1}\Vert +\Vert \Delta z^{k+1}\Vert }\) is finite.

Now for any \(p,q,K\in {\mathbb {Z}}_+\) where \(q\ge p>0\), by (5) and Cauchy–Schwarz inequality we have

$$\begin{aligned} |||u^q-u^p||| =||| \sum _{k=p}^{q-1} \Delta u^{k+1}||| \le \sum _{k=p}^{q-1} |||\Delta u^{k+1}|||\le & {} \sum _{k=p}^{q-1} \Big (\Vert \Delta x^{k+1}\Vert + \Vert \Delta y^{k+1}\Vert + \Vert \Delta z^{k+1}\Vert \Big )\\\le & {} \sum _{k\ge 0} \Vert \Delta x^{k+1}\Vert +\Vert \Delta y^{k+1}\Vert +\Vert \Delta z^{k+1}\Vert < \infty . \end{aligned}$$

This implies that \(\{u^k\}_{k\ge 0}=\{(x^k,y^k,z^k)\}_{k\ge 0}\) is a Cauchy sequence and converges. Moreover, by Lemma 3, it converges to a stationary point. \(\square \)

Remark 3

Theorem 2 gives rise to the fact that the limit point set \(\omega (\{(x^k,y^k,z^k)\}_{k\ge 0})\) is a singleton. Let’s denote by \((x^{\infty }, y^{\infty }, z^{\infty })\) the unique limit point of the sequence \({(x^k,y^k,z^k)}_{k\ge 0}\).

Theorem 3

(Convergence rate of \({\mathcal {R}}_k\)) Suppose that Assumption 1 and 2 (ii) hold. Let \((x^{\infty }, y^{\infty }, z^{\infty })\) be the unique limit point of the sequence \({(x^k,y^k,z^k)}_{k\ge 0}\) and denote \(v^{\infty }:=(x^{\infty }, y^{\infty }, z^{\infty },y^{\infty }, z^{\infty })\), \({\mathcal {E}}_k:={\mathcal {R}}_k-{\mathcal {R}}_{\infty }\) where \({\mathcal {R}}_{\infty }:={\mathcal {R}}(v^{\infty })=\lim _{k\rightarrow \infty } {\mathcal {R}}_k\). If the regularized augmented Lagrangian \({\mathcal {R}}\) defined in (29) satisfies the KŁ property at \(v^{\infty }\) (see Sect. 2.2); that is, there exists an exponent \(\theta \in [0,1)\), \(C_L>0\), and \(\epsilon >0\) such that for all \(v:=(x,y,z,y',z')\) where \(\mathrm{dist}(v,v^{\infty })<\epsilon \) the following inequality holds

$$\begin{aligned} |{\mathcal {R}}(v)-{\mathcal {R}}(v^{\infty )})|^{\theta }\le C_L \Vert \partial {\mathcal {R}}(v)\Vert _-. \end{aligned}$$
(48)

Then there exists \(K\ge 1\) such that

$$\begin{aligned} {{\bar{\alpha }}} {\mathcal {E}}_{k}^{2\theta }\le {\mathcal {E}}_{k-1}-{\mathcal {E}}_{k},\quad \forall k\ge K \end{aligned}$$
(49)

where \({{\bar{\alpha }}}>0\). Moreover,

  1. (a)

    if \(\theta =0\), then \({\mathcal {E}}_k\) converges to zero in a finite number of iterations.

  2. (b)

    if \(\theta \in (0, \frac{1}{2}]\), then for all \(k\ge K\) it holds

    $$\begin{aligned} {\mathcal {E}}_k\le \displaystyle {\frac{ \max \{{\mathcal {E}}_i:1\le i\le K\}}{(1+{{\bar{\alpha }}} {\mathcal {E}}_K^{2\theta -1})^{k-K+1} }}, \end{aligned}$$
    (50)
  3. (c)

    if \(\theta \in (\frac{1}{2},1)\) then there is a \(\mu >0\) such that for all \(k\ge K\) it holds

    $$\begin{aligned} {\mathcal {E}}_{k}\le \Big (\frac{1}{\mu (k-K)+ {\mathcal {E}}_K^{1-2\theta }} \Big )^{\frac{1}{2\theta -1}}, \quad \forall k\ge K. \end{aligned}$$

Proof

By (40), and the use of (5), for any \(k\ge 1\) we have

$$\begin{aligned} \frac{1}{3{{\tilde{\rho }}}^2}|||s^k|||^2 \le \Vert \Delta x^k\Vert ^2+\Vert \Delta y^k\Vert ^2+ \Vert \Delta z^k\Vert ^2. \end{aligned}$$
(51)

By Assumption 2(ii), there exists a \(k_0\ge 1\) such that for any \(k\ge k_0\) we have

$$\begin{aligned} \Vert \Delta x^{k}\Vert ^2+\Vert \Delta y^{k}\Vert ^2+\Vert \Delta z^{k}\Vert ^2 \le \frac{1}{\sigma } ({\mathcal {E}}_{k-1}-{\mathcal {E}}_{k}). \end{aligned}$$
(52)

Combining (51) and (52) leads to

$$\begin{aligned} \frac{1}{3{{\tilde{\rho }}}^2}|||s^k|||^2 \le \frac{1}{\sigma } ({\mathcal {E}}_{k-1}-{\mathcal {E}}_{k}). \end{aligned}$$
(53)

By the assumption \({\mathcal {R}}\) satisfies the Łojasiewicz property at \(v^{\infty }\) where \(v^{\infty }=\lim _{k\rightarrow \infty } v^k\) together with the fact that \(\{{\mathcal {R}}_k\}_{k\ge k_0}\) is monotonically decreasing where \({\mathcal {R}}_{\infty }=\lim _{k\rightarrow \infty } {\mathcal {R}}_k\), then there exist an \(K\ge k_0\), \(\epsilon >0\), \(\theta \in [0,1)\), and \(C_L>0\) such that for all \(k\ge K\) \(\mathrm{dist}(v^k,v^{\infty })<\epsilon \) and

$$\begin{aligned} {\mathcal {E}}_k^{\theta }=|{\mathcal {R}}_k-{\mathcal {R}}_{\infty }|^{\theta }\le C_L\; \Vert \partial {\mathcal {R}}_k\Vert _- \le C_{L}|||s^k||| \end{aligned}$$

for some \(s^k\in \partial {\mathcal {R}}_k\).Thus raising both sides to the power two gives

$$\begin{aligned} {\mathcal {E}}_k^{2\theta }\le C_L^2 |||s^k|||^2, \quad \forall k\ge K. \end{aligned}$$

This, together with (53) yields (49) with \({{\bar{\alpha }}}=\sigma /{3C_L^2{{\tilde{\rho }}}^2}>0\).

(a) Let \(\theta =0\). If \({\mathcal {E}}_k>0\) for all \(k\ge K\) we would have \(0<{{\bar{\alpha }}} \le {\mathcal {E}}_{k-1}-{\mathcal {E}}_k\). As k approaches infinity, the right hand side approaches zero, which leads to a contradiction. Hence \({\mathcal {E}}_k\) must be equal to zero for some \(k\ge K\). Since \({\mathcal {E}}_k\) is a monotonically decreasing sequence to zero, thus there is a \({{\tilde{k}}}\le K\) such that \({\mathcal {E}}_k=0\) for all \(k\ge {{\tilde{k}}}\).

(b) If \(\theta \in (0,\frac{1}{2}]\), then \(2\theta -1<0\). Let \(k\ge K\) be fixed. The sequence \(\{{\mathcal {E}}_k\}_{k\ge K}\) is monotonically decreasing thus

$$\begin{aligned} {{\bar{\alpha }}} {\mathcal {E}}_K^{2\theta -1} {\mathcal {E}}_k\le {\mathcal {E}}_{k-1}-{\mathcal {E}}_{k},\quad \forall k \ge K. \end{aligned}$$

We rearrange this to obtain

$$\begin{aligned} {\mathcal {E}}_k \le \frac{ {\mathcal {E}}_{k-1}}{1+{{\bar{\alpha }}} {\mathcal {E}}_K^{2\theta -1}} \le \frac{ {\mathcal {E}}_{k-2}}{(1+{{\bar{\alpha }}} {\mathcal {E}}_K^{2\theta -1})^2} \le \dots \le \frac{{\mathcal {E}}_{K}}{(1+{{\bar{\alpha }}} {\mathcal {E}}_{k_0}^{2\theta -1})^{k-K}}, \quad k\ge K. \end{aligned}$$

Hence

$$\begin{aligned} {\mathcal {E}}_k \le \frac{\max \{{\mathcal {E}}_i: 1\le i\le K\}}{(1+{{\bar{\alpha }}} {\mathcal {E}}_K^{2\theta -1})^{k-K}}, \quad k\ge K. \end{aligned}$$

(c) Let \(\theta \in (\frac{1}{2},1)\). Rearrange (49) to obtain

$$\begin{aligned} {{\bar{\alpha }}} \le ({\mathcal {E}}_{k-1}-{\mathcal {E}}_{k}) {\mathcal {E}}_{k}^{-2\theta },\quad \forall k\ge K \end{aligned}$$
(54)

We let \(h:{\mathbb {R}}_+\rightarrow {\mathbb {R}}\) defined by \(h(s)=s^{-2\theta }\) for \(s\in {\mathbb {R}}_+\). Clearly, h is monotonically decreasing (\(h'(s)=-2\theta s^{-(1+2\theta )}<0\)). Since \({\mathcal {E}}_k\le {\mathcal {E}}_{k-1}\) for all \(k\ge K\) then \(h({\mathcal {E}}_{k-1}) \le h({\mathcal {E}}_{k})\) for all \(k\ge K\). We consider two cases. First, let \(r_0\in (1,+\infty )\) such that

$$\begin{aligned} h({\mathcal {E}}_{k}) \le r_0 h({\mathcal {E}}_{k-1}),\quad \forall k>K. \end{aligned}$$

Hence, by (54) we obtain

$$\begin{aligned}&{{\bar{\alpha }}} \le r_0 ({\mathcal {E}}_{k-1}-{\mathcal {E}}_{k}) h({\mathcal {E}}_{k-1}) \le r_0 h({\mathcal {E}}_{k-1}) \int ^{{\mathcal {E}}_{k-1}}_{{\mathcal {E}}_{k}} 1ds \\&\le r_0 \int ^{{\mathcal {E}}_{k-1}}_{{\mathcal {E}}_{k}} h(s) ds = r_0 \int ^{{\mathcal {E}}_{k-1}}_{{\mathcal {E}}_{k}} s^{-2\theta } ds = \frac{r_0}{1-2\theta }[{\mathcal {E}}_{k-1} ^{1-2\theta } - {\mathcal {E}}_{k}^{1-2\theta }]. \end{aligned}$$

Note that \(1-2\theta <0\), so rearrange to get \( 0< \frac{{{\bar{\alpha }}}(2\theta -1)}{r_0} \le {\mathcal {E}}_{k}^{1-2\theta }-{\mathcal {E}}_{k-1} ^{1-2\theta }. \) Setting \({{\hat{\mu }}}=\frac{{{\bar{\alpha }}}(2\theta -1)}{r_0}>0\) and \(\nu :={1-2\theta }<0 \) one then can obtain

$$\begin{aligned} 0<{{\hat{\mu }}}<{\mathcal {E}}_{k}^{\nu }-{\mathcal {E}}_{k-1}^{\nu },\quad \forall k> K. \end{aligned}$$
(55)

Next, we consider the case where \(h({\mathcal {E}}_{k}) \ge r_0 h({\mathcal {E}}_{k-1})\),. This yields \({\mathcal {E}}_{k}^{-2\theta } \ge r_0 {\mathcal {E}}_{k-1}^{-2\theta }\). Rearranging this gives \(r_0^{-1}{\mathcal {E}}_{k-1}^{2\theta } \ge {\mathcal {E}}_{k}^{2\theta }\), which by raising both sides to the power \(1/2\theta \) and setting \(q:=r_0^{-\frac{1}{2\theta }}\in (0,1)\) leads to \( q{\mathcal {E}}_{k-1}\ge {\mathcal {E}}_{k}. \) Since \(\nu =1-2\theta <0\), \(q^{\nu } {\mathcal {E}}_{k-1}^{\nu }\le {\mathcal {E}}_{k}^{\nu }\), which follows that

$$\begin{aligned} (q^{\nu }-1) {\mathcal {E}}_{k-1}^{\nu }\le {\mathcal {E}}_{k}^{\nu } -{\mathcal {E}}_{k-1}^{\nu }. \end{aligned}$$

By the fact that \(q^{\nu }-1>0\) and \({\mathcal {E}}_p\rightarrow 0^+\) as \(p\rightarrow \infty \), there exists \({{\bar{\mu }}}\) such that \((q^{\nu }-1) {\mathcal {E}}_{k-1}^{\nu }> {{\bar{\mu }}}\) for all \(k> K\). Therefore we obtain

$$\begin{aligned} 0<{{\bar{\mu }}} \le {\mathcal {E}}_{k}^{\nu } -{\mathcal {E}}_{k-1}^{\nu }. \end{aligned}$$
(56)

Choose \(\mu =\min \{{{\hat{\mu }}},{{\bar{\mu }}}\}>0\), one can combine (55) and (56) to obtain

$$\begin{aligned} 0<\mu \le {\mathcal {E}}_{k}^{\nu } -{\mathcal {E}}_{k-1}^{\nu },\quad \forall k> K. \end{aligned}$$

Summing this inequality from \(K+1\) to some \(k\ge K+1\) gives \( \mu (k-K)+ {\mathcal {E}}_K^{\nu } \le {\mathcal {E}}_{k}^{\nu } . \) Hence \( \displaystyle { {\mathcal {E}}_{k}\le ( \mu (k-K)+ {\mathcal {E}}_K^{\nu } )^{1/\nu } =( \mu (k-K)+ {\mathcal {E}}_K^{1-2\theta } )^{1/ (1-2\theta )}.}\) This concludes the proof. \(\square \)

Theorem 4

(Convergence rate of the PL-ADMM sequence) Suppose that the Assumptions 1 and 2(ii) hold, \(u^{\infty }:=(x^{\infty }, y^{\infty }, z^{\infty })\) is the unique limit point of the sequence \(\{(x^k,y^k,z^k)\}_{k\ge 0}\) generated by the PL-ADMM algorithm. If the regularized augmented Lagrangian \({\mathcal {R}}\) (29) satisfies the KŁ property at \(v^{\infty }:=(x^{\infty }, y^{\infty }, z^{\infty },y^{\infty }, z^{\infty })\), then the following assertions hold:

  1. (a)

    There exists a \(K\ge 1\) such that for all \(k\ge K\) we have

    $$\begin{aligned} |||u^k-u^{\infty }|||\le c \max \{ \psi ({\mathcal {E}}_{k}), \sqrt{{\mathcal {E}}_{k-1}} \}, \end{aligned}$$
    (57)

    where \(c>0\) constant, \({\mathcal {E}}_k:={\mathcal {R}}_k-{\mathcal {R}}_{\infty }\), \({\mathcal {R}}_{\infty }:=\displaystyle {\lim _{k\rightarrow \infty } {\mathcal {R}}_k}\), and \(\psi :[0,\eta )\rightarrow [0,+\infty )\) where \(\eta >0\) denotes a desingularizing function (see Sect. 2.2).

  2. (b)

    If \(\psi :[0,\eta )\rightarrow [0,+\infty )\) is defined by \(\psi (s)=s^{1-\theta }\) where \(\eta >0\) and \(\theta \in [0,1)\) then

    1. (i)

      If \(\theta =0\), then \(u^k\) converges to \(u^{\infty }\) in a finite number of iterations.

    2. (ii)

      If \(\theta \in (0, \frac{1}{2}]\), then for all \(k\ge K\) it holds

      $$\begin{aligned} |||u^k-u^{\infty }||| \le \frac{ \max \{\sqrt{{\mathcal {E}}_i}:1\le i\le K\} }{\sqrt{(1+{{\bar{\alpha }}} {\mathcal {E}}_K^{2\theta -1})^{k-K+1}}},\quad \quad {{\bar{\alpha }}}>0. \end{aligned}$$
    3. (iii)

      If \(\theta \in (\frac{1}{2},1)\) then

      $$\begin{aligned} |||u^k-u^{\infty }||| \le \Big (\frac{1}{\mu (k-K+1)+{{\mathcal {E}}_K}^{1-2\theta }} \Big )^{\frac{1-\theta }{2\theta -1}},\quad \forall k\ge K. \end{aligned}$$

Proof

Part (a). By Assumption 2(ii), let \(k_0\ge 1\) such that the sequence \(\{{\mathcal {E}}_k\}_{k\ge k_0}\) is monotonically decreasing. By (33) and (5) then for all \(k\ge k_0+1\) we have

$$\begin{aligned} \Vert \Delta x^k\Vert +\Vert \Delta y^k\Vert +\Vert \Delta z^k\Vert \le \frac{\sqrt{3}}{\sqrt{\sigma }} \sqrt{{\mathcal {E}}_{k-1}-{\mathcal {E}}_{k}} \le \frac{\sqrt{3}}{\sqrt{\sigma }} \sqrt{{\mathcal {E}}_{k-1}}. \end{aligned}$$
(58)

Since \({\mathcal {R}}_k\) and \(v^k\) converge to \({\mathcal {R}}_{\infty }\) and \(v^{\infty }\) respectively and \({\mathcal {R}}\) satisfies the KŁ property at \(v^{\infty }\), then there exists \(\epsilon >0\), \(\eta >0\), \(\psi :[0,\eta )\rightarrow [0,+\infty )\), and \(K\ge k_0+1\) such that for all \(k\ge K\), we have \(\mathrm{dist}(v^k, v^{\infty }) <\epsilon \) and \({\mathcal {R}}_{\infty }<{\mathcal {R}}_k< {\mathcal {R}}_{\infty }+\eta , \) and the following KŁ property holds

$$\begin{aligned} \psi '\big ({\mathcal {E}}_k\big )\cdot \Vert \partial {\mathcal {R}}_k\Vert _-\ge 1. \end{aligned}$$
(59)

By the concavity of \(\psi \) and (33) we have \( |||\Delta u^{k+1}|||^2 \le \frac{1}{\sigma } \big (\psi ({\mathcal {E}}_{k}) -\psi ({\mathcal {E}}_{k+1})\big ) \cdot \Vert \partial {\mathcal {R}}_k\Vert _-. \) By the arithmetic mean-geometric mean inequality for any \(\gamma >0\) we have

$$\begin{aligned} \Vert \Delta x^{k+1}\Vert +\Vert \Delta y^{k+1}\Vert +\Vert \Delta z^{k+1}\Vert \le \frac{\sqrt{3}\gamma }{2\sigma } \big (\psi ({\mathcal {E}}_{k}) -\psi ({\mathcal {E}}_{k+1})\big ) +\frac{\sqrt{3}}{2\gamma } \Vert \partial {\mathcal {R}}_k\Vert _-. \end{aligned}$$

Lemma 12 leads us to

$$\begin{aligned} \Vert \Delta x^{k+1}\Vert +\Vert \Delta y^{k+1}\Vert +\Vert \Delta z^{k+1}\Vert \le \frac{\sqrt{3}\gamma }{2\sigma } \psi ({\mathcal {E}}_{k}) + \frac{\sqrt{3}{{\tilde{\rho }}}}{2\gamma }\Big (\Vert \Delta x^{k}\Vert +\Vert \Delta y^{k}\Vert +\Vert \Delta z^{k}\Vert \Big ).\nonumber \\ \end{aligned}$$
(60)

Let \(\gamma >0\) large enough such that \(1>\sqrt{3}{{\tilde{\rho }}}/2\gamma \). Denote \(\delta _0:=1-\frac{\sqrt{3}{{\tilde{\rho }}}}{2\gamma }\), then sum up the latter inequality over \(k\ge K\) to get

$$\begin{aligned}&\sum _{k\ge K}\Vert \Delta x^{k+1}\Vert +\Vert \Delta y^{k+1}\Vert +\Vert \Delta z^{k+1}\Vert \le \frac{\sqrt{3}\gamma }{2\sigma \delta _0} \psi ({\mathcal {E}}_K)+\frac{\sqrt{3}{{\tilde{\rho }}}}{2\gamma \delta _0} \Big (\Vert \Delta x^K\Vert +\Vert \Delta y^K\Vert +\Vert \Delta z^K\Vert \Big ). \end{aligned}$$

Hence by the triangle inequality for any \(k\ge K\) it holds

$$\begin{aligned} |||u^k-u^{\infty }|||\le & {} \sum _{p\ge k}|||\Delta u^{p+1}|||\\\le & {} \sum _{p\ge k} \Vert \Delta x^{p+1}\Vert +\Vert \Delta y^{p+1}\Vert +\Vert \Delta z^{p+1}\Vert \\\le & {} \frac{\sqrt{3}\gamma }{2\sigma \delta _0} \psi ({\mathcal {E}}_k) +\frac{\sqrt{3}{{\tilde{\rho }}}}{2\gamma \delta _0} \Big (\Vert \Delta x^{k}\Vert +\Vert \Delta y^{k}\Vert +\Vert \Delta z^{k}\Vert \Big ). \end{aligned}$$

Use (58) to get

$$\begin{aligned} |||u^k-u^{\infty }|||\le \frac{\sqrt{3}\gamma }{2\sigma \delta _0} \psi ({\mathcal {E}}_k) +\frac{3{{\tilde{\rho }}}}{2\gamma \delta _0\sqrt{\sigma }} \sqrt{{\mathcal {E}}_{k-1}} \le c\max \{ \psi ({\mathcal {E}}_{k}), \sqrt{{\mathcal {E}}_{k-1}} \}, \end{aligned}$$

where \( c=\max \Big \{ \frac{\sqrt{3}\gamma }{2\sigma \delta _0}, \frac{3{{\tilde{\rho }}}}{2\gamma \delta _0\sqrt{\sigma }}\Big \}. \)

Part (b). Let \(\theta \in [0,1)\) and \(\psi (s)=s^{1-\theta }\). By part (a) we have

$$\begin{aligned} |||u^k-u^{\infty }||| \le c\max \{ {\mathcal {E}}_{k}^{1-\theta }, \sqrt{{\mathcal {E}}_{k-1}} \},\quad k\ge K. \end{aligned}$$
(61)

Exploit \(\psi '(s)=(1-\theta )s^{-\theta }\) in (59) to obtain \( {{\mathcal {E}}_k}^{\theta }\le (1-\theta )\Vert \partial {\mathcal {R}}_k\Vert _-,\;\forall k\ge K. \) This implies that \({\mathcal {R}}_k\) satisfies the Łojasiewics (48) at \(v^{\infty }\) for all \(k\ge K\) with \(C_L=1-\theta \).

(i) If \(\theta =0\), by Theorem 3(a) \({\mathcal {E}}_k\) goes to zero in a finite numbers of iterations. Hence by (61) \(u^k\) must converge to \(u^{\infty }\) in a finite number of iterations. (ii) If \(\theta \in (0,\frac{1}{2}]\), then \(\max \{{\mathcal {E}}_{k}^{1-\theta }, \sqrt{{\mathcal {E}}_{k-1}} \} =\sqrt{{\mathcal {E}}_{k-1}}\). By Theorem 3(b) we have

$$\begin{aligned} |||u^k-u^{\infty }||| \le \frac{ \max \{\sqrt{{\mathcal {E}}_i}:1\le i\le K\} }{\sqrt{(1+{{\bar{\alpha }}} {\mathcal {E}}_K^{2\theta -1})^{k-K+1} }},\quad \forall k\ge K \end{aligned}$$

where \({{\bar{\alpha }}} =\sigma /3{{\tilde{\rho }}}^2\). (iii) If \(\theta \in (1/2,1)\), then \(\max \{{\mathcal {E}}_{k-1}^{1-\theta }, \sqrt{{\mathcal {E}}_{k-1}} \} ={\mathcal {E}}_{k-1}^{1-\theta }\). By Theorem 3(c) we have

$$\begin{aligned} |||u^k-u^{\infty }||| \le \Big (\frac{1}{\mu (k-K-1)+{{\mathcal {E}}_K}^{1-2\theta }} \Big )^{\frac{1-\theta }{2\theta -1}},\quad \forall k\ge K. \end{aligned}$$

This completes the proof. \(\square \)

6 Concluding remarks

In this paper, we considered the variable metric proximal linearized ADMM (PL-ADMM) method described in (4) and established its theoretical analysis. This algorithm solves a broad class of linearly constrained nonconvex and nonsmooth optimization problems of the form (1). We proved the algorithm’s convergence by showing that the sequence generated by the PL-ADMM has a finite length, thus it is Cauchy. Under the powerful Kurdyka-Łojasiewicz (KŁ) property, we established the convergence rates for the regularized augmented Lagrangian function value as well as and the PL-ADMM sequence. We showed that various values of KŁ-exponent of the desingularising function associated with the objective function may raise the PL-ADMM with different convergence rates. For instance, we proved three rates for the desingularising function \(\psi (s)=s^{1-\theta }\), with \(\theta \in [0,1)\). If \(\theta =0\), the sequence generated by PL-ADMM converges in a finite numbers of iterations. If \(\theta \in (0,1/2]\), then the sequential rate of convergence is \(cQ^{k}\) where \(c>0\), \(Q\in (0,1)\), and \(k\in {\mathbb {N}}\) is the iteration number. If \(\theta \in (1/2,1)\), then the rate \({\mathcal {O}}(1/k^{r})\) where \(r=(1-\theta )/(2\theta -1)\) is achieved.