An alternating linearization bundle method for a class of nonconvex nonsmooth optimization problems

Tang, Chunming; Lv, Jinman; Jian, Jinbao

doi:10.1186/s13660-018-1683-1

An alternating linearization bundle method for a class of nonconvex nonsmooth optimization problems

Research
Open access
Published: 27 April 2018

Volume 2018, article number 101, (2018)
Cite this article

Download PDF

You have full access to this open access article

Journal of Inequalities and Applications Submit manuscript

An alternating linearization bundle method for a class of nonconvex nonsmooth optimization problems

Download PDF

Chunming Tang¹,
Jinman Lv¹ &
Jinbao Jian²

1523 Accesses
1 Citation
Explore all metrics

Abstract

In this paper, we propose an alternating linearization bundle method for minimizing the sum of a nonconvex function and a convex function, both of which are not necessarily differentiable. The nonconvex function is first locally “convexified” by imposing a quadratic term, and then a cutting-planes model of the local convexification function is generated. The convex function is assumed to be “simple” in the sense that finding its proximal-like point is relatively easy. At each iteration, the method solves two subproblems in which the functions are alternately represented by the linearizations of the cutting-planes model and the convex objective function. It is proved that the sequence of iteration points converges to a stationary point. Numerical results show the good performance of the method.

A simple version of bundle method with linear programming

Article 27 November 2018

A doubly stabilized bundle method for nonsmooth convex optimization

Article 20 February 2015

Subgradient and Bundle Methods for Nonsmooth Optimization

1 Introduction

In this paper, we consider the structured nonconvex minimization problem

$$ \min_{x\in\mathbb{R}^{n}} \bigl\{ F(x):=f(x)+h(x)\bigr\} , $$

(1)

where $f: \mathbb{R}^{n}\rightarrow\mathbb{R}$ is possibly a nonconvex nonsmooth function and $h: \mathbb{R}^{n}\rightarrow(-\infty,\infty]$ is a closed proper convex function.

Problems of the form (1) often appear in practice, such as signal processing, image reconstruction, engineering, optimal control, and so on. Three typical examples are given below.

Example 1

(Unconstrained transformation of a constrained problem)

Consider the constrained problem

$$ \min \bigl\{ f(x): x\in C\bigr\} , $$

(2)

where f is possibly a nonsmooth nonconvex function and C is a convex subset of $\mathbb{R}^{n}$. Problem (2) can be written equivalently as

$$ \min_{x\in\mathbb{R}^{n}} f(x)+\imath_{C}(x), $$

(3)

where $\imath_{C}$ is the indicator function of C, i.e., $\imath _{C}(x)$ equals 0 on C and infinity elsewhere. Clearly, problem (3) is a special case of problem (1) with $h(x)=\imath_{C}(x)$. We note that the proximal point of $\imath_{C}$ can easily be calculated or even has a closed-form solution if C has some special structure.

Example 2

(Nonconvex regularization of a convex function)

Consider the $l_{q}\ (0< q<1)$ regularization problem

$$ \min_{x\in\mathbb{R}^{n}} \frac{1}{2} \Vert Ax-b \Vert ^{2}+\lambda \Vert x \Vert _{q}, $$

(4)

which has many practical applications in compressed sensing and imaging science (see e.g., [1]), where $\Vert x \Vert _{q}=(\sum_{i=1}^{n} \vert x_{i} \vert ^{q})^{1/q}$. The objective function of problem (4) is also the sum of a convex function and a nonconvex function.

Example 3

(Convex regularization of a nonconvex function)

Hare et al. [2] studied the function of the form

$$ F(x)=\sum^{n}_{i=1} \bigl\vert f_{i}(x) \bigr\vert +\frac{1}{2} \Vert x \Vert ^{2}, $$

(5)

where $f_{i}(x): \mathbb{R}^{n}\rightarrow\mathbb{R}, i=1,\ldots,n$ are Ferrier polynomials defined as

$$f_{i}(x)=\bigl(ix_{i}^{2}-2x_{i}\bigr)+ \sum_{j=1}^{n}x_{j}. $$

It is well known that $f(x)=\sum^{n}_{i=1} \vert f_{i}(x) \vert $ is a nonconvex nonsmooth function, and $h(x)=\frac{1}{2} \Vert x \Vert ^{2}$ is a simple convex function.

The methods for minimizing the sum of two functions have been well studied during the past several decades. Different methods are developed based on these two types of functions; see e.g., [3–12]. In particular, Kiwiel [9] proposed an alternating linearization bundle method for the sum of two convex functions and one of them is “simple” (i.e., minimizing this function plus a separable convex quadratic function is “easy”). Goldfarb et al. [7] proposed a fast alternating linearization methods for the sum of two convex functions both of which are “simple”. Li et al. [10] presented a proximal alternating linearization method for the sum of two nonconvex functions based on the assumption that the proximal point of the two functions at a given point can easily be calculated. Attouch et al. [3] and Bolte et al. [4] considered a broad class of nonconvex and nonsmooth minimization problems that include as a special case minimizing the sum of two nonconvex functions, in which the proximal alternating minimization technique is used and the Kurdyka–Łojasiewicz property is assumed.

In this paper, we consider to minimize the sum of a nonconvex function and a convex function with the form of (1). In particular, we assume that f is lower-$C^{2}$ and h is “simple” in the sense that minimizing h plus a quadratic term is relatively easy. The method presented in this paper can be viewed as a generalized version of the methods given in [9] and [13]. On one hand, we generalize the method of [9] from minimizing the sum of two convex functions to the sum of a nonconvex function and a convex function. On the other hand, we generalize the method of [13] from minimizing a single nonconvex nonsmooth function to the sum of two functions.

Our method will produce three sequences of points: $\{z^{\ell}\}$, $\{ y^{\ell}\}$ and $\{x^{k(\ell)}\}$, where $\{z^{\ell}\}$ is the sequence of proximal points, $\{y^{\ell}\}$ is the sequence of trial points, and $\{x^{k(\ell)}\}$ is the sequence of stability centers (i.e., $x^{k(\ell)}\in\{y^{\ell}\}$ is the “best” point obtained so far for iteration ℓ, which will be abbreviated as $x^{k}$ if there is no confusion). More precisely, our method will alternately solve the following two subproblems:

$$\begin{aligned} &z^{\ell+1}:=\arg\min\biggl\{ \check{\varphi}_{\ell}(\cdot)+\bar {h}_{\ell-1}(\cdot)+ \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , \end{aligned}$$

(6)

$$\begin{aligned} &y^{\ell+1}:=\arg\min\biggl\{ \bar{\varphi}_{\ell}(\cdot)+h(\cdot)+ \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , \end{aligned}$$

(7)

where $\check{\varphi}_{\ell}$ is a cutting-planes model [14, 15] of the local convexification function of f at iteration ℓ, which is based on the idea of the redistributed proximal bundle method in [13] and will be made more precise later; $\bar{h}_{\ell-1}$ is a linearization of h at iteration $\ell-1$; $\bar{\varphi}_{\ell}$ is a linearization of $\check{\varphi }_{\ell}$; $\mu_{\ell}$ is the proximal parameter. Our convergence analysis shows that, under suitable assumptions, any accumulation point of the sequence $\{x^{k}\}$ is a stationary point of F if there is an infinite number of serious steps; otherwise, the last stability center is a stationary point of F.

This paper is organized as follows. In Sect. 2, we review some basic definitions and results required for this work. In Sect. 3, we present the alternating linearization bundle method for problem (1). Section 4 examines the convergence properties of the algorithm. Some preliminary numerical results are given in Sect. 5. The Euclidean inner product in $\mathbb{R}^{n}$ is denoted by $\langle x, y\rangle=x^{T}y$, and the associated norm by $\Vert \cdot \Vert $.

2 Preliminaries

In this section, we recall some basic definitions and results that are closely relevant to our method, which can be found in [13, 16, 17].

The limiting subdifferential of f at x̄ is defined by
$$\partial f(\bar{x}):=\lim_{x\rightarrow\bar{x}}\sup_{f(x)\rightarrow f(\bar{x})} \hat{\partial}f(x), $$
where $\hat{\partial}f(\bar{x})$ is the regular subdifferential defined by
$$\hat{\partial}f(\bar{x}):=\biggl\{ g\in\mathbb{R}^{n}: \lim _{x\rightarrow \bar{x}}\inf_{x\neq\bar{x}}\frac{f(x)-f(\bar{x})-\langle g,x-\bar {x}\rangle}{ \Vert x-\bar{x} \Vert }\geq0\biggr\} . $$
An element $g\in\partial f(\bar{x})$ is called a subgradient of f at x̄.
The function f is prox-bounded if there exists $R\geq0$ such that the function $f(\cdot)+\frac{1}{2}R \Vert \cdot \Vert ^{2}$ is bounded below. The corresponding threshold is the smallest $r_{pb}\geq0$ such that $f(\cdot)+\frac{1}{2}R \Vert \cdot \Vert ^{2}$ is bounded below for all $R>r_{pb}$.
The function f is lower-$C^{2}$ on an open set V if for each $\bar{x}\in V$ there is a neighborhood $V'$ of x̄ upon which a representation $f(x)=\max_{t\in T} f_{t}(x)$ holds, where T is a compact set and the functions $f_{t}$ are of class $C^{2}$ on V such that $f_{t}$, $\bigtriangledown f_{t}$ and $\bigtriangledown^{2}f_{t}$ depend continuously on $(t,x)\in T\times V$.
The proximal point mapping of the function f at the point $x\in\mathbb{R}^{n}$ is defined by
$$p_{R}f(x):=\arg\min_{\omega\in\mathbb{R}^{n}} \biggl\{ f(\omega )+ \frac{1}{2}R \Vert \omega-x \Vert ^{2} \biggr\} . $$

Lemma 1

([16])

Suppose that the function f is lower-$C^{2}$ on V and $\bar{x}\in V$. Then there exist $\varepsilon>0, K>0$, and $\rho>0$ such that

(i)
for any point $x^{0}$ and parameter $R\geq\rho$ the function $f+\frac{1}{2}R \Vert \cdot-x^{0} \Vert ^{2}$ is convex and finite valued on the closed ball $\bar{B}_{\varepsilon }(\bar{x})$, and
(ii)
the function f is Lipschitz continuous with constant K on $\bar{B}_{\varepsilon}(\bar{x})$.

Theorem 1

([16])

Suppose that the lower semicontinuous function f is prox-bounded with threshold $r_{pb}$ and lower-$C^{2}$ on V. Let $\bar{x}\in V$ and let $\varepsilon>0,K>0$ and $\rho>0$ be given by Lemma 1. Then x̄ is a stationary point of f if and only if $\bar {x}=p_{R}f(\bar{x})$ for any $R>R_{\bar{x}}:=\max\{4K/\varepsilon, \rho, r_{pb}\}$.

Assumption 1

([13])

Given $x^{0}\in\mathbb{R}^{n}$ and $M_{0}\geq0$, there exist an open bounded set $\mathcal{O}$ and a function H such that $\mathcal{L}_{0}:=\{x\in\mathbb{R}^{n}: f(x)\leq f(x^{0})+M_{0}\}\subset\mathcal{O}$, and H is lower-$C^{2}$ on $\mathcal{O}$ with $H\equiv f$ on $\mathcal{L}_{0}$.

Theorem 2

([13])

For a function f satisfying Assumption 1, the following results hold:

(i)
The level set $\mathcal{L}_{0}$ is nonempty and compact.
(ii)
There exists $\rho^{\mathrm{id}}>0$ such that, for any $\rho\geq\rho^{\mathrm{id}}$ and any given $y\in\mathcal{L}_{0}$, the function $f+\frac{1}{2}\rho \Vert \cdot-y \Vert ^{2}$ is convex on $\mathcal{L}_{0}$.
(iii)
The function f is Lipschitz continuous on $\mathcal{L}_{0}$.

3 The alternating linearization bundle method

3.1 Motivation and framework

The classic proximal point algorithm (see e.g. [18]) for solving problem (1) generates the new iterate by

$$ y^{\ell+1}=\arg\min \biggl\{ f(\cdot)+h(\cdot)+ \frac{1}{2}R_{\ell } \bigl\Vert \cdot-y^{\ell} \bigr\Vert ^{2} \biggr\} , $$

(8)

where $R_{\ell}>0$ is the proximal parameter.

However, since f is a nonconvex function, solving problem (8) may not be easy and is usually as difficult as the original problem (1). Therefore, we will tackle the difficulty via the following three steps.

1. Generate the local convexification function of the nonconvex function f. Following the redistribution idea of [13], we split the prox-parameter $R_{\ell}$ into two dynamic parameters $\eta_{\ell}$ and $\mu_{\ell}$ which are nonnegative and satisfy $R_{\ell}=\eta_{\ell}+\mu_{\ell}$. Then the problem (8) (after replacing $y^{\ell}$ by $x^{k}$) can be written as

$$ y^{\ell+1}=\arg\min\biggl\{ \varphi_{\ell}(\cdot)+h(\cdot)+\frac{1}{2}\mu _{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , $$

(9)

where

$$ \varphi_{\ell}(\cdot)=f(\cdot)+\frac{1}{2} \eta_{\ell} \bigl\Vert \cdot -x^{k} \bigr\Vert ^{2} $$

(10)

is called the local convexification function of f, since it is convex whenever $\eta_{\ell}$ is large enough (see Theorem 2).

2. Generate the cutting-planes model of $\varphi_{\ell}$. Let ℓ be the current iteration index, $y^{i}$, $i\in J_{\ell }\subseteq\{0,1,\ldots,\ell\}$ be trial points generated in the previous iterations, and $g_{f}^{i}\in\partial f(y^{i})$. Define the cutting-planes model of $\varphi_{\ell}$ by

$$ \check{\varphi}_{\ell}(\cdot)=\max_{i\in J_{\ell}} \biggl\{ f\bigl(y^{i}\bigr)+\frac{1}{2}\eta_{\ell} \bigl\Vert y^{i}-x^{k} \bigr\Vert ^{2}+ \bigl\langle g_{f}^{i}+\eta_{\ell}\bigl(y^{i}-x^{k} \bigr), \cdot-y^{i}\bigr\rangle \biggr\} , $$

(11)

where $g_{f}^{i}=g_{f}(y^{i})\in\partial f(y^{i})$. Therefore, we obtain an approximate version of problem (9) as follows:

$$ y^{\ell+1}:=\arg\min\biggl\{ \check{\varphi}_{\ell}( \cdot)+h(\cdot )+\frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} . $$

(12)

3. Apply the alternating linearization bundle strategy to solve problem (12). Since problem (12) may still be difficult, motivating by the idea of the alternating linearization bundle [9], we consider to alternately solve the following two subproblems:

$$\begin{aligned} &z^{\ell+1}:=\arg\min\biggl\{ \check{\varphi}_{\ell}(\cdot)+\bar {h}_{\ell-1}(\cdot)+ \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , \end{aligned}$$

(13)

$$\begin{aligned} &y^{\ell+1}:=\arg\min\biggl\{ \bar{\varphi}_{\ell}(\cdot)+h(\cdot)+ \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} . \end{aligned}$$

(14)

The above two subproblems are much easier to solve, whose objective functions are alternately represented by linear models of $h(\cdot)$ and $\check{\varphi}_{\ell}(\cdot)$, respectively.

3.2 Further description via bundle terminologies

Bundle methods [19–21] are among the most robust and reliable methods to solve general nonsmooth optimization problems, which can be considered stabilized variants of cutting-planes method [14, 15]. In general, for a convex function h, bundle methods store the trial points $y^{i}$, $i\in J_{\ell}$ with their function values and subgradients in a bundle of information:

$$ \bigcup_{i\in J_{\ell}} \bigl\{ \bigl(y^{i}, h\bigl(y^{i}\bigr), g^{i}_{h} \in \partial h\bigl(y^{i}\bigr)\bigr) \bigr\} , $$

(15)

and a point $x^{k}:=x^{k(\ell)}$ (called stability center) which is the “best” point obtained so far. A storage-saving form of (15) (refer to the current stability center $x^{k}$) is given by

$$\bigcup_{i\in J_{\ell}}\bigl\{ \bigl(e^{i,k}_{h}, g^{i}_{h}\in\partial _{e^{i,k}_{h}}h\bigl(x^{k} \bigr)\bigr)\bigr\} , $$

where $\partial_{e}h$ is the e-subdifferential of h in convex analysis, and $e^{i,k}_{h}$ are the linearization errors of h defined by

$$ e^{i,k}_{h}=h\bigl(x^{k}\bigr)- \bigl(h\bigl(y^{i}\bigr)+\bigl\langle g^{i}_{h}, x^{k}-y^{i}\bigr\rangle \bigr). $$

(16)

Following the notations above, the bundle information of the function $\varphi_{\ell}(\cdot)$ can be written as (see also [13]):

$$ \bigcup_{i\in J_{\ell}}\bigl\{ \bigl(e^{i,k}_{f}, d^{k}_{i}, \Delta^{k}_{i}, g^{i}_{f}\bigr)\bigr\} \quad\text{with } \textstyle\begin{cases} e^{i,k}_{f}=f(x^{k})-(f(y^{i})+\langle g^{i}_{f}, x^{k}-y^{i} \rangle ), \\ d^{k}_{i}=\frac{1}{2} \Vert y^{i}-x^{k} \Vert ^{2}, \\ g^{i}_{f}\in\partial f(y^{i}), \\ \Delta^{k}_{i}=y^{i}-x^{k}, \end{cases} $$

(17)

where $e^{i,k}_{f}$ and $d^{k}_{i}$ are the linearization errors of f and $\frac{1}{2} \Vert \cdot-x^{k} \Vert ^{2}$, respectively, $g^{i}_{f}$ is a subgradient of f at $y^{i}$, and $\Delta^{k}_{i}$ is the gradient of $\frac{1}{2} \Vert \cdot-x^{k} \Vert ^{2}$ at $y^{i}$. From (17), we know that $e^{i,k}_{f}$, $d^{k}_{i}$ and $\Delta^{k}_{i}$ depend on the point $x^{k}$, so they should be updated whenever a new stability center is generated (details are given below).

By the optimality conditions of subproblem (13), there exists a multiplier vector $(\alpha^{\ell}_{i}, i\in J_{\ell})\in S^{\ell}$ such that

$$ z^{\ell+1}=x^{k}-\frac{1}{\mu_{\ell}} \biggl(\sum _{i\in J_{\ell }}\alpha^{\ell}_{i} \bigl(g^{i}_{f}+\eta_{\ell}\Delta^{k}_{i} \bigr)+g^{\ell -1}_{h} \biggr), $$

(18)

where $S^{\ell}$ denotes the unit simplex in $\mathbb{R}^{ \vert J_{\ell } \vert }$ and $g^{\ell-1}_{h}=\nabla\bar{h}_{\ell-1}(z^{\ell+1})$.

As iterations go along, the number of elements in the bundle may increase infinitely, which could lead to serious problems with storage and computation. The subgradient aggregation strategy [22] is the most popular and efficient way to overcome such a difficulty. We use the notation $g^{-\ell}_{\eta_{\ell}}$ to denote the aggregate subgradient, i.e.,

$$ g^{-\ell}_{\eta_{\ell}}:=\sum _{i\in J_{\ell}}\alpha^{\ell }_{i}\bigl(g^{i}_{f}+ \eta_{\ell}\Delta^{k}_{i}\bigr)\in \partial\check{ \varphi}_{\ell}\bigl(z^{\ell+1}\bigr). $$

(19)

Define the strongly active set of subgradients by

$$J^{\mathrm{act}}_{\ell}:=\bigl\{ i\in J_{\ell}: \alpha^{\ell}_{i}>0\bigr\} . $$

Then the corresponding aggregate bundle elements are given by

$$ \bigl(e^{-\ell}_{f}, d^{k}_{-\ell}, \Delta^{k}_{-\ell}, g^{-\ell}_{f}\bigr):= \sum_{i\in J_{\ell}}\alpha^{\ell }_{i} \bigl(e^{i,k}_{f},d^{k}_{i}, \Delta^{k}_{i},g^{i}_{f}\bigr) =\sum _{j\in J^{\mathrm{act}}_{\ell}}\alpha^{\ell }_{j} \bigl(e^{j,k}_{f},d^{k}_{j}, \Delta^{k}_{j},g^{j}_{f}\bigr). $$

(20)

Therefore

$$g^{-\ell}_{\eta_{\ell}}=\sum_{i\in J_{\ell}} \alpha^{\ell }_{i}\bigl(g^{i}_{f}+ \eta_{\ell}\Delta^{k}_{i}\bigr)=g^{-\ell}_{f}+ \eta_{\ell }\Delta^{k}_{-\ell}=\mu_{\ell} \bigl(x^{k}-z^{\ell+1}\bigr)-g^{\ell-1}_{h}. $$

Here, as in [13, 23], we use negative index −ℓ to express the aggregate bundle elements, hence $J_{\ell}\subseteq\{ -\ell,-\ell+1,\dots,0,1,\dots,\ell-1,\ell\}$ in general.

By making use of the notations above, the cutting-planes model $\check {\varphi}_{\ell}$ in (11) can be rewritten as:

$$ \check{\varphi}_{\ell}(\cdot)=f\bigl(x^{k} \bigr)+\max_{i\in J_{\ell}}\bigl\{ -\bigl(e^{i,k}_{f}+ \eta_{\ell}d^{k}_{i}\bigr)+\bigl\langle g^{i}_{f}+\eta_{\ell }\Delta^{k}_{i}, \cdot-x^{k}\bigr\rangle \bigr\} . $$

(21)

Note that, for all $j\in J^{\mathrm{act}}_{\ell}$ we have

$$ \check{\varphi}_{\ell}\bigl(z^{\ell+1}\bigr)=f \bigl(x^{k}\bigr)-e^{j,k}_{f}-\eta_{\ell }d^{k}_{j}+ \bigl\langle g^{j}_{f}+\eta_{\ell} \Delta^{k}_{j},z^{\ell +1}-x^{k}\bigr\rangle , $$

(22)

and the aggregate model of $\check{\varphi}_{\ell}$ in $J_{\ell}$ is

$$\tilde{\varphi}_{-\ell}(\cdot)=f\bigl(x^{k} \bigr)-e^{-\ell,k}_{f}-\eta_{\ell }d_{-\ell}^{k}+ \bigl\langle g^{-\ell}_{f}+\eta_{\ell} \Delta_{-\ell }^{k},\cdot-x^{k}\bigr\rangle . $$

For a new stability center $x^{k+1}$, the bundle elements can be updated by (see [13])

$$\begin{aligned} \begin{aligned} &e^{i,k+1}_{f}=e^{i,k}_{f}+f \bigl(x^{k+1}\bigr)-\bigl(f\bigl(x^{k}\bigr)+\bigl\langle g^{i}_{f},x^{k+1}-x^{k}\bigr\rangle \bigr), \\ &d^{k+1}_{i}=d^{k}_{i}+ \frac{1}{2} \bigl\Vert x^{k+1}-x^{k} \bigr\Vert ^{2}+\bigl\langle \Delta ^{k}_{i},x^{k+1}-x^{k} \bigr\rangle , \\ &\Delta^{k+1}_{i}=\Delta^{k}_{i}+x^{k}-x^{k+1}. \end{aligned} \end{aligned}$$

(23)

On the other hand, since f is possibly nonconvex, the linearization errors $e^{i,k}_{f}+\eta_{\ell}d^{k}_{i}, i\in J_{\ell}$ may be negative, and therefore the model $\check{\varphi}_{\ell}$ is not necessarily a lower approximation to $\varphi_{\ell}$. In the case of $e^{i,k}_{f}+\eta_{\ell}d^{k}_{i}\geq0$, one has

$$ g^{i}_{f}+\eta_{\ell} \Delta^{k}_{i}\in\partial_{e^{i,k}_{f}+\eta _{\ell}d^{k}_{i}}\check{ \varphi}_{\ell}\bigl(x^{k}\bigr). $$

(24)

In order to ensure that the linearization errors are all nonnegative, the convexification parameter $\eta_{\ell}$ should be adjusted to asymptotically estimate the ideal convexity threshold $\rho^{\mathrm{id}}$ in Theorem 2. Hare et al. [2] suggested a lower bound for $\eta_{\ell }$ as follows:

$$ \eta^{\min}_{\ell}:=\max_{i\in J_{\ell},d^{k}_{i}>0}- \frac {e^{i,k}_{f}}{d^{k}_{i}}, $$

(25)

which guarantees that $e^{i,k}_{f}+\eta d^{k}_{i}\geq0$ for all $i\in J_{\ell}$ whenever $\eta\geq\eta^{\min}_{\ell}$.

Finally, in our algorithm, we define the predicted descent $\delta _{\ell}$ and the linearization error $\varepsilon_{\ell}$ as follows:

$$\begin{aligned} &\delta_{\ell}:=f\bigl(x^{k}\bigr)+\frac{1}{2} \eta_{\ell} \bigl\Vert y^{\ell +1}-x^{k} \bigr\Vert ^{2}+h\bigl(x^{k}\bigr) -\bigl[ \bar{\varphi}_{\ell} \bigl(y^{\ell+1}\bigr)+h\bigl(y^{\ell+1}\bigr) \bigr], \end{aligned}$$

(26)

$$\begin{aligned} &\varepsilon_{\ell}:=F\bigl(x^{k}\bigr)-\bigl[ \bar{ \varphi}_{\ell}\bigl(x^{k}\bigr)+\bar {h}_{\ell} \bigl(x^{k}\bigr) \bigr]. \end{aligned}$$

(27)

For a fixed parameter $\kappa\in(0,1)$, a descent step is taken if

$$ F\bigl(y^{\ell+1}\bigr)\leq F\bigl(x^{k}\bigr)- \kappa\delta_{\ell}, $$

(28)

holds, and then update the stability center $x^{k+1}=y^{\ell+1}$. Otherwise, a null step occurs, and then the aggregate linearization and the new linearization are used to produce a better model $\check{\varphi}_{\ell+1}$.

3.3 The algorithm

Algorithm 1

Step 0. (Initialization). Select a starting point $y^{0}$ and set $x^{0}=y^{0}$. Set parameters $M>0$, $R_{0}>0$, $\kappa\in(0,1)$, $\epsilon\geq0$, and $\Gamma\geq1$. Initialize the iteration counter $\ell=0$, the descent step counter $k:=k(\ell)=0$ with $i_{0}=0$. Set $(\mu_{0}, \eta_{0})=(R_{0}, 0)$ and $J_{0}:=\{0\}$. Compute $f(x^{0}), g^{0}_{f}\in\partial f(x^{0})$ and the bundle information $(e^{0,0}_{f}, d^{0}_{0}, \triangle^{0}_{0}):=(0, 0, 0)$. Set $s^{-1}_{h}=g^{0}_{h}\in\partial h(x^{0})$.

Step 1. Find $z^{\ell+1}$ by solving subproblem (13), and set

$$ \bar{\varphi}_{\ell}(\cdot)=\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)+ \bigl\langle s_{\varphi}^{\ell}, \cdot-z^{\ell+1} \bigr\rangle \quad\textit{with } s_{\varphi}^{\ell}= \mu_{\ell}\bigl(x^{k}-z^{\ell +1}\bigr)-s_{h}^{\ell-1}. $$

(29)

Step 2. Find $y^{\ell+1}$ by solving subproblem (14), and set

$$ \bar{h}_{\ell}(\cdot)=h\bigl(y^{\ell+1}\bigr)+\bigl\langle s_{h}^{\ell}, \cdot -y^{\ell+1} \bigr\rangle \quad\textit{with } s_{h}^{\ell}=\mu_{\ell} \bigl(x^{k}-y^{\ell +1}\bigr)-s_{\varphi}^{\ell}. $$

(30)

Step 3. (Stopping criterion). Compute $f(y^{\ell +1})$, $h(y^{\ell+1})$, $g_{f}^{\ell+1}\in\partial f(y^{\ell+1})$ and $g_{h}^{\ell+1}\in\partial h(y^{\ell+1})$. If $\delta_{\ell} \leq \epsilon$, then STOP. Otherwise, compute the new bundle elements by

$$\begin{aligned} &\Delta^{k}_{\ell+1}:=y^{\ell+1}-x^{k},\qquad d^{k}_{\ell+1}:= \bigl\Vert \Delta ^{k}_{\ell+1} \bigr\Vert ^{2}/2, \\ &e^{\ell+1,k}_{f}:=f\bigl(x^{k}\bigr)-\bigl(f \bigl(y^{\ell+1}\bigr)+\bigl\langle g^{\ell +1}_{f}, \Delta^{k}_{\ell+1}\bigr\rangle \bigr). \end{aligned}$$

Select a new index set $J_{\ell+1}$ satisfying

$$ J_{\ell+1}\supseteq\{\ell+1, i_{k}\} \quad\textit{and}\quad \textstyle\begin{cases} \textit{either } J_{\ell+1}\supseteq J_{\ell}^{\mathrm{act}},\\ \textit{or } J_{\ell+1}\supseteq\{-\ell\}. \end{cases} $$

(31)

Step 4. (Descent test). If (28) holds, declare a descent step, set $k(\ell +1)=k+1, i_{k+1}=\ell+1, x^{k+1}=y^{\ell+1}$, and update the bundle elements by (23). Otherwise, declare a null step, and set $k(\ell+1)=k(\ell)$.

Step 5. (Update η). Update the convexification parameter by

$$ \textstyle\begin{cases} \eta_{\ell+1}:=\eta_{\ell} \quad\textit{if } \eta_{\ell+1}^{\min }\leq\eta_{\ell}, \\ \eta_{\ell+1}:=\Gamma\eta_{\ell+1}^{\min} \quad \textit{and} \quad R_{\ell +1}:=\mu_{\ell}+\eta_{\ell+1} \quad \textit{otherwise}. \end{cases} $$

(32)

Step 6. (Update μ). If $F(y^{\ell +1})>F(x^{k})+M$, then the objective increase is unacceptable, let $\mu_{\ell+1}:=\Gamma\mu_{\ell}$ and loop to Step 1; otherwise, set $\mu_{\ell+1}:=\mu_{\ell}$.

Step 7. (Loop). Increase ℓ by 1 and go to Step 1.

Remark 1

(1) The predicted descent $\delta_{\ell}$ and the linearization error $\varepsilon_{\ell}$ are nonnegative (the details are given below); (2) in Step 6, if $F(y^{\ell+1})>F(x^{k})+M$ holds, then the current model is considered as “bad”, so we should become more “conservative”, and therefore increase the proximal parameter μ by setting $\mu_{\ell+1}=\Gamma\mu_{\ell}$. In the next section, we will prove that the number of increasing μ is finite; (3) the parameters $\mu_{\ell}, \eta_{\ell}$ in the algorithm will be stable eventually.

Lemma 2

The predicted descent $\delta_{\ell}$ and the linearization error $\varepsilon_{\ell}$ are nonnegative, and satisfy

$$ \varepsilon_{\ell}=\delta_{\ell}-\frac{R_{\ell}+\mu_{\ell }}{2\mu_{\ell}^{2}} \bigl\Vert s^{\ell} \bigr\Vert ^{2}, \quad \textit{with } s^{\ell }=s_{\varphi}^{\ell}+s_{h}^{\ell}. $$

(33)

Proof

In Step 2, from (30) and (18) we know $g^{\ell-1}_{h}=s^{\ell-1}_{h}$, hence

$$ g^{-\ell}_{\eta_{\ell}}={\mu_{\ell}} \bigl(x^{k}-z^{\ell+1}\bigr)-g^{\ell -1}_{h}={ \mu_{\ell}}\bigl(x^{k}-z^{\ell+1}\bigr)-s^{\ell-1}_{h}=s^{\ell }_{\varphi}. $$

(34)

Next, we prove $\delta_{\ell}\geq0$ and $\varepsilon_{\ell}\geq 0$. From (26) and (29) we have

$$\begin{aligned} \delta_{\ell}&=f\bigl(x^{k}\bigr)+ \frac{1}{2}\eta_{\ell} \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h\bigl(x^{k}\bigr)-\bigl[ \bar{ \varphi}_{\ell}\bigl(y^{\ell+1}\bigr)+ \bar{h}_{\ell} \bigl(y^{\ell+1}\bigr) \bigr] \\ &=f\bigl(x^{k}\bigr)+\frac{1}{2}\eta_{\ell} \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h \bigl(x^{k}\bigr)-\check{\varphi}_{\ell}\bigl(z^{\ell+1} \bigr)-\bigl\langle s^{\ell }_{\varphi}, y^{\ell+1}-z^{\ell+1} \bigr\rangle -h\bigl(y^{\ell+1}\bigr) \\ &=f\bigl(x^{k}\bigr)-\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell }_{\varphi}, y^{\ell+1}-z^{\ell+1}\bigr\rangle +\frac{1}{2} \eta_{\ell } \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h\bigl(x^{k}\bigr)-h\bigl(y^{\ell+1}\bigr). \end{aligned}$$

(35)

Let $j=-\ell$ in (22) and from (34) we can obtain

$$\begin{aligned} &f\bigl(x^{k}\bigr)-\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell }_{\varphi}, y^{\ell+1}-z^{\ell+1}\bigr\rangle \\ &\quad=f\bigl(x^{k}\bigr)-\bigl[ f\bigl(x^{k} \bigr)-e^{-\ell}_{f}-\eta_{\ell}d^{k}_{-\ell }+ \bigl\langle g^{-\ell}_{f}+\eta_{\ell} \triangle^{k}_{-\ell},z^{\ell +1}-x^{k}\bigr\rangle \bigr]-\bigl\langle s^{\ell}_{\varphi}, y^{\ell+1}-z^{\ell +1} \bigr\rangle \\ &\quad=e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}- \bigl\langle s^{\ell}_{\varphi }, z^{\ell+1}-x^{k} \bigr\rangle -\bigl\langle s^{\ell}_{\varphi}, y^{\ell +1}-z^{\ell+1} \bigr\rangle \\ &\quad=e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}+ \bigl\langle s^{\ell}_{\varphi }, x^{k}-y^{\ell+1} \bigr\rangle . \end{aligned}$$

(36)

On the other hand, from (30), we have

$$\begin{aligned} -h\bigl(y^{\ell+1}\bigr)&=-h\bigl(y^{\ell+1}\bigr)-\bigl\langle s^{\ell}_{h},x^{k}-y^{\ell +1}\bigr\rangle + \bigl\langle s^{\ell}_{h},x^{k}-y^{\ell+1}\bigr\rangle \\ &=-\bar{h}_{\ell}\bigl(x^{k}\bigr)+\bigl\langle s^{\ell}_{h},x^{k}-y^{\ell+1}\bigr\rangle . \end{aligned}$$

(37)

Hence, by combining (36), (37) and (30), (35) can be written as

$$\begin{aligned} \delta_{\ell}={}&e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell }+ \bigl\langle s^{\ell}_{\varphi}, x^{k}-y^{\ell+1} \bigr\rangle +\frac {1}{2}\eta_{\ell} \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h \bigl(x^{k}\bigr)-\bar{h}_{\ell }\bigl(x^{k}\bigr) \\ &{} +\bigl\langle s^{\ell}_{h},x^{k}-y^{\ell+1} \bigr\rangle \end{aligned}$$

(38)

$$\begin{aligned} ={}&e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}+ \bigl\langle \mu_{\ell }\bigl(x^{k}-y^{\ell+1}\bigr), x^{k}-y^{\ell+1}\bigr\rangle +\frac{1}{2} \eta_{\ell } \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h\bigl(x^{k}\bigr)-\bar{h}_{\ell} \bigl(x^{k}\bigr) \\ ={}&e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}+ \frac{R_{\ell}+\mu _{\ell}}{2} \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}+h\bigl(x^{k}\bigr)-\bar{h}_{\ell } \bigl(x^{k}\bigr). \end{aligned}$$

(39)

In Step 5, the update for $\eta_{\ell}$ is done to ensure $\eta _{\ell}\geq\eta^{\min}_{\ell}$ for all iterations, so that $e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}\geq0$. Therefore, the predicted descent $\delta_{\ell}\geq0$ since h is convex.

For $\varepsilon_{\ell}$, from (27), one has

$$\begin{aligned} \varepsilon_{\ell}&=F\bigl(x^{k}\bigr)-\bigl[ \bar{ \varphi}_{\ell}\bigl(x^{k}\bigr)+\bar {h}_{\ell} \bigl(x^{k}\bigr) \bigr] \\ &=f\bigl(x^{k}\bigr)+h\bigl(x^{k}\bigr)-\check{ \varphi}_{\ell}\bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell}_{\varphi}, x^{k}-z^{\ell+1}\bigr\rangle - \bar{h}_{\ell }\bigl(x^{k}\bigr) \\ &=f\bigl(x^{k}\bigr)-\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell }_{\varphi}, x^{k}-z^{\ell+1}\bigr\rangle +h\bigl(x^{k}\bigr)- \bar{h}_{\ell }\bigl(x^{k}\bigr). \end{aligned}$$

(40)

Similar to (36), we have

$$\begin{aligned} &f\bigl(x^{k}\bigr)-\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell }_{\varphi}, x^{k}-z^{\ell+1}\bigr\rangle \\ &\quad=f\bigl(x^{k}\bigr)-\bigl[ f\bigl(x^{k} \bigr)-e^{-\ell}_{f}-\eta_{\ell}d^{k}_{-\ell }+ \bigl\langle g^{-\ell}_{f}+\eta_{\ell} \triangle^{k}_{-\ell},z^{\ell +1}-x^{k}\bigr\rangle \bigr] \\ &\qquad{}-\bigl\langle s^{\ell}_{\varphi}, x^{k}-z^{\ell +1} \bigr\rangle \\ &\quad=e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}- \bigl\langle s^{\ell}_{\varphi }, z^{\ell+1}-x^{k} \bigr\rangle -\bigl\langle s^{\ell}_{\varphi}, x^{k}-z^{\ell+1} \bigr\rangle \\ &\quad=e^{-\ell}_{f}+\eta_{\ell}d^{k}_{-\ell}. \end{aligned}$$

(41)

Thus, we have

$$ \varepsilon_{\ell}=e^{-\ell}_{f}+ \eta_{\ell}d^{k}_{-\ell }+h\bigl(x^{k}\bigr)- \bar{h}_{\ell}\bigl(x^{k}\bigr)\geq0. $$

(42)

Equation (33) follows immediately from (39) and (42). □

From (33), We know that $\delta_{\ell}\geq\varepsilon _{\ell}$. Therefore, if $\delta_{\ell}\leq\epsilon$, then $\varepsilon_{\ell}\leq\epsilon$. So, we only use $\delta_{\ell }\leq\epsilon$ as the termination criterion in Step 5.

Lemma 3

The vectors $s_{\varphi}^{\ell}$ and $s_{h}^{\ell}$ of (30) and (29) are in fact subgradients, i.e.,

$$ s_{\varphi}^{\ell}\in\partial\check{ \varphi}_{\ell}\bigl(z^{\ell+1}\bigr) \quad\textit{and}\quad s_{h}^{\ell}\in\partial h_{\ell}\bigl(y^{\ell+1} \bigr). $$

(43)

Furthermore, we have

$$ \bar{\varphi}_{\ell}\leq\check{\varphi}_{\ell}\quad \textit{and}\quad \bar{h}_{\ell}\leq h. $$

(44)

Proof

Let $\phi^{\ell}_{f}$ and $\phi^{\ell}_{h}$ denote the objectives of (13) and (14), respectively, i.e.,

$$\begin{aligned} &\phi^{\ell}_{f}(\cdot):=\check{\varphi}_{\ell}( \cdot)+\bar {h}_{\ell-1}(\cdot)+\frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}, \end{aligned}$$

(45)

$$\begin{aligned} &\phi^{\ell}_{h}(\cdot):=\bar{\varphi}_{\ell}( \cdot)+h(\cdot )+\frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2} . \end{aligned}$$

(46)

By (13), (29) and the optimality condition of (45), we have

$$0\in\partial\check{\varphi}_{\ell}\bigl(z^{\ell+1} \bigr)+s_{h}^{\ell -1}+\mu_{\ell}\bigl(z^{\ell+1}-x^{k} \bigr)=\partial\check{\varphi}_{\ell }\bigl(z^{\ell+1}\bigr) -s^{\ell}_{\varphi}, $$

which implies $s_{\varphi}^{\ell}\in\partial\check{\varphi}_{\ell }(z^{\ell+1})$. Similarly, by (14) and the optimality condition of (46), we obtain

$$0\in\partial h\bigl(y^{\ell+1}\bigr)+s_{\varphi}^{\ell}+ \mu_{\ell }\bigl(x^{k}-y^{\ell+1}\bigr)=\partial h \bigl(y^{\ell+1}\bigr)-s^{\ell}_{h}, $$

which implies $s_{h}^{\ell}\in\partial h_{\ell}(y^{\ell+1})$. So (43) holds.

Equation (44) follows immediately from (43). □

4 Convergence

In this section, we will study the convergence properties of Algorithm 1. Firstly, based on the objective function of problem (1), we need to slightly modify Assumption 1 as follows.

Assumption 2

Given $x^{0}\in\mathbb{R}^{n}$ and $M_{0}\geq0$, there exist an open bounded set $\mathcal{O}$ and a function H such that $\mathcal{L}_{0}:=\{x\in\mathbb{R}^{n}: F(x)\leq F(x^{0})+M_{0}\}\subset\mathcal{O}$, and H is lower-$C^{2}$ on $\mathcal{O}$ with $H\equiv f$ on $\mathcal{L}_{0}$.

For convenience, we assume that Assumption 2 holds throughout the rest of convergence analysis.

In addition, from [24] we know that, if f is a locally Lipschitz continuous function, then the subgradients of f are locally bounded, i.e.,

$$ \bigl\{ g^{\ell}_{f}\bigr\} \text{ is bounded if } \bigl\{ y^{\ell}\bigr\} \text{ is bounded.} $$

(47)

Further, as in [9], it follows that the model subgradients $s^{\ell}_{\varphi}$ in (43) satisfy

$$ \bigl\{ s^{\ell}_{\varphi}\bigr\} \text{ is bounded if } \bigl\{ y^{\ell}\bigr\} \text{ is bounded.} $$

(48)

Remark 2

Note that (47) implies that $\{g^{\ell}_{\varphi}:=g^{\ell }_{f}+\eta_{\ell}\triangle^{\ell}_{i}\}$ $(g^{\ell}_{\varphi}\in\partial\check{\varphi}_{\ell})$ is bounded if $\{y^{\ell}\}$ is bounded, since $\{\triangle^{\ell}_{i}\}$ in (17) is bounded if $\{y^{\ell}\}$ is bounded. Since $s^{\ell}_{\varphi}\in\partial\check{\varphi}_{\ell}$, then $s^{\ell}_{\varphi}\in \operatorname{conv}\{g^{j}_{\varphi}\}_{j\in J_{\ell}}$, thus we have $\Vert s^{\ell}_{\varphi} \Vert \leq\max^{\ell }_{j=1} \Vert g^{j}_{\varphi} \Vert $, and the model $\check{\varphi}_{\ell}$ satisfies condition (48) automatically when (47) holds.

The following lemma shows the properties of the model function $\check {\varphi}_{\ell}$, whose proof can be found in [13, 16].

Lemma 4

For the model function $\check{\varphi}_{\ell}$ and convexification parameter $\eta_{\ell}$, we have

(i)
$\check{\varphi}_{\ell}$ is a convex function.
(ii)
If $\eta_{\ell}\geq\eta_{\ell}^{\min}$, then
$$ \check{\varphi}_{\ell}\bigl(x^{k}\bigr)\leq f \bigl(x^{k}\bigr). $$
(49)
(iii)
If $\eta_{\ell+1}=\eta_{\ell}$, and either $J_{\ell +1}\supseteq J_{\ell}^{\mathrm{act}}$ or $J_{\ell+1}\supseteq\{-\ell\}$, then
$$\check{\varphi}_{\ell+1}(\cdot)\geq\check{\varphi}_{\ell } \bigl(z^{\ell+1}\bigr)+\bigl\langle s^{\ell}_{\varphi}, \cdot-z^{\ell+1}\bigr\rangle $$
if $y^{\ell+1}$ is a null step.
(iv)
If $J_{\ell}\supseteq\{\ell\}$, then
$$\check{\varphi}_{\ell}(\cdot)\geq f\bigl(y^{\ell}\bigr)+ \frac{1}{2}\eta _{\ell} \bigl\Vert y^{\ell}-x^{k} \bigr\Vert ^{2}+\bigl\langle g^{\ell}_{f}+ \eta_{\ell }\bigl(y^{\ell}-x^{k}\bigr), \cdot-y^{\ell}\bigr\rangle , $$
for some $g^{\ell}_{f}\in\partial f(y^{\ell})$.
(v)
If $\eta_{\ell}\geq\rho^{\mathrm{id}}$, then
$$ \check{\varphi}_{\ell}(\omega)\leq f(\omega)+ \frac{1}{2}\eta _{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2} \quad\textit{for all } \omega\in\mathcal{L}_{0}. $$
(50)

From the updating rule in Step 5 of Algorithm 1, the convexification parameter $\eta_{\ell}$ is either unchanged or increasing. The following lemma shows that $\eta_{\ell}$ can be fixed in a finite number of iterations, whose proof can be found in [13].

Lemma 5

There exist an index $\ell_{1}$ and a positive constant $\overline {\eta}>0$ such that

$$\eta_{\ell}\equiv\overline{\eta}, \quad\textit{for all } \ell\geq \ell_{1}. $$

Lemma 6

Suppose that there exists an integer K such that, for all $\ell\geq K$, only null steps occur without increasing μ. Then the following results hold:

(i)
The sequences
$$\begin{aligned} &\biggl\{ \phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)=\check{ \varphi}_{\ell}\bigl(z^{\ell +1}\bigr)+\bar{h}_{\ell-1} \bigl(z^{\ell+1}\bigr)+ \frac{1}{2}\mu_{\ell} \bigl\Vert z^{\ell+1}-x^{k} \bigr\Vert ^{2}\biggr\} _{\ell\geq K}, \\ &\biggl\{ \phi^{\ell}_{h}\bigl(y^{\ell+1}\bigr)=\bar{ \varphi}_{\ell}\bigl(y^{\ell +1}\bigr)+h\bigl(y^{\ell+1}\bigr)+ \frac{1}{2}\mu_{\ell} \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert ^{2}\biggr\} _{\ell\geq K} \end{aligned}$$
are nondecreasing and convergent.
(ii)
The sequences $\{y^{\ell+1}\}$ and $\{z^{\ell+1}\}$ are bounded, $\Vert z^{\ell+1}-y^{\ell+1} \Vert \rightarrow0$ and $\Vert z^{\ell+2}-y^{\ell+1} \Vert \rightarrow0$ as $\ell\rightarrow \infty$.

Proof

First, using partial linearizations of the subproblems to show (i) is hold. Fixed $\ell\geq K$. By the definitions in (13) and (29), we have $\bar{\varphi}_{\ell}(z^{\ell+1})=\check{\varphi}_{\ell }(z^{\ell+1})$ and

$$ z^{\ell+1}=\arg\min\biggl\{ \bar{\phi}^{\ell}_{f}( \cdot):=\bar{\varphi }_{\ell}(\cdot)+\bar{h}_{\ell-1}(\cdot) + \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , $$

(51)

from $\nabla\bar{\phi}^{\ell}_{f}(z^{\ell+1})=0$. Since $\bar {\phi}^{\ell}_{f}$ is quadratic and $\bar{\phi}^{\ell}_{f}(z^{\ell+1})=\phi^{\ell}_{f}(z^{\ell +1})$, by Taylor’s expansion

$$\begin{aligned} \bar{\phi}^{\ell}_{f}(\cdot)&=\bar{ \phi}^{\ell}_{f}\bigl(z^{\ell +1}\bigr)+\nabla\bar{ \phi}^{\ell}_{f}\bigl(z^{\ell+1}\bigr) \bigl( \cdot-z^{\ell+1}\bigr) +\frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-z^{\ell+1} \bigr\Vert ^{2} \\ &=\phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)+ \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot -z^{\ell+1} \bigr\Vert ^{2}. \end{aligned}$$

(52)

Similarly, by the definitions in (14) and (30), we have $\bar{h}_{\ell}(y^{\ell+1})=h(y^{\ell +1})$, and

$$\begin{aligned} &y^{\ell+1}=\arg\min\biggl\{ \bar{\phi}^{\ell}_{h}( \cdot):=\bar {\varphi}_{\ell}(\cdot)+\bar{h}_{\ell}(\cdot) + \frac{1}{2}\mu_{\ell} \bigl\Vert \cdot-x^{k} \bigr\Vert ^{2}\biggr\} , \end{aligned}$$

(53)

$$\begin{aligned} &\bar{\phi}^{\ell}_{h}(\cdot)=\phi^{\ell}_{h} \bigl(y^{\ell+1}\bigr)+\frac {1}{2}\mu_{\ell} \bigl\Vert \cdot-y^{\ell+1} \bigr\Vert ^{2}. \end{aligned}$$

(54)

Next, to bound the objective values of the linearized subproblem (51) and (53) from above, we use $\bar{\varphi}_{\ell}\leq\check{\varphi}_{\ell}$ and $\bar{h}_{\ell-1}\leq h$, $\bar{h}_{\ell}\leq h$ of (44) and $\check{\varphi}_{\ell}(x^{k})\leq f(x^{k})$ in (ii) of Lemma 4

$$\begin{aligned} &\phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)+\frac{1}{2} \mu_{\ell} \bigl\Vert x^{k}-z^{\ell +1} \bigr\Vert ^{2}=\bar{\phi}^{\ell}_{f}\bigl(x^{k} \bigr)\leq\check{\varphi}_{\ell}\bigl(x^{k}\bigr) +h \bigl(x^{k}\bigr)\leq F\bigl(x^{k}\bigr), \end{aligned}$$

(55)

$$\begin{aligned} &\phi^{\ell}_{h}\bigl(y^{\ell+1}\bigr)+\frac{1}{2} \mu_{\ell} \bigl\Vert x^{k}-y^{\ell +1} \bigr\Vert ^{2}=\bar{\phi}^{\ell}_{h}\bigl(x^{k} \bigr)\leq\check{\varphi}_{\ell}\bigl(x^{k}\bigr) +h \bigl(x^{k}\bigr)\leq F\bigl(x^{k}\bigr). \end{aligned}$$

(56)

From (14) and (51), we have $\bar{\phi }_{f}^{\ell}\leq\phi^{\ell}_{h}$. On the other hand, since only null step occurred, so $x^{k+1}=x^{k}$, the algorithm ensures that $\mu_{\ell}=\mu_{\ell +1}$, and $\bar{\varphi}_{\ell}\leq\check{\varphi}_{\ell+1}$ by (iii) of Lemma 4, we can obtain $\bar{\phi}^{\ell}_{h}\leq\phi ^{\ell+1}_{f}$. By (52) and (54), we see that

$$\begin{aligned} &\phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)+\frac{1}{2} \mu_{\ell} \bigl\Vert y^{\ell +1}-z^{\ell+1} \bigr\Vert ^{2}=\bar{\phi}^{\ell}_{f}\bigl(y^{\ell+1} \bigr) \leq\phi^{\ell}_{h}\bigl(y^{\ell+1} \bigr), \end{aligned}$$

(57)

$$\begin{aligned} &\phi^{\ell}_{h}\bigl(y^{\ell+1}\bigr)+\frac{1}{2} \mu_{\ell} \bigl\Vert z^{\ell +2}-y^{\ell+1} \bigr\Vert ^{2}=\bar{\phi}^{\ell}_{h}\bigl(z^{\ell+2} \bigr) \leq\phi^{\ell+1}_{f}\bigl(z^{\ell+2} \bigr). \end{aligned}$$

(58)

In particular, from (57) and (58), we have the relation

$$\phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)\leq \phi^{\ell}_{h}\bigl(y^{\ell+1}\bigr)\leq \phi^{\ell+1}_{f}\bigl(z^{\ell+2}\bigr) $$

which implies that $\{\phi^{\ell}_{f}(z^{\ell+1})\}_{\ell\geq K}$ and $\{\phi^{\ell}_{h}(y^{\ell+1})\}_{\ell\geq K}$ are nondecreasing sequences. Together with the bound of $F(x^{k})$ from (55) and (56), the convergence is established.

For (ii), we have proved the convergence of $\{\phi^{\ell }_{f}(z^{\ell+1})\}$ and $\{\phi^{\ell}_{h}(y^{\ell+1})\}$ in (i) when $\ell\geq K$, so there must have a common limit, say $\phi_{\infty}\leq F(x^{k})$, such that

$$ \phi^{\ell}_{f}\bigl(z^{\ell+1}\bigr)\rightarrow \phi_{\infty}, \qquad\phi^{\ell }_{h}\bigl(y^{\ell+1} \bigr)\rightarrow\phi_{\infty} $$

(59)

and we have $\Vert z^{\ell+1}-y^{\ell+1} \Vert \rightarrow0$ and $\Vert z^{\ell +2}-y^{\ell+1} \Vert \rightarrow0$ from (57) and (58), $\{y^{\ell+1}\}$ and $\{z^{\ell+1}\}$ are bounded from (55) and (56). Then the sequences $\{g^{\ell}_{f}\}$ and $\{s^{\ell}_{\varphi}\}$ are bounded by (47) and (48). □

The following lemma shows that the number of times of increasing μ is finite.

Lemma 7

Suppose that $i_{k}\in J_{\ell}$, and let $N_{\ell}$ be the number of times of increasing μ. Then there exists a positive constant L such that

$$ N_{\ell}\leq \biggl\lceil \frac{\ln\frac{L^{2}}{M\mu_{0}}}{\ln\Gamma } \biggr\rceil , $$

(60)

where $\lceil a\rceil$ is the smallest integer greater than or equal to a. As a result, there exists an index $\ell_{2}$ such that

$$\mu_{\ell}=\bar{\mu}, \quad\textit{for all } \ell\geq\ell_{2}. $$

Proof

Let $\ell_{r}$ be the index corresponding to the rth time that μ increases, then when $\ell_{r}+1\leq\ell<\ell_{r+1}$, we have

$$ \mu_{\ell}=\Gamma^{r}\mu_{0}. $$

(61)

Since $i_{k}\in J_{\ell}$, from (24), we obtain $g^{i_{k}}\in\partial\check{\varphi}(x^{k})$ by writing $i=i_{k}$, and it also holds that $g_{f}^{i_{k}}\in\partial f(x^{k})$ from (21), so $\Vert g_{f}^{i_{k}} \Vert $ is bounded. Hence

$$\begin{aligned} &p_{\mu_{\ell}}(\bar{\varphi}_{\ell}+h) \bigl(x^{k}\bigr) \\ &\quad=\arg\min\biggl\{ \bar{\varphi}_{\ell}(y)+h(y)+\frac{1}{2} \mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2} \biggr\} \\ &\quad\in\biggl\{ y \big| \bar{\varphi}_{\ell}(y)+h(y)+\frac{1}{2} \mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2} \leq\bar{\varphi}_{\ell}\bigl(x^{k}\bigr)+h \bigl(x^{k}\bigr)\biggr\} \\ &\quad\subseteq\biggl\{ y \big| \bar{\varphi}_{\ell}\bigl(x^{k}\bigr)+ \bigl\langle g_{f}^{i_{k}}, y-x^{k}\bigr\rangle +h \bigl(x^{k}\bigr)+\bigl\langle g^{k}_{h}, y-x^{k}\bigr\rangle \\ &\qquad{}+\frac{1}{2}\mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2}\leq\bar{\varphi}_{\ell }\bigl(x^{k} \bigr)+h\bigl(x^{k}\bigr)\biggr\} \\ &\quad=\biggl\{ y \big| \bigl\langle g_{f}^{i_{k}}, y-x^{k} \bigr\rangle +\bigl\langle g^{k}_{h},y-x^{k}\bigr\rangle +\frac{1}{2}\mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2}\leq0\biggr\} \\ &\quad=\biggl\{ y \big|\frac{1}{2}\mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2}\leq-\bigl\langle g_{f}^{i_{k}}+g^{k}_{h}, y-x^{k}\bigr\rangle \biggr\} \\ &\quad\subseteq\biggl\{ y \big| \frac{1}{2}\mu_{\ell} \bigl\Vert y-x^{k} \bigr\Vert ^{2}\leq \bigl\Vert g_{f}^{i_{k}}+g^{k}_{h} \bigr\Vert \bigl\Vert y-x^{k} \bigr\Vert \biggr\} \\ &\quad=\biggl\{ y \big| \bigl\Vert y-x^{k} \bigr\Vert \leq \frac{2 \Vert g_{f}^{i_{k}} \Vert +2 \Vert g^{k}_{h} \Vert }{\mu_{\ell }}\biggr\} . \end{aligned}$$

When $\ell_{r}+1\leq\ell<\ell_{r+1}$, if $y^{\ell+1}$ is a null step, we know $g^{\ell}_{f}$ is bounded from Lemma 6, else $y^{\ell+1}$ is a descent step, and the corresponding subgradient $g^{i_{k(\ell+1)}}\in\partial f(x^{k+1})$ is also bounded. Therefore, there exists a constant $L>0$ such that $\max\{ \Vert g_{f}^{i_{k}} \Vert , \Vert g_{f}^{\ell+1} \Vert , \Vert g^{k}_{h} \Vert , \Vert g^{\ell+1}_{h} \Vert \}\leq\frac{L}{2}$. Thus we have

$$y^{\ell+1}=p_{\mu_{\ell}}(\bar{\varphi}_{\ell}+h) \bigl(x^{k}\bigr)\in\biggl\{ y\Big| \bigl\Vert y-x^{k} \bigr\Vert \leq\frac{2L}{\mu_{\ell}}\biggr\} . $$

This together with (61) shows that

$$\begin{aligned} & \bigl\vert F\bigl(y^{\ell+1}\bigr)-F\bigl(x^{k}\bigr) \bigr\vert \\ &\quad= \bigl\vert f\bigl(x^{k}\bigr)+\bigl\langle g^{\ell+1}_{f}, y^{\ell+1}-x^{k}\bigr\rangle -f\bigl(x^{k}\bigr)+h \bigl(x^{k}\bigr)+\bigl\langle g^{\ell+1}_{h}, y^{\ell+1}-x^{k}\bigr\rangle -h\bigl(x^{k}\bigr) \bigr\vert \\ &\quad\leq \bigl\Vert g^{\ell+1}_{f}+g^{\ell+1}_{h} \bigr\Vert \bigl\Vert y^{\ell+1}-x^{k} \bigr\Vert \\ &\quad\leq2\cdot\frac{L}{2}\cdot\frac{2L}{\Gamma^{r}\mu_{0}}\leq M. \end{aligned}$$

Thus, if

$$r\geq\frac{\ln\frac{L^{2}}{M\mu_{0}}}{\ln\Gamma}, $$

then

$$f\bigl(y^{\ell+1}\bigr)\leq f\bigl(x^{k}\bigr)+M, \quad\forall \ell_{r}+1\leq\ell\leq\ell_{r+1}. $$

This means that the number of times $N_{\ell}$ of increasing μ satisfies (60). The latter part of the lemma follows immediately from the above result. □

Theorem 3

If $\delta_{\ell}=0$ and $\eta_{\ell}\geq\rho^{\mathrm{id}}$, then $x^{k}$ is a stationary point of F.

Proof

From (33), we have the relation

$$\delta_{\ell}=\varepsilon_{\ell}+\frac{R_{\ell}+\mu_{\ell }}{2\mu_{\ell}^{2}} \bigl\Vert s^{\ell} \bigr\Vert ^{2}. $$

If $\delta_{\ell}=0$, then $\varepsilon_{\ell}=0$ and $s^{\ell }=0$, and therefore

$$ x^{k}=y^{\ell+1}=p_{\mu_{\ell}}(\bar{ \varphi}_{\ell}+h) \bigl(x^{k}\bigr). $$

(62)

From the last result of Theorem 1, we know that $x^{k}$ is a stationary point of $\bar{\varphi}_{\ell}+h$. In addition, from $\varepsilon_{\ell}=0$ in (27), we have

$$f\bigl(x^{k}\bigr)+h\bigl(x^{k}\bigr)=\bar{ \varphi}_{\ell}\bigl(x^{k}\bigr)+\bar{h}_{\ell} \bigl(x^{k}\bigr). $$

This together with $\bar{h}_{\ell}(x^{k})\leq h(x^{k})$ shows that

$$ f\bigl(x^{k}\bigr)\leq\bar{\varphi}_{\ell} \bigl(x^{k}\bigr). $$

(63)

On one hand, for $\omega\in\mathcal{L}_{0}$, if $\eta_{\ell}\geq \rho^{\mathrm{id}}$, we obtain by (62) and (63)

$$ F\bigl(x^{k}\bigr)=f\bigl(x^{k}\bigr)+h \bigl(x^{k}\bigr)\leq\bar{\varphi}_{\ell }\bigl(x^{k} \bigr)+h\bigl(x^{k}\bigr)\leq\bar{\varphi}_{\ell}(\omega)+h( \omega) +\frac{1}{2}\mu_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}. $$

(64)

From the convexity of $\check{\varphi}_{\ell}$ and (50), we have

$$\bar{\varphi}_{\ell}(\omega)\leq\check{\varphi}_{\ell}(\omega ) \leq f(\omega)+\frac{1}{2}\eta_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}. $$

So, (64) can be written as

$$ F\bigl(x^{k}\bigr)\leq f(\omega)+\frac{1}{2} \eta_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}+h(\omega)+\frac{1}{2}\mu_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}. $$

(65)

On the other hand, for $\omega\notin\mathcal{L}_{0}$, from (28) we can obtain

$$ F\bigl(x^{k}\bigr)\leq F\bigl(x^{0}\bigr) \leq F\bigl(x^{0}\bigr)+M\leq F(\omega)=F(\omega)+\frac {1}{2}R_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}. $$

(66)

Combining (64) and (66), we have

$$F\bigl(x^{k}\bigr)\leq F(\omega)+\frac{1}{2}R_{\ell} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2} \quad\text{for all } \omega\in\mathbb{R}^{n}. $$

Hence

$$x^{k}=p_{R_{\ell}}F\bigl(x^{k}\bigr), $$

which together with Theorem 1 shows that $x^{k}$ is a stationary point of F. □

We are now in a position to present the main convergence result of our algorithm. As usual in bundle methods, two cases are considered: the algorithm generates finite number of descent steps; and the algorithm generates infinite number of descent steps. We set the stopping parameter $\epsilon=0$.

Theorem 4

Let η̄ be stabilized value for the convexification parameter sequence and assume $\bar{\eta}\geq\rho^{\mathrm{id}}$. Then the following mutually exclusive situations hold:

(i)
Algorithm 1 generates finite number of descent steps followed by infinitely many null steps. Let x̄ be the last stability center. Then $y^{\ell +1}\rightarrow\bar{x}$, and x̄ is a stationary point of F.
(ii)
Algorithm 1 generates an infinite sequence $\{ x^{k}\}$ of stability centers. Then any accumulation point of $\{x^{k}\}$ is a stationary point of F.

Proof

For (i), without loss of generality, we may assume $\eta_{\ell }=\overline{\eta}$, $\mu_{\ell}=\overline{\mu}$, and $R_{\ell }=\overline{R}$ throughout. As in Lemma 6, for the bounded sequences $\{y^{\ell}\}$ and $\{ z^{\ell}\}$ we showed that $\Vert y^{\ell}-z^{\ell} \Vert \rightarrow0$ and $\Vert z^{\ell+1}-y^{\ell } \Vert \rightarrow0$ as $\ell\rightarrow\infty$. Therefore $y^{\ell_{i}}\rightarrow p$ as $i\rightarrow\infty$ implies $z^{\ell_{i}}\rightarrow p$ and $z^{\ell_{i}+1}\rightarrow p$ as $i\rightarrow\infty$. For $\omega\in\mathcal{L}_{0}$ near p, by (50) and the convexity of h, we have

$$\begin{aligned} F(\omega)&=f(\omega)+h(\omega) \\ &\geq\check{\varphi}_{\ell_{i}}(\omega)-\frac{1}{2}\bar{\eta} \Vert \omega-\bar{x} \Vert ^{2}+h(\omega) \\ &\geq\bar{\varphi}_{\ell_{i}}(\omega)-\frac{1}{2}\bar{\eta} \Vert \omega-\bar{x} \Vert ^{2}+\bar{h}_{\ell_{i}-1}(\omega) \\ &=\check{\varphi}_{\ell_{i}}\bigl(z^{\ell_{i}+1}\bigr)+\bigl\langle s^{\ell _{i}}_{\varphi}, \omega-z^{\ell_{i}+1}\bigr\rangle - \frac{1}{2}\bar {\eta} \Vert \omega-\bar{x} \Vert ^{2}+h \bigl(y^{\ell_{i}}\bigr)+\bigl\langle s^{\ell _{i}-1}_{h}, \omega-y^{\ell_{i}}\bigr\rangle . \end{aligned}$$

(67)

Let $x^{k}=\bar{x}$, $\mu_{\ell}=\bar{\mu}$, from (29) and the boundedness of $s^{\ell}_{\varphi}+s^{\ell -1}_{h}=\overline{\mu}(\bar{x}-z^{\ell+1})$, we know

$$\begin{aligned} &\bigl\langle s^{\ell_{i}}_{\varphi}, \omega-z^{\ell_{i}+1}\bigr\rangle +\bigl\langle s^{\ell_{i}-1}_{h}, \omega-y^{\ell_{i}} \bigr\rangle \\ &\quad=\bigl\langle s^{\ell_{i}}_{\varphi}, \omega-z^{\ell_{i}+1}\bigr\rangle +\bigl\langle s^{\ell_{i}-1}_{h}, \omega-z^{\ell_{i}+1}+z^{\ell _{i}+1}-y^{\ell_{i}} \bigr\rangle \\ &\quad=\bar{\mu}\bigl\langle \bar{x}-z^{\ell_{i}+1}, \omega-z^{\ell _{i}+1}\bigr\rangle +\bigl\langle s^{\ell_{i}-1}_{h}, z^{\ell_{i}+1}-y^{\ell _{i}} \bigr\rangle . \end{aligned}$$

(68)

Note that

$$\begin{aligned} &{-}\frac{1}{2}\bar{\eta} \Vert \omega-\bar{x} \Vert ^{2} \end{aligned}$$

(69)

$$\begin{aligned} &\quad=-\frac{1}{2}\bar{\eta} \bigl\Vert \omega-z^{\ell_{i}+1}+z^{\ell _{i}+1}- \bar{x} \bigr\Vert ^{2} \\ &\quad=-\frac{1}{2}\bar{\eta} \bigl\Vert \omega-z^{\ell_{i}+1} \bigr\Vert ^{2}-\frac {1}{2}\bar{\eta} \bigl\Vert z^{\ell_{i}+1}- \bar{x} \bigr\Vert ^{2} +\bar{\eta}\bigl\langle \bar{x}-z^{\ell_{i}+1}, \omega-z^{\ell _{i}+1}\bigr\rangle . \end{aligned}$$

(70)

Combining (68) and (70), (67) can be written as

$$\begin{aligned} F(\omega)={}&\check{\varphi}_{\ell_{i}}\bigl(z^{\ell_{i}+1}\bigr)- \frac{1}{2}\bar{\eta } \bigl\Vert z^{\ell_{i}+1}-\bar{x} \bigr\Vert ^{2}+h\bigl(y^{\ell_{i}}\bigr) +(\bar{\eta}+\bar{\mu})\bigl\langle \bar{x}-z^{\ell_{i}+1}, \omega -z^{\ell_{i}+1}\bigr\rangle \\ &{} +\bigl\langle s^{\ell_{i}-1}_{h}, z^{\ell_{i}+1}-y^{\ell_{i}} \bigr\rangle -\frac{1}{2}\bar{\eta} \bigl\Vert \omega-z^{\ell_{i}+1} \bigr\Vert ^{2} . \end{aligned}$$

By Claim (iv) of Lemma 4 written with $\ell=\ell_{i}$ for $\omega=z^{\ell_{i}+1}$, we have the following inequality:

$$\begin{aligned} \check{\varphi}_{\ell_{i}}\bigl(z^{\ell_{i}+1}\bigr)-\frac{1}{2} \overline {\eta} \bigl\Vert z^{\ell_{i}+1}-\bar{x} \bigr\Vert ^{2}\geq{}& f\bigl(y^{\ell_{i}}\bigr)+\frac{1}{2}\overline{ \eta} \bigl\Vert y^{\ell _{i}}-\bar{x} \bigr\Vert ^{2}- \frac{1}{2}\overline{\eta} \bigl\Vert z^{\ell _{i}+1}-\bar{x} \bigr\Vert ^{2} \\ &{} +\bigl\langle g^{\ell_{i}}_{f}+\overline{\eta} \bigl(y^{\ell_{i}}-\bar {x}\bigr), z^{\ell_{i}+1}-y^{\ell_{i}}\bigr\rangle . \end{aligned}$$

Then

$$\begin{aligned} F(\omega)={}&f(\omega)+h(\omega) \\ \geq{}&\check{\varphi}_{\ell_{i}}\bigl(z^{\ell_{i}+1}\bigr)- \frac{1}{2}\bar {\eta} \bigl\Vert z^{\ell_{i}+1}-\bar{x} \bigr\Vert ^{2}+h\bigl(y^{\ell_{i}}\bigr) -\frac{1}{2}\bar{\eta} \bigl\Vert \omega-z^{\ell_{i}+1} \bigr\Vert ^{2} \\ &{} +(\bar{\eta}+\bar{\mu})\bigl\langle \bar{x}-z^{\ell_{i}+1}, \omega-z^{\ell_{i}+1}\bigr\rangle +\bigl\langle s^{\ell_{i}-1}_{h}, z^{\ell _{i}+1}-y^{\ell_{i}}\bigr\rangle \\ \geq{}& f\bigl(y^{\ell_{i}}\bigr)+h\bigl(y^{\ell_{i}}\bigr)+ \frac{1}{2}\bar{\eta} \bigl\Vert y^{\ell_{i}}-\bar{x} \bigr\Vert ^{2} -\frac{1}{2}\bar{\eta} \bigl\Vert z^{\ell_{i}+1}- \bar{x} \bigr\Vert ^{2}-\frac {1}{2}\bar{\eta} \bigl\Vert \omega-z^{\ell_{i}+1} \bigr\Vert ^{2} \\ & {}+\bigl\langle g^{\ell_{i}}_{f}+\bar{\eta}\bigl(y^{\ell_{i}}- \bar {x}\bigr),z^{\ell_{i}+1}-y^{\ell_{i}}\bigr\rangle +(\bar{\eta}+\bar{ \mu})\bigl\langle \bar{x}-z^{\ell_{i}+1}, \omega -z^{\ell_{i}+1}\bigr\rangle \\ &{} +\bigl\langle s^{\ell_{i}-1}_{h}, z^{\ell_{i}+1}-y^{\ell_{i}} \bigr\rangle . \end{aligned}$$

From $s^{\ell-1}_{h}=\overline{\mu}(\bar{x}-z^{\ell+1})-s^{\ell }_{\varphi}$, the bounded sequence $\{\overline{\mu}(\bar {x}-z^{\ell+1})\}$, $\{s^{\ell}_{\varphi}\}$, $\{g^{\ell}_{f}\}$ and $\{y^{\ell}\}$, we know that $\{s^{\ell-1}_{h}\}$ and $\{g^{\ell_{i}}_{f}+ \overline{\eta}(y^{\ell_{i}}-\bar{x})\}$ are bounded. Taking the limit as $i\rightarrow\infty$, and using the fact that f is continuous at p, we obtain

$$\begin{aligned} F(\omega)={}&f(\omega)+h(\omega) \\ \geq{}&\lim_{i\rightarrow\infty}\check{\varphi}_{\ell_{i}} \bigl(z^{\ell _{i}+1}\bigr)-\frac{1}{2}\bar{\eta} \Vert p-\bar{x} \Vert ^{2} +h(p)-\frac{1}{2}\bar{\eta} \Vert \omega-p \Vert ^{2}+\bar{R}\langle\bar {x}-p, \omega-p\rangle \\ \geq{}& f(p)+h(p)-\frac{1}{2}\bar{\eta} \Vert \omega-p \Vert ^{2}+\bar {R}\langle\bar{x}-p, \omega-p\rangle \\ ={}&F(p)-\frac{1}{2}\bar{\eta} \Vert \omega-p \Vert ^{2}+ \bar{R}\langle\bar {x}-p, \omega-p\rangle, \end{aligned}$$

(71)

for all $\omega\in\mathcal{L}_{0}$ near p. Since $\frac {1}{2}\overline{\eta} \Vert \omega-p \Vert ^{2}=o( \Vert \omega-p \Vert )$, the last inequality means that $\overline{R}(\bar{x}-p)\in\partial F(p)$ by Definition 8.3 in [17], which implies $p=p_{\bar{R}}F(\bar{x})$ by Theorem 1. Since f is continuous and condition (50) holds at all accumulation points of $\{y^{\ell}\}$, then the entire sequence $\{y^{\ell}\}$ converges to the proximal point $p_{\bar{R}}F(\bar{x})$. Furthermore, evaluating the relations at $\omega=p$ shows that the following equation holds for the entire sequence by Theorem 2 in [16]:

$$ \lim_{i\rightarrow\infty}\check{\varphi}_{\ell_{i}} \bigl(z^{\ell _{i}+1}\bigr)=f(p)+\frac{1}{2}\overline{\eta} \Vert p- \bar{x} \Vert ^{2}. $$

(72)

So as $\ell\rightarrow\infty$, the whole sequence

$$\bigl\{ y^{\ell}\bigr\} \rightarrow p=p_{\bar{R}}F(\bar{x}) \quad\text{with } \check{\varphi}_{\ell}\bigl(z^{\ell+1}\bigr)\rightarrow f(p) +\frac{1}{2}\overline{\eta} \Vert p-\bar{x} \Vert ^{2}. $$

Thus, from (26) we have

$$\begin{aligned} \delta_{\ell} &=f(\bar{x})+\frac{1}{2}\bar{\eta} \bigl\Vert y^{\ell+1}-\bar{x} \bigr\Vert ^{2}+h(\bar{x})-\bigl[ \bar{ \varphi}_{\ell}\bigl(y^{\ell+1}\bigr)+h\bigl(y^{\ell +1}\bigr) \bigr] \\ &=f(\bar{x})+\frac{1}{2}\bar{\eta} \bigl\Vert y^{\ell+1}-\bar{x} \bigr\Vert ^{2}+h(\bar{x})-\check{\varphi}_{\ell} \bigl(z^{\ell+1}\bigr)-\bigl\langle s^{\ell }_{\varphi}, y^{\ell+1}-z^{\ell+1}\bigr\rangle -h\bigl(y^{\ell+1}\bigr) \\ &\rightarrow f(\bar{x})+\frac{1}{2}\bar{\eta} \Vert p-\bar{x} \Vert ^{2}+h(\bar{x})-f(p)-\frac{1}{2}\bar{\eta} \Vert p-\bar{x} \Vert ^{2}-h(p) \\ &=f(\bar{x})+h(\bar{x})-f(p)-h(p) \\ &=F(\bar{x})-F(p). \end{aligned}$$

Since null step does not satisfy the descent test in Step 4 of the algorithm, we have $F(y^{\ell+1})> F(\bar{x})-\kappa\delta_{\ell}$. Taking the limit as $\ell\rightarrow\infty$ gives the relation $F(p)\geq F(\bar{x})-\kappa(F(\bar{x})-F(p))$, so $F(\bar{x})\leq F(p)$ because $\kappa\in(0,1)$. But $p=p_{\bar{R}}F(\bar{x})$ implies

$$F(p)+\bar{R} \Vert p-\bar{x} \Vert ^{2}\leq F(\bar{x}), $$

which shows that $\bar{x}=p$. That is, $\bar{x}=p_{\bar{R}}F(\bar {x})$, so x̄ is a stationary point of F from Theorem 1.

For (ii), $\mathcal{L}_{0}$ is a compact set, and the sequence $\{ x^{k}\}\subset\mathcal{L}_{0}$, so it has an accumulation point, i.e., there exists some infinite set K such that $x^{k}\rightarrow\hat {x}\in\mathcal{L}_{0}$ as $K\ni k\rightarrow\infty$. Since $x^{k+1}=y^{i_{k+1}}$, let $j_{k}=i_{k+1}-1$ so that $x^{k+1}=p_{\bar{\mu}}(\bar{\varphi}_{j_{k}}+h)(x^{k})$. The descent test

$$F\bigl(x^{k+1}\bigr)\leq F\bigl(x^{k}\bigr)-\kappa \delta_{j_{k}} $$

implies that, as $k\rightarrow\infty$, either $F(x^{k})\searrow -\infty$, or $\delta_{j_{k}}\rightarrow0$. By Assumption 2, $F(x^{k})$ is bounded below, therefore, $\delta _{j_{k}}\rightarrow0$. From (39), this means that

$$\bigl\Vert y^{j_{k}+1}-x^{k} \bigr\Vert ,\qquad e^{-j_{k}}_{f}+\bar{\eta}d^{k}_{-j_{k}},\qquad h \bigl(x^{k}\bigr)-\bar{h}_{j_{k}}\bigl(x^{k}\bigr) $$

must converge to 0. By

$$\bigl\Vert z^{j_{k}+1}-x^{k} \bigr\Vert \leq \bigl\Vert z^{j_{k}+1}-y^{j_{k}+1} \bigr\Vert + \bigl\Vert y^{j_{k}+1}-x^{k} \bigr\Vert $$

and

$$\bigl\Vert z^{j_{k}+1}-y^{j_{k}+1} \bigr\Vert \rightarrow0 $$

in Lemma 6, we have

$$\bigl\Vert z^{j_{k}+1}-x^{k} \bigr\Vert \rightarrow0. $$

By (21), $\check{\varphi}_{j_{k}}(z^{j_{k}+1})-f(x^{k})\rightarrow 0$ as $k\rightarrow\infty$, from (29) we know that

$$\bar{\varphi}_{j_{k}}\bigl(y^{j_{k}+1}\bigr)=\check{\varphi }_{j_{k}}\bigl(z^{j_{k}+1}\bigr)+\bigl\langle s^{j_{k}}_{\varphi}, y^{j_{k}+1}-z^{j_{k}+1}\bigr\rangle . $$

Therefore,

$$\bar{\varphi}_{j_{k}}\bigl(y^{j_{k}+1}\bigr)-f\bigl(x^{k} \bigr)\rightarrow0 $$

as $k\rightarrow\infty$. Consider now $k\in K$. Since $\Vert x^{k+1}-x^{k} \Vert = \Vert y^{j_{k}+1}-x^{k} \Vert \rightarrow0$, both $x^{k+1}$ and $x^{k}$ converge to $x^{\inf}$ as $K\ni k\rightarrow \infty$ with

$$\bar{\varphi}_{j_{k}}\bigl(x^{k+1}\bigr)\rightarrow f(\hat{x}). $$

And from $x^{k+1}=p_{\bar{\mu}}(\bar{\varphi}_{j_{k}}+h)(x^{k})$, $\bar {\eta}\geq\rho^{\mathrm{id}}$ and (50), for all $\omega\in\mathcal{L}_{0}$,

$$\begin{aligned} &\bar{\varphi}_{j_{k}}\bigl(x^{k+1}\bigr)+h\bigl(x^{k+1} \bigr)+\frac{1}{2}\bar{\mu} \bigl\Vert x^{k+1}-x^{k} \bigr\Vert ^{2} \\ &\quad\leq\bar{\varphi}_{j_{k}}(\omega)+h(\omega)+ \frac{1}{2}\bar{\mu } \bigl\Vert \omega-x^{k} \bigr\Vert ^{2} \\ &\quad\leq\check{\varphi}_{j_{k}}(\omega)+h(\omega)+\frac{1}{2}\bar{\mu } \bigl\Vert \omega-x^{k} \bigr\Vert ^{2} \\ &\quad\leq f(\omega)+h(\omega)+\frac{1}{2}\bar{R} \bigl\Vert \omega-x^{k} \bigr\Vert ^{2}. \end{aligned}$$

Therefore, taking the limit $k\in K$, we have

$$f(\hat{x})+h(\hat{x})\leq f(\omega)+h(\omega)+\frac{1}{2}\bar{R} \Vert \omega-\hat{x} \Vert ^{2},\quad \text{for all } \omega\in \mathcal{L}_{0}. $$

On the other hand, $x^{\inf}\in\mathcal{L}_{0}$ and for any $\omega \notin\mathcal{L}_{0}$, it follows

$$F(\hat{x})\leq F\bigl(x^{0}\bigr)\leq F\bigl(x^{0}\bigr)+M< F(\omega)< F(\omega)+\frac {1}{2}\bar{R} \Vert \omega-\hat{x} \Vert ^{2}. $$

Hence,

$$F(\hat{x})\leq F(\omega)+\frac{1}{2}\bar{R} \Vert \omega-\hat{x} \Vert ^{2}, \quad\text{for all } \omega\in\mathbb{R}^{n}. $$

Therefore, $\hat{x}=p_{\bar{R}}F(\hat{x})$ with $\bar{R}\geq\rho ^{\mathrm{id}}$, hence x̂ is a stationary point of F from Theorem 1. □

5 Numerical results

This section aims to test the practical effectiveness of Algorithm 1. We tested a set of nine problems. The first set of seven problems are generalized from the unconstrained versions in [25] by imposing suitable constraints, the second set of two nonconvex unconstrained problems are taken from [13, 26] which are the sum of a nonconvex function and a convex function.

All numerical experiments were implemented by using MATLAB R2014a, and on a ThinkPad laptop with Windows 7 operating system. The first seven problems have the form of (2) with $C=\{ x: \Vert x-a \Vert \leq b\}$, where $a\in\mathbb{R}^{n}$ and $0< b\in\mathbb{R}$ are given below. These problems are transformed to the form of (3) by using the indicator function. The detailed data for the seven problems are listed below. For simplicity, we use the MATLAB notations: ones(p,q) and zeros(p,q) denote p-by-q matrices of ones and zeros, respectively.

CB2: $f(x)=\max\{x_{1}^{2}+x_{2}^{4}, (2-x_{1})^{2}+(2-x_{2})^{2}, 2e^{x_{2}-x_{1}}\}$, $y^{0}=(3,3)^{T}$, $a=(0,0)^{T}$, $b=1$.

CB3: $f(x)=\max\{x_{1}^{4}+x_{2}^{2}, (2-x_{1})^{2}+(2-x_{2})^{2}, 2e^{x_{2}-x_{1}}\}$, $y^{0}=(3,3)^{T}$, $a=(3,3)^{T}$, $b=1$.

LQ: $f(x)=\max\{-x_{1}-x_{2},-x_{1}-x_{2}+x_{1}^{2}+x_{2}^{2}-1\}$, $y^{0}=(1,1)^{T}$, $a=(1,-1)^{T}$, $b=1$.

Mifflin1: $f(x)=-x_{1}+20\max\{x_{1}^{2}+x_{2}^{2}-1, 0\}$, $y^{0}=(1.5,0.5)^{T}$, $a=(-2,2)^{T}$, $b=1$.

Rosen–Suzuki: $f(x)=\max_{1\leq i\leq4}f_{i}(x)$, $y^{0}=(1,2.1,-3,-0.9)^{T}$, $a=(1,2,3,4)^{T}$, $b=2$, with

$$\begin{aligned} &f_{1}(x)=x_{1}^{2}+x_{2}^{2}+2x_{3}^{2}+x_{4}^{2}-5x_{1}-5x_{2}-21x_{3}+7x_{4}, \\ &f_{2}(x)=f_{1}(x)+10\bigl(x_{1}^{2}+x_{2}^{2}+x_{3}^{2}+x_{4}^{2}+x_{1}-x_{2}+x_{3}-x_{4}-8 \bigr), \\ &f_{3}(x)=f_{1}(x)+10\bigl(x_{1}^{2}+2x_{2}^{2}+x_{3}^{2}+2x_{4}^{2}-x_{1}-x_{4}-10 \bigr), \\ &f_{4}(x)=f_{1}(x)+10\bigl(2x_{1}^{2}+x_{2}^{2}+x_{3}^{2}+2x_{1}-x_{2}-x_{4}-5 \bigr). \end{aligned}$$

Shor: $f(x)=\max_{1\leq i\leq10}\{d_{i}\sum^{5}_{j=1}(x_{j}-c_{ij})^{2})\}$, $y^{0}=\texttt {zeros(10,1)}$, $a=\texttt {zeros(10,1)}$, $b=3$, $d=(1,5,10, 2, 4, 3, 1.7, 2.5, 6, 3.5)^{T}$,

$$C=\left ( \begin{matrix} 0& 2& 1& 1& 3& 0& 1& 1& 0&1\\ 0& 1& 2& 4& 2& 2& 1& 0& 0&1\\ 0& 1& 1& 1& 1& 1& 1& 1& 2&2\\ 0& 1& 1& 2& 0& 0& 1& 2& 1&0\\ 0& 3& 2& 2& 1& 1& 1& 1& 0&0 \end{matrix} \right ). $$

MAXL: $f(x)=\max_{1\leq i\leq20} \vert x_{i} \vert $, $a = (-\texttt {ones(1,10)}, \texttt {ones(1,10)})^{T}$, $b=4$, $y^{0}=(1,1.1,3, 1.1,5,1.1,7,1.1,9,1.1,-11,0.1,-13,0.1,-15,0.1,-17,0.1,-19,0.1)^{T}$.

The second set of two problems are:

Regular: $F(x)=\sum_{i=1}^{n} \vert f_{i}(x) \vert +\frac{1}{2} \Vert x \Vert ^{2}$, where

$$f_{i}(x):=\bigl(ix_{i}^{2}-2x_{i}\bigr)+ \sum_{j=1}^{n}x_{j},\quad i=1,2, \ldots,n $$

are the Ferrier polynomials.

L-Mifflin: $F(x)=2(x_{1}^{2}+x_{2}^{2}-1)+1.75 \vert x_{1}^{2}+x_{2}^{2}-1 \vert $.

The nonconvexity of the above two problems can be seen from Fig. 1.

In the test, the parameters are selected as $M=5$, $R_{0}=10$, $\kappa =0.3$, $\epsilon=10^{-5}$, $\Gamma=2$. The numerical results are reported in Tables 1, 2 and 3. The notations are: the dimension of problem n; the number of iterations NI; the number of descent steps ND; the number of function evaluations NF; the approximately optimal solution $x^{*}$; the approximately optimal objective value $F^{*}$. The comparisons between Algorithm 1 and PPBM (the algorithm in [27]) for the first seven problems are listed in Table 1. From Table 1, we see that Algorithm 1 performs better than PPBM. In Table 2, we compare our algorithm with RedistProx in [13] for problem Regular with various n. From Table 2, under the same conditions (terminates if NF is more than 300), we see that the approximately optimal values and accuracies of Algorithm 1 are better than RedistProx. Finally, in Table 3, we report the approximately optimal solutions and values for problem L-Mifflin with different starting points.

Table 1 Numerical results for the first set of seven problems

Full size table

Table 2 Numerical results for “Regular” compared with RedistProx

Full size table

Table 3 Numerical results for “Regular” and “L-Mifflin”

Full size table

References

Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process. Lett. 14(10), 707–710 (2007)
Article Google Scholar
Hare, W., Sagastizábal, C., Solodov, M.V.: A proximal bundle method for nonsmooth nonconvex functions with inexact information. Comput. Optim. Appl. 63(1), 1–28 (2016)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Article MathSciNet MATH Google Scholar
Dinh, Q.T., Diehl, M.: Proximal methods for minimizing the sum of a convex function and a composite function (2011). arXiv:1105.0276
Eckstein, J., Svaiter, B.F.: A family of projective splitting methods for the sum of two maximal monotone operators. Math. Program. 111(1), 173–199 (2007)
Article MathSciNet MATH Google Scholar
Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program. 141, 349–382 (2013)
Article MathSciNet MATH Google Scholar
Kiwiel, K.C.: A method for minimizing the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 48(3), 437–449 (1986)
Article MathSciNet MATH Google Scholar
Kiwiel, K.C.: An alternating linearization bundle method for convex optimization and nonlinear multicommodity flow problems. Math. Program. 130(1), 59–84 (2011)
Article MathSciNet MATH Google Scholar
Li, D., Pang, L., Chen, S.: A proximal alternating linearization method for nonconvex optimization problems. Optim. Methods Softw. 29(4), 771–785 (2014)
Article MathSciNet MATH Google Scholar
Mine, H., Fukushima, M.: A minimization method for the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl. 33(1), 9–23 (1981)
Article MathSciNet MATH Google Scholar
Tuy, H., Tam, B.T., Dan, N.D.: Minimizing the sum of a convex function and a specially structured nonconvex function. Optimization 28, 237–248 (1994)
Article MathSciNet MATH Google Scholar
Hare, W.L., Sagastizábal, C.: A redistributed proximal bundle method for nonconvex optimization. SIAM J. Optim. 20(5), 2442–2473 (2010)
Article MathSciNet MATH Google Scholar
Cheney, E.W., Goldstein, A.A.: Newton’s method for convex programming and Tchebycheff approximations. Numer. Math. 1, 253–268 (1959)
Article MathSciNet MATH Google Scholar
Kelley, J.E.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 8, 703–712 (1960)
Article MathSciNet MATH Google Scholar
Hare, W., Sagastizábal, C.A.: Computing proximal points of nonconvex functions. Math. Program. 116(1), 221–258 (2009)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)
Book MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Article MathSciNet MATH Google Scholar
Lemaréchal, C.: An extension of Davidon methods to nondifferentiable problems. Math. Program. Stud. 3, 95–109 (1975)
Article MATH Google Scholar
Wolfe, P.: A method of conjugate subgradients for minimizing nondifferentiable functions. Math. Program. Stud. 3, 145–173 (1975)
Article MathSciNet MATH Google Scholar
Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.: Numerical Optimization: Theoretical and Practical Aspects, 2nd edn. Springer, Berlin (2006)
MATH Google Scholar
Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization. Lecture Notes in Mathematics, vol. 1133. Springer, Berlin (1985)
MATH Google Scholar
Kiwiel, K.C.: A method of centers with approximate subgradient linearizations for nonsmooth convex optimization. SIAM J. Optim. 18(4), 1467–1489 (2008)
Article MathSciNet MATH Google Scholar
Kiwiel, K.C.: An algorithm for linearly constrained convex nondifferentiable minimization problems. J. Math. Anal. Appl. 105(2), 452–465 (1985)
Article MathSciNet MATH Google Scholar
Lukšan, L., Vlček, J.: Test problems for nonsmooth unconstrained and linearly constrained optimization. Technical Report No. 798, Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague (2000)
Bagirov, A.M., Karmitsa, N., Mäkelä, M.: Introduction to Nonsmooth Optimization: Theory, Practice and Software. Springer, Cham (2014)
Book MATH Google Scholar
Kiwiel, K.C.: A proximal-projection bundle method for Lagrangian relaxation, including semidefinite programming. SIAM J. Optim. 17(4), 1015–1034 (2006)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Project supported by the National Natural Science Foundation (11761013, 11771383) and Guangxi Natural Science Foundation (2013GXNSFAA019013, 2014GXNSFFA118001, 2016GXNSFDA380019) of China.

Author information

Authors and Affiliations

College of Mathematics and Information Science, Guangxi University, Nanning, P.R. China
Chunming Tang & Jinman Lv
College of Science, Guangxi University of Nationalities, Nanning, P.R. China
Jinbao Jian

Authors

Chunming Tang
View author publications
You can also search for this author in PubMed Google Scholar
Jinman Lv
View author publications
You can also search for this author in PubMed Google Scholar
Jinbao Jian
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors read and approved the final manuscript. CT mainly contributed to the algorithm design and convergence analysis; JL mainly contributed to the convergence analysis and numerical results; and JJ mainly contributed to the idea of the method and algorithm design.

Corresponding author

Correspondence to Jinbao Jian.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Tang, C., Lv, J. & Jian, J. An alternating linearization bundle method for a class of nonconvex nonsmooth optimization problems. J Inequal Appl 2018, 101 (2018). https://doi.org/10.1186/s13660-018-1683-1

Download citation

Received: 02 March 2018
Accepted: 28 March 2018
Published: 27 April 2018
DOI: https://doi.org/10.1186/s13660-018-1683-1

An alternating linearization bundle method for a class of nonconvex nonsmooth optimization problems

Abstract

Similar content being viewed by others

A simple version of bundle method with linear programming

A doubly stabilized bundle method for nonsmooth convex optimization

Subgradient and Bundle Methods for Nonsmooth Optimization

1 Introduction

Example 1

Example 2

Example 3

2 Preliminaries

Lemma 1

Theorem 1

Assumption 1

Theorem 2

3 The alternating linearization bundle method

3.1 Motivation and framework

3.2 Further description via bundle terminologies

3.3 The algorithm

Algorithm 1

Remark 1

Lemma 2

Proof

Lemma 3

Proof

4 Convergence

Assumption 2

Remark 2

Lemma 4

Lemma 5

Lemma 6

Proof

Lemma 7

Proof

Theorem 3

Proof

Theorem 4

Proof

5 Numerical results

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation