1 Introduction

The Minkowski sum of sets (also known as dilation, sumset) is a long-standing and fundamental nomenclature in computational geometry. It plays an important role in the fields, such as mathematical morphology in image processing [44], collision detection in robotics [29], configuration space computation in mechanics [48], numerical control machining in computer-aided manufacturing [45], and aggregation theory in economics [33]. Given two sets \({\mathcal {A}}\subset {{\mathbb {R}}}^n\) and \({\mathcal {B}}\subset {{\mathbb {R}}}^n\) (henceforth, we symbolize set by capital calligraphic), the Minkowski sum of \({\mathcal {A}}\) and \({\mathcal {B}}\), denoted by \({\mathcal {A}}\oplus {\mathcal {B}}\), is defined as

$$\begin{aligned} {\mathcal {A}}\oplus {\mathcal {B}}:=\left\{ a+b \mid a\in {\mathcal {A}},\;b\in {\mathcal {B}}\right\} . \end{aligned}$$
(1.1)

Specially, if one of the sets consists of a unique element, e.g., \({\mathcal {A}}=\{a\}\), we briefly adopt the notation \(a\oplus {\mathcal {B}}\) to indicate the translation of set \({\mathcal {B}}\) toward a. It follows immediately from (1.1) that the zero set \(\{0\}\) plays the role of identity element under the additive operation \(\oplus\). Figure 1 displays the diagrams of Minkowski sum of ball, box and convex polytope in two- and three-dimensions.

Fig. 1
figure 1

Diagram representations of the Minkowski sum of \(\Omega _1=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _1\le 1, x_n\ge 0\}\), \(\Omega _2=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _2\le 1\}\) and \(\Omega _3=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _{\infty }\le 1\}\). Top row: the case of \(n=2\). Bottom row: the case of \(n=3\)

Let \({\mathcal {F}}\subset {{\mathbb {R}}}^n\) be a nonempty set. The support function of \({\mathcal {F}}\), denoted by \(\sigma _{\mathcal {F}}:{{\mathbb {R}}}^n\rightarrow (-\infty , +\infty ]\), is defined as

$$\begin{aligned} \sigma _{\mathcal {F}}(x):=\sup \left\{ \langle x,y\rangle \mid y\in {\mathcal {F}} \right\} , \end{aligned}$$
(1.2)

where \(\langle x,y\rangle :=x^\top y\) is the inner product endowed on \({{\mathbb {R}}}^n\). Because of the closedness and sublinearity of \(\sigma _{\mathcal {F}}\) (see Proposition 2.3 in Sect. 2), it essentially indicates a norm on \({{\mathbb {R}}}^n\) whenever \({\mathcal {F}}\) fulfils some prerequisites (e.g., compactness, convexity, and symmetry). Accordingly, the generalized distance from an \(x^0\in {{\mathbb {R}}}^n\) to a set \({\mathcal {C}}\subset {{\mathbb {R}}}^n\), denoted by \(d_{\mathcal {F}}:{{\mathbb {R}}}^n\rightarrow [0,+\infty ]\), can be quantified by the metric \(\sigma _{\mathcal {F}}\) as follows

$$\begin{aligned} d_{\mathcal {F}}(x^0;{\mathcal {C}}):=\inf \left\{ \sigma _{\mathcal {F}}(x-x^0) \mid x\in {\mathcal {C}}\right\} . \end{aligned}$$
(1.3)

Note that the optima of (1.3) (if exist) are coined as the generalized projection of \(x^0\) onto \({\mathcal {C}}\), denoted by \(\Pi _{\mathcal {F}}(x^0;{\mathcal {C}})\). Particularly, if \({\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _2\le 1\}\) is an origin-centered unit ball under the \(\ell ^2\)-norm, the \(\sigma _{\mathcal {F}}\), \(d_{\mathcal {F}}\) and \(\Pi _{\mathcal {F}}\) reduce to the canonical Euclidean norm, Euclidean distance and Euclidean projection (also known as orthogonal projection), respectively. The (generalized) projection onto set, especially onto the convex set, is crucial for both theoretical and practical purposes in numerical optimization and computational geometry. For some “simple” sets such as the \(\ell ^2\)-norm ball, hyperplane, second-order cone and spectral set, the Euclidean projections onto those sets can be trivially derived by closed-form formulae (see e.g, [2, Chapter 28] and the references therein). There is also abundant literature on solving projections onto, e.g., the \(\ell ^1\)-norm ball [12, 49], manifolds [30], \(\ell ^{1,q}\)-norm ball [47], and isotone projection cone [35]. In other words, the distance (1.3) can be efficiently computed for the above “simple” sets. However, the effort on computing the distance (1.3) may be significantly intensive if \({\mathcal {F}}\) and \({\mathcal {C}}\) are generic convex sets.

In this paper, our particular interest is to compute the generalized distance from an \(x^0\in {{\mathbb {R}}}^n\) to a convex set with Minkowski sum structure, i.e.,

$$\begin{aligned} d_{\mathcal {F}}(x^0;\mathcal {C})=\inf \left\{ \sigma _{\mathcal {F}}(x-x^0) \mid x\in {\mathcal {C}}:=T_1(\Omega _1)\oplus T_2(\Omega _2)\oplus \cdots \oplus T_m(\Omega _m)\right\} , \end{aligned}$$
(1.4)

where \(\Omega _i\subset {{\mathbb {R}}}^{l_i}\)\((i = 1, 2,\ldots ,m)\) are nonempty closed convex sets with \(\sum _{i=1}^ml_i=l\), and \(T_i(\Omega _i)\subset {{\mathbb {R}}}^n\) denotes the image of \(\Omega _i\) under affine mapping \(T_i\), i.e.,

$$\begin{aligned} T_i(\Omega _i)=\{M_i\omega +a_i\mid \omega \in \Omega _i\},\;\;i=1,2,\ldots ,m \end{aligned}$$
(1.5)

with \(M_i\in {{\mathbb {R}}}^{n\times l_i}\) and \(a_i\in {{\mathbb {R}}}^n\). For brevity, the \(-\Omega _i\) denotes the opposite set of \(\Omega _i\), i.e., the case of \(M_i=-I_n\) and \(a_i={\mathbf {0}}\) in (1.5). Due to the closedness and convexity of \(\Omega _i\), it follows from Lemma 2.2 (see Sect. 2) that those \(T_i(\Omega _i)\) in (1.5) and \(\mathcal {C}\) in (1.4) are closed convex sets. Problem (1.4) has plenty of applications in robotics. For example, to guarantee a robot operate freely in workspace with obstacles, collision detection is conducted to inspect whether the robot and obstacles are overlapped [17]. Concretely, let \(\Omega _i\subset {{\mathbb {R}}}^n\) (\(i=1,2\)) be convex sets. The collision detection, which is measured by the distance between \(\Omega _1\) and \(\Omega _2\), can be formalized as (see e.g., [8, 51] for details)

$$\begin{aligned} d_{\mathcal {F}}(0;{\mathcal {C}})=\inf \left\{ \sigma _{\mathcal {F}}(x) \mid x\in {\mathcal {C}}:=\Omega _1\oplus (-\Omega _2)\right\} , \end{aligned}$$
(1.6)

which is a special case of (1.4) with specifications \(m=2\), \(x^0=0\), \(M_1=-M_2=-I_n\) and \(a_1=a_2=0\).

However, the (1.4) is nontrivial to be tackled, even if all \(T_i\)’s are identity mappings. Take the sets in Fig. 1 as examples, although those \(\Omega _i\)’s are simple enough in the sense that the projection onto individual set admits closed-form solutions or can be easily solved up to high precisions, the projection onto their Minkowski sum set (i.e., the \(\Omega _1\oplus \Omega _2\oplus \Omega _3\) in Fig. 1) is challenging to acquire. Typically, the projection onto a set with Minkowski sum structure does not possess explicit formula and should be approximated numerically. A large number of methods have been devised for solving (1.6) (a special case of (1.4)) with compact convex sets (especially for polytopes) over the past decades. The existing methods for solving (1.6) can be roughly classified into two categories: optimization-based methods [1, 13, 32, 34, 43, 50, 52] and geometry-based methods [8, 19,20,21, 31, 39, 51]. Empirically, the optimization-based methods can be adaptive for (1.6) with generic convex sets in high dimension but may be computational intensive, whilst the geometry-based ones usually perform efficiently but are confined to (1.6) with low-dimensional structured sets (e.g., 2D/3D polytopes).

Gilbert et al. [21] proposed a recursion (named as GJK method) for solving (1.6) by approximating \({\mathcal {C}}\) as simplices. It can also be extended to solve (1.4) with \(n=3\) and \(\sigma _{\mathcal {F}}(\cdot )=\Vert \cdot \Vert _2\). Some improvements of GJK method were explored in, e.g., [8, 20, 39]. Recently, Qin and An [40] proposed a fast recursion for solving the dual of (1.4) by using Nesterov’s gradient-like method [36] and smoothing technique [37]. Herein, our paper is devoted to tackling (1.4) with generic component sets by reformulating it as a separable convex optimization, and solving it by the recent state-of-the-art first-order optimization methods with O(1/k) convergence rate. Moreover, the proposed methods favour the parallel computing system [4, 6] when the number of component sets \(\Omega _i\)’s in (1.4) is huge-size.

The rest of the paper is organized as follows. In Sect. 2, some basic definitions and propositions are stated for the sequel discussion. In Sect. 3, the reformulations of (1.4) in primal, dual, and primal-dual fashions are presented, followed by the implementation details on handling (1.4) in Sect. 4. In Sect. 5, numerical simulations are conducted to illustrate the performances of alternating direction method of multipliers (ADMM) and primal-dual hybrid gradient method (PDHG). Finally, some concluding remarks are drawn in Sect. 6.

2 Preliminaries

In this section, we summarize some basic concepts and their properties that will be useful for the sequel discussions.

For any vector \(x=(x_1,x_2,\ldots ,x_n)\in {{\mathbb {R}}}^n\), let \(\Vert x\Vert _p:=(\sum _{i=1}^n|x_i|^p)^{1/p}\) (\(1\le p<\infty\)) denote the \(\ell ^p\)-norm and \(\Vert x\Vert _\infty :=\max _{i=1,\ldots , n}|x_i|\) denote the \(\ell ^\infty\)-norm. Given a symmetric matrix \(Q\in {{\mathbb {R}}}^{n\times n}\), the \(Q\succ 0\) (resp. \(Q\succeq 0\)) denotes the positive definite (resp. positive semi-definite) matrix. For any \(Q\succ 0\), we denote by \(\Vert x\Vert _Q:=(x^\top Qx)^{1/2}\) the Q-norm of x. We use \(\hbox {diag}(x)\) to represent a diagonal matrix whose diagonal elements are \(x_i\)’s. Let \({\mathbf {1}}\) and I denote the all-one and identity matrix (whose dimensions will be clear from the context), respectively.

We now review some basic definitions and properties from convex analysis (see e.g., the monograph [2]).

Definition 2.1

Let \({\mathcal {C}}\subset {{\mathbb {R}}}^n\) be a set. Then

  1. i)

    \({\mathcal {C}}\) is closed if the limit of any convergent sequence in \({\mathcal {C}}\) belongs to \({\mathcal {C}}\).

  2. ii)

    \({\mathcal {C}}\) is compact if any sequence of \({\mathcal {C}}\) has a cluster point in \({\mathcal {C}}\) (or equivalently, \(\mathcal {C}\) is closed and bounded).

  3. iii)

    The convex hull of \({\mathcal {C}}\), denoted by \(\mathrm{conv}({\mathcal {C}})\), is the intersection of all convex sets containing \({\mathcal {C}}\) (or equivalently, the smallest convex set containing \(\mathcal {C}\)). Particularly, the \(\mathrm{conv}({\mathcal {C}})\) is a polytope if \({\mathcal {C}}\) comprises a finite number of elements.

  4. iv)

    The decomposition of \({\mathcal {C}}\), denoted by \({\mathfrak {D}}({\mathcal {C}})\), is defined as

$$\begin{aligned} \begin{array}{c} {\mathfrak {D}}({\mathcal {C}}):=\left\{ {\mathcal {C}}_i\subset {{\mathbb {R}}}^n\mid \bigcup _{i=1}^m{\mathcal {C}}_i={\mathcal {C}},\; \hbox {and}\; \; {\mathcal {C}}_i\bigcap \mathcal {C}_j=\emptyset \;\hbox {for any}\; i\ne j\right\} . \end{array} \end{aligned}$$

Furthermore, the convex decomposition of \({\mathcal {C}}\), denoted by \(\widehat{\mathfrak {D}}(\mathcal {C})\), is a decomposition of \({\mathcal {C}}\) with only convex component sets, i.e.,

$$\begin{aligned} \widehat{\mathfrak {D}}({\mathcal {C}}):=\{{\mathcal {C}}_i\subset {{\mathbb {R}}}^n\mid {\mathcal {C}}_i\in {\mathfrak {D}}({\mathcal {C}}),\;\hbox {and all}\;\;{\mathcal {C}}_i\hbox {'s} \;\hbox {are convex sets}\}. \end{aligned}$$

Besides the commutative and associative properties, the Minkowski sum operation “\(\oplus\)” admits the following important propositions.

Lemma 2.2

Let \({\mathcal {C}}_i\subset {{\mathbb {R}}}^{l_i}\)\((i=1,2)\)be nonempty sets (possibly nonconvex).

  1. i)

    \(\mathrm{conv}({\mathcal {C}}_1\oplus {\mathcal {C}}_2)=\mathrm{conv}({\mathcal {C}}_1)\oplus \mathrm{conv}({\mathcal {C}}_2)\).

  2. ii)

    If \({\mathcal {C}}_i\)\((i=1,2)\)are convex (resp. compact) sets and \(T_i:{{\mathbb {R}}}^{l_i}\rightarrow {{\mathbb {R}}}^n\) (\(i=1,2\)) are affine mappings, then \(T_1({\mathcal {C}}_1)\oplus T_2({\mathcal {C}}_2)\)is a convex (resp. compact) set.

More generally, the above lemma remains true on the occasion of a finite number of sets or affine mappings.

Let \(f:{{\mathbb {R}}}^n\rightarrow (-\infty ,+\infty ]\). The domain and epigraph of f are defined by \(\hbox {dom}(f):=\{x\in {{\mathbb {R}}}^n \mid f(x)<+\infty \}\) and \(\hbox {epi}(f):=\{(x,y)\in {{\mathbb {R}}}^n\times {{\mathbb {R}}}\mid f(x)\le y\}\), respectively. If the \(\hbox {dom}(f)\) is nonempty, then f is said to be proper. If the \(\hbox {epi}(f)\) is closed, then f is said to be closed. The indicator function of a set \({\mathcal {C}}\subset {{\mathbb {R}}}^n\), denoted by \(\iota _{\mathcal {C}}\), is defined as

$$\begin{aligned} \iota _{\mathcal {C}}(x):= {\left\{ \begin{array}{ll} 0, &\quad{} \hbox {if}\;\;x\in {\mathcal {C}},\\ +\infty , &\quad{}\hbox {otherwise}. \end{array}\right. } \end{aligned}$$
(2.1)

The following properties and examples of support function \(\sigma _{\mathcal {F}}\) can be referred to, e.g., [2].

Proposition 2.3

Let \({\mathcal {F}}\subset {{\mathbb {R}}}^n\)be a nonempty set. Then,

  1. i)

    \(\sigma _{\mathcal {F}}\)is closed and convex.

  2. ii)

    \(\sigma _{\mathcal {F}}\)is finite everywhere if and only if \({\mathcal {F}}\)is a bounded set.

  3. iii)

    \(\sigma _{\mathcal {F}}\)is Lipschitz continuous with \(\sup _{x\in {\mathcal {F}}}\Vert x\Vert\)as Lipschitz constant if \({\mathcal {F}}\)is a compact convex set.

  4. iv)

    \(\sigma _{\mathcal {F}}\)is sublinear, i.e., \(\sigma _{\mathcal {F}}(\alpha _1x_1+\alpha _2x_2)\le \alpha _1\sigma _{\mathcal {F}}(x_1)+\alpha _2\sigma _{\mathcal {F}}(x_2)\) for any scalars \(\alpha _i\ge 0\)and vectors\(x_i\in {{\mathbb {R}}}^n\) (\(i=1,2\)).

  5. v)

    \(\sigma _{\mathcal {F}}\)is a norm on \({{\mathbb {R}}}^n\)if \({\mathcal {F}}\)is compact, convex, symmetric (i.e., \({\mathcal {F}}=-{\mathcal {F}}\)) and contains the origin as an interior point.

Example 2.1

Let \({\mathcal {F}}\subset {{\mathbb {R}}}^n\) be a nonempty set.

  1. i)

    If \({\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _\dagger \le 1\}\) is a unit ball associated with a norm \(\Vert \cdot \Vert _\dagger\), then the support function \(\sigma _{{\mathcal {F}}}(x)=\Vert x\Vert _{\ddagger }\), where \(\Vert \cdot \Vert _{\ddagger }\) is the dual norm of \(\Vert \cdot \Vert _\dag\). For instance, if \({\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _Q\le 1\}\) with \(Q\succ 0\), then \(\sigma _{{\mathcal {F}}}(x)=\Vert x\Vert _{Q^{-1}}\); if \({\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _p\le 1\}\) with \(p\in [1,\infty ]\), then \(\sigma _{{\mathcal {F}}}(x)=\Vert x\Vert _{\frac{p}{p-1}}\). Herein, the \(\frac{p}{p-1}\) takes value \(\infty\) (resp. 1) when \(p=1\) (resp. \(p=\infty\)).

  2. ii)

    If \({\mathcal {F}}\) is a closed convex cone, then the support function \(\sigma _{{\mathcal {F}}}(x)=\iota _{{\mathcal {F}}^o}(x)\), where \({\mathcal {F}}^\circ :=\{x\in {{\mathbb {R}}}^n\mid x^\top y\le 0, \;\forall y\in {\mathcal {F}}\}\) denotes the polar cone of \({\mathcal {F}}\).

  3. iii)

    If \({\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid Ax=b\}\) is an affine set with \(A\in {{\mathbb {R}}}^{m\times n}\), then the support function

$$\begin{aligned} \sigma _{{\mathcal {F}}}(x)={\left\{ \begin{array}{ll}b^\top z, &\quad{} \hbox {if}\;\; x=A^\top z,\\ +\infty , &\quad{} \hbox {otherwise}. \end{array}\right. } \end{aligned}$$
(2.2)

As aforementioned, the generalized projection \(\Pi _{\mathcal {F}}(\cdot ;{\mathcal {C}})\) in (1.3) reduces to the canonical Euclidean projection when \({\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _2\le 1\}\). For notational brevity, we use \(\Pi (\cdot ;{\mathcal {C}})\) for the Euclidean projection operator. The followings are some examples of Euclidean projections possessing closed-form formulae. The interested reader is referred to, e.g., [2, Chapter 28], for more examples.

Example 2.2

Let \({\mathcal {C}}\subset {{\mathbb {R}}}^n\) be a nonempty closed convex set.

  1. i)

    If \({\mathcal {C}}=\{x\in {{\mathbb {R}}}^n\mid a\le x\le b\}\) is a box with \(a\in {{\mathbb {R}}}^n\) and \(b\in {{\mathbb {R}}}^n\), then \(\left[ \Pi (x;{\mathcal {C}})\right] _i=\mathrm {median}\{a_i,x_i,b_i\}\) for \(i=1,2,\ldots ,n\).

  2. ii)

    If \({\mathcal {C}}=\left\{ x\in {{\mathbb {R}}}^n \mid \Vert x\Vert _2\le \alpha \right\}\) is a ball with \(\alpha >0\), then \(\Pi (x;{\mathcal {C}})=\min \left( 1,\frac{\alpha }{\Vert x\Vert _2}\right) x\).

  3. iii)

    If \({\mathcal {C}}=\{x\in {{\mathbb {R}}}^n\mid Ax=b\}\) is an affine set with \(A\in {{\mathbb {R}}}^{m\times n}\) and \(b\in {{\mathbb {R}}}^m\), then \(\Pi (x;{\mathcal {C}})=x-A^\dagger (Ax-b)\), where \(A^\dagger\) is the Moore–Penrose pseudo-inverseFootnote 1 of A. Particularly, if \({\mathcal {C}}=\{x\in {{\mathbb {R}}}^n\mid a^\top x=b,\;a\ne {0}\in {{\mathbb {R}}}^n\}\) is a nonvertical hypeplane, then \(\Pi (x;{\mathcal {C}})=x-\frac{a^\top x-b}{\Vert a\Vert _2^2}a\).

  4. iv)

    If \({\mathcal {C}}=\{(x,t)\in {{\mathbb {R}}}^n\times {{\mathbb {R}}}\mid \Vert x\Vert _2\le \alpha t\}\) is a second-order cone with \(\alpha >0\), then

    $$\begin{aligned} \Pi (x;{\mathcal {C}})=\left\{ \begin{array}{ll}(x,t),&\quad{} \hbox {if}\;\;\Vert x\Vert _2\le \alpha t;\\ ({0},0),&\quad{}\hbox {if}\;\;\Vert x\Vert _2\le -t/\alpha ;\\ \frac{\alpha \Vert x\Vert _2+t}{(\alpha ^2+1)\Vert x\Vert _2}(\alpha x,\Vert x\Vert _2),&\quad{}\hbox {otherwise}. \end{array}\right. \end{aligned}$$

The followings are some propositions of Euclidean projection onto convex set with Minkowski sum structures.

Proposition 2.4

Let \(\alpha >0\)be a scalar, \({\mathcal {A}}\)and \({\mathcal {C}}\)be nonempty closed convex sets in \({{\mathbb {R}}}^n\), and x, \(x^0\)be vectors in \({{\mathbb {R}}}^n\).

  1. i)

    If \({\mathcal {A}}=x^0\oplus {\mathcal {C}}\), then \(\Pi (x;{\mathcal {A}})=x^0+\Pi (x-x^0;{\mathcal {C}})\).

  2. ii)

    If \({\mathcal {A}}\perp {\mathcal {C}}\) (i.e., \(a^\top c=0\) for any \(a\in {\mathcal {A}}\)and \(c\in {\mathcal {C}}\)), then \(\Pi (x;{\mathcal {A}}\oplus {\mathcal {C}})=\Pi (x;{\mathcal {A}})+\Pi (x;{\mathcal {C}})\).

  3. iii)

    If \({\mathcal {A}}={\mathcal {B}}\oplus {\mathcal {C}}\)with \({\mathcal {B}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _2\le \alpha \}\), then \(\Pi (x;{\mathcal {A}})=\Pi (x;{\mathcal {C}})+\min \left( 1,\frac{\alpha }{\Vert x-\Pi (x;{\mathcal {C}})\Vert _2}\right) (x-\Pi (x;{\mathcal {C}}))\).

Proof

The proofs of i) and ii) can be referred to [2, Propositions 28.1 and 28.6], respectively. The assertion iii) can be proved as follows.

By combining the definition of Euclidean projection with the Minkowski sum structure of \({\mathcal {A}}\), we have \(\Pi (x;{\mathcal {A}})=\hbox {arg}\min \limits _{y\in {\mathcal {A}}}\Vert y-x\Vert _2=w_1^*+w_2^*\), where \((w_1^*,w_2^*)\) is an optimum of

$$\begin{aligned} \min \limits _{w_1\in \mathcal {B},w_2\in \mathcal {C}}\;\Vert w_1+w_2-x\Vert _2^2. \end{aligned}$$

By invoking the first-order optimality conditions, we have

$$\begin{aligned} w_1^*&= \Pi \left( w_1^*-\beta _1(w_1^*+w_2^*-x);{\mathcal {B}}\right) ,\;\;\forall \;\beta _1>0, \end{aligned}$$
(2.3)
$$\begin{aligned} w_2^*&= \Pi \left( w_2^*-\beta _2(w_1^*+w_2^*-x);{\mathcal {C}}\right) ,\;\;\forall \;\beta _2>0. \end{aligned}$$
(2.4)

Without loss of generality, by setting \(\beta _1=1\) in (2.3), it follows from Example 2.5 ii) that

$$\begin{aligned} w_1^*=\Pi \left( x-w_2^*;{\mathcal {B}}\right) ={\left\{ \begin{array}{ll} \frac{\alpha (x-w_2^*)}{\Vert x-w_2^*\Vert _2},&{}\;\;\hbox {if}\;\;\Vert x-w_2^*\Vert _2>\alpha ,\\ x-w_2^*,&{}\;\;\hbox {otherwise}. \end{array}\right. } \end{aligned}$$
(2.5)

In the case of \(\Vert x-w_2^*\Vert _2>\alpha\), by substituting \(w_1^*=\frac{\alpha (x-w_2^*)}{\Vert x-w_2^*\Vert _2}\) into (2.4) and setting \(\beta _2=(1-\frac{\alpha }{\Vert x-w_2^*\Vert _2})^{-1}\), we obtain \(w_2^*=\Pi (x;{\mathcal {C}})\), and thus \(w_1^*=\frac{\alpha (x-\Pi (x;{\mathcal {C}}))}{\Vert x-\Pi (x;{\mathcal {C}})\Vert _2}\). Finally, we get

$$\begin{aligned} w_1^*+w_2^*={\left\{ \begin{array}{ll} \Pi (x;{\mathcal {C}})+\frac{\alpha (x-\Pi (x;{\mathcal {C}}))}{\Vert x-\Pi (x;{\mathcal {C}})\Vert _2},&{}\;\;\hbox {if}\;\;\Vert x-\Pi (x;{\mathcal {C}})\Vert _2>\alpha ,\\ x,&{}\;\;\hbox {otherwise}, \end{array}\right. } \end{aligned}$$
(2.6)

which completes the proof. \(\square\)

The set \({\mathcal {A}}\) in Proposition 2.6 iii) is widely known as the offsetting of \({\mathcal {C}}\) by a radius \(\alpha\). As a special case of Proposition 2.6 i), if \({\mathcal {C}}=\{x\in {{\mathbb {R}}}\mid \Vert x\Vert _2\le \alpha \}\), it follows that \({\mathcal {A}}\) is an \(x^0\)-centred \(\ell ^2\)-norm ball with radius \(\alpha\). Accordingly, the following corollary holds.

Corollary 2.5

If \({\mathcal {A}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x-x^0\Vert _2\le \alpha \}\)with \(\alpha >0\), then \(\Pi (x;{\mathcal {A}})=x^0+\min \left( 1,\frac{\alpha }{\Vert x-x^0\Vert _2}\right) (x-x^0)\).

We now present some preliminaries involving functions and operators, which will be useful in the sequel discussion.

The subdifferential of f, denoted by \(\partial f: {{\mathbb {R}}}^n\rightarrow 2^{{{\mathbb {R}}}^n}\), is defined as

$$\begin{aligned} \begin{array}{c} \partial f(x):=\left\{ \xi \in {{\mathbb {R}}}^n \mid f(y)\ge f(x)+\xi ^\top (y-x), \;\;\forall y\in {{\mathbb {R}}}^n\right\} ,\;\;\forall x\in \hbox {dom}(f). \end{array} \end{aligned}$$
(2.7)

If \(f:{{\mathbb {R}}}^n\rightarrow (-\infty ,+\infty ]\) is a convex function, then the subdifferential \(\partial f(x)\) is a nonempty compact convex set for all \(x\in {{\mathbb {R}}}^n\) (see e.g., [2] for details). We state below some examples of subdifferential.

Example 2.3

Let \({\mathcal {C}}\subset {{\mathbb {R}}}^n\) be a nonempty closed convex set. Then,

$$\begin{aligned} \partial \iota _{\mathcal {C}}(x)&={\left\{ \begin{array}{ll} N_{{\mathcal {C}}}(x):=\{\xi \in {{\mathbb {R}}}^n\mid \xi ^\top (y-x)\le 0,\;\forall y\in {\mathcal {C}}\},&{}\;\;\hbox {if}\;\;x\in {\mathcal {C}}, \\ \emptyset , &{}\;\;\hbox {otherwise}, \end{array}\right. }\\ \partial \sigma _{\mathcal {C}}(x)&={\left\{ \begin{array}{ll} {\mathcal {C}},&{}\;\;\hbox {if}\;\;x=0, \\ F_{\mathcal {C}}(x):=\{\xi \in {\mathcal {C}}\mid \xi ^\top x=\sigma _{\mathcal {C}}(x)\}, &{}\;\;\hbox {otherwise}, \end{array}\right. } \end{aligned}$$

where \(N_{\mathcal {C}}\) and \(F_{\mathcal {C}}\) are the so-called normal cone and exposed face of \({\mathcal {C}}\).

Example 2.4

Let \(\Vert \cdot \Vert _{\dagger }\) be a norm of \({{\mathbb {R}}}^n\) and \(\Vert \cdot \Vert _{\ddagger }\) be its dual norm. Then

$$\begin{aligned} \partial \Vert x\Vert _{\dagger }={\left\{ \begin{array}{ll} \{\xi \in {{\mathbb {R}}}^n\mid \Vert \xi \Vert _{\ddagger }\le 1\}, &{}\;\hbox {if}\;\;x={0},\\ \{\xi \in {{\mathbb {R}}}^n\mid \Vert \xi \Vert _{\ddagger }\le 1,\;\xi ^\top x=\Vert x\Vert _{\dagger }\},&{}\;\hbox {otherwise}. \end{array}\right. } \end{aligned}$$
(2.8)

In particular, the subdifferential of \(\ell ^p\)-norm (\(p\in [1,+\infty ]\)) at origin is the unit ball associated with its dual norm, i.e., \(\partial \Vert 0\Vert _p=\{\xi \in {{\mathbb {R}}}^n\mid \Vert \xi \Vert _{\frac{p}{p-1}}\le 1\}\), whilst the subdifferetial of \(\ell ^p\)-norm at \(x\ne {0}\) can be stated as

$$\begin{aligned}&\partial \Vert x\Vert _1=\Big \{\xi \in {{\mathbb {R}}}^n\mid \xi _i=\frac{x_i}{|x_i|}\;\;\hbox {for}\;\; x_i\ne 0,\;\;\hbox {and}\;\;\xi _i\in [-1,1]\;\;\hbox {for}\; x_i=0\Big \},\\&\partial \Vert x\Vert _p=\Big \{\xi \in {{\mathbb {R}}}^n\mid \xi _i=\frac{x_i|x_i|^{p-2}}{\Vert x\Vert _p^{p-1}}\;\;\hbox {for}\;\; x_i\ne 0,\;\;\hbox {and}\;\;\xi _i=0\;\;\hbox {for}\; x_i=0\Big \},\\&\partial \Vert x\Vert _\infty ={\mathrm conv}\Big (\frac{x_ie_i}{|x_i|}\mid i\in {\mathcal {I}}(x)\Big ), \end{aligned}$$

where \({\mathcal {I}}(x):=\{i\mid |x_i|=\Vert x\Vert _\infty \}\) and \(e_i\in {{\mathbb {R}}}^n\) accounts for the canonical basis vector (i.e., the vector whose only nonzero element is 1 in the ith coordinate). The conjugate of f, denoted by \(f^*\), is defined as

$$\begin{aligned} \begin{array}{c} f^*(\lambda ):=\sup \big \{x^\top \lambda -f(x)\mid x\in \hbox {dom}(f)\big \}, \;\; \forall \lambda \in {{\mathbb {R}}}^n. \end{array} \end{aligned}$$
(2.9)

Note that \(f^*\) is always a closed convex function (even if f is not closed convex). Furthermore, if f is a closed convex proper function, then \(f^{**}=f\). It follows by (2.9) that the indicator function \(\iota _{\mathcal {C}}\) and support function \(\sigma _{\mathcal {C}}\) are conjugate with each other. Let \(f:{{\mathbb {R}}}^n\rightarrow (-\infty ,+\infty ]\) be a closed convex proper function. The proximity of f, denoted by \(\hbox {prox}_f\), is defined as

$$\begin{aligned} \begin{array}{c} \hbox {prox}_f(x):=\underset{y\in {{\mathbb {R}}}^n}{\hbox {argmin}}\left\{ f(y)+\frac{1}{2}\Vert y-x\Vert ^2_2\right\} , \;\; \forall x\in {{\mathbb {R}}}^n. \end{array} \end{aligned}$$
(2.10)

The followings are some properties of proximity which will be useful for the sequel analysis. The interested reader is referred to [11, Table 10.2] for more instances and properties of proximity operators in Hilbert space.

Proposition 2.6

Let \(\alpha >0\)and \(\gamma \in {{\mathbb {R}}}\)be scalars, \(b\in {{\mathbb {R}}}^n\)be a vector, \(A\in {{\mathbb {R}}}^{n\times m}\)be a matrix, \({\mathcal {C}}\subset {{\mathbb {R}}}^n\)be a nonempty closed convex set, and \(\varphi :{{\mathbb {R}}}^n\rightarrow (-\infty ,+\infty ]\)be a closed convex proper function.

  1. i)

    If \(f(x)=\varphi (Ax+b)\)with A satisfying \(AA^\top =\alpha I_n\), then

    $$\begin{aligned} \hbox {prox}_f(x)=x-\alpha ^{-1}A^\top \left( Ax+b-\hbox {prox}_{\alpha \varphi }(Ax+b)\right) . \end{aligned}$$
    (2.11)
  2. ii)

    If \(f(x)=\varphi (x)+\frac{\alpha }{2}\Vert x\Vert _2^2+b^\top x+\gamma\), then \(\hbox {prox}_f(x)=\hbox {prox}_{\frac{\varphi }{\alpha +1}}\left( \frac{x-b}{\alpha +1}\right)\).

  3. iii)

    Recall that \(\varphi ^*\)is the conjugate of \(\varphi\), then \(x=\hbox {prox}_{\alpha \varphi }(x)+\alpha \hbox {prox}_{\frac{\varphi ^*}{\alpha }}(\frac{x}{\alpha })\)for any \(x\in {{\mathbb {R}}}^n\).

  4. iv)

    If \(f(x)=\iota _{{\mathcal {C}}}(x)\)is indicator function, then \(\hbox {prox}_f(x)=\Pi (x;{\mathcal {C}})\); and if \(f(x)=\sigma _{{\mathcal {C}}}(x)\)is support function, then \(\hbox {prox}_f(x)=x-\Pi (x;{\mathcal {C}})\).

Note that Proposition 2.10 iii) is the celebrated Moreau’s identity, which indicates the relationship of proximities between \(\varphi\) and its conjugate.

3 Reformulations of (1.4) into variant forms

In this section, we first reformulate (1.4) into variant forms, and then deduce the equivalent monotone inclusion problems by employing the first-order optimality conditions. It will be demonstrated that the problem (1.4) can be tractably solved by a large family of splitting methods.

We start the discussion by restating (1.4) as follows

$$\begin{aligned} \min&\;\;\;\sigma _{\mathcal {F}}(x-x^0) \end{aligned}$$
(3.1a)
$$\begin{aligned} \hbox {subject to}&\;\;\; x\in T_1(\Omega _1)\oplus T_2(\Omega _2)\oplus \cdots \oplus T_m(\Omega _m). \end{aligned}$$
(3.1b)

The existence of optima of (3.1) follows immediately by the Weierstrass’ theorem (see e.g., [2, Chapter 11]).

Proposition 3.1

Problem (3.1) admits at least one optimum if one of the following conditions is satisfied: (i) \(T_i(\Omega _i)\) (\(i=1,2,\ldots ,m\)) are compact sets; (ii) \(0\in \mathrm{int}({\mathcal {F}})\).

Proof

  1. i)

    By the compactness of \(T_i(\Omega _i)\) and Lemma 2.2 ii), we have that the Minkowski sum of \(T_i(\Omega _i)\) is a compact set. Moreover, as \(\sigma_{\mathcal{F}}\) is closed (see Proposition 2.3), the assertion holds by the Weierstrass’ theorem.

  2. ii)

    If \(0\in {\mathrm int}(F)\), there exists a neighbourhood of 0, denoted by \(\mathcal {N}_\delta (0)\), satisfying \(\mathcal {N}_\delta (0)\subset \mathcal {F}\). For any \(x\in {{\mathbb {R}}}^n\), we have \(y^0:=\frac{\delta (x-x^0)}{2\Vert x-x^0\Vert _2}\in {\mathcal {N}}_\delta (0)\). Accordingly,

$$\begin{aligned} \sigma _{\mathcal {F}}(x-x^0)=\sup _{y\in {\mathcal {F}}} y^\top (x-x^0)\ge \sup _{y\in {\mathcal {N}}_\delta (0)} y^\top (x-x^0)\ge (y^0)^\top (x-x^0)=\frac{\delta }{2}\Vert x-x^0\Vert _2, \end{aligned}$$

which indicates that \(\sigma _{\mathcal {F}}(\cdot -x^0)\) is coercive (i.e., \(\lim \limits _{\Vert x\Vert _2\rightarrow \infty }\sigma _{\mathcal {F}}(x-x^0)=+\infty\)). Again, the assertion (ii) follows from the Weierstrass’ theorem. \(\square\)

Because of the affine property of \(T_i\), it is obvious that Proposition 3.1 (i) holds if all \(\Omega _i\)’s are compact sets. Under some stringent premise, the uniqueness of optimum of (3.1) can be guaranteed. For instance, if \(\Omega\) and \({\mathcal {F}}\) in (3.1) are convex, normally smooth and round,Footnote 2 then (3.1) admits a unique optimum [40].

By recalling the affine mappings \(T_i\) in (1.5), the constraint (3.1b) can be described as

$$\begin{aligned} \begin{array}{c} x=\sum \limits _{i=1}^m(M_i\omega _i+a_i), \;\;\omega _i\in \Omega _i, \;\;i=1,2,\ldots ,m. \end{array} \end{aligned}$$
(3.2)

Hereafter, for notational convenience, we denote \(\omega \in {{\mathbb {R}}}^l\), \(a\in {{\mathbb {R}}}^n\), \(M\in {{\mathbb {R}}}^{n\times l}\) and \(\Omega \subset {{\mathbb {R}}}^l\) as follows

$$\begin{aligned} \begin{array}{c} \omega :=\begin{pmatrix} \omega _1\\ \omega _2\\ \vdots \\ \omega _m\\ \end{pmatrix}, \;\; a:=\sum \limits _{i=1}^ma_i, \;\;M:=\begin{pmatrix}M_1,M_2,\ldots ,M_m\end{pmatrix}\;\; \hbox {and}\;\; \Omega :=\Omega _1\times \cdots \times \Omega _m. \end{array} \end{aligned}$$
(3.3)

3.1 Primal reformulation

With notations in (3.3), the constraint (3.2) can be narrowed down to \(x=M\omega +a\) with \(\omega \in \Omega\). Accordingly, problem (3.1) can be cast as

$$\begin{aligned} \begin{array}{rl} \min \;\sigma _{\mathcal {F}}(x-x^0)\;\;\;\;\;\; \hbox {subject to}\;\;\;x=M\omega +a,\;\omega \in \Omega , \end{array} \end{aligned}$$
(3.4)

or concisely,

$$\begin{aligned} \min _{\omega \in {{\mathbb {R}}}^l}\;\;\sigma _{\mathcal {F}}(M\omega +a-x^0)+\iota _{\Omega }(\omega ), \end{aligned}$$
(3.5)

where \(\iota _{\Omega }\) is the indicator function defined in (2.1). For simplicity, the (3.5) is termed as the primal reformulation of (1.4). The first-order optimality conditions of (3.5) read

$$\begin{aligned} 0\in M^\top \partial \sigma _{\mathcal {F}}(M\omega +a-x^0)+\partial \iota _{\Omega }(\omega ), \end{aligned}$$
(3.6)

where \(\partial \iota _{\Omega }\) and \(\partial \sigma _{\mathcal {F}}\) are subdifferential operators (see Example 2.8) of \(\iota _{\Omega }\) and \(\sigma _{\mathcal {F}}\), respectively. Particularly, if the dynamic set \(\mathcal {F}=\{x\in {\mathbb {R}}^n\mid \Vert x\Vert _{Q^{-1}}\le 1\}\) with \(Q\succ 0\) (or equivalently, \(\sigma _{\mathcal {F}}(x)=\Vert x\Vert _Q\)), the (3.5) can be equivalently cast as

$$\begin{aligned} \begin{array}{c} \min \limits _{\omega \in {\mathbb {R}}^l}\;\;\frac{1}{2}\Vert M\omega +a-x^0\Vert _Q^2+\iota _{\Omega }(\omega ), \end{array} \end{aligned}$$
(3.7)

which implies that the projection onto Minkowski sum set (1.4) is substantially a constrained least squares problem. Obviously, the objective function of (3.7) is strongly convex if M is of full column rank, which implies that the (3.7) admits unique optimum. Moreover, a corollary can be immediately deduced for particular M.

Corollary 3.2

If \({\mathcal {F}}=\{x\in {\mathbb {R}}^n\mid \Vert x\Vert _{Q^{-1}}\le 1\}\)with \(Q\succ 0\), i.e., \(\sigma _{\mathcal {F}}(x)=\Vert x\Vert _Q\), and \(M\in {\mathbb {R}}^{n\times l}\)satisfies \(M^\top QM=\mathrm{diag}\{\frac{1}{\nu _1}I_{l_1},\frac{1}{\nu _2}I_{l_2},\ldots ,\frac{1}{\nu _m}I_{l_m}\}\)with \(\nu _i>0\)for all \(i=1,2,\ldots ,m\) (i.e., the columns of M are Q-conjugate vectors), then the optimum of (3.5), denoted by \(\omega ^*=(\omega _1^*,\omega _2^*,\ldots ,\omega _m^*)\), admits the explicit formula

$$\begin{aligned} \begin{array}{c} w_i^*=\Pi \left( \nu _iM_i^\top Q(x^0-a);\Omega _i\right) ,\;\; i=1,2,\ldots ,m. \end{array} \end{aligned}$$
(3.8)

Proof

By the first-order optimality conditions of (3.7) , we derive

$$\begin{aligned} 0\in M^\top Q(M\omega ^*+a-x^0)+\partial \iota _{\Omega }(\omega ^*). \end{aligned}$$

Furthermore, it follows from the assumption \(M_i^\top QM_i=\frac{1}{\nu _i}I_{l_i}\) and the fact \(\Omega =\Omega _1\times \cdots \times \Omega _m\) that

$$\begin{aligned} \nu _iM_i^\top Q(x^0-a)\in w_i^*+\nu _i\partial \iota _{\Omega _i}(\omega _i^*),\;\; i=1,2, \ldots ,m, \end{aligned}$$

which indicates that \(w_i^*=(I_{l_i}+\nu _i\partial \iota _{\Omega _i})^{-1}(\nu _iM_i^\top Q(x^0-a))=\Pi (\nu _iM_i^\top Q(x^0-a);\Omega )\). \(\square\)

The corollary above indicates that if the column vectors of \(M_i\) are Q-conjugate, then the projection onto a convex set with Minkowski sum structure admits closed-form solution.

3.2 Dual reformulation

On the other hand, by deploying Fenchel–Rockafellar duality (see e.g., [2, Chapter 15]), the dual of (3.5) can be formulated as

$$\begin{aligned} \max _{\lambda \in {\mathbb {R}}^n}\;\;\lambda ^\top (a-x^0)-\iota _{\mathcal {F}}(\lambda )-\sigma _{\Omega }(-M^\top \lambda ). \end{aligned}$$
(3.9)

Accordingly, the first-order optimality conditions of (3.9) read

$$\begin{aligned} 0\in a- x^0-\partial \iota _{\mathcal {F}}(\lambda )+M\partial \sigma _{\Omega }(-M^\top \lambda ). \end{aligned}$$
(3.10)

The explicit formula of \(\sigma _{\Omega }\) can be available if \(\Omega\) is special set, e.g, ball, closed convex cone, and affine set (see Example 2.4). For example, if \(\Omega\) is a convex cone, the (3.9) can be cast as \(\max _{\lambda \in {\mathbb {R}}^n}\{\lambda ^\top (a-x^0)-\iota _{\mathcal {F}}(\lambda )-\iota _{\Omega ^\circ }(-M^\top \lambda )\}\). Furthermore, if \({\mathcal {F}}=\{x\in {\mathbb {R}}^n\mid \Vert x\Vert _{\infty }\le \alpha \}\) and \(\Omega =\{x\in {\mathbb {R}}^n\mid Ax\le 0\}\) be a polyhedral cone, the \(\Omega ^\circ =\{x\in {\mathbb {R}}^n\mid x=A^\top z, z\ge 0\}\) is exactly the finite generated cone by columns of A. Thus, the (3.9) can be reduced to

$$\begin{aligned} \begin{array}{rl} \max \limits _{\lambda \in {\mathbb {R}}^n}\;\;\lambda ^\top (a-x^0)\;\;\;\;\;\; \hbox {subject to}\;\;\;\Vert \lambda \Vert _\infty \le \alpha ,\; M^\top \lambda +A^\top z=0,\; z\ge 0, \end{array} \end{aligned}$$

which is essentially a linear programming in variables \((\lambda ,z)\).

3.3 Primal-dual reformulation

Moreover, by applying the conjugate (see (2.9) for definition) of \(\sigma _{\mathcal {F}}\) to (3.5), we derive a saddle point problem

$$\begin{aligned} \min _{\omega \in {\mathbb {R}}^l}\max _{\lambda \in {\mathbb {R}}^n}\;\psi (w,\lambda ):=\lambda ^\top (M\omega +a-x^0)-\iota _{\mathcal {F}}(\lambda )+\iota _{\Omega }(\omega ). \end{aligned}$$
(3.11)

Again, by the first-order optimality conditions, the saddle point problem (3.11) amounts to

$$\begin{aligned} {\left\{ \begin{array}{ll} 0\in \partial \iota _{\Omega }(\omega )+M^\top \lambda ,\\ 0\in \partial \iota _{\mathcal {F}}(\lambda )-M\omega -a+x^0, \end{array}\right. } \end{aligned}$$

or equivalently,

$$\begin{aligned} \begin{pmatrix} 0\\ a-x^0 \end{pmatrix}\in \begin{pmatrix} \partial \iota _{\Omega } &{}M^\top \\ -M&{} \partial \iota _{\mathcal {F}} \end{pmatrix}\begin{pmatrix} \omega \\ \lambda \end{pmatrix}. \end{aligned}$$
(3.12)

As the (3.11) is linearly coupled in \(\omega\) and \(\lambda\), there always exists a saddle point \((\omega ^*,\lambda ^*)\) satisfying (3.12), where \(\omega ^*\) and \(\lambda ^*\) are solutions of primal and dual problem, respectively (see e.g., [3, Chapter 3]).

Remark 3.3

In a word, the (1.4) can be equivalently reformulated as variant optimization forms, whose optimality conditions correspond to finding zeros of a series of monotone inclusion problems, e.g., (3.6), (3.10) and (3.12). To solve those monotone inclusion problems, numerous splitting schemes can be employed, e.g., proximal point method (PPA), forward-backward splitting (FBS), forward-backward-forward splitting (FBFS), Douglas-Rachford splitting (DRS) and Peaceman-Rachford splitting (PRS), to name a few. The interested reader is referred to, e.g., [15, 38, 42], for overviews and recent advances in splitting schemes. Among those splitting schemes for tackling monotone inclusion problems, the most popular ones include the application of PPA to (3.12) which results in the well-known PDHG method [9, 24], the application DRS to (3.10) which results in ADMM [18, 23], the application DRS to (3.6) which results in the Spingarn’s method [46], etc. In a nutshell, the (1.4) can be tractably solved by a large family of first-order optimization methods except for the computational geometry based algorithms.

4 Solvers for projection onto Minkowski sum set

As discussed in Sect. 3, the (1.4) can be reformulated into variant manners, then solved by numerous state-of-the-art methods. The analysis and numerical comparisons on all those methods are beyond the scope of this paper. Herein, we focus on ADMM [23] and PDHG [24] for solving (1.4).

4.1 ADMM based solver

We first consider a generic linearly constrained convex optimization problem with separable structure

$$\begin{aligned} \min&\;\;\theta _1(x_1)+\theta _2(x_2) \end{aligned}$$
(4.1a)
$$\begin{aligned} \hbox {subject to}&\;\; A_1x_1+A_2x_2=b,\;x_1\in \mathcal {X}_1,\;x_2\in \mathcal {X}_2, \end{aligned}$$
(4.1b)

where \(\theta _i:{\mathbb {R}}^{n_i}\rightarrow (-\infty ,+\infty ]\) are all closed convex proper functions (possibly nonsmooth); \(\mathcal {X}_i\subseteq {\mathbb {R}}^{n_i}\) (\(i=1,2\)) are nonempty closed convex sets; \(A_i\in {\mathbb {R}}^{l\times n_i}\) (\(i=1,2\)) are full column rank matrices; \(b\in {\mathbb {R}}^l\); and \(n_1+n_2=n\). A large family of instances arising from the area such as compressive sensing, image processing, computer vision and machine learning can be boiled down to the abstract model (4.1). A state-of-the-art algorithm for solving (4.1) is ADMM originated in [18, 23]. Given \((x_2^k,\lambda ^k)\in {\mathbb {R}}^{n_2}\times {\mathbb {R}}^l\), the recursion of ADMM for solving the abstract model (4.1) can be stated as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} x_1^{k+1}=\underset{x_1\in {\mathcal {X}}_1}{\hbox {argmin}}L_{\beta }(x_1,x_2^k,\lambda ^k), \\ x_2^{k+1}=\underset{x_2\in {\mathcal {X}}_2}{\hbox {argmin}}L_{\beta }(x_1^{k+1},x_2,\lambda ^k), \\ \lambda ^{k+1}=\lambda ^k-\gamma \beta (A_1x_1^{k+1}+A_2x_2^{k+1}-b), \end{array}\right. } \end{aligned}$$
(4.2)

where \(\beta >0\) is a penalty parameter and \(\gamma \in (0,\frac{\sqrt{5}+1}{2})\) is a relaxation factor. Herein, the \(L_{\beta }:{\mathbb {R}}^{n_1}\times {\mathbb {R}}^{n_2}\times {\mathbb {R}}^l\rightarrow {\mathbb {R}}\) denotes the augmented Lagrangian function of (4.1), which is defined as

$$\begin{aligned} L_{\beta }(x_1,x_2,\lambda ):=\theta _1(x_1)+\theta _2(x_2)-\lambda ^\top (A_1x_1+A_2x_2-b)+\frac{\beta }{2}\Vert A_1x_1+A_2x_2-b\Vert _2^2, \end{aligned}$$
(4.3)

where \(\lambda \in {\mathbb {R}}^l\) is the Lagrange multiplier. As ADMM allows us to independently cope with individual sub-function \(\theta _i\) by the aid of augmented Lagrangian function, it has gained much popularity in diverse applications (see, e.g., the overviews [6, 22] and references therein). The convergence of ADMM has been restated in the tutorial [14], and its O(1/k) convergence rate was established in [25].

Now, we present the implementation details of ADMM on solving (1.4). Essentially, the primal reformulation of (3.1) falls into the framework of (4.1) with the following specifications:

  • the variables \(x_1:=x\), \(x_2:=\omega\), and the abstract sets \({\mathcal {X}}_1={\mathbb {R}}^n\) and \({\mathcal {X}}_2=\Omega\);

  • the objective functions \(\theta _1(x_1):=\sigma _{\mathcal {F}}(x-x^0)\) and \(\theta _2(x_2):=0\);

  • the matrices and vector in linear constraint (4.1b) are \(A_1:=I\), \(A_2:=-M\) and \(b:=a\).

Therefore, the resulting subproblems when applying ADMM (or rather, the iterative scheme (4.2)) to (3.4) read

$$\begin{aligned} {\left\{ \begin{array}{ll} x^{k+1}=\underset{x\in {\mathbb {R}}^n}{\hbox {argmin}}\big \{\sigma _{\mathcal {F}}(x-x^0)+\frac{\beta }{2}\Vert x-M\omega ^k-a-{\beta }^{-1}\lambda ^k\Vert _2^2\big \}, \\ \omega ^{k+1}=\underset{\omega \in \Omega }{\hbox {argmin}}\Vert x^{k+1}-M\omega -a-{\beta }^{-1}\lambda ^k\Vert _2^2, \\ \lambda ^{k+1}=\lambda ^k-\beta (x^{k+1}-M\omega ^{k+1}-a). \end{array}\right. } \end{aligned}$$
(4.4)

Unfortunately, the \(\omega\)-subproblem in (4.4) is a least squares problem with constraint \(\Omega =\Omega _1\times \cdots \times \Omega _m\), which probably requires internally nested iteration to derive an approximate solution (albeit the projection onto individual \(\Omega _i\) admits closed-form solution). Note that the compelling numerical performance of ADMM is established in the sense that all resulting subproblems have close-form solutions or can be efficiently approximated by some subroutines. Accordingly, it is sensible to reformulate (3.4) by an alternative fashion. Concretely, by further introducing an auxiliary variable \(z\in {\mathbb {R}}^l\), the (3.4) is equivalent to

$$\begin{aligned} \begin{array}{c} \min \;\sigma _{\mathcal {F}}(x-x^0)\;\;\;\;\; \hbox {subject to}\;\;x=Mz+a, \;\omega =z, \;\omega \in \Omega . \end{array} \end{aligned}$$
(4.5)

Likewise, the problem (4.5) falls into the abstract model (4.1) with the following specifications:

  • the variables \(x_1:=(x,w)\), \(x_2:=z\), and the abstract sets \({\mathcal {X}}_1:={\mathbb {R}}^n\times \Omega\), \({\mathcal {X}}_2:={\mathbb {R}}^n\);

  • the objective functions \(\theta (x_1):=\sigma _{\mathcal {F}}(x-x^0)\) and \(\theta _2(x_2):=0\);

  • The matrices and vector with respect to the linear constraint (4.1b) are

    $$\begin{aligned} A_1:=\begin{pmatrix}I&{}0\\ 0&{}I\end{pmatrix}, \;\; A_2:=\begin{pmatrix}-M\mathrm{i}\end{pmatrix}\;\;\hbox {and}\;\;b:=\begin{pmatrix}a\\ 0\end{pmatrix}. \end{aligned}$$

ADMM is thus applicable to (4.5) with above specifications. More precisely, the augmented Lagrangian function of (4.5) can be specified as

$$\begin{aligned} L_{\beta }(x_1,x_2,\lambda ) =&\sigma _{\mathcal {F}}(x-x^0)-\lambda _1^\top (x-Mz-a)+\frac{\beta }{2}\Vert x-Mz-a\Vert _2^2\\&-\lambda _2^\top (\omega -z)+\frac{\beta }{2}\Vert \omega -z\Vert ^2_2, \end{aligned}$$

where \(\lambda =(\lambda _1,\lambda _2)\in {\mathbb {R}}^n\times {\mathbb {R}}^l\) is the Lagrange multiplier. Accordingly, the resulting subproblems when applying ADMM recursion (4.2) to (1.4) read

$$\begin{aligned} {\left\{ \begin{array}{ll} x^{k+1}=\underset{x\in {\mathbb {R}}^n}{\hbox {argmin}}\left\{ \sigma _{\mathcal {F}}(x-x^0)+\frac{\beta }{2}\Vert x-Mz^k-a-\beta ^{-1}\lambda _1^k\Vert _2^2\right\} , \\ \omega ^{k+1}=\underset{\omega \in \Omega }{\hbox {argmin}}\;\Vert \omega -z^k-\beta ^{-1}\lambda _2^k\Vert ^2_2, \\ z^{k+1}=\underset{z\in {\mathbb {R}}^n}{\hbox {argmin}}\left\{ \Vert x^{k+1}-Mz-a-\beta ^{-1}\lambda _1^k\Vert _2^2+\Vert \omega ^{k+1}-z-\beta ^{-1}\lambda _2^k\Vert ^2_2\right\} , \\ \lambda _1^{k+1}=\lambda _1^k-\beta (x^{k+1}-Mz^{k+1}-a),\\ \lambda _2^{k+1}=\lambda _2^k-\beta (w^{k+1}-z^{k+1}). \end{array}\right. } \end{aligned}$$
(4.6)

These resulting subproblems (correspondingly, the x-, \(\omega\)-, and z-subproblems) can be further delineated as follows:

  • The x-subproblem can be equivalently solved by the proximity operator

    $$\begin{aligned} \begin{array}{rl} x^{k+1}&{}=\underset{x\in {\mathbb {R}}^n}{\hbox {argmin}}\left\{ \sigma _{\mathcal {F}}(x-x^0)+\frac{\beta }{2}\Vert x-r^k\Vert _2^2\right\} \\ \hbox {[By (2.10)]}\;\;\;\;\;&{}=\hbox {prox}_{\frac{1}{\beta }\sigma _{{\mathcal {F}}\left( \cdot -x^0\right) }}(r^k)\\ \hbox {[Proposition 2.6 i)]}\;\;\;\;\;&{}=x^0+\hbox {prox}_{\frac{\sigma _{\mathcal {F}}}{\beta }}(r^k-x^0)\\ \hbox {[Proposition 2.6 iii)]}\;\;\;\;\;&{}=r^k-\frac{1}{\beta }\Pi (\beta (r^k-x^0);{\mathcal {F}}), \end{array} \end{aligned}$$
    (4.7)

    where \(r^k:=Mz^k+a+\lambda _1^k/\beta\). Accordingly, the computational effort on solving (4.7) is dominated by computing the Euclidean projection onto dynamic set \(\mathcal {F}\). In the upcoming numerical simulations, we shall consider the cases that \(\mathcal {F}\subset {\mathbb {R}}^n\) is the \(\ell ^1\)-, \(\ell ^2\)- and \(\ell ^\infty\)-norm unit ball.

  • The \(\omega\)-subproblem is equivalent to

    $$\begin{aligned} \begin{array}{rcl} \omega ^{k+1}=\underset{\omega \in \Omega }{\hbox {argmin}}\;\Vert \omega -z^k-\beta ^{-1}\lambda _2^k\Vert ^2_2=\Pi (z^k+\beta ^{-1}\lambda _2^k;\Omega ), \end{array} \end{aligned}$$
    (4.8)

    which corresponds to the Euclidean projection onto \(\Omega\). Attributed to the Cartesian product \(\Omega =\Omega _1\times \Omega _2\times \cdots \times \Omega _m\), the \(\omega ^{k+1}=(\omega _1^{k+1},\omega _2^{k+1},\ldots ,\omega _m^{k+1})\) can thus be derived simultaneously by

    $$\begin{aligned} \begin{array}{rcl} \omega _i^{k+1}&{}=&{}\underset{\omega _i\in \Omega _i}{\hbox {argmin}}\;\Vert \omega _i-z_i^k-\beta ^{-1}(\lambda _2^k)_i\Vert ^2_2\\ &{}=&{}\Pi (z_i^k+\beta ^{-1}(\lambda _2^k)_i;\Omega _i),\;\;i=1,2,\ldots ,m. \end{array} \end{aligned}$$
    (4.9)
  • The z-subproblem amounts to

    $$\begin{aligned} \begin{array}{c} z^{k+1}=\underset{z\in {\mathbb {R}}^n}{\hbox {argmin}}\left\{ \Vert x^{k+1}-Mz-a-\beta ^{-1}\lambda _1^k\Vert _2^2+\Vert \omega ^{k+1}-z-\beta ^{-1}\lambda _2^k\Vert ^2_2\right\} , \end{array} \end{aligned}$$

    which is a positive definite linear system

    $$\begin{aligned} \begin{array}{l} (M^\top M+I)z=M^\top (x^{k+1}-a-\beta ^{-1}\lambda _1^k)+\omega ^{k+1}-\beta ^{-1}\lambda _2^k. \end{array} \end{aligned}$$
    (4.10)

    There are numerous algorithms catering for the above linear system in numerical computation, e.g., preconditioned conjugate gradient (PCG) method.

Summarily, the updates of x, \(\omega\) and z when implementing ADMM can be readily conducted. The projections onto dynamic set \({\mathcal {F}}\) in (4.7) and those component sets \(\Omega _i\) in (4.9) can be handled simultaneously. Moreover, a pleasing property of ADMM is that there is only one user-dependent parameter \(\beta\) to be tuned throughout numerical implementation.

4.2 PDHG solver

As discussed in Sect. 4.1, ADMM can perform efficiently on solving (1.4) with lower dimensions, e.g., the collision detection in robotics. However, its pitfalls are becoming increasingly evident when the dimension of (1.4) is enlarged. The computational effort on implementing ADMM in each iteration is possibly dominated by solving the linear system (4.10). An alternative is the PDHG method explored in [9, 10, 16, 24, 26], which can be viewed as the application of PPA to the monotone inclusion problem (3.12). Recently, a novel viewpoint was addressed that PDHG can also be interpreted as the application of DRS to a reformulation of primal problem [38].

Given \((\omega ^k,\lambda ^k)\in {\mathbb {R}}^n\times {\mathbb {R}}^l\), the recursion of PDHG [24] on solving the primal-dual reformulation of (1.4) (i.e., the (3.11)), is specified as follows

$$\begin{aligned} {\left\{ \begin{array}{ll} \tilde{\omega }^k=\underset{\omega \in {\mathbb {R}}^l}{\hbox {argmin}}\left\{ \psi (\omega ,\lambda ^k)+\frac{1}{2\tau _1}\Vert \omega -\omega ^k\Vert _2^2\right\} ,\\ \tilde{\lambda }^k=\underset{\lambda \in {\mathbb {R}}^n}{\hbox {argmax}}\left\{ \psi (2\tilde{\omega }^k-\omega ^k,\lambda )-\frac{1}{2\tau _2}\Vert \lambda -\lambda ^k\Vert _2^2\right\} ,\\ \omega ^{k+1}=\omega ^k+\rho (\tilde{\omega }^k-\omega ^k),\\ \lambda ^{k+1}=\lambda ^k+\rho (\tilde{\lambda }^k-\lambda ^k), \end{array}\right. } \end{aligned}$$
(4.11)

where \(\psi\) is the saddle point function defined in (3.11), the \(\tau _1>0\), \(\tau _2>0\) are parameters satisfying \(\tau _1\tau _2\Vert M^\top M\Vert _2\le 1\), and \(\rho \in (0,2)\) is a relaxation factor. By elaborating the subproblems in (4.11), we obtain the following concrete recursion for solving (1.4) by PDHG

$$\begin{aligned} {\left\{ \begin{array}{ll} \tilde{\omega }_i^k=\Pi (\omega _i^k-\tau _1M_i^\top \lambda ^k;\Omega _i),\;\;i=1,2,\ldots ,m\\ \tilde{\lambda }^k=\Pi (\lambda ^k+\tau _2(M(2\tilde{\omega }^k-\omega ^k)+a-x^0);{\mathcal {F}}),\\ \omega ^{k+1}=\omega ^k+\rho (\tilde{\omega }^k-\omega ^k),\\ \lambda ^{k+1}=\lambda ^k+\rho (\tilde{\lambda }^k-\lambda ^k). \end{array}\right. } \end{aligned}$$
(4.12)

Intuitively, the recursion of PDHG for solving (1.4) is much easier than that of ADMM. The computational effort of (4.12) is merely dominated by solving the projections onto dynamic set \({\mathcal {F}}\) and component sets \(\Omega _i\)’s. Compared to ADMM, PDHG is fairly favourable for handling large-scale problems as the linear systems are not involved throughout iteration. The only drawback of PDHG may be the challenge in tuning parameters \((\tau _1,\tau _2)\) to obtain compelling numerical performance.

Remark 4.1

In contrast, Gilbert’s method [19] aims to seek the optima of convex quadratic function f on compact set \({\mathcal {C}}\), i.e., \(\min _{x\in {\mathcal {C}}}f(x)\). The recursion reads

$$\begin{aligned} x^{k+1}=x^k+\alpha _k[s_{\mathcal {C}}(-\nabla f(x^k))-x^k], \end{aligned}$$
(4.13)

where \(\alpha _k\in (0,1]\) is the stepsize computed by a prior criterion, and \(s_{\mathcal {C}}(y):=\{x\in {\mathcal {C}}\mid x^\top y=\sup _{z\in {\mathcal {C}}} z^\top y\}\) denotes the contact mapping of \({\mathcal {C}}\). The recursion (4.13) amounts to minimizing f on a polyhedron approximation of \({\mathcal {C}}\) by projection gradient method, or can be deemed as a variant of Frank-Wolfe method (see e.g., [40] for discussion). An improvement of (4.13) can be referred to, e.g. [21], for solving problem (1.4) in 3D case. The worst-case convergence rate of (4.13) is proven to be \(O(\frac{1}{\epsilon })\). Alternatively, Qin and An [40] deployed Nesterov gradient-like method [36, Chapter 2.2] to the dual reformulation of (1.4), which admits the worst-case \(O(\frac{1}{\sqrt{\epsilon }}\ln (\frac{1}{\epsilon }))\) convergence rate. Both methods are essentially the gradient-like methods in either primal or dual variable. Numerically, as shown in [40], Gilbert’s method may converge slowly near optima due to zigzagging phenomenon.

ADMM and PDHG are state-of-the-art splitting methods involving both primal and dual variables in recursion, and admit the worst-case \(O(\frac{1}{\epsilon })\) convergence rate for generic convex programming. PDHG is typically more popular for solving large-scale practical problems as its recursion only involves matrix-vector operations.

5 Numerical simulations

In this section, we conduct some numerical experiments involving the abstract model (1.4) so as to demonstrate the performances of PDHG and ADMM. Essentially, the model (1.4) captures numerous important real-world applications in robotics [29] and mechanics [48]. We test a 2D path planning problem in robotics and some high-dimensional synthetic problems involving (1.4). All codes are written by matlab 7.9 and all experiments are conducted on a Lenovo personal computer with Intel Core (TM) CPU 2.30GHZ and 8G memory.

5.1 Path planning

Path planning plays a fundamental role in robotics and has been extensively studied in real-world applications. It aims to ensure the agents (e.g., manipulator, robot) move freely in a (static or dynamic) workspace without hitting obstacles. We herein consider the scenarios of an agent (more precisely, a manipulator) with a single link and m degree-of-freedom (DoF) links in a static workspace.

Let \(\Omega \subset {\mathbb {R}}^n\) denote an n-dimensional (\(n=\{2,3\}\) for practical applications in robotics) workspace and \({\mathcal {O}}_i\subset \Omega\) (\(i=1,2,\ldots ,l\)) are closed sets representing stationary obstacles. Without loss of generality, all \({\mathcal {O}}_i\)’s are assumed to be convex.Footnote 3 A manipulator with m DoF can be described by a collection of rigid bodies \({\mathcal {A}}_i\subset {\mathbb {R}}^n\) (\(i=1,2,\ldots ,m\)). Figure 2 shows the diagram representations of some manipulators with variant shapes. The links (also called arms) of manipulator, which are concatenated by the joints (green points in Fig. 2), are typically mimicked by simple sets such as line segment, polytope, and ellipsoid.

In path planning, a probabilistic roadmap is constructed as an undirected graph [27], followed by a collision-free path by repetitively checking the distances between obstacles and links. A mathematical model can be formulated by calculating the distance function

$$\begin{aligned} d({\mathcal {O}},{\mathcal {A}}):=\inf \left\{ \sigma _{\mathcal {F}}(x,y)\mid x\in {\mathcal {O}}, y\in {\mathcal {A}}\right\} , \end{aligned}$$
(5.1)

where \(\sigma _{\mathcal {F}}\) is the generalized distance metric defined in (1.2). The \({\mathcal {O}}:=\bigcup _{i=1}^l{\mathcal {O}}_i\) and \({\mathcal {A}}:=\bigcup _{i=1}^m{\mathcal {A}}_i\) denote the collection of stationary obstacles and rigid links, respectively. Unfortunately, the problem (5.1) is prohibitively difficult to be tackled by numerical optimization methods because the constraint \({\mathcal {O}}\) or \({\mathcal {A}}\) is typically nonconvex, even if all component set \({\mathcal {O}}_i\)’s or \({\mathcal {A}}_i\)’s are convex. An equivalent formulation of (5.1) is to compute the distances between a series of component sets, i.e.,

$$\begin{aligned} d({\mathcal {O}},{\mathcal {A}})=\min \{d({\mathcal {O}}_i,{\mathcal {A}}_j)\mid i=1,2,\ldots ,l;\;j=1,2,\ldots ,m\}. \end{aligned}$$
(5.2)

Under the convexity assumption on \({\mathcal {O}}_i\)’s and \({\mathcal {A}}_i\)’s, the nonconvex optimization problem (5.1) can be possibly handled by solving ml convex optimization problems. There is abundant literature on solving (5.2) with \({\mathcal {O}}_i\)’s and \({\mathcal {A}}_i\)’s being ellipsoids or polytopes. Those special shapes are typically resulted by approximating obstacles or links via principal component analysis and linearization technique. The ability to cope with only the ellipsoidal and polytonal obstacles may be not accurate enough. Herein, we further assume that the component sets \({\mathcal {O}}_i\)’s and \({\mathcal {A}}_i\)’s can be described as the Minkowski sum of convex sets. This assumption is admissible as the manipulators typically admit special geometry shape, e.g., polyhedron and ellipsoid (which can be easily described as Minkowski sum of simplex and ball). In order to formulate a generic shape (possibly non-convex) as the Minkowski sum of “simple” set, a possible idea is first approximating the shape by a union of polyhedrons (e.g., [7, 41]) or ellipsoids (e.g., by using principle component analysis), and then deriving the Minkowski sum description for those polyhedrons and ellipsoids.

Fig. 2
figure 2

Left: Diagram representations of manipulators with variant shapes of links. Right: A link \({\mathcal {A}}(P,\theta )\) with Minkowski sum structure

Figure 3 shows an environment of 2D workspace \(\Omega =[0,75]^2\) with four stationary obstacles. The manipulators with single- and 4-DoF links are illustrated in Fig. 3 (a) and (b), respectively. The base of 4-DoF manipulator is fixed at the point (25, 20) and all links of manipulator admit the same shapes. Accordingly, the link \({\mathcal {A}}_i\) can be described as a rigid transformation of stencil body, denoted by \({\mathcal {A}}(P,\theta )\) (see Fig. 2b). The stencil body is hinged by the location of joint (denoted by P), and the phase of dominant axis (denoted by \(\theta\)). The data for reproducing obstacles \({\mathcal {O}}_i\)’s and the stencil body \({\mathcal {A}}(P,\theta )\) are reported in Table 1. All obstacles and links are represented as the Minkowski sum of set \(T_i(\Omega _i)\), where the sets \(\Omega _i\) (\(i=1,2,3\)) are defined by \(\Omega _1=\{x\in {\mathbb {R}}^2\mid \Vert x\Vert _1\le 1, x_2\ge 0\}\), \(\Omega _2=\{x\in {\mathbb {R}}^2\mid \Vert x\Vert _2\le 1\}\) and \(\Omega _3=\{x\in {\mathbb {R}}^2\mid \Vert x\Vert _{\infty }\le 1\}\). In Table 1, the parameter w for \(M_9\) indicates the length of link. We take w\(=3\) and w\(=5\) for sigle- and 4-DoF manipulators, respectively. The aim of manipulator in Fig. 4 is to move freely from initial configuration (red location) to goal configuration (green location) without hitting obstacles.

Fig. 3
figure 3

Examples of path planning in a 2D workspace with four obstacles. a Single-link manipulator; b 4-DoF manipulator

Fig. 4
figure 4

The motion tracks of manipulators from initial to goal configurations. a Single-link manipulator; b 4-DoF manipulator

Table 1 Data of the obstacles \({\mathcal {O}}_i\)\((i=1,2,3,4)\) and links for path planning problem

In numerical simulation, we first generate a probabilistic roadmap by the method in [27], then collision-free paths for moving manipulators from initial to goal configurations are probed by repetitively computing the distances \(d({\mathcal {O}}_i,{\mathcal {A}}_j)\) in (5.2) (more precisely, the \(d({\mathcal {O}}_i,{\mathcal {A}}(P,\theta ))\) with data in Table 1). The dynamic set \({\mathcal {F}}\) is taken as \(\mathcal {F}=\{x\in {\mathbb {R}}^2\mid \Vert x\Vert _2\le 1\}\) for this real-world application, which is equivalent to choosing \(\sigma _{\mathcal {F}}(x)=\Vert x\Vert _2\). Throughout, we choose the penalty parameter \(\beta \equiv 1\) for ADMM, and \((\tau _1,\tau _2)=(1,1.1/\Vert M^TM\Vert _2)\) for PDHG. The maximum number of iteration is set as 100. The motion tracks of manipulators, which is indicated by the colour changes from initial to goal configurations, are shown in Fig. 4. The computational time to realize this path planning problem is less than 5 s for single-link manipulator and 10 s for 4-DoF manipulator on our personal laptop computing system.

5.2 High-dimensional problems

We now consider the problem (1.4)–(1.5) with synthetic data in high dimensions so as to demonstrate the numerical performances of ADMM and PDHG. Numerical comparisons with some benchmark methods in the literature are also tested, including the Gilbert’s method [19], Qin and An’s method [40]. The datasets of (1.4)–(1.5) are synthesized as follows:

  • All \(M_i\)’s in (1.5) are taken as square matrices, i.e., \(l_i\equiv n\) for all \(i=1,2,\ldots ,m\).

  • The \(M_i\) and \(a_i\) in (1.5) are randomly generated with entries being independent and identically distributed (i.i.d) variables. Concretely, the entries of \(M_i\) obey the standard normal distribution, and the entries of \(a_i\) follow the uniform distribution on \([mn,mn+10]\).

  • The component sets \(\{\Omega _i\}_{i=1}^m\) in (1.4) are randomly taken among some abstract sets, including the unit \(\ell ^p\)-norm balls with \(p=\{1,2,\infty \}\), the nonnegative orthant, and the second-order cone with \(\alpha =1\) (see Example 2.5 iv) for definition).

We test the (1.4)–(1.5) on scenarios of \(\sigma_{\mathcal{F}}(\cdot )=\Vert \cdot \Vert _p\) (\(p=\{1,2,\infty \}\)) with different number of component sets (i.e., the m) and variant dimension of variables (i.e., the n). Both ADMM and PDHG methods are conducted with their implementation details elucidated in Sect. 4. As Gilbert’s method [19], Qin and An’s method [40] can also be applicable for solving (1.4) with \(\sigma _{\mathcal {F}}(\cdot )=\Vert \cdot \Vert _2\), we test both methods on (1.4)–(1.5) with Euclidean norm. For the involved parameters in test methods, we choose \(\alpha =0.5\) for Gilbert’s method; \(\mu =0.001\) for Qin and An’s method; \(\beta =0.8\), and \(\gamma =1.6\) for ADMM; \(\tau _1=1\), \(\tau _2=1.1/\Vert M^\top M\Vert _2\) and \(\rho =1.5\) for PDHG. As for the nm-by-nm linear system (4.10) in ADMM recursion, we first economize the arithmetic operations on deriving the inverse of coefficient matrix \(I+M^\top M\) by aid of the Sherman-Morrison formula, then cope with an n-by-n linear system by using the build-in matlab package linsolve. All initial points required for test methods are taken as zeros vectors. The stopping criterion is measured by the primal-dual relative gap, which is defined as

$$\begin{aligned} \hbox {relgap}=\frac{|{\texttt {Pobj}}-{\texttt {Dobj}}|}{1+|{\texttt {Pobj}}|+|{\texttt {Dobj}}|}<\hbox {Tol}, \end{aligned}$$
(5.3)

where Pobj and Dobj represent the primal and dual objective function values of (1.4), respectively. According to the discussion in Sect. 3, the values of Pobj and Dobj can be respectively computed by \(\sigma _{\mathcal {F}}(x-x^0)\) and \(\lambda _1^\top (a-x^0)-\sigma _{\Omega }(\lambda _2)\) when implementing ADMM recursion, whilst the values of Pobj and Dobj can be respectively computed by \(\sigma _{\mathcal {F}}(Mw+a-x^0)\) and \(\lambda ^\top (a-x^0)-\sigma _{\Omega }(-M^\top \lambda )\) when implementing PDHG recursion.

We report in Tables 2, 3 and 4 the number of iterations and computing time in seconds (“No. Iteration(CPU)”) when the stopping criterion (5.3) reaches the pre-determined tolerances. Note that the number of iterations and computing time reported in Tables  2, 3 and 4 are averaged by running 50 i.i.d random datasets. Table 2 shows that: (i) the performances of ADMM and PDHG outperform substantially those of Gilbert’s method, Qin and An’s method; (ii) ADMM is competitive to PDHG when the dimensionality (i.e., the n) and the number of component sets (i.e., the m) in problem (1.4) are relative small. The superiority of PDHG would be more evident when the dimensionality or the number of component sets in (1.4) increases. The number of iterations between ADMM and PDHG differs slightly, whilst the computing time of both methods is quite disparate. The reason is that ADMM recursion involves solving generic linear systems which are usually time-consuming in high-dimensional scenarios, whilst the recursion of PDHG merely involves matrix-vector multiplications whose arithmetic operations are relatively low. Both ADMM and PDHG are also tested for solving problem (1.4) with nonsmooth distance merits, e.g., \(\sigma_{\mathcal{F}}(x)=\Vert x\Vert _1\) and \(\Vert x\Vert _\infty\). The numerical results are reported in Tables  3 and 4, which demonstrate the compelling performances of PDHG. In Fig. 5, we plot the evolutions of primal-dual relative gap with respect to the number of iterations and computing time in seconds for some test datasets with variant support function \(\sigma _{\mathcal {F}}\). The figures show that both ADMM and PDHG perform well for solving the problem (1.4). PDHG is more numerically preferable for solving problem (1.4) in large-scale.

Table 2 Numerical results on solving (1.4)–(1.5) with distance merit \(\sigma _{\mathcal {F}}(\cdot )=\Vert \cdot \Vert _2\)
Table 3 Numerical results on solving (1.4)–(1.5) with distance merit \(\sigma _{\mathcal {F}}(\cdot )=\Vert \cdot \Vert _1\)
Table 4 Numerical results on solving (1.4)–(1.5) with distance merit \(\sigma _{\mathcal {F}}(\cdot )=\Vert \cdot \Vert _\infty\)
Fig. 5
figure 5

Evolutions of primal-dual gap values w.r.t. iterations and computing time for solving (1.4). Top row: the case of \(\sigma _{\mathcal {F}}(x)=\Vert x\Vert _2\) with \((n,m)=(100,500)\). Center row: the case of \(\sigma _{\mathcal {F}}(x)=\Vert x\Vert _1\) with \((n,m)=(100,500)\). Bottom row: the case of \(\sigma _{\mathcal {F}}(x)=\Vert x\Vert _\infty\) with \((n,m)=(500,500)\)

6 Concluding remarks

In this paper, we explore the distance between convex sets with Minkowski sum structure (or equivalently, the projection onto Minkowski sum of convex sets), which is a fundamental nomenclature in computational geometry. By reformulating the problem into a separable convex optimization with linear constraints, we cope with the resulted model by some recent advanced optimization methods. Attributed to the skilfully first-order optimization methods, the distance can be efficiently solved in primal, dual, or primal-dual forms. Moreover, those methods are also favourable for solving large-scale computational geometry problems by aid of parallel computing techniques.