The distance between convex sets with Minkowski sum structure: application to collision detection

Wang, Xiangfeng; Zhang, Junping; Zhang, Wenxing

doi:10.1007/s10589-020-00211-0

The distance between convex sets with Minkowski sum structure: application to collision detection

Published: 13 July 2020

Volume 77, pages 465–490, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Optimization and Applications Aims and scope Submit manuscript

The distance between convex sets with Minkowski sum structure: application to collision detection

Download PDF

Xiangfeng Wang¹,
Junping Zhang² &
Wenxing Zhang²

811 Accesses
6 Citations
Explore all metrics

Abstract

The distance between sets is a long-standing computational geometry problem. In robotics, the distance between convex sets with Minkowski sum structure plays a fundamental role in collision detection. However, it is typically nontrivial to be computed, even if the projection onto each component set admits explicit formula. In this paper, we explore the problem of calculating the distance between convex sets arising from robotics. Upon the recent progress in convex optimization community, the proposed model can be efficiently solved by the recent hot-investigated first-order methods, e.g., alternating direction method of multipliers or primal-dual hybrid gradient method. Preliminary numerical results demonstrate that those first-order methods are fairly efficient in solving distance problems in robotics.

Distance Geometry in Active Structures

Collision Detection

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The Minkowski sum of sets (also known as dilation, sumset) is a long-standing and fundamental nomenclature in computational geometry. It plays an important role in the fields, such as mathematical morphology in image processing [44], collision detection in robotics [29], configuration space computation in mechanics [48], numerical control machining in computer-aided manufacturing [45], and aggregation theory in economics [33]. Given two sets ${\mathcal {A}}\subset {{\mathbb {R}}}^n$ and ${\mathcal {B}}\subset {{\mathbb {R}}}^n$ (henceforth, we symbolize set by capital calligraphic), the Minkowski sum of ${\mathcal {A}}$ and ${\mathcal {B}}$, denoted by ${\mathcal {A}}\oplus {\mathcal {B}}$, is defined as

$$\begin{aligned} {\mathcal {A}}\oplus {\mathcal {B}}:=\left\{ a+b \mid a\in {\mathcal {A}},\;b\in {\mathcal {B}}\right\} . \end{aligned}$$

(1.1)

Specially, if one of the sets consists of a unique element, e.g., ${\mathcal {A}}=\{a\}$, we briefly adopt the notation $a\oplus {\mathcal {B}}$ to indicate the translation of set ${\mathcal {B}}$ toward a. It follows immediately from (1.1) that the zero set $\{0\}$ plays the role of identity element under the additive operation $\oplus$. Figure 1 displays the diagrams of Minkowski sum of ball, box and convex polytope in two- and three-dimensions.

Let ${\mathcal {F}}\subset {{\mathbb {R}}}^n$ be a nonempty set. The support function of ${\mathcal {F}}$, denoted by $\sigma _{\mathcal {F}}:{{\mathbb {R}}}^n\rightarrow (-\infty , +\infty ]$, is defined as

$$\begin{aligned} \sigma _{\mathcal {F}}(x):=\sup \left\{ \langle x,y\rangle \mid y\in {\mathcal {F}} \right\} , \end{aligned}$$

(1.2)

where $\langle x,y\rangle :=x^\top y$ is the inner product endowed on ${{\mathbb {R}}}^n$. Because of the closedness and sublinearity of $\sigma _{\mathcal {F}}$ (see Proposition 2.3 in Sect. 2), it essentially indicates a norm on ${{\mathbb {R}}}^n$ whenever ${\mathcal {F}}$ fulfils some prerequisites (e.g., compactness, convexity, and symmetry). Accordingly, the generalized distance from an $x^0\in {{\mathbb {R}}}^n$ to a set ${\mathcal {C}}\subset {{\mathbb {R}}}^n$, denoted by $d_{\mathcal {F}}:{{\mathbb {R}}}^n\rightarrow [0,+\infty ]$, can be quantified by the metric $\sigma _{\mathcal {F}}$ as follows

$$\begin{aligned} d_{\mathcal {F}}(x^0;{\mathcal {C}}):=\inf \left\{ \sigma _{\mathcal {F}}(x-x^0) \mid x\in {\mathcal {C}}\right\} . \end{aligned}$$

(1.3)

Note that the optima of (1.3) (if exist) are coined as the generalized projection of $x^0$ onto ${\mathcal {C}}$, denoted by $\Pi _{\mathcal {F}}(x^0;{\mathcal {C}})$. Particularly, if ${\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _2\le 1\}$ is an origin-centered unit ball under the $\ell ^2$-norm, the $\sigma _{\mathcal {F}}$, $d_{\mathcal {F}}$ and $\Pi _{\mathcal {F}}$ reduce to the canonical Euclidean norm, Euclidean distance and Euclidean projection (also known as orthogonal projection), respectively. The (generalized) projection onto set, especially onto the convex set, is crucial for both theoretical and practical purposes in numerical optimization and computational geometry. For some “simple” sets such as the $\ell ^2$-norm ball, hyperplane, second-order cone and spectral set, the Euclidean projections onto those sets can be trivially derived by closed-form formulae (see e.g, [2, Chapter 28] and the references therein). There is also abundant literature on solving projections onto, e.g., the $\ell ^1$-norm ball [12, 49], manifolds [30], $\ell ^{1,q}$-norm ball [47], and isotone projection cone [35]. In other words, the distance (1.3) can be efficiently computed for the above “simple” sets. However, the effort on computing the distance (1.3) may be significantly intensive if ${\mathcal {F}}$ and ${\mathcal {C}}$ are generic convex sets.

In this paper, our particular interest is to compute the generalized distance from an $x^0\in {{\mathbb {R}}}^n$ to a convex set with Minkowski sum structure, i.e.,

$$\begin{aligned} d_{\mathcal {F}}(x^0;\mathcal {C})=\inf \left\{ \sigma _{\mathcal {F}}(x-x^0) \mid x\in {\mathcal {C}}:=T_1(\Omega _1)\oplus T_2(\Omega _2)\oplus \cdots \oplus T_m(\Omega _m)\right\} , \end{aligned}$$

(1.4)

where $\Omega _i\subset {{\mathbb {R}}}^{l_i}$$(i = 1, 2,\ldots ,m)$ are nonempty closed convex sets with $\sum _{i=1}^ml_i=l$, and $T_i(\Omega _i)\subset {{\mathbb {R}}}^n$ denotes the image of $\Omega _i$ under affine mapping $T_i$, i.e.,

$$\begin{aligned} T_i(\Omega _i)=\{M_i\omega +a_i\mid \omega \in \Omega _i\},\;\;i=1,2,\ldots ,m \end{aligned}$$

(1.5)

with $M_i\in {{\mathbb {R}}}^{n\times l_i}$ and $a_i\in {{\mathbb {R}}}^n$. For brevity, the $-\Omega _i$ denotes the opposite set of $\Omega _i$, i.e., the case of $M_i=-I_n$ and $a_i={\mathbf {0}}$ in (1.5). Due to the closedness and convexity of $\Omega _i$, it follows from Lemma 2.2 (see Sect. 2) that those $T_i(\Omega _i)$ in (1.5) and $\mathcal {C}$ in (1.4) are closed convex sets. Problem (1.4) has plenty of applications in robotics. For example, to guarantee a robot operate freely in workspace with obstacles, collision detection is conducted to inspect whether the robot and obstacles are overlapped [17]. Concretely, let $\Omega _i\subset {{\mathbb {R}}}^n$ ($i=1,2$) be convex sets. The collision detection, which is measured by the distance between $\Omega _1$ and $\Omega _2$, can be formalized as (see e.g., [8, 51] for details)

$$\begin{aligned} d_{\mathcal {F}}(0;{\mathcal {C}})=\inf \left\{ \sigma _{\mathcal {F}}(x) \mid x\in {\mathcal {C}}:=\Omega _1\oplus (-\Omega _2)\right\} , \end{aligned}$$

(1.6)

which is a special case of (1.4) with specifications $m=2$, $x^0=0$, $M_1=-M_2=-I_n$ and $a_1=a_2=0$.

However, the (1.4) is nontrivial to be tackled, even if all $T_i$’s are identity mappings. Take the sets in Fig. 1 as examples, although those $\Omega _i$’s are simple enough in the sense that the projection onto individual set admits closed-form solutions or can be easily solved up to high precisions, the projection onto their Minkowski sum set (i.e., the $\Omega _1\oplus \Omega _2\oplus \Omega _3$ in Fig. 1) is challenging to acquire. Typically, the projection onto a set with Minkowski sum structure does not possess explicit formula and should be approximated numerically. A large number of methods have been devised for solving (1.6) (a special case of (1.4)) with compact convex sets (especially for polytopes) over the past decades. The existing methods for solving (1.6) can be roughly classified into two categories: optimization-based methods [1, 13, 32, 34, 43, 50, 52] and geometry-based methods [8, 19,20,21, 31, 39, 51]. Empirically, the optimization-based methods can be adaptive for (1.6) with generic convex sets in high dimension but may be computational intensive, whilst the geometry-based ones usually perform efficiently but are confined to (1.6) with low-dimensional structured sets (e.g., 2D/3D polytopes).

Gilbert et al. [21] proposed a recursion (named as GJK method) for solving (1.6) by approximating ${\mathcal {C}}$ as simplices. It can also be extended to solve (1.4) with $n=3$ and $\sigma _{\mathcal {F}}(\cdot )=\Vert \cdot \Vert _2$. Some improvements of GJK method were explored in, e.g., [8, 20, 39]. Recently, Qin and An [40] proposed a fast recursion for solving the dual of (1.4) by using Nesterov’s gradient-like method [36] and smoothing technique [37]. Herein, our paper is devoted to tackling (1.4) with generic component sets by reformulating it as a separable convex optimization, and solving it by the recent state-of-the-art first-order optimization methods with O(1/k) convergence rate. Moreover, the proposed methods favour the parallel computing system [4, 6] when the number of component sets $\Omega _i$’s in (1.4) is huge-size.

The rest of the paper is organized as follows. In Sect. 2, some basic definitions and propositions are stated for the sequel discussion. In Sect. 3, the reformulations of (1.4) in primal, dual, and primal-dual fashions are presented, followed by the implementation details on handling (1.4) in Sect. 4. In Sect. 5, numerical simulations are conducted to illustrate the performances of alternating direction method of multipliers (ADMM) and primal-dual hybrid gradient method (PDHG). Finally, some concluding remarks are drawn in Sect. 6.

2 Preliminaries

In this section, we summarize some basic concepts and their properties that will be useful for the sequel discussions.

For any vector $x=(x_1,x_2,\ldots ,x_n)\in {{\mathbb {R}}}^n$, let $\Vert x\Vert _p:=(\sum _{i=1}^n|x_i|^p)^{1/p}$ ($1\le p<\infty$) denote the $\ell ^p$-norm and $\Vert x\Vert _\infty :=\max _{i=1,\ldots , n}|x_i|$ denote the $\ell ^\infty$-norm. Given a symmetric matrix $Q\in {{\mathbb {R}}}^{n\times n}$, the $Q\succ 0$ (resp. $Q\succeq 0$) denotes the positive definite (resp. positive semi-definite) matrix. For any $Q\succ 0$, we denote by $\Vert x\Vert _Q:=(x^\top Qx)^{1/2}$ the Q-norm of x. We use $\hbox {diag}(x)$ to represent a diagonal matrix whose diagonal elements are $x_i$’s. Let ${\mathbf {1}}$ and I denote the all-one and identity matrix (whose dimensions will be clear from the context), respectively.

We now review some basic definitions and properties from convex analysis (see e.g., the monograph [2]).

Definition 2.1

Let ${\mathcal {C}}\subset {{\mathbb {R}}}^n$ be a set. Then

i)
${\mathcal {C}}$ is closed if the limit of any convergent sequence in ${\mathcal {C}}$ belongs to ${\mathcal {C}}$.
ii)
${\mathcal {C}}$ is compact if any sequence of ${\mathcal {C}}$ has a cluster point in ${\mathcal {C}}$ (or equivalently, $\mathcal {C}$ is closed and bounded).
iii)
The convex hull of ${\mathcal {C}}$, denoted by $\mathrm{conv}({\mathcal {C}})$, is the intersection of all convex sets containing ${\mathcal {C}}$ (or equivalently, the smallest convex set containing $\mathcal {C}$). Particularly, the $\mathrm{conv}({\mathcal {C}})$ is a polytope if ${\mathcal {C}}$ comprises a finite number of elements.
iv)
The decomposition of ${\mathcal {C}}$, denoted by ${\mathfrak {D}}({\mathcal {C}})$, is defined as

$$\begin{aligned} \begin{array}{c} {\mathfrak {D}}({\mathcal {C}}):=\left\{ {\mathcal {C}}_i\subset {{\mathbb {R}}}^n\mid \bigcup _{i=1}^m{\mathcal {C}}_i={\mathcal {C}},\; \hbox {and}\; \; {\mathcal {C}}_i\bigcap \mathcal {C}_j=\emptyset \;\hbox {for any}\; i\ne j\right\} . \end{array} \end{aligned}$$

Furthermore, the convex decomposition of ${\mathcal {C}}$, denoted by $\widehat{\mathfrak {D}}(\mathcal {C})$, is a decomposition of ${\mathcal {C}}$ with only convex component sets, i.e.,

$$\begin{aligned} \widehat{\mathfrak {D}}({\mathcal {C}}):=\{{\mathcal {C}}_i\subset {{\mathbb {R}}}^n\mid {\mathcal {C}}_i\in {\mathfrak {D}}({\mathcal {C}}),\;\hbox {and all}\;\;{\mathcal {C}}_i\hbox {'s} \;\hbox {are convex sets}\}. \end{aligned}$$

Besides the commutative and associative properties, the Minkowski sum operation “$\oplus$” admits the following important propositions.

Lemma 2.2

Let ${\mathcal {C}}_i\subset {{\mathbb {R}}}^{l_i}$$(i=1,2)$be nonempty sets (possibly nonconvex).

i)
$\mathrm{conv}({\mathcal {C}}_1\oplus {\mathcal {C}}_2)=\mathrm{conv}({\mathcal {C}}_1)\oplus \mathrm{conv}({\mathcal {C}}_2)$.
ii)
If ${\mathcal {C}}_i$$(i=1,2)$are convex (resp. compact) sets and $T_i:{{\mathbb {R}}}^{l_i}\rightarrow {{\mathbb {R}}}^n$ ($i=1,2$) are affine mappings, then $T_1({\mathcal {C}}_1)\oplus T_2({\mathcal {C}}_2)$is a convex (resp. compact) set.

More generally, the above lemma remains true on the occasion of a finite number of sets or affine mappings.

Let $f:{{\mathbb {R}}}^n\rightarrow (-\infty ,+\infty ]$. The domain and epigraph of f are defined by $\hbox {dom}(f):=\{x\in {{\mathbb {R}}}^n \mid f(x)<+\infty \}$ and $\hbox {epi}(f):=\{(x,y)\in {{\mathbb {R}}}^n\times {{\mathbb {R}}}\mid f(x)\le y\}$, respectively. If the $\hbox {dom}(f)$ is nonempty, then f is said to be proper. If the $\hbox {epi}(f)$ is closed, then f is said to be closed. The indicator function of a set ${\mathcal {C}}\subset {{\mathbb {R}}}^n$, denoted by $\iota _{\mathcal {C}}$, is defined as

$$\begin{aligned} \iota _{\mathcal {C}}(x):= {\left\{ \begin{array}{ll} 0, &\quad{} \hbox {if}\;\;x\in {\mathcal {C}},\\ +\infty , &\quad{}\hbox {otherwise}. \end{array}\right. } \end{aligned}$$

(2.1)

The following properties and examples of support function $\sigma _{\mathcal {F}}$ can be referred to, e.g., [2].

Proposition 2.3

Let ${\mathcal {F}}\subset {{\mathbb {R}}}^n$be a nonempty set. Then,

i)
$\sigma _{\mathcal {F}}$is closed and convex.
ii)
$\sigma _{\mathcal {F}}$is finite everywhere if and only if ${\mathcal {F}}$is a bounded set.
iii)
$\sigma _{\mathcal {F}}$is Lipschitz continuous with $\sup _{x\in {\mathcal {F}}}\Vert x\Vert$as Lipschitz constant if ${\mathcal {F}}$is a compact convex set.
iv)
$\sigma _{\mathcal {F}}$is sublinear, i.e., $\sigma _{\mathcal {F}}(\alpha _1x_1+\alpha _2x_2)\le \alpha _1\sigma _{\mathcal {F}}(x_1)+\alpha _2\sigma _{\mathcal {F}}(x_2)$ for any scalars $\alpha _i\ge 0$and vectors$x_i\in {{\mathbb {R}}}^n$ ($i=1,2$).
v)
$\sigma _{\mathcal {F}}$is a norm on ${{\mathbb {R}}}^n$if ${\mathcal {F}}$is compact, convex, symmetric (i.e., ${\mathcal {F}}=-{\mathcal {F}}$) and contains the origin as an interior point.

Example 2.1

Let ${\mathcal {F}}\subset {{\mathbb {R}}}^n$ be a nonempty set.

i)
If ${\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _\dagger \le 1\}$ is a unit ball associated with a norm $\Vert \cdot \Vert _\dagger$, then the support function $\sigma _{{\mathcal {F}}}(x)=\Vert x\Vert _{\ddagger }$, where $\Vert \cdot \Vert _{\ddagger }$ is the dual norm of $\Vert \cdot \Vert _\dag$. For instance, if ${\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _Q\le 1\}$ with $Q\succ 0$, then $\sigma _{{\mathcal {F}}}(x)=\Vert x\Vert _{Q^{-1}}$; if ${\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _p\le 1\}$ with $p\in [1,\infty ]$, then $\sigma _{{\mathcal {F}}}(x)=\Vert x\Vert _{\frac{p}{p-1}}$. Herein, the $\frac{p}{p-1}$ takes value $\infty$ (resp. 1) when $p=1$ (resp. $p=\infty$).
ii)
If ${\mathcal {F}}$ is a closed convex cone, then the support function $\sigma _{{\mathcal {F}}}(x)=\iota _{{\mathcal {F}}^o}(x)$, where ${\mathcal {F}}^\circ :=\{x\in {{\mathbb {R}}}^n\mid x^\top y\le 0, \;\forall y\in {\mathcal {F}}\}$ denotes the polar cone of ${\mathcal {F}}$.
iii)
If ${\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid Ax=b\}$ is an affine set with $A\in {{\mathbb {R}}}^{m\times n}$, then the support function

$$\begin{aligned} \sigma _{{\mathcal {F}}}(x)={\left\{ \begin{array}{ll}b^\top z, &\quad{} \hbox {if}\;\; x=A^\top z,\\ +\infty , &\quad{} \hbox {otherwise}. \end{array}\right. } \end{aligned}$$

(2.2)

As aforementioned, the generalized projection $\Pi _{\mathcal {F}}(\cdot ;{\mathcal {C}})$ in (1.3) reduces to the canonical Euclidean projection when ${\mathcal {F}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _2\le 1\}$. For notational brevity, we use $\Pi (\cdot ;{\mathcal {C}})$ for the Euclidean projection operator. The followings are some examples of Euclidean projections possessing closed-form formulae. The interested reader is referred to, e.g., [2, Chapter 28], for more examples.

Example 2.2

Let ${\mathcal {C}}\subset {{\mathbb {R}}}^n$ be a nonempty closed convex set.

i)
If ${\mathcal {C}}=\{x\in {{\mathbb {R}}}^n\mid a\le x\le b\}$ is a box with $a\in {{\mathbb {R}}}^n$ and $b\in {{\mathbb {R}}}^n$, then $\left[ \Pi (x;{\mathcal {C}})\right] _i=\mathrm {median}\{a_i,x_i,b_i\}$ for $i=1,2,\ldots ,n$.
ii)
If ${\mathcal {C}}=\left\{ x\in {{\mathbb {R}}}^n \mid \Vert x\Vert _2\le \alpha \right\}$ is a ball with $\alpha >0$, then $\Pi (x;{\mathcal {C}})=\min \left( 1,\frac{\alpha }{\Vert x\Vert _2}\right) x$.
iii)
If ${\mathcal {C}}=\{x\in {{\mathbb {R}}}^n\mid Ax=b\}$ is an affine set with $A\in {{\mathbb {R}}}^{m\times n}$ and $b\in {{\mathbb {R}}}^m$, then $\Pi (x;{\mathcal {C}})=x-A^\dagger (Ax-b)$, where $A^\dagger$ is the Moore–Penrose pseudo-inverse^{Footnote 1} of A. Particularly, if ${\mathcal {C}}=\{x\in {{\mathbb {R}}}^n\mid a^\top x=b,\;a\ne {0}\in {{\mathbb {R}}}^n\}$ is a nonvertical hypeplane, then $\Pi (x;{\mathcal {C}})=x-\frac{a^\top x-b}{\Vert a\Vert _2^2}a$.
iv)
If ${\mathcal {C}}=\{(x,t)\in {{\mathbb {R}}}^n\times {{\mathbb {R}}}\mid \Vert x\Vert _2\le \alpha t\}$ is a second-order cone with $\alpha >0$, then
$$\begin{aligned} \Pi (x;{\mathcal {C}})=\left\{ \begin{array}{ll}(x,t),&\quad{} \hbox {if}\;\;\Vert x\Vert _2\le \alpha t;\\ ({0},0),&\quad{}\hbox {if}\;\;\Vert x\Vert _2\le -t/\alpha ;\\ \frac{\alpha \Vert x\Vert _2+t}{(\alpha ^2+1)\Vert x\Vert _2}(\alpha x,\Vert x\Vert _2),&\quad{}\hbox {otherwise}. \end{array}\right. \end{aligned}$$

The followings are some propositions of Euclidean projection onto convex set with Minkowski sum structures.

Proposition 2.4

Let $\alpha >0$be a scalar, ${\mathcal {A}}$and ${\mathcal {C}}$be nonempty closed convex sets in ${{\mathbb {R}}}^n$, and x, $x^0$be vectors in ${{\mathbb {R}}}^n$.

i)
If ${\mathcal {A}}=x^0\oplus {\mathcal {C}}$, then $\Pi (x;{\mathcal {A}})=x^0+\Pi (x-x^0;{\mathcal {C}})$.
ii)
If ${\mathcal {A}}\perp {\mathcal {C}}$ (i.e., $a^\top c=0$ for any $a\in {\mathcal {A}}$and $c\in {\mathcal {C}}$), then $\Pi (x;{\mathcal {A}}\oplus {\mathcal {C}})=\Pi (x;{\mathcal {A}})+\Pi (x;{\mathcal {C}})$.
iii)
If ${\mathcal {A}}={\mathcal {B}}\oplus {\mathcal {C}}$with ${\mathcal {B}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x\Vert _2\le \alpha \}$, then $\Pi (x;{\mathcal {A}})=\Pi (x;{\mathcal {C}})+\min \left( 1,\frac{\alpha }{\Vert x-\Pi (x;{\mathcal {C}})\Vert _2}\right) (x-\Pi (x;{\mathcal {C}}))$.

Proof

The proofs of i) and ii) can be referred to [2, Propositions 28.1 and 28.6], respectively. The assertion iii) can be proved as follows.

By combining the definition of Euclidean projection with the Minkowski sum structure of ${\mathcal {A}}$, we have $\Pi (x;{\mathcal {A}})=\hbox {arg}\min \limits _{y\in {\mathcal {A}}}\Vert y-x\Vert _2=w_1^*+w_2^*$, where $(w_1^*,w_2^*)$ is an optimum of

$$\begin{aligned} \min \limits _{w_1\in \mathcal {B},w_2\in \mathcal {C}}\;\Vert w_1+w_2-x\Vert _2^2. \end{aligned}$$

By invoking the first-order optimality conditions, we have

$$\begin{aligned} w_1^*&= \Pi \left( w_1^*-\beta _1(w_1^*+w_2^*-x);{\mathcal {B}}\right) ,\;\;\forall \;\beta _1>0, \end{aligned}$$

(2.3)

$$\begin{aligned} w_2^*&= \Pi \left( w_2^*-\beta _2(w_1^*+w_2^*-x);{\mathcal {C}}\right) ,\;\;\forall \;\beta _2>0. \end{aligned}$$

(2.4)

Without loss of generality, by setting $\beta _1=1$ in (2.3), it follows from Example 2.5 ii) that

$$\begin{aligned} w_1^*=\Pi \left( x-w_2^*;{\mathcal {B}}\right) ={\left\{ \begin{array}{ll} \frac{\alpha (x-w_2^*)}{\Vert x-w_2^*\Vert _2},&{}\;\;\hbox {if}\;\;\Vert x-w_2^*\Vert _2>\alpha ,\\ x-w_2^*,&{}\;\;\hbox {otherwise}. \end{array}\right. } \end{aligned}$$

(2.5)

In the case of $\Vert x-w_2^*\Vert _2>\alpha$, by substituting $w_1^*=\frac{\alpha (x-w_2^*)}{\Vert x-w_2^*\Vert _2}$ into (2.4) and setting $\beta _2=(1-\frac{\alpha }{\Vert x-w_2^*\Vert _2})^{-1}$, we obtain $w_2^*=\Pi (x;{\mathcal {C}})$, and thus $w_1^*=\frac{\alpha (x-\Pi (x;{\mathcal {C}}))}{\Vert x-\Pi (x;{\mathcal {C}})\Vert _2}$. Finally, we get

$$\begin{aligned} w_1^*+w_2^*={\left\{ \begin{array}{ll} \Pi (x;{\mathcal {C}})+\frac{\alpha (x-\Pi (x;{\mathcal {C}}))}{\Vert x-\Pi (x;{\mathcal {C}})\Vert _2},&{}\;\;\hbox {if}\;\;\Vert x-\Pi (x;{\mathcal {C}})\Vert _2>\alpha ,\\ x,&{}\;\;\hbox {otherwise}, \end{array}\right. } \end{aligned}$$

(2.6)

which completes the proof. $\square$

The set ${\mathcal {A}}$ in Proposition 2.6 iii) is widely known as the offsetting of ${\mathcal {C}}$ by a radius $\alpha$. As a special case of Proposition 2.6 i), if ${\mathcal {C}}=\{x\in {{\mathbb {R}}}\mid \Vert x\Vert _2\le \alpha \}$, it follows that ${\mathcal {A}}$ is an $x^0$-centred $\ell ^2$-norm ball with radius $\alpha$. Accordingly, the following corollary holds.

Corollary 2.5

If ${\mathcal {A}}=\{x\in {{\mathbb {R}}}^n\mid \Vert x-x^0\Vert _2\le \alpha \}$with $\alpha >0$, then $\Pi (x;{\mathcal {A}})=x^0+\min \left( 1,\frac{\alpha }{\Vert x-x^0\Vert _2}\right) (x-x^0)$.

We now present some preliminaries involving functions and operators, which will be useful in the sequel discussion.

The subdifferential of f, denoted by $\partial f: {{\mathbb {R}}}^n\rightarrow 2^{{{\mathbb {R}}}^n}$, is defined as

$$\begin{aligned} \begin{array}{c} \partial f(x):=\left\{ \xi \in {{\mathbb {R}}}^n \mid f(y)\ge f(x)+\xi ^\top (y-x), \;\;\forall y\in {{\mathbb {R}}}^n\right\} ,\;\;\forall x\in \hbox {dom}(f). \end{array} \end{aligned}$$

(2.7)

If $f:{{\mathbb {R}}}^n\rightarrow (-\infty ,+\infty ]$ is a convex function, then the subdifferential $\partial f(x)$ is a nonempty compact convex set for all $x\in {{\mathbb {R}}}^n$ (see e.g., [2] for details). We state below some examples of subdifferential.

Example 2.3

Let ${\mathcal {C}}\subset {{\mathbb {R}}}^n$ be a nonempty closed convex set. Then,

$$\begin{aligned} \partial \iota _{\mathcal {C}}(x)&={\left\{ \begin{array}{ll} N_{{\mathcal {C}}}(x):=\{\xi \in {{\mathbb {R}}}^n\mid \xi ^\top (y-x)\le 0,\;\forall y\in {\mathcal {C}}\},&{}\;\;\hbox {if}\;\;x\in {\mathcal {C}}, \\ \emptyset , &{}\;\;\hbox {otherwise}, \end{array}\right. }\\ \partial \sigma _{\mathcal {C}}(x)&={\left\{ \begin{array}{ll} {\mathcal {C}},&{}\;\;\hbox {if}\;\;x=0, \\ F_{\mathcal {C}}(x):=\{\xi \in {\mathcal {C}}\mid \xi ^\top x=\sigma _{\mathcal {C}}(x)\}, &{}\;\;\hbox {otherwise}, \end{array}\right. } \end{aligned}$$

where $N_{\mathcal {C}}$ and $F_{\mathcal {C}}$ are the so-called normal cone and exposed face of ${\mathcal {C}}$.

Example 2.4

Let $\Vert \cdot \Vert _{\dagger }$ be a norm of ${{\mathbb {R}}}^n$ and $\Vert \cdot \Vert _{\ddagger }$ be its dual norm. Then

$$\begin{aligned} \partial \Vert x\Vert _{\dagger }={\left\{ \begin{array}{ll} \{\xi \in {{\mathbb {R}}}^n\mid \Vert \xi \Vert _{\ddagger }\le 1\}, &{}\;\hbox {if}\;\;x={0},\\ \{\xi \in {{\mathbb {R}}}^n\mid \Vert \xi \Vert _{\ddagger }\le 1,\;\xi ^\top x=\Vert x\Vert _{\dagger }\},&{}\;\hbox {otherwise}. \end{array}\right. } \end{aligned}$$

(2.8)

In particular, the subdifferential of $\ell ^p$-norm ($p\in [1,+\infty ]$) at origin is the unit ball associated with its dual norm, i.e., $\partial \Vert 0\Vert _p=\{\xi \in {{\mathbb {R}}}^n\mid \Vert \xi \Vert _{\frac{p}{p-1}}\le 1\}$, whilst the subdifferetial of $\ell ^p$-norm at $x\ne {0}$ can be stated as

$$\begin{aligned}&\partial \Vert x\Vert _1=\Big \{\xi \in {{\mathbb {R}}}^n\mid \xi _i=\frac{x_i}{|x_i|}\;\;\hbox {for}\;\; x_i\ne 0,\;\;\hbox {and}\;\;\xi _i\in [-1,1]\;\;\hbox {for}\; x_i=0\Big \},\\&\partial \Vert x\Vert _p=\Big \{\xi \in {{\mathbb {R}}}^n\mid \xi _i=\frac{x_i|x_i|^{p-2}}{\Vert x\Vert _p^{p-1}}\;\;\hbox {for}\;\; x_i\ne 0,\;\;\hbox {and}\;\;\xi _i=0\;\;\hbox {for}\; x_i=0\Big \},\\&\partial \Vert x\Vert _\infty ={\mathrm conv}\Big (\frac{x_ie_i}{|x_i|}\mid i\in {\mathcal {I}}(x)\Big ), \end{aligned}$$

where ${\mathcal {I}}(x):=\{i\mid |x_i|=\Vert x\Vert _\infty \}$ and $e_i\in {{\mathbb {R}}}^n$ accounts for the canonical basis vector (i.e., the vector whose only nonzero element is 1 in the ith coordinate). The conjugate of f, denoted by $f^*$, is defined as

$$\begin{aligned} \begin{array}{c} f^*(\lambda ):=\sup \big \{x^\top \lambda -f(x)\mid x\in \hbox {dom}(f)\big \}, \;\; \forall \lambda \in {{\mathbb {R}}}^n. \end{array} \end{aligned}$$

(2.9)

Note that $f^*$ is always a closed convex function (even if f is not closed convex). Furthermore, if f is a closed convex proper function, then $f^{**}=f$. It follows by (2.9) that the indicator function $\iota _{\mathcal {C}}$ and support function $\sigma _{\mathcal {C}}$ are conjugate with each other. Let $f:{{\mathbb {R}}}^n\rightarrow (-\infty ,+\infty ]$ be a closed convex proper function. The proximity of f, denoted by $\hbox {prox}_f$, is defined as

$$\begin{aligned} \begin{array}{c} \hbox {prox}_f(x):=\underset{y\in {{\mathbb {R}}}^n}{\hbox {argmin}}\left\{ f(y)+\frac{1}{2}\Vert y-x\Vert ^2_2\right\} , \;\; \forall x\in {{\mathbb {R}}}^n. \end{array} \end{aligned}$$

(2.10)

The followings are some properties of proximity which will be useful for the sequel analysis. The interested reader is referred to [11, Table 10.2] for more instances and properties of proximity operators in Hilbert space.

Proposition 2.6

Let $\alpha >0$and $\gamma \in {{\mathbb {R}}}$be scalars, $b\in {{\mathbb {R}}}^n$be a vector, $A\in {{\mathbb {R}}}^{n\times m}$be a matrix, ${\mathcal {C}}\subset {{\mathbb {R}}}^n$be a nonempty closed convex set, and $\varphi :{{\mathbb {R}}}^n\rightarrow (-\infty ,+\infty ]$be a closed convex proper function.

i)
If $f(x)=\varphi (Ax+b)$with A satisfying $AA^\top =\alpha I_n$, then
$$\begin{aligned} \hbox {prox}_f(x)=x-\alpha ^{-1}A^\top \left( Ax+b-\hbox {prox}_{\alpha \varphi }(Ax+b)\right) . \end{aligned}$$
(2.11)
ii)
If $f(x)=\varphi (x)+\frac{\alpha }{2}\Vert x\Vert _2^2+b^\top x+\gamma$, then $\hbox {prox}_f(x)=\hbox {prox}_{\frac{\varphi }{\alpha +1}}\left( \frac{x-b}{\alpha +1}\right)$.
iii)
Recall that $\varphi ^*$is the conjugate of $\varphi$, then $x=\hbox {prox}_{\alpha \varphi }(x)+\alpha \hbox {prox}_{\frac{\varphi ^*}{\alpha }}(\frac{x}{\alpha })$for any $x\in {{\mathbb {R}}}^n$.
iv)
If $f(x)=\iota _{{\mathcal {C}}}(x)$is indicator function, then $\hbox {prox}_f(x)=\Pi (x;{\mathcal {C}})$; and if $f(x)=\sigma _{{\mathcal {C}}}(x)$is support function, then $\hbox {prox}_f(x)=x-\Pi (x;{\mathcal {C}})$.

Note that Proposition 2.10 iii) is the celebrated Moreau’s identity, which indicates the relationship of proximities between $\varphi$ and its conjugate.

3 Reformulations of (1.4) into variant forms

In this section, we first reformulate (1.4) into variant forms, and then deduce the equivalent monotone inclusion problems by employing the first-order optimality conditions. It will be demonstrated that the problem (1.4) can be tractably solved by a large family of splitting methods.

We start the discussion by restating (1.4) as follows

$$\begin{aligned} \min&\;\;\;\sigma _{\mathcal {F}}(x-x^0) \end{aligned}$$

(3.1a)

$$\begin{aligned} \hbox {subject to}&\;\;\; x\in T_1(\Omega _1)\oplus T_2(\Omega _2)\oplus \cdots \oplus T_m(\Omega _m). \end{aligned}$$

(3.1b)

The existence of optima of (3.1) follows immediately by the Weierstrass’ theorem (see e.g., [2, Chapter 11]).

Proposition 3.1

Problem (3.1) admits at least one optimum if one of the following conditions is satisfied: (i) $T_i(\Omega _i)$ ($i=1,2,\ldots ,m$) are compact sets; (ii) $0\in \mathrm{int}({\mathcal {F}})$.

Proof

i)
By the compactness of $T_i(\Omega _i)$ and Lemma 2.2 ii), we have that the Minkowski sum of $T_i(\Omega _i)$ is a compact set. Moreover, as $\sigma_{\mathcal{F}}$ is closed (see Proposition 2.3), the assertion holds by the Weierstrass’ theorem.
ii)
If $0\in {\mathrm int}(F)$, there exists a neighbourhood of 0, denoted by $\mathcal {N}_\delta (0)$, satisfying $\mathcal {N}_\delta (0)\subset \mathcal {F}$. For any $x\in {{\mathbb {R}}}^n$, we have $y^0:=\frac{\delta (x-x^0)}{2\Vert x-x^0\Vert _2}\in {\mathcal {N}}_\delta (0)$. Accordingly,

$$\begin{aligned} \sigma _{\mathcal {F}}(x-x^0)=\sup _{y\in {\mathcal {F}}} y^\top (x-x^0)\ge \sup _{y\in {\mathcal {N}}_\delta (0)} y^\top (x-x^0)\ge (y^0)^\top (x-x^0)=\frac{\delta }{2}\Vert x-x^0\Vert _2, \end{aligned}$$

which indicates that $\sigma _{\mathcal {F}}(\cdot -x^0)$ is coercive (i.e., $\lim \limits _{\Vert x\Vert _2\rightarrow \infty }\sigma _{\mathcal {F}}(x-x^0)=+\infty$). Again, the assertion (ii) follows from the Weierstrass’ theorem. $\square$

Because of the affine property of $T_i$, it is obvious that Proposition 3.1 (i) holds if all $\Omega _i$’s are compact sets. Under some stringent premise, the uniqueness of optimum of (3.1) can be guaranteed. For instance, if $\Omega$ and ${\mathcal {F}}$ in (3.1) are convex, normally smooth and round,^{Footnote 2} then (3.1) admits a unique optimum [40].

By recalling the affine mappings $T_i$ in (1.5), the constraint (3.1b) can be described as

$$\begin{aligned} \begin{array}{c} x=\sum \limits _{i=1}^m(M_i\omega _i+a_i), \;\;\omega _i\in \Omega _i, \;\;i=1,2,\ldots ,m. \end{array} \end{aligned}$$

(3.2)

Hereafter, for notational convenience, we denote $\omega \in {{\mathbb {R}}}^l$, $a\in {{\mathbb {R}}}^n$, $M\in {{\mathbb {R}}}^{n\times l}$ and $\Omega \subset {{\mathbb {R}}}^l$ as follows

$$\begin{aligned} \begin{array}{c} \omega :=\begin{pmatrix} \omega _1\\ \omega _2\\ \vdots \\ \omega _m\\ \end{pmatrix}, \;\; a:=\sum \limits _{i=1}^ma_i, \;\;M:=\begin{pmatrix}M_1,M_2,\ldots ,M_m\end{pmatrix}\;\; \hbox {and}\;\; \Omega :=\Omega _1\times \cdots \times \Omega _m. \end{array} \end{aligned}$$

(3.3)

3.1 Primal reformulation

With notations in (3.3), the constraint (3.2) can be narrowed down to $x=M\omega +a$ with $\omega \in \Omega$. Accordingly, problem (3.1) can be cast as

$$\begin{aligned} \begin{array}{rl} \min \;\sigma _{\mathcal {F}}(x-x^0)\;\;\;\;\;\; \hbox {subject to}\;\;\;x=M\omega +a,\;\omega \in \Omega , \end{array} \end{aligned}$$

(3.4)

or concisely,

$$\begin{aligned} \min _{\omega \in {{\mathbb {R}}}^l}\;\;\sigma _{\mathcal {F}}(M\omega +a-x^0)+\iota _{\Omega }(\omega ), \end{aligned}$$

(3.5)

where $\iota _{\Omega }$ is the indicator function defined in (2.1). For simplicity, the (3.5) is termed as the primal reformulation of (1.4). The first-order optimality conditions of (3.5) read

$$\begin{aligned} 0\in M^\top \partial \sigma _{\mathcal {F}}(M\omega +a-x^0)+\partial \iota _{\Omega }(\omega ), \end{aligned}$$

(3.6)

where $\partial \iota _{\Omega }$ and $\partial \sigma _{\mathcal {F}}$ are subdifferential operators (see Example 2.8) of $\iota _{\Omega }$ and $\sigma _{\mathcal {F}}$, respectively. Particularly, if the dynamic set $\mathcal {F}=\{x\in {\mathbb {R}}^n\mid \Vert x\Vert _{Q^{-1}}\le 1\}$ with $Q\succ 0$ (or equivalently, $\sigma _{\mathcal {F}}(x)=\Vert x\Vert _Q$), the (3.5) can be equivalently cast as

$$\begin{aligned} \begin{array}{c} \min \limits _{\omega \in {\mathbb {R}}^l}\;\;\frac{1}{2}\Vert M\omega +a-x^0\Vert _Q^2+\iota _{\Omega }(\omega ), \end{array} \end{aligned}$$

(3.7)

which implies that the projection onto Minkowski sum set (1.4) is substantially a constrained least squares problem. Obviously, the objective function of (3.7) is strongly convex if M is of full column rank, which implies that the (3.7) admits unique optimum. Moreover, a corollary can be immediately deduced for particular M.

Corollary 3.2

If ${\mathcal {F}}=\{x\in {\mathbb {R}}^n\mid \Vert x\Vert _{Q^{-1}}\le 1\}$with $Q\succ 0$, i.e., $\sigma _{\mathcal {F}}(x)=\Vert x\Vert _Q$, and $M\in {\mathbb {R}}^{n\times l}$satisfies $M^\top QM=\mathrm{diag}\{\frac{1}{\nu _1}I_{l_1},\frac{1}{\nu _2}I_{l_2},\ldots ,\frac{1}{\nu _m}I_{l_m}\}$with $\nu _i>0$for all $i=1,2,\ldots ,m$ (i.e., the columns of M are Q-conjugate vectors), then the optimum of (3.5), denoted by $\omega ^*=(\omega _1^*,\omega _2^*,\ldots ,\omega _m^*)$, admits the explicit formula

$$\begin{aligned} \begin{array}{c} w_i^*=\Pi \left( \nu _iM_i^\top Q(x^0-a);\Omega _i\right) ,\;\; i=1,2,\ldots ,m. \end{array} \end{aligned}$$

(3.8)

Proof

By the first-order optimality conditions of (3.7) , we derive

$$\begin{aligned} 0\in M^\top Q(M\omega ^*+a-x^0)+\partial \iota _{\Omega }(\omega ^*). \end{aligned}$$

Furthermore, it follows from the assumption $M_i^\top QM_i=\frac{1}{\nu _i}I_{l_i}$ and the fact $\Omega =\Omega _1\times \cdots \times \Omega _m$ that

$$\begin{aligned} \nu _iM_i^\top Q(x^0-a)\in w_i^*+\nu _i\partial \iota _{\Omega _i}(\omega _i^*),\;\; i=1,2, \ldots ,m, \end{aligned}$$

which indicates that $w_i^*=(I_{l_i}+\nu _i\partial \iota _{\Omega _i})^{-1}(\nu _iM_i^\top Q(x^0-a))=\Pi (\nu _iM_i^\top Q(x^0-a);\Omega )$. $\square$

The corollary above indicates that if the column vectors of $M_i$ are Q-conjugate, then the projection onto a convex set with Minkowski sum structure admits closed-form solution.

3.2 Dual reformulation

On the other hand, by deploying Fenchel–Rockafellar duality (see e.g., [2, Chapter 15]), the dual of (3.5) can be formulated as

$$\begin{aligned} \max _{\lambda \in {\mathbb {R}}^n}\;\;\lambda ^\top (a-x^0)-\iota _{\mathcal {F}}(\lambda )-\sigma _{\Omega }(-M^\top \lambda ). \end{aligned}$$

(3.9)

Accordingly, the first-order optimality conditions of (3.9) read

$$\begin{aligned} 0\in a- x^0-\partial \iota _{\mathcal {F}}(\lambda )+M\partial \sigma _{\Omega }(-M^\top \lambda ). \end{aligned}$$

(3.10)

The explicit formula of $\sigma _{\Omega }$ can be available if $\Omega$ is special set, e.g, ball, closed convex cone, and affine set (see Example 2.4). For example, if $\Omega$ is a convex cone, the (3.9) can be cast as $\max _{\lambda \in {\mathbb {R}}^n}\{\lambda ^\top (a-x^0)-\iota _{\mathcal {F}}(\lambda )-\iota _{\Omega ^\circ }(-M^\top \lambda )\}$. Furthermore, if ${\mathcal {F}}=\{x\in {\mathbb {R}}^n\mid \Vert x\Vert _{\infty }\le \alpha \}$ and $\Omega =\{x\in {\mathbb {R}}^n\mid Ax\le 0\}$ be a polyhedral cone, the $\Omega ^\circ =\{x\in {\mathbb {R}}^n\mid x=A^\top z, z\ge 0\}$ is exactly the finite generated cone by columns of A. Thus, the (3.9) can be reduced to

$$\begin{aligned} \begin{array}{rl} \max \limits _{\lambda \in {\mathbb {R}}^n}\;\;\lambda ^\top (a-x^0)\;\;\;\;\;\; \hbox {subject to}\;\;\;\Vert \lambda \Vert _\infty \le \alpha ,\; M^\top \lambda +A^\top z=0,\; z\ge 0, \end{array} \end{aligned}$$

which is essentially a linear programming in variables $(\lambda ,z)$.

3.3 Primal-dual reformulation

Moreover, by applying the conjugate (see (2.9) for definition) of $\sigma _{\mathcal {F}}$ to (3.5), we derive a saddle point problem

$$\begin{aligned} \min _{\omega \in {\mathbb {R}}^l}\max _{\lambda \in {\mathbb {R}}^n}\;\psi (w,\lambda ):=\lambda ^\top (M\omega +a-x^0)-\iota _{\mathcal {F}}(\lambda )+\iota _{\Omega }(\omega ). \end{aligned}$$

(3.11)

Again, by the first-order optimality conditions, the saddle point problem (3.11) amounts to

$$\begin{aligned} {\left\{ \begin{array}{ll} 0\in \partial \iota _{\Omega }(\omega )+M^\top \lambda ,\\ 0\in \partial \iota _{\mathcal {F}}(\lambda )-M\omega -a+x^0, \end{array}\right. } \end{aligned}$$

or equivalently,

$$\begin{aligned} \begin{pmatrix} 0\\ a-x^0 \end{pmatrix}\in \begin{pmatrix} \partial \iota _{\Omega } &{}M^\top \\ -M&{} \partial \iota _{\mathcal {F}} \end{pmatrix}\begin{pmatrix} \omega \\ \lambda \end{pmatrix}. \end{aligned}$$

(3.12)

As the (3.11) is linearly coupled in $\omega$ and $\lambda$, there always exists a saddle point $(\omega ^*,\lambda ^*)$ satisfying (3.12), where $\omega ^*$ and $\lambda ^*$ are solutions of primal and dual problem, respectively (see e.g., [3, Chapter 3]).

Remark 3.3

In a word, the (1.4) can be equivalently reformulated as variant optimization forms, whose optimality conditions correspond to finding zeros of a series of monotone inclusion problems, e.g., (3.6), (3.10) and (3.12). To solve those monotone inclusion problems, numerous splitting schemes can be employed, e.g., proximal point method (PPA), forward-backward splitting (FBS), forward-backward-forward splitting (FBFS), Douglas-Rachford splitting (DRS) and Peaceman-Rachford splitting (PRS), to name a few. The interested reader is referred to, e.g., [15, 38, 42], for overviews and recent advances in splitting schemes. Among those splitting schemes for tackling monotone inclusion problems, the most popular ones include the application of PPA to (3.12) which results in the well-known PDHG method [9, 24], the application DRS to (3.10) which results in ADMM [18, 23], the application DRS to (3.6) which results in the Spingarn’s method [46], etc. In a nutshell, the (1.4) can be tractably solved by a large family of first-order optimization methods except for the computational geometry based algorithms.

4 Solvers for projection onto Minkowski sum set

As discussed in Sect. 3, the (1.4) can be reformulated into variant manners, then solved by numerous state-of-the-art methods. The analysis and numerical comparisons on all those methods are beyond the scope of this paper. Herein, we focus on ADMM [23] and PDHG [24] for solving (1.4).

4.1 ADMM based solver

We first consider a generic linearly constrained convex optimization problem with separable structure

$$\begin{aligned} \min&\;\;\theta _1(x_1)+\theta _2(x_2) \end{aligned}$$

(4.1a)

$$\begin{aligned} \hbox {subject to}&\;\; A_1x_1+A_2x_2=b,\;x_1\in \mathcal {X}_1,\;x_2\in \mathcal {X}_2, \end{aligned}$$

(4.1b)

where $\theta _i:{\mathbb {R}}^{n_i}\rightarrow (-\infty ,+\infty ]$ are all closed convex proper functions (possibly nonsmooth); $\mathcal {X}_i\subseteq {\mathbb {R}}^{n_i}$ ($i=1,2$) are nonempty closed convex sets; $A_i\in {\mathbb {R}}^{l\times n_i}$ ($i=1,2$) are full column rank matrices; $b\in {\mathbb {R}}^l$; and $n_1+n_2=n$. A large family of instances arising from the area such as compressive sensing, image processing, computer vision and machine learning can be boiled down to the abstract model (4.1). A state-of-the-art algorithm for solving (4.1) is ADMM originated in [18, 23]. Given $(x_2^k,\lambda ^k)\in {\mathbb {R}}^{n_2}\times {\mathbb {R}}^l$, the recursion of ADMM for solving the abstract model (4.1) can be stated as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} x_1^{k+1}=\underset{x_1\in {\mathcal {X}}_1}{\hbox {argmin}}L_{\beta }(x_1,x_2^k,\lambda ^k), \\ x_2^{k+1}=\underset{x_2\in {\mathcal {X}}_2}{\hbox {argmin}}L_{\beta }(x_1^{k+1},x_2,\lambda ^k), \\ \lambda ^{k+1}=\lambda ^k-\gamma \beta (A_1x_1^{k+1}+A_2x_2^{k+1}-b), \end{array}\right. } \end{aligned}$$

(4.2)

where $\beta >0$ is a penalty parameter and $\gamma \in (0,\frac{\sqrt{5}+1}{2})$ is a relaxation factor. Herein, the $L_{\beta }:{\mathbb {R}}^{n_1}\times {\mathbb {R}}^{n_2}\times {\mathbb {R}}^l\rightarrow {\mathbb {R}}$ denotes the augmented Lagrangian function of (4.1), which is defined as

$$\begin{aligned} L_{\beta }(x_1,x_2,\lambda ):=\theta _1(x_1)+\theta _2(x_2)-\lambda ^\top (A_1x_1+A_2x_2-b)+\frac{\beta }{2}\Vert A_1x_1+A_2x_2-b\Vert _2^2, \end{aligned}$$

(4.3)

where $\lambda \in {\mathbb {R}}^l$ is the Lagrange multiplier. As ADMM allows us to independently cope with individual sub-function $\theta _i$ by the aid of augmented Lagrangian function, it has gained much popularity in diverse applications (see, e.g., the overviews [6, 22] and references therein). The convergence of ADMM has been restated in the tutorial [14], and its O(1/k) convergence rate was established in [25].

Now, we present the implementation details of ADMM on solving (1.4). Essentially, the primal reformulation of (3.1) falls into the framework of (4.1) with the following specifications:

the variables $x_1:=x$, $x_2:=\omega$, and the abstract sets ${\mathcal {X}}_1={\mathbb {R}}^n$ and ${\mathcal {X}}_2=\Omega$;
the objective functions $\theta _1(x_1):=\sigma _{\mathcal {F}}(x-x^0)$ and $\theta _2(x_2):=0$;
the matrices and vector in linear constraint (4.1b) are $A_1:=I$, $A_2:=-M$ and $b:=a$.

Therefore, the resulting subproblems when applying ADMM (or rather, the iterative scheme (4.2)) to (3.4) read

$$\begin{aligned} {\left\{ \begin{array}{ll} x^{k+1}=\underset{x\in {\mathbb {R}}^n}{\hbox {argmin}}\big \{\sigma _{\mathcal {F}}(x-x^0)+\frac{\beta }{2}\Vert x-M\omega ^k-a-{\beta }^{-1}\lambda ^k\Vert _2^2\big \}, \\ \omega ^{k+1}=\underset{\omega \in \Omega }{\hbox {argmin}}\Vert x^{k+1}-M\omega -a-{\beta }^{-1}\lambda ^k\Vert _2^2, \\ \lambda ^{k+1}=\lambda ^k-\beta (x^{k+1}-M\omega ^{k+1}-a). \end{array}\right. } \end{aligned}$$

(4.4)

Unfortunately, the $\omega$-subproblem in (4.4) is a least squares problem with constraint $\Omega =\Omega _1\times \cdots \times \Omega _m$, which probably requires internally nested iteration to derive an approximate solution (albeit the projection onto individual $\Omega _i$ admits closed-form solution). Note that the compelling numerical performance of ADMM is established in the sense that all resulting subproblems have close-form solutions or can be efficiently approximated by some subroutines. Accordingly, it is sensible to reformulate (3.4) by an alternative fashion. Concretely, by further introducing an auxiliary variable $z\in {\mathbb {R}}^l$, the (3.4) is equivalent to

$$\begin{aligned} \begin{array}{c} \min \;\sigma _{\mathcal {F}}(x-x^0)\;\;\;\;\; \hbox {subject to}\;\;x=Mz+a, \;\omega =z, \;\omega \in \Omega . \end{array} \end{aligned}$$

(4.5)

Likewise, the problem (4.5) falls into the abstract model (4.1) with the following specifications:

the variables $x_1:=(x,w)$, $x_2:=z$, and the abstract sets ${\mathcal {X}}_1:={\mathbb {R}}^n\times \Omega$, ${\mathcal {X}}_2:={\mathbb {R}}^n$;
the objective functions $\theta (x_1):=\sigma _{\mathcal {F}}(x-x^0)$ and $\theta _2(x_2):=0$;
The matrices and vector with respect to the linear constraint (4.1b) are
$$\begin{aligned} A_1:=\begin{pmatrix}I&{}0\\ 0&{}I\end{pmatrix}, \;\; A_2:=\begin{pmatrix}-M\mathrm{i}\end{pmatrix}\;\;\hbox {and}\;\;b:=\begin{pmatrix}a\\ 0\end{pmatrix}. \end{aligned}$$

ADMM is thus applicable to (4.5) with above specifications. More precisely, the augmented Lagrangian function of (4.5) can be specified as

$$\begin{aligned} L_{\beta }(x_1,x_2,\lambda ) =&\sigma _{\mathcal {F}}(x-x^0)-\lambda _1^\top (x-Mz-a)+\frac{\beta }{2}\Vert x-Mz-a\Vert _2^2\\&-\lambda _2^\top (\omega -z)+\frac{\beta }{2}\Vert \omega -z\Vert ^2_2, \end{aligned}$$

where $\lambda =(\lambda _1,\lambda _2)\in {\mathbb {R}}^n\times {\mathbb {R}}^l$ is the Lagrange multiplier. Accordingly, the resulting subproblems when applying ADMM recursion (4.2) to (1.4) read

$$\begin{aligned} {\left\{ \begin{array}{ll} x^{k+1}=\underset{x\in {\mathbb {R}}^n}{\hbox {argmin}}\left\{ \sigma _{\mathcal {F}}(x-x^0)+\frac{\beta }{2}\Vert x-Mz^k-a-\beta ^{-1}\lambda _1^k\Vert _2^2\right\} , \\ \omega ^{k+1}=\underset{\omega \in \Omega }{\hbox {argmin}}\;\Vert \omega -z^k-\beta ^{-1}\lambda _2^k\Vert ^2_2, \\ z^{k+1}=\underset{z\in {\mathbb {R}}^n}{\hbox {argmin}}\left\{ \Vert x^{k+1}-Mz-a-\beta ^{-1}\lambda _1^k\Vert _2^2+\Vert \omega ^{k+1}-z-\beta ^{-1}\lambda _2^k\Vert ^2_2\right\} , \\ \lambda _1^{k+1}=\lambda _1^k-\beta (x^{k+1}-Mz^{k+1}-a),\\ \lambda _2^{k+1}=\lambda _2^k-\beta (w^{k+1}-z^{k+1}). \end{array}\right. } \end{aligned}$$

(4.6)

These resulting subproblems (correspondingly, the x-, $\omega$-, and z-subproblems) can be further delineated as follows:

The x-subproblem can be equivalently solved by the proximity operator
$$\begin{aligned} \begin{array}{rl} x^{k+1}&{}=\underset{x\in {\mathbb {R}}^n}{\hbox {argmin}}\left\{ \sigma _{\mathcal {F}}(x-x^0)+\frac{\beta }{2}\Vert x-r^k\Vert _2^2\right\} \\ \hbox {[By (2.10)]}\;\;\;\;\;&{}=\hbox {prox}_{\frac{1}{\beta }\sigma _{{\mathcal {F}}\left( \cdot -x^0\right) }}(r^k)\\ \hbox {[Proposition 2.6 i)]}\;\;\;\;\;&{}=x^0+\hbox {prox}_{\frac{\sigma _{\mathcal {F}}}{\beta }}(r^k-x^0)\\ \hbox {[Proposition 2.6 iii)]}\;\;\;\;\;&{}=r^k-\frac{1}{\beta }\Pi (\beta (r^k-x^0);{\mathcal {F}}), \end{array} \end{aligned}$$
(4.7)
where $r^k:=Mz^k+a+\lambda _1^k/\beta$. Accordingly, the computational effort on solving (4.7) is dominated by computing the Euclidean projection onto dynamic set $\mathcal {F}$. In the upcoming numerical simulations, we shall consider the cases that $\mathcal {F}\subset {\mathbb {R}}^n$ is the $\ell ^1$-, $\ell ^2$- and $\ell ^\infty$-norm unit ball.
The $\omega$-subproblem is equivalent to
$$\begin{aligned} \begin{array}{rcl} \omega ^{k+1}=\underset{\omega \in \Omega }{\hbox {argmin}}\;\Vert \omega -z^k-\beta ^{-1}\lambda _2^k\Vert ^2_2=\Pi (z^k+\beta ^{-1}\lambda _2^k;\Omega ), \end{array} \end{aligned}$$
(4.8)
which corresponds to the Euclidean projection onto $\Omega$. Attributed to the Cartesian product $\Omega =\Omega _1\times \Omega _2\times \cdots \times \Omega _m$, the $\omega ^{k+1}=(\omega _1^{k+1},\omega _2^{k+1},\ldots ,\omega _m^{k+1})$ can thus be derived simultaneously by
$$\begin{aligned} \begin{array}{rcl} \omega _i^{k+1}&{}=&{}\underset{\omega _i\in \Omega _i}{\hbox {argmin}}\;\Vert \omega _i-z_i^k-\beta ^{-1}(\lambda _2^k)_i\Vert ^2_2\\ &{}=&{}\Pi (z_i^k+\beta ^{-1}(\lambda _2^k)_i;\Omega _i),\;\;i=1,2,\ldots ,m. \end{array} \end{aligned}$$
(4.9)
The z-subproblem amounts to
$$\begin{aligned} \begin{array}{c} z^{k+1}=\underset{z\in {\mathbb {R}}^n}{\hbox {argmin}}\left\{ \Vert x^{k+1}-Mz-a-\beta ^{-1}\lambda _1^k\Vert _2^2+\Vert \omega ^{k+1}-z-\beta ^{-1}\lambda _2^k\Vert ^2_2\right\} , \end{array} \end{aligned}$$
which is a positive definite linear system
$$\begin{aligned} \begin{array}{l} (M^\top M+I)z=M^\top (x^{k+1}-a-\beta ^{-1}\lambda _1^k)+\omega ^{k+1}-\beta ^{-1}\lambda _2^k. \end{array} \end{aligned}$$
(4.10)
There are numerous algorithms catering for the above linear system in numerical computation, e.g., preconditioned conjugate gradient (PCG) method.

Summarily, the updates of x, $\omega$ and z when implementing ADMM can be readily conducted. The projections onto dynamic set ${\mathcal {F}}$ in (4.7) and those component sets $\Omega _i$ in (4.9) can be handled simultaneously. Moreover, a pleasing property of ADMM is that there is only one user-dependent parameter $\beta$ to be tuned throughout numerical implementation.

4.2 PDHG solver

As discussed in Sect. 4.1, ADMM can perform efficiently on solving (1.4) with lower dimensions, e.g., the collision detection in robotics. However, its pitfalls are becoming increasingly evident when the dimension of (1.4) is enlarged. The computational effort on implementing ADMM in each iteration is possibly dominated by solving the linear system (4.10). An alternative is the PDHG method explored in [9, 10, 16, 24, 26], which can be viewed as the application of PPA to the monotone inclusion problem (3.12). Recently, a novel viewpoint was addressed that PDHG can also be interpreted as the application of DRS to a reformulation of primal problem [38].

Given $(\omega ^k,\lambda ^k)\in {\mathbb {R}}^n\times {\mathbb {R}}^l$, the recursion of PDHG [24] on solving the primal-dual reformulation of (1.4) (i.e., the (3.11)), is specified as follows

$$\begin{aligned} {\left\{ \begin{array}{ll} \tilde{\omega }^k=\underset{\omega \in {\mathbb {R}}^l}{\hbox {argmin}}\left\{ \psi (\omega ,\lambda ^k)+\frac{1}{2\tau _1}\Vert \omega -\omega ^k\Vert _2^2\right\} ,\\ \tilde{\lambda }^k=\underset{\lambda \in {\mathbb {R}}^n}{\hbox {argmax}}\left\{ \psi (2\tilde{\omega }^k-\omega ^k,\lambda )-\frac{1}{2\tau _2}\Vert \lambda -\lambda ^k\Vert _2^2\right\} ,\\ \omega ^{k+1}=\omega ^k+\rho (\tilde{\omega }^k-\omega ^k),\\ \lambda ^{k+1}=\lambda ^k+\rho (\tilde{\lambda }^k-\lambda ^k), \end{array}\right. } \end{aligned}$$

(4.11)

where $\psi$ is the saddle point function defined in (3.11), the $\tau _1>0$, $\tau _2>0$ are parameters satisfying $\tau _1\tau _2\Vert M^\top M\Vert _2\le 1$, and $\rho \in (0,2)$ is a relaxation factor. By elaborating the subproblems in (4.11), we obtain the following concrete recursion for solving (1.4) by PDHG

$$\begin{aligned} {\left\{ \begin{array}{ll} \tilde{\omega }_i^k=\Pi (\omega _i^k-\tau _1M_i^\top \lambda ^k;\Omega _i),\;\;i=1,2,\ldots ,m\\ \tilde{\lambda }^k=\Pi (\lambda ^k+\tau _2(M(2\tilde{\omega }^k-\omega ^k)+a-x^0);{\mathcal {F}}),\\ \omega ^{k+1}=\omega ^k+\rho (\tilde{\omega }^k-\omega ^k),\\ \lambda ^{k+1}=\lambda ^k+\rho (\tilde{\lambda }^k-\lambda ^k). \end{array}\right. } \end{aligned}$$

(4.12)

Intuitively, the recursion of PDHG for solving (1.4) is much easier than that of ADMM. The computational effort of (4.12) is merely dominated by solving the projections onto dynamic set ${\mathcal {F}}$ and component sets $\Omega _i$’s. Compared to ADMM, PDHG is fairly favourable for handling large-scale problems as the linear systems are not involved throughout iteration. The only drawback of PDHG may be the challenge in tuning parameters $(\tau _1,\tau _2)$ to obtain compelling numerical performance.

Remark 4.1

In contrast, Gilbert’s method [19] aims to seek the optima of convex quadratic function f on compact set ${\mathcal {C}}$, i.e., $\min _{x\in {\mathcal {C}}}f(x)$. The recursion reads

$$\begin{aligned} x^{k+1}=x^k+\alpha _k[s_{\mathcal {C}}(-\nabla f(x^k))-x^k], \end{aligned}$$

(4.13)

where $\alpha _k\in (0,1]$ is the stepsize computed by a prior criterion, and $s_{\mathcal {C}}(y):=\{x\in {\mathcal {C}}\mid x^\top y=\sup _{z\in {\mathcal {C}}} z^\top y\}$ denotes the contact mapping of ${\mathcal {C}}$. The recursion (4.13) amounts to minimizing f on a polyhedron approximation of ${\mathcal {C}}$ by projection gradient method, or can be deemed as a variant of Frank-Wolfe method (see e.g., [40] for discussion). An improvement of (4.13) can be referred to, e.g. [21], for solving problem (1.4) in 3D case. The worst-case convergence rate of (4.13) is proven to be $O(\frac{1}{\epsilon })$. Alternatively, Qin and An [40] deployed Nesterov gradient-like method [36, Chapter 2.2] to the dual reformulation of (1.4), which admits the worst-case $O(\frac{1}{\sqrt{\epsilon }}\ln (\frac{1}{\epsilon }))$ convergence rate. Both methods are essentially the gradient-like methods in either primal or dual variable. Numerically, as shown in [40], Gilbert’s method may converge slowly near optima due to zigzagging phenomenon.

ADMM and PDHG are state-of-the-art splitting methods involving both primal and dual variables in recursion, and admit the worst-case $O(\frac{1}{\epsilon })$ convergence rate for generic convex programming. PDHG is typically more popular for solving large-scale practical problems as its recursion only involves matrix-vector operations.

5 Numerical simulations

In this section, we conduct some numerical experiments involving the abstract model (1.4) so as to demonstrate the performances of PDHG and ADMM. Essentially, the model (1.4) captures numerous important real-world applications in robotics [29] and mechanics [48]. We test a 2D path planning problem in robotics and some high-dimensional synthetic problems involving (1.4). All codes are written by matlab 7.9 and all experiments are conducted on a Lenovo personal computer with Intel Core (TM) CPU 2.30GHZ and 8G memory.

5.1 Path planning

Path planning plays a fundamental role in robotics and has been extensively studied in real-world applications. It aims to ensure the agents (e.g., manipulator, robot) move freely in a (static or dynamic) workspace without hitting obstacles. We herein consider the scenarios of an agent (more precisely, a manipulator) with a single link and m degree-of-freedom (DoF) links in a static workspace.

Let $\Omega \subset {\mathbb {R}}^n$ denote an n-dimensional ($n=\{2,3\}$ for practical applications in robotics) workspace and ${\mathcal {O}}_i\subset \Omega$ ($i=1,2,\ldots ,l$) are closed sets representing stationary obstacles. Without loss of generality, all ${\mathcal {O}}_i$’s are assumed to be convex.^{Footnote 3} A manipulator with m DoF can be described by a collection of rigid bodies ${\mathcal {A}}_i\subset {\mathbb {R}}^n$ ($i=1,2,\ldots ,m$). Figure 2 shows the diagram representations of some manipulators with variant shapes. The links (also called arms) of manipulator, which are concatenated by the joints (green points in Fig. 2), are typically mimicked by simple sets such as line segment, polytope, and ellipsoid.

In path planning, a probabilistic roadmap is constructed as an undirected graph [27], followed by a collision-free path by repetitively checking the distances between obstacles and links. A mathematical model can be formulated by calculating the distance function

$$\begin{aligned} d({\mathcal {O}},{\mathcal {A}}):=\inf \left\{ \sigma _{\mathcal {F}}(x,y)\mid x\in {\mathcal {O}}, y\in {\mathcal {A}}\right\} , \end{aligned}$$

(5.1)

where $\sigma _{\mathcal {F}}$ is the generalized distance metric defined in (1.2). The ${\mathcal {O}}:=\bigcup _{i=1}^l{\mathcal {O}}_i$ and ${\mathcal {A}}:=\bigcup _{i=1}^m{\mathcal {A}}_i$ denote the collection of stationary obstacles and rigid links, respectively. Unfortunately, the problem (5.1) is prohibitively difficult to be tackled by numerical optimization methods because the constraint ${\mathcal {O}}$ or ${\mathcal {A}}$ is typically nonconvex, even if all component set ${\mathcal {O}}_i$’s or ${\mathcal {A}}_i$’s are convex. An equivalent formulation of (5.1) is to compute the distances between a series of component sets, i.e.,

$$\begin{aligned} d({\mathcal {O}},{\mathcal {A}})=\min \{d({\mathcal {O}}_i,{\mathcal {A}}_j)\mid i=1,2,\ldots ,l;\;j=1,2,\ldots ,m\}. \end{aligned}$$

(5.2)

Under the convexity assumption on ${\mathcal {O}}_i$’s and ${\mathcal {A}}_i$’s, the nonconvex optimization problem (5.1) can be possibly handled by solving ml convex optimization problems. There is abundant literature on solving (5.2) with ${\mathcal {O}}_i$’s and ${\mathcal {A}}_i$’s being ellipsoids or polytopes. Those special shapes are typically resulted by approximating obstacles or links via principal component analysis and linearization technique. The ability to cope with only the ellipsoidal and polytonal obstacles may be not accurate enough. Herein, we further assume that the component sets ${\mathcal {O}}_i$’s and ${\mathcal {A}}_i$’s can be described as the Minkowski sum of convex sets. This assumption is admissible as the manipulators typically admit special geometry shape, e.g., polyhedron and ellipsoid (which can be easily described as Minkowski sum of simplex and ball). In order to formulate a generic shape (possibly non-convex) as the Minkowski sum of “simple” set, a possible idea is first approximating the shape by a union of polyhedrons (e.g., [7, 41]) or ellipsoids (e.g., by using principle component analysis), and then deriving the Minkowski sum description for those polyhedrons and ellipsoids.

Figure 3 shows an environment of 2D workspace $\Omega =[0,75]^2$ with four stationary obstacles. The manipulators with single- and 4-DoF links are illustrated in Fig. 3 (a) and (b), respectively. The base of 4-DoF manipulator is fixed at the point (25, 20) and all links of manipulator admit the same shapes. Accordingly, the link ${\mathcal {A}}_i$ can be described as a rigid transformation of stencil body, denoted by ${\mathcal {A}}(P,\theta )$ (see Fig. 2b). The stencil body is hinged by the location of joint (denoted by P), and the phase of dominant axis (denoted by $\theta$). The data for reproducing obstacles ${\mathcal {O}}_i$’s and the stencil body ${\mathcal {A}}(P,\theta )$ are reported in Table 1. All obstacles and links are represented as the Minkowski sum of set $T_i(\Omega _i)$, where the sets $\Omega _i$ ($i=1,2,3$) are defined by $\Omega _1=\{x\in {\mathbb {R}}^2\mid \Vert x\Vert _1\le 1, x_2\ge 0\}$, $\Omega _2=\{x\in {\mathbb {R}}^2\mid \Vert x\Vert _2\le 1\}$ and $\Omega _3=\{x\in {\mathbb {R}}^2\mid \Vert x\Vert _{\infty }\le 1\}$. In Table 1, the parameter w for $M_9$ indicates the length of link. We take w$=3$ and w$=5$ for sigle- and 4-DoF manipulators, respectively. The aim of manipulator in Fig. 4 is to move freely from initial configuration (red location) to goal configuration (green location) without hitting obstacles.

Table 1 Data of the obstacles ${\mathcal {O}}_i$$(i=1,2,3,4)$ and links for path planning problem

Full size table

In numerical simulation, we first generate a probabilistic roadmap by the method in [27], then collision-free paths for moving manipulators from initial to goal configurations are probed by repetitively computing the distances $d({\mathcal {O}}_i,{\mathcal {A}}_j)$ in (5.2) (more precisely, the $d({\mathcal {O}}_i,{\mathcal {A}}(P,\theta ))$ with data in Table 1). The dynamic set ${\mathcal {F}}$ is taken as $\mathcal {F}=\{x\in {\mathbb {R}}^2\mid \Vert x\Vert _2\le 1\}$ for this real-world application, which is equivalent to choosing $\sigma _{\mathcal {F}}(x)=\Vert x\Vert _2$. Throughout, we choose the penalty parameter $\beta \equiv 1$ for ADMM, and $(\tau _1,\tau _2)=(1,1.1/\Vert M^TM\Vert _2)$ for PDHG. The maximum number of iteration is set as 100. The motion tracks of manipulators, which is indicated by the colour changes from initial to goal configurations, are shown in Fig. 4. The computational time to realize this path planning problem is less than 5 s for single-link manipulator and 10 s for 4-DoF manipulator on our personal laptop computing system.

5.2 High-dimensional problems

We now consider the problem (1.4)–(1.5) with synthetic data in high dimensions so as to demonstrate the numerical performances of ADMM and PDHG. Numerical comparisons with some benchmark methods in the literature are also tested, including the Gilbert’s method [19], Qin and An’s method [40]. The datasets of (1.4)–(1.5) are synthesized as follows:

All $M_i$’s in (1.5) are taken as square matrices, i.e., $l_i\equiv n$ for all $i=1,2,\ldots ,m$.
The $M_i$ and $a_i$ in (1.5) are randomly generated with entries being independent and identically distributed (i.i.d) variables. Concretely, the entries of $M_i$ obey the standard normal distribution, and the entries of $a_i$ follow the uniform distribution on $[mn,mn+10]$.
The component sets $\{\Omega _i\}_{i=1}^m$ in (1.4) are randomly taken among some abstract sets, including the unit $\ell ^p$-norm balls with $p=\{1,2,\infty \}$, the nonnegative orthant, and the second-order cone with $\alpha =1$ (see Example 2.5 iv) for definition).

We test the (1.4)–(1.5) on scenarios of $\sigma_{\mathcal{F}}(\cdot )=\Vert \cdot \Vert _p$ ($p=\{1,2,\infty \}$) with different number of component sets (i.e., the m) and variant dimension of variables (i.e., the n). Both ADMM and PDHG methods are conducted with their implementation details elucidated in Sect. 4. As Gilbert’s method [19], Qin and An’s method [40] can also be applicable for solving (1.4) with $\sigma _{\mathcal {F}}(\cdot )=\Vert \cdot \Vert _2$, we test both methods on (1.4)–(1.5) with Euclidean norm. For the involved parameters in test methods, we choose $\alpha =0.5$ for Gilbert’s method; $\mu =0.001$ for Qin and An’s method; $\beta =0.8$, and $\gamma =1.6$ for ADMM; $\tau _1=1$, $\tau _2=1.1/\Vert M^\top M\Vert _2$ and $\rho =1.5$ for PDHG. As for the nm-by-nm linear system (4.10) in ADMM recursion, we first economize the arithmetic operations on deriving the inverse of coefficient matrix $I+M^\top M$ by aid of the Sherman-Morrison formula, then cope with an n-by-n linear system by using the build-in matlab package linsolve. All initial points required for test methods are taken as zeros vectors. The stopping criterion is measured by the primal-dual relative gap, which is defined as

$$\begin{aligned} \hbox {relgap}=\frac{|{\texttt {Pobj}}-{\texttt {Dobj}}|}{1+|{\texttt {Pobj}}|+|{\texttt {Dobj}}|}<\hbox {Tol}, \end{aligned}$$

(5.3)

where Pobj and Dobj represent the primal and dual objective function values of (1.4), respectively. According to the discussion in Sect. 3, the values of Pobj and Dobj can be respectively computed by $\sigma _{\mathcal {F}}(x-x^0)$ and $\lambda _1^\top (a-x^0)-\sigma _{\Omega }(\lambda _2)$ when implementing ADMM recursion, whilst the values of Pobj and Dobj can be respectively computed by $\sigma _{\mathcal {F}}(Mw+a-x^0)$ and $\lambda ^\top (a-x^0)-\sigma _{\Omega }(-M^\top \lambda )$ when implementing PDHG recursion.

We report in Tables 2, 3 and 4 the number of iterations and computing time in seconds (“No. Iteration(CPU)”) when the stopping criterion (5.3) reaches the pre-determined tolerances. Note that the number of iterations and computing time reported in Tables 2, 3 and 4 are averaged by running 50 i.i.d random datasets. Table 2 shows that: (i) the performances of ADMM and PDHG outperform substantially those of Gilbert’s method, Qin and An’s method; (ii) ADMM is competitive to PDHG when the dimensionality (i.e., the n) and the number of component sets (i.e., the m) in problem (1.4) are relative small. The superiority of PDHG would be more evident when the dimensionality or the number of component sets in (1.4) increases. The number of iterations between ADMM and PDHG differs slightly, whilst the computing time of both methods is quite disparate. The reason is that ADMM recursion involves solving generic linear systems which are usually time-consuming in high-dimensional scenarios, whilst the recursion of PDHG merely involves matrix-vector multiplications whose arithmetic operations are relatively low. Both ADMM and PDHG are also tested for solving problem (1.4) with nonsmooth distance merits, e.g., $\sigma_{\mathcal{F}}(x)=\Vert x\Vert _1$ and $\Vert x\Vert _\infty$. The numerical results are reported in Tables 3 and 4, which demonstrate the compelling performances of PDHG. In Fig. 5, we plot the evolutions of primal-dual relative gap with respect to the number of iterations and computing time in seconds for some test datasets with variant support function $\sigma _{\mathcal {F}}$. The figures show that both ADMM and PDHG perform well for solving the problem (1.4). PDHG is more numerically preferable for solving problem (1.4) in large-scale.

Table 2 Numerical results on solving (1.4)–(1.5) with distance merit $\sigma _{\mathcal {F}}(\cdot )=\Vert \cdot \Vert _2$

Full size table

Table 3 Numerical results on solving (1.4)–(1.5) with distance merit $\sigma _{\mathcal {F}}(\cdot )=\Vert \cdot \Vert _1$

Full size table

Table 4 Numerical results on solving (1.4)–(1.5) with distance merit $\sigma _{\mathcal {F}}(\cdot )=\Vert \cdot \Vert _\infty$

Full size table

6 Concluding remarks

In this paper, we explore the distance between convex sets with Minkowski sum structure (or equivalently, the projection onto Minkowski sum of convex sets), which is a fundamental nomenclature in computational geometry. By reformulating the problem into a separable convex optimization with linear constraints, we cope with the resulted model by some recent advanced optimization methods. Attributed to the skilfully first-order optimization methods, the distance can be efficiently solved in primal, dual, or primal-dual forms. Moreover, those methods are also favourable for solving large-scale computational geometry problems by aid of parallel computing techniques.

Notes

The Moore–Penrose pseudo-inverse exists and is unique for any $A\in \mathbb {C}^{m\times n}$. Let $\hbox {rank}(A)=r$ and $A=U\Sigma V^*$ be the singular value decomposition with $U\in \mathbb {C}^{m\times r}$, $\Sigma \in {{\mathbb {R}}}^{r\times r}$ and $V\in \mathbb {C}^{m\times r}$. Then $A^\dagger =V\Sigma ^{-1}U^*$. Particularly, $A^\dagger =(A^*A)^{-1}A^*$ when A is of full column rank, and $A^\dagger =A^*(AA^*)^{-1}$ when A is of full row rank.
Let $\mathrm {bd}({\mathcal {C}})$ and $N_{\mathcal {C}}(x)$ denote the topological boundary and normal cone (see also Example 2.8 for definition) of a set ${\mathcal {C}}\subset {{\mathbb {R}}}^n$, respectively. The ${\mathcal {C}}$ is called normally smooth if, for any $x\in \mathrm{bd}({\mathcal {C}})$, there exists an $a_x\in {{\mathbb {R}}}^n$ such that $N_{\mathcal {C}}(x)=\mathrm {cone}\{a_x\}$. The ${\mathcal {C}}$ is said to be round if $N_{\mathcal {C}}(x)\ne N_{\mathcal {C}}(y)$ for any x, $y\in \mathrm{bd}({\mathcal {C}})$ and $x\ne y$.
If an obstacle ${\mathcal {O}}_i$ has nonconvex feature, it can be approximate by convex decomposition (see Definition 2.1). Some literature on deriving convex decomposition of a given set can be referred to, e.g., [5, 28, 41].

References

Aliyu, M.D.: A vertex algorithm for collision detection. Eur. J. Oper. Res. 120, 174–180 (2000)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)
Book MATH Google Scholar
Bertsekas, D.P.: Convex Analysis and Optimization. Athena Scientific, Belmont (2003)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Upper Saddle River (1989)
MATH Google Scholar
Böröczky, K., Reitzner, M.: Approximation of smooth convex bodies by random circumscribed polytopes. Ann. Appl. Probab. 14, 239–273 (2004)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011)
Article MATH Google Scholar
Bronstein, E.M.: Approximation of convex sets by polytopes. J. Math. Sci. 153, 727–762 (2008)
Article MathSciNet MATH Google Scholar
Cameron, S.: Enhancing GJK: computing minimum and penetration distances between convex polyhedra. In: IEEE International Conference on Robotics and Automation, pp. 3112–3117 (2002)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2012)
Article MathSciNet MATH Google Scholar
Chambolle, A., Ehrhardt, M.J., Richtárik, P., Schönlieb, C.: Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM J. Optim. 28, 2783–2808 (2018)
Article MathSciNet MATH Google Scholar
Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., et al. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer, Berlin (2011)
Chapter MATH Google Scholar
Condat, L.: Fast projection onto the simplex and the $\ell _1$ ball. Math. Program. 158, 575–585 (2016)
Article MathSciNet MATH Google Scholar
Dax, A.: The distance between two convex sets. Linear Algebra Appl. 416, 184–213 (2006)
Article MathSciNet MATH Google Scholar
Eckstein, J.: Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. RUTCOR Research Report, (2012)
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D. Thesis, MIT (1989)
Esser, E., Zhang, X.Q., Chan, T.F.: A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3, 1015–1046 (2010)
Article MathSciNet MATH Google Scholar
Fogel, E., Halperin, D., Wein, R.: CGAL Arrangements and Their Applications. Springer, Berlin (2012)
Book MATH Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximations. Comput. Math. Appl. 2, 17–40 (1976)
Article MATH Google Scholar
Gilbert, E.: An iterative procedure for computing the minimum of a quadratic form on a convex set. SIAM J. Control 6, 61–80 (1966)
Article MathSciNet MATH Google Scholar
Gilbert, E., Foo, C.: Computing the distance between general convex objects in three-dimensional space. IEEE Trans. Robot. Autom. 6, 53–61 (1990)
Article Google Scholar
Gilbert, E., Johnson, D., Keerthi, S.: A fast procedure for computing the distance between complex objects in three-dimensional space. IEEE Trans. Robot. Autom. 4, 193–203 (1988)
Article Google Scholar
Glowinski, R.: On alternating direction methods of multipliers: a historical perspective. Model. Simul. Optim. Sci. Technol. Comput. Methods Appl. Sci. 34, 59–82 (2014)
MathSciNet MATH Google Scholar
Glowinski, R., Marrocco, A.: Sur l’approximation paréléments finis d’ordre unet larésolution parpénalisation-dualité d’une classe deproblèmes de Dirichlet non linéaires. Revue Fr. Autom. Inform. Rech. Opér., Anal. Numér 2, 41–76 (1975)
Google Scholar
He, B.S., Yuan, X.M.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM J. Imaging Sci. 5, 119–149 (2012)
Article MathSciNet MATH Google Scholar
He, B.S., Yuan, X.M.: On the $\cal{O}(1/n)$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)
Article MathSciNet MATH Google Scholar
He, B.S., You, Y.F., Yuan, X.M.: On the convergence of primal-dual hybrid gradient algorithm. SIAM J. Imaging Sci. 7, 2526–2537 (2014)
Article MathSciNet MATH Google Scholar
Kavraki, L.E., Svestka, P., Latombe, J.C., Overmars, M.H.: Probabilistic roadmaps for path planning in high-dimensional configuration space. IEEE Trans. Robot. Autom. 12, 566–580 (1996)
Article Google Scholar
Keil, J.M.: Decomposing a polygon into simpler components. SIAM J. Comput. 14, 799–817 (1985)
Article MathSciNet MATH Google Scholar
Latombe, J.C.: Robot Motion Planning. Kluwer Academic Publishers, Boston (1991)
Book MATH Google Scholar
Lewis, A.S., Malick, J.: Alternating projections on manifolds. Math. Oper. Res. 33, 216–234 (2008)
Article MathSciNet MATH Google Scholar
Lin, M., Canny, J.: A fast algorithm for incremental distance calculation. In: IEEE International Conference on Robotics and Automation, pp. 266–275 (1991)
Liu, Z., Fathi, Y.: An active index algorithm for the nearest point problem in a polyhedral cone. Comput. Optim. Appl. 49, 435–456 (2011)
Article MathSciNet MATH Google Scholar
Mayer, A., Zelenyuk, V.: Aggregation of Malmquist productivity indexes allowing for realloction of resources. Eur. J. Oper. Res. 238, 774–785 (2014)
Article MATH Google Scholar
Mitchell, B.F., Dem’Yanov, V.F., Malozemov, V.N.: Finding the point of a polyhedron closest to the origin. SIAM J. Control 12, 19–26 (1974)
Article MathSciNet MATH Google Scholar
Németh, A., Németh, S.: How to project onto an isotone projection cone. Linear Algebra Appl. 433, 41–51 (2010)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence $O(1/k^2)$. Dokl. Akad. Nauk SSSR 269, 543–547 (1983)
MathSciNet Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)
Article MathSciNet MATH Google Scholar
O’Connor, D., Vandenberghe, L.: On the equivalence of the primal-dual hybrid gradient method and Douglas–Rachford splitting. Math. Program. 179, 85–108 (2020)
Article MathSciNet MATH Google Scholar
Ong, C.J., Gilbert, E.: Fast versions of the Gilbert–Johnson–Keerthi distance algorithm: additional results and comparisons. IEEE Trans. Robot. Autom. 17, 531–539 (2001)
Article Google Scholar
Qin, X.L., An, N.T.: Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets. Comput. Optim. Appl. 74, 821–850 (2019)
Article MathSciNet MATH Google Scholar
Rosin, P.L.: Shape partitioning by convexity. IEEE Trans. Syst. Man, Cybern. A 30, 202–210 (2000)
Article Google Scholar
Ryu, E., Boyd, S.: A primer on monotone operator methods. Appl. Comput. Math. 15, 3–43 (2016)
MathSciNet MATH Google Scholar
Sekitani, K., Yamamoto, Y.: Recursive algorithm for finding the minimum norm point in a polytope and a pair of closest points in two polytopes. Math. Program. 61, 233–249 (1993)
Article MathSciNet MATH Google Scholar
Serra, J.: Image Analysis and Mathematical Morphology. Academic Press, Cambridge (1983)
Google Scholar
Smid, P.: CNC Programming Handbook. Industrial Press, New York (2008)
Google Scholar
Spingarn, J.E.: Applications of the method of partial inverses to convex programming: decomposition. Math. Program. 32, 199–223 (1985)
Article MathSciNet MATH Google Scholar
Sra, S.: Fast projections onto $\ell _{1, q}$-norm balls for grouped feature selection. In: Machine Learning and Knowledge Discovery in Databases, pp. 305–317 (2011)
Sussman, G.J., Wisdom, J.: Structure and Interpretation of Classical Mechanics. MIT Press, Cambridge (2002)
MATH Google Scholar
van den Berg, E., Friedlander, M.P.: Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput. 31, 890–912 (2008)
Article MathSciNet MATH Google Scholar
Wolfe, P.: Finding the nearest point in a polytope. Math. Program. 11, 128–149 (1976)
Article MathSciNet MATH Google Scholar
Zheng, Y., Yamane, K.: Generalized distance between compact convex sets: algorithms and applications. IEEE Trans. Robot. 31, 988–1003 (2015)
Article Google Scholar
Zhu, X.Y., Tso, S.K.: A peudodistance function and its applications. IEEE Trans. Robot. 20, 344–352 (2004)
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous referees for their valuable comments, which helped improving the paper substantially. X. Wang was supported by NSFC 11871279 and STCSM 19ZR1414200. W. Zhang was supported by NSFC 11971003.

Author information

Authors and Affiliations

School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
Xiangfeng Wang
School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, 611731, China
Junping Zhang & Wenxing Zhang

Authors

Xiangfeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Junping Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenxing Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenxing Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Zhang, J. & Zhang, W. The distance between convex sets with Minkowski sum structure: application to collision detection. Comput Optim Appl 77, 465–490 (2020). https://doi.org/10.1007/s10589-020-00211-0

Download citation

Received: 21 July 2019
Published: 13 July 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10589-020-00211-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The distance between convex sets with Minkowski sum structure: application to collision detection

Abstract

Similar content being viewed by others

Distance Geometry in Active Structures

Collision Detection

Collision Detection

1 Introduction

2 Preliminaries

Definition 2.1

Lemma 2.2

Proposition 2.3

Example 2.1

Example 2.2

Proposition 2.4

Proof

Corollary 2.5

Example 2.3

Example 2.4

Proposition 2.6

3 Reformulations of (1.4) into variant forms

Proposition 3.1

Proof

3.1 Primal reformulation

Corollary 3.2

Proof

3.2 Dual reformulation

3.3 Primal-dual reformulation

Remark 3.3

4 Solvers for projection onto Minkowski sum set

4.1 ADMM based solver

4.2 PDHG solver

Remark 4.1

5 Numerical simulations

5.1 Path planning

5.2 High-dimensional problems

6 Concluding remarks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation