1 Introduction

The method of alternating projections is a classical tool to solve the following feasibility problem: Given closed sets \(A,B\) in \({\mathbb {R}}^n\), find a point \(x^*\in A \cap B\). Alternating projections can be traced back to the work of Schwarz [26] in 1869 and were popularized in lecture notes of von Neumann [23] since the 1930s. The method generates sequences \(a_k\in P_A(b_{k-1})\), \(b_k\in P_B(a_k)\), where \(P_A\), \(P_B\) are the set-valued orthogonal projection operators on \(A\) and \(B\). If the alternating sequence \(a_k,b_k\) is bounded and satisfies \(a_k-b_k\rightarrow 0\), then each of its accumulation points is a solution of the feasibility problem. The fundamental question is when such a sequence converges to a single limit point \(x^*\in A \cap B\).

For convex sets, alternating projections are globally convergent as soon as \(A \cap B \not =\emptyset \), and the survey [2] gives an excellent state of art of the convex theory. In one of the earliest contributions to the non-convex case, Combettes and Trussell [11] proved in 1990 that the set of accumulation points of a bounded sequence of alternating projections with \(a_k-b_k\rightarrow 0\) is either a singleton or a nontrivial compact continuum. In 2013, it was shown in [6] by way of an example that the continuum case may indeed occur. This shows that without convexity, a sequence of alternating projections \(a_k,b_k\) may fail to converge even when it is bounded and satisfies \(a_k-b_k\rightarrow 0\).

In 2008, Lewis and Malick [20] proved that a sequence \(a_k,b_k\) of alternating projections converges locally linearly if \(A,B\) are \(C^2\)-manifolds intersecting transversally. Expanding on this in 2009, Lewis et al. [21] proved local linear convergence for general \(A,B\) intersecting non-tangentially in the sense of linear regularity, where one of the sets is superregular. In 2013, Bauschke et al. [4, 5] investigated the case of non-tangential intersection further and proved linear convergence under weaker regularity and transversality hypotheses.

Here we prove local convergence under less restrictive conditions, where \(A,B\) may also intersect tangentially. We propose a new geometric concept, called separable intersection, which gives local convergence of alternating projections when combined with Hölder regularity, a mild hypothesis less restrictive than prox-regularity.

Separable intersection has wide scope for applications, as it not only includes non-tangential intersection, but goes beyond and allows also a large variety of cases where \(A, B\) intersect tangentially. In particular, we prove that closed subanalytic sets \(A,B\) always intersect separably. This leads to the central result that alternating projections between subanalytic sets converge locally with rate \({\mathcal {O}}(k^{-\rho })\) for some \(\rho \in (0,\infty )\) if one of the sets is Hölder regular with respect to the other. As these hypotheses are satisfied in practical situations, we obtain a theoretical explanation for the fact, observed in practice, that even without convexity, alternating projections converge well in the neighborhood of \(A\cap B\). As an application, we obtain a local convergence proof for the classical Gerchberg–Saxton error reduction algorithm in phase retrieval.

The structure of the paper is as follows: Sect. 3 introduces the concept of separable intersection of two closed sets. Then \(0\)-separability is related to existing transversality concepts. In Sect. 4, we discuss Hölder regularity and compare it to older regularity concepts like prox-regularity, Clarke regularity, and superregularity. The central Sect. 5 gives the convergence proof with rate for sets intersecting separably. In Sect. 6, we show that subanalytic sets intersect separably and then deduce the convergence result for subanalytic sets. Section 6 gives also some applications indicating the versatility of our convergence test. In particular, we prove local convergence of an averaged projection method related to in [1, Corollary 12], where the authors use the Kurdyka-Łojasiewicz inequality. The final Sect. 7 gives limiting examples.

After the initial version of this article appeared, a concept related to our notion of \(0\)-separability, called intrinsic transversality, was announced in [12]. We compare this to our own transversality and regularity concepts in Sects. 3 and 4.

2 Preparation

Given a non-empty closed subset \(A\) of \({\mathbb {R}}^n\), the projection onto \(A\) is the set-valued mapping \(P_A\) associating with \(x\in {\mathbb {R}}^n\) the non-empty set

$$\begin{aligned} P_A(x)=\left\{ a\in A: \Vert x-a\Vert =d_A(x) \right\} , \end{aligned}$$

where \(\Vert \cdot \Vert \) is the Euclidean norm, induced by the scalar product \(\langle \cdot ,\cdot \rangle \), and where \(d_A(x)=\min \{\Vert x-a\Vert : a\in A\}\). The closed Euclidean ball with center \(x\) and radius \(r\) is denoted \(\mathcal {B}(x,r)\). We write \(a\in P_A(b)\) since the projection is potentially set-valued, while \(a=P_A(b)\) means it is unique.

A sequence of alternating projections between non-empty closed sets \(A,B\) satisfies \(b_k\in P_B(a_k), a_{k+1}\in P_A(b_k)\), \(k\in {\mathbb {N}}\). We occasionally switch to the following index-free notation, which is standard in optimization:

$$\begin{aligned} b\in P_B(a), a^+\in P_A(b), b^+\in P_B(a^+), \mathrm{etc.} \end{aligned}$$

The sequence of alternating projections is then \(\dots , a,b,a^+,b^+,a^{++},b^{++},\dots \). We refer to \(a \rightarrow b \rightarrow a^+\), respectively, \(b\rightarrow a^+\rightarrow b^+\), as the building blocks of the sequence, where it is always understood that \(b\in P_B(a)\), \(a^+\in P_A(b)\), \(b^+\in P_B(a^+)\), etc.

Notions from non-smooth analysis are covered by [22, 25]. The proximal normal cone to \(A\) at \(a\in A\) is the set \(N_A^p(a)=\{\lambda u: \lambda \ge 0, a \in P_A(a+u)\}\). The normal cone to \(A\) at \(a\in A\) is the set \(N_A(a)\) of vectors \(v\) for which there exist \(a_k\in A\) with \(a_k \rightarrow a\) and \(v_k\in N_A^p(a_k)\) such that \(v_k\rightarrow v\). The Fréchet normal cone \(\widehat{N}_A(a)\) to \(A\) at \(a\in A\) is the set of \(v\) for which \(\limsup _{A \ni a'\rightarrow a} \frac{\langle v,a'-a\rangle }{\Vert a'-a\Vert }\le 0\); cf. [22, (1.2)]. We have the inclusions \(N_A^p(a)\subset \widehat{N}_A(a)\subset N_A(a)\); cf. [22, Chapter 2.D and (1.6)] or [4, Lemma 2.4]. For any function \(f:\mathbb {R}^n\rightarrow \mathbb {R}\cup \{\infty \}\), the epigraph of \(f\) is the set \(\mathrm{epi}f=\{(x,\xi )\in \mathbb {R}^n\times \mathbb {R} : \xi \ge f(x)\}\). The proximal subdifferential \(\partial _pf(x)\) of a lower semi-continuous function \(f\) at \(x\in \mathrm{dom}f\) is the set of vectors \(v\in {\mathbb {R}}^n\) such that \((v,-1)\in N^p_{\mathrm{epi}f}(x,f(x))\); [22, (2.81)]. The subdifferential \(\partial f(x)\) of \(f\) at \(x\in \mathrm{dom} f\) is the set of \(v\) satisfying \((v,-1)\in N_{\mathrm{epi} f}(x,f(x))\). The Fréchet subdifferential \(\widehat{\partial }f(x)\) at \(x\in \mathrm{dom} f\) is the set of \(v\in {\mathbb {R}}^n\) such that \((v,-1)\in \widehat{N}_{\mathrm{epi}f}(x,f(x))\), cf. [22, (1.51)].

3 Tangential and Non-tangential Intersection

In this section, we introduce the fundamental concept of separable intersection of sets \(A,B\), which plays the central role in our convergence theory.

Definition 1

(Separable intersection). We say that \(B\) intersects \(A\) separably at \(x^*\in A \cap B\) with exponent \(\omega \in [0,2)\) and constant \(\gamma >0\) if there exists a neighborhood \(U\) of \(x^*\) such that for every building block \(b\rightarrow a^+\rightarrow b^+\) in \(U\), the condition

$$\begin{aligned} \langle b-a^+,b^+-a^+\rangle \le \left( 1-\gamma \Vert b^+-a^+\Vert ^\omega \right) \Vert b-a^+\Vert \Vert b^+-a^+\Vert \end{aligned}$$
(1)

is satisfied.

We say that \(B\) intersects \(A\) separably at \(x^*\) if (1) holds for some \(\omega \in [0,2)\), \(\gamma > 0\). If it is also true that \(A\) intersects \(B\) separably, that is, if the analogue of (1) holds for building blocks \(a \rightarrow b \rightarrow a^+\), then we obtain a symmetric condition, and in that case, we say that \(A,B\) intersect separably at \(x^*\).

Remark 1

Condition (1) discloses itself if we introduce the angle \(\alpha = \angle (b-a^+,b^+-a^+)\) and rewrite (1) in the more suggestive form

$$\begin{aligned} \displaystyle \frac{1-\cos \alpha }{\Vert a^+-b^+\Vert ^\omega } \ge \gamma , \end{aligned}$$
(1′)

calling this the angle condition for the building block \(b\rightarrow a^+\rightarrow b^+\). For \(\omega \in (0,2)\) the interpretation of (1), or \((1')\), is that if the angle \(\alpha \) between \(b-a^+\) and \(b^+-a^+\) for two consecutive projection steps \(b\rightarrow a^+ \rightarrow b^+\) shrinks down to \(0\) as the alternating sequence approaches \(x^*\), then \(\alpha \) should not shrink too fast. Namely, through \((1')\), the angle is linked to the shrinking distance between the sets. For \(\omega =0\), the meaning of \((1')\) is that the angle \(\alpha \) stays away from \(0\).

Remark 2

Suppose \(B\) intersects \(A\) separably with exponent \(\omega \in [0,2)\) and constant \(\gamma >0\) at \(x^*\). Let \(\omega '\in (\omega ,2)\) and \(\gamma ' \in ( 0,\gamma ]\). Then \(B\) intersects \(A\) also \(\omega '\)-separably with constant \(\gamma '\). In consequence, \(0\)-separability is the severest condition, while \(\omega \)-separability gets less restrictive as \(\omega \) increases.

As we shall see, for \(\omega \ge 2\), property (1) can still be formulated, but turns out too weak to be meaningful. For an illustration, see Example 7.8.

Remark 3

Informally, when the angle \(\alpha = \angle (b-a^+,b^+-a^+)\) between two consecutive projection steps shrinks to zero, \(A,B\) must in some sense intersect tangentially at \(x^*\). In contrast, when \(\alpha \) stays away from 0, the case of \(0\)-separability, one could say that \(A,B\) intersect transversally, or at an angle. In that case, alternating projections are expected to behave well and converge linearly. Tangential intersection is the more embarrassing case, where convergence could be slowed down or even fail. Our concept of \(\omega \)-separability gives new insight into the case of tangential intersection.

There has been considerable effort in the literature to avoid tangential intersection by making transversality assumptions. We mention transversal intersection in [20], the generalized non-separation property in [22], linearly regular intersection in [21], or the notion of constraint qualification in [4]. In the following, we relate these notions to \(0\)-separability.

Bauschke et al. [4, Definition 2.1] introduce an extension of the Mordukhovich normal cone called the \(B\)-restricted normal cone \(N_A^B(x^*)\) to \(A\) at \(x^*\in A\). They define \(u\in N_A^B(x^*)\) if there exist \(a_n\in A\), \(a_n\rightarrow x^*\), and \(u_n \rightarrow u\) such that

$$\begin{aligned} u_n = \lambda _n \left( b_n-a_n \right) \end{aligned}$$

for some \(\lambda _n > 0\) and \(b_n\in B\) with \(a_n\in P_A(b_n)\). They then establish basic inclusions between the restricted normal cone and various classical cones [4, Lemma 2.4]. In particular for any \(a\in A\) and \(B\), one has \(N_A^B(a)\subset N_A(a)\).

Now let \(\widetilde{A}\) and \(\widetilde{B}\) be non-empty subsets of \(\mathbb {R}^n\). In [4, Definition 6.6] the authors say that \((A,\widetilde{A},B,\widetilde{B})\) satisfies the CQ-condition at \(x^*\in A\cap B\) if

$$\begin{aligned} N_A^{\widetilde{B}}(x^*) \cap \left( -N_B^{\widetilde{A}}(x^*) \right) \subset \{0\}. \end{aligned}$$
(2)

This condition is to be understood as a transversality hypothesis, because we have the following

Proposition 1

(CQ implies 0-separability). Let \(P_A(\partial B{\setminus } A) \subset \widetilde{A},~P_B(\partial A{\setminus } B)\subset \widetilde{B}\), and suppose \((A,\widetilde{A}, B, \widetilde{B})\) satisfies the CQ-condition at \(x^*\in A \cap B\). Then \(A,B\) intersect \(0\)-separably at \(x^*\).

Proof

According to [4, Definition 2.1] the \(\widetilde{B}\)-restricted proximal normal cone \(\widehat{N}_A^{\tilde{B}}(a)\) of \(A\) at \(a\in A\) is the set of vectors \(u\) of the form \(u = \lambda (\tilde{b}-a)\) for some \(\lambda > 0\) and some \(\widetilde{b}\in \widetilde{B}\) satisfying \(a\in P_A(\widetilde{b})\). The cone \(\widehat{N}_B^{\tilde{A}}(b)\) at \(b\in B\) is defined analogously. Then by [4, Definition 6.1], specialized to the case of two sets, the CQ-number at \(x^*\) associated with \((A,\widetilde{A},B,\widetilde{B})\) is

$$\begin{aligned}&\theta _\delta (A,\widetilde{A},B,\widetilde{B})\\&\quad = \sup \left\{ \langle u,v\rangle : u\in \widehat{N}_A^{\widetilde{B}}(a), -v\in \widehat{N}_B^{\widetilde{A}}(b), \Vert u\Vert \le 1, \Vert v\Vert \le 1, a,b \in \mathcal {B}(x^*,\delta )\right\} \end{aligned}$$

and the limiting CQ-number is

$$\begin{aligned} \theta (A,\widetilde{A},B,\widetilde{B})=\lim _{\delta \rightarrow 0^+} \theta _\delta (A,\widetilde{A},B,\widetilde{B}). \end{aligned}$$

The authors show in [4, Theorem 6.8] that for two sets the CQ-condition \(N_A^{\widetilde{B}}(x^*)\cap (-N_B^{\widetilde{A}}(x^*)) \subset \{0\}\), implies \(\theta (A,\widetilde{A},B,\widetilde{B}) < 1\).

Using this, pick \(\delta > 0\) such that \(\theta _\delta (A,\widetilde{A},B,\widetilde{B}) =: 1-\gamma < 1\). Consider a building block \(b\rightarrow a^+\rightarrow b^+\) as in Definition 1 with \(b,a^+,b^+\in U:=\mathcal {B}(x^*,\delta )\). Then we have \(b\in \widetilde{B}\) and \(a^+\in \widetilde{A}\). Hence, \(b-a^+\in \widehat{N}_A^{\widetilde{B}}(a^+)\) and \(a^+-b^+\in \widehat{N}_B^{\widetilde{A}}(b^+)\), and also

$$\begin{aligned} u=(b-a^+)/\Vert b-a^+\Vert \in \widehat{N}_A^{\widetilde{B}}(a^+) \text{ and } v=(b^+-a^+)/\Vert b^+-a^+\Vert \in -\widehat{N}_B^{\widetilde{A}}(b^+). \end{aligned}$$

Therefore, if \(\alpha = \angle (b-a^+,b^+-a^+)\), then \(\cos \alpha =\langle u,v\rangle \le \theta _\delta (A,\widetilde{A},B,\widetilde{B})=1- \gamma \) by the definition of \(\theta _\delta \), because \(b,a^+,b^+\in \mathcal {B}(x^*,\delta )\) and \(\Vert u\Vert =\Vert v\Vert =1\). That shows \(1-\cos \alpha \ge \gamma > 0\) and proves that \(B\) intersects \(A\) \(0\)-separably at \(x^*\) with constant \(\gamma \). The estimate for building blocks \(a\rightarrow b\rightarrow a^+\) is analogous. \(\square \)

Example 7.3 shows that the converse of proposition 1 is not true. In fact, \(0\)-separability seems more versatile in applications, while still guaranteeing linear convergence. We conclude by noting that linearly regular intersection in the sense of [21], and transversality in the sense of [20], imply \(0\)-separability.

Following [21, section 2,(2.2)], \(A\) and \(B\) have linearly regular intersection at \(x^*\in A\cap B\) if

$$\begin{aligned} N_A(x^*)\cap \left( -N_B(x^*)\right) =\{0\}. \end{aligned}$$
(3)

This property is called strong regularity in [18] and the basic qualification condition for sets in [22, Definition 3.2 (i)]. As a consequence of \(N_A^{\widetilde{B}}(a)\subset N_A(a)\), \(N_B^{\widetilde{A}}(b)\subset N_B(b)\), linearly regular intersection implies that \((A,\widetilde{A}, B, \widetilde{B}\)) satisfies the CQ-condition at \(x^*\) for any non-empty \(\widetilde{A}\) and \(\widetilde{B}\) in \(\mathbb {R}^n\); cf. [5]. By Proposition 1, we therefore have:

Corollary 1

(Linear regularity implies 0-separability). Suppose \(A,B\) intersect linearly regularly at \(x^*\in A\cap B\). Then they intersect \(0\)-separably at \(x^*\). \(\square \)

As we mentioned before, in the context of alternating projections, linear regularity and the CQ-condition are to be understood as transversality type hypotheses, indicating that the sets \(A,B\) intersect at an angle at \(x^*\), as opposed to intersecting tangentially. This is confirmed by relating \(0\)-separability to the classical notion of transversality. Following [20, def.3], two \(C^2\)-manifolds \(A,B\) in \({\mathbb {R}}^n\) intersect transversally at \(x^*\in A \cap B\) if

$$\begin{aligned} T_A(x^*) + T_B(x^*) = {\mathbb {R}}^n, \end{aligned}$$
(4)

where \(T_M(x^*)\) is the tangent space to \(M\) at \(x^*\in M\). We then have the following

Corollary 2

Let \(A,B\) be \(C^2\)-manifolds which intersect transversally at \(x^*\). Then \(A\) and \(B\) intersect \(0\)-separably at \(x^*\).

Proof

Indeed, as shown in [20, Theorem 18], classical transversality (4) implies linear regular intersection (3), and hence, we can apply Corollary 1. \(\square \)

After the initial version of this work was published, a related concept termed ‘intrinsic transversality” was proposed in [12]. Following [12, Def. 2.2], \(A,B\) are intrinsically transversal at \(x^*\in A \cap B\) with constant \(\kappa \in (0,1]\) if there exists a neighborhood \(U\) of \(x^*\) such that for every \(a^+\in A \cap U {\setminus } B\) and every \(b^+\in B \cap U {\setminus } A\) the estimate

$$\begin{aligned} \max \left\{ d\left( \frac{a^+-b^+}{\Vert a^+-b^+\Vert },N_B^p(b^+) \right) , d\left( \frac{a^+-b^+}{\Vert a^+-b^+\Vert },-N_A^p(a^+) \right) \right\} \ge \kappa >0 \end{aligned}$$
(5)

is satisfied. This relates to \(0\)-separability as follows:

Proposition 2

(Intrinsic transversality implies 0-separability). Suppose \(A,B\) are intrinsically transversal at \(x^*\in A \cap B\) with transversality constant \(\kappa \in (0,1]\). Then they intersect \(0\)-separably at \(x^*\) with constant \(\gamma =\kappa ^2/2\).

Proof

By assumption, there exists a neighborhood \(U\) of \(x^*\) on which (5) is satisfied. Now let \(b^+\in P_B(a^+)\cap U\), \(a^+\not = b^+\), \(a^+\in A \cap U\). Then since \(a^+-b^+\in N^p_B(b^+)\), we have \(d\left( \frac{a^+-b^+}{\Vert a^+-b^+\Vert }, N^p_B(b^+) \right) =0\), so by (5) we must have \(d\left( \frac{a^+-b^+}{\Vert a^+-b^+\Vert }, -N^p_A (a^+)\right) \ge \kappa \). Since \(b-a^+\in N^p_A(a^+)\), we obtain

$$\begin{aligned} 2-2 \frac{\langle b^+-a^+,b-a^+\rangle }{\Vert a^+-b^+\Vert \Vert b-a^+\Vert } \ge \kappa ^2, \end{aligned}$$

and this readily gives

$$\begin{aligned} \langle b^+-a^+,b-a^+\rangle \le \left( 1-\frac{\kappa ^2}{2} \right) \Vert a^+-b^+\Vert \Vert b-a^+\Vert , \end{aligned}$$

which is (1) for the case \(\omega = 0\) and \(\gamma =\kappa ^2/2\), as claimed. \(\square \)

We will resume the discussion of separable intersection of sets in Sect. 6.

4 Hölder Regularity

In this section, we introduce the concept of Hölder regularity. We then relate it to other regularity notions like Clarke regularity, prox-regularity, superregularity in the sense of [21], and its extension in [4].

Definition 2

(Hölder regularity). Let \(\sigma \in [0,1)\). The set \(B\) is \(\sigma \)-Hölder regular with respect to \(A\) at \(b^*\in A \cap B\) if there exists a neighborhood \(U\) of \(b^*\) and a constant \(c>0\) such that for every \(a^+\in A\cap U\), and every \(b^+\in P_B(a^+) \cap U\) one has

$$\begin{aligned} \mathcal {B}(a^+,(1+c)r) \cap \{b\in P_A(a^+)^{-1}: \langle a^+-b^+,b-b^+\rangle > \sqrt{c} r^{\sigma +1}\Vert b-b^+\Vert \} \cap B = \emptyset , \end{aligned}$$
(6)

where \(r = \Vert a^+-b^+\Vert \). We say that \(B\) is Hölder regular with respect to \(A\) if it is \(\sigma \)-Hölder regular with respect to \(A\) for every \(\sigma \in [0,1)\).

Remark 4

Using the angle \(\beta =\angle (a^+-b^+,b-b^+)\) and \(r=\Vert a^+-b^+\Vert \), condition (6) can be rewritten in the following more suggestive form

$$\begin{aligned} \displaystyle \mathcal {B}(a^+,(1+c)r) \cap \{b\in P_A(a^+)^{-1} : \cos \beta > \sqrt{c} r^\sigma \} \cap B = \emptyset . \end{aligned}$$
(6′)

Geometrically, this means that the right circular cone with axis \(a^+-b^+\) and aperture \(\beta =\arccos \sqrt{c}r^\sigma \) truncated by the ball \(\mathcal {B}(a^+,(1+c)r)\) and the \(B\)-restricted proximal normal cone \(\widehat{N}_A^B(a^+)\) contains no points of \(B\) other than \(b^+\).

In the remainder of this section, we relate Hölder regularity to older geometric and analytic regularity concepts. We first consider notions related to \(0\)-Hölder regularity. The case of \(\sigma \)-Hölder regularity with \(\sigma > 0\) will be considered later.

Definition 3

(Superregularity [21, Proposition 4.4]). A closed set \(B\) in \({\mathbb {R}}^n\) is called superregular at \(b^*\in B\) if for every \(\epsilon > 0\) there exists \(\delta > 0\) such that for all \(b,b^+\in \mathcal {B}(b^*,\delta )\cap B\) and \(u\in N_B^p(b^+)\), the estimate \(\langle u,b-b^+\rangle \le \epsilon \Vert u\Vert \Vert b-b^+\Vert \) is satisfied.

Definition 4

(\((A,\epsilon ,\delta )\) -regularity [4, Definition 8.1, (i)]). Let \(A,B\) be closed sets in \({\mathbb {R}}^n\). \(B\) is called \((A,\epsilon ,\delta )\)-regular at \(b^*\in A \cap B\) if for all \(b,b^+\in \mathcal {B}(b^*,\delta )\cap B\) and every \(u\in \widehat{N}_B^A(b^+)\), the estimate \(\langle u,b-b^+\rangle \le \epsilon \Vert u\Vert \Vert b-b^+\Vert \) is satisfied.

The two concepts are linked as follows: \(B\) is superregular at \(b^*\in B\) if and only if for every \(\epsilon > 0\) there exists \(\delta >0\) such that \(B\) is \(({\mathbb {R}}^n,\epsilon ,\delta )\)-regular at \(b^*\) in the sense of Definition 4, see [4, Definition 8.1].

Proposition 3

(0-Hölder regularity from superregularity). Suppose \(B\) is \((A,\epsilon ,\delta )\)-regular at \(b^*\in A \cap B\). Then \(B\) is \(0\)-Hölder regular at \(b^*\) with respect to \(A\) with constant \(c=\epsilon ^2\). In particular, if \(B\) is superregular at \(b^*\), then \(B\) is \(0\)-Hölder regular with respect to \(A\) with constant \(c\) that may be chosen arbitrarily small.

Proof

Since superregularity of \(B\) at \(b^*\) implies that for every \(\epsilon >0\) there exists \(\delta >0\) such that \(B\) is \((A,\epsilon ,\delta )\)-regular at \(b^*\), [4], it remains to prove the first part of the statement.

In order to check \(0\)-Hölder regularity, we have to provide a neighborhood \(U\) of \(b^*\) and \(c>0\) such that (6) is satisfied with \(\sigma =0\). We choose \(U = \mathcal {B}(b^*,\frac{\delta }{4(1+\epsilon ^2)})\) and put \(c=\epsilon ^2\). To check (6) pick \(a^+,b^+\in U\) such that \(b^+\in P_B(a^+)\), \(a^+\in A\). That gives \(r=\Vert b^+-a^+\Vert \le \frac{\delta }{2(1+c)}\). By the definition of the restricted normal cone, we have \(u:=a^+-b^+\in \widehat{N}_B^A(b^+)\). Now let \(b\in B\), \(b\not = b^+\). We have to show that \(b\) is not an element of the set in (6) for \(\sigma =0\). Suppose \(b\in \mathcal {B}(a^+,(1+c )r)\). Then we have to show \(\langle a^+-b^+,b-b^+\rangle \le \sqrt{c}r\Vert b-b^+\Vert \). Observe that \(\Vert b-b^*\Vert \le \Vert b-a^+\Vert +\Vert a^+-b^*\Vert \le (1+c )r + \frac{\delta }{{4}(1+c)}\le (1+c ) \frac{\delta }{2(1+c)}+ \frac{\delta }{4(1+c)} < \delta \). Hence, \((A,\epsilon ,\delta )\)-regularity implies \(\langle u,b-b^+\rangle \le \epsilon \Vert u\Vert \Vert b-b^+\Vert = \sqrt{c}\Vert u\Vert \Vert b-b^+\Vert \), and the claim follows. \(\square \)

Remark 5

Example 7.1 shows that the converse of proposition 3 is not true. The difference between superregularity and its extension \((A,\epsilon ,\delta )\)-regularity on the one hand, and \(0\)-Hölder regularity on the other, is the following: In (6), we exclude points in the intersection of a restricted right circular cone with vertex \(b^+\), axis \(a^+-b^+\), and aperture \(\beta = \arccos \sqrt{c}r^\sigma \) and the shrinking ball \(\mathcal {B}(a^+,(1+c)r)\). In contrast, \((A,\epsilon ,\delta )\)-regularity forbids many more points, namely all points in that same cone, but within the fixed ball \(\mathcal {B}(b^*,\delta )\). In consequence, this type of regularity is not suited to deal with singularities pointing inwards, like the prototype in Example 7.1.

Remark 6

If \(B\) is \(\sigma \)-Hölder regular at \(b^*\) with respect to \(A\) with constant \(c>0\) on the neighborhood \(U\) of \(b^*\), and if \(\sigma ' < \sigma \), then for every \(c'\in (0,c)\), there exists a neighborhood \(V \subset U\) of \(b^*\) such that \(B\) is \(\sigma '\)-Hölder regular at \(b^*\) with constant \(c'\). Indeed, if \(b\in \mathcal {B}(a^+,(1+c')r)\) in (6), then also \(b\in \mathcal {B}(a^+,(1+c)r)\), hence by assumption \(\cos \beta \le \sqrt{c}r^\sigma = \sqrt{c} r^{\sigma -\sigma '}r^{\sigma '} \le \sqrt{c'}r^{\sigma '}\) if \(V\) is chosen so that \(\sqrt{c}r^{\sigma -\sigma '} < \sqrt{c'}\).

We next justify our notion of Hölder regularity by proving that prox-regular sets are \(\sigma \)-Hölder regular for every \(\sigma \in (0,1]\). Recall that a set \(B\) in \({\mathbb {R}}^n\) is prox-regular at \(b^*\in B\) if there exists a neighborhood \(U\) of \(b^*\) such that \(P_B(y)\) is single-valued for every \(y\in U\), cf. [25, Chapter 13].

Consider \(b\in B\) and let \(d\in N_B^p(b)\) be a unit proximal normal to \(B\) at \(b\). Define the reach of \(B\) at \(b\) along \(d\) as

$$\begin{aligned} R(b,d) = \sup \{R \ge 0: b = P_B(b+td) \text{ for } \text{ every } 0 \le t \le R\}. \end{aligned}$$
(7)

Then \(R(b,d) \in (0,\infty ]\), and the case \(R(b,d)=\infty \) occurs, e.g., if \(B\) is convex and \(b\) a boundary point of \(B\). We can say that \(\mathcal {B}(b+R(b,d)d,R(b,d))\) is the largest ball with its center on \(b+{\mathbb {R}}_+d\) which touches \(B\) in \(b\) from outside, i.e., has no points from \(B\) in its interior.

It was shown in [24], Thm. 1.3\((h)\)] that \(B\) is prox-regular at \(b^*\in B\) if and only if there exists \(r>0\) and a neighborhood \(U\) of \(b^*\) such that \(R(b,d)\ge r\) for every \(b\in U \cap B\) and every \(d\in N_B^p(b)\) with \(\Vert d\Vert =1\). An immediate consequence is that sets of positive reach in the sense of Federer [15] are prox-regular; see, e.g., [24, Theorem 1.3]. Therefore, prox-regularity is a local version of positive, or non-vanishing, reach.

We now relax the concept of non-vanishing reach to sets where the reach may vanish at some boundary points, but slowly so.

Definition 5

Let \(\sigma \in (0,1]\). The set \(B\) has \(\sigma \)-slowly vanishing reach with respect to the set \(A\) at \(b^*\in A\cap B\) if there exists \(0 \le \tau < 1\) such that

$$\begin{aligned} \limsup _{A \ni a\rightarrow b^*, b\in P_B(a)} \frac{\Vert a-b\Vert ^\sigma }{R(b,d)} \le \tau , \end{aligned}$$
(8)

where \(d=(a-b)/\Vert a-b\Vert \). We say that the reach vanishes with exponent \(\sigma \) and rate \(\tau \).

Proposition 4

If \(B\) is prox-regular at \(b^*\in A\cap B\), then it has slowly vanishing reach at \(b^*\) with respect to \(A\) with rate \(\tau =0\) and arbitrary exponent \(\sigma \in (0,1]\).

Proof

Let \(\tau ' > 0\). By [24], Thm. 1.3\((h)\)] prox-regularity at \(b^*\) implies that there exist \(\epsilon > 0\) and \(r > 0\) such that \(R(b,d) \ge r\) for every \(b\in B\) with \(\Vert b-b^*\Vert \le \epsilon \) and every \(d\in N_B^p(b)\) with \(\Vert d\Vert =1\). By shrinking \(\epsilon \) if necessary, we may assume \(\epsilon ^\sigma /(2^\sigma r) < \tau '\). Now let \(a\in A \cap \mathcal {B}(b^*,\frac{\epsilon }{2})\) be arbitrary, choose \(b\in P_B(a)\), and let \(d=(a-b)/\Vert a-b\Vert \). Then as \(b^*\in A \cap B\), we have \(\Vert b-b^*\Vert \le \Vert b-a\Vert +\Vert a-b^*\Vert \le 2\Vert a-b^*\Vert \le \epsilon \). Since \(d\in N_B^p(b)\), the above gives us \(R(b,d)\ge r\). Therefore, since \(\Vert a-b\Vert = d_B(a) \le \Vert a-b^*\Vert \), the quotient in (8) satisfies

$$\begin{aligned} \frac{\Vert a-b\Vert ^\sigma }{R(b,d)} \le \frac{\epsilon ^\sigma }{2^\sigma r} < \tau ', \end{aligned}$$

and since \(\tau ' > 0\) was arbitrary, this shows that (8) is satisfied with \(\tau =0\). \(\square \)

Proposition 5

(Hölder regularity from slowly vanishing reach). Let \(\sigma \in {(0,1)}\). Suppose \(B\) has \(\sigma \)-slowly vanishing reach with rate \(\tau \in [0,1)\) with respect to \(A\) at \(b^*\in A\cap B\). Then \(B\) is \((1-\sigma )\)-Hölder regular with respect to \(A\) with any constant \(c>0\) satisfying

$$\begin{aligned} \frac{\tau }{2} \sqrt{2+c} < 1. \end{aligned}$$
(9)

In particular, \(c\) may be chosen arbitrarily small.

Proof

1) We have to show that there exists a neighborhood \(U\) of \(b^*\) such that (6) is satisfied with \(c\) as in (9) and with exponent \(1-\sigma \).

By condition (9), we can choose \(\tau ' > \tau \) and \(\epsilon > 0\) such that

$$\begin{aligned} \frac{\tau '}{2} \left( \epsilon + \sqrt{\epsilon ^2 + 2 + c}\right) < 1. \end{aligned}$$

By condition (8), and since \(\tau < \tau '\), there exists a neighborhood \(U\) of \(b^*\) such that whenever \(a^+,b^+\in U\), \(a^+\in A\), \(b^+\in P_B(a^+)\) and \(d=(a^+-b^+)/\Vert a^+-b^+\Vert \), then \(r^\sigma /R(b^+,d) < \tau '\), where \(r:= \Vert a^+-b^+\Vert \). On shrinking \(U\) further if necessary, we may arrange that \(a^+,b^+\in U\) implies \(r^{1-\sigma }=\Vert a^+-b^+\Vert ^{1-\sigma } < \epsilon \). We will show that \(U\) is the neighborhood we need in condition (6).

2) To prove this, pick \(a^+,b^+\in U\), \(a^+\in A\), \(b^+\in P_B(a^+)\), put \(r=\Vert a^+-b^+\Vert \), and let \(b\in B\), \(b\not = b^+\). We have to show that \(b\) is not an element of the set (6 \(^\prime \)). To check this, let \(\beta \) be the angle \(\beta = \angle (a^+-b^+,b-b^+)\). Since there is nothing to prove for \(b\not \in \mathcal {B}(a^+,(1+c)r)\), we assume \(b\in \mathcal {B}(a^+,(1+c)r)\). Now we have to show that \(\cos \beta \le \sqrt{c}r^{1-\sigma }\). As this is clear for \(\cos \beta \le 0\), we may assume \(\cos \beta > 0\).

Let us define

$$\begin{aligned} R := \frac{r}{2} \left( 1 + \sqrt{1+\frac{2c+c^2}{\cos ^2\beta }} \right) , \end{aligned}$$
(10)

where \(r,\beta \) are as before. We claim that the ball \(\mathcal {B}(b^++Rd,R)\) contains \(b\), where as above \(d=(a^+-b^+)/\Vert a^+-b^+\Vert \). To prove this, note that by the cosine theorem, applied in the triangle \(a^+,b^+,b\), we have

$$\begin{aligned} \Vert a^+-b\Vert ^2 = r^2 + \Vert b-b^+\Vert ^2 - 2 r \Vert b-b^+\Vert \cos \beta . \end{aligned}$$

Since \(\Vert a^+-b\Vert \le (1+c)r\), we obtain

$$\begin{aligned} r^2 + \Vert b-b^+\Vert ^2 - 2r\Vert b-b^+\Vert \cos \beta \le (1+c)^2 r^2, \end{aligned}$$

which on completing squares turns out to be the same as

$$\begin{aligned} \Vert b-b^+\Vert \le r \left( \cos \beta + \sqrt{2c+c^2+\cos ^2\beta } \right) =2R\cos \beta . \end{aligned}$$

Here, the last equality uses the definition (10) of \(R\). We therefore obtain

$$\begin{aligned} \Vert b-b^+\Vert ^2 \le 2R \cos \beta \Vert b-b^+\Vert , \end{aligned}$$

and using the cosine theorem again, now in the triangle \(b^++Rd,b^+,b\), we deduce

$$\begin{aligned} \Vert b^++Rd - b\Vert ^2 = R^2 + \Vert b-b^+\Vert ^2 -2R\Vert b-b^+\Vert \cos \beta \le R^2. \end{aligned}$$

This gives \(b\in \mathcal {B}(b^++Rd,R)\) as claimed.

3) By the definition (7) of \(R(b^+,d)\), any radius \(R' < R(b^+,d)\) must give rise to a ball with \(\mathcal {B}(b^++R'd,R') \cap B = \{b^+\}\). But as we have shown in part 2), the ball \(\mathcal {B}(b^++Rd,R)\) contains \(b\), so necessarily \(R \ge R(b^+,d)\). Hence, by the choice of \(U\) in part 1), \(r^\sigma /R \le r^\sigma /R(b^+,d) < \tau '\), or what is the same, \(r^\sigma < R\tau '\). Substituting the definition (10) of \(R\) and multiplying by \(r^{-\sigma }\), we deduce

$$\begin{aligned} 1 < r^{1-\sigma } \tau ' \left( \frac{1}{2}+\frac{1}{2}\sqrt{1+ \frac{2c+c^2}{\cos ^2\beta }} \right) . \end{aligned}$$

Now suppose that \(\cos \beta > \sqrt{c}r^{1-\sigma }\), contrary to what we wish to show. Then

$$\begin{aligned} 1&< r^{1-\sigma } \tau '\left( \frac{1}{2} + \frac{1}{2} \sqrt{1+ \frac{2c +c^2}{c r^{2(1-\sigma )}}} \right) \\&= \frac{\tau '}{2} \left( r^{1-\sigma } + \sqrt{r^{2(1-\sigma )}+2+c}\right) \\&< \frac{\tau '}{2} \left( \epsilon + \sqrt{\epsilon ^2+2+c}\right) < 1, \end{aligned}$$

a contradiction. That proves the result. \(\square \)

Since prox-regularity at \(b^*\in B\) implies slowly vanishing reach at \(b^*\) with respect to any closed set \(A\) containing \(b^*\), we have the following immediate consequence.

Corollary 3

(Hölder regularity from prox-regularity). Let \(B\) be prox-regular. Then \(B\) is \(\sigma \)-Hölder regular with respect to \(A\) for every \(\sigma \in [0,1)\) with a constant \(c>0\) that may be chosen arbitrarily small.

Proof

For \(\sigma =0\), this follows from Proposition 3, because prox-regularity implies superregularity. For \(\sigma \in (0,1)\), we obtain it by combining Propositions 4 and 5. \(\square \)

Consider the case of a Lipschitz domain \(B\). Here Hölder regularity may be related to a property of the boundary \(\partial B\) of \(B\).

Proposition 6

Let \(\sigma \in (0,1)\). Suppose \(B\) is the epigraph of a locally Lipschitz function \(f:{\mathbb {R}}^{n-1} \rightarrow {\mathbb {R}}\). Let \(x^*\in {\mathbb {R}}^{n-1}\) and suppose there exists a neighborhood \(V\) of \(x^*\) and \(\mu >0\) such that for every \(x_0\in V\) and every proximal subgradient \(g\in \partial _pf(x_0)\) the one-sided Hölder estimate \(f(x_0)+\langle g,x-x_0\rangle - \mu \Vert x-x_0\Vert ^{1+\sigma }\le f(x)\) is satisfied for all \(x\in V\). Then \(B\) is \(\sigma \)-Hölder regular at \((x^*,f(x^*))\in B\) with respect to every closed set \(A\) containing \((x^*,f(x^*))\), and for every constant \(c>0\) satisfying \(\mu \le \sqrt{c}/(2+c)^\sigma \).

Proof

We have to find a neighborhood \(U\) of \(b^*=(x^*,f(x^*))\in B\) such that (6) is satisfied with exponent \(\sigma \) and constant \(c\) satisfying \(\mu \le \sqrt{c}/(2+c)^\sigma \). Choose \(\epsilon > 0\) such that \(\mathcal {B}(x^*,\epsilon ) \subset V\). Now choose \(\delta >0\) with \(\delta < \epsilon /(2+c)\) and define \(U = \mathcal {B}(b^*,\delta )\). We will show that \(U\) is as required.

In order to check (6), choose \(a^+\in A\backslash B\) and \(b^+\in P_B(a^+)\) such that \(a^+,b^+\in U\). As \(b^+\in P_B(a^+)\cap U\), we get \(b^+ =(x_0,f(x_0))\in B\) for some \(x_0\in V\subset \) \({\mathbb {R}}^{n-1}\), while \(a^+\not \in B= \mathrm{epi} f\) implies \(a^+=(x_1,\xi _1)\) for some \(\xi _1 < f(x_1)\). Since \(a^+-b^+\) is a proximal normal to \(B\) at \(b^+\), there exists a proximal subgradient \(g\in \partial _p f(x_0)\) such that \((x_1,\xi _1) = (x_0,f(x_0)) + t(g,-1)\) for some \(t>0\). Using \(\Vert a^+-b^+\Vert =r\), we can therefore write

$$\begin{aligned} a^+-b^+ =\left( \frac{rg}{\sqrt{1+\Vert g\Vert ^2}}, -\frac{r}{\sqrt{1+\Vert g\Vert ^2}} \right) . \end{aligned}$$

Now consider \(b\in B\), \(b\ne b^+\) such that \(\Vert a^+-b\Vert \le (1+c)r\). To verify (6 \(^\prime \)) we have to show that \(\cos \beta \le \sqrt{c} r^\sigma \), where \(\beta =\angle (a^+-b^+,b-b^+)\). Since there is nothing to prove for \(\cos \beta \le 0\), we assume \(\cos \beta > 0\). By the definition of \(B\), we have \(b = (x,\xi )\) for some \(x\in {\mathbb {R}}^{n-1}\) and \(\xi \ge f(x)\). Now

$$\begin{aligned} \cos \beta&= \frac{\langle g,x-x_0\rangle - \xi +f(x_0)}{\sqrt{1+\Vert g\Vert ^2}\sqrt{\Vert x-x_0\Vert ^2 + \left( \xi -f(x_0)\right) ^2}}\\&\le \frac{\langle g,x-x_0\rangle - f(x)+f(x_0)}{\Vert x-x_0\Vert } \\&\le \frac{\mu \Vert x-x_0\Vert ^{1+\sigma }}{\Vert x-x_0\Vert }= \mu \Vert x-x_0\Vert ^\sigma \le \sqrt{c} r^\sigma . \end{aligned}$$

Here the first inequality uses the fact that \(\xi \ge f(x)\). The second inequality uses the one-sided Hölder estimate from the hypothesis. In order to be allowed to use this estimate, we have to assure that \(x\in V\). This follows from

$$\begin{aligned} \Vert x-x^*\Vert&\le \Vert b-b^*\Vert \le \Vert b-a^+\Vert + \Vert a^+-b^*\Vert \\&\le (1+c)\Vert a^+-b^+\Vert + \Vert a^+-b^*\Vert \\&\le (2+c)\Vert a^+-b^*\Vert \le (2+c)\delta < \epsilon . \end{aligned}$$

The third inequality can be seen as follows: We have

$$\begin{aligned} \Vert x-x_0\Vert&\le \Vert b-b^+\Vert \le \Vert b-a^+\Vert + \Vert a^+-b^+\Vert \\&\le (2+c) \Vert a^+-b^+\Vert = (2+c)r. \end{aligned}$$

Hence,

$$\begin{aligned} \mu \Vert x-x_0\Vert ^\sigma \le \mu (2+c)^\sigma r^\sigma \le \sqrt{c}r^\sigma \end{aligned}$$

by the choice of \(c\). That completes the argument. \(\square \)

Remark 7

The nomenclature in Proposition 6 can be explained as follows: Lipschitz smoothness [14] of \(-f\) at \(x_0\) is a well-known second-order property equivalent to the second difference quotient

$$\begin{aligned} \Delta _2(x) = \frac{f(x) - f(x_0)-\langle g,x-x_0\rangle }{\Vert x-x_0\Vert ^2} \ge -\mu > -\infty \end{aligned}$$

being bounded below for \(g\in \partial f(x_0)\) and \(x\) in a neighborhood of \(x_0\). The Hölder estimate in Proposition 6 is the analogous but weaker condition \(\Delta _{1+\sigma }(\cdot ) \ge -\mu > -\infty \) for some \(\sigma \in (0,1)\). In analogy with [14] one could call this \(\sigma \)-Hölder smoothness of \(-f\) at \(x_0\).

We consider the following natural modification of amenability from [25]. The set \(B\subset {\mathbb {R}}^n\) is called \(\sigma \)-Hölder amenable at \(x^*\in B\) if there exists a neighborhood \(U\) of \(x^*\), a class \(C^{1,\sigma }\)-mapping \(G:{\mathbb {R}}^n \rightarrow {\mathbb {R}}^m\), and a closed convex set \(C\subset {\mathbb {R}}^m\), such that \(B \cap U =\{x\in U: G(x) \in C\}\) and \(N_C\left( G(x^*) \right) \cap \mathrm{ker}\left( DG(x^*)^T \right) = \{0\}\), where \(DG(x^*)\) denotes the first-order differential of \(G\) at \(x^*\). A typical example in optimization is when \(B\) is defined by \(C^{1,\sigma }\) equality and inequality constraints, where the Mangasarian–Fromowitz constraint qualification holds at \(x^*\), see [24, Prop. 2.3], [25, Chap10,F.], [21, Prop. 4.8].

Proposition 7

(Hölder regularity from Hölder amenability). Suppose the closed set \(B\) is \(\sigma '\)-Hölder amenable at \(x^*\) for some \(\sigma '\in (0,1]\). Then \(B\) is \(\sigma \)-Hölder regular at \(x^*\) with respect to any closed set \(A\) containing \(x^*\) for every \(\sigma \in (0,\sigma ')\), and with arbitrary constant \(c\).

The proof may be adopted from on [21, Prop. 4.8] with minor changes, and we skip the details. This result suggests that Hölder regularity is settled between the weaker superregularity and the stronger prox-regularity. This is true as long as we consider this type of regularity as a property of \(B\) alone. We stress, however, that it is the combination with \(A\) and the shrinking distance between the sets in (6) which makes our definition 3 truly versatile in applications. This is corroborated by the following observation.

Proposition 8

(Hölder regularity from intrinsic transversality). Suppose \(A\), \(B\) are intrinsically transversal at \(x^*\) with constant of transversality \(\kappa \in (0,1]\). Then \(B\) is \(\sigma \)-Hölder regular at \(x^*\) with respect to \(A\) for every \(\sigma \in [0,1)\), with any constant \(c>0\) satisfying \(c < \frac{\kappa ^2}{1-\kappa ^2}\).

Proof

1) By [12, Def. 2.2] and [12, Prop. 6.9] there exists a neighborhood \(V\) of \(x^*\) such that max\(\left\{ \mathrm{dist}(u, N_B(b)),\mathrm{dist}(u,- N_A(a^+))\right\} \ge \kappa >0\) for all \(a^+\in A\cap V {\setminus } B\), \(b\in B\cap V{\setminus } A\), and \(u=(a^+-b)/\Vert a^+-b\Vert \). Following entirely the argument in [12, page 6], one can now find a smaller neighborhood \(U\) of \(x^*\) such that the following is true: If \(b\in B \cap U\) and \(a^+\in P_A(b)\cap U\), then even \(d_B(a^+) \le (1-\kappa ^2) \Vert b-a^+\Vert \).

2) We claim that \(U\) is the neighborhood required in \(\sigma \)-Hölder regularity with constant \(c\). To check this, we have to show that the set (6) is empty. We assume that \(b\in U\) is an element of that set. Then \(b\in P_A(a^+)^{-1}\cap B\) and \(b\in \mathcal {B}(a^+,(1+c)r)\). Hence, we are in the situation of part 1), which means \(r=d_B(a^+) \le (1-\kappa ^2)\Vert b-a^+\Vert \le (1-\kappa ^2)(1+c) r < r\), a contradiction. Hence, the set (6) is empty. \(\square \)

5 Convergence

In this section, we prove the main convergence result. Alternating projections converge locally for sets which intersect separably, if one of the sets is Hölder regular with respect to the other. The proof requires the following preparatory lemma.

Lemma 1

(Three-point estimate). Suppose \(B\) intersects \(A\) separably at \(x^*\in A\cap B\) with exponent \(\omega \in [0,2)\) and constant \(\gamma >0\) on the neighborhood \(U\) of \(x^*\). Suppose \(B\) is also \(\omega /2\)-Hölder regular at \(x^*\in A\cap B\) with respect to \(A\) on \(U\) with constant \(c>0\) satisfying \(c < \frac{\gamma }{2}\). Then there exists \(0<\ell < 1\), depending only on \(\gamma ,c\) and \(U\), such that

$$\begin{aligned} \Vert a^+-b^+\Vert ^2 + \ell \Vert b-b^+\Vert ^2 \le \Vert a^+-b\Vert ^2 \end{aligned}$$
(11)

for every building block \(b\rightarrow a^+\rightarrow b^+\) in \(U\).

Proof

1) By the cosine theorem, we have

$$\begin{aligned} \Vert a^+-b\Vert ^2 = \Vert a^+-b^+\Vert ^2 + \Vert b-b^+\Vert ^2 - 2 \Vert a^+-b^+\Vert \Vert b-b^+\Vert \cos \beta , \end{aligned}$$

where \(\beta = \angle (b-b^+,a^+-b^+)\). Hence, in order to assure (11) we have to find \(\ell \in (0,1)\) such that

$$\begin{aligned} \textstyle \frac{1-\ell }{2} \Vert b-b^+\Vert \ge \Vert a^+-b^+\Vert \cos \beta \end{aligned}$$
(12)

for all building blocks \(b\rightarrow a^+\rightarrow b^+\) in \(U\). We consider two cases. Case I is when \(\beta \in ( \frac{\pi }{2},\pi ]\). Case II is \(\beta \in [0,\frac{\pi }{2}]\).

2) We start by discussing case I. For angles \(\beta \in (\frac{\pi }{2},\pi ]\), we have \(\cos \beta < 0\), and hence (12) is trivially true if we choose any \(0 < \ell < 1\). For instance, \(\ell = \frac{1}{2}\) will do. To establish (12), we may now concentrate on case II, where \(\beta \in [0,\frac{\pi }{2}]\).

3) In case II, we want to use \(\omega /2\)-Hölder regularity of \(B\) with respect to \(A\). We subdivide case II in two subcases. Case IIa is when \(\cos \beta \le \sqrt{c}\Vert a^+-b^+\Vert ^{\frac{\omega }{2}}\). Case IIb is when \(\cos \beta >\sqrt{c}\Vert a^+-b^+\Vert ^{\frac{\omega }{2}}\).

Let us start with case IIa, where \(\cos \beta \le \sqrt{c}\Vert a^+-b^+\Vert ^{\frac{\omega }{2}}\). Observe that

$$\begin{aligned} \Vert b-b^+\Vert ^2&= \Vert b-a^+\Vert ^2 +\Vert a^+-b^+\Vert ^2-2\langle b-a^+,b^+-a^+\rangle \\&= (\Vert b-a^+\Vert -\Vert a^+-b^+\Vert )^2 +2\Vert b-a^+\Vert \Vert a^+-b^+\Vert \left( 1-\cos \alpha \right) \\&\ge 2\Vert b-a^+\Vert \Vert a^+-b^+\Vert \left( 1-\cos \alpha \right) , \end{aligned}$$

where \(\alpha = \angle (b-a^+,b^+-a^+)\). By the angle condition (1 \(^\prime \)), we have \(1-\cos \alpha \ge \gamma \Vert a^+-b^+\Vert ^\omega \) for every building block \(b\rightarrow a^+\rightarrow b^+\) in \(U\). Hence,

$$\begin{aligned} \Vert b-b^+\Vert ^2 \ge 2\gamma \Vert b-a^+\Vert \Vert a^+-b^+\Vert ^{1+\omega } \ge 2\gamma \Vert a^+-b^+\Vert ^{2+\omega }, \end{aligned}$$

where the last estimate uses \(b^+\in P_B(a^+)\). Altogether we obtain

$$\begin{aligned} \Vert b-b^+\Vert \ge \sqrt{2\gamma } \Vert a^+-b^+\Vert ^{1+\frac{\omega }{2}} \ge \sqrt{\frac{{2\gamma }}{c}} \Vert a^+-b^+\Vert \cos \beta , \end{aligned}$$

bearing in mind that we are in case IIa. To assure (12) we put \(\ell = 1 - \sqrt{\frac{2c}{\gamma }}\). Then \(\ell \in (0,1)\), because of the hypothesis \(c<\frac{\gamma }{2}\).

4) Let us now deal with case IIb, where \(\cos \beta >\sqrt{c}\Vert a^+-b^+\Vert ^{\frac{\omega }{2}}\). By Hölder regularity (6) of \(B\) with respect to \(A\) and since \(a^+\in P_A(b)\), we have \(b\not \in \mathcal {B}(a^+,(1+c )r)\), where \(r=\Vert a^+-b^+\Vert \). In other words, \(\Vert b-a^+\Vert > (1+c)\Vert a^+-b^+\Vert \). Using this and the cosine theorem again, we have

$$\begin{aligned} \Vert b-b^+\Vert ^2&= \Vert b-a^+\Vert ^2 - \Vert a^+-b^+\Vert ^2+2 \Vert b-b^+\Vert \Vert a^+-b^+\Vert \cos \beta \\&\ge c (c +2)\Vert a^+-b^+\Vert ^2+2 \Vert b-b^+\Vert \Vert a^+-b^+\Vert \cos \beta . \end{aligned}$$

Since \(a^+\not = b^+\), this may be rearranged as

$$\begin{aligned} \displaystyle \frac{\Vert b-b^+\Vert ^2}{\Vert a^+-b^+\Vert ^2} -2\frac{\Vert b-b^+\Vert }{\Vert a^+-b^+\Vert }\cos \beta - c (c +2)\ge 0. \end{aligned}$$
(13)

Hence, (13) implies that the polynomial \(P(X)=X^2-2X\cos \beta -c (c +2)\) is non-negative at \(X=\frac{\Vert b-b^+\Vert }{\Vert a^+-b^+\Vert }\). But for non-negative \(X\), non-negativity \(P(X)\ge 0\) is equivalent to

$$\begin{aligned} X\ge \cos \beta +\sqrt{\cos ^2 \beta +c (c +2)} = \cos \beta \left( 1+\sqrt{1+\frac{c (c +2)}{\cos ^2 \beta }}\right) . \end{aligned}$$

Hence, for \(X=\frac{\Vert b-b^+\Vert }{\Vert a^+-b^+\Vert }\) we know that

$$\begin{aligned} \frac{\Vert b-b^+\Vert }{\Vert a^+-b^+\Vert } \ge \cos \beta \left( 1+\sqrt{1+\frac{c (c +2)}{\cos ^2\beta }}\right) . \end{aligned}$$
(14)

Put \(\Theta _{r,\beta } = 1+ \sqrt{1+\frac{c(c +2)}{\cos ^2\beta }}\), then \(\Theta _{r,\beta } \ge c+2\). Hence, \(\ell = \frac{c}{2+c}\) assures (12) in case IIb.

5) In conclusion, if we put \(\ell = \min \left\{ \frac{1}{2}, 1-\sqrt{\frac{2c}{\gamma }} ,\frac{c}{2+c} \right\} \), with \(c<\frac{\gamma }{2}\), then (12) is satisfied in all cases. \(\square \)

Theorem 1

(Local convergence). Suppose \(B\) intersects \(A\) separably at \(x^*\in A\cap B\) with exponent \(\omega \in [0,2)\) and constant \(\gamma \) and is \(\omega /2\)-Hölder regular at \(x^*\) with respect to \(A\) and constant \(c < \frac{\gamma }{2}\). Then there exists a neighborhood \(V\) of \(x^*\) such that every sequence of alternating projections between \(A\) and \(B\) which enters \(V\) converges to a point \(b^*\in A \cap B\).

Proof

1) By hypothesis, there exists a neighborhood \(U=\mathcal {B}(x^*,4\epsilon )\) of \(x^*\in A \cap B\) such that every building block \(b\rightarrow a^+\rightarrow b^+\) with \(b,a^+,b^+\in U\) satisfies the angle condition \(1-\cos \alpha \ge \gamma \Vert b^+-a^+\Vert ^\omega \), where \(\alpha = \angle (b-a^+,b^+-a^+)\). In addition, by shrinking \(U\) if necessary, we may assume that \(B\) is \(\omega /2\)-Hölder regular at \(x^*\) on \(U\) with constant \(c < \frac{\gamma }{2}\). Then by the three-point estimate (Lemma 1), there exists \(\ell \in (0,1)\) depending only on \(c,\gamma \) and \(U\), such that \(\Vert a^+-b^+\Vert ^2 + \ell \Vert b-b^+\Vert ^2 \le \Vert a^+-b\Vert ^2\) for every such building block. Since \(\Vert a^+-b\Vert \le \Vert a-b\Vert \), we deduce the following four-point estimate

$$\begin{aligned} d_B(a^+)^2 + \ell \Vert b-b^+\Vert ^2 \le d_B(a)^2 \end{aligned}$$
(15)

for building blocks \(a\rightarrow b\rightarrow a^+\rightarrow b^+\) with \(b,a^+,b^+\in U\).

2) Define the constants \(\theta = (\omega +2)/4\) and \(C=1/((1-\theta )\ell \sqrt{2\gamma }))\). Choose \(\delta > 0\) such that

$$\begin{aligned} 9\delta +C 2^{2(1-\theta )}\delta ^{2(1-\theta )} < \frac{\epsilon }{4}, \end{aligned}$$

which implies \(16\delta <\epsilon \). Then define the neighborhood \(V\) as \(V=\mathcal {B}(x^*,\delta )\). We have to show that if the alternating sequence enters \(V\), then it converges to a unique limit \(b^*\in A \cap B\). By relabeling the sequence, we may without loss of generality assume that \(b_0\in V = \mathcal {B}(x^*,\delta )\). The case where the \(a_k\)’s reach \(V\) first is treated analogously.

We shall prove by induction that for every \(k\ge 1\), we have

$$\begin{aligned} b_k, a_{k+1}, b_{k+1} \in \mathcal {B}(x^*,\epsilon ) \end{aligned}$$
(16)

and

$$\begin{aligned} \sum _{{j=1}}^k \Vert b_j-b_{j+1}\Vert \le \frac{1}{2} \sum _{{j=1}}^k \Vert b_{j-1}-b_j\Vert + \frac{C}{2} \left( d_B({a_{1}})^{2(1-\theta )} - d_B(a_{k+1})^{2(1-\theta )} \right) .\nonumber \\ \end{aligned}$$
(17)

Let us first do the induction step and suppose that hypotheses (16), (17) are true at \(k-1\) for some \(k\ge 2\). We have to show that they also hold at \(k\).

2.1) Firstly we check (16) at \(k\). By (16) at \(k-1\) we know that \(b_{k-1},a_k,b_k\in \mathcal {B}(x^*,\epsilon )\). So it remains to prove \(a_{k+1},b_{k+1}\in \mathcal {B}(x^*,\epsilon )\). We claim that \(b_k\in \mathcal {B}(x^*,\frac{\epsilon }{4})\). Indeed, using the induction hypothesis (17) at \(k-1\) gives

$$\begin{aligned} \displaystyle \sum _{{j=1}}^{k-1} \Vert b_j-b_{j+1}\Vert \le \frac{1}{2}\sum _{{j=1}}^{k-1} \Vert b_j-b_{j-1}\Vert +\frac{C}{2} \left( d_B({a_{1}})^{2(1-\theta )} - d_B(a_{k})^{2(1-\theta )} \right) . \end{aligned}$$

Hence,

$$\begin{aligned} \displaystyle \sum _{{j=1}}^{k-1} \Vert b_j-b_{j+1}\Vert&\le \Vert {b_{0}}-{b_{1}}\Vert + C\,d_B({a_{1}})^{2(1-\theta )} - C\,d_B(a_{k})^{2(1-\theta )} -\Vert b_{k-1}-b_{k}\Vert \nonumber \\&\le \Vert {b_{0}-b_{1}}\Vert + C\,d_B({a_{1}})^{2(1-\theta )}.\\ \nonumber \end{aligned}$$

Therefore,

$$\begin{aligned} \Vert b_{k} - x^*\Vert&\le \Vert b_{k}- {b_{1}}\Vert + \Vert {b_{1}}-x^*\Vert \nonumber \le \sum _{{j= 1}}^{k-1} \Vert b_{j+1}-b_{j}\Vert + \Vert {b_{1}}-x^*\Vert \\&\le \Vert {b_{0}-b_{1}}\Vert + C\,d_B({a_{1}})^{2(1-\theta )}+ \Vert {b_{1}}-x^*\Vert . \end{aligned}$$
(18)

Since \(b_0\in \mathcal {B}(x^*,\delta )\), we have \(\Vert b_{0}-a_{1}\Vert \le \Vert b_0-x^*\Vert \le \delta \), hence \(a_{1}\in \mathcal {B}(x^*,2\delta ).\) Then \(\Vert {a_{1}-b_{1}}\Vert \le \Vert {a_{1}}-x^*\Vert \le 2 \delta \), which gives \(\Vert {b_{1}} - x^*\Vert \le 4 \delta .\) It follows that \(\Vert {b_{0}-b_{1}}\Vert \le 5 \delta .\) Now since \(d_B({a_{1}})=\Vert {a_{1}-b_{1}}\Vert \le 2\delta \), going back to (18), we obtain

$$\begin{aligned} \Vert b_{k} - x^*\Vert \le 9\delta + C2^{2(1-\theta )}\delta ^{2(1-\theta )}<\frac{\epsilon }{4}, \end{aligned}$$

which is our above claim. Now this implies

$$\begin{aligned} \Vert a_{k+1}-x^*\Vert \le \Vert a_{k+1}-b_{k}\Vert +\Vert b_k-x^*\Vert \le 2\Vert b_k-x^*\Vert <\frac{\epsilon }{2}<\epsilon , \end{aligned}$$

and

$$\begin{aligned} \Vert b_{k+1}-x^*\Vert \le \Vert a_{k+1}-b_{k+1}\Vert +\Vert a_{k+1}-x^*\Vert \le 2\Vert a_{k+1}-x^*\Vert <\epsilon . \end{aligned}$$

This proves \(a_{k+1}\in \mathcal {B}(x^*,\epsilon )\) and \(b_{k+1}\in \mathcal {B}(x^*,\epsilon )\) and therefore (16) at \(k\).

2.2) Let us now prove that (17) is true at \(k\). Using the induction hypothesis (16) at \(k-1\), that is, \(b_{k-1}\), \(a_k\), \(b_k\in \mathcal {B}(x^*,\epsilon )\), we apply part 1) of the proof to the building block \(b_{k-1}\rightarrow a_k\rightarrow b_k\), which gives

$$\begin{aligned} \frac{1-\cos \alpha _k}{\Vert a_k-b_k\Vert ^\omega }\ge \gamma , \end{aligned}$$
(19)

where \(\alpha _k=\angle (b_{k-1}-a_k,b_k-a_k)\). By part 2.1), which is already proved, we have \(b_k\), \(a_{k+1}\), \(b_{k+1}\in \mathcal {B}(x^*,\epsilon )\) and \(\mathcal {B}(x^*,\epsilon )\subset U\), so that we can apply the four-point estimate of part 1) to the building block \(b_k\rightarrow a_{k+1}\rightarrow b_{k+1}\). This gives

$$\begin{aligned} d_B(a_{k+1})^2 + \ell \Vert b_k-b_{k+1}\Vert ^2 \le d_B(a_k)^2. \end{aligned}$$
(20)

Now using the cosine theorem and (19), we obtain

$$\begin{aligned} \Vert b_{k-1}-b_k\Vert ^2&= \Vert b_{k-1}-a_k\Vert ^2+\Vert a_k-b_k\Vert ^2 -2\Vert b_{k-1}-a_k\Vert \Vert a_k-b_k\Vert \cos \alpha _k\\&= \left( \Vert b_{k-1}-a_k\Vert -\Vert a_k-b_k\Vert \right) ^2+2 \Vert b_{k-1}-a_k\Vert \Vert a_k-b_k\Vert (1\!-\!\cos \alpha _k)\\&\ge \left( \Vert b_{k-1}-a_k\Vert -\Vert a_k-b_k\Vert \right) ^2+2 \gamma \Vert b_{k-1}-a_k\Vert \Vert a_k-b_k\Vert ^{\omega +1}\\&\ge 2 \gamma \Vert b_{k-1}-a_k\Vert \Vert a_k-b_k\Vert ^{\omega +1}. \end{aligned}$$

Since \(b_k\in P_B(a_k)\) and \(b_{k-1}\in B\), we have \(\Vert b_{k-1}-a_k\Vert \ge \Vert a_k-b_k\Vert =d_B(a_k)\). Hence, \(\Vert b_{k-1}-b_k\Vert ^2 \ge 2 \gamma d_B(a_k)^{\omega +2}\), or what is the same

$$\begin{aligned} \Vert a_k-b_k\Vert ^{-{(\omega +2)}/{2}}\Vert b_{k-1}-b_k\Vert \ge \sqrt{2\gamma }. \end{aligned}$$
(21)

Recalling that \(\theta = {(\omega +2)}/{4}\), we have \(\theta \in [\frac{1}{2},1)\), because \(\omega \in [0,2)\). By concavity of the function \(s\mapsto s^{1-\theta }/(1-\theta )\), we have \(s_1^{1-\theta }-s_2^{1-\theta } \ge (1-\theta ) s_1^{-\theta } (s_1-s_2)\). Applying this to \(s_1 = d_B(a_k)^2\) and \(s_2 = d_B(a_{k+1})^2\), we obtain

$$\begin{aligned} d_B(a_k)^{2(1-\theta )}-d_B(a_{k+1})^{2(1-\theta )}&\ge (1-\theta ) d_B(a_k)^{-2\theta } \left( d_B(a_k)^2-d_B(a_{k+1})^2 \right) \\&= (1\!-\!\theta ) \Vert a_k\!-\!b_k\Vert ^{{-}2\theta } \left( \Vert a_k\!-\!b_k\Vert ^2 \!{-}\! \Vert a_{k+1}\!-\!b_{k+1}\Vert ^2 \right) \\&\ge (1-\theta ) \ell \sqrt{2\gamma }\, \frac{\Vert b_k-b_{k+1}\Vert ^2}{\Vert b_{k-1}-b_k\Vert }, \end{aligned}$$

where the last estimate uses (20) and (21). Multiplying by \(\Vert b_k-b_{k-1}\Vert \) and recalling that \(C=1/((1-\theta )\ell \sqrt{2\gamma })\), this gives

$$\begin{aligned} C\,\left( d_B(a_k)^{2(1-\theta )} - d_B(a_{k+1})^{2(1-\theta )} \right) \Vert b_k-b_{k-1}\Vert \ge \Vert b_k-b_{k+1}\Vert ^2. \end{aligned}$$

By comparison of the arithmetic and geometric mean, \({a^2} \le bc\) implies \(a\le \frac{1}{2}b + \frac{1}{2}c\) for positive \(a,b,c\), hence we obtain

$$\begin{aligned} \Vert b_k-b_{k+1}\Vert \le \frac{1}{2}\Vert b_k-b_{k-1}\Vert + \frac{C}{2} \left( d_B(a_k)^{2(1-\theta )} - d_B(a_{k+1})^{2(1-\theta )} \right) . \end{aligned}$$
(22)

By the induction hypothesis, we have (17) at \(k-1\), that is,

$$\begin{aligned} \sum _{{j=1}}^{k-1} \Vert b_j-b_{j+1}\Vert \le \frac{1}{2} \sum _{{j=1}}^{k-1} \Vert b_{j-1}-b_j\Vert + \frac{C}{2} \left( d_B({a_{1}})^{2(1-\theta )} -d_B(a_k)^{2(1-\theta )}\right) . \end{aligned}$$

Adding this and (22) gives (17) at index \(k\).

2.3) Let us now prove that the hypotheses (16) and (17) hold at \({k=1}\). Concerning (16), since \({b_{1}}\in \mathcal {B}(x^*,4\delta )\) and \(\Vert {b_{1}-a_{2}}\Vert \le 4\delta \le \frac{\epsilon }{4}\), we have \({a_{2}}\in \mathcal {B}(x^*,\frac{\epsilon }{2})\). Then using \(\Vert {a_{2}-b_{2}}\Vert \le \frac{\epsilon }{2}\) gives \({b_{2}}\in \mathcal {B}(x^*,\epsilon )\), so (16) is true at \({k=1}\).

Concerning the validity of (17) at \({k=1}\), observe that using \({b_{0},b_{1},a_{1}}\in \mathcal {B}(x^*,\frac{\epsilon }{4})\), we may repeat the argument in the induction step starting before formula (20) with \({k=1}\) in the place of \(k\). The conclusion is formula (22) at \({k=1}\), that is,

$$\begin{aligned} \Vert {b_{1}-b_{2}}\Vert \le \frac{1}{2} \Vert {b_{1}-b_{0}}\Vert + \frac{C}{2} \left( d_B({a_{1}})^{2(1-\theta ))} - d_B({a_{2}})^{2(1-\theta )} \right) , \end{aligned}$$

and this is precisely (17) at \({k=1}\). This concludes the induction argument.

3) Having proved (16), (17) for all indices \({k \ge 1}\), we see from (18) that the series \(\sum _{{j=1}}^\infty \Vert b_j-b_{j+1}\Vert \) converges, which means \(b_k\) is a Cauchy sequence, which converges to a limit \(b^*\in B \cap \mathcal {B}(x^*,\epsilon )\). Using relation (21) we conclude that \(a_k\) converges to the same limit \(b^*\in A \cap B\). \(\square \)

Our next result gives the convergence rate for \(\omega \in (0,2)\). The case \(\omega =0\), where linear convergence is obtained, will be treated separately in Theorem 3.

Corollary 4

(Rate of convergence). Under the hypotheses of Theorem 1, with \(\omega \in (0,2)\), the convergence rates are \(\Vert b_k - b^*\Vert ={\mathcal {O}}\left( k^{-\frac{2-\omega }{2\omega }} \right) \) and \(\Vert a_k-b^*\Vert = {\mathcal {O}} \left( k^{- \frac{2-\omega }{2\omega }} \right) \).

Proof

Summing (22) from \(k=N\) to \(k=M\) gives

$$\begin{aligned}&-\frac{1}{2} \Vert b_N-b_{N-1}\Vert + \frac{1}{2} \sum _{k=N}^{M-1} \Vert b_k-b_{k+1}\Vert + \Vert b_M-b_{M+1}\Vert \\&\quad \le \frac{C}{2} \left( d_B(a_N)^{2(1-\theta )} - d_B(a_{M+1})^{2(1-\theta )} \right) . \end{aligned}$$

Passing to the limit \(M\rightarrow \infty \) gives

$$\begin{aligned} -\frac{1}{2} \Vert b_N-b_{N-1}\Vert + \frac{1}{2} \sum _{k=N}^\infty \Vert b_k-b_{k+1}\Vert \le \frac{C}{2} d_B(a_N)^{2(1-\theta )}. \end{aligned}$$

Introducing \(S_N=\sum _{k=N}^\infty \Vert b_k-b_{k+1}\Vert \), this becomes

$$\begin{aligned} -\frac{1}{2} \left( S_{N-1}-S_N \right) + \frac{1}{2}S_N \le \frac{C}{2} d_B(a_N)^{2(1-\theta )}. \end{aligned}$$

Consequently,

$$\begin{aligned} \frac{1}{2}S_N \le \frac{1}{2} (S_{N-1}-S_N) + {C}\,d_B(a_N)^{2(1-\theta )}. \end{aligned}$$

Now using estimate (21), we have \(d_B(a_N)^{2(1-\theta )} \le (2\gamma )^{-\frac{1-\theta }{2\theta }} \Vert b_{N-1}-b_N\Vert ^\frac{1-\theta }{\theta }\). Putting \(C' := C (2\gamma )^{-\frac{1-\theta }{2\theta }}\) and substituting this gives

$$\begin{aligned} \frac{1}{2} S_N \le \frac{1}{2}( S_{N-1}-S_N) + C' \left( S_{N-1}-S_N \right) ^{\frac{1-\theta }{\theta }}. \end{aligned}$$
(23)

Since \(\theta \in \left( \frac{1}{2},1\right) \), we have \((1-\theta )/\theta \in (0,1)\). Moreover, \(S_N\rightarrow 0\) implies \(S_{N-1}-S_N\rightarrow 0\), so the second term \((S_{N-1}-S_N)^{\frac{1-\theta }{\theta }}\) on the right of (23) dominates the first term \(\frac{1}{2}(S_{N-1}-S_N)\). That means, there exists another constant \(C'' >0\) such that

$$\begin{aligned} S_N^{\frac{\theta }{1-\theta }} \le C'' (S_{N-1}-S_N) \end{aligned}$$

for all \(N\in {\mathbb {N}}\). We claim that there exists yet another constant \(C'''\) such that

$$\begin{aligned} 1 \le C'' (S_{N-1}-S_N) S_N^{-\frac{\theta }{1-\theta }} \le C''' \left( S_{N}^{-\frac{2\theta -1}{1-\theta }}-S_{N-1}^{-\frac{2\theta -1}{1-\theta }} \right) . \end{aligned}$$
(24)

Assuming this proved, summation of (24) from \(N=1\) to \(N=M\) gives

$$\begin{aligned} M \le C''' \left( S_{M}^{-\frac{2\theta -1}{1-\theta }}-S_1^{-\frac{2\theta -1}{1-\theta }}\right) . \end{aligned}$$

Hence, for yet two other constants \(C'''',C'''''\),

$$\begin{aligned} S_M \le C'''' \left[ S_1^{-\frac{2\theta -1}{1-\theta }} + M \right] ^{-\frac{1-\theta }{2\theta -1}} \le C''''' M^{-\frac{1-\theta }{2\theta -1}}. \end{aligned}$$

Since \(\Vert b_M-b^*\Vert \le S_M\) by the triangle inequality, that proves the claimed speed of convergence.

In order to prove (24), we divide the set of indices into \({\mathcal {I}} = \{N: 2S_N \ge S_{N-1}\}\) and \({\mathcal {J}}=\{N: 2S_N < S_{N-1}\}\). For \(N\in {\mathcal {I}}\), we have

$$\begin{aligned} (S_{N-1}-S_N) S_N^{-\frac{\theta }{1-\theta }}&\le 2^{\frac{\theta }{1-\theta }} (S_{N-1} - S_N) S_{N-1}^{-\frac{\theta }{1-\theta }}\\&\le 2^{\frac{\theta }{1-\theta }} \int _{S_N}^{S_{N-1}} S^{-\frac{\theta }{1-\theta }}\mathrm{d}S \\&= 2^{\frac{\theta }{1-\theta }} \frac{1-\theta }{2\theta -1} \left( S_N^{-\frac{2\theta -1}{1-\theta }} - S_{N-1}^{-\frac{2\theta -1}{1-\theta }}\right) , \end{aligned}$$

proving (24). In contrast, for \(N\in {\mathcal {J}}\) we have

$$\begin{aligned} S_N^{-\frac{2\theta -1}{1-\theta }} -S_{N-1}^{-\frac{2\theta -1}{1-\theta }} \ge 2^{\frac{2\theta -1}{1-\theta }} S_{N-1}^{-\frac{2\theta -1}{1-\theta }}-S_{N-1}^{-\frac{2\theta -1}{1-\theta }} =\left( 2^{\frac{2\theta -1}{1-\theta }}-1\right) S_{N-1}^{-\frac{2\theta -1}{1-\theta }} \rightarrow \infty \end{aligned}$$

in view of \(S_{N-1}\rightarrow 0\), \(\frac{2\theta -1}{1-\theta }>0\) and \(2^{\frac{2\theta -1}{1-\theta }}-1 >0\). So on the set \({\mathcal {J}}\) estimate (24) is also satisfied. Finally, the same estimate for \(a_k\) follows from \(\Vert a_{k+1}-b^*\Vert \le \Vert a_{k+1}-b_k\Vert + \Vert b_k-b^*\Vert \le 2 \Vert b_k-b^*\Vert \). \(\square \)

Theorem 2

(Local convergence with linear rate). Let \(A,B\) intersect \(0\)-separably at \(x^*\) with constant \(\gamma \in (0,2)\). Suppose \(B\) is \(0\)-Hölder regular at \(x^*\) with respect to \(A\) with constant \(c < \frac{\gamma }{2}\). Then there exists a neighborhood \(V\) of \(x^*\) such that every sequence of alternating projections that enters \(V\) converges R-linearly to a point \(b^*\in A\cap B\).

Proof

Applying Lemma 1 and Theorem 1 in the case \(\omega =0\), we obtain convergence of the sequence \(a_k,b_k\) to a point \(b^*\in A \cap B\) from summability of \(\sum _k \Vert b_{k-1}-b_k\Vert \). Now from the Proof of Corollary 4, we see that in the case \(\theta =\frac{1}{2}\) equation (23) simplifies to

$$\begin{aligned} \frac{1}{2} S_N\le \frac{1}{2}(S_{N-1}-S_N) + C' (S_{N-1}-S_N), \end{aligned}$$

or what is the same

$$\begin{aligned} S_N \le \frac{1+2C'}{2+2C'}\,S_{N-1}. \end{aligned}$$

This proves Q-linear convergence of \(S_N\) to \(0\), hence R-linear convergence of \(b_N\rightarrow b^*\). \(\square \)

Remark 8

Theorem 2 extends the results in [21, Thm. 5.2] and [4, Thm. 3.14] in two ways. Firstly, as seen in Example 7.1, \(0\)-Hölder regularity includes sets \(B\) which have singularities pointing inwards, where superregularity [21] and its extension in [4] fail. Secondly, \(0\)-separability is weaker than linear regularity or the CQ in [4], see Example 7.4.

We now obtain the following consequence of Theorem 1, originally proved in [5] for more general families of sets. When specialized to the case of two sets we have

Corollary 5

(Bauschke et al. [5, Theorem 3.14]). Suppose \((A,\tilde{A},B,\tilde{B})\) satisfies the CQ-condition (2) at \(x^*\in A \cap B\), where \(P_A(\partial B{\setminus } A)\subset \tilde{A}\), \(P_B(\partial A{\setminus } B)\subset \tilde{B}\). Moreover, suppose for every \(\epsilon > 0\) there exists \(\delta > 0\) such that \(B\) is \((A,\epsilon ,\delta )\) regular at \(x^*\). Then there exists a neighborhood \(V\) of \(x^*\) such that every alternating sequence which enters \(V\) converges R-linearly to a point in \(A\cap B\). \(\square \)

Remark 9

While [4, Thm.3.14] is slightly more restrictive than Theorem 2 as far as the regularity and transversality hypotheses are concerned, the authors of [4, 5] go on the other hand further than our present contribution in two ways. They discuss the case of more than two sets, and they quantify the size of the neighborhood on which the observed convergence rate is attained. A fine analysis of the Proof of Lemma 1 and Corollary 4 should in principle allow a similar more quantitative version of Theorem 2.

As already shown in [5] one readily derives

Corollary 6

(Lewis, Luke, Malick [21]). Suppose \(A,B\) intersect linearly regularly and \(B\) is superregular. Then alternating projections converge locally R-linearly to a point in the intersection. \(\square \)

The following is now a consequence of Theorem 2, using Propositions 2 and 8.

Corollary 7

(Drusvyatskiy, Ioffe, Lewis [12]). Suppose \(A,B\) intersect intrinsically transversally at \(x^*\). Then there exists a neighborhood \(U\) of \(x^*\) such that every sequence of alternating projections entering \(U\) converges R-linearly to a point in the intersection.

Proof

By Proposition 2, the sets \(A,B\) intersect \(0\)-separably at \(x^*\) with constant \(\gamma =\kappa ^2/2\in (0,1]\). By Proposition 8, \(B\) is \(0\)-Hölder regular with respect to \(A\) at \(x^*\) with any constant \(c < \kappa ^2/(1-\kappa ^2)=\gamma /(2-\gamma )\). Choosing \(c<\gamma /2\) allows us therefore to apply Theorem 2. \(\square \)

Remark 10

Drusvyatskiy et al. [12] stress that their approach gives local R-linear convergence under a transversality hypothesis alone, while the older [5, 21] as well as our present approach still need regularity assumptions. However, this statement should be read with care, because Propositions 2 and 8 show that intrinsic transversality amalgamates transversality and regularity aspects. In particular, it is more restrictive than \(0\)-Hölder regularity in tandem with \(0\)-separability, so that Theorem 2 is stronger than the main result in [12].

6 Subanalytic Sets

Following [8], a subset \(A\) of \({\mathbb {R}}^n\) is called semianalytic if for every \(x\in {\mathbb {R}}^n\) there exists an open neighborhood \(V\) of \(x\) such that

$$\begin{aligned} A \cap V = \bigcup _{i\in I} \bigcap _{j\in J} \{x\in V: \phi _{ij}(x)=0, \psi _{ij}(x) > 0\} \end{aligned}$$
(25)

for finite sets \(I,J\) and real-analytic functions \(\phi _{ij},\psi _{ij} :V \rightarrow {\mathbb {R}}\). The set \(B\) in \({\mathbb {R}}^n\) is called subanalytic if for every \(x\in {\mathbb {R}}^n\) there exist a neighborhood \(V\) of \(x\) and a bounded semianalytic subset \(A\) of some \({\mathbb {R}}^n \times {\mathbb {R}}^m\), \(m\ge 1\), such that \(B\cap V =\{x\in {\mathbb {R}}^n: \exists y\in {\mathbb {R}}^m \text{ such } \text{ that } (x,y)\in A\}\). Finally, an extended real-valued function \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}} \cup \{\infty \}\) is called subanalytic if its graph is a subanalytic subset of \({\mathbb {R}}^n\times {\mathbb {R}}\).

We consider the function \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}} \cup \{\infty \}\) defined as \(f(x) = i_A(x) + \frac{1}{2} d_B(x)^2\), where \(i_A\) is the indicator function of \(A\); cf. [22, p.84].

Lemma 2

Let \(f = i_A + \frac{1}{2}d_B^2\). Let \(a^+\in A\) be projected from \(b\in B\) and \(v= \lambda (b-a^+)\in {N^p_A(a^+)}\), where \(\lambda \ge 0\). Then \(v + a^+ - P_B(a^+) \subset \widehat{\partial } f(a^+)\).

Proof

Note that \(a^+- P_B(a^+)\subset \widehat{\partial } \left( \frac{1}{2} d_B^2\right) (a^+)\) by [22, 1.3.3] or [25, Example 8.53]. Next observe that \(\widehat{\partial } i_A(a^+) = \widehat{N}_A(a^+)\) by [22, Proposition 1.79] or [25, Exercice 8.14]. Hence, \(v \in N_A^p(a^+) \subset \widehat{N}_A(a^+) = \widehat{\partial }i_A(a^+)\) using [4, Lemma 2.4]. Finally, by the sum rule [19, Prop. 1.12], we have \(\widehat{\partial }i_A(a^+)+\widehat{\partial }\frac{1}{2}d_B^2(a^+) \subset \widehat{\partial }f(a^+)\), which completes the proof. \(\square \)

Definition 6

Let \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}} \cup \{\infty \}\) be lower semi-continuous with closed domain such that \(f|_{\mathrm{dom} f}\) is continuous. We say that \(f\) satisfies the Łojasiewicz inequality with exponent \(\theta \in [0,1)\) at the critical point \(x^*\) of \(f\) if there exists \(\gamma > 0\) and a neighborhood \(U\) of \(x^*\) such that \(|f(x)-f(x^*)|^{-\theta } \Vert g\Vert \ge \gamma \) for every \(x\in U\) and every \(g\in \widehat{\partial } f(x)\).

Here \(x^*\) is critical if \(0\in \partial f(x^*)\), see [22, 25].

Lemma 3

Suppose \(f = i_A+\frac{1}{2}d_B^2\) satisfies the Łojasiewicz inequality with exponent \(\theta \in [0,1)\) at \(a^*=b^*\in A \cap B\) and constant \(\gamma >0\). Then in fact \(\theta > \frac{1}{2}\). Moreover, \(B\) intersects \(A\) separably with exponent \(\omega =4\theta -2\in (0,2)\) and constant \(\gamma '=2^{-2\theta -1}\gamma ^2\).

Proof

Note that \(f(a^*)=0\). Therefore, there exists a neighborhood \(U\) of \(a^*\in A \cap B\) such that

$$\begin{aligned} f(a^+)^{-\theta } \Vert g\Vert \ge \gamma >0 \end{aligned}$$
(26)

for every \(a^+\in A\cap U\) and every \(g\in \widehat{\partial } f(a^+)\). Now let \(a\rightarrow b\rightarrow a^+\rightarrow b^+\) be a building block with \(a,b,a^+,b^+\in U\). From Lemma 2, \(g = v + a^+-b^+\in \widehat{\partial } f(a^+)\) for every \(v\in N_A^p(a^+)\) of the form \(v=\lambda (b-a^+)\). This uses the fact that \(a^+\in P_A(b)\). Hence, by (26) we have

$$\begin{aligned} 2^\theta d_B(a^+)^{-2\theta } \Vert \lambda (b-a^+) + a^+ - b^+\Vert \ge \gamma > 0 \end{aligned}$$

for every \(\lambda \ge 0\). We deduce

$$\begin{aligned} d_B(a^+)^{-2\theta } \min _{\lambda \ge 0} \Vert \lambda (b-a^+) + a^+ - b^+\Vert \ge 2^{-\theta }\gamma . \end{aligned}$$
(27)

Let us for the time being consider angles \(\alpha = \angle (b-a^+,b^+-a^+)\) smaller than \(90^\circ \). Then the minimum value in (27) is \(d_B(a^+)^{-2\theta }\Vert a^+-b^+\Vert \sin \alpha \). Therefore,

$$\begin{aligned} \frac{\sin \alpha }{d_B(a^+)^{2\theta -1}} \ge 2^{-\theta } \gamma . \end{aligned}$$
(28)

Since \(1-\cos \alpha \ge \frac{1}{2}\sin ^2\alpha \), estimate (28) implies

$$\begin{aligned} \frac{1-\cos \alpha }{d_B(a^+)^{4\theta -2}} \ge 2^{-2\theta -1}\gamma ^2. \end{aligned}$$
(29)

This shows that we must have \(\theta > \frac{1}{2}\), because the numerator tends to 0, so the denominator has to go to zero, too, which it does for \(4\theta -2>0\).

Let us now discuss the case where \(\alpha \ge 90^\circ \). We claim that the same estimate (29) is still satisfied. Since \(\cos \alpha < 0\), the numerator \(1-\cos \alpha \) in (29) is \(\ge 1\). Moreover, the infimum in (27) is now attained at \(\lambda =0\) with the value \(\Vert a^+-b^+\Vert =d_B(a^+)\). Hence, estimate (27) implies \(d_B(a^+)^{1-2\theta } \ge 2^{-\theta }\gamma \), hence \(d_B(a^+)^{2-4\theta } \ge 2^{-2\theta }\gamma ^2> 2^{-2\theta -1}\gamma ^2\), so that (29) is satisfied. This completes the proof. \(\square \)

Theorem 3

Let \(A,B\) be closed subanalytic sets. Then \(A,B\) intersect separably.

Proof

We assume \(A\cap B\not =\emptyset \), otherwise there is nothing to prove. Consider the function \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}} \cup \{\infty \}\) defined as \(f(x) = i_A(x) + \frac{1}{2}d_B(x)^2\). Then \(f\) has closed domain \(A\) and is continuous on \(A\), which makes it amenable to Definition 6. Every \(x^*\in A \cap B\) is a critical point of \(f\). Since \(A,B\) are subanalytic sets, \(f\) is subanalytic. That can be seen as follows: First observe that \(d_B\) is subanalytic by [27, I.2.1.11]. Then \(d_B^2\) is subanalytic as the product of two subanalytic functions by [27, I.2.1.9]. Finally, graph\((f) = (A \times {\mathbb {R}}) \cap \mathrm{graph}\left( \frac{1}{2}d_B^2\right) \) shows that \(f\) is subanalytic.

Now we invoke Theorem 3.1 of [9], which asserts that \(f\) satisfies the Łojasiewicz inequality at \(x^*\) for some \(\theta \in (0,1)\). Hence, (26) is true for every \(g\in \partial f(a^+)\), and therefore also for every \(g\in \widehat{\partial }f(a^+)\). Applying Lemma 3, we deduce that \(B\) intersects \(A\) separably with \(\omega =4\theta -2\). Interchanging the roles of \(A\) and \(B\), it follows also that \(A\) intersects \(B\) separably. \(\square \)

Corollary 8

(Local convergence for subanalytic sets). Let \(A,B\) be subanalytic. Suppose \(B\) is Hölder regular at \(x^*\in A \cap B\) with respect to \(A\). Then there exists a neighborhood \(V\) of \(x^*\) such that every sequence of alternating projections \(a_k,b_k\) which enters \(V\) converges to some \(b^*\in A \cap B\) with rate \(\Vert b_k-b^*\Vert ={\mathcal {O}}(k^{-\rho })\) for some \(\rho \in (0,\infty )\). \(\square \)

Corollary 9

Let \(A,B\) be closed subanalytic sets and suppose \(B\) has slowly vanishing reach with respect to \(A\). Let \(x^*\in A \cap B\), then there exists a neighborhood \(U\) of \(x^*\) such that every sequence of alternating projections \(a_k,b_k\) which enters \(U\) converges to some \(b^*\in A \cap B\) with rate \(\Vert b_k-b^*\Vert = {\mathcal {O}}(k^{-\rho })\) for some \(\rho \in (0,\infty )\). \(\square \)

Recall from [8, 27] that a subset \(A\) of \({\mathbb {R}}^n\) is called semialgebraic if for every \(x\in {\mathbb {R}}^n\) there exists a neighborhood \(V\) of \(x\) such that (25) is satisfied with \(\phi _{ij},\psi _{ij}\) polynomials. Naturally, this means that every semialgebraic set is semianalytic, hence subanalytic. By combining Theorems 1 and 3, we therefore obtain the following result.

Corollary 10

(Borwein, Li, Yao [10]). Let \(A,B\) be closed convex semialgebraic sets with non-empty intersection. Then there exists \(\rho \in (0,\infty )\) such that every sequence of alternating projections converges with rate \(\Vert b_k-b^*\Vert = {\mathcal {O}}(k^{-\rho })\). \(\square \)

As a variant of the method of alternating projects, consider the averaged projection method. Given closed sets \(C_1,\ldots ,C_m\), the method generates a sequence \(x_n\) by the recursion \(x_{n+1}\in \frac{1}{m}\left( P_{C_1}(x_n)+\cdots +P_{C_m}(x_n)\right) \).

Corollary 11

Let \(C_1,\ldots ,C_m\) be subanalytic sets in \({\mathbb {R}}^d\), and let \(c^*\in C_1 \cap \cdots \cap C_m\). Then there exists a neighborhood \(U\) of \(c^*\) such that whenever a sequence \(x_n\) of averaged projections enters \(U\), then it converges to some \(x^*\in C_1 \cap \cdots \cap C_m\) with rate \({\Vert x_k-x^*\Vert }={\mathcal {O}}(k^{-\rho })\) for some \(\rho \in (0,\infty )\).

Proof

We follow a standard procedure and define closed sets in the product space \({\mathbb {R}}^d \times \cdots \times {\mathbb {R}}^d\) (\(m\) times) as \(B = \{(x,\cdots ,x): x\in {\mathbb {R}}^d\}\), and \(A=C_1\times \cdots \times C_m\). Note that \(A\) is again subanalytic by [27, I.2.1.1], whereas \(B\) is convex and subanalytic. We have \(P_B(x_1,\cdots ,x_m)= \left( \frac{1}{m}(x_1+\cdots +x_m),\ldots , \frac{1}{m}(x_1+\cdots +x_m)\right) \), while \(P_A (x_1,\ldots ,x_m) =(P_{C_1}(x_1),\ldots ,P_{C_m}(x_m))\). Therefore, a sequence of averaged projections between \(C_1,\ldots ,C_m\) generates a sequence of alternating projections between \(A,B\).

Since \(B\) is convex, it is prox-regular hence Hölder regular with respect to \(A\), so by Corollary 8 there exists a neighborhood \({\mathcal {U}}=U\times \cdots \times U\) of \((c^*,\ldots ,c^*)\in A \cap B\) such that every alternating sequence which enters \({\mathcal {U}}\) converges to some \((x^*,\ldots ,x^*)\in A\cap B\) with rate \({\mathcal {O}}(k^{-\rho })\) for some \(\rho \in (0,\infty )\). Now consider an averaged projection sequence \(x_k\) entering \(U\). It follows that \((x_k,\ldots ,x_k)\in {\mathcal {U}}\), hence \(x_k\) converges to \(x^*\) with that same rate. \(\square \)

Remark 11

We mention a related averaged projection method in [1, Corollary 12], where the authors use the Kurdyka-Łojasiewicz inequality. The employed technique indicates that results in the spirit of Theorem 3 could be obtained for more general classes of sets definable in an o-minimal structure [1].

We conclude this section with an application of Theorem 1, demonstrating its versatility as a convergence test in practical situations. Let \({\mathbb {C}}^N\) be a finite dimensional unitary space, and consider the discrete Fourier transform

$$\begin{aligned} \widehat{x}(\omega )= \frac{1}{\sqrt{N}} \sum _{t=0}^{N-1} e^{2\pi i t\cdot \omega /N} x(t), \quad \omega =0,\ldots ,N-1 \end{aligned}$$

as a unitary linear operator \(x \rightarrow \widehat{x}\) of \({\mathbb {C}}^N\). The phase retrieval problem [13, 16] consist in estimating an unknown signal \(x\in {\mathbb {C}}^N\) whose Fourier amplitude \(| \widehat{x}(\omega )|=a(\omega )\), \(\omega =0,\ldots ,N-1\), is known. In physical terminology, identifying \(x\) means retrieving its unknown phase \(\widehat{x}(\omega )/|\widehat{x}(\omega )|\) in frequency domain.

Formally, given a function \(a(\cdot ):\{0,\ldots ,N-1\} \rightarrow [0,\infty )\), we have to find an element of the set \(B=\{x\in {\mathbb {C}}^N: |\widehat{x}(\omega )| = a(\omega ) \text{ for } \text{ all } \omega =0,\ldots ,N-1\}\). Since this problem is underdetermined, additional information about \(x\) in a different Fourier plane or in the time domain is added. We represent it in the abstract form \(x\in A\) for a closed set \(A\). Then the phase retrieval problem is to find \(x\in A \cap B\).

The famous Gerchberg–Saxton error reduction algorithm [16] computes a solution of the phase retrieval problem by generating a sequence of estimates as follows: Given \(x\in {\mathbb {C}}^N\), compute \(\widehat{x}\) and correct its Fourier amplitude by putting \({y}(\omega ) = a(\omega )\, \widehat{x}(\omega )/|\widehat{x}(\omega )|\) if \(\widehat{x}(\omega )\not =0\), and \({y}(\omega )=a(\omega )\) if there is extinction \(\widehat{x}(\omega )=0\). For short, \({y}=a\widehat{x}/|\widehat{x}|\) with the convention \(0/|0|=1\). Then compute the inverse discrete Fourier transform \(\widetilde{{y}}\) of \(y\) and build the new iterate \(x^+\) by projecting \(\widetilde{y}\) on the set \(A\), that is \(x^+\in P_A(\widetilde{y})\). In condensed notation:

$$\begin{aligned} x^+ \in P_A\left( (a\widehat{x}/|\widehat{x}|)^\sim \right) . \end{aligned}$$
(30)

Corollary 12

(Gerchberg–Saxton error reduction). Suppose the constraint \(x\in A\) is represented by a closed subanalytic set \(A\). Let \(x^*\in A \cap B\) be a solution of the phase retrieval problem. Then there exists \(\epsilon > 0\) such that whenever a Gerchberg–Saxton sequence \(x_k\) enters \(\mathcal {B}(x^*,\epsilon )\), then it converges to a solution \(\bar{x}\in A \cap B\) of the phase retrieval problem with rate of convergence \(\Vert x_k-\bar{x}\Vert ={\mathcal {O}}(k^{-\rho })\) for some \(\rho \in (0,\infty )\).

Proof

With the convention \(0/|0|=1\), the mapping \(x \mapsto \left( a\, \widehat{x}/|\widehat{x}| \right) ^\sim \) is an orthogonal projection on the set \(B=\{x\in {\mathbb {C}}^N: |\widehat{x}(\cdot )|=a(\cdot )\}\). (See for instance [3, (8),(10)], where the authors consider even the function space case). Therefore, the Gerchberg–Saxton algorithm (30) is an instance of the alternating projection methods between the subanalytic set \(A\) and the Fourier amplitude set \(B\). We show that \(B\) is subanalytic and prox-regular. Local convergence with rate \({\mathcal {O}}(k^{-\rho })\) then follows from Corollary 9.

As far as subanalyticity of \(B\) is concerned, observe that on identifying \({\mathbb {C}}^N\) with \({\mathbb {R}}^{2N}\) via \(\widehat{x}(\omega ) = \widehat{x}_1(\omega ) + i\, \widehat{x}_2(\omega )\), we have

$$\begin{aligned} B=\bigcap _{\omega =0}^{N-1}\left\{ x\in {\mathbb {C}}^N: \widehat{x}_1(\omega )^2 + \widehat{x}_2(\omega )^2 - a(\omega )^2=0\right\} , \end{aligned}$$

which is clearly a representation of the form (25), since the discrete Fourier transform \(x \mapsto \widehat{x}\) is analytic.

To show prox-regularity of \(B\), we have to show that the projection on \(B\) is single-valued in a neighborhood of \(B\). With the same identification \({\mathbb {C}}^N \cong {\mathbb {R}}^{2N}\) evoked before, the projection on \(B\) splits into \(N\) projections in \({\mathbb {R}}^2\), given as \((\widehat{x}_1(\omega ),\widehat{x}_2(\omega ))\rightarrow a(\omega ) \left( \frac{\widehat{x}_1(\omega )}{\sqrt{\widehat{x}_1(\omega )^2+\widehat{x}_2(\omega )^2}}, \frac{\widehat{x}_2(\omega )}{\sqrt{\widehat{x}_1(\omega )^2+\widehat{x}_2(\omega )^2} }\right) \). In the case \(a(\omega ) = 0\) this is the projection onto the origin, which is clearly single-valued. For \(a(\omega ) > 0\) this is the orthogonal projection onto the sphere of radius \(a(\omega )\) in \({\mathbb {R}}^2\), which is single-valued except at the origin \((\widehat{x}_1(\omega ),\widehat{x}_2(\omega ))=(0,0)\). This means the projection on \(B\) is unique on the neighborhood \(U=\{x\in {\mathbb {C}}^N: |\widehat{x}(\cdot )| \ge a(\cdot )/2\}\) of \(B\), proving prox-regularity of \(B\).

\(\square \)

Remark 12

The constraint \(x\in A\) may represent additional measurements, or it may include prior information about the unknown image. In the original work [16], \(x\in A\) represents Fourier amplitude information from a second Fourier plane, which is a constraint analogous to \(x\in B\). The constraint \(x\in A\) may also represent prior information about the support supp\((x)\) of the unknown signal \(x\) in physical domain. It may for instance be known that supp\((x)\subset S\), where \(S\) is a subset of \(\{0,\ldots ,N-1\}\) with card\((S)\ll N\), or with a periodic structure. This is known as an atomicity constraint in crystallographic phase retrieval [13]. For \(A=\{x\in {\mathbb {C}}^N: x(t)=0 \text{ for } t\not \in S\}\), \(P_A\) is simply truncation \(y \rightarrow y \cdot 1_S\). Here the Gerchberg–Saxton error correction method has the explicit form

$$\begin{aligned} x^+(t) = \left\{ \begin{array}{l@{\quad }l} \left( a\widehat{x}/|\widehat{x}| \right) ^\sim (t) &{}\text{ if } t\in S\\ 0&{}\text{ else } \end{array} \right. \end{aligned}$$

Other choices of the constraint \(x\in A\) have been discussed in the literature, see, e.g., [13]. Our convergence result requires only subanalyticity of \(A\), a condition which is always satisfied in practice.

7 Examples

Example 7.1

(Packman gulping an ice-cream cone the wrong way). Consider packman \(B=\{(x,y)\in {\mathbb {R}}^2: x^2+y^2 \le 1, x \le |y|\}\) the instant before it scarfs down the ice-cream cone section \(A=\{(x,y) \in {\mathbb {R}}^2: 0\le x^2+y^2 \le 1, 2|y| \le x\}\) fitting symmetrically into its notch. We have \(A \cap B = \{(0,0)\}\), leaving an angular gap of \(15^\circ \) on both sides.

Let \(a^+=(x,\frac{1}{2}x)\in \partial A\), then \(b^+=P_B(a^+)=\left( \frac{3}{4}x,\frac{3}{4}x\right) \in \partial B\), which means \(r=\Vert b^+-a^+\Vert =\frac{\sqrt{2}}{4}x\). It is easy to see that condition (6) is satisfied for every \(c<1\) and arbitrary \(\sigma \in [0,1)\), i.e., \(B\) is Hölder regular with respect to \(A\). This example shows that Hölder regularity applies to sets which have inward corners and fail Clarke regularity.

Note that since \(B\) is not Clarke regular at \(x^*=(0,0)\), it is not superregular in the sense of [21]. What is more, \(B\) is not \((A,\epsilon ,\delta )\)-regular in the sense of [4] at \(x^*=(0,0)\), regardless how \(\epsilon ,\delta > 0\) are chosen, because the cone \(b^++\{v: \langle a^+-b^+,v\rangle \le \epsilon \Vert a^+-b^+\Vert \Vert v\Vert \}\) with vertex at the projected point \(b^+=\left( \frac{3}{4}x,\frac{3}{4}x\right) \in B\) hits \(B\) at points \(b'\in B\) other than \(b\) on the opposite side of \(A\), regardless how small \(\epsilon \) is chosen. And this cannot be prevented by shrinking the neighborhood \(\mathcal {B}(x^*,\delta )\). Note that \(A,B\) intersect \(0\)-separably at \((0,0)\), hence alternating projections converge linearly by Theorem 2. This cannot be obtained from the results in [4, 21].

In [17, Def. 2.9], Hesse and Luke discuss a related concept called \((\epsilon ,\delta )\)-subregularity. In [17, Thm 3.11] they prove linear convergence under \((\epsilon ,\delta )\)-subregularity of \(B\) with respect to \(A\cap B\), in tandem with a transversality assumption at \(x^*\), called local linear regularity. Note that \(B\) above is \((0,1)\)-subregular with respect to \(A \cap B=\{x^*\}\) at \(x^*=(0,0)\). However, if we change the set \(A\) to \(A= \{(x,y): 0 \le x^2+y^2 \le 1, -\frac{1}{2}x \le y \le x, x\ge 0\}\), then \(B\) is no longer \((\epsilon ,\delta )\)-subregular with respect to \(A \cap B\) at \(x^*\), regardless how small \(\epsilon \in (0,1)\), \(\delta >0\) are chosen. Since now \(S=A \cap B=\{(x,y): x^2+y^2 \le 1, y = x, x\ge 0\}\), fixing \(\delta > 0\), we can find \(x\in B\cap \mathcal {B}(x^*,\delta )\), \(x=(\xi ,-\xi )\) with \(\xi >0\), and \(\bar{x}=(\eta ,\eta )\in S \cap \mathcal {B}(x^*,\delta )\) with \(\eta > 0\) such that \(\cos \angle (v_x,\bar{x}-x)\) is arbitrarily close to 1. Note that \(B\) is still \(0\)-Hölder regular at \(x^*\) with respect to \(A\), so Theorem 2 is still applicable.

Example 7.2

(Regularity cannot be dispensed with). Following [6], consider the spiral \(z(\phi )=(1+e^{-\phi })e^{i\phi }\), \(\phi \in [0, \infty )\) in the complex plane which approaches the unit circle \(S=\{|z|=1\}\) form outside. Define a sequence \(z_n = z(\phi _n)\) with \(\phi _1 < \phi _2 < \cdots \rightarrow \infty \) such that \(\Vert z_{n+1}-z_n\Vert < \Vert z_n-z_{n-1}\Vert \rightarrow 0\), \(P_{\{z_k:k\not =n\} \cup S}(z_n)=z_{n+1}\), and such that every \(z\in S\) is an accumulation point of the \(z_n\). In [6] an explicit construction with these properties is obtained recursively as

$$\begin{aligned} z_n = z(\phi _n),\quad \Vert z(\phi _{n+1})-z(\phi _n)\Vert =r_n, \quad r_{n+1}=e^{-\phi _{n+1} } \frac{1-e^{-2\pi }}{2}. \end{aligned}$$
(31)

Let \(A = \{z_{2n}:n\in {\mathbb {N}}\}\cup S\), \(B=\{z_{2n-1}: n\in {\mathbb {N}}\} \cup S\), then \(A \cap B=S\). Note that for starting points \(|z_0|>1\), the sequence of alternating projections between \(A\) and \(B\) is a tail of the sequence \(z_n\), so none of the alternating sequences converges. Note that \(\angle (z_{n+1}-z_n,z_{n-1}-z_n) \rightarrow \pi \), hence \(A,B\) intersect \(0\)-separably at every \(x^*\in S=A\cap B\). The CQ in the sense of [4] is satisfied at every \(x^*\in A \cap B\). Namely, for \(z\in S\), \(N_A^B(z)=N_B^A(z)={\mathbb {R}}_+(-i{z})\). Indeed, as \(a_n= P_A(b_n)\) approaches \(z\), the direction \(u_n= (b_n-a_n)/\Vert b_n-a_n\Vert \) approaches a direction perpendicular to \(z\), and since the spiral turns counterclockwise, this direction is \(-i{z}\). Therefore, \(N_A^B(z) \cap (-N_B^A(z))=\{0\}\) for every \(z\in S\).

Since the sequence \(z_n\) fails to converge, we conclude that this must be due to the lack of regularity at points in \(S\). Indeed, Hölder regularity fails for every \(0\le \sigma < 1\). This can be seen as follows: Since the angle \(\angle (b-a^+,b^+-a^+)\) for the building block \(b \rightarrow a^+ \rightarrow b^+\) approaches \(\pi \), the corresponding angle \(\beta = \angle (b-b^+,a^+-b^+)\) goes to 0, so \(\cos \beta \rightarrow 1\), and for \(\sigma \in (0,1)\) we cannot find \(c>0\) such that \(\cos \beta \le \sqrt{c}r^\sigma \). Therefore, in order to assure (6), we would need \(b\not \in \mathcal {B}(a^+,(1+c)r)\), where \(r=\Vert a^+-b^+\Vert \). This, however, would imply linear convergence of the alternating sequence, which fails. As a consequence of Proposition 3, the other regularity concepts fail, as does intrinsic transversality.

Example 7.3

(Discrete spiral I). We consider a discrete approximation of the logarithmic spiral, generated by \(8\) equally spaced rays emanating from the origin. Starting on one of the rays, we project perpendicularly on the neighboring ray, going counterclockwise. We label the projected points \(a_1,b_1,a_2,\ldots \). This defines two sets \(A=\{a_i: i\in {\mathbb {N}}\}\cup \{(0,0)\}\) and \(B=\{b_i:i\in {\mathbb {N}}\} \cup \{(0,0)\}\) with \(A\cap B=\{(0,0)\}\) such that \(P_B(a_i)=b_i\) and \(P_A(b_i)=a_{i+1}\). Every sequence of alternating projections between \(A\) and \(B\) not starting at the origin is a tail of the sequence \(a_n,b_n\) and converges to \((0,0)\).

Since \(\alpha = \angle (b-a^+,b^+-a^+)=135^\circ \), \(A,B\) intersect \(0\)-separably at \(x^*=(0,0)\). We check whether the intersection satisfies the CQ in the sense on [4]. Consider one of the rays on which a point \(a^+\) is situated. Then \(u=b-a^+ \in \widehat{N}_A^B(a^+)\) is perpendicular to \(a^+-x^*\), i.e., perpendicular to the ray in question. As \(u\) is the same for all \(a^+\) on that ray, we have \(u\in N_A^B(0,0)\). Altogether, \(N_A^B(0,0)=\mathrm{lin}\{u_1,u_3,-u_1,-u_3\}\) for four directions spaced \(90^\circ \). Similarly, \(N_B^A(0,0)=\{u_2,u_4,-u_2,-u_4\}\) spaced \(90^\circ \), and intertwined with the directions of \(N_A^B(0,0)\). We have \(N_A^B(0,0)=-N_A^B(0,0)\), and similarly for \(N_B^A(0,0)\), and since \(N_B^A(0,0)\cap N_A^B(0,0)=\{(0,0)\}\), the intersection does indeed satisfy the CQ in the sense of [4] for \(\widetilde{A}=A\), \(\widetilde{B}=B\).

How about regularity at \((0,0)\)? Naturally, \(A,B\) are not superregular at \((0,0)\), because they are not Clarke regular. Concerning \((A,\epsilon ,\delta )\)-regularity of \(B\) in the sense of [4], suppose in a building block \(b\rightarrow a^+\rightarrow b^+\) we wish to set up a cone with apex \(b^+\) and axis \(b^++{\mathbb {R}}_+(a^+-b^+)\) by choosing its aperture small enough through the choice of \(\epsilon \) such that all previous points of \(A\) are avoided, then we have to choose smaller and smaller angles \(\beta \) to do this, so this type of regularity fails.

On the other hand, we have \(\sigma \)-Hölder regularity for every \(\sigma \in [0,1)\). Suppose we start at \(a_1=(1,0)\), then \(b_1=\left( \frac{1}{2},\frac{1}{2}\right) \) and \(a_2=(0,\frac{1}{2})\). After a tour of \(360^\circ \), the spiral comes back to the horizontal ray \({\mathbb {R}}^+(1,0)\) at \(a_5=\left( \frac{1}{16},0\right) \). So while at the beginning the spiral turns within the square \([-1,1]^2\), from the second tour onward it will stay in the square \([-\frac{1}{16},\frac{1}{16}]^2\). As the circle \(\mathcal {B}(b_1,\frac{7}{16}\sqrt{2})\) contains no points of the small square in its interior, the distance of \(b_1\) to the small square being \(R=\frac{7}{16}\sqrt{2}\), writing \(R=(1+c)r\), we conclude that we can take \(c=\frac{7}{8}\sqrt{2}-1>0\) in (6). Now up to a scaling and a rotation, the situation is precisely the same for every building block \(a\rightarrow b \rightarrow a^+\) starting in a square of length \(2 \Vert a\Vert \). After one \(360^\circ \)-tour, we end up at \(a^{++++}\) on the same ray as \(a\), and from there on, the spiral will stay in that smaller square of length \(2 \frac{1}{16}\Vert a\Vert = 2\Vert a^{++++}\Vert \). As a consequence of theorem 2, the sequence converges to \((0,0)\) with linear rate. None of the approaches of [4, 20, 21] allows to derive this.

Example 7.4

(Discrete spiral II). We can modify the above construction by fixing \(\phi \in (0, \frac{\pi }{4})\) and generating rays \(k\phi \), \(k\in {\mathbb {N}}\). Turning counterclockwise, and keeping only the projected points, we generate iterates \(a_k,b_k\) with the property that \(a_k\) has angle \(2k\phi \) mod \(2\pi \) with the horizontal, \(b_k\) has angle \((2k+1)\phi \) mod \(2\pi \). We put \(A=\{a_k: k\in {\mathbb {N}} \} \cup \{(0,0)\}\), \(B=\{b_k: k\in {\mathbb {N}} \} \cup \{(0,0)\}\), then \(A \cap B = \{(0,0)\}\) and \(P_B(a_k)=b_k\), \(P_A(b_k)=a_{k+1}\) by adapting the argument in Example 7.3. The sequence represents again a discrete version of the logarithmic spiral, turning inwards counterclockwise. However, if we now choose \(\phi \) such that \(\phi /(2\pi )\) is irrational, there will be no periodicity, and the set of directions \(a_k/\Vert a_k\Vert \) will be dense in \({\mathbb {S}}^1\), and so for \(b_k/\Vert b_k\Vert \). We have \(\angle (b^+-a^+,b-a^+)=\pi -\phi \), which means \(A,B\) intersect \(0\)-separably at \((0,0)\). They intersect at an angle, this angle being \(\pi -\phi \). However, \(A,B\) do not intersect linearly regularly in the sense of [4, 21]. Indeed, let us fix \(\psi \in [0,2\pi )\) and \(u=(\cos \psi ,\sin \psi )\). Then there exist rays \(2k\phi \) arbitrarily close to \(\psi \) and \(a_k\) on these rays, projected from \(b_{k-1}\) on ray \((2k-1)\phi \). That means, \(u_k= (b_k-a_k)/\Vert b_k-a_k\Vert \) gets arbitrarily close to the direction \(u^\perp = (-\sin \psi ,\cos \psi )\), so \(u^\perp \in N_A^B(0,0)\). This shows \(N_A^{\widetilde{B}}(0,0) = {\mathbb {R}}^2\) and \(N_B^{\widetilde{A}}(0,0)={\mathbb {R}}^2\) for \(\widetilde{A}=P_A(\partial B{\setminus } A)\), \(\widetilde{B}=P_B(\partial A{\setminus } B)\), so linear regularity and extensions fail.

Example 7.5

(Spiral and cylinder [7]). Consider the cylinder \(B=\{(x_1,x_2,x_3): x_1^2+x_2^2\le 1, 0 \le x_3\le 1\}\) and the spiral \(A = \{((1+e^{-t})\cos t,(1+e^{-t})\sin t, e^{-t/2}): t \ge 0 \} \cup S\), where \(S=\{(\cos \alpha ,\sin \alpha ,0): \alpha \in [0,2\pi ]\}\). Clearly \(A \cap B = S\). As shown in [7], any sequence of alternating projections between \(A\) and \(B\) started at \(a\in A{\setminus } S\) wanders down following the spiral, turning infinitely often around the cylinder with shrinking \(a_n-b_n\rightarrow 0\). In particular, every \(x^*\in S\) is an accumulation point of \(a_n,b_n\), so convergence fails. Since \(B\), being convex, is clearly Hölder regular with respect to \(A\), we deduce that the angle condition (1) must fail, so in particular \(A\) is not subanalytic. This is interesting, as \(A\) is the projection of an unbounded semianalytic set in \({\mathbb {R}}^4\). For a picture, see [7].

Example 7.6

(Failure of intrinsic transversality). We consider the sets \(A=\{2^{-2n}: n\in {\mathbb {N}}\}\cup \{0\}\), \(B = \{2^{-2n+1}: n \in {\mathbb {N}}\}\cup \{0\}\) in \({\mathbb {R}}\), so that \(A \cap B=\{0\}\). The sequence of alternating projections is \(1,\frac{1}{2},\frac{1}{4},\ldots \) and converges Q-linearly to 0. We have \(N_A^B(0)=N_B^A(0)={\mathbb {R}}_+\), hence \(A,B\) intersect with the CQ in the sense of [4] at 0, hence also \(0\)-separably. Note that \(B\) is not \((A,\epsilon ,\delta )\)-regular at \(0\) in the sense of [4], but it is \(\sigma \)-Hölder regular for every \(\sigma \in [0,1)\). Note that intrinsic transversality fails here, because it uses the cones \(N_A(a)\), \(N_B(b)\), which in this case are too large because they coincide with the whole line.

We modify this example as follows: Let \(a_n=2^{-n}\), \(A=\{a_n: n\in {\mathbb {N}}\}\cup \{0\}\), \(b_n = \frac{1}{2}(a_n+a_{n+1})-\delta _n\), \(B = \{b_n: n\in {\mathbb {N}}\}\cup \{0\}\), where \(\delta _n < 2^{-n}(a_n-b_n)\). Then \(\Vert a_{n+1}-b_n\Vert \) shrinks only by a factor \(1-\delta _n\rightarrow 1\) with respect to \(\Vert b_n-a_n\Vert \), while shrinkage between \(\Vert a_{n+1}-b_n\Vert \) and \(\Vert a_{n+1}-b_{n+1}\Vert \) is by a factor close to \(\frac{1}{2}\). This shows that an alternating sequence may converge R-linearly without a fixed shrinkage factor \(1-\kappa ^2\) in every half step. Note that Theorem 2 still applies in this case.

Example 7.7

We give an example where \(A,B\) intersect tangentially, but not \(\omega \)-separably for any \(\omega \in [0,2)\). Let \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) be differentiable with \(f'\) continuous at \(0\), \(f(0)=0\), \(f(x)> 0\) for \(x\not =0\), and define \(A=\mathrm{epi}f=\{(x,y)\in {\mathbb {R}}^2: y \ge f(x)\}\), \(B=\{(x,y)\in {\mathbb {R}}^2: y \le 0\}\), then \(A \cap B=\{(0,0)\}\). We consider a building block \(b\rightarrow a^+\rightarrow b^+\). Let \(b^+=(x,0)\), then \(a^+=(x,f(x))\). Suppose \(b=(y,0)\), then \(y = x + f(x)f'(x)\). Then the quotient \(q\) in (1 \(^\prime \)) reads

$$\begin{aligned} q(x)= \frac{\sqrt{1+f'(x)^2}-1}{f(x)^\omega \sqrt{1+f'(x)^2}} = \frac{f'(x)^2}{f(x)^\omega \sqrt{1+f'(x)^2} (\sqrt{1+f'(x)^2}+1)} \le \frac{f'(x)^2}{2 f(x)^\omega }. \end{aligned}$$

Therefore, if the angle condition (1) is to hold for some \(\omega \), then \(\liminf _{x\rightarrow 0} \frac{f'(x)^2}{f(x)^{\omega }}\ge \gamma >0.\) It is possible to construct \(f\) such that this fails for every \(\omega \in [0,2)\). Take for instance

$$\begin{aligned} f(x) =\left\{ \begin{array}{l@{\quad }l} e^{-\frac{1}{x^2}} &{}\text{ if } x\ne 0\\ 0&{}\text{ if } x=0 \end{array}\right. , \end{aligned}$$

then \(q(x) \le 2 x^{-6} \exp (-x^{-2}(2-\omega )) \rightarrow 0\) as \(x\rightarrow 0\) for \(0 < \omega < 2\). Separability with \(\omega =0\) is also impossible because \(f'(x)\rightarrow 0\) as \(x\rightarrow 0\). In conclusion, the sets \(A\) and \(B\) intersects tangentially, but not separably for any \(\omega \in [0,2)\).

Example 7.8

Using the same function \(f\) and \(A,B\), observe that for \(\omega \ge 2\) the quotient \(q(x)\) stays away from 0, so that condition (1 \(^\prime \)) is satisfied. This explains why values \(\omega \ge 2\) are not meaningful in Definition 1.