1 Tranversality

The classical idea of transversality at an intersection point of two smooth manifolds in a Euclidean space \(\mathbf{E}\) means that the two tangent spaces at the intersection point sum to \(\mathbf{E}\): equivalently, their two orthogonal complements (the normal spaces) intersect trivially. Transversality assumptions are ubiquitous throughout analysis, guaranteeing in particular some stability for intersections under small perturbations. For two convex sets, for example, this stability requires that the intersection point lies on no separating hyperplane, a property also easy to express using normal vectors.

For more general sets that may be neither convex nor smooth, the correct generalization of transversality involves normals rather than tangents. Finite-dimensional variational analysis allows a unified treatment through a single, fundamental geometric construction: the normal cone (the monographs [8, 10, 30, 35] are good references). In the space \(\mathbf{E}\), we denote the inner product \(\langle \cdot ,\cdot \rangle \) and the norm \( |\cdot |\). The set of points in a set \(X \subset \mathbf{E}\) at minimum distance from a point \(y \in \mathbf{E}\) is the projection \(P_X(y)\), and that minimum distance is d(yX). For any point \(x \in X\), vectors in the cone

$$\begin{aligned} N^p_X(x)=\big \{\lambda u:\lambda \in \mathbf{R}_+,\,u \in \mathbf{E},\,x \in P_X(x+u) \big \} \end{aligned}$$

are called proximal normals to X at x. Limits of proximal normals to X at points \(x_n \in X\) approaching x are called normals and comprise the normal cone \(N_X(x)\). When X is a smooth manifold, \(N_X(x)\) is the classical normal space; when X is closed and convex, on the other hand, \(N_X(x)\) consists of those linear functionals maximized over X by x.

By analogy with the classical case of manifolds, we say that two closed sets \(X,Y \subset \mathbf{E}\) intersect transversally at a point \(z \in X \cap Y\) when

$$\begin{aligned} N_X(z) \cap -N_Y(z) = \{0\}. \end{aligned}$$
(1.1)

Central to many developments of variational analysis, this property is the pre-eminent theme, in particular, in [30, 31]. The terminology, highlighting a fresh understanding of the classical analogy, is more recent: an early example is [10, p. 99].

In our entirely finite-dimensional setting, this definition of transversality is equivalent to several metric regularity properties [35], with profound consequences for stability, sensitivity, and robustness (and for algorithms [27]). Metric regularity of set-valued mappings is not our focus, but standard “coderivative” computations [23, 28] prove the equivalence of property (1.1) with each of the following properties.

  • The mapping \(\Phi :\mathbf{E}^2 \rightrightarrows \mathbf{E}\) defined by \(\Phi (x,y) = x-y\) for \(x \in X\) and \(y \in Y\) (and empty otherwise) is metrically regular at (zz) for 0.

  • The mapping \(\Psi :\mathbf{E}\rightrightarrows \mathbf{E}^2 \) defined by \(\Psi (w) = (X-w) \times (Y-w)\) is metrically regular at z for (0, 0).

  • There exists a constant \(\tau > 0\) such that

    $$\begin{aligned} d(w,X' \cap Y') ~\le ~ \tau \big (d(w,X') + d(w,Y')\big ) ~~~\text{ for } \text{ all } w \in \mathbf{E} \text{ near } \text{ z } \end{aligned}$$
    (1.2)

    and all small translations \(X',Y'\) of XY.

Transversality implies in particular the existence of a constant \(\tau > 0\) such that (1.2) holds with \(X'=X\) and \(Y'=Y\). We call this weaker property subtransversality, because it corresponds to metric subregularity [35] for the mappings \(\Phi \) and \(\Psi \). For an extended discussion of metric regularity and transversality, see [23].

Transversality is not just geometrically fundamental; it is also broad in applicability. As we discuss below, the property is strikingly unrestrictive in the class of sets to which it potentially applies, and, in concrete settings, it holds generically.

First, note that the transversality condition (1.1) implies little about the local structure of either of the two sets X and Y individually. In particular, it may hold without either set, considered alone, being “regular” in the standard single-set senses in the literature: for example, local convexity or smoothness, prox- or Clarke regularity [8, 10, 30, 35], or super-regularity [28].Footnote 1 The epigraph

$$\begin{aligned} \{ (x,\tau ) \in \mathbf{E}\times \mathbf{R}: \tau \ge f(x) \} \end{aligned}$$

of any locally Lipschitz function f intersects itself transversally at the point (0, f(0)), so in particular transversality is potentially applicable to any “epi-Lipschitz” set. In Sect. 3, we extend the reach of transversality further, to sets with empty interior.

Secondly, consider the broad but concrete class of semi-algebraic sets in the space \(\mathbf{R}^n\). Such sets are finite unions of basic sets, each of which is a finite intersection of polynomial level sets \(\{x : p(x) \le 0 \}\) and their complements [6, 12]. Given any two closed semi-algebraic sets \(X,Y \subset \mathbf{R}^n\), as we later prove, for almost all vectors \(e \in \mathbf{E}\), transversality holds at every point in the intersection of the sets X and \(Y-e\).

To summarize, transversality is a concise, fundamental, and widely studied geometric property, elegantly unifying classical notions, applicable to broad classes of sets, and commonly holding in concrete settings. Among its many powerful consequences, our aim here is to describe the first proof that transversality alone also guarantees local linear convergence for the method of alternating projections.

2 Alternating Projections

The method of alternating projections for finding a point in the intersection of two nonempty closed sets \(X,Y \subset \mathbf{E}\) iterates the following pair of steps: given a current point \(x_n \in X\),

$$\begin{aligned} \text{ choose }&y_n \in P_Y(x_n) \\ \text{ choose }&x_{n+1} \in P_X(y_n). \end{aligned}$$

This conceptually simple and widely used method for solving feasibility problems has been discovered and rediscovered by a number of authors, notably von Neumann [36]. We are particularly interested in linear convergence of the sequence of iterates, by which we mean that the sequence of distances from the iterates to the limit point is bounded above by a geometric sequence.

Convergence of the method for two intersecting closed convex sets is always guaranteed [9]. Linear convergence holds whenever the relative interiors of X and Y intersect [16], and more generally assuming just the global version of subtransversality [3].

Even for nonconvex sets, the method is popular: the first linear convergence guarantees in this setting appeared in [28, 29], with extensions in [4, 5, 17] and another discussion in [1]. In particular, the main contribution in [28] was a proof that transversality at the point z suffices to guarantee linear convergence, starting sufficiently close to z, providing at least one of the sets X and Y is “super-regular” at z. Super-regularity is a nontrivial assumption, requiring the set to be not too nonconvex, and in particular implying Clarke regularity: it opens the door to a simple geometric proof. Super-regularity is, however, superfluous. Using a very different proof technique—the main topic of this paper—the authors derived the following striking theorem. We first discuss its provenance.

Theorem 2.1

If two closed sets in a Euclidean space intersect transversally at a point, then the method of alternating projections, started nearby, converges linearly to a point in the intersection.

The August 2013 PhD thesis of Drusvyatskiy seems to be the first reference for this result and its proof [13, Theorem 3.2.3], as well as the definition of “intrinsic” transversality on which it relies [13, Definition 3.2.1]. The result (without proof) was published in August 2014, in Lewis’s lecture for the International Congress of Mathematicians [27, Theorem 2.1]. The initial version of the current work [14] was made public and submitted in January 2014.

A new and general framework for analyzing the convergence of nonconvex alternating projections (using properties of short sequences of iterates) was submitted for publication and disseminated in December 2013—see [32]. That manuscript did not show that transversality implies linear convergence (Theorem 2.1 above). Although result statements in a later revision [33, Corollary 7] (and the final published version [34]) do subsume Theorem 2.1, they do so by explicitly relying on our core argument in [14].Footnote 2 It is that argument that we present here.

In general, for two intersecting sets, no matter how close an initial point is to the intersection, the iterates generated by the method of alternating projections may not tend to the intersection. However, such pathologies rarely appear in practice. To end, we show how our basic technique proves that, if the two sets X and Y are semi-algebraic and one of the sets is compact, then the method of alternating projections initiated sufficiently close to the intersection always generates iterates whose distance to the intersection tends to zero. Further regularity guarantees convergence of the sequence of iterates [34]. An analogous but independent result for averaged projections appears in [2].

3 Intrinsic Transversality

Theorem 2.1 is not the last word on the local linear convergence of the method of alternating projections. It does not cover classically well-understood cases like two distinct intersecting lines in \(\mathbf{R}^3\), a gap discussed in some recent work [1, 4, 5, 17, 34].

The approach in [17] depends, among other assumptions, on subtransversality (called there “local linear regularity”). Using a very different proof technique, we rely only on the following property, which we show to be intermediate in strength between transversality and subtransversality.

Definition 3.1

Two closed sets \(X,Y \subset \mathbf{E}\) are intrinsically transversal at a common point if there exists an angle \(\alpha > 0\) such that, nearby, any two points \(x \in X \cap Y^c\) and \(y \in Y \cap X^c\) cannot have difference \(x-y\) simultaneously making an angle strictly less than \(\alpha \) with both the cones \(N_Y(y)\) and \(-N_X(x)\).

(The normal cones may equivalently be replaced by proximal normal cones [14].)

While intrinsic transversality is widely applicable, we make no attempt at a detailed comparison with other variants of transversality. Some relationships may be found in [14], and extensive discussions appear in [34] and [24, 25]. Our focus here is rather a clear and complete presentation of the new and subtle geometric argument opened up by the idea of intrinsic transversality, proving linear convergence.

Intrinsic transversality, as described in Definition 3.1, is a geometric idea. However, for the purposes of our proof, a variational perspective is more useful.

Using the normalization map \(\widehat{ } :\mathbf{E}{\setminus } \{0\} \rightarrow \mathbf{E}\), defined by \(\widehat{z} = \frac{z}{|z|}\), we can rephrase the definition as follows. Two sets \(X,Y \subset \mathbf{E}\) are intrinsically transversal at a common point if there exists a constant \(\kappa \in (0,1]\) (called a constant of transversality) such that nearby, any two points \(x \in X \cap Y^c\) and \(y \in Y \cap X^c\) have normalized displacement \(u = \widehat{x-y}\) satisfying

$$\begin{aligned} \max \Big \{ d\big (u,N_Y(y)\big ), d\big (u,-N_X(x)\big ) \Big \}\ge \kappa . \end{aligned}$$
(3.1)

With this in mind, we have the following easy observation.

Proposition 3.2

Transversality implies intrinsic transversality. Indeed, if two closed sets \(X,Y \subset \mathbf{E}\) intersect transversally at a point \(z \in X \cap Y\), and \(\theta \in (0,\pi ]\) is the minimal angle between vectors in the normal cones \(N_Y(z)\) and \(-N_X(z)\), then intrinsic transversality holds with any constant in the interval \((0,\sin \frac{\theta }{2})\).

Proof

Clearly we have

$$\begin{aligned} \min _{|u| = 1} \max \Big \{ d\big (u,N_Y(z)\big ), d\big (u,-N_X(z)\big ) \Big \} ~=~ \sin \frac{\theta }{2}. \end{aligned}$$

(We define \(\theta = \pi \) in the trivial case when either normal cone is \(\{0\}\)). Transversality simply means \(\theta > 0\), and in that case an easy limiting argument shows that inequality (3.1) holds providing that \(0 < \kappa < \sin \frac{\theta }{2}\). \(\square \)

Several other common special cases are also covered under the umbrella of intrinsic transversality. Examples include two closed convex sets with intersecting relative interiors, and two arbitrary intersecting sets that are polyhedra (or finite unions of polyhedra).

We mention one particular classical generalization of transversality in this context (see [1]). Two smooth manifolds \(X,Y \subset \mathbf{E}\) intersect cleanly at a point \(z \in X \cap Y\) (see [18]) if the intersection \(X \cap Y\) is a manifold and the tangent spaces at z satisfy

$$\begin{aligned} T_{X \cap Y}(z) = T_{X}(z) \cap T_{Y}(z). \end{aligned}$$

One consequence is local linear convergence of the method of alternating projections, since, in fact, clean intersection implies intrinsic transversality. To see this, we note that, assuming clean intersection at z, there exists a diffeomorphism mapping z to zero and the two manifolds onto two linear subspaces [18, Proposition C.3.1]. However, intrinsic transversality always holds at zero for subspaces, and an easy exercise shows that it is preserved under diffeomorphisms, so the result follows.

4 Slope and The Coupling Function

Intrinsic transversality has its roots in a fundamental variational-analytic idea: slope. Denoting the extended reals \([-\infty ,+\infty ]\) by \({\overline{\mathbf{R}}}\), if a function \(f :\mathbf{E}\rightarrow {\overline{\mathbf{R}}}\) is finite at a point \(x \in \mathbf{E}\), then its slope there is

$$\begin{aligned} |\nabla f|(x) = {\mathop {\mathop {\limsup }\limits _{w \rightarrow x}}\limits _{w \ne x}}\frac{f(x)-f(w)}{|x-w|}, \end{aligned}$$

unless x is a local minimizer, in which case we set \(|\nabla f|(x) = 0\). The limiting slope at x is

$$\begin{aligned} \overline{|\nabla f|}(x) = {\mathop {\mathop {\liminf }\limits _{w \rightarrow x}}\limits _{f(w) \rightarrow f(x)}} |\nabla f|(w), \end{aligned}$$

and when f is lower semi-continuous, we have the following relationship:

$$\begin{aligned} |\nabla f|(x)\ge \overline{|\nabla f|}(x)=d\big (0,\partial f(x)\big ), \end{aligned}$$
(4.1)

where \(\partial f\) denotes the set of subgradients [35]. For a proof, see [19, Propositions 1 and 2, Chapter 3], or, for example, [15, Proposition 4.6].

Now consider a “coupling” function for the two nonempty closed sets \(X,Y \subset \mathbf{E}\). We define a function \(\phi :\mathbf{E}^2 \rightarrow {\overline{\mathbf{R}}}\) by

$$\begin{aligned} \phi (x,y) = \delta _X(x) + |x-y| + \delta _Y(y), \end{aligned}$$

where \(\delta _X\) and \(\delta _Y\) denote the indicator functions of the sets X and Y, respectively. We also consider the marginal function \(\phi _y :\mathbf{E}\rightarrow {\overline{\mathbf{R}}}\) (for any fixed point \(y \in \mathbf{E}\)) defined by \(\phi _y(x) = \phi (x,y)\), and the marginal function \(\phi _x :\mathbf{E}\rightarrow {\overline{\mathbf{R}}}\) (for any fixed point \(x \in \mathbf{E}\)) defined by \(\phi _x(y) = \phi (x,y)\). Standard subdifferential calculus applied to the coupling function and its marginals shows, for distinct points \(x \in X\) and \(y \in Y\), with normalized displacement \(u = \widehat{x-y}\),

$$\begin{aligned} \partial \phi _y(x) = u + N_X(x) \quad \text{ and }\quad \partial \phi _x(y) = -u + N_Y(y), \end{aligned}$$

and furthermore

$$\begin{aligned} \partial \phi (x,y) = \partial \phi _y(x) \times \partial \phi _x(y). \end{aligned}$$

Using the relationship (4.1), we arrive at the following expressions for the limiting slopes:

$$\begin{aligned} \overline{|\nabla \phi _y|}(x) = d\big (u,-N_X(x)\big ) \quad \text{ and }\quad \overline{|\nabla \phi _x|}(y) = d\big (u,N_Y(y)\big ). \end{aligned}$$

Hence, the sets X and Y are intrinsically transversal at a common point z exactly when there exists a constant \(\kappa \in (0,1]\) such that, near z, for any two points \(x \in X \cap Y^c\) and \(y \in Y \cap X^c\), the limiting slopes \(\overline{|\nabla \phi _y|}(x)\) and \(\overline{|\nabla \phi _x|}(y)\) cannot both be strictly less than \(\kappa \). The following result is an interesting alternative view.

Proposition 4.1

Two closed sets \(X,Y \subset \mathbf{E}\) are intrinsically transversal at a point \(z \in X \cap Y\) if and only if the coupling function has slope uniformly bounded away from zero for pairs (xy) with \(x \in X \cap Y^c\) and \(y \in Y \cap X^c\), both near z.

Proof

We note the relationship

$$\begin{aligned} \big (\overline{|\nabla \phi |}(x,y) \big )^2=\big ( \overline{|\nabla \phi _y|}(x) \big )^2 + \big (\overline{|\nabla \phi _x|}(y) \big )^2, \end{aligned}$$
(4.2)

from which the result follows immediately. \(\square \)

5 Error Bounds and Level Sets

We use the following key tool [19, Basic Lemma, Chapter 1], giving a slope-based criterion for a lower semi-continuous function f to have a nonempty level set

$$\begin{aligned}{}[f \le \alpha ]=\{u \in \mathbf{E}: f(u) \le \alpha \}. \end{aligned}$$

We note in passing that the result (and the proof we present) holds more generally, with \(\mathbf{E}\) replaced by an arbitrary complete metric space.

Theorem 5.1

(Error bound) Consider any lower semi-continuous function \(f:\mathbf{E}\rightarrow \overline{\mathbf{R}}\), finite at the point \(x \in \mathbf{E}\), and constants \(\delta >0\) and \(\alpha < f(x)\) satisfying

$$\begin{aligned} \inf _{w \in \mathbf{E}} \big \{|\nabla f|(w) : \alpha <f(w)\le f(x),\,|w-x| \le \delta \big \}>\frac{f(x)-\alpha }{\delta }. \end{aligned}$$

Then, the level set \([f \le \alpha ]\) is nonempty, and furthermore, its distance from x is no more than \(\frac{1}{K}\big (f(x)-\alpha \big )\), where K denotes the left-hand side of the inequality above.

Proof

We present two proofs. The first, suitable in any complete metric space, applies the Ekeland variational principle to the function \(g :\mathbf{E}\rightarrow \overline{\mathbf{R}}\), defined as the positive part of the function \(f-\alpha \). We deduce, for any constant \(\gamma \in \big (\frac{1}{\delta }(f(x)-\alpha ),K\big )\), the existence of a point v minimizing the function \(g + \gamma |\cdot - v|\) and satisfying the properties \(g(v) \le g(x)\) and \(|v - x| \le \frac{1}{\gamma }g(x) < \delta \). The minimizing property of v shows \(|\nabla g|(v) \le \gamma < K\). Hence, \(g(v) = 0\), since otherwise \(|\nabla g|(v) = |\nabla f|(v) \ge K\). The result now follows by letting \(\gamma \) approach K.

A second, more elementary proof dispenses with the Ekeland principle when \(\mathbf{E}\) is a Euclidean space. Assuming as before \(\frac{1}{\delta }(f(x)-\alpha ) < \gamma < K\), define a set

$$\begin{aligned} W=\{w \in \mathbf{E}: f(w) \le f(x) - \gamma |w-x| \big \}. \end{aligned}$$

The value \(\beta = \inf _W f\) satisfies \(\beta \le f(x)\). Any sequence \((w_r)\) in W with \(f(w_r) \rightarrow \beta \) must be bounded, and any limit point \(\bar{w}\) satisfies \(\beta = f(\bar{w}) \le f(x) - \gamma |\bar{w} - x|\).

We claim \(\beta \le \alpha \). If not, then \(\alpha < f(\bar{w}) \le f(x)\) and

$$\begin{aligned} |\bar{w} - x| \le \frac{f(x) - f(\bar{w})}{\gamma } < \frac{f(x) - \alpha }{\gamma } < \delta , \end{aligned}$$

so by definition \(K \le |\nabla f|(\bar{w})\). Hence, by definition of the slope, there is a point \(w \in \mathbf{E}\) satisfying

$$\begin{aligned} f(w) < f(\bar{w}) - \gamma |\bar{w} - w| < \beta . \end{aligned}$$

Furthermore, the central expression is bounded above by \(f(x) - \gamma (|\bar{w} - x| + |\bar{w} - w|)\), and hence by \(f(x) - \gamma |w - x|\), so \(w \in W\), contradicting the definition of \(\beta \).

We conclude \(\beta \le \alpha \), so we have constructed a point \(\bar{w}\) in the level set \([f \le \alpha ]\), as required. To see the distance estimate, notice (as in the first proof) that we can assume \(f \ge \alpha \), by replacing f by \(\max \{f, \alpha \}\). In that case, we have in fact proved \(\beta = \alpha \) and hence \(|\bar{w} - x| \le \frac{1}{\gamma }(f(x) - \alpha )\). Now let \(\gamma \) approach K as before. \(\square \)

Our main result follows quickly from the following appealing geometric consequence of Theorem 5.1. This tool concerns the distance from a point y to a set X. Given a trial point \(x \in X\), it shows how to improve the upper bound \(|x-y|\) under the assumption of a uniform lower bound on the angle between normal vectors to X at points w near x and line segments from w to y.

Theorem 5.2

(Distance decrease) Consider a closed set \(X \subset \mathbf{E}\) and points \(x \in X\) and \(y \not \in X\) with \(\rho :=|y-x|\). Given any constant \(\delta >0\), if

$$\begin{aligned} \inf _{w \in X} \big \{ d \big ( \widehat{y-w}, N_X(w) \big ) : w \in B_{ \rho }(y)\cap B_{\delta }(x) \big \} =\mu >0, \end{aligned}$$

then

$$\begin{aligned} d(y,X) \le |y-x| - \mu \delta . \end{aligned}$$

Proof

We apply Theorem 5.1 (Error bound) to the function \(f = |\cdot - y| + \delta _X\). Inequality (4.1) and a repeat of our calculation of \(\overline{|\nabla \phi _y|}\) in the previous section show

$$\begin{aligned} |\nabla f|(w)\ge \overline{|\nabla f|}(w)=d\big (\widehat{y-w},N_X(w)\big ). \end{aligned}$$

In the notation of Theorem 5.1, assuming \(\alpha < f(x)\), we have

$$\begin{aligned} \mu \le K = \inf _{w \in X} \big \{ |\nabla f|(w) : \alpha < |w-y| \le |x-y|,|w-x| \le \delta \big \}. \end{aligned}$$

Hence, providing \(\alpha \) satisfies \(\mu > \frac{1}{\delta }(|x-y| - \alpha )\), we deduce the first conclusion of Theorem 5.1: the level set \([f \le \alpha ]\) is nonempty, or in other words \(d(y,X) \le \alpha \). The result now follows. \(\square \)

6 Main Result

The following is our main result.

Theorem 6.1

(Linear convergence) If two closed sets \(X,Y \subset \mathbf{E}\) are intrinsically transversal at a point \(z \in X \cap Y\), with constant \(\kappa > 0\), then, for any constant c in the interval \((0,\kappa )\), the method of alternating projections, initiated sufficiently near z, converges to a point in the intersection \(X \cap Y\) with R-linear rate \(1-c^2\).

Proof

There exists a constant \(\epsilon > 0\) such that for any distinct points \(x \in X\) and \(y \in Y\), both lying in the open ball \(B_{\epsilon }(z) \), we have

$$\begin{aligned} \max \big \{ d\big (u,N_Y(y)\big ), d\big (u,-N_X(x)\big ) \big \}\ge \kappa \quad \text{ for }\,u=\widehat{x-y}. \end{aligned}$$

Now consider any point \(x \in X\), close to z, and \(y \in P_Y(x)\). We aim to deduce the inequality

$$\begin{aligned} d(y,X) \le (1-c^2)|y-x|. \end{aligned}$$

The result will then follow by a routine induction. We can assume \(y \not \in X\), since otherwise there is nothing to prove.

Define \(\rho = |x-y|\) and assume

$$\begin{aligned} x,y \in B_{\epsilon - \kappa \rho }(z), \end{aligned}$$

as certainly holds if x is sufficiently close to z. Consider any point \(w \in X \cap B_{\kappa \rho }(x)\). Since \(|w-x| < \kappa |x-y| \le d_Y(x)\), we see \(w \not \in Y\), and in particular \(w \ne y\). We have

$$\begin{aligned} d\big (\widehat{w-y}, N_Y(y)\big ) \le d\big (\widehat{w-y}, \mathbf{R}_+(x-y)\big ), \end{aligned}$$

since \(x-y \in N_Y(y)\). To bound the right-hand side, note that any nonzero vectors \(p,q \in \mathbf{E}\) satisfy the inequalities

$$\begin{aligned} d(\widehat{p},\mathbf{R}_+q)\le \big |\widehat{p}-\langle \widehat{p},\widehat{q} \rangle \widehat{q} \big |\le \frac{|p-q|}{|q|}. \end{aligned}$$

Setting \(p=w-y\) and \(q=x-y\) shows, for any \(w \in X \cap B_{\kappa \rho }(x)\), the inequality

$$\begin{aligned} d\big (\widehat{w-y}, N_Y(y)\big ) < \kappa , \end{aligned}$$

and hence, using intrinsic transversality

$$\begin{aligned} d\big (\widehat{y-w}, N_X(w)\big ) \ge \kappa . \end{aligned}$$

We next apply Theorem 5.2 with \(\delta = c\rho \). In the notation of that result, we have \(\mu \ge \kappa > c\), so we deduce

$$\begin{aligned} d(y,X) \le \rho - \mu c \rho \le (1-c^2)|y-x|, \end{aligned}$$

as required. The result now follows. \(\square \)

Theorem 2.1 now follows, by applying Proposition 3.2. Indeed, if two closed sets \(X,Y \subset \mathbf{E}\) intersect transversally at a point \(z \in X \cap Y\), and the minimal angle between vectors in the closed cones \(N_Y(z)\) and \(-N_X(z)\) is \(\theta \in (0,\pi ]\), then for any constant \(r \in (\cos ^2 \frac{\theta }{2}, 1)\), the method of alternating projections, initiated sufficiently near z, must converge to a point in \(X \cap Y\) with R-linear rate r.

Another consequence of the proof technique is the following result.

Theorem 6.2

Intrinsic transversality implies subtransversality.

Proof

Under the assumptions of Theorem 6.1, consider any point \(x \in X\) near the intersection point z, and any point \(y \in P_Y(x)\). We proved \(d(y,X) \le (1-c^2)d(x,Y)\), so by induction, alternating projections converges to a limit in the intersection \(X \cap Y\) whose distance from x is no more than

$$\begin{aligned} d(x,Y) \sum _{n=0}^{\infty } (1-c^2)^n. \end{aligned}$$

Hence, \(d(x,X \cap Y) \le c^{-2} d(x,Y)\).

Now consider any point \(w \in \mathbf{E}\) near the intersection point z, and any point \(x \in P_X(w)\). The triangle inequality implies

$$\begin{aligned} d(x,Y) \le |x-w| + d(w,Y) = d(w,X) + d(w,Y). \end{aligned}$$

and hence

$$\begin{aligned} d(w,X\cap Y)\le & {} |w-x| + d(x,X \cap Y) \\\le & {} d(w,X) + c^{-2}d(x,Y) \\\le & {} (1+c^{-2})\big (d(w,X) + d(w,Y)\big ), \end{aligned}$$

which proves subtransversality. \(\square \)

Transversality is a strictly stronger condition than intrinsic transversality, as shown by two intersecting lines in \(\mathbf{R}^3\). In turn, intrinsic transversality is strictly stronger than subtransversality. Indeed, subtransversality alone does not even guarantee local convergence of alternating projections, as the following example shows.

Consider the two closed sets in \(\mathbf{R}^2\)

$$\begin{aligned} X= & {} \big \{ (x,0) : x \ge 0 \big \} \\ Y= & {} \big \{(0,0)\big \} \cup \bigcup _{n=-\infty }^{\infty } \Big \{ \big ( x,\frac{1}{2}(3^{n+1}-x)(3^{-n}x-1) \big ) : 3^n \le x \le 3^{n+1} \Big \}. \end{aligned}$$

Geometrically, X is just the nonnegative x-axis, and Y consists of sequence of parabolic arches based on the x-axis and scaling linearly down toward the origin. Subtransversality at the origin is routine to check. However, intrinsic transversality fails: for any integer n, the point \((2\cdot 3^n,0) \in X\) projects onto the point \((2\cdot 3^n,\frac{1}{2}3^n) \in Y\) and vice versa, so alternating projections cycles at points arbitrarily close to the origin.

7 Semi-algebraic Intersections

In this last section, we turn to the case where the two sets are semi-algebraic. Alternating projections on algebraic varieties, in particular, was considered in [1].

Classically (and reassuringly) two smooth manifolds “typically” intersect transversally: almost all translations to the manifolds satisfy transversality throughout the intersection. We first show the same result for semi-algebraic sets. See [22] for the role of semi-algebraic ideas in nonsmooth optimization, and for the more general “tame” setting, to which all results of this section extend.

Theorem 7.1

(Generic transversality) Consider two closed semi-algebraic sets \(X,Y \subset \mathbf{E}\). Then, for almost every vector e in \(\mathbf{E}\), transversality holds at every point in the (possibly empty) intersection of the sets X and \(Y-e\).

Proof

Consider the set-valued mapping F from \(\mathbf{E}\) to \(\mathbf{E}^2\) defined by

$$\begin{aligned} F(z)=(X-z) \times (Y-z). \end{aligned}$$

In standard variational-analytic language [20, 21], we define a value of this mapping \((a,b) \in \mathbf{E}^2\) as critical when F is not metrically regular for (ab) at some point

$$\begin{aligned} z \in F^{-1}(a,b) = (X-a) \cap (Y-b). \end{aligned}$$

By a simple coderivative calculation (cf. [28, p. 491]), this is equivalent to the sets \(X-a\) and \(Y-b\) not intersecting transversally at z. The semi-algebraic Sard theorem [20, 21] now shows that the semi-algebraic set of critical values \((a,b) \in \mathbf{E}^2\) has dimension strictly less than \(2\dim \mathbf{E}\). The result now follows by Fubini’s theorem. \(\square \)

It is interesting to ask whether convergence of alternating projections is guaranteed (albeit sublinear) even without intrinsic transversality. This is particularly important, since in practice to check whether transversality holds requires knowledge of a point in the intersection. As the example in the previous section shows, without intrinsic transversality, even limit points of alternating projection iterates may not lie in the intersection. Nevertheless, such pathologies do not occur in the semi-algebraic setting. A key tool for establishing this result is the following theorem based on [7, Theorem 14], a result growing out of the work of [26] extending the classical Łojasiewicz inequality for analytic functions to a nonsmooth semi-algebraic setting. See [22, Corollary 6.2] for a related version.

Theorem 7.2

(Kurdyka–Łojasiewicz inequality) Given a semi-algebraic lower semi-continuous function \(f:\mathbf{E}\rightarrow \overline{\mathbf{R}}\), consider a bounded open set \(U \subset \mathbf{E}\). Then, for any value \(\tau \in \mathbf{R}\), there exists a constant \(\rho >0\) and a continuous strictly increasing semi-algebraic function \(\theta :(\tau ,\tau +\rho ) \rightarrow (0,+\infty )\), such that the inequality

$$\begin{aligned} \overline{|\nabla f|}(x)\ge \theta \big ((f(x)\big ) \end{aligned}$$

holds for all \(x\in U\) with \(\tau <f(x)<\tau +\rho \).

We now arrive at the main result of this section.

Theorem 7.3

(Semi-algebraic intersections) Consider two nonempty closed semi-algebraic sets \(X,Y \subset \mathbf{E}\), with X bounded. If the method of alternating projections starts in Y and near X, then the distance of the iterates to the intersection \(X \cap Y\) converges to zero, and hence every limit point lies in \(X \cap Y\).

Proof

Define a bounded open set

$$\begin{aligned} U=\{z:d(z,X)<1\}. \end{aligned}$$

The coupling function \(\phi \) from Sect. 4 is semi-algebraic, so the Kurdyka-Łojasiewicz inequality above implies that there exists a constant \(\rho \in (0,1)\) and a continuous function \(h :(0,\rho ) \rightarrow (0,+\infty )\) such that all points \(x \in X\) and \(y \in Y \cap U\) with \(0 < |x-y| < \rho \) satisfy the inequality

$$\begin{aligned} \overline{|\nabla \phi |}(x,y)\ge \sqrt{2}\cdot h(|x-y|). \end{aligned}$$

From Eq. (4.2), we deduce

$$\begin{aligned} \max \big \{\overline{|\nabla \phi _x|}(y), \overline{|\nabla \phi _y|}(x)\big \} ~\ge ~ h(|x-y|). \end{aligned}$$
(7.1)

Suppose the initial point \(y_0 \in Y\) satisfies \(d(y_0,X) < \rho \). The distance between successive iterates \(|x_n - y_n|\) is decreasing: our aim is to prove its limit is zero. By way of contradiction, suppose in fact \(|x_n-y_n| \downarrow \alpha \in (0,\rho )\).

Associated with the continuous function h, around any nonzero vector v, we define a radially symmetric open “cusp”

$$\begin{aligned} \mathcal {K}(v):=\big \{z\in \mathbf{E}: 0<|z|<\rho ~\text { and }~ d\big (\widehat{z},\mathbf{R}_+v\big )< h\big (|z|\big )\big \}, \end{aligned}$$

where \(\mathbf{R}_+v\) is the ray generated by v. For each iteration \(n=1,2,3,\ldots \), consider the normal vector \(v_n = x_n-y_n \in N_Y(y_n)\). Note \(|v_n| > \alpha \). Obviously the open set \({\mathcal {K}}(v_n)\) contains the point \(v_n\): we next observe that in fact it also contains a uniform neighborhood. Specifically, we claim there exists a constant \(\epsilon > 0\) such that \(x-y_n \in {\mathcal {K}}(v_n)\) whenever \(|x-x_n| \le \epsilon \). Otherwise, there would exist a sequence \((x'_n)\) in \(\mathbf{E}\) satisfying

$$\begin{aligned} |x'_n - x_n| \rightarrow 0 ~~\text{ and }~~ x'_n - y_n \not \in {\mathcal {K}}(v_n). \end{aligned}$$

Notice that the distance between the vector \(x'_n - y_n\) and the vector \(v_n\) converges to 0, so the same is true after we normalize them (normalization being a Lipschitz map on the set \(\{v \in \mathbf{E}: |v| > \frac{\alpha }{2}\}\)). But continuity of h now gives the contradiction

$$\begin{aligned} \big | \widehat{x'_n - y_n} - \widehat{v}_n \big | \ge d \big ( \widehat{x'_n - y_n}, \mathbf{R}_+ v_n \big ) \ge h\big (|x'_n - y_n|\big ) \rightarrow h(\alpha )>0. \end{aligned}$$

We deduce that any point \(x \in X\) with \(|x-y_n| < \rho \) and \(|x-x_n| \le \epsilon \) satisfies

$$\begin{aligned} \overline{|\nabla \phi _{x}|}(y_n) = d\big (\widehat{x-y_n},N_Y(y_n)\big ) \le d\big (\widehat{x-y_n},\mathbf{R}_+v_n\}\big ) < h\big (|x-y_n|\big ), \end{aligned}$$

and hence, using inequality (7.1),

$$\begin{aligned} \overline{|\nabla \phi _{y_n}|}(x) \ge h\big (|x-y_{n}|\big ). \end{aligned}$$
(7.2)

Denote by \(\beta \) the minimum value of the strictly positive continuous function h on the interval \(\big [\alpha , d(y_0,X)\big ]\), so \(\beta > 0\). Inequality (7.2) implies

$$\begin{aligned} d \big (\widehat{y_n - x}, N_X(x) \big ) \ge \beta , \end{aligned}$$

and hence, using Theorem 5.2 (Distance decrease),

$$\begin{aligned} |x_{n+1} - y_{n+1}| \le |x_{n+1} - y_n| \le |x_n - y_n| - \epsilon \beta , \end{aligned}$$

giving a contradiction for large n. The result follows. \(\square \)

We remark that the technique we present here applies to much broader classes of sets—those “definable in an o-minimal structure” [11]. Under a further regularity condition, [34, Corollary 9] proves convergence of the full alternating projection sequence in the semi-algebraic (or, more broadly, subanalytic) setting.