1 Introduction

Many models in the sciences and engineering are expressed as sets of real solutions to systems of polynomial equations in \(n\) unknowns. For such a real algebraic variety \(X \subset \mathbb {R}^n\), we consider the following problem: given \(u \in \mathbb {R}^n\), compute \(u^* \in X\) that minimizes the squared Euclidean distance \(d_u(x) = \sum _{i=1}^n (u_i-x_i)^2\) from the given point \(u\). This optimization problem arises in a wide range of applications. For instance, if \(u\) is a noisy sample from \(X\), where the error model is a standard Gaussian in \(\mathbb {R}^n\), then \(u^*\) is the maximum likelihood estimate for \(u\).

In order to find \(u^*\) algebraically, we consider the set of solutions in \(\mathbb {C}^n\) to the equations defining \(X\). In this manner, we regard \(X\) as a complex variety in \(\mathbb {C}^n\), and we examine all complex critical points of the squared distance function \(d_u(x) = \sum _{i=1}^n (u_i-x_i)^2\) on \(X\). Here, we only allow those critical points \(x\) that are non-singular on \(X\). The number of such critical points is constant on a dense open subset of data \(u \in \mathbb {R}^n \). That number is called the Euclidean distance degree (or ED degree) of the variety \(X\) and denoted as EDdegree(X).

The ED degree of a variety \(X\) measures the algebraic complexity of writing the optimal solution \(u^*\) of \(d_u(x)\) over \(X\). It is a function of the input data and an important invariant of the optimization problem. This paper describes the basic properties of the ED degree using tools from computational and classical algebraic geometry. In many situations, our techniques offer formulas for this invariant. Our goal is to establish the foundations of ED degree so that it can be specialized in specific instances to solve the optimization problem.

Using Lagrange multipliers, and the observation that \(\nabla d_u = 2(u-x) \), our problem amounts to computing all regular points \(x \in X\) such that \(u-x = (u_1-x_1,\ldots ,u_n-x_n) \) is perpendicular to the tangent space \(T_x X \) of \(X\) at \(x\). Thus, we seek to solve the constraints

$$\begin{aligned} x \in X , \quad x \not \in X_\mathrm{sing} \quad \mathrm{and} \quad u-x \perp T_x X, \end{aligned}$$
(1.1)

where \(X_\mathrm{sing}\) denotes the singular locus of \(X\). The ED degree of \(X\) counts the solutions \(x\).

Example 1.1

We illustrate our problem for a plane curve. Figure 1 shows the cardioid

$$\begin{aligned} X = \bigl \{ (x,y) \in \mathbb {R}^2:(x^2+y^2+x)^2 = x^2 + y^2\bigr \} . \end{aligned}$$

For general data \((u,v)\) in \(\mathbb {R}^2\), the cardioid \(X\) contains precisely three points \((x,y) \) whose tangent line is perpendicular to \((u-x,v-y)\). Thus, \(\mathrm{EDdegree}(X) = 3\). All three critical points \((x,y)\) are real, provided \((u,v)\) lies outside the evolute, which is the small inner cardioid

$$\begin{aligned} \bigl \{(u,v) \in \mathbb {R}^2\,:27 u^4+54 u^2v^2+27 v^4+54 u^3+54 uv^2+36 u^2+9v^2+8u = 0 \bigr \} . \end{aligned}$$
(1.2)

The evolute is called the ED discriminant in this paper. If \((u,v)\) lies inside the evolute then two of the critical points are complex, and the unique real solution maximizes \(d_u\). \(\diamondsuit \)

Fig. 1
figure 1

The cardioid has ED degree three. The inner cardioid is the ED discriminant

Readers familiar with algebraic statistics [12] may note that the ED degree of a variety \(X\) is an additive analog of its ML degree (maximum likelihood degree). Indeed, if \(X\) represents a statistical model for discrete data then maximum likelihood estimation leads to polynomial equations which we can write in a form that looks like (1.1), with \(u/x = (u_1/x_1,\ldots , u_n/x_n)\):

$$\begin{aligned} x \in X , \quad x \not \in X_\mathrm{sing} \quad \mathrm{and} \quad u/x \perp T_x(X). \end{aligned}$$
(1.3)

See Example 2.4 and [26, 27] for details. Here, the optimal solution \(\hat{u}\) minimizes the Kullback–Leibler distance from the distribution \(u\) to the model \(X\). Thus, ED degree and ML degree are close cousins.

For most varieties \(X\) and most data \(u\), the number of real critical points is much smaller than \(\mathrm{EDdegree}(X)\). To quantify that difference, we also study the expected number of real critical points of \(d_u\) on \(X\). This number, denoted \(\mathrm{aEDdegree}(X)\) and called the average ED degree, depends on the underlying probability distribution on \(\mathbb {R}^n\). For instance, for the cardioid \(X\) in Example 1.1, the invariant \(\mathrm{aEDdegree}(X)\) can be any real number between \(1\) and \(3\). The specific value depends on how we sample the data points \((u,v)\) from \(\mathbb {R}^2\).

This paper is organized as follows. In Sect. 2, we rigorously define ED degree for affine and projective varieties and show how the ED degree of \(X\) and all critical points of \(d_u\) can be computed in practice. The projective case is important because many varieties in applications are defined by homogenous equations. For the most part, our exposition assumes no prerequisites beyond undergraduate mathematics. We follow the book by Cox, Little and O’Shea [9], and we illustrate the main concepts with code in Macaulay2 [20].

Section 3 is devoted to case studies in control theory, geometric modeling, computer vision, and low-rank matrix completion. New results include formulas for the ED degree for the Hurwitz stability problem and for the number of critical formations on the line, as in [2].

In Sect. 4, we introduce the ED correspondence \(\mathcal {E}_X\), which is the variety of pairs \((x,u)\) where \(x \in X\) is critical for \(d_u\). The ED correspondence is of vital importance for the computation of average ED degrees, in that same section. We show how to derive parametric representations of \(\mathcal {E}_X\), and how these translate into integral representations for \(\mathrm{aEDdegree}(X)\).

Duality plays a key role in both algebraic geometry and optimization theory [38]. Every projective variety \(X \subset \mathbb {P}^n\) has a dual variety \(X^* \subset \mathbb {P}^n\), whose points are the hyperplanes tangent to \(X\). In Sect. 5, we prove that \(\mathrm{EDdegree}(X) = \mathrm{EDdegree}(X^*)\), we express this number as the sum of the classical polar classes [25], and we lift the ED correspondence to the conormal variety of \((X,X^*)\). When \(X\) is smooth and toric, we obtain a combinatorial formula for \(\mathrm{EDdegree}(X)\) in terms of the volumes of faces of the corresponding polytope.

In Sect. 6, we study the behavior of the ED degree under linear projections and under intersections with linear subspaces. We also examine the fact that the ED degree can go up or can go down when passing from an affine variety in \(\mathbb {C}^n\) to its projective closure in \(\mathbb {P}^n\).

In Sect. 7, we express \(\mathrm{EDdegree}(X)\) in terms of Chern classes when \(X\) is smooth and projective, and we apply this to classical Segre and Veronese varieties. We also study the ED discriminant which is the locus of all data points \(u\) where two critical points of \(d_u\) coincide. For instance, in Example 1.1, the ED discriminant is the inner cardioid. Work of Catanese, Trifogli and others [7, 29] offers degree formulas for ED discriminants in various situations.

As we will see in Example 2.3, the ED degree of the variety of bounded-rank matrices can be derived from the Eckart–Young Theorem. The singular value decomposition furnishes the critical points. The case of multidimensional tensors, while of equally fundamental importance, is much more involved. In Sect. 8, following [10, 14, 15], we give an account of recent results on the ordinary and average ED degree of the variety of rank one tensors.

Even though the problem of minimizing Euclidean distance to a variety arises in a number of applications, there seems to be no systematic study of this problem in the generality that we address here. This paper aims to lay down the foundations for solving this problem using tools from algebraic geometry and computational algebra. In Sect. 3, we will see many formulas for the ED degree in specific instances, some of which are new while others reprove existing results in the literature using our tools and uniform framework. The remaining sections explore several different aspects of the ED degree and offer many possible directions in which the general theory can be further developed or tailored to particular applications. Our ED degree umbrella brings under it a variety of applications and theoretical tools, and draws on previous work that relates to subjects such as the Catanese–Trifogli formula for the ED discriminant (Sect. 7) and the work of Piene and Holme on duality (Sect. 5).

2 Equations Defining Critical Points

An algebraic variety \(X\) in \(\mathbb {R}^n\) can be described either implicitly, by a system of polynomial equations in \(n\) variables, or parametrically, as the closure of the image of a polynomial map \(\psi : \mathbb {R}^m \rightarrow \mathbb {R}^n\). The second representation arises frequently in applications, but it is restricted to varieties \(X\) that are unirational. The first representation exists for any variety \(X\). In what follows we start with the implicit representation, and we derive the polynomial equations that characterize the critical points of the squared distance function \(d_u = \sum _{i=1}^n(x_i-u_i)^2\) on \(X\). The function \(d_u\) extends to a polynomial function on \(\mathbb {C}^n\). So, if \(x\) is a complex point in \(X\) then \(d_u(x)\) is usually a complex number, and that number can be zero even if \(x \not = u\). The Hermitian inner product and its induced metric on \(\mathbb {C}^n\) will not appear in this paper.

Fix a radical ideal \(I_X =\langle f_1, \ldots ,f_s \rangle \subset \mathbb {R}[x_1,\ldots ,x_n]\) and \(X = V(I_X)\) its variety in \(\mathbb {C}^n\). Since ED degree is additive over the components of \(X\), we may assume that \(X\) is irreducible and that \(I_X\) is a prime ideal. The formulation (1.1) translates into a system of polynomial equations as follows. We write \(J(f)\) for the \(s \times n\) Jacobian matrix, whose entry in row \(i\) and column \(j\) is the partial derivative \(\partial f_i /\partial x_j\). The singular locus \(X_\mathrm{sing}\) of \(X\) is defined by

$$\begin{aligned} I_{X_\mathrm{sing}} = I_X + \bigl \langle c \times c\hbox {-minors of }J(f) \bigr \rangle , \end{aligned}$$

where \(c\) is the codimension of \(X\). The ideal \(I_{X_\mathrm {sing}}\) can in fact be non-radical, but that does not matter for our purposes. We now augment the Jacobian matrix \(J(f)\) with the row vector \(u-x\) to get an \((s+1) \times n\)-matrix. That matrix has rank \(\le c\) on the critical points of \(d_u\) on \(X\). From the subvariety of \(X\) defined by these rank constraints, we must remove contributions from the singular locus \(X_\mathrm{sing}\). Thus, the critical ideal for \(u \in \mathbb {C}^n\) is the following saturation:

$$\begin{aligned} \biggl ( I_X + \biggl \langle (c+1) \times (c+1) \hbox { -minors of } \begin{pmatrix} u - x \\ J(f) \end{pmatrix} \biggr \rangle \biggr ) : \left( I_{X_\mathrm{sing}} \,\right) ^\infty . \end{aligned}$$
(2.1)

Note that if \(I_X\) were not radical, then the above ideal could have an empty variety.

Lemma 2.1

For general \(u \in \mathbb {C}^n\), the variety of the critical ideal in \(\mathbb {C}^n\) is finite. It consists precisely of the critical points of the squared distance function \(d_u\) on the manifold \(X \backslash X_\mathrm{sing}\).

Proof

For fixed \(x \in X \backslash X_\mathrm{sing}\), the Jacobian \(J(f)\) has rank \(c\), so the data points \(u\) where the \((c{+}1) \times (c{+}1)\)-minors of \(\begin{pmatrix} u-x \\ J(f) \end{pmatrix}\) vanish form an affine-linear subspace in \(\mathbb {C}^n\) of dimension \(c\). Hence, the variety of pairs \((x,u) \in X \times \mathbb {C}^n\) that are zeros of (2.1) is irreducible of dimension \(n\). The fiber of its projection into the second factor over a general point \(u \in \mathbb {C}^n\) must hence be finite.\(\square \)

The ED degree of \(X\) is defined to be the number of critical points in Lemma 2.1. It is the same as the normal class of \(X\) defined in [30], where it is studied in detail for curves and surfaces. We start with two examples that are familiar to all students of applied mathematics.

Example 2.2

(Linear regression) Every linear space \(X\) has ED degree \(1\). Here, the critical equations (1.1) take the form \(x \in X\) and \(u-x \perp X\). These linear equations have a unique solution \(u^*\). If \(u\) and \(X\) are real then \(u^*\) is the unique point in \(X\) that is closest to \(u\). \(\diamondsuit \)

Example 2.3

(The Eckart–Young Theorem) Fix positive integers \(r \le s \le t\) and set \(n = st\). Let \(X_r\) be the variety of \(s \times t\)-matrices of rank \(\le r\). This determinantal variety has

$$\begin{aligned} \mathrm{EDdegree}(X_r) = \left( {\begin{array}{c}s\\ r\end{array}}\right) . \end{aligned}$$
(2.2)

To see this, we consider a general real \(s\times t\)-matrix \(U\) and its singular value decomposition

$$\begin{aligned} U = T_1 \cdot \mathrm{diag}(\sigma _1, \sigma _2, \ldots , \sigma _s) \cdot T_2. \end{aligned}$$
(2.3)

Here, \( \sigma _1 > \sigma _2 > \cdots > \sigma _s\) are the singular values of \(U\), and \(T_1\) and \(T_2\) are orthogonal matrices of format \(s \times s\) and \(t \times t\), respectively. According to the Eckart–Young Theorem,

$$\begin{aligned} U^* = T_1 \cdot \mathrm{diag}(\sigma _1, \ldots , \sigma _r,0, \ldots , 0) \cdot T_2 \end{aligned}$$

is the closest rank \(r\) matrix to \(U\). More generally, the critical points of \(d_U\) are

$$\begin{aligned} T_1 \cdot \mathrm{diag}(0,\ldots ,0,\sigma _{i_1},0,\ldots ,0,\sigma _{i_r},0,\ldots ,0) \cdot T_2 \end{aligned}$$

where \(I=\{i_1<\cdots <i_r\}\) runs over all \(r\)-element subsets of \(\{1,\ldots ,s\}\). This yields the formula (2.2). The case \(r=1,s=t=2\) was featured in Example 2.4. \(\diamondsuit \)

Example 2.4

To compare the ED degree with the ML degree, we consider the algebraic function that takes a \(2 \times 2\)-matrix \(u\) to its closest rank one matrix. By Example 2.3, we have \(\mathrm{EDdegree}(X) = 2\), while \(\mathrm{MLdegree}(X) = 1\). To see what this means, consider the instance

$$\begin{aligned} u = \begin{pmatrix} 3 &{} 5 \\ 7 &{} 11 \end{pmatrix} . \end{aligned}$$

The closest rank \(1\) matrix in the maximum likelihood sense of [12, 27] has rational entries:

$$\begin{aligned} \hat{u} = \frac{1}{3 {+} 5 {+} 7 {+} 11} \begin{pmatrix} (3{+}5)(3{+}7) &{} (3{+}5)(5{+}11) \\ (7{+}11)(3{+}7) &{} (7{+}11)(5{+}11) \end{pmatrix}; \end{aligned}$$

it is the unique rank one matrix with the same row sums and the same column sums as \(u\). By contrast, when minimizing the Euclidean distance, we must solve a quadratic equation:

$$\begin{aligned} u^*&= \begin{pmatrix} v_{11} \!&{} \! v_{12} \\ v_{21} \! &{} \! v_{22} \end{pmatrix} \,\hbox {where } \, v_{11}^2{-}3 v_{11}{-}\frac{437}{1300} = 0, v_{12}= \frac{62}{41} v_{11}{+}\frac{19}{82} ,\,\\&v_{21} = \frac{88}{41} v_{11}{+} \frac{23}{82} ,\, v_{22} = \frac{141}{41} v_{11}{+}\frac{14}{41} . \end{aligned}$$

This rank \(1\) matrix arises from the singular value decomposition, as seen in Example 2.3. \(\diamondsuit \)

Example 2.5

The following Macaulay2 code computes the ED degree of a variety in \(\mathbb {R}^3\):

figure a

We chose a random vector \(\mathtt{u}\) as input for the above computation. The output reveals that the Fermat quintic cone \(\{(x_1,x_2,x_3) \in \mathbb {R}^3 : x_1^5+x_2^5+x_3^5 = 0\}\) has ED degree \(23\). \(\diamondsuit \)

Here, is a general upper bound on the ED degree in terms of the given polynomials \(f_i\).

Proposition 2.6

Let \(X \) be a variety of codimension \(c\) in \(\mathbb {C}^n\) that is cut out by polynomials \(f_1,f_2, \ldots ,f_c, \ldots , f_s\) of degrees \(d_1\ge d_2\ge \cdots \ge d_c\ge \cdots \ge d_s\). Then

$$\begin{aligned} \mathrm{EDdegree}(X) \le d_1 d_2 \cdots d_c \cdot \!\!\!\!\!\!\!\!\!\!\!\! \sum _{i_1+i_2+\cdots +i_c \le n-c} \!\!\! \!\! (d_1-1)^{i_1} (d_2-1)^{i_2} \cdots (d_c-1)^{i_c}. \end{aligned}$$

Equality holds when \(X\) is a general complete intersection of codimension \(c\) (hence \(c=s\)).

In Sect. 7, this result will be derived from our Chern class formula given in Theorem 7.8 and from Theorem 6.11. The latter relates the ED degree of an affine variety and of its projective closure. A similar bound for the ML degree appears in [26, Theorem 5].

Many varieties arising in applications are rational and they are presented by a parametrization \(\psi : \mathbb {R}^m \rightarrow \mathbb {R}^n\) whose coordinates \(\psi _i\) are rational functions in \(m\) unknowns \(t = (t_1,\ldots ,t_m)\). Instead of first computing the ideal of \(X\) by implicitization and then following the approach above, we can use the parametrization directly to compute the ED degree of \(X\).

The squared distance function in terms of the parameters equals

$$\begin{aligned} D_u(t) = \sum _{i=1}^n (\psi _i(t) - u_i)^2 . \end{aligned}$$

The equations we need to solve are given by \(m\) rational functions in \(m\) unknowns:

$$\begin{aligned} \frac{\partial D_u}{\partial t_1} = \cdots = \frac{\partial D_u}{\partial t_m} = 0 . \end{aligned}$$
(2.4)

The critical locus in \(\mathbb {C}^m\) is the set of all solutions to (2.4) at which the Jacobian of \(\psi \) has maximal rank. The closure of the image of this set under \(\psi \) coincides with the variety of (2.1). Hence, if the parametrization \(\psi \) is generically finite-to-one of degree \(k\), then the critical locus in \(\mathbb {C}^m\) is finite, by Lemma 2.1, and its cardinality equals \(k \cdot \mathrm{EDdegree}(X)\).

In analogy to Proposition 2.6, we can ask for the ED degree when general polynomials are used in the parametrization of \(X\). Suppose that \(n-m\) of the \(n\) polynomials \(\psi _i(t)\) have degree \(\le d\), while the remaining \(m\) polynomials are general of degree \(d\). Since \(u\) is general, (2.4) has no solutions at infinity, and all its solutions have multiplicity one. Hence, Bézout’s Theorem implies

$$\begin{aligned} \mathrm{EDdegree}(X) = (2d-1)^m . \end{aligned}$$
(2.5)

For specific, rather than general, parametrizations, the right-hand side is just an upper bound on the ED degree of \(X\). As the following example shows, the true value can be smaller for several different reasons.

Example 2.7

Let \( m = 2, n = 4\) and consider the map \(\psi (t_1,t_2) = (t_1^3, t_1^2 t_2, t_1 t_2^2, t_2^3)\), which has degree \(k=3\). Its image \(X \subset \mathbb {C}^4\) is the cone over the twisted cubic curve. The system (2.4) consists of two quintics in \(t_1,t_2\), so Bézout’s Theorem predicts \(25 = 5 \times 5\) solutions. The origin is a solution of multiplicity \(4\) and maps to a singular point of \(X\), hence does not contribute to the ED degree. The critical locus in \(\mathbb {C}^2\) consists of \(21 = 25 - 4 \) points. We conclude that the toric surface \(X\) has \(\mathrm{EDdegree}(X) = 21/k = 7\).

Next, we change the parametrization by scaling the middle two monomials as follows:

$$\begin{aligned} \widetilde{\psi }(t_1,t_2) = \left( t_1^3, \sqrt{3} t_1^2 t_2 , \sqrt{3} t_1 t_2^2 , t_2^3 \right) . \end{aligned}$$
(2.6)

We still have \(k=3\). This scaling is special in that \(||\widetilde{\psi }(t_1,t_2)||^2=(t_1^2+t_2^2)^3\), and this causes the ED degree to drop. The function whose critical points we are counting has the form

$$\begin{aligned} \widetilde{D}(t_1,t_2) = (t_1^3 - a)^2 + 3(t_1^2 t_2 - b)^2 + 3(t_1 t_2^2 - c)^2 + (t_2^3 - d)^2 , \end{aligned}$$

where \(a,b,c,d\) are random scalars. A computation shows that the number of complex critical points of \(\widetilde{D}\) equals \(9\). So, the corresponding toric surface \(\widetilde{X}\) has \(\mathrm{EDdegree}(\widetilde{X}) = 9/k = 3\). This is a special case of Corollary 8.7 on Veronese varieties that are scaled such that the norm on the ambient space is a power of the norm on the parametrizing space. \(\diamondsuit \)

The variety \(X \subset \mathbb {C}^n\) is an affine cone if \(x \in X\) implies \(\lambda x \in X\) for all \(\lambda \in \mathbb {C}\). This means that \(I_X\) is a homogeneous ideal in \(\mathbb {R}[x_1,\ldots ,x_n]\). By slight abuse of notation, we identify \(X\) with the projective variety given by \(I_X\) in \( \mathbb {P}^{n-1}\). The former is the affine cone over the latter.

We define the ED degree of a projective variety in \(\mathbb {P}^{n-1}\) to be the ED degree of the corresponding affine cone in \(\mathbb {C}^n\). For instance, in Example 2.6, we considered two twisted cubic curves \(X\) and \(\widetilde{X}\) that lie in \(\mathbb {P}^3\). These curves have ED degrees \(3\) and \(7\), respectively.

To take advantage of the homogeneity of the generators of \(I_X\), and of the geometry of projective space \(\mathbb {P}^{n-1}\), we replace (2.1) with the following homogeneous ideal in \(\mathbb {R}[x_1,\ldots ,x_n]\):

$$\begin{aligned} \biggl ( I_X + \biggl \langle (c+2) \times (c+2)\hbox { -minors of } \begin{pmatrix} u \\ x \\ J(f) \end{pmatrix} \biggr \rangle \biggr ) : \left( I_{X_\mathrm{sing}} \cdot \langle x_1^2 + \cdots + x_n^2 \rangle \right) ^\infty . \end{aligned}$$
(2.7)

The singular locus of an affine cone is the cone over the singular locus of the projective variety. They are defined by the same ideal \(I_{X_\mathrm{sing}}\). The isotropic quadric \(Q = \{ x \in \mathbb {P}^{n-1}: x_1^2 + \cdots + x_n^2 = 0\}\) plays a special role, seen clearly in the proof of Lemma 2.8. In particular, the role of \(Q\) exhibits that the computation of ED degree is a metric problem. Note that \(Q\) has no real points. The Macaulay2 code in Example 2.5 can be adapted to verify \(\mathrm{EDdegree}(Q) = 0\).

The following lemma concerns the transition between affine cones and projective varieties.

Lemma 2.8

Fix an affine cone \(X \subset \mathbb {C}^n\) and a data point \(u \in \mathbb {C}^n \backslash X\). Let \(x \in X \backslash \{0\}\) be such that the corresponding point \([x]\) in \(\mathbb {P}^{n-1}\) does not lie in the isotropic quadric \(Q\). Then \([x]\) lies in the projective variety of (2.7) if and only if some scalar multiple \(\lambda x\) of \(x\) lies in the affine variety of (2.1). In that case, the scalar \(\lambda \) is unique.

Proof

Since both ideals are saturated with respect to \(I_{X_\mathrm{sing}}\), it suffices to prove this under the assumption that \(x \in X \backslash X_\mathrm{sing}\), so that the Jacobian \(J(f)\) at \(x\) has rank \(c\). If \(u- \lambda x\) lies in the row space of \(J(f)\), then the span of \(u,x,\) and the rows of \(J(f)\) has dimension at most \(c+1\). This proves the only-if direction. Conversely, suppose that \([x]\) lies in the variety of (2.7). First assume that \(x\) lies in the row span of \(J(f)\). Then \(x = \sum \lambda _i \nabla f_i(x)\) for some \(\lambda _i \in \mathbb {C}\). Now recall that if \(f\) is a homogeneous polynomial in \(\mathbb {R}[x_1,\ldots ,x_n]\) of degree \(d\), then \(x \cdot \nabla f(x) = d \,f(x)\). Since \(f_i(x)=0\) for all \(i\), we find that \(x \cdot \nabla f_i(x)=0\) for all \(i\), which implies that \(x \cdot x=0\), i.e., \([x] \in Q\). This contradicts the hypothesis, so the matrix \(\begin{pmatrix} x \\ J(f) \end{pmatrix}\) has rank \(c+1\). But then \(u- \lambda x\) lies in the row span of \(J(f)\) for a unique \(\lambda \in \mathbb {C}\).\(\square \)

The condition on \(X\) in the following corollary is fulfilled by any projective variety that contains at least one real point.

Corollary 2.9

Let \(X\) be a variety in \( \mathbb {P}^{n-1}\) that is not contained in the isotropic quadric \(Q\), and let \(u\) be general. Then \(\mathrm{EDdegree}(X)\) is equal to the number of zeros of (2.7) in \(\mathbb {P}^{n-1}\).

Proof

Since \(X \not \subseteq Q\) and \(u\) is general, none of the critical points of \(d_u\) in \(X \backslash X_\mathrm{sing}\) will lie in \(Q\). The claim follows from Lemma 2.8. For further details, see Theorems 4.1 and 4.4.\(\square \)

Corollary 2.9 implies that Proposition 2.6 holds almost verbatim for projective varieties.

Corollary 2.10

Let \(X \) be a variety of codimension \(c\) in \(\mathbb {P}^{n-1}\) that is cut out by homogeneous polynomials \(F_1,F_2, \ldots ,F_c, \ldots , F_s\) of degrees \(d_1\ge d_2\ge \cdots \ge d_c\ge \cdots \ge d_s\). Then

$$\begin{aligned} \mathrm{EDdegree}(X) \le d_1 d_2 \cdots d_c \cdot \!\!\!\!\!\!\!\!\!\!\!\! \sum _{i_1+i_2+\cdots +i_c \le n-c-1} \!\!\! \!\! (d_1-1)^{i_1} (d_2-1)^{i_2} \cdots (d_c-1)^{i_c}. \end{aligned}$$
(2.8)

Equality holds when \(X\) is a general complete intersection of codimension \(c\) in \(\mathbb {P}^{n-1}\).

Fixing the codimension \(c\) of \(X\) is essential in Proposition 2.6 and Corollary 2.10. Without this hypothesis, the bounds do not hold. In Example 5.10, we display homogeneous polynomials \(F_1,\ldots ,F_c\) of degrees \(d_1,\ldots ,d_c\) whose variety has ED degree larger than (2.8).

Example 2.11

The following Macaulay2 code computes the ED degree of a curve in \(\mathbb {P}^2\):

figure b

The output confirms that the Fermat quintic curve given by \(x_1^5+x_2^5+x_3^5 = 0\) has ED degree \(23\). By contrast, as seen from Corollary 2.10, a general curve of degree five in \(\mathbb {P}^2\) has ED degree \(25\). Saturating with \(I_{X_\mathrm{sing}}\) alone in the fourth line of the code would yield \(25\). \(\diamondsuit \)

It should be stressed that the ideals (2.1) and (2.7), and our two Macaulay2 code fragments, are blueprints for first computations. In order to succeed with larger examples, it is essential that these formulations be refined. For instance, to express rank conditions on a polynomial matrix \(M\), the determinantal constraints are often too large, and it is better to add a matrix equation of the form \(\Lambda \cdot M = 0\), where \(\Lambda \) is a matrix filled with new unknowns. This leads to a system of bilinear equations, so the methods of Faugère et al. [16] can be used. We also recommend trying tools from numerical algebraic geometry, such as Bertini [3].

3 First Applications

The problem of computing the closest point on a variety arises in numerous applications. In this section, we discuss some concrete instances, and we explore the ED degree in each case.

Example 3.1

(Geometric modeling) Thomassen et al. [44] study the nearest point problem for a parametrized surface \(X\) in \(\mathbb {R}^3\). The three coordinates of their birational map \(\psi : \mathbb {R}^2 \rightarrow X \subseteq \mathbb {R}^3\) are polynomials in the parameters \((t_1,t_2)\) that have degree \(d_1\) in \(t_1\) and degree \(d_2\) in \(t_2\). The image \(X = \overline{\psi (\mathbb {R}^2)}\) is a Bézier surface of bidegree \((d_1,d_2)\). It is shown in [44, §3] that

$$\begin{aligned} \mathrm{EDdegree}(X) = 4d_1 d_2 + (2d_1-1)(2d_2-1). \end{aligned}$$

This refinement of the Bézout bound in (2.5) is the intersection number in \(\mathbb {P}^1 \times \mathbb {P}^1\) of a curve of bidegree \((2d_1-1,d_2)\) with a curve of bidegree \((d_1,2d_2-1)\). The authors of [44] demonstrate how to solve the critical equations \(\partial D_u /\partial t_1 = \partial D_u/ \partial t_2 = 0\) with resultants based on moving surfaces. \(\diamondsuit \)

Example 3.2

(The closest symmetric matrix) Let \(X\) denote the variety of symmetric \(s \times s\)-matrices of rank \(\le r\). The nearest point problem for \(X\) asks the following question: given a symmetric \(s \times s\)-matrix \(U = (U_{ij})\), find the symmetric rank \(r\) matrix \(U^*\) that is closest to \(U\). There are two natural interpretations of this question in the Euclidean distance context. The difference lies in which of the following two functions we are minimizing:

$$\begin{aligned} D_U = \sum _{i=1}^s \sum _{j=1}^s \left( U_{ij} - \sum _{k=1}^r t_{ik} t_{kj} \right) ^2 \qquad \mathrm{or} \qquad D_U = \sum _{1 \le i \le j \le s} \left( U_{ij} - \sum _{k=1}^r t_{ik} t_{kj} \right) ^2. \end{aligned}$$
(3.1)

These unconstrained optimization problems use the parametrization of symmetric \(s \times s\)-matrices of rank \(r\) that comes from multiplying an \(s \times r\) matrix \(T = (t_{ij})\) with its transpose. The two formulations are dramatically different as far as the ED degree is concerned. On the left side, the Eckart–Young Theorem applies, and \(\mathrm{EDdegree}(X) = \left( {\begin{array}{c}s\\ r\end{array}}\right) \) as in Example 2.3. On the right side, \(\mathrm{EDdegree}(X)\) is much larger than \(\left( {\begin{array}{c}s\\ r\end{array}}\right) \). For instance, for \(s=3\) and \(r= 1 \ \mathrm{or} \ 2\),

$$\begin{aligned} \qquad \mathrm{EDdegree}(X) = 3 \qquad \hbox {and} \qquad \mathrm{EDdegree}(X) = 13. \end{aligned}$$
(3.2)

The two ideals that represent the constrained optimization problems equivalent to (3.1) are

$$\begin{aligned}&\biggl \langle 2 {\times } 2\hbox {-minors of } \begin{pmatrix} \!\!\sqrt{2}x_{11} &{}\quad \! x_{12} &{}\quad \! x_{13} \\ \!\!x_{12} &{}\quad \! \sqrt{2}x_{22} &{}\quad \! x_{23} \\ \!\!x_{13} &{} \quad \! x_{23} &{}\quad \! \sqrt{2} x_{33} \end{pmatrix} \biggr \rangle \hbox { and } \biggl \langle 2 {\times } 2\hbox {-minors of } \begin{pmatrix} x_{11} &{} \quad \! x_{12} &{} \quad \! x_{13} \\ x_{12} &{} \quad \! x_{22} &{} \quad \! x_{23} \\ x_{13} &{} \quad \! x_{23} &{} \quad \! x_{33} \end{pmatrix} \biggr \rangle . \end{aligned}$$
(3.3)

These equivalences can be seen via a change of variables. For example, for the left ideal in (3.3), the constrained optimization problem is to minimize \(\sum _{1 \le i \le j \le 3} (u_{ij}-x_{ij})^2\) subject to the nine quadratic equations \(2x_{11}x_{22}=x_{12}^2, \sqrt{2}x_{11}x_{23}=x_{12}x_{13}, \ldots ,2x_{22}x_{33}=x_{23}^2\). Now making the change of variables \(x_{ii} = X_{ii}\) for \( i=1,2,3\) and \(x_{ij} = \sqrt{2}X_{ij}\) for \(1 \le i < j \le 3\), and similarly, \(u_{ii} = U_{ii}\) for \(i=1,2,3\) and \(u_{ij} = \sqrt{2}U_{ij}\) for \(1 \le i < j \le 3\), we get the problem

$$\begin{aligned} \begin{array}{l@{\quad }l} \hbox {minimize} &{}\quad \sum _{i=1}^{3} (U_{ii}-X_{ii})^2 + \sum _{1 \le i < j \le 3} 2(U_{ij}-X_{ij})^2\\ \hbox {subject to} &{} \quad X_{ik}X_{jl} = X_{il}X_{jk} \quad \mathrm{for}\,\, 1 \le i < j \le 3, 1 \le k < l \le 3. \end{array} \end{aligned}$$

This is equivalent to the left problem in (3.1) for \(r=1\) via the parametrization \(X_{ij} = t_it_j\). The appearance of \(\sqrt{2}\) in the left matrix \(M(x)\) in (3.3) is analogous to the appearance of \(\sqrt{3}\) in Example 2.7: it is the special scaling that relates the natural squared matrix norm \({\mathrm {tr}} M(x)^T M(x)\) on the ambient space to (two times) the squared norm \(||x||^2\), and this puts the variety defined by the \(2 \times 2\)-minors of \(M(x)\) into special position relative to the isotropic quadric \(Q\). In Example 5.6, we discuss a general ED degree formula for symmetric \(s \times s\)-matrices of rank \(\le r\) that works for the version on the right. The same issue for ML degrees is the difference between “scaled” and “unscaled” in the table at the end of [26, §5]. \(\diamondsuit \)

Example 3.3

(Computer vision) This article got started with the following problem from [1, 22, 41]. A general projective camera is a \(3 \times 4\) real matrix of rank three, that defines a linear map from \(\mathbb {P}^3\) to \(\mathbb {P}^2\) sending a “world point” \(y \in \mathbb {P}^3\) to its image \(Ay \in \mathbb {P}^2\). This map is well-defined everywhere except at the kernel of \(A\), which is called the center of the camera.

The multiview variety associated to \(n\) cameras \(A_1,A_2,\ldots ,A_n\) is the closure of the image of the map \(\mathbb {P}^3 \dashrightarrow (\mathbb {P}^2)^n , y \mapsto (A_1 y, A_2 y, \ldots , A_n y)\). This is an irreducible threefold in \((\mathbb {P}^2)^n\) and its defining prime ideal \(I_n\) is multi-homogeneous and lives in the polynomial ring \(\mathbb {R}[x_{ij}: i=1,\ldots ,n, j=0,1,2]\), where \((x_{i0} : x_{i1}:x_{i2})\) are homogeneous coordinates of the \(i\)-th plane \(\mathbb {P}^2\). Explicit determinantal generators and Gröbner bases for \(I_n\) are derived in [1]. If we dehomogenize \(I_n\) by setting \(x_{i0} = 1\) for \(i=1,2,\ldots ,n\), then we get a three-dimensional affine variety \(X_n\) in \(\mathbb {R}^{2n} = (\mathbb {R}^2)^n\) that is the space of dehomogenized images under the \(n\) cameras. Note that \(I_n\) and \(X_n\) depend on the choice of the matrices \(A_1,A_2,\ldots ,A_n\). This dependence is governed by the Hilbert scheme in [1].

The Euclidean distance problem for \(X_n\) is known in computer vision as \(n\) -view triangulation. Following [22] and [41], the data \(u \in \mathbb {R}^{2n}\) are \(n\) noisy images of a point in \(\mathbb {R}^3\) taken by the \(n\) cameras. The maximum likelihood solution of the recovery problem with Gaussian noise is the configuration \(u^* \in X_n\) of minimum distance to \(u\). For \(n=2\), the variety \(X_2\) is a hypersurface cut out by a bilinear polynomial \((1,x_{11}, x_{12}) M (1,x_{21},x_{22})^T\), where \(M\) is a \(3 \times 3\)-matrix of rank \(2\). Hartley and Sturm [22] studied the critical equations and found that \(\mathrm{EDdegree}(X_2) = 6\). Their computations were extended by Stewénius et al. [41] up to \(n=7\):

$$\begin{aligned} \begin{array}{r|@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r} n &{} 2 &{} 3 &{} 4 &{} 5 &{} 6 &{} 7 \\ \hline \mathrm{EDdegree}(X_n) &{} 6 &{} 47 &{} 148 &{} 336 &{} 638 &{} 1081 \end{array} \end{aligned}$$

This table suggests the conjecture that these ED degrees grow as a cubic polynomial:

Conjecture 3.4

The Euclidean distance degree of the affine multiview variety \(X_n\) equals

$$\begin{aligned} \mathrm{EDdegree}(X_n) \quad = \quad \frac{9}{2} n^3 - \frac{21}{2} n^2 + 8 n - 4. \end{aligned}$$

At present, we do not know how to prove this. Our first idea was to replace the affine threefold \(X_n\) by a projective variety. For instance, consider the closure \(\overline{X}_n\) of \(X_n\) in \(\mathbb {P}^{2n}\). Alternatively, we can regard \(I_n\) as a homogeneous ideal in the usual \(\mathbb {Z}\)-grading, thus defining a projective variety \(Y_n\) in \(\mathbb {P}^{3n-1}\). However, for \(n \ge 3\), the ED degrees of both \(\overline{X}_n\) and \(Y_n\) are larger than the ED degree of \(X_n\). For instance, in the case of three cameras, we have

$$\begin{aligned} \mathrm{EDdegree}(X_3) = 47 < \mathrm{EDdegree}(\overline{X}_3) = 112 < \mathrm{EDdegree}(Y_3) = 148. \end{aligned}$$

Can one find a natural reformulation of Conjecture 3.4 in terms of projective geometry? \(\diamondsuit \)

Many problems in engineering lead to minimizing the distance from a given point \(u\) to an algebraic variety. One such problem is detecting voltage collapse and blackouts in electrical power systems [36, p. 94]. It is typical to model a power system as a differential equation \(\dot{x} = f(x,\lambda )\) where \(x\) is the state and \(\lambda \) is the parameter vector of load powers. As \(\lambda \) varies, the state moves around. At critical load powers, the system can lose equilibrium and this results in a blackout due to voltage collapse. The set of critical \(\lambda \)’s form an algebraic variety \(X\) that one wants to stay away from. This is done by calculating the closest point on \(X\) to the current set of parameters \(\lambda _0\) used by the power system. A similar, and very well-known, problem from control theory is to ensure the stability of a univariate polynomial.

Example 3.5

(Hurwitz stability) Consider a univariate polynomial with real coefficients,

$$\begin{aligned} u(z) = u_0 z^n + u_1 z^{n-1} + u_2 z^{n-2} + \cdots + u_{n-1} z + u_n. \end{aligned}$$

We say that \(u(z)\) is stable if each of its \(n\) complex zeros has negative real part. It is an important problem in control theory to check whether a given polynomial \(u(z)\) is stable, and, if not, to find a polynomial \(x(z)\) in the closure of the stable locus that is closest to \(u(z)\).

The stability of \(x(z) = \sum _{i=0}^n x_i z^i\) is characterized by the following Hurwitz test. The \(n\)th Hurwitz matrix is an \(n \times n\) matrix with \(x_1,\ldots ,x_n\) on the diagonal. Above the diagonal entry \(x_i\) in column \(i\), we stack as much of \(x_{i+1}, x_{i+2}, \ldots ,x_n\) consecutively, followed by zeros if there is extra room. Similarly, below \(x_i\), we stack as much of \(x_{i-1}, x_{i-2}, \ldots ,x_1,x_0\) consecutively, followed by zeros if there is extra room. The Hurwitz test says that \(x(z)\) is stable if and only if every leading principal minor of \(H_n\) is positive. For instance, for \(n = 5\) we have

$$\begin{aligned} H_5 = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} x_1 &{} x_3 &{} x_5 &{} 0 &{} 0\\ x_0 &{} x_2 &{} x_4 &{} 0 &{} 0 \\ 0 &{} x_1 &{} x_3 &{} x_5 &{} 0 \\ 0 &{} x_0 &{} x_2 &{} x_4 &{} 0 \\ 0 &{} 0 &{} x_1 &{} x_3 &{} x_5 \end{array} \right) . \end{aligned}$$

The ratio \(\bar{\varGamma }_n = \mathrm{det}(H_n)/x_n\), which is the \((n-1)\)st leading principal minor of \(H_n\), is a homogeneous polynomial in the variables \(x_0,\ldots , x_{n-1}\) of degree \(n-1\). Let \(\varGamma _n\) denote the non-homogeneous polynomial obtained by setting \(x_0 = 1\) in \(\bar{\varGamma }_n\). We refer to \(\varGamma _n\) and \(\bar{\varGamma }_n\) as the non-homogeneous and homogeneous Hurwitz determinant, respectively. Table 1 shows the ED degrees and the average ED degrees of both \(\varGamma _n\) and \(\bar{\varGamma }_n\) for some small values of \(n\). The average ED degree was computed with respect to the standard multivariate Gaussian distribution in \(\mathbb {R}^n\) or \(\mathbb {R}^{n+1}\) centered at the origin. For the formal definition of \(\mathrm {aEDdegree}( \cdot )\) see Sect. 4. The first two columns in Table 1 seem to be oscillating by parity. Theorem 3.6 explains this. Interestingly, the oscillating behavior does not occur for average ED degree. \(\diamondsuit \)

Table 1 ED degrees and average and ED degrees of small Hurwitz determinants

Theorem 3.6

The ED degrees of the Hurwitz determinants are given by the following table:

Proof

The hypersurface \(X = V(\bar{\varGamma }_n)\) defines the boundary of the stability region. If a polynomial \(x(z)\) lies on \(X\), then it has a complex root on the imaginary axis, so it admits a factorization \(x(z) = (c z^2+d)(b_0 z^{n-2}+\cdots +b_{n-2})\). This representation yields a parametrization of the hypersurface \(X \subset \mathbb {P}^n\) with parameters \(b_0,\ldots ,b_{n-2},c,d\). We can rewrite this as

$$\begin{aligned} x :=\begin{bmatrix} x_0 \\ x_{1} \\ \vdots \\ x_n \end{bmatrix} = \begin{bmatrix} c&\quad \!&\quad \!&\quad \!&\quad \!\\ 0&\quad \! c&\quad \!&\quad \!&\quad \!\\ d&\quad \! 0&\quad \! c&\quad \!&\quad \!\\&\quad \! \ddots&\quad \! \ddots&\quad \! \ddots&\quad \!\\&\quad \!&\quad \! d&\quad \! 0&\quad \! c \\&\quad \!&\quad \!&\quad \! d&\quad \! 0 \\&\quad \!&\quad \!&\quad \!&\quad \! d \end{bmatrix} \cdot \begin{bmatrix} b_0 \\ b_1 \\ \vdots \\ b_{n-2} \end{bmatrix} =: C \cdot b. \end{aligned}$$

where this parametrization is regular and \(x\) is a smooth point of \(X\), the tangent space \(T_x X\) is spanned by the columns of \(C\) and the vectors \(b',b''\) obtained by appending or prepending two zeros to \(b\), respectively. Thus, for \(u \in \mathbb {C}^{n+1}\), the condition \(u-Cb \perp T_x X\) translates into

$$\begin{aligned} C^T (u-Cb)=0 \text { and } (b')^T (u-Cb)=0 \text { and } (b'')^T (u-Cb)=0. \end{aligned}$$

The first equation expresses \(b\) as a rational homogenous function in \(c,d\), namely, \(b=b(c,d)=(C^T C)^{-1} C^T u\). By Cramer’s rule, the entries of the inverse of a matrix are homogeneous rational functions of degree \(-1\). Hence, the degree of \(b(c,d)\) equals \(-2+1=-1\). Let \(\gamma =\gamma (c,d)\) be the denominator of \((C^T C)^{-1}\), i.e., the lowest-degree polynomial in \(c,d\) for which \(\gamma \cdot (C^T C)^{-1}\) is polynomial; and let \(N\) be the degree of \(\gamma \). Then, \(\gamma b'\) has entries that are homogeneous polynomials in \(c,d\) of degree \(N-1\). Similarly, \(\gamma u-\gamma Cb\) has degree \(N\). Hence,

$$\begin{aligned} p(c,d):=(\gamma b') \cdot (\gamma u- \gamma Cb) \quad \text { and } \quad q(c,d) :=(\gamma b'') \cdot (\gamma u- \gamma Cb) \end{aligned}$$

are homogeneous polynomials of degree \(2N-1\) that vanish on the desired points \((c:d) \in \mathbb {P}^1\). Indeed, if \(p\) and \(q\) vanish on \((c:d)\) and \(\gamma (c,d)\) is nonzero, then there is a unique \(b\) that makes \((b,c,d)\) critical for the data \(u\). It turns out that \(p\) is divisible by \(d\), that \(q\) is divisible by \(c\), and that \(p/d=q/c\). Thus, \(2N-2\) is an upper bound for \(\mathrm {EDdegree}(X)\).

To compute \(\gamma \), note that \(C^T C\) decomposes into two blocks, corresponding to even and odd indices. When \(n=2m+1\) is odd, these two blocks are identical, and \(\gamma \) equals their determinant, which is \(c^{2m} + c^{2m-2}d^2 + \cdots + d^{2m}\). Hence \(N = 2m\). When \(n=2m\) is even, the two blocks are distinct, and \(\gamma \) equals the product of their determinants, which is \((c^{2m} + c^{2m-2}d^2 + \cdots + d^{2m})(c^{2m-2}+\cdots +d^{2m-2})\). Hence \(N = 4m-2\). In both cases, one can check that \(p/d\) is irreducible, and this implies that \(\mathrm {EDdegree}(X)=2N-2\). This establishes the stated formula for \(\mathrm{EDdegree}(\bar{\varGamma }_n)\). A similar computation can be performed in the non-homogeneous case, by setting \(x_0 = b_0 = c = 1\), leading to the formula for \(\mathrm{EDdegree}(\varGamma _n)\).\(\square \)

Example 3.7

(Interacting agents) This concerns a problem we learned from work of Anderson and Helmke [2]. Let \(X\) denote the variety in \(\mathbb {R}^{\left( {\begin{array}{c}p\\ 2\end{array}}\right) }\) with parametric representation

$$\begin{aligned} d_{ij} = (z_i-z_j)^2 \quad \hbox {for} \quad 1 \le i < j \le p. \end{aligned}$$
(3.4)

Thus, the points in \(X\) record the squared distances among \(p\) interacting agents with coordinates \(z_1,z_2,\ldots ,z_p\) on the line \(\mathbb {R}^1\). Note that \(X\) is the cone over a projective variety in \(\mathbb {P}^{{\left( {\begin{array}{c}p\\ 2\end{array}}\right) }-1}\). The prime ideal of \(X\) is given by the \(2 \times 2\)-minors of the Cayley–Menger matrix

$$\begin{aligned} \begin{bmatrix} 2 d_{1p}&\quad \! d_{1p} {+} d_{2p} {-} d_{12}&\quad \! d_{1p} {+} d_{3p} {-} d_{13}&\quad \! \cdots&\quad \! d_{1p} {+} d_{p-1,p} {-} d_{1,p-1} \\ d_{1p} {+} d_{2p} {-} d_{12}&\quad \! 2 d_{2p}&\quad \! d_{2p} {+} d_{3p} {-} d_{23}&\quad \! \cdots&\quad \! d_{2p} {+} d_{p-1,p} {-} d_{2,p-1} \\ d_{1p} {+} d_{3p} {-} d_{13}&\quad \! d_{2p} {+} d_{3p} {-} d_{23}&\quad \! 2 d_{3p}&\quad \! \cdots&\quad \! d_{3p} {+} d_{p-1,p} {-} d_{3,p-1} \\ \vdots&\quad \! \vdots&\quad \! \vdots&\quad \! \ddots&\quad \!\vdots \\ d_{1p} {+} d_{p-1,p} {-} d_{1,p-1} \!&\quad \! \! d_{2p} {+} d_{p-1,p} {-} d_{2,p-1} \!&\quad \! \! d_{3p} {+} d_{p-1,p} {-} d_{3,p-1} \!&\quad \! \! \cdots&\quad \! 2 d_{p-1,p} \end{bmatrix} \end{aligned}$$
(3.5)

Indeed, under the parametrization (3.4), the \((p-1) \times (p-1)\) matrix (3.5) factors as \(2Z^T Z\), where \(Z\) is the row vector \((z_1{-}z_p,z_2{-}z_p, z_3{-}z_p,\ldots ,z_{p-1}{-}z_p)\). We can form the Cayley–Menger matrix (3.5) for any finite metric space on \(p\) points. The metric space can be embedded in a Euclidean space if and only if (3.5) is positive semidefinite [31, (8)]. That Euclidean embedding is possible in dimension \(r\) if and only if the rank of (3.5) is at most \(r\).

The following theorem is inspired by [2] and provides a refinement of results therein. In particular, it explains the findings in [2, §4] for \(p \le 4\). There is an extra factor of \(1/2\) because of the involution \(z \mapsto -z\) on the fibers of the map (3.4). For instance, for \(p = 4\), our formula gives \(\mathrm{EDdegree}(X) = 13\) while [2, Theorem 13] reports \(26\) nonzero critical points. The most interesting case occurs when \(p\) is divisible by \(3\), and this will be explained in the proof.

Theorem 3.8

The ED degree of the Cayley–Menger variety \(X \subset \mathbb {P}^{\left( {\begin{array}{c}p\\ 2\end{array}}\right) -1}\) equals

$$\begin{aligned} \mathrm{EDdegree}(X) = {\left\{ \begin{array}{ll} \frac{3^{p-1} - 1}{2} &{} \hbox { if }\,p \equiv 1,2 \!\mod 3 \\ \frac{3^{p-1} - 1}{2} - \frac{p!}{3 ((p/3)!)^3}&{} \hbox { if }\,p \equiv 0 \! \mod 3 \end{array}\right. } \end{aligned}$$
(3.6)

Proof

After the linear change of coordinates given by \(x_{ii} = 2 d_{ip} \) and \(x_{ij} = d_{ip} + d_{jp} - d_{ij}\), the Cayley–Menger variety \(X\) agrees with the variety of symmetric \((p-1) \times (p-1)\)-matrices of rank \(1\). This is the Veronese variety of order \(d=2\). The number \((3^{p-1}-1)/2\) is a special instance of the formula in Proposition 7.10. To show that it is valid here, we need to prove that \(X\) intersects the isotropic quadric \(Q\) transversally, i.e., the intersection \(X \cap Q\) is non-singular. If there are isolated nodal singular points, then their number gets subtracted.

The parametrization (3.4) defines the second Veronese embedding \(\mathbb {P}^{p-2} \rightarrow X \subset \mathbb {P}^{\left( {\begin{array}{c}p\\ 2\end{array}}\right) -1}\), where \(\mathbb {P}^{p-2}\) is the projective space of the quotient \(\mathbb {C}^p/\mathbb {C}\cdot (1,\ldots ,1)\). So \(X \cap Q\) is isomorphic to its inverse image in \(\mathbb {P}^{p-2}\) under this map. That inverse image is the hypersurface in \(\mathbb {P}^{p-2}\) defined by the homogeneous quartic \( f = \sum _{1 \le i < j \le p} (z_i - z_j)^4\). We need to analyze the singular locus of the hypersurface \(V(f)\) in \(\mathbb {P}^{p-2}\), which is the variety defined by all partial derivatives of \(f\). Arguing modulo \(3\) one finds that if \(p\) is not divisible by \(3\) then \(V(f)\) is smooth, and then we have \(\mathrm{EDdegree}(X) =(3^{p-1}-1)/2\). If \(p\) is divisible by \(3\) then \(V(f)\) is not smooth, but \(V(f)_{\mathrm sing}\) consists of isolated nodes that form one orbit under permuting coordinates. One representative is the point in \(\mathbb {P}^{p-2}\) represented by the vector

$$\begin{aligned} \left( 0,0, \ldots , 0, 1,1, \ldots ,1, \, \xi , \xi , \ldots , \xi \right) \in \mathbb {C}^p \quad \hbox {where } \xi ^2-\xi +1 = 0 . \end{aligned}$$

The number of singular points of the quartic hypersurface \(V(f)\) is equal to

$$\begin{aligned} \frac{p!}{3 \cdot ((p/3)!)^3}. \end{aligned}$$

For \(p>0\), this is the number of words that start with the first letter of the ternary alphabet \(\{0,1,\xi \}\) and that contain each letter exactly \(p\) times; see [34, A208881].\(\square \)

4 ED Correspondence and Average ED Degree

The ED correspondence arises when the variety \(X\) is fixed but the data point \(u\) varies. After studying this, we restrict to the real numbers, and we introduce the average ED degree, making precise a notion that was hinted at in Example 3.5. The ED correspondence yields an integral formula for \(\mathrm {aEDdegree}(X)\). This integral can sometimes be evaluated in closed form. In other cases, experiments show that evaluating the integral numerically is more efficient than estimating \(\mathrm {aEDdegree}(X)\) by sampling \(u\) and counting real critical points.

We start with an irreducible affine variety \(X \subset \mathbb {C}^n\) of codimension \(c\) that is defined over \(\mathbb {R}\), with prime ideal \(I_X = \langle f_1,\ldots ,f_s \rangle \) in \(\mathbb {R}[x_1,\ldots ,x_n]\). The ED correspondence \(\mathcal {E}_X\) is the subvariety of \(\mathbb {C}^n \times \mathbb {C}^n\) defined by the ideal (2.1) in the polynomial ring \(\mathbb {R}[x_1,\ldots ,x_n,u_1,\ldots ,u_n]\). Now, the \(u_i\) are unknowns that serve as coordinates on the second factor in \(\mathbb {C}^n \times \mathbb {C}^n\). Geometrically, \(\mathcal {E}_X\) is the topological closure in \(\mathbb {C}^n \times \mathbb {C}^n\) of the set of pairs \((x,u)\) such that \(x \in X \backslash X_\mathrm{sing}\) is a critical point of \(d_u\). The following theorem implies and enriches Lemma 2.1.

Theorem 4.1

The ED correspondence \(\mathcal {E}_X\) of an irreducible subvariety \(X \subseteq \mathbb {C}^n\) of codimension \(c\) is an irreducible variety of dimension \(n\) inside \(\mathbb {C}^n \times \mathbb {C}^n\). The first projection \( \pi _1 : \mathcal {E}_X \rightarrow X \subset \mathbb {C}^n\) is an affine vector bundle of rank \(c\) over \(X \backslash X_\mathrm{sing}\). Over general data points \(u \in \mathbb {C}^n\), the second projection \(\pi _2 : \mathcal {E}_X \rightarrow \mathbb {C}^n\) has finite fibers \(\pi _2^{-1}(u)\) of cardinality equal to \(\mathrm {EDdegree}(X)\). If, moreover, we have \(T_x X \cap (T_x X)^\perp = \{0\}\) at some point \(x \in X \backslash X_\mathrm {sing}\), then \(\pi _2\) is a dominant map and \(\mathrm {EDdegree}(X)\) is positive.

In our applications, the variety \(X\) always has real points that are smooth, i.e., in \(X\backslash X_\mathrm{sing}\). If this holds, then the last condition in Theorem 4.1 is automatically satisfied: the tangent space at such a point is real and intersects its orthogonal complement trivially. But, for instance, the hypersurface \( Q = V(x_1^2 + \cdots + x_n^2)\) does not satisfy this condition: at any point \(x \in Q\) the tangent space \(T_x Q=x^\perp \) intersects its orthogonal complement \(\mathbb {C}x\) in all of \(\mathbb {C}x\).

Proof

The affine vector bundle property follows directly from the system (1.1) or, alternatively, from the matrix representation (2.1): fixing \(x \in X \backslash X_\mathrm{sing}\), the fiber \(\pi _1^{-1}(x)\) equals \(\{x\} \times (x+(T_x X)^\perp )\), where the second factor is an affine space of dimension \(c\) varying smoothly with \(x\). Since \(X\) is irreducible, so is \(\mathcal {E}_X\), and its dimension equals \((n-c)+c=n\). For dimension reasons, the projection \(\pi _2\) cannot have positive-dimensional fibers over general data points \(u\), so those fibers are generically finite sets, of cardinality equal to \(\mathrm {EDdegree}(X)\).

For the last statement, note that the diagonal \(\Delta (X):=\{(x,x) \in \mathbb {C}^n \times \mathbb {C}^n \mid x \in X\}\) is contained in \(\mathcal {E}_X\). Fix a point \(x \in X \backslash X_\mathrm {sing}\) for which \(T_x X \cap (T_x X)^\perp = \{0\}\). Being an affine bundle over \(X \backslash X_\mathrm {sing}\), \(\mathcal {E}_X\) is smooth at the point \((x,x)\). The tangent space \(T_{(x,x)} \mathcal {E}_X\) contains both the tangent space \(T_{(x,x)} \Delta (X)= \Delta (T_x X)\) and \(\{0\} \times (T_xX)^\perp \). Thus, the image of the derivative at \(x\) of \(\pi _2: \mathcal {E}_X \rightarrow \mathbb {C}^n\) contains both \(T_x X\) and \((T_x X)^\perp \). Since these spaces have complementary dimensions and intersect trivially by assumption, they span all of \(\mathbb {C}^n\). Thus, the derivative of \(\pi _2\) at \((x,x)\) is surjective onto \(\mathbb {C}^n\), and this implies that \(\pi _2\) is dominant.\(\square \)

Corollary 4.2

If \(X\) is (uni-)rational then so is the ED correspondence \(\mathcal {E}_X\).

Proof

Let \(\psi : \mathbb {C}^{m} \rightarrow \mathbb {C}^n\) be a rational map that parametrizes \(X\), where \(m=\dim X=n-c\). Its Jacobian \(J(\psi )\) is an \(n \times m\)-matrix of rational functions in the standard coordinates \(t_1,\ldots ,t_m\) on \(\mathbb {C}^m\). The columns of \(J(\psi )\) span the tangent space of \(X\) at the point \(\psi (t)\) for general \(t \in \mathbb {C}^m\). The left kernel of \( J(\psi )\) is a linear space of dimension \(c\). We can write down a basis \(\{\beta _1 (t), \ldots ,\beta _{c}(t)\}\) of that kernel by applying Cramer’s rule to the matrix \(J(\psi )\). In particular, the \(\beta _j\) will also be rational functions in the \(t_i\). Now the map

$$\begin{aligned} \mathbb {C}^{m} \times \mathbb {C}^{c} \rightarrow \mathcal {E}_X,\ (t,s) \mapsto \biggl (\psi (t), \psi (t)+\sum _{i=1}^{c} s_i \beta _i(t) \biggr ) \end{aligned}$$

is a parametrization of \(\mathcal {E}_X\), which is birational if and only if \(\psi \) is birational.\(\square \)

Example 4.3

The twisted cubic cone \(X\) from Example 2.7 has the parametrization \(\psi : \mathbb {C}^2 \rightarrow \mathbb {C}^4,\, (t_1,t_2) \mapsto (t_1^3,t_1^2t_2,t_1t_2^2,t_2^3)\). We saw that \(\mathrm{EDdegree}(X) = 7\). Here is a parametrization of the ED correspondence \(\mathcal {E}_X\) that is induced by the construction in the proof above:

$$\begin{aligned}&\mathbb {C}^2 \times \mathbb {C}^2 \,\rightarrow \, \mathbb {C}^4 \times \mathbb {C}^4 , \quad ((t_1,t_2),(s_1,s_2)) \, \mapsto \big ( (t_1^3 , t_1^2 t_2 , t_1 t_2^2 , t_2^3), \\&\quad (t_1^3 + s_1 t_2^2, t_1^2 t_2 - 2 s_1 t_1 t_2 + s_2 t_2^2, t_1 t_2^2 + s_1 t_1^2 - 2 s_2 t_1 t_2, t_2^3 + s_2 t_1^2) \big ). \end{aligned}$$

The prime ideal of \(\mathcal {E}_X\) in \(\mathbb {R}[x_1,x_2,x_3,x_4,u_1,u_2,u_3,u_4]\) can be computed from (2.1). It is minimally generated by seven quadrics and one quartic. It is important to note that these generators are homogeneous with respect to the usual \(\mathbb {Z}\)-grading but not bi-homogeneous.

The formulation (2.7) leads to the subideal generated by all bi-homogeneous polynomials that vanish on \(\mathcal {E}_X\). It has six minimal generators, three of degree \((2,0)\) and three of degree \((3,1)\). Geometrically, this corresponds to the variety \(\mathcal {P} \mathcal {E}_X \subset \mathbb {P}^3 \times \mathbb {C}^4\) we introduce next. \(\diamondsuit \)

If \(X\) is an affine cone in \(\mathbb {C}^n\), we consider the closure of the image of \(\mathcal {E}_X \cap ((\mathbb {C}^n \backslash \{0\}) \times \mathbb {C}^n)\) under the map \((\mathbb {C}^n \backslash \{0\}) \times \mathbb {C}^n \rightarrow \mathbb {P}^{n-1} \times \mathbb {C}^n,\ (x,u) \mapsto ([x],u)\). This closure is called the projective ED correspondence of \(X\), and it is denoted \(\mathcal {P} \mathcal {E}_X\). It has the following properties.

Theorem 4.4

Let \(X \subseteq \mathbb {C}^n\) be an irreducible affine cone not contained in the isotropic quadric \(Q\). Then, the projective ED correspondence \(\mathcal {P} \mathcal {E}_X\) of \(X\) is an \(n\)-dimensional irreducible variety in \(\mathbb {P}^{n-1} \times \mathbb {C}^n\). It is the zero set of the ideal (2.7). Its projection onto the projective variety in \(\mathbb {P}^{n-1}\) given by \(X\) is a vector bundle over \(X \backslash (X_\mathrm {sing}\cup Q)\) of rank \(c+1\). The fibers over general data points \(u\) of its projection onto \(\mathbb {C}^n\) are finite of cardinality equal to \(\mathrm {EDdegree}(X)\).

Proof

The first statement follows from Lemma 2.8: let \(x \in X \backslash (X_\mathrm {sing}\cup Q)\) and \(u \in \mathbb {C}^n\). First, if \((x,u) \in \mathcal {E}_X\), then certainly \(([x],u)\) lies in the variety of the ideal (2.1). Conversely, if \(([x],u)\) lies in the variety of that ideal, then there exists a (unique) \(\lambda \) such that \((\lambda x, u) \in \mathcal {E}_X\). If \(\lambda \) is nonzero, then this means that \(([x],u)\) lies in the projection of \(\mathcal {E}_X\). If \(\lambda \) is zero, then \(u \perp T_x X\) and hence \((\epsilon x, \epsilon x+u) \in \mathcal {E}_X\) for all \(\epsilon \in \mathbb {C}\). The limit of \(([\epsilon x],\epsilon x + u)\) for \(\epsilon \rightarrow 0\) equals \(([x],u)\), so the latter point still lies in the closure of the image of \(\mathcal {E}_X\), i.e., in the projective ED correspondence. The remaining statements are proved as in the proof of Theorem 4.1.\(\square \)

We now turn our attention to the average ED degree of a real affine variety \(X\) in \(\mathbb {R}^n\). In applications, the data point \(u\) also lies in \(\mathbb {R}^n\), and \(u^*\) is the unique closest point to \(u\) in \(X\). The quantity \(\mathrm {EDdegree}(X)\) measures the algebraic complexity of writing the optimal solution \(u^*\) as a function of the data \(u\). But when applying other, non-algebraic methods for finding \(u^*\), the number of real-valued critical points of \(d_u\) for randomly sampled data \(u\) is of equal interest. In contrast with the number of complex-valued critical points, this number is typically not constant for all general \(u\), but rather constant on the connected components of the complement of an algebraic hypersurface \(\Sigma _X \subset \mathbb {R}^n\), which we call the ED discriminant. To get, nevertheless, a meaningful count of the critical points, we propose to average over all \(u\) with respect to a measure on \(\mathbb {R}^n\). In the remainder of this section, we describe how to compute that average using the ED correspondence. Our method is particularly useful in the setting of Corollary 4.2, i.e., when \(X\) and hence \(\mathcal {E}_X\) have rational parametrizations.

We equip data space \(\mathbb {R}^n\) with a volume form \(\omega \) whose associated density \(|\omega |\) satisfies \(\int _{\mathbb {R}^n} |\omega |=1\). A common choice for \(\omega \) is the standard multivariate Gaussian \(\frac{1}{(2\pi )^{n/2}} e^{-||x||^2/2}\, \mathrm {d}x_1 \wedge \cdots \wedge \mathrm {d}x_n\). This choice is natural when \(X\) is an affine cone: in that case, the origin \(0\) is a distinguished point in \(\mathbb {R}^n\), and the number of real critical points will be invariant under scaling \(u\). Now we ask for the expected number of critical points of \(d_u\) when \(u\) is drawn from the probability distribution on \(\mathbb {R}^n\) with density \(|\omega |\). This average ED degree of the pair \((X,\omega )\) is

$$\begin{aligned} \mathrm {aEDdegree}(X,\omega ) := \int _{\mathbb {R}^n} \#\{\text {real critical points of } d_u \text { on } X \} \cdot |\omega |. \end{aligned}$$
(4.1)

In the formulas below, we write \(\mathcal {E}_X\) for the set of real points of the ED correspondence. Using the substitution rule from multivariate calculus, we rewrite the integral in (4.1) as follows:

$$\begin{aligned} \mathrm {aEDdegree}(X,\omega ) = \int _{\mathbb {R}^n} \# \pi _2^{-1}(u) \cdot |\omega | = \int _{\mathcal {E}_X} |\pi _2^*(\omega )|, \end{aligned}$$
(4.2)

where \(\pi _2^*(\omega )\) is the pullback of the volume form \(\omega \) along the derivative of the map \(\pi _2\).

See Fig. 2 for a cartoon illustrating the computation in (4.2). Note that \(\pi _2^*(\omega )\) need not be a volume form since it may vanish at some points—namely, at the ramification locus of \(\pi _2\), i.e., at points where the derivative of \(\pi _2\) is not of full rank. This ramification locus is typically an algebraic hypersurface in \(\mathcal {E}_X\) and equal to the inverse image of the ED discriminant \(\Sigma _X\). The usefulness of the formula (4.2), and a more explicit version of it to be derived below, will depend on whether the strata in the complement of the branch locus of \(\pi _2\) are easy to describe. We need such a description because the integrand will be a function “without absolute values in it” only on such open strata that lie over the complement of \(\Sigma _X\).

Fig. 2
figure 2

The map from the ED correspondence \(\mathcal {E}_X\) to data space has four branch points. The weighted average of the fiber sizes \(1,3,5,3,1\) can be expressed as an integral over \(\mathcal {E}_X\)

Suppose that we have a parametrization \(\phi :\mathbb {R}^n \rightarrow \mathcal {E}_X\) of the ED correspondence that is generically one-to-one. For instance, if \(X\) itself is given by a birational parametrization \(\psi \), then \(\phi \) can be derived from \(\psi \) using the method in the proof of Corollary 4.2. We can then write the integral over \(\mathcal {E}_X\) in (4.2) more concretely as

$$\begin{aligned} \int _{\mathcal {E}_X} |\pi _2^*(\omega )| \!=\! \int _{\mathbb {R}^n} |\phi ^* \pi _2^*(\omega )| \!=\! \int _{\mathbb {R}^n} |\det J_t (\pi _2 \circ \phi )| \cdot f(\pi _2(\phi (t))) \cdot \mathrm {d}t_1 \wedge \cdots \wedge \mathrm {d}t_n. \end{aligned}$$
(4.3)

Here, \(f\) is the smooth (density) function on \(\mathbb {R}^n\) such that \(\omega _u=f(u) \cdot \mathrm {d}u_1 \wedge \cdots \wedge \mathrm {d}u_n\). In the standard Gaussian case, this would be \(f(u) = e^{-||u||^2/2}/(2\pi )^{n/2}\). The determinant in (4.3) is taken of the differential of \(\pi _2 \circ \phi \). To be fully explicit, the composition \(\pi _2 \circ \phi \) is a map from \(\mathbb {R}^n\) to \(\mathbb {R}^n\), and \(J_t(\pi _2 \circ \phi )\) denotes its \(n \times n\) Jacobian matrix at a point \(t\) in the domain of \(\phi \).

Example 4.5

(ED and average ED degree of an ellipse) Let \(X \) denote the ellipse in \(\mathbb {R}^2\) with equation \(x^2 + 4 y^2=4\). We first compute \(\mathrm {EDdegree}(X)\). Let \((u,v) \in \mathbb {R}^2\) be a data point. The tangent line to the ellipse \(X\) at \((x,y) \) has direction \((-4y,x)\). Hence, the condition that \((x,y) \in X\) is critical for \(d_{(u,v)}\) translates into the equation \((u-x,v-y) \cdot (-4y,x)=0\), i.e., into \(3xy+vx-4uy=0\). For general \((u,v)\), the curve defined by the latter equation and the ellipse intersect in \(4\) points in \(\mathbb {C}^2\), so \(\mathrm {EDdegree}(X)=4\).

Now we consider \(\mathrm {aEDdegree}(X,\omega )\) where \(\omega =\frac{1}{2\pi }e^{-(u^2+v^2)/2} \mathrm {d}u \wedge \mathrm {d}v\) is the standard Gaussians centered at the midpoint \((0,0)\) of the ellipse. Given \((x,y) \in X\), the \((u,v)\) for which \((x,y)\) is critical are precisely those on the normal line. This is the line through \((x,y)\) with direction \((x,4y)\). In Fig. 3, we plotted some of these normal lines. A colorful dynamic version of the same picture can be seen at http://en.wikipedia.org/wiki/Evolute. The evolute of the ellipse \(X\) is what we named the ED discriminant. It is the sextic Lamé curve

$$\begin{aligned} \Sigma _X&= V( 64 u^6+48 u^4 v^2+12 u^2 v^4+v^6 -432 u^4\\&+756 u^2 v^2-27 v^4+ 972 u^2+243 v^2-729). \end{aligned}$$

Consider the rational parametrization of \(X\) given by \(\psi (t) = \left( \frac{8t}{1+4t^2}, \frac{4t^2-1}{1+4t^2} \right) \), \(t \in \mathbb R\). From \(\psi \) we construct a parametrization \(\phi \) of the surface \(\mathcal {E}_X\) as in Corollary 4.2, so that

$$\begin{aligned} \pi _2 \circ \phi : \mathbb {R}\times \mathbb {R}\rightarrow \mathbb {R}^2,(t,s) \mapsto \left( (s+1)\frac{8t}{1+4t^2} ,(4s+1)\frac{4t^2-1}{1+4t^2} \right) . \end{aligned}$$

The Jacobian determinant of \(\pi _2 \circ \phi \) equals \(\frac{-32(1+s+4(2s-1)t^2+16(1+s)t^4)}{(1+4t^2)^3}\), so \(\mathrm {aEDdegree}(X)\) is

$$\begin{aligned}&\frac{1}{2\pi } \int _{-\infty }^\infty \left( \int _{-\infty }^\infty \left| \frac{-32(1+s+4(2s{-}1)t^2+16(1{+}s)t^4)}{(1+4t^2)^3}\right| \right. \nonumber \\&\qquad \left. e^{\frac{-(1+4s)^2-8(7-8(-1+s)s)t^2-16(1+4s)^2t^4}{2(1+4t^2)^2}} \mathrm {d}t \right) \mathrm {d}s. \end{aligned}$$

Numerical integration (using Mathematica 9) finds the value \(3.04658...\) in \(0.2\) s.

Fig. 3
figure 3

Computing the average ED of an ellipse: the evolute divides the plane into an inside region, where fibers or \(\pi _2\) have cardinality \(4\), and an outside region, where fibers of \(\pi _2\) have cardinality \(2\). The average ED of the ellipse is a weighted average of these numbers

The following experiment independently validates this average ED degree calculation. We sample data points \((u,v)\) randomly from Gaussian distribution. For each \((u,v)\) we compute the number of real critical points, which is either \(2\) or \(4\), and we average these numbers. The average value approaches \(3.05...\), but it requires \(10^5\) samples to get two digits of accuracy. The total running time is \(38.7\) s, so much longer than the numerical integration. \(\diamondsuit \)

Example 4.6

The cardioid \(X\) from Example 1.1 can be parametrized by

$$\begin{aligned} \psi :\mathbb {R}\rightarrow \mathbb {R}^2,\ t \mapsto \left( \frac{2 t^2-2}{(1 + t^2)^2},\ \frac{-4 t}{(1 + t^2)^2}\right) . \end{aligned}$$

From this, we derive the following parametrization of the ED correspondence \(\mathcal {E}_X\):

$$\begin{aligned} \phi :\mathbb {R}\times \mathbb {R}\rightarrow \mathbb {R}^2 \times \mathbb {R}^2, (t,s) \mapsto&\biggl (\psi (t), \frac{2 (t^4 -1 + 4 s ( 3 t^2-1)}{(1 + t^2)^3},\\&\qquad \frac{4 t (-1 - 6 s + ( 2 s-1) t^2)}{(1 + t^2)^3}\biggr ). \end{aligned}$$

Fixing the standard Gaussian centered at \((0,0)\), the integral (4.3) now evaluates as follows:

$$\begin{aligned} \mathrm {aEDdegree}(X,\omega )= \frac{1}{2\pi } \int _{\mathbb {R}^2} |\det J_{t,s} (\pi _2 \circ \phi )| e^{-\frac{||\pi _2\circ \phi (t,s)||^2}{2}}\mathrm {d}t\mathrm {d}s \approx 2.8375. \end{aligned}$$

Thus, our measure gives little mass to the region inside the smaller cardioid in Fig. 1. \(\diamondsuit \)

Example 4.7

Some varieties \(X\) have the property that, for all real data \(u\), all the complex critical points have real coordinates. If this holds then \(\mathrm {aEDdegree}(X,\omega ) = \mathrm {EDdegree}(X)\), for any measure \(|\omega |\) on data space. One instance is the variety \(X_r\) of real \(s\times t\) matrices of rank \(\le r\), by Example 2.3. We shall discuss the ED discriminant of \(X_r\) in Example 7.6. \(\diamondsuit \)

We next present a family of examples where the integral (4.3) can be computed exactly.

Example 4.8

We take \(X\) as the cone over the rational normal curve, in a special coordinate system, as in Example 2.7 and Corollary 8.7. Fix \(\mathbb {R}^2\) with the standard orthonormal basis \(e_1,e_2\). Let \(S^n \mathbb {R}^2\) be the space of homogeneous polynomials of degree \(n\) in the variables \(e_1,e_2\). We identify this space with \(\mathbb {R}^{n+1}\) by fixing the basis \(f_i:=\sqrt{\left( {\begin{array}{c}n\\ i\end{array}}\right) }\cdot e_1^ie_2^{n-i}\) for \( i=0,\ldots ,n\). This ensures that the natural action of the orthogonal group \(\mathrm {O}_2(\mathbb {R})\) on polynomials in \(e_1,e_2\) is by transformations that are orthogonal with respect to the standard inner product on \(\mathbb {R}^{n+1}\).

Define \(v,w:\mathbb {R}^2\rightarrow \mathbb {R}^2\) by \(v(t_1,t_2):=t_1e_1+t_2e_2\) and \(w(t_1,t_2):=t_2e_1-t_1e_2\). These two vectors form an orthogonal basis of \(\mathbb {R}^2\) for \((t_1,t_2) \ne (0,0)\). Our surface \(X\) is parametrized by

$$\begin{aligned} \psi : \mathbb {R}^2 \,\rightarrow \, S^n \mathbb {R}^2 =\mathbb {R}^{n+1},\quad (t_1,t_2) \,\mapsto \, v(t_1,t_2)^n = \sum _{i=0}^n t_1^i t_2^{n-i} \sqrt{\left( {\begin{array}{c}n\\ i\end{array}}\right) } f_i. \end{aligned}$$

For \(n=3\), this parametrization specializes to the second parametrization in Example 2.7. Fix the standard Gaussian centered at the origin in \(\mathbb {R}^{n+1}\). In what follows, we shall prove

$$\begin{aligned} \mathrm {aEDdegree}(X) = \sqrt{3n-2}. \end{aligned}$$
(4.4)

We begin by parametrizing the ED correspondence, as suggested in the proof of Corollary 4.2. For \((t_1,t_2) \ne (0,0)\), the tangent space \(T_{\psi (t_1,t_2)} X\) is spanned by \(v(t_1,t_2)^n\) and \(v(t_1,t_2)^{n-1}\cdot w(t_1,t_2)\). Since, by the choice of scaling, the vectors \(v^n,v^{n-1}w,\ldots ,w^n\) form an orthogonal basis of \(\mathbb {R}^{n+1}\), we find that the orthogonal complement \((T_{\psi (t_1,t_2)} X)^\perp \) has the orthogonal basis

$$\begin{aligned} w(t_1,t_2)^n,\ v(t_1,t_2)\cdot w(t_1,t_2)^{n-1},\ldots ,\ v(t_1,t_2)^{n-2}\cdot w(t_1,t_2)^2. \end{aligned}$$

The resulting parametrization \( \phi : \mathbb {R}^2 \times \mathbb {R}^{n-1} \rightarrow \mathcal {E}_X\) of the ED correspondence equals

$$\begin{aligned} (t_1,t_2,s_0, \ldots ,s_{n-2}) \,\mapsto \,&\left( \psi (t_1,t_2),\ v(t_1,t_2)^n + s_0 w(t_1,t_2)^n \right. \\&\quad \left. + \cdots + s_{n-2} v(t_1,t_2)^{n-2}\cdot w(t_1,t_2)^2 \right) . \end{aligned}$$

Next, we determine the Jacobian \(J=J(\pi _2 \circ \phi )\) at the point \(\psi (t_1,t_2)\). It is most convenient to do so relative to the orthogonal basis \(v(t_1,t_2), w(t_1,t_2), (1,0,\ldots ,0), \ldots , (0,\ldots ,0,1)\) of \(\mathbb {R}^2 \times \mathbb {R}^{n-1}\) and the orthogonal basis \(w(t_1,t_2)^n,\ldots ,v(t_1,t_2)^n\) of \(\mathbb {R}^{n+1}\). Relative to these bases,

$$\begin{aligned} J \quad = \quad \begin{bmatrix} *&\quad \! *&\quad \! 1&\quad \! 0&\quad \! \cdots&\quad \! 0\\ *&\quad \! *&\quad \! 0&\quad \! 1&\quad \! \cdots&\quad \! 0\\ \vdots&\quad \! \vdots&\quad \! \vdots&\quad \! \vdots&\quad \! \ddots&\quad \! \vdots \\ *&\quad \! *&\quad \! 0&\quad \! 0&\quad \! \cdots&\quad \! 1\\ 0&\quad \! n-2s_{n-2}&\quad \! 0&\quad \! 0&\quad \! \cdots&\quad \! 0\\ n&\quad \! *&\quad \! 0&\quad \! 0&\quad \! \cdots&\quad \! 0 \end{bmatrix}, \end{aligned}$$

where the stars are irrelevant for \(\mathrm{det}(J)\). For instance, an infinitesimal change \(v(t_1,t_2) \mapsto v(t_1,t_2)+\epsilon w(t_1,t_2)\) leads to a change \(w(t_1,t_2) \mapsto w(t_1,t_2)-\epsilon v(t_1,t_2)\) and to a change of \(\pi _2 \circ \phi \) in which the coefficient of \(\epsilon v(t_1,t_2)^{n-1}\cdot w(t_1,t_2)\) equals \(n-2s_{n-2}\). When computing the determinant of \(J\), we must consider that the chosen bases are orthogonal but not orthonormal: the norm of \(v(t_1,t_2)^i\cdot w(t_1,t_2)^{n-i}\), corresponding to the \(i\)-th row, equals \(\sqrt{(t_1^2 + t_2^2)^n}\left( {\begin{array}{c}n\\ i\end{array}}\right) ^{-1/2}\); and the norm of \(v(t_1,t_2)\) and \(w(t_1,t_2)\), corresponding to the first and second column, equals \(\sqrt{t_1^2+t_2^2}\). Multiplying the determinant of the matrix above with the product of these scalars and dividing by the square of \(\sqrt{t_1^2+t_2^2}\) for the first two columns, we obtain the formula

$$\begin{aligned} |\det J(\pi _2 \circ \phi )|= n \cdot |n-2s_{n-2}| \cdot (t_1^2+t_2^2)^{n(n+1)/2 - 1} \cdot \prod \limits _{i=0}^n \left( {\begin{array}{c}n\\ i\end{array}}\right) ^{-1/2}\!\!\!. \end{aligned}$$

Next, the squared norm of \(u=\pi _2 \circ \psi (t_1,t_2,s_0,\ldots ,s_{n-2})\) equals

$$\begin{aligned} ||u||^2=(t_1^2+t_2^2)^n \cdot \left( 1 + \sum _{i=0}^{n-2} s_i^2 \left( {\begin{array}{c}n\\ i\end{array}}\right) ^{-1} \right) . \end{aligned}$$

The average ED degree of \(X\) relative to the standard Gaussian equals

$$\begin{aligned} \mathrm {aEDdegree}(X)=\frac{1}{(2\pi )^{(n+1)/2}} \int |\det J(\pi _2 \circ \psi )| e^{-||u||^2/2} \mathrm {d}v_1 \mathrm {d}v_2 \mathrm {d}s_0 \cdots \mathrm {d}s_{n-2}. \end{aligned}$$

parametrizing the regions where \(\det J(\pi _2 \circ \psi )\) is positive or negative by \(s_{n-2} \in (-\infty ,n/2)\) or \(s_{n-2} \in (n/2,\infty )\), this integral can be computed in closed form. Its value equals \(\sqrt{3n-2}\). Interestingly, this value is the square root of the general ED degree in Example 5.12. For more information see Sect. 8 and the article [10] where tensors of rank \(1\) are treated. \(\diamondsuit \)

We close this section with the remark that different applications require different choices of the measure \(|\omega |\) on data space. For instance, one might want to draw \(u\) from a product of intervals equipped with the uniform distribution, or to concentrate the measure near \(X\).

5 Duality

This section deals exclusively with irreducible affine cones \(X \subset \mathbb {C}^n\), or, equivalently, with their corresponding projective varieties \(X \subset \mathbb {P}^{n-1}\). Such a variety has a dual variety \(Y:=X^* \subset \mathbb {C}^n\), which is defined as follows, where the line indicates the topological closure:

$$\begin{aligned} Y := \overline{ \bigl \{y \in {\mathbb {C}}^n \mid \exists x \in X \backslash X_\mathrm {sing}: y \perp T_x X \bigr \} }. \end{aligned}$$

See [38, Section 5.2.4] for an introduction to this duality in the context of optimization. Algorithm 5.1 in [38] explains how to compute the ideal of \(Y\) from that of \(X\).

The variety \(Y\) is an irreducible affine cone, so we can regard it as an irreducible projective variety in \(\mathbb {P}^{n-1}\). That projective variety parametrizes hyperplanes tangent to \(X\) at non-singular points, if one uses the standard bilinear form on \(\mathbb {C}^n\) to identify hyperplanes with points in \(\mathbb {P}^{n-1}\). We will prove \(\mathrm {EDdegree}(X)=\mathrm {EDdegree}(Y)\). Moreover, for general data \(u \in \mathbb {C}^n\), there is a natural bijection between the critical points of \(d_u\) on the cone \(X\) and the critical points of \(d_u\) on the cone \(Y\). We then link this result to the literature on the conormal variety (cf. [25]) which gives powerful techniques for computing ED degrees of smooth varieties that intersect the isotropic quadric \(Q = V(x_1^2+\cdots + x_n^2)\) transversally. Before dealing with the general case, we revisit the example of the Eckart–Young Theorem.

Example 5.1

For the variety \(X_r\) of \(s\times t\) matrices (\(s\le t\)) of rank \(\le r\), we have \(X_r^*=X_{s-r}\) [19, Chap. 1, Prop. 4.11]. From Example 2.3, we see that \(\mathrm{EDdegree}(X_r) = \mathrm{EDdegree}(X_{s-r})\). There is a bijection between the critical points of \(d_U\) on \(X_r\) and on \(X_{s-r}\). To see this, consider the singular value decomposition (2.3). For a subset \(I=\{i_1,\ldots , i_r\}\) of \(\{1,\ldots , s\}\), we set

$$\begin{aligned} U_I = T_1 \cdot \mathrm{diag}(\ldots ,\sigma _{i_1},\ldots , \sigma _{i_2}, \ldots , \sigma _{i_r},\ldots ) \cdot T_2, \end{aligned}$$

where the places of \(\sigma _j\) for \(j \not \in I\) have been filled with zeros in the diagonal matrix. Writing \(I^c\) for the complementary subset of size \(s-r\), we have \(U=U_I+U_{I^c}\). This decomposition is orthogonal in the sense that \(\langle U_I,U_{I^c}\rangle =\mathrm tr (U_I^tU_{I^c})=0\). It follows that, if \(U\) is real, then \(|U|^2=|U_I|^2+|U_{I^c}|^2\), where \(|U|^2=tr(U^tU)\). As \(I\) ranges over all \(r\)-subsets, \(U_I\) runs through the critical points of \(d_U\) on the variety \(X_r\), and \(U_{I^c}\) runs through the critical points of \(d_U\) on the dual variety \(X_{s-r}\). Since the formula above reads as \(|U|^2=|U-U_{I^c}|^2+|U-U_{I}|^2\), we conclude that the proximity of the real critical points reverses under this bijection. For instance, if \(U_I\) is the real point on \(X_r\) closest to \(U\), then \(U_{I^c}\) is the real point on \(X_{s-r}\) farthest from \(U\). For a similar result in the multiplicative context of maximum likelihood see [11]. \( \diamondsuit \)

The following theorem shows that the duality seen in Example 5.1 holds in general.

Theorem 5.2

Let \(X \subset \mathbb {C}^n\) be an irreducible affine cone, \(Y \subset \mathbb {C}^n\) its dual variety, and \(u \in \mathbb {C}^n\) a general data point. The map \(x \mapsto u-x\) gives a bijection from the critical points of \(d_u\) on \(X\) to the critical points of \(d_u\) on \(Y\). Consequently, \(\mathrm {EDdegree}(X)=\mathrm {EDdegree}(Y)\). Moreover, if \(u\) is real, then the map sends real critical points to real critical points, and hence \(\mathrm {aEDdegree}(X,\omega )=\mathrm {aEDdegree}(Y,\omega )\) for any volume form \(\omega \). The map is proximity-reversing: the closer a real critical point \(x\) is to the data point \(u\), the further \(u-x\) is from \(u\).

The statement of Theorem 5.2 is illustrated in Fig. 4. On the left, the variety \(X\) is a one-dimensional affine cone in \(\mathbb {R}^2\). This \(X\) is not irreducible but it visualizes our duality in the simplest possible case. The right picture shows the same scenario in one dimension higher. Here, \(X\) and \(X^*\) are quadratic cones in \(\mathbb {R}^3\), corresponding to a dual pair of conics in \(\mathbb {P}^2\).

Fig. 4
figure 4

The bijection between critical points on \(X\) and critical points on \(X^*\)

The proof of Theorem 5.2 uses properties of the conormal variety, which is defined as

$$\begin{aligned} \mathcal {N}_X := \overline{ \bigl \{(x,y) \in \mathbb {C}^n \times \mathbb {C}^n \mid x \in X \backslash X_\mathrm {sing}\,\,\mathrm{and} \,\, y \perp T_x X \bigr \}}. \end{aligned}$$

The conormal variety is the zero set of the following ideal in \(\mathbb {R}[x,y]\):

$$\begin{aligned} N_X := \biggl ( I_X \,+ \, \biggl \langle (c+1) \times (c+1)\hbox {-minors of } \begin{pmatrix} y \\ J(f) \end{pmatrix} \biggr \rangle \biggr ) : \left( \, I_{X_\mathrm{sing}}\,\right) ^\infty , \end{aligned}$$
(5.1)

where \(f=(f_1,\ldots ,f_s)\) is a system of homogeneous generators of \(I_X\). It is known that \(\mathcal {N}_X\) is irreducible of dimension \(n-1\). The projection of \(\mathcal {N}_X\) into the second factor \(\mathbb {C}^n\) is the dual variety \(Y = X^*\). Its ideal \(I_Y\) is computed by elimination, namely, by intersecting (5.1) with \(\mathbb {R}[y]\). An important property of the conormal variety is the Biduality Theorem [19, Chapter 1], which states that \(\mathcal {N}_X\) equals \(\mathcal {N}_Y\) up to swapping the two factors. In symbols, we have

$$\begin{aligned} \mathcal {N}_X \,=\, \mathcal {N}_Y \,= \,\overline{\bigl \{(x,y) \in \mathbb {C}^n \times \mathbb {C}^n \mid \,y \in Y \backslash Y_\mathrm {sing}\,\, \mathrm{and} \,\, x \perp T_y Y \bigr \}}. \end{aligned}$$

This implies \((X^*)^* = Y^*=X\). Thus, the biduality relation in [38, Theorem 5.13] holds. To keep the symmetry in our notation, we will henceforth write \(\mathcal {N}_{X,Y}\) for \(\mathcal {N}_X\) and \(N_{X,Y}\) for \(N_X\).

Proof of Theorem 5.2

The following is illustrated in Fig. 4. If \(x\) is a critical point of \(d_u\) on \(X\), then \(y:=u-x\) is orthogonal to \(T_x X\), and hence \((x,y) \in \mathcal {N}_{X,Y}\). Since \(u\) is general, all \(y\) thus obtained from critical points \(x\) of \(d_u\) are non-singular points on \(Y\). By the Biduality Theorem, we have \(u-y=x \perp T_y Y\), i.e., \(y\) is a critical point of \(d_u\) on \(Y\). This shows that \(x \mapsto u-x\) maps critical points of \(d_u\) on \(X\) into critical points of \(d_u\) on \(Y\). Applying the same argument to \(Y\), and using that \(Y^*=X\), we find that, conversely, \(y \mapsto u-y\) maps critical points of \(d_u\) on \(Y\) to critical points of \(d_u\) on \(X\). This establishes the bijection.

The consequences for \(\mathrm {EDdegree}(X)\) and \(\mathrm {aEDdegree}(X,\omega )\) are straightforward. For the last statement we observe that \(u-x \perp x \in T_x X\) for critical \(x\). For \(y = u-x\), this implies

$$\begin{aligned} ||u-x||^2 + ||u-y||^2 = ||u-x||^2 + ||x||^2 = ||u||^2 . \end{aligned}$$

Hence, the assignments that take real data points \(u\) to \(X\) and \(X^*\) are proximity-reversing.\(\square \)

Duality leads us to define the joint ED correspondence of the cone \(X\) and its dual \(Y\) as

$$\begin{aligned} \mathcal {E}_{X,Y} :&= \overline{\, \bigl \{(x,u-x,u) \in \mathbb {C}^n_x \times \mathbb {C}^n_y \times \mathbb {C}^n_u \mid x \in X \backslash X_\mathrm {sing}\,\, \mathrm{and} \,\, u-x \perp T_x X \bigr \}}\\&= \,\overline{\bigl \{(u-y,y,u) \in \mathbb {C}^n_x \times \mathbb {C}^n_y \times \mathbb {C}^n_u \mid \, y \in Y \backslash Y_\mathrm {sing}\,\, \mathrm{and} \,\, u-y \perp T_y Y \bigr \}}. \end{aligned}$$

The projection of \(\mathcal {E}_{X,Y}\) into \(\mathbb {C}^n_x \times \mathbb {C}^n_u\) is the ED correspondence \(\mathcal {E}_X\) of \(X\), its projection into \(\mathbb {C}^n_y \times \mathbb {C}^n_u\) is \(\mathcal {E}_Y\), and its projection into \(\mathbb {C}^n_x \times \mathbb {C}^n_y\) is the conormal variety \(\mathcal {N}_{X,Y}\). The affine variety \(\mathcal {E}_{X,Y}\) is irreducible of dimension \(n\), since \(\mathcal {E}_X\) has these properties (by Theorem 4.1), and the projection \(\mathcal {E}_{X,Y} \rightarrow \mathcal {E}_X\) is birational with inverse \((x,u) \mapsto (x,u-x,u)\).

Following Theorem 4.4, we also introduce the projective joint ED correspondence \(\mathcal {P} \mathcal {E}_{X,Y}\). By definition, \(\mathcal {P} \mathcal {E}_{X,Y}\) is the closure of the image of \(\,\mathcal {E}_{X,Y} \cap \left( (\mathbb {C}^n \backslash \{0\})^2 \times \mathbb {C}^n \right) \) in \(\mathbb {P}^{n-1}_x \times \mathbb {P}^{n-1}_y \times \mathbb {C}^n_u\).

Proposition 5.3

Let \(X \subset \mathbb {C}^n\) be an irreducible affine cone, let \(Y \subset \mathbb {C}^n\) be the dual variety of \(X\), and assume that neither \(X\) nor \(Y\) is contained in \(Q = V(q)\), where \(q = x_1^2+\cdots +x_n^2\). Then \(\mathcal {P} \mathcal {E}_{X,Y}\) is an irreducible \(n\)-dimensional variety in \(\mathbb {P}^{n-1}_x \times \mathbb {P}^{n-1}_y \times \mathbb {C}^n_u\).

It is the zero set of the tri-homogeneous ideal

$$\begin{aligned} \biggl ( N_{X,Y} \!+ \! \biggl \langle 3 \!\times \! 3\hbox {-minors of the}\,3 \times n\, \hbox {-matrix} \begin{pmatrix} \,\,u\,\, \\ x \\ y \end{pmatrix} \biggr \rangle \biggr ) :\, \bigl \langle q(x) \cdot q(y) \bigr \rangle ^\infty \!\subset \! \mathbb {R}[x,y,u]. \end{aligned}$$
(5.2)

Proof

The irreducibility of \(\mathcal {P} \mathcal {E}_{X,Y}\) follows from that of \(\mathcal {E}_{X,Y}\) which has the same dimension.

To see that \(\mathcal {P} \mathcal {E}_{X,Y}\) is defined by the ideal (5.2), note first that any point \((x,y,u)\) with \(x \in X \backslash X_\mathrm {sing}\) and \(y \perp T_xX\) and \(x+y=u\) has \((x,y) \in \mathcal {N}_{X,Y}\) and \(\dim \langle x,y,u \rangle \le 2\), so that \(([x],[y],u)\) is a zero of (5.2). This shows that \(\mathcal {P}\mathcal {E}_{X,Y}\) is contained in the variety of (5.2).

Conversely, let \(([x],[y],u)\) be in the variety of (5.2). The points with \(q(x)q(y) \ne 0\) are dense in the variety of (5.2), so we may assume \(x,y \not \in Q\). Moreover, since \((x,y) \in \mathcal {N}_{X,Y}\), we may assume that \(x,y\) are non-singular points of \(X\) and \(Y\), and that \(x \perp T_y Y\) and \(y \perp T_x X\). This implies \(x \perp y\). Since \(x,y\) are not isotropic, they are linearly independent. Then, \(u=c x + d y\) for unique constants \(c,d \in \mathbb {C}\). If \(c,d \ne 0\), then we find that \((cx,dy,u) \in \mathcal {E}_{X,Y} \cap ((\mathbb {C}^n \backslash \{0\})^2 \times \mathbb {C}^n_u)\) and hence \(([x],[y],u) \in \mathcal {P} \mathcal {E}_{X,Y}\). If \(c \ne 0\) but \(d=0\), then \((cx, \epsilon y, u+\epsilon y) \in \mathcal {E}_{X,Y}\) for all \(\epsilon \ne 0\), so that the limit of \(([cx],[\epsilon y],u+\epsilon y)\) for \(\epsilon \rightarrow 0\), which is \(([x],[y],u)\), lies in \(\mathcal {P} \mathcal {E}_{X,Y}\). Similar arguments apply when \(d \ne 0\) but \(c=0\) or when \(c=d=0\).\(\square \)

Our next result gives a formula for \(\mathrm {EDdegree}(X)\) in terms of the polar classes of classical algebraic geometry [37]. These non-negative integers \(\delta _i(X)\) are the coefficients of the class

$$\begin{aligned}{}[\mathcal {N}_{X,Y}] \quad = \quad \delta _0(X) s^{n-1} t + \delta _1(X) s^{n-2} t^2 + \cdots + \delta _{n-2}(X) s t^{n-1} \end{aligned}$$
(5.3)

of the conormal variety, when regarded as a subvariety of \(\mathbb {P}^{n-1} \times \mathbb {P}^{n-1}\). For topologists, the polynomial (5.3) is the class representing \(\mathcal {N}_{X,Y}\) in the cohomology ring \(H^*(\mathbb {P}^{n-1} {\times } \mathbb {P}^{n-1}) = \mathbb {Z}[s,t]/\langle s^n,t^n \rangle \). For commutative algebraists, it is the multidegree of the \(\mathbb {Z}^2\)-graded ring \(\mathbb {R}[x,y]/N_{X,Y}\). This is explained in [33, Section 8.5] and is implemented in Macaulay2 with the command multidegree. For geometers, the polar classes \(\delta _i(X)\) have the following definition: intersecting the \((n-2)\)-dimensional subvariety \(\mathcal {N}_{X,Y} \subset \mathbb {P}^{n-1} \times \mathbb {P}^{n-1}\) with an \(n\)-dimensional subvariety \( L \times M \) where \(L,M\) are general linear subspaces of \(\mathbb {P}^{n-1}\) of dimensions \(n-j\) and \(j\), respectively, one gets a finite number of simple points. The number \(\delta _{j-1}(X)\) counts these points. The shift by one is to ensure compatibility with Holme’s paper [25].

So, for example, \(\delta _0(X)\) counts the number of intersections of \(\mathcal {N}_{X,Y}\) with \(\mathbb {P}^{n-1} \times M\) where \(M\) is a general projective line. These are the intersections of the dual variety \(Y\) with \(M\). Thus, if \(Y\) is a hypersurface, then \(\delta _0(X)\) is the degree of \(Y\), and otherwise \(\delta _0(X)\) is zero. In general, the first nonzero coefficient of (5.3) is the degree of \(Y\) and the last nonzero coefficient is the degree of \(X\). For all \(i\), we have \(\delta _i(Y)=\delta _{n-2-i}(X)\); see [25, Theorem 2.3].

Theorem 5.4

If \(\mathcal {N}_{X,Y}\) does not intersect the diagonal \(\Delta (\mathbb {P}^{n-1}) \subset \mathbb {P}^{n-1} \times \mathbb {P}^{n-1}\), then

$$\begin{aligned} \mathrm {EDdegree}(X) = \delta _0(X)+\cdots +\delta _{n-2}(X) = \delta _{n-2}(Y) + \cdots + \delta _0(Y) = \mathrm {EDdegree}(Y). \end{aligned}$$

A sufficient condition for \(\mathcal {N}_{X,Y}\) not to intersect \(\Delta (\mathbb {P}^{n-1})\) is that \(X \cap Q\) is a transversal intersection everywhere (i.e., \(X \cap Q\) is smooth) and disjoint from \(X_\mathrm {sing}\). Indeed, suppose that \((x,x) \in \mathcal {N}_{X,Y}\) for some \(x \in X\). There exists a sequence of points \((x_i,y_i) \in \mathcal {N}_{X,Y}\) with \(x_i \in X \backslash X_\mathrm {sing}\), \(y_i \perp T_{x_i} X\), such that \(\lim _{i \rightarrow \infty }(x_i,y_i) \rightarrow (x,x)\). Then \(y_i \perp x_i\), so taking the limit we find \(x \in Q\). If, moreover, \(X\) is smooth at \(x\), then \(T_{x_i} X\) converges to the tangent space \(T_x X\). We conclude that \(x \perp T_x X\), which means that \(X\) is tangent to \(Q\) at \(x\).

Proof of Theorem 5.4

Denote by \(Z\) the variety of linearly dependent triples \((x,y,u) \in \mathbb {P}^{n-1}_x \times \mathbb {P}^{n-1}_y \times \mathbb {C}^n_u\). By Proposition 5.3, the intersection \((\mathcal {N}_{X,Y} \times \mathbb {C}^n) \cap Z\) contains the projective ED correspondence \(\mathcal {P} \mathcal {E}_{X,Y}\) as a component. The two are equal because \((\mathcal {N}_{X,Y} \times \mathbb {C}^n) \cap Z\) is swept out by the two-dimensional vector spaces \(\{(x,y)\} \times \langle x,y \rangle \), as \((x,y)\) runs through the irreducible variety \(\mathcal {N}_{X,Y}\), and hence, it is irreducible. Here, we are using that \(\mathcal {N}_{X,Y} \cap \Delta (\mathbb {P}^{n-1}) = \emptyset \).

Hence, \(\mathrm {EDdegree}(X)\) is the length of a general fiber of the map \(\pi _3:(\mathcal {N}_{X,Y} \times \mathbb {C}^n) \cap Z \rightarrow \mathbb {C}^n\). Next, a tangent space computation shows that the intersection \((\mathcal {N}_{X,Y} \times \mathbb {C}^n) \cap Z\) is transversal, so an open dense subset of it is a smooth scheme. By generic smoothness [23, Corollary III.10.7], the fiber \(\pi _3^{-1}(u)\) over a general data point \(u\) consists of simple points only. This fiber is scheme-theoretically the same as \(\mathcal {N}_{X,Y} \cap Z_u\), where \(Z_u\) is the fiber in \(Z\) over \(u\). The cardinality of this intersection is the coefficient of \(s^{n-1} t^{n-1}\) in the product \([\mathcal {N}_{X,Y}] \cdot [Z_u]\) in \(H^*(\mathbb {P}^{n-1} {\times } \mathbb {P}^{n-1}) = \mathbb {Z}[s,t]/\langle s^n,t^n \rangle \). The determinantal variety \(Z_u\) has codimension \(n-2\), and

$$\begin{aligned}{}[Z_u] = s^{n-2}+s^{n-3}t+s^{n-4} t^2 + \cdots +s t^{n-3}+t^{n-2} . \end{aligned}$$

This is a very special case of [33, Corollary 16.27]. By computing modulo \(\langle s^n, t^n \rangle \), we find

$$\begin{aligned}{}[\mathcal {N}_{X,Y}] \cdot [Z_u] \,&= \, (\delta _0(X) s^{n-1} t + \cdots + \delta _{n-2}(X) s t^{n-1}) \cdot \\&[Z_u] \,= \,(\delta _0(X) + \cdots + \delta _{n-2}(X)) s^{n-1} t^{n-1}. \end{aligned}$$

This establishes the desired identity.\(\square \)

Remark 5.5

If \(X\) and \(Y\) are smooth then \(X \cap Q\) is smooth if and only if \(\Delta (\mathbb {P}^{n-1}) \cap \mathcal {N}_{X,Y} = \emptyset \) if and only if \(Y \cap Q\) is smooth. We do not know whether this holds when \(X\) or \(Y\) is singular. Unfortunately, it happens very rarely that \(X\) and \(Y\) are both smooth (see [13]).

Example 5.6

Let \(X\) be the variety of symmetric \(s \times s\)-matrices \(x\) of rank \(\le r\) and \(Y\) the variety of symmetric \(s \times s\)-matrices \(y\) of rank \(\le s-r\). These two determinantal varieties form a dual pair [38, Example 5.15]. Their conormal ideal \(N_{X,Y}\) is generated by the relevant minors of \(x\) and \(y\) and the entries of the matrix product \(xy\). The class \([N_{X,Y}]\) records the algebraic degree of semidefinite programming. A formula was found by von Bothmer and Ranestad in [4]. Using the package Schubert2 in Macaulay2 [20, 21], and summing over the index \(m\) in [4, Proposition 4.1], we obtain the following table of values for \(\mathrm {EDdegree}(X)\):

$$\begin{aligned} \begin{array}{r@{\quad }|r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r} &{}s&{}=&{}2&{}3&{}4&{}5&{}6&{}7\\ \hline r=1&{}&{}&{}4&{}13&{}40&{}121&{}364&{}1093\\ r=2&{}&{}&{}&{}13&{}122&{}1042&{}8683&{}72271\\ r=3&{}&{}&{}&{}&{}40&{}1042&{}23544&{}510835\\ r=4&{}&{}&{}&{}&{}&{}121&{}8683&{}510835\\ r=5&{}&{}&{}&{}&{}&{}&{}364&{}72271\\ r=6&{}&{}&{}&{}&{}&{}&{}&{}1093\\ \end{array} \end{aligned}$$

In order for \(X\) to satisfy the hypothesis in Theorem 5.4, it is essential that the coordinates are sufficiently general, so that \(X \cap Q\) is smooth. The usual coordinates in \(\mathbb {C}^{\left( {\begin{array}{c}s+1\\ 2\end{array}}\right) }\) enjoy this property, and the table above records the ED degree for the second interpretation in Example 3.2. Specifically, our number \(13\) for \(s=3\) and \(r=2\) appeared on the right in (3.2). The symmetry in the columns of our table reflects the duality result in Theorem 5.2. \(\diamondsuit \)

Example 5.7

Following [38, Ex. 5.44], Cayley’s cubic surface \(X = V(f) \subset \mathbb {P}^3_x\) is given by

$$\begin{aligned} f(x) = \mathrm{det} \begin{pmatrix} x_0 &{}\quad \! x_1 &{}\quad \! x_2 \\ x_1 &{}\quad \! x_0 &{}\quad \! x_3 \\ x_2 &{}\quad \! x_3 &{}\quad \! x_0 \end{pmatrix}. \end{aligned}$$

Its dual in \(\mathbb {P}^3_y\) is the quartic Steiner surface \(Y = V(g)\), with \( g\, = \, y_1^2 y_2^2+y_1^2 y_3^2+y_2^2 y_3^2 -2y_0 y_1 y_2y_3\). The conormal ideal \(N_{X,Y}\) is minimally generated by \(18\) bihomogeneous polynomials in \(\mathbb {R}[x,y]\):

$$\begin{aligned}&f \hbox { of degree }(3,0); g \hbox { of degree }(0,4); \,q(x,y) = x_0 y_0+x_1 y_1+x_2 y_2\nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad +x_3 y_3\hbox { of degree } (1,1);\\&\qquad \hbox {six generators of degree }(1,2),\hbox { such as }\,\,x_2 y_1 y_2+x_3 y_1 y_3+x_0 y_2 y_3 ;\hbox { and}\\&\qquad \hbox {nine generators of degree }(2,1),\hbox { such as }\,x_0 x_1 y_2-x_2 x_3 y_2+x_0^2y_3-x_3^2 y_3. \end{aligned}$$

The conormal variety \(\mathcal {N}_{X,Y}\) is a surface in \(\mathbb {P}^3_x \times \mathbb {P}^3_y\) with class \(\,4 s^3 t + 6 s^2 t^2 + 3 s t^3\), and hence

$$\begin{aligned} \mathrm{EDdegree}(X) \, = \, \mathrm{EDdegree}(Y) \, = \, 4+6+3 \, = \,13. \end{aligned}$$

Corollary 6.4 relates this to the number \(13\) in (3.2). The projective joint ED correspondence \(\mathcal {P}\mathcal {E}_{X,Y}\) is defined by the above equations together with the four \(3 \times 3\)-minors of the matrix

$$\begin{aligned} \begin{pmatrix} u \\ x \\ y \end{pmatrix} = \begin{pmatrix} u_0 &{}\quad \! u_1 &{}\quad \! u_2 &{}\quad \! u_3 \\ x_0 &{}\quad \! x_1 &{}\quad \! x_2 &{}\quad \! x_3 \\ y_0 &{}\quad \! y_1 &{}\quad \! y_2 &{}\quad \! y_3 \end{pmatrix}. \end{aligned}$$

For fixed scalars \(u_0,u_1,u_2,u_3 \in \mathbb {R}\), this imposes a codimension \(2\) condition. This cuts out \(13\) points in \(\,\mathcal {N}_{X,Y} \subset X \times Y \subset \mathbb {P}^3_x \times \mathbb {P}^3_y\). These represent the critical points of \(d_u\) on \(X\) or \(Y\). \(\diamondsuit \)

Armed with Theorem 5.4, we can now use the results described in Holme’s article [25] to express the ED degree of a smooth projective variety \(X\) in terms of its Chern classes.

Theorem 5.8

Let \(X\) be a smooth irreducible subvariety of dimension \(m\) in \( \mathbb {P}^{n-1}\), and suppose that \(X\) is transversal to the isotropic quadric \(Q\). Then

$$\begin{aligned} \mathrm {EDdegree}(X) = \sum _{i=0}^{m} (-1)^i \cdot (2^{m+1-i}-1) \cdot \deg (c_i(X)). \end{aligned}$$
(5.4)

Here, \(c_i(X)\) is the \(i\)th Chern class of the tangent bundle of \(X\). For more information on Chern classes, and alternative formulations of Theorem 5.8, we refer the reader to Sect. 7.

Proof

By Theorem 5.4, we have \(\mathrm {EDdegree}(X)=\sum _{i=0}^{n-2} \delta _i(X)\). We also saw that \(\delta _i(X)=0\) for \(i>m\), so we may let \(i\) run from \(0\) to \(m\) instead. Substituting the expression

$$\begin{aligned} \delta _i(X) = \sum _{j=i}^{m} (-1)^{m-j} \left( {\begin{array}{c}j+1\\ i+1\end{array}}\right) \deg (c_{m-j}(X)) \end{aligned}$$

from [25, p. 150], and summing over all values of the index \(i\), yields the theorem.\(\square \)

Corollary 5.9

Let \(X\) be a smooth irreducible curve of degree \(d\) and genus \(g\) in \( \mathbb {P}^{n-1}\), and suppose that \(X\) is transversal to \(Q\). Then

$$\begin{aligned} \mathrm {EDdegree}(X) = 3d+2g-2. \end{aligned}$$
(5.5)

Proof

We have from [23, App. A §3] that \(\deg (c_0(X))=d\) and \(\deg (c_1(X))=2-2g\).\(\square \)

Example 5.10

Consider a \(2 \times 3\) matrix with entries in \(\mathbb {R}[x_1,x_2,x_3,x_4]\) where the first row contains general linear forms, and the second row contains general quadratic forms. The ideal \(I\) generated by its three maximal minors defines a smooth irreducible curve in \(\mathbb {P}^3\) of degree \(7\) and genus \(5\), so Corollary 5.9 gives \(EDdegree (V(I)) = 3\cdot 7 + 2\cdot 5 - 2 = 29\). This exceeds the bound of \(27\) we would get by taking \(n=4, c=3, d_1=d_2=d_3=3\) in (2.8). However, while ideal \(I\) has \(s=3\) generators, the codimension of its variety \(V(I)\) is \(c=2\). Applying Corollary 2.10 to \(c=2, d_1=d_2=3\), we get the correct bound of \(45\). This is the ED degree for the complete intersection of two cubics in \(\mathbb {P}^3\), and it exceeds \(29\) as desired. \(\diamondsuit \)

The formula (5.4) is particularly nice for a (projectively normal) smooth toric variety \(X\) in \(\mathbb {P}^{n-1}\). According to [17], this can be represented by a simple lattice polytope \(P \subset \mathbb {R}^m\) with \(|P \cap \mathbb {Z}^m| = n\), and \(c_{m-j}(X)\) is the sum of classes corresponding to all \(j\)-dimensional faces of \(P\). The degree of this class is its normalized volume. Therefore, Theorem 5.8 implies

Corollary 5.11

Let \(X \subset \mathbb {P}^{n-1}\) be an \(m\)-dimensional smooth projective toric variety, with coordinates such that \(X\) is transversal to \(Q\). If \(\,V_j\) denotes the sum of the normalized volumes of all \(j\)-dimensional faces of the simple lattice polytope \(P\) associated with \(X\), then

$$\begin{aligned} \mathrm {EDdegree}(X) = \sum _{j=0}^m (-1)^{m-j} \cdot (2^{j+1}-1) \cdot V_j. \end{aligned}$$

Example 5.12

Consider a rational normal curve \(X\) in \(\mathbb {P}^{n}\) in general coordinates (we denote the ambient space as \(\mathbb {P}^{n}\) instead of \(\mathbb {P}^{n-1}\), to compare with Example 4.8). The associated polytope \(P\) is a segment of integer length \(n\). The formula above yields

$$\begin{aligned} \mathrm {EDdegree}(X) = (2^2-1) \cdot V_1 -(2^1-1) \cdot V_0 = 3 n -2. \end{aligned}$$

In special coordinates, the ED degree can drop to \(n\); see Corollary 8.7. Interestingly, in those special coordinates, the square root of \(3n-2\) is the average ED degree, by Example 4.8.

All Segre varieties and Veronese varieties are smooth toric varieties, so we can compute their ED degrees (in general coordinates) using Corollary 5.11. For Veronese varieties, this can be used to verify the \(r=1\) row in the table of Example 5.6. For instance, for \(s=3\), the toric variety \(X\) is the Veronese surface in \(\mathbb {P}^5\), and the polytope is a regular triangle with sides of lattice length \(2\). Here, \(\mathrm {EDdegree}(X) = 7 \cdot V_2 - 3 \cdot V_1 + V_0 = 7 \cdot 4 - 3 \cdot 6 + 3 \, = \, 13 \). \(\diamondsuit \)

6 Geometric Operations

Following up on our discussion of duality, this section studies the behavior of the ED degree of a variety under other natural operations. We begin with the dual operations of projecting from a point and intersecting with a hyperplane. Thereafter, we discuss homogenizing and dehomogenizing. Geometrically, these correspond to passing from an affine variety to its projective closure and vice versa. We saw in the examples of Sect. 3 that the ED degree can go up or go down under homogenization. We aim to explain that phenomenon.

Our next two results are corollaries to Theorem 5.4 and results of Piene in [37]. We work in the setting of Sect. 5, so \(X\) is an irreducible projective variety in \(\mathbb {P}^{n-1}\) and \(X^*\) is its dual, embedded into the same \(\mathbb {P}^{n-1}\) by way of the quadratic form \(q(x,y) = x_1 y_1 + \cdots + x_n y_n\). The polar classes satisfy \(\delta _i(X) = \delta _{n-2-i}(X^*)\). These integers are zero for \(i \ge \mathrm{dim}(X)\) and \(i \le \mathrm{codim}(X^*)-2\), and they are strictly positive for all other values of the index \(i\). The first positive \(\delta _i(X)\) is the degree of \(X^*\), and the last positive \(\delta _i(X)\) is the degree of \(X\). The sum of all \(\delta _i(X)\) is the common ED degree of \(X\) and \(X^*\). See [25] and our discussion above.

Fix a general linear map \(\pi :\mathbb {C}^n \rightarrow \mathbb {C}^{n-1}\). This induces a rational map \(\pi : \mathbb {P}^{n-1} \dashrightarrow \mathbb {P}^{n-2}\), whose base point lies outside \(X\). The image \(\pi (X)\) is an irreducible closed subvariety in \(\mathbb {P}^{n-2}\). Since the projective space \(\mathbb {P}^{n-2}\) comes with a coordinate system \((x_1:x_2:\cdots :x_{n-1})\), the ED degree of \(\pi (X)\) is well-defined. If \(\mathrm{codim}(X) =1\) then \(\pi (X) = \mathbb {P}^{n-2}\) has ED degree \(1\) for trivial reasons. Otherwise, \(X\) maps birationally onto \(\pi (X)\), and the ED degree is preserved:

Corollary 6.1

Let \(X\) satisfy the assumptions of Theorem 5.4. If \(\mathrm{codim}(X) \ge 2\) then

$$\begin{aligned} \mathrm {EDdegree}(\pi (X)) = \mathrm {EDdegree}(X) . \end{aligned}$$
(6.1)

Proof

Piene [37] showed that \(\delta _i(\pi (X)) = \delta _i(X)\) for all \(i\). Now use Theorem 5.4.\(\square \)

Example 6.2

Let \(I\) be the prime ideal generated by the \(2 \times 2\)-minors of the symmetric \(3 \times 3\)-matrix whose six entries are general linear forms in \(\mathbb {R}[x_1,x_2,x_3,x_4,x_5,x_6]\). The elimination ideal \(\,J = I \cap \mathbb {R}[x_1,x_2,x_3,x_4,x_5]\) is minimally generated by seven cubics. Its variety \(\pi (X) = V(J)\) is a random projection of the Veronese surface \(X=V(I)\) from \(\mathbb {P}^5\) into \(\mathbb {P}^4\). Example 5.6 tells us that \(\mathrm {EDdegree}(X) = 13\). By plugging \(J = I_{\pi (X)}\) into the formula (2.7), and running Macaulay2 as in Example 2.11, we verify \(\mathrm {EDdegree}(\pi (X)) = 13\). \(\diamondsuit \)

If \(X\) is a variety of high codimension, then Corollary 6.1 can be applied repeatedly until the image \(\pi (X)\) is a hypersurface. In other words, we can take \(\pi \) to be a general linear projection \(\mathbb {P}^{n-1} \dashrightarrow \mathbb {P}^{d}\) provided \(d > \mathrm{dim}(X)\). Then, \(\pi (X)\) also satisfies the assumptions of Theorem 5.4, and the formula (6.1) remains valid. This technique is particularly useful when \(X\) is a smooth toric variety as in Corollary 5.11. Here, \(X\) is parametrized by certain monomials, and \(\pi (X)\) is parametrized by general linear combinations of those monomials.

Example 6.3

Consider a surface in \(\mathbb {P}^3\) that is parametrized by four homogeneous polynomials of degree \(d\) in three variables. That surface can be represented as \(\pi (X)\) where \(X\) is the \(d\)-fold Veronese embedding of \(\mathbb {P}^2\) into \(\mathbb {P}^{\left( {\begin{array}{c}d+2\\ 2\end{array}}\right) -1}\), and \(\pi \) is a random projection into \(\mathbb {P}^3\). By applying Corollary 5.11 to the associated lattice triangle \(P = \mathrm{conv}\{ (0,0), (0,d), (d,0)\}\), and using Corollary 6.1, we find \(\, \mathrm {EDdegree}(\pi (X)) = \mathrm {EDdegree}(X) = 7 d^2 -9 d + 3\). This is to be compared to the number \(4 d^2 - 4 d + 1\), which is the ED degree in (2.5) for the affine surface in \(\mathbb {C}^3\) parametrized by three inhomogeneous polynomials of degree \(d\) in two variables.

A similar distinction arises for Bézier surfaces in \(3\)-space. The ED degree of the affine surface in Example 3.1 is \(8d_1d_2 - 2d_1 - 2d_2 + 1\), while \(\mathrm {EDdegree}(\pi (X)) = 14 d_1 d_2 - 6 d_1 - 6 d_1 + 4\) for the projective surface \(\pi (X)\) that is given by four bihomogeneous polynomials \(\psi _i\) of degree \((d_1,d_2)\) in \(2+2\) parameters. Here, the toric surface is \(X = \mathbb {P}^1 \times \mathbb {P}^1\), embedded in \(\mathbb {P}^{(d_1+1)(d_2+1)-1}\) by the line bundle \(\mathcal {O}(d_1,d_2)\), and the lattice polygon is the square \(P = [0,d_1] \times [0,d_2]\). \(\diamondsuit \)

In the previous example, we computed the ED degree of a variety by expressing it as a linear projection from a high-dimensional space with desirable combinatorial properties. This is reminiscent of the technique of lifting in optimization theory, where one simplifies a problem instance by optimizing over a higher-dimensional constraint set that projects onto the given constraint set. It would be desirable to develop this connection further, and to find a more direct proof of Corollary 6.1 that works for both projective and affine varieties.

The operation dual to projection is taking linear sections. Let \(H\) be a general hyperplane in \(\mathbb {P}^{n-1}\). Then, \(X \cap H\) is a subvariety of codimension \(1\) in \(X\). In particular, it lives in the same ambient space \( \mathbb {P}^{n-1}\), with the same coordinates \((x_1:\cdots :x_n)\), and this defines the ED degree of \(X \cap H\). By Bertini’s Theorem, the variety \(X \cap H\) is irreducible provided \(\mathrm{dim}(X) \ge 2\).

Corollary 6.4

Let \(X \subset \mathbb {P}^{n-1}\) satisfy the assumptions of Theorem 5.4. Then

$$\begin{aligned} \mathrm {EDdegree}(X\cap H) = {\left\{ \begin{array}{ll} \mathrm {EDdegree}(X) - \mathrm{degree}(X^*) &{} \mathrm{if} \,\, \mathrm{codim}(X^*) = 1,\\ \qquad \mathrm {EDdegree}(X) &{} \mathrm{if} \,\, \mathrm{codim}(X^*) \ge 2. \end{array}\right. } \end{aligned}$$

Proof

Piene [37] showed that \(\delta _i(X \cap H)=\delta _{i+1}(X)\) for all \(i \ge 0\). By Theorem 5.4, the desired ED degree is the sum of these numbers, so it equals \(\mathrm {EDdegree}(X) - \delta _0(X)\). However, we know that \(\delta _0(X)\) equals the degree of \(X^*\) if \(X^*\) is a hypersurface and it is zero otherwise.\(\square \)

Example 6.5

Let \(X_r\) be the projective variety of symmetric \(3 \times 3\)-matrices of rank \(\le r\). We know that \(\,X_r^* = X_{3-r}\,\) and \(\,\mathrm {EDdegree}(X_2) = \mathrm {EDdegree}(X_1) = 13\). If \(H\) is a general hyperplane in \(\mathbb {P}^5\) then \(\, \mathrm {EDdegree}(X_2 \cap H) = 13\,\) but \(\, \mathrm {EDdegree}(X_1 \cap H) = 13-3 = 10 \). \(\diamondsuit \)

If \(X\) is a variety of high dimension in \(\mathbb {P}^{n-1}\), then Corollary 6.4 can be applied repeatedly until a general linear section is a curve. This motivates the following definition which parallels its analog in the multiplicative setting of likelihood geometry [27, §3]. The sectional ED degree of the variety \(X\) is the following binary form of degree \(n-1\) in \((x,u)\):

$$\begin{aligned} \sum _{i=0}^{\mathrm{dim}(X)-1} \!\! \mathrm {EDdegree}(X \cap L_i) \cdot x^{i} \cdot u^{n-1-i} \end{aligned}$$
(6.2)

where \(L_i\) is a general linear section of codimension \(i\). Corollary 6.4 implies that, for varieties in general coordinates as in Theorem 5.4, this equals

$$\begin{aligned} \sum _{0 \le i \le j < \mathrm{dim}(X)} \!\! \delta _j(X) \cdot x^{i} \cdot u^{n-1-i} . \end{aligned}$$

It is desirable to get a better understanding of the sectional ED degree also for varieties in special coordinates. For instance, in light of [27, Conjecture 3.19], we may ask how (6.2) is related to the bidegree of the projective ED correspondence, or to the tridegree of the joint projective ED correspondence. For a concrete application, suppose that \(X\) is a determinantal variety, in the special coordinates of the Eckart–Young Theorem (Example 2.3). Minimizing the squared distance function \(d_u\) over a linear section \(X \cap L_i\) is known as structured low-rank matrix approximation. This problem has numerous applications in engineering; see [8]. After this paper had been written, a study, including computation of EDdegree for low-rank matrices constrained in linear or affine subspaces, was published in [35].

We now change the topic to homogenization. Geometrically, this is the passage from an affine variety \(X \subset \mathbb {C}^n\) to its projective closure \(\overline{X} \subset \mathbb {P}^n\). This is a standard operation in algebraic geometry [9, §8.4]. Homogenization often preserves the solution set to a given geometric problem, but the analysis is simpler in \(\mathbb {P}^n\) since projective space is compact. Algebraically, we proceed as follows. Given the ideal \(I_X = \langle f_1,\ldots ,f_s \rangle \subset \mathbb {R}[x_1,\ldots ,x_n]\), we introduce a new variable \(x_0\), representing the hyperplane at infinity, \(H_\infty = \mathbb {P}^n \backslash \mathbb {C}^n = V(x_0)\). Given a polynomial \(f \in \mathbb {R}[x_1,\ldots ,x_n]\) of degree \(d\), its homogenization \(\overline{f} \in \mathbb {R}[x_0,x_1,\ldots ,x_n]\) is defined by \(\overline{f}(x_0,\ldots ,x_n)=x_0^d \cdot f(x_1/x_0,\ldots ,x_n/x_0)\). The ideal \(I_{\overline{X}}\) of the projective variety \(\overline{X} \) is generated by \(\{ \overline{f} \,: \,f \in I_X \}\). It can be computed (e.g., in Macaulay2) by saturation:

$$\begin{aligned} I_{\overline{X}} = \langle \overline{f}_1,\ldots , \overline{f}_s \rangle : \langle x_0 \rangle ^\infty \quad \subseteq \mathbb {R}[x_0,x_1,\ldots ,x_n]. \end{aligned}$$

One might naively hope that \(\mathrm {EDdegree}(X)=\mathrm {EDdegree}(\overline{X})\). But this is false in general:

Example 6.6

Let \(X\) be the cardioid in Example 1.1. Written in the notation above, its projective closure is the quartic curve \(\overline{X} \subset \mathbb {P}^2\) whose defining homogeneous ideal equals

$$\begin{aligned} I_{\overline{X}} = \langle \, x_0^2 x_2^2 - 2 x_0 x_1^3 - 2 x_0 x_1 x_2^2 -x_1^4 - 2x_1^2x_2^2 -x_2^4 \, \rangle . \end{aligned}$$

For this curve we have

$$\begin{aligned} \mathrm {EDdegree}(X)\,=\, 3 < 7\, = \, \mathrm {EDdegree}(\overline{X}). \end{aligned}$$

By contrast, consider the affine surface \(Y = V(x_1 x_2 - x_3) \subset \mathbb {C}^3\). Its projective closure is the \(2 {\times } 2\)-determinant \(\overline{Y} = V(x_1 x_2 - x_0 x_3) \subset \mathbb {P}^3\). Here, the inequality goes in the other direction:

$$\begin{aligned} \mathrm {EDdegree}(Y)\,=\, 5 > 2\, = \, \mathrm {EDdegree}(\overline{Y}). \end{aligned}$$
(6.3)

The same phenomenon was seen in our study of Hurwitz determinants in Theorem 3.6. \(\diamondsuit \)

To explain what is going on here, we recall that \(\mathrm {EDdegree}(\overline{X})\) is defined as the ED degree of the affine cone over the projective variety \(\overline{X}\subset \mathbb {P}^n\), which we also denote by \(\overline{X}\). Explicitly,

$$\begin{aligned} \overline{X}= \{\,(t,tx) \mid x \in X , t \in \mathbb {C}\,\} \quad \subset \mathbb {C}^{n+1}. \end{aligned}$$

The ED degree of \(\overline{X}\) is for the fixed quadratic form \(x_0^2 +x_1^2 + \cdots + x_n^2\) that cuts out the isotropic quadric \(Q \subset \mathbb {P}^n\). This is just one of the infinitely many quadratic forms on \(\mathbb {C}^{n+1}\) that restrict to the given form \(x \cdot x = x_1^2 + \cdots + x_n^2\) on \(\mathbb {C}^n\). That is one reason why the ED degrees of \(X\) and of \(\overline{X}\) are not as closely related as one might hope. Nevertheless, we will now make the relation more explicit. The affine variety \(X\) is identified with the intersection of the cone \(\overline{X}\) with the hyperplane \(\{x_0=1\}\). Its part at infinity is denoted \(\,X_\infty :=\overline{X}\cap H_\infty \).

The data point \((1,0) \in \mathbb {C}^{n+1}\) plays a special role, since it is the orthogonal projection of the vertex \((0,0)\) of the cone \(\overline{X}\) onto the affine hyperplane \(\{x_0=1\}\). The following lemma relates the critical points for \(u = 0\) on \(X\) to the critical points for \(u = (1,0)\) on \(\overline{X}\).

Lemma 6.7

Assume that all critical points of \(d_0\) on \(X\) satisfy \(x \cdot x \ne -1\). Then, the map

$$\begin{aligned} x \,\mapsto \, \left( \frac{1}{1+(x \cdot x)}, \frac{1}{1+(x \cdot x)} x\right) \end{aligned}$$

is a bijection from the critical points of \(d_0\) on \(X\) to the critical points of \(d_{(1,0)}\) on \(\overline{X}\backslash X_\infty \).

Proof

Let \(t \in \mathbb {C}\backslash \{0\}\) and \(x \in X \backslash X_\mathrm {sing}\). The point \((t,tx) \in \overline{X}\) is critical for \(d_{(1,0)}\) if and only if \((1-t,-tx)\) is perpendicular to \(T_{(t,tx)}\overline{X}\). That space is spanned by \(\{0\} \times T_x X\) and \((1,x)\). Hence, \((1-t,-tx)\) is perpendicular to \(T_{(t,tx)}\overline{X}\) if and only if \(x \perp T_x X\) and \((1-t) - t(x\cdot x)=0\). The first condition says that \(x\) is critical for \(d_0\), and the second gives \(t = 1/(1+(x \cdot x))\).\(\square \)

If under the assumptions in Lemma 6.7, the number of critical points of \(d_0\) equals the ED degree of \(X\), then we can conclude \(\mathrm {EDdegree}(X) \le \mathrm {EDdegree}(\overline{X})\), with equality if none of the critical points of \(d_{(1,0)}\) on \(\overline{X}\) lies at infinity. To formulate a condition that guarantees equality, we fix the isotropic quadric \(\,Q_\infty = \{ x_1^2 + \cdots + x_n^2=0\}\,\) in \( H_\infty \). Our condition is:

$$\begin{aligned} \hbox {The intersections }\,X_\infty =\overline{X}\cap H_\infty \, \hbox {and }\,X_\infty \cap Q_\infty \,\hbox { are both transversal.} \end{aligned}$$
(6.4)

Lemma 6.8

If (6.4) holds then none of the critical points of \(d_{(1,0)}\) on \(\overline{X}\) lies in \(X_\infty \).

Proof

Arguing by contradiction, suppose that \((0,x_\infty ) \in X_\infty \) is a critical point of \(d_{(1,0)}\) on \(\overline{X}\). Then \((1,-x_\infty )\) is perpendicular to \(T_{(0,x_\infty )} \overline{X}\), and hence \((0,x_\infty )\) is perpendicular to \(H_\infty \cap T_{(0,x_\infty )} \overline{X}\). By transversality of \(\overline{X}\) and \(H_\infty \), the latter is the tangent space to \(X_\infty \) at \((0,x_\infty )\). Hence, \(T_{(0,x_\infty )} X_\infty \) is contained in \((0,x_\infty )^\perp \), and \(X_\infty \) is tangent to \(Q_\infty \) at \((0,x_\infty )\).\(\square \)

Fix \(v \in \mathbb {C}^n\) and consider the affine translate \(X_v:=X - v=\{x-v \mid x \in X\}\). Its projective closure \(\overline{X}_v\) is isomorphic to \(\overline{X}\) as a projective variety in \(\mathbb {P}^n\). However, the metric properties of the corresponding cones in \(\mathbb {C}^{n+1}\) are rather different. While \(\mathrm {EDdegree}(X_v)=\mathrm {EDdegree}(X)\) holds trivially, it is possible that \(\mathrm {EDdegree}(\overline{X}_v) \ne \mathrm {EDdegree}(\overline{X})\). Here is a simple example:

Example 6.9

Consider the unit circle \(X = \{x_1^2+x_2^2=1\}\) in the plane. Then, \(\mathrm {EDdegree}(X)=\mathrm {EDdegree}(\overline{X})=2\). For general \(v \in \mathbb {R}^2\), the translated circle \(X_v\) has \(\mathrm {EDdegree}(\overline{X}_v)=4\). \(\diamondsuit \)

Affine translation sheds light on the behavior of the ED degree under homogenization.

Proposition 6.10

Let \(X\) be an irreducible variety in \(\mathbb {C}^n\), and let \(v \in \mathbb {C}^n\) be a general vector. Then \(\mathrm {EDdegree}(X) \le \mathrm {EDdegree}(\overline{X}_v)\), and equality holds if the hypothesis (6.4) is satisfied.

The hypothesis (6.4) simply says that \(X_\infty \) and \(X_\infty \cap Q_\infty \) are smooth. Note that this does not depend on the extension of the quadric \(Q_\infty \) to \(\mathbb {C}^{n+1}\).

Proof

Since translation of affine varieties preserves ED degree, the inequality follows from Lemma 6.7 provided \(x'\cdot x' \not = -1\) for all critical points \(x'\) for \(d_0\) on \(X_v\). These are the points \(x'=x-v\) with \(x\) critical for \(d_v\), i.e., with \((x,v) \in \mathcal {E}_X\). The expression \( (x-v) \cdot (x-v)\) is not constant \(-1\) on the irreducible variety \(\mathcal {E}_X\), because it is zero on the diagonal \(\Delta (X) \subset \mathcal {E}_X\). As a consequence, the variety of pairs \((x,v) \in \mathcal {E}_X\) with \((x-v) \cdot (x-v)=-1\) has dimension \(\le n-1\). In particular, it does not project dominantly onto the second factor \(\mathbb {C}^n\). Taking \(v\) outside that projection, and such that the number of critical points of \(d_v\) on \(X\) is equal to \(\mathrm {EDdegree}(X)\), ensures that we can apply Lemma 6.7. The second statement follows from Lemma 6.8 applied to \(X_v\) and the fact that \(X\) and \(X_v\) have the same behavior at infinity.\(\square \)

Our main result on homogenization links the discussion above to the polar classes of \(\overline{X}\).

Theorem 6.11

For any irreducible affine variety \(X\) in \(\mathbb {C}^n\) we have the two inequalities

$$\begin{aligned} \mathrm {EDdegree}(X) \le \sum _{i=0}^{n-1} \delta _i(\overline{X}) \quad \text { and } \quad \mathrm {EDdegree}(\overline{X}) \le \sum _{i=0}^{n-1} \delta _i(\overline{X}), \end{aligned}$$

with equality on the left if (6.4) holds, and equality on the right if the conormal variety \(\mathcal {N}_{\overline{X}}\) is disjoint from the diagonal \(\Delta (\mathbb {P}^n)\) in \(\mathbb {P}^n_x \times \mathbb {P}^n_y\). The equality on the right holds in particular if \(X\cap Q\) is smooth and disjoint from \(X_{sing}\) (see the statement after Theorem 5.4).

Proof

We claim that for general \(v \in \mathbb {C}^n\) the conormal variety \(\mathcal {N}_{\overline{X}_v}\) does not intersect \(\Delta (\mathbb {P}^n)\). For this, we need to understand how \(\mathcal {N}_{\overline{X}_v}\) changes with \(v\). The \((1+n) \times (1+n)\) matrix

$$\begin{aligned} A_v:=\begin{pmatrix} 1 &{}\quad \! 0 \\ -v &{}\quad \! I_n \end{pmatrix} \end{aligned}$$

defines an automorphism \(\mathbb {P}^n_x \rightarrow \mathbb {P}^n_x\) that maps \(\overline{X}\) isomorphically onto \(\overline{X}_v\). The second factor \(\mathbb {P}^n_y\) is the dual of \(\mathbb {P}^n_x\) and hence transforms contragradiently, i.e., by the matrix \(A_v^{-T}\). Hence, the pair of matrices \((A_v,A_v^{-T})\) maps \(\mathcal {N}_{\overline{X}}\) isomorphically onto \(\mathcal {N}_{\overline{X}_v}\). Consider the variety

$$\begin{aligned} Z\,:=\, \bigl \{\,(x,y,v) \in \mathcal {N}_{\overline{X}} \times \mathbb {C}^n \mid A_v x = A_v^{-T} y \bigr \}. \end{aligned}$$

For fixed \((x,y)=((x_0:x_\infty ),(y_0:y_\infty )) \in \mathcal {N}_{\overline{X}}\) with \(x_0 \ne 0\), the equations defining \(Z\) read

$$\begin{aligned} x_0 = c (y_0+v^T y_\infty ) \quad \text { and } \quad -x_0 v+x_\infty =c y_\infty , \end{aligned}$$

for \(v \in \mathbb {C}^n\) and a scalar \(c\) reflecting that we work in projective space. The second equation expresses \(v\) in \(c,x,y\). Substituting that expression into the first equation gives a system for \(c\) with at most two solutions. This shows that \(\dim Z\) is at most \(\dim \mathcal {N}_{\overline{X}}=n-1\), so the image of \(Z\) in \(\mathbb {C}^n\) is contained in a proper subvariety of \(\mathbb {C}^n\). For any \(v\) outside that subvariety, \(\mathcal {N}_{\overline{X}_v} \cap \Delta (\mathbb {P}^n) = \emptyset \). For those \(v\), Theorem 5.4 implies that \(\mathrm {EDdegree}(\overline{X}_v)\) is the sum of the polar classes of \(\overline{X}_v\), which are also those of \(\overline{X}\) since they are projective invariants. Since \(\mathrm {EDdegree}(\overline{X}_v)\) can only go down as \(v\) approaches a limit point, this yields the second inequality, as well as the sufficient condition for equality there. By applying Proposition 6.10, we establish the first inequality, as well as the sufficient condition (6.4) for equality.\(\square \)

Example 6.12

Consider the quadric surface \(\overline{Y} = V(x_0 x_3-x_1 x_2) \subset \mathbb {P}^3\) from Example 6.6. This is the toric variety whose polytope \(P\) is the unit square. By Corollary 5.11, the sum of the polar classes equals \(7 V_2 - 3 V_1 + V_0 = 14-12+4=6\). Comparing this with (6.3), we find that neither of the two inequalities in Theorem 6.11 is an equality. This is consistent with the fact that \(Y_\infty :=\overline{Y} \cap H_\infty = V(x_1 x_2)\) is not smooth at the point \((0:0:0:1)\) and the fact that \(\overline{Y}\) and \(Q\) are tangent at the four points \((1:a_1:a_2:a_1 a_2)\) with \(a_1,a_2 = \pm i\). \(\diamondsuit \)

Example 6.13

Consider the threefold \(\,\overline{Z} = V(x_1 x_4 - x_2 x_3 - x_0^2 - x_0x_1)\,\) in \(\mathbb {P}^4\). Then, \(Z_\infty \) is isomorphic to \(\overline{Y}\) from the previous example and smooth in \(\mathbb {P}^3\), but \(Z_\infty \cap Q_\infty \) is isomorphic to the \(\overline{Y} \cap Q\) from the previous example and hence has four non-reduced points. Here, we have \(\,\mathrm {EDdegree}(Z) = 4 < 8 = \mathrm {EDdegree}(\overline{Z}) = \sum _{i=0}^3 \delta _i(\overline{Z})\). If we replace \(x_1 x_4\) by \(2 x_1 x_4\) in the equation defining \(\overline{Z}\), then the four non-reduced points disappear. Now \(Z_\infty \cap Q_\infty \) is smooth, we have \(\mathrm {EDdegree}(Z) = 8\), and both inequalities in Theorem 6.11 hold with equality. \(\diamondsuit \)

Example 6.14

Let \(X\) be the cardioid from Examples 1.1 and 6.6. This curve violates both conditions for equality in Theorem 6.11. Here, \(X_\infty = V(x_1^4+2x_1^2x_2^2+x_2^4) \) agrees with \(Q_\infty = V(x_1^2+x_2^2)\) as a subset of \( H_\infty \simeq \mathbb {P}^1\), but it has multiplicity two at the two points. \(\diamondsuit \)

7 ED discriminant and Chern Classes

Catenese and Trifogli [7, 45] studied ED discriminants under their classical name focal loci. We present some of their results, including a formula for the ED degree in terms of Chern classes, and we discuss a range of applications. We work in the projective setting, so \(X\) is a subvariety of \( \mathbb {P}^{n-1}\), equipped with homogeneous coordinates \((x_1:\ldots :x_n)\) and \(\mathcal {P}\mathcal {E}_X \subset \mathbb {P}^{n-1}_x \times \mathbb {C}^{n}_u\) is its projective ED correspondence. By Theorem 4.4, the ED degree is the size of the general fiber of the map \(\mathcal {P}\mathcal {E}_X \rightarrow \mathbb {C}^{n}_u\). The branch locus of this map is the closure of the set of data points \(u\) for which there are fewer than \(\mathrm {EDdegree}(X)\) complex critical points. Since the variety \(\mathcal {P}\mathcal {E}_X \subset \mathbb {P}^{n-1}_x \times \mathbb {C}^{n}_u\) is defined by bihomogeneous equations in \(x\), \(u\), also the branch locus is defined by homogeneous equations and it is a cone in \(\mathbb {C}^{n}_u\). Hence, the branch locus defines a projective variety \(\Sigma _X\subset \mathbb {P}^{n-1}_u\), which we call the ED discriminant. The ED discriminant \(\Sigma _X\) is typically a hypersurface, by the Nagata–Zariski Purity Theorem, and we are interested in its degree and defining polynomial.

Remark 7.1

In applications, the uniqueness of the closest real-valued point \(u^* \in X\) to a given data point \(u\) is relevant. In many cases, e.g., for symmetric tensors of rank one [14], this closest point is unique for \(u\) outside an algebraic hypersurface that strictly contains \(\Sigma _X\).

Example 7.2

Let \(n=4\) and consider the quadric surface \(X = V(x_1x_4- 2 x_2 x_3) \subset \mathbb {P}^3_x\). This is the \(2 \times 2\)-determinant in general coordinates, so \(\mathrm {EDdegree}(X) = 6\). The ED discriminant is a irreducible surface of degree \(12\) in \(\mathbb {P}^3_u\). Its defining polynomial has \(119\) terms:

$$\begin{aligned} \begin{matrix} \Sigma _X\, = \, 65536u_1^{12}+835584 u_1^{10} u_2^2+835584 u_1^{10} u_3^2 -835584 u_1^{10} u_4^2 +9707520 u_1^9 u_2 u_3 u_4 \\ +3747840 u_1^8 u_2^4 -7294464 u_1^8 u_2^2 u_3^2 + \,\cdots \, +835584 u_3^2 u_4^{10}+65536 u_4^{12}. \end{matrix} \end{aligned}$$

This ED discriminant can be computed using the following Macaulay2 code:

figure c

Here, EX is the ideal of the ED correspondence in \(\mathbb {P}^3_x \times \mathbb {P}^3_u\). The command eliminate maps that threefold into \(\mathbb {P}^1_{(x_1:x_2)} \times \mathbb {P}^3_u\). We print the discriminant of that hypersurface over \(\mathbb {P}^3\). \(\diamondsuit \)

If \(X\) is a general hypersurface of degree \(d\) in \(\mathbb {P}^{n-1}\) then, by Corollary 2.10,

$$\begin{aligned} \mathrm {EDdegree}(X) = d \cdot \frac{(d-1)^{n-1}-1}{d-2} . \end{aligned}$$
(7.1)

Trifogli [45] determined the degree of the ED discriminant \(\Sigma _X\) for such a hypersurface \(X\):

Theorem 7.3

(Trifogli) If \(X\) is a general hypersurface of degree \(d\) in \(\mathbb {P}^{n-1}\) then

$$\begin{aligned} \mathrm{degree}(\Sigma _X) = d(n-2)(d-1)^{n-2} \,+\, 2d(d-1) \frac{(d-1)^{n-2}-1}{d-2} . \end{aligned}$$
(7.2)

Example 7.4

A general plane curve \(X\) has \(\mathrm {EDdegree}(X) = d^2\) and \(\mathrm{degree}(\Sigma _X) = 3 d(d-1)\). These are the numbers seen for the ellipse \((d=2)\) in Example 4.5. For a plane quartic \(X\), we expect \(\mathrm {EDdegree}(X) = 16\) and \(\mathrm{degree}(\Sigma _X)= 36\), in contrast to the numbers \(3\) and \(4\) for the cardioid in Example 1.1. A general surface in \(\mathbb {P}^3\) has \(\mathrm {EDdegree}(X) = d(d^2-d+1)\) and \(\mathrm{degree}(\Sigma _X) = 2d(d-1)(2d-1)\). For quadrics \((d=2)\) we get \(6\) and \(12\), as in Example 7.2\(\diamondsuit \)

Example 7.5

The ED discriminant \(\Sigma _X\) of a plane curve \(X\) was already studied in the nineteenth century under the name evolute.

Salmon [39, p. 96, art. 112] showed that a curve \(X \subset \mathbb {P}^2\) of degree \(d\) with \(\delta \) ordinary nodes and \(k\) ordinary cusps has \(\mathrm{degree}(\Sigma _X) = 3 d^2-3d-6\delta -8k\). For affine \(X \subset \mathbb {C}^2\), the same holds provided that \(\overline{X} \subseteq \mathbb {P}^2\) is not tangent to the line \(H_\infty \) and neither of the two isotropic points on \(H_\infty \) is on \(\overline{X}\). Curves with more general singularities are considered in [6, 29] in the context of caustics, which are closely related to evolutes.\(\diamondsuit \)

Example 7.6

Let \(X_r\) be the determinantal variety of Examples 2.3, 5.1 and 4.7. The ED discriminant \(\Sigma _{X_r}\) does not depend on \(r\) and equals the discriminant of the characteristic polynomial of the symmetric matrix \(UU^t\). This polynomial has been expressed as a sum of squares in [28]. The set of real points in the hypersurface \(V(\Sigma _{Xr})\) has codimension two in the space of real \(s\times t\) matrices; see [42, §7.5]. This explains why the complement of this ED discriminant in the space of real matrices is connected. In particular, if \(U\) is real then all critical points are real, hence \(\mathrm {aEDdegree}(X_r)={s\atopwithdelims ()r}\). A computation reveals that \(\Sigma _{X_r}\) is reducible for \(s = 2\). It has two components if \(t \ge 3\), and it has four components if \(t=2\). \(\diamondsuit \)

We comment on the relation between duality and the ED discriminant \(\Sigma _X\). Recall that \(\Sigma _X\) is the projectivization of the branch locus of the covering \(\mathcal {P}\mathcal {E}_X \rightarrow \mathbb {C}^n_u\). By the results in Sect. 5, this is also the branch locus of \(\mathcal {P}\mathcal {E}_{X,Y} \rightarrow \mathbb {C}^n_u\), and hence also of \(\mathcal {P}\mathcal {E}_Y \rightarrow \mathbb {C}^n_u\). This implies that the ED discriminant of a variety \(X\) agrees with that of its dual variety \(Y=X^*\).

Example 7.7

Let \(X \subset \mathbb {P}^2_x\) denote the cubic Fermat curve given by \(x_0^3+x_1^3+x_2^3 = 0 \). Its dual \(Y\) is the sextic curve in \(\mathbb {P}^2_y\) that is defined by \(\, y_0^6 +y_1^6 + y_2^6 -2 y_0^3 y_1^3 -2 y_0^3 y_2^3 -2 y_1^3 y_2^3 \). This pair of curves satisfies \(\mathrm{EDdegree}(X) = \mathrm{EDdegree}(Y) = 9\). The ED discriminant \(\Sigma _X = \Sigma _Y\) is an irreducible curve of degree \(18\) in \(\mathbb {P}^2_u\). Its defining polynomial has \(184\) terms:

$$\begin{aligned} \Sigma _X&= 4 u_0^{18}-204 u_0^{16} u_1^2+588 u_0^{15} u_1^3 -495 u_0^{14} u_1^4+2040 u_0^{13} u_1^5\\&\quad -2254 u_0^{12} u_1^6+2622 u_0^{11} u_1^7 + \cdots + 4 u_2^{18}. \end{aligned}$$

The computation of the ED discriminant for larger examples in Macaulay2 is difficult. \(\diamondsuit \)

The formulas (7.1) and (7.2) are best understood and derived using modern intersection theory; see [18] or [23, Appendix A]. That theory goes far beyond the techniques from [9] used in the earlier sections but is indispensable for more general formulas, especially for varieties \(X\) of codimension \(\ge 2\). We briefly sketch some of the required vector bundle techniques.

A vector bundle \(\mathcal {E}\rightarrow X\) on a smooth, \(m\)-dimensional projective variety \(X\) has a total Chern class \(c(\mathcal {E})=c_0(\mathcal {E}) + \ldots + c_m(\mathcal {E})\), which resides in the cohomology ring \(H^*(X)=\bigoplus _{i=0}^{m} H^{2i}(X)\). In particular, the top Chern class \(c_m(\mathcal {E})\) is an integer scalar multiple of the class of a point, and that integer is commonly denoted \(\int c(\mathcal {E})\). If \(\mathcal {E}\) has rank equal to \(\dim X=m\), and if \(s:X \rightarrow \mathcal {E}\) is a global section for which \(V(s):=\{x \in X \mid s(x)=0\}\) consists of finitely many simple points, then the cardinality of \(V(s)\) equals \(\int c(\mathcal {E})\). To apply this to the computation of ED degrees, we shall find \(\mathcal {E}\) and \(s\) such that the variety \(V(s)\) is the set of critical points of \(d_u\), and then compute \(\int c(\mathcal {E})\) using vector bundle tools. Among these tools are Whitney’s sum formula \(c(\mathcal {E})=c(\mathcal {E}') \cdot c(\mathcal {E}'')\) for any exact sequence \(0 \rightarrow \mathcal {E}' \rightarrow \mathcal {E}\rightarrow \mathcal {E}'' \rightarrow 0\) of vector bundles on \(X\), and the fact that the total Chern class of the pullback of \(\mathcal {E}\) under a morphism \(X' \rightarrow X\) is the image of \(c(\mathcal {E})\) under the ring homomorphism \(H^*(X) \rightarrow H^*(X')\).

Here is our repertoire of vector bundles on \(X\): the trivial bundle \(X \times \mathbb {C}^n\) of rank \(n\); the tautological line bundle pulled back from \(\mathbb {P}^{n-1}\), which is \(\mathcal {R}_X := \{(x,v) \in X \times \mathbb {C}^{n} \mid v \in x\}\) (also often denoted by \(\mathcal {O}_X(-1)\), while the dual \(\mathcal {R}_X^*\) is denoted by \(\mathcal {O}_X(1)\)); the tangent bundle \(TX\) whose fibers are the tangent spaces \(T_xX\); the cotangent bundle \(T^*X\) whose fibers are their duals \((T_xX)^*\); and the normal bundle \(N_X\) whose fibers are the quotient \(T_x\mathbb {P}^n/T_xX\). From these building blocks, we can construct new vector bundles using direct sums, tensor products, quotients, duals, and orthogonal complements inside the trivial bundle \(X \times \mathbb {C}^n\).

Theorem 7.8

(Catanese–Trifogli) Let \(X\) be an irreducible smooth subvariety of \(\mathbb {P}^{n-1}\) and assume that \(X\) intersects the isotropic quadric \(Q= V(x_1^2+\cdots +x_{n}^2)\) transversally, i.e., \(X \cap Q\) is smooth. Then, the EDdegree of \(X\) can be computed in \(H^*(X)\) by either of the expressions

$$\begin{aligned} \mathrm{EDdegree}(X) \!=\! \int \frac{c(\mathcal {R}^*_X) \cdot c(T^*X \otimes \mathcal {R}^*_X)}{c(\mathcal {R}_X)} \quad =\! \quad \! \int \frac{1}{c(\mathcal {R}_X) \cdot c(N^*_X\otimes \mathcal {R}^*_X)}.\quad \end{aligned}$$
(7.3)

Proof

The first expression is stated after Remark 3 on page 6026 in [7], as a formula for the inverse of the total Chern class of what they call Euclidean normal bundle (for simplicity we tensor it by \(\mathcal {R}_X^*\), differently from [7]). The total space of that bundle, called normal variety in [7, 45], is precisely our projective ED correspondence \(\mathcal {P}\mathcal {E}_X\) from Theorem 4.4.

A general data point \(u \in \mathbb {C}^n\) gives rise to a section \(x \mapsto [(x,u)]\) of the quotient bundle \((X \times \mathbb {C}^n) / \mathcal {P}\mathcal {E}_X\), whose zero set is exactly the set of critical points of \(d_u\). By Whitney’s sum formula, the total Chern class of this quotient is \(1/c(\mathcal {P}\mathcal {E}_X)\). This explains the inverse and the first formula. The second formula is seen using the identity

$$\begin{aligned} \frac{1}{c(N^*_X\otimes \mathcal {R}_X^*)}\,=\,\frac{c(T^* X \otimes \mathcal {R}_X^*)}{c(T^*\mathbb {P}^{n-1}\otimes \mathcal {R}_X^*)}\,=\,c(\mathcal {R}_X^*)\cdot c(T^* X \otimes \mathcal {R}_X^*), \end{aligned}$$
(7.4)

where the second equality follows from the Euler sequence [23, Example II.8.20.1].

\(\square \)

Remark 7.9

The ED degree of a smooth projective variety \(X\) can also be interpreted as the top Segre class [18] of the Euclidean normal bundle of \(X\).

We shall now relate this discussion to the earlier formula in Sect. 5, by offering a second proof of Theorem 7.8. This proof is based on a Chern class computation and Theorem 5.8, and hence independent of the proof by Catanese and Trifogli.

Second proof of Theorem 7.8

If \(\mathcal {E}\) is a vector bundle of rank \(m\) and \(\mathcal {L}\) is a line bundle, then

$$\begin{aligned} c_k(\mathcal {E}\otimes \mathcal {L})\!=\! \sum _{i=0}^k{{r-i}\atopwithdelims (){k-i}}c_i(\mathcal {E})c_1(\mathcal {L})^{k-i}. \end{aligned}$$
(7.5)

This formula is [18, Example 3.2.2]. By definition, we have \(c_i(X) = (-1)^i c_i(T^*X)\). Setting \(c_1(\mathcal {R}^*_X)=h\), the formula (7.5) implies

$$\begin{aligned} c(T^*X \otimes \mathcal {R}^*_X) = \sum _{k=0}^m\sum _{i=0}^k {{m-i}\atopwithdelims (){k-i}}(-1)^ic_i(X)h^{k-i} = \sum _{i=0}^m(-1)^ic_i(X)\sum _{t=0}^{m-i} {{m-i}\atopwithdelims ()t}h^t. \end{aligned}$$

We have \(c(\mathcal {R}^*_X)=1+h\) and \(1/c(\mathcal {R}_X)=1/(1-h) = \sum _{i=0}^mh^i\). The equation above implies

$$\begin{aligned} \frac{c(\mathcal {R}^*_X) \cdot c(T^*X \otimes \mathcal {R}^*_X)}{c(\mathcal {R}_X)} = \sum _{i=0}^m(-1)^ic_i(X)\left( \sum _{t=0}^{m-i} {{m-i}\atopwithdelims ()t}h^t\right) \left( 1+2\sum _{j=1}^mh^j\right) . \end{aligned}$$

The integral on the left-hand side in (7.3) is the coefficient of \(h^{m-i}\) in the polynomial in \(h\) that is obtained by multiplying the two parenthesized sums. That coefficient equals

$$\begin{aligned} 1+2\sum _{j=0}^{m-i-1}{{m-i}\atopwithdelims ()j} = \, 2^{m-i+1}-1. \end{aligned}$$

We conclude that Theorem 5.8 is in fact equivalent to the first formula in Theorem 7.8. The second formula follows from (7.4), as argued above.\(\square \)

The Catanese–Trifogli formula in (7.3) is most useful when \(X\) has low codimension. In that case, we compute the relevant class in the cohomology ring of the ambient projective space \(\mathbb {P}^n\), and pull back to \(X\). This yields the following proof of Proposition 2.6.

Proof of Proposition 2.6

First consider the case where \(X \) is a general hypersurface of degree \(d\) in \(\mathbb {P}^{n-1}\). We compute in \(H^*(\mathbb {P}^{n-1})=\mathbb {Z}[h]/\langle h^{n} \rangle \). The line bundle \(\mathcal {R}_X\) is the pullback of \(\mathcal {R}_{\mathbb {P}^{n-1}}\), whose total Chern class is \(1-h\). Since \(\mathrm{codim}(X)=1\), the vector bundle \(N_X\) is a line bundle. By [23, Example II.8.20.3], we have \(N_X=(\mathcal {R}_X^*)^{\otimes d}\), so that \(N^*_X\otimes \mathcal {R}_X^*=(\mathcal {R}_X)^{\otimes (d-1)}\). In \(H^*(\mathbb {P}^{n-1})\) we have

$$\begin{aligned} \frac{1}{c(\mathcal {R}_{\mathbb {P}^{n-1}}) \cdot c(\mathcal {R}_{\mathbb {P}^{n-1}}^{\otimes d-1})} = \frac{1}{(1-h)(1-(d-1)h)}. \end{aligned}$$

The coefficient of \(h^{n-2}\) in this expression equals \(\sum _{i=0}^{n-2} (d-1)^i\), and since the image of \(h^{n-2}\) in \(H^*(X)\) under pullback equals \(d = \mathrm{degree}(X)\) times the class of a point, we find

$$\begin{aligned} \mathrm {EDdegree}(X)= \int \frac{1}{c(\mathcal {R}_X) \cdot c(\mathcal {N}^*_X)} = d \cdot \sum _{i=0}^{n-2} (d-1)^i. \end{aligned}$$

A similar reasoning applies when \(X\) is a general complete intersection of \(c\) hypersurfaces of degrees \(d_1,\ldots ,d_c\). Again, by working in \(H^*(\mathbb {P}^{n-1}) = \mathbb {Z}[h]/\langle h^{n} \rangle \), we evaluate

$$\begin{aligned} \mathrm{EDdegree}(X) = \int \frac{1}{(1-h) \prod \limits _{i=1}^c (1-(d_i-1)h) }, \end{aligned}$$

where \(\int \) refers to the coefficient of the point class in the pullback to \(X\). To compute this, we expand the integrand as a series in \(h\). The coefficient of \(h^{n-c-1}\) in that series, multiplied by \(\mathrm{degree}(X) = d_1 \cdots d_c\), is the formula in (2.8). Proposition 2.6 then follows from Theorem 6.11. Here is the argument. After a transformation (if necessary) of the given equations \(f_1,\ldots ,f_s\), the variety \(X'\) cut out by the first \(c\) of them is a complete intersection. Then, \(X\) is an irreducible component of \(X'\). This implies \(\mathrm{EDdegree}(X)\le \mathrm{EDdegree}(X')\). Now, by semicontinuity, \(\mathrm{EDdegree}(X')\) is at most the value for a general complete intersection.\(\square \)

If \(X\) is a low-dimensional variety then Theorem 5.8 may be more useful, especially if \(X\) is a variety whose cohomology ring we understand well. We illustrate this scenario with a computation that generalizes Example 5.12 from \(X \simeq \mathbb {P}^1\) to higher dimensions.

Proposition 7.10

After a change of coordinates that creates a transverse intersection with the isotropic quadric \(Q\) in \(\mathbb {P}^{{{m+d}\atopwithdelims ()d}-1}\), the \(d\)-th Veronese embedding of \(\,\mathbb {P}^{m}\) has ED degree

$$\begin{aligned} \frac{(2d-1)^{m+1} - (d-1)^{m+1}}{d} . \end{aligned}$$
(7.6)

Proof

We write \(i_d:\mathbb {P}^{m-1} \rightarrow X\) for the \(d\)th-Veronese embedding in question. So, \(X\) denotes the image of \(\mathbb {P}^{m-1}\) in \( \mathbb {P}^{{{m+d-1}\atopwithdelims ()d}-1}\) under the map given by a sufficiently general basis for the space of homogeneous polynomials of degree \(d\) in \(m\) variables. We have \(c_i(X)={m + 1\atopwithdelims ()i}h^i\), so that \(\deg c_i(X)\,=\, \int (dh)^{m-i} c_i(X) \,=\, {m\atopwithdelims ()i}d^{m-i}\). From Theorem 5.8, we now get

$$\begin{aligned} \mathrm {EDdegree}(X) = \sum _{i=0}^{m}(-1)^i(2^{m+1-i}-1){m+1\atopwithdelims ()i}d^{m-i}. \end{aligned}$$

Using the Binomial Theorem, we see that this alternating sum is equal to (7.6).\(\square \)

Theorem 7.8 requires \(X\) to be smooth. Varieties with favorable desingularizations are also amenable to Chern class computations, but the computations become more technical.

Example 7.11

Let \(X_r\) denote the variety of \(s\times t\) matrices of rank \(\le r\), in general coordinates so that \(X_r\) intersects \(Q\) transversally. Its ED degree can be computed by the desingularization in [46, Proposition 6.1.1.a]. The Chern class formula amounts to a nontrivial computation in the ring of symmetric functions. We implemented this in Macaulay2 as follows:

figure d

The first values of \(\mathrm {EDdegree}(X_r)\) are summarized in the following table

$$\begin{aligned}\begin{array}{r@{\quad }|r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r} &{}(s,t)&{}=&{}(2,2)&{}(2,3)&{}(2,4)&{}(2,5)&{}(3,3)&{}(3,4)&{}(3,5)&{}(4,4)&{}(4,5)&{}(5,5)\\ \hline r=1&{}&{}&{}6&{}10&{}14&{}18&{}39&{}83&{}143&{}284&{}676&{}2205\\ r=2&{}&{}&{}&{}&{}&{}&{}39&{}83&{}143&{}1350&{}4806&{}55010\\ r=3&{}&{}&{}&{}&{}&{}&{}&{}&{}&{}284&{}676&{}55010\\ r=4&{}&{}&{}&{}&{}&{}&{}&{}&{}&{}&{}&{}2205\\ \end{array} \end{aligned}$$

The \(r=1\) row can also be computed with \(P\) a product of two simplices in Corollary 5.11. \(\diamondsuit \)

Using the formalism of Chern classes, Catanese and Trifogli [7, p. 6030] derive a general formula for the degree of the ED discriminant \(\Sigma _X\). Their formula is a complicated expression in terms of the Chow ring of the ED correspondence \(\mathcal {P} \mathcal {E}_X\). Here are two easier special cases.

Example 7.12

If \(X\) is a general smooth curve in \(\mathbb {P}^n\) of degree \(d\) and genus \(g\) then

$$\begin{aligned} \mathrm{degree}(\Sigma _X) = 6 (d+g-1) . \end{aligned}$$

For instance, the rational normal curve \(X\) in general coordinates in \(\mathbb {P}^n\), as discussed in Example 5.12, has \(\,\mathrm{degree}(X) = n ,\, \mathrm {EDdegree}(X) = 3n-2\), and \(\, \mathrm{degree}(\Sigma _X) = 6n-6 \).

If \(X\) is a general smooth surface in \(\mathbb {P}^n\) of degree \(d\), with Chern classes \(c_1(X), c_2(X)\), then

$$\begin{aligned} \mathrm{degree}(\Sigma _X) = 2 \cdot \left( \,15 \cdot d + c_1(X)^2 + c_2(X) - 9 \cdot \mathrm{deg} \,c_1(X) \,\right) . \end{aligned}$$

The formulas in Example 7.4 can be derived from these expressions, as in [7, p. 6034]. \(\diamondsuit \)

8 Tensors of Rank One

In this section, we present a brief account of recent work on multidimensional tensors of rank one [14]. For these, the ED degree is computed in [15], and the average ED degree is computed in [10]. Our discussion includes partially symmetric tensors, and it represents a step toward extending the Eckart–Young theorem from matrices to tensors.

We consider real tensors \(x = (x_{i_1 i_2 \cdots i_p})\) of format \(m_1 \times m_2 \times \cdots \times m_p\). The space of such tensors is the tensor product \(\mathbb {R}^{m_1} \otimes \mathbb {R}^{m_2} \otimes \cdots \otimes \mathbb {R}^{m_p}\), which we identify with \(\mathbb {R}^{m_1 m_2 \cdots m_p}\). The corresponding projective space \(\mathbb {P}(\mathbb {R}^{m_1} \otimes \cdots \otimes \mathbb {R}^{m_p})\) is likewise identified with \( \mathbb {P}^{m_1m_2 \cdots m_p-1}\).

A tensor \(x\) has rank one if \(x = t_1 \otimes t_2 \otimes \cdots \otimes t_p\) for some vectors \(t_i \in \mathbb {R}^{m_i}\). In coordinates,

$$\begin{aligned} \qquad x_{i_1 i_2 \cdots i_p} = t_{1 i_1} t_{2 i_2} \cdots t_{p i_p} \qquad \hbox {for} \quad 1 \le i_1 \le m_1, \ldots , 1 \le i_p \le m_p. \end{aligned}$$
(8.1)

The set \(X\) of all tensors of rank one is an algebraic variety in \(\mathbb {R}^{m_1 m_2 \cdots m_p}\). It is the cone over the Segre variety \( \mathbb {P}(\mathbb {R}^{m_1})\times \cdots \times \mathbb {P}(\mathbb {R}^{m_p}) = \mathbb {P}^{m_1-1} \times \cdots \times \mathbb {P}^{m_p-1}\) in its natural embedding in \(\mathbb {P}^{m_1m_2 \cdots m_p-1}\). By slight abuse of notation, we use the symbol \(X\) also for that Segre variety.

Theorem 8.1

(Draisma and Horobeţ [15, Theorem 4]) The ED degree of the Segre variety \(X \) of rank \(1\) tensors of format \(m_1 {\times } \cdots {\times } m_p\) equals the coefficient of the monomial \(z_1^{m_1-1}\cdots z_p^{m_p-1}\) in the polynomial

$$\begin{aligned} \prod \limits _{i=1}^p \frac{(\widehat{z}_i)^{m_i}-z_i^{m_i}}{\, \widehat{z}_i\,-\,z_i} \qquad \hbox { where } \,\widehat{z}_i \, = \, z_1 {+} \cdots {+} z_{i-1} + z_{i+1} {+} \cdots {+} z_p. \end{aligned}$$

The embedding (8.1) of the Segre variety \(X\) into \(\mathbb {P}^{m_1 m_2 \cdots m_p-1}\) is not transversal to the isotropic quadric \(Q\), so our earlier formulas do not apply. However, it is natural in the following sense. The Euclidean distance on each factor \(\mathbb {R}^{m_i}\) is preserved under the action by the rotation group \(SO(m_i)\). The product group \(SO(m_1)\times \cdots \times SO(m_p)\) embeds in the group \(SO(m_1\cdots m_p)\), which acts by rotations on the tensor space \(\mathbb {R}^{m_1 m_2 \cdots m_p}\). The Segre map (8.1) from \(\mathbb {R}^{m_1} \times \cdots \times \mathbb {R}^{m_p}\) to \(\mathbb {R}^{m_1 m_2 \cdots m_p}\) is \(SO(m_1)\times \cdots \times SO(m_p)\)-equivariant. This group invariance becomes crucial when, in a short while, we pass to partially symmetric tensors.

For \(p=2\), when the given tensor \(u\) is a matrix, Theorem 8.1 gives the Eckart–Young formula \(\mathrm{EDdegree}(X) = \mathrm{min}(m_1,m_2)\). The fact that singular vectors are the eigenvectors of \(u^T u\) or \(u u^T\), can be interpreted as a characterization of the ED correspondence \(\mathcal {E}_X\). The following generalization to arbitrary tensors, due to Lim [32], is the key ingredient used in [15]. Suppose that \(u = (u_{i_1 i_2 \cdots i_p})\) is a given tensor, and we seek to find its best rank one approximation \(x^* = (x^*_{i_1 i_2 \cdots i_p}) = (t^*_{1 i_1} t^*_{2 i_2} \cdots t^*_{p i_p})\). Then, we have the singular vector equations

$$\begin{aligned} u \cdot (t^*_1 \otimes \cdots \otimes t^*_{i-1} \otimes t^*_{i+1} \otimes \cdots \otimes t^*_p) \, =\, \lambda t^*_i, \end{aligned}$$
(8.2)

where the scalars \(\lambda \)’s are the singular values of the tensor \(u\). The dot in (8.2) denotes tensor contraction. In the special case \(p=2\), these are the equations, familiar from linear algebra, that characterize the singular vector pairs of a rectangular matrix [15, (1.1)]. Theorem 8.1 is proved in [15] by counting the number of solutions to (8.2). The arguments used are based on Chern class techniques as described in Sect. 6.

Consider the ED correspondence \( \mathcal {P}\mathcal {E}_X \), introduced before Theorem 4.4, but now regarded as a subvariety of \( \mathbb {P}^{m_1 \cdots m_p-1} \times \mathbb {P}^{m_1 \cdots m_p-1} \). Its equations can be derived as follows. The proportionality conditions of (8.2) are expressed as quadratic equations given by \(2\times 2\) minors. This leads to a system of bilinear equations in \((x,u)\). These equations, together with the quadratic binomials in \(x\) for the Segre variety \(X\), define the ED correspondence \(\mathcal {P}\mathcal {E}_X\).

Example 8.2

Let \(p = 3\), \(m_1 = m_2 = m_3 = 2\), and abbreviate \(a=t^*_1,b=t^*_2,c= t^*_3\), for the Segre embedding of \(X = \mathbb {P}^1 \times \mathbb {P}^1 \times \mathbb {P}^1 \) into \(\mathbb {P}^7\). This toric threefold is defined by the ideal

$$\begin{aligned} \begin{matrix} \langle \,x_{101} x_{110} - x_{100} x_{111} , x_{011} x_{110} - x_{010} x_{111}, x_{011} x_{101} - x_{001} x_{111} \\ \ x_{010} x_{100} - x_{000} x_{110}, x_{001} x_{100} - x_{000} x_{101}, x_{001} x_{010} - x_{000} x_{011} \\ \quad x_{010} x_{101} - x_{000} x_{111}, x_{011} x_{100} - x_{000} x_{111}, x_{001} x_{110} - x_{000} x_{111} \,\rangle . \end{matrix} \end{aligned}$$
(8.3)

The six singular vector equations (8.2) for the \(2 {\times } 2 {\times } 2\)-tensor \(x\) reduce to the proportionality between the columns of the following three matrices

$$\begin{aligned}&\begin{pmatrix} u_{000} b_0 c_0 + u_{001} b_0 c_1 + u_{010} b_1 c_0+ u_{011} b_1 c_1 &{} a_0 \\ u_{100} b_0 c_0 + u_{101} b_0 c_1 + u_{110} b_1 c_0 +u_{111} b_1 c_1 &{} a_1 \end{pmatrix}\\&\begin{pmatrix}u_{000} a_0 c_0 + u_{001} a_0 c_1 + u_{100} a_1 c_0 + u_{101} a_1 c_1 &{} b_0 \\ u_{010} a_0 c_0 + u_{011} a_0 c_1 + u_{110} a_1 c_0 + u_{111} a_1 c_1 &{} b_1 \end{pmatrix}\\&\begin{pmatrix}u_{000} a_0 b_0 + u_{010} a_0 b_1 + u_{100} a_1 b_0 + u_{110} a_1 b_1 &{} c_0 \\ u_{001} a_0 b_0 + u_{011} a_0 b_1 + u_{101} a_1 b_0 + u_{111} a_1 b_1 &{} c_1 \end{pmatrix} \end{aligned}$$

We now take the three determinants, by using \(a_i b_j c_k = x_{ijk}\), this gives the bilinear equations

$$\begin{aligned} u_{000} x_{100} + u_{001} x_{101} + u_{010} x_{110} + u_{011} x_{111}&= u_{100} x_{000} + u_{101} x_{001} \nonumber \\&+ u_{110} x_{010} + u_{111} x_{011}, \nonumber \\ u_{000} x_{010} + u_{001} x_{011}+u_{100} x_{110} + u_{101} x_{111}&= u_{010} x_{000}+u_{011} x_{001}\nonumber \\&+u_{110} x_{100}+u_{111} x_{101}, \nonumber \\ u_{000} x_{001} + u_{010} x_{011} + u_{100} x_{101} + u_{110} x_{111}&= u_{001} x_{000} + u_{011} x_{010}\nonumber \\&+ u_{101} x_{100}+u_{111} x_{110}. \end{aligned}$$
(8.4)

The ED correspondence \(\,\mathcal {P}\mathcal {E}_X \subset \mathbb {P}^7 \times \mathbb {P}^7\,\) of \(\,X = \mathbb {P}^1 {\times } \mathbb {P}^1 {\times } \mathbb {P}^1\,\) is defined by (8.3) and (8.4).

By plugging the binomials (8.3) into (2.7), we verify \(\mathrm{EDdegree}(X) = 6\), the number from Theorem 8.1. By contrast, if we scale the \(x_{ijk}\) so that \(X\) meets the isotropic quadric \(Q\) transversally, then \(\mathrm{EDdegree}(X) = 15 \cdot 6 - 7 \cdot 12 + 3 \cdot 12 - 1 \cdot 8 = 34 \), by Corollary 5.11. \(\diamondsuit \)

Our duality results in Sect. 5 have nice consequences for rank one tensor approximation. It is known [19, Chapter XIV] that the dual variety \(Y = X^*\) is a hypersurface if and only if

$$\begin{aligned} 2 \cdot \max (m_1,m_2,\ldots ,m_p) \le m_1 + m_2 + \cdots + m_p - p + 2 . \end{aligned}$$
(8.5)

In that case, the polynomial defining \(Y\) is the hyperdeterminant of format \(m_1 \times m_2 \times \cdots \times m_p\). For instance, in Example 8.2, where \(P\) is the \(3\)-cube, we get the \(2 \times 2 \times 2\) -hyperdeterminant

$$\begin{aligned} \begin{matrix} Y &{}=&{} V \left( x_{000}^2 x_{111}^2-2 x_{000} x_{001} x_{110} x_{111}-2x_{000} x_{010} x_{101} x_{111} -2 x_{000} x_{011} x_{100} x_{111} \right. \\ &{}&{} + 4 x_{000} x_{011} x_{101} x_{110} + x_{001}^2 x_{110}^2+4 x_{001} x_{010} x_{100} x_{111} - 2 x_{001} x_{010} x_{101} x_{110} \\ &{}&{}\left. - 2 x_{001} x_{011} x_{100} x_{110} + x_{010}^2 x_{101}^2- 2 x_{010} x_{011} x_{100} x_{101} + x_{011}^2 x_{100}^2 \, \right) . \end{matrix} \end{aligned}$$

The following result was proved for \(2 \times 2 \times 2\)-tensors by Stegeman and Comon [40]. However, it holds for arbitrary \(m_1,\ldots ,m_p\). The proof is an immediate consequence of Theorem 5.2.

Corollary 8.3

Let \(u\) be a tensor and \(u^*\) its best rank one approximation. Then, \(u-u^*\) is in the dual variety \(Y\). In particular, if (8.5) holds then the hyperdeterminant of \(u-u^*\) is zero.

This result explains the fact, well known in the numerical multilinear algebra community, that tensor decomposition and best rank one approximation are unrelated for \(p\ge 3\). The same argument gives the following generalization to arbitrary toric varieties \(X_A\). Following [19], here \(A\) is a point configuration, whose convex hull is the polytope \(P\) in Corollary 5.11. Fix a projective toric variety \(X_A \subset \mathbb {P}^n\) whose dual variety \((X_A)^*\) is a hypersurface. The defining polynomial of that hypersurface is the A-discriminant \(\Delta _A\). See [19] for details.

Corollary 8.4

Given a general point \(u \in \mathbb {R}^{n+1}\), let \(x\) be a point in the cone over \(X_A\) which is critical for the squared distance function \(d_u\). The \(A\)-discriminant \(\Delta _A\) vanishes at \(u-x\).

The construction of singular vectors and the ED degree formula in Theorem 8.1 generalizes to partially symmetric tensors. Corollary 8.4 continues to apply in this setting. We denote by \(S^a\mathbb {R}^m\) the \(a\)-th symmetric power of \(\mathbb {R}^m\). Fix positive integers \(\omega _1, \ldots , \omega _p\). We consider the embedding of the Segre variety \(X=\mathbb {P}(\mathbb {R}^{m_1})\times \cdots \times \mathbb {P}(\mathbb {R}^{m_p})\) into the space of tensors \(\mathbb {P}(S^{\omega _1}\mathbb {R}^{m_1}\otimes \cdots \otimes S^{\omega _p}\mathbb {R}^{m_p})\), sending \((v_1,\ldots , v_p)\) to \(v_1^{\omega _1}\otimes \cdots \otimes v_p^{\omega _p}\). The image is called a Segre–Veronese variety. When \(p=1\) we get the classical Veronese variety whose points are symmetric decomposable tensors in \(\mathbb {P}(S^{\omega _1}\mathbb {R}^{m_1})\). A symmetric tensor \(x \in S^{\omega _1}\mathbb {R}^{m_1}\) corresponds to a homogeneous polynomial of degree \(\omega _1\) in \(m_1\) indeterminates. Such a polynomial sits in the Veronese variety \(X\) if it can be expressed as the power of a linear form.

At this point, it is extremely important to note the correct choice of coordinates on the space \(S^{\omega _1}\mathbb {R}^{m_1}\otimes \cdots \otimes S^{\omega _p}\mathbb {R}^{m_p}\). We want the group \(SO(m_1) \times \cdots \times SO(m_p)\) to act by rotations on that space, and our Euclidean distance must be compatible with that action. In order for this to happen, we must include square roots of appropriate multinomial coefficients in the parametrization of the Segre–Veronese variety. We saw this Example 2.7 for the twisted cubic curve (\(p=1, m_1 = 2, \omega _1 = 3\)) and in Example 3.2 for symmetric matrices (\(p=1, \omega _2=3\)). In both examples, the Euclidean distances come from the ambient space of all tensors.

Example 8.5

Let \(p = 2, m_1 = 2, m_2 = 3, \omega _1 = 3, \omega _2 = 2\). The corresponding space \(S^3 \mathbb {R}^2 \otimes S^2 \mathbb {R}^3\) of partially symmetric tensors has dimension \(24\). We regard this as a subspace in the \(72\)-dimensional space of \(2 {\times } 2 {\times } 2 {\times } 3 {\times } 3\)-tensors. With this, the coordinates on \(S^3 \mathbb {R}^2 \otimes S^2 \mathbb {R}^3\) are \(x_{ijklm}\) where \(1 \le i \le j \le k \le 2\) and \(1 \le l \le m \le 3\), and the squared distance function is

$$\begin{aligned} \begin{matrix} d_u(x) &{} = &{} \!\!\!\!\!\!\!\!\!\!\!(u_{11111} - x_{11111})^2 + 2 (u_{11112} - x_{11112})^2 + \cdots + (u_{11133} - x_{11133})^2 \\ &{} &{} + \, 3 (u_{12111} - x_{12111})^2 + 6 (u_{12112} - x_{12112})^2 + \cdots + (u_{22233} - x_{22233})^2. \end{matrix} \end{aligned}$$

In the corresponding projective space \(\mathbb {P}^{23} = \mathbb {P}(S^3 \mathbb {R}^2 \otimes S^2 \mathbb {R}^3)\), the threefold \(X = \mathbb {P}^1 \times \mathbb {P}^2\) is embedded by the line bundle \(\mathcal {O}(3,2)\). It is cut out by scaled binomial equations such as \(3 x_{11111} x_{22111} - x_{12111} x_{12111}\). The ED degree of this Segre–Veronese variety \(X\) equals \(27\). \(\diamondsuit \)

Theorem 8.6

(Friedland and Ottaviani [15, Theorem 5]) Let \(X \subset \mathbb {P}(S^{\omega _1}\mathbb {C}^{m_1}\otimes \cdots \otimes S^{\omega _p}\mathbb {C}^{m_p})\) be the Segre–Veronese variety of partially symmetric tensors of rank one. In the invariant coordinates described above, the ED degree of \(X\) is the coefficient of the monomial \(z_1^{m_1-1}\cdots z_p^{m_p-1}\) in the polynomial

$$\begin{aligned} \prod \limits _{i=1}^p \frac{(\widehat{z}_i)^{m_i}-z_i^{m_i}}{\, \widehat{z}_i\,-\,z_i} \qquad \hbox { where } \,\widehat{z}_i \, = \, \left( \sum _{j=1}^p \omega _j z_j\right) - z_i. \end{aligned}$$

The critical points of \(d_u\) on \(X\) are characterized by the singular vector equations (8.2), obtained by restricting from ordinary tensors to partially symmetric tensors. Of special interest is the case \(p = 1\), with \(m_1 = m \) and \(\omega _1 =\omega \). Here, \(X\) is the Veronese variety of symmetric \(m \times m \times \cdots \times m\) tensors with \(\omega \) factors that have rank one.

Corollary 8.7

The Veronese variety \(X \subset \mathbb {P}(S^\omega \mathbb {C}^m)\), with \(SO(m)\) invariant coordinates, has

$$\begin{aligned} \mathrm{EDdegree}(X) = \frac{(\omega -1)^m - 1}{\omega -2}. \end{aligned}$$

This is the formula in [5] for the number of eigenvalues of a tensor. Indeed, for symmetric tensors, the eigenvector equations of [5] translate into (8.2). This is well known in the matrix case \((\omega =2)\): computing eigenvalues and computing singular values is essentially equivalent. At present, we do not know how to extend our results to tensors of rank \(r \ge 2\).

We now shift gears and examine the average ED degrees of rank one tensors. As above, we write \(X\) for the cone over the Segre variety, given by its distinguished embedding (8.1) into \(\mathbb {R}^{m_1 m_2 \cdots m_p}\). We fix the standard Gaussian distribution \(\omega \) centered at the origin in \(\mathbb {R}^{m_1 m_2 \cdots m_p}\).

In [10], the average ED degree of \(X\) is expressed in terms of the average absolute value of the determinant on a Gaussian-type matrix ensemble constructed as follows. Set \(m:=\sum _i (m_i-1)\) and let \(A=(a_{k\ell })\) be the symmetric \(m \times m\)-matrix with \(p \times p\)-block division into blocks of sizes \(m_1-1,\ldots ,m_p-1\) whose upper triangular entries \(a_{k\ell },\,1 \le k \le \ell \le m\), are

$$\begin{aligned} a_{k\ell }={\left\{ \begin{array}{ll} U_{k\ell } &{}\hbox {if }k,\ell \hbox { are from distinct blocks,}\\ U_0 &{}\hbox {if }k=\ell ,\hbox { and}\\ 0 &{}\hbox {otherwise}. \end{array}\right. } \end{aligned}$$

Here, \(U_0\) and the \(U_{k\ell }\) with \(k<\ell \) in distinct blocks are independent normally distributed scalar random variables. For instance, if \(p=3\) and \((m_1,m_2,m_3)=(2,2,3)\), then

$$\begin{aligned} A=\begin{bmatrix} U_0&U_{12}&\quad \! U_{13}&\quad \! U_{14} \\ U_{12}&\quad \! U_0&\quad \! U_{23}&\quad \! U_{24} \\ U_{13}&\quad \! U_{23}&\quad \! U_0&\quad \! 0 \\ U_{14}&\quad \! U_{24}&\quad \! 0&\quad \! U_0 \end{bmatrix} \end{aligned}$$

with \(U_0, U_{12}, U_{13}, U_{14}, U_{23},U_{24} \sim N(0,1)\) independent.

Theorem 8.8

(Draisma and Horobeţ [10]) The average ED degree of the Segre variety \(X\) relative to the standard Gaussian distribution on \(\mathbb {R}^{m_1 m_2 \cdots m_p}\) equals

$$\begin{aligned} \mathrm {aEDdegree}(X) = \frac{\pi ^{p/2}}{2^{m/2} \cdot \prod _{i=1}^p \varGamma \left( \frac{m_i}{2}\right) } \cdot \mathbb {E}(|\det (A)|), \end{aligned}$$

where \(\mathbb {E}(|\det (A)|)\) is the expected absolute determinant of the random matrix \(A\).

The proof of this theorem, which can be seen as a first step in random tensor theory, is a computation similar to that in Example 4.8, though technically more difficult. Note the dramatic decrease in dimension: instead of sampling tensors \(u\) from an \(m_1 \cdots m_p\)-dimensional space and computing the critical points of \(d_u\), the theorem allows us to compute the average ED degree by sampling \(m \times m\)-matrices and computing their determinants. Unlike in Example 4.8, we do not expect that there exists a closed form expression for \(\mathbb {E}(|\det (A)|)\), but existing asymptotic results on the expected absolute determinant, e.g. from [43], should still help in comparing \(\mathrm {aEDdegree}(X)\) with \(\mathrm {EDdegree}(X)\) for large \(p\). The following table from [10] gives some values for the average ED degree and compares them with Theorem 8.1:

Tensor format

Average ED degree

ED degree

\(n\times m\)

\(\min (n,m)\)

\(\min (n,m)\)

\(2^3=2\times 2\times 2\)

4.287

6

\(2^4\)

11.06

24

\(2^5\)

31.56

120

\(2^6\)

98.82

720

\(2^7\)

333.9

5,040

\(2^8\)

\(1.206\times 10^3\)

40,320

\(2^9\)

\(4.611\times 10^3\)

362,880

\(2^{10}\)

\(1.843\times 10^4\)

3,628,800

\(2\times 2 \times 3\)

5.604

8

\(2\times 2 \times 4\)

5.556

8

\(2\times 2 \times 5\)

5.536

8

\(2\times 3 \times 3\)

8.817

15

\(2\times 3 \times 4\)

10.39

18

\(2\times 3 \times 5\)

10.28

18

\(3\times 3\times 3\)

16.03

37

\(3\times 3\times 4\)

21.28

55

\(3\times 3\times 5\)

23.13

61

It is known from [15] that \(\mathrm {EDdegree}(X)\) stabilizes outside the range (8.5), i.e., if the \(m_i\) are ordered increasingly, for \(m_p-1 \ge \sum _{i=1}^{p-1} (m_i-1)\). This can be derived from Theorem 8.1. For \(\mathrm {aEDdegree}(X)\) we observe a similar behavior experimentally, except that the average seems to slightly decrease with \(m_p-1\) beyond this bound. At present, we have neither a geometric explanation for this phenomenon nor a proof using the formula in Theorem 8.8.

9 Epilogue

We conclude our investigation of the Euclidean distance degree by loosely paraphrasing Hilbert and Cohn-Vossen in their famous book Anschauliche Geometrie [24, Chapter I, §1]:

The simplest curves are the planar curves. Among them, the simplest one is the line (ED degree 1). The next simplest curve is the circle (ED degree 2). After that come the parabola (ED degree 3), and, finally, general conics (ED degree 4).