6.1 Introduction

The paper aims to introduce some basic geometric methods for the study of the decompositions of tensors. It is mainly devoted to symmetric decompositions of symmetric tensors, which can be identified with homogeneous polynomials, i.e. forms.

Decomposing a form F as a sum of powers (Waring decomposition) is a crucial step to understand the complexity of F. The complexity, or (Waring) rank, of F is indeed given by the minimal number of summands which are necessary to express F as a sum of powers.

In many effective cases, it turns out that one has one decomposition of F as a sum of powers, and the problem is to determine if the given decomposition has minimal length or it is unique (up to trivialities). Just to give a couple of examples:

  • in the Strassen problem, one has a form which is a sum F = F 1 + F 2 where F 1, F 2 are forms defined over two different, disjoint sets of variables. Then one can assume to have a minimal decomposition of both F 1 and F 2. The problem is to determine if the sum of the two decompositions gives a decomposition of F of minimal length. See [10] and [27], for recent accounts on the theory.

  • in the application of tensor analysis to signal processing, there are computational methods which can determine (an approximation of) one decomposition of a tensor F. Since one aims to reconstruct the original components of a mixed signal, the uniqueness of the decomposition is crucial to guarantee that the computed decomposition is (in a small neighborhood of) the correct one (see e.g. [26]).

For the identifiability problem, i.e. in order to determine that a decomposition is unique (up to trivialities), the most popular criterion is the Kruskal’s criterion (see Theorem 6.9 below), which requires the calculation of the Kruskal rank of a set of points (see Definition 6.1). Kruskal’s criterion only works for small values of the rank. Recently, for symmetric tensors, there is a series of results which show how the Kruskal’s criterion can be modified, to widen slightly the range of application (see [2, 4, 5, 14]). These extensions of Kruskal’s criterion are mainly based on methods of algebraic geometry for the study of finite sets in projective spaces.

Since we believe that geometric tools for the study of finite projective sets can contribute to many other aspects of the theory of symmetric (and maybe also non-symmetric) tensors, and we feel that several tools are not widely known in the community of researchers in tensor analysis, we provide here an account of methods which constitute the background for the theory developed in [14] and [2].

As a by-product, we show how similar argument yield a slight extension of the results of [2], for forms of degree 4, even to the case in which the Kruskal rank of a given decomposition is not maximal (see Theorem 6.12).

We hope, in this way, to contribute to the propagation of geometric tools which can help a lot our insight into the analysis of decompositions of specific tensors.

The structure of the paper is the following. The first section contains some basic definitions, basic results and remarks which are useful in the theory. The second section contains a list of results on tensors which are proved by means of the Hilbert function. The third section is devoted to prove a new result, which extends a recent criterion, proved by Angelini, Vannieuwenhoven and the author [2], for the (symmetric) identifiability of a symmetric tensor in a range where the Kruskal’s criterion does not apply. The result requires a deep analysis of the Hilbert function of a finite set in a projective space. In the last section there is a short list of possible developments of the theory and open problems.

6.2 Tensors and Projective Geometry

Since the study of tensors under a geometric point of view is strictly related with systems of homogeneous polynomials and their solutions, it is natural, from a mathematical point of view, to treat tensors defined over an algebraically closed field, as the complex field \(\mathbb {C}\).

At the risks of losing a strict connection with experience, yet the choice of working over \(\mathbb {C}\) will not sound so odd to specialists of quantum information theory, where the algebraic properties of complex numbers play a primary role in many quantum manipulations.

Less familiar is the choice of working on projective spaces of tensors. The idea behind using the projective setting is that the phenomena encoded in a tensor T are as well encoded in its multiples aT, for \(a\in \mathbb {C}\) a non-zero constant. In projective spaces, a point P is an equivalence class containing a vector and its multiples. At the cost of dropping the one-to-one correspondence between points and coordinates (which are defined up to scaling), projective geometry provides a compact algebraic ambient where some operations, like linear dependence, have a natural interpretation.

Thus, we drop the probabilistic approach, in which the sum of some entries of the tensors are forced to be 1, since they represent the probabilities of some event, and we will freely multiply tensors by (complex) scalars. It is an ubiquitous fact that all the results that we obtain can be translated in the probabilistic language, without any loss of validity. The main, non-trivial aspect of the projective point of view is the notion of product of projective spaces, which does not produce a linear variety.

So, we consider a complex vector space V  of dimension n + 1, which we will often identify with \(\mathbb {C}^{n+1}\), thanks to the choice of a basis. We will think of V  as the space of linear forms a 0 x 0 + a 1 x 1 + ⋯ + a n x n, where x 1, …, x n can be identified with the elements of the chosen basis or with variables. Consequently, the space \(Sym^d(V)=Sym^d(\mathbb {C}^{n+1})\) will be identified with the space of homogeneous polynomials (forms) of degree d in the n + 1 variables x 0, …, x n.

Instead of considering directly symmetric tensors as vectors of Sym d(V ), we consider the projective space \(\mathbb {P}(Sym^d(V))\) and consider points T in this space. Thus T corresponds to a symmetric tensor or a form, modulo scaling. Any representative for the equivalence of class of T is a set of coordinates for T. As Sym d(V ) has dimension \(\binom {n+d}d\), the space \(\mathbb {P}(Sym^d(V))\) has projective dimension

$$\displaystyle \begin{aligned} N(d,n):=\binom{n+d}d-1. \end{aligned}$$

The next step is the definition of a (non-linear) map from \(\mathbb {P}(V)=\mathbb {P}^n\) to the space \(\mathbb {P}(Sym^d(V))=\mathbb {P}^{N(d,n)}\): the Veronese map.

To do that, choose an order for the monomials of degree d in n + 1 variables M 0, …, M N, N = N(d, n). One of the most popular order is the lexicographic one, and we will opt for it for the rest of the paper.

Then, use the coordinates to define a map ν d,n as follows. Let a point \(P\in \mathbb {P}(V)\) have coordinates a 0 x 0 + ⋯ + a n x n. We will write:

$$\displaystyle \begin{aligned}P=[a_0x_0+\dots+a_nx_n].\end{aligned}$$

We define ν d,n by sending P to the equivalence class

$$\displaystyle \begin{aligned}\nu_{d,n}(P)=[(a_0x_0+\dots+a_nx_n)^d].\end{aligned}$$

The class ν d,n(P) does not depend on the choice of a representative for the class P, so we get a well defined projective map. We will refer to this map as the Veronese map of degree d in n + 1 variables. We will often write ν d for the Veronese map, when there is no confusion on the number of variables.

We notice that the Veronese maps are embeddings.

Proposition 6.1

Every Veronese map ν d, n is injective.

Proof

Assume that two points \(P,Q\in \mathbb {P}^n\) have the same image in ν d,n. Choose coordinates in \(\mathbb {P}(V)\) and let a 0 x 0 + ⋯ + a n x n be coordinates for P and b 0 x 0 + ⋯ + b n x n be coordinates for Q. Then (b 0 x 0 + ⋯ + b n x n)d is equal to α(a 0 x 0 + ⋯ + a n x n)d, for some \(\alpha \in \mathbb {C}\setminus \{0\}\). Since \(\mathbb {C}\) is algebraically closed, then, after scaling a 0 x 0 + ⋯ + a n x n by a d-root of α, we may assume (b 0 x 0 + ⋯ + b n x n)d = (a 0 x 0 + ⋯ + a n x n)d. Thus b i = 𝜖 i a i, for some choice of the d-roots of unit 𝜖 i, i = 0, …, n. We want to prove that the 𝜖 i’s are all equal, so that P = Q. Indeed, since \(\epsilon _0^{(d-j)}\epsilon _i^j=1\) for all i, j, multiplying by \(\epsilon _0^j\) it follows \(\epsilon _0^j=\epsilon _i^j\) for any j, hence 𝜖 0 = 𝜖 i for all i.

Notice that the previous construction is not the unique way to define a Veronese map. Often v d,n(P) is defined by computing b i = M i(a 0, …, a n) for i = 0, …, N and sending P to the equivalence class [b 0 M 0 + ⋯ + b N M N]. We made our choice in order to make it obvious that the image of the Veronese map is the set of forms which are a power of a linear forms. Since the two choices differ only by the multiplication by a non-singular diagonal matrix, the geometric properties will not be affected after taking any of the choices.

Next, we need to fix some notation for finite subsets of a projective space.

Let \(A\subset \mathbb {P}^n\) be a non-empty finite set. We denote by (A) the cardinality of A. We will say that A is linearly independent when choosing a set of coordinates for each point of A we get a set of linearly independent vectors. This definition does not depend on the choice of the coordinates for each point.

We will denote with 〈A〉 the linear span of A.

Remark 6.1

The projective dimension of 〈A〉 is at most (A) − 1. The dimension of 〈A〉 is equal to (A) − 1 precisely when A is linearly independent.

Notice that, by elementary linear algebra, for any finite set \(A\subset \mathbb {P}^n=\mathbb {P}(V)\) the dimension of the linear span 〈A〉 is equal to n minus the dimension of the space of linear forms that vanish at the points of A.

Definition 6.1

Let \( A \subset \mathbb {P}^n\) be a finite set. The Kruskal rank is the maximum integer k A such that any subset B ⊂ A of cardinality (B) ≤ k A is linearly independent.

Notice that k A is at most equal to (A), and k A = (A) if and only if A is linearly independent. Unless A is a singleton, then k A is always bigger than 1. Moreover k A = 2 exactly when A is aligned.

Obviously the Kruskal rank of a set of points \(A\subset \mathbb {P}^n\) cannot exceed neither n + 1, nor the cardinality of A. We have indeed:

$$\displaystyle \begin{aligned}k_A\leq \dim\langle A \rangle + 1 \leq \ell(A).\end{aligned}$$

Next definition concerns the case where the Kruskal rank is maximal.

Definition 6.2

A finite set \( A \subset \mathbb {P}^n \) is in linear general position (LGP) if the Kruskal rank of A is maximal, i.e. the Kruskal rank is equal to \(\min \{\ell (A), n+1\}\). This is equivalent to say that for any a ≤ n + 1, any subset of A of cardinality a is linearly independent.

Next, we come to the definition of decomposition of a (symmetric) tensor.

Definition 6.3

Let \(A \subset \mathbb {P}^n=\mathbb {P}(V)\) be a finite set. We say that A is a decomposition of the tensor \(T\in \mathbb {P}(Sym^d(V))\), or equivalently that A computes T, if T belongs to the span 〈ν d(A)〉.

Definition 6.4

Let \( A \subset \mathbb {P}^n \) be a decomposition of T. A is minimal if we cannot find a proper subset A′ of A such that T ∈〈ν d(A′)〉.

Remark 6.2

If \( A \subset \mathbb {P}^n \) is a decomposition of T and satisfies the minimality property, then in particular the points of ν d(A) are linearly independent, i.e.,

$$\displaystyle \begin{aligned}\dim(\langle\nu_{d}(A)\rangle) = \ell(A) -1.\end{aligned}$$

6.2.1 The Hilbert Function of Finite Sets in Projective Spaces

We collect in this section a series of definitions and propositions which are well known to people working in algebraic geometry, but maybe not so familiar to other people working in tensor analysis. The main definition is the Hilbert function of a finite set in a projective space, which is a basic tool for our results on the decompositions of symmetric tensors.

Definition 6.5

Let \(Y\subset \mathbb {C}^{n+1} \) be an ordered, finite set of cardinality of vectors. Fix an integer \( d \in \mathbb {N} \).

The evaluation map of degree d on Y  is the linear map

$$\displaystyle \begin{aligned}ev_{Y}(d): Sym^d(\mathbb{C}^{n+1}) \to \mathbb{C}^\ell\end{aligned}$$

which sends \( F \in Sym^d(\mathbb {C}^{n+1}) \) to the evaluation of F at the vectors of Y .

We will use the evaluation map to define the Hilbert function of a finite set \( Z \subset \mathbb {P}^n \).

Remark 6.3

Let \(A \subset \mathbb {P}^n \) be a finite set, with a definite order. Choose a set of homogeneous coordinates for the points of A. We get an ordered set of vectors \(Y\subset \mathbb {C}^{n+1} \), for which the evaluation map ev Y(d) is defined for every d.

If we change the choice of the homogeneous coordinates for the points of the fixed set A, we get another ordered set \( Y' \subset \mathbb {C}^{n+1} \) and the evaluation map \(ev_{Y'}(j)\) differs from ev Y(j) for the multiplication by a non-singular diagonal matrix. Thus the rank of ev Y(j) and \(ev_{Y'}(j)\) are the same for all j.

It is also clear that the rank of ev Y(j) does not depend on how we ordered the points of A.

Let \(f:\mathbb {C}^{n+1}\to \mathbb {C}^{n+1}\) be an automorphism and consider the associated change of coordinates \(\mathbb {P}^n\to \mathbb {P}^n\), that we call again f, by abuse. Then the evaluation on Y  and f(Y ) differ by the multiplication by a non-singular matrix. Thus for any d the maps ev Y(d) and ev f(Y )(d) have the same rank.

Definition 6.6

Let \(Z\subset \mathbb {P}^n \) be a finite set. Choose an order and an ordered set of homogeneous coordinates Y  for the points of A. Define the Hilbert function of Z as the map

$$\displaystyle \begin{aligned}h_Z : \mathbb{Z} \to \mathbb{N} \qquad h_Z(d) = \operatorname{rank}(ev_{Y}(d)) .\end{aligned}$$

By the previous remark, the Hilbert function does not depend on the choice of the coordinates, as well as it does not vary after a change of coordinates in \(\mathbb {P}^n\).

People who are expert of algebraic geometry may wonder why we did not define the Hilbert function as the rank of the restriction maps \(H^0(\mathscr {O}(d))\to H^0(\mathscr {O}_Z(d))\), where \(\mathscr {O},\mathscr {O}_Z\) indicate respectively the structure sheaves of \(\mathbb {P}^n\) and A. This would simplify the notation, since the restriction is well defined, regardless of a choice of coordinates for the points of A. On the other hand, our definition is immediately accessible also to readers who are not expert about cohomology, structure sheaves and so on. We preferred to make our basic definition more familiar and easily computable for a wider audience. We based our definition on the choice of coordinates because only after a choice of coordinates for the points of A one has a natural identification of \( H^0(\mathscr {O}_Z(d))\) with \(\mathbb {C}^{\ell }\).

There is a different notation for the Hilbert function, which is widely used in algebraic geometry. Since it clarifies some aspects, we introduce it.

Remark 6.4

Recall that the homogeneous ideal I Z of the set Z in the polynomial ring \(\mathbb {C}[t_0,\dots ,t_n]\) is the ideal generated by all the homogeneous polynomials (forms) which vanish at all the points of Z. Thus, I Z is a graded ideal. Its degree d summand I Z(d) is exactly the kernel of the evaluation map ev Z(d).

Notice that, indeed, the kernel does not depend on the choice of homogeneous coordinates for the points of Z, because the vanishing of a form at a projective point P is independent from the choice of a specific set of homogeneous coordinates for P.

Thus, recalling that the vector space of forms of degree d we have

$$\displaystyle \begin{aligned} h_Z(d)= \dim(Sym^d(\mathbb{C}^{n+1}))- \dim(I_Z(d))=\binom{n+d}n-\dim(I_Z(d)). \end{aligned}$$

Consequently, we introduce the following notation:

Definition 6.7

Let Z be a finite subset of the projective space \(\mathbb {P}^n\) and let h Z be its Hilbert function. For any d ≥ 0, the value h Z(d) is also called the number of conditions that Z imposes to forms of degree d.

We say that Z imposes independent conditions to forms of degree d, or also that the points of Z are separated by forms of degree d, if h Z(j) = (Z). This happens exactly when, for (any choice of) a set Y  of homogeneous coordinates for the points of Z, the evaluation map ev Y(d) surjects.

Remark 6.5

Let us explain in more details the last definition. Set  = (Z), and fix an order for the points of Z.

Take a vector e j = (0, …, 0, 1, 0, …, 0) (1 is in the j-th position) of the natural basis of \(\mathbb {C}^\ell \), which corresponds to the j-th point P j of Z in the given order. We say that P j is separated in Z by forms of degree d if e j belongs to the image of the evaluation map ev Y(d). Indeed, in this case, e j is the evaluation of a form F of degree d. Thus there exists a form F which vanishes at all the points of Z, but P j. Notice that this is independent on the choice of the homogeneous coordinates Y .

If h Z(j) = (Z), i.e. if the evaluation map ev Y(d) surjects, then any point of Z is separated.

The link between the Hilbert function of finite sets and the decompositions of symmetric tensors is mainly based on the following formula, which gives a different, geometric interpretation of the values h Z(d).

Proposition 6.2

Let \(\nu _{d,n}: \mathbb {P}^n\to \mathbb {P}^N\) , N = N(d, n), be the d-th Veronese embedding of \(\mathbb {P}^n\) . For any finite set \(Z\subset \mathbb {P}^n\) , and for any d ≥ 0, the value h Z(d) determines the dimension of the span of ν d(Z). I.e.:

$$\displaystyle \begin{aligned}h_Z(d) = \dim (\langle \nu_{d,n}(Z) \rangle) +1.\end{aligned}$$

Proof

We know that the value h Z(d) is equal to the dimension of \(Sym^d(\mathbb {C}^{n+1})\) minus the dimension of the space I Z(d), where I Z is the homogeneous ideal of Z in \(\mathbb {C}[t_0,\dots ,t_n]\). If we identify the coordinates in \(\mathbb {P}^{N(d,n)}=\mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) with the monic monomials M j’s of degree d in \(\mathbb {C}[t_0,\dots ,t_n]\) (say with the lexicographic order), then any element of I Z(d) corresponds to a linear form in \(\mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\). The claim follows by Remark 6.1.

Definition 6.8

We define the first difference of the Hilbert function Dh Z of Z as:

$$\displaystyle \begin{aligned}Dh_Z(j) = h_Z(j)-h_Z(j-1),\quad j \in \mathbb{Z} .\end{aligned}$$

The set of non-zero values of Dh Z is called the h-vector of Z.

The following properties of h A and Dh A are elementary and well-known in algebraic geometry. We recall them because they will be useful throughout the paper.

Lemma 6.1

Set ℓ = ℓ(Z). Then we have:

  1. (i)

    h Z(d) ≤ ℓ for all d;

  2. (ii)

    Dh Z(d) = 0 for d < 0;

  3. (iii)

    h Z(0) = Dh Z(0) = 1;

  4. (iv)

    Dh Z(d) ≥ 0 for all d;

  5. (v)

    h Z(d) = ℓ(Z) for all d  ℓ(Z) − 1;

  6. (vi)

    h Z(i) =∑0≤di Dh Z(d);

  7. (vii)

    Dh Z(d) = 0 for d ≫ 0 andd Dh Z(d) = ℓ(Z);

  8. (viii)

    if h Z(d) = ℓ(Z), then Dh Z(d + 1) = 0.

Proof

(i) is a consequence of the definition. (ii) follows immediately since the space \(Sym^d(\mathbb {C}^{n+1})\) is (0) for d negative. (iii) follows since \(Sym^0(\mathbb {C}^{n+1})=\mathbb {C}\) and the evaluation of a constant form c is equal to \(c(1,\dots ,1)\in \mathbb {C}^\ell \).

To see (iv), fix an ordered set of coordinates Y  for the points of Z and fix a linear form Λ which does not vanish at any vector of the finite set Y . Then for any form F of degree d, the evaluation of ΛF at Y  is equal to the evaluation of F at Y  multiplied by a fixed non-singular diagonal matrix, whose entries are the evaluations of Λ at the vectors of Y . Thus the image of ev Y(d + 1) contains a subspace isomorphic to the image of ev Y(d). It follows that h Z(d + 1) ≥ h Z(d), hence Dh Z(d) ≥ 0.

To see (v), choose for each point P j ∈ Z a linear form L j which vanishes at P j and does not vanish at any other point P k ∈ Z. Then for any j call F j the product of the linear forms L k, k ≠ j. F j is a form of degree  − 1, which vanishes at all the points of Z, except P j. Thus, the evaluation of F j at an ordered set of coordinates Y  for the points of Z is a vector (c 1, …, c ) with c k = 0 for k ≠ j and c j ≠ 0. It follows that ev Y( − 1) is surjective. Then, by (iv), ev Y(d) surjects for all d ≥  − 1.

(vi) is a triviality. (vii) and (viii) are obvious consequences of (v) and (vi).

Proposition 6.3

With the previous notation, if Z′ Z, then, for every \(d \in \mathbb {Z} \) , we have \(h_{Z'}(d) \leq h_Z(d) \) and \(Dh_{Z'}(d) \leq Dh_Z(d).\)

Proof

Fix, as usual, an ordered set of coordinates Y, Y  for the points of Z′, Z respectively. Then we have an obvious forgetful map \(f:C^\ell \to \mathbb {C}^{\ell '}\), where ℓ′ = (Z′), such that \(ev_{Y'}(d)=f\circ ev_Y(d)\) for all d. This implies that \(h_{Z'}(d) \leq h_Z(d) \).

The second inequality is less trivial, and we will need some algebra. Write R for the polynomial ring \(\mathbb {C}[t_0,\dots ,t_n]\) and call I Z the ideal generated by forms which vanish at the points of Z. The inclusion I Y ⊂ R determines, for every \(d\in \mathbb {Z}\) an exact sequence of vector spaces:

$$\displaystyle \begin{aligned}0\to I_Y(d)\to R(d)\to (R/I)(d)\to 0,\end{aligned}$$

where R(d), RI Z(d) are the graded pieces of the rings R, RI respectively, in degree d. It follows by Remark 6.3 that for any d:

$$\displaystyle \begin{aligned}h_Z(d)=\dim(R/I_Z(d)).\end{aligned}$$

The natural inclusion \(I_Z\subset I_{Z'}\) induces a surjection \(R/I_Z(d)\to R/I_{Z'}(d)\) for all d. Let Λ be a linear form in \(\mathbb {C}[t_0,\dots ,t_n]\), which does not vanish at any point of Z. The multiplication by Λ induces an inclusion RI Z(d) → RI Z(d + 1). Indeed if F ∈ R(d) is a form which does not vanish at some point P ∈ Z, then LF cannot vanish at P, i.e. the class of LF is non-zero in RI Z(d + 1). Call J Z the ideal generated by I Z and Λ. We have an exact sequence:

$$\displaystyle \begin{aligned}0\to R/I_Z(d)\to R/I_Z(d+1)\to R/J_Z(d+1)\to 0\end{aligned}$$

which proves that

$$\displaystyle \begin{aligned}Dh_Z(d)= \dim(R/J_Z(d+1)).\end{aligned}$$

Similarly Λ induces an embedding \(R/I_{Z'}(d)\to R/I_{Z'}(d+1)\) and \(Dh_{Z'}(d)= \dim (R/J_{Z'}(d)).\) Now look at the commutative diagram:

$$\displaystyle \begin{aligned}\begin{matrix} 0 & \to & R/I_Z(d) &\stackrel{L}\longrightarrow & R/I_Z(d+1) & \to & R/J_Z(d+1) &\to &0 \\ & & \downarrow & & \downarrow & & \downarrow & & \\ 0 & \to & R/I_{Z'}(d) &\stackrel{L}\longrightarrow & R/I_{Z'}(d+1) & \to & R/J_{Z'}(d+1) &\to &0 \end{matrix} \end{aligned}$$

Since the central vertical map \(R/I_Z(d+1)\to R/I_{Z'}(d+1)\) surjects, by the snake’s lemma also the map \(R/J_Z(d+1)\to R/J_{Z'}(d+1)\) surjects. Then \(Dh_Z(d)=\dim ( R/J_Z(d+1))\geq \dim (R/J_{Z'}(d+1))=Dh_{Z'}(d)\). This proves the second claim.

Perhaps, the most important algebraic result on Hilbert functions of finite sets is the maximal growth principle found by Macaulay. Roughly speaking, the maximal growth principle gives an upper bound for the value of h A(i + 1) in terms of h Z(i) and the dimension of the ambient space. We list below the most relevant consequences for the application to the study of tensors and forms.

Proposition 6.4

Assume that for some j > 0 we have Dh Z(j) ≤ j. Then:

$$\displaystyle \begin{aligned}Dh_Z(j) \geq Dh_Z(j+1).\end{aligned}$$

In particular, if for some j > 0, Dh Z(j) = 0, then Dh Z(i) = 0 for all i  j.

Proof

See Section 3 of [8].

Example 6.1

Let us see what happens for h Z(1). Since for i = 1 the domain of the evaluation map is \(Sym^1(\mathbb {C}^{n+1})=\mathbb {C}^{n+1}\), then clearly h Z(1) ≤ n + 1. So h Z(1) = 0 can hold only if (Z) ≤ n + 1. Moreover the kernel of the evaluation map ev Z(1) is isomorphic to the space of linear forms in \(\mathbb {P}^n\) which vanish at Z. Thus:

$$\displaystyle \begin{aligned}h_Z(1)= 1+\dim(\langle Z \rangle).\end{aligned}$$

In particular, h Z(1) = 0 if and only if Z is linearly independent.

Remark 6.6

Assume that for some j we have Dh Z(j) = 0, so that h Z(j − 1) = h Z(j). By Proposition 6.4, for any i ≥ j also Dh Z(i) = 0, i.e., h Z(j − 1) = h Z(i) for any i ≥ j. Therefore, by part (v) of Lemma 6.1, h Z(j − 1) is equal to the cardinality of Z, i.e., the evaluation map in degree j − 1 surjects and Z imposes independent conditions to hypersurfaces of degree j − 1.

Remark 6.7

Assume h Z(i) = (Z) − 1. Then h Z(i + 1) > h Z(i), by Remark 6.6. Thus, if h Z(i) = (Z) − 1, then necessarily h Z(i + 1) = (Z).

Hilbert functions of finite sets share many other properties. One can find an accurate account of the theory in the book of Iarrobino and Kanev [20] and in the book of Migliore [24].

We will focus on the Cayley-Bacharach property, which is defined as follows:

Definition 6.9

A finite set \(Z\subset \mathbb {P}^n\) satisfies the Cayley-Bacharach property in degree i, abbreviated as CB(i), if, for any P ∈ Z, every form of degree i vanishing at Z ∖{P} also vanishes at P.

Remark 6.8

One should compare CB with the property of separating points. In a sort of sense, the CB property is the contrary of the separation property.

  • Z is separated in degree i if for all P ∈ Z, there exists a form of degree i vanishing at Z ∖{P} and not vanishing at P.

  • Z does not satisfy CB if there exists P ∈ Z and there exists a form of degree i vanishing at Z ∖{P} and not vanishing at P.

In particular, if Z satisfies CB(i), then hypersurfaces of degree i cannot separate the points of Z, i.e. h Z(i) < (Z).

Example 6.2

The set Z consisting of four points in \( \mathbb {P}^2 \), three of them aligned, does not satisfy CB(1), and h Z(1) < 4.

Let Z be a set of 6 points in \(\mathbb {P}^{2} \).

If the 6 points are general, then Dh Z = (1, 2, 3), and Z satisfies CB(1). Since h Z(2) = 6, Z does not satisfy CB(2).

If Z lies on an irreducible conic, then Dh Z = (1, 2, 2, 1), and Z satisfies CB(2), and, hence, CB(1).

If Z has 5 points on a line plus one point off the line, then Dh Z = (1, 2, 1, 1, 1), and Z does not satisfy CB(1).

Remark 6.9

If Z satisfies CB(i), then it satisfies CB(i − 1) too. Otherwise, one could find P ∈ Z and a hypersurface \( F \subset \mathbb {P}^{n} \) of degree (i − 1) such that Z ∖{P}⊂ F and PF. Therefore, if \( H_{P} \subset \mathbb {P}^{n} \) is a hyperplane not containing P, then F ∪ H P ∈ H 0(J Z∖{P}(i)) ∖ H 0(J Z(i)), which contradicts the hypothesis.

Remark 6.10

Assume that Z satisfies CB(i). Call I Z the homogeneous ideal of Z. For any P ∈ Z call I Z∖{P} the homogeneous ideal of Z ∖{P}. Then for all j ≤ i we have I Z and I Z∖{P} are equal in degree j. It follows that:

$$\displaystyle \begin{aligned} h_Z(j)=h_{Z\setminus\{P\}}(j) \quad \mbox{ and }\quad Dh_Z(j)=Dh_{Z\setminus\{P\}}(j) \quad \forall j\leq i. \end{aligned} $$
(6.1)

The following proposition, which gives a strong bound on the Hilbert function of sets with a Cayley-Bacharach property, is a refinement of a result due to Geramita, Kreuzer, and Robbiano (see Corollary 3.7 part (b) and (c) of [18]).

Theorem 6.1

If a finite set \( Z \subset \mathbb {P}^{n} \) satisfies CB(i), then for any j such that 0 ≤ j  i + 1 we have

$$\displaystyle \begin{aligned}Dh_{Z}(0)+Dh_{Z}(1)+\cdots + Dh_{Z}(j) \leq Dh_{Z}(i+1-j)+\cdots +Dh_{Z}(i+1).\end{aligned}$$

Proof

See Theorem 4.9 of [2].

Finally, let us point out the relation between the Hilbert functions of a finite set Z and of its image in a Veronese map ν d(Z).

Remark 6.11

Let \(Z\subset \mathbb {P}^n\) be a finite set and let \(\nu _d(Z)\subset \mathbb {P}^N\) be its image in the d-th Veronese map. Then

$$\displaystyle \begin{aligned}h_Z(d)=h_{\nu_d(Z)}(1).\end{aligned}$$

Namely the inverse image in ν d of a linear form Λ in \(\mathbb {P}^N\) corresponds to a form of degree d in \(\mathbb {P}^n\), and the consequent map \(\mathbb {C}^{N+1}\to Sym^d(\mathbb {C}^{n+1})\) surjects. Moreover it is easy to see that, for any choice of coordinates Y  for the points of Z in \(\mathbb {P}^n\) and the consequent choice ν d(Y ) of coordinates for the points of ν d(Z), one has \(ev_{Y'}(L)= ev_Y(\nu _d^{-1}(L))\), so that the claim follows.

In particular, since ν d is a bijection, then h Z(d) = (Z) if and only if \(h_{\nu _d(Z)}(1)=\ell (\nu _d(Z))\), i.e. if and only if ν d(Z) is linearly independent (see Example 6.1).

The following result will be useful in the proof of Theorem 6.12

Proposition 6.5

Let Z be a finite set in \(\mathbb {P}^n\) . Call k the Kruskal rank of Z. If ℓ(Z) ≤ 2k − 1, then Z is separated by forms of degree 2. Hence v 2(Z) is linearly independent.

Proof

We know that k ≤ n + 1. For any point P ∈ Z, consider a partition of the residue Z ∖{P} in two disjoint sets Z 1, Z 2, each of cardinality at most k − 1. Since k − 1 ≤ n, then the span L i of Z i has dimension strictly smaller than n. Moreover, L i does not contain P, for otherwise Z has k linearly dependent points, which contradicts the assumption on the Kruskal rank of Z. Thus, there are hyperplanes H 1, H 2 containing Z 1 and Z 2 respectively and both missing P. The union Q = H 1 ∪ H 2 is a quadric which misses P and contains the remaining points of Z.

6.3 Results on Tensors from Classical Projective Geometry

The section is devoted to list a series of results on tensors whose proof is based on the study of the Hilbert function of finite sets. In many cases we omit the proof, or give only a short draft it.

Remark 6.12

Fix integers d, n > 1 and consider symmetric tensors in the space \(\mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\). In [1] Alexander and Hirschowitz determined the unique value r d,n such that the set of tensors of rank r d,n is dense in \(\mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\). It turns out that r d,n coincides with the expected value, except for a short list of exceptions.

We will call r d,n the generic rank.

Definition 6.10

We say that a tensor \(T\in \mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) of rank r is identifiable if T has only one minimal decomposition A with (A) = r, up to scaling and permutations of the summands.

Identifiability is a relevant property for tensors for many applications, as explained in Sect. 6.1.

If we fix a subgeneric value of the rank r < r d,n, then the set of tensors of rank ≤ r in \(\mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) is irreducible and its general element has rank r, so we can talk about a general tensor of rank r. For general tensors of rank r < r d,n, thanks to the fundamental preparatory works [1, 11], and [3], the situation with respect to the identifiability property has been completely described in [13].

Theorem 6.2

Let d, r ≥ 2. The general tensor in \(\mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) of subgeneric rank r < r d,n is identifiable, unless it is one of the following cases:

  1. 1.

    d = 2;

  2. 2.

    d = 6, n = 2, and r = 9;

  3. 3.

    d = 4, n = 3, and r = 8;

  4. 4.

    d = 3, n = 5, and r = 9.

In the first case there are infinitely many decompositions. In the three last exceptional cases, there are exactly two decompositions.

Proof

See Theorem 1.1 of [13].

Remark 6.13

On the contrary, when r = r d,n, there are very few cases in which a general tensor of rank r is identifiable. The classification has been proved by Galuppi and Mella, see [17].

When r > r d,n, the situation is less known. It is not even obvious what is the meaning of generic tensors, since the set of tensors of given rank can have many components.

In any case, one expects that a sufficiently general tensor is not identifiable, though for r > r n,d very few things are known.

For the case r = r n,d, the situation is completely described in [1, 23] and mainly in [17]: there are many decompositions, unless d, n are included in a short list of cases.

Let us turn to the problem of the identifiability of one specific given tensor \(T\in \mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\), of which we know a minimal decomposition \(A\subset \mathbb {P}^n=\mathbb {P}(\mathbb {C}^{n+1})\) with (A) = r.

Recall that minimal means that the set v d(A) is linearly independent. We do not assume that (A) is actually the rank of T, i.e. we do not know if T has some other decomposition with smaller cardinality.

Let us start recalling the following, classical result of Sylvester, which disposes of the case n = 1, the case of binary forms:

Theorem 6.3

Assume n = 1, i.e. consider the space of tensor \(\mathbb {P}(Sym^d(\mathbb {C}^{2}))\) . Then r 2,d = (d + 1)∕2 if d is odd, r 2,d = (d + 2)∕2 if d is even. Moreover every tensor of rank r < r 2,d is identifiable.

Proof

See [28].

Indeed, to be precise, when n = 1 and d is odd, also tensors of rank r 2,d are identifiable. See Theorem 6.4 below.

So, we restrict ourselves to the case n > 1.

The reason why an analysis of the Hilbert functions is relevant for the identifiability property is expressed in the following lemma, which can be found in [4]:

Lemma 6.2

Consider two different minimal decompositions A, B of a tensor \(T\in \mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) . In other words, we have:

$$\displaystyle \begin{aligned}T\in\langle \nu_d(A)\rangle \cap \langle \nu_d(B)\rangle.\end{aligned}$$

Then if Z = A  B, we get h Z(d) < ℓ(Z), so that Dh Z(d + 1) > 0.

Proof

Set Z = A ∪ B. First assume that A, B are disjoint. The existence of T implies that ν d(Z) is not linearly independent. By Example 6.1, this implies that linear forms in the space \(\mathbb {P}^N\) spanned by \(\nu _d(\mathbb {P}^n)\) do not separate the points of ν d(Z). By Remark 6.11, this implies that forms of degree d in \(\mathbb {P}^n\) do not separate the points of Z. The claims follow by part viii) of Lemma 6.1 and Proposition 6.4.

If A ∩ B ≠ ∅, define B′ = A ∖ B, so that Z is the disjoint union of A and B′. By elementary linear algebra, 〈ν d(A)〉∩〈ν d(B)〉 is also spanned by ν d(A ∩ B) and 〈ν d(A)〉∩〈ν d(B′)〉. By the minimality of A, T cannot belong to the span of ν d(A ∩ B). Thus 〈ν d(A)〉∩〈ν d(B′)〉 is non empty, and the claim follows again, as above, by part viii) of Lemma 6.1 and Proposition 6.4.

We can be more precise about the dimension of the intersection of the span of ν d(A) and ν d(B).

Lemma 6.3

Let \(A,B\subset \mathbb {P}^n\) be disjoint finite sets. Set Z = A  B. Then:

$$\displaystyle \begin{aligned}\dim(\langle \nu_d(A)\rangle\cap\langle\nu_d(B)\rangle) = \ell(Z)-h_Z(d)-1. \end{aligned}$$

If A  B  ∅, then:

$$\displaystyle \begin{aligned}\dim(\langle \nu_d(A)\rangle\cap\langle\nu_d(B)\rangle)\leq \dim(\nu_d(A\cap B))+\ell(Z)-h_Z(d).\end{aligned}$$

Proof

The first formula in an exercise for the application of the Grassmann intersection formula. The second formula follows since, setting B 0 = B ∖ A so that A, B 0 are disjoint and Z = A ∪ B 0, by elementary linear algebra 〈ν d(A)〉∩〈ν d(B)〉 is spanned by ν d(A ∩ B)) and 〈ν d(A)〉∩〈ν d(B 0)〉.

An extension of Sylvester’s theorem, which works for all symmetric tensors in \(\mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\), is possible for n > 1 only for small values of the rank. The following statement is proved in Theorem 1.5.1 of [9]. We give here an alternative proof, in terms of the Hilbert function of decompositions.

Theorem 6.4

Assume that a tensor \(T\in \mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) has a decomposition A with ℓ(A) ≤ (d + 1)∕2. Then T has rank ℓ(A) and it is identifiable.

Proof

Assume on the contrary that T has a second decomposition B with (B) ≤ (A), and take the union Z = A ∪ B. Then (Z) ≤ 2(A) ≤ d + 1. By Lemma 6.2 we have Dh Z(d + 1) > 0. Thus by Proposition 6.4 and by point iii) of Lemma 6.1 we get Dh Z(j) > 0 for j = 0, …, d + 1. Hence ∑j Dh Z(j) ≥ d + 2, which contradicts point vii) of Lemma 6.1.

An easy extension of Theorem 6.4 is given by the following result.

Theorem 6.5

Assume that a tensor \(T\in \mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) has a decomposition A with ℓ(A) ≤ (d + n)∕2, such that \(\langle A\rangle =\mathbb {P}^n\) . Then T has rank ℓ(A) and it is identifiable.

Proof

Assume on the contrary that T has a second decomposition B with (B) ≤ (A), and take the union Z = A ∪ B. Then (Z) ≤ 2(A) ≤ d + n. By Lemma 6.2 we have Dh Z(d + 1) > 0. Thus by Proposition 6.4 and by point iii) of Lemma 6.1 we get Dh Z(j) > 0 for j = 0, …, d + 1. By Example 6.6 and by Proposition 6.3 we get h Z(1) = n + 1, so that Dh Z(1) = n. Hence ∑j Dh Z(j) ≥ d + n + 1, which contradicts point vii) of Lemma 6.1.

A tensor \(T\in \mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) is concise if there exist no linear subspaces \(W\subset \mathbb {C}^{n+1}\), of codimension 1, such that T belongs to \(\mathbb {P}(Sym^d(W))\).

The previous statement implies that when T is concise and it has a decomposition of cardinality ≤ (d + n)∕2, then T is identifiable.

To go further, we may assume some restrictions on the geometry of a decomposition A of T.

Lemma 6.4

Let \(Z\subset \mathbb {P}^n\) be a finite set and assume that for some j ≥ 1: Dh Z(j + 1) = Dh Z(j) = 1. Then Z contains an aligned subset Z′ of cardinality ℓ(Z′) = j + 2, and \(Dh_Z(i)=Dh_{Z'}(i)\) for all i  j.

Proof

See Lemma 2 of [7].

The following result gives a further extension of Theorem 6.4 (compare with Theorem 2 of [4]).

Proposition 6.6

Fix a form \(T\in \mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) and a minimal decomposition \(A\subset \mathbb {P}^n\) of T. Assume that ℓ(A) ≤ d and A does not contain an aligned subset of cardinality d∕2. Then T has rank ℓ(A) and it is identifiable.

Proof

Assume there exists another decomposition B of T with (B) ≤ d and call Z the union Z = A ∪ B. Then (Z) ≤ 2d, moreover, by Lemma 6.2, Dh Z(d + 1) > 0, which implies Dh Z(d) > 0. By Example 6.1 we get that h A(1) = 2, hence also h Z(1) = 2, by Now assume Dh Z(d) ≥ 2. Then Dh Z(j) ≥ 2 for j = 1, …, d, by Proposition 6.4, so that ∑j Dh Z(j) ≥ 2d + 2, which contradicts point vii) of Lemma 6.1. Then for some j ≥ 1, j ≤ d, we have Dh Z(j) < 2. By Proposition 6.4 again, this implies Dh Z(d) = Dh Z(d + 1) = 1. Hence by Lemma 6.4, Z contains an aligned subset Z′ with (Z′) ≥ d + 2, and \(Dh_Z(i)=Dh_{Z'}(i)\) for i > d. Since Z′ cannot contain A, then there exists a proper subset A′⊂ A and a subset B′⊂ B such that Z′ = A′∪ B′. Shrinking B′, if necessary, we may assume that B′∩ A = ∅, so that also A′∩ B′ = ∅. Then by (6.3):

$$\displaystyle \begin{aligned}\displaystyle \dim(\langle \nu_d(A')\rangle\cap\langle\nu_d(B')\rangle) = \ell(Z')-h_{Z'}(d)-1= \sum_{i>d}h_{Z'}(i) =\\\displaystyle = \sum_{i>d}h_Z(i) = \dim(\langle \nu_d(A)\rangle\cap\langle\nu_d(B_0)\rangle), \end{aligned} $$

where B 0 = B ∖ A. Thus:

$$\displaystyle \begin{aligned}\langle \nu_d(A')\rangle\cap\langle\nu_d(B')\rangle = \langle \nu_d(A)\rangle\cap\langle\nu_d(B_0)\rangle \end{aligned}$$

Since, as in the proof of Lemma 6.3, the intersection 〈ν d(A)〉∩〈ν d(B)〉 is spanned by ν d(A ∩ B) and 〈ν d(A)〉∩〈ν d(B 0)〉, it follows that T belongs to the span of ν d((A ∩ B) ∪ A′). The minimality of A implies A = (A ∩ B) ∪ A′, so the points of A which are not contained in B are aligned. By assumption (A′) ≤ d∕2 and (A′) + (B′) = (Z′) ≥ d + 2, it follows that (B′) ≥ 2 + d∕2. Thus (A ∩ B) ≤ (B) − 2 − d∕2 ≤ (A) − 2 − d∕2. Then

$$\displaystyle \begin{aligned}\ell(A)\leq \ell(A')+\ell (A\cap B)\leq d/2+\ell(A)-2-d/2=\ell(A)-2,\end{aligned}$$

a contradiction.

In order to go further in the study of the identifiability of symmetric tensors, one needs a refinement of Lemma 6.4. The refinement is provided by the following, strong result of Bigatti, Geramita and Migliore (for the case n = 2 the result has been proved by Davis).

Theorem 6.6

Let \(Z\subset \mathbb {P}^n\) be a finite set. Assume that for some s  j, Dh Z(j) = Dh Z(j + 1) = s. Then there exists a reduced curve C of degree s such that, setting Z′ = Z  C and Z″ = Z  Z′:

  1. 1.

    for i  j − 1, \(h_{Z'}(i)=h_Z(i)-\ell (Z'')\) ;

  2. 2.

    for i  j, h Z(i) = h C(i);

  3. 3.

    \(Dh_{Z'}(i)=\begin {cases} Dh_C(i) \mathit{\mbox{ for }} i\leq j+1; \\ Dh_Z(i) \mathit{\mbox{ for }} i\geq j.\end {cases}\)

In particular, \(Dh_{Z'}(i)=s\) for s  i  j + 1.

For n = 2, i.e. when \(Z\subset \mathbb {P}^2\) , we also have:

$$\displaystyle \begin{aligned}h_{Z''}(j-1)=\ell(Z'') \quad \mathit{\mbox{ and }}\quad Dh_{Z''}(i)=Dh_Z(i+s)-s \mathit{\mbox{ for }} i+s\leq j.\end{aligned}$$

Proof

See Theorem 3.6 of [8], and [15] for the case n = 2.

Thanks to Theorem 6.6, for the case n = 2 one can prove an extension of Proposition 6.6:

Theorem 6.7

Fix a form \(T\in \mathbb {P}(Sym^d(\mathbb {C}^3))\) and a minimal decomposition \(A\subset \mathbb {P}^n\) of T. Assume that for all j the Kruskal rank of v j(A) is maximal, i.e. it is equal to the minimum between ℓ(A) and \(\binom {j+2}2\) . If

$$\displaystyle \begin{aligned}\ell(A)< \frac{d^2+d}8,\end{aligned}$$

then T has rank ℓ(A) and it is identifiable.

Proof

See Theorem 1.4 of [5], in which the general uniform position (GUP) assumption is equivalent to the condition that the Kruskal rank of v j(A) is maximal for all j.

One aspect of the study of decomposition which has not been developed appropriately derives from the observation that Sylvester Theorem 6.3 can be sharpened as follows.

Theorem 6.8

Assume n = 1. Assume that \(T\in \mathbb {P}(Sym^d(\mathbb {C}^{2}))\) has a minimal decomposition A with ℓ(A) < d + 1. Then for any other minimal decomposition B of T one has ℓ(A) + ℓ(B) ≥ d + 2.

Proof

Assume on the contrary that T has a second decomposition B with (B) + (A) ≤ d + 1, and take the union Z = A ∪ B. Then (Z) ≤ d + 1. Then we conclude as in the proof of Theorem 6.4.

Remark 6.14

One can prove a statement similar to Theorem 6.5 under the assumption that \(\langle A\rangle =\mathbb {P}^n\). Namely in this case for any other minimal decomposition B of T one has (A) + (B) ≥ d + n. Details are left to the reader.

6.4 Kruskal’s Criterion and Terracini’s Criterion

The most famous and most used criterion for detecting the identifiability of a given tensor was proved by Kruskal in 1977 (see [21]). Kruskal’s criterion was originally proved for 3way, non necessarily symmetric, tensors. The application to symmetric tensors of any size is described e.g. in [14]. We recall the result here, rephrased in terms of the geometric language.

Theorem 6.9 (Reshaped Kruskal’s Criterion)

Let \(T\in Sym^d(\mathbb {C}^{n+1})\) and let \(A\subset \mathbb {P}^n\) be a minimal decomposition of T. Fix a partition d = a + b + c, with 0 < a  b  c. Write k a, k b, k c for the Kruskal ranks of v a(A), v b(A), v c(A) respectively. If

$$\displaystyle \begin{aligned}\ell(A)\leq \frac {k_a+k_b+k_c-2}2\end{aligned}$$

then T has rank ℓ(A) and it is identifiable.

Of course the efficiency of the previous criterion depends on the choice of the partition. One should observe that computing the Kruskal ranks can be demanding, for large values of d, unless the coordinates matrices of v a(A), v b(A), v c(A) have full rank. For that reason, and also for widening the range in which Kruskal’s criterion applies, it is usually convenient to us a maximally unbalanced partition

Example 6.3

Consider the case d = 4. The unique partition is a = b = 1, c = 2.

If 2 ≤ (A) ≤ n + 1, in the most favorable case in which k a = k b = k c = (A), then the condition (A) ≤ (k a + k b + k c − 2)∕2 is automatically satisfied and Kruskal’s criterion applies.

If \(n+1<\ell (A)\leq \binom {n+2}2\), then the most favorable case is k a = k b = n + 1 and k c = (A). In this situation (A) ≤ (k a + k b + k c − 2)∕2 is equivalent to (A) ≤ 2n.

So, one cannot hope to apply directly Kruskal’s criterion, for d = 4, as soon as (A) > 2n.

A direct improvement of Kruskal’s criterion is impossible, unless one adds some extra test on the tensor T. Namely Kruskal’s criterion (even in its reshaped version) is known to be sharp, in its maximal range.

Theorem 6.10

For any n, d there exist a, b, c and a tensor \(T\in Sym^d(\mathbb {C}^{n+1})\) with a minimal decomposition A such that the Kruskal’s ranks k a, k b, k c are maximal (i.e. \(k_a=\min \{\ell (A),\binom {n+a}a\) , and a similar equality holds for b and c), with

$$\displaystyle \begin{aligned}\ell(A)=\frac{k_a+k_b+k_c}2\end{aligned}$$

and such that T is not identifiable.

Proof

The proof is essentially due to Derksen ([16]), who proved the result in the non symmetric case. Remark 1.1 of [2] contains the observation that, when T is symmetric, then Derksen’s construction provides several symmetric decompositions of T.

Thus, given a decomposition A of a fixed symmetric tensor T, one can test the identifiability (and the rank) of T by computing the Kruskal ranks k a of the images of A in suitable Veronese embeddings, hoping to obtain k a + k b + k c ≥ 2(A) + 2. If the inequality holds, Kruskal’s theorem guarantees the identifiability of T.

Typically, the reshaped Kruskal’s criterion works for small values of (A). To study the identifiability of tensors in a wider range, one needs to add some new test for T.

An example of a test that, together with Kruskal’s test, can provide an affirmative answer for the identifiability of T, is provided by an observation which comes out from the Terracini’s description of the tangent space to the set of tensors of fixed rank.

In the space \(\mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\), call Σ r the set of tensors of rank r.

For small values of r, i.e. for \(r(n+1)\leq \binom {n+d}d\), Σ r is locally closed in the Zariski topology, i.e. it is an open subset of a projective subvariety (the r-th secant variety of the Veronese image \(v_d(\mathbb {P}^n)\)).

Consider the symmetric product \((\mathbb {P}^n)^{(r)}\). In the product

$$\displaystyle \begin{aligned}\mathbb{P}(Sym^d(\mathbb{C}^{n+1}))\times (\mathbb{P}^n)^{(r)}\end{aligned}$$

consider the subvariety r of pairs (T, [{P 1, …, P r}]) such that the set A = {P 1, …, P r} is mapped by v d to a finite set which spans a subspace of dimension r − 1 in \(\mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) (i.e. v d(A) is linearly independent) and T belongs to the span of v d(A).

The set r, which is a quasi-projective variety, is called the abstract secant variety of \(v_d(\mathbb {P}^n)\). The projection to the first factor maps r surjectively to Σ r.

Definition 6.11

Define the r-th secant map s r as the map projection to the first factor

$$\displaystyle \begin{aligned}s_r: A\varSigma_r\to \mathbb{P}(Sym^d(\mathbb{C}^{n+1})).\end{aligned}$$

The image of the secant map is Σ r. The inverse image of a tensor T of rank r in the secant map is the set of decompositions of T.

Since \(v_d(\mathbb {P}^n)\) is a smooth variety, then \((\mathbb {P}^n)^{(r)}\) is smooth, outside the diagonals. Thus also r, which is a \(\mathbb {P}^{r-1}\) bundle over a subset of \((\mathbb {P}^n)^{(r)}\) which does not meet the diagonals, is smooth.

Definition 6.12

The tangent space to r at a point (T, [{P 1, …, P r}]) maps, in the differential of s r, to the space \(\mathscr {T}\) in \(\mathbb {P}^N=\mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) spanned by the tangent spaces to \(v_d(\mathbb {P}^n)\) at the points v d(P 1), …, v d(P n). We call this space the Terracini space of the decomposition A = {P 1, …, P r} of T.

The name of Terracini space comes from the celebrated Terracini’s Lemma, which says that, for a general choice of T ∈ Σ r and for r ≤ N, the Terracini space is the tangent space to Σ r at T. Thus, a computation of the dimension of the Terracini space at a general point corresponds to compute the dimension of the set Σ r of tensors of rank r ≤ N.

Remark 6.15

The dimension of the Terracini space \(\mathscr {T}\) is naturally bounded:

$$\displaystyle \begin{aligned}\dim(\mathscr{T})\leq (n+1)r-1,\end{aligned}$$

and the equality means that the tangent spaces to \(v_d(\mathbb {P}^n)\) at the points v d(P i)’s are linearly independent.

Since r is a \(\mathbb {P}^{r-1}\) bundle over a quasi-projective variety of dimension nr, then \((n+1)r-1=\dim (A\varSigma _r)\). It follows that the dimension of the Terracini space equals (n + 1)r − 1 when the differential of s r has maximal rank.

Remark 6.16

The decomposition A of \(T\in \mathbb {P}(Sym^d(\mathbb {C}^{n+1}))\) corresponds to the datum of r linear forms L 1, …, L r in the polynomial ring \(R=\mathbb {C}[x_0,\dots ,x_n]\).

The Terracini space can be naturally identified with the degree d homogeneous piece of the ideal in R spanned by

$$\displaystyle \begin{aligned}L_1^{d-1}m+ \dots +L_r^{d-1}m,\end{aligned}$$

where m is the ideal generated by the variables.

It follows that the computation of the dimension of the Terracini space at a decomposition of T is a straightforward application of simple algorithm of linear algebra.

We refer to the book [20] for the (elementary) proof of this statement.

The use of the Terracini space in the computation of the identifiability of a form T is meaningful in the following situation.

Proposition 6.7

Let A be a decomposition of T of length r and assume that there exists a non trivial family A t of decompositions of T, such that A 0 = A. Then the Terracini space of A has dimension strictly smaller than (n + 1)r − 1.

Proof

A t determines a positive dimensional subvariety W in the fiber of s r over T. Thus, there exists a tangent vector to r at (T, [A]), where [A] is the point of the symmetric product corresponding to A, which is killed by the differential of s r at (T, [A]). Then use Remark 6.15.

Now, we can introduce our strategy in finding criteria for the identifiability of symmetric tensors, which works in a range slightly wider than the Kruskal’s one.

If we can prove that tensors T which are non identifiable must have a positive dimensional family of different decompositions, containing the given decomposition A, then we can check the identifiability of T by computing the dimension of the Terracini space.

The fact that non identifiable tensors have indeed a positive dimensional family of different decompositions, is false in general. It turns out, however, that this fact holds in some cases, especially when we are outside the Kruskal’s numerical range, but very close to it.

A way to produce positive dimensional family of different decompositions is explained in the following:

Proposition 6.8

Assume that a decomposition A of length r of T is contained in a projective curve \(C\subset \mathbb {P}^n\) which is mapped by v d to a space \(\mathbb {P}^m\) , with m < 2r − 1. Then there exists positive dimensional family of different decompositions A t of T, such that A 0 = A.

Proof

T belongs to the span of v d(A), which is contained in the span of v d(C), which is contained in \(\mathbb {P}^m\). The condition m < 2r + 1 implies that there is a positive dimensional family of subsets A t ⊂ C such that T ∈〈v d(A t)〉. Namely, the abstract r secant variety \(A\varSigma ^C_r\) of C has dimension 2r − 1, thus all the components of the fibers of the map \(A\varSigma ^C_r\to \mathbb {P}^m\) are positive dimensional.

Now we can mix together the analysis of the Hilbert function, the Cayley-Bacharach condition and the computation of the dimension of the Terracini space, to produce a criterion for the identifiability of T.

Theorem 6.11 (See [2])

Let T be a quartic form in n + 1 variables, and consider a decomposition A of T of length 2n + 1.

Assume that:

  1. a)

    the Kruskal rank of A is n + 1;

  2. b)

    the Terracini space at A has (the maximal) dimension (2n + 1)(n + 1) − 1.

Then T has rank 2n + 1 and it is identifiable.

Notice that conditions a) and b) are expected to hold for a general quartic, i.e. outside a proper Zariski closed subset (of measure 0) in the space of quartics. Thus the previous theorem provides a criterion to prove the identifiability of T, except for very special tensors.

Proof

We give a sketch of the proof.

First notice that, by Proposition 6.5, the set v 2(A) is linearly independent, i.e. it has Kruskal rank 2n + 1.

Call B a different decomposition of length ≤ 2n + 1 for T, which we want to exclude. Call Z the union Z = A ∪ B and consider the Hilbert function of Z.

First step is to prove that Z has the Cayley-Bacharach property CB(4). This is almost clear when A ∩ B = ∅, while if A ∩ B ≠ ∅ the claim follows from Kruskal’s theorem.

Next, since Z has the property CB(4), by Theorem 6.1 it follows soon that Dh Z(3) + Dh Z(4) + Dh Z(5) ≥ h A(2) = 2n + 1, so that h Z(2) = h A(2) = 2n + 1. Then one invokes the following extension of the classical Castelnuovo’s Lemma:

Lemma 6.5 (See [2], Lemma 5.4)

Let Z be a set of r ≥ 2n + 3 points in \(\mathbb {P}^n\) which impose at most 2n + 1 conditions to quadrics. Assume that Z has a subset Z′ of 2n + 1 points in LGP. Then the entire Z is in LGP and it is contained in an irreducible rational normal curve.

The classical formulation of Castelnuovo’s lemma required that the whole set Z is in LGP, which we cannot assume in our setting, because we only know the position of A, which contains 2n + 1 points of Z, while we have no control of the points of B. Fortunately, the extension matches exactly our requirements. Now we can turn back to the proof of the Theorem.

Since Z has a subset, namely A, which is in LGP, then it follows that Z, hence also A, sits in a rational normal curve C of \(\mathbb {P}^n\). The image of C in the Veronese map v 4 spans a \(\mathbb {P}^{4n}\). Hence the claim follows by Proposition 6.8.

6.5 A New Result on the Decomposition of Tensors

In this section we improve slightly Theorem 6.11, by removing the assumption that the Kruskal rank of A is n + 1, and replacing it by a numerical assumption on (A). At a certain point of the proof we will need the cohomological properties of the residue of a finite set with respect to a hyperplane. This is the unique passage in which some sophisticated algebraic machinery enters into the proof.

Let Z be a finite set in \(\mathbb {P}^n\), and let H be a hyperplane. Call Z 1 the intersection Z 1 = Z ∩ H and call Z 2 the set:

$$\displaystyle \begin{aligned}Z_2= Z\setminus Z_1 = Z\setminus (Z\cap H).\end{aligned}$$

For obvious reasons, Z 2 is called the residue of Z with respect to H.

If \(I_Z,I_{Z_2}\) denote the homogeneous ideals of Z, Z 2 respectively, the multiplication by an equation of H determines an exact sequence of graded modules:

$$\displaystyle \begin{aligned} 0\to I_{Z_2}(1) \to I_Z(2) \stackrel{\rho}\longrightarrow I_{Z_1,H}(2) \end{aligned} $$
(6.2)

in which the rightmost ideal \(I_{Z_1,H}\) is the homogeneous ideal of Z 1 in H.

The following result is a straightforward application of the cohomology of maps of sheaves:

Lemma 6.6

Assume that Z 2 is linearly independent. Then the rightmost map ρ in sequence (6.2) is surjective.

Proof

The cokernel of ρ is contained in the cohomology group \(H^1(\mathscr {I}_{Z_2}(1))\), where \(\mathscr {I}_{Z_2}\) is the ideal sheaf of Z 2. Moreover \(H^1(\mathscr {I}_{Z_2}(1))\) vanishes if Z 2 is linearly independent, because in this case the evaluation map ev(1) on Z 2 determines a surjective map \(\mathbb {C}^{n+1}\to \mathbb {C}^{\ell (Z_2)}\).

Remark 6.17

With the same trick, one can prove the following general statement:

Assume that the residue Z 2 of a finite set Z, with respect to a hyperplane H, is separated by forms of degree d − 1. Then any form of degree d in H which contains Z 1 = Z ∖ Z 2 can be lifted to a form of degree d in \(\mathbb {P}^n\) which contains Z.

As a consequence, in the hypothesis of Lemma 6.6, it turns out that every quadric of the hyperplane H that contains Z 2 can be lifted to a quadric of \(\mathbb {P}^n\) that contains Z.

We will need the following, well known remark for linearly independent sets W in a projective space \(\mathbb {P}^n\):

Lemma 6.7

Let W be a linearly independent finite set in \(\mathbb {P}^n\) . Then for any QW, there exists a quadric of \(\mathbb {P}^n\) containing W and missing Q. In other words, the ideal of W is generated by quadrics.

Proof

The proof is an easy argument of linear algebra. After shrinking n we may always assume W = {P 1, …, P n+1}. If Q does not belong to the span of any proper subset of W, just by taking two hyperplanes containing two proper subsets we get the claim. Thus, reorder the points of W so that P 1, …, P s (s ≥ 2, s ≤ n) is a minimal subset whose span L contains Q. Since the points are linearly independent, the span M of P 1, …, P s−1, P s+1 intersects L in the span of P 1, …, P s−1, hence by minimality it does not contain Q. Similarly, the span M′ of P s, P s+2, …, P n+1 intersects L only in P n. The union of a general hyperplane containing M and a general hyperplane containing M′ provides a quadric containing W and missing Q.

Now we are ready to state and proof our result.

Theorem 6.12

Let T be a quartic form in n + 1 variables, and consider a decomposition A of T. Call k the Kruskal rank of A and assume that ℓ(A) ≤ 2k − 1. Assume that the Terracini space at A has (maximal) dimension (2k − 1)(n + 1) − 1.

Then T has rank 2k − 1 and it is identifiable.

Notice that since k ≤ n + 1, it follows (A) ≤ 2n + 1. Moreover, by Proposition 6.5, we know that A is separated by quadrics, i.e. v 2(A) is linearly independent. This implies immediately that also v 4(A) is linearly independent.

Notice also that if (A) < 2k − 1, then A satisfies the hypothesis of the reshaped Kruskal’s criterion, because in this case

$$\displaystyle \begin{aligned}\ell(A)\leq \frac{k+k+\ell(A)-2}2,\end{aligned}$$

so that the identifiability of A follows immediately.

Thus the Theorem produces a new criterion only for (A) = 2k − 1. Hence we assume, in the proof, that (A) = 2k − 1.

Proof

As in the proof of Theorem 6.11, we will prove that, under the assumptions, if another decomposition B of cardinality (B) ≤ 2k − 1 exists, then there exists a curve C containing A and such that v 4(C) spans a space of dimension ≤ 4k − 4, which contradicts the assumption 2).

Of course, we may assume that A spans \(\mathbb {P}^n\), otherwise we simply decrease n. It follows that 2k − 1 > n and the difference of the Hilbert function of A is:

$$\displaystyle \begin{aligned}Dh_A(0)=1,\quad Dh_A(1)=n, \quad Dh_A(2)=2k-2-n.\end{aligned}$$

Assume that a second decomposition B exists. The first step is to prove that Z = A ∪ B satisfies the Cayley Bacharach property CB(4), which holds by following verbatim the proof of the similar statement in Theorem 6.2 of [2].

It follows then, by Theorem 6.1, that the difference of the Hilbert function of Z satisfies Dh Z(3) + Dh Z(4) + Dh Z(5) = 2k − 1, so that in particular (B) = 2k − 1, A, B are disjoint and the difference of the Hilbert function of Z satisfies:

$$\displaystyle \begin{aligned}Dh_Z(0)=1,\quad Dh_Z(1)=n, \quad Dh_Z(2)=2k-2-n.\end{aligned}$$

Thus, summing up, one gets h Z(2) = h A(2), i.e. all the quadrics that contain A must contain Z.

The assumption that k is the Kruskal rank of A means that any subset of k points in A is linearly independent, while there exists a subset of k + 1 points which generates a subspace \(\varLambda =\mathbb {P}^{k-1}\). After rearranging the points, we may assume that P 1, …, P k+1 generate Λ, P k+1, …, P k+q are also contained in Λ, and P k+q+1, …, P 2k−1 are outside Λ. Notice that we may always assume that A is non degenerate, thus k + q < 2k − 1. Call Λ′ the space generated by P k+q+1, …, P 2k−1. Any pair of hyperplanes H, H′ which contain Λ, Λ′ respectively, determine a quadric which contains A. It follows that all the points of B are contained either in Λ or in Λ′.

Let Q be a point of B which lies in Λ. For any subset W of k − 1 points among P 1, …, P k+q consider the hyperplane L W of Λ spanned by W. If Q belongs to no hyperplanes L W, then there are quadrics in Λ which contain P 1, …, P k+q. Thus if H is a general hyperplane containing Λ then there are quadrics in H which contain P 1, …, P k+q and miss Q. Since the set P k+q+1, …, P 2k−1 is linearly independent, by our assumption on the Kruskal rank of A, then by Lemma 6.6 one finds a quadric of \(\mathbb {P}^n\) which contains A and misses Q, contradicting h Z(2) = h A(2).

Hence, there exists a set W of k − 1 points among P 1, …, P k+q which spans a hyperplane L W of Λ containing Q. Since W is linearly independent, by Lemma 6.7 one can find a quadric K in L W that contains W and misses Q. Since, by our assumption on the Kruskal rank of A, also {P 1, …, P k+q}∖ W, which contains at most k points, is linearly independent, then by Lemma 6.6 we can lift K to a quadric K′ of Λ which misses Q and contains P 1, …, P k+q. As above, K′ lifts to a quadric K″ which contains A and misses Q. Thus we have a contradiction with h Z(2) = h A(2).

It follows that all the points of B belong to Λ′. In particular, the form T does not involve all the variables. After choosing carefully the coordinates x 0, …, x n in \(\mathbb {P}^n\), we may assume that T does not involve x n. But then, by replacing x n with tx n in the points of A (actually in the points of A ∩ Λ), as t varies we get a family of decompositions of T which coincides with A for t = 1. By Proposition 6.7, this contradicts the assumption that the Terracini space has maximal dimension.

Remark 6.18

As in Section 6 of [2], one can create an algorithm that uses Theorem 6.12 to detect the identifiability of quartics of low rank. Given a symmetric decomposition of length r of a quartic

$$\displaystyle \begin{aligned}T = \sum_{i=1}^r \nu_{4}( P_i ),\end{aligned}$$

in the form of the collection of points \(A = \{ P_i = [{\mathbf {m}}_i] \}_{i=1}^r \subset \mathbb {P}^n\), we can apply the following algorithm for verifying that the given decomposition of T is identifiable:

  1. 1)

    Kruskal’s test: compute the Kruskal rank k of A;

    1. S1.

      If r > 2k − 1, the criterion cannot be applied.

    2. S2.

      If r < 2k − 1, use the reshaped Kruskal criterion from [13, Section 6.2].

    3. S3.

      If r = 2k − 1, perform the:

  2. 2)

    Terracini’s test: check that the dimension of \(\langle T_{{\mathbf {m}}_1}{\nu _4(\mathbb {C}^{n+1})}, \ldots , T_{{\mathbf {m}}_r} {\nu _4(\mathbb {C}^{n+1})} \rangle \) is (2k − 1)(n + 1) − 1.

If all these tests are successful, then T is of rank r and is identifiable.

Notice that the computation of the Kruskal rank of A turns out to be the heaviest step of the algorithm.

6.6 Final Remarks and Open Problems

  1. 1.

    We believe that the range in which the non-identifiability of tensors implies the existence of a positive dimensional family of decompositions (which can be detected by the computation of the Terracini space) goes beyond the numerical bounds given in Theorems 6.11 and 6.12.

    In order to extend the previous results, however, one needs extensions of the basic Castelnuovo’s Lemma 6.5. What we would need is to replace the existence of a rational normal curve, predicted by Lemma 6.5 for sets of points with special Hilbert functions, with the existence of other types of curves (elliptic, or even of higher genera), when the number of points increases.

    Similar results are known in some cases (see e.g. [19, 25]), but not in a form that can be immediately applied to our situation.

    We would like to stimulate further researches on the geometry of sets of points with special Hilbert functions, with the final target of an application to tensor analysis.

  2. 2.

    The geometric methods known so far for the study of the identifiability of specific tensors, as the Kruskal’s criterion and the extension given in the previous sections, are based on the study of the geometry of a given decomposition. The idea has a basic bug: once the identifiability follows from geometric properties of a given decomposition A, then it must hold for all the tensors which lie in the span of v d(A) (at least those for which A is minimal), regardless of the coefficients that are used to produce the form T. Of course, we can expect that a similar uniform behavior holds only for small values of the rank r. When r increases, then it is natural to expect that the space 〈v d(A)〉 contains both identifiable and non-identifiable points.

    As a consequence, we need criteria for identifiability which are able to distinguish between different points of the span 〈v d(A)〉 of a given decomposition A.

    We believe that a geometric analysis of A and of its linked sets of points can produce geometric criteria which reach much further than the range of application of Kruskal’s criterion.

  3. 3.

    A different approach to the study of the identifiability of tensors is contained in the paper [22]. The authors prove that when the space spanned by partial derivatives of the form T (the catalecticant space, in the terminology of [20]) meets the corresponding variety in a finite set of the expected length r, then r is the rank of T and the tensor is identifiable.

    The method of partial derivatives has the advantage that it does not need to start with a given decomposition. On the other hand, for special tensors, it does not describe the geometric situation which yields the non-uniqueness of the decomposition. Furthermore, the method relies on the computation of an intersection of algebraic varieties, i.e. on methods of computer algebra, which usually cost a lot in terms of computational complexity.

    We believe that a mix of the two methods, which will be the target of a forthcoming paper, will produce new, interesting developments in the theory.

  4. 4.

    We wonder if the analysis of tensor decomposition by means of geometric methods, related with the study of finite sets in projective spaces, can be extended beyond the case of symmetric tensors. For general tensors, the natural substitute for the Hilbert function is the multigraded Hilbert function. Indeed, for general tensors, one has only to consider the first piece of the multigraded Hilbert function, i.e. the piece bounded by the origin and the multidegree (1, …, 1). For this piece of the Hilbert function, which is basically the Segre function, in the terminology of [12] and [6], very few is known. For instance, we do not know an analogue of Lemma 6.1, which lists the most elementary properties. A study of the Segre function, aimed to an application to tensor analysis, will probably yield several new, valuable results on the theory.