Key words and Phrases

Mathematics Subject Classification (2000):

1 Introduction and Overview

Suppose f(x 1, , x n ) is a form of degree d with coefficients in a field \(K \subseteq \mathbb{C}\). The K -length of f, L K (f), is the smallest r for which there exist λ j , α jk K so that

$$\displaystyle{ f(x_{1},\ldots,x_{n}) =\sum _{ j=1}^{r}\lambda _{ j}{\bigl (\alpha _{j1}x_{1} +\ldots +\alpha _{jn}x{_{n}\bigr )}}^{d}. }$$
(1.1)

In this paper, we consider the K-length of a fixed form f as K varies; this is apparently an open question in the literature, even for binary forms (n = 2). Sylvester [53, 54] explained how to compute \(L_{\mathbb{C}}(f)\) for binary forms in 1851 and gave a lower bound for \(L_{\mathbb{R}}(f)\) for binary forms in 1864. Except for a few remarks, we shall restrict our attention to binary forms.

It is trivially true that L K (f) = 1 for linear f and for d = 2, L K (f) equals the rank of f: a representation over K can be found by completing the square, and this length cannot be shortened by enlarging the field. Accordingly, we shall also assume that d ≥ 3.

When \(K = \mathbb{C}\), the λ j ’s in (1.1) are superfluous. The computation of \(L_{\mathbb{C}}(f)\) is a huge, venerable and active subject, and difficult when n ≥ 3. The interested reader is directed to [2, 7, 8, 14, 17, 18, 22, 25, 28, 34, 4446] as representative recent works. Even for small n, d ≥ 3, there are still many open questions. Landsberg and Teitler [34] complete a classification of \(L_{\mathbb{C}}(f)\) for ternary cubics f and also discuss \(L_{\mathbb{C}}(x_{1}x_{2}\cdots x_{n})\), among other topics. Historically, much attention has centered on the \(\mathbb{C}\)-length of a general form of degree d. In 1995, Alexander and Hirschowitz [1] (see also [5, 36]) established that for n, d ≥ 3, this length is \(\lceil \frac{1} {n}\left ({ n+d-1 \atop n-1} \right )\rceil \), the constant-counting value, with the four exceptions known since the nineteenth century – (n, d) = (3, 5), (4, 3), (4, 4), (4, 5) – in which the length is \(\lceil \frac{1} {n}\left ({ n+d-1 \atop n-1} \right )\rceil + 1\). There has been a recent series of papers studying \(L_{\mathbb{R}}(f)\) [4, 9, 15]; these study the length in a greater depth than we do here.

Two central examples illustrate the phenomenon of multiple lengths over different fields.

Example 1.1.

Suppose \(f(x,y) = {(x + \sqrt{2}y)}^{d} + {(x -\sqrt{2}y)}^{d} \in \mathbb{Q}[x,y]\). Then L K (f) is 2 (if \(\sqrt{2} \in K\)) and d (otherwise). This example first appeared in [47, p. 137]. (See Theorem 4.6 for a generalization.)

Example 1.2.

If \(\phi (x,y) = 3{x}^{5} - 20{x}^{3}{y}^{2} + 10x{y}^{4}\), then L K (ϕ) = 3 if and only if \(\sqrt{-1} \in K\), L K (ϕ) = 4 for \(K = \mathbb{Q}(\sqrt{-2}), \mathbb{Q}(\sqrt{-3}), \mathbb{Q}(\sqrt{-5}), \mathbb{Q}(\sqrt{-6})\) (at least) and \(L_{\mathbb{R}}(\phi ) = 5\). (We give proofs of these assertions in Examples 2.1 and 3.1.)

The following simple definitions and remarks apply in the obvious way to forms in n ≥ 3 variables, but for simplicity are given for binary forms. A representation such as (1.1) is called K -minimal if r = L K (f). Two linear forms are called distinct if they (or their d-th powers) are not proportional. A representation is honest if the summands are pairwise distinct. Any minimal representation is honest. Two honest representations are different if the ordered sets of summands are not rearrangements of each other; we shall not distinguish between d and \({(\zeta _{d}^{k}\ell)}^{d}\) where \(\zeta _{d} = {e}^{2\pi i/d}\). If g is obtained from f by an invertible linear change of variables over K, then L K (f) = L K (g).

Given a form \(f \in \mathbb{C}[x,y]\), we let E f denote the field generated by the coefficients of f over \(\mathbb{C}\); L K (f) is defined for fields K satisfying \(E_{f} \subseteq K \subseteq \mathbb{C}\). The following implication is immediate:

$$\displaystyle{ K_{1} \subset K_{2}\Rightarrow L_{K_{1}}(f) \geq L_{K_{2}}(f). }$$
(1.2)

Strict inequality in (1.2) is possible, as shown by the two examples. Finally, we define the cabinet of f, \(\mathcal{C}(f)\), to be the set of all possible lengths for f.

There is a natural alternative definition of length in which sums of powers are considered without coefficients. This makes no difference when \(K = \mathbb{C}\) or \(K = \mathbb{R}\) and d is odd, but in other cases, a form might not even be a sum of powers. For example, \(\sqrt{ 2}\) is not totally positive in \(K = \mathbb{Q}(\sqrt{2})\), so \(\sqrt{ 2} \cdot {x}^{2}\) is not a sum of squares in K[x], and \({x}^{4} +\lambda {x}^{2}{y}^{2} + {y}^{4}\) is a sum of fourth powers of real linear forms if and only if 0 ≤ λ ≤ 6. This alternative definition was studied by Ellison [19] in the special cases \(K = \mathbb{C}, \mathbb{R}, \mathbb{Q}\). Newman and Slater [39] do not restrict to homogeneous polynomials. They write x as a sum of d d-th powers of linear polynomials; by substitution, any polynomial is a sum of at most d d-th powers of polynomials. They also show that the minimum number of d-th powers in this formulation is \(\geq \sqrt{d}\). Because of the degrees of the summands, these methods do not homogenize to forms. Mordell [37] showed that a polynomial that is a sum of cubes of linear forms over \(\mathbb{Z}\) is also a sum of at most eight such cubes. More generally, if R is a commutative ring, then its d -Pythagoras number, P d (R), is the smallest integer k so that any sum of d-th powers in R is a sum of k d-th powers. Helmke [25] uses both definitions for length for forms, but is mainly concerned with the alternative definition in the case when K is an algebraically closed (or a real closed) field of characteristic zero, not necessarily a subset of \(\mathbb{C}\). This subject is closely related to Hilbert’s 17th Problem; see [1012]. In [47], a principal object of study is Q n, 2k , the (closed convex) cone of real forms which are a sum of 2k-th powers of real linear forms.

We now outline the remainder of the paper.

In Sect. 2, we give a self-contained proof of Sylvester’s 1851 Theorem (Theorem 2.1). Although originally given over \(\mathbb{C}\), it adapts easily to any \(K \subset \mathbb{C}\) (Corollary 2.2). If f is a binary form, then L K (f) ≤ r if and only if a certain subspace of the binary forms of degree r (a subspace determined by f) contains a form that splits into distinct factors over K. We illustrate the algorithm by proving the assertions of lengths 3 and 4 for ϕ in Example 1.2.

In Sect. 3, we prove (Theorem 3.2) a homogenized version of Sylvester’s 1864 Theorem (Theorem 3.1), which implies that if real f has r linear factors over \(\mathbb{R}\) (counting multiplicity), then \(L_{\mathbb{R}}(f) \geq r\). As far as we can tell, Sylvester did not connect his two theorems: perhaps because he presented the second one for non-homogeneous polynomials in one variable.

We apply these theorems and some other simple observations in Sects. 4 and 5. We first show that if \(L_{\mathbb{C}}(f) = 1\), then \(L_{E_{f}}(f) = 1\) as well (Theorem 4.1). Any set of d + 1 d-th powers of pairwise distinct linear forms is linearly independent (Theorem 4.2). It follows quickly that if f(x, y) has two different honest representations of length r and s, then \(r + s \geq d + 2\) (Corollary 4.3), and so if \(L_{E_{f}}(f) = r \leq \frac{d+1} {2}\), then the representation over E f is the unique minimal \(\mathbb{C}\)-representation (Corollary 4.4). We show that Example 1.1 gives a template for forms f satisfying \(L_{\mathbb{C}}(f) = 2 < L_{E_{f}}(f)\) (see Theorem 4.6), and give two generalizations which provide other types of constructions (Corollaries 4.7 and 4.8) of forms with multiple lengths. We apply Sylvester’s 1851 Theorem to give an easy proof of the known result that \(L_{\mathbb{C}}(f) \leq d\) (Theorem 4.9) and a slightly trickier proof of the probably-known result that L K (f) ≤ d as well (Theorem 4.10). Theorem 4.10 combines with Theorem 3.2 into Corollary 4.11: if \(f \in \mathbb{R}[x,y]\) is a product of d linear factors and not a d-th power, then \(L_{\mathbb{R}}(f) = d\). Conjecture 4.12 asserts that \(f \in \mathbb{R}[x,y]\) is a product of d linear factors if and only if \(L_{\mathbb{R}}(f) = d\). This conjecture has recently been proven by Comon-Ottaviani-Causa-Re [9, 15] when the factors of f are distinct.

In Corollary 5.1, we discuss the various possible cabinets when d = 3, 4; and give examples for each one not already ruled out. We then completely classify binary cubics; the key point of Theorem 5.2 is that if the cubic f has no repeated factors, then L k (f) = 2 if and only if \(E_{f}(\sqrt{-3\Delta (f)}) \subseteq K\); this significance of the discriminant \(\Delta (f)\) can already be found in Salmon [52, Sect. 167]. This proves Conjecture 4.12 for d = 3. In Theorem 5.3, we show that Conjecture 4.12 also holds for d = 4. We then show (Theorem 5.4) that \(L_{\mathbb{C}}(f) = d\) if and only if there are distinct linear forms , ℓ ′ so that \(f {=\ell }^{d-1}\ell^{\prime}\). (One direction of this result is well-known; the other has recently been proved by Białynicki-Birula and Schinzel [2].) The minimal representations of x k y k are parameterized (Theorem 5.5), and in Corollary 5.6, we show that \(L_{K}({({x}^{2} + {y}^{2})}^{k}) \geq k + 1\), with equality if and only if \(\tan \frac{\pi }{k+1} \in K\). In particular, \(L_{\mathbb{Q}}({({x}^{2} + {y}^{2})}^{2}) = 4\). Theorem 5.7 shows that \(L_{\mathbb{Q}}({x}^{4} + 6\lambda {x}^{2}{y}^{2} + {y}^{4}) = 3\) if and only if a certain quartic diophantine equation over \(\mathbb{Z}\) has a non-zero solution.

Section 6 lists some open questions.

We would like to express our appreciation to the organizers of the Higher Degree Forms conference in Gainesville in May 2009 for offering the opportunities to speak on these topics, and to write this article for its Proceedings. We also thank Mike Bennett, Tony Geramita, Giorgio Ottaviani, Joe Rotman and Zach Teitler for helpful conversations and correspondence.

2 Sylvester’s 1851 Theorem

Modern proofs of Theorem 2.1 can be found in the work of Kung and Rota: [33, Sect. 5], with further discussion in [3032, 49]. We present here a very elementary proof showing the connection with constant coefficient linear recurrences, in the hopes that this remarkable theorem might become better known to the modern reader.

Theorem 2.1 (Sylvester). 

Suppose

$$\displaystyle{ f(x,y) =\sum _{ j=0}^{d}\left ({ d \atop j} \right )a_{j}{x}^{d-j}{y}^{j} }$$
(2.1)

and suppose

$$\displaystyle{ h(x,y) =\sum _{ t=0}^{r}c_{ t}{x}^{r-t}{y}^{t} =\prod _{ j=1}^{r}(-\beta _{ j}x +\alpha _{j}y) }$$
(2.2)

is a product of pairwise distinct linear factors. Then there exist \(\lambda _{k} \in \mathbb{C}\) so that

$$\displaystyle{ f(x,y) =\sum _{ k=1}^{r}\lambda _{ k}{(\alpha _{k}x +\beta _{k}y)}^{d} }$$
(2.3)

if and only if

$$\displaystyle{ \left (\begin{array}{cccc} a_{0} & a_{1} & \cdots & a_{r} \\ a_{1} & a_{2} & \cdots &a_{r+1}\\ \vdots & \vdots & \ddots & \vdots \\ a_{d-r}&a_{d-r+1} & \cdots & a_{d} \end{array} \right )\cdot \left (\begin{array}{c} c_{0} \\ c_{1}\\ \vdots \\ c_{r} \end{array} \right ) = \left (\begin{array}{c} 0\\ 0\\ \vdots \\ 0 \end{array} \right ); }$$
(2.4)

that is, if and only if

$$\displaystyle{ \sum \limits _{t=0}^{r}a_{\ell +t}c_{t} = 0,\qquad \ell = 0,1,\ldots,d - r. }$$
(2.5)

Proof.

First suppose that (2.3) holds. Then for 0 ≤ jd,

$$\displaystyle\begin{array}{rcl} a_{j}& =& \sum _{k=1}^{r}\lambda _{ k}\alpha _{k}^{d-j}\beta _{ k}^{j}\Rightarrow\sum _{ t=0}^{r}a_{\ell +t}c_{t} =\sum _{ k=1}^{r}\sum _{ t=0}^{r}\lambda _{ k}\alpha _{k}^{d-\ell-t}\beta _{ k}^{\ell+t}c_{ t} {}\\ & =& \sum _{k=1}^{r}\lambda _{ k}\alpha _{k}^{d-\ell-r}\beta _{ k}^{\ell}\sum _{ t=0}^{r}\alpha _{ k}^{r-t}\beta _{ k}^{t}c_{ t} =\sum _{ k=1}^{r}\lambda _{ k}\alpha _{k}^{d-\ell-r}\beta _{ k}^{\ell}\ h(\alpha _{ k},\beta _{k}) = 0. {}\\ \end{array}$$

Now suppose that (2.4) holds and suppose first that c r ≠ 0. We may assume without loss of generality that c r = 1 and that α j = 1 in (2.2), so that the β j ’s are distinct. Define the infinite sequence \((\tilde{a}_{j})\), j ≥ 0, by:

$$\displaystyle{ \tilde{a}_{j} = a_{j}\quad \text{if}\quad 0 \leq j \leq r - 1;\quad \tilde{a}_{r+\ell} = -\sum _{t=0}^{r-1}\tilde{a}_{ t+\ell}c_{t}\quad \text{for}\quad \ell \geq 0. }$$
(2.6)

This sequence satisfies the recurrence of (2.5), so that

$$\displaystyle{ \tilde{a}_{j} = a_{j}\quad \text{for}\quad j \leq d. }$$
(2.7)

Since \(\vert \tilde{a}_{j}\vert \leq \gamma \cdot {M}^{j}\) for suitable γ, M by induction, the generating function

$$\displaystyle{ \Phi (T) =\sum _{ j=0}^{\infty }\tilde{a}_{ j}{T}^{j} }$$

converges in a neighborhood of 0. We have

$$\displaystyle{ \left (\sum _{t=0}^{r}c_{ r-t}{T}^{t}\right )\Phi (T) =\sum _{ n=0}^{r-1}\left (\sum _{ j=0}^{n}c_{ r-(n-j)}\tilde{a}_{j}\right ){T}^{n} +\sum _{ n=r}^{\infty }\left (\sum _{ t=0}^{r}c_{ r-t}\tilde{a}_{n-t}\right ){T}^{n}. }$$

It follows from (2.6) that the second sum vanishes, and hence \(\Phi (T)\) is a rational function with denominator

$$\displaystyle{ \sum _{t=0}^{r}c_{ r-t}{T}^{t} = h(T,1) =\prod _{ j=1}^{r}(1 -\beta _{ j}T). }$$

By the method of partial fractions, there exist \(\lambda _{k} \in \mathbb{C}\) so that

$$\displaystyle{ \sum _{j=0}^{\infty }\tilde{a}_{ j}{T}^{j} = \Phi (T) =\sum _{ k=1}^{r} \frac{\lambda _{k}} {1 -\beta _{k}T}\Rightarrow\tilde{a}_{j} =\sum _{ k=1}^{r}\lambda _{ k}\beta _{k}^{j}. }$$
(2.8)

A comparison of (2.8) and (2.7) with (2.1) shows that

$$\displaystyle{ f(x,y) =\sum _{ j=0}^{d}\left ({ d \atop j} \right )a_{j}{x}^{d-j}{y}^{j} =\sum _{ k=1}^{r}\lambda _{ k}\sum _{j=0}^{d}\left ({ d \atop j} \right )\beta _{k}^{j}{x}^{d-j}{y}^{j} =\sum _{ k=1}^{r}\lambda _{ k}{(x +\beta _{k}y)}^{d}, }$$
(2.9)

as claimed in (2.3).

If c r = 0, then c r − 1 ≠ 0, because h has distinct factors. We may proceed as before, replacing r by r − 1 and taking \(c_{r-1} = 1\), so that (2.2) becomes

$$\displaystyle{ h(x,y) =\sum _{ t=0}^{r-1}c_{ t}{x}^{r-t}{y}^{t} = x\prod _{ j=1}^{r-1}(y -\beta _{ j}x). }$$
(2.10)

Since c r = 0, the system (2.4) can be rewritten as

$$\displaystyle{ \left (\begin{array}{cccc} a_{0} & a_{1} & \cdots &a_{r-1} \\ a_{1} & a_{2} & \cdots & a_{r}\\ \vdots & \vdots & \ddots & \vdots \\ a_{d-r}&a_{d-r+1} & \cdots &a_{d-1} \end{array} \right )\cdot \left (\begin{array}{c} c_{0} \\ c_{1}\\ \vdots \\ c_{r-1} \end{array} \right ) = \left (\begin{array}{c} 0\\ 0\\ \vdots \\ 0 \end{array} \right ). }$$

We may now argue as before, except that (2.7) becomes

$$\displaystyle{ \tilde{a}_{j} = a_{j}\quad \text{for}\quad j \leq d - 1,\quad a_{d} =\tilde{ a}_{d} +\lambda _{r} }$$
(2.11)

for some λ r , and (2.9) becomes

$$\displaystyle\begin{array}{rcl} f(x,y)& =& \sum _{j=0}^{d}\left ({ d \atop j} \right )a_{j}{x}^{d-j}{y}^{j} =\lambda _{ r}{y}^{d} +\sum _{ k=1}^{r-1}\lambda _{ k}\sum _{j=0}^{d}\left ({ d \atop j} \right )\beta _{k}^{j}{x}^{d-j}{y}^{j} \\ & =& \lambda _{r}{y}^{d} +\sum _{ k=1}^{r-1}\lambda _{ k}{(x +\beta _{k}y)}^{d}. {}\end{array}$$
(2.12)

By (2.10), (2.12) meets the description of (2.3), completing the proof.

The \((d - r + 1) \times (r + 1)\) Hankel matrix in (2.4) will be denoted H r (f). If (f, h) satisfy the criterion of this theorem, we shall say that h is a Sylvester form for f. If the only Sylvester forms of degree r are λ h for \(\lambda \in \mathbb{C}\), we say that h is the unique Sylvester form for f. Any polynomial multiple of a Sylvester form that has no repeated factors is also a Sylvester form, since there is no requirement that λ k ≠ 0 in (2.3). If f has a unique Sylvester form of degree r, then \(L_{\mathbb{C}}(f) = r\) and L K (f) ≥ r.

The proof of Theorem 2.1 in [49] is based on apolarity. If f and h are given by (2.1) and (2.2), and \(h(D) =\prod _{ j=1}^{r}(-\beta _{j} \frac{\partial } {\partial x} +\alpha _{j} \frac{\partial } {\partial y})\), then

$$\displaystyle{ h(D)f =\sum _{ m=0}^{d-r} \frac{d!} {(d - r - m)!m!}\left (\sum _{i=0}^{d-r}a_{ i+m}c_{i}\right ){x}^{d-r-m}{y}^{m}. }$$

Thus, (2.4) is equivalent to h(D)f = 0. One can then argue that each linear factor in h(D) kills a different summand, and dimension counting takes care of the rest. In particular, if degh > d, then h(D)f = 0 automatically, and this implies that \(L_{\mathbb{C}}(f) \leq d + 1\). Theorem 4.2 gives another explanation of this fact.

If h has repeated factors, a condition of interest in [3033, 49], then Gundelfinger’s Theorem [23], first proved in 1886, shows that a factor \({(-\beta x +\alpha y)}^{\ell}\) of h corresponds to a summand \(q(x,y){(\alpha x +\beta y)}^{d+1-\ell}\) in f, where q is an arbitrary form of degree − 1. (We are not interested in such summands when > 1. For more discussion of this case, see [49].)

If \(d = 2s - 1\) and r = s, then H s (f) is s × (s + 1) and has a non-trivial null-vector; for a general f, the resulting form h has distinct factors, and so is a unique Sylvester form. (The coefficients of h, and its discriminant, are polynomials in the coefficients of f.) This is how Sylvester proved that a general binary form of degree 2s − 1 is a sum of s powers of linear forms over \(\mathbb{C}\), and the minimal representation is unique.

If d = 2s and r = s, then H s (f) is square; det(H s (f)) is the catalecticant of f. (For more on the term “catalecticant”, see [47, pp. 49–50] and [22, pp. 104–105].) In general, there exists λ so that the catalecticant of f(x, y) − λ x 2s vanishes, and the resulting non-trivial null vector is generally a Sylvester form (no repeated factors). Thus, a general binary form of degree 2s is a sum of λ x 2s plus s powers of linear forms over \(\mathbb{C}\).

Sylvester’s Theorem can also be adapted to compute K-length when \(K \subsetneq \mathbb{C}\), with the understanding that a Sylvester form of minimal degree might not split over K.

Corollary 2.2.

Given f ∈ K[x,y], L K (f) fis the minimal degree of a Sylvester form for f which splits completely over K.

Proof.

If (2.3) is a minimal representation for f over K, where \(\lambda _{k},\alpha _{k},\beta _{k} \in K\), then h(x, y) ∈ K[x, y] splits over K by (2.2). Conversely, if h is a Sylvester form for f satisfying (2.2) with α k , β k K, then (2.3) holds for some \(\lambda _{k} \in \mathbb{C}\). This is equivalent to saying that the linear system

$$\displaystyle{ a_{j} =\sum _{ k=1}^{r}\alpha _{ k}^{d-j}\beta _{ k}^{j}X_{ k},\quad (0 \leq j \leq d) }$$
(2.13)

has a solution {X k = λ k } over \(\mathbb{C}\). Since \(a_{j},\alpha _{k}^{d-j}\beta _{k}^{j} \in K\) and (2.13) has a solution over \(\mathbb{C}\), it also has a solution over K. Thus, f has a K-representation of length r.

Example 2.1 (Continuing Example 1.2). 

Note that

$$\displaystyle\begin{array}{rcl} & & \phi (x,y) = 3{x}^{5} - 20{x}^{3}{y}^{2} + 10x{y}^{4} = \left ({ 5 \atop 0} \right ) \cdot 3\ {x}^{5} + \left ({ 5 \atop 1} \right ) \cdot 0\ {x}^{4}y {}\\ & +& \left ({ 5 \atop 2} \right ) \cdot (-2)\ {x}^{3}{y}^{2} + \left ({ 5 \atop 3} \right ) \cdot 0\ {x}^{2}{y}^{3} + \left ({ 5 \atop 4} \right ) \cdot 2\ x{y}^{4} + \left ({ 5 \atop 5} \right ) \cdot 0\ {y}^{5}. {}\\ \end{array}$$

Since

$$\displaystyle{ \left (\begin{array}{c@{\quad }c@{\quad }c@{\quad }c} 3 \quad & 0 \quad & - 2\quad &0\\ 0 \quad & - 2\quad & 0 \quad &2 \\ - 2\quad & 0 \quad & 2 \quad &0\\ \quad \end{array} \right )\cdot \left (\begin{array}{c} c_{0} \\ c_{1} \\ c_{2} \\ c_{3} \end{array} \right ) = \left (\begin{array}{c} 0\\ 0 \\ 0 \end{array} \right )\;\Longleftrightarrow\;(c_{0},c_{1},c_{2},c_{3}) = r(0,1,0,1), }$$

ϕ has a unique Sylvester form of degree 3: \(h(x,y) = y({x}^{2} + {y}^{2}) = y(y - ix)(y + ix)\). Accordingly, there exist \(\lambda _{k} \in \mathbb{C}\) so that

$$\displaystyle{ \phi (x,y) =\lambda _{1}{x}^{5} +\lambda _{ 2}{(x + iy)}^{5} +\lambda _{ 3}{(x - iy)}^{5}. }$$

Indeed, \(\lambda _{1} =\lambda _{2} =\lambda _{3} = 1\), as may be checked. It follows that L K (ϕ) = 3 if and only if iK. (There is no representation of length two.)

To find representations for ϕ of length 4, we consider (2.4) for ϕ with r = 4:

$$\displaystyle\begin{array}{rcl} H_{4}(\phi ) \cdot {(c_{0},c_{1},c_{2},c_{3},c_{4})}^{t}& =& {(0,0)}^{t}\;\Longleftrightarrow\;3c_{ 0} - 2c_{2} + 2c_{4} = -2c_{1} + 2c_{3} = 0 {}\\ \;\Longleftrightarrow\;(c_{0},c_{1},c_{2},c_{3},c_{4})& =& r_{1}(2,0,3,0,0) + r_{2}(0,1,0,1,0) + r_{3}(0,0,1,0,1), {}\\ \end{array}$$

hence \(h(x,y) = r_{1}{x}^{2}(2{x}^{2} + 3{y}^{2}) + y({x}^{2} + {y}^{2})(r_{2}x + r_{3}y)\). Given a field K, it is unclear whether there exist {r } so that h splits into distinct factors over K. We have found such {r } for small imaginary quadratic fields.

The choice \((r_{1},r_{2},r_{3}) = (1,0,2)\) gives \(h(x,y) = (2{x}^{2} + {y}^{2})({x}^{2} + 2{y}^{2})\) and

$$\displaystyle{ 24\phi (x,y) = 4{(x + \sqrt{-2}y)}^{5} + 4{(x -\sqrt{-2}y)}^{5} + {(2x + \sqrt{-2}y)}^{5} + {(2x -\sqrt{-2}y)}^{5}. }$$

Similarly, \((r_{1},r_{2},r_{3}) = (2,0,9)\) and (2, 0, − 5) give \(h(x,y) = ({x}^{2} + 3{y}^{2})(4{x}^{2} + 3{y}^{2})\) and \(({x}^{2} - {y}^{2})(4{x}^{2} + 5{y}^{2})\), leading to representations for ϕ of length 4 over \(\mathbb{Q}(\sqrt{-3})\) and \(\mathbb{Q}(\sqrt{-5})\). The simplest such representation we have found for \(\mathbb{Q}(\sqrt{-6})\) uses \((r_{1},r_{2},r_{3}) = (12,675,0,-156,816)\) and

$$\displaystyle{ h(x,y) = (5x + 12y)(5x - 12y)(6 \cdot 1{3}^{2}{x}^{2} + 3{3}^{2}{y}^{2}). }$$

We conjecture that \(L_{\mathbb{Q}(\sqrt{-m})}(\phi ) = 4\) for all non-square m ≥ 2. In Example 3.1, we shall show that there is no choice of \((r_{1},r_{2},r_{3})\) for which h splits into distinct factors over any subfield of \(\mathbb{R}\).

3 Sylvester’s 1864 Theorem

Theorem 3.1 was discovered by Sylvester [55] in 1864 while proving Isaac Newton’s conjectural variation on Descartes’ Rule of Signs, see [27, 56]. This theorem appeared in Pólya-Szegö [42, Chap. 5, Problem 79], and has been used by Pólya and Schoenberg [41] and Karlin [29, p. 466]. The (dehomogenized) version proved in [42] is:

Theorem 3.1 (Sylvester). 

Suppose 0 ≠ λ k for all k and \(\gamma _{1} <\ldots <\gamma _{r}\) , r ≥ 2, are real numbers such that

$$\displaystyle{ Q(t) =\sum _{ k=1}^{r}\lambda _{ k}{(t -\gamma _{k})}^{d} }$$

does not vanish identically. Suppose the sequence \((\lambda _{1},\ldots,\lambda _{r},{(-1)}^{d}\lambda _{1})\) has C changes of sign and Q has Z zeros, counting multiplicity. Then Z ≤ C.

We shall prove an equivalent version which exploits the homogeneity of f to avoid discussion of zeros at infinity in the proof. (The equivalence is discussed in [50].)

Theorem 3.2.

Suppose f(x,y) is a non-zero real form of degree d with τ real linear factors (counting multiplicity) and

$$\displaystyle{ f(x,y) =\sum _{ j=1}^{r}\lambda _{ j}{(\cos \theta _{j}x +\sin \theta _{j}y)}^{d} }$$
(3.1)

where \(-\frac{\pi }{2} <\theta _{1} <\ldots <\theta _{r} \leq \frac{\pi } {2}\) , r ≥ 2 and λ j ≠ 0. If there are σ sign changes in the tuple \((\lambda _{1},\lambda _{2},\ldots,\lambda _{r},{(-1)}^{d}\lambda _{1})\) , then τ ≤ σ. In particular, τ ≤ r.

Example 3.1 (Examples 1.2 and 2.1 concluded). 

Since

$$\displaystyle{ \phi (x,y) = 3x{\bigl ({x}^{2} -\tfrac{10-\sqrt{70}} {3} {y}^{2}\bigr )}{\bigl ({x}^{2} -\tfrac{10+\sqrt{70}} {3} {y}^{2}\bigr )} }$$

is a product of five linear factors over \(\mathbb{R}\), \(L_{\mathbb{R}}(\phi ) \geq 5\). The representation

$$\displaystyle{ 6\phi (x,y) = 36{x}^{5} - 10{(x + y)}^{5} - 10{(x - y)}^{5} + {(x + 2y)}^{5} + {(x - 2y)}^{5} }$$

over \(\mathbb{Q}\) implies that \(L_{\mathbb{R}}(\phi ) = 5\). It will follow from Theorem 4.10 that \(\mathcal{C}(\phi ) =\{ 3,4,5\}\).

Proof of Theorem 3.2.

We first rewrite (3.1):

$$\displaystyle\begin{array}{rcl} 2f(x,y)& =& \sum _{j=1}^{r}\lambda _{ j}{(\cos \theta _{j}x +\sin \theta _{j}y)}^{d} + \\ & & \sum _{j=1}^{r}{(-1)}^{d}\lambda _{ j}{(\cos (\theta _{j}+\pi )x +\sin (\theta _{j}+\pi )y)}^{d}.{}\end{array}$$
(3.2)

View the sequence \((\lambda _{1},\lambda _{2},\ldots,\lambda _{r},{(-1)}^{d}\lambda _{1},{(-1)}^{d}\lambda _{2},\ldots,{(-1)}^{d}\lambda _{r},\lambda _{1})\) cyclically, identifying the first and last term. There are 2σ pairs of consecutive terms with a negative product. This count is independent of the starting point, so if we make any invertible change of variables \((x,y)\mapsto (\cos \theta x +\sin \theta y,-\sin \theta x +\cos \theta y)\) in (3.1) (which doesn’t affect τ, and which “dials” the angles by θ), and reorder the “main” angles to \((-\tfrac{\pi }{2}, \tfrac{\pi } {2}]\), the value of σ is unchanged. We may therefore assume that neither x nor y divide f, that x d and y d are not summands in (3.2) (i.e., θ j is not a multiple of \(\frac{\pi }{2}\)), and that if there is a sign change in \((\lambda _{1},\lambda _{2},\ldots,\lambda _{r})\), then θ u < 0 < θ u + 1 implies λ u λ u + 1 < 0. Under these hypotheses, we may safely dehomogenize f by setting either x = 1 or y = 1 and avoid zeros at infinity and know that τ is the number of zeros of the resulting polynomial. The rest of the proof generally follows [42].

Let \(\bar{\sigma }\) denote the number of sign changes in \((\lambda _{1},\lambda _{2},\ldots,\lambda _{r})\). We induct on \(\bar{\sigma }\). The base case is \(\bar{\sigma }= 0\) (and λ j > 0 without loss of generality). If d is even, then σ = 0 and

$$\displaystyle{ f(x,y) =\sum _{ j=1}^{r}\lambda _{ j}{(\cos \theta _{j}x +\sin \theta _{j}y)}^{d} }$$

is definite, so τ = 0. If d is odd, then σ = 1. Let g(t) = f(t, 1), so that

$$\displaystyle{ g^{\prime}(t) =\sum _{ j=1}^{r}d\left (\lambda _{ j}\cos \theta _{j}\right ){(\cos \theta _{j}t +\sin \theta _{j})}^{d-1}. }$$

Since d − 1 is even, cosθ j > 0 and λ j > 0, g ′ is definite and g ′ ≠ 0. Rolle’s Theorem implies that g has at most one zero; that is, τ ≤ 1 = σ.

Suppose the theorem is valid for \(\bar{\sigma }= m \geq 0\) and suppose that \(\bar{\sigma }= m + 1\) in (3.1). Now let h(t) = f(1, t). We have

$$\displaystyle{ h^{\prime}(t) =\sum _{ j=1}^{r}d\left (\lambda _{ j}\sin \theta _{j}\right ){(\cos \theta _{j} +\sin \theta _{j}t)}^{d-1}. }$$

Note that h ′(t) = q(1, t), where

$$\displaystyle{ q(x,y) =\sum _{ j=1}^{r}d\left (\lambda _{ j}\sin \theta _{j}\right ){(\cos \theta _{j}x +\sin \theta _{j}y)}^{d-1}. }$$

Since \(\bar{\sigma }\geq 1\), θ u < 0 < θ u + 1 implies that λ u λ u + 1 < 0, so that the number of sign changes in \((d\lambda _{1}\sin \theta _{1},d\lambda _{2}\sin \theta _{2},\ldots,d\lambda _{r}\sin \theta _{r})\) is m, as the sign change at the u-th consecutive pair has been removed, and no other possible sign changes are introduced. The induction hypothesis implies that q(x, y) has at most m linear factors, hence q(1, t) = h ′(t) has ≤ m zeros (counting multiplicity) and Rolle’s Theorem implies that h has ≤ m + 1 zeros, completing the induction.

4 Applications to Forms of General Degree

We begin with a folklore result: the vector space of complex forms f in n variables of degree d is spanned by the set of linear forms taken to the d-th power. It follows from a 1903 theorem of Biermann (see [47, Proposition 2.11] or [51] for a proof) that a canonical set of the “correct” number of d-th powers over \(\mathbb{Z}\) forms a basis:

$$\displaystyle{ \left \{{(i_{1}x_{1} +\ldots +i_{n}x_{n})}^{d}\:\ 0 \leq i_{ k} \in \mathbb{Z},\ i_{1} + \cdots + i_{n} = d\right \}. }$$
(4.1)

If \(f \in K[x_{1},\ldots,x_{n}]\), then f is a K-linear combination of these forms and so \(L_{K}(f) \leq \left ({ n+d-1 \atop n-1} \right )\). We show below (Theorems 4.10 and 5.4) that when n = 2, the bound for L K (f) can be improved from d + 1 to d, but this is best possible.

The first two results are presented explicitly for completeness.

Theorem 4.1.

If f ∈ K[x,y], then L K (f) = 1 if and only if \(L_{\mathbb{C}}(f) = 1\).

Proof.

One direction is immediate from (1.2). For the other, suppose \(f(x,y) = {(\alpha x +\beta y)}^{d}\) with \(\alpha,\beta \in \mathbb{C}\). If α = 0, then f(x, y) = β d y d, with β dK. If α ≠ 0, then \(f(x,y) {=\alpha }^{d}{(x + (\beta /\alpha )y)}^{d}\). Since the coefficients of x d and d x d − 1 y in f are α d and α d − 1 β, it follows that α d and \(\beta /\alpha = {(\alpha }^{d-1}\beta ){/\alpha }^{d}\) are both in K.

Theorem 4.2.

Any set \(\{{(\alpha _{j}x +\beta _{j}y)}^{d}: 0 \leq j \leq d\}\) of pairwise distinct d-th powers is linearly independent and spans the binary forms of degree d.

Proof.

The matrix of this set with respect to the basis \(\left ({ d \atop i} \right ){x}^{d-i}{y}^{i}\) is \([\alpha _{j}^{d-i}\beta _{j}^{i}]\), whose determinant is Vandermonde:

$$\displaystyle{ \prod _{0\leq j<k\leq d}\left \vert \begin{array}{cc} \alpha _{j} & \beta _{j}\\ \alpha _{k } &\beta _{k} \end{array} \right \vert. }$$

This determinant is a product of non-zero terms by hypothesis.

By considering the difference of two representations of a given form, we obtain an immediate corollary about different representations of the same form. Trivial counterexamples, formed by splitting summands, occur in non-honest representations.

Corollary 4.3.

If f has two different honest representations:

$$\displaystyle{ f(x,y) =\sum _{ i=1}^{s}\lambda _{ i}{(\alpha _{i}x +\beta _{i}y)}^{d} =\sum _{ j=1}^{t}\mu _{ j}{(\gamma _{j}x +\delta _{j}y)}^{d}, }$$
(4.2)

then \(s + t \geq d + 2\) . If \(s + t = d + 2\) in (4.2) , then the combined set of linear forms, \(\{\alpha _{i}x +\beta _{i}y,\gamma _{j}x +\delta _{j}y\}\) , is pairwise distinct.

The next result collects some consequences of Corollary 4.3.

Corollary 4.4.

Let E = E f .

  1. (1)

    If \(L_{E}(f) = r \leq \frac{d} {2} + 1\) , then \(L_{\mathbb{C}}(f) = r\) , so \(\mathcal{C}(f) =\{ r\}\) .

  2. (2)

    If, further, \(L_{E}(f) = r \leq \frac{d} {2} + \frac{1} {2}\) , then f has a unique \(\mathbb{C}\) -minimal representation.

  3. (3)

    If \(d = 2s - 1\) and H s (f) has full rank, f has a unique Sylvester form h of degree s and E f ⊆ K, then L K (f) ≥ s, with equality if and only if h splits in K.

Proof.

We take the parts in turn.

  1. (1)

    A different representation of f over \(\mathbb{C}\) must have length \(\geq d + 2 - r \geq \frac{d} {2} + 1 \geq r\) by Corollary 4.3, and so \(L_{\mathbb{C}}(f) = r\).

  2. (2)

    If \(r \leq \frac{d} {2} + \frac{1} {2}\), then any other representation has length \(\geq \frac{d} {2} + \frac{3} {2} > r\), and so cannot be minimal.

  3. (3)

    If \(d = 2s - 1\) and r = s, then the last case applies, so f has a unique \(\mathbb{C}\)-minimal representation, and by Corollary 2.2, this representation can be expressed in K if and only if the Sylvester form splits over K.

We now give some more explicit constructions of forms with multiple lengths. We first need a lemma about cubics.

Lemma 4.5.

If f is a cubic given by (2.1) and \(H_{2}(f) = \left (\begin{array}{ccc} a_{0} & a_{1} & a_{2} \\ a_{1} & a_{2} & a_{3}\\ \end{array} \right )\) has rank ≤ 1, then f is a cube.

Proof.

If a 0 = 0, then a 1 = 0, so a 2 = 0 and f is a cube. If a 0 ≠ 0, then \(a_{2} = a_{1}^{2}/a_{0}\) and \(a_{3} = a_{1}a_{2}/a_{0} = a_{1}^{3}/a_{0}^{2}\) and \(f(x,y) = a_{0}{(x + \frac{a_{1}} {a_{0}} y)}^{3}\) is again a cube.

Theorem 4.6.

Suppose d ≥ 3 and there exist \(\alpha _{i},\beta _{i} \in \mathbb{C}\) so that

$$\displaystyle{ f(x,y) =\sum _{ i=0}^{d}\left ({ d \atop i} \right )a_{i}{x}^{d-i}{y}^{i} = {(\alpha _{ 1}x +\beta _{1}y)}^{d} + {(\alpha _{ 2}x +\beta _{2}y)}^{d} \in K[x,y]. }$$
(4.3)

If (4.3) is honest and L K (f) > 2, then there exists u ∈ K with \(\sqrt{u}\notin K\) so that \(L_{K(\sqrt{u})}(f) = 2\) . The summands in (4.3) are conjugates of each other in \(K(\sqrt{u})\) .

Proof.

First observe that if α 2 = 0, then \(\alpha _{2}\beta _{1}\neq \alpha _{1}\beta _{2}\) implies that α 1 ≠ 0. But then \(a_{0} =\alpha _{ 1}^{d}\neq 0\) and \(a_{1} =\alpha _{ 1}^{d-1}\beta _{1}\) imply that \(\alpha _{1}^{d},\beta _{1}/\alpha _{1} \in K\) as in Theorem 4.1, and so

$$\displaystyle{ f(x,y) -\alpha _{1}^{d}{(x + (\beta _{ 1}/\alpha _{1})y)}^{d} = {(\beta _{ 2}y)}^{d} =\beta _{ 2}^{d}{y}^{d} \in K[x,y]. }$$

This contradicts L K (f) > 2, so α 2 ≠ 0; similarly, α 1 ≠ 0. Let \(\lambda _{i} =\alpha _{ i}^{d}\) and \(\gamma _{i} =\beta _{i}/\alpha _{i}\) for i = 1, 2, so λ 1 λ 2 ≠ 0 and γ 1γ 2. We have

$$\displaystyle{ f(x,y) =\lambda _{1}{(x +\gamma _{1}y)}^{d} +\lambda _{ 2}{(x +\gamma _{2}y)}^{d}\Rightarrow a_{ i} =\lambda _{1}\gamma _{1}^{i} +\lambda _{ 2}\gamma _{2}^{i}. }$$

Now let

$$\displaystyle{ g(x,y) =\lambda _{1}{(x +\gamma _{1}y)}^{3} +\lambda _{ 2}{(x +\gamma _{2}y)}^{3} = a_{ 0}{x}^{3} + 3a_{ 1}{x}^{2}y + 3a_{ 2}x{y}^{2} + a_{ 3}{y}^{3}. }$$

Since λ i ≠ 0 and (4.3) is honest, Corollary 4.3 implies that \(L_{\mathbb{C}}(g) = 2\), so H 2 (g) has full rank by Lemma 4.5. It can be checked directly that

$$\displaystyle{ \left (\begin{array}{ccc} a_{0} & a_{1} & a_{2} \\ a_{1} & a_{2} & a_{3}\\ \end{array} \right )\cdot \left (\begin{array}{c} \gamma _{1}\gamma _{2} \\ - (\gamma _{1} +\gamma _{2}) \\ 1\\ \end{array} \right ) = \left (\begin{array}{c} 0\\ 0 \end{array} \right ), }$$

and this gives \(h(x,y) = (y -\gamma _{1}x)(y -\gamma _{2}x)\) as the unique Sylvester form for g. Since H 2 (g) has entries in K and hence has a null vector in K, we must have hK[x, y]. By hypothesis, h does not split over K; it must do so over \(K(\sqrt{u})\), where \(u = {(\gamma _{1} -\gamma _{2})}^{2} = {(\gamma _{1} +\gamma _{2})}^{2} - 4\gamma _{1}\gamma _{2} \in K\). Moreover, if σ denotes conjugation with respect to \(\sqrt{u}\), then γ 2 = σ(γ 1 ) and since λ 1 + λ 2K, λ 2 = σ(λ 1 ) as well. Note that \(\lambda _{i} =\alpha _{ i}^{d}\) and \(\gamma _{i} =\beta _{i}/\alpha _{i} \in K(\sqrt{u})\), but this is not necessarily true for α i and β i themselves.

Corollary 4.7.

Suppose g ∈ E[x,y] does not split over E, but factors into distinct linear factors \(g(x,y) =\prod _{ j=1}^{r}(x +\alpha _{j}y)\) over an extension field K of E. If d > 2r − 1, then for each ℓ ≥ 0,

$$\displaystyle{f_{\ell}(x,y) =\sum _{ j=1}^{r}\alpha _{ j}^{\ell}{(x +\alpha _{ j}y)}^{d} \in E[x,y],}$$

and \(L_{K}(f_{\ell}) = r < d + 2 - r \leq L_{E}(f_{\ell})\) .

Proof.

The coefficient of \(\left ({ d \atop k} \right ){x}^{d-k}{y}^{k}\) in f is \(\sum _{j=1}^{r}\alpha _{j}^{\ell+k}\). Each such power-sum belongs to E by Newton’s Theorem on Symmetric Forms. If α s E (which must hold for at least one α s ≠ 0), then \(\alpha _{s}^{\ell}{(x +\alpha _{s}y)}^{d}\notin E[x,y]\). Apply Corollary 4.3.

Corollary 4.8.

Suppose K is an extension field of E f, \(r \leq \frac{d+1} {2}\) , and

$$\displaystyle{ f(x,y) =\sum _{ i=1}^{r}\lambda _{ i}{(\alpha _{i}x +\beta _{i}y)}^{d} }$$

with \(\lambda _{i},\alpha _{i},\beta _{i} \in K\) . Then every automorphism of K which fixes E f permutes the summands of the representation of f.

Proof.

We interpret \(\sigma (\lambda {(\alpha x +\beta y)}^{d}) =\sigma (\lambda ){(\sigma (\alpha )x +\sigma (\beta )y)}^{d}\). Since σ(f) = f, the action of σ is to give another representation of f. Corollary 4.4(2) implies that this is the same representation, perhaps reordered.

Theorem 4.9.

If f ∈ K[x,y], then \(L_{\mathbb{C}}(f) \leq \deg f\).

Proof.

By a change of variables, which does not affect the length, we may assume that neither x nor y divide f, hence a 0 a d ≠ 0 and \(h = a_{d}{x}^{d} - a_{0}{y}^{d}\) is a Sylvester form which splits over \(\mathbb{C}\).

We have been unable to find an “original” citation for Theorem 4.9. It appears as an exercise in Harris [24, Exercise 11.35], with the (dehomogenized) maximal length occurring at \({x}^{d-1}(x + 1)\) (see Theorem 5.4). Landsberg and Teitler [34, Corollary 5.2] prove that \(L_{\mathbb{C}}(f) \leq \left ({ n+d-1 \atop n-1} \right ) - (n - 1)\), which reduces to Theorem 4.9 for n = 2. The proof of Theorem 4.9 will not apply to L K (f) for \(K\neq \mathbb{C}\), because \(a_{d}{x}^{d} - a_{0}{y}^{d}\) usually does not split over K. A more careful argument is required, constructing an explicit Sylvester form of degree d for f which splits over K.

Theorem 4.10.

If f ∈ K[x,y], then L K (f) ≤ deg f.

Proof.

Write f as in (2.1). If f is identically zero, there is nothing to prove. Otherwise, we may assume that f(1, 0) = a 0 ≠ 0 after a change of variables if necessary. By Corollary 2.2, it suffices to find \(h(x,y) =\sum _{ k=0}^{d}c_{k}{x}^{d-k}{y}^{k}\) which splits into distinct linear factors over K and satisfies \(\sum _{k=0}^{d}a_{k}c_{k} = 0\).

Let e 0 = 1 and \(e_{k}(t_{1},\ldots,t_{d-1})\) denote the usual k-th elementary symmetric functions. We make a number of definitions:

$$\displaystyle\begin{array}{rcl} h_{0}(t_{1},\ldots,t_{d-1};x,y)&:=& \sum _{k=0}^{d-1}e_{ k}(t_{1},\ldots,t_{d-1}){x}^{d-1-k}{y}^{k} =\prod _{ j=1}^{d-1}(x + t_{ j}y), {}\\ \beta (t_{1},\ldots,t_{d-1})&:=& -\sum _{k=0}^{d-1}a_{ k}e_{k}(t_{1},\ldots,t_{d-1}), {}\\ \alpha (t_{1},\ldots,t_{d-1})&:=& \sum _{k=0}^{d-1}a_{ k+1}e_{k}(t_{1},\ldots,t_{d-1}), {}\\ \Phi (t_{1},\ldots,t_{d-1})&:=& \prod _{j=1}^{d-1}(\alpha (t_{ 1},\ldots,t_{d-1})t_{j} -\beta (t_{1},\ldots,t_{d-1})), {}\\ \Psi (t_{1},\ldots,t_{d-1})&:=& \Phi (t_{1},\ldots,t_{d-1}) \times \prod _{1\leq i<j\leq d-1}(t_{i} - t_{j}). {}\\ \end{array}$$

Then \(\beta (0,\ldots,0) = -a_{0}e_{0} = -a_{0}\neq 0\), so \(\Phi (0,\ldots,0) = a_{0}^{d-1}\neq 0\) and \(\Phi \) is not the zero polynomial, and thus neither is \(\Psi \). Choose γ j K, 1 ≤ jd − 1, so that \(\Psi (\gamma _{1},\ldots,\gamma _{d-1})\neq 0\). It follows that the γ j ’s are distinct, and α γ j β, where \(\alpha =\alpha (\gamma _{1},\ldots,\gamma _{d-1})\) and \(\beta =\beta (\gamma _{1},\ldots,\gamma _{d-1})\). Let \(\tilde{e}_{k} = e_{k}(\gamma _{1},\ldots,\gamma _{d-1})\). We claim that

$$\displaystyle\begin{array}{rcl} h(x,y)& =& \sum _{i=0}^{d}c_{ i}{x}^{d-1}{y}^{i}:= (\alpha x+\beta y)h_{ 0}(\gamma _{1},\ldots,\gamma _{d-1};x,y) = (\alpha x+\beta y)\prod _{j=1}^{d-1}(x+\gamma _{ j}y) {}\\ & =& (\alpha x+\beta y)\sum _{k=0}^{d-1}\tilde{e}_{ k}{x}^{d-1-k}{y}^{k} =\alpha \tilde{ e}_{ 0}{x}^{d}+\sum _{ k=1}^{d-1}(\alpha \tilde{e}_{ k} +\beta \tilde{ e}_{k-1}){x}^{d-k}{y}^{k}+\beta \tilde{e}_{ d-1}{y}^{d}{}\\ \end{array}$$

is a Sylvester form for f. Note that the γ j ’s are distinct and α γ j β, 1 ≤ jd − 1, so that h is a product of distinct linear factors. Finally,

$$\displaystyle\begin{array}{rcl} \sum _{k=0}^{d}a_{ k}c_{k}& =& \alpha \tilde{e}_{0}a_{0} +\sum _{ k=1}^{d-1}(\alpha \tilde{e}_{ k} +\beta \tilde{ e}_{k-1})a_{k} +\beta \tilde{ e}_{d-1}a_{k} = {}\\ & & \alpha \sum _{k=0}^{d-1}\tilde{e}_{ k}a_{k} +\beta \sum _{ k=0}^{d-1}\tilde{e}_{ k}a_{k+1} =\alpha (-\beta )+\beta \alpha = 0. {}\\ \end{array}$$

This completes the proof.

Corollary 4.11.

If f is a product of d real linear forms and not a d-th power, then \(L_{\mathbb{R}}(f) = d\) .

Proof.

Write f as a sum of \(L_{\mathbb{R}}(f) = r \leq d\) d-th powers and rescale into the shape (3.1). Taking τ = d in Theorem 3.2, we see that dσr.

Conjecture 4.12.

If \(f \in \mathbb{R}[x,y]\) is a form of degree d ≥ 3, then \(L_{\mathbb{R}}(f) = d\) if and only if f is a product of d linear forms.

We shall see in Theorems 5.2 and 5.3 that this conjecture is true for d = 3, 4.

After a preprint of this paper was distributed, Giorgio Ottaviani pointed out that in the case that the roots of f are distinct, Conjecture 4.12 has been proved very recently by Comon and Ottaviani [15] and by Causa and Re [9].

5 Applications to Forms of Particular Degree

Corollary 4.3 and Theorem 4.10 impose some immediate restrictions on the possible cabinets of a form of degree d.

Corollary 5.1.

Suppose deg f = d.

  1. (1)

    If \(L_{\mathbb{C}}(f) = r\) , then \(\mathcal{C}(f) \subseteq \{ r\} \cup \{ d - i: 0 \leq i \leq r - 2\}\) .

  2. (2)

    If \(L_{\mathbb{C}}(f) = 2\) , then \(\mathcal{C}(f)\) is either {2} or {2,d}.

  3. (3)

    If f has k different lengths, then d ≥ 2k − 1.

  4. (4)

    If f is cubic, then \(\mathcal{C}(f) =\{ 1\},\{2\},\{3\}\) or {2,3}.

  5. (5)

    If f is quartic, then \(\mathcal{C}(f) =\{ 1\},\{2\},\{3\},\{4\},\{2,4\}\) or {3,4}.

We now completely classify L K (f) when f is a binary cubic.

Theorem 5.2.

Suppose f(x,y) ∈ E f [x,y] is a cubic form with discriminant \(\Delta \) and suppose \(E_{f} \subseteq K \subseteq \mathbb{C}\) .

  1. (1)

    If f is a cube, then L K (f) = 1 and \(\mathcal{C}(f) =\{ 1\}\) .

  2. (2)

    If f has a repeated linear factor, but is not a cube, then L K (f) = 3 and \(\mathcal{C}(f) =\{ 3\}\) .

  3. (3)

    If f does not have a repeated factor, then L K (p) = 2 if \(\sqrt{-3\Delta } \in K\) and L K (p) = 3 otherwise, so either \(\mathcal{C}(f) =\{ 2\}\) or \(\mathcal{C}(f) =\{ 2,3\}\) .

Proof.

The first case follows from Theorem 4.1. In the second case, after an invertible linear change of variables, we may assume that f(x, y) = 3x 2 y, and apply Theorem 2.1 to test for representations of length 2. But

$$\displaystyle{ \left (\begin{array}{ccc} 0&1&0\\ 1 &0 &0\\ \end{array} \right )\cdot \left (\begin{array}{c} c_{0} \\ c_{1} \\ c_{2} \end{array} \right ) = \left (\begin{array}{c} 0\\ 0 \end{array} \right )\Rightarrow c_{0} = c_{1} = 0, }$$
(5.1)

so h has repeated factors. Hence \(L_{K}({x}^{2}y) \geq 3\) and by Theorem 4.10, \(L_{K}({x}^{2}y)\,=\,3\).

Finally, suppose

$$\displaystyle{ f(x,y) = a_{0}{x}^{3} + 3a_{ 1}{x}^{2}y + 3a_{ 2}x{y}^{2} + a_{ 3}{y}^{3} =\prod _{ j=1}^{3}(r_{ j}x + s_{j}y) }$$

does not have repeated factors, so that

$$\displaystyle{ 0\neq \Delta (f) =\prod _{j<k}{(r_{j}s_{k} - r_{k}s_{j})}^{2}, }$$

and consider the system:

$$\displaystyle{ \left (\begin{array}{ccc} a_{0} & a_{1} & a_{2} \\ a_{1} & a_{2} & a_{3}\\ \end{array} \right )\cdot \left (\begin{array}{c} c_{0} \\ c_{1} \\ c_{2} \end{array} \right ) = \left (\begin{array}{c} 0\\ 0 \end{array} \right ). }$$

By Lemma 4.5, this system has rank 2; the unique Sylvester form is

$$\displaystyle{ h(x,y) = (a_{1}a_{3} - a_{2}^{2}){x}^{2} + (a_{ 1}a_{2} - a_{0}a_{3})xy + (a_{0}a_{2} - a_{1}^{2}){y}^{2}, }$$

which happens to be the Hessian of f. Since hE f [x, y] ⊆ K[x, y], it splits over K if and only if its discriminant is a square in K. A computation shows that

$$\displaystyle{ {(a_{1}a_{2} - a_{0}a_{3})}^{2} - 4(a_{ 1}a_{3} - a_{2}^{2})(a_{ 0}a_{2} - a_{1}^{2}) = -\frac{\Delta (f)} {27} = -\frac{3\Delta (f)} {{9}^{2}}. }$$

Thus, L K (f) = 2 if and only if \(\sqrt{-3\Delta (f)} \in K\). If h does not split over F, then L F (f) = 3 by Theorem 4.10.

In particular, x 3, x 3 + y 3, x 2 y and \({(x + iy)}^{3} + {(x - iy)}^{3}\) have the cabinets enumerated in Corollary 5.1(4). If f has three distinct real linear factors, then \(\Delta (f) > 0\), so \(\sqrt{-3\Delta (f)}\notin \mathbb{R}\) and \(L_{\mathbb{R}}(f) = 3\). If f is real and has one real and two conjugate complex linear factors, then \(\Delta (f) < 0\), so \(L_{\mathbb{R}}(f) = 2\). Counting repeated roots, we see that if f is a real cubic, and not a cube, then \(L_{\mathbb{R}}(f) = 3\) if and only if it has three real factors, thus proving Conjecture 4.12 when d = 3.

Example 5.1.

We find all representations of 3x 2 y of length 3. Note that

$$\displaystyle{ H_{3}(f) \cdot {(c_{0},c_{1},c_{2},c_{3})}^{t} = (0)\;\Longleftrightarrow\;c_{ 1} = 0\;\Longleftrightarrow\;h(x,y) = c_{0}{x}^{3} + c_{ 2}x{y}^{2} + c_{ 3}{y}^{3}. }$$

If c 0 = 0, then y 2  |  h, which is to be avoided, so we scale and assume c 0 = 1. We can parameterize the Sylvester forms as \(h(x,y) = (x - ay)(x - by)(x + (a + b)y)\) with \(a,b,-(a + b)\) distinct. This leads to an easily checked general formula

$$\displaystyle\begin{array}{rcl} 3(a - b)(a + 2b)(2a + b){x}^{2}y& =& \\ (a + 2b){(ax + y)}^{3}& -& (2a + b){(bx + y)}^{3} + (a - b){(-(a + b)x + y)}^{3}.{}\end{array}$$
(5.2)

Białynicki-Birula and Schinzel [2, Lemma 7.1] give the general formula for d x d − 1 y as a sum of d d-th powers of linear forms.

Theorem 5.3.

If f is a real quartic form, then \(L_{\mathbb{R}}(f) = 4\) if and only if f is a product of four linear factors.

Proof.

Factor ± f as a product of k positive definite quadratic forms and 4 − 2k linear forms. If k = 0, then Corollary 4.11 implies that \(L_{\mathbb{R}}(f) = 4\). We must show that if k = 1 or k = 2, then f has a representation over \(\mathbb{R}\) as a sum of ≤ 3 fourth powers.

If k = 2, then f is positive definite and by [43, Theorem 6], after an invertible linear change of variables, \(f(x,y) = {x}^{4} + 6\lambda {x}^{2}{y}^{2} + {y}^{4}\), with 6λ ∈ ( − 2, 2]. (This is also proved in [51].) If r ≠ 1, then

$$\displaystyle\begin{array}{rcl} & & {(rx + y)}^{4} + {(x + ry)}^{4} - ({r}^{3} + r){(x + y)}^{4} \\ & & = {(r - 1)}^{2}({r}^{2} + r + 1)\left ({x}^{4} -\left ( \tfrac{6r} {{r}^{2}+r+1}\right ){x}^{2}{y}^{2} + {y}^{4}\right ).{}\end{array}$$
(5.3)

Let \(\phi (r) = - \frac{6r} {{r}^{2}+r+1}\). Then \(\phi (-2 + \sqrt{3}) = 2\) and \(\phi (1) = -2\), and since ϕ is continuous, it maps \([-2 + \sqrt{3},1)\) onto ( − 2, 2], and (5.3) shows that \(L_{\mathbb{R}}(f) \leq 3\).

If k = 1, there are two cases, depending on whether the linear factors are distinct. Suppose that after a linear change, f(x, y) = x 2 h(x, y), where h is positive definite, and so for some λ > 0 and linear , \(h(x,y) =\lambda {x}^{2} {+\ell }^{2}\). After another linear change,

$$\displaystyle{ f(x,y) = {x}^{2}(2{x}^{2} + 12{y}^{2}) = {(x + y)}^{4} + {(x - y)}^{4} - 2{y}^{4}, }$$
(5.4)

and (5.4) shows that \(L_{\mathbb{R}}(f) \leq 3\).

If the linear factors are distinct, then after a linear change,

$$\displaystyle{ f(x,y) = xy(a{x}^{2} + 2bxy + c{y}^{2}), }$$

where a > 0, c > 0, b 2 < a c. After a scaling, \(f(x,y) = xy({x}^{2} + dxy + {y}^{2})\), | d | < 2, and by taking ± f(x, ± y), we may assume d ∈ [0, 2). If r ≠ 1, then

$$\displaystyle\begin{array}{rcl} & & ({r}^{4} + 1){(x + y)}^{4} - {(rx + y)}^{4} - {(x + ry)}^{4} \\ & =& 4{(r - 1)}^{2}({r}^{2} + r + 1)\left ({x}^{3}y + \left ( \tfrac{3{(1+r)}^{2}} {2({r}^{2}+r+1)}\right ){x}^{2}{y}^{2} + x{y}^{3}\right ).{}\end{array}$$
(5.5)

Let \(\psi (r) = \frac{3{(1+r)}^{2}} {2({r}^{2}+r+1)}\). Since \(\psi (-1) = 0\), ψ(1) = 2 and ψ is continuous, it maps [ − 1, 1) onto [0, 2), and (5.5) shows that \(L_{\mathbb{R}}(f) \leq 3\).

The next result may be very old; \(L_{\mathbb{C}}({x}^{d-1}y) = d\) seems well known, but the only reference we have seen for the converse is the very recent [2, Corollary 3]. Białynicki-Birula and Schinzel also classify all binary p with degp = d and \(L_{\mathbb{C}}(p) = d - k\) for 1 ≤ k ≤ 3 and sufficiently large d. Landsberg and Teitler [34, Corollary 4.5] and Boij, Carlini and Geramita [4] have both shown that \(L_{\mathbb{C}}({x}^{a}{y}^{b}) =\max (a + 1,b + 1)\) if a, b ≥ 1.

Theorem 5.4.

If d ≥ 3, then \(L_{\mathbb{C}}(f) = d\) if and only if there are two distinct linear forms ℓ and ℓ′ so that \(f {=\ell }^{d-1}\ell^{\prime}\) .

Proof.

If \(f {=\ell }^{d-1}\ell^{\prime}\), then after an invertible linear change, we may assume that \(f(x,y) = d{x}^{d-1}y\). If \(L_{\mathbb{C}}(d{x}^{d-1}y) \leq d - 1\), then f would have a Sylvester form of degree d − 1. But then, as in (5.1), (2.4) becomes

$$\displaystyle{ \left (\begin{array}{cccc} 0&1&\cdots &0\\ 1 &0 &\cdots &0 \end{array} \right )\cdot \left (\begin{array}{c} c_{0} \\ c_{1}\\ \vdots \\ c_{d-1} \end{array} \right ) = \left (\begin{array}{c} 0\\ 0 \end{array} \right )\Rightarrow c_{0} = c_{1} = 0, }$$

so \(h(x,y) =\sum _{ t=0}^{d-1}c_{t}{x}^{d-1-t}{y}^{t}\) does not have distinct factors. Thus, \(L_{\mathbb{C}}(d{x}^{d-1}y) = d\).

Conversely, suppose \(L_{\mathbb{C}}(f) = d\). Factor \(f =\prod \ell_{ j}^{m_{j}}\) as a product of pairwise distinct linear forms, with \(\sum m_{j} = d\), \(m_{1} \geq m_{2}\ldots \geq m_{s} \geq 1\), and s > 1 (otherwise, \(L_{\mathbb{C}}(f) = 1\).) Make an invertible linear change taking \((\ell_{1},\ell_{2})\mapsto (x,y)\), and call the new form g; \(L_{\mathbb{C}}(g) = d\) as well. If \(g(x,y) =\sum _{ \ell=0}^{d}\left ({ d \atop \ell} \right )b_{\ell}{x}^{d-\ell}{y}^{\ell}\), then \(b_{0} = b_{d} = 0\). By hypothesis, there does not exist a Sylvester form of degree d − 1 for g. Consider Theorem 2.1 in this case. We have

$$\displaystyle{ \left (\begin{array}{ccccc} 0 &b_{1} & \cdots &b_{d-2} & b_{d-1} \\ b_{1} & b_{2} & \cdots &b_{d-1} & 0 \end{array} \right )\cdot \left (\begin{array}{c} c_{0} \\ c_{1}\\ \vdots \\ c_{d-1} \end{array} \right ) = \left (\begin{array}{c} 0\\ 0 \end{array} \right ). }$$

If m 1m 2 ≥ 2, then x 2, y 2  |  g(x, y) and \(b_{1} = b_{d-1} = 0\) and \({x}^{d-1} - {y}^{d-1}\) is a Sylvester form of degree d − 1 for f. Thus m 2 = 1 and so y 2 does not divide g and b 1 ≠ 0. Let \(q(t) =\sum _{ i=0}^{d-2}b_{i+1}{t}^{i}\) (note the absence of binomial coefficients!) and suppose q is not the constant polynomial. Then there exists t 0 so that q(t 0 ) = 0. Since q(0) = b 1, t 0 ≠ 0. We have

$$\displaystyle{ \left (\begin{array}{ccccc} 0 &b_{1} & \cdots &b_{d-2} & b_{d-1} \\ b_{1} & b_{2} & \cdots &b_{d-1} & 0 \end{array} \right )\cdot \left (\begin{array}{c} 1\\ t_{ 0}\\ \vdots \\ t_{0}^{d-1} \end{array} \right ) = \left (\begin{array}{c} t_{0}q(t_{0}) \\ q(t_{0}) \end{array} \right ) = \left (\begin{array}{c} 0\\ 0 \end{array} \right ). }$$

Since

$$\displaystyle{ h(x,y) =\sum _{ i=0}^{d-1}t_{ 0}^{i}{x}^{d-1-i}{y}^{i} = \frac{{x}^{d} - t_{ 0}^{d}{y}^{d}} {x - t_{0}y} =\prod _{ k=1}^{d-1}(x -\zeta _{ d-1}^{k}t_{ 0}y) }$$

has distinct linear factors, it is a Sylvester form for g, and \(L_{\mathbb{C}}(g) \leq d - 1\). This contradiction implies that q has no zeros, so q(t) = b 1 must be a constant. It follows that \(g(x,y) = db_{1}{x}^{d-1}y\), as promised.

By Corollaries 4.4 and 5.1, instances of the first five cabinets in Corollary 5.1(5) are: x 4, x 4 + y 4, \({x}^{4} + {y}^{4} + {(x + y)}^{4}\), x 3 y and \({(x + iy)}^{4} + {(x - iy)}^{4}\). It will follow from the next results that \(\mathcal{C}({({x}^{2} + {y}^{2})}^{2}) =\{ 3,4\}\).

Theorem 5.5.

If d = 2k and \(f(x,y) = \left ({ 2k \atop k} \right ){x}^{k}{y}^{k}\) , then \(L_{\mathbb{C}}(f) = k + 1\) . The minimal \(\mathbb{C}\) -representations of f are given by

$$\displaystyle{ (k + 1)\left ({ 2k \atop k} \right ){x}^{k}{y}^{k} =\sum _{ j=0}^{k}{(\zeta _{ 2k+2}^{j}wx +\zeta _{ 2k+2}^{-j}{w}^{-1}y)}^{2k},\qquad 0\neq w \in \mathbb{C}. }$$
(5.6)

Proof.

We first evaluate the right-hand side of (5.6) by expanding the powers:

$$\displaystyle\begin{array}{rcl} & \sum _{j=0}^{k}{(\zeta _{ 2k+2}^{j}wx +\zeta _{ 2k+2}^{-j}{w}^{-1}y)}^{2k} =\sum _{ j=0}^{k}\sum _{ t=0}^{2k}\left ({ 2k \atop t} \right )\zeta _{2k+2}^{j(2k-t)-jt}{w}^{(2k-t)-t}{x}^{2k-t}{y}^{t}& \\ & =\sum _{ t=0}^{2k}\left ({ 2k \atop t} \right ){w}^{2k-2t}{x}^{2k-t}{y}^{t}\left (\sum _{ j=0}^{k}\zeta _{ k+1}^{j(k-t)}\right ). &{}\end{array}$$
(5.7)

But \(\sum _{j=0}^{m-1}\zeta _{m}^{rj} = 0\) unless m  |  r, in which case it equals m. Since the only multiple of k + 1 in the set {kt: 0 ≤ t ≤ 2k} occurs for t = k, (5.7) reduces to the left-hand side of (5.6). We now show that these are all the minimal \(\mathbb{C}\)-representations of f.

Since \(H_{k}({x}^{k}{y}^{k})\) has 1’s on the NE-SW diagonal, it is non-singular, so \(L_{\mathbb{C}}({x}^{k}{y}^{k}) > k\), and \(L_{\mathbb{C}}({x}^{k}{y}^{k}) = k + 1\) by (5.6). By Corollary 4.3, any minimal \(\mathbb{C}\)-representation not given by (5.6) can only use powers of forms which are distinct from any \(wx + {w}^{-1}y\). If a b = c 2 ≠ 0, then a x + b y is a multiple of \(\frac{a} {c}x + \frac{c} {a}y\). This leaves only x 2k and y 2k, and there is no linear combination of these giving x k y k.

The representations in (5.6) arise because the null-vectors of \(H_{k+1}({x}^{k}{y}^{k})\) can only be \({(c_{0},0,\ldots,0,c_{k+1})}^{t}\) and \(c_{0}{x}^{k+1} + c_{k+1}{y}^{k+1}\) is a Sylvester form when c 0 c k + 1 ≠ 0.

Corollary 5.6.

For k ≥ 2 \(L_{\mathbb{C}}({({x}^{2} + {y}^{2})}^{k}) = k + 1\) , and \(L_{K}({({x}^{2} + {y}^{2})}^{k}) = k + 1\) iff \(\tan \frac{\pi }{k+1} \in K\) . The \(\mathbb{C}\) -minimal representations of \({({x}^{2} + {y}^{2})}^{k}\) are given by,

$$\displaystyle{ \left ({ 2k \atop k} \right ){({x}^{2} + {y}^{2})}^{k} = \frac{1} {k + 1}\sum _{j=0}^{k}{\left (\cos ( \tfrac{j\pi } {k+1}+\theta )x +\sin ( \tfrac{j\pi } {k+1}+\theta )y\right )}^{2k},\quad \theta \in \mathbb{C}. }$$
(5.8)

Proof.

The invertible map \((x,y)\mapsto (x - iy,x + iy)\) takes x k y k into \({({x}^{2} + {y}^{2})}^{k}\). Setting 0 ≠ w = e i θ in (5.6) gives (5.8). If tanα ≠ 0, then

$$\displaystyle{ {(\cos \alpha \ x +\sin \alpha \ y)}^{2k} {=\cos }^{2k}\alpha \cdot {(x +\tan \alpha \ y)}^{2k} = {(1 {+\tan }^{2}\alpha )}^{-k}{(x +\tan \alpha \ y)}^{2k}. }$$

Thus, if tanαK, then (cosα x + sinα y)2kK[x, y]. Further, if cosα = 0, then \({(\cos \alpha \ x +\sin \alpha \ y)}^{2k} = {y}^{2k} \in K[x,y]\). Conversely, if (cosα x + sinα y)2kK[x, y] and cosα ≠ 0, then the ratio of the coefficients of x 2k − 1 y and x 2k equals 2ktanα, which must be in K. It follows that \(L_{K}({({x}^{2} + {y}^{2})}^{k}) = k + 1\) if and only if there exists \(\theta \in \mathbb{C}\) so that for each 0 ≤ jk, either \(\cos ( \tfrac{j\pi } {k+1}+\theta ) = 0\) or \(\tan ( \tfrac{j\pi } {k+1}+\theta ) \in K\). Since tanα, tanβK imply tan(αβ) ∈ K and k ≥ 2, we see that (5.8) is a representation over K if and only if \(\tan \frac{\pi }{k+1} \in K\).

In particular, since \(\tan \frac{\pi }{3} = \sqrt{3}\notin \mathbb{Q}\), \(L_{\mathbb{Q}}({({x}^{2} + {y}^{2})}^{2}) > 3\) and so must equal 4. Thus, \(\mathcal{C}({({x}^{2} + {y}^{2})}^{2}) =\{ 3,4\}\), as promised. Since \(\tan \frac{\pi }{m}\) is irrational for m ≥ 5 (see e.g. [40, Corollary 3.12]), it follows that \(L_{\mathbb{Q}}({({x}^{2} + {y}^{2})}^{k}) = k + 1\) only for k = 1, 3.

It is worth remarking that x k y k is a highly singular complex form, as is \({({x}^{2} + {y}^{2})}^{k}\). However, as a real form, \({({x}^{2} + {y}^{2})}^{k}\) is interior to the real convex cone Q 2, 2k . For real θ, the formula in (5.8) goes back at least to Friedman [21] in 1957. It was shown in [47] that all minimal real representations of \({({x}^{2} + {y}^{2})}^{k}\) have this shape. There is an equivalence between representations of \({({x}^{2} + {y}^{2})}^{k}\) as a real sum of 2k-th powers and quadrature formulas on the circle – see [47]. In this sense, (5.8) can be traced back to Mehler [35] in 1864.

A real representation (1.1) of \({(\sum x_{i}^{2})}^{k}\) (with positive real coefficients λ j ) is called a Hilbert Identity; Hilbert [20, 26] used such representations with rational coefficients to solve Waring’s problem. Hilbert Identities have been important in studying quadrature problems on S n − 1, the Delsarte-Goethals-Seidel theory of spherical designs in combinatorics and for embedding questions in Banach spaces [47, Chaps. 8 and 9], as well as for explicit computations in Hilbert’s 17th problem [48]. It can be shown that any such representation requires at least \(\left ({ n+k-1 \atop n-1} \right )\) summands, and this bound also applies if negative coefficients λ j are allowed. It is not known whether allowing negative coefficients can reduce the total number of summands. However, Blekherman [3] has recently constructed fQ 6, 4 which has a smaller length if one allows negative λ j in a real representation. When \({(\sum x_{i}^{2})}^{k}\) is a sum of exactly \(\left ({ n+k-1 \atop n-1} \right )\) 2k-th powers, the coordinates of minimal representations can be used to produce tight spherical designs. Such representations exist when n = 2, 2k = 2, (n, 2k) = (3, 4), \((n,2k) = ({u}^{2} - 2,4)\) (u = 3, 5), \((n,2k) = (3{v}^{2} - 4,6)\) (v = 2, 3), (n, 2k) = (24, 10). It has been proved that they do not exist otherwise, unless possibly \((n,2k) = ({u}^{2} - 2,4)\) for some odd integer u ≥ 7 or \((n,2k) = (3{v}^{2} - 4,6)\) for some integer v ≥ 4. These questions have been largely open for more than thirty years. It is also not known whether there exist (k, n) so that \(L_{\mathbb{R}}({(\sum x_{i}^{2})}^{k}) > L_{\mathbb{C}}({(\sum x_{i}^{2})}^{k})\), although this cannot happen for n = 2. For that matter, it is not known whether there exists any fQ n, d so that \(L_{\mathbb{R}}(f) > L_{\mathbb{C}}(f)\).

We conclude this section with a related question: if \(f_{\lambda }(x,y) = {x}^{4} + 6\lambda {x}^{2}{y}^{2} + {y}^{4}\) for \(\lambda \in \mathbb{Q}\), what is \(L_{\mathbb{Q}}(f_{\lambda })\)? If \(\lambda \leq -\frac{1} {3}\), then f λ has four real factors, so \(L_{\mathbb{Q}}(f_{\lambda }) = 4\). Since \(\det H_{2}(f_{\lambda }) =\lambda {-\lambda }^{3}\), \(L_{\mathbb{C}}(f_{\lambda }) = 2\) for \(\lambda = 0,1,-1\). The formula

$$\displaystyle{ ({x}^{4} + 6\lambda {x}^{2}{y}^{2} + {y}^{4}) = \tfrac{\lambda } {2}\left ({(x + y)}^{4} + {(x - y)}^{4}\right ) + (1-\lambda )({x}^{4} + {y}^{4}) }$$

shows that \(L_{\mathbb{Q}}(f_{0}) = L_{\mathbb{Q}}(f_{1}) = 2\); \(2f_{-1}(x,y) = {(x+iy)}^{4}+{(x-iy)}^{4}\) has \(\mathbb{Q}\)-length 4.

Theorem 5.7.

Suppose \(\lambda = \frac{a} {b} \in \mathbb{Q}{,\lambda }^{3}\neq \lambda\) . Then \(L_{\mathbb{Q}}({x}^{4} + 6\lambda {x}^{2}{y}^{2} + {y}^{4}) = 3\) if and only if there exist integers (m,n) ≠ (0,0) so that

$$\displaystyle{ \Gamma (a,b,m,n) = 4{a}^{3}b\ {m}^{4} + ({b}^{4} - 6{a}^{2}{b}^{2} - 3{a}^{4}){m}^{2}{n}^{2} + 4{a}^{3}b\ {n}^{4} }$$
(5.9)

is a non-zero square.

Proof.

By Corollary 2.2, such a representation occurs if and only if there is a cubic \(h(x,y) =\sum _{ i=0}^{3}c_{i}{x}^{3-i}{y}^{i}\) which splits over \(\mathbb{Q}\) and satisfies

$$\displaystyle{ c_{0} +\lambda c_{2} =\lambda c_{1} + c_{3} = 0. }$$
(5.10)

Assume that \(h(x,y) = (mx + ny)g(x,y)\), (m, n) ≠ (0, 0) with \(m,n \in \mathbb{Z}\). If \(g(x,y) = r{x}^{2} + sxy + t{y}^{2}\), then \(c_{0} = mr,c_{1} = ms + nr,c_{2} = mt + ns,c_{3} = nt\) and (5.10) becomes

$$\displaystyle{ \left (\begin{array}{ccc} m& \lambda n &\lambda m\\ \lambda n &\lambda m & n\\ \end{array} \right )\cdot \left (\begin{array}{c} r\\ s \\ t \end{array} \right ) = \left (\begin{array}{c} 0\\ 0 \end{array} \right ) }$$
(5.11)

If m = 0, then the general solution to (5.11) is \((r,s,t) = (r,0,-\lambda r)\) and rx 2λ r y 2 splits over \(\mathbb{Q}\) into distinct factors iff λ is a non-zero square; that is, iff a b is a square, and similarly if n = 0. Otherwise, the system has full rank since λ 2 ≠ 1 and any solution is a multiple of

$$\displaystyle{ r{x}^{2} + sxy + t{y}^{2} = (\lambda {n}^{2} {-\lambda }^{2}{m}^{2}){x}^{2} + {(\lambda }^{2} - 1)mnxy + (\lambda {m}^{2} {-\lambda }^{2}{n}^{2}){y}^{2}. }$$
(5.12)

The quadratic in (5.12) splits over \(\mathbb{Q}\) into distinct factors iff its discriminant

$$\displaystyle{ {4\lambda }^{3}{m}^{4} + (1 - {6\lambda }^{2} - {3\lambda }^{4}){m}^{2}{n}^{2} + {4\lambda }^{3}{n}^{4} = {b}^{-4}\Gamma (a,b,m,n) }$$
(5.13)

is a non-zero square in \(\mathbb{Q}\).

In particular, we have the following identities: \(\Gamma ({u}^{2},{v}^{2},v,u) = {({u}^{5}v - u{v}^{5})}^{2}\) and \(\Gamma (uv,{u}^{2} - uv + {v}^{2},1,1) = {(u - v)}^{6}{(u + v)}^{2}\), hence \(L_{\mathbb{Q}}(f_{\lambda }) = 3\) for λ = τ 2 and \(\lambda ={ \frac{\tau } {\tau }^{2 } -\tau +1}\), where \(\tau = \frac{u} {v} \in \mathbb{Q}\), τ ≠ ± 1. These show that \(L_{\mathbb{Q}}(f_{\lambda }) = 3\) for a dense set of rationals in \([-\frac{1} {3},\infty )\). These families do not exhaust the possibilities. If \(\lambda = \frac{38} {3}\), so \(f_{\lambda }(x,y) = {x}^{4} + 76{x}^{2}{y}^{2} + {y}^{4}\), then λ is expressible neither as τ 2 nor \({ \frac{\tau }{\tau }^{2 } -\tau +1}\) for \(\tau \in \mathbb{Q}\), but \(\Gamma (38,3,2,19) = 276,90{6}^{2}\).

We mention two negative cases: if \(\lambda = \frac{1} {3}\), \(\Gamma (1,3,m,n) = 12{({m}^{2} + {n}^{2})}^{2}\), which is never a square, giving another proof that \(L_{\mathbb{Q}}({({x}^{2} + {y}^{2})}^{2}) = 4\). If \(\lambda = \frac{1} {2}\), then

$$\displaystyle{ \Gamma (1,2,m,n) = 8{m}^{4} - 11{m}^{2}{n}^{2} + 8{n}^{4} = \tfrac{27} {4} {({m}^{2} - {n}^{2})}^{2} + \tfrac{5} {4}{({m}^{2} + {n}^{2})}^{2}, }$$

hence if \(L_{\mathbb{Q}}({x}^{4} + 3{x}^{2}{y}^{2} + {y}^{4}) = 3\), then there is a solution to the Diophantine equation \(27{X}^{2} + 5{Y }^{2} = {Z}^{2}\). A simple descent shows that this has no non-zero solutions: working mod 5, we see that 2X 2 = Z 2; since 2 is not a quadratic residue mod 5, it follows that 5  |  X, Z, and these imply that 5  |  Y as well. It follows that, \(L_{\mathbb{Q}}({x}^{4} + 3{x}^{2}{y}^{2} + {y}^{4}) = 4\).

Solutions of the Diophantine equation \(A{m}^{4} + B{m}^{2}{n}^{2} + C{n}^{4} = {r}^{2}\) were first studied by Euler; see [16, pp. 634–639] and [38, pp. 16–29] for more on this topic. This equation has not yet been completely solved; see [6, 13]. We hope to return to the analysis of (5.9) in a future publication.

6 Open Questions

We are confident that Conjecture 4.12 can be completely settled. This raises the question of whether there exist other fields besides \(\mathbb{C}\) (and possibly \(\mathbb{R}\)) for which there is a simple description of {f : L K (f) = deg f}.

Which cabinets are possible for binary forms? Are there other restrictions beyond Corollary 5.1(1)? How many different lengths are possible? If \(\vert \mathcal{C}(f)\vert \geq 4\), then d ≥ 7. Can anything more be said about forms in n ≥ 3 variables?

Can f have more than one, but a finite number, of K-minimal representations, where K is not necessarily equal to E f ? Theorem 5.7 might be a way to find such examples.

Length is generic over \(\mathbb{C}\), but not over \(\mathbb{R}\). For d = 2r, the \(\mathbb{R}\)-length of a real form is always 2r in a small neighborhood of \(\prod _{j=1}^{d}(x - jy)\), but the \(\mathbb{R}\)-length is always r + 1 in a small neighborhood of \({({x}^{2} + {y}^{2})}^{r}\) [47]. Which combinations of degrees and lengths have interior? Does the parity of d matter? This question is explored in much greater detail in [15].