Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The Fundamental Theorem of Algebra (stated below) provides an ideal case study for illustrating the roles of alternative proofs in mathematical practice. Like the Pythagorean Theorem, the Fundamental Theorem of Algebra has been proved in many different ways since its enunciation by Euler in 1739. Unlike the Pythagorean Theorem, however, early attempts to prove the Fundamental Theorem of Algebra are not shrouded in the mists of antiquity, so we know how the adequacy of those attempts was evaluated by mathematicians of the time. We can see how criticisms of earlier efforts to prove the theorem led to alternative proof strategies, and we can analyze why the proof given by Gauss in his 1799 inaugural dissertation was the first to be accorded general acceptance, though it too would later be deemed not fully rigorous.

As with the theorems considered in earlier chapters, besides questions of rigor there have been other impetuses for devising alternative proofs of the Fundamental Theorem of Algebra: issues of perspicuity, simplicity, generality, purity of method and constructivity have also been matters of concern; and in a pedagogical context, different proofs of the Fundamental Theorem have been employed as a vehicle for introducing a variety of topics in higher-level mathematics (complex line integrals, field extensions, Galois theory, and notions from algebraic topology) in a text designed for a capstone course for senior mathematics majors (Fine and Rosenberger 1997).

8.1 Alternative formulations of the theorem

In its earliest and simplest form, the Fundamental Theorem of Algebra was the conjecture that every polynomial with real coefficients can be expressed as a product of linear and quadratic polynomials with real coefficients. The question whether that is so arose in connection with Leibniz’s attempts to integrate functions by the method of partial fractions, and Leibniz himself believed the conjecture to be false. Euler, however, showed that putative counterexamples put forward by Leibniz and by Nikolaus Bernoulli did, in fact, possess factorizations of the stated form, and it was he, in a letter to Bernoulli of 1 October 1742, who first asserted the truth of the statement. Two months later, however, in a letter of 15 December to his friend Goldbach, he confessed that he was unable to produce a fully satisfactory proof of the theorem.Footnote 1

Today the Fundamental Theorem of Algebra is more often stated in the form “Every polynomial p(X) of degree n with complex coefficients possesses exactly n complex roots, counting multiplicities.” The equivalence of that statement with the one given above rests not only upon recognizing complex numbers as meaningful entities, but upon the quadratic formula (which shows how to express any quadratic polynomial as a product of two complex linear factors), upon the factor theorem of Descartes (that a is a root of a polynomial p(X) if and only if Xa is a factor of p(X)), and upon the observation (made by Bombelli around 1560, and again by Euler in his 1742 letter to Goldbach) that the complex roots of any polynomial with real coefficients always occur in conjugate pairs, so that the product of the corresponding linear factors guaranteed by Descartes’s theorem is a real quadratic polynomial.

Consideration of the properties of complex conjugates shows that if p(X) is a polynomial with complex coefficients and \(\overline{p}(X)\) is the polynomial whose coefficients are the conjugates of those of p(X), then the product \(p(X)\overline{p}(X)\) is a polynomial with real coefficients. If z 0 is a complex root of \(p(X)\overline{p}(X)\), then it must either be a root of p(X) or of \(\overline{p}(X)\); and in the latter case, again using the overbar to denote complex conjugation, \(\overline{\overline{p}(z_{0})} = \overline{\overline{p}}(\overline{z_{0}}) = p(\overline{z_{0}}) = 0\), so \(\overline{z_{0}}\) is a root of p(X). To prove the Fundamental Theorem it therefore suffices to establish it for polynomials p(X) whose coefficients are real numbers.

8.2 Early attempts to prove the theorem

The task of showing that every polynomial with real coefficients possesses at least one complex root involves two separate aspects: showing (1) that a root of some definite sort exists, and (2) that any such root must in fact be of the form a + bi (in modern terms, proving the existence of a splitting field K over \(\mathbb{R}\) for p(X), and then showing that K must be isomorphic to \(\mathbb{C}\)). Before Gauss, however, those who endeavored to prove the existence of complex roots either explicitly assumed (1) to be true (in part, perhaps, because it was believed that formulas for the roots of polynomials of degree ≥ 5 similar to those obtained by Ferro, Tartaglia, Ferrari, and Bombelli for cubic and quartic polynomials would eventually be foundFootnote 2), or else unwittingly employed arguments that smuggled in that assumption. Their efforts focused instead on establishing (2). But, as Gauss trenchantly observed in the critique of prior proof attempts that he gave in his dissertation, there was need not only to justify the existence of roots, but, if algebraic operations were to be applied to them, to characterize their structure; for it made no sense to attempt to manipulate hypothetical quantities that were mere “shadows of shadows.”

The remainder of this section is devoted to outlining the strategies employed by D’Alembert, Euler, Lagrange, and Laplace in their attempted proofs of the Fundamental Theorem (published in 1746, 1749, 1772 and 1795, respectively) and to analyzing the deficiencies in their arguments.

d’Alembert’s ‘proof’:

In his memoir ‘Recherches sur le calcul intégral’ (d’Alembert 1746), Jean Le Rond d’Alembert is generally credited with having made the first serious attempt to prove the Fundamental Theorem of Algebra. The memoir was apparently hastily written, however, and is not notable for its clarity. Indeed, there is marked disparity among the descriptions given by modern commentators both of the mechanics of d’Alembert’s argument and of the extent of its deficiencies. (My own reading of d’Alembert’s text is in accord with the descriptions of it in Gilain (1991) and Baltus (2004), but at variance with that in Remmert (1990).) d’Alembert began by noting that if \(p(X) = X^{m} + c_{m-1}X^{m-1} +\ldots +c_{1}X + c_{0}\) is a monic polynomial with real coefficients of degree m ≥ 1, then p(0) = 0 if c 0 = 0. He then replaced the constant term c 0 by the parameter z and set the resulting function F(X, z) equal to 0, so that the Fundamental Theorem became the statement that for any real value of z, there is a (possibly complex) value x for which F(x, z) = 0. To establish that, d’Alembert first claimed that for any real number z 0, if (x 0, z 0) is a point for which \(F(x_{0},z_{0}) = 0\) — in particular, if \(x_{0} = z_{0} = 0\) — then for all real z sufficiently close to z 0, there is a complex value x for which F(x, z) = 0. He then went on to claim that for any real value z of z, an overlapping chain of discs can be found, starting at (0, 0), yielding a sequence of points (x n , z n ) such that \(F(x_{n},z_{n}) = 0\) for each n, the values x n converge to a complex number x and the (real) values z n to z , with \(F(x^{{\ast}},z^{{\ast}}) = 0\).

To establish the first claim d’Alembert alleged, without proof, that if

$$\displaystyle{F(x_{0},z_{0}) = 0,}$$

then for all z sufficiently close to z 0, there is a natural number q and a convergent series of fractional powers of zz 0 such that

$$\displaystyle{x = x_{0} +\sum _{ i=1}^{\infty }c_{ i}(z - z_{0})^{i/q}}$$

satisfies F(x, z 0) = 0. More than a century later that fact was finally proved by Victor Puiseaux, as a consequence of the Fundamental Theorem (which by then had been rigorously proved by other means); so d’Alembert’s argument was circular. There were difficulties, too, with his second claim: What ensures that the radii of the discs are such that the z n converge to z  ? And even if they do, why does the fact that \(F(x_{n},z_{n}) = 0\) for each n entail that \(F(x^{{\ast}},z^{{\ast}}) = 0\) ? Those and other criticisms were lodged against d’Alembert’s argument by Gauss, who nevertheless thought that it might be possible to repair its defects. But he and others chose instead to seek different ways to establish the Fundamental Theorem.Footnote 3

Euler’s attack on the theorem:

Three years after the appearance of d’Alembert’s memoir, Euler attempted to prove the Fundamental Theorem in its original formulation. By invoking the fact that a real polynomial of odd degree must have a real root (a consequence of the intermediate-value theorem, a principle generally accepted at the time, but first rigorously proved by Bolzano around 1816), he argued that a real quintic polynomial must have at least one real linear factor, and then went on to show how any real quartic polynomial could be expressed as a product of two real quadratic factors. Having thus established the truth of the Fundamental Theorem for polynomials of degree ≤ 5, he attempted to extend the proof to polynomials of higher degree, but was unable to do so.

At first glance it might appear that Euler had made but a minor advance beyond the work of Ferrari and Bombelli two centuries earlier, since their explicit formulas for the roots of real quartic polynomials, in which the complex roots occur in conjugate pairs, immediately entail that all such quartics can be factored into a product of real polynomials of degree at most 2. But in the formulas obtained by the Italians for the roots of cubic and quartic polynomials, complex numbers play an essential role. In its original form, however, the Fundamental Theorem makes no reference to complex numbers, so, as noted in Remmert (1990), p. 117, their employment in proofs thereof appears to invoke a deus ex machina. Euler’s method of factoring quartics, however, made no use of complex numbers, so from the standpoint of purity of method it was superior.

A detailed and very readable discussion of what Euler did in his paper Euler (1749) is given in Dunham (1991). Following the lead of the Italian school, Euler noted that any monic polynomial of degree 4 in the variable x with real coefficients can be converted into an equivalent quartic in the variable y that lacks a cubic term (via the substitution \(x = y - c_{3}/4\), where c 3 is the coefficient of x 3 in the original polynomial). The factorization of the resulting quartic \(y^{4} + \mathit{By}^{2} + \mathit{Cy} + D\) then depends on the values of the coefficients B, C and D. If C = 0, the quartic is a quadratic in y 2, which, if B 2 − 4D ≥ 0, factors as

$$\displaystyle{\left [y^{2} + \frac{B + \sqrt{B^{2 } - 4D}} {2} \right ]\left [y^{2} + \frac{B -\sqrt{B^{2 } - 4D}} {2} \right ].}$$

On the other hand, if B 2 − 4D < 0, then both \(\sqrt{D} > 0\) and \(\sqrt{2\sqrt{D} - B} > 0\), so

$$\displaystyle{y^{4} + \mathit{By}^{2} + D = \left [y^{2} + \sqrt{D}\right ]^{2} -\left [y\sqrt{2\sqrt{D} - B}\right ]^{2},}$$

a difference of two squares that factors once again into the product of two quadratics. If C ≠ 0, the absence of the y 3 term in \(y^{4} + \mathit{By}^{2} + \mathit{Cy} + D\) implies that any factorization of that quartic into quadratic factors must be of the form \((y^{2} + \mathit{uy}+\alpha )(y^{2} -\mathit{uy}+\beta )\) for some constants u, α and β. Expanding that product, setting it equal to \(y^{4} + \mathit{By}^{2} + \mathit{Cy} + D\) and equating coefficients of like powers of y, Euler obtained three equations in the unknowns u, α and β, from which after further algebra he deduced the equation \(u^{6} + 2\mathit{Bu}^{4} + (B^{2} - 4D)u^{2} - C^{2} = 0\), in which all the powers of u are even. The graph of \(Y = u^{6} + 2\mathit{Bu}^{4} + (B^{2} - 4D)u^{2} - C^{2}\) is therefore symmetric about the Y -axis, Y approaches + as u approaches ±, and Y (0) < 0 (see Figure 8.1), so by the intermediate-value principle, \(Y = u^{6} + 2\mathit{Bu}^{4} + (B^{2} - 4D)u^{2} - C^{2}\) must have real roots ± u 0, either of which can be substituted back into the earlier equations to find real values for α and β.

Fig. 8.1
figure 1

Graph of a sixth-degree polynomial with no odd powers

To extend to polynomials of arbitrary degree, Euler noted that by multiplying, if necessary, by some positive integral power of X, any polynomial p(X) of degree d could be converted into a polynomial q(X) of degree 2m, for some m > 0. He then attempted to mimic the procedure he had employed for factoring quartics. A direct approach led to systems of equations too complex to allow derivation of an equation for u, but he showed that an alternative approach, avoiding the need to find an explicit equation for u 0, was also possible in the quartic case. However, to extend that approach to polynomials of degree 2m for m > 2 it was necessary to assume that 2m roots of some sort existed; and without specifying the nature of those roots, Euler’s attempt to show that algebraic combinations of them would yield real coefficients for the putative factors of q(X) was doomed to failure (as Gauss was to point out).

Lagrange’s improvement of Euler’s argument:

In a long and important paper that appeared in 1770/1771,Footnote 4 Joseph Louis Lagrange investigated the properties of symmetric polynomials and established the result now known as the fundamental theorem about them: that any polynomial \(S(X_{1},\ldots,X_{n})\) symmetric in \(X_{1},\ldots,X_{n}\) has a unique representation as a polynomial \(P(s_{1},\ldots,s_{n})\), where \(s_{1},\ldots,\,s_{n}\) are the elementary symmetric polynomials in \(X_{1},\ldots,X_{n}\). Using that result, and assuming that a real polynomial p(X) of degree 2m had 2m roots that could be manipulated like ordinary real numbers (in modern terms, that p(X) had roots in some field extending \(\mathbb{R}\)), Lagrange was able to establish (even to Gauss’s satisfaction) that the factors Euler had sought for p(X) would indeed have real coefficients. Only the justification for the existence of such roots remained to be proven.

Laplace’s proof:

Under the same basic assumptions that Euler and Lagrange had made (the existence of a splitting field and the intermediate-value principle), together with DeMoivre’s theorem on roots of complex numbers, proved earlier in the eighteenth century, Pierre Simon de Laplace employed Lagrange’s theorem on symmetric polynomials to prove the Fundamental Theorem of Algebra in its second formulation. A version of his proof in modern terminology, as given in Remmert (1990), pp. 120–122, goes as follows.Footnote 5

Suppose a monic polynomial p(X) of degree n ≥ 1 with real coefficients has roots \(r_{1},\ldots,r_{n}\) in some splitting field F over \(\mathbb{R}\), and rewrite n as 2m q, where q is odd. The proof proceeds by induction on m. If m = 0, p(X) has a real root by the intermediate-value principle. For m ≥ 1, suppose that every polynomial of degree n = 2k q with k < m has a complex root. Laplace then considered the symmetric polynomials over F given by

$$\displaystyle{L_{t}(X) =\prod _{1\leq i<j\leq n}(X - r_{i} - r_{j} -\mathit{tr}_{i}r_{j}),}$$

for each positive integer t. By the Fundamental Theorem on Symmetric Polynomials, each L t , when written as a polynomial in powers of X, has coefficients that are elementary symmetric polynomials in the roots of the real polynomial p(X). But the coefficient of each power X j in p(X) is just \((-1)^{j}s_{j}(r_{1},\ldots,r_{n})\), where \(s_{j}(r_{1},\ldots,r_{n})\) is the jth elementary symmetric polynomial in \(r_{1},\ldots,r_{n}\); so each coefficient of L t (X) is a real number. Moreover, each L t (X) has degree

$$\displaystyle{\binom{n}{2} = \binom{2^{m}q}{2} = 2^{m-1}q[2^{m}q - 1],}$$

where q[2m q − 1] is odd. By the induction hypothesis, each L t (X) thus has a complex root c t , so for some pair (r i , r j ) with \(1 \leq i < j \leq n,c_{t} = r_{i} + r_{j} + \mathit{tr}_{i}r_{j}\); and since there are infinitely many integers t but only finitely many pairs \((r_{i},r_{j})\) with 1 ≤ i < j ≤ n, there must be distinct integers t 1 and t 2 such that for the same i and j, \(c_{t_{1}} = r_{i} + r_{j} + t_{1}r_{i}r_{j}\) and \(c_{t_{2}} = r_{i} + r_{j} + t_{2}r_{i}r_{j}\) are both complex numbers. The difference \(c_{t_{1}} - c_{t_{2}} = (t_{1} - t_{2})r_{i}r_{j}\) is then also a complex number, whence so is r i r j . Therefore \(c_{t_{1}} - t_{1}r_{i}r_{j} = r_{i} + r_{j}\) is a complex number as well. So by DeMoivre’s theorem and the quadratic formula, the roots of the polynomial \(X^{2} - (r_{i} + r_{j})X + r_{i}r_{j} = (X - r_{i})(X - r_{j})\), that is, the roots r i and r j of the original polynomial p(X), must be complex numbers. q.e.d.

8.3 Gauss’s first proof

Gauss’s doctoral dissertation, submitted to the University of Helmstedt in 1799 and written in Latin, was entitled Demonstratio nova theorematis omnem functionem algebraicam rationalem integram unius variabilis in factores reales primi vel secundi gradus resolvi posse Footnote 6 — that is, “New proof of the theorem that every rational integral algebraic function [i.e, polynomial] of one variable can be resolved into real factors of first or second degree.” Gauss thus stated the Fundamental Theorem in its original formulation, and declared his aim to be that of giving “a new and stronger proof” of that result. On the second page of the dissertation he noted the equivalent formulation in terms of complex roots, but stated that he would eschew the use of complex numbers in his demonstration. He went on to criticize earlier proofs, all of which he faulted for presuming without justification that a polynomial of degree m must possess m roots of some (unspecified) sort. To avoid that presumption, he gave a geometric argument to establish the desired factorization.

Outline of proof:

Gauss begins by proving two lemmas.

Lemma 1:

If m is any positive integer, then \(x^{2} - 2r\cos \phi x + r^{2}\) is a factor of \(\sin \phi x^{m} - r^{m-1}\sin (m\phi )x + r^{m}\sin (m - 1)\phi\).

(The latter expression is 0 if m = 1. If m = 2, the other factor is sinϕ, and if m > 2, \(\sum _{i=1}^{m-1}\sin (i\phi )r^{i-1}x^{m-i-1}\) is the other factor.)

Lemma 2:

If r and ϕ satisfy the equations

$$\displaystyle\begin{array}{rcl} r^{m}\cos (m\phi ) + \mathit{Ar}^{m-1}\cos (m - 1)\phi & +& \mathit{Br}^{m-2}\cos (m - 2)\phi +\ldots \\ & +& \mathit{Kr}^{2}\cos (2\phi ) + \mathit{Lr}\cos \phi + M = 0{}\end{array}$$
(8)

and

$$\displaystyle\begin{array}{rcl} & & r^{m}\sin (m\phi ) + \mathit{Ar}^{m-1}\sin (m - 1)\phi + \mathit{Br}^{m-2}\sin (m - 2)\phi +\ldots \\ & & \phantom{r^{m}\sin (m\phi ) + \mathit{Ar}^{m-1}\sin (m - 1)\phi + \mathit{Br}^{m-2}} + \mathit{Kr}^{2}\sin (2\phi ) + \mathit{Lr}\sin \phi = 0,{}\end{array}$$
(9)

then the expression \(x^{m} + \mathit{Ax}^{m-1} + \mathit{Bx}^{m-2} +\ldots +\mathit{Kx}^{2} + \mathit{Lx} + M\) has the factor xrcosϕ if rsinϕ = 0 and the factor \(x^{2} - (2r\cos \phi )x + r^{2}\) if rsinϕ ≠ 0.

(Gauss notes that complex numbers are usually invoked to prove Lemma 2, but he gives an alternative proof that avoids them, based on Lemma 1.)

To prove the Fundamental Theorem it therefore suffices to show that r and ϕ can be found that satisfy the two equations of Lemma 2.Footnote 7

Toward that end, Gauss considers the surfaces generated by the functions

$$\displaystyle\begin{array}{rcl} & & T = r^{m}\sin (m\phi ) + \mathit{Ar}^{m-1}\sin (m - 1)\phi + \mathit{Br}^{m-2}\sin (m - 2)\phi +\ldots {}\\ & & \phantom{T = r^{m}\sin (m\phi ) + \mathit{Ar}^{m-1}\sin (m - 1)\phi + \mathit{Br}^{m-2}\sin (}+\mathit{Kr}^{2}\sin (2\phi ) + \mathit{Lr}\sin \phi {}\\ \end{array}$$

and

$$\displaystyle\begin{array}{rcl} & & U = r^{m}\cos (m\phi ) + \mathit{Ar}^{m-1}\cos (m - 1)\phi + \mathit{Br}^{m-2}\cos (m - 2)\phi +\ldots {}\\ & & \phantom{U = r^{m}\cos (m\phi ) + \mathit{Ar}^{m-1}\cos (m - 1)\phi + \mathit{Br}} + \mathit{Kr}^{2}\cos (2\phi ) + \mathit{Lr}\cos \phi + M {}\\ \end{array}$$

above and below the (r, ϕ)-plane and the traces in that plane where those surfaces intersect it.Footnote 8 The problem then becomes that of showing that there is at least one point in the (r, ϕ)-plane where the T-trace and the U-trace themselves intersect.

Further analysis shows that the T- and U-traces each contain 2m arcs that extend to infinity. Two arcs of the T-trace join to form the horizontal axis. The other arcs are each asymptotic to lines where sin(m ϕ) = 0, that is, to lines through the origin that are inclined to the axis at one of the angles k πm, for 0 < k < m. The arcs of the U-trace are likewise asymptotic to lines where cos(m ϕ) = 0, that is, to lines through the origin that are inclined to the axis at one of the angles \((2k - 1)\pi /2m\), for 0 < k ≤ m. Accordingly, those arcs will intersect a circle of sufficiently large radius at 2m points, which divide its circumference into 2m intervals in which T is alternately positive and negative. Moreover, those points are alternately one where a T-arc intersects the circle and one where a U-arc does. (See Figure 8.2, based on Gauss’s own illustration for the quartic polynomial \(X^{4} - 2X^{2} + 3X + 10\). The solid curves there represent the T-arcs and the dotted ones the U-arcs.)

Fig. 8.2
figure 2

Alternating T- and U- arcs

Assuming that for at least one k the arc of the T-trace that intersects the circle at point k and the arc of the T-trace that intersects the circle at point k + 2 are both part of the same continuous T-branch, and likewise that the arc of the U-trace that intersects the circle at point k + 1 and the arc of the U-trace that intersects the circle at point k + 3 are both part of the same continuous U-branch, then that U-branch, in passing from one intersection point where T is positive to another where it is negative, must at some point within the circle cross that T-branch. (See Figure 8.3, also taken from Gauss’s original text.) The theorem would thereby be proved.

Fig. 8.3
figure 3

Intersecting T- and U-branches

Gauss declared that the assumption involved could be justified in “many different ways,” one of which he endeavored to outline. But ultimately he had to rely on his geometric intuition that “an algebraic curve can neither suddenly end abruptly … nor lose itself, so to speak, … after an infinity of circuits (as in the case of a logarithmic spiral)” — a fact that he believed could “be taken as [having been] sufficiently securely established” (Gauss 1890, footnote to p. 33).

Gauss’s contemporaries evidently agreed, for they found no fault with his proof. Only much later — long after Bolzano’s proof of the intermediate-value theorem, and after Kronecker, Dedekind and others, by relatively straightforward means, had shown how to construct splitting fields and thereby justified the earlier proofs of the Fundamental Theorem that had relied on those facts — did mathematicians come to regard the principle that Gauss had relied on (that a non-compact branch of an algebraic curve that enters a bounded space must eventually emerge from it) as a statement (like the Jordan Curve Theorem) that required more rigorous demonstration. The principle was finally proved rigorously by Alexander Ostrowski in 1920, using sophisticated topological notions. (See Ostrowski 1983.)

Fifty years after his receipt of the doctorate, Gauss gave another proof of the Fundamental Theorem (his fourth) that was a minor variant of the one given in his dissertation.Footnote 9 Since (as he remarked) complex numbers had by then come to be generally accepted by the mathematical community (in large part due to the Fundamental Theorem itself), he felt free to employ them in the revised version of his proof; and there he also allowed the polynomials to have complex coefficients.

8.4 Argand’s proof

The assertion that every nonconstant polynomial with complex coefficients must have a complex root was first made not by Gauss, but by Jean Robert Argand, a Paris bookkeeper who introduced the planar representation of complex numbers now named after him;Footnote 10 and in 1814 Argand gave his own, very different proof of that result (Argand 1814), for whose understanding nothing beyond some basic knowledge of advanced calculus is required.

Specifically, Argand’s proof depends on knowing (1) that polynomials are continuous functions; (2) that every continuous function defined on a closed disc | z | ≤ R assumes a minimum in that disc; and (3) that every complex number has a kth root for each integer k > 1 (an immediate consequence of DeMoivre’s Theorem).Footnote 11 The proof then proceeds as follows:

Given a polynomial \(p(z) = a_{n}z^{n} + a_{n-1}z^{n-1} +\ldots +a_{1}z + a_{0}\) with n ≥ 1, | p(z) | approaches as | z | does, so for any positive constant C, there is an R > 0 such that | p(z) |  > C for | z |  > R. Taking \(C =\inf _{z\in \mathbb{C}}\vert p(z)\vert \), it follows that \(\inf _{z\in \mathbb{C}}\vert p(z)\vert =\inf _{z\leq R}\vert p(z)\vert \), so by (2), | p((z) | assumes a minimum for some z 0 with | z 0 | ≤ R. That p(z 0) = 0 then follows directly from

Argand’s inequality:

For any polynomial p(z) of degree n ≥ 1, if p(z 0) ≠ 0 then there is a \(z_{1} \in \mathbb{C}\) for which \(\vert p(z_{1})\vert < \vert p(z_{0})\vert \).

Sketch of proof: Since p(z 0) ≠ 0, we can divide p(z) by | p(z 0) | to obtain a polynomial q(z) of the same degree for which q(z 0) = 1; and if we define \(h(z) = q(z + z_{0})\), then h(0) = 1, so h(z) may be written as

$$\displaystyle\begin{array}{rcl} 1 + b_{1}z + b_{2}z^{2} +\ldots +b_{ n}z^{n}& =& 1 + b_{ k}z^{k} + b_{ k+1}z^{k+1} +\ldots +b_{ n}z^{n} {}\\ & =& 1 + b_{k}z^{k} + z^{k}[b_{ k+1}z +\ldots +b_{n}z^{n-k}], {}\\ \end{array}$$

where k is the least index i for which b i ≠ 0. The expression in brackets is a polynomial that has z = 0 as a root, so it is continuous at z = 0 and therefore can be made arbitrarily small for z sufficiently close to 0. If r is any kth root of \(-1/b_{k}\), the triangle inequality then shows that | h(rt) |  < 1 for sufficiently small positive real numbers t. For any such t, setting \(z_{1} = z_{0} + \mathit{rt}\) yields | p(z 1) |  <  | p(z 0) | .

8.5 Gauss’s second proof

Despite the acceptance of his 1799 proof by other mathematicians of his time, Gauss published a second proof in 1815Footnote 12 that was based on algebraic rather than geometric principles. In his opening remarks, he maintained that his first proof “probably [sic ] leaves nothing more to be desired with respect to rigor or simplicity”.Footnote 13 He deemed his new proof to be “no less rigorous” than the first, but he did not claim it was simpler (as it certainly was not). Why then did he offer it?

Perhaps, as the qualifying wohl might be taken to suggest, he did after all harbor some doubts about whether the geometric principles he had invoked in his first proof had been rigorously established; but at least ostensibly, he was concerned about purity of method.

Like the proofs of Laplace and Lagrange, Gauss’s 1815 proof was by induction on the highest power of 2 dividing the degree of the polynomial Y (x). But Gauss avoided having to assume that Y (x) had any roots by working in terms of indeterminates \(a_{1},a_{2},\ldots,a_{n}\). He defined various auxiliary polynomials symmetric in those indeterminates, and applied the Fundamental Theorem on Symmetric Polynomials to each.Footnote 14

The very long proof is divided into twenty numbered sections. To make the argument self-contained, Gauss first established a number of preliminary results: Given two polynomials Y 1(x) and Y 2(x), whose coefficients might include other indeterminates in addition to x, he defined their greatest common divisor and used the Euclidean algorithm to show that the g.c.d. must be a linear combination of Y 1(x) and Y 2(x), so that, in particular, if Y 1(x) and Y 2(x) have no common divisor of positive degree, there must be polynomials Z 1(x) and Z 2(x) such that \(Z_{1}(x)Y _{1}(x) + Z_{2}(x)Y _{2}(x) = 1\), and conversely. In section 3, for any positive integer m he defined the elementary symmetric polynomials in the indeterminates \(a_{1},a_{2},\ldots,a_{m}\) and noted that any polynomial function of them (again, possibly containing other indeterminates as well) must also be symmetric in \(a_{1},a_{2},\ldots,a_{m}\). Section 4 was devoted to proving the converse (the fundamental theorem), and section 5 to establishing the uniqueness of that representation.

Then, for any integer m ≥ 1, he defined π m to be the symmetric polynomial in the indeterminates \(a_{1},a_{2},\ldots,a_{m}\) given by

$$\displaystyle{ \pi _{m} =\prod _{\begin{array}{c}1\leq i,j\leq m \\ i\neq j \end{array}}(a_{i} - a_{j}) = (-1)^{\binom{m}{2}}\prod _{ \begin{array}{c}1\leq i,j\leq m\\ i<j\end{array}}(a_{i} - a_{j})^{2}. }$$
(10)

By the fundamental theorem, π m can be represented as a polynomial in the elementary symmetric polynomials \(s_{1},s_{2},\ldots s_{m}\) of the indeterminates \(a_{1},a_{2},\ldots,a_{m}\). Let p m be the polynomial in the indeterminates \(i_{1},i_{2},\ldots,i_{m}\) obtained from the latter polynomial by replacing each occurrence of s j therein by i j , for 1 ≤ j ≤ m.Footnote 15 Then, given a monic polynomial \(Y (x) = x^{m} - C_{1}x^{m-1} + C_{2}x^{m-2} -\ldots \pm C_{m}\) of degree m with real coefficients \(C_{1},-C_{2},\ldots,\pm C_{m}\), let P Y be the real number obtained by substituting for each i j in p m the value C j from Y (that is, P Y is the value of \(p_{m}(i_{1},i_{2},\ldots,i_{m})\) at \((C_{1},C_{2},\ldots,C_{m})\)).

Sections 6–9 of Gauss’s paper are devoted to proving

Lemma 1:

If Y (x) is the (formal) derivative of Y (x), then Y (x) And Y (x) have a nonconstant common factor if and only if P Y  = 0.

Gauss began by noting that if Y (x) could be decomposed into (not necessarily distinct) linear factors, say

$$\displaystyle{Y (x) = (x - A_{1})(x - A_{2})\cdots (x - A_{m}),}$$

then

$$\displaystyle{Y (x) = x^{m} - (\sum _{ 1\leq i\leq m}A_{i})x^{m-1} + (\sum _{ \begin{array}{c}1\leq i,j\leq m\\ i\neq j\end{array}}A_{i}A_{j})x^{m-2} -\ldots \pm \prod _{ 1\leq i\leq m}A_{i}\,;}$$

that is, for each 1 ≤ i ≤ m, \(C_{i} = s_{i}(A_{1},A_{2},\ldots,A_{m})\). Hence

$$\displaystyle\begin{array}{rcl} P_{Y }& =& p_{m}\,(s_{1}(A_{1},A_{2},\ldots,A_{m}),s_{2}(A_{1},A_{2},\ldots,A_{m}),\ldots,s_{m}(A_{1},A_{2},\ldots,A_{m})) {}\\ & =& \prod _{\begin{array}{c}1\leq i,j\leq m \\ i\neq j \end{array}}(A_{i} - A_{j}), {}\\ \end{array}$$

so P Y  = 0 if and only if for some distinct i, j, A i  = A j . It then follows directly from the product rule, applied to the factored form of Y (x), that the latter condition holds if and only if Y (x) and Y (x) have some factor (xA i ) in common.

To avoid circularity in proving the Fundamental Theorem, however, it was necessary to prove the Lemma without presuming that Y (x) could be decomposed into linear factors. To do so, Gauss noted that any monic polynomial

$$\displaystyle{Y (x) = x^{m} - C_{ 1}x^{m-1} + C_{ 2}x^{m-2} -\ldots \pm C_{ m}}$$

in the indeterminate x could be regarded as a substitution instance of the polynomial

$$\displaystyle{ y(x,i_{1},\ldots,i_{m}) = x^{m} - i_{ 1}x^{m-1} + i_{ 2}x^{m-2} -\ldots \pm i_{ m}, }$$
(11)

in which the indeterminates \(i_{1},\ldots,i_{m}\) had been replaced by the real numbers \(C_{1},\ldots,C_{m}\). If, on the other hand, those indeterminates were replaced by the elementary functions \(s_{1},\ldots,s_{m}\) of the indeterminates \(a_{1},\ldots,a_{m}\), the resulting polymonial ν could be written as

$$\displaystyle{ \nu =\prod _{ i=1}^{m}(x - a_{ i}). }$$
(12)

As a tool for proving the ‘only if’ direction of the Lemma, Gauss considered the polynomial in the indeterminates \(x,a_{1},a_{2},\ldots,a_{m}\), symmetric in \(a_{1},a_{2},\ldots,a_{m}\), given by

$$\displaystyle{\rho = \pi _{m}\sum _{i=1}^{m}\ \prod _{ \begin{array}{c}1\leq j\leq m\\ j\neq i\end{array}} \frac{(x - a_{j})} {(a_{i} - a_{j})^{2}}}$$

(That ρ is a polynomial in \(x,a_{1},a_{2},\ldots,a_{m}\) follows from the rightmost member of equation (10), which shows that the denominator in each summand of ρ divides π m .) If 1 ≤ k ≤ m and x = a k , then only one summand of ρ and one of the derivatives ν is non-zero (in each case, that for which i = k), and \(\rho \nu ^{{\prime}} = \pi _{m}\). Thus for each integer k between 1 and m, xa k is a factor of \(\pi _{m} -\rho \nu ^{{\prime}}\), so ν divides \(\pi _{m} -\rho \nu ^{{\prime}}\).

The quotient, σ, is then another polynomial in \(x,a_{1},a_{2},\ldots,a_{m}\) symmetric in all the a i . Applying the Fundamental Theorem on Symmetric Polynomials to each term in the equation \(\pi _{m} =\sigma \nu +\rho \nu ^{{\prime}}\) and replacing each elementary symmetric polynomial s j therein by the indeterminate i j produces the equation

$$\displaystyle{ p_{m} = s(x)y(x) + r(x)y^{{\prime}}(x), }$$
(13)

where r(x) and s(x) are polynomials in \(x,i_{1},i_{2},\ldots,i_{m}\), and y(x) is the polynomial defined in (11). Replacing each i j in (13) by the real number C j then yields \(P_{Y } = S(x)Y (x) + R(x)Y ^{{\prime}}(x)\), whose left member is a real number and whose right member is a polynomial in x. If P Y ≠ 0, division by P Y yields \(1 = \dfrac{S_{Y }(x)} {P_{Y }} Y (x) + \dfrac{R_{Y }(x)} {P_{Y }} Y ^{{\prime}}(x)\); so Y (x) and Y (x) have no nonconstant common factor unless P Y  = 0.

Conversely, if Y (x) and Y (x) have a nonconstant common factor, there are functions f(x) and ϕ(x) such that \(f(x)Y (x) +\phi (x)Y ^{{\prime}}(x) = 1\). Moving the left member of that equation to the right and adding \(f(x)\nu +\phi (x)\nu ^{{\prime}}\) to both sides then gives

$$\displaystyle{ f(x)\nu +\phi (x)\nu ^{{\prime}} = 1 + f(x)(\nu -Y (x)) +\phi (x)(\nu -Y (x))^{{\prime}}. }$$
(14)

Gauss abbreviates the expression \(f(x)(y(x) - Y (x)) +\phi (x)(y(x) - Y (x))^{{\prime}}\), a polynomial in the indeterminates \(x,i_{1},\ldots,i_{m}\), by \(F(x,i_{1},\ldots,i_{m})\), and the right member of (14), regarded as a polynomial in x and the elementary symmetric polynomials \(s_{1},\ldots,s_{m}\), by \(1 + F(x,s_{1},\ldots,s_{m}\)). Similarly, he uses \(F(x,C_{1},\ldots,\) C m ) to denote the result of replacing each i k in \(F(x,i_{1},\ldots,i_{m})\) by C k ; and since y(x) is thereby transformed into Y (x), it follows that for any value of x,

$$\displaystyle{ F(x,C_{1},\ldots,C_{m}) = 0. }$$
(15)

Gauss next applies the product rule to ν, which, for any j between 1 and m, yields

$$\displaystyle{ \nu ^{{\prime}} =\prod _{ \begin{array}{c}1\leq i\leq m\\ i\neq j\end{array}}(x - a_{i}) + (x - a_{j})\bigg(\prod _{\begin{array}{c}1\leq i\leq m \\ i\neq j \end{array}}(x - a_{i})\bigg)^{{\prime}}. }$$
(16)

Replacing ν in (14) by the expression on the right of (16) and setting x = a j successively for each j between 1 and m then gives the sequence of equations

$$\displaystyle{\phi (a_{j})\prod _{\begin{array}{c}1\leq i\leq m \\ i\neq j \end{array}}(x - a_{i}) = 1 + F(a_{j},s_{1},\ldots,s_{m})\quad \quad (1 \leq j \leq m).}$$

Since the expressions on either side of each equation in that sequence are polynomials symmetric in the indeterminates \(a_{1},\ldots,a_{m}\), the same is true of the product of those equations:

$$\displaystyle{ \pi _{m}\phi (a_{1})\phi (a_{2})\ldots \phi (a_{m}) =\prod _{ i=1}^{m}(1 + F(a_{ i},s_{1},\ldots,s_{m})). }$$
(17)

The Fundamental Theorem on Symmetric Polynomials thus ensures that there are polynomials in the indeterminates \(i_{1},\ldots,i_{m}\), say t and ψ, such that t is transformed into \(\phi (a_{1})\phi (a_{2})\ldots \phi (a_{m})\) and ψ into \(\prod _{i=1}^{m}(1 + F(a_{i},s_{1},\ldots,s_{m}))\) when each i j is replaced by s j . From (17) it then follows that p m t = ψ.

Replacing each indeterminate i j in this last equation by the real number C j , and writing T for the resulting value of t, we conclude from (15) that P Y T = 1. Hence P Y ≠ 0.

Lemma 1, just proved, implies that if P Y  = 0, then Y (x) must have a nontrivial factor; so repeating that argument, if need be, shows that it must in fact have a factor Q for which P Q ≠ 0. Hence without loss of generality we may assume that P Y ≠ 0. Moreover, if Y (x) has degree m = k2μ, where k is odd, then at least one factor of Y (x) must be of degree l2τ, where l is odd and τ ≤ μ, for otherwise the power of 2 in m would exceed μ.

Gauss went on in section 12 to consider the polynomial

$$\displaystyle{ \zeta (u,x) =\prod [u - (a_{i} + a_{j})x + a_{i}a_{j}], }$$
(18)

where the product is taken over the \(m(m - 1)/2\) unordered pairs \(\{a_{i},a_{j}\},i\neq j\) of the indeterminates \(a_{1},\ldots,a_{m}\). Once again, ζ is symmetric in those indeterminates, so there is a function z(u, x) in the indeterminates \(i_{1},\ldots,i_{m}\) that transforms to ζ when the latter indeterminates are replaced by \(s_{1},\ldots,s_{m}\), and a function Z(u, x) that results from z(u, x) when each s i is replaced by C i (the ith coefficient of Y ). Regarded as functions of u alone, ζ and z are monic polynomials of degree \(n = m(m - 1)/2\), with coefficients that are polynomials in \(x,a_{1},\ldots,a_{m}\) and in \(x,i_{1},\ldots,i_{m}\), respectively, and Z is a monic polynomial of degree n in u whose coefficients are polynomials in x, say \(c_{1}(x),\ldots,c_{n}(x)\). We may then consider the discriminant P Z (x) of Z, that, is, the function \(p_{n}(c_{1}(x),\ldots,c_{n}(x))\). (See the last footnote above.)

Gauss’s next task was to prove

Lemma 2:

If P Y ≠ 0, then P Z (x) cannot be identically zero.

He noted that, once again, that would be straightforward if Y (x) were a product of linear factors. To establish the result without that assumption, he observed that the discriminant of ζ is the product of all the n(n − 1) non-zero differences of distinct pairs of the n expressions \((a_{i} + a_{j})x - a_{i}a_{j}\). Hence the discriminants of ζ and of z, regarded as polynomials in x, each have degree \(d = n(n - 1) = \dfrac{1} {4}m(m - 1)(m + 1)(m - 2)\), while the discriminant P Z (x) of Z may have lesser degree if the particular values of the C i cause the coefficient of x d in P Z (x) to vanish. The problem is to show that not all the coefficients of P Z (x) will vanish.

Closer examination of the discriminant of ζ reveals that that product may be split into two groups of factors, the first consisting of those differences of the form

$$\displaystyle{[(a_{i} + a_{j})x - a_{i}a_{j}] - [(a_{i} + a_{k})x - a_{i}a_{k}] = (a_{j} - a_{k})(x - a_{i}),}$$

for distinct i, j, k, and the second of those differences of the form

$$\displaystyle{ [(a_{i} + a_{j})x - a_{i}a_{j}] - [(a_{k} + a_{l})x - a_{k}a_{l}] = (a_{i} + a_{j} - a_{k} - a_{l})x - a_{i}a_{j} + a_{k}a_{l}, }$$
(19)

for distinct i, j, k, l. In the first group, each factor \((a_{j} - a_{k})\) will occur m − 2 times (once for each value of i distinct from j and k), whereas each factor (xa i ) will occur \((m - 1)(m - 2)\) times (once for every ordered pair of distinct values j, k different from i). If the product of the second group of factors (a polynomial symmetric in \(a_{1},\ldots,a_{m}\)) be denoted by κ, then from (10) and (12), the discriminant of ζ is \(\pi _{m}^{m-2}\nu ^{(m-1)(m-2)}\kappa\).

Likewise, if \(k(x,i_{1},\ldots,i_{m})\) is the function that transforms into κ when each i j in it is replaced by s j , and K(x) is the result of replacing each such i j by C j , then the discriminant of z is \(p_{m}^{m-2}y^{(m-1)(m-2)}k\) and that of Z, P Z , is \(P_{Y }^{m-2}Y ^{(m-1)(m-2)}K\). Since P Y ≠ 0 by assumption, it remains to show that K is not identically zero.

Toward that end Gauss introduced the function of the new indeterminate w given by

$$\displaystyle{ \prod [(a_{i} + a_{j} - a_{k} - a_{l})w + (a_{i} - a_{k})(a_{i} - a_{l})], }$$
(20)

where i, j, k, l are distinct integers between 1 and m and no factors are repeated. (Note that each factor is invariant under the interchange of a k and a l , and so would appear twice in the product if repeated factors were allowed.) That product is symmetric in the indeterminates \(a_{1},\ldots,a_{m}\), so it is uniquely expressible as a polynomial function \(f(w,s_{1},\ldots,s_{m})\) of the elementary symmetric polynomials and w. Since the number of factors in the product is \(\dfrac{1} {2}m(m - 1)(m - 2)(m - 3)\), the degree of each substitution instance of \(f(w,s_{1},\ldots,s_{m})\) is at most that. Also,

$$\displaystyle\begin{array}{rcl} f(0,s_{1},\ldots,s_{m})& =& \pi _{m}^{(m-2)(m-3)}, {}\\ f(0,i_{1},\ldots,i_{m})& =& p_{m}^{(m-2)(m-3)},\quad \text{and} {}\\ f(0,C_{1},\ldots,C_{m})& =& P_{Y }^{(m-2)(m-3)}. {}\\ \end{array}$$

In particular, the last equation shows that the constant term of

$$\displaystyle{ f(w,C_{1},\ldots,C_{m}) }$$

does not vanish.

Let the non-zero term of highest degree in \(f(w,C_{1},\ldots,C_{m})\) be Nw γ. Then for each j between 1 and m, if w be replaced by xa j , \(f(x - a_{j},C_{1},\ldots,C_{m})\) may be regarded as a polynomial in x whose leading term is Nx γ and whose other coefficients depend upon a j . Consequently

$$\displaystyle{ \prod _{j=1}^{m}f(x - a_{ j},C_{1},\ldots,C_{m}) }$$
(21)

is a polynomial in x with leading term N m x m γ, in which the coefficients of the remaining terms are functions of \(a_{1},\ldots,a_{m}\).

Similarly,

$$\displaystyle{\prod _{j=1}^{m}f(x - a_{ j},i_{1},\ldots,i_{m})}$$

is a polynomial function of \(x,a_{1},\ldots,a_{m},i_{1},\ldots,i_{m}\) symmetric in \(a_{1},\ldots,a_{m}\), which by the Fundamental Theorem on Symmetric Polynomials may be rewritten as a polynomial \(\varphi (x,s_{1},\ldots,s_{m},i_{1},\ldots,i_{m})\). Replacing each i j by s j then yields

$$\displaystyle{\varphi (x,s_{1},\ldots,s_{m},s_{1},\ldots,s_{m}) =\prod _{ j=1}^{m}f(x - a_{ j},s_{1},\ldots,s_{m}).}$$

Now, for any fixed value of i, when w is replaced by xa i , the factor

$$\displaystyle{(a_{i} + a_{j} - a_{k} - a_{l})w + (a_{i} - a_{k})(a_{i} - a_{l})}$$

in (20) reduces after cancellation of like terms to

$$\displaystyle{(a_{i} + a_{j} - a_{k} - a_{l})x - a_{i}a_{j} + a_{k}a_{l},}$$

which is the same as the right member of (19). So each factor of κ is also a factor of \(\varphi (x,s_{1},\ldots,s_{m},s_{1},\ldots,s_{m})\); that is, κ divides \(\varphi (x,s_{1},\ldots,s_{m},s_{1},\ldots,s_{m})\), say \(\varphi (x,s_{1},\ldots,s_{m},s_{1},\ldots,s_{m}) =\kappa \chi (x,s_{1},\ldots,s_{m})\). Therefore also

$$\displaystyle{\varphi (x,C_{1},\ldots,C_{m},C_{1},\ldots,C_{m}) = K\chi (x,C_{1},\ldots,C_{m}).}$$

But \(\varphi (x,s_{1},\ldots,s_{m},C_{1},\ldots,C_{m})\) is the product in (21), which has leading term N m x m γ, not involving any of \(s_{1},\ldots,s_{m}\); so N m x m γ must also be the leading term of \(\varphi (x,C_{1},\ldots,C_{m},C_{1},\ldots,C_{m})\). In particular, \(\varphi (x,C_{1},\ldots,C_{m},C_{1},\ldots,C_{m})\) does not vanish identically. Therefore neither does K, as was to be shown.

Before beginning the induction that lay at the heart of his second proof, Gauss stated and proved one final lemma.

Lemma 3:

Let \(\Phi (u,x)\) denote the product \(\prod _{i=1}^{k}(\alpha _{i} +\beta _{i}u +\gamma _{i}x)\) of any number of factors linear in the indeterminates u and x, and let v be another indeterminate. Then the function

$$\displaystyle{\Omega = \Phi (u + v\dfrac{\partial \Phi } {\partial x},x - v\frac{\partial \Phi } {\partial u} )}$$

is divisible by \(\Phi (u,x)\).

Proof:

For each i between 1 and k, we have

$$\displaystyle{\Phi (u,x) = (\alpha _{i} +\beta _{i}u +\gamma _{i}x)Q_{i},}$$

where Q i denotes the product of all the factors \(\alpha _{j} +\beta _{j}u +\gamma _{j}x\) with ji. (Each Q i is thus a polynomial in \(u,x,\alpha _{j},\beta _{j},\gamma _{j}\), for \(1 \leq j \leq k,j\neq i\).) So

$$\displaystyle{\frac{\partial \Phi } {\partial x} = \gamma _{i}Q_{i} + (\alpha _{i} +\beta _{i}u +\gamma _{i}x)\frac{\partial Q_{i}} {\partial x} \quad \text{ and }\quad \frac{\partial \Phi } {\partial u} = \beta _{i}Q_{i} + (\alpha _{i} +\beta _{i}u +\gamma _{i}x)\frac{\partial Q_{i}} {\partial u}.}$$

Substituting the expressions on the right sides of those equations into the corresponding factor

$$\displaystyle{\alpha _{i} +\beta _{i}u +\gamma _{i}x +\beta _{i}v\frac{\partial \Phi } {\partial x} -\gamma _{i}v\frac{\partial \Phi } {\partial u}}$$

of \(\Phi \) yields

$$\displaystyle{(\alpha _{i} +\beta _{i}u +\gamma _{i}x)(1 +\beta _{i}v\frac{\partial Q} {\partial x} -\gamma _{i}v\frac{\partial Q} {\partial u} ),}$$

and consequently,

$$\displaystyle{\Omega = \Phi (u,x)\prod _{i=1}^{k}(1 +\beta _{ i}v\frac{\partial Q} {\partial x} -\gamma _{i}v\frac{\partial Q} {\partial u} ).\qquad \qquad \text{q.e.d.}}$$

When applied to \(\zeta (u,x) = z(u,x,s_{1},\ldots,s_{m})\), Lemma 3 shows that ζ divides

$$\displaystyle{z(u + v \frac{\partial \zeta } {\partial x},x - v \frac{\partial \zeta } {\partial u},s_{1},\ldots,s_{m}),}$$

say with quotient \(\Psi (u,x,v,s_{1},\ldots,s_{m})\). Likewise,

$$\displaystyle{z(u + v\frac{\partial z} {\partial x},x - v\frac{\partial z} {\partial u},i_{1},\ldots,i_{m}) = z(u,x)\Psi (u,x,v,i_{1},\ldots,i_{m})}$$

and

$$\displaystyle{Z(u + v\frac{\partial Z} {\partial x},x - v\frac{\partial Z} {\partial u},C_{1},\ldots,C_{m}) = Z(u,x)\Psi (u,x,v,C_{1},\ldots,C_{m}).}$$

Now, given definite values U and X for the indeterminates u and x, let U and X denote

$$\displaystyle{ \left.\dfrac{\partial Z} {\partial u} \right \vert _{(U,X)}\quad \text{and}\qquad \left.\dfrac{\partial Z} {\partial x} \right \vert _{(U,X)}, }$$

respectively. Then

$$\displaystyle{ Z(U + \mathit{vX}^{{\prime}},X -\mathit{vU}^{{\prime}}) = Z(U,X)\Psi (U,X,v,C_{ 1},\ldots,C_{m}). }$$

If U ≠ 0, replacing v by \(\dfrac{X - x} {U^{{\prime}}}\) yields

$$\displaystyle{ Z(U + \frac{\mathit{XX}^{{\prime}}} {U^{{\prime}}} -\frac{\mathit{xX}^{{\prime}}} {U^{{\prime}}},x) = Z(U,X)\Psi (U,X, \frac{X - x} {U^{{\prime}}},C_{1},\ldots,C_{m}). }$$
(22)

In other words, setting \(u = U + \dfrac{X - x} {U^{{\prime}}} X^{{\prime}}\) transforms Z(u, X) into

$$\displaystyle{ Z(U,X)\Psi (U,X, \dfrac{X - x} {U^{{\prime}}},C_{1},\ldots,C_{m}). }$$

By Lemma 2, the assumption that P Y ≠ 0 implies that the polynomial P Z does not identically vanish. Therefore P Z (x) = 0 for only finitely many values of x, so a real number X may be chosen for which P Z (X) ≠ 0; that is, the discriminant of the function Z(u, X) is non-zero. By Lemma 1, the polynomial Z(u, X) and its derivative \(\dfrac{\mathit{dZ}} {\mathit{du}}\) thus have no common factor. Also, as noted earlier, Z(u, X) has degree \(n = m(m - 1)/2\) in u, where m = k2μ, k odd, is the degree of Y (x). Hence \(n = (k2^{\mu })(k2^{\mu } - 1)/2 = (k^{2})2^{2\mu -1} - k2^{\mu -1} = [k(k2^{\mu } - 1)]2^{\mu -1}\). The quantity in brackets in the last member of that equation is odd, so the power of 2 in n is less than the power of 2 in m. Therefore we may assume by induction that there is a real or complex value U for which Z(U, X) = 0.Footnote 16 By the factor theorem, uU must then be a factor of Z(u, X), but, ipso facto, not a factor of \(\dfrac{\mathit{dZ}} {\mathit{du}}\). So U ≠ 0, again by the factor theorem.

For those particular values of X and U, the right-hand member of (22) is identically zero, independent of the value of x. Thus Z(u, x), regarded as a polynomial in u with coefficients that are polynomials in x, vanishes when \(u = U + \dfrac{X - x} {U^{{\prime}}} X^{{\prime}}\), and so has \(u - U -\dfrac{X - x} {U^{{\prime}}} X^{{\prime}}\) as a factor. If we then let u = x 2, the polynomial Z(x 2, x) must have the quadratic polynomial

$$\displaystyle{x^{2} - U -\dfrac{X - x} {U^{{\prime}}} X^{{\prime}} = x^{2} + \frac{X^{{\prime}}} {U^{{\prime}}}x - (U + \frac{\mathit{XX}^{{\prime}}} {U^{{\prime}}} )}$$

as a factor. The quadratic formula then provides a real or complex root of Z(x 2, x).

Finally, recall that \(z(x^{2},x,i_{1},\ldots,i_{m})\) is the unique polynomial that transforms into ζ(x 2, x) when each i j is replaced by the elementary symmetric polynomial s j , and that Z(x 2, x) is obtained from \(z(x^{2},x,i_{1},\ldots,i_{m})\) by replacing each i j by the coefficient C j of Y (x). But

$$\displaystyle\begin{array}{rcl} & & \zeta (x^{2},x) =\prod [x^{2} - (a_{ i} + a_{j})x + a_{i}a_{j}] =\prod (x - a_{i})(x - a_{j}) {}\\ & & \phantom{\zeta (x^{2},x) =\prod [x^{2} - (a_{ i} + a_{j})x + a_{i}a_{j}] =\prod (x-} =\prod _{ i=1}^{m}(x - a_{ i})^{m-1} = \nu ^{m-1} {}\\ \end{array}$$

(where the first two products are taken over all unordered pairs \(\{a_{i},a_{j}\},i\neq j\); cf. (18) and (12) above), and the unique polynomial in \(x,i_{1},\ldots,i_{m}\) that transforms into ν m−1 when each i j is replaced by s j is y(x)m−1 (cf. (11)). Therefore \(z(x^{2},x) = (y(x))^{m-1}\) and \(Z(x^{2},x) = (Y (x))^{m-1}\), so the root found for Z(x 2, x) is also a root of Y (x), completing the proof of the theorem.

The proof just given is remarkable not only for purity of method but for its economy of means. The principal tool invoked is the Fundamental Theorem on Symmetric Polynomials, applied over and over again to a sequence of carefully chosen polynomials, and the only non-algebraic principle used is the intermediate-value theorem. As such it is a tour de force. Its length, however, is a pedagogical deterrent, and the justifications for some of the definitions and substitutions employed become apparent only with hindsight; it is thus less perspicuous than Argand’s nearly contemporaneous argument.

8.6 Proofs based on integration

The proofs of the Fundamental Theorem of Algebra discussed above are all direct proofs. There are indirect proofs as well, several of which are based on the theory of integration. Among them is another by Gauss, published just one year after his second.Footnote 17

Gauss’s third proof:

As in his first proof, given a monic polynomial

$$\displaystyle{ Y = x^{m} + A_{ 1}x^{m-1} +\ldots +A_{ m-1}x + A_{m} }$$

with real coefficients, Gauss began by writing the variable x in polar form as \(x = r(\cos \phi +i\sin \phi )\) and considered the real and imaginary parts of Y, which he denoted by t and u. Thus (replacing Gauss’s \(A,B,\ldots,M\) by \(A_{1},\ldots,A_{m}\))

$$\displaystyle{t =\sum _{ j=0}^{m}A_{ j}r^{m-j}\cos (m - j)\phi \quad \text{and}\quad u =\sum _{ j=0}^{m}A_{ j}r^{m-j}\sin (m - j)\phi,}$$

where A 0 = 1 and, for 1 ≤ j ≤ m, the coefficients A j are arbitrary real numbers. He further defined

$$\displaystyle\begin{array}{rcl} t^{{\prime}}& =& \sum _{ j=0}^{m}(m - j)A_{ j}r^{m-j}\cos (m - j)\phi, {}\\ u^{{\prime}}& =& \sum _{ j=0}^{m}(m - j)A_{ j}r^{m-j}\sin (m - j)\phi, {}\\ t^{{\prime\prime}}& =& \sum _{ j=0}^{m}(m - j)^{2}A_{ j}r^{m-j}\cos (m - j)\phi, {}\\ u^{{\prime\prime}}& =& \sum _{ j=0}^{m}(m - j)^{2}A_{ j}r^{m-j}\sin (m - j)\phi,\quad \text{and} {}\\ y\ & =& \frac{(t^{2} + u^{2})(\mathit{tt}^{{\prime\prime}} + \mathit{uu}^{{\prime\prime}}) + (\mathit{tu}^{{\prime}}-\mathit{ut}^{{\prime}})^{2} - (\mathit{tt}^{{\prime}} + \mathit{uu}^{{\prime}})^{2}} {r(t^{2} + u^{2})^{2}} \,, {}\\ \end{array}$$

and stipulated that R should be a real number greater than the largest of the numbers \((m\vert A_{j}\vert \sqrt{2})^{1/j}\), for 1 ≤ j ≤ m. He then claimed that setting r = R would ensure that \(\mathit{tt}^{{\prime}} + \mathit{uu}^{{\prime}}\) was positive, for any angle ϕ.

Proof of claim:

Corresponding to the definitions of t, u, t and u , let

$$\displaystyle\begin{array}{rcl} T& =& \sum _{j=0}^{m}A_{ j}R^{m-j}\cos ( \frac{\pi } {4} + j\phi ), {}\\ U& =& \sum _{j=0}^{m}A_{ j}R^{m-j}\sin ( \frac{\pi } {4} + j\phi ), {}\\ T^{{\prime}}& =& \sum _{ j=0}^{m}(m - j)A_{ j}R^{m-j}\cos ( \frac{\pi } {4} + j\phi ),\quad \text{and} {}\\ U^{{\prime}}& =& \sum _{ j=0}^{m}(m - j)A_{ j}R^{m-j}\sin ( \frac{\pi } {4} + j\phi ). {}\\ \end{array}$$

Gauss observed that T could be rewritten as

$$\displaystyle{ \sum _{j=1}^{m}\frac{R^{m-j}} {m\sqrt{2}} (R^{j} + \mathit{mA}_{ j}\sqrt{2}\cos ( \frac{\pi } {4} + j\phi )), }$$

and similarly for U, T and U ; so, by the stipulation on R, those four quantities, and hence \(\mathit{TT}^{{\prime}} + \mathit{UU}^{{\prime}}\) must all be positive. But when r = R, \(\mathit{tt}^{{\prime}} + \mathit{uu}^{{\prime}} = \mathit{TT}^{{\prime}} + \mathit{UU}^{{\prime}}\).

To see that, note first that when r = R, the quantity t is equal to

$$\displaystyle{ T\cos ( \frac{\pi } {4} + m\phi ) + U\sin ( \frac{\pi } {4} + m\phi ). }$$
(23)

For, by the definitions of T and U, each term of (23) is of the form

$$\displaystyle\begin{array}{rcl} & & \qquad A_{j}R^{m-j}[\cos ( \frac{\pi } {4} + j\phi )\cos ( \frac{\pi } {4} + m\phi ) +\sin ( \frac{\pi } {4} + j\phi )\sin ( \frac{\pi } {4} + m\phi )] \\ & & = \frac{A_{j}R^{m-j}} {2} [\cos ((m - j)\phi ) +\cos ( \frac{\pi } {2} + (m + j)\phi ) +\cos ((m - j)\phi ) \\ & & \qquad \qquad \qquad \qquad \quad -\cos ( \frac{\pi } {2} + (m + j)\phi )] = A_{j}R^{m-j}\cos ((m - j)\phi ), {}\end{array}$$
(24)

the corresponding term of t. Likewise, when r = R, the quantities u, t and u are equal to

$$\displaystyle\begin{array}{rcl} & & T\sin ( \frac{\pi } {4} + m\phi ) - U\cos ( \frac{\pi } {4} + m\phi ), {}\\ & & T^{{\prime}}\cos ( \frac{\pi } {4} + m\phi ) + U^{{\prime}}\sin ( \frac{\pi } {4} + m\phi ),\quad \text{and} {}\\ & & T^{{\prime}}\sin ( \frac{\pi } {4} + m\phi ) - U^{{\prime}}\cos ( \frac{\pi } {4} + m\phi ), {}\\ \end{array}$$

respectively. Then, letting \(A = \dfrac{\pi } {4} + m\phi\), we have

$$\displaystyle\begin{array}{rcl} \mathit{tt}^{{\prime}}& =& \mathit{TT}^{{\prime}}\cos ^{2}A + T^{{\prime}}U\sin A\cos A + \mathit{TU}^{{\prime}}\sin A\cos A + \mathit{UU}^{{\prime}}\sin ^{2}A\quad \text{and} {}\\ \mathit{uu}^{{\prime}}& =& \mathit{TT}^{{\prime}}\sin ^{2}A - T^{{\prime}}U\sin A\cos A -\mathit{TU}^{{\prime}}\sin A\cos A + \mathit{UU}^{{\prime}}\cos ^{2}A, {}\\ \end{array}$$

so \(\mathit{tt}^{{\prime}} + \mathit{uu}^{{\prime}} = \mathit{TT}^{{\prime}} + \mathit{UU}^{{\prime}} > 0\) when r = R, as claimed.

In addition, when r = R,

$$\displaystyle\begin{array}{rcl} t^{2}& =& T^{2}\cos ^{2}A + 2\mathit{TU}\sin A\cos A + U^{2}\sin ^{2}A\quad \text{and} {}\\ u^{2}& =& T^{2}\sin ^{2}A - 2\mathit{TU}\sin A\cos A + U^{2}\cos ^{2}A, {}\\ \end{array}$$

so \(t^{2} + u^{2} = T^{2} + U^{2}\). Consequently, for any r satisfying the stipulations on R, \(t^{2} + u^{2}\) must be positive, whence t and u cannot simultaneously equal 0.

On the other hand, within the circle C of radius R centered at the origin there must be a point (r, ϕ) where both t = 0 and u = 0 (and thus a point \(x = r(\cos \phi +i\,\sin \phi )\) where Y = 0, proving the theorem). For suppose not. Then let

$$\displaystyle{ \Omega =\int \int _{\begin{array}{c}C\end{array}}y\,\mathit{dA} =\int _{ 0}^{R}\int _{ 0}^{2\pi }y\,d\phi \,\mathit{dr} =\int _{ 0}^{2\pi }\int _{ 0}^{R}y\,\mathit{dr}\,d\phi. }$$

Note that \(\dfrac{\partial t} {\partial \phi } = -u^{{\prime}}\), \(\dfrac{\partial u} {\partial \phi } = t^{{\prime}}\), \(\dfrac{\partial t^{{\prime}}} {\partial \phi } = -u^{{\prime\prime}}\quad \text{and}\quad \dfrac{\partial u^{{\prime}}} {\partial \phi } = t^{{\prime\prime}}\). Using those relations, one computes that

$$\displaystyle{ \frac{\partial } {\partial \phi }\left [ \frac{tu^{{\prime}}-\mathit{ut}^{{\prime}}} {r(t^{2} + u^{2})}\right ] = y,\quad \text{that is,}\quad \int y\,d\phi = \frac{\mathit{tu}^{{\prime}}-\mathit{ut}^{{\prime}}} {r(t^{2} + u^{2})}. }$$
(25)

Since u and u both equal 0 when ϕ = 0 or ϕ = 2π, the last expression above is also zero for those values of ϕ, whence

$$\displaystyle{ \int _{0}^{2\pi }y\,d\phi = 0,\quad \text{and so}\quad \Omega =\int _{ 0}^{R}\int _{ 0}^{2\pi }y\,d\phi \,\mathit{dr} = 0. }$$
(26)

Similarly, from \(r \dfrac{\partial t} {\partial r} = t^{{\prime}}\), \(r\dfrac{\partial u} {\partial r} = u^{{\prime}}\), \(r\dfrac{\partial t^{{\prime}}} {\partial r} = t^{{\prime\prime}}\), and \(r\dfrac{\partial u^{{\prime}}} {\partial r} = u^{{\prime\prime}}\), one computes that

$$\displaystyle{ \frac{\partial } {\partial r}\left [\frac{\mathit{tt}^{{\prime}} + \mathit{uu}^{{\prime}}} {t^{2} + u^{2}} \right ] = y,\quad \text{that is,}\quad \int y\,\mathit{dr} = \frac{\mathit{tt}^{{\prime}} + \mathit{uu}^{{\prime}}} {t^{2} + u^{2}}. }$$
(27)

Consequently,

$$\displaystyle{\int _{0}^{R}y\,\mathit{dr} = \left.\frac{\mathit{tt}^{{\prime}} + \mathit{uu}^{{\prime}}} {t^{2} + u^{2}} \right \vert _{0}^{R} = \frac{\mathit{TT}^{{\prime}} + \mathit{UU}^{{\prime}}} {T^{2} + U^{2}} > 0\quad \text{by the claim proved earlier.}}$$

But then

$$\displaystyle{\Omega =\int _{ 0}^{2\pi }\int _{ 0}^{R}y\,\mathit{dr}\,d\phi =\int _{ 0}^{2\pi }\frac{\mathit{TT}^{{\prime}} + \mathit{UU}^{{\prime}}} {T^{2} + U^{2}} > 0,\quad \text{contrary to}\ \text{(26)}.}$$

In his prefatory remarks, Gauss said merely that continued reflection on the Fundamental Theorem had led him to this third proof, which, like the second, was “purely analytic,” but was based on entirely different principles and far surpassed the second in simplicity. And indeed, like Argand’s proof, nothing beyond advanced calculus is needed for understanding the argument just given. However, several of the functions used in the proof, especially y, are introduced seemingly out of the blue, and it seems almost miraculous that the partial derivatives in (25) and (27) turn out to equal y. Thus, though succinct and requiring minimal prerequisites, the proof is not perspicuous: It provides convincing verification that the Fundamental Theorem is true, but it is not explanatory, since it does not convey understanding of why it is.

Other indirect proofs of the Fundamental Theorem are based on Cauchy’s theory of complex contour integration. The best known is perhaps that based on Liouville’s Theorem (that a bounded entire function must be constant): For if the polynomial p(z) had no zero in the complex plane, then \(\dfrac{1} {p(z)}\) would be an entire function; and as in Argand’s proof, for any positive constant C there is an R > 0 such that | p(z) |  > C whenever | z |  > R. Thus \(\dfrac{1} {p(z)} < \dfrac{1} {C}\) for | z |  > R, and as a continuous function, \(\dfrac{1} {p(z)}\) would also be bounded within the disc | z | ≤ R. Thus \(\dfrac{1} {p(z)}\) would be bounded throughout the complex plane, and hence a constant by Liouville’s Theorem — a contradiction for any p(z) of positive degree.

Liouville’s Theorem is itself a consequence of Cauchy’s integral formula, which asserts that if f(z) is any function analytic in a simply connected domain containing the simple closed curve γ, then for any point z 0 inside γ,

$$\displaystyle{f(z_{0}) = \dfrac{1} {2\pi i}\int _{\gamma } \dfrac{f(z)} {z - z_{0}}\,\mathit{dz};}$$

and, as first noted in Zalcman (1978) (see also Lax and Zalcman 2012), Cauchy’s formula may be applied directly to yield an even simpler proof of the Fundamental Theorem of Algebra. For if | p(z) | ≠ 0 throughout the complex plane, then \(q(z) = \dfrac{1} {p(z)}\) is entire and \(\dfrac{1} {p(0)} = q(0)\neq 0\). Hence

$$\displaystyle{q(0) = \frac{1} {2\pi i}\int _{\vert z\vert =R}\frac{q(z)} {z} \,\mathit{dz} = \frac{1} {2\pi }\int _{0}^{2\pi }q(\mathit{Re}^{i\theta })\,d\theta,}$$

for any R > 0. But as R approaches , the last integral approaches zero, contrary to q(0) ≠ 0.

Alternatively, a proof of the Fundamental Theorem may be couched in terms of winding numbers, where the winding number of a continuously differentiable closed curve γ about the origin is given by

$$\displaystyle{ \frac{1} {2\pi i}\int _{\gamma }\frac{\mathit{dz}} {z} \,,}$$

if γ does not pass through the origin. That notion can, however, also be defined without reference to line integrals: Less formally, and more generally, if f(z) is a continuous function that is never zero, the winding number of f(z) around the origin as z traces out a continuously differentiable closed curve γ may be defined, as in Courant and Robbins (1941), as “the net number of complete revolutions made by a vector joining the origin to f(γ(z)) as z traces out γ.” Courant and Robbins then offer the following indirect proof of the Fundamental Theorem.

Suppose that the monic polynomial \(p(z) = z^{n} + a_{n-1}z^{n-1} + \cdots + a_{0}\) of degree n > 0 is never zero. Let C t be the circle about the origin of radius t, given by the equation z = te i θ, and let ϕ(t) be the winding number of p(z) around the origin as z traces out C t . Then ϕ is a continuous, integer-valued function of t, and so must be a constant; and since ϕ(0) = 0, we must have ϕ(t) = 0 for all t.

But

$$\displaystyle{ \vert z^{n} - p(z)\vert \leq \vert a_{ n-1}\vert \vert z\vert ^{n-1} +\ldots +\vert a_{ 0}\vert = \vert z\vert ^{n-1}\left [\vert a_{ n-1} +\ldots + \frac{\vert a_{0}\vert } {\vert z\vert ^{n-1}}\right ], }$$

so for values of t greater than \(\vert a_{0}\vert + \vert a_{1}\vert +\ldots +\vert a_{n-1}\vert + 1,\) the length | z np(z) | of the vector from p(z) to z n will be less than or equal to

$$\displaystyle{t^{n-1}[\vert a_{ n-1}\vert +\ldots + \frac{\vert a_{0}\vert } {t^{n-1}}] < t^{n} = \vert z^{n}\vert,\ \text{the distance from}\ z^{n}\ \text{to the origin.}}$$

(See Figure 8.4.)

Fig. 8.4
figure 4

A winding-number proof. Adapted from Figure 150, p. 270 in What is Mathematics?, 2nd. ed. (1996), by Richard Courant and Herbert Robbins. By permission of Oxford University Press, U.S.A.

The segment joining p(z) to z n thus cannot pass through the origin when z is on C t . Deforming the curve traced by p(z) to the circle traced by z n by shrinking each such line segment to zero will thus not alter the value of ϕ(t), which must be the same as the winding number of z n around the origin as z traces out C t .Footnote 18 But that number is n, which is greater than zero, contrary to what was found above.

The preface to the first edition of What is Mathematics? states that that book “presupposes only knowledge that a good high school course could impart,” and the proof just given is an excellent example of a perspicuous informal proof. By dispensing with reference to line integrals, Courant and Robbins succeeded in offering a proof of the Fundamental Theorem that should be both convincing and understandable to mathematically inclined high school students — a remarkable achievement from a pedagogical standpoint, since it is at that level that students first encounter the Fundamental Theorem, and all other proofs known to this author presume at least knowledge of advanced calculus.Footnote 19

8.7 Other modern proofs

As part of the development of field theory in the nineteenth century, Kronecker proved the basic result needed to establish the existence of splitting fields: that if p(x) is an irreducible polynomial with coefficients in a field F, then there is a field F extending F, of finite degree over F, in which p(x) has a root. The earlier arguments put forward as proofs of the Fundamental Theorem of Algebra by Lagrange and Laplace were thereby validated.

Later in the nineteenth century the work of Évariste Galois was belatedly published, and in 1872 Peter Ludvig Sylow proved the theorem in group theory now bearing his name, according to which for any prime number p, if p m is the largest power of p dividing the order of a finite group G, then G must possess subgroups of order p k for each k ≤ m. In addition, the number of subgroups of G of order p m divides the order of G and is congruent to 1 modulo p. Emil Artin then gave a proof of the Fundamental Theorem in terms of those concepts.

Proof via Galois theory:

Footnote 20 Let K be a splitting field for the polynomial p(x), of degree n > 1 with real coefficients. K is a finite extension of the real field \(\mathbb{R}\), say of degree d = 2m q over \(\mathbb{R}\), where q is odd. Since \(\mathbb{R}\) has characteristic 0, K is a simple extension \(\mathbb{R}(\alpha )\) of \(\mathbb{R}\), where α is a root of a unique monic irreducible polynomial r(x) of degree d with real coefficients. If m = 0, r(x) must have degree q = 1, since every real polynomial of odd degree has a real root. So in that case \(K = \mathbb{R}\). If m = 1 and q = 0, r(x) is a quadratic polynomial, whose roots must lie in \(\mathbb{C}\), whence \(K = \mathbb{C}\). So suppose m > 0 and q ≠ 0, and let G be the Galois group of K over R. By the fundamental theorem of Galois theory, G must have order d, and by Sylow’s theorem, G therefore has a subgroup of order 2m and index q. Again by the fundamental theorem of Galois theory, there must then be an extension field E of \(\mathbb{R}\) intermediate between \(\mathbb{R}\) and K, having degree \(q\) over \(\mathbb{R}\). Then as above, since q is odd, q = 1 and \(E = \mathbb{R}\). Accordingly, K must have degree 2m over \(\mathbb{R}\), so the order of G is 2m. By Sylow’s Theorem, G has a subgroup H 1 of order 2m−1. Let L 1 be the fixed field of H 1. Then L 1 is an extension of \(\mathbb{R}\) of degree 2, so L 1 is isomorphic to \(\mathbb{C}\) (since L 1 is obtained from \(\mathbb{R}\) by adjoining a root of a quadratic polynomial), and K is an extension of L 1 of degree 2m−1. If m > 1, let H 2 be a subgroup of H 1 of order 2m−2, and let L 2 be the fixed field of H 2. Then L 2 is an extension of L 1 (and so is isomorphic to an extension of \(\mathbb{C}\)) of degree 2. But that is impossible, since quadratic polynomials with complex coefficients always have complex roots. Thus m = 1 and K has degree 20 = 1 over L 1; that is, \(K \simeq L_{1} \simeq \mathbb{C}\), where ≃ denotes isomorphism. q.e.d.

The prerequisites for understanding the proof just given are substantially greater than those for the other proofs so far considered. In particular, few undergraduates would be able to comprehend it. Nonetheless, since Galois theory was developed to answer questions about the solvability of polynomials by radicals, it is appropriate to use it to prove the Fundamental Theorem of Algebra; and like Gauss’s second proof, it is as methodologically pure as possible, invoking nothing non-algebraic except the intermediate-value theorem.

There are also proofs of the Fundamental Theorem based on sophisticated notions from algebraic topology (Brouwer degree, homotopy theory, simplicial complexes, homology theory), three of which are sketched in chapter 9 and appendix D of Fine and Rosenberger (1997). They are not discussed further here, since the means they employ were developed to address very different concerns and go far beyond what is required to establish the Fundamental Theorem. Nevertheless, they demonstrate the power of topological techniques (another example of benchmarking) and illustrate the coherence of disparate mathematical theories.

A much simpler proof, based on notions from point-set topology, establishes the Fundamental Theorem in the equivalent form: Every polynomial of non-zero degree with complex coefficients, regarded as a mapping with domain \(\mathbb{C}\), has range all of \(\mathbb{C}\).Footnote 21

A topological proof:

Footnote 22 Let p(z) be a polynomial of degree n > 1 with complex coefficients. Since p is continuous and | p(z) | approaches infinity as | z | does, the range R of p must be closed (by Weierstrass’s theorem that every bounded sequence has a convergent subsequence). Consider then the set T of all points p(z) where p (z) = 0. Since \(\mathbb{C} - R\) is open, it suffices to show that RT is open as well. For if so, then \((\mathbb{C} - R) \cup (R - T) = \mathbb{C} - T\). Since T is finite, \(\mathbb{C} - T\) is a connected set, which cannot be the union of two disjoint nonempty open sets; and since p −1(T) is finite and n > 1, RT is nonempty. Thus \(\mathbb{C} - R\) must be empty.

The proof is completed by noting that for every p(z 0) in RT, p (z 0) ≠ 0, so the inverse function theorem implies that there are neighborhoods U of z 0 and V of p(z 0) such that p maps U one-to-one onto V. Hence every point of RT is an interior point.

There remains the purity question broached earlier in connection with Euler’s failed proof attempt: Can the original statement of the Fundamental Theorem, that any non-constant polynomial with real coefficients can be expressed as a product of real linear and quadratic factors, be proved without reference to complex numbers or splitting fields?

Such a proof has been given, based on concepts from linear algebra. The proof is by induction on the degree of the polynomial and uses properties of the so-called Bezoutian resultant (the determinant of an n × n symmetric matrix defined for any pair of polynomials p, q, where n is the maximum of the degrees of p and q). Specifically, writing p as a function of the variable x and q as a function of the variable w, the factor theorem implies that xw must be a factor of the polynomial \(P(x,w) = p(x)q(w) - p(w)q(x)\), say \(P(x,w) = (x - w)b(x,w)\), where b is a polynomial each of whose terms is of the form \(b_{\mathit{ij}}x^{i-1}w^{j-1}\), for 1 ≤ i, j ≤ n. The coefficients b ij are the entries of the Bezoutian matrix B(p, q), which is nonsingular if and only if p and q have no common root.Footnote 23

In outline, the proof then proceeds as follows: Given a real polynomial \(p(x) = a_{n}x^{n} + a_{n-1}x^{n-1} +\ldots +a_{0}\) with a 0 ≠ 0 and a n ≠ 0, if n is odd, then p(x) has a real root x 0 by the intermediate-value theorem, so \(p(x) = (x - x_{0})q(x)\), where q(x) has degree n − 1. By the induction hypothesis, q(x) then is a product of real linear and quadratic factors, whence so is p(x). If n is even, say n = 2m, then let r be a real parameter and define auxiliary polynomials p 1(x) and p 2(x) by

$$\displaystyle\begin{array}{rcl} & p_{1}(x) = p(\mathit{rx}) = a_{n}r^{n}x + a_{n-1}r^{n-1}x^{n-1} =\ldots +a_{0}\quad \text{and}& {}\\ & p_{2}(x) = a_{0}x^{n} + a_{1}\mathit{rx}^{n-1} +\ldots +a_{n}r^{n}\,. & {}\\ \end{array}$$

The intermediate-value theorem, together with properties of principal minors, can then be invoked to show that there must be a value r 0 of r for which the determinant of B(p 1, p 2) = 0 ; so when r = r 0, p 1(x) and p 2(x) will have some maximal common factor f(x). Writing \(p_{1}(x) = f(x)Q_{1}(x)\) and p 2(x) = f(x)Q 2(x) yields \(p_{1}(x)Q_{2}(x) = p_{2}(x)Q_{1}(x)\), where Q 1 and Q 2 are relatively prime polynomials of degree less than n. By the induction hypothesis, if Q 2 is not constant, it must be a product of linear and quadratic factors, all of which divide p 1. The quotient \(p_{1}(x)/Q_{2}(x)\) then also has degree less than n, so it too must be a product of linear and quadratic factors. Hence p 1(x) is a product of linear and quadratic factors, and replacing x in each of those factors by xr 0 produces a similar factorization of p(x).

On the other hand, if Q 2(x) is a constant, then f(x) has degree n, so Q 1(x) is also a constant. Hence \(p_{1}(x) = \mathit{cp}_{2}(x)\) for some constant c, so for each i between 0 and n, the coefficient of x i in p 1(x) must equal c times the coefficient of x i in p 2(x); that is, \(a_{i}r_{0}^{i} = \mathit{ca}_{n-i}r_{0}^{n-i}\) for each such i. In particular, for i = m, \(a_{m}r_{0}^{m} = \mathit{ca}_{m}r_{0}^{m}\), so c = 1 and \(a_{j}r_{0}^{j} = a_{n-j}r_{0}^{n-j}\) for 0 ≤ j ≤ n, which means that p 1(x) is a palindromic (self-reciprocal) polynomial. But then either x + 1 is a factor of p 1(x), or \(p_{1}(x) = x^{m}q(X)\), where q is a polynomial of degree m and \(X = x + x^{-1}\).

By the induction hypothesis, q is a product of real linear and quadratic polynomials in X, so p 1(x) is a product of real linear, quadratic, and quartic polynomials in x. Since quartics can always be factored (e.g., by Euler’s technique) into products of real linear and quadratic expressions, so can p 1(x). The desired factorization of p(x) is then obtained once again by replacing x everywhere by xr 0.

The proof just given (published in Eaton 1960) employs very different techniques than the others considered here. However, the idea of using the resultant of two polynomials as a tool in proving the Fundamental Theorem of Algebra is not new. Indeed, the British mathematician James Wood gave an incomplete proof of that sort in 1798, the year before Gauss’s first proof.Footnote 24 A full proof using resultants was later published by Paul Gordan, who presented it as a simpler alternative to Gauss’s second proof (Gordan 1876).

8.8 Constructive proofs

Although the foregoing proofs establish the existence of a root for any nonconstant polynomial, they provide no means for actually finding one. One may therefore wonder, as Weierstrass did in 1891, whether it is possible “for any given polynomial f in \(\mathbb{C}(Z)\), to produce a sequence z n of complex numbers by an effectively defined procedure, so that | f(z n ) | is sufficiently small in relation to | f(z n−1) | that it converges to a zero of f?”Footnote 25 Beyond that, one might ask whether the proof that such a procedure converges can be done constructively, and how computationally efficient the procedure is. The importance of such questions to mathematical practice is indicated by the appearance in 1969 of a volume entitled Constructive Aspects of the Fundamental Theorem of Algebra (Dejon and Henrici 1969), containing the proceedings of a symposium on that topic held two years before at the IBM Research Laboratory in Zürich.

In 1924 Hermann Weyl gave an intuitionistic proof of the Fundamental Theorem of Algebra that invoked winding numbers (Weyl 1924, pp. 142–146). Later Hellmuth Kneser presented a modification of Argand’s proof in which, given a nonconstant polynomial p(x), he defined a sequence of complex numbers and proved, also by means acceptable to intuitionists, that it converged to a root of p (H. Kneser 1940). Specifically, given a monic polynomial \(p(x) = x^{n} + a_{n-1}x^{n-1} +\ldots +a_{0}\) of degree n > 1 with complex coefficients, Kneser defined a sequence of complex numbers x m designed to make the ratio \(\vert p(x_{m+1})\vert /\vert p(x_{m})\vert \) as small as possible. Toward that end he expressed p(x m + y) as \(p(x_{m}) +\sum _{ i=1}^{n}b_{i}y^{i}\), chose y so that one of the terms in the sum would have the same argument as − p(x m ) and would strongly dominate the other terms, and set \(x_{m+1} = x_{m} + y\). To find such a y he employed a lemma whose proof was lengthy and delicate. Forty-one years later his son Martin published a simplified version of the proof (M. Kneser 1981) based on the following much simpler lemma:

Given a monic polynomial p(x) of degree n > 1 with constant term a 0 , there is a positive number q < 1, depending only on n, such that if |a 0 |≤ c, a complex number y can be specified for which |y|≤ c 1∕n and |p(y)|≤ qc.

The sequence {x m } may then be defined inductively, as follows, starting with x 0 = 0. Suppose that x m has already been defined and satisfies \(\vert p(x_{m})\vert \leq q^{m}c\). Apply the lemma to \(f(x) = p(x_{m} + x)\), which has constant term p(x m ), and set \(x_{m+1} = x_{m} + y\). Then \(\vert x_{m+1} - x_{m}\vert = \vert y\vert \leq (q_{m}c)^{1/n}\) and \(\vert p(x_{m+1})\vert \leq q(q^{m}c) = q^{m+1}c\). The first of those inequalities shows that the sequence {x m } converges to some value x and the second that the inductive hypothesis is satisfied by x m+1, so that p(x m ) converges to 0. Since p is continuous, it follows that p(x) = 0. The proof of the lemma is constructive, and can be made to satisfy intuitionistic demands as well.

Two years before Martin Kneser’s paper appeared, Steve Smale and Morris Hirsch also defined a sequence of values guaranteed to converge to a root of any given polynomial of degree n > 0 (Hirsch and Smale 1979, pp. 303–309). Their procedure was based on a modification of Newton’s method, and again involved an auxiliary proposition, namely:

For any positive integer n there exist real numbers \(\sigma _{1},\ldots,\sigma _{n}\) with 0 < σ k ≤ 1 for \(k = 1,\ldots,n\) and K n satisfying 0 < K n < 1, such that if \(h_{k} = \sigma _{k}e^{i\pi /k}\) , then for any n-tuple \((\nu _{1},\ldots,\nu _{n})\) in \(\mathbb{C}^{n} - 0\) there is an m for which

$$\displaystyle{\left \vert 1 +\sum _{ k=1}^{n}( \frac{\nu _{k}} {\nu _{m}}h_{m})^{k}\right \vert < K_{ n}.}$$

Suppose then that p(x) is a polynomial of degree n > 0 and z is any complex number. If p(z) = 0, let z  = z. Otherwise, for each positive integer k ≤ n, let ν k be a k th root of \(p^{(k)}(z)/k!p(z)\). Since n > 0, p (n)(z) ≠ 0, so \((\nu _{1},\ldots,\nu _{n})\) is in \(\mathbb{C}^{n} - 0\). The auxiliary proposition then yields a ν m ≠ 0, whence z can be taken to be \(z + h_{m}/\nu _{m}\). Then by Taylor’s theorem,

$$\displaystyle\begin{array}{rcl} \vert p(z^{{\prime}})\vert & =& \left \vert p(z) +\sum _{ k=1}^{n}\frac{p^{(k)}(z)} {k!} \left (\frac{h_{m}} {\nu _{m}} \right )^{k}\right \vert {}\\ & =& \left \vert p(z)\left [1 +\sum _{ k=1}^{n}\frac{p^{(k)}(z)} {p(z)k!} \left (\frac{h_{m}} {\nu _{m}} \right )^{k}\right ]\right \vert {}\\ & =& \vert p(z)\vert \left \vert 1 +\sum _{ k=1}^{n}\left ( \frac{\nu _{k}} {\nu _{m}}h_{m}\right )^{k}\right \vert < K_{ n}\vert p(z)\vert. {}\\ \end{array}$$

To prove the Fundamental Theorem of Algebra, let z 0 be any complex number, and assume by induction that \(z_{0},\ldots,z_{j}\) have already been defined. If p(z j ) = 0, then z j is a root of p, so put \(z_{j+1} = z_{j}\). Otherwise define z j+1 to be \(z_{j} + h_{m_{0}}/\nu _{m_{0}}\), where m 0 is the least value of m that satisfies the inequality given in the auxiliary proposition. Then the equations and inequality displayed above show that \(\vert p(z_{j+1}\vert < K_{n}\vert p(z_{j})\vert \). So \(\vert p(z_{j})\vert < (K_{n})^{j}\vert p(z_{0})\vert \) for every j. Since \(K_{n} < 1,p(z_{j})\) therefore converges to 0 as j approaches . The convergence may be very slow, however, since the authors note that for large values of n, K n is very close to 1.Footnote 26

That a subsequence of {z j } must converge to a root of p follows by a compactness argument:Footnote 27 For since | p(z) | approaches as z does, and since \(\vert p(z_{j})\vert < K_{n}\vert p(z_{0})\vert \) for all j, all the values z j must lie within some disc | z | ≤ R. If there are infinitely many distinct values z j , a subsequence of them must approach some point within that disc, since otherwise for each point z with | z | ≤ R some disc D z centered at z would contain at most one of the z j (namely z j if and only if z = z j ). But since the disc | z | ≤ R is compact, a finite number of those D z would cover it. Thus either the sequence {z j } is eventually constant, say \(z_{k} = z_{j}\) for all k ≥ j (which by construction implies that p(z j ) = 0), or else every neighborhood of some z in \(\mathbb{C}\) contains infinitely many of the z j . In the latter case, the continuity of p implies that p(z ) = 0, so z is a root of p.

Further algorithms and constructive proofs of the Fundamental Theorem of Algebra may be found in the volume (Dejon and Henrici 1969) cited earlier. See in particular the article “A never failing fast convergent root-finding algorithm,” by Bruno Dejon and Karl Nickel, pp. 1–35.