Keywords

1.1 Sets

To study analysis successfully, the reader must be conversant with some of the basic concepts of mathematics. Foremost among these is the idea of a set. For our purposes a set may be thought of as a collection of objects. This statement is too imprecise to be regarded as a definition, and in fact it leads to logical difficulties, but it does convey a mental image of a set that is satisfactory for our purposes. The reader who wishes to delve into the nature of this concept more deeply is referred to [10].

Sets are important in that they can be used to construct a host of mathematical concepts. In fact, every mathematical object studied in this book can be constructed from sets. We therefore begin this introductory chapter with some basic properties of sets. Proofs are omitted because most of the properties are evident and their proofs are straightforward.

First, the objects in a set X are called its elements or members. They are said to be contained in X and to belong to X. If an object x is contained in X, then we write x ∈ X; otherwise we write xX.

We shall assume the existence of a set with no elements. This set is denoted by ∅, and it is unique. It is said to be empty.

If X and Y are sets such that every element of X is also an element of Y, then we say that X is a subset of Y and that it is included in Y. In this case we write X ⊆ Y; otherwise we write X ⊈ Y. Note that ∅ ⊆ X for every set X. The reasoning is that since ∅ has no elements at all, it certainly has no elements that are not in X. Therefore we can safely say, without fear of contradiction, that each of its elements does belong to X. In particular, ∅ is a subset of itself, and in fact it is its only subset. Observe also that every set includes itself. Moreover, if X, Y, Z are sets such that X ⊆ Y and Y ⊆ Z, then X ⊆ Z. In this case we write X ⊆ Y ⊆ Z.

If X and Y are sets such that X ⊆ Y ⊆ X, then X and Y contain exactly the same elements. In this case we say that these sets are equal, and we write X = Y. Otherwise X and Y are distinct, and we write XY. Any set is equal to itself, and if X = Y, then Y = X. Furthermore, if X, Y, Z are sets such that X = Y and Y = Z, then X = Z. In this case we write \(X = Y = Z\). Equal sets are treated as identical since they contain the same elements.

The collection of all subsets of a given set X is another set, called the power set of X. It is denoted by \(\mathcal{P}(X)\). For example, \(\mathcal{P}(\emptyset )\) is a set having ∅ as its only element. This set is denoted by {∅}. Moreover, \(\mathcal{P}(\{\emptyset \})\) is a set containing only the elements ∅ and {∅} and is denoted by {∅, {∅}}.

If X is any set, we may replace the elements of X by other objects and thereby construct a new set. For example, we may replace the unique element ∅ of the set {∅} by any object Y. We then have a new set whose only element is Y. This set is denoted by {Y }. Similarly, if we replace the elements ∅ and {∅} of the set {∅, {∅}} by objects Y and Z, respectively, then we obtain a new set whose only elements are Y and Z. This set is denoted by {Y, Z}. The notation may be extended to an arbitrary number of objects.

If X is a set and P is a property that may be satisfied by some elements of X, then we can construct a subset of X whose elements are precisely the members of X that do satisfy P. This set is denoted by {x ∈ XP}. For example, let X and Y be sets, and let P be the property that x ∈ Y, where x ∈ X. Then {x ∈ XP} is the set whose elements are the objects that are in both X and Y. This set is called the intersection of X and Y and is denoted by XY. If this intersection happens to be empty, then the sets X and Y are disjoint. If \(\mathcal{S}\) is a collection of sets (in other words, a set whose elements are themselves sets), then the sets in \(\mathcal{S}\) are said to be mutually disjoint if the sets A and B are disjoint whenever \(A \in \mathcal{S}\) and \(B \in \mathcal{S}\).

On the other hand, if P is the property that xY, then {x ∈ XP} is the set whose elements are the members of X that are not in Y. This set is denoted by XY and is called the complement of Y with respect to X.

Let \(\mathcal{S}\) be a collection of sets. Then we may construct another set whose elements are the objects that belong to at least one member of \(\mathcal{S}\). This set is called the union of \(\mathcal{S}\). For example, if \(\mathcal{S} =\{ X,Y \}\) for some sets X and Y, then the union of \(\mathcal{S}\) is the set of all objects that are in X or Y. In particular, it contains all the objects that are in both of those sets. It is denoted by XY.

We may define the intersection of \(\mathcal{S}\) as the set of all objects in the union of \(\mathcal{S}\) that belong to every set in \(\mathcal{S}\). If \(\mathcal{S} =\{ X,Y \}\) for some sets X and Y, then the intersection of \(\mathcal{S}\) is the set of all objects that are in both X and Y. Thus it is equal to XY.

1.2 Ordered Pairs, Relations, and Functions

As we said before, sets can be used to construct a large number of mathematical concepts. In this section we show how to construct ordered pairs, relations, and functions from sets, but once again the reader is referred to [10] for the details.

If x and y are any objects, then the ordered pair (x, y) is defined as the set {{x}, {x, y}}. We refer to x and y as its components, x being the first component and y the second. The important observation to be made here is that the definition does not treat x and y similarly (if in fact they are distinct objects). Instead, we are given a way of distinguishing them: y is a member of just one of the two sets in (x, y), but x belongs to both sets. From this observation it is easy to deduce that the ordered pairs (x, y) and (a, b) are equal if and only if x = a and y = b. In other words, for equality to hold it is not sufficient for the sets {x, y} and {a, b} to be equal. Their elements must also be listed in the same order. This is the only property of ordered pairs that is important to remember. Once it is grasped, the definition may be forgotten. The definition can be extended to ordered triples by defining

$$\displaystyle{(x,y,z) = ((x,y),z)}$$

for all objects x, y, z. Thus (x, y, z) technically is an ordered pair whose first component is itself an ordered pair. This notation may be extended to an arbitrary number of objects, as we shall see later.

If X and Y are sets, we may construct a set X × Y whose elements are the ordered pairs (x, y) such that x ∈ X and y ∈ Y. This set is called the Cartesian product of X and Y. A subset of X × Y is called a relation from X to Y. Thus a relation is just a set of ordered pairs. A relation from X to X is sometimes called a relation on X. An example is the relation R on X such that (x, y) ∈ R if and only if x = y. This relation is called the equality relation. Similarly, we may define the inclusion relation by specifying that it contains the ordered pair (x, y) if and only if x and y are sets such that x ⊆ y.

If R is a relation and x and y are objects, we often write xRy to indicate that (x, y) ∈ R. For instance, we are already accustomed to using = to denote the equality relation and writing x = y instead of (x, y) ∈  = . We also write x​​​​y instead of (x, y) ∉ R.

If R and S are relations and x, y, z are objects, then we write xRySz if xRy and ySz. This notation may be extended to arbitrarily long chains of relations.

Some kinds of relations are of particular importance, and we discuss them now. If R is a relation on a set X, then R is reflexive if xRx for each x ∈ X. Examples include the equality and inclusion relations. The relation R is symmetric if yRx whenever x and y are members of X satisfying xRy. Equality has this property, but inclusion does not. We also say that R is transitive if xRy whenever there is a z ∈ X for which xRzRy. Equality and inclusion both exhibit this property. A relation with all three of these properties is an equivalence relation. Equality is such a relation, but inclusion is not.

Equivalence relations are closely linked to partitions. A partition of a set X is a set of nonempty, mutually disjoint subsets of X whose union is X. The elements of a partition are sometimes called its cells. It can be shown that with any equivalence relation R on a set X there is an associated unique partition P of X with the property that elements x and y of X belong to the same cell of P if and only if xRy. Conversely, any partition P of X has associated with it a unique equivalence relation R on X constructed in the same way: xRy if and only if x and y belong to the same cell of P. The cells of P are called the equivalence classes of R. For each x in X, we denote by [x] the unique equivalence class to which it belongs. Thus [x] = [y] if and only if y is an element of X that belongs to the equivalence class [x].

There is yet another kind of relation that is of paramount importance. A relation f from a set X to a set Y is called a function from X into Y if for each x ∈ X there is a unique y ∈ Y for which (x, y) ∈ f. We usually write y as f(x). We say that f maps x to y and that y is the image of x under f and corresponds to x under f. We think of the function f as providing a rule for associating with each x in X a unique corresponding element y of Y. It is this aspect of the concept that is usually important for our purposes, but it is of interest to see how to construct the idea of a function from sets, as we have done.

Given sets X and Y, we sometimes write f: X → Y to indicate that f is a function from X into Y. The set X is called the domain of the function f. The subset of Y consisting of the images of the elements of X is the range of f. In this book the range of f will usually be a set of real or complex numbers, in which case we describe the function as real-valued or complex-valued, respectively. The range of f is not necessarily the whole of Y: Some elements of Y might not correspond to any element of X. The domain of f is denoted by \(\mathcal{D}_{f}\) and the range of f by \(\mathcal{R}_{f}\). If \(\mathcal{R}_{f} = Y\), then f is described as surjective and called a surjection from X onto Y. In the case of a surjection, each member of Y does correspond to an element of X, but this element need not be unique.

Suppose on the other hand that each element of \(\mathcal{R}_{f}\) does correspond to a unique element of X. Then f is injective and an injection from X into Y. Thus f is injective if and only if w = x whenever f(w) = f(x): Distinct elements of X must be mapped to distinct elements of Y. An injective function is also described as one-to-one.

Perhaps f is both injective and a surjection from X onto Y. Then f is a bijection from X onto Y and described as bijective. In this case each member of Y corresponds to a unique element of X. Thus there exists a function g from Y into X such that g(f(x)) = x for all x ∈ X. This function is called the inverse of f and is denoted by f −1. It is a bijection from Y onto X. Note that \(f^{-1}(f(x)) = x\) for all x ∈ X. Moreover \(f(f^{-1}(y)) = y\) for each y ∈ Y, and \((f^{-1})^{-1} = f.\)

A function f with domain X is said to be constant (on X) if f(w) = f(x) for each w and x in X. The range of a constant function with nonempty domain therefore consists of a single element.

Now let f be a function from a set X into a set Y and g a function from Y into a set Z. Then g(f(x)) is defined for each x ∈ X. Letting h(x) = g(f(x)) for each such x, we see that h is a function from X into Z. We call it the composition of g and f and denote it by gf. Thus

$$\displaystyle{(g \circ f)(x) = g(f(x))}$$

for each x ∈ X. For instance, if f is a bijection from X onto Y, it follows that \((f^{-1} \circ f)(x) = x\) for each x ∈ X and \((f \circ f^{-1})(y) = y\) for each y ∈ Y. It is easily checked that a composition of injections is injective and that a composition of surjections is surjective. It follows that a composition of bijections is bijective.

If f is a function and \(X \subseteq \mathcal{D}_{f}\), then we write

$$\displaystyle{f(X) =\{ f(x) \in \mathcal{R}_{f}\mid x \in X\}.}$$

Thus \(f(\mathcal{D}_{f}) = \mathcal{R}_{f}\).

If f is a real-valued function whose domain is a set of real numbers, then the graph of f is the set of all points (x, f(x)) in the Cartesian plane, where \(x \in \mathcal{D}_{f}\). It is also the graph of the equation f(x) = y.

1.3 Induction and Inequalities

Natural numbers are those used to count. The set {1, 2, } of natural numbers is denoted by \(\mathbb{N}\). We assume familiarity with the basic properties of these numbers and simply highlight those that are of particular importance for our development of analysis. The details of the development of natural numbers, integers, rational numbers, and real numbers in terms of sets are given in [10] and will not be repeated here.

The main properties of \(\mathbb{N}\) that we require are that it contains the number 1 and that every natural number has a successor in \(\mathbb{N}\). The successor of a natural number n is denoted by n + 1. For example, the successor of 1 is \(2 = 1 + 1\) and that of 2 is \(3 = 2 + 1\). This notion of a successor for each natural number enables us to list the natural numbers in order, beginning with 1. In fact, it can be shown that each natural number n ≠ 1 is the successor of a unique natural number n − 1. If Y is a set of natural numbers such that 1 ∈ Y and n + 1 ∈ Y for each n ∈ Y, then \(Y = \mathbb{N}\).

This observation can be applied to yield an important technique, called induction, for proving theorems about natural numbers. In this application, Y is the set of all natural numbers for which the desired result is true. First, prove the desired theorem for the natural number 1. (In other words, prove that 1 ∈ Y.) Then assume that it is true for a particular natural number n (so that n ∈ Y ) and prove it for n + 1 under this assumption. The conclusion that \(Y = \mathbb{N}\) then shows that the theorem is indeed true for all natural numbers. This idea is perhaps most easily visualized as follows. Imagine a line of dominoes standing on end and close together so that the line begins with a particular domino and extends indefinitely to the right of that first domino. Knock the first domino onto the second. Then it is easy to see that all the dominoes will fall over, because the first domino will fall and every other domino has one before it that will eventually knock it over. This picture captures the essence of induction.

Before giving examples of induction at work, let us make some observations about the pattern of a proof by induction. One approach to the use of induction to prove a theorem about a natural number n, as we have seen, is to prove the theorem for the natural number 1, then assume it for a particular natural number n (that is, for a particular integer n ≥ 1), and finally prove it for n + 1. The assumption that the theorem holds for a particular n is commonly called the inductive hypothesis. We can in fact strengthen it by assuming that the theorem holds for all natural numbers less than n as well. Equivalently, one could prove the theorem first for 1, then assume as an inductive hypothesis that it holds for a particular integer n − 1 (or for all natural numbers less than n) where n ≥ 2, and finally prove it for n under this assumption. Another observation is that the process of induction need not begin with the natural number 1. In fact, it should begin with the smallest integer for which the theorem is to be proved. In other words, suppose we wish to prove a theorem for all integers n ≥ a for some fixed integer a. We start an inductive proof in this case by proving the theorem for a. Then there are two equivalent ways to continue. One is to assume as an inductive hypothesis that the theorem holds for some particular integer n ≥ a (or for all integers k such that a ≤ k ≤ n) and then prove it for n + 1. The alternative is to assume that it holds for n − 1 for a particular integer n > a (or for all integers k such that a ≤ k < n) and then prove it for n under this assumption.

Because of its importance, we now state the principle of induction formally as a theorem. We denote the set of all integers by \(\mathbb{Z}\), so that

$$\displaystyle{\mathbb{Z} =\{\ldots,-2,-1,0,1,2,\ldots \}.}$$

Theorem 1.3.1.

Let \(Y \subseteq \mathbb{Z}\) and a ∈ Y. Suppose also that y + 1 ∈ Y whenever y ∈ Y. Then Y contains every integer greater than a.

In applications of Theorem 1.3.1 to prove a given assertion about integers, Y is the set of all integers for which the assertion in question is true.

We now offer an example of an inductive proof.

Example 1.3.1.

We shall prove by induction that if x is a nonnegative real number, then

$$\displaystyle{ (1 + x)^{n} \geq 1 + \mathit{nx} + \frac{n(n - 1)} {2} x^{2} }$$
(1.1)

for all integers n ≥ 0. It is easy to see that equality holds for n = 0. Assume that inequality (1.1) holds for a particular integer n ≥ 0. Then

$$\displaystyle\begin{array}{rcl} (1 + x)^{n+1}& =& (1 + x)^{n}(1 + x) {}\\ & \geq & \left (1 + \mathit{nx} + \frac{n(n - 1)} {2} x^{2}\right )(1 + x) {}\\ & =& 1 + (n + 1)x + \left (n + \frac{n(n - 1)} {2} \right )x^{2} + \frac{n(n - 1)} {2} x^{3} {}\\ & \geq & 1 + (n + 1)x + \frac{2n + n^{2} - n} {2} x^{2} {}\\ & =& 1 + (n + 1)x + \frac{n(n + 1)} {2} x^{2}, {}\\ \end{array}$$

since \(n(n - 1)x^{3}/2 \geq 0\). The proof by induction is now complete.

It follows that

$$\displaystyle{ (1 + x)^{n} \geq 1 + nx }$$
(1.2)

and

$$\displaystyle{ (1 + x)^{n} > \frac{n(n - 1)} {2} x^{2} }$$
(1.3)

hold for every nonnegative real number x and nonnegative integer n. Inequality (1.2) is known as Bernoulli’s inequality. Both of these inequalities will be found to be useful later. △

Example 1.3.1 involved an inequality. It is assumed that the reader is conversant with inequalities and can work with them comfortably. One of the salient points to remember is that multiplication of both sides of an inequality by a negative number causes a change in the direction of the inequality. For instance, if x < y, then \(-x > -y\). Similarly, if both sides of an inequality have the same sign, then taking the reciprocals of both sides again induces a change in the direction of the inequality. Thus \(1/x > 1/y\) if either x < y < 0 or 0 < x < y.

Induction can be used to make definitions as well as to prove theorems. Let us illustrate this point by defining finite sums inductively. Let m and n be integers with n > m. If a m , a m+1, , a n are numbers, we define

$$\displaystyle{\sum _{j=m}^{m}a_{ j} = a_{m}}$$

and

$$\displaystyle{\sum _{j=m}^{n}a_{ j} =\sum _{ j=m}^{n-1}a_{ j} + a_{n}.}$$

Often we write

$$\displaystyle{\sum _{j=m}^{n}a_{ j} = a_{m} + a_{m+1} + \cdots + a_{n}.}$$

This quantity is called the sum of a m , a m+1, , a n , and those numbers are its terms. We also define

$$\displaystyle{\sum _{j=m}^{n}a_{ j} = 0}$$

if m > n, and we regard this expression as a sum with no terms.

We may define the product of the same numbers in a similar way. Thus

$$\displaystyle{\prod _{j=m}^{m}a_{ j} = a_{m}}$$

and

$$\displaystyle{\prod _{j=m}^{n}a_{ j} = a_{n}\prod _{j=m}^{n-1}a_{ j}}$$

for all n > m. We often write

$$\displaystyle{\prod _{j=m}^{n}a_{ j} = a_{m}a_{m+1}\ldots a_{n}.}$$

This number is the product of a m , a m+1, , a n , and those numbers are its factors. If m > n, then we define

$$\displaystyle{\prod _{j=m}^{n}a_{ j} = 1}$$

and regard this expression as a product with no factors.

Similarly, we obtain analogous expressions for unions and intersections of sets by replacing with \(\bigcup\) and \(\bigcap\), respectively.

Induction may also be used to define exponentiation for powers that are natural numbers: Given a real number a, set a 1 = a and if a n has been defined for a specific natural number n, let

$$\displaystyle{a^{n+1} = a^{n} \cdot a.}$$

We may also define a 0 = 1, though this definition is normally made only if a ≠ 0. Furthermore we define

$$\displaystyle{a^{-n} = \frac{1} {a^{n}}}$$

if \(n \in \mathbb{N}\) and a ≠ 0.

One can easily prove by induction that

$$\displaystyle{ (\mathit{ab})^{n} = a^{n}b^{n} }$$
(1.4)

for any real numbers a and b and natural number n, and similarly that

$$\displaystyle{ \left (\frac{a} {b}\right )^{n} = \frac{a^{n}} {b^{n}} }$$
(1.5)

if b ≠ 0. If ab ≠ 0, then these two equations extend to the case where n is any integer. For each fixed \(m \in \mathbb{N}\), it is also easy to prove by induction on n that

$$\displaystyle{ a^{m}a^{n} = a^{m+n} }$$
(1.6)

for all \(n \in \mathbb{N}\) and then that

$$\displaystyle{ (a^{m})^{n} = a^{\mathit{mn}} }$$
(1.7)

for all \(n \in \mathbb{N}\). If a ≠ 0, then these two rules may also be extended to the case where m and n are any integers.

In order to extend these ideas, we need the following definition.

Definition 1.3.1.

If n is a nonnegative integer and a 0, a 1, , a n are constants, then the function given by

$$\displaystyle{ \sum _{j=0}^{n}a_{ j}x^{j} }$$
(1.8)

for all numbers x is called a polynomial. Its degree is n if a n ≠ 0. The numbers a 0, a 1, , a n are its coefficients. If

$$\displaystyle{\sum _{j=0}^{n}a_{ j}c^{j} = 0,}$$

then c is called a root of the polynomial (1.8).

It is shown in [10] that every nonnegative number a has a unique nonnegative square root. This square root is denoted by a 1∕2 or \(\sqrt{a}\). Thus we have \((a^{1/2})^{2} = a\). It is also shown in [10] that every polynomial of odd degree whose coefficients are real has a real root. For example, if m is an odd positive integer and a is a real number, then there is a real number c satisfying the equation c m = a. The uniqueness of c follows from the fact that if x and y are real numbers such that x < y, then x m < y m since m is odd. (This fact is obvious if x ≤ 0 and y ≥ 0, for then x m ≤ 0, y m ≥ 0, and at least one of x m and y m is nonzero. If 0 < x < y, then use induction to prove it for all positive integers m. If x < y < 0, then note that \(0 < -y < -x\) and apply the previous case with m odd.) We write \(a^{1/m} = c\). In the case where m = 1, this definition is consistent with the equation a 1 = a. In any case, we have \((a^{1/m})^{m} = a\), and so we refer to a 1∕m as the mth root of a. Note that \(0^{1/m} = 0\) and that a 1∕m has the same sign as a if a ≠ 0.

An arbitrary positive integer can be written in the form 2k m, where k and m are nonnegative integers and m is odd. If n = 2k m, then we can show by induction on k that each a ≥ 0 has a unique nonnegative nth root. Indeed, we have already observed this fact if k = 0. Suppose that k > 0 and that a has a unique nonnegative rth root c, where \(r = 2^{k-1}m = n/2\). Then

$$\displaystyle{a = c^{r} = \left (\left (c^{\frac{1} {2} }\right )^{2}\right )^{r} = \left (c^{\frac{1} {2} }\right )^{2r} = \left (c^{\frac{1} {2} }\right )^{n}.}$$

On the other hand, if \(a = b^{n} = b^{2r} = (b^{2})^{r}\), where b ≥ 0, then b 2 = c by the uniqueness of c, and so \(b = c^{1/2}\). We conclude that \(\sqrt{c}\) is the unique nonnegative nth root of a. We write it as a 1∕n. Hence

$$\displaystyle{ \left (a^{\frac{1} {n} }\right )^{n} = a. }$$
(1.9)

Moreover \(0^{1/n} = 0\) and \(1^{1/n} = 1\).

An alternative proof of the existence of the nonnegative nth root of a will be given in Example 5.3.5.

Let 0 ≤ a < b, and for some positive integer n let \(c = a^{1/n}\) and \(d = b^{1/n}\). If c ≥ d, then we should have the contradiction that \(a = c^{n} \geq d^{n} = b\). We conclude that c < d. For instance, if a > 1, then a 1∕n > 1.

Let a ≥ 0, let r and s be positive integers, and let \(c = a^{1/r}\). Then

$$\displaystyle{\left (c^{\frac{1} {s} }\right )^{\mathit{rs}} = \left (\left (c^{\frac{1} {s} }\right )^{s}\right )^{r} = c^{r} = a,}$$

so that

$$\displaystyle{a^{ \frac{1} {\mathit{rs}} } = c^{\frac{1} {s} } = \left (a^{\frac{1} {r} }\right )^{\frac{1} {s} }.}$$

Next, let a ≥ 0, let \(n \in \mathbb{N}\), and let m be a nonnegative integer. We then define

$$\displaystyle{a^{\frac{m} {n} } = \left (a^{\frac{1} {n} }\right )^{m}.}$$

This definition generalizes Eq. (1.9) and is consistent with the equation a 1 = a in the case where m = 1 or n = 1. It is also consistent with the equation a 0 = 1 in the case where m = 0. If m = jk and n = jl for some positive integers j, k, l, then

$$\displaystyle{a^{\frac{\mathit{jk}} {\mathit{jl}} } = \left (a^{\frac{1} {\mathit{jl}} }\right )^{\mathit{jk}} = \left (\left (\left (a^{\frac{1} {l} }\right )^{\frac{1} {j} }\right )^{j}\right )^{k} = \left (a^{\frac{1} {l} }\right )^{k} = a^{\frac{k} {l} },}$$

a result that is consistent with the equation \(\mathit{jk}/(jl) = k/l\). If a > 0, then we define

$$\displaystyle{a^{-\frac{m} {n} } = \frac{1} {a^{\frac{m} {n} }} = \frac{1} {\left (a^{\frac{1} {n} }\right )^{m}} = \left (a^{\frac{1} {n} }\right )^{-m}.}$$

We proceed to the generalization of Eqs. (1.4)–(1.7). Let a ≥ 0, let m be an integer, and let \(n \in \mathbb{N}\). Since

$$\displaystyle{\left (a^{\frac{m} {n} }\right )^{n} = \left (\left (a^{\frac{1} {n} }\right )^{m}\right )^{n} = \left (a^{\frac{1} {n} }\right )^{\mathit{nm}} = \left (\left (a^{\frac{1} {n} }\right )^{n}\right )^{m} = a^{m},}$$

it follows that

$$\displaystyle{(a^{m})^{ \frac{1} {n} } = a^{\frac{m} {n} } = \left (a^{\frac{1} {n} }\right )^{m}.}$$

Thus if p and q are also integers and q > 0, then

$$\displaystyle{\left (a^{\frac{m} {n} }\right )^{\frac{p} {q} } = \left (\left (\left (a^{\frac{1} {n} }\right )^{m}\right )^{\frac{1} {q} }\right )^{p} = \left (\left (\left (a^{\frac{1} {n} }\right )^{\frac{1} {q} }\right )^{m}\right )^{p} = \left (a^{ \frac{1} {\mathit{nq}} }\right )^{\mathit{mp}} = a^{\frac{\mathit{mp}} {\mathit{nq}} }.}$$

This result generalizes Eq. (1.7). Equation (1.6) also generalizes, since

$$\displaystyle{a^{\frac{m} {n} }a^{\frac{p} {q} } = a^{\frac{\mathit{qm}} {\mathit{qn}} }a^{\frac{\mathit{pn}} {\mathit{qn}} } = \left (a^{ \frac{1} {\mathit{qn}} }\right )^{\mathit{qm}}\left (a^{ \frac{1} {\mathit{qn}} }\right )^{\mathit{pn}} = \left (a^{ \frac{1} {\mathit{qn}} }\right )^{\mathit{qm}+\mathit{pn}} = a^{\frac{\mathit{qm}+\mathit{pn}} {\mathit{qn}} } = a^{\frac{m} {n} +\frac{p} {q} }.}$$

Now suppose that b is also a nonnegative real number. As

$$\displaystyle{\mathit{ab} = \left (a^{\frac{1} {n} }\right )^{n}\left (b^{\frac{1} {n} }\right )^{n} = \left (a^{\frac{1} {n} }b^{\frac{1} {n} }\right )^{n},}$$

it then follows that

$$\displaystyle{(ab)^{ \frac{1} {n} } = a^{\frac{1} {n} }b^{\frac{1} {n} }.}$$

Therefore

$$\displaystyle{(ab)^{\frac{m} {n} } = ((\mathit{ab})^{m})^{ \frac{1} {n} } = (a^{m}b^{m})^{ \frac{1} {n} } = (a^{m})^{ \frac{1} {n} }(b^{m})^{ \frac{1} {n} } = a^{\frac{m} {n} }b^{\frac{m} {n} },}$$

a result that generalizes Eq. (1.4). Thus if b > 0, then

$$\displaystyle{\left (\frac{a} {b}\right )^{\frac{m} {n} } = (\mathit{ab}^{-1})^{\frac{m} {n} } = a^{\frac{m} {n} }(b^{-1})^{\frac{m} {n} } = a^{\frac{m} {n} }b^{-\frac{m} {n} } = \frac{a^{\frac{m} {n} }} {b^{\frac{m} {n} }},}$$

and so Eq. (1.5) generalizes as well.

We have now defined the real number a x for every number a > 0 and every rational power x. Later we shall extend this definition to the case where x is any real number. In fact, similar ideas may be used to define w z for any complex numbers w and z provided that if w is real, then it is positive.

As another example of an inductive definition, we may define the factorial n! of a nonnegative integer n by writing 0!  = 1 and

$$\displaystyle{n! = n(n - 1)!}$$

for all \(n \in \mathbb{N}\).

Given objects x 1, x 2, , x n , where n > 1, we may define the ordered set

$$\displaystyle{(x_{1},x_{2},\ldots,x_{n})}$$

by induction: The ordered pair (x 1, x 2) has already been defined, and for each n > 2 we let

$$\displaystyle{(x_{1},x_{2},\ldots,x_{n}) = ((x_{1},x_{2},\ldots,x_{n-1}),x_{n}).}$$

Let X 1, X 2, , X n be sets, where n > 1. We can also define the Cartesian product X 1 × X 2 ×⋯ × X n of X 1, X 2, , X n by induction: The definition has already been made for n = 2, and for each n > 2 define

$$\displaystyle{X_{1} \times X_{2} \times \cdots \times X_{n} = (X_{1} \times X_{2} \times \cdots \times X_{n-1}) \times X_{n}.}$$

The result is the collection of all ordered sets (x 1, x 2, , x n ) such that x j  ∈ X j for each j. If X j  = X for each j, then we write

$$\displaystyle{X^{n} = X_{ 1} \times X_{2} \times \cdots \times X_{n}.}$$

The absolute value of a real number x plays a prominent role in analysis. Denoted by | x | , it is defined as x if x ≥ 0 and − x otherwise. Thus it is always the case that | x | ≥ 0 and that

$$\displaystyle{\vert x\vert = \sqrt{x^{2}} = \vert - x\vert.}$$

For every a > 0 it is also easy to see that | x |  < a if and only if − a < x < a. From these inequalities it follows that | x |  > a if and only if x > a or x < −a, and that | xy |  < a if and only if \(y - a < x < y + a\), where y is another real number. In addition, observe that | x | ≥ x and | x | ≥ −x for all real x. Thus \(\vert x\vert + \vert y\vert \geq x + y\) and \(\vert x\vert + \vert y\vert \geq -(x + y)\), so that

$$\displaystyle{\vert x + y\vert \leq \vert x\vert + \vert y\vert.}$$

This result is known as the triangle inequality.

Inequalities can be used to define intervals on the real line. If a and b are real numbers with a < b, then we write

$$\displaystyle\begin{array}{rcl} (a,b)& =& \{x\mid a < x < b\}, {}\\ {}[a,b]& =& \{x\mid a \leq x \leq b\}, {}\\ {}(a,b]& =& \{x\mid a < x \leq b\}, {}\\ {}[a,b)& =& \{x\mid a \leq x < b\}. {}\\ \end{array}$$

These sets are called intervals. The first is open and the second closed; the others are half-open. It should be clear from the context whether the notation (a, b) specifies an open interval or an ordered pair. The numbers a and b are called the ends of each of these intervals. In addition, we write

$$\displaystyle\begin{array}{rcl} (a,\infty )& =& \{x\mid a < x\}, {}\\ {}[a,\infty )& =& \{x\mid a \leq x\}, {}\\ {}(-\infty,a)& =& \{x\mid x < a\}, {}\\ (-\infty,a]& =& \{x\mid x \leq a\}. {}\\ \end{array}$$

These sets are also reckoned as intervals, and a is regarded as an end of each.

Let f be a real-valued function whose domain includes an interval I. Then f is said to be increasing on I if f(x 1) < f(x 2) whenever x 1 and x 2 are numbers in I such that x 1 < x 2. If, on the other hand, f(x 1) ≤ f(x 2) for each such x 1 and x 2, then f is nondecreasing on I. We define functions that are decreasing or nonincreasing on I similarly. All these functions are said to be monotonic on I, and those that are increasing on I or decreasing on I are strictly monotonic on I.

Exercises 1.1.

 

  1. 1.

    Show that if a < b, then

    $$\displaystyle{a \leq \alpha a + (1-\alpha )b \leq b}$$

    for all α such that 0 ≤ α ≤ 1.

  2. 2.

    Prove that

    $$\displaystyle{\vert xy\vert = \vert x\vert \vert y\vert }$$

    for all real numbers x and y. If y ≠ 0, prove also that

    $$\displaystyle{\left \vert \frac{x} {y}\right \vert = \frac{\vert x\vert } {\vert y\vert }.}$$
  3. 3.

    Use the triangle inequality to prove that

    $$\displaystyle{\vert \vert x\vert -\vert y\vert \vert \leq \vert x - y\vert,}$$

    where x and y are real numbers.

  4. 4.

    Solve the following equations and inequalities, where x is real:

    $$\displaystyle{\begin{array}{llll} \mbox{ (a)} &\vert 3x - 2\vert = \vert 5x + 4\vert;&\mbox{ (d)}&\vert 3x + 4\vert \geq 2; \\ \mbox{ (b)}&\vert x + 4\vert = -4x; &\mbox{ (e)} &\vert 3x + 2\vert < 3\vert x\vert. \\ \mbox{ (c)} &\vert 2x - 3\vert < 4; & &\end{array} }$$
  5. 5.

    Prove the following by induction for all positive integers n and real numbers a, a 1, a 2, , a n :

    1. (a)

       | a n |  =  | a | n.

    2. (b)

      \(\left \vert \prod _{j=1}^{n}a_{j}\right \vert =\prod _{ j=1}^{n}\vert a_{j}\vert \).

    3. (c)

      \(\left \vert \sum _{j=1}^{n}a_{j}\right \vert \leq \sum _{j=1}^{n}\vert a_{j}\vert \).

  6. 6.

    Prove that \(\sqrt{2}\) is irrational by writing \(\sqrt{2} = a/b\), where a and b are positive integers with no common factor, and obtaining a contradiction by showing that a and b must both be even.

1.4 Complex Numbers

Throughout this book we will assume that any numbers we are working with are complex unless an indication to the contrary is given by either the context or an explicit statement. For example, the numbers a 0, a 1, , a n , x in the definition of a polynomial (Definition 1.3.1) need not be real. As complex numbers might not be as familiar to the reader as real numbers, we define them here and prove some basic properties.

The equation

$$\displaystyle{ x^{2} = -1 }$$
(1.10)

has no real solution. We seek to extend the real number system to include a number i such that \(i^{2} = -1\) while preserving familiar operations such as addition and multiplication. If we put \(x + \mathit{iy} = 0\), where x and y are real numbers, then \(x = -iy\), so that \(x^{2} = i^{2}y^{2} = -y^{2}\) and hence \(x = y = 0\). Now if

$$\displaystyle{ x + \mathit{iy} = a + \mathit{ib}, }$$
(1.11)

where a and b are also real, then

$$\displaystyle{x - a + i(y - b) = 0.}$$

The argument above shows that \(x - a = y - b = 0\) and we conclude that Eq. (1.11) holds if and only if x = a and y = b. Consequently x +iy may be identified with the ordered pair (x, y) of real numbers.

We therefore define a complex number as an ordered pair of real numbers. Hence every complex number can be visualized as a point in the Cartesian plane. If z = (x, y), where x and y are real, then we define Re (z) = x, and Im (z) = y, and we refer to x and y as the real and imaginary parts, respectively, of z. We define addition and multiplication by the rules

$$\displaystyle{(u,v) + (x,y) = (u + x,v + y)}$$

and

$$\displaystyle{(u,v)(x,y) = (\mathit{ux} -\mathit{vy},\mathit{uy} + \mathit{vx}),}$$

respectively, where u, v, x, y are all real. These rules are motivated by the calculations

$$\displaystyle{(u + \mathit{iv}) + (x + \mathit{iy}) = u + x + i(v + y)}$$

and

$$\displaystyle{(u + \mathit{iv})(x + \mathit{iy}) = \mathit{ux} -\mathit{vy} + i(\mathit{uy} + \mathit{vx}).}$$

Thus

$$\displaystyle{(u,0) + (x,0) = (u + x,0)}$$

and

$$\displaystyle{(u,0)(x,0) = (\mathit{ux},0).}$$

Therefore we may identify (x, 0) with the real number x. In particular, it follows that (0, 0) = 0 and (1, 0) = 1. Note also that

$$\displaystyle{\mathrm{Re}\,(w + z) =\mathrm{ Re}\,(w) +\mathrm{ Re}\,(z)}$$

and

$$\displaystyle{\mathrm{Im}\,(w + z) =\mathrm{ Im}\,(w) +\mathrm{ Im}\,(z)}$$

for any two complex numbers w and z. It is also worth observing that the addition of complex numbers may be interpreted geometrically as vector addition. In Chap. 6 we will give a geometric interpretation of the multiplication of complex numbers.

The required solution i to Eq. (1.10) is identified with the ordered pair (0, 1). According to our definition of multiplication, it has the desired property:

$$\displaystyle{i^{2} = (0,1)(0,1) = (-1,0) = -1.}$$

Note also that if x and y are real, then the definitions of addition and multiplication do indeed give

$$\displaystyle{x + \mathit{iy} = (x,0) + (0,1)(y,0) = (x,0) + (0,y) = (x,y).}$$

These results show that the algebraic manipulation of complex numbers is identical to that of real numbers with the additional rule that \(i^{2} = -1\). In particular, we have the laws that

$$\displaystyle{ a + b = b + a, }$$
(1.12)
$$\displaystyle{ a + (b + c) = (a + b) + c, }$$
(1.13)

and

$$\displaystyle{ a(b + c) = \mathit{ab} + \mathit{ac} }$$
(1.14)

for all complex numbers a, b, c. We summarise Eqs. (1.12) and (1.13) by asserting that the addition of complex numbers is commutative and associative, respectively. Similarly, multiplication of complex numbers is commutative and associative. Equation (1.14) asserts that multiplication is distributive over addition.

We also define

$$\displaystyle{-(x + \mathit{iy}) = -x -\mathit{iy}.}$$

Then subtraction is defined by the equation

$$\displaystyle{w - z = w + (-z)}$$

for all complex numbers w and z.

We have now extended the real number system to include a number i that gives a solution to the equation \(x^{2} = -1\). More generally, \(i\sqrt{c}\) gives a solution to the equation \(x^{2} = -c\), where c ≥ 0. In fact, it can be shown (see [10]) that every nonconstant polynomial with complex coefficients has at least one complex root. This result is known as the fundamental theorem of algebra. For example, suppose that

$$\displaystyle{\mathit{az}^{2} + \mathit{bz} + c = 0,}$$

where a, b, c, z are all complex numbers and a ≠ 0. As the left-hand side of this equation is a polynomial of degree 2, the equation is said to be quadratic. It can be solved for z by the following procedure. Since a ≠ 0, we have

$$\displaystyle\begin{array}{rcl} 0& =& z^{2} + \frac{\mathit{bz}} {a} + \frac{c} {a} {}\\ & =& \left (z + \frac{b} {2a}\right )^{2} - \frac{b^{2}} {4a^{2}} + \frac{c} {a} {}\\ & =& \left (z + \frac{b} {2a}\right )^{2} -\frac{b^{2} - 4\mathit{ac}} {4a^{2}}, {}\\ \end{array}$$

so that

$$\displaystyle{\left (z + \frac{b} {2a}\right )^{2} = \frac{b^{2} - 4\mathit{ac}} {4a^{2}}.}$$

Thus

$$\displaystyle{z + \frac{b} {2a} = \pm \frac{1} {2a}\sqrt{b^{2 } - 4\mathit{ac}},}$$

and we conclude that

$$\displaystyle{z = \frac{-b \pm \sqrt{b^{2 } - 4\mathit{ac}}} {2a}.}$$

The expression b 2 − 4ac is called the discriminant of the polynomial \(\mathit{az}^{2} + \mathit{bz} + c\). If a, b, c are real, then the polynomial has just two real roots if its discriminant is positive, just one if its discriminant is 0, and none otherwise.

We proceed to the exponentiation of complex numbers. As in the case of real numbers, we define z 1 = z for every complex number z, and if z n has been defined for a specific natural number n, we write \(z^{n+1} = z^{n} \cdot z\). We also define z 0 = 1 if z ≠ 0.

Next, let

$$\displaystyle{z = x + \mathit{iy}\neq 0,}$$

where x and y are real. Then

$$\displaystyle{(x + \mathit{iy})(x -\mathit{iy}) = x^{2} + y^{2}\neq 0,}$$

and so

$$\displaystyle{\frac{1} {z} = \frac{1} {x + \mathit{iy}} = \frac{x -\mathit{iy}} {(x + \mathit{iy})(x -\mathit{iy})} = \frac{x -\mathit{iy}} {x^{2} + y^{2}}.}$$

Thus if we define

$$\displaystyle{z^{-1} = \frac{x -\mathit{iy}} {x^{2} + y^{2}},}$$

then we have \(\mathit{zz}^{-1} = 1\). For every positive integer n we now define \(z^{-n} = (z^{-1})^{n}\) if z ≠ 0. This definition agrees with the equation z 1 = z in the case where n = 1. We also define \(w/z = wz^{-1}\) for every complex number w.

We turn our attention now to two numbers that are associated with a given complex number. The first is the conjugate of a complex number \(z = x + iy\), where x and y are real. The conjugate of z is defined as xiy and is denoted by \(\bar{z}\). Thus \(\overline{\bar{z}} = z\), and \(z =\bar{ z}\) if and only if z is real. Geometrically, the function that maps z to \(\bar{z}\) is a reflection about the x-axis.

If we also have \(w = u + iv\), where u and v are real, then

$$\displaystyle{\overline{w + z} = \overline{u + x + i(v + y)} = u + x - i(v + y) = u -\mathit{iv} + x -\mathit{iy} =\bar{ w} +\bar{ z}.}$$

Since

$$\displaystyle{\overline{ - z} = \overline{ - x -\mathit{iy}} = -x + \mathit{iy} = -(x -\mathit{iy}) = -\bar{z},}$$

it also follows that

$$\displaystyle{\overline{w - z} = \overline{w + (-z)} =\bar{ w} + \overline{ - z} =\bar{ w} + (-\bar{z}) =\bar{ w} -\bar{ z}.}$$

Moreover

$$\displaystyle{\overline{wz} = \overline{\mathit{ux} -\mathit{vy} + i(\mathit{uy} + \mathit{vx})} = \mathit{ux} -\mathit{vy} - i(\mathit{uy} + \mathit{vx}),}$$

and so

$$\displaystyle{\bar{w} \cdot \bar{ z} = (u -\mathit{iv})(x -\mathit{iy}) = \mathit{ux} -\mathit{vy} + i(-\mathit{uy} -\mathit{vx}) = \overline{wz}.}$$

If z ≠ 0, then

$$\displaystyle{\overline{1/z} = \frac{x + \mathit{iy}} {x^{2} + y^{2}} = 1/\bar{z},}$$

and it follows that

$$\displaystyle{\overline{w/z} = \overline{wz^{-1}} =\bar{ w}\overline{z^{-1}} =\bar{ w} \cdot \overline{1/z} = \frac{\bar{w}} {\bar{z}}.}$$

Note also that

$$\displaystyle{z +\bar{ z} = 2x = 2\mathrm{Re}\,(z)}$$

and, similarly,

$$\displaystyle{z -\bar{ z} = 2i\mathrm{Im}\,(z);}$$

hence

$$\displaystyle{\mathrm{Re}\,(z) = \frac{z +\bar{ z}} {2} }$$

and

$$\displaystyle{\mathrm{Im}\,(z) = \frac{z -\bar{ z}} {2i}.}$$

A further observation is that

$$\displaystyle{z\bar{z} = (x + \mathit{iy})(x -\mathit{iy}) = x^{2} + y^{2}.}$$

We now define

$$\displaystyle{\vert z\vert = \sqrt{x^{2 } + y^{2}}.}$$

This number is called the modulus of z. For example, | i |  = 1. We note that

$$\displaystyle{z\bar{z} = \vert z\vert ^{2},}$$
$$\displaystyle{\vert z\vert \geq \sqrt{x^{2}} = \vert x\vert = \vert \mathrm{Re}\,(z)\vert,}$$

and, similarly,

$$\displaystyle{\vert z\vert \geq \sqrt{y^{2}} = \vert \mathrm{Im}\,(z)\vert.}$$

Moreover if z = x, then

$$\displaystyle{\vert z\vert = \sqrt{x^{2}} = \vert x\vert.}$$

This observation shows that | z | is a generalization of the absolute value of a real number. It follows that

$$\displaystyle{z +\bar{ z} = 2\mathrm{Re}\,(z) \leq 2\vert \mathrm{Re}\,(z)\vert \leq 2\vert z\vert.}$$

If w is as defined in the previous paragraph, then

$$\displaystyle{\vert z - w\vert = \vert x + \mathit{iy} - u -\mathit{iv}\vert = \vert x - u + i(y - v)\vert = \sqrt{(x - u)^{2 } + (y - v)^{2}};}$$

hence

$$\displaystyle{\vert z - w\vert = \vert w - z\vert }$$

and we perceive | zw | geometrically as the distance between z and w. In particular, | z | is the distance between z and the origin. Note also that

$$\displaystyle{\vert z - c\vert = r,}$$

where c is a complex number and r ≥ 0 is the equation of a circle with center c and radius r.

We also have | z | ≥ 0 for all z, and | z |  = 0 if and only if z = 0. In addition,

$$\displaystyle{\vert z\vert = \vert - z\vert = \vert \bar{z}\vert.}$$

Since

$$\displaystyle{\vert \mathit{wz}\vert ^{2} = \mathit{wz} \cdot \overline{\mathit{wz}} = \mathit{wz}\bar{w}\bar{z} = w\bar{w}z\bar{z} = \vert w\vert ^{2}\vert z\vert ^{2} = (\vert w\vert \vert z\vert )^{2},}$$

we deduce that

$$\displaystyle{\vert \mathit{wz}\vert = \vert w\vert \vert z\vert.}$$

If z ≠ 0, it follows that

$$\displaystyle{\vert w\vert = \left \vert \frac{\mathit{wz}} {z} \right \vert = \left \vert \frac{w} {z} \right \vert \vert z\vert,}$$

and so

$$\displaystyle{\left \vert \frac{w} {z} \right \vert = \frac{\vert w\vert } {\vert z\vert }.}$$

The triangle inequality

$$\displaystyle{\vert w + z\vert \leq \vert w\vert + \vert z\vert }$$

also holds, as can be inferred from the calculation

$$\displaystyle\begin{array}{rcl} \vert w + z\vert ^{2}& =& (w + z)\overline{w + z} {}\\ & =& (w + z)(\bar{w} +\bar{ z}) {}\\ & =& w\bar{w} + w\bar{z} + z\bar{w} + z\bar{z} {}\\ & =& \vert w\vert ^{2} + w\bar{z} + \overline{w\bar{z}} + \vert z\vert ^{2} {}\\ & \leq & \vert w\vert ^{2} + 2\vert w\bar{z}\vert + \vert z\vert ^{2} {}\\ & =& \vert w\vert ^{2} + 2\vert w\vert \vert \bar{z}\vert + \vert z\vert ^{2} {}\\ & =& \vert w\vert ^{2} + 2\vert w\vert \vert z\vert + \vert z\vert ^{2} {}\\ & =& (\vert w\vert + \vert z\vert )^{2}. {}\\ \end{array}$$

Thus

$$\displaystyle{\vert w\vert = \vert w - z + z\vert \leq \vert w - z\vert + \vert z\vert,}$$

so that

$$\displaystyle{\vert w - z\vert \geq \vert w\vert -\vert z\vert.}$$

But we also have

$$\displaystyle{\vert w - z\vert = \vert z - w\vert \geq \vert z\vert -\vert w\vert = -(\vert w\vert -\vert z\vert ),}$$

and so we conclude that

$$\displaystyle{\vert w - z\vert \geq \left \vert \vert w\vert -\vert z\vert \right \vert.}$$

Finally, the triangle inequality also shows that

$$\displaystyle{\vert z\vert \leq \vert x\vert + \vert \mathit{iy}\vert = \vert x\vert + \vert y\vert = \vert \mathrm{Re}\,(z)\vert + \vert \mathrm{Im}\,(z)\vert.}$$

The notion of an inequality for real numbers does not extend to complex numbers. We have x 2 ≥ 0 for all real x, but i 2 < 0. It is meaningless to write w < z if either w or z is not real. The inequality | z |  < a is equivalent to − a < z < a if and only if a and z are both real.

Throughout the book we will denote by \(\mathbb{Z}\), \(\mathbb{Q}\), \(\mathbb{R}\), and \(\mathbb{C}\) the sets of integers, rational numbers, real numbers, and complex numbers, respectively.

Let f and g be functions whose domains are subsets of \(\mathbb{C}\). For all \(z \in \mathcal{D}_{f} \cap \mathcal{D}_{g}\), we define

$$\displaystyle{(f + g)(z) = f(z) + g(z),}$$
$$\displaystyle{(f - g)(z) = f(z) - g(z),}$$

and

$$\displaystyle{(fg)(z) = f(z)g(z).}$$

We also define

$$\displaystyle{\left (\frac{f} {g}\right )(z) = \frac{f(z)} {g(z)}}$$

for all \(z \in \mathcal{D}_{f} \cap \mathcal{D}_{g}\) such that g(z) ≠ 0. Thus f + g, fg, fg, and fg are all functions. In addition, if \(c \in \mathbb{C}\), then we define cf to be the function such that

$$\displaystyle{(cf)(z) = cf(z)}$$

for all \(z \in \mathcal{D}_{f}\).

Exercises 1.2.

 

  1. 1.

    Express in the form x +iy, where x and y are real,

    1. (a)

      \(3 + 4i + (1 - i)(1 + i)\);

    2. (b)

      (2 − 5i)2;

    3. (c)

      \(\frac{1-2i} {3+4i}\).

  2. 2.

    Compute i n for every integer n.

  3. 3.

    For every complex number z, find the real and imaginary parts of the following expressions:

    1. (a)

      \(\frac{z+2} {z-2i}\);

    2. (b)

      iz.

  4. 4.

    Let \(z = x + \mathit{iy}\) and \(w = a + \mathit{ib}\), where x, y, a, b are real. Suppose that z 2 = w. Show that

    $$\displaystyle{x^{2} = \frac{a + \vert w\vert } {2} }$$

    and

    $$\displaystyle{y^{2} = \frac{-a + \vert w\vert } {2}.}$$

    Deduce that

    $$\displaystyle{z = \pm (\alpha +\lambda \beta i),}$$

    where

    $$\displaystyle{\alpha = \sqrt{\frac{a + \vert w\vert } {2}},}$$
    $$\displaystyle{\beta = \sqrt{\frac{-a + \vert w\vert } {2}},}$$

    λ = 1 if b ≥ 0 and \(\lambda = -1\) if b < 0. Hence, conclude that the square root of a complex number is real if and only if the complex number is real and positive.

  5. 5.

    Solve the following equations:

    1. (a)

      \(z^{2} = 1 - i\);

    2. (b)

      z 4 = i.

  6. 6.

    Show that

    $$\displaystyle{ \frac{\vert z\vert } {\vert u + v\vert } \leq \frac{\vert z\vert } {\vert \vert u\vert -\vert v\vert \vert },}$$

    where u, v, z are complex numbers and | u | ≠ | v | .

  7. 7.

    If z and w are complex numbers, show that

    $$\displaystyle{\vert z + w\vert = \vert z\vert + \vert w\vert }$$

    if and only if z = α w for some real number α.

  8. 8.

    Recall that the function \(d(z,w) = \vert z - w\vert \) measures the distance between the points representing the complex numbers z and w. Prove the inequality

    $$\displaystyle{d(z,w) \leq d(z,v) + d(v,w),}$$

    where w, v, z are complex numbers. More generally, let z 1, z 2, , z n be complex numbers. Show that

    $$\displaystyle{d(z_{1},z_{n}) \leq d(z_{1},z_{2}) + d(z_{2},z_{3}) +\ldots +d(z_{n-1},z_{n}).}$$
  9. 9.

    Show that

    $$\displaystyle{\vert d(z,v) - d(v,w)\vert \leq d(z,w)}$$

    for complex numbers v, w, z.

  10. 10.

    Give a condition for

    $$\displaystyle{d(z,w) = d(z,v) + d(v,w),}$$

    where v, w, z are complex numbers.

1.5 Finite Sums

Our development of analysis is based on the concepts of sequences and series. In order to be able to deal with series, we need to be familiar with the properties of finite sums. This section is therefore devoted to the development of their basic properties.

Let m and n be integers with n ≥ m. Recall that

$$\displaystyle{\sum _{j=m}^{n}a_{ j} = a_{m} + a_{m+1} + \cdots + a_{n},}$$

where a m , a m+1, , a n are numbers. Observe that j is a dummy variable. In other words, the sum is independent of j, so that if k is another letter, then

$$\displaystyle{ \sum _{k=m}^{n}a_{ k} =\sum _{ j=m}^{n}a_{ j}. }$$
(1.15)

(Sometimes j is referred to as an index.) More generally, let r be any integer and put \(k = j + r\). Then \(j = k - r\). Moreover \(k = m + r\) when j = m, and \(k = n + r\) when j = n, and so we may write

$$\displaystyle{\sum _{j=m}^{n}a_{ j} =\sum _{ k=m+r}^{n+r}a_{ k-r}.}$$

Using Eq. (1.15), we therefore obtain the following result.

Proposition 1.5.1.

If m,n,r are integers with m ≤ n and a j is a number for each integer j such that m ≤ j ≤ n, then

$$\displaystyle{\sum _{j=m+r}^{n+r}a_{ j-r} =\sum _{ j=m}^{n}a_{ j}.}$$

Moreover the associativity of addition shows that if m, n and r are integers such that m ≤ r < n, then

$$\displaystyle{\sum _{j=m}^{n}a_{ j} =\sum _{ j=m}^{r}a_{ j} +\sum _{ j=r+1}^{n}a_{ j}.}$$

The next theorem is a basic property of finite sums.

Theorem 1.5.2.

Let m,n be integers with m ≤ n, and let a j and b j be numbers for each integer j for which m ≤ j ≤ n. For all numbers s,t, we have

$$\displaystyle{ \sum _{j=m}^{n}(\mathit{sa}_{ j} + \mathit{tb}_{j}) = s\sum _{j=m}^{n}a_{ j} + t\sum _{j=m}^{n}b_{ j}. }$$
(1.16)

Proof.

We fix m and use induction on n. For n = m we have

$$\displaystyle\begin{array}{rcl} \sum _{j=m}^{m}(\mathit{sa}_{ j} + \mathit{tb}_{j})& =& \mathit{sa}_{m} + \mathit{tb}_{m} {}\\ & =& s\sum _{j=m}^{m}a_{ j} + t\sum _{j=m}^{m}b_{ j}. {}\\ \end{array}$$

Now suppose that Eq. (1.16) holds for some integer n ≥ m. Then

$$\displaystyle\begin{array}{rcl} \sum _{j=m}^{n+1}(\mathit{sa}_{ j} + \mathit{tb}_{j})& =& \sum _{j=m}^{n}(\mathit{sa}_{ j} + \mathit{tb}_{j}) + \mathit{sa}_{n+1} + \mathit{tb}_{n+1} {}\\ & =& s\sum _{j=m}^{n}a_{ j} + t\sum _{j=m}^{n}b_{ j} + \mathit{sa}_{n+1} + \mathit{tb}_{n+1} {}\\ & =& s\left (\sum _{j=m}^{n}a_{ j} + a_{n+1}\right ) + t\left (\sum _{j=m}^{n}b_{ j} + b_{n+1}\right ) {}\\ & =& s\sum _{j=m}^{n+1}a_{ j} + t\sum _{j=m}^{n+1}b_{ j}, {}\\ \end{array}$$

and the proof by induction is complete. □ 

For example, by taking t = 0, we obtain the distributive law:

$$\displaystyle{ \sum _{j=m}^{n}\mathit{sa}_{ j} = s\sum _{j=m}^{n}a_{ j}. }$$
(1.17)

Similarly, putting \(s = t = 1\), we find that

$$\displaystyle{\sum _{j=m}^{n}(a_{ j} + b_{j}) =\sum _{ j=m}^{n}a_{ j} +\sum _{ j=m}^{n}b_{ j},}$$

and by setting s = 1 and \(t = -1\), we have

$$\displaystyle{\sum _{j=m}^{n}(a_{ j} - b_{j}) =\sum _{ j=m}^{n}a_{ j} -\sum _{j=m}^{n}b_{ j}.}$$

Thus if m ≤ n and \(a_{m},a_{m+1},\ldots,a_{n+1}\) are numbers, then

$$\displaystyle\begin{array}{rcl} \sum _{j=m}^{n}(a_{ j+1} - a_{j})& =& \sum _{j=m}^{n}a_{ j+1} -\sum _{j=m}^{n}a_{ j} {}\\ & =& \sum _{j=m+1}^{n+1}a_{ j} -\sum _{j=m}^{n}a_{ j} {}\\ & =& \sum _{j=m+1}^{n}a_{ j} + a_{n+1} -\left (a_{m} +\sum _{ j=m+1}^{n}a_{ j}\right ) {}\\ & =& a_{n+1} - a_{m}, {}\\ \end{array}$$

a result known as the telescoping property. We state it as a theorem.

Theorem 1.5.3.

Let m,n be integers such that m ≤ n, and let a j be a number for each j such that m ≤ j ≤ n + 1. Then

$$\displaystyle{\sum _{j=m}^{n}(a_{ j+1} - a_{j}) = a_{n+1} - a_{m}.}$$

This theorem in fact is intuitively clear, since the sum can be written as

$$\displaystyle{(a_{n+1} - a_{n}) + (a_{n} - a_{n-1}) + \cdots + (a_{m+1} - a_{m})}$$

and cancellation yields \(a_{n+1} - a_{m}\). Note also that

$$\displaystyle{\sum _{j=m}^{n}(a_{ j} - a_{j+1}) = -\sum _{j=m}^{n}(a_{ j+1} - a_{j}) = -(a_{n+1} - a_{m}) = a_{m} - a_{n+1}.}$$

As an example, if a j  = j for each j, we obtain

$$\displaystyle{\sum _{j=m}^{n}1 =\sum _{ j=m}^{n}(j + 1 - j) = n + 1 - m}$$

by the telescoping property. In particular, if m = 1 and n is a positive integer, then

$$\displaystyle{\sum _{j=1}^{n}1 = n + 1 - 1 = n,}$$

as expected, because we are simply adding n copies of the number 1.

For a less trivial application of the telescoping property, let us evaluate

$$\displaystyle{\sum _{j=1}^{n}(2j + 1).}$$

Note first that

$$\displaystyle\begin{array}{rcl} (j + 1)^{2} - j^{2}& =& j^{2} + 2j + 1 - j^{2} {}\\ & =& 2j + 1. {}\\ \end{array}$$

In the telescoping property we therefore take a j  = j 2 for each j and thereby obtain

$$\displaystyle\begin{array}{rcl} \sum _{j=1}^{n}(2j + 1)& =& \sum _{ j=1}^{n}((j + 1)^{2} - j^{2}) {}\\ & =& (n + 1)^{2} - 1 {}\\ & =& n^{2} + 2n. {}\\ \end{array}$$

But we also have

$$\displaystyle\begin{array}{rcl} \sum _{j=1}^{n}(2j + 1)& =& 2\sum _{ j=1}^{n}j +\sum _{ j=1}^{n}1 {}\\ & =& 2\sum _{j=1}^{n}j + n, {}\\ \end{array}$$

and so

$$\displaystyle\begin{array}{rcl} 2\sum _{j=1}^{n}j& =& \sum _{ j=1}^{n}(2j + 1) - n {}\\ & =& n^{2} + 2n - n {}\\ & =& n^{2} + n. {}\\ \end{array}$$

Hence we obtain the following theorem.

Theorem 1.5.4.

For every positive integer n

$$\displaystyle{\sum _{j=1}^{n}j = \frac{n(n + 1)} {2}.}$$

A similar argument can be used to show that

$$\displaystyle{\sum _{j=1}^{n}j^{2} = \frac{n(n + 1)(2n + 1)} {6}.}$$

The next theorem also illustrates the telescoping property in action.

Theorem 1.5.5.

If n is a nonnegative integer and a ≠ b, then

$$\displaystyle{\sum _{j=0}^{n}a^{j}b^{n-j} = \frac{a^{n+1} - b^{n+1}} {a - b} = \frac{b^{n+1} - a^{n+1}} {b - a},}$$

where the convention that 0 0 = 1 is used.

Proof.

By distributivity and the telescoping property, we have

$$\displaystyle\begin{array}{rcl} (a - b)\sum _{j=0}^{n}a^{j}b^{n-j}& =& \sum _{ j=0}^{n}(a^{j+1}b^{n-j} - a^{j}b^{n-j+1}) {}\\ & =& a^{n+1} - b^{n+1}, {}\\ \end{array}$$

and the result follows. □ 

Putting b = 1, we draw the following conclusion.

Corollary 1.5.6.

If a ≠ 1, then

$$\displaystyle{\sum _{j=0}^{n}a^{j} = \frac{a^{n+1} - 1} {a - 1} = \frac{1 - a^{n+1}} {1 - a},}$$

where 0 0 = 1.

There is one further theorem concerning finite sums that we include in this section. It is called the binomial theorem and gives a formula for (a + b)n for every nonnegative integer n. It furnishes another example of a proof by induction.

First we introduce some new notation. If n and r are integers such that 0 ≤ r ≤ n, then we define

$$\displaystyle\begin{array}{rcl}{ n\choose r}& =& \frac{n!} {r!(n - r)!} {}\\ & =& \frac{n(n - 1)\cdots (n - r + 1)} {r!}. {}\\ \end{array}$$

For example,

$$\displaystyle{{n\choose 0} = \frac{n!} {0!n!} = 1.}$$

Similarly,

$$\displaystyle{{n\choose n} = 1.}$$

Because of their role in the binomial theorem, these numbers are often called the binomial coefficients. They satisfy the following lemma.

Lemma 1.5.7.

If n and r are positive integers and r ≤ n, then

$$\displaystyle{{n + 1\choose r} ={ n\choose r} +{ n\choose r - 1}.}$$

Proof.

By direct calculation, we have

$$\displaystyle\begin{array}{rcl}{ n\choose r} +{ n\choose r - 1}& =& \frac{n!} {r!(n - r)!} + \frac{n!} {(r - 1)!(n - r + 1)!} {}\\ & =& \frac{n!(n - r + 1 + r)} {r!(n - r + 1)!} {}\\ & =& \frac{(n + 1)!} {r!(n - r + 1)!} {}\\ & =&{ n + 1\choose r}. {}\\ \end{array}$$

 □ 

Theorem 1.5.8 (Binomial Theorem).

Let a and b be numbers and n a nonnegative integer. Then

$$\displaystyle{ (a + b)^{n} =\sum _{ j=0}^{n}{n\choose j}a^{j}b^{n-j}, }$$
(1.18)

where the convention that 0 0 = 1 is used.

Proof.

Both sides are equal to 1 if n = 0. Assume that the theorem holds for some integer n ≥ 0. Then

$$\displaystyle\begin{array}{rcl} (a + b)^{n+1}& =& (a + b)(a + b)^{n} {}\\ & =& (a + b)\sum _{j=0}^{n}{n\choose j}a^{j}b^{n-j} {}\\ & =& \sum _{j=0}^{n}{n\choose j}a^{j+1}b^{n-j} +\sum _{ j=0}^{n}{n\choose j}a^{j}b^{n-j+1} {}\\ & =& \sum _{j=1}^{n+1}{n\choose j - 1}a^{j}b^{n-j+1} +\sum _{ j=0}^{n}{n\choose j}a^{j}b^{n-j+1} {}\\ & =& \sum _{j=1}^{n}{n\choose j - 1}a^{j}b^{n+1-j} + a^{n+1} + b^{n+1} +\sum _{ j=1}^{n}{n\choose j}a^{j}b^{n+1-j} {}\\ & =& b^{n+1} +\sum _{ j=1}^{n}\left ({n\choose j - 1} +{ n\choose j}\right )a^{j}b^{n+1-j} + a^{n+1} {}\\ & =& b^{n+1} +\sum _{ j=1}^{n}{n + 1\choose j}a^{j}b^{n+1-j} + a^{n+1} {}\\ & =& \sum _{j=0}^{n+1}{n + 1\choose j}a^{j}b^{n+1-j}. {}\\ \end{array}$$

The result follows by induction. □ 

The sum on the right-hand side of Eq. (1.18) is often referred to as the binomial expansion of (a + b)n. Note that the result of Example 1.3.1 can easily be deduced by considering three terms of the binomial expansion with a = 1 and b = x.

Exercises 1.3.

  1. 1.

    Use induction to prove the following formulas for all positive integers n:

    1. (a)

      \(\sum _{j=1}^{n}j = \frac{n(n+1)} {2}\);

    2. (b)

      \(\sum _{j=1}^{n}j^{2} = \frac{n(n+1)(2n+1)} {6}\);

    3. (c)

      \(\sum _{j=1}^{n} \frac{1} {j(j+1)} = \frac{n} {n+1}\);

    4. (d)

      \(\sum _{j=1}^{n} \frac{j} {(j+1)!} = 1 - \frac{1} {(n+1)!}\);

    5. (e)

      \(\sum _{j=1}^{n}j^{3} = \left (\sum _{j=1}^{n}j\right )^{2}\);

    6. (f)

      \(\sum _{j=1}^{n}j^{4} = \frac{n(n+1)(2n+1)(3n^{2}+3n-1)} {30}\);

    7. (g)

      \(\sum _{j=1}^{n-1}\mathit{jx}^{j} = \frac{x-\mathit{nx}^{n}+(n-1)x^{n+1}} {1-x^{2}}\) for every real x ≠ ± 1.

  2. 2.

    Let \(n \in \mathbb{N}\) and let a 1, a 2, , a n be real numbers such that 0 ≤ a j  ≤ 1 for all j. Prove that

    $$\displaystyle{\prod _{j=1}^{n}(1 - a_{ j}) \geq 1 -\sum _{j=1}^{n}a_{ j}.}$$
  3. 3.

    Discover and prove a theorem about the relative sizes of 3n and n! , where n is a positive integer.

  4. 4.

    A function \(f: \mathbb{R} \rightarrow \mathbb{R}\) is said to be convex if for all nonnegative real numbers α 1 and α 2 with sum equal to 1 we have

    $$\displaystyle{f(\alpha _{1}x_{1} +\alpha _{2}x_{2}) \leq \alpha _{1}f(x_{1}) +\alpha _{2}f(x_{2})}$$

    for each \(x_{1} \in \mathbb{R}\) and \(x_{2} \in \mathbb{R}\). Prove that

    $$\displaystyle{f\left (\sum _{j=1}^{n}\alpha _{ j}x_{j}\right ) \leq \sum _{j=1}^{n}\alpha _{ j}f(x_{j})}$$

    whenever n is an integer greater than 1 and α 1, α 2, , α n are nonnegative real numbers with sum 1. This result is known as Jensen’s inequality. More generally, show that

    $$\displaystyle{f\left (\sum _{j=1}^{n}(\alpha _{ j}x_{j})\bigg/\sum _{k=1}^{n}\alpha _{ k}\right ) \leq \sum _{j=1}^{n}(\alpha _{ j}f(x_{j}))\bigg/\sum _{k=1}^{n}\alpha _{ k}.}$$
  5. 5.

    Use the telescoping property to prove the following identities for all \(n \in \mathbb{N}\):

    1. (a)

      \(\sum _{j=1}^{n} \frac{1} {j(j+2)} = \frac{3} {4} -\frac{1} {2}\left ( \frac{1} {n+1} + \frac{1} {n+2}\right )\);

    2. (b)

      \(\sum _{j=1}^{n} \frac{1} {\sqrt{j}+\sqrt{j+1}} = -1 + \sqrt{n + 1}\);

    3. (c)

      \(\sum _{j=1}^{n}j \cdot j! = (n + 1)! - 1\);

    4. (d)

      \(\sum _{j=1}^{n} \frac{6^{j}} {(3^{j+1}-2^{j+1})(3^{j}-2^{j})} = 3 - \frac{3^{n+1}} {3^{n+1}-2^{n+1}}\);

    5. (e)

      \(\sum _{j=1}^{n} \frac{1} {j^{2}+4j+3} = \frac{5} {12} -\frac{1} {2}\left ( \frac{1} {n+2} - \frac{1} {n+3}\right )\);

    6. (f)

      \(\sum _{j=1}^{n} \frac{1} {(j+1)\sqrt{j}+j\sqrt{j+1}} = 1 - \frac{1} {\sqrt{n+1}}\);

    7. (g)

      \(\sum _{j=1}^{n} \frac{j} {j^{4}+j^{2}+1} = \frac{1} {2} - \frac{1} {2n(n+1)+2}\);

    8. (h)

      \(\sum _{j=1}^{n} \frac{j} {4j^{4}+1} = \frac{1} {4} - \frac{1} {8n^{2}+8n+4}\).

  6. 6.

    Let d and a 1 be numbers. Define a j for each integer j > 1 by the equation \(a_{j+1} = a_{j} + d\), and suppose that a j ≠ 0 for all \(j \in \mathbb{N}\). Find the sum

    $$\displaystyle{\sum _{j=1}^{n} \frac{1} {a_{j}a_{j+1}}}$$

    for all \(n \in \mathbb{N}\).

  7. 7.

    Prove that

    $$\displaystyle{\sum _{j=1}^{n}j!(j^{2} + j + 1) = n!(n + 1)^{2} - 1}$$

    for all \(n \in \mathbb{N}\).

  8. 8.

    Evaluate the sum

    $$\displaystyle{\sum _{j=3}^{n} \frac{1} {j^{2} - 4}}$$

    for all integers n ≥ 3.

  9. 9.

    Suppose that | na n  | ≤ M for all n ≥ 0. Show that

    $$\displaystyle{S_{n} -\sigma _{n} = \frac{1} {n + 1}\sum _{j=1}^{n}ja_{ j} \leq M}$$

    for all n, where \(S_{n} =\sum _{ j=0}^{n}a_{j}\) and \(\sigma _{n} =\sum _{ j=0}^{n}S_{j}\) for all n.