This chapter delves into the fundamental mathematical structures of normed linear spaces and inner product spaces, providing a solid comprehension of these essential mathematical structures. Normed spaces are defined as vector spaces that have been reinforced with a norm function that quantifies the magnitude or length of a vector from the origin. Several examples, such as Euclidean space with the well-known Euclidean norm, demonstrate the use of normed spaces. Building on this, inner product spaces are investigated, with the goal of broadening the concept of normed spaces by integrating an inner product that generalizes the dot product. Euclidean space is one example, where the inner product can characterize orthogonality and angle measurements. The chapter expands on the importance of orthogonality in inner product spaces, providing insights into geometric relationships and applications in a variety of domains. Gram–Schmidt orthogonalization technique is introduced, which provides a mechanism for constructing orthogonal bases from any bases of an inner product space. The concept of orthogonal complement and projection onto subspaces broadens our understanding by demonstrating the geometrical interpretation and practical application of these fundamental mathematical constructs. Proficiency in these topics is essential for advanced mathematical study and a variety of real-world applications in a variety of areas.

1 Normed Linear Spaces

 

In this section, we will introduce a metric structure called a norm on a vector space and then study in detail the resultant space. A vector space with a norm defined on it is called normed linear space. A norm, which intuitively measures the magnitude or size of a vector in a normed space, enables the definition of distance and convergence. Normed spaces provide an adaptive environment for various mathematical and scientific applications, providing a deeper understanding of vector spaces and accommodating numerous norm functions to meet various needs. Let us start with the following definition.

Definition 5.1

(Normed linear space) Let V be a vector space over the field \(\mathbb {K}\), where \(\mathbb {K}\) is either \(\mathbb {R}\) or \(\mathbb {C}\). Norm is a real-valued function on V \((\left\| .\right\| :V \rightarrow \mathbb {R})\) satisfying the following three conditions for all \(u,v \in V\) and \(\lambda \in \mathbb {K}\):

  1. (N1)

    \(\left\| v \right\| \ge 0\), and \(\left\| v \right\| =0\) if and only if \(v=0\)

  2. (N2)

    \(\left\| \lambda v \right\| = |\lambda |\left\| v \right\| \)

  3. (N3)

    \(\left\| u+v \right\| \le \left\| u \right\| + \left\| v \right\| \).(Triangle Inequality)

Then V together with a norm defined on it, denoted by \((V,\left\| .\right\| )\), is called a Normed linear space.

Example 5.1

Consider the vector space \(\mathbb {R}\) over \(\mathbb {R}\). Define \(\left\| v \right\| _{0} = |v |\) for \(v \in \mathbb {R}\). Then by the properties of modulus function, \(\left\| . \right\| _{0}\) is a norm on \(\mathbb {R}\).

Example 5.2

Consider the vector space \(\mathbb {R}^n\) over \(\mathbb {R}\). For \(v=(v_1,v_2,\ldots ,v_n)\) in \(\mathbb {R}^n\), define \(\left\| v \right\| _2 = \left( \sum _{i=1}^n |v_i |^2 \right) ^{\frac{1}{2}} \). This norm is called the 2-norm.

  1. (N1)

    Clearly \(\left\| v \right\| _2 = \left( \sum _{i=1}^n |v_i |^2 \right) ^{\frac{1}{2}} \ge 0\) and \(\left\| v \right\| _2 = \left( \sum _{i=1}^n |v_i |^2 \right) ^{\frac{1}{2}} =0 \Leftrightarrow \)

    \(|v_i |^2=0 \) for all \(i=1,2, \ldots ,n \Leftrightarrow v=0\).

  2. (N2)

    For \(\lambda \in \mathbb {R}\) and \(v \in \mathbb {R}^n\),

    $$\begin{aligned} \left\| \lambda v \right\| _2 = \left( \sum _{i=1}^n |\lambda v_i |^2 \right) ^{\frac{1}{2}} = \left( \sum _{i=1}^n |\lambda |^2 |v_i |^2 \right) ^{\frac{1}{2}}=\left( |\lambda |^2 \sum _{i=1}^n |v_i |^2 \right) ^{\frac{1}{2}}\\ = |\lambda |\left( \sum _{i=1}^n |v_i |^2 \right) ^{\frac{1}{2}} = |\lambda |\left\| v \right\| _2 \end{aligned}$$
  3. (N3)

    For \(u,v \in \mathbb {R}^n\),

    $$\begin{aligned} \sum _{i=1}^n \left( |u_i |+ |v_i |\right) ^2 &= \sum _{i=1}^n \left( |u_i |+ |v_i |\right) \left( |u_i |+ |v_i |\right) \\ &= \sum _{i=1}^n |u_i |\left( |u_i |+ |v_i |\right) + \sum _{i=1}^n |v_i |\left( |u_i |+ |v_i |\right) \\ & \le \left( \sum _{i=1}^n |u_i |^2 \right) ^ {\frac{1}{2}} \left( \sum _{i=1}^n \left( |u_i |+ |v_i |\right) ^2 \right) ^ {\frac{1}{2}} \\ {} &+\left( \sum _{i=1}^n |v_i |^2 \right) ^ {\frac{1}{2}} \left( \sum _{i=1}^n \left( |u_i |+ |v_i |\right) ^2 \right) ^ {\frac{1}{2}} \\ &= \left( \sum _{i=1}^n \left( |u_i |+ |v_i |\right) ^2 \right) ^ {\frac{1}{2}} \left[ \left( \sum _{i=1}^n |u_i |^2 \right) ^ {\frac{1}{2}} +\left( \sum _{i=1}^n |v_i |^2 \right) ^ {\frac{1}{2}} \right] \end{aligned}$$

    which implies

    $$\begin{aligned}\left( \sum _{i=1}^n \left( |u_i |+ |v_i |\right) ^2 \right) ^ {\frac{1}{2}} \le \left( \sum _{i=1}^n |u_i |^2 \right) ^ {\frac{1}{2}} +\left( \sum _{i=1}^n |v_i |^2 \right) ^ {\frac{1}{2}}\end{aligned}$$

    Since \( |u_i +v_i |\le |u_i |+ |v_i |\), we have

    $$\begin{aligned}\left( \sum _{i=1}^n |u_i + v_i |^2 \right) ^ {\frac{1}{2}} \le \left( \sum _{i=1}^n |u_i |^2 \right) ^ {\frac{1}{2}} +\left( \sum _{i=1}^n |v_i |^2 \right) ^ {\frac{1}{2}}\end{aligned}$$

Therefore \(\mathbb {R}^n\) is a normed linear space with respect to \(2-norm\). In general, \(\mathbb {R}^n\) is a normed linear space with respect to the \(p-norm\) defined by \(\left\| v \right\| _p = \left( \sum _{i=1}^n |v_i |^p \right) ^{\frac{1}{p}},p\ge 1\).(Verify)

Example 5.3

Consider the vector space \(\mathbb {R}^n\) over \(\mathbb {R}\). For \(v=(v_1,v_2,\ldots ,v_n)\) in \(\mathbb {R}^n\), define \(\left\| v \right\| _{\infty } = \max \left\{ |v_1 |, |v_2 |, \ldots ,|v_n |\right\} =\max \limits _{i \in \lbrace 1,\ldots ,n \rbrace } \{ |v_i |\} \). This norm is called the infinity norm

Example 5.4

Let \(V=\mathcal {C}[a,b]\), the space of continuous real-valued functions on [ab]. For \(f \in V\), define \(\left\| f \right\| = \max \limits _{x \in [a,b]} |f(x) |\). This norm is called supremum norm. 

  1. (N1)

    Clearly \(\left\| f \right\| = \max \limits _{x \in [a,b]} |f(x) |\ge 0 \). Also, \(\left\| f \right\| = \max \limits _{x \in [a,b]} |f(x) |= 0 \Leftrightarrow \) \(|f(x) |= 0\) for all \(x \in [a,b]\Leftrightarrow f(x)=0\) for all \(x \in [a,b]\).

  2. (N2)

    For \(\lambda \in \mathbb {R}\) and \(f \in \mathcal {C}[a,b]\),

    $$\begin{aligned} \left\| \lambda f \right\| &= \max \limits _{x \in [a,b]} |(\lambda f)(x) |=\max \limits _{x \in [a,b]} |\lambda \left( f(x)\right) |= \max \limits _{x \in [a,b]} |\lambda ||f(x) |= |\lambda |\max \limits _{x \in [a,b]} |f(x) |= |\lambda |\left\| f \right\| \end{aligned}$$
  3. (N3)

    Since \( |a+b |\le |a |+ |b |\), for \(f,g \in \mathcal {C}[a,b]\) we have

    $$\begin{aligned} \left\| f+g \right\| &= \max \limits _{x \in [a,b]} |(f+g)(x) |= \max \limits _{x \in [a,b]} |f(x)+g(x) |\le \max \limits _{x \in [a,b]} |f(x) |+\max \limits _{x \in [a,b]} |g(x) |=\left\| f \right\| + \left\| g \right\| \end{aligned}$$

Then \(\mathcal {C}[a,b]\) is a normed linear space with the supremum norm (Fig. 5.1).

Fig. 5.1
2 line graphs. The left graph has an upward open parabola with its vertex at (0, 0) and a dashed rectangle passing through (negative 4, 0), (negative 4, 16), (4, 16), and (4, 0). The right graph has a cosine waveform between 2 dashed horizontal lines at 1 and negative 1.

Consider the functions \(f(x)=x^2\) and \(g(x)=\cos x\) in \(\mathcal {C}[-4,4]\). Then \(\left\| f \right\| = \max \limits _{x \in [-4,4]} |x^2 |=16\) and \(\left\| g \right\| = \max \limits _{x \in [-4,4]} |\cos x |=1\)

We have shown that \(\left\| f \right\| = \max \limits _{x \in [a,b]} |f(x) |\) defines a norm in \(\mathcal {C}[a,b]\). Now let us define \(\left\| f \right\| = \min \limits _{x \in [a,b]} |f(x) |\). Does that function defines a norm on \(\mathcal {C}[a,b]\)? No, it doesn’t! Clearly, we can observe that \(\left\| f \right\| =0\) does not imply that \(f=0\). For example, consider the function \(f(x)=x^2\) in \(\mathcal {C}[-4,4]\). Then \(\left\| f \right\| = \min \limits _{x \in [-4,4]} |f(x) |=0\), but \(f\ne 0\). As (N1) is violated, \(\left\| f \right\| = \min \limits _{x \in [-4,4]} |f(x) |\) does not defines a norm on \(\mathcal {C}[-4,4]\).

Definition 5.2

(Subspace) Let \(\left( V,\left\| . \right\| \right) \) be a normed linear space. A subspace of V is a vector subspace W of V with the same norm as that of V. The norm on W is said to be induced by the norm on V.

Example 5.5

Consider \(\mathcal {C}[a,b]\) with the supremum norm, then \(\mathbb {P}[a,b]\) is a subspace of \(\mathcal {C}[a,b]\) with supremum norm as the induced norm.

We will now show that every normed linear space is a metric space. Consider the following theorem.

Theorem 5.1

Let \(\left( V, \left\| . \right\| \right) \) be a normed linear space. Then \(d(v_1,v_2)= \left\| v_1-v_2 \right\| \) is a metric on V.

Proof

Let \(v_1,v_2,v_3 \in V\). Then

  1. (M1)

    By (N1), we have

    $$\begin{aligned}d(v_1,v_2)=\left\| v_1-v_2 \right\| \ge 0\end{aligned}$$

    and

    $$\begin{aligned}d(v_1,v_2)= \left\| v_1-v_2 \right\| =0 \Leftrightarrow v_1-v_2=0\Leftrightarrow v_1=v_2\end{aligned}$$
  2. (M2)

    By (N2), we have

    $$\begin{aligned}d(v_1,v_2)= \left\| v_1-v_2 \right\| = \left\| v_2-v_1 \right\| =d(v_2,v_1) \end{aligned}$$
  3. (M3)

    Now we have to prove the triangle inequality.

    $$\begin{aligned} d(v_1,v_2)&=\left\| v_1-v_2 \right\| \\ &=\left\| v_1-v_3+v_3-v_2 \right\| \\ &\le \left\| v_1-v_3 \right\| + \left\| v_3-v_2 \right\| \ \ \ \left( By(N3) \right) \\ &= d(v_1,v_3) + d(v_3,v_2) \end{aligned}$$

The metric defined in the above theorem is called metric induced by the norm. The above theorem implies that every normed linear space is a metric space with respect to the induced metric. Is the converse true? Consider the following example.

Example 5.6

In Example 1.25, we have seen that for any non-empty set X, the function d defined by

$$\begin{aligned}d(x,y)={\left\{ \begin{array}{ll} 1\ , \ \ x \ne y \\ 0\ , \ \ x =y \end{array}\right. }\end{aligned}$$

defines a metric on X. Let V be a vector space over the field \(\mathbb {K}\). Clearly (Vd) is a metric space. If V is a normed linear space, by Theorem 5.1, we have

$$\begin{aligned}\left\| v \right\| =d(v,0)= {\left\{ \begin{array}{ll} 1\ , \ \ v \ne 0 \\ 0\ , \ \ v =0 \end{array}\right. }\end{aligned}$$

As you can observe that, for any \(\lambda \ne 0 \in \mathbb {K}\),

$$\begin{aligned}\left\| \lambda v \right\| = {\left\{ \begin{array}{ll} 1\ , \ \ v \ne 0 \\ 0\ , \ \ v =0 \end{array}\right. } \ne |\lambda |\left\| v \right\| = {\left\{ \begin{array}{ll} |\lambda |\ , \ \ v \ne 0 \\ 0\ , \ \ v =0 \end{array}\right. } \end{aligned}$$

the discrete metric cannot be obtained from any norm. Therefore, every metric space need not be a normed linear space.

Now that you have understood the link between normed spaces and metric spaces, let us discuss a bit more in detail about defining a distance notion on vector spaces. In Example 5.2, we have defined a number of norms on \(\mathbb {R}^n\). What is the significance of defining several norms on a vector space? Consider a simple example as depicted in Fig. 5.2.

Fig. 5.2
An 8 by 8 grid has two stars at A 2 and D 5. The rows are numbered from 8 to 1 from top to bottom. The columns from left to right are labeled from A to H.

Suppose that you have to move a chess piece from A1 to D4 in least number of moves. If the piece is a bishop we can move the piece directly from \(A_1\) to \(D_4\). If the piece is a rook, first we will have to move the piece either to A4 or D1 and then to D4. Now, if the piece is king, the least number of moves would be 3\((A_1 \rightarrow B_2 \rightarrow C_3 \rightarrow D_4)\). Observe that the path chosen by different pieces to move from \(A_1\) to \(D_4\) in least number of moves are different. Now try to calculate the distance traveled by the piece in each of these cases. Are they the same? We need different notions of distances, right? Interestingly, the metric induced from the infinity norm, \(d(u,v)=max_i \{ |u_i -v_i |\}\) is known as the chess distance or Chebyshev distance (In honor of the Russian mathematician, Pafnuty Chebyshev (1821–1894)) as the Chebyshev distance between two spaces on a chess board gives the minimum number of moves required by the king to move between them

In real life, we can justify the significance of defining various notions of distances on vector spaces with many practical applications. Therefore, while dealing with a normed linear space we choose the norm which meets our need accordingly (Fig. 5.3).

Fig. 5.3
A graph has a right triangle made of 3 vectors. The horizontal vector 1 extends from the origin to (1, 0). The vertical vector 2 extends from (1, 0) to (1, 2). The diagonal vector extends from the origin to (1, 2).

Consider \(\mathbb {R}^2\) with different norms defined on it. If we are using the 2-norm, the distance from the origin to the point (1, 2) is \(\sqrt{|1 |^2 + |2 |^2} = \sqrt{5} \) as it is length of the hypotenuse of a triangle with base 1 and height 2. If we are using 1-norm the distance will be 3 as it is the sum of the absolute values of the coordinates and if we are using infinity norm, the distance will be 2 as it is the maximum of absolute values of the coordinates

Now we understand that different norms on a vector space can give rise to different geometrical and analytical structures. Now we will discuss whether these structures are related or not. As a prerequisite for the discussion, let us define the “fundamental sets” on a normed linear space

Definition 5.3

(Open ball) Let \(\left( V, \left\| . \right\| \right) \) be a normed linear space. For any point \(v_0 \in V\) and \(\epsilon \in \mathbb {R}^{+}\),

$$\begin{aligned}B_{\epsilon }(v_0) = \left\{ v \in V \mid \ \left\| v - v_0 \right\| < \epsilon \right\} \end{aligned}$$

is called an open ball centered at \(v_0\) with radius \(\epsilon \). The set \( \left\{ v \in V \mid \ \left\| v \right\| = 1 \right\} \) is called the unit sphere in V

We can see that this definition follows from the Definition 1.23 of an open ball in a metric space.

Example 5.7

Consider \((\mathbb {R},\left\| . \right\| _0)\). In Example 1.26, we have seen that the open balls in \((\mathbb {R},\left\| . \right\| _0)\) are open intervals in the real line. Now, consider the set \(S=\lbrace (v_1,0) \mid v_1 \in \mathbb {R},1<v_1<4 \rbrace \) in \((\mathbb {R}^2,\left\| . \right\| _2)\). Is S an open ball in \((\mathbb {R}^2,\left\| . \right\| _2)\)? Is there any way to generalize the open balls in \((\mathbb {R}^2,\left\| . \right\| _2)\)? Yes, we can!! Take an arbitrary point \(w=(w_1,w_2) \in \mathbb {R}^2\), and \(\epsilon \in \mathbb {R}^+\). Then

$$\begin{aligned} B_{\epsilon }(w)&= \lbrace v=(v_1,v_2) \in \mathbb {R}^2 \mid \left\| v-w \right\| < \epsilon \rbrace \\ &= \lbrace v=(v_1,v_2) \in \mathbb {R}^2 \mid (v_1-w_1)^2+(v_2-w_2)^2 < \epsilon ^2 \rbrace \end{aligned}$$

That is, open balls in \((\mathbb {R}^2,\left\| . \right\| _2)\) are “open circles” (Fig. 5.4).

Fig. 5.4
Two graphs of y versus x. A. A dashed circle with center w lies in the first quadrant. The expression above reads B subscript epsilon of w 0. B. A thick horizontal line within parentheses along the horizontal axis on the right is labeled S.

The open balls in \((\mathbb {R}^2,\left\| . \right\| _2)\) are open circles as given in (a). Clearly, S is not an open ball in \((\mathbb {R}^2,\left\| . \right\| _2)\)

Example 5.8

Let us compute the open unit balls centered at the origin in \(\mathbb {R}^2\) with respect to 1-norm, 2-norm and infinity norm. Let \(B_{\epsilon }^p\) denote the open ball in \(\left( \mathbb {R}^2,\left\| .\right\| _p \right) \). Then

$$\begin{aligned} B_{1}^1 &= \left\{ (v_1,v_2) \in \mathbb {R}^2 \mid |v_1 |+ |v_2 |< 1 \right\} \\ B_{1}^2 &= \left\{ (v_1,v_2) \in \mathbb {R}^2 \mid |v_1 |^2 + |v_2 |^2 < 1 \right\} \end{aligned}$$

and (Fig. 5.5)

$$\begin{aligned}B_{1}^{\infty } = \left\{ (v_1,v_2) \in \mathbb {R}^2 \mid max \{|v_1 |, |v_2 |\} < 1 \right\} \end{aligned}$$
Fig. 5.5
A graph has an outer square given by the norm of v infinity = 1. It contains a circle given by norm v 2 = 1 and a tilted square given by norm v 1 = 1.

Unit spheres in \(\mathbb {R}^2\) with respect to \(1-\) norm, 2-norm and infinity norm. Observe that the interior portion of the unit spheres represents the open unit ball, \(B_1(0)=\{ v \in V \mid \ \left\| v \right\| < 1 \}\) in each of the norms

Observe that the open balls in \(\mathbb {R}^2\) corresponding to different norms may not have the same shape even if the center and radius are the same. Now, let us give you an example of open ball in \(\mathcal {C}[-4,4]\) with supremum norm (Fig. 5.6).

Fig. 5.6
A graph has three cosine waveforms at equal intervals from top to bottom. The upper and lower waves are dashed. The highest peaks are at (0, 2) and (0, 0). The solid wave at the center passes through the horizontal axis with its highest peak at (0, 1).

Consider a function f in \(\mathcal {C}[-4,4]\) with supremum norm. Continuous functions that lie between the dotted lines constitute \(B_1(f)= \lbrace g \in \mathcal {C}[-4,4] \mid \ \left\| f-g \right\| <1 \rbrace \)

Earlier, we have posed a question, does there exist any link between the topology generated by the different norms defined on a vector space? It is interesting to note that the topology generated by any norms on a finite-dimensional space is the same. That is, the open sets defined by these norms are topologically same. The following figure illustrates this idea by taking the open balls in \(\mathbb {R}^2\) generated by the infinity norm and 2-norm as an example (Fig. 5.7).

Fig. 5.7
Two graphs. The left graph has a dashed unit circle with a dashed square in the first quadrant. The right graph has a dashed unit square with a dashed circle in the first quadrant.

Clearly, we can observe that every point in an open ball generated by the infinity norm is inside an open ball generated by 2-norm and vice versa

Now we will prove algebraically that, in a finite-dimensional space the open sets generated by any norms are topologically the same. For that, we will have the following definition.

Definition 5.4

(Equivalence of norms) A norm \(\left\| . \right\| \) on a vector space V is equivalent to \(\left\| . \right\| _0 \) on V if there exists positive scalars \(\lambda \) and \(\mu \) such that for all \(v \in V\), we have

$$\begin{aligned} \lambda \left\| v \right\| _0 \le \left\| v \right\| \le \mu \left\| v \right\| _0 \end{aligned}$$

Example 5.9

Let us consider the 1-norm, 2-norm and infinity norm in \(\mathbb {R}^n\). For any element \(v=(v_1,v_2,\ldots ,v_n) \in \mathbb {R}^n\), we have

$$\begin{aligned}\left\| v \right\| _{\infty }= \max \limits _{ i \in \{ 1,2, \ldots , n \}} \{ |v_{i} |\} \le |v_{1} |+ |v_{2} |+ \cdots + |v_{n} |= \left\| v \right\| _1 \end{aligned}$$

Also by Holder’s inequality (Exercise 5, Chap. 1), we have

$$\begin{aligned}\left\| v \right\| _1 = \sum _{i=1}^n |v_i |= \sum _{i=1}^n |v_i |.1 \le \left( \sum _{i=1}^{n}|v_{i} |^2\right) ^{\frac{1}{2}}\left( \sum _{i=1}^n 1^2 \right) ^{\frac{1}{2}} =\sqrt{n}\left\| v \right\| _2\end{aligned}$$

and finally,

$$\begin{aligned}\left\| v \right\| _2 = \left( \sum _{i=1}^{n}|v_{i} |^2\right) ^{\frac{1}{2}} \le \left( \sum _{i=1}^{n}\left( \max \limits _{i \in \{ 1,2, \ldots , n \}} \{ |v_{i} |\} |v_{i} |\right) ^2\right) ^{\frac{1}{2}} = \left( n \left\| v \right\| _{\infty }^2 \right) ^{\frac{1}{2}}=\sqrt{n}\left\| v \right\| _{\infty } \end{aligned}$$

Thus \(1-\) norm, 2-norm and infinity norm in \(\mathbb {R}^n\) are equivalent.

In fact, we can prove that every norm in a finite-dimensional space is equivalent. But this is not the case if the space is infinite- dimensional. Consider the following example.

Example 5.10

Consider the linear space \(\mathcal {C}[0,1]\) over the field \(\mathbb {R}\). In Example 5.4, we have seen that \(\left\| f \right\| = \max \limits _{x \in [0,1]} |f(x) |\) defines a norm on \(\mathcal {C}[0,1]\), called the supremum norm. Also, we can show that \(\left\| f \right\| _1= \int _0^1 |f(x) |dx\) defines a norm on \(\mathcal {C}[0,1]\)(Verify!). We will show that there doesn’t exist any scalar \(\lambda \) such that \(\left\| f \right\| \le \left\| f \right\| _1\) for all \(f \in \mathcal {C}[0,1]\). For example, consider a function defined as in Fig. 5.8. Then we can observe that \(\left\| f_n \right\| =1\) and \( \left\| f_n \right\| _1 = \frac{1}{2n}\)(How?). Clearly, we can say that there doesn’t exists any scalar \(\lambda \) such that \(1 \le \frac{\lambda }{2n}\) for all n.

Fig. 5.8
A graph has a line that begins at the origin, increases to (1 over n, 1), and remains constant up to (1, 1).

Define as shown in the figure. Clearly \(f_n(x)\) belongs to \(\mathcal {C}[0,1]\) for all n

We have discussed the equivalence of norms in terms of defining topologically identical open sets. This can also be discussed in terms of sequences. In Chap. 1, we have seen that the addition of metric structure to an arbitrary set enables us to discuss the convergence or divergence of sequences, limit and continuity of functions, etc., in detail. The same happens with normed linear spaces also. The difference is that we are adding the metric structure not just to any set, but a vector space. All these notions can be discussed in terms of induced metric as well as norm. We will start by defining a Cauchy sequence in a normed linear space.

Definition 5.5

(Cauchy Sequence) A sequence \(\lbrace v_n \rbrace \) in a normed linear space \(\left( V,\left\| . \right\| \right) \) is said to be Cauchy if for every \(\epsilon > 0\) there exists an \(N_{\epsilon } \in \mathbb {N}\) such that \(\left\| v_n - v_m \right\| <\epsilon \) for all \(m,n >N\).

Definition 5.6

(Convergence) Let \(\lbrace v_n \rbrace \) be a sequence in \(\left( V,\left\| . \right\| \right) \), then \(v_n \rightarrow v\) in V if and only if \(\left\| v_n-v \right\| \rightarrow 0\) as \(n \rightarrow \infty \).

In Chap. 1, we have seen that in a metric space every Cauchy sequence need not necessarily be convergent. Now the important question of whether a Cauchy sequence is convergent or not in a normed linear space pops up. The following example gives us an answer.

Example 5.11

Consider the normed linear space \(\mathbb {P}[0,1]\) over \(\mathbb {R}\) with the supremum norm. Consider the sequence, \(\left\{ p_n(x) \right\} \), where

$$\begin{aligned}p_n(x)=1+\frac{x}{1!}+\frac{x^2}{2!}+\cdots + \frac{x^n}{n!}\end{aligned}$$

Is the sequence convergent? If so, is the limit function a polynomial? Clearly, not! We know that \(p_n(x) \rightarrow e^x,x\in [0,1]\)(Verify!). Is it the only sequence in \(\mathbb {P}[0,1]\) over \(\mathbb {R}\) that converge to a function which is not a polynomial? Let us consider another sequence \(\left\{ q_n(x) \right\} \), where

$$\begin{aligned}q_n(x)=1+\frac{x}{2}+\frac{x^2}{4}+\cdots + \frac{x^n}{2^n}\end{aligned}$$

First we will prove that \(\left\{ q_n \right\} \) is a Cauchy sequence. For \(n>m\),

$$\begin{aligned} \left\| q_n(x) -q_m(x)\right\| &= \max \limits _{x \in [0,1]}\left| \sum _{i=0}^{n} \frac{x^i}{2^i} -\sum _{i=0}^{m} \frac{x^m}{2^m} \right| \\ &= \max \limits _{x \in [0,1]}\left| \sum _{i=m+1}^{n} \frac{x^i}{2^i} \right| \\ &\le \frac{1}{2^{m}} \end{aligned}$$

which shows that \(\left\{ q_n(x) \right\} \) is a Cauchy sequence. Now for any \(x \in [0,1]\), we have \(q_n(x) \rightarrow q(x)\) as \(n \rightarrow \infty \) where \(q(x)=\dfrac{1}{1-\frac{x}{2}}\) (How?) and clearly \(q(x) \notin \mathbb {P}[0,1]\) as it is not a polynomial function. Hence \(\left\{ q_n(x) \right\} \) is not convergent in \(\mathbb {P}[0,1]\). What about \(\mathbb {P}_n[0,1]\)? Is it complete?

Here is another example of an incomplete normed linear space.

Example 5.12

Consider \(\mathcal {C}[0,1]\) with \(\left\| f \right\| = \int _0^1 |f(x)|dx\) for \(f \in \mathcal {C}[0,1]\). Consider the sequence of functions \(f_n \in \mathcal {C}[0,1]\) where

$$\begin{aligned}f_n(x)={\left\{ \begin{array}{ll} nx,\ x\in \left[ 0,\frac{1}{n} \right] \\ 1,\ \ x\in \left[ \frac{1}{n},1 \right] \end{array}\right. }\end{aligned}$$

We will show that \(\lbrace f_n \rbrace \) is Cauchy but not convergent (Fig. 5.9).

Fig. 5.9
A graph has two lines from the origin. The line f n increases to (1 over n, 1), and the line f m increases to (1 over m, 1). The lines remain constant up to (1, 1). Two dashed vertical lines are drawn at 1 over n and 1 over m.

As \(\left\| f_n -f_m \right\| \) is the area of the triangle depicted in the figure, it is easy to observe that \(\{f_n \}\) is Cauchy

For \(n>m\),

$$\begin{aligned}|f_n(x) - f_m(x) |= {\left\{ \begin{array}{ll} nx-mx,\ x\in \left[ 0,\frac{1}{n} \right] \\ 1-mx,\ x\in \left[ \frac{1}{n},\frac{1}{m} \right] \\ 0,\ x \in \left[ \frac{1}{m},1 \right] \end{array}\right. }\end{aligned}$$

Then

$$\begin{aligned} \int _0^1 |f_n(x) - f_m(x) |dx &= \int _0^{\frac{1}{n}} \left( n-m \right) x.dx + \int _{\frac{1}{n}}^{\frac{1}{m}} \left( 1-mx\right) dx \\ &= (n-m)\frac{1}{2n^2}+\frac{1}{m}-\frac{1}{n}-\frac{1}{2m}+\frac{m}{2n^2}\\ &= \frac{1}{2}\left[ \frac{1}{m}-\frac{1}{n} \right] \end{aligned}$$

Now for any \(\epsilon >0\), take \(N > \frac{2}{\epsilon }\). Then for \(m,n > N\),

$$\begin{aligned}\int _0^1 |f_n(x) - f_m(x) |dx= \frac{1}{2}\left[ \frac{1}{m}-\frac{1}{n} \right] < \frac{1}{m}+\frac{1}{n} < \frac{\epsilon }{2}+\frac{\epsilon }{2} =\epsilon \end{aligned}$$

Therefore the sequence is Cauchy. Now consider

$$\begin{aligned}f(x)={\left\{ \begin{array}{ll} 0,\ x=0 \\ 1,\ x \in \left( 0,1 \right] \end{array}\right. }\end{aligned}$$

Then \(\left\| f_n -f \right\| =\frac{1}{n} \rightarrow 0 \) as \(n \rightarrow \infty \). That is, \(f_n\) converges to f but \(f \notin \mathcal {C}[0,1]\).

Normed linear spaces where every Cauchy sequence is convergent are of greater importance in Mathematics. Such spaces are named after the famous Polish mathematician Stefan Banach (1892–1945) who started a systematic study in this area.

Definition 5.7

(Banach Space) A complete normed linear space is called a Banach space. 

Example 5.13

Consider the normed linear space \(\mathbb {R}^n\) over \(\mathbb {R}\) with 2-norm. We will show that this space is a Banach space. Let \(\lbrace v_k \rbrace \) be a Cauchy sequence in \(\mathbb {R}^n\). As \(v_k \in \mathbb {R}^n\), we can take \(v_k = \left( v_1^{k},v_2^{k},\ldots ,v_n^{k} \right) \) for each k. Since \(\lbrace v_k \rbrace \) is a Cauchy sequence, for every \(\epsilon >0\) there exists an N such that

$$\begin{aligned}\left\| v_k-v_m \right\| ^2 = \sum _{i=1}^n \left( v_i^k-v_i^m\right) ^2<\epsilon ^2 \end{aligned}$$

for all \(k,m \ge N\). This implies that \(\left( v_i^k-v_i^m\right) ^2<\epsilon ^2\) for each \(i=1,2,\ldots , n\) and \(k,m\ge N\) and hence \(|v_i^k-v_i^m|<\epsilon \) for each \(i=1,2,\ldots , n\) and \(k,m\ge N\). Thus for a fixed i, the sequence \(v_i^1,v_i^2,\ldots \) forms a Cauchy sequence of real numbers. Since \(\mathbb {R}\) is complete, \(v_i^k\rightarrow v_i\) as \(k\rightarrow \infty \) for each i. Take \(v=\left( v_1,v_2,\ldots ,v_n \right) \in \mathbb {R}^n\). Then

$$\begin{aligned}\left\| v_k-v \right\| ^2 = \sum _{i=1}^n \left( v_i^k-v_i\right) ^2 \rightarrow 0\ as\ n\rightarrow \infty \end{aligned}$$

Hence, \(\left\| v_k-v \right\| \rightarrow 0\ as\ n\rightarrow \infty \). Therefore \(\mathbb {R}^n\) over \(\mathbb {R}\) with 2-norm is a Banach space. What about \(\mathbb {C}^n\) over \(\mathbb {C}\) with 2-norm?

In fact, we can prove that every finite-dimensional normed linear space is complete. We have seen that this is not true when the normed linear space is infinite-dimensional. Here is an example of infinite-dimensional Banach space.

Example 5.14

Consider \(\mathcal {C}[a,b]\) with supremum norm. Let \(\{ f_n \}\) be a Cauchy sequence in \(\mathcal {C}[a,b]\). Then for every \(\epsilon >0\) there exists an N such that

$$\begin{aligned} \left\| f_n - f_m \right\| = \max \limits _{x \in [a,b]} |f_n(x) - f_m(x) |< \epsilon \end{aligned}$$
(5.1)

Hence for any fixed \(x_0 \in [a,b]\), we have

$$\begin{aligned}|f_n(x_0)-f_m(x_0) |< \epsilon \end{aligned}$$

for all \(m,n > N\). This implies that \(f_1(x_0),f_2(x_0),f_3(x_0),\ldots \) is a Cauchy sequence of real numbers. Since \(\mathbb {R}\) is complete (by Theorem 1.2), this sequence converges, say \(f_n(x_0) \rightarrow f(x_0)\) as \(n \rightarrow \infty \). Proceeding like this for each point in [ab], we can define a function f(x) on [ab]. Now we have to prove that \(f_n \rightarrow f\) and \(f \in \mathcal {C}[a,b]\). Then from, Equation 5.1, as \(m \rightarrow \infty \), we have

$$\begin{aligned}\max \limits _{x \in [a,b]} |f_m(x) -f(x) |\le \epsilon \end{aligned}$$

for all \(m>N\). Hence for every \(x \in [a,b]\),

$$\begin{aligned} |f_m(x)-f(x) |\le \epsilon \end{aligned}$$

for all \(m >N\). This implies that \(\{ f_m(x) \}\) converges to f(x) uniformly on [ab]. Since \(f_m's\) are continuous on [ab] and the convergence is uniform, the limit function is continuous on [ab](See Exercise 12, Chap. 1). Thus \(f\in \mathcal {C}[a,b]\) and \(f_n \rightarrow f\). Therefore \(\mathcal {C}[a,b]\) is complete.

2 Inner Product Spaces

 

In the previous section, we have added a metric structure to vector spaces which enabled as to find the distance between any two vectors. Now we want to study the geometry of vector spaces which will be useful in many practical applications. In this section, we will give another abstract structure that will help us to study the orthogonality of vectors, projection of one vector over another vector, etc. First we will discuss the properties of the dot product in the space \(\mathbb {R}^2\) and then generalize these ideas to abstract vector spaces.

Definition 5.8

(Dot Product)  Let \(v=(v_1,v_2),w=(w_1,w_2) \in \mathbb {R}^2 \). The dot product of v and w is denoted by \('\ v.w\ '\) and is given by

$$\begin{aligned}v.w=v_1w_1+v_2w_2\end{aligned}$$

Theorem 5.2

For \(u,v,w \in \mathbb {R}^2 \) and \(\lambda \in \mathbb {K}\),

  1. (a)

    \(v.v \ge 0\) and \(v.v=0\) if and only if \(v=0\).

  2. (b)

    \(u. (v+w)=u.v +u.w\) (distributivity of dot product over addition)

  3. (c)

    \((\lambda u).v= \lambda (u.v)\)

  4. (d)

    \(u.v=v.u\) (commutative)

Proof

  1. (a)

    Let \(v=(v_1,v_2) \in \mathbb {R}^2 \). Clearly, \(v.v=v_1^2+v_2^2 \ge 0\) and

    $$\begin{aligned}v.v=v_1^2+v_2^2=0\Leftrightarrow v_1=v_2=0 \Leftrightarrow v=0 \end{aligned}$$
  2. (b)

    For \(u=(u_1,u_2),v=(v_1,v_2),w=(w_1,w_2) \in \mathbb {R}^2 \),

    $$\begin{aligned} u. (v+w) &= u_1(v_1+w_1)+u_2(v_2+w_2)\\ &=u_1v_1+u_2v_2+u_1w_1+u_2w_2 \\ &=u.v+u.w \end{aligned}$$
  3. (c)

    For \(u=(u_1,u_2),v=(v_1,v_2) \in \mathbb {R}^2 \) and \(\lambda \in \mathbb {K}\),

    $$\begin{aligned} (\lambda u).v &= (\lambda u_1, \lambda u_2).(v_1,v_2) \\ &=\lambda u_1v_1 + \lambda u_2v_2 \\ &=\lambda (u_1v_1 + u_2v_2) = \lambda (u.v) \end{aligned}$$
  4. (d)

    For \(u=(u_1,u_2),v=(v_1,v_2) \in \mathbb {R}^2 \),

    $$\begin{aligned}u.v=u_1v_1+u_2v_2=v_1u_1+v_2u_2=v.u\end{aligned}$$

Definition 5.9

(Length of a vector) Let \(v=(v_1,v_2) \in \mathbb {R}^2\). The length of v is denoted by \(|v |\) and is defined by \(|v |=\sqrt{v.v}= \sqrt{v_1^2+v_2^2}\).

Theorem 5.3

Let \(u,v \in \mathbb {R}^2\), then \(u.v= |u ||v |\cos \theta \) where \(0 \le \theta \le \pi \) is the angle between u and v.

Proof

Let \(u=(u_1,u_2),v=(v_1,v_2) \in \mathbb {R}^2\). If either u or v is the zero vector, say \(u=0\), then

$$\begin{aligned}u.v=0v_1+0v_2=0\end{aligned}$$

Then as \(|u |=0\), \(|u ||v |\cos \theta =0\). Therefore, the theorem holds. Now suppose that, both \(u,v \ne 0\). Consider the triangle with sides uv and w. Then \(w=v-u\) and by the law of cosines of triangle,

$$\begin{aligned} |w |^2=|u |^2 + |v |^2 - 2 |u ||v |\cos \theta \end{aligned}$$
(5.2)

where \(0 \le \theta \le \pi \) is the angle between u and v. Also,

$$\begin{aligned} |w |^2 = w.w = (v-u).(v-u)=(v-u).v-(v-u).u=v.v+u.u-2u.v \end{aligned}$$
(5.3)

Then equating (5.2) and (5.3), we get, \(u.v= |u ||v |\cos \theta \).

Remark 5.1

Let u and v be two vectors in \(\mathbb {R}^2\) and let \(\theta \) be the angle between u and v. Then

  1. 1.

    \(\theta =cos^{-1}\left( \dfrac{u.v}{|u ||v |}\right) \).

  2. 2.

    If \(\theta =\frac{\pi }{2}\), then \(u.v=0\). Then we say that u is orthogonal to v and is denoted by \(u \perp v\).

Let \(v \in \mathbb {R}^2\) be any vector and \(u\in \mathbb {R}^2\) be a vector of unit length. We want to find a vector in \(span\left( \lbrace u \rbrace \right) \) such that it is near to v than any other vector in \(span\left( \lbrace u \rbrace \right) \) (Fig. 5.10). We know that the shortest distance from a point to a line is the segment perpendicular to the line from the point. We will proceed using this intuition. From the above figure, we get

Fig. 5.10
A graph has two vectors from the origin with an angle theta between them. The vector u lies in the horizontal axis and extends from the origin to pi u of v. The vector v extends in the first quadrant. They are connected by a dashed vertical line to form a right triangle.

Orthogonal projection of v on u

$$\begin{aligned}\pi _u(v)=\left( |v |\cos \theta \right) u\end{aligned}$$

From Theorem 5.3, \(\cos \theta =\frac{u.v}{|u ||v |}\). Substituting this in the above equation, we get \(\pi _u(v)=(u.v)u\). The vector \(\pi _u(v)\) is called the orthogonal projection of v on u as \(v-\pi _u(v)\) is perpendicular to \(span\left( \lbrace u \rbrace \right) \)

Definition 5.10

(Projection)  Let \(v\in \mathbb {R}^2\) be any vector and \(u\in \mathbb {R}^2\) be a vector of unit length. Then the projection of v onto \(span\left( \lbrace u \rbrace \right) \) (a line passing through origin) is defined by \(\pi _u(v)=(u.v)u\).

Inner Product Spaces Norm defined on a vector space generalizes the idea of the length of a vector in \(\mathbb {R}^2\). Likewise, we will generalize the idea of the dot product in \(\mathbb {R}^2\) to arbitrary vector spaces to obtain a more useful structure, where we can discuss the idea of orthogonality, projection, etc.

Definition 5.11

(Inner product space)  Let V be a vector space over a field \(\mathbb {K}\). An inner product on V is a function that assigns, to every ordered pair of vectors \(u,v \in V\), a scalar in \(\mathbb {K}\), denoted by \(\langle u,v \rangle \), such that for all uv and w in V and all \(\lambda \in \mathbb {K}\), the following hold:

  1. (IP1)

    \(\langle v,v \rangle \ge 0\) and \(\langle v,v \rangle = 0\Leftrightarrow v=0\)

  2. (IP2)

    \(\langle u+w,v \rangle = \langle u,v \rangle +\langle w,v \rangle \)

  3. (IP3)

    \(\langle \lambda u,v \rangle =\lambda \langle u,v \rangle \)

  4. (IP4)

    \(\overline{\langle u,v \rangle }=\langle v,u \rangle \), where the bar denotes complex conjugation.

Then V together with an inner product defined on it is called an Inner product space. If \(\mathbb {K}=\mathbb {R}\), then (IP4) changes to \(\langle u,v \rangle =\langle v,u \rangle \).

Remark 5.2

  1. 1.

    If \(\lambda _1,\lambda _2,\ldots , \lambda _n \in \mathbb {K}\) and \(w,v_1,v_2,\ldots ,v_n \in V\), then

    $$\begin{aligned}\left\langle \sum _{i=1}^{n}\lambda _iv_i,w \right\rangle =\sum _{i=1}^{n}\lambda _i \langle v_i,w \rangle \end{aligned}$$
  2. 2.

    By (IP2) and (IP3), for a fixed \(v \in V\), \(\langle u,v \rangle \) is a linear transformation on V.

  3. 3.

    Dot product is an inner product on the vector space \(\mathbb {R}^2\) over \(\mathbb {R}\).

Example 5.15

Consider the vector space \(\mathbb {K}^n\) over \(\mathbb {K}\). For \(u=(u_1,u_2,\ldots ,u_n)\) and \(v=(v_1,v_2,\ldots ,v_n)\) in \(\mathbb {K}^n\), define \(\langle u,v \rangle = \sum _{i=1}^n u_i\overline{v_i}\), here \(\overline{v}\) denote the conjugate of v. This inner product is called standard inner product in \(\mathbb {K}^n\).

  1. (IP1)

    We have

    $$\begin{aligned}\langle u,u \rangle = \sum _{i=1}^n u_i\overline{u_i}=\sum _{i=1}^n |u_i |^2 \ge 0\end{aligned}$$

    and

    $$\begin{aligned}\langle u,u \rangle = \sum _{i=1}^n |u_i |^2 = 0 \Leftrightarrow |u_i |^2 = 0,\forall i=1,2, \ldots ,n \Leftrightarrow u_i=0 ,\forall i=1,2, \ldots ,n \Leftrightarrow u=0\end{aligned}$$
  2. (IP2)

    For, \(w=(w_1,w_2,\ldots ,w_n) \in \mathbb {K}^n\)

    $$\begin{aligned} \langle u+w,v \rangle &= \sum _{i=1}^n (u_i+w_i) \overline{v_i} \\ &=\sum _{i=1}^n u_i\overline{v_i}+\sum _{i=1}^n w_i\overline{v_i} =\langle u,v \rangle + \langle u, w \rangle \end{aligned}$$
  3. (IP3)

    \(\langle \lambda u,v \rangle = \sum _{i=1}^n \lambda u_i \overline{v_i}=\lambda \sum _{i=1}^n u_i\overline{v_i}= \lambda \langle u,v \rangle \), where \(\lambda \in \mathbb {K}\).

  4. (IP4)

    \(\overline{\langle u,v \rangle }= \overline{\sum _{i=1}^n u_i\overline{v_i}}=\sum _{i=1}^n \overline{u_i\overline{v_i}}=\sum _{i=1}^n v_i\overline{u_i}=\langle v,u \rangle \)

Therefore \(\mathbb {K}^n\) is an inner product space with respect to the standard inner product. Observe that if \(\mathbb {K}=\mathbb {R}\), the inner product, \(\langle u,v \rangle = \sum _{i=1}^n u_iv_i\) is the usual dot product in \(\mathbb {R}^n\).

Example 5.16

Let \(V=\mathcal {C}[a,b]\), the space of real-valued functions on [ab]. For \(f,g \in V\), define \(\langle f,g \rangle = \int _a^bf(x)g(x)dx\). Then V is an inner product space with the defined inner product.

  1. (IP1)

    We have

    $$\begin{aligned}\langle f,f \rangle = \int _a^bf(x)f(x)dx= \int _a^b \left[ f(x) \right] ^2dx \ge 0 \end{aligned}$$

    and

    $$\begin{aligned}\langle f,f \rangle = \int _a^b \left[ f(x) \right] ^2dx = 0 \Leftrightarrow f(x)=0, \forall \ x \in [a,b]\end{aligned}$$
  2. (IP2)

    For, \(h \in \mathcal {C}[a,b]\)

    $$\begin{aligned} \langle f+h,g \rangle &= \int _a^b \left[ f(x)+h(x) \right] g(x)dx \\ &= \int _a^b f(x)g(x)dx+ \int _a^b h(x)g(x)dx =\langle f,g \rangle + \langle h,g \rangle \end{aligned}$$
  3. (IP3)

    \(\langle \lambda f,g \rangle = \int _a^b \lambda f(x)g(x)dx= \lambda \int _a^b f(x)g(x)dx =\lambda \langle f,g \rangle \) where \(\lambda \in \mathbb {R}\).

  4. (IP4)

    \(\langle f,g \rangle = \int _a^bf(x)g(x)dx = \int _a^bg(x)f(x)dx = \langle g,f \rangle \).

Thus \(\mathcal {C}[a,b]\) is an inner product space with respect to the inner product \(\langle f,g \rangle = \int _a^bf(x)g(x)dx\). Let us consider a numerical example here for better understanding. Consider \(f(x)=x^2-1,\ g(x)=x+1 \in \mathcal {C}[0,1]\). Then

$$\begin{aligned}\langle f,g \rangle = \int _0^1(x^3+x^2-x-1)dx=\left[ \frac{x^4}{4}+\frac{x^3}{3}-\frac{x^2}{2}-x\right] _0^1 =\frac{-11}{12} \end{aligned}$$
$$\begin{aligned}\langle f,f \rangle = \int _0^1(x^4-2x^2+1)dx=\left[ \frac{x^5}{5}-2\frac{x^3}{3}+x\right] _0^1 =\frac{8}{15} \end{aligned}$$

and

$$\begin{aligned}\langle g,g \rangle = \int _0^1(x^2+2x+1)dx=\left[ \frac{x^3}{3}+x^2+x\right] _0^1 =\frac{7}{3} \end{aligned}$$

What if we define, \(\langle f,g \rangle = \int _0^1f(x)g(x)dx-1\) for \(f,g \in \mathcal {C}[0,1]\)? Does it define an inner product on \(\mathcal {C}[0,1]\)? No, it doesn’t! Observe that, for \(f(x)=x^2-1\), we get \(\langle f,f \rangle = \frac{8}{15}-1=\frac{-7}{15}<0\). This is not possible for an inner product as it violates (IP1). Now, let us discuss some of the basic properties of inner product spaces.

Theorem 5.4

Let V be an inner product space. Then for \(u,v,w \in V\) and \(\lambda \in \mathbb {K}\), the following statements are true.

  1. (a)

    \(\langle u,v+w \rangle =\langle u,v \rangle + \langle u,w \rangle \)

  2. (b)

    \(\langle u, \lambda v \rangle = \overline{\lambda } \langle u,v \rangle \)

  3. (c)

    \(\langle u,0 \rangle =\langle 0,u \rangle =0\)

  4. (d)

    If \(\langle u,v \rangle = \langle u,w \rangle \) for all \(u \in V\), then \(v=w\).

Proof

For \(u,v,w \in V\) and \(\lambda \in \mathbb {K}\),

  1. (a)

    \(\langle u,v+w \rangle = \overline{\langle v+w,u \rangle }=\overline{\langle v,u\rangle + \langle w,u \rangle }=\overline{\langle v,u\rangle }+\overline{\langle w,u \rangle }=\langle u,v \rangle + \langle u,w \rangle \)

  2. (b)

    \(\langle u, \lambda v \rangle = \overline{\langle \lambda v, u \rangle }=\overline{\lambda \langle v, u \rangle } =\overline{\lambda }\ \overline{\langle v, u \rangle }=\overline{\lambda } \langle u,v \rangle \)

  3. (c)

    \(\langle u,0 \rangle =\langle u,0+0 \rangle =\langle u,0 \rangle +\langle u,0 \rangle \Rightarrow \langle u,0 \rangle =0\). Similarly \(\langle 0,u \rangle = \langle 0+0,u \rangle = \langle 0,u \rangle + \langle 0,u \rangle =0\).

  4. (d)

    Suppose that \(\langle u,v \rangle = \langle u,w \rangle \) for all \(u \in V\).

    $$\begin{aligned}\langle u,v \rangle = \langle u,w \rangle \Rightarrow \langle u,v \rangle - \langle u,w \rangle =0 \Rightarrow \langle u,v - w \rangle =0\end{aligned}$$

    That is, \(\langle u,v \rangle = \langle u,w \rangle \) for all \(u \in V\) implies that \(\langle u,v - w \rangle =0\ \forall \ u \in V \). In particular, \(\langle v-w,v-w \rangle =0 \). This implies \(v-w=0\). That is, \(v=w\).

The following theorem gives one of the most important and widely used inequalities in mathematics, called the Cauchy-Schwarz Inequality, named after the French mathematician Augustin-Louis Cauchy (1789–1857) and the German mathematician Hermann Schwarz (1843–1921).

Theorem 5.5

(Cauchy-Schwarz Inequality)  Let V be an inner product space. For \(v,w \in V\),

$$\begin{aligned}|\langle v,w \rangle |^2 \le \langle v,v \rangle \langle w,w \rangle \end{aligned}$$

where equality holds if and only if \(\lbrace v,w \rbrace \) is linearly dependent.

Proof

Let \(v,w \in V\). Consider

$$\begin{aligned}u= \langle w,w \rangle v - \langle v,w \rangle w\end{aligned}$$

Then

$$\begin{aligned} 0 \le \langle u,u \rangle &= \langle \langle w,w \rangle v - \langle v,w \rangle w,\langle w,w \rangle v - \langle v,w \rangle w \rangle \\ &= |\langle w,w \rangle |^2 \langle v,v \rangle - \langle w,w \rangle |\langle v,w \rangle |^2 - \langle w,w \rangle |\langle v,w \rangle |^2 + \langle w,w \rangle |\langle v,w \rangle |^2 \\ &= \langle w,w \rangle \left[ \langle v,v \rangle \langle w,w \rangle - |\langle v,w \rangle |^2 \right] \end{aligned}$$

Now suppose that \(\langle w,w \rangle >0\), then \(\langle v,v \rangle \langle w,w \rangle - |\langle v,w \rangle |^2 \ge 0\), which implies that \(|\langle v,w \rangle |^2 \le \langle v,v \rangle \langle w,w \rangle \). If \(\langle w,w \rangle =0\), then by (IP4), \(w=0\). Therefore by Theorem 5.4(c), \(\langle v,w \rangle =0\) and hence \(\langle v,v \rangle \langle w,w \rangle =0= |\langle v,w \rangle |^2\).

Now suppose that equality holds. That is, \(|\langle v,w \rangle |^2 = \langle v,v \rangle \langle w,w \rangle \). Then \(\langle u,u \rangle =0\). Then \( \langle w,w \rangle v = \langle v,w \rangle w\) and hence \(\lbrace v,w \rbrace \) is linearly dependent. Conversely, suppose that \(\lbrace v,w \rbrace \) is linearly dependent. Then by Corollary 2.1, one is a scalar multiple of the other. That is, there exists \(\lambda \in \mathbb {K}\) such that \(v = \lambda w\) or \(w= \lambda v\). Then

$$\begin{aligned}\langle v,v \rangle \langle w,w \rangle = \langle \lambda w,\lambda w \rangle \langle w,w \rangle =|\lambda |^2 |\langle w,w \rangle |^2 =|\langle v,w \rangle |^2 \end{aligned}$$

Hence the proof.

Example 5.17

Consider \(\mathbb {R}^n\) with standard inner product. For \((u_1,\ldots ,u_n),(v_1,\ldots ,v_n) \in \mathbb {R}^n\), by Cauchy-Schwarz inequality, we have

$$\begin{aligned}(u_1v_1+u_2v_2+\cdots +u_nv_n)^2 \le (u_1+u_2+\cdots +u_n)^2(v_1+v_2+\cdots +v_n)^2 \end{aligned}$$

That is, \(\left( \sum _{i=1}^{n}u_iv_i \right) ^2 \le \left( \sum _{i=1}^{n}u_i \right) ^2 \left( \sum _{i=1}^{n}v_i \right) ^2\). If we consider, \(\mathcal {C}[a,b]\) with the inner product, \(\langle f,g \rangle = \int _a^bf(x)g(x)dx\), then by Cauchy-Schwarz inequality, we have

$$\begin{aligned}\left[ \int _a^b f(x)g(x)dx\right] ^2 \le \int _a^b f^2(x)dx \int _a^b g^2(x)dx \end{aligned}$$

That is, \(|\langle f,g \rangle |^2 \le \langle f, f \rangle \langle g,g \rangle \). Consider \(f,g \in \mathcal {C}[0,1]\) as defined in Example 5.16. We have seen that \(\langle f,g \rangle = \frac{-11}{12}, \langle f,f \rangle = \frac{8}{15}\) and \(\langle g,g \rangle = \frac{7}{3}\). Clearly,

$$\begin{aligned}|\langle f,g \rangle |^2 =\frac{121}{144} \le \frac{56}{45} =\langle f, f \rangle \langle g,g \rangle \end{aligned}$$

In the previous section, we have seen that every normed linear space is a metric space. Now, we will show that every inner product space is a normed linear space. The following theorem gives a method to define a norm on an inner product space using the inner product.

Theorem 5.6

Let V be an inner product space. For \(v \in V\), \( \left\| v \right\| =\sqrt{\langle v,v \rangle }\) is a norm on V.

Proof

  1. (N1)

    Let \(v \in V\). Since \(\langle v,v \rangle \ge 0\), we have \(\left\| v \right\| =\sqrt{\langle v,v \rangle } \ge 0\). Also \(\langle v,v \rangle =0 \Leftrightarrow v=0\), implies that \(\left\| v\right\| =\sqrt{\langle v,v \rangle }=0 \Leftrightarrow v=0 \).

  2. (N2)

    \( \left\| \lambda v \right\| =\sqrt{\langle \lambda v,\lambda v \rangle }=\sqrt{ \lambda \overline{\lambda }\langle v,v \rangle } =\sqrt{|\lambda |^2 \left\| v \right\| ^2}=|\lambda |\left\| v \right\| \), where \(\lambda \in \mathbb {K}\).

  3. (N3)

    For \(u,v \in V\),

    $$\begin{aligned} \left\| u+v \right\| ^2 &= \langle u+v,u+v \rangle \\ &= \langle u,u \rangle + \langle u,v \rangle +\langle v,u \rangle + \langle v,v \rangle \\ &= \left\| u \right\| ^2 +\left\| v \right\| ^2 +2Re (\langle u,v \rangle )\\ &\le \left\| u \right\| ^2 +\left\| v \right\| ^2 +2|\langle u,v \rangle |\\ &\le \left\| u \right\| ^2 +\left\| v \right\| ^2 +2 \left\| u \right\| \left\| v \right\| \ (Cauchy-Schwarz) \\ &= \left( \left\| u \right\| + \left\| v \right\| \right) ^2 \end{aligned}$$

    Hence \(\left\| u+v \right\| \le \left\| u \right\| + \left\| v \right\| \).

Therefore \( \left\| v \right\| =\sqrt{\langle v,v \rangle }\) is a norm on V.

Remark 5.3

The norm defined in the above theorem is called the norm induced by the inner product. Every inner product space is a normed linear space with respect to the induced norm. 

Example 5.18

Consider \(\mathbb {R}^n\) with standard inner product. Observe that for \(v=(v_1,v_2,\ldots ,v_n) \in \mathbb {R}^n\), we get

$$\begin{aligned}\left\| v \right\| =\sqrt{\langle v,v \rangle } = \left( \sum _{i=1}^{n}v_i^2\right) ^{\frac{1}{2}}=\left\| v \right\| _2 \end{aligned}$$

Thus the standard inner product on \(\mathbb {R}^n\) induces 2-norm. Similarly, the inner product \(\langle f,g\rangle =\int _a^b f(x)g(x) dx\) on \(\mathcal {C}[a,b]\) induces the norm,

$$\begin{aligned}\left\| f \right\| =\sqrt{\langle f,f\rangle }=\left( \int _a^b f^2(x)dx \right) ^{\frac{1}{2}} \end{aligned}$$

This norm is called, energy norm

The following inclusion can be derived between the collections of these abstract spaces.

$$\begin{aligned}\lbrace \text {Inner\ product\ spaces} \rbrace \subset \lbrace \text {Normed\ spaces} \rbrace \subset \lbrace \text {Metric\ spaces} \rbrace \end{aligned}$$

Now we have to check whether the reverse inclusion is true or not. The following theorem gives a necessary condition for an inner product space.

Theorem 5.7

(Parallelogram Law)  Let V be an inner product space. Then for all \(u,v \in V\),

$$\begin{aligned} \left\| u+v \right\| ^2 + \left\| u-v \right\| ^2 =2\left( \left\| u \right\| ^2 +\left\| v \right\| ^2 \right) \end{aligned}$$

Proof

For all \(u,v \in V\),

$$\begin{aligned} \left\| u+v \right\| ^2 =\langle u+v ,u+v \rangle = \langle u,u \rangle + \langle u,v \rangle +\langle v,u \rangle + \langle v,v \rangle \\ \left\| u-v \right\| ^2 = \langle u-v ,u-v \rangle = \langle u,u \rangle - \langle u,v \rangle -\langle v,u \rangle + \langle v,v \rangle \end{aligned}$$

Therefore \( \left\| u+v \right\| ^2 + \left\| u-v \right\| ^2 =2\left( \left\| u \right\| ^2 +\left\| v \right\| ^2 \right) \) (Fig. 5.11).

Fig. 5.11
Three vectors labeled norm v, norm u + v, and norm u form a parallelogram by adding three dashed vectors. The vector on the top is norm u. The vector on the right is norm v. The diagonal vector is norm u minus v.

Parallelogram law

Example 5.19

In Example 5.4, we have seen that \(\mathcal {C}[a,b]\), the space of continuous real-valued functions on [ab] is a normed linear space with the supremum norm given by, \(\left\| f \right\| = \max \limits _{x \in [a,b]} |f(x) |\) where \(f \in \mathcal {C}[a,b]\). This space gives an example of a normed linear space which is not an inner product space. Consider the elements \(f_1(x)=1\) and \(f_2(x)=\dfrac{(x-a)}{(b-a)}\) in \(\mathcal {C}[a,b]\). Then \(\left\| f_1 \right\| = 1\) and \(\left\| f_2 \right\| = 1\). We have

$$\begin{aligned}(f_1+f_2)(x) = 1+ \dfrac{(x-a)}{(b-a)}\ \text {and}\ (f_1-f_2)(x) = 1- \dfrac{(x-a)}{(b-a)}\end{aligned}$$

Hence \(\left\| f_1 +f_2 \right\| = 2\) and \(\left\| f_1 -f_2 \right\| = 1\). Now

$$\begin{aligned}\left\| f_1 +f_2 \right\| ^2 + \left\| f_1 - f_2 \right\| ^2 =5\ \text {but} \ 2\left( \left\| f_1 \right\| ^2 +\left\| f_2 \right\| ^2 \right) =4\end{aligned}$$

Clearly, parallelogram law is not satisfied. Thus supremum norm on \(\mathcal {C}[a,b]\) cannot be obtained from an inner product.

From the above example, we can conclude that not all normed linear spaces are inner product spaces. Now, we will prove that a normed linear space is an inner product space if and only if the norm satisfies parallelogram law

Theorem 5.8

Let \(\left( V, \left\| . \right\| \right) \) be a normed linear space. Then there exists an inner product \(\langle , \rangle \) on V such that \(\langle v,v \rangle = \left\| v \right\| ^2\) for all \(v\in V\) if and only if the norm satisfies the parallelogram law.

Proof

Suppose that we have an inner product on V with \(\langle v,v \rangle = \left\| v \right\| ^2\) for all \(v\in V\). Then by Theorem 5.7, parallelogram law is satisfied.

Conversely, suppose that the norm on V satisfies parallelogram law. For any \(u,v \in V\), define

$$\begin{aligned}4 \langle u,v \rangle = \left\| u+v \right\| ^2 - \left\| u-v \right\| ^2+i \left\| u+iv \right\| ^2 -i \left\| u-iv \right\| ^2\end{aligned}$$

Now we will prove that the inner product defined above will satisfy the conditions \((IP1)-(IP4)\).

  1. (IP1)

    For any \(v\in V\), we have

    $$\begin{aligned} 4 \langle v,v \rangle &= \left\| v+v \right\| ^2 - \left\| v-v \right\| ^2+i \left\| v(1+i) \right\| ^2 -i \left\| v(1-i) \right\| ^2 \\ &= 4 \left\| v \right\| ^2 + i |1+i |^2 \left\| v \right\| ^2 -i|1-i |^2 \left\| v \right\| ^2 \\ &= 4 \left\| v \right\| ^2 + 2i \left\| v \right\| ^2 -2i\left\| v \right\| ^2 \\ &= 4\left\| v \right\| ^2 \end{aligned}$$

    This implies that \(\langle v,v \rangle = \left\| v \right\| ^2\) for all \(v\in V\). Hence \(\langle v,v \rangle \ge 0\) for all \(v \in V\) and \(\langle v,v \rangle = 0\) if and only if \(v=0\).

  2. (IP2)

    For any \(u,v,w\in V\), we have

    $$\begin{aligned} 4 \langle u+w,v \rangle &= \left\| (u+w)+v \right\| ^2 - \left\| (u+w)-v \right\| ^2+i \left\| (u+w)+iv \right\| ^2 -i \left\| (u+w)-iv \right\| ^2 \\ \end{aligned}$$

    rewriting \(u+w+v\) as \(\left( u+\frac{v}{2}\right) + \left( w+\frac{v}{2}\right) \) and applying parallelogram law, we have

    $$\begin{aligned} \left| \left| \left( u+\frac{v}{2}\right) + \left( w+\frac{v}{2}\right) \right| \right| ^2+ \left| \left| \left( u+\frac{v}{2}\right) - \left( w+\frac{v}{2}\right) \right| \right| ^2= 2 \left| \left| u+\frac{v}{2} \right| \right| ^2 + 2 \left| \left| w+\frac{v}{2}\right| \right| ^2 \end{aligned}$$

    This implies

    $$\begin{aligned}\left\| u+w+v \right\| ^2 = 2 \left| \left| u+\frac{v}{2} \right| \right| ^2 + 2 \left| \left| w+\frac{v}{2}\right| \right| ^2 - \left\| u-w \right\| ^2 \end{aligned}$$

    Similarly,

    $$\begin{aligned}\left\| u+w-v \right\| ^2 = 2 \left| \left| u-\frac{v}{2} \right| \right| ^2 + 2 \left| \left| w-\frac{v}{2}\right| \right| ^2 - \left\| u-w \right\| ^2 \end{aligned}$$

    Then

    $$\begin{aligned} \left\| u+w+v \right\| ^2 - \left\| u+w-v \right\| ^2= 2 \left[ \left| \left| u+\frac{v}{2} \right| \right| ^2 - \left| \left| u-\frac{v}{2} \right| \right| ^2 + \left| \left| w+\frac{v}{2}\right| \right| ^2 - \left| \left| w-\frac{v}{2}\right| \right| ^2 \right] \end{aligned}$$
    (5.4)

    Multiplying both sides by i and replacing v by iv in the above equation,

    $$\begin{aligned} i \left[ \left\| u+w+iv \right\| ^2 - \left\| u+w-iv \right\| ^2 \right] =2i \left[ \left| \left| u+\frac{iv}{2} \right| \right| ^2 - \left| \left| u-\frac{iv}{2} \right| \right| ^2 + \left| \left| w+\frac{iv}{2}\right| \right| ^2 - \left| \left| w-\frac{iv}{2}\right| \right| ^2 \right] \end{aligned}$$
    (5.5)

    adding (5.4) and (5.5),we get

    $$\begin{aligned} 4 \langle u+w, v \rangle &= 2 \left[ \left| \left| u+\frac{v}{2} \right| \right| ^2 - \left| \left| u- \frac{v}{2} \right| \right| ^2 +i \left| \left| u+ \frac{iv}{2} \right| \right| ^2 - i \left| \left| u- \frac{iv}{2} \right| \right| ^2 \right] \\ &\ + 2 \left[ \left| \left| w +\frac{v}{2} \right| \right| ^2 - \left| \left| w- \frac{v}{2} \right| \right| ^2 +i \left| \left| w+ \frac{iv}{2} \right| \right| ^2 - i \left| \left| w- \frac{iv}{2} \right| \right| ^2 \right] \\ &= 8\left[ \left\langle u, \frac{v}{2} \right\rangle + \left\langle w, \frac{v}{2} \right\rangle \right] \end{aligned}$$

    No taking \(w=0\) and then \(u=0\) separately in the above equation, we get \(\langle u,v \rangle =2 \left\langle u, \frac{v}{2} \right\rangle \) and \(\langle w,v \rangle =2 \left\langle w, \frac{v}{2} \right\rangle \). Thus we get, \(4 \langle u+w,v \rangle =4 \langle u,v \rangle + 4 \langle w,v \rangle \) for all \(u,v,w \in V\).

  3. (IP3)

    Now we will prove that \(\langle \lambda u,v \rangle = \lambda \langle u, v \rangle \). We will prove this as four separate cases.

    1. (a)

      \(\lambda \) is an integer.

      For all \(u,v,w \in V\), we have

      $$\begin{aligned}\langle u+w,v \rangle =\langle u,v \rangle + \langle w,v \rangle \end{aligned}$$

      Replacing w by u, we get \(\langle 2u,v \rangle = 2 \langle u,v \rangle \). Thus the result is true for \(\lambda =2\). Suppose that the result is true for any positive integer n. That is, \(\langle nu,v \rangle = n \langle u,v \rangle \) for all \(u,v \in V\). Now

      $$\begin{aligned}\langle (n+1)u,v \rangle =\langle nu+u,v \rangle =\langle nu,v \rangle + \langle u,v \rangle =(n+1) \langle u,v \rangle \end{aligned}$$

      hence by the principle of mathematical induction, the result is true for all positive integers n. Now, to prove this for any negative integer n, first we prove that \(\langle -u,v \rangle =- \langle u,v \rangle \), for any \(u,v \in V\). We have

      $$\begin{aligned}4 \langle u,v \rangle = \left\| u+v \right\| ^2 - \left\| u-v \right\| ^2+i \left\| u+iv \right\| ^2 -i \left\| u-iv \right\| ^2\end{aligned}$$

      Replacing u by \(-u\), we get

      $$\begin{aligned} 4 \langle -u,v \rangle &= \left\| -u+v \right\| ^2 - \left\| -u-v \right\| ^2+i \left\| -u+iv \right\| ^2 -i \left\| -u-iv \right\| ^2 \\ &= \left\| -(u-v) \right\| ^2 - \left\| -(u+v) \right\| ^2+i \left\| -(u-iv) \right\| ^2 -i \left\| -(u+iv) \right\| ^2 \\ &= \left\| u-v \right\| ^2 - \left\| u+v \right\| ^2+i \left\| u-iv \right\| ^2 -i \left\| u+iv \right\| ^2 \\ &=-4 \langle u,v \rangle \end{aligned}$$

      Thus we have \(\langle -u,v \rangle =- \langle u,v \rangle \) for any \(u,v \in V\). Let \(\lambda = -\mu \) be any negative integer, where \(\mu >0\). Then we have,

      $$\begin{aligned}\langle \lambda u, v \rangle = \langle -\mu u,v \rangle = \langle -(\mu u),v \rangle =-\langle \mu u,v \rangle = -\mu \langle u,v \rangle = \lambda \langle u,v \rangle \end{aligned}$$

      Thus the result is true for any integer \(\lambda \).

    2. (b)

      \(\lambda = \frac{p}{q}\) is a rational number, where \(p,q\ne 0\) are integers.

      Then we have

      $$\begin{aligned}p \langle u, v \rangle = \langle pu,v \rangle = \left\langle q \left( \frac{p}{q} \right) u,v \right\rangle = q \left\langle \frac{p}{q} u,v \right\rangle \end{aligned}$$

      Thus we have \(\left\langle \frac{p}{q} u,v \right\rangle = \frac{p}{q} \langle u,v \rangle \) for all \(u,v \in V\). Thus the result is true for all rational numbers.

    3. (c)

      \(\lambda \) is a real number.

      Then there exists a sequence of rational numbers \(\{ \lambda _n \}\) such that \(\lambda _n \rightarrow \lambda \) as \(n \rightarrow \infty \) (See Exercise 13, Chap. 1). Observe that, as \(n \rightarrow \infty \)

      $$\begin{aligned} |\lambda _n \langle u,v \rangle - \lambda \langle u,v \rangle |= |(\lambda _n - \lambda ) \langle u,v \rangle |= |\lambda _n-\lambda ||\langle u,v \rangle |\rightarrow 0\end{aligned}$$

      Hence, \(\lambda _n \langle u,w \rangle \rightarrow \lambda \langle u,v \rangle \) as \(n \rightarrow \infty \). Now, by (b), \(\lambda _n \langle u,v \rangle = \langle \lambda _n u,v \rangle \). Also,

      $$\begin{aligned} 4 \langle \lambda _n u,v \rangle &= \left\| \lambda _n u+v \right\| ^2 - \left\| \lambda _n u-v \right\| ^2+i \left\| \lambda _n u+iv \right\| ^2 -i \left\| \lambda _n u-iv \right\| ^2 \\ &\rightarrow \left\| \lambda u+v \right\| ^2 - \left\| \lambda u-v \right\| ^2+i \left\| \lambda u+iv \right\| ^2 -i \left\| \lambda u-iv \right\| ^2 \\ &= 4 \langle \lambda u,v \rangle \end{aligned}$$

      That is, \(\langle \lambda _n u,v \rangle \rightarrow \langle \lambda u,v \rangle \) as \(n \rightarrow \infty \). This implies that \( \langle \lambda u,v \rangle =\lambda \langle u,v \rangle \) for any \(u,v \in V\).

    4. (d)

      \(\lambda \) is a complex number.

      First we will show that \(\langle iu,v \rangle =i\langle u,v \rangle \). We have

      $$\begin{aligned}4 \langle u,v \rangle = \left\| u+v \right\| ^2 - \left\| u-v \right\| ^2+i \left\| u+iv \right\| ^2 -i \left\| u-iv \right\| ^2\end{aligned}$$

      Replacing u by iu, we have

      $$\begin{aligned} 4 \langle iu,v \rangle &= \left\| iu+v \right\| ^2 - \left\| iu-v \right\| ^2+i \left\| iu+iv \right\| ^2 -i \left\| iu-iv \right\| ^2 \\ &= \left\| i(u-iv) \right\| ^2 - \left\| i(u+iv) \right\| ^2+i \left\| i(u+v) \right\| ^2 -i \left\| i(u-v) \right\| ^2\\ &= \left\| u-iv \right\| ^2 - \left\| u+iv \right\| ^2+i \left\| u+v \right\| ^2 -i \left\| u-v \right\| ^2\\ &= -i^2 \left\| u-iv \right\| ^2 +i^2 \left\| u+iv \right\| ^2+i \left\| u+v \right\| ^2 -i \left\| u-v \right\| ^2\\ &= i \left[ \left\| u+v \right\| ^2 - \left\| u-v \right\| ^2+i \left\| u+iv \right\| ^2 -i \left\| u-iv \right\| ^2 \right] \\ &= i 4\langle u,v \rangle \end{aligned}$$

      which implies that \(\langle iu,v \rangle =i\langle u,v \rangle \). Now, for any complex number \(\lambda =a+ib\), then

      $$\begin{aligned} \langle \lambda u,v \rangle &= \langle (a+ib) u,v \rangle \\ &= \langle au+ibu,v \rangle \\ &=\langle au,v \rangle + \langle ibu,v \rangle \\ &= a \langle u,v \rangle + i b \langle u,v \rangle \\ &= (a+ib) \langle u,v \rangle =\lambda \langle u,v \rangle \end{aligned}$$

      Thus \(\langle \lambda u,v \rangle = \lambda \langle u,v \rangle \) for all \(u,v \in V\) and for all scalars \(\lambda \).

  4. (IP4)

    For any \(u,v\in V\), we have

    $$\begin{aligned} 4 \overline{\langle u,v \rangle } &= \left\| u+v \right\| ^2 - \left\| u-v \right\| ^2+i \left\| u+iv \right\| ^2 -i \left\| u-iv \right\| ^2 \\ &= \left\| v+u \right\| ^2 - \left\| v-u \right\| ^2 + i \left\| i(v-iu) \right\| ^2 - i \left\| (-i)(v+iu) \right\| ^2 \\ &= \left\| v+u \right\| ^2 - \left\| v-u \right\| ^2 + i |i |^2 \left\| v-iu \right\| ^2 -i |-i |^2 \left\| v+iu \right\| ^2 \\ &= \left\| v+u \right\| ^2 - \left\| v-u \right\| ^2 - i \left\| v-iu \right\| ^2 +i \left\| v+iu \right\| ^2 \\ &= 4\langle v,u \rangle \end{aligned}$$

    Hence, \(\overline{\langle u,v \rangle }= \langle v,u \rangle \) for all \(u,v \in V\).

Thus all the conditions for an inner product are satisfied and hence \((V,\langle , \rangle )\) is an inner product space.

Similar to what we have done in normed linear spaces, the concept of convergence of sequences in inner product spaces follows from the definition of convergence in metric spaces as given below.

Definition 5.12

(Convergence) Let \(\lbrace v_n \rbrace \) be a sequence in an inner product space V, then \(v_n \rightarrow v\) if and only if \( \langle v_n , v \rangle \rightarrow 0\) as \(n \rightarrow \infty \).

Again the question of completeness rises. The following example shows that every inner product space need not necessarily be complete.

Example 5.20

Consider \(\mathcal {C}[0,1]\) with the inner product \(\langle f,g \rangle = \int _0^1 f(x)g(x) dx\). We have already seen that \(\mathcal {C}[0,1]\) is an inner product space with respect to the given inner product. Now, consider the sequence,

$$\begin{aligned}f_n = {\left\{ \begin{array}{ll} 0,\ x\in \left[ 0,\frac{1}{2} \right] \\ n\left( x- \frac{1}{2} \right) ,\ x\in \left[ \frac{1}{2},\frac{1}{2}+\frac{1}{n} \right] \\ 1,\ x\in \left[ \frac{1}{2}+\frac{1}{n},1 \right] \end{array}\right. }\end{aligned}$$

If we proceed as in Example 5.12, we can show that \(\{f_n\}\) is Cauchy but not convergent.

Complete inner product spaces are named after the famous German mathematician David Hilbert (1862–1943) who started a systematic study in this area.

Definition 5.13

(Hilbert Space) A complete inner product space is called a Hilbert space. 

Example 5.21

Consider \(\mathbb {K}^n\) over \(\mathbb {K}\) with standard inner product. Then \(\left\| v \right\| = \sqrt{\langle v,v \rangle }= \left( \sum _{i=1}^n |v_i |^2\right) ^{\frac{1}{2}} \) for \(v=\left( v_1,v_2, \ldots ,v_n \right) \in \mathbb {K}^n\). Then from Example 5.13, \(\mathbb {K}^n\) over \(\mathbb {K}\) with standard inner product is a Hilbert space. In fact, we can prove that every finite-dimensional space over the fields \(\mathbb {R}\) or \(\mathbb {C}\) is complete(Prove). Is \(\mathbb {Q}\) over the field \(\mathbb {Q}\) complete?

3 Orthogonality of Vectors and Orthonormal Sets

Orthogonality of vectors in vector spaces is one of the important basic concepts in mathematics which is generalized from the idea that the dot product of two vectors is zero implies that the vectors are perpendicular in \(\mathbb {R}^2\) (Fig. 5.12).

Fig. 5.12
A graph has a V-shaped line formed by two vectors from the origin above the horizontal axis. The vector in the first quadrant is u and the vector in the second quadrant is v. The angle between the vectors is 90 degrees.

Example for orthogonal vectors in \(\mathbb {R}^2\)

Orthogonal/orthonormal bases are of great importance in functional analysis, which we will be discussing in the coming sections. We will start with the definition of an orthogonal set.

Definition 5.14

(Orthogonal set)  Let V be an inner product space. Vectors \(v ,w \in V\) are orthogonal if \(\langle v,w \rangle =0\). A subset S of V is orthogonal if any two distinct vectors in S are orthogonal.

We are all familiar with the fundamental relation from Euclidean geometry that, “in a right-angled triangle, the square of the hypotenuse is equal to the sum of squares of the other two sides”, named after the famous Greek mathematician, Pythagoras(570-495 BC) (Fig. 5.13).

Fig. 5.13
Three vectors from the same point. The horizontal vector is v 1. The vertical vector on the left is v 2. The diagonal vector is v 1 + v 2. The vectors v 1 and v 1 + v 2 are connected by a vertical dashed line v 2.

Pythagoras theorem illustrated in \(\mathbb {R}^2\)

This relation can be generalized to higher-dimensional spaces, to spaces that are not Euclidean, to objects that are not right triangles, and to objects that are not even triangles. Consider the following theorem.

Theorem 5.9

(Pythagoras Theorem) Let V be an inner product space and \(\lbrace v_1,v_2, \ldots ,v_n \rbrace \) be an orthogonal set in V. Then

$$\begin{aligned}\left\| v_1+v_2+ \cdots + v_n \right\| ^2 = \left\| v_1 \right\| ^2 + \left\| v_2 \right\| ^2 + \cdots + \left\| v_2 \right\| ^2 \end{aligned}$$

Proof

As \(\lbrace v_1,v_2, \ldots ,v_n \rbrace \) is an orthogonal set in V, we have \(\langle v_i,v_j \rangle =0,\ \forall i \ne j\). Then

$$\begin{aligned} \left\| v_1+v_2+ \cdots + v_n \right\| ^2 &= \langle v_1+v_2+ \cdots + v_n,v_1+v_2+ \cdots + v_n \rangle \\ &= \sum _{i,j=1}^n \langle v_i,v_j \rangle \\ &= \sum _{i=1}^n \langle v_i,v_i \rangle \\ &= \left\| v_1 \right\| ^2 + \left\| v_2 \right\| ^2 + \cdots + \left\| v_2 \right\| ^2 \end{aligned}$$

Definition 5.15

(Orthonormal set)  A vector \(v \in V\) is a unit vector if \(\left\| v \right\| =1\). A subset S of V is orthonormal if S is orthogonal and consists entirely of unit vectors. A subset of V is an orthonormal basis for V if it is an ordered basis that is orthonormal.

Example 5.22

Consider the set \(S=\lbrace v_1,v_2,v_3 \rbrace \) in \(\mathcal {C}[-1,1]\), where

$$\begin{aligned}v_1= \frac{1}{\sqrt{2}},v_2=\sqrt{\frac{3}{2}}x\ \text {and} \ v_3=\sqrt{\frac{5}{8}}(3x^2-1)\end{aligned}$$

Then

$$\begin{aligned}\langle v_1,v_1 \rangle = \int _{-1}^1\frac{1}{2} dx=1, \langle v_2,v_2 \rangle = \frac{3}{2}\int _{-1}^1x^2 dx=1, \langle v_3,v_3 \rangle = \frac{5}{8}\int _{-1}^1 (9x^4-6x^2+1)dx=1\end{aligned}$$

and

$$\begin{aligned}\langle v_1,v_2 \rangle = \frac{\sqrt{3}}{2}\int _{-1}^1 xdx=0, \langle v_1,v_3 \rangle = \frac{\sqrt{5}}{4}\int _{-1}^1 (3x^2-1)dx=0,\end{aligned}$$
$$\begin{aligned}\langle v_2,v_3 \rangle = \frac{\sqrt{15}}{4}\int _{-1}^1 (3x^3-x)dx=0\end{aligned}$$

Thus S is an orthonormal set in \(\mathcal {C}[-1,1]\). As \(\mathbb {P}_2[-1,1]\) is a subspace of \(\mathcal {C}[-1,1]\) with dimension 3, S can be considered as an orthonormal basis for \(\mathbb {P}_2[-1,1]\).

Example 5.23

Consider the standard ordered basis \(S= \lbrace e_1,e_2, \ldots ,e_n \rbrace \) in \(\mathbb {R}^n\) with standard inner product. Clearly \(\langle e_i,e_j \rangle = 0\) for \(i \ne j\) and \(\left\| e_i \right\| =\sqrt{\langle e_i,e_i \rangle }=1\) for all \(i=1,2, \ldots ,n\). Therefore the standard ordered basis of \(\mathbb {R}^n\) is an orthonormal basis.

In the previous chapters, we have seen that bases are the building blocks of a vector space. Now, suppose that this basis is orthogonal. Do we have any advantage? Consider the following example.

Example 5.24

Consider the vectors \(v_1=(2,1,2),v_2=(-2,2,1)\) and \(v_3=(1,2,-2)\) in \(\mathbb {R}^3\). Clearly, we can see that \(\{ v_1,v_2,v_3\}\) is an orthogonal basis for \(\mathbb {R}^3\)(verify). Then we know that any non-zero vector in \(\mathbb {R}^3\) can be written as a linear combination of \(\{ v_1,v_2,v_3\}\) in a unique way. That is, any \(v \in \mathbb {R}^3\) can be expressed as \(v=\lambda _1 v_1+\lambda _2v_2+\lambda _3v_3\) for some scalars \(\lambda _1,\lambda _2,\lambda _3\). Because of the orthogonality of basis vectors, here we can observe that,

$$\begin{aligned}\langle v,v_1 \rangle = \langle \lambda _1 v_1+\lambda _2v_2+\lambda _3v_3,v_1 \rangle = \lambda _1 \langle v_1,v_1 \rangle = \lambda _1 \left\| v_1 \right\| ^2 \end{aligned}$$

Hence, \(\lambda _1 = \frac{\langle v,v_1 \rangle }{\left\| v_1 \right\| ^2}\). Similarly, we can compute \(\lambda _2\) and \(\lambda _3\) as \(\frac{\langle v,v_2 \rangle }{\left\| v_1 \right\| ^2}\) and \(\frac{\langle v,v_3 \rangle }{\left\| v_1 \right\| ^2}\), respectively. This is interesting! right? Let us consider a numerical example. Take \(v=(6,12,-3) \in \mathbb {R}^3\). We have

$$\begin{aligned}(6,12,-3)=2(2,1,2)+1(-2,2,1)+4(1,2,-2)\end{aligned}$$

Observe that \(\frac{\langle v,v_1 \rangle }{\left\| v_1 \right\| ^2}=2,\frac{\langle v,v_2 \rangle }{\left\| v_2 \right\| ^2}=1\) and \(\frac{\langle v,v_3 \rangle }{\left\| v_3 \right\| ^2}=4\). Is this possible in any arbitrary inner product space? Yes, it is possible!! That is, if we have an orthogonal basis for an inner product space V, it is easy to represent any vector \(v\in V\) as a linear combination of the basis vectors. For, if \(\{ v_1,v_2, \ldots ,v_n \}\) is an orthogonal basis for an inner product space V, then for any \(v\in V\), we have

$$\begin{aligned}v=\frac{\langle v,v_1 \rangle }{\left\| v_1 \right\| ^2}v_1+\frac{\langle v,v_2 \rangle }{\left\| v_2 \right\| ^2}v_2+ \cdots + \frac{\langle v,v_n \rangle }{\left\| v_n \right\| ^2}v_n\end{aligned}$$

and if \(\{ v_1,v_2, \ldots ,v_n \}\) is an orthonormal basis for V, we have

$$\begin{aligned}v=\langle v,v_1 \rangle v_1+ \langle v,v_2 \rangle v_2 + \cdots + \langle v,v_n \rangle v_n\end{aligned}$$

This fact is formulated as the following theorem.

Theorem 5.10

Let V be an inner product space and \(S=\lbrace v_1,v_2, \ldots ,v_n \rbrace \) be an orthogonal subset of V consisting of non-zero vectors. If \(w \in span (S)\), then

$$\begin{aligned}w=\sum _{i=1}^n \dfrac{\langle w,v_i \rangle }{\left\| v_i \right\| ^2}v_i\end{aligned}$$

Further if S is an orthonormal set,

$$\begin{aligned}w=\sum _{i=1}^n \langle w,v_i \rangle v_i\end{aligned}$$

Proof

Since \(w \in span (S)\), there exists scalars \(\lambda _1 , \lambda _2 , \ldots , \lambda _n \in \mathbb {K}\) such that \(w= \lambda _1v_1 + \lambda _2v_2 + \cdots + \lambda _n v_n\). Now for \(i=1,2, \ldots ,n\), we have

$$\begin{aligned} \langle w,v_i \rangle &= \langle \lambda _1v_1 + \lambda _2v_2 + \cdots + \lambda _n v_n ,v_i \rangle \\ &=\lambda _1 \langle v_1,v_i \rangle + \lambda _2 \langle v_2,v_i \rangle + \cdots + \lambda _n \langle v_n ,v_i \rangle \end{aligned}$$

Since \(S=\lbrace v_1,v_2, \ldots ,v_n \rbrace \) is an orthogonal set, \(\langle v_i,v_j \rangle =0\) for all \(i \ne j\) and \(\langle v_i,v_i \rangle =\left\| v_i \right\| ^2 \ne 0\). Therefore

$$\begin{aligned}\langle w,v_i \rangle = \lambda _i \left\| v_i \right\| ^2 \end{aligned}$$

and hence \( \lambda _i = \dfrac{\langle w,v_i \rangle }{\left\| v_i \right\| ^2}\) for \(i=1,2, \ldots ,n\). This implies that \(w=\sum \limits _{i=1}^n \dfrac{\langle w,v_i \rangle }{\left\| v_i \right\| ^2}v_i\). If S is orthonormal, \(v_1,v_2, \ldots ,v_n\) are unit vectors and hence \(\left\| v_i \right\| =1\) for \(i=1,2, \ldots ,n\). Therefore \(w=\sum _{i=1}^n \langle w,v_i \rangle v_i\).

Remark 5.4

  The coefficients \(\dfrac{\langle w,v_i \rangle }{\left\| v_i \right\| ^2}\) is called the Fourier coefficients of v with respect to the basis \(\lbrace v_1,v_2, \ldots ,v_n \rbrace \), named after the French mathematician Jean-Baptiste Joseph Fourier (1768–1830).

The following corollary shows that the matrix representation of a linear operator defined on a finite-dimensional vector space with orthonormal basis can be easily computed using the idea of an inner product.

Corollary 5.1

Let V be an inner product space, and let \(B= \lbrace v_1,v_2, \ldots ,v_n \rbrace \) be an orthonormal basis of V. If T is a linear operator on V, and \(A=\left[ T \right] _B\). Then \(A_{ij}= \langle T(v_j),v_i \rangle \), where \(1 \le i,j \le n\).

Proof

Since B is a basis of V and as T is from V to V, from the above theorem

$$\begin{aligned}T(v_j)=\sum _{i=1}^n \langle T(v_j),v_i \rangle v_i\end{aligned}$$

which clearly implies that \(A_{ij}= \langle T(v_j),v_i \rangle \), where \(1 \le i,j \le n\).

Example 5.25

Consider \(\mathbb {P}_2[-1,1]\) with the basis defined in Example 5.22. Take an arbitrary element, say \(w=x^2+2x+3 \in \mathbb {P}_2[-1,1]\). Then we have,

$$\begin{aligned}\langle w,v_1 \rangle = \frac{1}{\sqrt{2}}\int _{-1}^1 (x^2+2x+3) dx = \frac{10\sqrt{2}}{3}\end{aligned}$$
$$\begin{aligned}\langle w,v_2 \rangle = \sqrt{\frac{3}{2}}\int _{-1}^1 (x^3+2x^2+3x) dx = \frac{2\sqrt{6}}{3}\end{aligned}$$

and

$$\begin{aligned}\langle w,v_3 \rangle = \frac{1}{\sqrt{2}}\int _{-1}^1 (3x^2-1)(x^2+2x+3) dx = \frac{\sqrt{40}}{15}\end{aligned}$$

Observe that \(w= \frac{10\sqrt{2}}{3}v_1+\frac{2\sqrt{2}}{\sqrt{3}}v_2+\frac{\sqrt{40}}{15}v_3\).

Define \(T:V\rightarrow V\) by

$$\begin{aligned}\left( Tp\right) (x) =p'(x)\end{aligned}$$

Then

$$\begin{aligned}T(v_1)=0,T(v_2)=\sqrt{\frac{3}{2}}\ \text {and}\ T(v_3)=\frac{\sqrt{15}}{2}x\end{aligned}$$

Clearly \(\langle T(v_1),v_i \rangle =0\) where \(i=1,2,3\). Also

$$\begin{aligned}\langle T(v_2),v_1 \rangle = \frac{\sqrt{3}}{2}\int _{-1}^1dx=\sqrt{3},\langle T(v_2),v_2 \rangle = \frac{3}{2}\int _{-1}^1xdx=0,\end{aligned}$$
$$\begin{aligned}\langle T(v_2),v_3 \rangle = \frac{\sqrt{15}}{4}\int _{-1}^1(3x^2-1)dx=0\end{aligned}$$

And

$$\begin{aligned}\langle T(v_3),v_1 \rangle = \frac{\sqrt{15}}{2\sqrt{2}}\int _{-1}^1xdx=0,\langle T(v_3),v_2 \rangle = \frac{3\sqrt{5}}{2\sqrt{2}}\int _{-1}^1x^2dx=\sqrt{\frac{5}{2}},\end{aligned}$$
$$\begin{aligned}\langle T(v_3),v_3 \rangle = \frac{5\sqrt{3}}{4\sqrt{2}}\int _{-1}^1(3x^3-x)dx=0\end{aligned}$$

Therefore

$$\begin{aligned} \left[ T \right] _B= \begin{bmatrix} 0 &{} \sqrt{3} &{} 0 \\ 0 &{} 0 &{} \sqrt{15} \\ 0 &{} 0 &{} 0 \end{bmatrix}\end{aligned}$$

Corollary 5.2

Let V be an inner product space, and \(S= \lbrace v_1,v_2, \ldots ,v_k \rbrace \) be an orthogonal subset of V consisting of non-zero vectors. Then S is linearly independent.

Proof

Let \( \lambda _1,\lambda _2, \ldots ,\lambda _k \in \mathbb {K} \) be such that \(\sum _{i=1}^k \lambda _iv_i=0\). Then for \(v_j \in S\),

$$\begin{aligned}0=\left\langle \sum _{i=1}^k \lambda _iv_i ,v_j \right\rangle = \lambda _j \left\| v_j \right\| ^2 \end{aligned}$$

Since S is a collection of non-zero vectors, this implies that \(\lambda _j =0\) for all \(j=1,2, \ldots ,k\). Therefore S is linearly independent.

Gram–Schmidt Orthonormalization

Corollary 5.2 shows that any orthogonal set of non-zero vectors is linearly independent. In this section, we will show that from a linearly independent set, we can construct an orthogonal set. In fact, we can construct an orthonormal set from a linearly independent set, with the same span using Gram–Schmidt Orthonormalization process. The process is named after the Danish mathematician Jørgen Pedersen Gram (1850–1916) and Baltic-German mathematician Erhard Schmidt (1876–1959).

Theorem 5.11

(Gram–Schmidt Orthonormalization) Let \(\lbrace v_1,v_2, \ldots v_n \rbrace \) be a linearly independent subset of an inner product space V. Define

$$\begin{aligned}w_1=v_1,\ u_1=\dfrac{w_1}{\left\| w_1 \right\| }\end{aligned}$$
$$\begin{aligned}w_2=v_2-\langle v_2,u_1 \rangle u_1,\ u_2=\dfrac{w_2}{\left\| w_2 \right\| }\end{aligned}$$
$$\begin{aligned}w_3=v_3-\langle v_3,u_1 \rangle u_1- \langle v_3,u_2 \rangle u_2, u_3=\dfrac{w_3}{\left\| w_3 \right\| }\end{aligned}$$
$$\begin{aligned}\vdots \end{aligned}$$
$$\begin{aligned}w_n=v_n-\langle v_n,u_1 \rangle u_1- \cdots -\langle v_n,u_{n-1} \rangle u_{n-1},\ \ u_n=\dfrac{w_n}{\left\| w_n \right\| } \end{aligned}$$

Then \(\lbrace u_1,u_2, \ldots u_n \rbrace \) is an orthonormal set in V and

$$\begin{aligned}span\lbrace u_1,u_2, \ldots ,u_n \rbrace = span \lbrace v_1,v_2, \ldots ,v_n \rbrace \end{aligned}$$

Proof

Since \(\lbrace v_1,v_2, \ldots v_n \rbrace \) is linearly independent, \(v_i \ne 0\) for all \(i=1,2, \ldots ,n\). We prove by induction on i. Consider \(\lbrace v_1 \rbrace \). Clearly \(\lbrace v_1 \rbrace \) is linearly independent.  Take \(w_1=v_1\) and \(u_1=\dfrac{w_1}{\left\| w_1 \right\| }\). Then \(\left\| u_1 \right\| =\dfrac{\left\| w_1 \right\| }{\left\| w_1 \right\| }=1\) and \(span \lbrace u_1 \rbrace =span \lbrace v_1 \rbrace \) (Fig. 5.14).

Fig. 5.14
6 vectors from the same point. The vectors in the same line on the right are u 1, (v 2, u 1) times u 1, and w 1 = v 1. The vectors in the same line on the left are u 2 and w 2 = v 2 minus (v 2, u 1) times u 1. The diagonal vector at the center is v 2. 2 dashed lines form a rectangle with 5 vectors.

Geometrical representation of first two steps of Gram–Schmidt process

For \(0 \le i \le n-1 \), define

$$\begin{aligned}w_i=v_i-\langle v_i,u_1 \rangle u_1- \cdots -\langle v_i,u_{i-1} \rangle u_{i-1},\ \ u_i=\dfrac{w_i}{\left\| w_i \right\| } \end{aligned}$$

and suppose that \(\lbrace u_1,u_2, \ldots u_{n-1} \rbrace \) is an orthonormal set with

$$\begin{aligned}span\lbrace u_1,u_2, \ldots ,u_{n-1} \rbrace = span \lbrace v_1,v_2, \ldots ,v_{n-1} \rbrace \end{aligned}$$

Now define,

$$\begin{aligned}w_n=v_n-\langle v_n,u_1 \rangle u_1- \cdots -\langle v_n,u_{n-1} \rangle u_{n-1}\end{aligned}$$

Since \(\lbrace v_1,v_2, \ldots v_n \rbrace \) is a linearly independent set \(v_n \notin span \lbrace v_1,v_2, \ldots ,v_{n-1} \rbrace = span\lbrace u_1,u_2, \ldots ,u_{n-1} \rbrace \). Since \(w_n \ne 0\), take \(u_n=\dfrac{w_n}{\left\| w_n \right\| } \). Then clearly \(\left\| u_n \right\| =1\). Now for \(i \le n-1\), we have

$$\begin{aligned} \langle w_n,u_i \rangle &= \langle v_n-\langle v_n,u_1 \rangle u_1- \cdots -\langle v_n,u_{n-1} \rangle u_{n-1} ,u_i \rangle \\ &= \langle v_n ,u_i \rangle -\langle v_n,u_1 \rangle \langle u_1 , u_i \rangle - \cdots - \langle v_n,u_{n-1} \rangle \langle u_{n-1} , u_i \rangle \\ &= \langle v_n ,u_i \rangle - \langle v_n ,u_i \rangle \\ &=0 \end{aligned}$$

as \(\lbrace u_1,u_2, \ldots u_{n-1} \rbrace \) is an orthonormal set. Therefore \(\langle w_n,w_i \rangle =0 \) for \(0 \le i \le n-1\) and hence \(\lbrace u_1,u_2, \ldots u_n \rbrace \) is an orthonormal set. Also

$$\begin{aligned} span \lbrace u_1,u_2, \ldots u_n \rbrace &= span \lbrace v_1,v_2, \ldots ,v_{n-1} ,u_n \rbrace \\ &= span \left\{ v_1,v_2, \ldots ,v_{n-1} ,\dfrac{w_n}{\left\| w_n \right\| }\right\} \\ &= span \lbrace v_1,v_2, \ldots ,v_n \rbrace \end{aligned}$$

Hence the proof.

Example 5.26

Let \(V=\mathbb {R}^4\) and

$$\begin{aligned}S= \lbrace v_1=(0,1,1,0) , v_2=(1,2,1,0),v_3=(1,0,0,1) \rbrace \end{aligned}$$

Since \(\begin{bmatrix} 0 &{} 1 &{} 1 &{} 0\\ 1 &{} 2 &{} 1 &{} 0\\ 1 &{} 0 &{} 0 &{} 1 \end{bmatrix}\) is of rank 3, S is linearly independent. Also as \(\langle v_1,v_2 \rangle =2+1=3\), S is not orthogonal. Now we may apply, Gram–Schmidt process to obtain an orthonormal set. Take \(w_1=v_1=(1,0,1,0)\). Then \(u_1=\dfrac{w_1}{\left\| w_1 \right\| }=\frac{1}{\sqrt{2}}(0,1,1,0)\).

Now

$$\begin{aligned} w_2&=v_2-\langle v_2,u_1 \rangle u_1 \\ &=(1,2,1,0)-\langle (1,2,1,0), \frac{1}{\sqrt{2}}(0,1,1,0) \rangle \frac{1}{\sqrt{2}}(0,1,1,0) \\ &= \frac{1}{2} \left( 2,1,-1,0 \right) \end{aligned}$$

and hence \(u_2=\dfrac{w_2}{\left\| w_2 \right\| }=\dfrac{1}{\sqrt{6}}(2,1,-1,0) \). Finally,

$$\begin{aligned} w_3&=v_3-\langle v_3,u_1 \rangle u_1-\langle v_3,u_2 \rangle u_2 \\ &=(1,0,0,1)-\langle (1,0,0,1), \frac{1}{\sqrt{2}}(0,1,1,0) \rangle \frac{1}{\sqrt{2}}(0,1,1,0)\\ &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ -\langle (1,0,0,1), \dfrac{1}{3}(2,1,-1,0) \rangle \dfrac{1}{3}(2,1,-1,0) \\ &= \dfrac{1}{3}(1,-1,1,3) \end{aligned}$$

and hence \(u_3=\dfrac{w_3}{\left\| w_3 \right\| }=\frac{1}{2\sqrt{3}}(1,-1,1,3) \). The set \(\lbrace u_1,u_2,u_3 \rbrace \) is an orthonormal set and \(span\lbrace u_1,u_2,u_3 \rbrace = span \lbrace v_1,v_2,v_3 \rbrace \).

Remark 5.5

Consider a matrix A with columns \(v_1,v_2,v_3\) from the above example. That is, \(A=\begin{bmatrix} 0 &{} 1 &{} 1 \\ 1 &{} 2 &{} 0 \\ 1 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \end{bmatrix}\). Then

$$\begin{aligned}A=\begin{bmatrix} 0 &{} \sqrt{\frac{2}{3}} &{} \frac{\sqrt{3}}{6} \\ \frac{\sqrt{2}}{2} &{} \frac{1}{\sqrt{6}} &{} -\frac{\sqrt{3}}{6} \\ \frac{\sqrt{2}}{2} &{} -\frac{1}{\sqrt{6}} &{} \frac{\sqrt{3}}{6} \\ 0 &{} 0 &{} \frac{\sqrt{3}}{2} \end{bmatrix} \begin{bmatrix} \sqrt{2} &{} \frac{3\sqrt{2}}{2} &{} \frac{\sqrt{2}}{2} \\ 0 &{} \sqrt{\frac{3}{2}} &{} \sqrt{\frac{2}{3}} \\ 0 &{} 0 &{} \frac{2\sqrt{3}}{3} \end{bmatrix}=QR\end{aligned}$$

Clearly, the columns of the matrix Q forms an orthonormal set and R is an upper triangular matrix with entries \(R_{ii}=\left\| w_i \right\| \forall \ i=1,2,3\) and \(R_{ij}=\langle v_j,u_i \rangle \ \forall \ j>i (i,j=1,2,3) \). This decomposition of a matrix with linearly independent columns into the product of an upper triangular matrix and a matrix whose columns form an orthonormal set is called the QR- decomposition.

Example 5.27

Consider \(V=\mathbb {P}_2[-1,1]\) and \(S=\left\{ 1,x,x^2 \right\} \). We have already seen that S is a basis of V and hence is linearly independent. Also as \(\int _{-1}^11.x^2 dx = \frac{2}{3}\), S is not orthogonal. Therefore take \(w_1=1\). As \(\left\| w_1 \right\| ^2 = \int _{-1}^1 1 dx=2\), we get \(u_1=\frac{1}{\sqrt{2}}\).

Now

$$\begin{aligned} w_2=v_2-\langle v_2,u_1 \rangle u_1 =x-\frac{1}{4} \int _{-1}^1 xdx = x ,u_2=\sqrt{\frac{3}{2}}x\end{aligned}$$

and

$$\begin{aligned}w_3=v_3-\langle v_3,u_1 \rangle u_1-\langle v_3,u_2 \rangle u_2=x^2-\int _{-1}^1x^2dx,u_3=\sqrt{\frac{5}{8}}(3x^2-1)\end{aligned}$$

Thus \(\left\{ \frac{1}{\sqrt{2}},\sqrt{\frac{3}{2}}x, \sqrt{\frac{5}{8}}(3x^2-1) \right\} \) is an orthonormal basis for \(\mathbb {P}_2[-1,1]\).

The above example makes it clear that given a basis, one could construct an orthonormal basis from it. Hence, we could assure that “Every finite-dimensional vector space has an orthonormal basis”.

4 Orthogonal Complement and Projection

In Sect. 5.2, we have discussed about orthogonal projection on \(\mathbb {R}^2\). We will extend this idea to the general inner product space structure here. Representing an inner product space as the direct sum of a closed subspace and its orthogonal complement has many useful applications in mathematics.

Definition 5.16

Let S be a non-empty subset of an inner product space V, then the set \(\lbrace v \in V \mid \langle v,s \rangle =0, \forall s \in S \rbrace \),i.e., the set of all vectors of V that are orthogonal to every vector in S is called the orthogonal complement of S and is denoted by \(S^{\perp }\). Clearly \(\lbrace 0 \rbrace ^{\perp }=V\) and \(V^{\perp }=\lbrace 0 \rbrace \). Also \(S \cap S^{\perp }=\lbrace 0 \rbrace \).

Remark 5.6

\(S^{\perp }\) is a subspace of V for any subset of V. For

$$\begin{aligned}\langle \lambda s_1+s_2 ,s \rangle =\lambda \langle s_1 ,s \rangle + \langle s_2 ,s \rangle =0\end{aligned}$$

for all \(s_1,s_2 \in S^{\perp }\) and \(\lambda \in \mathbb {K}\) (Fig. 5.15).

Fig. 5.15
A graph has a tilted rectangular plane along the vertical axis. The upper edge of the rectangle is labeled v transpose. A bidirectional arrow extends between the first and third quadrants through the origin. The vector v extends from the origin to the third quadrant.

Suppose v is a non-zero vector in \(\mathbb {R}^3\). Then \(v^{\perp }\) is the plane passing through origin \(\mathcal {O}\) and perpendicular to the vector v

Example 5.28

Consider \(V=\mathbb {R}^3\) and let \(S_1=\left\{ (1,2,3) \right\} \). Then

$$\begin{aligned} S_1^{\perp } &= \lbrace (v_1,v_2,v_3) \in \mathbb {R}^3 \mid \langle (v_1,v_2,v_3),(1,2,3) \rangle =0 \rbrace \\ &= \lbrace (v_1,v_2,v_3) \in \mathbb {R}^3 \mid v_1+2v_2+3v_3 =0 \rbrace \\ &= \text {plane\ passing\ through\ origin\ and\ perpendicular\ to\ the\ point}\ (1,2,3) \end{aligned}$$

Take \(S_2= \left\{ (1,0,1),(1,2,3) \right\} \). Then

$$\begin{aligned} S_2^{\perp } &= \lbrace (v_1,v_2,v_3) \in \mathbb {R}^3 \mid \langle (v_1,v_2,v_3),(1,0,1) \rangle =0 , \langle (v_1,v_2,v_3),(1,2,3) \rangle =0 \rbrace \\ &= \lbrace (v_1,v_2,v_3) \in \mathbb {R}^3 \mid v_1+v_3=0,v_1+2v_2+3v_3 =0 \rbrace \\ &= \lbrace (v_1,v_2,v_3) \in \mathbb {R}^3 \mid v_1=v_2=-v_3 \rbrace \\ &= \text {line\ passing\ through\ origin\ and\ passing\ through\ the\ point}\ (1,1,-1) \end{aligned}$$

Observe that if S is a singleton set (with non-zero element), \(S^{\perp }\) will be a plane passing through the origin as we will have to solve a homogeneous equation of three variables to find \(S^{\perp }\). Similarly, if S is a set with two linearly independent elements, \(S^{\perp }\) will be a line passing through the origin.

Example 5.29

Consider \(V=\mathbb {P}_2[0,1]\) and let \(S=\left\{ x \right\} \). Then

$$\begin{aligned} S^{\perp } &= \lbrace ax^2+bx+c \in \mathbb {P}_2[0,1] \mid \langle x,ax^2+bx+c \rangle =0 \rbrace \\ &= \lbrace ax^2+bx+c \in \mathbb {P}_2[0,1] \mid \int _0^{1}(ax^3+bx^2+cx)dx=0 \rbrace \\ &= \lbrace ax^2+bx+c \in \mathbb {P}_2[0,1] \mid 3a+4b+6c=0 \rbrace \end{aligned}$$

Given a subspace of an inner product space V, it is not always easy to find the orthogonal complement. The following theorem simplifies our effort in finding the orthogonal complement of a subspace.

Theorem 5.12

Let V be an inner product space and W be a finite-dimensional subspace of V. Then for any \(v \in V\), \(v \in W^{\perp }\) if and only if \(\langle v,w_i \rangle =0\) for all \(w_i \in B\), where B is a basis for W.

Proof

Let \(B=\lbrace w_1,w_2, \ldots , w_k \rbrace \) be a basis for W. Then for \(w \in W\), there exists scalars \(\lambda _1,\lambda _2, \ldots ,\lambda _k\) such that \(w=\lambda _1w_1+\lambda _2w_2+ \cdots +\lambda _kw_k\). Then for any \(v \in V\),

$$\begin{aligned} \langle v,w \rangle &=\langle v, \lambda _1w_1+\lambda _2w_2+ \cdots +\lambda _kw_k \rangle \\ &= \sum _{i=1}^k \lambda _i \langle v ,w_i \rangle \end{aligned}$$

Therefore \(\langle v,w_i \rangle =0\) for all \(w_i \in B\) implies that \(\langle v,w \rangle =0\). Hence, \(v \in W^{\perp }\). Conversely, suppose that \(v \in W^{\perp }\). Then by the definition of orthogonal complement \(\langle v,w_i \rangle =0\) for all \(w_i \in B\).

In Sect. 5.2, we have introduced the concept of projection of a vector to a one-dimensional subspace of \(\mathbb {R}^2\). We have seen that a vector \(v \in \mathbb {R}^2\) can be written as a sum of vectors, \((u.v)u \in span \lbrace u \rbrace \) where u is a unit vector and \(v-(u.v)u \) which is orthogonal to (u.v)u. That is, \(v-(u.v)u\) is an element of \(span \lbrace u \rbrace ^{\perp }\). The vector (u.v)u is called the projection of v on \(span \lbrace u \rbrace \). We will extend this result to any finite-dimensional subspace W of an inner product space V. We will proceed by considering an orthonormal basis \(\lbrace w_1,w_2, \ldots , w_k \rbrace \) for W, projecting \(v \in V\) on each one-dimensional subspace \(span\lbrace w_i \rbrace \) of W and taking the sum. That is, the projection of \(v\in V\) on W will be \(w= \sum \limits _{i=1}^k \langle v,w_i \rangle w_i\).

Theorem 5.13

Let V be an inner product space and W be a finite-dimensional subspace of V. Then for any \(v \in V\), there exist unique vectors \(w \in W\) and \(\tilde{w} \in W^{\perp }\) such that \(v= w+ \tilde{w}\). Furthermore, \(w \in W\) is the unique vector that has the shortest distance from v.

Proof

Let \(B= \lbrace w_1,w_2, \ldots , w_k \rbrace \) be an orthonormal basis for W and consider \(w= \sum _{i=0}^k \langle v,w_i \rangle w_i \in W\). Take \(\tilde{w}=v-w\). Then for any \(w_j \in B\),

$$\begin{aligned} \langle \tilde{w},w_j \rangle &= \left\langle v-\sum _{i=0}^k \langle v,w_i \rangle w_i ,w_j \right\rangle \\ &= \langle v,w_j \rangle - \sum _{i=1}^k \langle v,w_i \rangle \langle w_i , w_j \rangle \\ &= \langle v,w_j \rangle - \langle v,w_j \rangle =0 \end{aligned}$$

That is, \( \langle \tilde{w},w_j \rangle =0 \) for all \(w_j \in B\). Then by Theorem 5.12, \( \tilde{w} \in W^{\perp }\). Also, \(v = w + \tilde{w}\). To prove the uniqueness of w and \(\tilde{w}\) suppose that \(v=w+ \tilde{w} = u + \tilde{u}\) where \(u \in W\) and \(\tilde{u} \in W^{\perp }\). This implies that \(v=w-u= \tilde{u}-\tilde{w}\). Then as \(w-u \in W\) and \(\tilde{u}-\tilde{w} \in W^{\perp }, v \in W \cap W^{\perp }=\lbrace 0 \rbrace \). Hence, \(w=u\) and \(\tilde{w}=\tilde{u}\).

Now we have to prove that \(w= \sum _{i=1}^k \langle v,w_i \rangle w_i\) in W is the unique vector that has the shortest distance from v. Now for any \(w' \in W\),

$$\begin{aligned} \left\| v-w' \right\| ^2 = \left\| w+\tilde{w}-w' \right\| ^2= \left\| (w-w')+\tilde{w} \right\| ^2 \end{aligned}$$

As \(w-w' \in W\) and \(\tilde{w} \in W^{\perp }\), by Pythagoras theorem,

$$\begin{aligned} \left\| v-w' \right\| ^2 =\left\| (w-w') \right\| ^2 + \left\| \tilde{w} \right\| ^2 \ge \left\| \tilde{w} \right\| ^2 =\left\| v-w \right\| ^2 \end{aligned}$$

Thus for any \(w' \in W\), we get \( \left\| v-w' \right\| \ge \left\| \tilde{w} \right\| ^2 =\left\| v-w \right\| \).

Corollary 5.3

Let V be an inner product space and W be a finite-dimensional subspace of V. Then \(V= W \oplus W^{\perp }\).

Proof

From the above theorem, clearly \(V= W + W^{\perp }\). Also, \(W \cap W^{\perp }=\lbrace 0 \rbrace \). Then by Theorem 2.20, \(V= W \oplus W^{\perp }\).

The above decomposition is called the orthogonal decomposition of V with respect to the subspace W. In general, W can be any closed subspace of V.

Definition 5.17

(Orthogonal Projection) Let V be an inner product space and W be a finite-dimensional subspace of V. Then the orthogonal projection \(\pi _W\) of V onto W is the function \(\pi _W(v)=w\), where \(v=w+\tilde{w}\) is the orthogonal decomposition of v with respect to W.

Example 5.30

Consider \(\mathbb {R}^3\) over \(\mathbb {R}\) with standard inner product. Let

$$\begin{aligned}W= \lbrace \left( v_1,v_2,v_3 \right) \in \mathbb {R}^3 \mid v_1=0 \rbrace \end{aligned}$$

That is, the yz-plane. Consider the vector \(v_1=(2,4,5) \in \mathbb {R}^3\). Now we will find the projection of v on W. Clearly \(\lbrace \left( 0,1,0 \right) , \left( 0,0,1 \right) \rbrace \) is an orthonormal basis for W. Then the projection of \(v_1\) on W is given by

$$\begin{aligned}\pi _W(v_1)= \langle (2,4,5), (0,1,0) \rangle (0,1,0) +\langle (2,4,5), (0,0,1) \rangle (0,0,1)=(0,4,5) \end{aligned}$$

For an arbitrary vector \(v=(a,b,c) \in \mathbb {R}^3 \)

$$\begin{aligned}\pi _W(v)= \langle (a,b,c), (0,1,0) \rangle (0,1,0) +\langle (a,b,c), (0,0,1) \rangle (0,0,1)=(0,b,c) \end{aligned}$$

Also observe that \(W^{\perp } = \lbrace \left( v_1,v_2,v_3 \right) \in \mathbb {R}^3 \mid v_2=v_3=0 \rbrace \), i.e., the x-axis and hence \((a,b,c)=(0,b,c)+(a,0,0)\) is the orthogonal decomposition of v with respect to W.

Example 5.31

Consider \(\mathbb {P}_2[-1,1]\). Let \(W= \lbrace a+bx \mid a,b \in \mathbb {R} \rbrace \). Clearly W is a subspace of \(\mathbb {P}_2[0,1]\) and we have already seen that \(\left\{ \frac{1}{\sqrt{2}},\sqrt{\frac{3}{2}}x \right\} \) is an orthonormal basis for W. Consider the element \(v=x^2+2x+3 \in \mathbb {P}_2[-1,1]\). Then from Example 5.25,

$$\begin{aligned}\left\langle \frac{1}{\sqrt{2}} , x^2+2x+3 \right\rangle = \frac{10\sqrt{2}}{3}\ \text {and}\ \left\langle \sqrt{\frac{3}{2}}x , x^2+2x+3 \right\rangle = \frac{2\sqrt{6}}{3}\end{aligned}$$

Therefore the projection of v on W is \(\pi _W(v)=\dfrac{10}{3}+2x\).

Now we will discuss some of the important properties of projection map in the following theorem.

Theorem 5.14

Let W be a finite-dimensional subspace of an inner product space Vand let \(\pi _W\) be the orthogonal projection of V onto W. Then

  1. (a)

    \(\pi _W\) is linear.

  2. (b)

    \(\mathcal {R}\left( \pi _W \right) =W\) and \(\mathcal {N}\left( \pi _W \right) =W^{\perp }\)

  3. (c)

    \(\pi _W^2=\pi _W\)

Proof

  1. (a)

    Let \(v_1,v_2 \in V\). Then by Theorem 5.13, there exists unique vectors \(w_1,w_2 \in W\) and \(\tilde{w_1},\tilde{w_2} \in W^{\perp } \) such that \(v_1=w_1+\tilde{w_1}\) and \(v_2=w_2+\tilde{w_2}\). Then \(\pi _W(v_1)=w_1\) and \(\pi _W(v_2)=w_2\). Now for \(\lambda \in \mathbb {K}\),

    $$\begin{aligned}\lambda v_1 +v_2= \lambda \left( w_1+\tilde{w_1} \right) +\left( w_2+\tilde{w_2} \right) =\left( \lambda w_1 +w_2 \right) + \left( \lambda \tilde{w_1} +\tilde{w_2} \right) \end{aligned}$$

    where \(\lambda w_1 +w_2 \in W\) and \( \lambda \tilde{w_1} +\tilde{w_2} \in W^{\perp }\) as W and \( W^{\perp }\) are subspaces of V. Therefore

    $$\begin{aligned}\pi _W \left( \lambda v_1 +v_2 \right) = \lambda w_1 +w_2 = \lambda \pi _W(v_1) + \pi _W(v_2) \end{aligned}$$

    therefore, \(\pi _W\) is linear.

  2. (b)

    From Theorem 5.13, we have \(V=W \oplus W^{\perp }\) and any vector \(v \in V\) can be written as \(v=\pi _W(v) + \left( v-\pi _W(v) \right) \). Clearly \(\mathcal {R}\left( \pi _W \right) \subseteq W\). Now we have prove the converse part. Let \(w \in W\), then \(\pi _W(w)=w\) as \(w=w+0 \in W + W^{\perp }\). Therefore \(\mathcal {R}\left( \pi _W \right) =W\). Similarly, it is clear that \(\mathcal {N}\left( \pi _W \right) \subseteq W^{\perp }\). Now let \(\tilde{w} \in W^{\perp }\). As \(\tilde{w}=0+ \tilde{w}\), we have \(\pi _W\left( \tilde{w} \right) =0\) and hence \(\mathcal {N}\left( \pi _W \right) =W^{\perp }\).

  3. (c)

    Take any \(v \in V\). By Theorem 5.13, there exists unique vectors \(w \in W\) and \(\tilde{w} \in W^{\perp } \) such that \(v=w+\tilde{w}\). Then

    $$\begin{aligned} \pi _W^2(v)=\pi _W\left( \pi _W(v)\right) = \pi _W(w)=w=\pi _W(v)\end{aligned}$$

    Therefore \(\pi _W^2=\pi _W\).

In Theorem 5.13, we decomposed V as the direct sum of two subspaces where one is the orthogonal complement of the other. There may exist decompositions of V as the direct sum of two subspaces where one subspace is not the orthogonal complement of the other. For example, consider \(\mathbb {R}^3\). Let \(W_1 = span \lbrace (1,0,0),(0,1,0) \rbrace \) and \(W_2 = span \lbrace (1,1,1) \rbrace \). Observe that \(V=W_1 \oplus W_2\) and \(W_1 \not \perp W_2\). In such cases also we can define a linear map.

Theorem 5.15

Let V be an inner product space and \(W_1,W_2\) be subspaces of V with \(V=W_1 \oplus W_2\). Then the map P defined by \(P(v)=w_1\), where \(v=w_1+w_2 \) is the unique representation of \(v\in V\) is linear.

Proof

Similar to the proof of Theorem 5.14(a).

The above defined map P is called projection map. Observe that an orthogonal projection map is a projection map P with \(\left[ \mathcal {R}(P) \right] ^{\perp }= \mathcal {N}\left( P \right) \).

Example 5.32

Consider \(\mathbb {R}^2\) over \(\mathbb {R}\) with standard inner product. Let \(P_1: \mathbb {R}^2 \rightarrow \mathbb {R}^2\) be a linear map defined by

$$\begin{aligned}P_1(v_1,v_2)=(v_1,0)\end{aligned}$$

Observe that \(\mathcal {R}(P_1)\) is the straight line \(y=x\) and \(\mathcal {N}(P_1)\) is the \(y-\) axis. Clearly, \(\mathbb {R}^2= \mathcal {R}(P_1) \oplus \mathcal {N}(P_1)\). Thus \(P_1\) is a projection but not an orthogonal projection (Fig. 5.16).

Fig. 5.16
A graph has a diagonal line R of P 1 that extends between the first and third quadrants through the origin. The vertical axis is highlighted and labeled N of P 1.

Observe that \(\mathcal {R}(P_1) \not \perp \mathcal {N}(P_1)\). Therefore \(P_1\) is not an orthogonal projection

Example 5.33

Consider \(\mathbb {P}_2[0,1]\) with the inner product \(\langle p,q \rangle = \int _0^1 p(x)q(x) dx\). Let \(P_2: \mathbb {P}_2[0,1] \rightarrow \mathbb {P}_2[0,1] \) be a linear map defined by

$$\begin{aligned}P_2(a_0 + a_1x +a_2 x^2 ) =a_1 x\end{aligned}$$

We have \(\mathcal {R}(P_2)=span \lbrace x \rbrace \) and \(\mathcal {N}(P_2)=span \lbrace 1,x^2 \rbrace \). Observe that \(\mathbb {P}_2[0,1]= \mathcal {R}(P_2) \oplus \mathcal {N}(P_2)\), but \(\mathcal {R}(P_2) \not \perp \mathcal {N}(P_2)\). Therefore \(P_2\) is a projection but not an orthogonal projection.

The following theorem gives an algebraic method to check whether a linear operator is a projection map or not.

Theorem 5.16

Let V be a finite-dimensional inner product space and T be a linear operator on V. Then T is a projection of V if and only if \(T^2=T\).

Proof

Suppose that T is a projection on V, then clearly \(T^2=T\) by definition. Now suppose that T is a linear operator on V such that \(T^2=T\). We will show that \(V = \mathcal {R}(T) \oplus \mathcal {N}(T)\). Let \(v \in \mathcal {R}(T) \oplus \mathcal {N}(T)\). Then there exists \(\tilde{v} \in V\) such that \(T(\tilde{v})=v\). Also \(T(v)=0\). Now \(T^2(\tilde{v})=T(v)=0=T(\tilde{v})=v\) as \(T^2=T\). Thus T is a projection on V.

Example 5.34

Consider the linear operators \(P_1\) and \(P_2\) from Examples 5.32 and 5.33 respectively. Clearly, we can see that \(P_1^2=P_1\) and \(P_2^2=P_2\).

5 Exercises

  1. 1.

    Show that \((\mathbb {R},d)\) is a metric space, where \(d:\mathbb {R} \times \mathbb {R} \rightarrow \mathbb {R}\) is defined by

    1. (a)

      \(d(x,y)=|e^x-e^y |\) for \(x,y \in \mathbb {R}\).

    2. (b)

      \(d(x,y)=\frac{|x-y |}{1+|x-y |}\) for \(x,y \in \mathbb {R}\).

    Check whether d is induced by any norm on \(\mathbb {R}\)?

  2. 2.

    Let \(v=(v_1,v_2, \ldots v_n ) \in \mathbb {R}^n\). Show that

    1. (a)

      \(\left\| v \right\| _{\infty } = max \lbrace |v_1 |, \ldots , |v_n |\rbrace \) defines a norm on \(\mathbb {R}^n\) called infinity norm.

    2. (b)

      for \(p \ge 1\), \(\left\| v \right\| _p = \left( \sum _{i=1}^n |v_i |^p \right) ^{\frac{1}{p}}\) defines a norm on \(\mathbb {R}^n\) called p-norm.

  3. 3.

    Show that the following functions define a norm on \(\mathbb {M}_{m \times n} \left( \mathbb {R} \right) \). Let \(A=\left[ a_{ij} \right] \in \mathbb {M}_{m \times n} \left( \mathbb {R} \right) \).

    1. (a)

      \(\left\| A \right\| _1 = \max \limits _{1 \le j \le n} \sum _{i=1}^m |a_{ij} |\)

    2. (b)

      \(\left\| A \right\| _\infty = \max \limits _{1 \le i \le m} \sum _{j=1}^n |a_{ij} |\)

    3. (c)

      \(\left\| A \right\| _2 = \sqrt{\lambda _{max}(A^TA)}\), where \(\lambda _{max} \) denotes the highest eigenvalue of A.

  4. 4.

    Show that in a finite-dimensional space V every norm defined on it are equivalent.

  5. 5.

    Show that every finite-dimensional normed linear space is complete.

  6. 6.

    Show that

    1. (a)

      \(\left\| v \right\| _p =\left( \sum _{i=1}^{\infty } |v_i |^p \right) ^{\frac{1}{p}} \) defines a norm on \( l ^p\).

    2. (b)

      \(\left\| v \right\| _{\infty } = \sup \limits _{i \in \mathbb {N}} |v_i |\) defines a norm on \( l ^{\infty }\).

    3. (c)

      for \(1 \le p < r < \infty \), \( l ^p \subset l ^r\). Also \( l ^p \subset l ^{\infty }\).

  7. 7.

    Show that the following collections

    $$\begin{aligned} c&= \lbrace v=(v_1,v_2, \ldots ) \in l ^{\infty } \mid v_i \rightarrow \lambda \in \mathbb {K}\ \text {as}\ i \rightarrow \infty \rbrace \\ c_0&= \lbrace v=(v_1,v_2, \ldots ) \in l ^{\infty } \mid v_i \rightarrow 0 \ \text {as}\ i \rightarrow \infty \rbrace \\ c_{00}&= \lbrace v=(v_1,v_2, \ldots ) \in l ^{\infty } \mid \text {all\ but\ finitely\ many}\ v_i's\ \text {are\ equal\ to}\ 0 \rbrace \end{aligned}$$

    are subspaces of \( l ^{\infty }\).

  8. 8.

    Show that \(c,c_0\) are complete, whereas \(c_{00}\) is not complete with respect to the norm defined on \( l ^{\infty }\).

  9. 9.

    Let V be a vector space over a field \(\mathbb {K}\). A set \(B \subset V\) is a Hamel basis for V if \(span(B)=V\) and any finite subset of B is linearly independent. Show that if \(\left( V, \left\| . \right\| \right) \) is an infinite-dimensional Banach space with a Hamel basis B, then B is uncountable. (Hint: Use Baire’s Category theorem.)

  10. 10.

    Let \(u=(u_1,u_2),v=(v_1,v_2)\in \mathbb {R}^2\). Check whether the following defines an inner product on \(\mathbb {R}^2\) or not.

    1. (a)

      \(\langle u,v \rangle =v_1(u_1+2u_2)+v_2(2u_1+5v_2)\)

    2. (b)

      \(\langle u,v \rangle =v_1(2u_1+u_2)+v_2(u_1+v_2)\)

  11. 11.

    Show that \(\langle z_1,z_2 \rangle =Re(z_1\overline{z_2})\) defines an inner product on \(\mathbb {C}\), where Re(z) denotes the real part of the complex number \(z=a+ib\).

  12. 12.

    Show that \(\langle A,B \rangle =Tr\left( B^*A \right) \) defines an inner product on \(\mathbb {M}_{m \times n} \left( \mathbb {K} \right) \).

  13. 13.

    Prove or disprove:

    1. (a)

      The sequence spaces \( l ^p\) with \(p \ne 2\) are not inner product spaces.

    2. (b)

      \( l ^2\) with \(\langle u,v \rangle = \sum _{i=1}^{\infty } u_i \overline{v_i}\), where \(u=(u_1,u_2, \ldots ),v=(v_1,v_2, \ldots ) \in l ^2\) is a Hilbert space.

  14. 14.

    Let \(\left( V, \langle , \rangle \right) \) be an inner product space . Then show that for all \(u,v \in V\)

    $$\begin{aligned}\langle u,v \rangle = \frac{1}{4}\left[ \langle u+v,u+v \rangle - \langle u-v,u-v \rangle \rangle \right] \end{aligned}$$

    if \(\mathbb {K}=\mathbb {R}\). Also show that if \(\mathbb {K}=\mathbb {C}\), we have

    $$\begin{aligned}\langle u,v \rangle = \frac{1}{4}\left[ \langle u+v,u+v \rangle - \langle u-v,u-v \rangle + i \langle u+iv,u+iv \rangle - i \langle u-iv,u-iv \rangle \right] \end{aligned}$$
  15. 15.

    Show that in an inner product space V, \(u_n \rightarrow u \) and \(v_n \rightarrow v \) implies that \(\langle u_n,v_n \rangle \rightarrow \langle u,v \rangle \).

  16. 16.

    Show that \( l ^p\) with \(\langle u,v \rangle = \sum _{n=1}^{\infty } u_nv_n\) is a Hilbert space.

  17. 17.

    Let V be an inner product space with an orthonormal basis \(\lbrace v_1,v_2, \ldots ,v_n \rbrace \). Then for any \(v\in V\), show that \(\left\| v \right\| ^2 = \sum _{i=1}^n |\langle v,v_i \rangle |^2 \).

  18. 18.

    (Bessel’s Inequality) Let S be a countable orthonormal set in an inner product space V. Then for every \(v\in V\), show that \(\sum _{u_i \in S} |\langle v,u_i \rangle |^2 \le \left\| v \right\| ^2 \)

  19. 19.

    Let S be an orthonormal set in an inner product space V. Then for every \(v\in V\), show that the set \(S_v=\lbrace u \in S \mid \langle v,u \rangle =0 \rbrace \) is a countable set. (Hint: Use Bessel’s Inequality)

  20. 20.

    Construct an orthonormal basis using Gram–Schmidt orthonormalization process

    1. (a)

      for \(\mathbb {R}^3\) with standard inner product, using the basis \(\left\{ \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix},\begin{bmatrix} -1 \\ 0 \\ 2 \end{bmatrix}, \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} \right\} \)

    2. (b)

      for \(\mathbb {P}_3[0,1]\) with \(\langle f,g \rangle = \int _0^1 f(x)g(x)dx\), using the basis \(\lbrace 1,x,x^2 \rbrace \)

  21. 21.

    Show that, for \(A \in \mathbb {M}_{n \times n}\left( \mathbb {R} \right) \), \(AA^T=I\) if and only if the rows of A form an orthonormal basis for \(\mathbb {R}^n\).

  22. 22.

    Consider \(\mathbb {R}^2\) with standard inner product. Find \(S^{\perp }\), when S is

    1. (a)

      \(\lbrace u \rbrace \), where \(u=(u_1,u_2) \ne 0\)

    2. (b)

      \(\lbrace u,v \rbrace \), where u,v are two linearly independent vectors.

  23. 23.

    Let \(S_1,S_2\) be two non-empty subsets of an inner product space V, with \(S_1 \subset S_2\). Then show that

    (a) \(S_1 \subset S_1^{\perp \perp }\)          (b) \(S_2^{\perp } \subset S_1^{\perp }\)          (c) \( S_1^{\perp \perp \perp }=S_1^{\perp }\)

  24. 24.

    Let \(S= \lbrace (3,5,-1) \rbrace \subset \mathbb {R}^3 \).

    1. (a)

      Find an orthonormal basis B for \(S^{\perp }\).

    2. (b)

      Find the projection of \((2,3,-1) \) onto \(S^{\perp }\).

    3. (c)

      Extend B to an orthonormal basis of \(\mathbb {R}^3 \).

  25. 25.

    Let V be a finite-dimensional inner product space. Let \(W_1,W_2\) be subspaces of V. Then show that

    1. (a)

      \( \left( W_1 + W_2 \right) ^{\perp } = W_1^{\perp } \cap W_2^{\perp }\)

    2. (b)

      \( \left( W_1 \cap W_2 \right) ^{\perp } = W_1^{\perp } + W_2^{\perp }\)

  26. 26.

    Prove or disprove: Let W be any subspace of \(\mathbb {R}^n\) and let \(S \subset \mathbb {R}^n\) spans W. Consider a matrix A with elements of S as columns. Then \(W^{\perp } = ker(A)\).

  27. 27.

    Find the orthogonal projection of the given vector v onto the given subspace W of an inner product space V.

    1. (a)

      \(v=(1,2)\), \(W= \lbrace (x_1,x_2) \in \mathbb {R}^2 \mid x_1+x_2 =0 \rbrace \)

    2. (b)

      \(v=(3,1,2)\), \(W= \lbrace (x_1,x_2,x_3) \in \mathbb {R}^3 \mid x_3=2x_1+x_2 \rbrace \)

    3. (c)

      \(v=1+2x+x^2\), \(W= \lbrace a_0 + a_1 x+ a_2 x^2 \in \mathbb {P}_2[0,1] \mid a_2 = 0 \rbrace \)

  28. 28.

    Let V be an inner product space and W be a finite-dimensional subspace of V. If T is an orthogonal projection of V onto W, then \(I-T\) is the orthogonal projection of V onto \(W^{\perp }\).

  29. 29.

    Consider \(\mathcal {C}[-1,1]\) with the inner product \(\langle f,g \rangle = \int _{-1}^1 f(s)g(s) ds\), for all \(f,g \in \mathcal {C}[0,1]\). Let W be the subspace of \(\mathcal {C}[0,1]\) spanned by \( \lbrace x+1 , x^2 +x \rbrace \).

    1. (a)

      Find an orthonormal basis for \(span \left( W \right) \).

    2. (b)

      What will be the projection of \(x^3\) onto \(span \left( W \right) \)?

  30. 30.

    Show that a bounded linear operator on a Hilbert space V is an orthogonal projection if and only if P is self-adjoint and P is idempotent\((P^2=P)\).

Solved Questions related to this chapter are provided in Chap. 11.