We introduce a wide range of fundamental mathematical concepts and structures in this chapter on foundation of mathematics. Understanding their fundamental operations and attributes, we start with sets and functions. We then delve into the metric space universe, which offers a framework for comprehending distance and convergence. Moving on to algebraic structures, we examine the distinctive qualities and illustrative instances of groups, rings, and fields. Polynomial rings and their essential properties are introduced, as are matrices and their rank, trace, and determinant, all of which are highlighted as they have vital roles in the coming chapters. The latter sections of the chapter provide an overview of Euclidean space and demonstrate how to solve systems of linear equations using techniques like Cramer’s rule, LU decomposition, Gauss elimination, etc. These fundamental ideas in mathematics serve as the building blocks for more complex mathematical research and have numerous applications in science and engineering.

1 Sets and Functions

 Set theory is the core of modern mathematics and serves as a language for mathematicians to discuss and organize their ideas. It is a crucial and elegant concept at its core; a set is simply a collection of objects, similar to a bag containing multiple objects. These objects can be anything from numbers, characters, shapes, or other sets. The way set theory lets us classify, compare, and evaluate these collections is what makes it so powerful. This section will discuss some of the essential concepts in set theory. Though the notion of set is not well-defined in wide generality as it leads to paradoxes like Russell’s Paradox, published by Bertrand Russell (1872–1970) in 1901, we start with the following simple definition for a preliminary understanding of a set.

Definition 1.1

(Set) A set is a well-defined collection of objects. That is, to define a set X, we must know for sure whether an element x belongs to X or not. If x is an element of X, then it is denoted by \(x \in X\) and if x is not an element of X, then it is denoted by \(x \notin X\). Two sets X and Y are said to be equal if they have the same elements.

Definition 1.2

(Subset) Let X and Y be any two sets, then X is a subset of Y, denoted by \(X \subseteq Y\), if every element of X is also an element of Y. Two sets X and Y are equal if and only if \(X \subseteq Y\) and \(Y \subseteq X\).

A set can be defined in a number of ways. Commonly, a set is defined by either listing all the entries explicitly, called the Roster form, or by stating the properties that are meaningful and unambiguous for elements of the set, called the Set builder form.

Example 1.1

Here are some familiar collection/sets of numbers.

  • \(\mathbb {N}-\)the set of all natural numbers \(- \lbrace 1,2,3, \ldots \rbrace \)

  • \(\mathbb {W}-\)the set of all whole numbers \(- \lbrace 0,1,2, \ldots \rbrace \)

  • \(\mathbb {Z}-\)the set of all integers \(- \lbrace \ldots , -3,-2,-1,0,1,2,3, \ldots \rbrace \)

  • \(\mathbb {Q}-\)the set of all rational numbers \(- \lbrace \frac{p}{q} \mid p,q \in \mathbb {Z},\ q \ne 0 \rbrace \)

  • \(\mathbb {R}-\)the set of all real numbers

  • \(\mathbb {C}-\)the set of all complex numbers.

Usually, in a particular context, we have to deal with the elements and subsets of a basic set which is relevant to that particular context. This basic set is called the “Universal Set” and is denoted by \(\mathcal {U}\). For example, while studying the number system, we are interested in the set of natural numbers, \(\mathbb {N}\), and its subsets such as the set of all prime numbers, the set of all odd numbers, and so forth. In this case \(\mathbb {N}\) is the universal set. A null set, often known as an empty set, is another fundamental object in set theory. It is a set with no elements, which means it has no objects or members. In set notation, the null set is commonly represented by \(\Phi \) or \(\{\}\) (an empty pair of curly braces).

Definition 1.3

(Cardinality) The cardinality of a set X is the number of elements in X. A set X can be finite or infinite depending on the number of elements in X. Cardinality of X is denoted by \(|X |\).

Example 1.2

All the sets mentioned in Example 1.1 are infinite sets. The set of letters in the English alphabet is a finite set.

Set Operations Set operations are fundamental mathematical methods for constructing, manipulating, and analyzing sets. They enable the combination, comparison, and modification of sets in order to acquire insights and solve various mathematical and real-life problems. Union (combining items from several sets), intersection (finding common elements between sets), complement (identifying elements not in a set), and set difference (removing elements from one set based on another) are the fundamental set operations.

Definition 1.4

(Union and Intersection) Let X and Y be two sets. The union of X and Y, denoted by \(X \cup Y\), is the set of all elements that belong to either X or Y. The intersection of X and Y, denoted by \(X \cap Y\), is the set of all elements that belong to both X and Y.

The relationship between sets can be illustrated with the use of diagrams, known as Venn diagrams. It was popularized by the famous mathematician John Venn (1834–1923). In a Venn diagram, a rectangle is used to represent the universal set and circles are used to represent its subsets. For example, the union and intersection of two sets are represented in Fig. 1.1.

Fig. 1.1
Two Venn diagrams. The circles are labeled X and Y. A, both the circles are shaded. B, the area of intersection between the circles is shaded.

The shaded portions in a and b represents the union and intersection of the sets X and Y, respectively

Definition 1.5

(Difference of Y related to X) Let X and Y be two sets. The difference of Y related to X, denoted by \(X\setminus Y\), is the set of all elements in X which are not in Y. The difference of a set X related to its universal set \(\mathcal {U}\) is called the complement of X and is denoted by \(X^{c}\). That is, \(X^{c}=\mathcal {U}\setminus X\). Keep in mind that \(\mathcal {U}^c=\Phi \) and \(\Phi ^c=\mathcal {U}\) (Fig. 1.2).

Fig. 1.2
Two Venn diagrams. The circles are labeled X and Y. A, the circle X except for the area of intersection, is shaded. B, the circle Y except the area of intersection, and the outer rectangle are shaded.

The shaded portion in a represents the difference of Y related to X and the shaded portion in b represents the complement of a set

Definition 1.6

(Cartesian Product) Let X and Y be two sets. The Cartesian product of X and Y, denoted by \(X \times Y\), is the set of all ordered pairs (xy) such that x belongs to X and y belongs to Y. That is, \(X \times Y= \lbrace (x,y) \mid x \in X,y \in Y \rbrace \).

Example 1.3

Let \(X= \lbrace 1,2,3 \rbrace \) and \(Y = \lbrace 3,4,5 \rbrace \). Then the union and intersection of X and Y are \( X \cup Y =\lbrace 1,2,3,4,5 \rbrace \) and \( X \cap Y =\lbrace 3 \rbrace \), respectively. The difference of Y related to X is \( X \setminus Y =\lbrace 1,2 \rbrace \), and the Cartesian product of X and Y is \(X \times Y =\lbrace (1,3),(1,4),(1,5),(2,3),(2,4),(2,5),(3,3),(3,4),(3,5)\rbrace \).

Remark 1.1

Two sets X and Y are said to be disjoint, if their intersection is empty. That is, if \(X \cap Y = \Phi \).

We will now try to “connect” elements of distinct sets using the concept, “Relations”. A relation between two sets allows for the exploration and quantification of links and relationships between elements of various sets. It essentially acts as a link between elements of another set and elements from another, exposing patterns, dependencies, or correspondences.

Definition 1.7

(Relation) A relation R from a non-empty set X to a non-empty set Y is a subset of the Cartesian product \(X \times Y\). It is obtained by defining a relationship between the first element and second element (called the “image” of first element) of the ordered pairs in \(X \times Y\).

The set of all first elements in a relation R is called the domain of the relation R, and the set of all second elements is called the range of R. As we represent sets, a relation may be represented either in the roster form or in the set builder form. In the case of finite sets, a visual representation by an arrow diagram is also possible.

Example 1.4

Consider the sets X and Y from Example 1.3 and their Cartesian product \(X \times Y\). Then \(R=\lbrace (1,3),(2,4),(3,5) \rbrace \) is a relation between X and Y. The set builder form of the given relation can be given by \(R= \lbrace (x,y) \mid y=x+2,x \in X,y\in Y\rbrace \) (Fig. 1.3).

Fig. 1.3
Two ellipses. The elements 1, 2, and 3 on the left ellipse are mapped to the elements 3, 4, and 5, respectively, on the right ellipse with right arrows.

Arrow diagram for R

Remark 1.2

If \(|X |=m\) and \( |Y |=n\), then \(|X \times Y |= mn\) and the number of possible relations from set X to set Y is \(2^{mn}\).

Definition 1.8

(Equivalence Relations) A relation R on a set X is said to be an equivalence relation if and only if the following conditions are satisfied:

  1. (a)

    \((x,x)\in R\) for all \(x \in X\) ( Reflexive)

  2. (b)

    \((x,y) \in R\) implies \((y,x) \in R\) ( Symmetric)

  3. (c)

    \((x,y) \in R\) and \((y,z) \in R\) implies \((x,z) \in R\) ( Transitive).

Example 1.5

Consider \(\mathbb {N}\) with the relation R, where \((x,y) \in R\) if and only if \(x - y\) is divisible by n, where n is a positive integer. We will show that R is an equivalence relation on \(\mathbb {N}\). For,

  1. (a)

    \((x,x)\in R\) for all \(x \in \mathbb {N}\). For, \(x-x=0\) is divisible by n for all \(x\in \mathbb {N}\).

  2. (b)

    \((x,y) \in R \) implies \((y,x) \in R\). For,

    $$\begin{aligned} (x,y) \in R &\Rightarrow x-y\ \text {is divisible by}\ n \\ {} &\Rightarrow -(x-y)\ \text {is divisible by}\ n \\ {} &\Rightarrow y-x\ \text {is divisible by}\ n \\ {} &\Rightarrow (y,x) \in R \end{aligned}$$
  3. (c)

    \((x,y) \in R\) and \((y,z) \in R\) implies \((x,z) \in R\). For,

    $$\begin{aligned} (x,y),(y,z) \in R &\Rightarrow x-y\ \text {and}\ y-z\ \text {is divisible by}\ n \\ &\Rightarrow (x-y)+(y-z)\ \text {is divisible by}\ n \\ &\Rightarrow x-z\ \text {is divisible by}\ n \\ &\Rightarrow (x,z) \in R \end{aligned}$$

Thus, R is reflexive, symmetric, and transitive. Hence, R is an equivalence relation.

Example 1.6

Consider the set \(X= \lbrace 1,2,3 \rbrace \). Define a relation R on X by \(R=\lbrace (1,1),(2,2),(1,2),(2,3) \rbrace \). Is R an equivalence relation? Clearly, not! We can observe that R is not reflexive as \((3,3) \notin R\). Also R is not transitive as (1, 2), (2, 3) but \((1,3) \notin R\). What if we include the elements (3, 3) and (1, 3) to the relation and redefine R as \(\tilde{R}=\lbrace (1,1),(2,2),(3,3),(1,2),(2,3),(1,3) \rbrace \). Then \(\tilde{R}\) is an equivalence relation on X.

Relations define how elements from one set correspond to elements from another, allowing for a broader range of relationships. However, there are specialized relations in which each element in the first set uniquely relates to one element in the second. This connection gives these relations mathematical precision, making them crucial for modeling precise transformations and dependencies in various mathematical disciplines, ranging from algebra to calculus. We refer to such relations as functions.

Functions

Function in mathematics is a rule or an expression that relates how a quantity (dependent variable) varies with respect to another quantity (independent variable) associated with it. They are ubiquitous in mathematics and they serve many purposes.

Definition 1.9

(Function) A function f from a set X to a set Y, denoted by \(f:X \rightarrow Y\), is a relation that assigns to each element \(x \in X\) exactly one element \(y \in Y\). Then y is called the image of x under f and is denoted by f(x). The set X is called the domain of f and Y is called the co-domain of f. The collection of all images of elements in X is called the range of f.

Example 1.7

Consider the sets X and Y from Example 1.3. Define a relation R from the set X to the set Y as \(R= \lbrace (1,3),(2,3),(3,5) \rbrace \). Then the relation R is a function from X to Y (Fig. 1.4).

Fig. 1.4
Two ellipses with elements 1, 2, and 3 on the left and elements 3, 4, and 5 on the right. 1 and 2 on the left are mapped to 3 on the right, and 3 is mapped to 5 using right arrows. X at the top left is mapped to Y at the top right with a curved arrow labeled f.

Observe that each element from set X is mapped to exactly one element in set Y. Therefore the given relation is a function. X is called domain of f and Y is called the co-domain of f. 4 does not belong to the range set of f, as it does not have a pre-image. The range set of f is \(\lbrace 3,5 \rbrace \)

From Definition 1.9, it is clear that any function from a set X to a set Y is a relation from X to Y. But the converse need not be true. Consider the following example.

Example 1.8

Consider the sets X and Y from Example 1.3. Then the relation \(R= \lbrace (1,3),(1,4),(2,3) \rbrace \) from the set X to the set Y is not a function as two distinct elements of the set Y are assigned to the element 1 in X (Fig. 1.5).

Fig. 1.5
Two ellipses with elements 1, 2, and 3 on the left and elements 3, 4, and 5 on the right. 1 is mapped to 3 and 4, and 3 on the left is mapped to 5 on the right using right arrows. X at the top left is mapped to Y at the top right with a curved arrow labeled R.

Observe that 1 is mapped to both 3 and 4. Thus \(R= \lbrace (1,3),(1,4),(2,3) \rbrace \) is not a function

It would be easier to understand the dependence between the elements if we could geometrically represent a function. As a convention, the visual representation is done by plotting the elements in the domain along the horizontal axis and the  corresponding images along the vertical axis.

Definition 1.10

(Graph of a Function) Let \(f:X \rightarrow Y\) be a function. The set \(\lbrace \left( x,f(x) \right) \in X \times Y \mid x \in X \rbrace \) is called the graph of f.

Observe that the above-defined set is exactly the same as f, by Definition 1.9. Also keep in mind that not all graphs represent a function. If any vertical line intersects a graph at more than one point, the relation represented by the graph is not a function. This is known as the vertical line test (Fig. 1.6).

Fig. 1.6
Two graphs of y versus x. A, function has an S-shaped curve between the first and third quadrants through the origin and a vertical line between the second and third quadrants. B, not a function has a circle with its center at the origin and a vertical line between the first and fourth quadrants.

Observe that in the first graph any vertical line drawn in the domain will touch exactly one point of the graph. However, in the second graph it may touch more than one point

Definition 1.11

(One-one function and Onto function) A function f from a set X to a set Y is called a one-one (injective) function if distinct elements in the domain have distinct images, that is, for every \(x_1,x_2 \in X\), \(f(x_1)=f(x_2)\) implies \(x_1=x_2\). f is called onto (surjective) if every element of Y is the image of at least one element of X, that is, for every \(y \in Y\), \(\exists x \in X\) such that \(f(x)=y\). A function which is both one-one and onto is called a bijective function.

Example 1.9

Consider the function \(f: \mathbb {R} \rightarrow \mathbb {R}\) defined by \(f(x)=x+5\) for \(x \in \mathbb {R}\). First, we will check whether the function is one-one or not. We will start by assuming \(f(x_1)=f(x_2)\) for some \(x_1,x_2 \in \mathbb {R}\). Then

$$\begin{aligned} f(x_1)=f(x_2)&\Rightarrow x_1+5=x_2+5 \\ &\Rightarrow x_1=x_2 \end{aligned}$$

Therefore f is one-one. Now to check whether the function is onto, take any \(x \in \mathbb {R}\), then \(x-5 \in \mathbb {R}\) with \(f(x-5)=x-5+5=x\). That is, every element in \(\mathbb {R}\) (co-domain) has a pre-image in \(\mathbb {R}\) (domain). Thus, f is onto and hence f is a bijective function.

The graph of a function can also be used to check whether a function is one-one. If any horizontal line intersects the graph more than once, then the graph does not represent a one-one function as it implies that two different elements in the domain have the same image. This is known as the horizontal line test (Fig. 1.7).

Fig. 1.7
Two graphs. A, a line between the first and third quadrants via the second quadrant intersects a dashed horizontal line parallel to the horizontal axis. f 1 of x equals x + 5. B, an upward open parabola with its vertex at the origin intersects the dashed horizontal line. f 2 of x equals x squared.

Consider the graphs of the functions \(f_1,f_2{:}\mathbb {R}\rightarrow \mathbb {R}\) defined by \(f_1(x)=x+5\) and \(f_2(x)=x^2\). Observe that if we draw a horizontal line parallel to the x-axis, it will touch exactly one point on the graph of the function \(f_1\). But on the graph of the function \(f_2\), it touches two points. Then by horizontal line test, the first function is one-one whereas the second one is not a one-one function

Definition 1.12

(Composition of two functions) Let \(f:X \rightarrow Y\) and \(g:Y \rightarrow Z\) be any two functions, then the composition \(g \circ f\) is a function from X to Z, defined by \((g \circ f)(x)=g\left( f(x) \right) \) (Fig. 1.8).

Fig. 1.8
Three ellipses in a row. Ellipse x on the left leads to f of x at the center which leads to g of f of x on the right. x is mapped with g of f of x via g o f. X is mapped to Y and Y is mapped to Z via curved arrows.

It is clear that the range set of f must be a subset of the domain of g, for the composition function to be defined

Properties

Let \(f:X \rightarrow Y,g:Y \rightarrow Z\), and \(h: Z \rightarrow W\), then

  1. (a)

    \(h \circ (g \circ f)=(h \circ g) \circ f\) (Associative).

  2. (b)

    If f and g are one-one, then \(g \circ f\) is one-one.

  3. (c)

    If f and g are onto, then \(g \circ f\) is onto.

Example 1.10

Consider the functions \(f,g: \mathbb {R} \rightarrow \mathbb {R}\) defined by \(f(x)=x^2\) and \(g(x)=2x+1\). Then \((f \circ g)(x)=f(g(x))=f(2x+1)=(2x+1)^2=4x^2+4x+1\) and \((g \circ f)(x)=g(f(x))=g(x^2)=2x^2+1\). Observe that \(f \circ g \ne g \circ f \). Therefore function composition need not necessarily be commutative.

Definition 1.13

(Inverse of a function) A function \(f:X \rightarrow Y\) is said to be invertible if there exists a function \(g:Y \rightarrow X\) such that \(g\left( f(x)\right) =x\) for all \(x \in X\) and \(f\left( g(y)\right) =y\) for all \(y \in Y\). The inverse function of f is denoted by \(f^{-1}\).

The function f is invertible if and only if f is a bijective function. For, suppose there exists an inverse function g for f. Then

$$\begin{aligned}f(x_1)=f(x_2) \Rightarrow g\left( f(x_1)\right) =g\left( f(x_2)\right) \Rightarrow x_1=x_2 \end{aligned}$$

That is, f is injective. And \(f\left( g(y)\right) =y\) for all \(y \in Y\) implies that f is onto.

Example 1.11

Consider a function f, defined as in Fig. 1.4. Indeed, from the figure itself, it’s evident that function f is not bijective. Thus f is not invertible. Observe that if we define \(f(2)=4\), then f is both one-one and onto. Then define a function, \(g:Y \rightarrow X\) by \(g(3)=1\), \(g(4)=2\), and \(g(5)=3\). Now \(g\left( f(1)\right) =g\left( 3\right) =1\), \(g\left( f(2)\right) =g\left( 4\right) =2\), and \(g\left( f(3)\right) =g\left( 5\right) =3\). That is, \(g\left( f(x)\right) =x\) for all \(x \in X\). Similarly, we can prove that \(f\left( g(y)\right) =y\) for all \(y \in Y\).

Example 1.12

Consider the function \(f(x)=x+5\), defined as in Example 1.9. We have already shown that the function is bijective. Now, we will find the inverse of f. By definition, we can say that \(f^{-1}\) is the function that will undo the operation of f. That is, if a function f maps an element x from set X to y in set Y, its inverse function \(f^{-1}\) reverses this mapping, taking y from Y back to x in X. In this case, \(X=Y=\mathbb {R}\). If we consider, a \(y \in \mathbb {R}\)(co-domain), then there exists \(x \in \mathbb {R}\)(domain) such that \(y=x+5\) (Why?). Then \(x=y-5\). Thus, the function \(g(y)=y-5\) will undo the action of f. We can verify this algebraically as follows:

$$\begin{aligned}g\left( f(x)\right) = g(x+5)=x+5-5=x \end{aligned}$$

and

$$\begin{aligned}f\left( g(x)\right) = f(x-5)=x-5+5=x \end{aligned}$$

Thus \(f^{-1}(x)=x-5\).

Example 1.13

Now consider \(f: \mathbb {R} \rightarrow \mathbb {R}\) defined by \(f(x)=x^2\). From Fig. 1.7, we can clearly say that f is not bijective. Thus f does not have an inverse in \(\mathbb {R}\). But, if we restrict the domain of f to \([0, \infty )\), f is a bijective function. Then the inverse of f is the function \(f^{-1}(x)= \sqrt{x}\). For, \(g\left( f(x)\right) = g(x^2)=\sqrt{x^2}=x\) and \(f\left( g(x)\right) = f(\sqrt{x})=(\sqrt{x})^2=x\).

It is easy to check whether a real function is invertible or not, by just looking at its graph. Consider Fig. 1.9.

Fig. 1.9
Two graphs. The left graph has 2 parallel lines on either side given by f of x = x + 5 and a dashed line at the center given by f inverse of x = x minus 5. The right graph has a dashed line through the origin, f inverse of x = square root of x prime, and a leaf-shaped curve, f of x = x squared.

Observe that the graph of \(f^{-1}(x)\) is the graph of f(x) reflected about the line \(y=x\) (represented by the dotted line)

Now we will discuss some of the important concepts related to functions defined on the set of all real numbers to itself.

Definition 1.14

(Continuity at a point) Let \(X \subset \mathbb {R}\) and \(f: X \rightarrow \mathbb {R}\) be a function. We say that f is continuous at \(x_0 \in X\), if given any \(\epsilon >0\) there exists a \(\delta >0\) such that if x is any point in X satisfying \(|x- x_0 |<\delta \), then \(|f(x) - f(x_0) |< \epsilon \). Otherwise, f is said to be discontinuous at \(x_0\).

A function is continuous if it is continuous at each point of its domain. In graphical terms, the continuity of a function on the set of all real numbers means that the graph does not have any gaps or breaks. From Fig. 1.7, it is clear that both the functions \(f(x)=x+5\) and \(f(x)=x^2\) are continuous. Figure 1.10 gives an example for a discontinuous function.

Fig. 1.10
A graph of y versus x has two horizontal lines. The line at y equals 1 extends in the first quadrant. The line at y equals negative 1 extends in the third quadrant.

Consider the signum function, defined by . Clearly, f is not continuous at \(x=0\)

Observe that in the definition of continuity of a function at a point, the value of \(\delta \) depends on both \(x_0\) and \(\epsilon \). If \(\delta \) does not depend on the point \(x_0\), then the continuity is called uniform continuity. In other words, a function f is uniformly continuous on a set X, if for every \(\epsilon > 0\), there exists \(\delta > 0\), such that for every element \(x,y \in X\), \(|f(x) - f(y) |< \epsilon \) whenever \(|x -y |< \delta \). Graphically, this means that given any narrow vertical strip of width \(\epsilon \) on the graph, there exists a corresponding horizontal strip of width \(\delta \) such that all points in the interval within \(\delta \) units of each other on the x-axis map to points within \(\epsilon \) units of each other on the y-axis. Consider the following example.

Example 1.14

Consider the function \(f_1(x)=x+5\). We will show that \(f_1\) is uniformly continuous. For, given any \(\epsilon >0\), choose \(\delta =\epsilon \). Then, for any \(x,y \in \mathbb {R}\) with \(|x-y |< \delta \), we have

$$\begin{aligned} |f_1(x) - f_1(y) |= |x+5-(y+5) |= |x-y |< \delta = \epsilon \end{aligned}$$

Thus \(f_1(x)=x+5\) is uniformly continuous over \(\mathbb {R}\). However, the function \(f_2(x)=x^2\) is not uniformly continuous on \(\mathbb {R}\). Suppose on the contrary that \(f_2\) is uniformly continuous. Fix \(\epsilon =1\). Then, there exists \(\delta _0 > 0\), such that for every element \(x,y \in \mathbb {R}\), \(|f(x) - f(y) |< 1 \) whenever \(|x -y |< \delta _0\). Now, take \(y=x+\frac{\delta _0}{2}\). Then,

$$\begin{aligned}|f(x)-f(y) |= \left|x^2-\left( x+\frac{\delta _0}{2} \right) ^2 \right|= \left|x\delta _0 + \frac{\delta _0^2}{4} \right|<1 \end{aligned}$$

which is a contradiction as x can be chosen arbitrarily.

Now, we will define continuity of a function using the notion of sequences of real numbers.

Definition 1.15

(Real Sequence) A real sequence \(\lbrace x_n \rbrace \) is a function whose domain is the set \(\mathbb {N}\) of natural numbers and co-domain is the set of all real numbers \(\mathbb {R}\). In other words, a sequence in \(\mathbb {R}\) assigns to each natural number \(n= 1, 2, \ldots \) a uniquely determined real number. For example, the function \(f:\mathbb {N} \rightarrow \mathbb {R}\) defined by \(f(n)=\frac{1}{n}\) determines the sequence \(\left\{ 1,\frac{1}{2},\frac{1}{3}, \ldots \right\} \).

Example 1.15

The list of numbers \(\{r,r,r,\ldots \}\), where r is any real number, is a sequence called constant sequence as we can define a function, \(f: \mathbb {N} \rightarrow \mathbb {R}\), by \(f(n)=r\).

Example 1.16

The list of numbers \(\{r,r^2,r^3,\ldots \}\), where r is any real number, is a sequence called geometric sequence as we can define a function, \(f: \mathbb {N} \rightarrow \mathbb {R}\), by \(f(n)=r^n\).

Definition 1.16

(Convergent Sequence) A real sequence \(\lbrace x_n \rbrace \) is said to converge to \(x \in \mathbb {R}\), or x is said to be a limit of \(\lbrace x_n \rbrace \), denoted by \(x_n \rightarrow x\) or \(\lim \limits _{n \rightarrow \infty }x_n =x\), if for every \(\epsilon >0\), there exists a natural number N such that \(|x_n-x |< \epsilon \) for all \(n \ge N\). Otherwise, we say that \(\lbrace x_n \rbrace \) is divergent.

Theorem 1.1

A real sequence \(\lbrace x_n \rbrace \) can have at most one limit.

Example 1.17

Consider the sequence \(\lbrace x_n \rbrace \), where \(x_n=\frac{1}{n}\). Clearly, \(x_n \rightarrow 0\). For, given any \(\epsilon >0\), we have \(|x_n - 0 |= \left|\frac{1}{n} \right|\). If we take \(n> \frac{1}{\epsilon }\), we have \(|1/n|< \epsilon \). Thus \(\lim \limits _{n \rightarrow \infty }\frac{1}{n} =0\).

Example 1.18

Consider the sequence \(\lbrace x_n \rbrace \), defined as in Example 1.15. It is easy to observe that \( x_n \rightarrow r\) as \(|x_n - r |=0\) for all \(n \in \mathbb {N}\).

Example 1.19

Consider the sequence \(\lbrace x_n \rbrace \), defined as in Example 1.16. We can observe that the convergence of this sequence depends on the value of r. First of all, by the above example, for \(r=0\) and \(r=1\), \(\lbrace x_n \rbrace \) converges to 0 and 1, respectively. Now let \(0<r<1\). Then \( x_n \rightarrow 0\). For any \(\epsilon >0\), if we take \(N>\frac{ln\ \epsilon }{ln\ r}\) we have \(|x_n - 0 |=r^n < \epsilon \) for all \(n >N\). Similarly, for \(-1<r<0\), \( x_n \rightarrow 0\).

Now for \(r=-1\), the given sequence becomes \(x_n=(-1)^n\). Take \(\epsilon =\frac{1}{3}\). Then there does not exist any point \(x \in \mathbb {R}\) such that \(|x_n -x |< \frac{1}{3} \) as the interval \(\left( x-\frac{1}{3},x+\frac{1}{3}\right) \) must contain both 1 and \(-1\). Therefore \(\lbrace x_n\rbrace \) with \(x_n=(-1)^n\) does not converge. Similarly, we can prove that the sequence \(\lbrace x_n\rbrace \) with \(x_n=r^n\) does not converge outside the interval \((-1,1]\).

As we have discussed convergent sequences, Cauchy sequences must be introduced, which are a specific class of sequences in which the terms become arbitrarily close to each other as the index increases, rather than approaching a single limit.

Definition 1.17

(Cauchy Sequence) A real sequence \(\lbrace x_n \rbrace \) is said to be a Cauchy sequence, if for any \(\epsilon >0\), there exists a natural number N such that \(|x_m-x_n |< \epsilon \) for all \(m,n \ge N\).

For a real sequence, the terms convergent sequence and Cauchy sequence do not make any difference. We have the following theorem stating this fact.

Theorem 1.2

A real sequence \(\lbrace x_n \rbrace \) is convergent if and only if it is Cauchy.

However, this may not be true, if we are considering sequences in the set of rational numbers, \(\mathbb {Q}\). That is, there exist sequences of rational numbers that are Cauchy but not convergent in \(\mathbb {Q}\) (the sequence may not converge to a rational number). For example, consider the sequence \(1.41,1.412,1.1421,\ldots \). This sequence will converge to \(\sqrt{2}\) which is not a rational number (also, see Exercise 13 of this chapter). Now, we will introduce the sequential definition for continuity.

Definition 1.18

(Sequential Continuity) A function \(f:X \subseteq \mathbb {R} \rightarrow \mathbb {R}\) is said to be sequentially continuous at point \(x_0 \in X\) if for every \(\{x_n \}\) in X with \(x_n \rightarrow x_0\), we have \(f(x_n) \rightarrow f(x_0)\). That is if, \(\lim \limits _{n \rightarrow \infty }x_n =x_0 \Rightarrow \lim \limits _{n \rightarrow \infty }f(x_n) =f(x_0)\).

Then, we have the following result which asserts that sequential continuity and continuity of a real function are the same.

Theorem 1.3

A function \(f:X \subseteq \mathbb {R} \rightarrow \mathbb {R}\) is continuous if and only if it is sequentially continuous.

Example 1.20

Consider the signum function as defined in Fig. 1.10. We know that f is not continuous at \(x=0\). We can use the definition of sequential continuity to prove this fact. Consider the sequence \(\left\{ \frac{1}{n} \right\} \). In Example 1.17, we have seen that \(\frac{1}{n} \rightarrow 0\). However, observe that \(f\left( \frac{1}{n} \right) =1 \rightarrow 1\ne f(0)\). Thus f is not sequentially continuous at 0 and hence f is not continuous at 0.

Now, consider the function \(f(x)=x+5\). We have already seen that f is continuous on \(\mathbb {R}\) as its graph does not have any gaps or breaks. Let us check whether f is sequentially continuous or not. Consider any real number \(r \in \mathbb {R}\) and a sequence \(\{r_n\}\) with \(r_n \rightarrow r\) as \(n \rightarrow \infty \). For sequential continuity \(f(r_n)\) must converge to f(r). Observe that \(f(r_n)=r_n+5 \rightarrow r+5\) as \(n \rightarrow \infty \). Thus f is sequential continuous.

Remark 1.3

A set S is said to be countably infinite if there exists a bijective function from \(\mathbb {N}\) to S. A set which is empty, finite, or countably infinite is called a countable set. Otherwise it is called uncountable set. For example \(\mathbb {Z}\) is countable and \(\mathbb {R}\) is uncountable.

Sequence of Functions

Now, we will combine the ideas of functions and sequences discussed so far and define “sequence of functions”.

Definition 1.19

(Sequence of Functions) Let \(f_n\) be real-valued functions defined on an interval [ab] for each \(n \in \mathbb {N}\). Then \(\{f_1,f_2,f_3, \ldots \}\) is called a sequence of real-valued functions on [0, 1], and is denoted by \(\{f_n \}\).

Example 1.21

For each \(n \in \mathbb {N}\), let \(f_n\) be defined on [0, 1] by \(f_n(x)=x^n\). Then \(\{x,x^2,x^3,\ldots \}\) is a sequence of real-valued functions on [ab].

For a sequence of functions, we have two types of convergences, namely point-wise convergence and uniform convergence. We will discuss these concepts briefly in this section.

Let \(\{f_n \}\) be a sequence of functions on [ab] and \(x_0 \in [a,b]\). Then the sequence of real numbers, \(\{ f_n(x_0) \}\), may be convergent. In fact, this may be true for all points in [ab]. The limiting values of the sequence of real numbers corresponding to each point \(x \in X\) define a function called the limit function or simply the limit of the sequence \(\{ f_n \}\) of functions on [ab].

Definition 1.20

(Point-wise convergence) Let \(\{f_n\}\) be a sequence of real-valued function defined on an interval [ab]. If for each \(x \in [a,b]\) and each \(\epsilon >0\), there exists an \(N \in \mathbb {N}\) such that \(|f_n(x)-f(x) |< \epsilon \) for all \(n >N\), then we say that \(\{f_n \}\) converges point-wise to the function f on [ab] and is denoted by \(\lim \limits _{n \rightarrow \infty }f_n(x)=f(x),\ \forall \ x\in [a,b]\).

Example 1.22

Let \(f_n(x)=x^n\) be defined on [0, 1]. By Example 1.19, the limit function f(x) is given by

$$\begin{aligned}f(x)=\lim \limits _{n \rightarrow \infty }f_n(x)= {\left\{ \begin{array}{ll} 0,\ x\in [0,1)\\ 1,\ x=1 \end{array}\right. } \end{aligned}$$

Let \(\epsilon = \frac{1}{2}\). Then for each \(x \in [0,1]\), there exists a positive integer N such that \(|f_n(x)-f(x) |< \frac{1}{2}\) for all \(n >N\). If \(x=0,f(x)=0\) and \(f_n(x)=0\) for all n. \(|f_n(x)-f(x) |< \frac{1}{2}\) is true for all \(n >1\).

If \(x=1,f(x)=1\) and \(f_n(x)=1\) for all n. \(|f_n(x)-f(x) |< \frac{1}{2}\) is true for all \(n >1\).

If \(x=\frac{3}{4},f(x)=0\) and \(f_n(x)=\left( \frac{3}{4} \right) ^n \) for all n. Then

$$\begin{aligned}|f_n(x)-f(x) |=\left( \frac{3}{4} \right) ^n < \frac{1}{2} \end{aligned}$$

is true for all \(n >2\).

If \(x=\frac{9}{10},f(x)=0\) and \(f_n(x)=\left( \frac{9}{10} \right) ^n \) for all n. Then

$$\begin{aligned}|f_n(x)-f(x) |=\left( \frac{9}{10} \right) ^n < \frac{1}{2} \end{aligned}$$

is true for all \(n >6\) (Fig. 1.11).

Fig. 1.11
A graph of f n of x versus x has an increasing line from the origin and six exponential curves that begin and end at the same ends of the line. An inverted L-shaped line is drawn with the axes to form a rectangle.

Point-wise convergence of \(\{f_n \}\), where \(f_n(x)=x^n,x \in [0,1]\)

Observe that there is no value of N for which \(|f_n(x)-f(x) |< \frac{1}{2}\) is true for all \(x \in [0,1]\). N depends on both x and \(\epsilon \). But, this is not the case for the following example.

Example 1.23

Consider \(f_n(x)=\dfrac{x}{1+nx},x\ge 0\). Clearly,

$$\begin{aligned}\lim \limits _{n \rightarrow \infty } f_n(x)=f(x)=0, \ \forall \ x\ge 0 \end{aligned}$$

Also, we have

$$\begin{aligned}0 \le \frac{x}{1+nx} \le \frac{x}{nx} =\frac{1}{n} \end{aligned}$$

Therefore, \(|f_n(x)-f(x) |=|f_n(x) |\le \frac{1}{n} < \epsilon \) for all \(x\ge 0\), provided \(N>\frac{1}{\epsilon }\). That is, if \(N>\frac{1}{\epsilon }\), then \(|f_n(x)-f(x) |< \epsilon \) for all \(n >N\) and for all \(x \ge 0\). Here N depends only on \(\epsilon \). Such type of convergence is called uniform convergence (Fig. 1.12).

Fig. 1.12
A graph of f of x versus x has five logarithmic curves from the origin.

Uniform convergence of \(\{f_n \}\), where \(f_n(x)=\dfrac{x}{1+nx},x\ge 0\)

Definition 1.21

(Uniform convergence) Let \(\{f_n \}\) be a real-valued function defined on an interval [ab]. Then \(\{f_n \}\) is said to converge uniformly to the function f on [ab], if for each \(\epsilon >0\), there exists an integer N (dependent on \(\epsilon \) and independent of x) such that for all \(x \in [a,b]\), \(|f_n(x)-f(x) |< \epsilon \) for all \(n>N\) (Fig. 1.13).

Fig. 1.13
A graph has a cosine waveform with the lowest peaks at (negative 3, negative 1) and (3, negative 1). It has 3 flattened bell curves. The upper and lower curves are dashed. The center solid curve passes through the waveform.

If \(\{f_n\}\) converges uniformly to a function f on [ab], for a given \(\epsilon >0\), there exists a positive integer N such that the graph of \(f_n(x)\) for all \(n >N\) and for all \(x \in [a,b]\) lies between the graphs of \(f(x)- \epsilon \) and \(f(x)+ \epsilon \)

Clearly, we can observe that uniform convergence implies point-wise convergence, but the converse does not hold true always. Also observe that, in Example 1.22, all the functions in \(\{f_n \}\) were continuous. However, their point-wise limit was not continuous. In the case of uniform convergence, this is not possible. That is, if \(\{f_n\}\) is a sequence of continuous functions and \(f_n \rightarrow f\) uniformly then f is continuous.

2 Metric Spaces

In \(\mathbb {R}\), we have the notion of usual distance provided by the modulus function, to discuss the ideas like continuity of a function, convergence of a sequence, etc. These concepts can also be extended to a wide range of sets by generalizing the notion of “distance” to these sets by means of a function, called metric. A set with such a distance notion defined on it is called as a metric space. Consider the following definition.

Definition 1.22

(Metric Space) Let X be any non-empty set. A metric (or distance function) on X is a function \(d: X \times X \rightarrow \mathbb {R}^{+}\) which satisfies the following properties for all \(x,y,z \in X\):

(M1):

\(d(x,y) \ge 0\) and \(d(x,y) = 0\) if and only if \(x=y\). (Non-negativity)

(M2):

\(d(x,y)=d(y,x)\). (Symmetry)

(M3):

\(d(x,z) \le d(x,y)+d(y,z)\). ( Triangle Inequality)

If d is a metric on X, we say that (Xd) is a metric space.

Example 1.24

Consider the set of all real numbers, \(\mathbb {R}\). For \(x,y \in \mathbb {R}\), the function defined by

$$\begin{aligned}d(x,y)=|x-y | \end{aligned}$$

is the usual distance between two points on the real line.

(M1):

Clearly \(d(x,y)=|x-y |\ge 0\) and \(d(x,y)=|x-y |= 0\) if and only if \(x-y=0\). That is, if and only if \(x=y\).

(M2):

\(d(x,y)=|x-y |=|y-x |=d(y,x)\)

(M3):

Also, by the properties of modulus

$$\begin{aligned} d(x,z)&=|x-z |\\ &= |x-y+y-z |\\ &\le |x-y |+ |y-z |\\ &=d(x,y)+d(y,z) \end{aligned}$$

Thus all the conditions for a metric are satisfied and hence \(\left( \mathbb {R},|. |\right) \) is a metric space. This metric is known as the usual metric or Euclidean Distance .

Example 1.25

For any non-empty set X, define a function d by

$$\begin{aligned}d(x,y)={\left\{ \begin{array}{ll} 1\ , \ \ x \ne y \\ 0\ , \ \ x =y \end{array}\right. } \end{aligned}$$

Clearly conditions (M1) and (M2) are satisfied. Now we will check (M3),

  1. Case 1

    \(x\ne y = z\) Then \(d(x,y)=1,d(x,z)=1\) and \(d(y,z)=0\)

  2. Case 2

    \(x=y \ne z\) Then \(d(x,y)=0,d(x,z)=1\) and \(d(y,z)=1\)

  3. Case 3

    \(x=y =z\) Then \(d(x,y)=0,d(x,z)=0\) and \(d(y,z)=0\)

  4. Case 4

    \(x\ne y \ne z\) Then \(d(x,y)=1,d(x,z)=1\) and \(d(y,z)=1\).

In all four cases, condition (M3) is clearly satisfied. Hence \(\left( X ,d \right) \) is a metric space for any non-empty set X. The given metric d is known as a discrete metric.

Definition 1.23

(Open Ball) Let (Xd) be a metric space. For any  point \(x_0 \in X\) and \(\epsilon \in \mathbb {R}^+\),

$$\begin{aligned}B_{\epsilon }(x_0)=\lbrace x \in X \mid d(x,x_0) <\epsilon \rbrace \end{aligned}$$

is called an open ball centered at \(x_0\) with radius \(\epsilon \).

Definition 1.24

(Open Set and Closed Set) Let (Xd) be a metric space. A subset \(Y \subseteq X\) is said to be open if it contains an open ball about each of its elements. \(Y \subseteq X\) is said to be closed if its complement \(Y^c\) is open.

Example 1.26

Consider the metric space \(\left( \mathbb {R},|. |\right) \). Then we can verify that every open interval in the real line is an open set (see Exercise 8 of this chapter). Consider an arbitrary open interval \((a,b) \subset \mathbb {R}\) and choose an arbitrary element \(c \in (a,b)\). We have to show that there exists \(\epsilon > 0\) such that \(B_{\epsilon }(c) \subset (a,b)\) (Fig. 1.14).

Fig. 1.14
A horizontal bidirectional arrow line has a left parenthesis labeled as a, a dot labeled as c, and a right parenthesis labeled as b at equal intervals from left to right. The line between a and c is labeled c minus a. The line between c and b is b minus c.

Observe that if we take \(\epsilon \) less than both \(c-a\) and \(b-c\), \(B_{\epsilon }(c) \subset (a,b)\)

From Fig. 1.14, if we take \(\epsilon < min \lbrace c-a,b-c \rbrace \), it is clear that \(B_{\epsilon }(c) \subset (a,b)\) for any \(c \in (a,b)\). Similarly, we can prove that the union of open intervals is also an open set in \(\mathbb {R}\). But a closed interval \([a,b] \subset \mathbb {R}\) is not an open set as \(B_{\epsilon }(a) \nsubseteq [a,b]\) for any \(\epsilon >0\) (Fig. 1.15).

Fig. 1.15
A horizontal bidirectional arrow line has a left parenthesis labeled as a minus epsilon, a left square bracket labeled as a, a right parenthesis labeled as a plus epsilon, and a right bracket labeled as b from left to right.

Clearly for any \(\epsilon >0 \). Also, any open interval containing b is not a subset of [ab]

As \([a,b]^c = \left( - \infty , a\right) \cup \left( b, \infty \right) \) is an open set, [ab] is a closed set.

Example 1.27

Every singleton set in a discrete metric space X is an open set. It is obvious from the fact that for any \(x \in X\), we have \(B_{\epsilon }(x)= \lbrace x \rbrace \) when \(\epsilon <1\). Also it is interesting to observe that every subset of a discrete metric space is open as every open set can be written as a union of singleton sets. Therefore, every subset of a discrete metric space X is a closed set also.

As we have defined sequences on \(\mathbb {R}\), we can define sequences on an arbitrary metric space (Xd) as a function from the set of all natural numbers taking values in X, and we can discuss their convergence based on the distance function d.

Definition 1.25

(Convergent Sequence) Sequence \(\lbrace x_n \rbrace \) in a metric space (Xd) converges to \(x \in X\) if for every \(\epsilon >0 \) there exists \(N\in \mathbb {N}\) such that \(x_n \in B_{\epsilon }(x)\) for all \(n >N\) and x is called the limit of the sequence \(\lbrace x_n \rbrace \). We denote this by \(x_n \rightarrow x\) or \(\lim \limits _{n\rightarrow \infty }x_n=x\). In other words, we can say that \(d(x_n,x)\rightarrow 0\) as \(n\rightarrow \infty \).

Example 1.28

Consider the sequence \(\lbrace x_n \rbrace \), where \(x_n=r+\frac{1}{n},n\in \mathbb {N}\) in the metric space \(\left( \mathbb {R},|. |\right) \) for some \(r \in \mathbb {R}\). We will show that \(x_n \rightarrow r\) in \(\left( \mathbb {R},|. |\right) \). For any \(\epsilon >0\), if we take \(N > \frac{1}{\epsilon }\)

$$\begin{aligned}d(x_n,r)= \left| r + \frac{1}{n} -r \right| = \left| \frac{1}{n} \right| < \epsilon \ \forall \ n > N \end{aligned}$$

That is, \(x_n \in B_{\epsilon }(r)\) for all \(n>N\). Therefore \(x_n \rightarrow r\) in \(\left( \mathbb {R},|. |\right) \).

Example 1.29

Let \(\lbrace x_n \rbrace \) be a sequence in a metric space (Xd), where d is the discrete metric. We have seen in Example 1.27 that every singleton set in a discrete metric space is open. Therefore for the sequence \(\lbrace x_n \rbrace \) to converge to a point \(x \in X\), the open set \(\lbrace x \rbrace \) must contain almost all terms of the sequence. In other words, a sequence \(\lbrace x_n \rbrace \) in a discrete metric space converges if and only if it is of the form \(x_1,x_2,\ldots ,x_{N},x,x,\ldots \). That is, if and only if \(\lbrace x_n \rbrace \) is eventually constant.

Definition 1.26

(Cauchy Sequence) Sequence of points \(\lbrace x_n \rbrace \) in a metric space (Xd) is said to be a Cauchy sequence if for every \(\epsilon >0\), there exists an \(N_{\epsilon } \in \mathbb {N}\) such that \(d(x_n,x_m)< \epsilon \) for every \(m,n > N_{\epsilon }\).

Theorem 1.4

In a metric space, every convergent sequence is Cauchy.

The converse of the above theorem need not be true. That is, there exists metric spaces where every Cauchy sequence may not be convergent.

Example 1.30

Consider the sequence \(\lbrace x_n \rbrace \) with \(x_n = a + \frac{1}{n}\) in the metric space \(\left( (a,b),|. |\right) \) where (ab) is any open interval in \(\mathbb {R}\). We will show that this sequence is Cauchy but not convergent. For an \(\epsilon >0\), if we choose \(N>\frac{2}{\epsilon }\)

$$\begin{aligned}d(x_n,x_m)=\left| \frac{1}{n}-\frac{1}{m} \right| \le \left| \frac{1}{n} \right| + \left| \frac{1}{m}\right| \le \frac{\epsilon }{2}+\frac{\epsilon }{2}=\epsilon \ \forall \ m,n > N \end{aligned}$$

That is, the given sequence is a Cauchy sequence. As we have seen in Example 1.28, the given sequence converges to a as \(n \rightarrow \infty \). As \(a \notin (a,b)\), \(\lbrace x_n \rbrace \) with \(x_n = a + \frac{1}{n}\) is not convergent in \(\left( (a,b),|. |\right) \) .

Definition 1.27

(Complete Metric Space) A metric space in which every Cauchy sequence is convergent is called a complete metric space.

Example 1.31

By Theorem 1.2, \(\left( \mathbb {R},|. |\right) \) is a complete metric space and from Example 1.30, \(\left( (a,b),|. |\right) \) is an incomplete metric space.

Definition 1.28

(Continuous Function) Let \(\left( X,d_1 \right) \) and \(\left( Y,d_2 \right) \) be two metric spaces. A function \(f: X \rightarrow Y\) is said to be continuous at a point \(x_0\in X\) if for every \(\epsilon >0\) there is a \(\delta >0\) such that \(d_2\left( f(x),f(x_0) \right) < \epsilon \) whenever \(d_1\left( x,x_0 \right) <\delta \). f is said to be continuous on X if f is continuous at every point of X.

Theorem 1.5

Let \(\left( X,d_1 \right) \) and \(\left( Y,d_2 \right) \) be two metric spaces. Then a function \(f: X \rightarrow Y\) is said to be continuous if and only if the inverse image of any open set of \(\left( Y,d_2 \right) \) is open in \(\left( X,d_1 \right) \).

The continuity of a function in metric spaces can also be discussed in terms of sequences. Consider the following definition.

Definition 1.29

(Sequential Continuity) Let \(\left( X,d_1 \right) \) and \(\left( Y,d_2 \right) \) be two metric spaces. A function \(f: X \rightarrow Y\) is said to be sequentially continuous at a point \(x_0\in X\) if \(\lbrace x_n \rbrace \) is any sequence in \(\left( X,d_1 \right) \) with \(x_n \rightarrow x_0\), then \(f(x_n) \rightarrow f(x_0)\) in \(\left( Y,d_2 \right) \).

Theorem 1.6

Let \(\left( X,d_1 \right) \) and \(\left( Y,d_2 \right) \) be two metric spaces. Then a function \(f: X \rightarrow Y\) is continuous on X, if and only if it is sequentially continuous.

3 Some Important Algebraic Structures

An algebraic structure consists of a non-empty set together with a collection of operations defined on it satisfying certain conditions or axioms which are defined as per the context under discussion. The operations are of great importance when the resultant obtained by combining two elements in the set belongs to the same set.

Definition 1.30

(Binary Operation) Let G be any set. A binary operation \('*'\) on G is a function \(*:G \times G \rightarrow G\) defined by

$$\begin{aligned}*(g_1,g_2)=g_1*g_2 \end{aligned}$$

Example 1.32

Let \(G= \mathbb {R}\), the set of all real numbers, and let \( +\) be the usual addition of real numbers. Now \(+:\mathbb {R} \times \mathbb {R} \rightarrow \mathbb {R}\) such that \(+(a,b)=a+b \in \mathbb {R}\) defines a binary operation. Similarly, the usual multiplication and subtraction of real numbers are also binary operations on \(\mathbb {R}\). But as the division of a real number with 0 is not defined, division is not a binary operation.

Definition 1.31

(Group) A non-empty set G together with a binary operation \('*'\) is said to be a group, denoted by \((G,*)\), if \('*'\) satisfies the following properties:

  1. (a)

    \(g_1 * (g_2 * g_3)=(g_1 * g_2)*g_3 \ \forall \ g_1,g_2,g_3 \in G\) (Associative property)

  2. (b)

    There exists \(e \in G\), such that \(e*g=g=g*e\ \forall \ g \in G\) (Existence of Identity)

  3. (c)

    For each \(g \in G\), there exists \(g^{-1} \in G\) such that \(g*g^{-1}=e=g^{-1}*g\). (Existence of Inverse)

If \('*'\) satisfies \(g_1 * g_2 =g_2 *g_1\ \forall \ g_1,g_2 \in G\) (Commutative property) also, then \((G,*)\) is called an Abelian group.

Example 1.33

Consider \(\mathbb {R}\) together with the binary operation \('+'\). Then \(\mathbb {R}\) is an Abelian group under the operation \('+'\). For,

  1. (a)

    Addition is associative over \(\mathbb {R}\).

  2. (b)

    For all \(r \in \mathbb {R}\), there exists \(0 \in \mathbb {R}\) such that \(r+0=r=0+r\).

  3. (c)

    For all \(r \in \mathbb {R}\), there exists \(-r \in \mathbb {R}\) such that \(r+(-r)=0=(-r)+r\).

  4. (d)

    Addition is commutative over \(\mathbb {R}\).

Similarly, \(\mathbb {C}\), the set of all complex numbers, \(\mathbb {Q}\), the set of all rational numbers, and \(\mathbb {Z}\), the set of all integers together with the binary operation \('+'\) is an Abelian group. But \((\mathbb {R}, . )\) is not a group, where ‘.’ denotes usual multiplication as there does not exist any inverse element for 0.

Example 1.34

Consider \(\mathbb {R}^{*}= \mathbb {R} \setminus \lbrace 0 \rbrace \) under usual multiplication. We can show that \(\left( \mathbb {R}^{*}, . \right) \) is an Abelian group. Similarly, we can show that \(\left( \mathbb {Q}^{*}, . \right) \) and \(\left( \mathbb {C}^{*}, . \right) \) are also Abelian groups where \(\mathbb {Q}^*=\mathbb {Q}\setminus \{0\}\) and \(\mathbb {C}^*=\mathbb {C}\setminus \{0\}\). Observe that \(\mathbb {Z}^{*}\) with usual multiplication is not a group as the inverse of every element does not exist in \(\mathbb {Z}^{*}\).

Example 1.35

Consider \(\mathbb {R}^{+}\), the set of all positive real numbers under usual multiplication. We can show that \(\left( \mathbb {R}^{+},. \right) \) is an Abelian group. Similarly, we can show that \(\left( \mathbb {Q}^{+}, . \right) \) and \(\left( \mathbb {C}^{+}, . \right) \) are also Abelian groups.

Example 1.36

The set \(\mathbb {Z}_n = \lbrace 0,1,2, \ldots ,n-1 \rbrace \), for \(n \ge 1\), is a group under the operation addition modulo n, denoted by \(+_n\). The basic operation is usual addition of elements, which ends by reducing the sum of the elements modulo n, that is, taking the integer remainder when the sum of the elements is divided by n. This group is usually referred to as the group of integers modulo n. Consider the following examples:

The above group multiplication table is called Cayley table. A Cayley table, named after the British mathematician Arthur Cayley (1821–1895) of the nineteenth century, illustrates the structure of a finite group by arranging all the possible products of all the group’s members in a square table resembling an addition or multiplication table.

Example 1.37

A one-one function from a set S onto itself is called a permutation. Consider the set \(S= \lbrace 1,2, \ldots ,n \rbrace \). Let \(S_n\) denote the set of all permutations on S to itself. Then \(S_n\) is a non-Abelian group under the operation function composition, called symmetric group on n letters. Permutations of finite sets are represented by an explicit listing of each element of the domain and its corresponding image. For example, the elements of \(S_3\) can be listed as \(\left\{ \rho _0=\begin{pmatrix} 1 &{} 2 &{} 3 \\ 1 &{} 2 &{} 3 \end{pmatrix}, \rho _1=\begin{pmatrix} 1 &{} 2 &{} 3 \\ 2 &{} 3 &{} 1 \end{pmatrix}, \rho _2=\begin{pmatrix} 1 &{} 2 &{} 3 \\ 3 &{} 1 &{} 2 \end{pmatrix}, \mu _1=\begin{pmatrix} 1 &{} 2 &{} 3 \\ 1 &{} 3 &{} 2 \end{pmatrix}, \mu _2=\begin{pmatrix} 1 &{} 2 &{} 3 \\ 3 &{} 2 &{} 1 \end{pmatrix}, \mu _3=\begin{pmatrix} 1 &{} 2 &{} 3 \\ 2 &{} 1 &{} 3 \end{pmatrix} \right\} \)

Theorem 1.7

Let \((G,*)\) be a group. Then

  1. (a)

    the identity element is unique.

  2. (b)

    each element in G has a unique inverse.

Definition 1.32

(Subgroup) A subset H of a group \((G,*)\) is said to be a subgroup of G, if H is a group with respect to the operation \(*\) in G. Let \(H \le G\) denote that H is a subgroup of G and \(H<G\) denote that H is a subgroup of G, but \(H \ne G\).

Example 1.38

We have \(\left( \mathbb {Z},+ \right) < \left( \mathbb {Q},+ \right) < \left( \mathbb {R},+ \right) \). But \(\left( \mathbb {Z}_n,+_n \right) \) is a not a subgroup of \(\left( \mathbb {R},+ \right) \) even though as sets \(\mathbb {Z}_n \subset \mathbb {R}\), as the operations used are different.

Example 1.39

Consider the permutation  group \(S_3\). Then \(\lbrace \rho _0 \rbrace , \lbrace \rho _0,\mu _1 \rbrace , \lbrace \rho _0,\mu _2 \rbrace ,\lbrace \rho _0,\mu _3 \rbrace \) and \( \lbrace \rho _0,\rho _1 ,\rho _2 \rbrace \) are subgroups of \(S_3\).

Definition 1.33

(Order of a Group) Let \((G,*)\) be a group, then the order of G is the number of elements in G.

Example 1.40

Observe that \(\left( \mathbb {Z},+ \right) , \left( \mathbb {Q},+ \right) , \left( \mathbb {R},+ \right) \), and \(\left( \mathbb {C},+ \right) \) are groups of order infinity and \(\left( \mathbb {Z}_n,+_n \right) \) is a group of order n. Also observe that \(S_n\) has order n!.

Definition 1.34

(Order of an element) Let \((G,*)\) be a group, then the order of an element \(g \in G\), denoted by \(\mathcal {O}(g)\), is the least positive integer n such that \(g^n=e\), where e is the identity in G. Clearly, identity element in a group G has order 1.

Example 1.41

Consider the group \(\left( \mathbb {R},+ \right) \). Then we get that no element other than 0 in \(\mathbb {R}\) has finite order. This is because of the fact that repeated addition of a real number will never give us 0.

Example 1.42

Consider a finite group, say \(\left( \mathbb {Z}_4,+_4 \right) \). Then \(\mathcal {O}(0)=1,\mathcal {O}(1)=4,\mathcal {O}(2)=2\), and \(\mathcal {O}(3)=4\). It is easy to observe that, in a finite group G, every element has finite order. Consider another example, \(S_3\). Then \(\mathcal {O}\left( \rho _0 \right) =1,\mathcal {O}\left( \rho _1 \right) =\mathcal {O}\left( \rho _2 \right) =3\), and \(\mathcal {O}\left( \mu _1 \right) =\mathcal {O}\left( \mu _2 \right) = \mathcal {O}\left( \mu _3 \right) =2\).

Remark 1.4

A set G together with a binary operation \('*'\) defined on it is called a Groupoid or Magma. If \('*'\) satisfies associative property also, then \((G,*)\) is called a Semi-group. A semi-group  containing an identity element is called a Monoid.

Definition 1.35

(Group Homomorphism) Let \(\left( G, *\right) \) and \(\left( G',*'\right) \) be any two groups. A map \(\phi \) from G to \(G'\) satisfying \(\phi \left( g_1 * g_2\right) = \phi \left( g_1\right) *' \phi \left( g_2\right) ,\ \forall \ g_1,g_2 \in G\) is called a group homomorphism. If \(\phi \) is one-one and onto, we say that \(\phi \) is an isomorphism or \(\left( G, *\right) \) and \(\left( G',*'\right) \) are isomorphic, denoted by \(G \cong G'\).

Definition 1.36

(Kernel of a Homomorphism) The kernel of a homomorphism of a group G to a group \(G'\) with identity \(e'\) is the set of all elements in G which are mapped to \(e'\). That is, \(Ker\left( \phi \right) = \lbrace g \in G \mid \phi \left( g\right) =e' \rbrace \).

Example 1.43

Consider the groups \(\left( \mathbb {R},+ \right) \) and \(\left( \mathbb {R}^{*}, . \right) \). We will show that they are isomorphic. Define \(\phi : \mathbb {R} \rightarrow \mathbb {R}^{*}\) by \(\phi (x) = e^x\). Then for \(x_1,x_2 \in \mathbb {R}\),

$$\begin{aligned} \phi \left( x_1+x_2 \right) =e^{x_1+x_2}=e^{x_1}. e^{x_2}= \phi (x_1). \phi (x_2) \end{aligned}$$

Therefore \(\phi \) is a homomorphism from \(\mathbb {R}\) to \(\mathbb {R}^*\). Also we can easily verify that \(\phi \) is both one-one and onto. Thus \(\left( \mathbb {R},+ \right) \cong \left( \mathbb {R}^{*}, . \right) \). Now let us find the Kernel of \(\phi \). By definition, \(Ker\left( \phi \right) \) is the set of all elements of the domain which are mapped to the identity element in the co-domain, in this case, 1. Therefore \(Ker\left( \phi \right) = \lbrace x \in \mathbb {R} \mid \phi (x)=e^x=1 \rbrace = \lbrace 0 \rbrace \).

Example 1.44

Consider \(\left( \mathbb {Z},+ \right) \) and \(\left( \mathbb {Z}_n,+_n \right) \). Define \(\phi : \mathbb {Z} \rightarrow \mathbb {Z}_n\) by \(\phi (m)=r\), where r is the remainder when m is divided by n. Let us check whether \(\phi \) is a homomorphism or not. Take two elements \(m_1,m_2 \in \mathbb {Z}\). By division algorithm, we can write \(m_i=q_in+r_i\) with \(0 \le r_i <n\), where \(i=1,2\) and hence \(\phi (m_1)=r_1\) and \(\phi (m_2)=r_2\). Observe that \(m_1+m_2=(q_1+q_2)n+r_1+r_2\). Therefore, we can say that \(\phi (m_1+m_2)\) is the remainder when \(r_1+r_2\) is divided by n. That is, \(\phi (m_1+m_2)=r_1 +_n r_2\). Also \(\phi (m_1) +_n \phi (m_2)=r_1 +_n r_2\). Thus \(\phi \) is a homomorphism. Now the set of all elements mapped to \(0 \in \mathbb {Z}_n\) are integer multiples of n. That is, \(Ker\left( \phi \right) = <n>\).

Example 1.45

Consider the map \(\phi : (\mathbb {R},+) \rightarrow (\mathbb {R}^{*},.)\) defined by \(\phi (x) =x^2\). Then for \(x_1,x_2 \in \mathbb {R}\), we have

$$\begin{aligned} \phi \left( x_1+x_2 \right) = \left( x_1+x_2 \right) ^2 \ne x_1^2.x_2^2 = \phi \left( x_1 \right) .\phi \left( x_2 \right) \end{aligned}$$

Thus \(\phi \) is not a homomorphism.

Example 1.46

Consider \(\left( \mathbb {R}^{*}, . \right) \). Define a map \(\phi : \mathbb {R}^{*} \rightarrow \mathbb {R}^{*}\) by \(\phi (x)= |x |\). Then for \(x_1,x_2 \in \mathbb {R}^*\) , we have

$$\begin{aligned}\phi \left( x_1x_2 \right) =|x_1x_2 |= |x_1 ||x_2 |=\phi \left( x_1 \right) \phi \left( x_2 \right) \end{aligned}$$

Thus \(\phi \) is a homomorphism from \(\mathbb {R}^{*}\) to itself. Observe that \(Ker\left( \phi \right) = \lbrace x \in \mathbb {R}^* \mid \left| x \right| =1 \rbrace = \lbrace -1,1 \rbrace \). Thus \(\phi \) is not one-one (Why?). Also \(\phi \) is not onto as only positive real numbers have pre-images. Therefore \(\phi \) is not an isomorphism.

Theorem 1.8

Let \(\phi \) be a homomorphism from a group \( \left( G,*\right) \) to \(\left( G',*'\right) \). Then

  1. (a)

    if e is the identity element in G, \(\phi (e)\) is the identity element in \( G'\).

  2. (b)

    \(Ker \left( \phi \right) \) is a subgroup of G.

  3. (c)

    for any \(g \in G\), if \(\mathcal {O}(g)\) is finite \(\mathcal {O}\left( \phi (g) \right) \) divides \(\mathcal {O}(g)\).

  4. (d)

    for any subgroup H of G, \(\phi \left( H \right) \) is a subgroup of \(\phi \left( G \right) \) and if H is Abelian, \(\phi \left( H \right) \) is also Abelian.

Two algebraic structures \(\left( G,* \right) \) and \(\left( G',*' \right) \) are isomorphic, if there exists a one-one, onto homomorphism from G to \(G'\). But it will be difficult to show that \(\left( G,* \right) \) and \(\left( G',*' \right) \) are not isomorphic, following the definition as it means that there is no one-one homomorphism from G onto \(G'\). It is not possible to check whether such a function exists or not. In such cases, we could use the idea of structural properties of an algebraic structure, which are properties that must be shared by any isomorphic structure. Cardinality is an example for structural property.

Example 1.47

In Remark 1.3, we have seen that \(\mathbb {R}\) is an uncountable set and \(\mathbb {Z}\) is a countable set. Hence \((\mathbb {R},+)\) and \((\mathbb {Z},+)\) are not isomorphic.

Theorem 1.9

(Cyclic subgroup) Let \((G,*)\) be a group. Then the set \(\lbrace g^n \mid g\in G, n \in \mathbb {Z} \rbrace \) is a subgroup of G called cyclic subgroup of G generated by g, denoted by \(<g>\).

If the group \(G=<g>\) for some \(g \in G\), then G is called a cyclic group and g is called a generator of G.

Example 1.48

\(\left( \mathbb {Z},+ \right) \) is a cyclic group with two generators \(\{1,-1\}\).

Example 1.49

\(\left( \mathbb {Z}_n,+_n \right) \) is a cyclic group. The generators are the elements \(m\in \mathbb {Z}_n \) with \(gcd(m,n)=1\), where gcd(mn) denotes the greatest common divisor for m and n (verify).

Theorem 1.10

Let \((G,*)\) be a cyclic group with generator g. If \(\mathcal {O}(G)\) is finite, then \((G,*) \cong \left( \mathbb {Z}_n,+_n \right) \) and if \(\mathcal {O}(G)\) is infinite, then \((G,*) \cong \left( \mathbb {Z},+\right) \).

Example 1.50

By Example 1.47, \((\mathbb {R},+)\) is not a cyclic group.

Definition 1.37

(Coset) Let \((G,*)\) be a group and H be a non-trivial subgroup of G. Then \(gH=\lbrace g*h \mid h \in H \rbrace \) is called left coset of H in G containing g and \(Hg=\lbrace h*g \mid h \in H \rbrace \) is called right coset of H in G containing g.

Example 1.51

Consider \(\left( \mathbb {Z}_8,+_8 \right) \) and the subgroup \(H= \lbrace 0,2,4,6 \rbrace \) of \(\mathbb {Z}_8\). Then

$$\begin{aligned}0H = \lbrace 0,2,4,6 \rbrace = 2H = 4H =6H \end{aligned}$$
$$\begin{aligned}1H = \lbrace 1,3,5,7 \rbrace = 3H = 5H =7H \end{aligned}$$

Also observe that as \(\left( \mathbb {Z}_8,+_8 \right) \) is an Abelian group, the left and right cosets of each element coincide.

Example 1.52

Consider the subgroup \(H= \lbrace \rho _0,\mu _1 \rbrace \) in \(S_3\). Then

$$\begin{aligned}\rho _0 H = \lbrace \rho _0,\mu _1 \rbrace = \mu _1 H \end{aligned}$$
$$\begin{aligned}\rho _1 H = \lbrace \rho _1,\mu _3 \rbrace = \mu _3 H \end{aligned}$$
$$\begin{aligned}\rho _2 H = \lbrace \rho _2,\mu _2 \rbrace = \mu _2 H \end{aligned}$$

are the distinct left cosets of H in G and

$$\begin{aligned} H \rho _0 = \lbrace \rho _0,\mu _1 \rbrace = H \mu _1 \end{aligned}$$
$$\begin{aligned} H \rho _1 = \lbrace \rho _1,\mu _2 \rbrace = H \mu _2 \end{aligned}$$
$$\begin{aligned} H \rho _2 = \lbrace \rho _2,\mu _3 \rbrace = H \mu _3 \end{aligned}$$

are the distinct right cosets of H in G

Theorem 1.11

(Lagrange’s Theorem) Let G be a finite group and H be a subgroup of G, then \(\mathcal {O}(H)\) divides \(\mathcal {O}(G)\). Moreover, the number of distinct left/right cosets of H in G is \(\dfrac{\mathcal {O}(G)}{\mathcal {O}(H)}\).

Example 1.53

In Example 1.51, \(H= \lbrace 0,2,4,6 \rbrace \) and \(G=\mathbb {Z}_8\). We have \(\mathcal {O}\left( H \right) =4\) and \(\mathcal {O}\left( G \right) =8\). Clearly, \(\mathcal {O}(H)\) divides \(\mathcal {O}(G)\) and the number of distinct left/right cosets of H in G is \(\dfrac{\mathcal {O}(G)}{\mathcal {O}(H)}=2\)

Example 1.54

In Example 1.52, \(H= \lbrace \rho _0,\mu _1 \rbrace \) and \(G=S_3\). We have \(\mathcal {O}\left( H \right) =2\) and \(\mathcal {O}\left( G \right) =6\). Clearly, \(\mathcal {O}(H)\) divides \(\mathcal {O}(G)\) and the number of distinct left/right cosets of H in G is \(\dfrac{\mathcal {O}(G)}{\mathcal {O}(H)}=3\).

Definition 1.38

(Normal Subgroup) A subgroup H of G is called a normal subgroup of G if \(gH=Hg\) for all \(g \in G\).

Example 1.55

From Example 1.51, \(H= \lbrace 0,2,4,6 \rbrace \) is a normal subgroup of \(\left( \mathbb {Z}_8,+_8 \right) \). In fact, every subgroup of an Abelian group is a normal subgroup (verify).

Example 1.56

From Example 1.52, \(H= \lbrace \rho _0,\mu _1 \rbrace \) is not a normal subgroup of \(S_3\).

Theorem 1.12

(Factor Group) Let \((G,*)\) be a group and H be a normal subgroup. Then the set \(G/H= \lbrace gH \mid g \in G \rbrace \) is a group under the operation \(*'\), where \(*'\) is defined by \(\left( g_1H \right) *' \left( g_2H \right) = (g_1*g_2)H \).

Example 1.57

In Example 1.55 we have seen that \(H= \lbrace 0,2,4,6 \rbrace \) is a normal subgroup of \(\left( \mathbb {Z}_8,+_8 \right) \). From Example 1.51, \(G/H = \lbrace 0H,1H \rbrace \). Then G/H is a group, with the operation \(*'\) defined as \(\left( 0H \right) *' \left( 0H \right) =\left( 0H \right) ,\left( 0H \right) *'\left( 1H \right) =\left( 1H \right) *'\left( 0H \right) =\left( 1H \right) \), and \(\left( 1H \right) *'\left( 1H \right) =\left( 0H \right) \).

Example 1.58

Consider the group \(\left( \mathbb {Z},+ \right) \). Clearly \(3\mathbb {Z} = \lbrace \ldots , -6,-3,0,3,6 \rbrace \) is a normal subgroup of \(\mathbb {Z}\). Then \(G/H= \lbrace 0\left( 3 \mathbb {Z} \right) ,1\left( 3 \mathbb {Z} \right) ,2 \left( 3 \mathbb {Z} \right) \rbrace \) is a group, with the operation \(*\) defined as \( 0\left( 3 \mathbb {Z} \right) *' 0\left( 3 \mathbb {Z} \right) = 0\left( 3 \mathbb {Z} \right) , 0\left( 3 \mathbb {Z} \right) *' 1\left( 3 \mathbb {Z} \right) = 1\left( 3 \mathbb {Z} \right) *' 0\left( 3 \mathbb {Z} \right) = 1\left( 3 \mathbb {Z} \right) , 0\left( 3 \mathbb {Z} \right) *' 2\left( 3 \mathbb {Z} \right) = 2\left( 3 \mathbb {Z} \right) *' 0\left( 3 \mathbb {Z} \right) = 2\left( 3 \mathbb {Z} \right) , 2\left( 3 \mathbb {Z} \right) *' 1\left( 3 \mathbb {Z} \right) = 0\left( 3 \mathbb {Z} \right) , 1\left( 3 \mathbb {Z} \right) *' 2\left( 3 \mathbb {Z} \right) = 0\left( 3 \mathbb {Z} \right) ,1\left( 3 \mathbb {Z} \right) *' 1\left( 3 \mathbb {Z} \right) =0\left( 3 \mathbb {Z} \right) \) and \(2\left( 3 \mathbb {Z} \right) *' 2\left( 3 \mathbb {Z} \right) = 1\left( 3 \mathbb {Z} \right) \).

Theorem 1.13

(First Isomorphism Theorem) Let \(\phi \) be a homomorphism from a group G to a group \(G'\). Then the mapping \(\Psi : G/Ker \left( \phi \right) \rightarrow G'\) given by \(\Psi \left( g Ker ( \phi ) \right) = \phi \left( g \right) \) is an isomorphism. That is, \( G/Ker \left( \phi \right) \cong \phi \left( G \right) \).

Example 1.59

In Example 1.44, we have seen that \(\phi (m)=m\ mod\ n\) is a homomorphism from \(\left( \mathbb {Z},+ \right) \) and \(\left( \mathbb {Z}_n,+_n \right) \) with \(Ker\left( \phi \right) = <n>\). Therefore by Theorem 1.13, \(\mathbb {Z}/<n> \cong \mathbb {Z}_n\).

Definition 1.39

(Ring) A non-empty set R together with two operations \('+'\) and \('.'\), known as addition and multiplication, respectively, is called a ring (denoted by \(\langle \mathcal {R},+,. \rangle \)) if the following conditions are satisfied:

  1. (a)

    \((\mathcal {R},+)\) is an Abelian group.

  2. (b)

    \((\mathcal {R},.)\) is a semi-group.

  3. (c)

    For all \(r_1,r_2,r_3 \in \mathcal {R}\)

    $$\begin{aligned}r_1.(r_2+r_3)=r_1.r_2+r_1.r_3\ \text {(left\ distributive\ law)} \end{aligned}$$
    $$\begin{aligned}(r_1+r_2).r_3=r_1.r_3+r_2.r_3\ \text {(right\ distributive\ law)} \end{aligned}$$

If there exists a non-zero element \(1\in \mathcal {R}\) such that for every element \(r \in \mathcal {R}\), \(r.1=r=1.r\), then \(\langle \mathcal {R},+,. \rangle \) is called a ring with unity and if multiplication is also commutative, then the ring is called a commutative ring.

Example 1.60

The set of all real numbers under usual addition and multiplication is a commutative ring with unity. From Example 1.33, we have \((\mathbb {R},+)\) is an Abelian group. Clearly, the usual multiplication \(' . '\) is closed, associative, and commutative over \(\mathbb {R}\). Also \(1 \in \mathbb {R}\) acts as unity and the distributive laws are satisfied. Similarly \(\langle \mathbb {C},+, . \rangle \),\(\langle \mathbb {Q},+, . \rangle \), and \(\langle \mathbb {Z},+, . \rangle \) are commutative rings with unity.

Example 1.61

The set \(\mathbb {Z}_n = \lbrace 0,1,2, \ldots ,n-1 \rbrace \), for \(n \ge 1\), under the operations addition and multiplication modulo n (taking the integer remainder when the product is divided by n) is a ring with unity 1.

Definition 1.40

(Sub-Ring) A sub-ring of a ring \(\mathcal {R}\) is a subset of the \(\mathcal {R}\) that is a ring under the induced operations from \(\mathcal {R}\).

Example 1.62

Clearly \(\langle \mathbb {Q},+, . \rangle \) is a sub-ring of \(\langle \mathbb {Q},+, . \rangle \). Also \(\langle \mathbb {Q},+, . \rangle \) is a sub-ring of \(\langle \mathbb {R},+, . \rangle \) which is again a sub-ring of \(\langle \mathbb {C},+, . \rangle \)

Example 1.63

\(\mathbb {Z}_n\), for \(n \ge 1\), is a ring under the operation addition modulo n and multiplication modulo n (denoted by \(\times _n\)). The basic operation in \(\times _n\) is multiplication, which ends by reducing the result modulo n; that is, taking the integer remainder when the result is divided by n as in \(+_n\).

Definition 1.41

(Division Ring) Let \(\langle \mathcal {R},+,. \rangle \) be a ring with unity \('1'\). An element \(r \in \mathcal {R}\) is a unit of \(\mathcal {R}\) if it has multiplicative inverse in \(\mathcal {R}\). That is, if there exists an element \(r^{-1} \in \mathcal {R}\) such that \(r.r^{-1}=1=r^{-1}.r\). If every non-zero element in \(\mathcal {R}\) is a unit, then \(\mathcal {R}\) is called a division ring or skew-field.

Example 1.64

\(\langle \mathbb {R},+, . \rangle \) is a division ring as for any \(r(\ne 0 ) \in \mathbb {R}\), there exists \(\frac{1}{r} \in \mathbb {R}\) such that \(r \cdot \frac{1}{r} =1=\frac{1}{r} \cdot r\).

Theorem 1.14

An element \(m \in \mathbb {Z}_n\) is a unit if and only if \(gcd(m,n)=1\).

Corollary 1.1

\(\mathbb {Z}_n\) is a division ring only if n is a prime.

Definition 1.42

(Field) A field is a commutative division ring. In other words, \(\langle \mathcal {R},+,. \rangle \) is a field if the following conditions are satisfied:

  1. (a)

    \((\mathcal {R},+)\) is an Abelian group.

  2. (b)

    \(\left( \mathcal {R}\setminus \lbrace 0 \rbrace ,.\right) \) is an Abelian group.

Example 1.65

The set of all real numbers \(\mathbb {R}\) under usual addition and multiplication is a field. Similarly, the set of all complex numbers \(\mathbb {C}\) and the set of all rational numbers \(\mathbb {Q}\) under usual addition and multiplication are fields.

Example 1.66

From Corollary 1.1, the set \(\mathbb {Z}_n\) is a field under the operations addition and multiplication modulo n, if and only if n is a prime (Why?). Clearly, \(\langle \mathbb {Z}_n,+_n, \times _{n} \rangle \) is an example for a finite field.

Example 1.67

The set of all integers \(\mathbb {Z}\) under usual addition and multiplication is not a field as it is not a division ring. But \(\mathbb {Z}\) is a commutative ring with unity.

Definition 1.43

(Sub-Field) A sub-field of a field is a subset of the field that is a field under the induced operations from the field.

Example 1.68

Clearly \(\langle \mathbb {Q},+, . \rangle \) is a sub-field of \(\langle \mathbb {R},+, . \rangle \) which is again a sub-field of \(\langle \mathbb {C},+, . \rangle \).

4 Polynomials

Polynomials are a type of mathematical expression built by combining variables by the operations addition, subtraction, and multiplication. They are an important tool in mathematics as many mathematical problems can be encoded into polynomial equations. In this section, we will discuss some of the important properties of polynomials in one variable.

Definition 1.44

(Ring of polynomials) Let \(\mathbb {K}\) be a field. Consider the set

$$\begin{aligned}\mathbb {K}[x ]=\lbrace a_0+a_1x +\cdots +a_{n-1}x^{n-1}+a_nx^n \mid a_i \in \mathbb {K}, n\in \mathbb {Z}^{+} \rbrace \end{aligned}$$

\(a_i \in \mathbb {K}\) are called coefficients of the polynomial, and the order of the highest power of x with non-zero coefficient is called the degree of the polynomial. For \(f(x )=a_0+a_1x +\cdots +a_nx^n,g(x )=b_0+b_1x +\cdots +b_mx^m \in \mathbb {K}[x ] \), define

$$\begin{aligned}f(x )+g(x )= (a_0+b_0)+(a_1+b_1)x +\cdots +(a_{k-1}+b_{k-1})x^{k-1}+(a_k+b_k)x^k \end{aligned}$$

where \(k=max(m,n)\), \(a_i=0\) for \(i>n\) and \(b_i=0\) for \(i>m\). Also

$$\begin{aligned} f(x )g(x )= c_0+c_1x +\cdots +c_{m+n-1}x^{m+n-1}+c_{m+n}x^{m+n} \end{aligned}$$

where \(c_k=a_kb_0+a_{k-1}b_1+\cdots +a_1b_{k-1}+a_0b_k\) for \(k=0,1,\ldots ,m+n\). Then \(\mathbb {K}[x ]\) forms a ring with respect to the operations defined above, called the ring of polynomials over \(\mathbb {K}\) in the indeterminate x.

Remark 1.5

If the coefficient of the highest power of x is the multiplicative identity of \(\mathbb {K}\), then the polynomial is called a monic polynomial. Two elements in \(\mathbb {K}[x]\) are equal if and only they have the same coefficients for all powers of x.

Theorem 1.15

(Division Algorithm) Let \(\mathbb {K}\) be a field and let \(f(x ),g(x ) \in \mathbb {K}[x ]\) with \(g(x ) \ne 0\). Then there exists unique polynomials \(q(x ),r(x ) \in \mathbb {K}[x ]\) such that \( f(x )=g(x )q(x )+r(x )\) and either \(r(x )=0\) or \(deg[r(x )]<deg[g(x )]\). If \(r(x )=0\) we have \( f(x )=g(x )q(x )\) and we say that g(x) is a factor f(x).

Theorem 1.16

Let \(\mathbb {K}\) be a field and let \(f(x ),g(x ) \in \mathbb {K}[x ]\). The greatest common divisor of f(x) and g(x), denoted by \(\left( f(x ),g(x )\right) \), is the unique monic polynomial \(r(x ) \in \mathbb {K}[x ]\) such that

  1. 1.

    r(x) is a factor of both f(x) and g(x).

  2. 2.

    if \(q(x )\in \mathbb {K}[x ]\) is a factor of both f(x) and g(x), then r(x) is a factor of q(x).

Moreover, there exists polynomials \(l(x ),m(x ) \in \mathbb {K}[x]\) such that

$$\begin{aligned}r(x)=l(x )f(x )+m(x )g(x ) \end{aligned}$$

Remark 1.6

If \(\left( f(x ),g(x )\right) =1 \), then we say that \(f(x ),g(x ) \in \mathbb {K}[ x ]\) are relatively prime.

Definition 1.45

(Zero of a polynomial) Let \(f(x ) \in \mathbb {K}[x ]\); an element \(\mu \in \mathbb {K}\) is called a zero (or a root) of f(x) if \(f(\mu )=0\).

Theorem 1.17

(Factor Theorem) Let \(\mathbb {K}\) be a field and \(f(x ) \in \mathbb {K}[x ]\). Then \(\mu \in \mathbb {K}\) is a zero of f(x) if and only if \(x - \mu \) is a factor of f(x).

Definition 1.46

(Algebraically Closed Field) A field \(\mathbb {K}\) is said to be an algebraically closed field, if every non-constant polynomial in \(\mathbb {K}[x ]\) has a root in \(\mathbb {K}\).

Theorem 1.18

(Fundamental Theorem of Algebra) The field of complex numbers is algebraically closed. In other words, every non-constant polynomial in \(\mathbb {C}[x]\) has at least one root in \(\mathbb {C}\).

From the above theorem, we can infer that every polynomial of degree n in \(\mathbb {C}[x ]\) has exactly n roots in \(\mathbb {C}\).

Example 1.69

Consider \(x^2+1 \in \mathbb {R}[x ]\). As the given polynomial has no root in \(\mathbb {R}\), the field of real numbers is not algebraically closed, whereas if we consider \(x^2+1\) as a polynomial in \(\mathbb {C}[x]\), it has roots in \(\mathbb {C}\).

Remark 1.7

(Vieta’s Formula) Let \(f(x )=a_0+a_1x +\cdots +a_nx^n \in \mathbb {K}[x ]\) with roots \(x_1 ,x_2, \ldots ,x_n \), then

$$\begin{aligned}x_1 + x_2+ \cdots +x_n = -\frac{a_{n-1}}{a_n} \end{aligned}$$
$$\begin{aligned}x_1 x_2 \cdots x_n = (-1)^n \frac{a_0}{a_n} \end{aligned}$$

It is named after the French mathematician Francois Viete (1540–1603).

5 Matrices

A matrix in mathematics is a rectangular arrangement of numbers, symbols, or functions in rows and columns. They are of great importance in mathematics and are widely used in linear algebra to study linear transformations which will be discussed in later chapters.

Definition 1.47

An \(m \times n\) matrix A over a field \(\mathbb {K}\) is a rectangular array of m rows and n columns of entries from \(\mathbb {K}\):

$$\begin{aligned}A=\begin{pmatrix} a_{11} &{} a_{12} &{} \ldots &{} a_{1n} \\ a_{21} &{} a_{22} &{} \ldots &{} a_{2n} \\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ a_{m1} &{} a_{m2} &{} \ldots &{} a_{mn} \\ \end{pmatrix} \end{aligned}$$

Such a matrix, written as \(A =\left( a_{ij} \right) \), where \(1 \le i \le m\), \(1\le j \le n\) is said to be of size (or order) \(m \times n\). Two matrices are considered to be equal if they have the same size and same corresponding entries in all positions. \(\mathbb {M}_{m \times n}\left( \mathbb {K} \right) \) denotes the set of all \(m \times n\) matrices with entries from \(\mathbb {K}\).

Matrix Operations Let us discuss some of the important operations that are used in the collection of all matrices.

Definition 1.48

(Matrix Addition) Let \(A=\left( a_{ij}\right) \) and \(B=\left( b_{ij}\right) \), where \(1 \le i \le m\), \(1\le j \le n\) be any two elements of \(\mathbb {M}_{m \times n}\left( \mathbb {K} \right) \). Then \(A+B=\left( a_{ij}+b_{ij}\right) \in \mathbb {M}_{m \times n}\left( \mathbb {K} \right) \). Two matrices must have an equal number of rows and columns to be added.

Properties

For any matrices AB and \(C\in \mathbb {M}_{m \times n}\left( \mathbb {K} \right) \)

  1. 1.

    \(A+B=B+A\). (Commutativity)

  2. 2.

    \(A+(B+C)=(A+B)+C\). (Associativity)

  3. 3.

    There exists a matrix \(O \in \mathbb {M}_{m \times n}\left( \mathbb {K} \right) \) with all entries 0 such that \(A+O=A\). (Existence of Identity)

  4. 4.

    There exists a matrix \(-A\) such that \(A+(-A)=O\). (Existence of Inverse)

Remark 1.8

\(\mathbb {M}_{m \times n}\left( \mathbb {K} \right) \) with matrix addition defined on it forms  an Abelian group.

Definition 1.49

(Matrix Multiplication) Let \(A=\left( a_{ij}\right) _{m \times n}\) and \(B=\left( b_{ij}\right) _{n \times p}\). Then their product \(AB \in \mathbb {M}_{m \times p}\) and its \((i,j)\textrm{th}\) entry is given by

$$\begin{aligned}a_{i1}b_{1j}+a_{i2}b_{2j}+\cdots +a_{in}b_{nj} \end{aligned}$$

For AB to make sense, the number of columns of A must equal the number of rows of B. Then we say that the size of matrices A and B are compatible for multiplication.

Properties

For any matrices AB and \(C\in \mathbb {M}_{n \times n}\left( \mathbb {K} \right) \)

  1. 1.

    \(A(BC)=(AB)C\) (Associativity)

  2. 2.

    \(A(B+C)=AB+AC\) and \((A+B)C=AC+BC\). (Distributive laws)

Remark 1.9

  1. 1.

    Matrix multiplication need not be commutative. For example, if \(A=\begin{pmatrix} 1 &{} -1 \\ 0 &{} 2 \end{pmatrix}\) and \(B=\begin{pmatrix} 3 &{} 4 &{} 5 \\ 6 &{} 0 &{} 8 \end{pmatrix}\) then \(AB= \begin{pmatrix} -3 &{} 4 &{} -3 \\ 12 &{} 0 &{} 16 \end{pmatrix}\). Note that BA is undefined. It need not be commutative even if BA is defined. For example, if \(A=\begin{pmatrix} 1 &{} -1 \\ 0 &{} 2 \end{pmatrix}\) and \(B=\begin{pmatrix} 3 &{} 4 \\ 6 &{} 0 \end{pmatrix}\) then \(AB= \begin{pmatrix} -3 &{} 4 \\ 12 &{} 0 \end{pmatrix}\) and \(AB= \begin{pmatrix} 3 &{} 5 \\ 6 &{} -6 \end{pmatrix}\).

  2. 2.

    The set of all invertible matrices over the field \(\mathbb {K}\) under matrix multiplication forms a non-Abelian group, denoted by \(GL_n\left( \mathbb {K} \right) \). Also observe that \(\mathbb {M}_{n \times n}\left( \mathbb {K} \right) \) forms a ring under the operations matrix addition and multiplication.

Definition 1.50

(Scalar Multiplication) Let \(A=\left[ a_{ij}\right] \in \mathbb {M}_{m \times n}\left( \mathbb {K} \right) \) and \(\lambda \in \mathbb {K}\), then \(\lambda A= \left[ \lambda a_{ij}\right] \in \mathbb {M}_{m \times n}\left( \mathbb {K} \right) \).

Properties

For any matrices \(A,B \in \mathbb {M}_{m \times n}\left( \mathbb {K} \right) \) and \(\lambda , \mu \in \mathbb {K}\)

  1. 1.

    \(\lambda (A+B) =\lambda A +\lambda B\)

  2. 2.

    \((\lambda + \mu )A=\lambda A+ \mu A\)

  3. 3.

    \(\lambda (\mu A)=(\lambda \mu )A\)

  4. 4.

    \(A(\lambda B)=\lambda (AB)=(\lambda A)B\).

Definition 1.51

(Transpose of a matrix) The transpose of an \(m \times n\) matrix \(A=\left[ a_{ij} \right] \) is the \(n \times m\) matrix (denoted by \(A^T\)), given by \(A^T=\left[ a_{ji} \right] \).

Properties

Let A and B be matrices of appropriate order, then

  1. 1.

    \(\left( A^T \right) ^T =A \)

  2. 2.

    \(\left( A+B \right) ^T =A^T+B^T \)

  3. 3.

    \(\left( AB \right) ^T = B^TA^T \)

  4. 4.

    \(\left( kA \right) ^T=kA^T \).

Definition 1.52

(Conjugate transpose of a matrix) The conjugate transpose of an \(m \times n\) matrix \(A=\left[ a_{ij} \right] \) is the \(n \times m\) matrix (denoted by \(A^{*}\)) given by \(A^{*}=\left[ \overline{a_{ji}} \right] \) where bar denotes complex conjugation (if \(a_{ij}=c+id\), then \(\overline{a_{ij}}=c-id\)).

Properties

Let A and B be matrices of appropriate orders and \(\lambda \) be a scalar, then

  1. 1.

    \(\left( A^{*} \right) ^{*} =A \)

  2. 2.

    \(\left( A+B \right) ^{*} =A^{*}+B^{*} \)

  3. 3.

    \(\left( AB \right) ^{*} = B^{*}A^{*} \)

  4. 4.

    \(\left( \lambda A \right) ^{*}=\overline{\lambda }A^{*} \), where \(\overline{\lambda }\) is the conjugate of \(\lambda \).

Definition 1.53

(Trace of a matrix) Let \(A=\left[ a_{ij} \right] \) be an \(n \times n\) matrix. The trace of A, denoted by tr(A), is the sum of diagonal entries; that is \(tr(A)= \sum _{i=1}^{n}a_{ii}\).

Properties

For any \(n \times n\) matrices \(A, B, C, \ and \ D\) and \(\lambda \in \mathbb {R}\), we have the following properties:

  1. 1.

    Trace is a linear function.

    \(tr(A+B) = tr(A) + tr(B)\)

    \(tr(\lambda A) = \lambda \ tr(A)\)

  2. 2.

    \(tr(A^T) = tr(A)\) and \(tr(A^{*}) = \overline{(trA)}\)

  3. 3.

    \(tr(AB) = tr(BA)\)

  4. 4.

    \(tr(ABCD)= tr(DABC)=tr(CDAB)= tr(BCDA)\)

  5. 5.

    \(tr(ABC) \ne tr(ACB)\) in general.

  6. 6.

    \(tr(AB) \ne tr(A).tr(B)\) in general.

Definition 1.54

(Determinant of a matrix) For each square matrix A with entries in \(\mathbb {K}\left( \mathbb {K}=\mathbb {R} \ or \ \mathbb {C}\right) \), we can associate a single element of \(\mathbb {K}\) called determinant of A, denoted by \(det\ (A)\).

If A is a \(1 \times 1\) matrix, i.e., \(A=\left[ a_{11} \right] \), then its determinant is defined by \(det(A)=a_{11}\). If A is a \(2 \times 2\) matrix, say \(A=\begin{bmatrix} a_{11} &{} a_{12} \\ a_{21} &{} a_{22} \end{bmatrix}\), then its determinant is defined by

$$\begin{aligned}det(A)=a_{11}a_{22}-a_{21}a_{12} \end{aligned}$$

The determinant for a square matrix with higher dimension n may be defined inductively as follows:

$$\begin{aligned}det\ (A)=\sum _{i=1}^{n}(-1)^{i+j}a_{ij}M_{ij} \end{aligned}$$

for a fixed j, where \(M_{ij}\) is the determinant of the \((n-1)\times (n-1)\) matrix obtained from A by deleting \(i\textrm{th}\) row and \(j\textrm{th}\) column, called minor of the element \(a_{ij}\).

Properties

Let A and B be any \(n \times n \) matrices and \(\lambda \) be any scalar, then

  1. 1.

    \(det\ (I_n)=\ 1\), where \(I_n\) is the \(n \times n\) identity matrix.

  2. 2.

    \(det\ (A^T) = det\ (A)\) and \(det\ (A^{*}) = \overline{det\ (A)}\).

  3. 3.

    \(det\ (AB)=det\ (A)\ det\ (B)\).

  4. 4.

    \(det\ (\lambda A)=\lambda ^n\ det\ (A)\).

  5. 5.

    If B is a matrix obtained from A by multiplying one row (or column) by a scalar \(\lambda \), then \(det\ (B)=\lambda \ det\ (A)\).

  6. 6.

    If B is a matrix obtained from A by interchanging any two rows (or columns) of A then \(det\ (B)=-\ det\ (A)\).

  7. 7.

    If two rows of a matrix are identical then the matrix has determinant zero.

  8. 8.

    If B is a matrix obtained from A by adding \(\lambda \) times one row (or column) of A to another row (or column) of A, then \(det\ (B)= det\ (A)\).

Remark 1.10

An \(n \times n\) matrix with determinant zero is called singular matrix, otherwise it is called a non-singular matrix.

Definition 1.55

(Adjoint of a Matrix) The adjoint of a matrix \(A=\left[ a_{ij}\right] _{n \times n} \) (denoted by \(adj\ (A)\)) is the transpose of the co-factor matrix, where co-factor matrix of \(A=\left[ a_{ij}\right] _{n \times n} \) is \(\left[ (-1)^{i+j}M_{ij}\right] _{n \times n} \), where \(M_{ij}\) is the determinant of the \((n-1)\times (n-1)\) matrix obtained from A by deleting \(i\textrm{th}\) row and \(j\textrm{th}\) column, called minor of the \(ij\textrm{th}\) element.

Properties

Let A and B be any \(n \times n \) matrices, then

  1. 1.

    \(adj(I_n)=I_n\)

  2. 2.

    \(adj(AB)=adj(B)\ adj(A)\)

  3. 3.

    \(adj(kA)=k^{n-1}adj(A)\)

  4. 4.

    \(adj(A^m)=(adj(A))^m\)

  5. 5.

    \(adj(A^T)=(adj(A))^T\)

  6. 6.

    \(A\ adj(A)= det(A)\ I = adj(A)\ A\)

  7. 7.

    \(det\ \left( adj(A)\right) =\left( det(A)\right) ^{n-1} \)

  8. 8.

    \(adj\ \left( adj(A) \right) =\left( det(A)\right) ^{n-2}A \).

Definition 1.56

(Inverse of a matrix) The inverse of a square matrix \(A_{n \times n}\) if it exists is the matrix \(A^{-1}_{n \times n}\) such that \(AA^{-1}=I_n=A^{-1}A\) and is given by \(A^{-1}=\frac{1}{det(A)}adj(A)\).

Properties

Let A and B be any \(n \times n \) matrices and \(\lambda \) be any scalar, then

  1. 1.

    The inverse of a matrix if it exists is unique.

  2. 2.

    A is invertible if and only if \(det\ A \ne 0\).

  3. 3.

    \(\left( A^{-1}\right) ^{-1} =A\).

  4. 4.

    \( (kA)^{-1}=k^{-1}A^{-1}\), where \(k\ne 0\) is any scalar.

  5. 5.

    \(det\ (A^{-1})=\frac{1}{det\ (A)}\).

  6. 6.

    \(\left( AB \right) ^{-1}=B^{-1}A^{-1} \).

  7. 7.

    \(\left( A^T \right) ^{-1}=\left( A^{-1} \right) ^T \).

Remark 1.11

  1. 1.

    There are matrices for which \(AB=I\) but \(BA \ne I\). For example take \( A= \begin{bmatrix} 1 &2 \end{bmatrix}\) and \(B=\) \(\begin{bmatrix} 1 \\ 0 \end{bmatrix}\). Then \(AB=\) \(\begin{bmatrix} 1 \end{bmatrix}\) \(=I\) and \(BA=\) \(\begin{bmatrix} 1 &{}2 \\ 0 &{}0 \end{bmatrix}\) \(\ne I\).

  2. 2.

    If \(A=\) \(\begin{bmatrix} a &{}b \\ c &{}d \end{bmatrix}\) is invertible, then \(A^{-1}\) is given by \(A^{-1}=\frac{1}{ad-bc}\) \(\begin{bmatrix} d &{}-b \\ -c &{} a \end{bmatrix}\).

  3. 3.

    Set of all \(n \times n\) non-singular matrices with entries from the field \(\mathbb {K}\) under matrix multiplication forms a non-Abelian group called general linear group, and is denoted by \(GL_n\left( \mathbb {K} \right) \).

    1. 1.

      For any matrices \(A,B \in GL_n\left( \mathbb {K} \right) \), \(AB \in GL_n\left( \mathbb {K} \right) \) (\( det(A),det(B) \ne 0 \Rightarrow det(AB) \ne 0 \)). (Closure property)

    2. 2.

      Matrix multiplication is associative.

    3. 3.

      \(I_n\in GL_n\left( \mathbb {K} \right) \) acts as identity matrix.

    4. 4.

      For each \(A \in GL_n\left( \mathbb {K} \right) \), we have \(det(A) \ne 0\) and hence \(A^{-1}\) exists. Also, \(det\ (A^{-1})=\frac{1}{det\ (A)}\), and thus \(A^{-1} \in GL_n\left( \mathbb {K} \right) \).

Definition 1.57

(Rank of a matrix) The rank of a matrix is the order of the highest order sub-matrix having non-zero determinant.

Properties

  1. 1.

    Let A be an \(m \times n\) matrix. Then \(Rank(A)\le min\lbrace m,n\rbrace \).

  2. 2.

    Only zero matrix has rank zero.

  3. 3.

    A square matrix \(A_{n \times n}\) is invertible if and only if \(Rank(A)=n\).

  4. 4.

    Sylvester’s Inequality: If A is an \(m \times n\) matrix and B is an \(n \times p\) matrix, then 

    $$\begin{aligned}Rank(A) + Rank(B) - n \le Rank(AB) \le min\lbrace Rank(A),Rank(B)\rbrace \end{aligned}$$

    This result is named after the famous English mathematician James Joseph Sylvester (1814–1897).

  5. 5.

    Frobenius Inequality: Let AB, and C be any matrices such that ABBC, and ABC exists, then

    $$\begin{aligned}Rank(AB)+ Rank(BC) \le Rank(ABC)+Rank(B) \end{aligned}$$

      This result is named after the famous German mathematician Ferdinand Georg Frobenius (1849–1917).

  6. 6.

    Rank is sub-additive. That is, \(Rank(A+B) \le Rank(A) + Rank(B)\).

  7. 7.

    \(Rank(A)=Rank(A^T)=Rank(A^TA)\).

  8. 8.

    \(Rank(kA)= Rank(A)\) if \(k \ne 0\).

Definition 1.58

(Block Matrix) A block matrix or a partitioned matrix is a matrix that is defined using smaller matrices called blocks.

Example 1.70

Consider \(X=\) \(\begin{bmatrix} A &{}B \\ C &{}D \end{bmatrix}_{5 \times 5}\) where \(A=\) \(\begin{bmatrix} 2 &{}0 \\ 0 &{}2 \end{bmatrix}_{2 \times 2}\), \(B=\) \(\begin{bmatrix} 2 &{}1 &{}3 \\ 6 &{}2 &{}7 \end{bmatrix}_{2 \times 3}\), \(C=\) \(\begin{bmatrix} 1 &{}0 \\ 5 &{}2 \\ 7 &{}3 \end{bmatrix}_{3 \times 2}\), and \(D=\) \(\begin{bmatrix} 1 &{}9 &{}8 \\ 4 &{}2 &{}1\\ 7 &{}0 &{}1 \end{bmatrix}_{3 \times 3}\).

Properties

  1. 1.

    Let \(X=\) \(\begin{bmatrix} A &{}B \\ C &{}D \end{bmatrix}\) where \(A_{n \times n} , B_{n \times m},C_{m \times n}\), and \(D_{m \times m}\) are matrices. If A is invertible, then

    $$\begin{aligned}det\ (X)=(det\ (A))\left( det\ (D-CA^{-1}B) \right) \end{aligned}$$

Definition 1.59

(Block Diagonal Matrix) A block diagonal matrix is a block matrix which is a square matrix such that all blocks except the diagonal ones are zero.

Properties

  1. 1.

    Consider a block diagonal matrix of the form \(A=\begin{bmatrix} A_1 &{} 0 &{} \cdots &{} 0 \\ 0 &{} A_2 &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} A_n \\ \end{bmatrix}\), where each \(A_i's\) is a square matrix. Then

    1. (a)

      \(det(A)=det(A_1)det(A_2) \cdots det(A_n)\)

    2. (b)

      \(Tr(A)=Tr(A_1)+Tr(A_2)+ \cdots +Tr(A_n) \)

    3. (c)

      \(Rank(A)=Rank(A_1)+Rank(A_2)+ \cdots +Rank(A_n) \).

Definition 1.60

(Elementary Operations) There are three kinds of elementary matrix operations:

  1. (1)

    Interchanging two rows (or columns).

  2. (2)

    Multiplying each element in a row (or column) by a non-zero number.

  3. (3)

    Multiplying a row (or column) by a non-zero number and adding the result to another row (or column).

When these operations are performed on rows, they are called elementary row operations; and when they are performed on columns, they are called elementary column operations.

Definition 1.61

(Equivalent matrices) Two matrices A and B are said to be row(column) equivalent if there is a sequence of elementary row(column) operations that transforms A into B and is denoted by \(A \sim B\).

Definition 1.62

(Row Echelon form of a matrix) A matrix is said to be in row echelon form when it satisfies the following conditions:

  1. (a)

    Each leading entry (the first non-zero entry in a row) is in a column to the right of the leading entry in the previous row.

  2. (b)

    Rows with all zero elements, if any, are below rows having a non-zero element.

If the matrix also satisfies the condition

  1. (c)

    The first non-zero element in each row, called the leading entry or pivot, is 1.

Then the matrix is in reduced row echelon form.

Example 1.71

Consider the matrix \(A=\) \(\begin{bmatrix} 3 &{}2 &{}1 &{}4 \\ 1 &{}2 &{}3 &{}4 \\ 1 &{}6 &{}11 &{}12 \end{bmatrix}\). Now

$$\begin{aligned} A &= \begin{bmatrix} 3 &{}2 &{}1 &{}4 \\ 1 &{}2 &{}3 &{}4 \\ 1 &{}6 &{}11 &{}12 \end{bmatrix} \ \ \ \ \ \ \ \ \ \ \begin{matrix} R_1 \leftrightarrow R_2 \end{matrix} \\ &\sim \begin{bmatrix} 1 &{}2 &{}3 &{}4 \\ 3 &{}2 &{}1 &{}4 \\ 1 &{}6 &{}11 &{}12 \end{bmatrix}\ \ \ \ \ \ \ \ \ \ \begin{matrix} R_2 \rightarrow R_2 -3R_1 \\ R_3 \rightarrow R_3 - R_1 \end{matrix} \\ &\sim \begin{bmatrix} 1 &{}2 &{}3 &{}4 \\ 0 &{}-4 &{}-8 &{}-8 \\ 0 &{}4 &{}8 &{}8 \end{bmatrix} \ \ \ \ \ \ \ \ \ \ \begin{matrix} R_3 \rightarrow R_3 + R_2 \end{matrix} \\ &\sim \begin{bmatrix} 1 &{}2 &{}3 &{}4 \\ 0 &{}-4 &{}-8 &{}-8 \\ 0 &{}0 &{}0 &{}0 \end{bmatrix} \ \ \ \ \ \ \ \ \ \ \begin{matrix} R_2 \rightarrow -\frac{1}{4} R_2 \end{matrix} \\ &\sim \begin{bmatrix} 1 &{}2 &{}3 &{}4 \\ 0 &{}1 &{}2 &{}2 \\ 0 &{}0 &{}0 &{}0 \end{bmatrix}=B \end{aligned}$$

Then B is called the reduced row echelon form of A.

Remark 1.12

  1. 1.

    A matrix is equivalent to any of its row echelon form and reduced row echelon form. The reduced row echelon form of A is unique.

  2. 2.

    The rank of a matrix is equal to the number of non-zero rows in its row echelon form. For example, the matrix \(A= \begin{bmatrix} 3 &{}2 &{}1 &{}4 \\ 1 &{}2 &{}3 &{}4 \\ 1 &{}6 &{}11 &{}12 \end{bmatrix}\) has rank 2 as it is equivalent to \(B=\begin{bmatrix} 1 &{}2 &{}3 &{}4 \\ 0 &{}1 &{}2 &{}2 \\ 0 &{}0 &{}0 &{}0 \end{bmatrix}\), which is in the row echelon form.

6 Euclidean Space \(\mathbb {R}^n\)

In a mathematical environment, Euclidean space is a geometric concept that contains all conceivable positions and locations. It provides the theoretical framework for many other mathematical fields, including classical geometry. We can use well-defined connections and rules to describe points, lines, angles, and distances inside this space. It acts as a foundational tool and gives a framework for comprehending spatial relationships. Any point in \(\mathbb {R}^n\) is a list of n real numbers, denoted as \(v=(v_1,v_2, \ldots ,v_n)\). For convenience, we may use this list as a matrix with one column or one row called column vector and row vector, respectively. In the physical world, a vector is a quantity which has both magnitude and direction, which can be easily visualized when we work on \(\mathbb {R}^2\) or \(\mathbb {R}^3\). Vectors in \(\mathbb {R}^2\) Algebraically, a vector in \(\mathbb {R}^2\) is simply an ordered pair of real numbers. That is \(\mathbb {R}^2= \lbrace (v_1,v_2) \mid v_1,v_2 \in \mathbb {R} \rbrace \). Two vectors \((u_1,u_2)\) and \((v_1,v_2)\) are equal if and only if the corresponding components are equal. That is, if and only if \(u_1=v_1\) and \(u_2=v_2\). Now we can define some operations on \(\mathbb {R}^2\).

Definition 1.63

(Vector Addition) The sum of two vectors \(u=(u_1,u_2)\) and \(v=(v_1,v_2)\), denoted by \(u+v\), is given by \(u+v=(u_1+v_1,u_2+v_2)\in \mathbb {R}^2\).

Properties

Let \(u=(u_1,u_2),v=(v_1,v_2),w=(w_1,w_2)\in \mathbb {R}^2\). Then

  1. 1.

    \(u+v=(u_1+v_1,u_2+v_2)=(v_1+u_1,v_2+u_2)=v+u\). (Commutative)

  2. 2.

    \(u+(v+w)=(u_1+(v_1+w_1),u_2+(v_2+w_2))=((u_1+v_1)+w_1,(u_2+v_2)+w_2)=(u+v)+w\). (Associative)

  3. 3.

    There exists \({\textbf {0}}=(0,0)\) such that \(v+{\textbf {0}}=v\) for all v. (Existence of identity element)

  4. 4.

    For each \(v \in \mathbb {R}^2\), there exists \(-v=(-v_1,-v_2) \in \mathbb {R}^2\) such that \(v+(-v)={\textbf {0}}\). (Existence of inverse)

Remark 1.13

The set \(\mathbb {R}^2\) with vector addition forms an Abelian group.

Definition 1.64

(Scalar Multiplication) Let \(v=(v_1,v_2) \in \mathbb {R}^2\) and \(\lambda \in \mathbb {R}\), then \(\lambda v=(\lambda v_1, \lambda v_2) \in \mathbb {R}^2\).

Properties

Let \(u=(u_1,u_2),v=(v_1,v_2)\in \mathbb {R}^2\) and \(\lambda ,\mu \in \mathbb {R}\). Then

  1. 1.

    \(\lambda (u+v)=(\lambda (u_1+v_1),\lambda (u_2+v_2))=\lambda (u_1,u_2)+\lambda (v_1,v_2)=\lambda u + \lambda v\)

  2. 2.

    \((\lambda + \mu )v=((\lambda + \mu )v_1,(\lambda + \mu )v_2)=\lambda (v_1,v_2)+ \mu (v_1,v_2)=\lambda v + \mu v\)

  3. 3.

    \(\lambda (\mu v)=(\lambda \mu )v=\mu (\lambda v)\).

From the above properties, it is clear that \(0v=0\) for any \(v \in V\) and \(0 \in \mathbb {R}\). Also, \((-1)v=-v\) for any \(v \in V\) and \(-1 \in \mathbb {R}\).

The Geometric Notion of Vectors in \(\mathbb {R}^2\)

Corresponding to every vector in \(\mathbb {R}^2\), there exists a point in the Cartesian plane, and each point in the Cartesian plane represents a vector in \(\mathbb {R}^2\). But the representation of vectors in \(\mathbb {R}^2\) as points of Cartesian plane does not provide much information about the operations like vector addition and scalar multiplication. So it is better to represent a vector in \(\mathbb {R}^2\) as a directed line segment which begins at the origin and ends at the point. Such a visualization of a vector v is called position vector of v. Then as in the physical world, the vector possess both magnitude and direction. However, to represent a vector in \(\mathbb {R}^2\), the directed line segment need not start from the origin; it may start at some point in \(\mathbb {R}^2\), but the magnitude and direction cannot vary. For convenience, the directed line segment is considered to be starting from the origin.

Theorem 1.19

(Triangle Law of Vector Addition) If two vectors are represented in magnitude and direction by the two sides of a triangle, taken in order, then their sum is represented in magnitude and direction by the third side of the triangle, taken in the reverse order (Fig. 1.16).

Fig. 1.16
A triangle has a right horizontal vector, v 1, and two up vectors, v 1 plus v 2, and v 2.

Triangle law of vector addition

Theorem 1.20

(Parallelogram Law of vector Addition) If two vectors are represented in magnitude and direction by the two adjacent sides of a parallelogram, then their sum is represented in magnitude and direction by the diagonal of the parallelogram through their common point (Fig. 1.17).

Fig. 1.17
Three vectors, v 1, v 2, and v 1 plus v 2, in different directions form a parallelogram by the addition of two dashed vectors at the top and on the right.

Parallelogram law of vector addition

These ideas of vectors and vector operations in \(\mathbb {R}^2\) can be extended to general Euclidean space \(\mathbb {R}^n\).

7 System of Linear Equations

Solving simultaneous linear equations is one among the central problems in algebra. In this section, we will get to know some of the methods that are used to solve the system of linear equations. Let us start by discussing the solution of a system having n equations in n unknowns. Consider the basic problem with \(n=1\), i.e., consider an equation of the form, \(ax=b\). We know that there are three possible numerical realizations for this equation:

  1. (1)

    \(a \ne 0:\) In this case, we know that the equation have a unique solution, which is \(x=\frac{b}{a}\).

  2. (2)

    \(a,b=0:\) Any numerical value for x will be a solution for this equation. That is, there are infinite number of solutions.

  3. (3)

    \(a=0 , b \ne 0:\) Then it is clear that no numerical value of x would satisfy the equation. That is, the system has no solutions.

Now consider a set of two equations in 2 unknowns \(x_1\) and \(x_2\):

$$\begin{aligned}a_1x_1+a_2x_2=b_1 \end{aligned}$$
$$\begin{aligned}a_3x_1+a_4x_2=b_2 \end{aligned}$$

We know that these equations represent two lines on a plane and solution of this system, if it exists, are the intersecting points of these two lines. If the lines are intersecting, either there will be a unique intersection point or there will be an infinite number of intersection points and if the lines are non-intersecting, they must be parallel to each other. Thus, here also, there are only three possibilities. The possibilities will be the same in the case of a system of n equations with n unknowns. The three possibilities are demonstrated in the Fig. 1.18.

Fig. 1.18
Three graphs. A, unique solution, two lines given by x 1 + x 2 = 5 and x 1 minus x 2 = 3 intersect at (4, 1). B, infinitely many solutions, a line is given by the equations x 1 + x 2 = 5 and 2 x 1 + 2 x 2 = 10. C, no solution, two parallel lines are given by x 1 + x 2 = 5 and x 1 + x 2 = 2.

Observe that in a, the lines \(x_1+x_2=5\) and \(x_1-x_2=3\) have a unique intersection point (4, 1), in b both the equations \(x_1+x_2=5\) and \(2x_1+2x_2=10\) represent the same line and in c, the lines \(x_1+x_2=5\) and \(x_1+x_2=2\) are parallel to each other

Now that we have seen the possibilities for the number of solutions of a system of equations, we have to find a method to solve a system of linear equations. Consider a system of n equations in n unknowns \(x_1,x_2, \ldots ,x_n\) given by

$$\begin{aligned}a_{11}x_1+a_{12}x_2+\cdots +a_{1n}x_n \ =b_1 \end{aligned}$$
$$\begin{aligned}a_{21}x_1+a_{22}x_2+\cdots +a_{2n}x_n \ =b_2 \end{aligned}$$
$$\begin{aligned} \vdots \ \ \ \ \ \ \ \ \ \ \ \ \vdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \vdots \ \ \ \ \ \ \ \ \ \ \vdots \end{aligned}$$
$$\begin{aligned}a_{n1}x_1+a_{n2}x_2+\cdots +a_{nn}x_n=b_n \end{aligned}$$

The system can be written in the form \(Ax=b\), where

\(A=\) \(\begin{bmatrix} a_{11} &{} a_{12} &{} \cdots &{} a_{1n} \\ a_{21} &{} a_{22} &{} \cdots &{} a_{2n} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ a_{n1} &{} a_{n2} &{} \cdots &{} a_{nn} \end{bmatrix}\), \(x=\) \(\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\) and \(b=\) \(\begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{bmatrix}\).

The matrix A is called the coefficient matrix. A method to solve this system is given by Gabriel Cramer (1704–1752), using the determinants of the coefficient matrix and matrices obtained from it by replacing one column by the column vector of right-hand sides of the equations. Cramer’s rule states that if \(x=\left( x_1,x_2, \ldots ,x_n \right) \) is a solution of the system, \(x_i=\frac{det(A_i)}{det(A)},\ i=1,2,\ldots ,n,\) where \(A_i\) is the matrix obtained by replacing the \(i\textrm{th}\) column of A by the column vector b. Observe that this rule is applicable only if \(det(A) \ne 0\). For example, consider the equations \(x_1+x_2=5\) and \(x_1-x_2=3\). The system can be expressed in the form,

$$\begin{aligned}\begin{bmatrix} 1 &{} 1 \\ 1 &{} -1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 5 \\ 3 \end{bmatrix} \end{aligned}$$

As \(det(A)=-2 \ne 0\), we have

$$\begin{aligned}x=\dfrac{det\left( \begin{bmatrix} 5 &{} 1 \\ 3 &{} -1 \end{bmatrix} \right) }{det\left( \begin{bmatrix} 1 &{} 1 \\ 1 &{} -1 \end{bmatrix} \right) }= 4\, \text {and}\ y=\dfrac{det\left( \begin{bmatrix} 1 &{} 5 \\ 1 &{} 3 \end{bmatrix} \right) }{det\left( \begin{bmatrix} 1 &{} 1 \\ 1 &{} -1 \end{bmatrix} \right) }=1 \end{aligned}$$

As we can see, Cramer’s rule is applicable only if the determinant of A is non-zero. Even if the determinant of A is non-zero, this rule may cause computational difficulties for higher values of n. Also it cannot be applied to a system of m equations in n unknowns. Another method to find the solution of a system of equations is elimination, in which multiples of one equation is added or subtracted to other equations so as to remove the unknowns from the equations till only one equation in one by unknown remains, which can be solved easily. We can use the value of this unknown to find the value of the remaining ones.

Consider a system of m equations in n unknowns \(x_1,x_2, \ldots ,x_n\) given by

$$\begin{aligned}a_{11}x_1+a_{12}x_2+\cdots +a_{1n}x_n\ =b_1 \end{aligned}$$
$$\begin{aligned}a_{21}x_1+a_{22}x_2+\cdots +a_{2n}x_n\ =b_2 \end{aligned}$$
$$\begin{aligned} \vdots \ \ \ \ \ \ \ \ \ \ \ \ \vdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \vdots \ \ \ \ \ \ \ \ \ \ \vdots \end{aligned}$$
$$\begin{aligned}a_{m1}x_1+a_{m2}x_2+\cdots +a_{mn}x_n=b_m \end{aligned}$$

The system can be written in the form \(Ax=b\), where \(A=\) \(\begin{bmatrix} a_{11} &{} a_{12} &{} \cdots &{} a_{1n} \\ a_{21} &{} a_{22} &{} \cdots &{} a_{2n} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ a_{m1} &{} a_{m2} &{} \cdots &{} a_{mn} \end{bmatrix}\), \(x=\) \(\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}\) and \(b=\) \(\begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_n \end{bmatrix}\). The matrix A is called the coefficient matrix, and the matrix \(\left[ A\mid b\right] = \) \(\begin{bmatrix} a_{11} &{} a_{12} &{} \cdots &{} a_{1n} &{} b_1 \\ a_{21} &{} a_{22} &{} \cdots &{} a_{2n} &{} b_2\\ \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots \\ a_{m1} &{} a_{m2} &{} \cdots &{} a_{mn} &{} b_n \end{bmatrix}\) is called the augmented matrix of the system. If \(b=0\), then the system is called a homogeneous system. Otherwise, it is called non-homogeneous system. A system is said to be consistent, if it has a solution. Otherwise, it is called inconsistent. We will see that a homogeneous system is always consistent, whereas a non-homogeneous system can be inconsistent (as given in Fig. 1.18c). Gauss Elimination Method Consider a system of equations given by \(Ax=b\). We can solve the system using the following method called Gauss elimination method, named after the famous German mathematician Carl Friedrich Gauss (1777–1855).

  1. 1.

    Construct the augmented matrix for the given system of equations.

  2. 2.

    Use elementary row operations to transform the augmented matrix to its row echelon form.

  3. 3.

    The system

    • is consistent if and only if \(Rank\left[ A\mid b\right] = Rank(A) \).

      \(\diamond \):

      has unique solution if and only if \(Rank\left[ A\mid b\right] = Rank(A)=n \).

      \(\diamond \):

      has an infinite number of solutions if \(Rank\ \left[ A\mid b\right] = Rank(A)=r < n \).

    • is inconsistent if and only if \(Rank\left[ A\mid b\right] \ne Rank\ (A) \).

  4. 4.

    If the system is consistent, write and solve the new set of equations corresponding to the row echelon form of the augmented matrix.

If reduced row echelon form is used, the method is called Gauss–Jordan method.

Remark 1.14

A homogeneous system \(Ax=0\) is always consistent (since \(Rank\left[ A\mid 0 \right] = Rank(A)\) always). The system

  • has a unique solution if \(Rank(A)=n\).

  • has infinite number of solutions if and only if \(Rank(A)=r<n\).

Example 1.72

Consider the system of equations

$$\begin{aligned}2x_1+3x_2+5x_3=9 \end{aligned}$$
$$\begin{aligned}7x_1+3x_2-2x_3=8 \end{aligned}$$
$$\begin{aligned}2x_1+3x_2+\lambda _1 x_3=\lambda _2 \end{aligned}$$

where \(\lambda _1 \) and \(\lambda _2 \) are some real numbers.

The above system can be written in the matrix form \(Ax=b\) as

$$\begin{aligned}\begin{bmatrix} 2 &{} 3 &{} 5 \\ 7 &{} 3 &{} -2 \\ 2 &{} 3 &{} \lambda _1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 9 \\ 8 \\ \lambda _2 \end{bmatrix} \end{aligned}$$

Now the augmented matrix \([A \mid b]\) is given by

$$\begin{aligned}{}[A \mid b] &= \begin{bmatrix} 2 &{} 3 &{} 5 &{} 9\\ 7 &{} 3 &{} -2 &{} 8 \\ 2 &{} 3 &{} \lambda _1 &{} \lambda _2 \end{bmatrix}\ \ \ \ \ \ \ \ \ \ \begin{matrix} R_2 \rightarrow R_2 -\frac{7}{2}R_1 \\ R_3 \rightarrow R_3 - R_1 \end{matrix} \\ &\sim \begin{bmatrix} 2 &{} 3 &{} 5 &{} 9\\ 0 &{} \frac{-15}{2} &{} \frac{-39}{2} &{} \frac{-47}{2} \\ 0 &{} 0 &{} \lambda _1-5 &{} \lambda _2-9 \end{bmatrix} \end{aligned}$$

As the first two rows in the reduced form are non-zero, both Rank(A) and \(Rank\left[ A \mid b \right] \) are greater than or equal to 2.

  • \(\diamond \) The system has unique solution if and only if \(Rank\left[ A\mid b\right] = Rank(A)=3\). That is, if \(\lambda _1 \ne 5\) and for any arbitrary values \(\lambda _2\).

  • \(\diamond \) The system has an infinite number of solutions if \(Rank\left[ A\mid b\right] = Rank(A) < 3\). If \(\lambda _1 =5\) and \(\lambda _2 =9\), we have \(Rank\left[ A\mid b\right] = Rank(A)=2 < 3\).

  • \(\diamond \) The system has no solution when \(Rank\left[ A\mid b\right] \ne Rank(A)\). That is, if \(\lambda _1=5\) and \(\lambda _2 \ne 9\).

If \(b=0\) in the above system, then

  • \(\diamond \) The homogeneous system has a unique solution if and only if \( Rank(A)=3\). That is, if \(\lambda _1 \ne 5\) the given system has only the zero vector as solution.

  • \(\diamond \) If \(\lambda _1 =5\), then \(Rank(A)=2 < 3 \) and hence the given system has an infinite number of solutions.

As we have identified the values of \(\lambda _1\) and \(\lambda _2\) for which the given system is consistent, let us try to compute the solutions of the given system for some particular values of \(\lambda _1\) and \(\lambda _2\). Take \(\lambda _1=1\) and \(\lambda _2=9\). Then,

$$\begin{aligned}{}[A \mid b] \sim \begin{bmatrix} 2 &{} 3 &{} 5 &{} 9\\ 0 &{} \frac{-15}{2} &{} \frac{-39}{2} &{} \frac{-47}{2} \\ 0 &{} 0 &{} -4 &{} 0 \end{bmatrix} \end{aligned}$$

That is, the given system is reduced to the following equivalent form:

$$\begin{aligned} 2x_1+3x_2+5x_3 &=9 \\ \frac{15}{2}x_2+\frac{39}{2}x_3&=\frac{47}{2} \\ -4 x_3&=0 \end{aligned}$$

Thus, we have \(x=\begin{bmatrix} \frac{-1}{5} \\ \frac{47}{15} \\ 0 \end{bmatrix}\) as the unique solution for the given system. Similarly, if we take \(\lambda _1=5\) and \(\lambda _2=9\), we can show that set of all solutions of the given system is \(\left\{ (x_1,x_2,x_3) \mid x_3 \in \mathbb {R}, x_1=\frac{14x_3-2}{10}\ and\ x_2=\frac{47-39x_3}{15} \right\} \) (Verify!).

Remark 1.15

If the coefficient matrix A is an \(n \times n\) non-singular matrix, then the system \(Ax=b\) has a unique solution \(x=A^{-1}b\).

LU Decomposition The LU decomposition method consists of factorizing A into a product of two triangular matrices

$$\begin{aligned}A = LU \end{aligned}$$

where L is the lower triangular and U is the upper triangular. We use the Doolittle method to convert A into the form \(A=LU\), where L and U are as mentioned above. We initialize this process by setting \(A=IA\) and use Gaussian elimination procedure to achieve the desired form. The pivot element is identified in each column during this procedure, and if necessary, the rows are switched. We update the entries of both I and A on the right-hand side in accordance with each column, using row operations to remove elements below the main diagonal and multipliers to generate L. We get a lower triangular matrix L with ones on its principal diagonals and an upper triangular matrix U after iterating over all the columns. This decomposition allows us to reduce the solution of the system \(Ax = b\) to solving two triangular systems \(Ly = b\) and \(Ux = y\). Generally, there are many such factorizations. If L is required to have all diagonal elements equal to 1, then the decomposition, when it exists, is unique. This method was introduced by the Polish mathematician Tadeusz Julian Banachiewicz (1882–1954).

Example 1.73

Consider the system of equations

$$\begin{aligned}2x_1-x_2+3x_3=9 \end{aligned}$$
$$\begin{aligned}4x_1+2x_2+x_3=9 \end{aligned}$$
$$\begin{aligned}-6x_1-x_2+2x_3=12 \end{aligned}$$

The above system can be written in the matrix form \(Ax=b\) as

$$\begin{aligned}\begin{bmatrix} 2 &{} -1 &{} 3 \\ 4 &{} 2 &{} 1 \\ -6 &{} -1 &{} 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 9 \\ 9 \\ 12 \end{bmatrix} \end{aligned}$$

Consider the coefficient matrix A. We will use elementary row transformations to convert A into the form LU. We have

$$\begin{aligned} A=\begin{bmatrix} 2 &{} -1 &{} 3 \\ 4 &{} 2 &{} 1 \\ -6 &{} -1 &{} 2 \end{bmatrix} &= \begin{bmatrix} 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \end{bmatrix} \begin{bmatrix} 2 &{} -1 &{} 3 \\ 4 &{} 2 &{} 1 \\ -6 &{} -1 &{} 2 \end{bmatrix}\ \ \ \begin{matrix} R_2 \rightarrow R_2-(2)R_1 \\ R_3 \rightarrow R_3- (-3)R_1 \end{matrix} \\ &= \begin{bmatrix} 1 &{} 0 &{} 0 \\ 2 &{} 1 &{} 0 \\ -3 &{} 0 &{} 1 \end{bmatrix} \begin{bmatrix} 2 &{} -1 &{} 3 \\ 0 &{} 4 &{} -5 \\ 0 &{} -4 &{} 11 \end{bmatrix}\ \ \ \begin{matrix} R_3 \rightarrow R_3- (-1)R_1 \end{matrix} \\ &= \begin{bmatrix} 1 &{} 0 &{} 0 \\ 2 &{} 1 &{} 0 \\ -3 &{} -1 &{} 1 \end{bmatrix} \begin{bmatrix} 2 &{} -1 &{} 3 \\ 0 &{} 4 &{} -5 \\ 0 &{} 0 &{} 6 \end{bmatrix}= LU \end{aligned}$$

Now \(Ly=b\) implies

$$\begin{aligned}\begin{bmatrix} 1 &{} 0 &{} 0 \\ 2 &{} 1 &{} 0 \\ -3 &{} -1 &{} 1 \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix}=\begin{bmatrix} 9 \\ 9 \\ 12 \end{bmatrix} \end{aligned}$$

Solving the system, we get \(y_1=9,y_2=-9\), and \(y_3=30\). Now consider the system \(Ux=y\)

$$\begin{aligned}\begin{bmatrix} 2 &{} -1 &{} 3 \\ 0 &{} 4 &{} -5 \\ 0 &{} 0 &{} 6 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}=\begin{bmatrix} 9 \\ -9 \\ 30 \end{bmatrix} \end{aligned}$$

Solving the system, we get \(x_1=-1,x_2=4\), and \(x_3=5\).

Theorem 1.21

If y and z are two distinct solutions of \(Ax=b\), then \(\lambda y + \mu z\) is also a solution of \(Ax=b\), for any scalars \(\lambda , \mu \in \mathbb {K}\) with \(\lambda + \mu =1\). If \(b=0\), \(\lambda y + \mu z\) is a solution of \(Ax=0\), for any scalars \(\lambda , \mu \in \mathbb {K}\).

Proof

Suppose that \(b \ne 0\) and y and z are two given solutions of \(Ax=b\), then \(Ay=b\) and \(Az=b\). Let \(\lambda , \mu \in \mathbb {K}\) be such that \(\lambda + \mu =1\). Then

$$\begin{aligned}A(\lambda y + \mu z)=\lambda Ay + \mu Az=\lambda b + \mu b=(\lambda + \mu )b=b \end{aligned}$$

Now let \(b=0\). If y and z are two given solutions of \(Ax=0\), then \(Ay=0\) and \(Az=0\). Then

$$\begin{aligned}A(\lambda y + \mu z)=\lambda Ay + \mu Az=0 \end{aligned}$$

Hence the proof.

8 Exercises

  1. 1.

    For any sets A and B, show that

    1. (a)

      \(A \cap B \subseteq A,B \subseteq A \cup B\).

    2. (b)

      \(A \subseteq B\) if and only if \(A \cap B =A\).

  2. 2.

    Consider the relation \(R= \lbrace (0,1),(0,2),(1,2) \rbrace \) on \(X= \lbrace 0,1,2 \rbrace \). Check whether R is an equivalence relation.

  3. 3.

    Let \(f : X \rightarrow Y \) and \(g : Y \rightarrow Z \) be any two functions. Then show that

    1. (a)

      if f and g are one-one, then \(g \circ f\) is one-one.

    2. (b)

      if f and g are onto, then \(g \circ f\) is onto.

  4. 4.

    Check whether the following functions are bijective or not.

    1. (a)

      \(f : \mathbb {R} \rightarrow \mathbb {R}\) defined by \(f(x)=x^2+1\)

    2. (b)

      \(f : [0, \pi ] \rightarrow [-1,1]\) defined by \(f(x)=sin\ x\)

    3. (c)

      \(f : \mathbb {R}^* \rightarrow \mathbb {R}^*\) defined by \(f(x)=\frac{1}{x}\)

    4. (d)

      \(f : \mathbb {C} \rightarrow \mathbb {C}\) defined by \(f(z)=\overline{z}\).

  5. 5.

    Let \(\lambda _i , \mu _i \in \mathbb {K},i \in \mathbb {N}\). Then show that

    1. (a)

      for \( 1 < p < \infty \) and \(\frac{1}{p} + \frac{1}{q} = 1\) , we have

      $$\begin{aligned}\sum _{i=1}^{\infty } |\lambda _i \mu _i |\le \left( \sum _{i=1}^{\infty } |\lambda _i |^p \right) ^{\frac{1}{p}} \left( \sum _{i=1}^{\infty } |\mu _i |^q\right) ^{\frac{1}{q}} \end{aligned}$$
    2. (b)

      for \( 1 < p < \infty \), we have

      $$\begin{aligned}\left( \sum _{i=1}^{\infty } |\lambda _i + \mu _i |^p \right) ^{\frac{1}{p}} \le \left( \sum _{i=1}^{\infty } |\lambda _i |^p \right) ^{\frac{1}{p}} + \left( \sum _{i=1}^{\infty } |\mu _i |^p\right) ^{\frac{1}{p}} \end{aligned}$$

    These inequalities are called Holder’s inequality and Minkowski’s inequality, respectively.

  6. 6.

    For \(1<p< \infty \), consider the following collections of sequences.

    $$\begin{aligned} l ^p = \left\{ v=(v_1,v_2, \ldots )\mid v_i \in \mathbb {K}\ \text {and}\ \sum _{i=1}^{\infty }|v_i |^p < \infty \right\} \end{aligned}$$

    and

    $$\begin{aligned} l ^{\infty } = \left\{ v=(v_1,v_2, \ldots )\mid v_i \in \mathbb {K}\ \text {and}\ \sup \limits _{i\in \mathbb {N}} |v_i |< \infty \right\} \end{aligned}$$

    Show that for \(u=(u_1,u_2, \ldots ), v=(v_1,v_2, \ldots ) \in l ^p \)

    $$\begin{aligned}d_p(u,v) = \left( \sum _{i=1}^{\infty } |u_i-v_i |^p \right) ^{\frac{1}{p}} \end{aligned}$$

    defines a metric on \( l ^p\) and for \(u=(u_1,u_2, \ldots ), v=(v_1,v_2, \ldots ) \in l ^{\infty } \),

    $$\begin{aligned}d_{\infty }(u,v) = \sup \limits _{i \in \mathbb {N}} |u_i-v_i | \end{aligned}$$

    defines a metric on \( l ^{\infty }\).

  7. 7.

    Let X be a metric space with respect to the metrics \(d_1\) and \(d_2\). Then show that each of the following:

    1. (a)

      \(d(x,y)=d_1(x,y)+d_2(x,y)\)

    2. (b)

      \(d(x,y)= \dfrac{d_1(x,y)}{1+d_1(x,y)} \)

    3. (c)

      \(d(x,y)=max \lbrace d_1(x,y)+d_2(x,y) \rbrace \)

    also defines a metric on X.

  8. 8.

    Let (Xd) be a metric space. Show that

    1. (a)

      union of any number of open sets is open.

    2. (b)

      finite intersection of open sets is open.

    Also give an example to show that arbitrary intersection of open sets need not necessarily be open.

  9. 9.

    Show that a set is closed if and only if it contains all its limit points.

  10. 10.

    Show that \( \left( l ^p,d_p \right) \) and \(\left( l ^{\infty },d_{\infty } \right) \) are complete metric spaces.

  11. 11.

    Show that a closed subspace of a complete metric space is complete.

  12. 12.

    Prove that if a sequence of continuous functions on [ab] converges on [ab] and the convergence is uniform on [ab], then the limit function f is continuous on [ab].

  13. 13.

    Let \(x \in \mathbb {R}\). Show that the sequence \(\{ x_n \}\), where \(x_n=\frac{\lfloor nx \rfloor }{n}\), is a rational sequence that converges to x. (\(\lfloor x \rfloor \) denotes the greatest integer less than or equal to x.)

  14. 14.

    Let \((G,*)\) be a group. Then show that

    1. (a)

      the identity element in G is unique.

    2. (b)

      each element in G has a unique inverse.

  15. 15.

    Center of a group: Let \(\left( G,* \right) \) be group. The center of G, denoted by \(\mathcal {Z}(G)\), is the set of all elements of G that commute with every other element of G

    1. (a)

      Show that \(\mathcal {Z}(G)\) is a subgroup of G.

    2. (b)

      Show that \(\mathcal {Z}(G)=G\) for an Abelian group.

    3. (c)

      Find the center of \(GL_2\left( \mathbb {K} \right) \) and \(S_3\).

  16. 16.

    Find the order of the following elements in \(GL_2\left( \mathbb {K} \right) \)

    1. (a)

      \(\begin{bmatrix} 1 &{} 0 \\ 0 &{} -1 \end{bmatrix}\)

    2. (b)

      \(\begin{bmatrix} 1 &{} 0 \\ 1 &{} 1 \end{bmatrix}\).

  17. 17.

    Let \(\phi : \left( G,*\right) \rightarrow \left( G',*'\right) \) be a homomorphism. Then, prove the following:

    1. (a)

      if e is the identity element in G, \(\phi (e)\) is the identity element in \( G'\).

    2. (b)

      \(Ker \left( \phi \right) \) is a subgroup of G.

    3. (c)

      for any \(g \in G\), if \(\mathcal {O}(g)\) is finite \(\mathcal {O}\left( \phi (g) \right) \) divides \(\mathcal {O}(g)\).

    4. (d)

      for any subgroup H of G, \(\phi \left( H \right) \) is a subgroup of \(\phi \left( G \right) \) and if H is Abelian, \(\phi \left( H \right) \) is also Abelian.

  18. 18.

    Consider \(\phi : GL_n\left( \mathbb {K} \right) \rightarrow \left( \mathbb {R}^* ,. \right) \), defined by \(\phi (A)=det(A)\).

    1. (a)

      Show that \(\phi \) is a homomorphism.

    2. (b)

      Find \(Ker \left( \phi \right) \).

  19. 19.

    Show that every cyclic group is Abelian.

  20. 20.

    Find the normal subgroups of \(S_3\).

  21. 21.

    Prove that \(\langle \mathbb {Q},+, . \rangle \) , \(\langle \mathbb {R},+, . \rangle \), and \(\langle \mathbb {C},+, . \rangle \) are fields with respect to the given algebraic operations. Also show that \(\langle \mathbb {Z},+,. \rangle \) is not a field.

  22. 22.

    Give an example of a finite field.

  23. 23.

    Show that \(\mathbb {K}[x ]=\lbrace a_0+a_1x +\cdots +a_{n-1}x^{n-1}+a_nx^n \mid a_i \in \mathbb {K}, n\in \mathbb {Z}^{+} \rbrace \) forms a ring with respect to the operations defined in Definition 1.44.

  24. 24.

    Prove the Fundamental Theorem of Algebra.

  25. 25.

    Show that the set of all \(n \times n\) matrices with entries in \(\mathbb {K}\), denoted by \(M_n\left( \mathbb {K} \right) \) with matrix addition and scalar multiplication, forms a ring with unity.

  26. 26.

    Find the rank of the matrix \(A= \begin{bmatrix} 1 &{} 2 &{} -1 &{} 3 \\ 4 &{} 5 &{} 3 &{} 6 \\ 0 &{} 1 &{} 2 &{} -1 \end{bmatrix}\) using row reduced echelon form.

  27. 27.

    Show that the set of all solutions of a homogeneous system of equations forms a group with respect to coordinate-wise addition and scalar multiplication.

  28. 28.

    Consider the system of equations

    $$\begin{aligned}2x_1+x_2+3x_3=9 \end{aligned}$$
    $$\begin{aligned}3x_1+2x_2+5x_3=15 \end{aligned}$$
    $$\begin{aligned}4x_1-2x_2+7x_3=16 \end{aligned}$$

    Solve the above system of equations using

    1. (a)

      Gauss Elimination method

    2. (b)

      LU Decomposition method.

    Is it possible to solve this system using Cramer’s rule? If yes, find the solution using Cramer’s rule.

Solved Questions related to this chapter are provided in Chap. 8.