Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Linear algebra is the language describing systems in finite (or countably infinite) dimensions, where dimension represents the number of variables at hand. This appears naturally in systems with more than one degrees of freedom or in approximate descriptions of complex systems in a discrete set of variables. Linear algebra also gives the basic framework of quantum mechanics, describing observables in terms of eigenvalues. In two and more dimensions, much of linear algebra is illustrated by vectors and their linear transformations.

Angular momentum \(\mathbf{J}\) is a vector that appears in numerous problems. Like energy and linear momentum, total angular momentum is a conserved quantity. For freely rotating rigid bodies, angular momentum is in proportion to angular velocity

$$\begin{aligned} \varvec{\Omega }= \Omega \, \mathbf{n}. \end{aligned}$$
(3.1)

The length \(\Omega \) denotes the rate of rotation and the direction denotes its orientation. According to Mach’s principle, angular velocity is commonly defined relative to the distant stars, where most of the mass is. For periodic motion with period P, the angular velocity satisfies

$$\begin{aligned} \Omega = \frac{2\pi }{P}. \end{aligned}$$
(3.2)

For motion at a separation \(\mathbf{r}\) about to a given axis, the instantaneous velocity is a tangent

$$\begin{aligned} \mathbf{v} = \frac{d \mathbf r}{dt} = \varvec{\Omega }\times \mathbf{r}. \end{aligned}$$
(3.3)

The associated linear momentum is the vector

$$\begin{aligned} \mathbf{p} = \frac{d}{dt} m\mathbf{r} = m \mathbf{v}, \end{aligned}$$
(3.4)

when the mass m is time-independent. The associated angular momentum of a particle with linear momentum \(\mathbf{p}\) is

$$\begin{aligned} \mathbf{J } = \mathbf{r} \times \mathbf{p}. \end{aligned}$$
(3.5)

Thus, \(\mathbf{J}\) is a vector formed out of \(\mathbf{r}\) and \(\mathbf{p}\). Transformations of \(\mathbf J\) follow the rules for vectors, e.g., when considering translation or rotation of a coordinate system.

Maps in linear algebra are represented by \(n\times m\) matrices, from a linear vector space of dimension m to a linear vector space of dimension n. These vector spaces are often over the real or complex numbers, e.g., \(\mathbb {R}^n\) or \(\mathbb {C}^n\). As such, matrices are comprised of row and column vectors of length m and, respectively, n. An \(n\times m\) matrix is said to be of dimension \(n\times m\).

Consider, for instance, a \(2\times 2\) matrix

$$\begin{aligned} C=\left( \begin{array}{cc} 1 &{} 2 \\ 2 &{} 0 \end{array} \right) . \end{aligned}$$
(3.6)

Its rows \(\mathbf{r}_{(i)}\) and columns \(\mathbf{c}_{(i)}\) \((i=1,2)\) can be schematically indicated as

(3.7)

with

$$\begin{aligned} \mathbf{r}_{1}^T=(1&2), ~\mathbf{r}_{2}^T=(2&0),~ \mathbf{c}_{1}=\left( \begin{array}{c}1 \\ 2 \end{array}\right) ,~\mathbf{c}_{2}=\left( \begin{array}{c}2 \\ 0 \end{array}\right) , \end{aligned}$$
(3.8)

where we explicitly include the transpose T to denote row vectors according to

$$\begin{aligned} \left( \begin{array}{c} x \\ y \end{array}\right) ^T = (x ~y). \end{aligned}$$
(3.9)

These \(2\times 2\) matrices describe various transformations in the two-dimensional plane, such as reflections, rotations and coordinate permutations. Key properties are eigenvalues and the associated eigenvectors, much of which depends on their determinants and symmetry properties.

2 Inner and Outer Products

Two linearly independent vectors \(\mathbf{a}\) and \(\mathbf{b}\) span a parallelogram. The projection of \(\mathbf{a}\) onto \(\mathbf{b}\) defines the inner product (further Sect. 3.5) \(\mathbf{a} \cdot \mathbf{b} = |\mathbf{a}| |\mathbf{b}| \cos \theta \) with

$$\begin{aligned} \cos \theta =\angle (\mathbf{a},\mathbf{b}) = \frac{\mathbf{a}\cdot \mathbf{b}}{|\mathbf{a}||\mathbf{b}|} \end{aligned}$$
(3.10)

denoting the cosine of the angle between the two, where \(|\mathbf{a}|\) refers to the length of \(\mathbf{a}\) satisfying \(\left| \mathbf{a}\right| = \sqrt{\mathbf{a}\cdot \mathbf{a} }\). Referenced to a Cartesian coordinate system with basis vectors \(\{\mathbf{i}, \mathbf{j},\mathbf{k}\}\), expressions obtain in component form, In three dimensions, we have

$$\begin{aligned} \mathbf{a} = a_1 \mathbf{i}+a_2\mathbf{j}+a_3\mathbf{k},~~\mathbf{b}=b_1\mathbf{i} + b_2\mathbf{j} + b_3\mathbf{k} \end{aligned}$$
(3.11)

and so

$$\begin{aligned} \mathbf{a}\cdot \mathbf{b} = a_1b_a+a_2b_2+a_3b_3. \end{aligned}$$
(3.12)

The outer product represents the area element, in area and orientation, of the parallellogram, represented by the normal vector

$$\begin{aligned} (a_2b_3-a_3b_2) \mathbf{i} + (a_3b_1-a_1b_3) \mathbf{j} + (a_1 b_2 -a_2 b_1) \mathbf{k}=\left( \begin{array}{c} a_2b_3 - a_3 b_2 \\ a_3 b_1 -a_1b_3 \\ a_1 b_2 - a_2b_1 \end{array}\right) , \end{aligned}$$
(3.13)

where we used the right handed rule in the direction of movement of a corkscrew turned from \(\mathbf{a}\) to \(\mathbf{b}\). Its length equals the area of the parallelogram

$$\begin{aligned} |\mathbf{a}\times \mathbf{b}| = |\mathbf{a}||\mathbf{b}|\sin \theta . \end{aligned}$$
(3.14)

3 Angular Momentum Vector

In circular motion, angular momentum \(\mathbf{J}\) is a vector with the same orientation as the angular velocity \({\varvec{\Omega }}\). By the vector identity

$$\begin{aligned} \mathbf{a } \times (\mathbf{b} \times \mathbf{c}) = \mathbf{b} (\mathbf{a}\cdot \mathbf{c}) - \mathbf{c} (\mathbf a\cdot \mathbf b) \end{aligned}$$
(3.15)

between vectors \(\mathbf{a,b,c}\), circular motion gives the specific angular momentum (angular momentum per unit mass)

$$\begin{aligned} \mathbf{j}=\mathbf{r}\times \mathbf{v} = \mathbf{r} \times ( {\varvec{\Omega }}\times \mathbf{r}) = r^2\varvec{\Omega }= r^2\Omega \mathbf{n}, \end{aligned}$$
(3.16)

since \(\mathbf{r \cdot \mathbf r}=r^2\) and \(\mathbf{r \cdot \varvec{\Omega }}=0\). With (3.4, 3.5), our model problem of circular motion, therefore, implies

$$\begin{aligned} \mathbf{j} = r^2\frac{2\pi }{P} = 2\frac{\pi r^2}{P} = 2 \frac{dA}{dt} \mathbf{n}, \end{aligned}$$
(3.17)

that is, \(\mathbf{j}\) represents twice the rate-of-change of surface area traced out by the radius \(\mathbf{r}\) in the orbital motion. Based on (3.43.17), this is a geometrical identify, not restricted to circular motion, familiar as Kepler’s third law in planetary motion.

Fig. 3.1
figure 1

Rotation in the (xy)-plane over an angle \(\varphi \) obtains by multiplication by a \(2\times 2\) matrix \(R(\varphi )\) of vectors \(\mathbf{z}=x\mathbf{i}_x+y\mathbf{i}_y\). In the complex plane, it corresponds to multiplication by \(e^{i\varphi }\)

3.1 Rotations

If \(z=re^{i\theta }\), then

$$\begin{aligned} w=ze^{i\varphi } = re^{i(\theta +\varphi )} = r \left( \cos (\theta +\varphi ) + i \sin (\theta +\varphi ) \right) , \end{aligned}$$
(3.18)

as illustrated in Fig. 3.1. That is

$$\begin{aligned} w=re^{i\theta }e^{i\phi } = r \left[ \cos \theta \cos \phi - \sin \theta \sin \phi + i (\sin \theta \cos \phi +\cos \theta \sin \theta ) \right] . \end{aligned}$$
(3.19)

Applied to the basic vectors \(\{\mathbf{i}_x,\mathbf{i}_y\}\), we have

$$\begin{aligned} \mathbf{i}_x^\prime = \mathbf{i}_x \cos \theta + \mathbf{i}_y\sin \theta ,~~\mathbf{i}_y^\prime = - \mathbf{i}_x \sin \theta + \mathbf{i}_y\cos \theta . \end{aligned}$$
(3.20)

Example 3.1. Consider a basis \(\{ \mathbf{i},\mathbf{j}\}\) rotating along with a point (xy) over the unit circle \(S^1\). That is, \(\mathbf{i}\) points to (xy) with local tangent \(\mathbf{j}\) to \(S^1\) with counter-clockwise orientation. Moving along \(S^1\) at constant angular velocity \(\omega \), \(\theta =\omega t\) as a function time t and (3.20) implies

$$\begin{aligned} \frac{d\mathbf{i}}{dt} = \omega \left[ - \mathbf{i} \sin \theta + \mathbf{j}\cos \theta \right] ,~~ \frac{d\mathbf{j}}{dt} = \omega \left[ - \mathbf{i} \cos \theta - \mathbf{j}\sin \theta \right] , \end{aligned}$$
(3.21)

and so

$$\begin{aligned} \frac{d\mathbf{i}}{dt} = \omega \mathbf{j},~~ \frac{d\mathbf{j}}{dt} = - \omega \mathbf{i}. \end{aligned}$$
(3.22)

3.2 Angular Momentum and Mach’s Principle

Following (3.5) and (3.23), circular particle motion satisfies

$$\begin{aligned} \mathbf{J} = I \, \mathbf{n},~~I= m\Omega r^2, \end{aligned}$$
(3.23)

where I denotes the moment of inertia about \(\mathbf{n}\).Footnote 1 Evidently, (3.23) implies that \(J=0\) whenever \(\Omega =0\) and visa-versa. In an astronomical context, we may follow MachFootnote 2 and define the angular velocity as the rate of change of angles measured relative to the distant stars. Does (3.23) hold in general?

It turns out that angular momentum is sensitive to matter in the universe anywhere. While (3.23) holds true to great precision under ordinary circumstances when \(\Omega \) is defined relative to the distant stars, deviations appear in the proximity of massive rotating bodies. This can be detected in tracking the orientation \(\mathbf{n}\) of a freely suspended gyroscope relative to a distant star. Recently, the NASA satellite Gravity Probe BFootnote 3 did just that, and measured an angular velocity in \(\mathbf{n}\) at a minute rate of

$$\begin{aligned} \omega = - 39 \,\text{ mas } \text{ yr }^{-1} = -6\times 10^{-16} \,\text{ rad } \text{ s }^{-1}. \end{aligned}$$
(3.24)

It agrees within a 20% window of uncertainty with the frame-dragging angular velocity of space-time around the earth, induced by Earth’s angular momentum according to the theory of general relativity. According to the exact solution of rotating black holes in general relativity [3], (3.24) is the frame-dragging angular velocity at about 5 million Schwarzschild radii around a maximally spinning black hole with the same angular as the Earth (and 27 times its mass).

Though small, (3.24) defines a key result in our views on the relation between rotation and angular momentum, that comes out non-trivially in curved space-time predicted by the theory of general relativity. In particular, it changes our perception of the ballerina effect (Fig. 3.2). In reality, a ballerina standing still with respect to the distant stars experiences a slight lifting of her arms up, due to her non-zero angular momentum imparted by frame-dragging around Earth.Footnote 4 In a twist to the original formulation of Mach’s principle, she would experience co-rotation  with an angular velocity (3.24) for her arms to be down  in a fully relaxed state.

Fig. 3.2
figure 2

(Left) In flat space-time, the ballerina effect a correspondence between zero angular velocity \(\Omega \) relative to the distant stars and zero angular momentum J (Mach’s principle). (Right.) In curved space-time, the ballerina effect is different. Here, \(J=(\Omega -\omega )I\), where I denotes the moment of inertia and \(\omega \) is the frame-dragging angular velocity along the angular momentum \(J_{M}\) of a massive object nearby. As a result, \(\Omega =\omega \) for \(J=0\) and \(J<0\) when \(\Omega =0\). Mach’s principle is to be generalized include all matter, including massive objects in a local neighborhood

Frame dragging (3.24) induced by the angular momentum of the Earth is manifest also in energetic spin-spin interactions.Footnote 5 In response, particles with angular momentum \(J_p\) about the spin axis of the Earth experience a potential energy [4]

$$\begin{aligned} E=\omega J_p, \end{aligned}$$
(3.25)

that represents a line-integral of Papapetrou forces [5] mediated by \(\omega \). The energy (3.25) is notoriously small for \(J_p\) of classical objects. However, for charged particles like electrons or protons in magnetic fields around black holes, E can be huge and reach energies on the scale of Ultra High Energy Cosmic Rays (UHECRs) . Measurement of (3.25) around the Earth awaits future satellite experiments.

3.3 Energy and Torque

Angular momentum \(\mathbf{J}=J \mathbf{n}\) can be changed by application of a torque, defined as

$$\begin{aligned} \mathbf{T} = \frac{d}{dt} \mathbf{J} = \mathbf{n} \frac{d}{dt} J + J \frac{d}{dt} \mathbf{n}. \end{aligned}$$
(3.26)

The dimension of torque is energy, as follows from [J] = g cm\(^2\) s\(^{-1}\) (mass times rate of change of area). Because angular momentum is a vector, (3.26) shows the appearance of a torque already when changing its orientation, even when keeping its magnitude constant. In this case, (3.26) may be due to a rotation, i.e.,

Fig. 3.3
figure 3

Changing the orientation \(\mathbf{n}\) of the angular momentum of a spinning wheel by a rotation introduces a component in an orthogonal direction, here along the vertical direction. Since angular momentum is conserved, a corresponding negative amount of angular momentum along the vertical direction is imparted by the person holding the wheel. The person will experience a counter-torque along the vertical axis

$$\begin{aligned} \Delta \mathbf{T} = J \left( \mathcal{R} -I\right) \mathbf{n} \end{aligned}$$
(3.27)

where \(\mathcal{R}\) is a rotation matrix. (More on matrices in Sect. 3.5) For a rotation over an angle \(\varphi \) about the x-axis, for example, we have (Sect. 3.4.2)

$$\begin{aligned} \mathcal{R} = \left( \begin{array}{ccc} 1 &{} 0 &{} 0 \\ 0 &{} \cos \varphi &{} -\sin \varphi \\ 0 &{} -\sin \varphi &{} \cos \varphi \end{array}\right) \end{aligned}$$
(3.28)

Feymnan [6] gives an illustrative set-up that can be performed using a bicycle wheel attached freely to a rod. In this event, \(\mathbf{n}\) is along the y-axis when the rod is initially held horizontally. Attempting to rotate the rod along the x-axis in an effort to move the wheel overhead is described by (3.27), see Fig. 3.3. By (3.28), it introduces a component of \(\Delta \mathbf{T}\) along the z-axis. The person performing the rotation will experience a tendency to start rotating in the opposite direction to the angular momentum of the wheel, by conservation of total angular momentum in all three dimensions (in each of the three components xy and z), i.e.,

$$\begin{aligned} \mathbf{J}_{wheel}+\mathbf{J}_{person}=\mathbf{0}. \end{aligned}$$
(3.29)

Since power is a scalar of dimension energy s\(^{-1}\), the power delivered to or extracted from a rotating object is given by the inner product of torque and angular velocity, i.e.,

$$\begin{aligned} P={\varvec{\Omega }} \cdot \mathbf{T}. \end{aligned}$$
(3.30)

For our circular motion, we have \(\mathbf{T}=\frac{d}{dt}{} \mathbf{J}=I\frac{d}{dt}{\varvec{\Omega }}\), and hence

$$\begin{aligned} P = \frac{d}{dt}\left( \frac{1}{2}\Omega ^2\right) \end{aligned}$$
(3.31)

It follows that the rotational energy in case of \(\mathbf{J}=I{\varvec{\Omega }}\) satisfies

$$\begin{aligned} E_{rot}=\frac{1}{2}\Omega ^2 I = \frac{1}{2} {\varvec{\Omega }}\cdot \mathbf{J}. \end{aligned}$$
(3.32)

Although (3.32) applies to non-relativistic mechanics such as spinning tops, somewhat remarkably it gives a fairly good approximation also to the rotational energy \(E_{rot} = k \, {\varvec{\Omega }}\cdot \mathbf{J}\), \(k^{-1}= {2\cos ^2(\lambda /4)}\), of a rotating black hole with non-dimensional angular momentum \(\sin \lambda \), since

$$\begin{aligned} \frac{1}{2} \le k \le 0.5858. \end{aligned}$$
(3.33)

To exemplify angular momentum conservation, consider the problem of the Moon’s migration, in absorbing angular momentum in the Earth’s spin due to a gravitational tidal torque.

Example 3.2. Some 4.52 Gyr ago, the Earth’s spin period at birth was \(P=5.4\) h before the Moon was born. The Earth’s normalized angular velocity

$$\begin{aligned} A_0=\left( \frac{\Omega }{\Omega _b}\right) _\oplus \end{aligned}$$
(3.34)

then (4.52 Gyr ago) was very similar to the same for Jupiter today, where

$$\begin{aligned} \Omega =\frac{2\pi }{P},~~ \Omega _b=\sqrt{\frac{GM}{R^3}} \end{aligned}$$
(3.35)

denote the actual and, respectively, break-up angular velocity for a planet of mass M and radius R, and G is Newton’s constant. Some data:

$$\begin{aligned} \begin{array}{ll} \text{ Earth: } &{} M_\oplus =5.97\times 10^{27}\,\text{ g },~R_\oplus =6000\,\text{ km },~~P_\oplus =24\,\text{ h },\\ \\ \text{ Jupiter: } &{} M\simeq 320M_\oplus ~~, R\simeq 11R_\oplus ,~~P\simeq 0.5P_\oplus . \end{array} \end{aligned}$$
(3.36)

The above follows from the following.

  • For the Earth’s \(\Omega _{\oplus ,b}\) and today’s value \(\Omega _\oplus =2\pi /P_\oplus \), we have

    $$\begin{aligned} A_1=\left( \frac{\Omega }{\Omega _b}\right) _{\oplus }. \end{aligned}$$
    (3.37)
  • The change \(P_\oplus \) to 5.4 h from 24 h today satisfies the scaling

    $$\begin{aligned} \left( \frac{\Omega }{\Omega _b}\right) _\oplus \propto P^{-1}_\oplus . \end{aligned}$$
    (3.38)
  • Consequently, the spin angular velocity relative to break up at birth satisfies

    $$\begin{aligned} A_0=\left( \frac{5.4\,\text{ h }}{24\,\text{ h }}\right) ^{-1} A_1. \end{aligned}$$
    (3.39)

    that may be compared to the same ratio of Jupiter today.

With Newton’s constant \(G=6.67\times 10^{-8}\) g\(^{-1}\) cm\(^3\) s\(^{-2}\) (recall that \(G\rho \) has dimension angular velocity squared, i.e., s\(^{-2}\).), we have by explicit calculation

$$\begin{aligned} \Omega _\oplus = 7.27\times 10^{-5}\,\text{ rad } \text{ s }^{-1},~~\Omega _{\oplus ,b} = 1.36\times 10^{-3}\,\text{ rad } \text{ s }^{-1} \end{aligned}$$
(3.40)

and hence the ratio

$$\begin{aligned} A_1 = 0.0536. \end{aligned}$$
(3.41)

By aforementioned scaling with \(P_\oplus \), we have

$$\begin{aligned} A_0=\left( \frac{24\,\text{ h }}{5.4\,\text{ h }}\right) A_1 = 4.44A_0\simeq 0.2380 \end{aligned}$$
(3.42)

Repeating the above for Jupiter,

$$\begin{aligned} B_1=\left( \frac{\Omega }{\Omega _b}\right) _{J} = 0.2185, \end{aligned}$$
(3.43)

that is, our \(A_0\) 4.52 Gyr ago and Jupiter’s \(B_1\) today are very similar. As a consequence, we expect the weather of the Earth at birth to be very similar to that of Jupiter today, essentially a permanent storm by exceedingly large Coriolis forces. Recall that Coriolis forces scale with \(\Omega _{\oplus }^2\propto P_\oplus ^{-2}\). They were initially some 20 times stronger than they are now. Thanks, in part, to spin down by the Moon, we can enjoy today’s clement climate [7].

3.4 Coriolis Forces

Conservation of angular momentum gives rise to apparent forces when moving things around by external forces that leave the angular momentum invariant, as in the absence of any frictional forces. The specific angular momentum in the presence of an angular velocity \(\omega \) is

$$\begin{aligned} j = \omega \sigma ^2:~\omega =\frac{j}{\sigma ^2}, \end{aligned}$$
(3.44)

where \(\sigma \) denotes the distance to the axis of rotation. Moving a fluid element along the radial direction changes \(\omega \), as when the ballerina moves stretched arms inwards, according to \(\delta \omega =-2 {j}{\sigma ^{-3}}\delta \sigma \). It comes with a change in azimuthal velocity \(\delta v_\varphi =\sigma \delta \omega \) seen in a corotating frame, satisfying

$$\begin{aligned} \delta v_\varphi =- 2\omega \delta \sigma . \end{aligned}$$
(3.45)

In vector form, (3.45) is

$$\begin{aligned} \frac{d}{dt} \mathbf{v}_\varphi = -2{\varvec{\omega }}\times \mathbf{v}_\sigma . \end{aligned}$$
(3.46)

This result is commonly expressed in terms of the Coriolis force

$$\begin{aligned} F_c = m\frac{d}{dt} \mathbf{v}_\varphi = 2m \mathbf{v} \times {\varvec{\omega }} \end{aligned}$$
(3.47)

Coriolis forces are particularly relevant when working in a rotating frame of reference. In particular, all of us terrestrial inhabitants living with the rotating frame fixed to Earth’s surface. Air moving to a different latitude is subject to (3.47), since it changes the distance \(\sigma \) to the Earth’s axis of rotation, which is approximately polar. Let \(\Omega \) denote the absolute angular velocity of the Earth (relative to the distant stars), and express the angular velocity of the air \(\omega ^\prime = \omega -\Omega \) relative to it, as measured in this rotating frame. Since \(\delta \omega ^\prime =\delta \omega \), moving air from, say, in the direction of the equator produces a retrograde azimuthal velocity (rotation at an angular velocity \(\omega <\Omega \)). Moving it a constant angular velocity towards the equator produces a curved trajectory in response to the (retrograde) constant Coriolis force (3.47). This may give rise to large scale circulation patterns in combination with pressure gradients.

Fig. 3.4
figure 4

Precession of the spinning top causes a velocity \(\dot{\mathbf{n}}\) in the orientation \(\mathbf{n}\) of the angular momentum, such that \(d\mathbf{J}/dt=J\dot{\mathbf{n}}\) absorbs the torque due to the gravitational force \(\mathbf{F}_g\) applied at its center of mass CM. In the idealized friction-free set-up, this process involves no exchange of energy or dissipation

3.5 Spinning Top

The motion of a spinning top tilted at at angle \(\theta \) exemplifies the interaction of angular momentum as a vector with a torque, \(\mathbf{T}\), applied continuously by the Earth’s gravitational force \(\mathbf{F}_g\) as illustrated in Fig. 3.4. In general, we have the relations

$$\begin{aligned} \mathbf{T}=\frac{d}{dt} \mathbf{J} = \mathbf{r} \times \frac{d}{dt} \mathbf{p} = \mathbf{r}\times \mathbf{F}_g. \end{aligned}$$
(3.48)

For a top that spins with no friction, the magnitude of its angular momentum vector is conserved. By (3.26, 3.27), the top precesses at an angular velocity \({\varvec{\Omega }}_p\) about the z-axis, \(\Omega _p = d\phi /dt\), satisfying

$$\begin{aligned} \frac{d}{dt}{} \mathbf{J} = J \frac{d}{dt}{} \mathbf{n} = J{\varvec{\Omega }}_p\times \mathbf{n} = {\varvec{\Omega }}_p\times \mathbf{J}. \end{aligned}$$
(3.49)

By (3.48), \(T = \Omega _p J \sin \theta = r W \sin \theta \), and hence the angular velocity of precession about the vertical axis satisfies

$$\begin{aligned} \Omega _pJ = rW, \end{aligned}$$
(3.50)

where W denotes the weight of the top and r the distance of its center of mass away from its pivot on the table.

Fig. 3.5
figure 5

Shown is a ring of radius R rotating at an angular velocity \(\omega \) about a horizontal axis of length l, supported by a pivot that allow rotation at a precession angular velocity \(\omega _p\) about the vertical axis. Upon translation of the center of mass (CM) to the origin of a spherical coordinate system, mass elements on the ring move over the surface of a sphere of radius R, parameterized by a poloidal and azimuthal angle \(\theta \) and, respectively, \(\varphi \), wherein \(d\theta /dt=\omega \) and \(d\varphi /dt=\omega _p\)

Example 3.3. Illustrative for some vector calculations is a more explicit calculation of the precession frequency (3.50). To this end, Fig. 3.5 shows a massive ring of radius R spinning at an angular velocity \(\omega \), whereby it attains an angular momentum per unit mass \(J=\omega R^2\). Suppose it is mounted to one end of a rod, that is suspended at a pivot at the other end. An approximately horizontal rod hereby precesses with an angular velocity \(\omega _p\) about the vertical axis without dropping to a vertical position, satisfying (3.49). This result is invariant under linear translation of the CM. Precession is entirely due to the motion of mass-elements about the ring’s CM, allowing us to place the CM at the origin of a spherical coordinate system \((r,\theta ,\varphi )\), as if the CM where placed at the pivot.

With \(\varvec{\omega }=\omega \mathbf{i}_z\), the outer product \(\omega \times \mathbf{r}\) is the rotational velocity \(\mathbf{v}_\phi \) of the end point of a vector \(\mathbf{r}\) and that \(\mathbf{v}_\phi = \omega \sigma \), where \(\sigma =b\sin \theta \) is the distance to the axis of rotation. A mass element \(\delta m=(M/2\pi )\delta \theta \) in the ring herein assumes an angular momentum \(\delta \mathbf{J} = \mathbf{r} \times \delta \mathbf{p} = \delta m \mathbf{r}\times \mathbf{v}\) with position vector

$$\begin{aligned} \mathbf{r} = b\left( \begin{array}{c} \sin \theta \cos \varphi \\ \sin \theta \sin \varphi \\ \cos \theta \end{array} \right) ,~~\omega =\frac{d\theta }{dt},~~\omega _p = \frac{d\varphi }{dt}, \end{aligned}$$
(3.51)

and associated velocity \(\mathbf{v}=d\mathbf{r}/dt\),

$$\begin{aligned} \mathbf{v} = b\left( \begin{array}{c} \cos \theta \cos \varphi \\ \cos \theta \sin \varphi \\ -\sin \theta \end{array} \right) \omega + b\left( \begin{array}{c} -\sin \theta \sin \varphi \\ \sin \theta \cos \varphi \\ 0\end{array} \right) \omega _p, \end{aligned}$$
(3.52)

and acceleration \(\mathbf{a}=d\mathbf{v}/dt\),

$$\begin{aligned} \mathbf{a} = - b\left( \begin{array}{c} \sin \theta \cos \varphi \\ \sin \theta \sin \varphi \\ \cos \theta \end{array} \right) \omega ^2 - b\left( \begin{array}{c} \sin \theta \cos \varphi \\ \sin \theta \sin \varphi \\ 0\end{array} \right) \omega _p^2 \end{aligned}$$
(3.53)
$$\begin{aligned} + 2b\left( \begin{array}{c} -\cos \theta \sin \varphi \\ \cos \theta \cos \varphi \\ 0\end{array} \right) \omega \omega _p. \end{aligned}$$
(3.54)

Its inertia introduces a torque

$$\begin{aligned} \delta T = \frac{d\delta \mathbf{J}}{dt} = \delta m \left( \frac{d}{dt}{} \mathbf{r}\times \mathbf{v} + \mathbf{r} \times \frac{d}{dt}{} \mathbf{v}\right) = \delta m \mathbf{r}\times \mathbf{a} \end{aligned}$$
(3.55)

that evaluates to

$$\begin{aligned} \delta \mathbf{T}=\delta m b\left( \omega _p^2\mathbf{r}\times \mathbf{i}_z + 2\omega \omega _p \mathbf{r}\times \left( \begin{array}{c}-\cos \theta \sin \varphi \\ \cos \theta \cos \varphi \\ 0 \end{array}\right) \right) \end{aligned}$$
(3.56)

To finalize, we integrate (3.56) over all mass elements \(\delta m\). Making use of the following averages over the fast angle \(\theta \),

(3.57)

and

(3.58)
(3.59)

we arrive at a total inertial torque \(\mathbf{T}=\int _0^{2\pi } \delta \mathbf{T}\),

(3.60)

With \(J=I\omega \) expressed in the moment of inertia \(I=Mb^2\), the latter reduces to \(T=\omega \omega _pMb^2=\omega _pJ\), i.e., our vector identity (3.49).

In Fig. 3.5, if the bar holding the rotating wheel is initially suspended horizontally at the pivot with zero angular momentum about the z-axis, then the onset of precession \(\omega _p\)—balancing inertial to gravitational torque \(gM\sigma \)—produces a finite angular momentum \(J_z = M\sigma ^2\omega _p\) about the z-axis (upwards, say), where \(\sigma = l\cos \alpha \) is the arm length to the z-axis, now at a dip angle \(\alpha \). Since the total angular momentum about the z-axis remains zero, \(J_z=J\sin \theta \) (pointing downwards). Given \(\omega _pJ=Mg\sigma \), it follows that (cf. Exercise 3.3)

$$\begin{aligned} \tan \alpha = \left( \omega _p/\Omega \right) ^{2}, \end{aligned}$$
(3.61)

where \(\Omega = \sqrt{g/\sigma }\).

4 Elementary Transformations in the Plane

In the two-dimensional plane with Cartesian coordinates (xy), transformations describe a map

$$\begin{aligned} \mathbf{z}= x\mathbf{i}_x + y \mathbf{i}_y \rightarrow \mathbf{w}=x^\prime \mathbf{i}_x + y^\prime \mathbf{i}_y. \end{aligned}$$
(3.62)

When linear, such map is a matrix multiplication \(\mathbf{w} = C\mathbf{z}\) ,

$$\begin{aligned} \mathbf{z} = \left( \begin{array}{c} x \\ y \end{array}\right) = x \left( \begin{array}{c} 1 \\ 0 \end{array}\right) + y \left( \begin{array}{c} 0 \\ 1 \end{array}\right) \end{aligned}$$
(3.63)

with

(3.64)

and

$$\begin{aligned} \mathbf{r}^T = (a~b):~~\mathbf{r}^T \mathbf{z} = ax + by. \end{aligned}$$
(3.65)

Equivalently, we have

(3.66)

These two views (3.643.66) explicitly bring about linearity in the row and column vectors of C.

When working in the two-dimensional plane, we note that (3.62) is equivalent to a map of complex numbers \(z= x+iy \rightarrow w=x^\prime + i y^\prime \), that is occasionally useful when working with conformal transformations \(w=w(z)\) (\(w^\prime (z)\ne 0\)).

4.1 Reflection Matrix

Figure 3.6 illustrates reflections in the two-dimensional plane about the x-axis, the y-axis and through the origin, \(\mathcal{O}=(0,0)\). Reflection about the x-axis is described by

$$\begin{aligned} \mathbf{z}= x\mathbf{i}_x + y \mathbf{i}_y \rightarrow \mathbf{w}=x \mathbf{i}_x - y \mathbf{i}_y. \end{aligned}$$
(3.67)

The same transformation can be written as a matrix equation for the equations \(x^\prime = x\) and \(y^\prime = -y\) as follows

$$\begin{aligned} \left( \begin{array}{c} x^\prime \\ y^\prime \end{array} \right) = \left( \begin{array}{rr} 1 &{} 0 \\ 0 &{} -1 \end{array}\right) \left( \begin{array}{c} x \\ y \end{array}\right) . \end{aligned}$$
(3.68)

Reflection about the y-axis is described by

Fig. 3.6
figure 6

Reflections in the (xy)-plane about the x-axis, the y-axis and through the origin, take \(z=(x,y)\) to, respectively, \(w_1=(x,-y)\), \(w_2=(-x,y)\) and \(w_3=-z\). Each transformation is described by a \(2\times 2\) matrix acting on the vector \(\mathbf{z}=x\mathbf{i}_x+y\mathbf{i}_y\)

$$\begin{aligned} \mathbf{z}= x\mathbf{i}_x + y \mathbf{i}_y \rightarrow \mathbf{w}=- x \mathbf{i}_x + y \mathbf{i}_y. \end{aligned}$$
(3.69)

The same transformation can be written as a matrix equation for \(x^\prime = -x\) and \(y^\prime = y\) as follows

$$\begin{aligned} \left( \begin{array}{c} x^\prime \\ y^\prime \end{array} \right) = \left( \begin{array}{rr} -1 &{} 0 \\ 0 &{} 1 \end{array}\right) \left( \begin{array}{c} x \\ y \end{array}\right) . \end{aligned}$$
(3.70)

As mentioned above, (3.673.69) are equivalent to taking \(z\,\epsilon \,\mathbb {C}\) into, respectively,

$$\begin{aligned} w_1=\bar{z}=x-iy,~~w_2=-\bar{z}=-x+iy. \end{aligned}$$
(3.71)

Reflection about the origin is described by

$$\begin{aligned} \mathbf{z}= x\mathbf{i}_x + y \mathbf{i}_y \rightarrow \mathbf{w}=- x \mathbf{i}_x - y\mathbf{i}_y,~~w_3=-z, \end{aligned}$$
(3.72)

The same transformation can be written as a matrix equation for the equations \(x^\prime =- x\) and \(y^\prime = -y\) as follows

$$\begin{aligned} \left( \begin{array}{c} x^\prime \\ y^\prime \end{array} \right) = \left( \begin{array}{rr} -1 &{} 0 \\ 0 &{} -1 \end{array}\right) \left( \begin{array}{c} x \\ y \end{array}\right) . \end{aligned}$$
(3.73)

The identity matrix is the defined by the transformation which leaves \(\mathbf{z}\) the same, i.e.,

$$\begin{aligned} I=\left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} 1 \end{array}\right) . \end{aligned}$$
(3.74)

4.2 Rotation Matrix

The above can be extended to continuous transformations such as rotations. The rotation matrix can be derived from the multiplication of complex numbers following (3.18) and (3.62). With \(\mathbf{z}=r\cos \theta \mathbf{i}_x + r \sin \theta \mathbf{i}_y\), we have

$$\begin{aligned} \left( \begin{array}{c} x^\prime \\ y^\prime \end{array}\right) = R(\varphi )\left( \begin{array}{c} x \\ y \end{array}\right) , \end{aligned}$$
(3.75)

in terms of the rotation matrix

$$\begin{aligned} R(\varphi ) = \left( \begin{array}{cr} \cos \varphi &{} -\sin \varphi \\ \sin \varphi &{} \cos \varphi \end{array}\right) . \end{aligned}$$
(3.76)

Evidently, it satisfies

$$\begin{aligned} \text{ det }\!R=1,~~R(-\varphi )=R^{-1}(\varphi ) = R^T(\varphi ), \end{aligned}$$
(3.77)

where \(R^{-1}\) refers to the inverse of R, \(R^T\) refers to the transpose and

$$\begin{aligned} \text{ det } \left( \begin{array}{rr} a_{11} &{} a_{12} \\ a_{21} &{} a_{22} \end{array} \right) = a_{11}a_{22} - a_{12}a_{21} \end{aligned}$$
(3.78)

defines the determinant of a \(2\times 2\) matrix.

For what follows, we shall generalize (3.9) to matrices. For a square \(n\times n\) matrix A, the transpose obtains by interchanging the off-diagonal components \(a_{ij}\) \((i\ne j)\) about the principle diagonal containing the \(a_{ii}\). Schematically, if L refers to the upper diagonal elements and U refers to the lower diagonal elements, then

$$\begin{aligned} A = \left( \begin{array}{ccccc} a_{11} &{} &{} &{} U \\ &{} a_{22} &{} \\ &{} &{} \cdots &{} \\ L &{} &{} &{} a_{nn} \end{array}\right) \rightarrow A^T = \left( \begin{array}{ccccc} a_{11} &{} &{} &{} L \\ &{} a_{22} &{} \\ &{} &{} \cdots &{} \\ U &{} &{} &{} a_{nn} \end{array}\right) . \end{aligned}$$
(3.79)

The rotation matrix \(R(\varphi )\) in (3.76) is anti-symmetric in its off-diagonal elements, i.e., \(U=-L\). A square matrix is said to be anti-symmetric, if \(U=-L\) and the elements on the principle diagonal are zero. Since the diagonal elements in (3.76) are non-zero, \(R(\varphi )\) is not an anti-symmetric matrix.

Example 3.4. A symmetric matrix, satisfying \(U=L\) as defined in (3.79), is the Lorentz boost

$$\begin{aligned} \Lambda (\mu ) = \left( \begin{array}{cr} \cosh \mu &{} \sinh \mu \\ \sinh \mu &{} \cosh \mu \end{array}\right) , \end{aligned}$$
(3.80)

that appears in the transformation of four-momenta in Malinowski space. Both \(R(\varphi )\) and \(\Lambda (\mu )\) have determinant one,

$$\begin{aligned} \text{ det }\,R(\varphi ) = \cos ^2\varphi + \sin ^2\varphi = 1,~~\text{ det }\, \Lambda (\mu ) = \cosh ^2\mu - \sinh ^2\mu = 1. \end{aligned}$$
(3.81)

5 Matrix Algebra

Multiplication of two matrices A of dimension \(p\times m\) and B of dimension \(m\times q\) produces a new matrix \(C=AB\) of dimension \(p\times q\). Each entry of C is the inner product of a row from A and a column from B. Schematically, the product C of two \(2\times 2\) matrices is

(3.82)

upon considering A in terms of its rows and B in terms of its columns. The entries of C satisfy

$$\begin{aligned} c_{ij} = (a_{i1} ~ a_{i2}) \left( \begin{array}{c} b_{1j} \\ b_{2j} \end{array} \right) = a_{i1}b_{1j}+a_{i2}b_{2j}. \end{aligned}$$
(3.83)

The product \(D=BA\) of the same \(2\times 2\) matrices satisfies

(3.84)

upon considering B in terms of its rows and A in terms of its columns, so that

$$\begin{aligned} d_{ij} = (b_{i1} ~ b_{i2}) \left( \begin{array}{c} a_{j1} \\ a_{j2} \end{array} \right) = b_{i1}a_{1j}+b_{i2}a_{2j}. \end{aligned}$$
(3.85)

It is easy to see that in general \(D\ne C\), i.e., matrix multiplication does not commute,

$$\begin{aligned}{}[A,B]=AB-BA\ne 0, \end{aligned}$$
(3.86)

where the notation \([\cdot ,\cdot ]\) refers to the commutator.

Example 3.3. To illustrate, consider the two matrices

$$\begin{aligned} A=\left( \begin{array}{rr} 0 &{} -1 \\ 1 &{} 0 \end{array}\right) ,~~B=\left( \begin{array}{rr} 0 &{} 1 \\ 1 &{} 0 \end{array}\right) . \end{aligned}$$
(3.87)

The commutation [AB] then evaluates to

$$\begin{aligned} \left( \begin{array}{rr} 0 &{} -1 \\ 1 &{} 0 \end{array}\right) \left( \begin{array}{rr} 0 &{} 1 \\ 1 &{} 0 \end{array}\right) - \left( \begin{array}{rr} 0 &{} 1 \\ 1 &{} 0 \end{array}\right) \left( \begin{array}{rr} 0 &{} -1 \\ 1 &{} 0 \end{array}\right) = 2\left( \begin{array}{rr} -1 &{} 0 \\ 0 &{} 1 \end{array}\right) . \end{aligned}$$
(3.88)

6 Eigenvalue Problems

Eigenvalue problems are defined by the equation

$$\begin{aligned} A\mathbf{a} = \lambda \mathbf{a}, \end{aligned}$$
(3.89)

where \(\mathbf{a}\) refers to an eigenvector associated with the eigenvalue \(\lambda \). Equivalently, \(\mathbf{a}\) is in the null-space (is a right null-vector) of \(A-\lambda I\):

$$\begin{aligned} \left( A - \lambda I \right) \mathbf{a} =\mathbf{0}. \end{aligned}$$
(3.90)

For (3.90) to have a non-trivial solution \(\mathbf{a}\), we must have

$$\begin{aligned} \text{ det } \left( A - \lambda I \right) = 0. \end{aligned}$$
(3.91)

6.1 Eigenvalues of \(R(\varphi )\)

Let us explore (3.90, 3.91) for the rotation matrix \(R(\varphi )\),

$$\begin{aligned} 0=\left| R - \lambda I \right| = (\cos \varphi - \lambda )^2 + \sin ^2\varphi :~\lambda _\pm = \cos \varphi + i \sin \varphi = e^{\pm i\varphi }. \end{aligned}$$
(3.92)

The eigenvalues are \(S^1\). It is a consequence of the fact that rotation is unitary (see Sect. 3.7). Also, the eigenvalues satisfyFootnote 6

$$\begin{aligned} \lambda _1\lambda _2 = | R | = 1. \end{aligned}$$
(3.93)

The associated eigenvectors

$$\begin{aligned} \mathbf{a} = \left( \begin{array}{c} \alpha _1 \\ \alpha _2 \end{array} \right) \end{aligned}$$
(3.94)

satisfy (3.90). To be definite, (3.90) defines two homogeneous equations in the two unknown coefficients \((\alpha _1,\alpha _2)\),

$$\begin{aligned} \left\{ \begin{array}{l} \alpha _1 \cos \varphi - \alpha _2 \sin \varphi - \lambda \alpha _1=0,\\ \alpha _1 \sin \varphi + \alpha _2 \cos \varphi - \lambda \alpha _2=0. \end{array}\right. \end{aligned}$$
(3.95)

For the eigenvalues satisfying (3.93), these two equations are linearly dependent. It suffices to take one of them, to solve for \(\alpha _1\) and \(\alpha _2\),

$$\begin{aligned} \alpha _1 (\cos \varphi - \lambda ) - \alpha _2 \sin \varphi =0: ~ \alpha _1 =- i \alpha _2,~~\alpha _1=i\alpha _2 \end{aligned}$$
(3.96)

for \(\lambda =e^{i\varphi }\) and, respectively, \(\lambda =e^{-i\varphi }\). We thus arrive at the eigenvector-eigenvalue pairs

$$\begin{aligned} \left\{ e^{i\varphi },\left( \begin{array}{c} 1 \\ -i \end{array} \right) \right\} , \left\{ e^{-i\varphi },\left( \begin{array}{c} 1 \\ i \end{array} \right) \right\} . \end{aligned}$$
(3.97)

These two pairs are complex conjugates. This is no surprise since \(R(\varphi )\) is a real matrix, whose determinant \(\left| R-\lambda I \right| \) defines a quadratic polynomial in \(\lambda \). With real coefficients, its roots are either both real or a pair of complex conjugates.

6.2 Eigenvalues of a Real-Symmetric Matrix

The matrix

$$\begin{aligned} A = \left( \begin{array}{ccccc} 2 &{} 1 \\ 1 &{} 0 \end{array}\right) \end{aligned}$$
(3.98)

is real-symmetric with eigenvalues-eigenvectors \((\lambda _\pm ,\mathbf{x}_\pm )\)

$$\begin{aligned} \left\{ 1+\sqrt{2},\left( \begin{array}{c} 1+\sqrt{2} \\ 1 \end{array} \right) \right\} , \left\{ 1-\sqrt{2} ,\left( \begin{array}{c} 1-\sqrt{2} \\ 1 \end{array} \right) \right\} . \end{aligned}$$
(3.99)

It is readily seen that \(\mathbf{x}_\pm \) are orthogonal:

$$\begin{aligned} \mathbf{x}_+^T\mathbf{x}_-=0. \end{aligned}$$
(3.100)

We can normalize the eigenvectors to

$$\begin{aligned} \mathbf{e}_+ = \frac{1}{\sqrt{4+2\sqrt{2}}} \left( \begin{array}{c} 1+\sqrt{2} \\ 1 \end{array} \right) ,~~ \mathbf{e}_- = \frac{1}{\sqrt{4-2\sqrt{2}}} \left( \begin{array}{c} 1-\sqrt{2} \\ 1 \end{array} \right) , \end{aligned}$$
(3.101)

so that \((\mathbf{e}_+,\mathbf{e}_-)\) forms a new orthonormal basis set complementary to \((\mathbf{i},\mathbf{j})\) along the x- and y-axis. Hence, we have the general decompositions

$$\begin{aligned} \mathbf{x} = x \mathbf{i} + y \mathbf{j} = a \mathbf{e}_+ + b \mathbf{e}_+, \end{aligned}$$
(3.102)

where \(x=\mathbf{i}\cdot \mathbf{x}\) and \(y=\mathbf{j}\cdot \mathbf{y}\). The coefficients a and b can be read off using multiplication by \(\mathbf{e}_\pm \):

$$\begin{aligned} a = x\, \mathbf{i}\cdot \mathbf{e}_+ + y\, \mathbf{j }\cdot \mathbf{e}_+,~~b = x\, \mathbf{i}\cdot \mathbf{e}_- + y\, \mathbf{j }\cdot \mathbf{e}_-. \end{aligned}$$
(3.103)

Note that (3.89) defines the eigenvectors as invariant subspaces. We now arrive at a new look at A as an operator on \(\mathbf{x}\) in terms of multiplications by eigenvectors along the directions given by the associated eigenvectors,

$$\begin{aligned} A\mathbf{x} = a\, \lambda _+ \mathbf{e}_+ + b\, \lambda _- \mathbf{e}_-. \end{aligned}$$
(3.104)

6.3 Hermitian Matrices

Let \(^\dagger \) denote the Hermitian conjugate,Footnote 7 defined as the complex conjugate of the transpose of a matrix element, a column or row vector or a matrix. We define the scalar product of two vectors \(\mathbf{a}\) and \(\mathbf{b}\) in an n-dimensional vector space by

$$\begin{aligned} \mathbf{a}^\dagger \mathbf{b} = \bar{a}_1b_1+\bar{a}_2b_2+\cdots \bar{a}_n b_n. \end{aligned}$$
(3.105)

Real-symmetric matrices generalize to complex valued matrices with the same properties of having real eigenvalues and mutually orthogonal eigenvectors associated with different eigenvalues according to (3.116) and, respectively, (3.119). Following the steps of the previous section, these are the self-adjoint or Hermitian matrices satisfying

$$\begin{aligned} H^\dagger = H, \end{aligned}$$
(3.106)

defined by transformation of the entries \(H^\dagger _{ij}=\bar{H}_{ji}\). Note that applying \(\dagger \) twice is an identity operation, i.e., \((A^\dagger )^\dagger =A\) for any \(n\times m\) matrix A. Hence, if H is an \(n\times n\) matrix, we have

$$\begin{aligned} H = \left( \begin{array}{ccccc} a_{11} &{} &{} &{} L^\dagger \\ &{} a_{22} &{} \\ &{} &{} \cdots &{} \\ L &{} &{} &{} a_{nn} \end{array}\right) \end{aligned}$$
(3.107)

with real diagonal elements \(a_{ii}\) \((i=1,2,\cdots n\)).

Example 3.5. For instance, the rotation matrix \(R(i\mu )\) with imaginary angle \(\varphi = i\mu \),

$$\begin{aligned} H=\left( \begin{array}{cc} \cosh \mu &{} -i\sinh \mu \\ i\sinh \mu &{} \cosh \mu \end{array}\right) , \end{aligned}$$
(3.108)

is Hermitian. Since \(|R(\varphi )|=1\) for all \(\varphi \), we have \(|H|=1\) by analytic continuation, which also follows by inspection,

$$\begin{aligned} |H|=\cosh ^2\mu - \sinh ^2\mu =1. \end{aligned}$$
(3.109)

The eigenvalue-eigenvectors obtain by analytic continuation of (3.97), i.e.,

$$\begin{aligned} \left\{ e^{-\mu },\left( \begin{array}{c} 1 \\ -i \end{array} \right) \right\} , \left\{ e^{\mu },\left( \begin{array}{c} 1 \\ i \end{array} \right) \right\} . \end{aligned}$$
(3.110)

According to (3.105), the scalar product between the two eigenvectors satisfies

$$\begin{aligned} \left( \begin{array}{c} 1 \\ -i \end{array} \right) ^\dagger \left( \begin{array}{c} 1 \\ i\end{array} \right) = (1&i ) \left( \begin{array}{c} 1 \\ i\end{array} \right) = 1 + i^2 = 0. \end{aligned}$$
(3.111)

This result of Example 3.5 is expected, since (3.1173.119) continues to hold upon replacing T by \(\dagger \), i.e.,

$$\begin{aligned} \lambda _1= \lambda _2~\text{ or }~\mathbf{a}_1^\dagger \mathbf{a}_2=0. \end{aligned}$$
(3.112)

For a Hermitian matrix, the eigenvectors of distinct eigenvalues are mutually orthogonal, where orthogonality is defined according to the inner product (3.105).

Very similar properties of the eigenvalue problem (3.89) appear in the real-symmetric matrix \(\Lambda (\mu )\) of (3.80). Again, we will find that the eigenvalues are real and distinct, whose accompanying eigenvectors are mutually orthogonal. These properties hold true for all real-symmetric matrices, as shown by the following.

Consider an eigenvalue-eigenvector pair \((\lambda ,\mathbf{a})\) to a Hermition matrix A. Then

$$\begin{aligned} \mathbf{a}^\dagger A\mathbf{a} = \lambda \mathbf{a}^\dagger \mathbf{a}. \end{aligned}$$
(3.113)

Here, \(\mathbf{a}^\dagger \mathbf{a}\) is real, obtained from the summation of the squared norms of the entries of \(\mathbf{a}\). For (3.94), for example, we have

$$\begin{aligned} \mathbf{a}^\dagger \mathbf{a} = \overline{\alpha }_1\alpha _1 +\overline{\alpha }_2\alpha _2\ge 0. \end{aligned}$$
(3.114)

The transpose of the left hand side of (3.113) satisfies

$$\begin{aligned} \lambda \mathbf{a}^\dagger \mathbf{a} = \mathbf{a}^\dagger A\mathbf{a} = \left( \mathbf{a}^\dagger A\mathbf{a}\right) ^T = \mathbf{a}^T A^T \bar{\mathbf{a}} = \mathbf{a}^T \overline{A^\dagger \mathbf{a}}= \mathbf{a}^T \overline{A\mathbf{a}} =\overline{\lambda } \mathbf{a}^T\overline{\mathbf{a}}. \end{aligned}$$
(3.115)

and hence \(\lambda \mathbf{a}^\dagger \mathbf{a} = \overline{\lambda }~\overline{\mathbf{a}^\dagger \mathbf a} = \overline{\lambda }{\mathbf{a}^\dagger \mathbf a}\). It follows that the eigenvalues of a Hermitian matrix are real:

$$\begin{aligned} \bar{\lambda } = \lambda , \end{aligned}$$
(3.116)

since \(\mathbf{a}^T\overline{\mathbf{a}} \equiv \mathbf{a}^\dagger \mathbf{a}\).

Following similar arguments, consider

$$\begin{aligned} A\mathbf{a}_2 = \lambda _2 \mathbf{a}:~\mathbf{a}^\dagger _1 A\mathbf{a}_2 = \lambda _2\mathbf{a}^\dagger _1\mathbf{a}_2. \end{aligned}$$
(3.117)

For a Hermitian A, we have

$$\begin{aligned} \left( \mathbf{a}^\dagger _1 A\mathbf{a}_2\right) ^\dagger = \mathbf{a}^\dagger _2 A^\dagger \mathbf{a}_1 = \mathbf{a}^\dagger _2 A\mathbf{a}_1 = \lambda _1 \mathbf{a}^\dagger _2\mathbf{a}_1. \end{aligned}$$
(3.118)

By (3.117, 3.118), we have \(\lambda _2 \mathbf{a}^\dagger _1\mathbf{a}_2 = \lambda _1 \mathbf{a}^\dagger _2 \mathbf{a}_1\). Since \(\mathbf{a}^\dagger _1\mathbf{a}_2=\mathbf{a}^\dagger _2\mathbf{a}_1\), it follows that

$$\begin{aligned} \lambda _1= \lambda _2~\text{ or }~\mathbf{a}_1^\dagger \mathbf{a}_2=0. \end{aligned}$$
(3.119)

For a Hermitian matrix, the eigenvectors of distinct eigenvalues are mutually orthogonal .

Let us now turn to the example matrix \(\Lambda (\mu )\) in (3.89). Its eigenvalues are defined by (3.91) with \(A=\Lambda \), that is,

$$\begin{aligned} 0=\left| \Lambda - \lambda I \right| = (\cosh \mu - \lambda )^2 - \sinh ^2\mu , \end{aligned}$$
(3.120)

whereby

$$\begin{aligned} \lambda _\pm = \cosh \mu + \sinh \mu = \left\{ \begin{array}{c} e^{\mu } \\ e^{-\mu } \end{array} \right. . \end{aligned}$$
(3.121)

Similar to (3.92), we note

$$\begin{aligned} \lambda _1\lambda _2 = | \Lambda | = 1. \end{aligned}$$
(3.122)
Fig. 3.7
figure 7

The matrix \(\Lambda \) in (3.80) is real-symmetric. With two distinct eigenvalues, its eigenvectors are orthogonal. As shown, a second eigenvector \(\mathbf{a}_{2}=\mathbf{i}_x-\mathbf{i}_y\) hereby follows immediately from orthogonality to the first \(\mathbf{a}_{1}=\mathbf{i}_x+\mathbf{i}_y\)

The equation for the eigenvectors (3.95) in terms of \((\alpha _1,\alpha _2)\) are again a linearly dependent system of equations when \(\lambda \) assumes one of the eigenvalues (3.121). Considering the first of (3.95) with \(\lambda =e^\mu \),

$$\begin{aligned} \alpha _1 (\cosh \mu - e^\mu ) + \alpha _2 \sinh \mu = 0: ~ \alpha _1 = \alpha _2, \end{aligned}$$
(3.123)

we obtain the eigenvalue-eigenvector pair

$$\begin{aligned} \left\{ e^{\mu },\left( \begin{array}{c} 1 \\ 1\end{array} \right) \right\} . \end{aligned}$$
(3.124)

According to (3.119), the eigenvector associated with \(\lambda =e^{-\mu }\) is orthogonal to that of (3.124). Since we are working in two dimensions, the second eigenvalue-eigenvector pair is therefore

$$\begin{aligned} \left\{ e^{-\mu },\left( \begin{array}{c} 1 \\ -1\end{array} \right) \right\} \end{aligned}$$
(3.125)

as illustrated in Fig. 3.7. The same obtains by solving (3.123) with \(e^{\mu }\) replaced by \(e^{-\mu }\).

7 Unitary Matrices and Invariants

The reflections and rotations shown in Fig. 3.6 preserve norm and angles. If \(\mathbf{a}\) and \(\mathbf{b}\) are two real vectors and \(\mathbf{a}^\prime \) and \(\mathbf{b}^\prime \) are their images, e.g.,

$$\begin{aligned} \mathbf{a}^\prime = R(\varphi ) \mathbf{a},~~\mathbf{b}^\prime = R(\varphi ) \mathbf{b} \end{aligned}$$
(3.126)

then the inner product

$$\begin{aligned} \rho =\mathbf{a}^T \mathbf{b} \end{aligned}$$
(3.127)

is preserved, since

$$\begin{aligned} (\mathbf{a}^\prime )^T \mathbf{b}^\prime = \left( R(\varphi ) \mathbf{a} \right) ^T R(\varphi ) \mathbf{b} = \mathbf{a}^T R(\varphi )^T R(\varphi ) \mathbf{b} = \mathbf{a}^T\mathbf{b} \end{aligned}$$
(3.128)

by the property of unitarity

$$\begin{aligned} R(\varphi )^T R(\varphi ) = R(-\varphi ) R(\varphi ) = I. \end{aligned}$$
(3.129)

In particular, \(|\mathbf{a}^\prime |^2=(\mathbf{a}^\prime )^T\mathbf{a}^\prime =|\mathbf{a}|^2\) and, likewise, \(|\mathbf{b}^\prime |^2=|\mathbf{a}|^2\), showing that their norms are preserved. If \(\theta \) and \(\theta ^\prime \) refer to the angle between \((\mathbf{a},\mathbf{b})\) and, respectively, \((\mathbf{a}^\prime , \mathbf{b}^\prime )\), then

$$\begin{aligned} |\mathbf{a}| | \mathbf{b}| \cos \theta ^\prime = |\mathbf{a}^\prime | | \mathbf{b}^\prime | \cos \theta ^\prime = (\mathbf{a}^\prime )^T\mathbf{b}^\prime =\mathbf{a}^T\mathbf{b}= |\mathbf{a}| | \mathbf{b}| \cos \theta , \end{aligned}$$
(3.130)

which shows that \(\cos \theta ^\prime = \cos \theta \). Since the norms and angles (between two vectors) are invariant under rotations, we say that \(R(\phi )\) is unitary, defined by the property (3.129).

Generalized to complex valued matrices, we say that A is unitary if

$$\begin{aligned} A^\dagger A = I, \end{aligned}$$
(3.131)

by which A is norm and angle preserving following (3.1263.130) with \(\dagger \) replacing T. In a unitary matrix, therefore, the columns and rows form orthonormal sets. This is evident by inspection in the rotation matrix \(R(\varphi )\): its row

$$\begin{aligned} \mathbf{r}_{1}=(\cos \varphi -\sin \varphi ), \mathbf{r}_{2}=(\sin \varphi \,\, \cos \varphi ) \end{aligned}$$
(3.132)

and column vectors

$$\begin{aligned} \mathbf{c}_{1}=\left( \begin{array}{r}\cos \varphi \\ \sin \varphi \end{array}\right) ,\mathbf{c}_{2}=\left( \begin{array}{r}-\sin \varphi \\ \cos \varphi \end{array}\right) \end{aligned}$$
(3.133)

satisfy

$$\begin{aligned} \mathbf{r}_i\mathbf{r}_j^T = \delta _{ij},~~\mathbf{c}_i^T\mathbf{c}_j=\delta _{ij}, \end{aligned}$$
(3.134)

where \(\delta _{ij}\) denotes the Kronecker delta symbol (\(\delta _{ij}=1\) \((i=j)\), \(\delta _{ij}=0\) (\(i\ne j\))).

The eigenvalues of a unitary matrix (3.131) are on the unit circle, as follows from

$$\begin{aligned} \mathbf{a}^\dagger \mathbf{a} = \mathbf{a}^\dagger A^\dagger A \mathbf{a} = (\lambda \mathbf{a})^\dagger (\lambda \mathbf{a}) = |\lambda |^2 \mathbf{a}^\dagger \mathbf{a}, \end{aligned}$$
(3.135)

where \(\mathbf{a}\) denotes an eigenvector of the eigenvalue \(\lambda \).

Fig. 3.8
figure 8

Show are the eigenvalues \(\lambda _\pm = e^{\pm i\varphi }\) of the rotation matrix \(R(\varphi )\) on the unit circle \(S^1\). Since R has a complete orthonormal set of eigenvectors, it is unitary. Shown are also \(\lambda _\pm = e^{\pm \mu }\) of \(\Lambda \). Away from \(S^1\), \(\Lambda \) is not unitary

The \(n\times n\) unitary matrices are U(n). U(n) is a group in that (i) \(C=AB\) is in U(n) for any \(A,B\,\epsilon \, U(n)\), (ii) every \(A\,\epsilon \, U(n)\) has an inverse \(A^{-1}\,\epsilon \, U(n)\) and (hence) (iii) U(n) contains the identity matrix I. (Specifically, \(AI=IA=A\).) For any two matrices A and B, we have

$$\begin{aligned} \text{ det } AB = \text{ det } A \,\text{ det } \text{ B } = \text{ det } BA, \end{aligned}$$
(3.136)

that we state here without proof. (It may be seen from the fact that the determinant equals the product of eigenvalues.) According to (3.131), unitary matrices hereby satisfy

$$\begin{aligned} \left| \text{ det } A\right| = 1. \end{aligned}$$
(3.137)

Elements of U(n) have a complete set of orthonormal eigenvectors with eigenvalues on the unit circle (Fig. 3.8). The special unitary group \(SU(n)\,\subset \, U(n)\) have unit determinant,

$$\begin{aligned} \text{ det } A = 1, \end{aligned}$$
(3.138)

exemplified by the rotation matrices \(R(\varphi )\) in (3.129).

In contrast, Hermitian matrices have a complete set of orthonormal eigenvectors with eigenvalues on the real axis. A matrix can be both unitary and Hermitian only if its eigenvalues are \(\pm 1\). Examples are \(n\times n\) Householder matrices representing reflections across the plane normal to \(\mathbf{u}\) in \(\mathbb R^n\),

$$\begin{aligned} H=I - 2\mathbf{u}{} \mathbf{u}^\dagger , ~\mathbf{u}^\dagger \mathbf{u}=1. \end{aligned}$$
(3.139)

8 Hermitian Structure of Minkowski Spacetime

A Hermitian \(n\times n\) matrix A on \(\mathbb {C}^n\) introduces a metric structure through an inner product defined by their real eigenvalues \(\lambda _i\),

$$\begin{aligned} \left( \mathbf{a},\mathbf{b}\right) = \mathbf{a}^\dagger A\mathbf{b} = \sum _{i=1}^n \lambda _i \mathbf{a}^\dagger \mathbf{b}. \end{aligned}$$
(3.140)

If all eigenvalues are positive, this metric structure introduces a norm equivalent to the Euclidean norm on \(\mathbf{R}^{2n}\),

$$\begin{aligned} \left| a \right| _* = \sqrt{\left( \mathbf{a},\mathbf{a}\right) }= \sqrt{ \sum _{i=1}^n \lambda _i \left| \mathbf{a}\right| ^2}. \end{aligned}$$
(3.141)

The Lorentz metric of Sect. 1.5 is an example of a real-symmetric matrix on \(\mathbb {R}^4\) with signature \((1,-1,-1,-1)\), referring to one positive and three negative eigenvalues. The metric structure it introduces follows (3.140) with A given by \(\eta _{ab}\), that is referred to as hyperbolic rather than Euclidean. We next strengthen this association to Hermitian matrices with some interesting consequences.

By dimension, we are at liberty to introduce complex combinations of the real 3+1 space-time components of a vector in terms of two complex-valued component vectors. Embedding the latter into a \(2\times 2\) Hermitian matrix, (a) the Lorentz metric obtains by the determinant of the matrix and (b) Lorentz transformations correspond to unitary transformations by unimodular matrices from SL(2,\(\mathbb {C}\)). Remarkably, the unimodular matrices giving a unitary transformation are effectively square roots of Lorentz transformations of four-vectors. For rotations on the unit sphere,Footnote 8 it gives a double cover of the rotations on the unit sphere \(S^2\).Footnote 9

The causal structure of Minkowski spacetime, defined by the Lorentz metric, is given geometrically by Lorentz invariant light cones. The generators of light cones are light rays. Light rays are integral curves of null-vectors with length zero. This refers to the fact that the change in total phase along a light ray is zero by definition—light rays define the propagation of wave fronts carrying constant total phase of electromagnetic radiation. They carry information on direction, but not distance. Projection of light rays onto the celestial sphere defines a one-to-one map of directions onto \(S^2\). Light rays hereby have two degrees of freedom,Footnote 10 and are either future- or past-oriented with opposite signs of their angular velocity in the propagation of an electromagnetic wave. Since the dimension of Minkowski space is four, this suggests a formulation in two null-vectors.

Expressed in terms of complex variables, a \(2\times 2\) formulation is realized by spinors of \(\left( \epsilon _{AB},\mathbb {C}^2\right) \), where \(\epsilon _{AB}\) refers to the metric spinor as follows.

Given a four-vector \(k^b=(k^t,k^x,k^y,k^z)\), consider the Hermitian matrix

$$\begin{aligned} K = \left( \begin{array}{rr} k^t+k^z &{} k^x-ik^y \\ k^x+ik^y &{} k^t-k^z \end{array}\right) = k^t \sigma _t+ k^x\sigma _x+k^y\sigma _y+k^z\sigma _z, \end{aligned}$$
(3.142)

expanded in terms of the Hermitian Pauli spin matricesFootnote 11

$$\begin{aligned} \sigma _t = I,~ \sigma _x = \left( \begin{array}{rr} 0 &{} 1 \\ 1 &{} 0 \end{array}\right) ,~ \sigma _y = \left( \begin{array}{rr} 0 &{} -i \\ i &{} 0 \end{array}\right) ,~ \sigma _z = \left( \begin{array}{rr} 1 &{} 0 \\ 0 &{} -1 \end{array}\right) \end{aligned}$$
(3.143)

with respective eigenvalues \(\lambda = \pm 1\), \(\lambda =\pm i\), \(\lambda =\pm i\) and \(\lambda = \pm i\). The Pauli spin matrices embed the basis vectors of Minkowski space,

$$\begin{aligned} \begin{array}{l} \sigma _t = K\left\{ \left( \begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array}\right) \right\} ,~~ \sigma _x = K\left\{ \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array}\right) \right\} ,\\ \\ \sigma _y = K\left\{ \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array}\right) \right\} ,~~ \sigma _z = K\left\{ \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array}\right) \right\} . \end{array} \end{aligned}$$
(3.144)

Notice that the \(\sigma _i\) \((i=x,y,z)\) are trace-free. From the determinant of Z,

$$\begin{aligned} \text{ det }\, K = \left( k^t\right) ^2 -\left( k^x\right) ^2-\left( k^y\right) ^2 -\left( k^z\right) ^2, \end{aligned}$$
(3.145)

the length of \(k^b\) in Minkowski space satisfies

$$\begin{aligned} s^2=\text{ det }\, K, \end{aligned}$$
(3.146)

incorporating the line-element \(s^2 = \eta _{ab}k^ak^b\) with Minkowski metric

$$\begin{aligned} \eta _{ab} = \left( \begin{array}{cccc} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} -1 &{} 0 &{} 0 \\ 0 &{} 0 &{} -1 &{} 0 \\ 0 &{} 0 &{} 0 &{} -1 \end{array}\right) . \end{aligned}$$
(3.147)

Here, we use the Einstein summation convention of summing over all index values \(a=t,x,y,z\) in combinations of covariant and contravariant indices. In Exercise 1.11, we noticed that \(\eta _{ab}\) reduced to 1+1 is invariant under Lorentz boosts. It is not difficult to ascertain that \(\eta _{ab}\) is invariant under general Lorentz transformations including rotations. As such, \(\eta _{ab}\) is a Lorentz invariant tensor.

Consider an element \(L\,\epsilon \,\) SL(2,\(\mathbb {C}\)), mentioned above. These unimodular elements have 8 − 2 = 6 degrees of freedom. Then

$$\begin{aligned} K\rightarrow L^\dagger KL \end{aligned}$$
(3.148)

preserves (3.146), since

$$\begin{aligned} \text{ det }\, L^\dagger KL = \text{ det }\,K. \end{aligned}$$
(3.149)

Notice that (3.149) holds true also for \(L=-I\), showing that the sign of L is not determined by a given Lorentz transformation of \(k^b\). Even so, a given L from SL(2,\(\mathbb {C}\)) in (3.148) defines a Lorentz transformation of \(k^b\).

Example 3.6. Consider a Lorentz boost with rapidity \(\mu \) of \(k^b=(1,0,0,0)^T\) to \((\cosh \mu ,0,0,\sinh \mu )^T\) along the z-axis,

$$\begin{aligned} \left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} 1 \end{array}\right) \rightarrow \left( \begin{array}{cc} \cosh \mu -\sinh \mu &{} 0 \\ 0 &{} \cosh \mu +\sinh \mu \end{array}\right) = \left( \begin{array}{cc} e^{-\mu } &{} 0 \\ 0 &{} e^{\mu } \end{array}\right) . \end{aligned}$$
(3.150)

It obtains by a boost

$$\begin{aligned} L(\mu ) = \left( \begin{array}{rr} e^{-\frac{1}{2}\mu } &{} 0 \\ 0 &{} -e^{\frac{1}{2}\mu } \end{array}\right) , \end{aligned}$$
(3.151)

whereas a rotation of \(k^b=(0,1,0,0)\) to \((0,\cos \theta ,\sin \theta ,0)\) about the z-axis,

$$\begin{aligned} \left( \begin{array}{cc} 0 &{} 1 \\ 1 &{} 0 \end{array}\right) \rightarrow \left( \begin{array}{cc} 0 &{} \cos \theta +i\sin \theta \\ \cos \theta -i\sin \theta &{} 0 \end{array}\right) = \left( \begin{array}{cc} 0 &{} e^{i\theta } \\ e^{-i\theta } &{} 0\end{array}\right) , \end{aligned}$$
(3.152)

obtains by a rotation

$$\begin{aligned} L(\theta ) = \frac{1}{\sqrt{2}} \left( \begin{array}{rr} e^{i\frac{1}{2}\theta } &{} ie^{-i\frac{1}{2}\theta } \\ ie^{i\frac{1}{2}\theta } &{} e^{-i\frac{1}{2}\theta } \end{array}\right) . \end{aligned}$$
(3.153)

Viewed by continuation starting from the identity matrix, (3.153) goes at the heart of spinors to be introduced below: a rotation over \(2\pi \) in physical space gives rise to a change in sign in \(L(\theta )\). A continuing rotation over \(4\pi \) restores the original sign. The \(L(\theta )\) in (3.153) are elements of SU(2), since \(L^\dagger L=I\). Accordingly, SU(2) is a two-fold cover of SO(3). Light cones are described by null-rays \(k^b\), satisfying

$$\begin{aligned} s^2 = \text{ det } K = 0. \end{aligned}$$
(3.154)

Their embedding (3.142) is therefore in rank-one matricesFootnote 12 (3.142) of the form

$$\begin{aligned} Z =\left( \begin{array}{rr} \xi \bar{\xi } &{} \eta \bar{\xi } \\ \bar{\eta }\xi &{} \eta \bar{\eta } \end{array}\right) = \left( \begin{array}{rr} \bar{\xi } \\ \bar{\eta } \end{array}\right) \left( \begin{array}{rr} \xi&\eta \end{array}\right) , \end{aligned}$$
(3.155)

whose determinant is identically equal to zero. Here, right hand side expresses the spinor and its transpose indicated by a primed index

$$\begin{aligned} \kappa ^A = \left( \begin{array}{rr} \xi&\eta \end{array}\right) ,~~ \kappa ^{A^\prime } = \left( \begin{array}{rr} {\xi } \\ {\eta } \end{array}\right) ~~\left( \xi , \eta \,\epsilon \,\mathbb {C}\right) \end{aligned}$$
(3.156)

following the convention, to using unprimed and primed indices for a row and, respectively, column vector notation. Accordingly, we write

$$\begin{aligned} Z^{AA'}=\kappa ^A\bar{\kappa }^{A'}, \end{aligned}$$
(3.157)

where \(\bar{\kappa }^{A'}\) is the Hermitian transpose of \(\kappa ^A\).

Let \(\kappa \) denote the row vector \(\kappa ^A\) in (3.156). Then (3.148) implies a corresponding Lorentz transformation of a spinor \(\kappa \),

$$\begin{aligned} Z\rightarrow \left( \kappa L\right) ^\dagger \left( \kappa L\right) :~~\kappa \rightarrow \kappa L. \end{aligned}$$
(3.158)

Rotation over \(2\pi \) in real space now has a corresponding sign change in the spinor.

Now write the determinant of \(K=K^{AA'}\) in (3.142) as

$$\begin{aligned} \text{ det }\,K = K^{11}K^{22}-K^{21}K^{12} \equiv \epsilon _{AB}\epsilon _{A'B'}K^{AA'}K^{BB'} \end{aligned}$$
(3.159)

in terms of the anti-symmetric metric spinor \(\epsilon _{AB}=-\epsilon _{BA}\), \(\epsilon _{A'B'}=-\epsilon _{B'A'}\) with \(\epsilon _{01}=\epsilon _{0'1'}=1\), i.e.,

$$\begin{aligned} \epsilon _{AB} = \left( \begin{array}{cc} 0 &{} 1 \\ -1 &{} 0 \end{array}\right) ,~~\epsilon _{A'B'} = \left( \begin{array}{cc} 0 &{} 1 \\ -1 &{} 0 \end{array}\right) . \end{aligned}$$
(3.160)

Then

$$\begin{aligned} \epsilon _{AB}\epsilon _{A'B'}K^{AA'}K^{BB'} = g_{ab}k^ak^b:~~\epsilon _{AB}\epsilon _{A'B'} = g_{ab} \end{aligned}$$
(3.161)

with implicit reference to the basis elements (3.170) to be discussed below. In (3.159), note that the incomplete contraction \(\epsilon _{A'B'}K^{AA'}K^{BB'}\) is an antisymmetric tensor in our two-dimensional spinor space \(\mathbb {C}^2\). Since the antisymmetric elements of \(L(2,\mathbb {C})\) are spanned by \(\epsilon _{AB}\),

$$\begin{aligned} \epsilon _{A'B'}K^{AA'}K^{BB'}=\frac{1}{2}\epsilon ^{AB}\text{ det }\,K, \end{aligned}$$
(3.162)

taking into account (3.159) and \(\epsilon _{AB}\epsilon ^{AB}=2\).

The metric spinor allows lowering and raising indices

$$\begin{aligned} \kappa _A = \kappa ^B\epsilon _{BA},~~\kappa ^B = \epsilon ^{BA}\kappa _A. \end{aligned}$$
(3.163)

Lowering and raising is by multiplication from the left and, respectively, right. The same rules apply to \(A'\). Since the metric spinor is skew symmetric, we automatically have that spinors are null,

$$\begin{aligned} \kappa ^A\kappa _A = \kappa ^A\kappa ^B \epsilon _{BA}= \kappa ^2\kappa ^1-\kappa ^1\kappa ^2=0. \end{aligned}$$
(3.164)

In practical terms, the spinor \(\kappa ^A\) is a square root of a null-vector \(k^b\) in \(Z^{AA'}\).

Consider two null-vectors \(k^b\) and \(l^b\) represented by spinors \(o^A\) and \(\iota ^A\). Then

$$\begin{aligned} k^cl_c=\left( o^A\bar{o}^{A'}\right) \left( \iota _A\bar{\iota }_{A'}\right) = \left( o^A\iota _A\right) \left( o^{A}{\iota }_{A}\right) ^\dagger = \left| o^A\iota _A\right| ^2\ge 0, \end{aligned}$$
(3.165)

whereby \(k^cl_c\ge 0\), i.e., \(k^b\) and \(l^b\) share the same direction in time, e.g., are future-oriented. Choosing two distinct null-vectors, we may insist

$$\begin{aligned} k^cl_c=1:~~o^A\iota _A=1. \end{aligned}$$
(3.166)

As members of \(\mathbb {C}^2\), choosing such pair as a basis gives

$$\begin{aligned} \epsilon _{AB} = o_A\iota _B - \iota _Ao_B=\left( \begin{array}{cc} 0 &{} 1 \\ -1 &{} 0 \end{array}\right) . \end{aligned}$$
(3.167)

To be explicit, consider

$$\begin{aligned} o^A =\left( \begin{array}{cc} 1&0\end{array}\right) ,~~\iota ^A =\left( \begin{array}{cc} 0&1\end{array}\right) . \end{aligned}$$
(3.168)

Then \(\iota _A=\epsilon _{AB}\iota ^B=\left( \begin{array}{cc} 1&0\end{array}\right) \), whereby (3.166) is satisfied, and

$$\begin{aligned} \begin{array}{lll} o^A \bar{o}^{A'}= \left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} 0 \end{array}\right) , \iota ^A\bar{\iota }^{A'} = \left( \begin{array}{cc} 0 &{} 0 \\ 0 &{} 1 \end{array}\right) ,\\ \\ o^A \bar{\iota }^{A'}= \left( \begin{array}{cc} 0 &{} 1 \\ 0 &{} 0 \end{array}\right) , \iota ^A\bar{o}^{A'} = \left( \begin{array}{cc} 0 &{} 0 \\ 1 &{} 0 \end{array}\right) . \end{array} \end{aligned}$$
(3.169)

The result identifies the Pauli spin-matrices with metric spin-tensors

$$\begin{aligned} \begin{array}{lll} \sigma _t^{AA'} = o^A \bar{o}^{A'} + \iota ^A\bar{\iota }^{A'},&{} \sigma _x^{AA'} = o^A \bar{\iota }^{A'} + \iota ^A\bar{o}^{A'},\\ \\ \sigma _y^{AA'} = i\left( o^A \bar{\iota }^{A'} - \iota ^A\bar{o}^{A'}\right) ,&{} \sigma _z^{AA'} = o^A \bar{o}^{A'} - \iota ^A\bar{\iota }^{A'}. \end{array} \end{aligned}$$
(3.170)

In the notation of linear algebra, note that \(\bar{o}^{A'} = (o^A)^\dagger \), etc. It recovers Pauli matrices in (3.142) as a basis of four-vectors in 3+1 Minkowski space. Note that (3.170) also introduces an algebraic map of a complex second-rank spinor, e.g., \(\phi _{AA'}\), to four-vectors with possibly complex valued components.

9 Eigenvectors of Hermitian Matrices

For \(n\times n\) Hermitian matrix A, \(A^\dagger = A\), let \((\lambda ,\mathbf{a})\) denote one of its eigenvalue-eigenvector pairs. The latter always exists by virtue of eigenvalue solutions to (3.91). Let

$$\begin{aligned} \hat{\mathbf{a}} = \frac{\mathbf{a}}{\sqrt{\mathbf{a}^\dagger \mathbf{a}}} \end{aligned}$$
(3.171)

denote the normalized eigenvector, satisfying \(\hat{\mathbf{a}}^\dagger \mathbf{a}=1\). For instance, we have

$$\begin{aligned} \mathbf{a} = \left( \begin{array}{c} 1 \\ 1 \end{array}\right) \rightarrow \hat{\mathbf{a}} = \frac{1}{\sqrt{2}} \left( \begin{array}{c} 1 \\ 1 \end{array}\right) . \end{aligned}$$
(3.172)

Let \(\mathbf{u}\) be any vector. We may decompose it orthogonally as

$$\begin{aligned} \mathbf{u} = \mathbf{u}_{||}+\mathbf{u}_\perp . \end{aligned}$$
(3.173)

Here, \(\mathbf{u}_{||}\) and \(\mathbf{u}_\perp \) are parallel and orthogonal to \(\mathbf{a}\), obtained from the projection operator

$$\begin{aligned} P = I - \hat{\mathbf{a}}\hat{\mathbf{a}}^\dagger :~\mathbf{u}_\perp =P\mathbf{u},~\mathbf{u}_{||}=(I-P)\mathbf{u}. \end{aligned}$$
(3.174)

Geometrically, the image space of PA consists of all vectors orthogonal to \(u_{||}\),

$$\begin{aligned} \text{ Im }PA = \left( u_{||}\right) ^\perp . \end{aligned}$$
(3.175)

We also note that \(\mathbf{u}_\perp \) is in the plane with normal \(\mathbf{a}\). The expansion (3.173) hereby satisfies

$$\begin{aligned} \mathbf{u} = (I-P)\mathbf{u} + P\mathbf{u}, \end{aligned}$$
(3.176)

and hence

$$\begin{aligned} A\mathbf{u} = \lambda (I-P)\mathbf{u} + AP\mathbf{u}. \end{aligned}$$
(3.177)

Since \(u_{||}=(I-P)\mathbf{u}\), being parallel to \(\mathbf{a}\), it is an eigenvector of A with eigenvalue \(\lambda \). Since \(\mathbf{u}\) in (3.177) is arbitrary, it follows that

$$\begin{aligned} A = \lambda (I-P) + AP. \end{aligned}$$
(3.178)

Since A is Hermitian with \(\lambda \) real, and \(P^\dagger =P\), we have

$$\begin{aligned} \lambda (I-P)+AP = A = A^\dagger = \lambda (I-P) + PA^\dagger = \lambda (I-P)+PA. \end{aligned}$$
(3.179)

We thus find that A and P commute, i.e.,

$$\begin{aligned} AP = PA. \end{aligned}$$
(3.180)

It follows that in particular that

$$\begin{aligned} \text{ Im } PA = \text{ PA } = \left( u_{||}\right) ^\perp . \end{aligned}$$
(3.181)

Since I and A commute trivially, also \((I-P)\) and A commute: \([(I-P),A]=0\) and the Hermitian matrix A operates completely independently on the one-dimensional subspace of vectors along an eigenvector \(\mathbf{a}\) and on the subspace of vectors in the \(n-1\) dimensional hypersurface normal to \(\mathbf{a}\). Equation (3.180) also shows that PA is Hermitian:

$$\begin{aligned} (PA)^\dagger = A^\dagger P^\dagger = AP = PA. \end{aligned}$$
(3.182)

Therefore, we can repeat all the steps (3.1713.180) for an eigenvector \(\mathbf{a}^\prime \) of \(A_1=AP\). Since \(AP\mathbf{a}^\prime = PA \mathbf{a}^\prime \), this eigenvector is in the image space of P, and hence it is orthogonal to \(\mathbf{a}\). By this orthogonality, \(P^\prime \) associated with \(\mathbf{a}^\prime \) commutes with P. It follows that \(A_1P^\prime =APP^\prime \) is Hermitian. Continuing in this fashion, we ultimately arrive at n mutually orthogonal eigenvectors \(\mathbf{a}\), \(\mathbf{a}^\prime , \cdots , \mathbf{a}^{\prime \prime \cdots \prime }.\)

Example 3.7. The \(2\times 2\) Hermitian matrix

$$\begin{aligned} A=\left( \begin{array}{ll} 2 &{} 1 \\ 1 &{} 0 \end{array}\right) \end{aligned}$$
(3.183)

has eigenvalue-eigenvector pairs \(\left\{ \left( \lambda _1, \mathbf{a}_1\right) ,~\left( \lambda _2,\mathbf{a}_2\right) \right\} \) with

$$\begin{aligned} \left\{ 1\pm \sqrt{2}, \left( \begin{array}{c} 1\pm \sqrt{2} \\ 1\end{array}\right) \right\} . \end{aligned}$$
(3.184)

We wish to view the operation of A on a vector \(\mathbf{u}\) as the sum of linearly independent operations associated with the directions \(\mathbf{a}_1\) and \(\mathbf{a}_2\). We first rewrite (3.185) in terms of the equivalent orthonormal pair

$$\begin{aligned} \hat{\mathbf{a}}_1 = \frac{\mathbf{a}_1}{\sqrt{\mathbf{a}^\dagger _1\mathbf{a}_1}}, ~\hat{\mathbf{a}}_2 = \frac{\mathbf{a}_2}{\sqrt{\mathbf{a}_2^\dagger \mathbf{a}_2}} \end{aligned}$$
(3.185)

following (3.171). The \(\left\{ \hat{\mathbf{a}}_1, \hat{\mathbf{a}}_2\right\} \) form an orthonormal basis (a complete set of orthonormal vectors) for vectors \(\mathbf{u}\) in our two-dimensional space. They satisfy the property

$$\begin{aligned} \hat{\mathbf{a}}_i^\dagger \hat{\mathbf{a}}_j=\delta _{ij} = \left\{ \begin{array}{cc} 1 &{} (i=j) \\ 0 &{} (i\ne j) \end{array} \right. \end{aligned}$$
(3.186)

Here, \(\delta _{ij}\) is the commonly used Kronecker delta symbol. For (3.183), we have

$$\begin{aligned} \hat{\mathbf{a}}_{1,2} = \frac{1}{2(2\pm \sqrt{2})} \left( \begin{array}{c} 1\pm \sqrt{2} \\ 1\end{array}\right) . \end{aligned}$$
(3.187)

For an arbitrary vector, we can write

$$\begin{aligned} \mathbf{u} = \alpha \hat{\mathbf{a}}_1 + \beta \hat{\mathbf{a}}_2. \end{aligned}$$
(3.188)

Multiplication by \(\hat{\mathbf{a}}_{1,2}\) from the left obtains

$$\begin{aligned} \mathbf{a}_1^\dagger \mathbf{u} = \alpha \mathbf{a}_1^\dagger \hat{\mathbf{a}}_1 + \beta \hat{\mathbf{a}}_1^\dagger \mathbf{a}_2 = \alpha \end{aligned}$$
(3.189)
$$\begin{aligned} \mathbf{a}_2^\dagger \mathbf{u} = \alpha \mathbf{a}_2^\dagger \hat{\mathbf{a}}_1 + \beta \hat{\mathbf{a}}_2^\dagger \mathbf{a}_2 = \beta . \end{aligned}$$
(3.190)

Substitution of (3.190) into (3.188) gives the explicit expression

$$\begin{aligned} \mathbf{u} = \hat{\mathbf{a}}_1\hat{\mathbf{a}}^\dagger _1 \mathbf{u} + \hat{\mathbf{a}}_2\hat{\mathbf{a}}^\dagger _2 \mathbf{u}. \end{aligned}$$
(3.191)

This represents the Gram-Schmidt orthogonal decomposition of \(\mathbf{u}\) with respect to the eigenvectors of A. Accordingly, \(A\mathbf{u}\) satisfies

$$\begin{aligned} A\mathbf{u} = \alpha A \hat{\mathbf{a}}_1 + \beta A \hat{\mathbf{a}}_2 = \alpha \lambda _1 \hat{\mathbf{a}}_1 + \beta \lambda _2 \hat{\mathbf{a}}_2 = \lambda _1\hat{\mathbf{a}}_1\hat{\mathbf{a}}^\dagger _1 \mathbf{u} + \lambda _2 \hat{\mathbf{a}}_2\hat{\mathbf{a}}^\dagger _2 \mathbf{u}. \end{aligned}$$
(3.192)

Since \(\mathbf{u}\) is arbitrary, we conclude

$$\begin{aligned} A = \lambda _1\hat{\mathbf{a}}_1\hat{\mathbf{a}}^\dagger _1 + \lambda _2 \hat{\mathbf{a}}_2\hat{\mathbf{a}}^\dagger _2. \end{aligned}$$
(3.193)

The same follows from \(A=AI\), \(I= \hat{\mathbf{a}}_1\hat{\mathbf{a}}^\dagger _1 + \hat{\mathbf{a}}_2\hat{\mathbf{a}}^\dagger _2.\)

Example 3.8. For \(\Lambda \) in (3.80) we have, according to (3.124, 3.125), a normalized pair of eigenvalues-eigenvectors given by

$$\begin{aligned} \left\{ e^{\pm \mu },\frac{1}{\sqrt{2}}\left( \begin{array}{c} 1 \\ \pm 1\end{array} \right) \right\} . \end{aligned}$$
(3.194)

Following (3.194), we consider

$$\begin{aligned} \begin{array}{l} \lambda _1\hat{\mathbf{a}}_1\hat{\mathbf{a}}^\dagger _1 = \frac{e^\mu }{\sqrt{2}}(1~~1) \frac{1}{\sqrt{2}}\left( \begin{array}{r} 1 \\ 1 \end{array} \right) =\frac{e^\mu }{\sqrt{2}} \left( \begin{array}{rr} 1 &{} 1 \\ 1 &{} 1 \end{array} \right) ,\\ \\ \lambda _2\hat{\mathbf{a}}_2\hat{\mathbf{a}}^\dagger _2 = \frac{e^{-\mu }}{\sqrt{2}} (1~~\text{--1 })\frac{1}{\sqrt{2}}\left( \begin{array}{r} 1 \\ -1 \end{array} \right) = \frac{e^{-\mu }}{\sqrt{2}} \left( \begin{array}{rr} 1 &{} -1 \\ -1 &{} 1 \end{array} \right) . \end{array} \end{aligned}$$
(3.195)

Adding these expressions gives

$$\begin{aligned} \frac{e^\mu }{{2}} \left( \begin{array}{rr} 1 &{} 1 \\ 1 &{} 1 \end{array} \right) + \frac{e^{-\mu }}{{2}} \left( \begin{array}{rr} 1 &{} -1 \\ -1 &{} 1 \end{array} \right) = \frac{1}{2} \left( \begin{array}{rr} e^{\mu }+e^{-\mu } &{} e^{\mu }-e^{-\mu } \\ e^{\mu }-e^{-\mu } &{} e^\mu -e^{-\mu } \end{array} \right) , \end{aligned}$$
(3.196)

i.e., we recover our definition of a Lorentz boost,

$$\begin{aligned} A= \left( \begin{array}{rr} \cosh \mu &{} \sinh \mu \\ \cosh \mu &{} \sinh \mu \end{array} \right) . \end{aligned}$$
(3.197)

10 QR Factorization

In viewing an \(n\times m\) matrix A as a linear map from the vector space \(\mathbf{\mathbb C}^m\) to \(\mathbf{\mathbb C}^n\), we frequently encounter the question if A is imaging \(\mathbf{\mathbb C}^m\) onto all of \(\mathbf{\mathbb C}^n\) or just a linear subspace of it. Similarly, A may map all nonzero vectors from \(\mathbf{\mathbb C}^m\) to nonzero vectors or map some linear subspace of \(\mathbf{\mathbb C}^m\) to the origin \(\mathbf{0}\) in \(\mathbf{\mathbb C}^n\).

To streamline this discussion, we introduce the image of A, defined by the linear vector space

$$\begin{aligned} \text{ Im }\, A = \{ \mathbf{v}\,|\, \mathbf{v}=A\mathbf{u}\,, \mathbf{u}\, \epsilon \, \mathbf{\mathbb C}^m \} \end{aligned}$$
(3.198)

and the kernel of A, also known as the null space of A, defined by the linear vector space

$$\begin{aligned} \text{ Ker } \, A = \{ \mathbf{u}\, |\, A\mathbf{u} = \mathbf{0}, \mathbf{u}\, \epsilon \, \mathbf{\mathbb C}^m \}. \end{aligned}$$
(3.199)

The image space is supported by the column vectors \(\mathbf{a}_1\), \(\mathbf{a}_2\), \(\ldots ,\) \(\mathbf{a}_m\) of A, e.g., for a \(2\times 2\) matrix

(3.200)

The image space forms out of linear combinations

$$\begin{aligned} A\mathbf{u} = u_1\mathbf{a}_1 + u_2 \mathbf{a}_2 + \cdots + u_m \mathbf{a}_m \end{aligned}$$
(3.201)

by choice of \(\mathbf{u}\) in \(\mathbb C^m\),

$$\begin{aligned} \mathbf{u} = \left( \begin{array}{c} u_1 \\ u_2 \\ \cdot \\ u_m\end{array}\right) . \end{aligned}$$
(3.202)

The row space is supported by the row vectors \(\mathbf{b}_1\), \(\mathbf{b}_2\), \(\cdots ,\) \(\mathbf{b}_n\) of A, e.g., for a \(2\times 2\) matrix

(3.203)

that forms out of the linear combinations

$$\begin{aligned} \mathbf{r} = v_1\mathbf{b}_1 + v_2 \mathbf{b}_2 + \cdots + v_n \mathbf{b}_n. \end{aligned}$$
(3.204)

by choice of \(\mathbf{v}\) in \(\mathbb C^m\),

$$\begin{aligned} \mathbf{v} = \left( \begin{array}{c} v_1 \\ v_2 \\ \cdot \\ v_n\end{array}\right) . \end{aligned}$$
(3.205)

Following (3.199), the kernel of A consists of the vectors that are orthogonal to all of its row vectors \(\mathbf{b}_i\) \((i=1,2,\ldots n)\), commonly written as

$$\begin{aligned} \text{ Ker }\,A = \left( \text{ Im }\,A^\dagger \right) ^\perp , \end{aligned}$$
(3.206)

where \(\perp \) refers to orthogonality with respect to the inner product \(\mathbf{a}\cdot \mathbf{b}=\mathbf{a}^\dagger \mathbf{b}\) for vectors in \(\mathbb C^m\).

10.1 Examples of Image and Null Space

Let A be a nonsingular \(2\times 2\) matrix. We read off the columns vectors following (3.200), e.g.,

$$\begin{aligned} A=\left( \begin{array}{cc} 1 &{} 2 \\ 2 &{} 0 \end{array} \right) :~~ \mathbf{a}_1 = \left( \begin{array}{c} 1 \\ 2 \end{array}\right) ,~~ \mathbf{a}_2 = \left( \begin{array}{c} 2 \\ 0 \end{array}\right) . \end{aligned}$$
(3.207)

Hence, \(\text{ Im }\,A\) is defined by all vectors obtained from linear combinations of the \(\mathbf{a}_1\) and \(\mathbf{a}_2\) in (3.207). We say

$$\begin{aligned} \text{ Im }\,A=\text{ span } \{ \mathbf{a}_1, \mathbf{a}_2\} = \text{ span } \left\{ \left( \begin{array}{c} 1 \\ 2 \end{array} \right) , \left( \begin{array}{c}1 \\ 0 \end{array} \right) \right\} . \end{aligned}$$
(3.208)

Evidently, we have

$$\begin{aligned} \text{ Im }\,A = \mathbb R^2 \end{aligned}$$
(3.209)

(or \(\mathbb {C}^2\)) since the \(\mathbf{a}_1\) and \(\mathbf{a}_2\) in (3.207) point in different directions, whereby they are linearly independent. This may also be inferred from the fact that \(\text{ det }\,A\ne 0.\)

Alternatively, consider the singular matrix

$$\begin{aligned} B=\left( \begin{array}{cc} 1 &{} 2 \\ 2 &{} 4 \end{array} \right) :~~ \mathbf{a}_1 = \left( \begin{array}{c} 1 \\ 2 \end{array}\right) ,~~ \mathbf{a}_2 = \left( \begin{array}{c} 2 \\ 4 \end{array}\right) . \end{aligned}$$
(3.210)

In this event, the second column satisfies

$$\begin{aligned} \mathbf{a}_2 = 2\mathbf{a}_1 \end{aligned}$$
(3.211)

and the two columns are linearly dependent, as follows also from the fact that

$$\begin{aligned} \text{ det }\,B=0. \end{aligned}$$
(3.212)

Consequently, the image space is the one dimensional subspace given by

$$\begin{aligned} \text{ Im }\,B=\text{ span } \{ \mathbf{a}_1, \mathbf{a}_2\} = \text{ span } \left\{ \left( \begin{array}{c} 1 \\ 2 \end{array} \right) \right\} . \end{aligned}$$
(3.213)

Proceeding with (3.207) above, we have, following (3.203), the row vectors

$$\begin{aligned} A=\left( \begin{array}{cc} 1 &{} 2 \\ 2 &{} 0 \end{array} \right) :~~ \mathbf{b}_1 = \left( \begin{array}{c} 1 \\ 2 \end{array}\right) ,~~ \mathbf{b}_2 = \left( \begin{array}{c} 2 \\ 0 \end{array}\right) . \end{aligned}$$
(3.214)

They happen to be the same as the column vectors since A is real-symmetric. \(\text{ Ker }\,A\) is defined by vectors that are orthogonal to both \(\mathbf{b}_1\) and \(\mathbf{b}_2\). Since \(\mathbf{b}_1\) and \(\mathbf{b}_2\) are linearly independent, we have

$$\begin{aligned} \text{ Ker }\,A=\mathbf{0}. \end{aligned}$$
(3.215)

For the alternative (3.210), we have the row vectors

$$\begin{aligned} B=\left( \begin{array}{cc} 1 &{} 2 \\ 2 &{} 4 \end{array} \right) :~~ \mathbf{b}_1 = \left( \begin{array}{c} 1 \\ 2 \end{array}\right) ,~~ \mathbf{b}_2 = \left( \begin{array}{c} 2 \\ 4 \end{array}\right) . \end{aligned}$$
(3.216)

In this event, the second row satisfies \(\mathbf{b}_2 = 2\mathbf{b}_1\), whereby they are linearly dependent. Consequently, the null space of A is the one dimensional subspace, given by the vectors orthogonal to \(\mathbf{b}_1\), i.e.,

$$\begin{aligned} \text{ Ker }\,B= \left( \begin{array}{c} 1 \\ 2 \end{array} \right) ^\perp = \text{ span }~\left\{ \left( \begin{array}{r} -2 \\ 1 \end{array} \right) \right\} . \end{aligned}$$
(3.217)

The matrices (3.207) and (3.210) satisfy, respectively,

$$\begin{aligned} \begin{array}{l} \text{ dim }\,\text{ Im }\,A + \text{ dim }\,\text{ Ker }\,A=2+0=2,\\ \\ \text{ dim }\,\text{ Im }\,B + \text{ dim }\,\text{ Ker }\,B=1+1=2. \end{array} \end{aligned}$$
(3.218)

10.2 Dimensions of Image and Null Space

In what follows, we will restrict our discussion to square matrices of size \(n\times n\). In this event, the dimension \(\text{ dim }\,\text{ Im }\, A\) of (3.198) is n whenever A is of full rank, i.e., when \(\text{ det }\,A\ne 0\). Complementary to this, we have that \(\text{ dim }\,\text{ Ker }\, A\) of (3.199) is 0 whenever A is of full rank, i.e., when \(\text{ det }\,A\ne 0\). However, \(\text{ dim }\,\text{ Im }\, A<n\) and \(\text{ dim }\,\text{ Ker }\, A>0\) when \(\text{ det }\,A=0\).

The matrices (3.207) and (3.210) exemplify a general relationship of \(n\times n\) matrices, satisfying

$$\begin{aligned} \text{ dim }\,\text{ Im }\,A + \text{ dim }\,\text{ Ker }\,A=n \end{aligned}$$
(3.219)

To derive this relationship, we begin by observing the invariance

$$\begin{aligned} \text{ Im }\,A = \text{ Im }\,A^\prime \end{aligned}$$
(3.220)

for \(A^\prime =\left( \mathbf{a}_1^\prime ~\mathbf{a}_2^\prime ~\ldots ~\mathbf{a}_n^\prime \right) \) obtained from \(A=\left( \mathbf{a}_1~\mathbf{a}_2~\ldots ~\mathbf{a}_n\right) \) by changing a column vector \(\mathbf{a}_j\) by linear superposition with any of the other column vectors. Specifically, this may be by choice of \(1\le j\le n\) and a linear superposition

$$\begin{aligned} \mathbf{a}_i^\prime = \mathbf{a}_i~(i\ne j),~~\mathbf{a}_j^\prime = \mathbf{a}_j - \sum _{i=1}^{j-1} \mu _i \mathbf{a}_i. \end{aligned}$$
(3.221)

This transformation has a corresponding upper triangular transformation matrix U such that \(A^\prime = AU\). For instance, when \(n=3\) and \(j=2,3\)

$$\begin{aligned} U_2= \left( \begin{array}{ccc} 1 &{} -\mu _1 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \end{array} \right) ,~~ U_3= \left( \begin{array}{ccc} 1 &{} 0 &{} -\mu _1^\prime \\ 0 &{} 1 &{} -\mu _2^\prime \\ 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$
(3.222)

Since the \(AU_2\) and \(AU_2U_3\) form as superpositions of the column vectors of A, their image space remains \(\text{ Im }\,A\).

The Gram-Schmidt orthogonalization of \(\mathbf{a}_j\) from A to mutually orthoginalized \(\mathbf{a}_i\) \((1\le i \le j-1)\) satisfies

$$\begin{aligned} \mathbf{a}_{j}^\prime = \mathbf{a}_{j} - \sum _{i=1}^{j-1}\mu _i \mathbf{a}_i,~~\mu _i= \frac{\mathbf{a}_j^\dagger \mathbf{a}_{i}}{\mathbf{a}_i^\dagger \mathbf{a}_i}. \end{aligned}$$
(3.223)

Here, we omit projections \(\mu _i \mathbf{a}_i\) whenever \(\mathbf{a}_i=\mathbf{0}\). Performing (3.223) for each \(j=2,3,\ldots \) consecutively up to \(j=n\) produces \(A^{\prime \prime \cdots \prime }\) with column vectors that are all orthogonal. For \(2\,\times \,2\) matrix A, the Gram-Schmidt orthogonalization of its colum vectors obtains in one step

(3.224)

If A has full rank, then so has \(A^\prime \). This may also be seen from the product rule

$$\begin{aligned} \text{ det }\,A^\prime = \text{ det } \, A ~\text{ det }\,U = \text{ det }\,A, \end{aligned}$$
(3.225)

since \(\text{ det }\,U=1\). The determinant of \(A^\prime \) is nonzero iff the determinant of A is nonzero. For a 3\(\times 3\) matrix, we apply (3.222). The product \(U=U_2U_3\) is upper triangular,

$$\begin{aligned} U = U_2U_3= \left( \begin{array}{ccc} 1 &{} -\mu _1 &{} -\mu _1^\prime +\mu _1\mu _2^\prime \\ 0 &{} 1 &{} -\mu _2^\prime \\ 0 &{} 0 &{} 1 \end{array} \right) . \end{aligned}$$
(3.226)

The above is readily extended to \(n\times n\) matrices

$$\begin{aligned} A^\prime = AU = AU_2U_3\cdots U_n \end{aligned}$$
(3.227)

whose columns are mutually orthogonal, where U is upper triangular with unit determinant.

If \(\text{ det }~A=0\), some of the columns of \(A^\prime \) are zero, i.e.,

$$\begin{aligned} A^\prime = \left( \mathbf{a}_1 ~ \mathbf{a}_2 ~ \dots ~ \mathbf{0}~ \dots ~ \mathbf{a}_j ~\dots ~\mathbf{0} ~\dots ~ \mathbf{a}_n\right) . \end{aligned}$$
(3.228)

The null vectors of \(A^\prime \) are of the form \(\mathbf{u}^\prime =(0~\cdots ~1~\cdots 0)^T\), where 1 appears at a position j where \(\mathbf{a}_j=\mathbf{0}\). Since U is invertible, \(A=A^\prime U^{-1}\), and hence \(\mathbf{u} =U\mathbf{u}^\prime \) is a null vector of A. Our theorem (3.219) now readily follows: the number of non-zero columns in \(A^\prime \) define the dimension of the image space of A and the number of zero columns of \(A^\prime \) define the dimension of the null space of A.

Example 3.9. Consider the non-singular matrix

$$\begin{aligned} A= \left( \begin{array}{ccc} 1&{}2&{}1\\ 2&{}1&{}0 \\ 0&{}2&{}3\end{array} \right) . \end{aligned}$$
(3.229)

The first and second steps in (3.223) produce, respectively,

$$\begin{aligned} A^\prime = AU_2 = \left( \begin{array}{ccc} 1&{}6/5&{}1\\ 2&{}-3/5&{}0 \\ 0&{}2&{}3\end{array} \right) ,~U_2= \left( \begin{array}{ccc} 1&{}-4/5&{}0\\ 0&{}1&{}0 \\ 0&{}0&{}1\end{array} \right) , \end{aligned}$$
(3.230)
$$\begin{aligned} A^{\prime \prime }= A^\prime U_3 = \left( \begin{array}{ccc} 1&{}6/5&{}-{\frac{20}{29}} \\ 2&{}-3/5&{}{\frac{10}{29}}\\ 0&{}2&{}{ \frac{15}{29}}\end{array} \right) ,~~U_3= \left( \begin{array}{ccc} 1&{}0&{}-1/5\\ 0&{}1&{}-{\frac{36}{29}}\\ 0&{}0&{}1\end{array} \right) . \end{aligned}$$
(3.231)

It follows that

$$\begin{aligned} \left( \begin{array}{ccc} 1&{}6/5&{}-{\frac{20}{29}} \\ 2&{}-3/5&{}{\frac{10}{29}}\\ 0&{}2&{}{ \frac{15}{29}}\end{array} \right) = \left( \begin{array}{ccc} 1&{}2&{}1\\ 2&{}1&{}0 \\ 0&{}2&{}3\end{array} \right) \left( \begin{array}{ccc} 1&{}-4/5&{}{\frac{23}{29}} \\ 0&{}1&{}-{\frac{36}{29}}\\ 0&{}0&{}1 \end{array} \right) , \end{aligned}$$
(3.232)

where the second matrix on the right hand side is \(U=U_2U_3\). Similarly, consider the singular matrix

$$\begin{aligned} B=\left( \begin{array}{ccc} 1&{}2&{}1\\ 2&{}4&{}0 \\ 0&{}0&{}3\end{array} \right) . \end{aligned}$$
(3.233)

The first and second step in (3.223) produce, respectively,

$$\begin{aligned} B^\prime = BU_2 \left( \begin{array}{ccc} 1&{}0&{}1\\ 2&{}0&{}0 \\ 0&{}0&{}3\end{array} \right) ,~ U_2=\left( \begin{array}{ccc} 1&{}-2&{}0\\ 0&{}1&{}0 \\ 0&{}0&{}1\end{array} \right) ,~ \end{aligned}$$
(3.234)
$$\begin{aligned} B^{\prime \prime } = B^\prime U_3 = \left( \begin{array}{ccc} 1&{}0&{}4/5\\ 2&{}0&{}-2/5 \\ 0&{}0&{}3\end{array} \right) ,~ U_3 = \left( \begin{array}{ccc} 1&{}0&{}-1/5\\ 0&{}1&{}0 \\ 0&{}0&{}1\end{array} \right) . \end{aligned}$$
(3.235)

It follows that

$$\begin{aligned} \left( \begin{array}{ccc} 1&{}0&{}4/5\\ 2&{}0&{}-2/5 \\ 0&{}0&{}3\end{array} \right) =\left( \begin{array}{ccc} 1&{}2&{}1\\ 2&{}4&{}0 \\ 0&{}0&{}3\end{array} \right) \left( \begin{array}{ccc} 1&{}-2&{}-1/5\\ 0&{}1&{}0 \\ 0&{}0&{}1\end{array} \right) , \end{aligned}$$
(3.236)

where the second matrix on the right hand side is \(U=U_1U_2\).

Comparing (3.2323.236), we see that A in (3.229) has full rank with the trivial null space \(\text{ Ker }\,A=\mathbf{0}\), whereas B in (3.233) is of rank 2 with the nontrivial null space given by the second column of \(U=U_2U_3\), i.e.,

$$\begin{aligned} \text{ Ker }\,B = \text{ span }~U\left( \begin{array}{c} 0 \\ 1 \\ 0 \end{array}\right) = \text{ span } \left( \begin{array}{r} -2 \\ 1 \\ 0 \end{array}\right) . \end{aligned}$$
(3.237)

10.3 QR Factorization by Gram-Schmidt

The above is more commonly used to derive the QR factorization of a matrix upon including normalization in each step of the Gram-Schmidt procedure,

$$\begin{aligned} \mathbf{a}_j^\prime \rightarrow \hat{\mathbf{a}}_j = \frac{\mathbf{a}_j^\prime }{\sqrt{(\mathbf{a}^\prime )^\dagger \mathbf{a}^\prime }} \end{aligned}$$
(3.238)

if \(\mathbf{a}_j^\prime \ne \mathbf{0}\) (otherwise, we skip this step). The result \(A=QR\) has column vectors of Q forming an orthonormal bases for \(\text{ Im }\,A\) and R upper triangular. If A is square and invertible, then Q is unitary, \(Q^\dagger Q=I\), whereby \(R=Q^\dagger A\).

Example 3.10. Consider the QR factorizations of the non-singular \(2\,\times \,2\) matrix. Let \(A^\prime = AU_2\) be the outcome of the Gram-Schmidt procedure and \(D_2\) denote the diagonal matrix \(D_2\) containing the norms its column vectors. Then \(A^\prime = QD_2\) defines

$$\begin{aligned} \left( \begin{array}{cc} 1&{}2\\ 2&{}1\end{array} \right) = \left( \begin{array}{cc} \frac{1}{\sqrt{5}}&{}\frac{2}{\sqrt{5}} \\ \frac{1}{\sqrt{5}}&{}-\frac{1}{\sqrt{5}}\end{array} \right) \left( \begin{array}{cc} \sqrt{5}&{}\frac{4}{\sqrt{5}}\\ 0&{}\frac{3}{\sqrt{5}}\end{array} \right) \equiv QR. \end{aligned}$$
(3.239)

For a singular matrix, we similarly obtain

$$\begin{aligned} \left( \begin{array}{cc} 1&{}2\\ 2&{}4\end{array} \right) = \left( \begin{array}{cc} \frac{1}{\sqrt{5}}&{}\frac{2}{\sqrt{5}}\\ \frac{2}{\sqrt{5}}&{}-\frac{1}{\sqrt{5}}\end{array} \right) \left( \begin{array}{cc} \sqrt{5}&{}2\,\sqrt{5}\\ 0&{}0\end{array} \right) . \end{aligned}$$
(3.240)

For the A and B in (3.229) and (3.233), the QR factorizations are

$$\begin{aligned} A= \left( \begin{array}{ccc} 1/5\,\sqrt{5}&{}{\frac{6}{145}}\,\sqrt{145}&{}-{\frac{4}{29}}\,\sqrt{29}\\ 2/5\,\sqrt{5}&{}- {\frac{3}{145}}\,\sqrt{145}&{}{\frac{2}{29}}\,\sqrt{29} \\ 0&{}{\frac{2}{29}}\,\sqrt{145}&{}{\frac{3}{29}}\,\sqrt{29}\end{array} \right) \left( \begin{array}{ccc} \sqrt{5}&{}4/\sqrt{5}&{}1/\sqrt{5}\\ 0&{}1/\sqrt{145}&{}{\frac{36}{145}}\,\sqrt{145} \\ 0&{}0&{}{\frac{5}{29}}\,\sqrt{29} \end{array} \right) , \end{aligned}$$
(3.241)
$$\begin{aligned} B = \left( \begin{array}{ccc} 1/\sqrt{5}&{}2/\sqrt{5}&{}0 \\ 2/\sqrt{5}&{}-1/\sqrt{5}&{}0 \\ 0&{}0&{}1\end{array} \right) \left( \begin{array}{ccc} \sqrt{5}&{}4/\sqrt{5}&{}1/\sqrt{5} \\ 0&{}3/\sqrt{5}&{}2/\sqrt{5} \\ 0&{}0&{}3\end{array} \right) . \end{aligned}$$
(3.242)

Table 3.1 summarizes this discussion.

Table 3.1 Matrices and some symmetry properties

11 Exercises

3.1. Let

$$\begin{aligned} \mathbf{a} = \left( \begin{array}{c} 1 \\ 2 \\ 0 \end{array} \right) ,~ \mathbf{b} = \left( \begin{array}{c} 0 \\ 1 \\ 2 \end{array} \right) ,~ \mathbf{c} = \left( \begin{array}{c} 2 \\ 0 \\ 1 \end{array} \right) . \end{aligned}$$
(3.243)

Calculate

$$\begin{aligned} (i):~\mathbf{a}\times \mathbf{b}\cdot \mathbf{c},~~\mathbf{a}\cdot \mathbf{b}\times \mathbf{c};~~ (ii):~\mathbf{a}\times (\mathbf{b}\times \mathbf{c}),~~ (\mathbf{a}\times \mathbf{b})\times \mathbf{c}. \end{aligned}$$
(3.244)

Compare your answers to (i) and explain.

    3.2. Consider a Cartesian coordinate system (xyz) and rotation of a vector \(\mathbf{r}\) about the z-axis with angular velocity \(\omega =\Omega i_z\), where \(\mathbf{i}_z\) denotes the unit vector along the z-axis. The velocity of \(\mathbf{r}\) satisfies \(\mathbf{v}=\omega \times \mathbf{r}\), where \(\times \) denotes the outer product.

(i) If \(\mathbf{r} = 2 \mathbf{i}_x + 3 \mathbf{i}_z\), calculate the velocity \(\mathbf{v}\).

(ii) Show that \(\left| \mathbf{v }\right| = \Omega \sigma \), where \(\sigma \) denotes the distance to the axis of rotation.

3.3. Derive the equivalent expression for the dip angle (3.245), given by

$$\begin{aligned} \sin \theta = \frac{M\sigma ^2\omega _p}{J}=\frac{M^2g\sigma ^3}{J^2} = \left( \frac{g}{\sigma }\right) \left( \frac{\sigma }{b}\right) ^4\frac{1}{\omega ^2}. \end{aligned}$$
(3.245)

3.4. For the each of the following transformations in the two-dimensional plane, state which are projections, reflections and rotations:

$$\begin{aligned} \begin{array}{l} A_1=\left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} 0 \end{array}\right) ,~~ A_2=\left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} 1 \end{array}\right) ,~~ A_3=\left( \begin{array}{cc} 1 &{} 0 \\ 0 &{} -1 \end{array}\right) ,~~ A_4=\left( \begin{array}{cc} 0 &{} 1 \\ 1 &{} 0 \end{array}\right) ,\\ \\ A_5=\frac{1}{\sqrt{2}} \left( \begin{array}{cc} 1 &{} -1 \\ 1 &{} 1 \end{array}\right) . \end{array} \end{aligned}$$
(3.246)

3.5. Show that complex numbers \(z=x+iy\) can be written in terms of the matrices

$$\begin{aligned} A(z)=\left( \begin{array}{rr} x &{} -y \\ y &{} x \end{array} \right) \end{aligned}$$
(3.247)

satisfying \(A(z)A(w)=A(zw)\) by the rules of matrix multiplication. In particular, show that for \(z=i\), (3.247) satisfies \(I + A^2=0\), where I denotes the identify matrix (\(z=1\)).

3.6. Consider the matrix

$$\begin{aligned} A= \left( \begin{array}{cc} 1 &{} 2 \\ 2 &{} 0 \end{array}\right) . \end{aligned}$$
(3.248)

Compute the, determinant, the eigenvalues and eigenvectors.

3.7. Permutation of the x- and y-coordinates is described by

$$\begin{aligned} \mathbf{z}= x\mathbf{i}_x + y \mathbf{i}_y \rightarrow \mathbf{w}= y \mathbf{i}_x + z \mathbf{i}_y,~~w=y+ix. \end{aligned}$$
(3.249)

Show that w(z) is not analytic in z, i.e., the Cauchy-Riemann relations are not satisfied. Derive the equivalent \(2\times 2\) matrix equation for \(x^\prime = y\) and \(y^\prime = x\).

3.8. Consider the matrix

$$\begin{aligned} A= \left( \begin{array}{cc} 1 &{} 2 \\ 2 &{} 1 \end{array}\right) . \end{aligned}$$
(3.250)

Obtain the eigenvalues \(\lambda _{i}\) and eigenvectors \(\mathbf{a}_{i}\) (\(i=1,2)\) and decompose A in the form

$$\begin{aligned} A=\lambda _1 A_1 + \lambda _2A_2,~~A_i=\hat{\mathbf{a}}_i\hat{\mathbf{a}}_i^\dagger , \end{aligned}$$
(3.251)

where hat refers to normalization to unit norm.

3.9. Show the orthogonality (3.119).

3.10. If A is both unitary and Hermitian, show that \(A=A^{-1}\).

3.11. Consider the Householder matrix (3.139). Show that H is Hermitian (\(H^\dagger = H)\) and unitary \((H^\dagger H=I)\), whence it is involuntary (H is its own inverse): \(H^2=I\). In two dimensions, determine its eigenvalue-eigenvector pairs for a general direction \(\mathbf{u}\) and interpret the result geometrically. What happens in three dimensions to the multiplicity of the eigenvalues?

3.12. Let A be Hermitian, i.e., \(A^\dagger = A\). Show that A is diagonalizable according to

$$\begin{aligned} A= U \Lambda U^\dagger , \end{aligned}$$
(3.252)

where U is the unitary matrix satisfying \(U^\dagger U = I\). Compute U for A in (3.250). [Hint: Compose U from the eigenvectors of A.]

3.13. Consider the matrix

$$\begin{aligned} A= \left( \begin{array}{cc} 1 &{} 2 \\ 2 &{} 4+a \end{array}\right) . \end{aligned}$$
(3.253)

Compute the determinant and determine the condition number of A, defined by the ratio of the maximal to the minimal square root of the eigenvalues of \(A^\dagger A\). [Hint: use (3.252).] What happens when a approaches zero? Compute the solution to the system of equations

$$\begin{aligned} A\mathbf{u}=\mathbf{v}. \end{aligned}$$
(3.254)

Show that the solution is regular, respectively, ill-behaved as a approaches zero when

$$\begin{aligned} \mathbf{v} = \left( \begin{array}{r} 1 \\ 2 \end{array}\right) , ~ \left( \begin{array}{r} -2 \\ 1 \end{array}\right) . \end{aligned}$$
(3.255)

What is the condition number of \(A^2\). How does it generalize to \(A^n\) \((n\ge 3)\)?

3.14. Show that U(1) can be identified with the tangents of complex numbers on the unit circle \(S^1\). [Hint: Express elements of \(S^1\) by \(e^{i\theta }\) and generalize the Taylor expansion \(\theta \),

$$\begin{aligned} e^{i\theta } = 1 + i \theta +O\left( \theta ^2\right) , \end{aligned}$$
(3.256)

about the identity \(\theta =0\) to arbitrary \(\theta \).]Footnote 13

3.15. Show that U(1) is abelian:

$$\begin{aligned} AB-BA=0 ~~\left( A,B\,\epsilon \, U(1)\right) . \end{aligned}$$
(3.257)

3.16. Show that the elements of U(2) are of the form

$$\begin{aligned} A=e^{i\theta } \left( \begin{array}{rr} z &{} -\bar{w} \\ w &{} \bar{z} \end{array}\right) ,~~z\bar{z}+w\bar{w}=1 \end{aligned}$$
(3.258)

and that they are in general not Hermitian.

3.17. Following (3.258), specialize to \(\text{ det }\,A=1\).Footnote 14 Give a general representation of SU(2) in terms of traceless \(2\times 2\) matrices (the sum of diagonal elements being zero). Determine the number of degrees of freedom in view of the conditions \(A^\dagger A=I\) and det A = 1. Show that these traceless matrices are Hermitian and derive their eigenvalues.

3.18. Illustrate a double cover of \(S^1\) by way of a curve on a two-torus.

3.19. From the definition of the inner product of two arbitrary spinors \(o^A\) and \(\iota ^A\) and the definition of lowering indices, show that

$$\begin{aligned} o^A\iota _A = - o_A\iota ^A. \end{aligned}$$
(3.259)

3.20. In (3.165), obtain \(\iota ^A\) from a rotation of \(o^A\) and evaluate their inner product as a function of the rotation angle. Next, consider a spinor basis

$$\begin{aligned} o^A=\frac{1}{\sqrt{2}}\left( \begin{array}{cc} 1&i \end{array}\right) ,~~\iota ^A=\frac{1}{\sqrt{2}}\left( \begin{array}{cc} i&1 \end{array}\right) . \end{aligned}$$
(3.260)

Show that \(o^A\iota _A=1\) in (3.166). Rank-one matrices of the form \(o^A\bar{\iota }^{A'}\) may be expanded asFootnote 15

$$\begin{aligned} o^A\bar{\iota }^{A'}=\frac{1}{2}\left[ \left( \begin{array}{c} -i \\ 1 \end{array}\right) \left( \begin{array}{cc} 1&i \end{array}\right) \right] ^T = \left( \begin{array}{cc} -i &{} 1 \\ 1 &{} i \end{array}\right) ^T = \left( \begin{array}{cc} -i &{} 1 \\ 1 &{} i \end{array}\right) . \end{aligned}$$
(3.261)

where T denotes the ordinary matrix transpose. Express the Pauli spin matrices in this new basis similar to (3.170).

3.21. Occasionally, we allow coordinates to become complex. For reference, recall the line-element

$$\begin{aligned} ds^2 = dx^2 + dy^2= dr^2+r^2d\theta ^2 \end{aligned}$$
(3.262)

of the Euclidean plane, expressed in Cartesian and, respectively, polar coordinates. The Euclidean is flat, like an ordinary sheet of paper. ConsiderFootnote 16

$$\begin{aligned} ds^2 = -x^2 dt^2 + dx^2. \end{aligned}$$
(3.263)

Show that (3.263) again is the line-element of a flat two-surface using analytic continuation in t.Footnote 17

3.22. Consider the matrix

$$\begin{aligned} A = \left( \begin{array}{cc} 1 &{} 2 \\ 2 &{} 4+a\end{array}\right) . \end{aligned}$$
(3.264)

(i) Obtain the image space and the null space of the matrix for all a. [Hint: Distinguish between \(a=0\) and \(a\ne 0\).];

(ii) Apply Gram-Schidt orthogonalization to obtain \(A^\prime = AU\), where the column factors of \(A^\prime \) are orthogonal and U is upper triangular;

(iii) Obtain the QR factorization of A.

3.23. Following (3.239), obtain the QR factorizations of the \(2\times 2\) rotation matrix \(R(\varphi )\) and the Lorentz boost \(\Lambda (\mu )\) of Example 3.8.

3.24. Obtain the QR factorizations of general \(2\times 2\) matrices that are (a) Hermitian or (b) unitary.

3.25. Let \(i=0,1,2,3\) correspond to (txyz). For the Pauli spin matrices (3.143), show or calculate

(i) The \(\sigma _i\) are involutory: \(\sigma _1^2=\sigma _2^2=\sigma _3^2=-i\sigma _1\sigma _2\sigma _3 = I\); det \(\sigma _i=-1\) for \(i=1,2,3\) and they are trace-free, Tr\((\sigma _i)=0\).

(ii) The \(\sigma _i\) satisfy \(\sigma _i\sigma _j+\sigma _j\sigma _i = 2 \delta _{ij}I\) \((i,j=1,2,3)\).

(iii) The \(\sigma _a\) \((a=0,1,2,3)\) form a basis of the \(2\times 2\) Hermitian matrices.

(iv) The eigenvalues and eigenvectors of the \(\sigma _i\) \((i=1,2,3)\).

(v) The commutator \([\sigma _i,\sigma _j]\) for all \(i,j=1,2,3\).