Keywords

FormalPara Acronyms
FNS:

Fundamental Numerical Scheme

GPS:

Ground Positioning System

SVD:

Singular Value Decomposition

1 Introduction

Computing 3D pose from data provided by camera images and 3D sensors is one of the most fundamental problems of 3D analysis involving 3D data, including computer vision and robot control. The problem is usually formulated as minimization of a function of the form

$$\displaystyle \begin{aligned} J=J(\ldots, \boldsymbol{R}_{1},\boldsymbol{R}_{2}, \ldots, \boldsymbol{R}_{M}), \end{aligned} $$
(10.1)

where R 1, R 2, …, R M are rotation matrices, and “ … ” denotes other parameters that specify translations, object shapes, and other properties. Hereafter, we use bold uppercases to denote matrices (3 × 3 unless otherwise specified) and bold lowercases to denote vectors (3D unless otherwise specified). For a matrix A, we write its determinant and Frobenius norm as |A| and ∥A∥, respectively. For vectors a and b, we write 〈a, b〉 for their inner product and a ×b for their vector product.

For minimizing a function J in the form of Eq. (10.1), the standard approach one can immediately think of is: we first parameterize the rotation matrices in terms of, say, axis-angle, Euler angles, or quaternions; then we differentiate J with respect to the parameters and increment them so that J decreases; we iterate this. This approach is generally known as the “gradient method,” and many variations have been proposed for improving convergence, including “steepest descent,” “conjugate gradient,” “Newton iterations,” “Gauss–Newton iterations,” and the “Levenberg–Marquardt method.”

The purpose of this chapter is to show that for this type of optimization, parameterization of rotation is not necessary. After all, “differentiation” means evaluation of the change of the function value for a small variation of the variable. Hence, for differentiation with respect to rotation R, we only need to evaluate the change of the function value when a small rotation is added to R. To do this, it is sufficient to parameterize a small rotation. To be specific, we compute a small rotation that reduces the function J, add it to the current rotation R, regard the resulting rotation as a new current rotation R, and iterate this process. As a result, the matrix R is updated at each iteration in the computer memory, so that there is no need to parameterize the matrix R itself. We call this the “Lie algebra method” (this terminology is explained later).

This method has a big advantage over the parameterization approach, because any parameterization of rotation, such as axis-angle, Euler angles, and quaternions, has some singularities; if the parameter values happen to be at singularities, though very rarely, computational problems such as numerical instability may occur. Using the Lie algebra method, we need not worry about any singularities of the parameterization, because all we do is to update the current rotation by adding a small rotation. In a sense, this is obvious, but not many people understand this fact.

We first study the relationship between small rotations and angular velocities. Then, we derive the exponential expression of rotation and formalize the concept of “Lie algebra.” We describe the actual computational procedure of some computer vision problems to demonstrate how the Lie algebra method works in practice. Finally, we overview the role of Lie algebra in various computer vision applications.

2 Small Rotations and Angular Velocity

If R represents a small rotation around some axis by a small angle Δ Ω, we can Taylor-expand it in the form

$$\displaystyle \begin{aligned} \boldsymbol{R}=\boldsymbol{I}+\boldsymbol{A}\Delta\Omega+O(\Delta\Omega)^{2}, \end{aligned} $$
(10.2)

for some matrix A, where I is the identity and O( Δ Ω)2 denotes terms of second or higher orders in Δ Ω. Since R is a rotation matrix,

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \boldsymbol{R}\boldsymbol{R}^{\top}&\displaystyle =&\displaystyle (\boldsymbol{I}+\boldsymbol{A}\Delta\Omega+O(\Delta\Omega)^{2}) (\boldsymbol{I}+\boldsymbol{A}\Delta\Omega+O(\Delta\Omega)^{2})^{\top}\\ &\displaystyle =&\displaystyle \boldsymbol{I}+(\boldsymbol{A}+\boldsymbol{A}^{\top})\Delta\Omega+O(\Delta\Omega)^{2} \end{array} \end{aligned} $$
(10.3)

must be identically equal to I for any Δ Ω. Hence, A + A = O, or

$$\displaystyle \begin{aligned} \boldsymbol{A}^{\top}=-\boldsymbol{A}. \end{aligned} $$
(10.4)

This means that A is an antisymmetric matrix, so we can write it as

$$\displaystyle \begin{aligned} \boldsymbol{A}=\left(\begin{array}{ccc} 0 & -l_{3} & l_{2} \\ l_{3} & 0 & -l_{1} \\ -l_{2} & l_{1} & 0 \end{array}\right) \end{aligned} $$
(10.5)

for some l 1, l 2, and l 3. If a vector a = \(\Bigl (a_{i}\Bigr )\) (abbreviation of a vector whose ith component is a i) is rotated to a by the rotation of Eq. (10.2), we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{a}' &=& \left(\boldsymbol{I}+\boldsymbol{A}\Delta\Omega+O(\Delta\Omega)^{2}\right)\boldsymbol{a}= \boldsymbol{a}+ \left(\begin{array}{ccc} 0 & -l_{3} & l_{2} \\ l_{3} & 0 & -l_{1} \\ -l_{2} & l_{1} & 0 \end{array}\right) \left(\begin{array}{c} a_{1} \\ a_{2} \\ a_{3} \end{array}\right)\Delta\Omega +O(\Delta\Omega)^{2} \\ &=& \boldsymbol{a}+\left(\begin{array}{c} l_{2}a_{3}-l_{3}a_{2} \\ l_{3}a_{1}-l_{1}a_{3} \\ l_{1}a_{2}-l_{2}a_{1} \end{array}\right) \Delta\Omega+O(\Delta\Omega)^{2}= \boldsymbol{a}+\boldsymbol{l}\times\boldsymbol{a}\Delta\Omega+O(\Delta\Omega)^{2}, {} \end{array} \end{aligned} $$
(10.6)

where we let l = \(\Bigl (l_{i}\Bigr )\). Suppose this describes a continuous rotational motion over a small time interval Δt. Its velocity is given by

$$\displaystyle \begin{aligned} \dot{\boldsymbol{a}}=\lim_{\Delta t\rightarrow 0} \frac{\boldsymbol{a}'-\boldsymbol{a}}{\Delta t}=\omega\boldsymbol{l}\times\boldsymbol{a}, \end{aligned} $$
(10.7)

where we define the angular velocity ω by

$$\displaystyle \begin{aligned} \omega=\lim_{\Delta t\rightarrow 0}\frac{\Delta\Omega}{\Delta t}. \end{aligned} $$
(10.8)

Equation (10.7) states that the velocity \(\dot {\boldsymbol {a}}\) is orthogonal to both l and a and that its magnitude equals ω times the area of the parallelogram made by l and a. From the geometric consideration, the velocity \(\dot {\boldsymbol {a}}\) is orthogonal to the axis of rotation and a itself (Fig. 10.1). If we let θ be the angle made by a and that axis, the distance of the endpoint of a from the axis is \(\|\boldsymbol {a}\|\sin \theta \), and the definition of the angular velocity ω implies \(\|\dot {\boldsymbol {a}}\|\) = \(\omega \|\boldsymbol {a}\|\sin \theta \). Since \(\dot {\boldsymbol {a}}\) is orthogonal to l and a and since \(\|\dot {\boldsymbol {a}}\|\) = \(\omega \|\boldsymbol {a}\|\sin \theta \) equals the area of the parallelogram made by l and a, we conclude that l is the unit vector along the axis of rotation. In physics, the vector ω = ω l is known as the angular velocity vector. Using this notation, we can write Eq. (10.7) as

$$\displaystyle \begin{aligned} \dot{\boldsymbol{a}}=\boldsymbol{\omega}\times\boldsymbol{a}. \end{aligned} $$
(10.9)
Fig. 10.1
figure 1

Vector a is rotating around an axis in the direction of the unit vector l with angular velocity ω. Its velocity vector is \(\dot { \boldsymbol {a}}\).

3 Exponential Expression of Rotation

If we write R l( Ω) to denote the rotation around axis l (unit vector) by angle Ω, Eq. (10.2) equals R l( Δ Ω). If we add it to rotation R l( Ω), their composition is R l( Δ Ω)R l( Ω) = R l( Ω +  Δ Ω). Hence, the derivative of R l( Ω) with respect to Ω is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{d\boldsymbol{R}_{\boldsymbol{l}}(\Omega)}{d\Omega} &\displaystyle =&\displaystyle \lim_{\Delta\Omega\rightarrow0} \frac{\boldsymbol{R}_{\boldsymbol{l}}(\Omega+\Delta\Omega)-\boldsymbol{R}_{\boldsymbol{l}}(\Omega)}{\Delta\Omega}= \lim_{\Delta\Omega\rightarrow0} \frac{\boldsymbol{R}_{\boldsymbol{l}}(\Delta\Omega)\boldsymbol{R}_{\boldsymbol{l}}(\Omega)-\boldsymbol{R}_{\boldsymbol{l}}(\Omega)}{\Delta\Omega} \\ &\displaystyle =&\displaystyle \lim_{\Delta\Omega\rightarrow0} \frac{\boldsymbol{R}_{\boldsymbol{l}}(\Delta\Omega)-\boldsymbol{I}}{\Delta\Omega}\boldsymbol{R}_{\boldsymbol{l}}(\Omega)= \boldsymbol{A}\boldsymbol{R}_{\boldsymbol{l}}(\Omega). \end{array} \end{aligned} $$
(10.10)

Differentiating this repeatedly, we obtain

(10.11)

where the argument ( Ω) is omitted. Since R l(0) = I, the Taylor expansion of R l( Ω) around Ω = 0 is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} \boldsymbol{R}_{\boldsymbol{l}}(\Omega) &\displaystyle =&\displaystyle \boldsymbol{I}+\ \frac{d\boldsymbol{R}}{d\Omega}\Big|{}_{\Omega=0}\Omega+ \frac{1}{2}\frac{d^{2}\boldsymbol{R}}{d\Omega^{2}}\Big|{}_{\Omega=0}\Omega^{2}+ \frac{1}{3!}\frac{d^{3}\boldsymbol{R}}{d\Omega^{3}}\Big|{}_{\Omega=0}\Omega^{3}+\ldots \\ &\displaystyle =&\displaystyle \boldsymbol{I}+\Omega\boldsymbol{A}+\frac{\Omega}{2}\boldsymbol{A}^{2}+ \frac{\Omega}{3!}\boldsymbol{A}^{3}+\ldots =e^{\Omega\boldsymbol{A}}, {} \end{array} \end{aligned} $$
(10.12)

where we define the exponential of matrix by the following series expansion:

$$\displaystyle \begin{aligned} e^{\boldsymbol{X}}=\sum_{k=0}^{\infty}\frac{\boldsymbol{X}^{k}}{k!}. \end{aligned} $$
(10.13)

In Eq. (10.12), the matrix A specifies the axis direction in the form of Eq. (10.5). Hence, Eq. (10.12) expresses the rotation R l( Ω) in terms of its axis l and angle Ω. An explicit expression for such a rotation, called the Rodrigues formula, is well known (see, e.g., [11, 14]):

$$\displaystyle \begin{aligned} \begin{array}{rcl} &&{\boldsymbol{R}_{\boldsymbol{l}}(\Omega)} \\ &&\ = \left(\begin{array}{ccc} \cos\Omega+l_{1}^{2}(1-\cos\Omega) & l_{1}l_{2}(1-\cos\Omega)-l_{3}\sin\Omega & l_{1}l_{3}(1-\cos\Omega)+l_{2}\sin\Omega \\ l_{2}l_{1}(1-\cos\Omega)+l_{3}\sin\Omega & \cos\Omega+l_{2}^{2}(1-\cos\Omega) & l_{2}l_{3}(1-\cos\Omega)-l_{1}\sin\Omega \\ l_{3}l_{1}(1-\cos\Omega)-l_{2}\sin\Omega & l_{3}l_{2}(1-\cos\Omega)+l_{1}\sin\Omega & \cos\Omega+l_{3}^{2}(1-\cos\Omega) \end{array}\right). {} \end{array} \end{aligned} $$
(10.14)

In the following, we combine the axis l and angle Ω, as in the case of the angular velocity vector, as a single vector in the form of

$$\displaystyle \begin{aligned} \boldsymbol{\Omega}=\Omega\boldsymbol{l}, \end{aligned} $$
(10.15)

which we call the rotation vector. We also write the matrix that represents the corresponding rotation as R( Ω). Since Ω1 = Ωl 1, Ω2 = Ωl 2, and Ω3 = Ωl 3, Eq. (10.5) is rewritten as

$$\displaystyle \begin{aligned} \Omega\boldsymbol{A}=\Omega_{1}\boldsymbol{A}_{1}+\Omega_{2}\boldsymbol{A}_{2}+\Omega_{3}\boldsymbol{A}_{3}, \end{aligned} $$
(10.16)

where we define the matrices A 1, A 2, and A 3 by

(10.17)

Hence, Eq. (10.12) is also written as

$$\displaystyle \begin{aligned} \boldsymbol{R}(\boldsymbol{\Omega})=e^{\Omega_{1}\boldsymbol{A}_{1}+\Omega_{2}\boldsymbol{A}_{2}+\Omega_{3}\boldsymbol{A}_{3}}, \end{aligned} $$
(10.18)

which express the Rodrigues formula of Eq. (10.14).

4 Lie Algebra of Infinitesimal Rotations

Consider a rotation R(t) continuously changing with parameter t, which can be interpreted as time or angle of rotation or some control parameter. We assume it as a dimensionless parameter with appropriate normalization. We regard t = 0 as corresponding to the identity I. We call a “linear” variation of R(t) around t = 0 an infinitesimal rotation. To be specific, we expand R(t) for the small change δt of t and ignore terms of order two and higher in δt. From Eq. (10.2), we see that an infinitesimal rotation is expressed in the form

$$\displaystyle \begin{aligned} \boldsymbol{I}+\boldsymbol{A}\delta t, \end{aligned} $$
(10.19)

for some antisymmetric matrix A, which we call the generator of the infinitesimal rotation. If we accumulate this infinitesimal rotation continuously, we obtain a finite rotation e tA as shown in the preceding section.

Note that any multiple of an infinitesimal rotation is also an infinitesimal rotation. This may sound counterintuitive, but this is the consequence of our defining infinitesimal rotations as “linear” variations of rotations. If the parameter t is regarded as time, multiplication of a generator by a constant c means multiplication of the instantaneous velocity by c.

We also see that the composition of infinitesimal rotations is also an infinitesimal rotation. In fact, if infinitesimal rotations I + A δt and I + A ′δt are composed, we obtain

(10.20)

Recall that terms of order two and higher in δt are always ignored. From this, we see that, unlike finite rotations, the composition of infinitesimal rotations is commutative, i.e., it does not depend on the order of composition; the generator of the composed infinitesimal rotation is the sum of their generators. If we identify an infinitesimal rotation with its generator, we see that the set of infinitesimal rotations constitutes a linear space.

A linear space is called an algebra if it is closed under some product operation. The set of all the generators of infinitesimal rotations can be regarded as an algebra if we define a product of generators A and B by

$$\displaystyle \begin{aligned}{}[\boldsymbol{A},\boldsymbol{B}]=\boldsymbol{A}\boldsymbol{B}-\boldsymbol{B}\boldsymbol{A}, \end{aligned} $$
(10.21)

called the commutator of A and B. By definition, this is anticommutative:

$$\displaystyle \begin{aligned}{}[\boldsymbol{A},\boldsymbol{B}]=-[\boldsymbol{B},\boldsymbol{A}]. \end{aligned} $$
(10.22)

The commutator is bilinear:

(10.23)

and the following Jacobi identity holds:

$$\displaystyle \begin{aligned}{}[\boldsymbol{A},[\boldsymbol{B},\boldsymbol{C}]]+[\boldsymbol{B},[\boldsymbol{C},\boldsymbol{A}]]+[\boldsymbol{C},[\boldsymbol{A},\boldsymbol{B}]]=\boldsymbol{O}. \end{aligned} $$
(10.24)

An operation [ ⋅ , ⋅ ] which maps two elements to another element is called a Lie bracket if the identities of Eqs. (10.22), (10.23), and (10.24) hold. Evidently, the commutator of Eq. (10.21) defines a Lie bracket. An algebra equipped with a Lie bracket is called a Lie algebra.

Thus, the set of infinitesimal rotations is a Lie algebra under the commutator. Since the generator A is an antisymmetric matrix, it has three degrees of freedom. Hence, the Lie algebra of infinitesimal rotations is three-dimensional with the matrices A 1, A 2, and A 3 in Eq. (10.17) as its basis. It is easy to see that they satisfy

(10.25)

In terms of this basis, an arbitrary generator A is expressed in the form

$$\displaystyle \begin{aligned} \boldsymbol{A}=\omega_{1}\boldsymbol{A}_{1}+\omega_{2}\boldsymbol{A}_{2}+\omega_{3}\boldsymbol{A}_{3}, \end{aligned} $$
(10.26)

for some ω 1, ω 2, and ω 3. From the definition of A 1, A 2, and A 3 in Eq. (10.17), Eq. (10.26) is rewritten as

$$\displaystyle \begin{aligned} \boldsymbol{A}=\left(\begin{array}{ccc} 0 & -\omega_{3} & \omega_{2} \\ \omega_{3} & 0 & -\omega_{1} \\ -\omega_{2} & \omega_{1} & 0 \end{array}\right). \end{aligned} $$
(10.27)

This defines a 1-to-1 correspondence between a generator A and a vector ω = \(\Bigl (\omega _{i}\Bigr )\). Let ω = \(\Bigl (\omega _{i}^{\prime }\Bigr )\) be the vector that corresponds to generator A . Then, the commutator of A and A is

$$\displaystyle \begin{aligned} \begin{array}{rcl} [\boldsymbol{A},\boldsymbol{A}'] &=& \left(\begin{array}{ccc} 0 & -\omega_{3} & \omega_{2} \\ \omega_{3} & 0 & -\omega_{1} \\ -\omega_{2} & \omega_{1} & 0 \end{array}\right) \left(\begin{array}{ccc} 0 & -\omega_{3}^{\prime} & \omega_{2}^{\prime} \\ \omega_{3}^{\prime} & 0 & -\omega_{1}^{\prime} \\ -\omega_{2}^{\prime} & \omega_{1}^{\prime} & 0 \end{array}\right) \\ &&- \left(\begin{array}{ccc} 0 & -\omega_{3}^{\prime} & \omega_{2}^{\prime} \\ \omega_{3}^{\prime} & 0 & -\omega_{1}^{\prime} \\ -\omega_{2}^{\prime} & \omega_{1}^{\prime} & 0 \end{array}\right) \left(\begin{array}{ccc} 0 & -\omega_{3} & \omega_{2} \\ \omega_{3} & 0 & -\omega_{1} \\ -\omega_{2} & \omega_{1} & 0 \end{array}\right) \\ &=& \left(\begin{array}{ccc} 0 & -(\omega_{1}\omega_{2}^{\prime}-\omega_{2}\omega_{1}^{\prime}) & \omega_{3}\omega_{1}^{\prime}-\omega_{1} \omega_{3}^{\prime} \\ \omega_{1}\omega_{2}^{\prime}-\omega_{2}\omega_{1}^{\prime} & 0 & -(\omega_{2}\omega_{3}^{\prime}-\omega_{3}\omega_{2}^{\prime}) \\ -(\omega_{3}\omega_{1}^{\prime}-\omega_{1} \omega_{3}^{\prime}) & \omega_{2}\omega_{3}^{\prime}-\omega_{3}\omega_{2}^{\prime} & 0 \end{array}\right),\qquad \;\; \end{array} \end{aligned} $$
(10.28)

which shows that the vector product ω ×ω corresponds to the commutator [A, A ].

Evidently, all the relations of Eqs. (10.22), (10.23), and (10.24) hold if the commutator [A, B] is replaced by the vector product a ×b. In other words, the vector product is a Lie bracket, and the set of vectors is also a Lie algebra under the Lie bracket [a, b] = a ×b. As shown above, the Lie algebra of vectors is the same as or, to be precise, isomorphic to the Lie algebra of infinitesimal rotations. Indeed, the matrices A 1, A 2, and A 3 in Eq. (10.17) represent infinitesimal rotations around the x-, y-, and z-axes, respectively, and Eq. (10.25) corresponds to the relationships e 2 ×e 3 = e 1, e 3 ×e 1 = e 2, and e 1 ×e 2 = e 3 among the coordinate basis vectors e 1 = (1, 0, 0), e 2 = (0, 1, 0), and e 3 = (0, 0, 1). The argument in Sects. 10.2 and 10.3 implies that identifying the generator A of Eq. (10.27) with the vector ω = \(\Bigl (\omega _{i}\Bigr )\) is nothing but identifying an infinitesimal rotation with an instantaneous angular velocity vector. In other words, we can think of the Lie algebra of infinitesimal rotations as the set of all angular velocity vectors. For more general treatments of Lie algebras, see [11].

5 Optimization of Rotation

Given a function J(R) of rotation R, we now consider how to minimize it, assuming that the minimum exists. In general, the solution can be obtained by differentiating J(R) with respect to R and finding the value of R for which the derivative vanishes. But how should we interpret differentiating with respect to R?

As is well known, the derivative of a function f(x) is the rate of change of the function value f(x) when the argument x is infinitesimally incremented to x + δx. By “infinitesimal increment,” we mean considering the “linear” variation, ignoring higher order terms in δx. In other words, if the function value changes to f(x + δx) = f(x) + aδx + …, we call the coefficient a of δx the differential coefficient, or the derivative, of f(x) with respect to x and write a = f′(x). This is equivalently written as a = limδx→0(f(x + δx) − f(x))∕δx. Evidently, if a function f(x) takes its minimum at x, the function value does not change by infinitesimally incrementing x; the resulting change is of a high order in the increment. This is the principle of how we can minimize (or maximize) a function by finding the zero of its derivative. Thus, in order to minimize J(R), we only need to find an R such that its infinitesimal variation does not change the value of J(R) except for high order terms.

This consideration implies that “differentiation” of J(R) with respect to R means evaluation of the rate of the change of J(R) when an infinitesimal rotation is added to R. If an infinitesimal rotation of Eq. (10.19) is added to R, we obtain

$$\displaystyle \begin{aligned} (\boldsymbol{I}+\boldsymbol{A}\delta t)\boldsymbol{R}=\boldsymbol{R}+\boldsymbol{A}\boldsymbol{R}\delta t. \end{aligned} $$
(10.29)

The generator A is represented by a vector ω via Eq. (10.27). In the following, we combine the vector ω and the parameter δt of infinitesimal variation as a single vector

$$\displaystyle \begin{aligned} \Delta\boldsymbol{\omega}=\boldsymbol{\omega}\delta t, \end{aligned} $$
(10.30)

which we call the small rotation vector, an infinitesimal version of the finite rotation vector Ω of Eq. (10.15). We also denote the antisymmetric matrix A corresponding to vector ω = (ω 1, ω 2, ω 3) via Eq. (10.27) byFootnote 1 A(ω). As shown in Eq. (10.6), the following identity holds for an arbitrary vector a:

$$\displaystyle \begin{aligned} \boldsymbol{A}(\boldsymbol{\omega})\boldsymbol{a}=\boldsymbol{\omega}\times\boldsymbol{a}. \end{aligned} $$
(10.31)

Using this notation, we can write Eq. (10.29) as R + A( Δω)R in terms of a small rotation vector Δω. We substitute this into J(R). If J(R + A( Δω)R) is written in the form

$$\displaystyle \begin{aligned} J(\boldsymbol{R}+\boldsymbol{A}(\Delta\boldsymbol{\omega})\boldsymbol{R})= J(\boldsymbol{R})+\langle\boldsymbol{g},\Delta\boldsymbol{\omega}\rangle, \end{aligned} $$
(10.32)

for some vector g by ignoring higher order terms in Δω (recall that 〈a, b〉 denotes the inner product of vectors a and b), we call g the gradient, or the first derivative, of J(R) with respect to R.

Since g should vanish at R for which J(R) takes its minimum, we need to solve g = 0, but this is not easy in general. So, we do iterative search, starting from an initial value R and successively modifying it so that J(R) reduces. Note that the value of the gradient g depends on R, i.e., g is a function of R. If, after substituting R + A( Δω)R for R in g(R), we can write

$$\displaystyle \begin{aligned} \boldsymbol{g}(\boldsymbol{R}+\boldsymbol{A}(\Delta\boldsymbol{\omega})\boldsymbol{R})=\boldsymbol{g}(\boldsymbol{R})+\boldsymbol{H}\Delta\boldsymbol{\omega}, \end{aligned} $$
(10.33)

for some symmetric matrix H by ignoring higher order terms in Δω, we call the matrix H the Hessian, or the second derivative, of J(R) with respect to R. If the gradient g and the Hessian H are given, the value of J(R + A( Δω)R) is approximated in the form

$$\displaystyle \begin{aligned} J(\boldsymbol{R}+\boldsymbol{A}(\Delta\boldsymbol{\omega})\boldsymbol{R})=J(\boldsymbol{R})+ \langle\boldsymbol{g},\Delta\boldsymbol{\omega}\rangle+\frac{1}{2}\langle\Delta\boldsymbol{\omega},\boldsymbol{H}\Delta\boldsymbol{\omega}\rangle \end{aligned} $$
(10.34)

by ignoring higher order terms in Δω.

Now, we regard the “current” R as a fixed constant and regard the above expression as a function of Δω. Since this is a quadratic polynomial in Δω, its derivative with respect to Δω is g + H Δω. Hence, this polynomial in Δω takes its minimum for

$$\displaystyle \begin{aligned} \Delta\boldsymbol{\omega}=-\boldsymbol{H}^{-1}\boldsymbol{g}. \end{aligned} $$
(10.35)

Namely, the rotation for which Eq. (10.34) takes its minimum is approximately (I + A( Δω))R for that Δω (recall that the current value R is regarded as a fixed constant). However, I + A( Δω) is not an exact rotation matrix, although the discrepancy is of higher order in δt. To make it an exact rotation matrix, we add higher order correction terms as an infinite series expansion in the form of Eq. (10.12). Thus, the rotation matrix for which Eq. (10.34) takes its minimum is approximated by e A( Δω) R. Regarding this as the “new” value of the current rotation, we repeat this process. The procedure is described as follows.

  1. 1.

    Provide an initial value for R.

  2. 2.

    Compute the gradient g and the Hessian H of J(R).

  3. 3.

    Solve the following linear equation in Δω:

    $$\displaystyle \begin{aligned} \boldsymbol{H}\Delta\boldsymbol{\omega}=-\boldsymbol{g}. \end{aligned} $$
    (10.36)
  4. 4.

    Update R in the form

    $$\displaystyle \begin{aligned} \boldsymbol{R}\leftarrow e^{\boldsymbol{A}(\Delta\boldsymbol{\omega})}\boldsymbol{R}. \end{aligned} $$
    (10.37)
  5. 5.

    If ∥ Δω∥ ≈ 0, return R and stop. Else, go back to Step 2.

This is nothing but the well-known Newton iterations. For Newton iterations, we approximate the object function by a quadratic polynomial in the neighborhood of the current argument, proceed to the value that gives the minimum of that quadratic approximation, and repeat this. The difference of the above procedure from the usual Newton iterations is that we analyze the minimum of the quadratic approximation not in the space of the rotation R but in the Lie algebra of infinitesimal rotations. As we noted earlier, the space of R and its Lie algebra are not the same, having higher order discrepancies.

We can think of this situation as follows. Imagine the set of all rotations, defined by the “nonlinear” constraint R R = I and |R| = 1 (recall that |R| denotes the determinant), which is called the special orthogonal group Footnote 2 of dimension 3, or the group of rotations for short, and denoted by SO(3). This is a “curved space” in the 9-dimensional space of the elements of R. The Lie algebra of infinitesimal rotations defined by the “linear” constraint A + A = O can be thought of as a “flat” tangent space to it at the current R, which we denote by T R(SO(3)), parameterized by ( Δω 1, Δω 2, Δω 3) with the origin (0, 0, 0) at R. We “project” a point in the Lie algebra T R(SO(3)) to a nearby point of SO(3) by the exponential mapping e A( Δω) of Eq. (10.12) (Fig. 10.2) (see, e.g., [11]). Hereafter, we call this scheme of optimization the Lie algebra method.

Fig. 10.2
figure 2

The Lie algebra of infinitesimal rotations can be thought of as the tangent space T R(SO(3)) to the group of rotations SO(3) at R. The increment Δω in the Lie algebra is projected to the point e A( Δω) R of S0(3)

Note that in actual computation, we need not compute the series expansion of Eq. (10.12) in Eq. (10.37). Let Δ Ω = ∥ Δω∥ and l = \(\mathcal { N}[\Delta \boldsymbol {\omega }]\), where \(\mathcal {N}[\boldsymbol {a}]\) denotes normalization to unit norm: \(\mathcal {N}[\boldsymbol {a}]\)a∕∥a∥. As mentioned in Sect. 10.3, we can write e A( Δω) = R l( Δ Ω), i.e., the rotation of angle Δ Ω around axis l, which can be computed using the Rodrigues formula of Eq. (10.14).

The criterion ∥ Δω∥ ≈ 0 for convergence is set by a predetermined threshold. If Δω is 0, Eq. (10.35) implies g = 0, producing a local minimum of J(R). In general, iterative methods of this type are not necessarily guaranteed to converge when started from an arbitrary initial value (some methods are guaranteed, though). Hence, we need to start the iterations from a value close to the desired solution.

6 Rotation Estimation by Maximum Likelihood

Given two sets of 3D points x 1, …, x N and \(\boldsymbol {x}_{1}^{\prime }\), …, \(\boldsymbol {x}_{N}^{\prime }\) obtained by 3D sensing, we want to know the rigid (or Euclidean) motion between them (Fig. 10.3). A rigid motion consists of a translation t and a rotation R. Translation is easily computed by comparing the centroids of the N points before and after the motion:

(10.38)

Let a α and \(\boldsymbol {a}_{\alpha }^{\prime }\) be the displacements of x α and \(\boldsymbol {x}_{\alpha }^{\prime }\) from their respective centroids:

(10.39)

The translation is given by t = \(\boldsymbol {x}_{C}^{\prime }-\boldsymbol {x}_{C}\), and the rotation R is estimated so that \(\boldsymbol {a}_{\alpha }^{\prime }\)Ra α, α = 1, .., N, holds as accurately as possible. We formulate this problem as follows.

Fig. 10.3
figure 3

Observing N points {x α} moving to {\( \boldsymbol {x}_{\alpha }^{\prime }\)}, we want to know their translation t and the rotation R

We regard the data vectors a α and \(\boldsymbol {a}_{\alpha }^{\prime }\) as displaced from their true values \(\bar {\boldsymbol {a}}_{\alpha }\) and \(\bar {\boldsymbol {a}}_{\alpha }^{\prime }\) by noise and write

(10.40)

We view Δa α and \(\Delta \boldsymbol {a}_{\alpha }^{\prime }\) as independent Gaussian random variables with mean 0 and covariance matrices V [a α] and \(V[\boldsymbol {a}_{\alpha }^{\prime }]\), respectively. We write

(10.41)

and call V 0[a α] and \(V_{0}[\boldsymbol {a}_{\alpha }^{\prime }]\) the normalized covariance matrices and σ the noise level. The normalized covariance matrices describe the directional noise properties that reflect the characteristics of the 3D sensing, which we assume is known, while the noise level, which indicates the absolute noise magnitude, is unknown. Thus, the probability density of Δa α, \(\Delta \boldsymbol {a}_{\alpha }^{\prime }\), α = 1, .., N, is written as

$$\displaystyle \begin{aligned} \begin{array}{rcl} p &\displaystyle =&\displaystyle \prod_{\alpha=1}^{N} \frac{e^{-\langle\Delta\boldsymbol{a}_{\alpha},V_{0}[\boldsymbol{a}_{\alpha}]^{-1}\Delta\boldsymbol{a}_{\alpha}\rangle/2\sigma^{2}}} {\sqrt{(2\pi)^{3}|V_{0}[\boldsymbol{a}_{\alpha}]|}\sigma^{3}} \frac{e^{-\langle\Delta\boldsymbol{a}_{\alpha}^{\prime},V_{0}[\boldsymbol{a}_{\alpha}^{\prime}]^{-1}\Delta\boldsymbol{a}_{\alpha'}\rangle/2\sigma^{2}}} {\sqrt{(2\pi)^{3}|V_{0}[\boldsymbol{a}_{\alpha}]'}|\sigma^{3}} \\ &\displaystyle =&\displaystyle \frac{e^{-\sum_{\alpha=1}^{N}( \langle\boldsymbol{a}_{\alpha}-\bar{\boldsymbol{a}}_{\alpha},V_{0}[\boldsymbol{a}_{\alpha}](\boldsymbol{a}_{\alpha}-\bar{\boldsymbol{a}}_{\alpha})\rangle+ \langle\boldsymbol{a}_{\alpha}^{\prime}-\bar{\boldsymbol{a}}_{\alpha}^{\prime},V_{0}[\boldsymbol{a}_{\alpha}](\boldsymbol{a}_{\alpha}^{\prime}-\bar{\boldsymbol{a}}_{\alpha}^{\prime})\rangle )/2\sigma^{2}}}{ \prod_{\alpha=1}^{N}(2\pi)^{3}\sqrt{|V_{0}[\boldsymbol{a}_{\alpha}]||V_{0}[\boldsymbol{a}_{\alpha}]'|}\sigma^{6}}. \end{array} \end{aligned} $$
(10.42)

When regarded as a function of observations a α, \(\boldsymbol {a}_{\alpha }^{\prime }\), α = 1, …, N, this expression is called their likelihood. Maximum likelihood estimation means computing the values \(\bar {\boldsymbol {a}}_{\alpha }\), \(\bar {\boldsymbol {a}}_{\alpha }^{\prime }\), α = 1, .., N, and R that minimize this subject to

(10.43)

This is equivalent to minimizing

$$\displaystyle \begin{aligned} J=\frac{1}{2} \sum_{\alpha=1}^{N}( \langle\boldsymbol{a}_{\alpha}-\bar{\boldsymbol{a}}_{\alpha},V_{0}[\boldsymbol{a}_{\alpha}](\boldsymbol{a}_{\alpha}-\bar{\boldsymbol{a}}_{\alpha})\rangle+ \langle\boldsymbol{a}_{\alpha}^{\prime}-\bar{\boldsymbol{a}}_{\alpha}^{\prime},V_{0}[\boldsymbol{a}_{\alpha}](\boldsymbol{a}_{\alpha}^{\prime}-\bar{\boldsymbol{a}}_{\alpha}^{\prime})\rangle ), \end{aligned} $$
(10.44)

which is called the Mahalanobis distance, often called the reprojection error in the computer vision community, subject to Eq. (10.43). Introducing Lagrange multipliers for the constraint of Eq. (10.43) and eliminating \(\bar {\boldsymbol {a}}_{\alpha }\) and \(\bar {\boldsymbol {a}}_{\alpha }^{\prime }\), we can rewrite Eq. (10.44) in the form

$$\displaystyle \begin{aligned} J=\frac{1}{2}\sum_{\alpha=1}^{N}\langle\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha},\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha})\rangle, \end{aligned} $$
(10.45)

where we put

$$\displaystyle \begin{aligned} \boldsymbol{V}_{\alpha}=\boldsymbol{R} V_{0}[\boldsymbol{a}_{\alpha}]\boldsymbol{R}^{\top}+V_{0}[\boldsymbol{a}_{\alpha}^{\prime}], \end{aligned} $$
(10.46)

and define the matrix W α by

$$\displaystyle \begin{aligned} \boldsymbol{W}_{\alpha}=\boldsymbol{V}_{\alpha}^{-1}. \end{aligned} $$
(10.47)

We see that for maximum likelihood estimation, we need not know the unknown noise level σ, i.e., it is sufficient to know the covariance matrices up to scale.

If the noise characteristics are the same for all the data, the distribution is said to be homogeneous, otherwise it is inhomogeneous. If the noise occurrence is the same in all directions, the distribution is said to be isotropic, otherwise it is anisotropic. When the noise distribution is homogeneous and isotropic, we can let \(V_{0}[\boldsymbol {a}_{\alpha }] = V_{0}[\boldsymbol {a}_{\alpha }^{\prime }] = \boldsymbol {I}\), which means V α = 2I and W α = I∕2. Hence, minimizing Eq. (10.45) is equivalent to minimizing \(\sum _{\alpha =1}^{N}\|\boldsymbol {a}_{\alpha }^{\prime }-\boldsymbol {R}\boldsymbol {a}_{\alpha }\|{ }^{2}\), which is known as least-squares estimation or the Procrustes problem. In this case, the solution can be analytically obtained. For nondegenerate data distributions, Arun et al. [1] showed that the solution is directly given using the singular value decomposition (SVD), and Kanatani [12] generalized it to include degenerate distributions. Horn [10] showed an alternative method, using the quaternion representation of rotations, which also works for degenerate distributions.

However, the noise distribution of 3D sensing for computer vision applications is hardly homogeneous or isotropic. Today, various types of 3D sensor are available including stereo vision and laser or ultrasonic emission, and they are used in such applications as manufacturing inspection, human body measurement, archeological measurement, camera autofocusing, and autonomous navigation [3, 20, 21]. Recently, an easy-to-use device called “kinect” is popular. For all such devices, the accuracy in the depth direction (e.g., the direction of the camera lens axis or laser/ultrasonic emission) is different from that in the direction orthogonal to it. The covariance matrix of 3D sensing by stereo vision can be analytically evaluated from the camera setting configuration. Many 3D sensor manufacturers provide the covariance of their devices. Here, we consider minimization of Eq. (10.45) for inhomogeneous and anisotropic noise distribution with known (up to scale) covariance matrices.

This problem was first solved by Ohta and Kanatani [18] by combining the quaternion representation of rotations and a scheme of iterating eigenvalue computation called renormalization. Later, Kanatani and Matsunaga [15] solved the same problem by a method called extended FNS (Fundamental Numerical Scheme), which also iterates eigenvalue computation but can be applied not to just rotation but to all subgroups of affine transformations including rigid motions and similarities. They used their scheme for land deformation analysis, using GPS measurement. The GPS land surface measurement data and their covariance matrices are available on the websites of government agencies. Here, we show how the Lie algebra method works for minimizing Eq. (10.45).

Replacing R by R + A( Δω)R in Eq. (10.45), we see that the linear increment of J is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \Delta J&\displaystyle =&\displaystyle - \sum_{\alpha=1}^{N}\langle\boldsymbol{A}(\Delta\boldsymbol{\omega})\boldsymbol{R}\boldsymbol{a}_{\alpha},\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha})\rangle \\ &\displaystyle &\displaystyle +\frac{1}{2}\sum_{\alpha=1}^{N}\langle\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha},\Delta\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha})\rangle, \end{array} \end{aligned} $$
(10.48)

where we have noted that the right side of Eq. (10.45) is symmetric with respect to the two R’s in the expression so that we only need to consider the increment of one R and multiply the result by 2. Using the identity of Eq. (10.31), we can write the first term on the right side of Eq. (10.48) as

$$\displaystyle \begin{aligned} -\sum_{\alpha=1}^{N}\langle\Delta\boldsymbol{\omega}\times\boldsymbol{R}\boldsymbol{a}_{\alpha},\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha})\rangle =-\langle\Delta\boldsymbol{\omega},\sum_{\alpha=1}^{N} (\boldsymbol{R}\boldsymbol{a}_{\alpha})\times\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha})\rangle, \end{aligned} $$
(10.49)

where we have used the identity 〈a ×b, c〉 = 〈a, b ×c〉. For evaluating ΔW α in the second term on the right side of Eq. (10.48), we rewrite Eq. (10.47) as W α V α = I, from which we obtain ΔW α V α + W α ΔV α = O. Using Eq. (10.47) again, we can write ΔW α as

$$\displaystyle \begin{aligned} \Delta\boldsymbol{W}_{\alpha}=-\boldsymbol{W}_{\alpha}\Delta\boldsymbol{V}_{\alpha}\boldsymbol{W}_{\alpha}. \end{aligned} $$
(10.50)

From Eq. (10.46), we obtain

$$\displaystyle \begin{aligned} \Delta\boldsymbol{W}_{\alpha}=-\boldsymbol{W}_{\alpha}( \boldsymbol{A}(\Delta\boldsymbol{\omega})\boldsymbol{R} V[\boldsymbol{a}_{\alpha}]\boldsymbol{R}^{\top}+ \boldsymbol{R} V[\boldsymbol{a}_{\alpha}](\boldsymbol{A}(\Delta\boldsymbol{\omega})\boldsymbol{R})^{\top} )\boldsymbol{W}_{\alpha}, \end{aligned} $$
(10.51)

which we substitute into the second term on the right side of Eq. (10.48). Note that the two terms on the right side of Eq. (10.51) are transpose of each other and that the second term on the right side of Eq. (10.48) is a quadratic form in \(\boldsymbol {a}_{\alpha }^{\prime }-\boldsymbol {R}\boldsymbol {a}_{\alpha }\). Hence, we only need to consider one term of Eq. (10.51) and multiply the result by 2. Then, the second term on the right side of Eq. (10.48) is written as

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle -\sum_{\alpha=1}^{N}\langle\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha}, \boldsymbol{W}_{\alpha}\boldsymbol{A}(\Delta\boldsymbol{\omega})\boldsymbol{R} V[\boldsymbol{a}_{\alpha}]\boldsymbol{R}^{\top}\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha})\rangle \\ &\displaystyle =&\displaystyle -\sum_{\alpha=1}^{N}\langle\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha}), \Delta\boldsymbol{\omega}\times\boldsymbol{R} V[\boldsymbol{a}_{\alpha}]\boldsymbol{R}^{\top}\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha})\rangle \\ &\displaystyle =&\displaystyle \sum_{\alpha=1}^{N}\langle\Delta\boldsymbol{\omega},(\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha}))\times \boldsymbol{R} V[\boldsymbol{a}_{\alpha}]\boldsymbol{R}^{\top}\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha})\rangle. \end{array} \end{aligned} $$
(10.52)

Combining this with Eq. (10.49), we can write Eq. (10.48) as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Delta J &\displaystyle =&\displaystyle -\sum_{\alpha=1}^{N}\langle\Delta\boldsymbol{\omega}, (\boldsymbol{R}\boldsymbol{a}_{\alpha})\times\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha}) \\ &\displaystyle &\displaystyle -(\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha}))\times \boldsymbol{R} V[\boldsymbol{a}_{\alpha}]\boldsymbol{R}^{\top}\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha})\rangle \end{array} \end{aligned} $$
(10.53)

Hence, from Eq. (10.32), the gradient of the function J(R) of Eq. (10.45) is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} \boldsymbol{g}&\displaystyle =&\displaystyle -\sum_{\alpha=1}^{N}\left((\boldsymbol{R}\boldsymbol{a}_{\alpha})\times\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha})- (\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha}))\right. \\ &\displaystyle &\displaystyle \left.\qquad \quad \times \boldsymbol{R} V[\boldsymbol{a}_{\alpha}]\boldsymbol{R}^{\top}\boldsymbol{W}_{\alpha}(\boldsymbol{a}_{\alpha}^{\prime}-\boldsymbol{R}\boldsymbol{a}_{\alpha})\right). \end{array} \end{aligned} $$
(10.54)

Next, we consider the linear increment resulting from replacing R by R + A( Δω)R in this equation. Since we are computing an R such that \(\boldsymbol {a}_{\alpha }^{\prime }-\boldsymbol {R}\boldsymbol {a}_{\alpha }\)0, we can ignore the increment of the first R in the first term on the right side of Eq. (10.54), assuming that \(\boldsymbol {a}_{\alpha }^{\prime }-\boldsymbol {R}\boldsymbol {a}_{\alpha }\)0 as the iterations proceed. The second term is quadratic in \(\boldsymbol {a}_{\alpha }^{\prime }-\boldsymbol {R}\boldsymbol {a}_{\alpha }\), so we can ignore it. Only considering the increment of the second R in the first term, we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Delta\boldsymbol{g} &\displaystyle =&\displaystyle \sum_{\alpha=1}^{N} (\boldsymbol{R}\boldsymbol{a}_{\alpha})\times\boldsymbol{W}_{\alpha}\boldsymbol{A}(\Delta\boldsymbol{\omega})\boldsymbol{R}\boldsymbol{a}_{\alpha})= \sum_{\alpha=1}^{N} (\boldsymbol{R}\boldsymbol{a}_{\alpha})\times\boldsymbol{W}_{\alpha}(\Delta\boldsymbol{\omega}\times(\boldsymbol{R}\boldsymbol{a}_{\alpha})) \\ &\displaystyle =&\displaystyle -\sum_{\alpha=1}^{N} (\boldsymbol{R}\boldsymbol{a}_{\alpha})\times\boldsymbol{W}_{\alpha}((\boldsymbol{R}\boldsymbol{a}_{\alpha})\times\Delta\boldsymbol{\omega}). {} \end{array} \end{aligned} $$
(10.55)

Now, we introduce new notations. For a vector ω and a matrix T, we define

(10.56)

The last one is the combination of the first two; whichever × we evaluate first, we obtain the same result. From Eq. (10.31), it is easily seen that ω ×T is “the matrix whose columns are the vector products of ω and the three columns of T” and that T ×ω is “the matrix whose rows are the vector products of the three rows of T and ω” (see [13, 16] for more about this notation). Using this notation and Eq. (10.31), we can write Eq. (10.55) as

$$\displaystyle \begin{aligned} \Delta\boldsymbol{g}= -\sum_{\alpha=1}^{N} (\boldsymbol{R}\boldsymbol{a}_{\alpha})\times\boldsymbol{W}_{\alpha}\boldsymbol{A}(\boldsymbol{R}\boldsymbol{a}_{\alpha})\Delta\boldsymbol{\omega}= \sum_{\alpha=1}^{N} (\boldsymbol{R}\boldsymbol{a}_{\alpha})\times\boldsymbol{W}_{\alpha}\times(\boldsymbol{R}\boldsymbol{a}_{\alpha})\Delta\boldsymbol{\omega}, \end{aligned} $$
(10.57)

where we have noted that A(ω) is antisymmetric: A(ω) = −A(ω). Comparing this and Eq. (10.33), we obtain the Hessian in the form

$$\displaystyle \begin{aligned} \boldsymbol{H}=\sum_{\alpha=1}^{N}(\boldsymbol{R}\boldsymbol{a}_{\alpha})\times\boldsymbol{W}_{\alpha}\times(\boldsymbol{R}\boldsymbol{a}_{\alpha}). \end{aligned} $$
(10.58)

Now that the gradient g and the Hessian H are given by Eqs. (10.54) and (10.58), we can minimize J(R) by Newton iterations as described in the preceding section.

However, we have approximated the Hessian H by letting some quantities be zero in the course of the computation for minimizing those quantities. This convention is called Gauss–Newton approximation, and the Newton iterations using Gauss–Newton approximation are called Gauss–Newton iterations. From Eq. (10.35), we see that if Δω is 0 at the time of convergence, g = 0 holds irrespective of the value of H, returning an exact solution. In other words, as long as the gradient g is correctly computed, the Hessian H need not be exact. However, the value of H affects the speed of convergence.

If the Hessian H is not appropriate, we may overstep the minimum of J(R) and the value of J(R) may increase. Or we may proceed too slowly to reduce J(R) meaningfully. A well-known measure to cope with this is add to H a multiple of the identity matrix I and adjust the constant c of H + c I. To be specific, we decrease c as long as J(R) decreases and increase c if J(R) increases. This modification is known as the Levenberg–Marquardt method. The procedure is written as follows (see, e.g., [19]).

  1. 1.

    Initialize R, and let c = 0.0001.

  2. 2.

    Compute the gradient g and the (Gauss–Newton approximated) Hessian H of J(R).

  3. 3.

    Solve the following linear equation in Δω:

    $$\displaystyle \begin{aligned} (\boldsymbol{H}+c\boldsymbol{I})\Delta\boldsymbol{\omega}=-\boldsymbol{g}. \end{aligned} $$
    (10.59)
  4. 4.

    Tentatively update R to

    $$\displaystyle \begin{aligned} \tilde{\boldsymbol{R}}=e^{\boldsymbol{A}(\Delta\boldsymbol{\omega})}\boldsymbol{R}. \end{aligned} $$
    (10.60)
  5. 5.

    If \(J(\tilde {\boldsymbol {R}})\) < J(R) or \(J(\tilde {\boldsymbol {R}})\)J(R) is not satisfied, let c ← 10c and go back to Step 3.

  6. 6.

    If ∥ Δω∥ ≈ 0, return \(\tilde {\boldsymbol {R}}\) and stop. Else, update \(\boldsymbol {R} \leftarrow \tilde {\boldsymbol {R}}, c \leftarrow c/10\) and go back to Step 2.

If we let c = 0, this reduces to Gauss–Newton iterations. In Steps 1, 5, and 6, the values 0.0001, 10c, and c∕10 are all empirical. To start the iterations, we need appropriate initial values, for which we can use the analytical homogeneous and isotropic noise solution [1, 10, 12]. The initial solution is sufficiently accurate in most practical applications, so the above Levenberg-Marquardt iterations usually converge after a few iterations.

7 Fundamental Matrix Computation

Consider two images of the scene taken by two cameras. Suppose a point in the scene is imaged at (x, y) in the first camera image and at (x′, y′) in the second camera image. From the geometry of perspective imaging, the following epipolar equation holds [9]:

$$\displaystyle \begin{aligned} \left\langle\left(\begin{array}{c} x/f_{0} \\ y/f_{0} \\ 1 \end{array}\right),\boldsymbol{F}\left(\begin{array}{c} x'/f_{0} \\ y'/f_{0} \\ 1 \end{array}\right)\right\rangle=0, \end{aligned} $$
(10.61)

where f 0 is an arbitrary scale constant; theoretically, we could set it to one, but it is better to let it have the magnitude of xf and yf for numerical stability of finite length computation [8]. The matrix F is called the fundamental matrix and is determined from the relative configuration of the two cameras and their internal parameters such as the focal length.

Computing the fundamental matrix F from point correspondences (x α, y α) and \((x_{\alpha }^{\prime },y_{\alpha }^{\prime })\), α = 1, …, N, is one of the most fundamental steps of computer vision (Fig. 10.4). From the computed F, we can reconstruct the 3D structure of the scene (see, e.g., [9, 16]). The basic principle of its computation is minimizing the following function:

$$\displaystyle \begin{aligned} J(\boldsymbol{F})= \frac{f_{0}^{2}}{2}\sum_{\alpha=1}^{N}\frac{\langle\boldsymbol{x}_{\alpha},\boldsymbol{F}\boldsymbol{x}_{\alpha}^{\prime}\rangle^{2}} {\|\boldsymbol{P}_{\boldsymbol{k}}\boldsymbol{F}\boldsymbol{x}_{\alpha}^{\prime}\|{}^{2}+\|\boldsymbol{P}_{\boldsymbol{k}}\boldsymbol{F}^{\top}\boldsymbol{x}_{\alpha}^{\prime}\|{}^{2}}, \end{aligned} $$
(10.62)

where we define

(10.63)

By minimizing Eq. (10.62), we can obtain a maximum likelihood solution to a high accuracy, assuming that the noise terms Δx α, Δy α, \(\Delta x_{\alpha }^{\prime }\), and \(\Delta y_{\alpha }^{\prime }\) in the coordinates (x α, y α) and \((x_{\alpha }^{\prime },y_{\alpha }^{\prime })\) are Gaussian variables of mean 0 with a constant variance. The function J(F) of Eq. (10.62) is called the Sampson error [9, 16].

Fig. 10.4
figure 4

We compute the fundamental matrix F from point correspondences of two images

Evidently, the fundamental matrix F has scale indeterminacy: Eqs. (10.61) and (10.62) are unchanged if F is multiplied by an arbitrary nonzero constant. We normalized it to ∥F2 (≡ \(\sum _{i,j=1}^{3}F_{ij}^{2}\)) = 1. Besides, there is an important requirement, called the rank constraint ([9, 16]): F must have rank 2. Many strategies have been proposed to impose this constraint (see [16]), but the most straightforward one is to express F via the SVD in the form

$$\displaystyle \begin{aligned} \boldsymbol{F}=\boldsymbol{U}\left(\begin{array}{ccc} \sigma_{1} & 0 & 0 \\ 0 & \sigma_{2} & 0 \\ 0 & 0 & 0 \end{array}\right)\boldsymbol{V}^{\top}, \end{aligned} $$
(10.64)

where U and V are orthogonal matrices, and σ 1σ 2 (> 0) are the singular values; letting the third singular value σ 3 be 0 is the rank constraint. From the normalization ∥F2 = 1, we have \(\sigma _{1}^{2}+\sigma _{2}^{2}\) = 1, so we can let

(10.65)

Substituting Eq. (10.64) into Eq. (10.62), we minimize J(F) with respect to U, V , and ϕ. This parameterization was first proposed by Bartoli and Sturm [2], to which Sugaya and Kanatani [25] applied the Lie algebra method.

Note that U and V are orthogonal matrices; they may not represent rotations depending on the sign of the determinant. However, a small variation of an orthogonal matrix is a small rotation. Hence, we can express the small variations of U and V in the form

(10.66)

in terms of small rotation vectors Δω U = \(\Bigl (\Delta \omega _{iU}\Bigr )\) and Δω V = \(\Bigl (\Delta \omega _{iV}\Bigr )\). Incrementing U, V , and ϕ to U +  ΔU, V +  ΔV , and ϕ +  Δϕ in Eq. (10.64), we can write the linear increment of F, ignoring higher order terms, in the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Delta\boldsymbol{F} &\displaystyle =&\displaystyle \boldsymbol{A}(\Delta\boldsymbol{\omega}_{U})\boldsymbol{U}\mathrm{diag}(\cos\phi,\sin\phi,0)\boldsymbol{V}^{\top}+ \boldsymbol{U}\mathrm{diag}(\cos\phi,\sin\phi,0)(\boldsymbol{A}(\Delta\boldsymbol{\omega}_{V})\boldsymbol{V})^{\top} \\ &\displaystyle &\displaystyle +\boldsymbol{U}\mathrm{diag}(-\sin\phi,\cos\phi,0)\boldsymbol{V}^{\top}\Delta\phi. \end{array} \end{aligned} $$
(10.67)

Taking out individual elements, we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Delta F_{11} &\displaystyle =&\displaystyle \Delta\omega_{2U}F_{31}-\Delta\omega_{3U}F_{21}+ \Delta\omega_{2V}F_{13}-\Delta\omega_{3V}F_{12} \\ &\displaystyle &\displaystyle + (U_{12}V_{12}\cos\phi-U_{11}V_{11}\sin\phi)\Delta\phi, \\ \Delta F_{12} &\displaystyle =&\displaystyle \Delta\omega_{2U}F_{32}-\Delta\omega_{3U}F_{22}+ \Delta\omega_{3V}F_{11}-\Delta\omega_{1V}F_{13} \\ &\displaystyle &\displaystyle + (U_{12}V_{22}\cos\phi-U_{11}V_{21}\sin\phi)\Delta\phi, \\ &\displaystyle &\displaystyle \vdots \\ \Delta F_{33} &\displaystyle =&\displaystyle \Delta\omega_{1U}F_{23}-\Delta\omega_{2U}F_{13}+ \Delta\omega_{1V}F_{32}-\Delta\omega_{2V}F_{31} \\ &\displaystyle &\displaystyle + (U_{32}V_{32}\cos\phi-U_{31}V_{31}\sin\phi)\Delta\phi. \end{array} \end{aligned} $$
(10.68)

We identify ΔF with a 9-dimensional vector consisting of components ΔF 11, ΔF 12, …, ΔF 33 and write

$$\displaystyle \begin{aligned} \Delta\boldsymbol{F}=\boldsymbol{F}_{U}\Delta\boldsymbol{\omega}_{U}+\boldsymbol{F}_{V}\Delta\boldsymbol{\omega}_{V}+\boldsymbol{\theta}_{\phi}\Delta\phi, \end{aligned} $$
(10.69)

where we define the 9 × 3 matrices F U and F V and the 9-dimensional vector θ ϕ by

(10.70)
$$\displaystyle \begin{aligned} \boldsymbol{\theta}_{\phi}=\left(\begin{array}{c} \sigma_{1}U_{12}V_{12}-\sigma_{2}U_{11}V_{11} \\ \sigma_{1}U_{12}V_{22}-\sigma_{2}U_{11}V_{21} \\ \sigma_{1}U_{12}V_{32}-\sigma_{2}U_{11}V_{31} \\ \sigma_{1}U_{22}V_{12}-\sigma_{2}U_{21}V_{11} \\ \sigma_{1}U_{22}V_{22}-\sigma_{2}U_{21}V_{21} \\ \sigma_{1}U_{22}V_{32}-\sigma_{2}U_{21}V_{31} \\ \sigma_{1}U_{32}V_{12}-\sigma_{2}U_{31}V_{11} \\ \sigma_{1}U_{32}V_{22}-\sigma_{2}U_{31}V_{21} \\ \sigma_{1}U_{32}V_{32}-\sigma_{2}U_{31}V_{31} \end{array}\right). \end{aligned} $$
(10.71)

Then, the linear increment ΔJ of the function J(F) of Eq. (10.62) is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Delta J &\displaystyle =&\displaystyle \langle\nabla_{\boldsymbol{F}}J,\Delta\boldsymbol{F}\rangle= \langle\nabla_{\boldsymbol{F}}J,\boldsymbol{F}_{U}\Delta\boldsymbol{\omega}_{U}\rangle+ \langle\nabla_{\boldsymbol{F}}J,\boldsymbol{F}_{V}\Delta\boldsymbol{\omega}_{V}\rangle+ \langle\nabla_{\boldsymbol{F}}J,\boldsymbol{\theta}_{\phi}\Delta\phi\rangle \\ &\displaystyle =&\displaystyle \langle \boldsymbol{F}_{U}^{\top}\nabla_{\boldsymbol{F}}J,\Delta\boldsymbol{\omega}_{U}\rangle+ \langle \boldsymbol{F}_{V}^{\top}\nabla_{\boldsymbol{F}}J,\Delta\boldsymbol{\omega}_{V}\rangle+ \langle\nabla_{\boldsymbol{F}}J,\boldsymbol{\theta}_{\phi}\rangle\Delta\phi, {} \end{array} \end{aligned} $$
(10.72)

where ∇F J is the 9-dimensional vector consisting of components ∂J∂F ij. From this, we obtain the gradients of J with respect to U U, U V, and ϕ as follows:

(10.73)

Next, consider the second derivatives 2 J∂F ij ∂F kl of Eq. (10.62). We adopt the Gauss–Newton approximation of ignoring terms containing \(\langle \boldsymbol {x}_{\alpha },\boldsymbol {F}\boldsymbol {x}_{\alpha }^{\prime }\rangle \), i.e., the left side of the epipolar equation of Eq. (10.61). It follows that we need not consider terms containing \(\langle \boldsymbol {x}_{\alpha },\boldsymbol {F}\boldsymbol {x}_{\alpha }^{\prime }\rangle ^{2}\) in the first derivative, i.e., we need not differentiate the denominator in Eq. (10.62). Hence, the first derivative is approximated to be

$$\displaystyle \begin{aligned} \frac{\partial J}{\partial F_{ij}}\approx \sum_{\alpha=1}^{2}\frac{f_{0}^{2}x_{i\alpha}x_{j\alpha}^{\prime}\langle\boldsymbol{x}_{\alpha},\boldsymbol{F}\boldsymbol{x}_{\alpha}^{\prime}\rangle} {\|\boldsymbol{P}_{\boldsymbol{k}}\boldsymbol{F}\boldsymbol{x}_{\alpha}^{\prime}\|{}^{2}+\|\boldsymbol{P}_{\boldsymbol{k}}\boldsymbol{F}^{\top}\boldsymbol{x}_{\alpha}^{\prime}\|{}^{2}}, \end{aligned} $$
(10.74)

where x and \(x_{j\alpha }^{\prime }\) denote the ith components of x α and \(\boldsymbol {x}_{\alpha }^{\prime }\), respectively. For differentiating this with respect to F kl, we need not differentiate the denominator because the numerator contains \(\langle \boldsymbol {x}_{\alpha },\boldsymbol {F}\boldsymbol {x}_{\alpha }^{\prime }\rangle \). Differentiating only the numerator, we obtain

$$\displaystyle \begin{aligned} \frac{\partial^{2}J}{\partial F_{ij}\partial F_{kl}}\approx \sum_{\alpha=1}^{2}\frac{f_{0}^{2}x_{i\alpha}x_{j\alpha}^{\prime}x_{k\alpha}x_{l\alpha}^{\prime}} {\|\boldsymbol{P}_{\boldsymbol{k}}\boldsymbol{F}\boldsymbol{x}_{\alpha}^{\prime}\|{}^{2}+\|\boldsymbol{P}_{\boldsymbol{k}}\boldsymbol{F}^{\top}\boldsymbol{x}_{\alpha}^{\prime}\|{}^{2}}. \end{aligned} $$
(10.75)

Let us count the pairs of indices (i, j) = (1,1), (1,2), …, (3,3), using a single running index I = 1, …, 9. Similarly, we use a single running index J = 1, …, 9 for pairs (k, l) and regard the right side of the above equation as the (I, J) element of a 9 × 9 matrix, which we write as \(\nabla _{\boldsymbol {F}}^{2}J\). Then, as in Eq. (10.72), we can write, using Eq. (10.69), the second derivation of J with respect to U, V , and ϕ in the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Delta^{2}J &\displaystyle =&\displaystyle \langle\Delta\boldsymbol{F},\nabla_{\boldsymbol{F}}^{2}J\Delta\boldsymbol{F}\rangle \\ &\displaystyle =&\displaystyle \langle\boldsymbol{F}_{U}\Delta\boldsymbol{\omega}_{U}+\boldsymbol{F}_{V}\Delta\boldsymbol{\omega}_{V}+\boldsymbol{\theta}_{\phi}\Delta\phi,\nabla_{\boldsymbol{F}}^{2}J( \boldsymbol{F}_{U}\Delta\boldsymbol{\omega}_{U}+\boldsymbol{F}_{V}\Delta\boldsymbol{\omega}_{V}+\boldsymbol{\theta}_{\phi}\Delta\phi\rangle \\ &\displaystyle =&\displaystyle \langle\Delta\boldsymbol{\omega}_{U},\boldsymbol{F}_{U}^{\top}\nabla_{\boldsymbol{F}}^{2}J\boldsymbol{F}_{U}\Delta\boldsymbol{\omega}_{U}\rangle+ \langle\Delta\boldsymbol{\omega}_{U},\boldsymbol{F}_{U}^{\top}\nabla_{\boldsymbol{F}}^{2}J\boldsymbol{F}_{V}\Delta\boldsymbol{\omega}_{V}\rangle \\ &\displaystyle &\displaystyle + \langle\Delta\boldsymbol{\omega}_{V},\boldsymbol{F}_{V}^{\top}\nabla_{\boldsymbol{F}}^{2}J\boldsymbol{F}_{U}\Delta\boldsymbol{\omega}_{V}\rangle + \langle\Delta\boldsymbol{\omega}_{V},\boldsymbol{F}_{V}^{\top}\nabla_{\boldsymbol{F}}^{2}J\boldsymbol{F}_{V}\Delta\boldsymbol{\omega}_{V}\rangle \\ &\displaystyle &\displaystyle + \langle\Delta\boldsymbol{\omega}_{U},\boldsymbol{F}_{U}^{\top}\nabla_{\boldsymbol{F}}^{2}J\boldsymbol{\theta}_{\phi}\rangle\Delta\phi + \langle\Delta\boldsymbol{\omega}_{V},\boldsymbol{F}_{V}^{\top}\nabla_{\boldsymbol{F}}^{2}J\boldsymbol{\theta}_{\phi}\rangle\Delta\phi \\ &\displaystyle &\displaystyle + \langle\Delta\boldsymbol{\omega}_{U},\boldsymbol{F}_{U}^{\top}\nabla_{\boldsymbol{F}}^{2}J\boldsymbol{\theta}_{\phi}\rangle\Delta\phi+ \langle\Delta\boldsymbol{\omega}_{V},\boldsymbol{F}_{V}^{\top}\nabla_{\boldsymbol{F}}^{2}J\boldsymbol{\theta}_{\phi}\rangle\Delta\phi \\ &\displaystyle &\displaystyle + \langle\boldsymbol{\theta}_{\phi},\nabla_{\boldsymbol{F}}^{2}J\boldsymbol{\theta}_{\phi}\rangle\Delta\phi^{2}, \end{array} \end{aligned} $$
(10.76)

from which we obtain the following second derivatives of J:

(10.77)

Now that the first and second derivatives are given, the Levenberg–Marquardt procedure for minimizing J goes as follows:

  1. 1.

    Provide an initial value of F such that |F| = 0 and ∥F∥ = 1, and compute the SVD of Eq. (10.64). Evaluate the value J of Eq. (10.62), and let c = 0.0001.

  2. 2.

    Compute the first and second derivatives ∇F J and (Gauss–Newton approximated) \(\nabla _{\boldsymbol {F}}^{2}J\) of J with respect to F.

  3. 3.

    Compute the 9 × 3 matrices F U and F V of Eq. (10.70) and the 9-dimensional vector θ ϕ of Eq. (10.71).

  4. 4.

    Compute the first derivatives \(\nabla _{\boldsymbol {\omega }_{U}}J\), \(\nabla _{\boldsymbol {\omega }_{V}}J\), and ∂J∂ϕ in Eq. (10.73) and the second derivatives \(\nabla _{\boldsymbol {\omega }_{U}\boldsymbol {\omega }_{U}}J\), \(\nabla _{\boldsymbol {\omega }_{V}\boldsymbol {\omega }_{V}}J\), \(\nabla _{\boldsymbol {\omega }_{U}\boldsymbol {\omega }_{V}}J\), \(\partial \nabla _{\boldsymbol {\omega }_{U}}J/\partial \phi \), \(\partial \nabla _{\boldsymbol {\omega }_{V}}J/\partial \phi \), and 2 J∂ϕ 2 in Eq. (10.77) of J.

  5. 5.

    Solve the following linear equation in Δω U, Δω V, and Δϕ:

    $$\displaystyle \begin{aligned} &\left( \left(\begin{array}{ccc} \nabla_{\boldsymbol{\omega}_{U}\boldsymbol{\omega}_{U}}J & \nabla_{\boldsymbol{\omega}_{U}\boldsymbol{\omega}_{V}}J & \partial\nabla_{\boldsymbol{\omega}_{U}}J/\partial\phi \\ (\nabla_{\boldsymbol{\omega}_{U}\boldsymbol{\omega}_{V}}J)^{\top} & \nabla_{\boldsymbol{\omega}_{V}\boldsymbol{\omega}_{V}}J & \partial\nabla_{\boldsymbol{\omega}_{V}}J/\partial\phi \\ (\partial\nabla_{\boldsymbol{\omega}_{U}}J/\partial\phi)^{\top} & (\partial\nabla_{\boldsymbol{\omega}_{V}}J/\partial\phi)^{\top} & \partial^{2}J/\partial\phi^{2} \end{array}\right) +c\boldsymbol{I}\right)\left(\begin{array}{c} \Delta\boldsymbol{\omega}_{U} \\ \Delta\boldsymbol{\omega}_{V} \\ \Delta\phi \end{array}\right) \\ &\quad =- \left(\begin{array}{c} \nabla_{\boldsymbol{\omega}_{U}}J \\\nabla_{\boldsymbol{\omega}_{V}}J \\\partial J/\partial\phi \end{array}\right). \end{aligned} $$
    (10.78)
  6. 6.

    Tentatively update U, V , and ϕ to

    (10.79)
  7. 7.

    Tentatively update F to

    $$\displaystyle \begin{aligned} \tilde{\boldsymbol{F}}=\tilde{\boldsymbol{U}}\left(\begin{array}{ccc} \cos\tilde{\phi} & 0 & 0 \\ 0 & \sin\tilde{\phi} & 0 \\ 0 & 0 & 0 \end{array}\right) \tilde{\boldsymbol{V}}^{\top}. \end{aligned} $$
    (10.80)
  8. 8.

    Let \(\tilde {J}\) be the value of Eq. (10.62) for \(\tilde {\boldsymbol {F}}\).

  9. 9.

    If \(\tilde {J}\) < J or \(\tilde {J}\)J is not satisfied, let c ← 10c and go back to Step 5.

  10. 10.

    If \(\tilde {\boldsymbol {F}}\)F, return \(\tilde {\boldsymbol {F}}\) and stop. Else, update \(\boldsymbol {F} \leftarrow \tilde {\boldsymbol {F}}, \boldsymbol {U} \leftarrow \tilde {\boldsymbol {U}}, \boldsymbol {V} \leftarrow \tilde {\boldsymbol {V}}, \tilde {\phi } \leftarrow \phi , c \leftarrow c/10\), and JJ′ and go back to Step 2.

We need an initial value of F for starting these iterations. Various simple schemes are known. The simplest one is the “least squares” that minimizes the square sum of the left side of the epipolar equation of Eq. (10.61), which is equivalent to ignoring the denominator on the left side of Eq. (10.62). Since the square sum is quadratic in F, the solution is immediately obtained by eigenanalysis if the rank constraint is not considered. The rank constraint can be imposed by computing the SVD of the resulting F and replacing the smallest singular value by 0. This scheme is known as Hartley’s 8-point method [8]. Hartley’s 8-point method is sufficiently accurate in most practical applications, so the above iterations usually converge after a few iterations. See Kanatani et al. [16] for experimental comparisons of how the above method improves the accuracy over Hartley’s 8-point method; often the number of significant digits increases at least by one.

8 Bundle Adjustment

We consider the problem of reconstructing the 3D structure of the scene from multiple images taken by multiple cameras. One of the most fundamental methods is bundle adjustment: we optimally estimate all the 3D positions of the points we are viewing and all the postures of the cameras as well as their internal parameters, in such a way that the bundle of rays, or lines of sight, will piece through the images appropriately.

Consider points (X α, Y α, Z α), α = 1, …, N, in the scene. Suppose the αth point is viewed at (x ακ, y ακ) in the image of the κth camera, κ = 1, …, M (Fig. 10.5). The imaging geometry of most of today’s cameras is sufficiently modeled by perspective projection, for which the following relations hold [9]:

$$\displaystyle \begin{aligned} \begin{array}{rcl} x_{\alpha\kappa} &\displaystyle =&\displaystyle f_{0}\frac{ P_{\kappa(11)}X_{\alpha}+P_{\kappa(12)}Y_{\alpha}+P_{\kappa(13)}Z_{\alpha}+ P_{\kappa(14)} }{ P_{\kappa(31)}X_{\alpha}+P_{\kappa(32)}Y_{\alpha}+P_{\kappa(33)}Z_{\alpha}+ P_{\kappa(34)}}, \\ y_{\alpha\kappa} &\displaystyle =&\displaystyle f_{0}\frac{ P_{\kappa(21)}X_{\alpha}+P_{\kappa(22)}Y_{\alpha}+P_{\kappa(23)}Z_{\alpha}+ P_{\kappa(24)} }{ P_{\kappa(31)}X_{\alpha}+P_{\kappa(32)}Y_{\alpha}+P_{\kappa(33)}Z_{\alpha}+ P_{\kappa(34)}}, {} \end{array} \end{aligned} $$
(10.81)

where f 0 is the scale constant we used in Eq. (10.61), and P κ(ij) are constants determined by the position, orientation, and internal parameters (e.g., the focal length, the principal point position, and the image distortion description) of the κth camera. We write the 3 × 4 matrix whose (i, j) element is P κ(ij) as P κ and call it the camera matrix of the κth camera. From the geometry of perspective projection, we can write this in the form

$$\displaystyle \begin{aligned} \boldsymbol{P}_{\kappa}=\boldsymbol{K}_{\kappa}\boldsymbol{R}_{\kappa}^{\top} \left(\begin{array}{cc} \boldsymbol{I} & -\boldsymbol{t}_{\kappa} \end{array}\right), \end{aligned} $$
(10.82)

where K κ is the 3 × 3 matrix, called the intrinsic parameter matrix, consisting of the internal parameters of the κth camera [9]. The matrix R κ specifies the rotation of the κth camera relative to the world coordinate system fixed to the scene, and t κ is the position of the lens center of the κth camera. The principle of bundle adjustment is to minimize

$$\displaystyle \begin{aligned} \begin{array}{rcl} E &\displaystyle =&\displaystyle \sum_{\alpha=1}^{N}\sum_{\kappa=1}^{M}\left( \Bigl( \frac{x_{\alpha\kappa}}{f_{0}}-\frac{ P_{\kappa(11)}X_{\alpha}+P_{\kappa(12)}Y_{\alpha}+P_{\kappa(13)}Z_{\alpha}+ P_{\kappa(14)} }{ P_{\kappa(31)}X_{\alpha}+P_{\kappa(32)}Y_{\alpha}+P_{\kappa(33)}Z_{\alpha}+ P_{\kappa(34)}} \Bigr)^{2} \right.\\ &\displaystyle &\displaystyle \left.+\Bigl( \frac{y_{\alpha\kappa}}{f_{0}}-\frac{ P_{\kappa(21)}X_{\alpha}+P_{\kappa(22)}Y_{\alpha}+P_{\kappa(23)}Z_{\alpha}+ P_{\kappa(24)} }{ P_{\kappa(31)}X_{\alpha}+P_{\kappa(32)}Y_{\alpha}+P_{\kappa(33)}Z_{\alpha}+ P_{\kappa(34)}} \Bigr)^{2}\right), {} \end{array} \end{aligned} $$
(10.83)

with respect to all the 3D positions (X α, Y α, Z α) and all the camera matrices P κ from observed (x ακ, y ακ), α = 1, …, N, κ = 1, …, M, as the input so that Eq. (10.81) holds as accurately as possible. The expression E, called the reprojection error [9], measures the square sum of the discrepancies between the image positions predicted by the perspective projection geometry and their actually observed image positions.

Fig. 10.5
figure 5

N points in the scene are viewed by M cameras. The αth point (X α, Y α, Z α) is imaged at point (x ακ, y ακ) in the κth camera image

Various algorithms have been proposed for bundle adjustment and are now available on the Web. The best known is the SBA of Lourakis and Argyros [17]. Snavely et al. [23, 24] combined it with an image correspondence extraction process and offered a tool called bundler. Here, we slightly modify these algorithms, based on Kanatani et al. [16], to explicitly use the Lie algebra method for camera rotation optimization.

Letting

$$\displaystyle \begin{aligned} \begin{array}{rcl} p_{\alpha\kappa} &\displaystyle =&\displaystyle P_{\kappa(11)}X_{\alpha}+P_{\kappa(12)}Y_{\alpha}+P_{\kappa(13)}Z_{\alpha} +P_{\kappa(14)}, \\ q_{\alpha\kappa} &\displaystyle =&\displaystyle P_{\kappa(21)}X_{\alpha}+P_{\kappa(22)}Y_{\alpha}+P_{\kappa(23)}Z_{\alpha} +P_{\kappa(24)}, \\ r_{\alpha\kappa} &\displaystyle =&\displaystyle P_{\kappa(31)}X_{\alpha}+P_{\kappa(32)}Y_{\alpha}+P_{\kappa(33)}Z_{\alpha} +P_{\kappa(34)}, {} \end{array} \end{aligned} $$
(10.84)

we rewrite Eq. (10.83) in the form

$$\displaystyle \begin{aligned} E=\sum_{\alpha=1}^{N}\sum_{\kappa=1}^{M} \Bigl( \Bigl( \frac{p_{\alpha\kappa}}{r_{\alpha\kappa}} -\frac{x_{\alpha\kappa}}{f_{0}}\Bigr)^{2}+ \Bigl( \frac{q_{\alpha\kappa}}{r_{\alpha\kappa}} -\frac{y_{\alpha\kappa}}{f_{0}}\Bigr)^{2}\Bigr). \end{aligned} $$
(10.85)

Using a single running index k = 1, 2, …for all the unknowns, i.e., all the 3D positions (X α, Y α, Z α), α = 1, …, N, and all the camera matrices P κ, κ = 1, …, M, we write all the unknowns as ξ 1, ξ 2, …. The first derivative of the reprojection error E with respect to ξ k is

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\partial E}{\partial\xi_{k}} &\displaystyle =&\displaystyle \sum_{\alpha=1}^{N}\sum_{\kappa=1}^{M} \frac{2}{r_{\alpha\kappa}^{2}} \left(\Bigl( \frac{p_{\alpha\kappa}}{r_{\alpha\kappa}} -\frac{x_{\alpha\kappa}}{f_{0}}\Bigr) \Bigl( r_{\alpha\kappa}\frac{\partial p_{\alpha\kappa}}{\partial\xi_{k}} -p_{\alpha\kappa}\frac{\partial r_{\alpha\kappa}}{\partial\xi_{k}} \Bigr)\right. \\ &\displaystyle &\displaystyle \left.+\Bigl( \frac{q_{\alpha\kappa}}{r_{\alpha\kappa}} -\frac{y_{\alpha\kappa}}{f_{0}}\Bigr) \Bigl( r_{\alpha\kappa}\frac{\partial q_{\alpha\kappa}}{\partial\xi_{k}} -q_{\alpha\kappa}\frac{\partial r_{\alpha\kappa}}{\partial\xi_{k}} \Bigr)\right). {} \end{array} \end{aligned} $$
(10.86)

Next, we consider second derivatives. Noting that as Eq. (10.85) decreases in the course of iterations, we expect that p ακr ακ − x ακf 0 ≈ 0 and q ακr ακ − y ακf 0 ≈ 0. So, we adopt the Gauss–Newton approximation of ignoring them. Then, the second derivative of E is written as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \frac{\partial^{2}E}{\partial\xi_{k}\partial\xi_{l}} &\displaystyle =&\displaystyle 2\sum_{\alpha=1}^{N}\sum_{\kappa=1}^{M} \frac{1}{r_{\alpha\kappa}^{4}} \left( \Bigl( r_{\alpha\kappa}\frac{\partial p_{\alpha\kappa}}{\partial\xi_{k}} -p_{\alpha\kappa}\frac{\partial r_{\alpha\kappa}}{\partial\xi_{k}} \Bigr) \Bigl( r_{\alpha\kappa}\frac{\partial p_{\alpha\kappa}}{\partial\xi_{l}} -p_{\alpha\kappa}\frac{\partial r_{\alpha\kappa}}{\partial\xi_{l}} \Bigr)\right. \\ &\displaystyle &\displaystyle \left.+\Bigl( r_{\alpha\kappa}\frac{\partial q_{\alpha\kappa}}{\partial\xi_{k}} -q_{\alpha\kappa}\frac{\partial r_{\alpha\kappa}}{\partial\xi_{k}} \Bigr) \Bigl( r_{\alpha\kappa}\frac{\partial q_{\alpha\kappa}}{\partial\xi_{l}} -q_{\alpha\kappa}\frac{\partial r_{\alpha\kappa}}{\partial\xi_{l}} \Bigr) \right). {} \end{array} \end{aligned} $$
(10.87)

As a result, for computing the first and second derivatives ∂E∂ξ k and 2 E∂ξ k ∂ξ l of E, we only need to evaluate the first derivatives ∂p ακ∂ξ k, ∂q ακ∂ξ k, and ∂r ακ∂ξ k of p ακ, q ακ, and r ακ.

Now, we apply the Lie algebra method to differentiation with respect to the rotation R κ in Eq. (10.82)Footnote 3; to other unknowns (the 3D positions (X α, Y α, Z α), the camera positions t κ, and all the parameters contained in the intrinsic parameter matrix K κ), we can apply the usual chain rule straightforwardly.

The linear increment ΔP κ of Eq. (10.82) caused by a small change A(Δ ω κ)R κ of R κ is written as

$$\displaystyle \begin{aligned} \begin{array}{rcl} \Delta\boldsymbol{P}_{\kappa} &=& \boldsymbol{K}_{\kappa}(\boldsymbol{A}(\Delta\boldsymbol{\omega}_{\kappa}) \boldsymbol{R}_{\kappa})^{\top} \left(\begin{array}{cc} \boldsymbol{I} & -\boldsymbol{t}_{\kappa} \end{array}\right) =\boldsymbol{K}_{\kappa}\boldsymbol{R}_{\kappa}^{\top} \left(\begin{array}{cc} \boldsymbol{A}(\varDelta\boldsymbol{\omega}_{\kappa})^{\top} & -\boldsymbol{A}(\varDelta\boldsymbol{\omega}_{\kappa})^{\top}\boldsymbol{t}_{\kappa} \end{array}\right) \\ &=& \boldsymbol{K}_{\kappa}\boldsymbol{R}_{\kappa}^{\top} \left(\begin{array}{cccc} 0 & \Delta\omega_{\kappa3} & -\Delta\omega_{\kappa2} & \Delta\omega_{\kappa2}t_{\kappa3}-\Delta\omega_{\kappa3}t_{\kappa2} \\ -\Delta\omega_{\kappa3} & 0 & \Delta\omega_{\kappa1} & \Delta\omega_{\kappa3}t_{\kappa1}-\Delta\omega_{\kappa1}t_{\kappa3} \\ \Delta\omega_{\kappa2} & -\Delta\omega_{\kappa1} & 0 & \Delta\omega_{\kappa1}t_{\kappa2}-\Delta\omega_{\kappa2}t_{\kappa1} \end{array}\right), \end{array} \end{aligned} $$
(10.88)

where Δω κi and t κi are the ith components of Δω κ and t κ, respectively. Rewriting the above equation in the form

$$\displaystyle \begin{aligned} \Delta\boldsymbol{P}_{\kappa}= \frac{\partial\boldsymbol{P}_{\kappa}}{\partial\omega_{\kappa1}}\Delta\omega_{\kappa1}+ \frac{\partial\boldsymbol{P}_{\kappa}}{\partial\omega_{\kappa2}}\Delta\omega_{\kappa2}+ \frac{\partial\boldsymbol{P}_{\kappa}}{\partial\omega_{\kappa3}}\Delta\omega_{\kappa3}, \end{aligned} $$
(10.89)

we obtain the gradients P κ∂ω κ1, P κ∂ω κ2, and P κ∂ω κ3 of P κ with respect to the small rotation vector Δω κ. Letting the components of the vector ω κ be included in the set of ξ i, we obtain the first derivatives ∂p ακ∂ξ k, ∂q ακ∂ξ k, and ∂r ακ∂ξ k of Eq. (10.84) for the rotation. Note that the value of ω κ is not defined but its differential is defined. Using Eqs. (10.86) and (10.87), we can compute the first and second derivatives ∂E∂ξ k and 2 E∂ξ k ∂ξ l of the reprojection error E. The Levenberg–Marquardt bundle adjustment procedure has the following form:

  1. 1.

    Initialize the 3D positions (X α, Y α, Z α) and the camera matrices P κ, and compute the associated reprojection error E. Let c = 0.0001.

  2. 2.

    Compute the first and second derivatives ∂E∂ξ k and 2 E∂ξ k ∂ξ l for all the unknowns.

  3. 3.

    Solve the following linear equation for Δξ k, k = 1, 2, …:

    $$\displaystyle \begin{aligned} \left(\begin{array}{cccc} \partial^{2}E/\partial\xi_{1}^{2}+c & \partial^{2}E/\partial\xi_{1}\partial\xi_{2} & \partial^{2}E/\partial\xi_{1}\partial\xi_{3} & \ldots \\ \partial^{2}E/\partial\xi_{2}\partial\xi_{1} & \partial^{2}E/\partial\xi_{2}^{2}+c & \partial^{2}E/\partial\xi_{2}\partial\xi_{3} & \ldots \\ \partial^{2}E/\partial\xi_{3}\partial\xi_{1} & \partial^{2}E/\partial\xi_{3}\partial\xi_{2} & \partial^{2}E/\partial\xi_{3}^{2}+c & \ldots \\ \vdots & \vdots & \vdots & \ddots \end{array}\right) \left(\begin{array}{c} \Delta\xi_{1} \\ \Delta\xi_{2} \\ \Delta\xi_{3} \\ \vdots \end{array}\right)= -\left(\begin{array}{c} \partial E/\partial\xi_{1} \\ \partial E/\partial\xi_{2} \\ \partial E/\partial\xi_{3} \\ \vdots \end{array}\right). \end{aligned} $$
    (10.90)
  4. 4.

    Tentatively update the unknowns ξ k to \(\tilde {\xi }_{k}\) = ξ k +  Δξ k except the rotations R κ, which are updated to \(\tilde {\boldsymbol {R}}_{\kappa }\) = \(e^{\boldsymbol {A}(\Delta \boldsymbol {\omega }_{\kappa })}\boldsymbol {R}_{\kappa }\).

  5. 5.

    Compute the corresponding reprojection error \(\tilde {E}\). If \(\tilde {E}\) > E, let c ← 10c and go back to Step 3.

  6. 6.

    Update the unknowns to ξ k\(\tilde {\xi }_{k}\). If \(|\tilde {E}-E|\)δ, then stop (δ is a small constant). Else, let E\(\tilde {E}\) and cc∕10 and go back to Step 2.

In usual numerical iterations, the variables are successively updated until they no longer change. However, the number of unknowns for bundle adjustment is thousands or even tens of thousands, so an impractically long computation time would be necessary if all variables were required to converge over significant digits. On the other hand, the purpose of bundle adjustment is to find a solution with a small reprojection error. So, it is a practical compromise to stop if the reprojection error almost ceases to decrease, as we describe in the above procedure.

For actual implementation, many issues arise. One of them is the scale and orientation indeterminacy. This is a consequence of the fact that the world coordinate system can be arbitrarily defined and that imaging a small object by a nearby camera will produce the same image as imaging a large object by a faraway camera. To resolve this indeterminacy, we usually define the world coordinate system so that it coincides with the first camera frame and fix the scale so that the distance between the first and second cameras is unity. Normalization like this reduces the number of unknowns of Eq. (10.90). Also, all the points in the scene are not necessarily seen in all the images, so we must adjust the number of equations and unknowns of Eq. (10.90), accordingly.

Another issue is the computation time. Directly solving Eq. (10.90) would require hours or days of computation. One of the well-known techniques for reducing this is to separate the unknowns to the 3D point part and the camera matrix part; we solve for the unknowns of one part in terms of the unknowns of the other part and substitute the result into the remaining linear equations, which results in a smaller-size coefficient matrix known as the Schur complement [26]. The memory space is another issue; we need to retain all relevant information in the course of the iterations without writing all intermediate values in memory arrays, which might exhaust the memory resource. See [16] for implementation details and numerical examples using real image data.

9 Summary

We have described how we can optimize the pose computation involving rotations using image and sensor data. We have pointed out that we do not need any parameterization of the rotation (axis–angle, Euler angles, quaternions, etc.); we only need to parameterize infinitesimal rotations, which form a linear space called the Lie algebra. We have shown how the rotation matrix R is successively updated without involving any parameterization in the Levenberg–Marquardt framework. We have demonstrated our Lie algebra method for maximum likelihood rotation estimation, fundamental matrix computation, and bundle adjustment for 3D reconstruction.

The problems we have shown here have been well known and solved by many other methods, often with heuristics and ad-hoc treatment. Software tools for them are available on the Web, and their performance is usually satisfactory. We are not asserting that the use of Lie algebra improves their performance greatly. Our aim here is to emphasize the role Lie algebra plays in vision applications, because it is a fundamental mathematical principle that can be applied to a wide range of nonlinear optimization problems.

Lie algebra has been used for robotics control of continuously changing 3D postures [4, 6]. Recently, some researchers are using the Lie algebra method for “motion averaging”: the 3D posture is computed by different methods and sensors, resulting in different values, and their best average is computed by iterative optimization [5, 7]. A similar approach was used to create a seamless circular panorama by optimizing the camera orientations [22]. In Sect. 10.7, we showed how to optimally compute the fundamental matrix. If the camera internal parameters are all known, the fundamental matrix is called the “essential matrix,” and the Lie algebra method is also used to optimize it [27].

Thus, Lie algebra plays an important role in a wide range of computer vision problems. This chapter is aimed to help deepening its understanding.