Mathematical Foundations of VR/AR

Doerner, Ralf

doi:10.1007/978-3-030-79062-2_11

Ralf Doerner⁵

4552 Accesses
1 Citations

Abstract

In Virtual Reality and Augmented Reality, mathematical methods offer fundamental principles to model three-dimensional space. This makes it possible to provide exact information and perform calculations, e.g., to determine distances or to describe the effects of transformations such as rotations or translations exactly. This chapter compiles the most important mathematical methods, especially from linear algebra, that are frequently used in VR and AR. For this purpose, the term vector space is defined and extended to a Euclidean space. Afterwards, some basics of analytic geometry are introduced, especially the mathematical description of lines and planes. Finally, changes of coordinate systems as well as affine transformations are discussed and their computation with matrices in homogeneous coordinates is explained.

Dedicated website for additional material: vr-ar-book.org

Access provided by Autonomous University of Puebla. Download chapter PDF

AR and VR – A Review on Recent Progress and Applications

The Evolution of Azuma’s Augmented Reality– An Overview of 20 Years of Research

Augmented Reality: A Comprehensive Review

Article 20 October 2022

11.1 Vector Spaces

In Virtual Reality, we are concerned with the real space that surrounds us. It is helpful to model this space with methods of mathematics, e.g., to be able to make exact, formal, mathematically provable statements or to perform computations. In VR, we use a vector space , a construct of linear algebra (a branch of mathematics), for this modeling.

Each vector space is formed over a field G. The elements of G are called scalars and we denote them by small Latin letters. Being a field in the sense of algebra means that G is a set with the two binary operations “+” (addition) and “·” (multiplication), which combine two elements of G and as a result give an element of G. Moreover, there is an element 0 in G, called the additive identity, and an element 1 in G, called the multiplicative identity. Finally, the elements of G satisfy the following field axioms. For any scalar a, b, c, d (with d ≠ 0):

$$ a+\left(b+c\right)=\left(a+b\right)+c\kern1.6em \left(\mathrm{associativity}\ \mathrm{of}\ \mathrm{addition}\right) $$

$$ a+b=b+a\kern1.6em \left(\mathrm{commutativity}\ \mathrm{of}\ \mathrm{addition}\right) $$

$$ 0+a=a\kern1.6em \left(\mathrm{commutativity}\ \mathrm{of}\ \mathrm{addition}\right) $$

For each a ∈ G there exists a −a ∈ G with −a + a = 0 (additive inverses)

$$ a\cdotp \left(b\cdotp c\right)=\left(a\cdotp b\right)\cdotp c\kern1.6em \left(\mathrm{associativity}\ \mathrm{of}\ \mathrm{multiplication}\right) $$

$$ a\cdotp b=b\cdotp a\kern1.6em \left(\mathrm{commutativity}\ \mathrm{of}\ \mathrm{multiplication}\right) $$

$$ 1\cdotp d=d\kern1.6em \left(\mathrm{multiplicative}\ \mathrm{identity}\right) $$

For each d ∈ G \{0} there exists a d⁻¹ ∈ G with d⁻¹ · d = 1 (multiplicative inverses)

$$ a\cdotp \left(b+c\right)=a\cdotp b+a\cdotp c\kern1.6em \left(\mathrm{distributivity}\right) $$

The set of real numbers ℝ, which comprises the set of natural numbers (e.g., 1, 2, 3, …), integers, rational numbers and irrational numbers (e.g., π), fulfills the field axioms and is usually chosen in VR.

The set of elements of a vector space V over a field G is called vectors . We denote them by Latin letters, over which an arrow is placed. Two operations are defined on vectors. First, vector addition takes two vectors and assigns them a third vector. We write this operation as “+” (not to be confused with addition in scalars). The vector addition adheres to the associativity of addition and the commutativity of addition. There exists also an identity element of addition, the zero vector $ \overrightarrow{0} $. For each vector $ \overrightarrow{u} $there exists an additive inverse $ -\overrightarrow{u} $ in V. Secondly, scalar multiplication takes a scalar and a vector and assigns them a vector. we write it as “·”. Scalar multiplication adheres to distributivity:

$$ \forall a,b\in G,\forall \overrightarrow{u},\overrightarrow{v}\in V:a\cdotp \left(\overrightarrow{u}+\overrightarrow{v}\right)=a\cdotp \overrightarrow{u}+a\cdotp \overrightarrow{v}\kern0.24em \mathrm{and}\kern0.24em \left(a+b\right)\cdotp \overrightarrow{u}=a\cdotp \overrightarrow{u}+b\cdotp \overrightarrow{u} $$

An example of a set V that fulfills these properties of a vector space is the set of 3-tuples over the real numbers, i.e., the set of all lists of real numbers of length 3. We call this set ℝ³. The 3-tuple (5, –2, 3), for example, is an element from the set ℝ³. In the following, we will not write the elements of ℝ³ as a list next to each other but on top of each other:

$$ \overrightarrow{u}=\left(\begin{array}{c}5\\ {}-2\\ {}3\end{array}\right) $$

To specify the set ℝ³ completely as a vector space, we still have to specify the two operations “+” and “·” of the vector space. We do this by defining these operations based on the addition and multiplication of the real numbers (i.e., the field over which ℝ³ was formed).

$$ a\in \mathrm{\mathbb{R}},\overrightarrow{u},\overrightarrow{v}\in {\mathrm{\mathbb{R}}}^3: $$

$$ \overrightarrow{u}+\overrightarrow{v}:= \left(\begin{array}{c}{u}_1\\ {}{u}_2\\ {}{u}_3\end{array}\right)+\left(\begin{array}{c}{v}_1\\ {}{v}_2\\ {}{v}_3\end{array}\right)=\left(\begin{array}{c}{u}_1+{v}_1\\ {}{u}_2+{v}_2\\ {}{u}_3+{v}_3\end{array}\right)\ \mathrm{and}\ a\cdotp \overrightarrow{u}:= a\cdotp \left(\begin{array}{c}{u}_1\\ {}{u}_2\\ {}{u}_3\end{array}\right)=\left(\begin{array}{c}a\cdotp {u}_1\\ {}a\cdotp {u}_2\\ {}a\cdotp {u}_3\end{array}\right) $$

In vector spaces, vector addition and scalar multiplication are generally used to define a linear combination of a number of n scalars and n vectors:

$$ \overrightarrow{u}={a}_1\cdotp {\overrightarrow{u}}_1+{a}_2\cdotp {\overrightarrow{u}}_2+\dots +{a}_n\cdotp {\overrightarrow{u}}_n $$

If all n scalars must have the value 0 for the linear combination to yield the zero vector, the n vectors of the linear combination are called linearly independent . If one finds a maximum of d linearly independent vectors in a vector space V, then d is the dimension of the vector space V. In our example, the vector space ℝ³ has dimension 3. By the way, it is not only the set of all 3-tuples that forms a vector space. If k is a natural number, then the set of all k-tuples of real numbers forms a vector space ℝ^k, which has dimension k.

If V is a vector space of dimension n and we find n linearly independent vectors, these vectors are called a base of V. We can then represent each vector of V by a linear combination of these base vectors. The n scalars that occur in this linear combination are called the components or coordinates of a vector.

11.2 Geometry and Vector Spaces

In geometry, directed line segments are called geometric vectors . You can visualize them with an arrow, having a length and a direction. The beginning of the geometric vector is called the tail , and the end of the geometric vector is called the tip . We define an addition operation of two geometric vectors as follows. We place the tail of the second vector at the tip of the first vector – the result of the addition is a geometric vector that then runs from the tail of the first vector to the tip of the second vector. We also define a scalar multiplication , where we choose the real numbers ℝ as scalars (see Fig. 11.1). If we multiply the scalar a by a geometric vector, we get as a result a geometric vector with a × the length of the original geometric vector. If a is positive, the resulting vector points in the same direction; if not, the result vector points in the opposite direction. With these two operations the set of geometric vectors forms a vector space over ℝ.

Directed line segments are useful constructs when we want to model the space surrounding us. However, performing computations with them directly proves to be difficult. Therefore, we take a base from the space of geometric vectors – if we are in the three-dimensional space, it consists of three base vectors. We can represent each geometric vector as a linear combination of these three base vectors. The coordinates in this linear combination are three real numbers – which in turn we can understand as 3-tuples, i.e., an element of the vector space ℝ³.

We can proceed as follows. We assign a vector from ℝ³ to each directed line segment, i.e., to each geometric vector, with the help of a base. In ℝ³ we can calculate with vectors based on the addition and multiplication of real numbers. The result of the calculation is then transferred into the space of the geometric vectors by inserting the calculated result as a scalar into the linear combination of the base vectors. If, for example, we want to add two geometric vectors, then we assign two vectors from ℝ³, the “world of numbers”, to these two vectors from the “world of geometry”. In the “number world” we can calculate the result vector. We transfer this result vector back into the “world of geometry” and thus we have determined the geometric vector resulting from the addition by computation.

11.3 Points and Affine Spaces

However, the usefulness of our mathematical model is still limited: geometric vectors possess only length and direction, but no fixed position in space. This also means that we cannot model essential concepts from the real world, such as distances. Therefore, we introduce the term point in addition to scalar and vector. We write points with capital Latin letters. Points have no length and no direction, but a position. Let P and Q be two elements from the set of points. Then we define an operation “–”, called point-point subtraction , which connects two points and results in a vector:

$$ P-Q=\overrightarrow{u}\iff P=\overrightarrow{u}+Q $$

With this we also define an addition between a point and a vector (called point-vector addition ), where the result is a point. Thus, we can represent any point P in three-dimensional space as an addition of a point O (called the origin ) and a linear combination of three linearly independent geometric vectors $ \overrightarrow{u},\overrightarrow{v},\overrightarrow{w} $, the base vectors:

$$ P=O+a\cdot \overrightarrow{u}+b\cdot \overrightarrow{v}+c\cdot \overrightarrow{w}=O+\overrightarrow{p} $$

We call these three base vectors, together with O, a coordinate system K. We call the 3-tuple (a, b, c) the coordinates of P with respect to K. Thus, every point in our “world of geometry” for a given K can be represented by an element from ℝ³, our “world of numbers”. So, we can “calculate” not only with vectors, but also with points, i.e., with fixed positions in our world. We call $ \overrightarrow{p} $ the position vector belonging to P.

A vector space that has been extended by a set of points and an operation, the point-point subtraction, is called an affine space in mathematics. Geometrically, we can interpret point-point subtraction like this: P – Q is a vector that we get when we choose a directional path with starting point Q and final point P.

11.4 Euclidean Space

We add the concept of distance to our existing mathematical model of the space surrounding us. For this purpose, we introduce another operation, which we denote by “·” and which takes two vectors and results in a scalar. We call this operation the scalar product (not to be confused with scalar multiplication, which takes a scalar and a vector and results in one vector – even if we write both operations with “·”, we always know which operation is meant because of the types of the two operands). The scalar product must adhere to commutativity of multiplication and the following axioms for scalars a, b, vectors $ \overrightarrow{u},\overrightarrow{v},\overrightarrow{w} $and the null vector$ \overrightarrow{0} $:

$$ \left(a\cdot \overrightarrow{u}+b\cdot \overrightarrow{v}\right)\cdot \overrightarrow{w}=a\cdot \overrightarrow{u}\cdot \overrightarrow{w}+b\cdot \overrightarrow{v}\cdot \overrightarrow{w} $$

$$ \overrightarrow{u}\cdotp \overrightarrow{u}>0\kern0.62em \mathrm{if}\kern0.62em \overrightarrow{u}\ne \overrightarrow{0} $$

$$ \overrightarrow{0}\cdotp \overrightarrow{0}=0 $$

In our vector space ℝ³, we can define a scalar product as follows so that all the above conditions are fulfilled:

$$ \overrightarrow{u}\cdotp \overrightarrow{v}=\left(\begin{array}{c}{u}_1\\ {}{u}_2\\ {}{u}_3\end{array}\right)\cdotp \left(\begin{array}{c}{v}_1\\ {}{v}_2\\ {}{v}_3\end{array}\right):= {u}_1\cdotp {v}_1+{u}_2\cdotp {v}_2+{u}_3\cdotp {v}_3 $$

In honor of the Ancient Greek mathematician Euclid of Alexandria, an affine space supplemented by the scalar product operation is called a Euclidean point space . Using the scalar product, we define the amount of a vector as follows:

$$ \left|\overrightarrow{u}\right|=\sqrt{\overrightarrow{u}\cdot \overrightarrow{u}} $$

In our three-dimensional space, the amount of a vector is equal to its length. Thus, we can also determine the distance d between two points P and Q as

$$ d=\left|P\right.-\left.Q\right|=\sqrt{\left(P-Q\right)\cdot \left(P-Q\right)} $$

The angle α enclosed by two vectors can be determined from the following equation:

$$ \overrightarrow{u}\kern0.36em \cdot \kern0.36em \overrightarrow{v}=\left|\overrightarrow{u}\right.\left|\kern0.36em \cdot \kern0.36em \right.\left|\overrightarrow{v}\right.\left|\kern0.36em \cdot \right.\;\cos \alpha $$

In the case α = 90° (i.e., the two vectors are perpendicular to each other) the scalar product of the two vectors is 0. Two vectors whose scalar product is 0 are called orthogonal . If the two vectors also have length 1, they are called orthonormal. For the base in our space, we want to use orthonormal vectors in the following. A corresponding coordinate system (base vectors are perpendicular to each other and have length 1) is called a Cartesian coordinate system . In the case of ℝ³, we take the three unit vectors

$$ {\overrightarrow{e}}_x=\left(\begin{array}{c}1\\ {}0\\ {}0\end{array}\right),{\overrightarrow{e}}_y=\left(\begin{array}{c}0\\ {}1\\ {}0\end{array}\right),{\overrightarrow{e}}_z=\left(\begin{array}{c}0\\ {}0\\ {}1\end{array}\right) $$

in the given order and the point O as the origin point, whose position vector is the zero vector.

To be able to easily find a vector orthogonal to two vectors in ℝ³, we define an operator “×”, which we call the cross product and which takes two vectors and results in one vector:

$$ \overrightarrow{n}=\overrightarrow{u}\times \overrightarrow{v}=\left(\begin{array}{c}{u}_1\\ {}{u}_2\\ {}{u}_3\end{array}\right)\times \left(\begin{array}{c}{v}_1\\ {}{v}_2\\ {}{v}_3\end{array}\right):= \left(\begin{array}{c}{u}_2\cdotp {v}_3-{u}_3\cdotp {v}_2\\ {}{u}_3\cdotp {v}_1-{u}_1\cdotp {v}_3\\ {}{u}_1\cdotp {v}_2-{u}_2\cdotp {v}_1\end{array}\right)=-1\cdotp \left(\overrightarrow{v}\times \overrightarrow{u}\right) $$

The resulting vector is called a normal vector . In this order, the vectors $ \overrightarrow{u},\overrightarrow{v},\overrightarrow{n} $form a right-handed system , i.e., if you take them as geometric vectors and place their tail on a common point, the vectors are oriented like the thumb, index finger and middle finger of the right hand. The vector product is not commutative. While one can generalize our definition of the scalar product from ℝ³ to ℝⁿ and thus obtain Euclidean point spaces of dimension n, the cross product is defined exclusively in ℝ³.

11.5 Analytical Geometry in ℝ³

In ℝ³, our mathematical model of the space surrounding us, we can solve geometric problems by computation, e.g., finding an intersection of lines or determining the distance of a point to a plane. A line is the generalization of a directed line segment: it has no direction and has infinite length. A line is defined by two points. Mathematically we model a line g through points P and Q as a subset of ℝ³ that includes all points X whose position vector $ \overrightarrow{x} $satisfies the equation of the line, using the position vectors associated with P and Q:

$$ g=\left\{\overrightarrow{x}\in {\mathrm{\mathbb{R}}}^3|\exists t\in \mathrm{\mathbb{R}},\overrightarrow{x}=\overrightarrow{p}+t\cdotp \left(\overrightarrow{q}-\overrightarrow{p}\right)\right\} $$

The scalar t is called the parameter and the equation above is also called the vector equation of a line. The vector that is multiplied by t is called the directional vector of the line g. Similarly, we can model a plane E as a subset of ℝ³. It is defined by three points P, Q, R and the equation of the plane contains two parameters and two directional vectors:

$$ E=\left\{\overrightarrow{x}\in {\mathrm{\mathbb{R}}}^3|\exists t,s\in \mathrm{\mathbb{R}},\kern0.62em \overrightarrow{x}=\overrightarrow{p}+t\kern0.36em \cdotp \kern0.36em \left(\overrightarrow{q}-\overrightarrow{p}\right)+s\kern0.36em \cdotp \kern0.36em \left(\overrightarrow{r}-\overrightarrow{p}\right)\right\} $$

By means of the cross product, we can compute the normal vector $ \overrightarrow{n} $ from the directional vectors, which is perpendicular to E. For the distance d of a point X to a plane E we know the following equation in linear algebra, where the sign of the scalar product indicates on which side of E the point X is located:

$$ d=\left|\frac{\overrightarrow{n}}{\left|\overrightarrow{n}\right|}\cdot \left(\overrightarrow{x}-\overrightarrow{p}\right)\right| $$

Thus, we can reformulate the condition that points X belong to the subset E. This is because all points X that have the distance 0 from E lie on the plane E. Thus, we obtain the point-normal form of a plane:

$$ E=\left\{\overrightarrow{x}\in {\mathrm{\mathbb{R}}}^3|\overrightarrow{n}\cdotp \left(\overrightarrow{x}-\overrightarrow{p}\right)=0\right\} $$

With these definitions you can compute intersections between lines and between a line and a plane as well as intersections between planes. The first step is to equate the equations that define the set of points that form a line or a plane. Alternatively, substitution can sometimes be used. This results in either an equation to be solved or a linear system of equations, the solution of which can be computed by mathematical methods (for example, Gaussian elimination).

11.6 Matrices

In virtual reality, another mathematical construct is often used to compute transformations such as rotations or translations in three-dimensional space: the matrix (plural: matrices). A matrix is a table of n rows and m columns where each entry is a scalar. In the following, we will always assume that entries are real numbers. We find the scalar a_ij in row i and column j of the matrix. It is called the entry in place (i, j). We write matrices with bold capital letters: A = [ a_ij ] and say A is an n × m matrix. The matrix M in our example has two rows and four columns, so it is a 2 × 4 matrix, and the entry m_1,3 has the value 5:

$$ \boldsymbol{M}=\left[\begin{array}{llll}1& 0& 5& 3\\ {}1& 9& 2& 0\end{array}\right] $$

For matrices, we define three operations. First, the scalar-matrix multiplication , denoted by “·”, which combines a scalar s and a n × m matrix A = [a_ij] to form an n × m matrix: s·A = s·[a_ij]:=[s·a_ij]. This operation adheres to associativity. Secondly, matrix-matrix addition , denoted by “+”, links two matrices A and B of the same size n × m to form a matrix of size n × m: A + B = [a_ij] + [b_ij] := [a_ij + b_ij]. This operation adheres to associativity and commutativity. Third, matrix-matrix multiplication , denoted by “·”, combines a matrix A of size n × k and a matrix B of size k × m to form a matrix of size n × m:

$$ \boldsymbol{A}\cdotp \boldsymbol{B}:= \left[{c}_{ij}\right]\kern1.6em \mathrm{with}\kern1.6em {c}_{ij}=\sum \limits_{l=1}^k{a}_{il}\cdotp {b}_{lj} $$

This operation adheres to associativity. It should be emphasized that commutativity does not apply to matrix-matrix multiplication: A·B does not always equal B·A.

If we swap the rows and columns in a matrix, we get the transposed matrix . The transposed matrix of matrix M = [a_ij] is M^T=[a_ji]. The following applies: (A·B)^T = B^T · A^T. A special case are matrices that have the same number of rows and columns. These are called square matrices. The square matrix I for which the following applies

$$ \boldsymbol{I}=\left[{a}_{ij}\right],\kern1.6em {a}_{ij}=\left\{\begin{array}{cc}1& \mathrm{if}\kern0.62em i=j\\ {}0& \mathrm{otherwise}\end{array}\right. $$

is called the unit matrix. The following applies: A·I = I·A = A, where A and I are both n × n matrices If a matrix A^–1 of the same size exists for an n × n matrix A and the equation A·A^–1 = I applies, then A^–1 is called the inverse matrix of A. A is then called invertible. The following applies: (A·B)^–1 = B^–1·A^–1. If the following applies to a matrix A: A^–1 = A^T, then A is called orthogonal.

11.7 Affine Transformations

Assume that the point P has coordinates (x, y, z) with respect to a Cartesian coordinate system. If we translate P by t_x in the x-direction, by t_y in the y-direction and by t_z in the z-direction, we map point P to a new point P′. What are its coordinates? To calculate such transformations, we utilize matrices. We introduce a special notation for matrices that consist of only one column: we write them with small bold letters and call them column matrices. Now we want to represent the point P by the column matrix p. We do this as follows:

$$ \boldsymbol{p}=\left[\begin{array}{c}w\kern0.36em \cdotp \kern0.36em x\\ {}w\kern0.36em \cdotp \kern0.36em y\\ {}w\kern0.36em \cdotp \kern0.36em z\\ {}w\end{array}\right],\mathrm{for}\;\mathrm{any}\;\mathrm{real}\ \mathrm{number}\;w\;\mathrm{with}\;w\ne 0 $$

We call (w·x, w·y, w·z, w) the homogeneous coordinates of P. In practice, for the sake of simplicity, usually w = 1 is chosen. If one chooses w = 0, one can represent a vector in a column matrix instead of a point by means of homogeneous coordinates:

$$ \overrightarrow{v}=\left(\begin{array}{c}x\\ {}y\\ {}z\end{array}\right)\kern1.6em \equiv \kern1.72em \mathbf{v}=\left[\begin{array}{c}x\\ {}y\\ {}z\\ {}0\end{array}\right] $$

The translation from P to P′ can be described by a matrix M. The following simple equation applies:

$$ {\boldsymbol{p}}^{\prime }=\boldsymbol{M}\cdot \boldsymbol{p} $$

In our translation example, this equation looks like this:

$$ {\boldsymbol{p}}^{\prime }=\left[\begin{array}{cccc}1& 0& 0& {t}_x\\ {}0& 1& 0& {t}_y\\ {}0& 0& 1& {t}_z\\ {}0& 0& 0& 1\end{array}\right]\cdotp \left[\begin{array}{c}w\cdotp x\\ {}w\cdotp y\\ {}w\cdotp z\\ {}w\end{array}\right]=\left[\begin{array}{c}w\cdotp \left(x+{t}_x\right)\\ {}w\cdotp \left(y+{t}_y\right)\\ {}w\cdotp \left(z+{t}_z\right)\\ {}w\end{array}\right] $$

From the resulting column matrix p′, we can obtain the coordinates of point P′ after division by w: (x + t_x, y + t_y, z + t_z). If instead of p, which represents a point, we were to use the column matrix v, which represents a vector, in the above equation, then v would be mapped exactly back to v. This is also what we expect: since a vector has no fixed position in space, it is not changed by a displacement. As we will see below, the transformation of a vector by a more complex transformation is slightly more complicated.

Let us take a closer look at the matrix M that represents this translation. You can think of its four columns as column matrices. The first three columns represent vectors, because the value in the fourth row is zero. In fact, these are the base vectors of our three-dimensional space if we apply the translation to them. They do not change, because a translation does not change the length or the direction of a vector. The fourth column vector represents a point, because the value in the fourth row is not zero. This column vector represents the origin when the translation is applied to it. As a result of the translation, the origin (0, 0, 0) is mapped to (t_x, t_y, t_z). Therefore, this transformation can be seen as a change from one coordinate system of our three-dimensional space to another coordinate system. In fact, mathematicians have been able to show that each change of coordinate systems can be represented as a matrix M. With 4 × 4 matrices M, not only can translations be computed, but also other affine transformations that map one affine space into another. Besides translation, the following geometric transformations are also included: rotation, scaling, reflection and shearing. If you invert the matrix M, you get the matrix M^–1, which represents the inverse mapping of M, i.e., it reverses the mapping represented by M.

Let us assume that we perform n geometric transformations of the point P. We represent the transformation performed first by M₁, the second by M₂ and so on, until finally the transformation performed last is represented by M_n. This allows us to determine the coordinates of the point P′ resulting from the back-to-back execution (concatenation ) of these transformations as follows:

$$ {\boldsymbol{p}}^{\prime }=\left({\boldsymbol{M}}_n\cdotp \kern0.36em \dots \cdotp \kern0.36em {\boldsymbol{M}}_3\cdotp \kern0.36em {\boldsymbol{M}}_2\cdotp \kern0.36em {\boldsymbol{M}}_1\right)\kern0.36em \cdotp \kern0.36em \boldsymbol{p} $$

Note the order of the matrices and keep in mind that matrix multiplication is not commutative. If you perform the computation as indicated by the brackets, you only need to compute the product of all n matrices once, even if you transform hundreds of points with the same transformation. For a large number of points to be transformed, this results in a considerable saving of computing time. Matrix operations for 4 × 4 matrices are implemented directly in hardware in graphics processors, which leads to another reduction in computing time.

Besides points, vectors can also be transformed by a matrix M that describes an affine transformation. If we want to know where the vector $ \overrightarrow{v} $ is mapped to after the transformation described by M, we represent the vector in the column matrix v. We compute v^′ = (M^–1)^T · v and the first three rows of the column matrix v′ contain the coordinates of the transformed vector.

11.8 Determination of Transformation Matrices

To calculate geometric transformations or to perform a change between coordinate systems, we need a matrix M that represents this transformation, as described in the last section. But how do we determine this matrix M? In principle there are two ways.

The first alternative is to know formulas for these matrices for certain standard cases. The formula for translation has already been given in Sect. 11.7. For rotation by an angle α around the x-axis around the origin point, the following formula can be found for the matrix M:

$$ \mathbf{M}=\left[\begin{array}{cccc}1& 0& 0& 0\\ {}0& \cos \alpha & -\sin \alpha & 0\\ {}0& \sin \alpha & \cos \alpha & 0\\ {}0& 0& 0& 1\end{array}\right] $$

Accordingly, one can also find formulas for transformation matrices for rotation around the y-axis, around the z-axis or around any other axis, for reflection, or for scaling in computer graphics textbooks. From these standard cases, more complex transformations can be computed by concatenation (see Sect. 11.7). For example, if you want to calculate a rotation of 30° around the x-axis around the center of rotation (1, 2, 3), you divide this transformation into three transformations for which a formula is known: first, you perform a translation by (–1, –2, –3), which takes the center of rotation to the origin (because we only know the formula for rotations around the origin). Then you rotate 30° around the x-axis around the origin point and reverse the first translation performed with the inverse translation. The matrix for the entire transformation is obtained by multiplying the three matrices for the standard cases (note the order):

$$ \mathbf{M}=\left[\begin{array}{cccc}1& 0& 0& 1\\ {}0& 1& 0& 2\\ {}0& 0& 1& 3\\ {}0& 0& 0& 1\end{array}\right]\cdotp \left[\begin{array}{cccc}1& 0& 0& 0\\ {}0& \cos\ {30}^{{}^{\circ}}& \hbox{--} \sin\ {30}^{{}^{\circ}}& 0\\ {}0& \sin\ {30}^{{}^{\circ}}& \cos\ {30}^{{}^{\circ}}& 0\\ {}0& 0& 0& 1\end{array}\right]\cdotp \left[\begin{array}{cccc}1& 0& 0& -1\\ {}0& 1& 0& -2\\ {}0& 0& 1& -3\\ {}0& 0& 0& 1\end{array}\right] $$

The second alternative to determine the matrix M, which we need according to the formula p′ = M·p to compute a transformation or to change coordinate systems, is to construct M directly:

We start with our coordinate system K, which consists of three base vectors and the origin point. We also need to know the target coordinate system K′ after the transformation, which results from the geometrical transformation of the three base vectors and the origin point of K. Let M be the matrix that changes coordinates from coordinate system K to K′, i.e., M computes the geometric transformation from K to K′.
We represent the first base vector of K′ as a column matrix of size 4 by entering its three coordinates with respect to K in the first three rows of the column matrix and a zero in the fourth row. Analogously, we obtain column matrices for the second and third base vector of K′. We represent the origin point of K′ by entering its coordinates with respect to K in the first three rows of a column matrix of size 4 and a one in the fourth column. From these four column matrices, we form the matrix M^–1 of size 4 × 4 by writing them next to each other according to the above order. By inverting M^–1 we obtain the matrix M that we are looking for.

If a point P has coordinates (x, y, z) with respect to the old coordinate system K, its new coordinates with respect to K′ are calculated with the matrix M as follows:

We represent P as a column matrix p with the homogeneous coordinates (x, y, z, 1).
We calculate the matrix product p′ = M·p
The values in the first three rows of p′ are the coordinates of P with respect to the new coordinate system K′

Author information

Authors and Affiliations

Department of Design, Computer Science, Media, RheinMain University of Applied Sciences, Wiesbaden, Germany
Ralf Doerner

Authors

Ralf Doerner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ralf Doerner .

Editor information

Editors and Affiliations

Department of Design, Computer Science, Media, RheinMain University of Applied Sciences, Wiesbaden, Germany
Ralf Doerner
Department of Computer Science and Automation / Department for Economic Science and Media, Ilmenau University of Technology, Ilmenau, Germany
Wolfgang Broll
Department of Media, Darmstadt University of Applied Sciences, Darmstadt, Germany
Paul Grimm
Institute for Informatics, TU Bergakademie Freiberg, Freiberg, Germany
Bernhard Jung

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Doerner, R. (2022). Mathematical Foundations of VR/AR. In: Doerner, R., Broll, W., Grimm, P., Jung, B. (eds) Virtual and Augmented Reality (VR/AR). Springer, Cham. https://doi.org/10.1007/978-3-030-79062-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-79062-2_11
Published: 12 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79061-5
Online ISBN: 978-3-030-79062-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mathematical Foundations of VR/AR

Abstract

Similar content being viewed by others

AR and VR – A Review on Recent Progress and Applications

The Evolution of Azuma’s Augmented Reality– An Overview of 20 Years of Research

Augmented Reality: A Comprehensive Review

11.1 Vector Spaces

11.2 Geometry and Vector Spaces

11.3 Points and Affine Spaces

11.4 Euclidean Space

11.5 Analytical Geometry in ℝ³

11.6 Matrices

11.7 Affine Transformations

11.8 Determination of Transformation Matrices

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Mathematical Foundations of VR/AR

Abstract

Similar content being viewed by others

AR and VR – A Review on Recent Progress and Applications

The Evolution of Azuma’s Augmented Reality– An Overview of 20 Years of Research

Augmented Reality: A Comprehensive Review

11.1 Vector Spaces

11.2 Geometry and Vector Spaces

11.3 Points and Affine Spaces

11.4 Euclidean Space

11.5 Analytical Geometry in ℝ3

11.6 Matrices

11.7 Affine Transformations

11.8 Determination of Transformation Matrices

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation

11.5 Analytical Geometry in ℝ³