Keywords

1 Introduction

Augmented reality is a technology which superimposes computer generated images on top of a user’s perception of the real world in real time [7]. AR has important applications in fields such as videogamming, interactive marketing and advertising, instructional aids and how-to for use, construction and maintenance, and navigation [7].

Perhaps one of most successful application of AR could be in education [3]. As a new technology, AR could help to students to learn more effectively and increase knowledge retention, relative to traditional 2D desktop interfaces. With the aid of AR could be build the virtual interaction with complex phenomena (showing the magnetic field, or the Earth layers, for example).

In [4] authors analyze an AR application on sixty nine middle-school students. Their results show an increase in student’s motivation, attention and motivation factors or the learning environment based on augmented reality technology compared with a more traditional learning environment. Authors also mention that AR is not mature enough to be used massively in education.

Users in AR application need to learn how to build 3D scenes, how to draw virtual objects, and how to manage, although amazing and exciting, a technology that is complex.

OpenCV [2] is the open tool to learn Image Processing, Computer Vision, and 3D user interfaces. This is the facto standard in this field. This library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. But is the OpenCV library is far to be an easy tool, in its homepage there are listed 52 books about how to learn it.

The idea of this paper is to offer a simpler way to process fiducial markers. In Sect. 2 is presented the necessary components to build an AR application. Sect. 3 describes our proposed library that can be used in computationally restricted devices. Sect. 4 shows an AR application build with our library and running on the Raspberry Pi 3, a SBC that uses a 1.2 GHz 64-bits quad-core ARM Cortex-A53 CPU and 1 GB of RAM. Finally, in Sect. 5, some conclusions are drawn.

2 Augmented Reality Application

The following steps are performed in a typical AR application: (1) An image processing step to recognize a fiducial marker, (2) A computer vision step to calibrate the camera, this is, to obtain the projection matrix that transforms the 3D world viewing by the camera to a 2D plane that forms the viewed image; this step also obtains the pose, this is, the rotation and translation of the marker with to a virtual and global coordinate system. And finally, the step (3) that draws a virtual object over the coordinate system fixed in the marker. The augmented world is visualized only on the display, where it is possible to see the projection of the virtual objects on the image generated by the camera.

Two very simple fiducial markers are shown in Fig. 1: one is a simple black square, and the other is a pattern of two concentric circles [10]. The white area in the background of both markers help them to be recognized by a computer: it is a high contrast zone on the image. The black zone is supposed to be one of the biggest black objects on the scene. Marker detection is performed with image processing techniques: (1) a global threshold is applied to the input image which produce a binary (blank and white) image, (2) some morphological erosion and dilatation operations are applied to remove noise on the binarized image, and a labeling of the all black components in the image is performed. These image processing tasks were applied using library in [6].

Fig. 1.
figure 1

Images of two very simple fiducial markers in use.

Once we have the markers, the vertices positions must be calculated with the marker in Fig. 1(a), and the points of each ellipse must the extracted from marker in 1(b). With these data the homography can be estimated. Our library helps to perform easily these steps. Now, these Computer Vision steps will be described in detail.

3 Description of the Lightweight Library

Homography calculation. To obtain a homography by the normalized DTL algorithm [9] it is necessary to solve an overdetermined system of equations that forms a matrix of size \(2n \times 8\), where n is the number of point correspondences between the marker model and an image of the same model. The easiest way to solve this problem is using the QR decomposition computed with the modified Gram?Schmidt algorithm [8]. In fact, this is the shortest code and easiest codification to solve the QR decomposition. If the overdetermined system of equations is expressed as \(A \mathbf {h}= \mathbf {b}\), \(\mathbf {h}\) is found using normal equations doing:

$$\begin{aligned} \begin{aligned} Q R \mathbf {h}&= \mathbf {b}, \\ (Q R)^{\text {T}} Q R \mathbf {h}&= (Q R)^{\text {T}} \mathbf {b}, \\ R^{\text {T}} Q^{\text {T}} Q R \mathbf {h}&= R^{\text {T}} Q^{\text {T}} \mathbf {b}, \\ R \mathbf {h}&= Q^{\text {T}} \mathbf {b}= \mathbf {c}. \end{aligned} \end{aligned}$$
(1)

Here the system is already solved because matrix R is upper triangular and \(\mathbf {h}\) values are found by back-substitution, staring with \(h_{n-1} = c_{a}n / r_{nn}\).

We use here a further way to compact the calculations, such as it was suggested also by Golub and Van Loan [8]: the QR decomposition is calculated to the extended matrix \([ A \; | \; b ]\), of size \(2n \times 9\), then one obtains \(Q [ R \; | Q^\text {T}b ]\), thus the product \(Q^\text {T}b\) is obtained in the last column. Notice here that matrix Q is not needed, thus the subroutine to solve the \(A \mathbf {h}= \mathbf {b}\) is as (using the same notation that in [8]):

figure a

A notice here that is important. The calculation of the homography is a very well conditioned problem, then the use of the QR algorithm is justified. The only way to get a pour conditioned problem is to use repeated points correspondences to try to calculate the homography.

Eigendecomposition of a symmetric matrix of size \(3 \times 3\). The eigendecomposition of a symmetric matrix A results in the matrices \(V \; D \; V^\text {T}\), where D is a diagonal matrix former with the eigenvalues of A, and V is a orthogonal matrix where its columns are the eigenvectors corresponding to each eigenvalue in D. This problem is simplified with matrices of size \(3 \times 3\) because is equivalent to find the roots of a cubic equation [6, 12].

SVD of a matrix of size \(3 \times 3\). This problem is used to solve the orthogonalization of a matrix, specifically when this matrix is a matrix obtained with a linear method using a plane [13] or a circular marker [10]. The calibration of a camera with both linear methods produces a rotation matrix R far of being orthogonal, then to become orthogonal such matrix its Singular Value Decomposition is applied: \(R = U D_1 V^\text {T}\), and \(R' = U V^\text {T}\) is obtained, where \(R'\) is already orthogonal. In [13] it is demonstrated that the obtained \(R'\) is the best orthogonal matrix which minimizes the Frobenius norm of \(R'-R\).

Fitting an ellipse to a set of points. The fastest algorithm to fit a set of point to an ellipse is by solving a least square problem that minimizes the sum of squared algebraic distances [5]. Instead to solve this problem as an eigendecomposition of a \(6 \times 6\) matrix, as it is in [5], this problem can be transformed easily to solve three times a eigendecomposition of \(3 \times 3\) matrices, or to find three times the roots of three cubic equations [6].

Camera calibration using a marker of two concentric circles. This marker is presented in [10]. First, it is necessary to recognize two ellipses. For this task the previous method in the last paragraph can be used. With the two recognized ellipses, the homography can be obtained directly as it is explained in [10]. Here it is necessary to invert a \(3 \times 3\) matrix, thus its eigendecomposition can be used because the used matrices in this method are the representation of a conic equation, which is symmetric and of \(3 \times 3\) size.

4 Experiments and Results

To test our library against the results produced with a high level language, such as Octave, to demonstrate the correctness of its calculations. We generate two simulated images, each one from a square and two ellipses as the planar models of the markers. From these models we use the pinhole camera model in (2) with a known camera position to generate the two images shown in Fig. 2(b) and (c). The used pinhole camera model is

$$\begin{aligned} \lambda \mathbf {p}= K R [ I | -\mathbf {c}] \mathbf {P}, \end{aligned}$$
(2)

where \(\mathbf {p}= [u, v, 1]^\text {T}\) is a point over the image, \(\mathbf {P}= [x, y, z, 1]\text {T}\) is a point in the 3D scene, \(\mathbf {c}\) is the position of the camera, and R is a rotation matrix. Markers are situated on the xy-plane, then \(\mathbf {P}= [x, y, 0, 1]^\text {T}\) and the camera model is reduced as:

$$ \begin{aligned} \lambda \mathbf {p}&= K R [ \mathbf {e}_1, \; \mathbf {e}_2, \; -\mathbf {c}] \mathbf {P},\\ \lambda \mathbf {p}&= K [ \mathbf {r}_1, \; \mathbf {r}_2, \; -R\mathbf {c}] \mathbf {P},\\ \lambda \mathbf {p}&= H \mathbf {P}, \end{aligned} $$

where I is the identity matrix \(I = [\mathbf {e}_1, \; \mathbf {e}_2, \; \mathbf {e}_3]\), and \(R = [\mathbf {r}_1, \; \mathbf {r}_2, \; \mathbf {r}_3]\), and we are abusing the notation on \(\mathbf {P}\) to still denote a homogeneous 2D point on the model \(\mathbf {P}= [x, y, 1]\text {T}\). The used camera intrinsic parameters are in matrix K:

$$ K = \begin{bmatrix} 1000&0&-300\\ 0&1000&-200\\ 0&0&-1 \end{bmatrix}, $$

to generate images of size \(600 \times 400\) pixels, the principal point is situated in its center at (300, 200).

Table 1. Results of camera calibration and pose estimation using the markers in Fig. 2
Fig. 2.
figure 2

The square and two concentric circles markers and their projections used in the experiment. Images in (b) and (d) are generated with the parameters detailed in text.

Then we apply both codes, in C with out light library and with Octave, to estimate both homographies H, and recover the rotation matrix and camera position. Results are shown in Table 1.

Details for this first experiment are as follows: The square model has vertices \(\{ (10,10)\), \((-10,10)\), \((-10,-10)\), \((10,-10) \}\). The two circles models have radius 10 and 5. Both markers are shown in Fig. 2(a) and (c), respectively. Both codes, in C and Octave, for this experiment are available in http://cs.cinvestav.mx/~fraga/LightLib.tar.gz.

The results shown in Table 1 are not the best with respect to the ground truth. This is because two reasons: (1) The marker images in Fig. 2(a) and (c) where generated rounding the pixels locations to the nearest integers, and (2) The pose calculated using the homography is based in a linear method which is not the best solution. The solutions obtained and shown in Table 1 must be refined with a non-linear method to improve their values. An alternative that will be explored as future work is to use a PnP algorithm such as the one in [11, 14] to calculate the pose. The inconvenience of using a PnP algorithm is that the camera must be calibrated in advance.

With the library were programmed two augmented reality applications shown in Fig. 3. A virtual object is shown above the corresponding marker. It is possible to move the marker in the real world, or to move, carefully, the camera. The virtual object follows the marker on the monitor screen. We used images with a resolution of \(640 \times 480\) pixels, and then we obtained a processing frame rate of 30 frames per second. The Raspberry Pi 3 camera has a buildin autofocus feature [1], thus camera must be recalibrated at time to time. In our applications we recalibrate the camera at every frame because it is possible that focus change due this autofocus characteristic. In applications in Fig. 3 OpenGL 2.1 was used (this is the version that supports the Raspberry Pi 3) using glut for input/output. The glut function glutTimerFunc implements the timer that guide the main loop at 30 fps. Still could be possible to add a intelligent behavior to the application: perhaps it is not necessary to recalibrate if the reprojection error of the marker vertices is not to high. We are going to check this last possibility in the near future.

Fig. 3.
figure 3

Two pictures from two applications of augmented reality working on the Raspberry Pi 3

In OpenCV documentation in [2] it is available the SVD but not the QR decomposition. It seems now that OpenCV uses its own SVD implementation. In previous versions it used old Blas/Lapack numerical libraries (these libraries are used also by Octave). These old numerical libraries are written in Fortran language. We believe our small library uses memory more efficiently than Blas/Lapack and because of this reason, it can run very fast.

5 Conclusions

We have developed a light numerical library which can be applied in augmented reality application. We can calibrate a camera and calculate the pose for two different markers.

This library is programmed in C language and is intended for computing restricted devices such as smartphones, tablets and single board computers.

Results with our library are almost the same that the ones obtained with Octave that is a high level numerical language.

We tested our library in the Raspberry Pi 3 SBC. A frame rate of 30 frames per second with a camera resolution of \(640 \times 480\) pixels was obtained.

As a future work we think it is necessary a function to calculate the marker pose solving the PnP problem [14]. PnP problem solves the pose if camera intrinsic parameters are known. The method to obtain the pose based in the homography, described in this article, is a linear method which has not the best results. Otherwise, it is necessary to refine the solution using a non-linear method, which could be prohibitive in a real time application. It is necessary to investigate more about this problem of pose detection.