Keywords

4.1 Introduction

In computer vision research which is rapidly developing, we could extract geometric information (such as the position, shape, location, etc.) of objects in 3D space in accordance with images the camera captured, then reconstruct and perceive objects according to this information [1]. A point on the surface of the three-dimensional object can be mapped to the corresponding point on the image by the geometric model (such as pinhole model) of the camera. The geometric parameters are named camera parameters [2]. Usually under most conditions they can be calculated by experiments, the process is camera calibration [3].

Those current camera calibration techniques can be roughly classified into two categories: traditional calibration and self-calibration [4].

Traditional calibration: based on camera imaging model (such as pinhole or fish-eye model), under these conditions that the shape and size of the calibration target is fixed and must have been known, through image processing and a series of mathematical transformations (linear calculation and nonlinear optimization), to calculate camera parameters [5]. Self-calibration: using the corresponding relationships between some specific quantities of the two images imaged before and after the camera rotation or translation, to complete camera calibration. Because the camera’s main point and effective focal length have inherent constraints on the basis of certain camera imaging model, and these constraints usually have nothing to do with the surrounding environment and the movement of the camera. Therefore, self-calibration can take advantage of this [6].

Camera calibration has a long history, as early as 1986, R. Tsai [7] has created a classical Tsai’s camera model, putting forward the two-step calibration strategy which belongs to traditional calibration. This camera model can compensate camera radial distortion. The two-step calibration strategy [8] establishes equations by using RAC (radial alignment constraint), through direct linear operation and nonlinear optimization, to seek for the internal and external parameters. But this method has complex calculation process and high equipment accuracy requirement so that not fit with simple experimental conditions. Moreover, it is hard to detect feature points and measure data [9].

Professor Zhang [10] improved the two-step calibration strategy and proposed method based on planar template. At first, a set of images are obtained by observing a planar template at a few (at least two) different orientation. Then, the procedure consists of a closed-form solution, followed by a nonlinear refinement based on the maximum likelihood criterion.

Compared the above-mentioned methods, we adopt the second. This paper is organized as follows: Sect. 4.2 describes the basic principle of camera calibration. Section 4.3 describes the calibration procedure. We make the planar template in front of a camera rotate or translate two times or above; or camera rotate or translate two times or above while planar template is fixed. Specific parameters of planar template moving need not be known. Section 4.4 provides the experimental results. Finally, Sect. 4.5 presents a brief summary.

4.2 The Basic Principle of Camera Calibration

4.2.1 Four Kinds of Coordinate Systems

There are many kinds of coordinate systems in computer vision. The following four kinds of coordinate systems are often used while calibrating.

  1. (1)

    image pixel coordinate system

    As shown in Fig. 4.1, there is a two-dimensional orthogonal coordinate system O 0 uv, whose origin O 0 is on the top left corner of the digital image [2]. (u, v) is the coordinate of each pixel, indicating the v-th row and the u-th column element in the array. O 0 uv is called image pixel coordinate system [11].

    Fig. 4.1
    figure 1

    Image pixel coordinate system and image physics coordinate system

  2. (2)

    image physics coordinate system

    As shown in Fig. 4.1, an orthogonal coordinate system O 1 xy is built on the image plane, whose origin O 1 is the main point of the image [1]. Its x and y axes parallel to u and v axes respectively. Assumed the image pixel coordinate of the origin O 1 is (u 0, v 0), the distances between every two pixel along x and y axes direction are dx and dy respectively (The unit can assume to be mm), so the relation between image pixel coordinate (u, v) and image physics coordinate (x, y) of any point in image is:

    $$ u = \frac{x}{dx} + u_{0}\quad v = \frac{y}{dy} + v_{0} $$
    (4.1)

    We represent the relation with homogeneous coordinates and the matrix. The expression is:

    $$ \left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {1/dx} & 0 & {u_{0} } \\ 0 & {1/dy} & {v_{0} } \\ 0 & 0 & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} x \\ y \\ 1 \\ \end{array} } \right] $$
    (4.2)
  3. (3)

    camera coordinate system

    The geometric relationship about camera imaging is shown in Fig. 4.2. The camera coordinate system O C X C Y C Z C has its origin O C at the center of projection, its Z C axis along the optical axis, and its X C and Y C axes parallel to the x and y axes of image physics coordinate system respectively [5]. The distance from image plane to the core of camera O C O 1 is the camera’s focal length f [5].

    Fig. 4.2
    figure 2

    Camera coordinate system and world coordinate system

  4. (4)

    world coordinate system

    Because the position of camera and target is not fixed in the space, we can only describe their relative position through the establishment of reference coordinate system. This is the world coordinate system O W X W Y W Z W , as shown in Fig. 4.2 [2]. The relationship between the camera coordinate and the world coordinate can be represented by a rotation matrix R and a translation vector t [2]. Assumed that a specific point P in 3D space, whose camera coordinate and world coordinate are (X C , Y C , Z C )T, (X W , Y W , Z W )T, the relationship between them is:

    $$ \left[ {\begin{array}{*{20}c} {X_{C} } \\ {Y_{C} } \\ {Z_{C} } \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} \user2{R} & \user2{t} \\ {{\mathbf{0}}^{T} } & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ {Z_{W} } \\ 1 \\ \end{array} } \right] = \user2{M}_{\text{2}} \left[ {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ {Z_{W} } \\ 1 \\ \end{array} } \right] $$
    (4.3)

    R is 3 × 3 unit orthogonal matrix and t is translation vector, 0 = (0, 0, 0)T, M 2 is 4 × 4 matrix, standing for the relationship between the camera coordinate and the world coordinate.

4.2.2 The Camera Model

4.2.2.1 The Linear Camera Model

A 3D point P projected to the corresponding point p on the image plane can be expressed by pinhole model approximately. As shown in Fig. 4.2, an image point p is intersection between image plane and the attachment of camera’s core O C and P [2]. We call this model center projection or perspective projection [2]. According to the principle of similar triangles, we get:

$$ x = \frac{{fX_{C} }}{{Z_{C} }}\quad y = \frac{{fY_{C} }}{{Z_{C} }} $$
(4.4)

The relationship between image physical coordinate and camera coordinate is:

$$ Z_{C} \left[ {\begin{array}{*{20}c} x \\ y \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {X_{C} } \\ {Y_{C} } \\ {Z_{C} } \\ 1 \\ \end{array} } \right] $$
(4.5)

We put Eqs. 4.2 and 4.3 into Eq. 4.5, then get the relationship between world coordinate and image pixel coordinate:

$$ Z_{C} \left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {f/dx} & 0 & {u_{0} } & 0 \\ 0 & {f/dy} & {v_{0} } & 0 \\ 0 & 0 & 1 & 0 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} \user2{R} & \user2{t} \\ {{\mathbf{0}}^{T} } & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ {Z_{W} } \\ 1 \\ \end{array} } \right] = \user2{M}_{\text{1}} \user2{M}_{\text{2}} \left[ {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ {Z_{W} } \\ 1 \\ \end{array} } \right] = \user2{M}\left[ {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ {Z_{W} } \\ 1 \\ \end{array} } \right] $$
(4.6)

Fx = f/dx, fy = f/dy are expressed as the effective focal length of the camera in the x and y axes. M is 3 × 4 matrix, matrix M 1 is only related to the camera’s internal structure, which is called camera intrinsic parameters matrix defined by fx, fy, u 0, v 0. While matrix M 2 is only related to the camera’s external parameters, which is called camera extrinsic parameters matrix.

4.2.2.2 The Non-Linear Camera Model

When the camera lens are wide-angle lens or the production of camera is not standard, it occurs the distortion on the edge of image, what’s more, the more close to the edge, the more serious distortion phenomenon. Therefore, if to use linear model to calibrate camera, image point p will deviate from the original position and produce very large error. So the non-linear camera model is used [2], as follows:

$$ \left\{ {\begin{array}{*{20}l} {\updelta_{x} = k_{1} x(x^{2} + y^{2} ) + k_{2} x(x^{2} + y^{2} )^{2} + k_{3} x(x^{2} + y^{2} )^{3} + p_{2} (3x^{2} + y^{2} ) + 2p_{1} xy} \hfill \\ {\updelta_{y} = k_{1} y(x^{2} + y^{2} ) + k_{2} y(x^{2} + y^{2} )^{2} \text{ + }k_{3} y(x^{2} + y^{2} )^{3} + p_{1} (3x^{2} + y^{2} ) + 2p_{2} xy} \hfill \\ \end{array} } \right. $$
(4.7)

where k 1, k 2 are radial distortion coefficients, p 1, p 2 are tangential distortion coefficients, δ x , δ y are distortion errors along x and y axes.

4.3 Camera Calibration Method

4.3.1 Solving Internal and External Parameters

At first, to capture 3 or more images in the camera’s view. Planar template can be translated or rotated at random, but the Z W of the world coordinate system of each image is chosen to perpendicular to the plane, the first corner detected is perceived as the world coordinate system’s origin, and Z W  = 0 in the plane.

Let’s denote the i-th column of the rotation matrix R by r i . So Eq. 4.5 can be written as:

$$ \left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ \end{array} } \right] = s\user2{M}_{\text{1}} \left[ {\begin{array}{*{20}c} {r_{1} } & {r_{2} } & {r_{3} } & \user2{t} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ 0 \\ 1 \\ \end{array} } \right] = s\user2{M}_{\text{1}} \left[ {\begin{array}{*{20}c} {r_{1} } & {r_{2} } & \user2{t} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ 1 \\ \end{array} } \right] = \user2{H}\left[ {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ 1 \\ \end{array} } \right] $$
(4.8)

where s is an arbitrary scale factor. Let

$$ \user2{H} = \left[ {\begin{array}{*{20}c} {h_{11} } & {h_{12} } & {h_{13} } \\ {h_{21} } & {h_{22} } & {h_{23} } \\ {h_{31} } & {h_{32} } & 1 \\ \end{array} } \right] $$
(4.9)
$$ \user2{h} = \left[ {\begin{array}{*{20}c} {h_{11} } & {h_{12} } & {h_{13} } & {h_{21} } & {h_{22} } & {h_{23} } & {h_{31} } & {h_{32} } & 1 \\ \end{array} } \right]^{T} $$
(4.10)

Then obtain:

$$ \left( {\begin{array}{*{20}c} {X_{W} } & {Y_{W} } & 1 & 0 & 0 & 0 & { - uX_{W} } & { - uY_{W} } & { - u} \\ 0 & 0 & 0 & {X_{W} } & {Y_{W} } & 1 & { - vX_{W} } & { - vY_{W} } & { - v} \\ \end{array} } \right)\user2{h} = 0 $$
(4.11)

Each point can be shown above the two equations. One image has N points, and 2 × N equations can be obtained, written as Sh = 0. The solution of h is the corresponding eigenvector to the minimum eigenvalue of the equation S T S = 0 [2], H can be solved after the vector h is normalized. Nonlinear least square method can be used to solve maximum likelihood estimation of H, here is Levenberg-Marquardt algorithm [12].

Each image has homography matrix H, written as column vectors, H = [h 1 h 2 h 3], where each h is 3 × 1 vector. H = [h 1 h 2 h 3] = s M 1[r 1 r 2 t] can be decomposed as:

$$ h_{i} = s\user2{M}_{\text{1}} r_{i} {\text{ or }}r_{i} =\uplambda\user2{M}_{\text{1}}^{ - 1} h_{i} $$
(4.12)

where λ = 1/s, i = 1, 2, 3.

Rotation matrix R is unit orthogonal matrix, so r 1 and r 2 are orthogonal. We have two constraints:

$$ h_{1}^{T} \user2{M}_{\text{1}}^{ - T} \user2{M}_{\text{1}}^{ - 1} h_{2} = 0 $$
(4.13)
$$ h_{1}^{T} \user2{M}_{\text{1}}^{ - T} \user2{M}_{\text{1}}^{ - 1} h_{1} = h_{2}^{T} \user2{M}_{\text{1}}^{ - T} \user2{M}_{\text{1}}^{ - 1} h_{2} $$
(4.14)

Let \( \user2{B} = \user2{M}_{\text{1}}^{ - T} \user2{M}_{\text{1}}^{ - 1} \), we can get:

$$ \user2{B} = \user2{M}_{\text{1}}^{ - T} \user2{M}_{\text{1}}^{ - 1} = \left[ {\begin{array}{*{20}c} {B_{11} } & {B_{12} } & {B_{13} } \\ {B_{21} } & {B_{22} } & {B_{23} } \\ {B_{31} } & {B_{32} } & {B_{33} } \\ \end{array} } \right] $$
(4.15)

In fact, there is general closed-form of matrix B:

$$ \user2{B} = \left[ {\begin{array}{*{20}c} {\frac{1}{{f_{x}^{2} }}} & 0 & {\frac{{ - u_{0} }}{{f_{x}^{2} }}} \\ 0 & {\frac{1}{{f_{y}^{2} }}} & {\frac{{ - v_{0} }}{{f_{y}^{2} }}} \\ {\frac{{ - u_{0} }}{{f_{x}^{2} }}} & {\frac{{ - v_{0} }}{{f_{y}^{2} }}} & {\frac{{u_{0} }}{{f_{x}^{2} }} + \frac{{v_{0} }}{{f_{y}^{2} }} + 1} \\ \end{array} } \right] $$
(4.16)

The two constraints have their general form \( h_{i}^{T} \user2{B}h_{j} \) via matrix B. We could obtain each element just making sure the six elements of B as matrix B is symmetric matrix. The six elements are written as a column vector:

$$ h_{i}^{T} \user2{B}h_{j} = v_{ij}^{T} \user2{b} = \left[ {\begin{array}{*{20}c} {h_{i1} h_{j1} } \\ {h_{i1} h_{j2} + h_{i2} h_{j1} } \\ {h_{i2} h_{j2} } \\ {h_{i3} h_{j1} + h_{i1} h_{j3} } \\ {h_{i3} h_{j2} + h_{i2} h_{j3} } \\ {h_{i3} h_{j3} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {B_{11} } \\ {B_{12} } \\ {B_{22} } \\ {B_{13} } \\ {B_{23} } \\ {B_{33} } \\ \end{array} } \right] $$
(4.17)

The two constraints can be written as Eq. 4.18 via the definition of v T ij

$$ \left[ {\begin{array}{*{20}c} {v_{12}^{T} } \\ {(v_{11} - v_{22} )^{T} } \\ \end{array} } \right]\user2{b} = 0 $$
(4.18)

If we get K images at the same time, we can get Vb = 0 where V is a 2 K × 6 matrix.

If K ≥ 2, it has solution. At last, we compute the internal parameters.

$$ f_{x} = \sqrt {\uplambda/B_{11} } $$
(4.19)
$$ f_{y} = \sqrt {\lambda B_{11} /(B_{11} B_{22} - B_{12}^{2} )} $$
(4.20)
$$ u_{0} = - B_{13} f_{x}^{2} /\uplambda $$
(4.21)
$$ v_{0} = (B_{12} B_{13} - B_{11} B_{23} )/(B_{11} B_{22} - B_{12}^{2} ) $$
(4.22)
$$ \uplambda = B_{33} - \left[ {B_{13}^{2} + v_{0} (B_{12} B_{13} - B_{11} B_{23} )} \right]/B_{11} $$
(4.23)

Then we compute the external parameters by B:

$$ r_{1} =\uplambda\user2{M}_{1}^{ - 1} h_{1} $$
(4.24)
$$ r_{2} =\uplambda\user2{M}_{1}^{ - 1} h_{2} $$
(4.25)
$$ r_{3} = r_{1} \times r_{2} $$
(4.26)
$$ t =\uplambda\user2{M}_{1}^{ - 1} h_{3} $$
(4.27)
$$ \uplambda = 1/\left\| {\user2{M}_{1}^{ - 1} h_{1} } \right\| $$
(4.28)

But if let r 1, r 2, r 3 combine into rotation matrix R, there may be a large error because R is not a positive definite matrix in the actual process, that is to say R T R = RR T = I is not founded.

To solve this problem, we can choose to use singular value decomposition (SVD), to make R = UDV T is founded. U and V are orthogonal matrices, D is diagonal matrix. Moreover, because r 1, r 2, r 3 are orthogonal to each other, matrix D must be identity matrix I so that R = UIV T. So we first solve singular value decomposition of R, then D to be set as identity matrix, finally multiply U and V to solve rotation matrix R′ which is matching the requirement.

4.3.2 Maximum Likelihood Estimation

We adopt Levenberg-Marquardt algorithm to optimize these parameters after they are solved. Evaluation function is expressed as:

$$ C = \sum\limits_{i = 1}^{N} {\sum\limits_{j = 1}^{K} {\left\| {m_{ij} - m(\user2{M}_{1} ,\user2{R}_{i} ,\user2{t}_{i} ,\user2{M}_{ij} )} \right\|^{2} } } $$
(4.28)

where N is the total number of image, K is the total number of points in each image, m is image pixel coordinate, M is world coordinate, m(M 1, R i , t i , M ij ) is image pixel coordinate computed by these known parameters

4.3.3 Solving Camera Distortion

We have not dealt with camera distortion so far. We use camera’s internal and external parameters and all distortion coefficients set to zero as initial values to compute them. Let (x p , y p ) is the location of point, (x d , y d ) is the distortion location of point, so

$$ \left[ {\begin{array}{*{20}c} {x_{p} } \\ {y_{p} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {f_{x} X_{W} /Z_{W} + u_{0} } \\ {f_{y} Y_{W} /Z_{W} + v_{0} } \\ \end{array} } \right] $$
(4.29)

Combine with Eq. 4.7, we get

$$ \left[ {\begin{array}{*{20}c} {x_{p} } \\ {y_{p} } \\ \end{array} } \right] = (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6} )\left[ {\begin{array}{*{20}c} {x_{d} } \\ {y_{d} } \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} {2p_{1} x_{d} y_{d} + p_{2} (r^{2} + 2x_{d}^{2} )} \\ {p_{1} (r^{2} + 2y_{d}^{2} ) + 2p_{2} x_{d} y_{d} } \\ \end{array} } \right] $$
(4.30)

where r 2 = x 2 d  + y 2 d . We can get a large number of equations and solve them to obtain distortion coefficients.

4.4 Analysis of Experimental Results

4.4.1 Error Assessment Method

Camera calibration results are difficult to evaluate whether accurate or not, there is no objective criteria. We can use world coordinate (X W , Y W , Z W ) and the camera parameter matrix of a 3D point P, to get the back-projection value \( (\tilde{u},\tilde{v}) \) after matrix multiplication, then compare it with origin value (u, v) detected actually, get average error E uv , standard deviation e uv and maximum error e in the image pixel coordinate system.

$$ E_{uv} = \frac{{\sum\limits_{i = 1}^{K} {(\left| {\widetilde{{u_{i} }} - u_{i} } \right| + \left| {\widetilde{{v_{i} }} - v_{i} } \right|)} }}{K} $$
(4.31)
$$ e_{uv} = \sqrt {\frac{{\sum\limits_{i = 1}^{K} {(\left| {\widetilde{{u_{i} }} - u_{i} } \right|^{2} + \left| {\widetilde{{v_{i} }} - v_{i} } \right|^{2} )} }}{K}} $$
(4.32)
$$ e = \hbox{min} (\sum\limits_{i = 1}^{K} {\left| {\widetilde{{u_{i} }} - u_{i} } \right|} ,\left| {\widetilde{{v_{i} }} - v_{i} } \right|) $$
(4.33)

4.4.2 The Introduction of Experiment System

The calibration template is checkerboard with nine corners along length and six corners along width. The size of each black square is 20 mm × 20 mm, there are 54 angular points, the first corner of each image will be origin of the world coordinate system, and world coordinates of other corners will be computed.

Adopting microscopical industrial camera, whose resolution is 640 × 480 pixels, and focal length is 12 mm, the physical size is 1/3″, the distance of every two pixel is 3.2 μm. The flow chart of camera calibration algorithm is shown in Fig. 4.3.

Fig. 4.3
figure 3

The flow chart of camera calibration

4.4.3 Analysis of Experimental Results

To complete calibration through 5, 10, 15 and 20 images respectively, using Eqs. 4.314.33 to get average error, standard deviation and maximum error (See Fig. 4.4).

Fig. 4.4
figure 4

The scatter plot of back-projection error

With the number of calibration image increasing, the average error and standard deviation gradually decrease and stabilize to a certain value. We should take 20–25 image at least. In order to make the errors seem more convenient, we describe them as a scatter plot (See Fig. 4.4).

We choose an image to calculate the camera parameters. The rotation matrix and translation vector of 13-rd image are:

$$ \begin{aligned} \user2{R} & = \left[ {\begin{array}{*{20}c} { - 0.614031} & {0.789249} & { - 0.007166} \\ {0.766653} & {0.598560} & {0.232309} \\ {0.187639} & {0.137151} & { - 0.972615} \\ \end{array} } \right] \\ \user2{t} & = \left[ {\begin{array}{*{20}c} { - 99.62857} & { - 138.42137} & {1409.07778} \\ \end{array} } \right] \\ \end{aligned} $$

In order to verify the accuracy of the calibration results, we use MATLAB toolbox to calibrate the 20 images (See Tables 4.1 and 4.2).

Table 4.1 The error comparison of calibration precision
Table 4.2 The comparison of calibration results

4.5 Conclusion

Experiments show that Zhang Zhengyou plane calibration method not only has low requirement to the experimental equipment just with a camera and a calibration template, but also has high precision, which is a transition method between the traditional calibration and the self-calibration. In simple experimental conditions, we can accurately obtain the camera parameters, and use the theory of binocular vision to compute depth of field. At the same time, the software implementation compared with MATLAB toolbox is simple, do not needs to extract angular point manually.