Keywords

1 Introduction

Consumer level RGBD cameras, such as Kinect [1], are popular in the field of computer vision in recent years. A large number of Kinect performance tests have been completed by [2,3,4]. Kinect is applicable to many fields, such as robotic field, 3D reconstruction field and so on [5]. Recently, with the decrease of thermal camera price, infrared imaging has become a helpful and useful tool in the field of computer vision. Thus, the idea of combining a RGBD camera with a thermal camera has started to become more popular. More and more people design multi camera systems including Kinect and a thermal camera to perform a 3D reconstruction of a thermal scene [6]. The 3D reconstruction of thermal scene can be applied to building inspection [7] and fire rescue [8].

Geometric calibration and temporal calibration are the precondition of a multi camera system. Without above calibrations, the multiple image information from a multi camera system can’t be integrated together. The infrared image represents the temperature information of the object in the scene. Therefore, traditional calibration boards can’t calibrate thermal cameras. In this paper, we designed a novel calibration board to calibrate RGBD and thermal cameras. The temporal calibration means synchronizing scene images from different cameras. The scene information will be confusing without temporal calibration. We adopted the method of nearest adjacent time to synchronize RGBD and thermal cameras. The contribution of this paper is as follows:

  1. 1)

    We designed a novel calibration board which can calibrate both RGBD and thermal cameras. The advantages and disadvantages between the circular calibration plate and the checkerboard calibration plate are compared.

  2. 2)

    We considered the effect of the interruption from non-uniformity corrections of the thermal camera on synchronization. And we adopted the method of nearest adjacent time by using timestamps to solve camera synchronization.

The rest of the paper is organized as follows: In Sect. 2 we provide an overview of prior works in camera calibration and synchronization of multi camera systems. And then, we introduce our methods of camera calibration and synchronization in detail in Sect. 3 and 4. Next, we carry out some experiments and evaluation in Sect. 5 and then provide conclusion and future work in Sect. 6.

2 Related Work

Camera calibration is a core problem in the field of 3D reconstruction and robot navigation [9]. The most widely accepted calibration method is the strategy from Zhang [10]. The traditional calibration boards usually use printed chessboards and the calibration points are easily located in a visible image, but they cannot be accurately located in an infrared image. This is because the thermal camera mainly acquires the temperature information of objects and the printed chessboard generally maintains near-uniform temperature with low contrast. In recent years, many researches have designed different calibration boards in order to calibrate thermal cameras. In [11,12,13], authors cut rectangular holes through a board and put it in front of different temperature objects. The method can calibrate both RGBD and thermal cameras. However, the thickness of holes will result in inaccurate calibration. A thermal calibration rig with 42 small LED lights located on the intersections of the conventional checkerboard was designed. Which can calibrate thermal cameras when the LED lights were on [14]. However the light will affect the precise positioning of feature points in infrared images. Kim [15] presented a line-based grid pattern board, which included a line based grid of regularly sized squares pattern. Michael [16] adopted a black and white checkerboard with one resistor mounted in the center of each checkerboard square. In [17], the authors used rubber heater to warm up a plastic mask with a grid of holes for thermal camera calibration. It worked well but the price of the calibration board is a little high. We proposed a novel calibration board made of aluminum foil and cardboards, which is easy to make and low cost. Besides, the calibration board only need to be heat one minute by a hair drier before calibrating cameras. We designed square and circle pattern calibration boards and carried out experiments to study which one is more accurate.

The process that all images from all sensors of a multi camera system have corresponding timestamps is called camera temporal calibration or synchronization. Many previous studies adopted hardware to solve synchronization. Soonmin [18] used the master-slave synchronization technique to synchronize visible and thermal cameras. The visible and thermal cameras are synchronized by sending trigger signals from master to slaves. In [19], the author adopted beam splitter to synchronize cameras. The synchronization result is ideal with the method of hardware, but most cameras don’t have trigger generators or beam splitters. Therefore, the cost of synchronization will increase. Some other works such as [20] dealt with extrinsic calibration and synchronization jointly.

The previous works have taken little account of the impact of non-uniformity corrections from thermal cameras on synchronization. Most thermal cameras have regular non-uniformity corrections, which will lead to camera interruption about 1.5 to 2 s. To overcome this problem, we adopted the method of nearest adjacent time, which can quickly synchronize cameras and can solve the impact of non-uniformity corrections.

3 Calibration

3.1 Intrinsic Calibration

Pinhole model is one of the simplest types of camera models, which can mathematically describe the projection of points in 3D space onto an image plane. \( P \) is the point in 3D space, \( p \) is the perspective projection point from \( P \) to a 2D image plane. In homogeneous coordinates, the 3D world point \( P \) and perspective projection point \( p \) can be respected as:

$$ P = \left[ {\begin{array}{*{20}c} X & Y & Z & 1 \\ \end{array} } \right] $$
(1)
$$ p = \left[ {\begin{array}{*{20}c} u & v & 1 \\ \end{array} } \right] $$
(2)

The geometric relation between \( P \) and \( p \) can be written as:

$$ z\left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {f_{x} } & 0 & {u_{0} } & 0 \\ 0 & {f_{y} } & {v_{0} } & 0 \\ 0 & 0 & 1 & 0 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} R & T \\ {0^{T} } & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} X \\ Y \\ Z \\ 1 \\ \end{array} } \right] $$
(3)

Where z is the distance from the point \( P \) to the camera, R and T are the spatial relation matrixes between the world coordinate system and the camera coordinate system. The intrinsic matrix consists of focal length \( f_{x} ,\,\,f_{y} \) and the principal point \( \left( {u_{0} ,v_{0} } \right) \).

The traditional calibration boards can’t calibrate the thermal camera. The thermal camera acquires the information of the scene through the reflected and radiate infrared rays of the objects. Materials with high emissivity reflect most of the infrared rays back because they don’t absorb much infrared energy. Therefore we can calibrate the thermal camera with the calibration board made of different materials. We use the second generation Kinect as the RGBD camera. The second generation Kinect contains both a color camera and a thermal camera. And the spatial depth information is detected by the thermal camera. This means that the depth camera and the thermal camera are the same camera. Thus the calibration of the depth camera can be converted to calibrated the thermal camera.

In this paper, we proposed two different calibration pattern boards which can calibrate both thermal cameras and RGBD cameras. One is the square planar pattern calibration board. The other is the circle pattern calibration board. The square planar pattern calibration board was consisted of white cardboard squares and aluminum foil squares. The aluminum squares were made by sticking aluminum foil on the white cardboard squares. The circle pattern calibration board was made of circle aluminum foil and a thick white cardboard. Both square planar pattern and circle pattern calibration boards are suitable, convenient and low cost. In order to know which kind of calibration boards has higher calibration accuracy, some tests was carried out by using OpenCV. Figure 1 shows the two types of calibration boards and the results of calibration. The accuracy of calibration can be analyzed by calculating the mean reprojection error (MRE), which indicates the distance between a detected point and a reprojection point in a photo. The unit of MRE is pixel.

Fig. 1.
figure 1

Calibration boards and the results of calibration. First column: square and circle planar pattern calibration boards. Second column: RGB camera. Third column: near-infrared (depth) camera. Last column: thermal camera.

Table 1 shows a comparison of the proposed calibration patterns and the pattern presented by Skala et al. [13]. From Table 1 we can see that our method is more accurate than the method from Skala et al. Besides, the MRE of circle pattern is less than square pattern. It is suggested that the circle pattern is more accurate than the square pattern.

Table 1. The MRE of different calibration patterns.

The main reason of affecting calibration precision is the precise localization of feature points. The localization of feature points is very difficult when the calibration board is non-front parallel to the camera imaging plane [21]. The calibration board is usually not parallel to the camera imaging plane. As a result, the square in the photo becomes a diamond and the circle becomes an ellipse. The algorithm detecting the center of a circle is divided into the three steps. Firstly, an input image is converted into a set of binary images by a series of continuous threshold values. Secondly, the connected region of each binary image is extracted by detecting their boundaries. The connected region is the blob of corresponding a binary image. Finally, the blob is fitted to a circle and the center of the circle can be calculated. And it can also detect the center of an ellipse by fitting the ellipse edge. The method is very simple and efficient in detecting the center of a circle or an ellipse. For square pattern, the algorithm is detecting the corners of a chessboard. When the square in the photo becomes a diamond, the localization error of feature points becomes larger. According to Table 1, the calibration accuracy of circle pattern is higher than square’s. So we adopted the circle pattern calibration board to calibrate cameras.

3.2 Stereo Calibration of Thermal and RGBD Cameras

In order to obtain fusion information from Kinect and the thermal camera, it is require to make them in a common geometric coordinate system. The process of acquiring the rotation and translation relationships between different cameras is known as stereo calibration. The new calibration board that is presented in this paper can calibrate thermal camera, RGB camera and depth camera. Thus we are able to use calibration-based to carry out stereo calibration.

According to pinhole projection model, a spatial point in scene can be project onto the camera image plane. And there is an infinite number of spatial points can be project onto the same point on the camera image plane. All these spatial points lie on a straight line that goes through the camera center. The points on the camera image plane can’t be reprojected into spatial scene because they lost the distance information. Nevertheless, the pixels of a depth image represent the distance between the spatial object and the depth camera. So the depth information is able to be used to reproject points from the camera image plane to the spatial scene.

$$ z_{d} p_{depth} = K_{depth} \left[ {\begin{array}{*{20}c} {R_{depth} } & {t_{depth} } \\ \end{array} } \right]P $$
(4)
$$ z_{t} p_{thermal} = K_{thermal} \left[ {\begin{array}{*{20}c} {R_{thermal} } & {t_{thermal} } \\ \end{array} } \right]P $$
(5)

Where \( P \) is the point in a 3D space scene, \( p_{depth} \) and \( p_{thermal} \) are the projection points from \( P \) to the image plane of the depth camera and the thermal camera. \( K_{depth} \) and \( K_{thermal} \) are the intrinsic matrixes of the depth camera and the thermal camera. \( \left[ {\begin{array}{*{20}c} {R_{depth} } & {t_{depth} } \\ \end{array} } \right] \) and \( \left[ {\begin{array}{*{20}c} {R_{thermal} } & {t_{thermal} } \\ \end{array} } \right] \) are the extrinsic matrixes of the depth camera and the thermal camera, including the rotation matrix and the translation vector.\( z \) represents the distance from \( P \) to camera in the camera coordinate system. The projection point \( P_{depth} \) from \( P \) in the camera coordinate system can be written as:

$$ P_{depth} = z_{d} K_{depth}^{ - 1} p_{depth} = R_{depth} P + t_{depth} $$
(6)

The spatial position relationship between the thermal camera and the depth camera can be written as:

$$ P_{thermal} = RP_{depth} + T $$
(7)

Where \( R \) is the rotation relationship between two cameras, \( T \) is the translation relationship between two cameras. The following equation can be acquired by combining Eqs. (4), (5) and (6).

$$ P_{thermal} = R_{thermal} R_{depth}^{ - 1} P_{depth} + t_{thermal} - R_{thermal} R_{depth}^{ - 1} t_{depth} $$
(8)

Comparing Eqs. (7) and (8), \( R \) and \( T \) are able to be calculated.

$$ R = R_{thermal} R_{depth}^{ - 1} $$
(9)
$$ T = t_{thermal} - Rt_{depth} $$
(10)

From Eqs. (9) and (10), we can come to a conclusion that we only need to obtain the extrinsic matrixes of the thermal camera and the depth camera in the same scene, and then the transformation matrixes of two cameras can be calculated. Similarly, the relationship between visible and depth cameras can obtain using the same method.

4 Synchronization

It is very important for synchronization of a multi camera system to integrate various information. The authors in [12] adopted motion statistic alignment to synchronize cameras. They used the TV-L1 optical flow to calculate the average flow vector magnitude of RGBD and thermal images streams. However, it cannot be used in low-frame cameras and the thermal camera with non-uniformity corrections (NUCs). In general, most thermal cameras need regular intensity correction in order to increase the accuracy of receiving the infrared radiation signal, which is known as NUCs. When the thermal camera performs NUCs, a baffle of uniform temperature will temporarily block the imaging sensor. Therefore, this leads to a data stream interruption about 1.5 to 2 s. NUCs are significant for thermal images. If a thermal camera doesn’t have NUCs, the pixel values of thermal images can’t represent an accurate estimate of infrared radiance in the scene. On the other hand, the sampling frequency of Kinect and thermal camera is different, which maybe cause an infrared photo can’t match a corresponding depth photo.

In order to solve above problems, we improved the method from [22]. We used the nearest adjacent time between an infrared image and a depth image to quickly find the corresponding frames between the Kinect and the thermal camera. The RGB and depth image frames from the Kinect are already synchronized. It only need to synchronize the depth and thermal cameras. We first obtain the timestamps of depth and thermal frame streams \( \left\{ {t_{n}^{d} ,n = 1,2,3 \cdots } \right\} \) and \( \left\{ {t_{m}^{th} ,m = 1,2,3 \cdots } \right\} \). After that, we calculate the time difference between depth frames and corresponding thermal frames:

$$ \Delta t = \hbox{min} \left\{ {\left| {t_{n - 1}^{d} - t_{m}^{th} } \right|,\left| {t_{n}^{d} - t_{m}^{th} } \right|,\left| {t_{n + 1}^{d} - t_{m}^{th} } \right|} \right\} $$
(11)

The frame rate of the Kinect is almost 30 frames per second and frame rate of the thermal camera is almost 20 frames per second. So every infrared frame maybe correspond more than one depth frames. Considering that, we select the three depth frames adjacent to the infrared frame to calculate and keep the minimum time difference as \( \Delta t \). The threshold \( \sigma \) represents half of the inter-frame interval of the camera with the slowest frame rate. So the value of \( \sigma \) is set to 25 ms. If \( \Delta t \) is less than the threshold \( \sigma \), the depth frame and corresponding thermal frame are the synchronized frames and we will keep two fames. Otherwise, we will abandon them temporarily. Considering NUCs of thermal camera, many depth frames don’t have corresponding thermal frames during NUCs. When \( \Delta t \) exceeds 1.5 s, we can consider that the thermal camera is carrying out NUCs. Thus, the depth frame should be abandoned and the corresponding infrared frame is kept to search next depth frames. If \( \Delta t \) is greater than 25 ms but no more than 1.5 s, relative infrared and depth frames are both abandoned.

5 Experiment and Evaluation

In our experiment, we adopted the second generation Kinect as RGBD camera and FLIR T420 as thermal camera. We first carried out the experiment of camera calibration including single camera calibration and stereo camera calibration. For the visible camera, the calibration needs to be carried out in a weak light environment. And a light source needs to be arranged on one side of the calibration board. After we obtained the intrinsic parameters and extrinsic parameters of cameras, we registered depth images with infrared images and RGB images. Figure 2 shows the results of registration. We can find that results are particularly desirable and indicate that we can obtain an accurate camera calibration using our calibration board.

Fig. 2.
figure 2

The results of registration. First row: left column is original infrared images, middle column is original depth images, right column is registered images. Second row: left column is original color images, middle column is original depth images, right column is registered images.

And then we evaluated the efficacy of synchronization using infrared and depth image frames. Figure 3 shows the offset between thermal camera and the depth camera. Figure 4 shows the results of synchronization. We selected a key frame as a start frame. From Fig. 3, as time goes by, the offset is increasing because of different frame rates between two cameras. We can clearly see two curve pulses, which are NUCs operation of the thermal camera. In Fig. 4, the depth frames those in NUCs time are abandoned. Comparing with pictures from Figs. 3 and 4, the nearest adjacent time has an ideal effect on synchronization.

Fig. 3.
figure 3

The offset between the thermal camera and the depth camera. Left image shows the timestamps of two cameras. Right image shows the time difference of thermal frames and depth frames.

Fig. 4.
figure 4

The results of synchronization using our method. First row: left image shows the result of depth frames after synchronization; right image shows the result of thermal frames after synchronization. Second row shows the time difference between thermal frames and depth frames after synchronization.

6 Conclusion

In this paper we presented novel methods for geometric calibration and temporal calibration of the Kinect and the thermal camera. We designed a new calibration board for camera calibration, which can be used to calibrate thermal camera, depth camera and RGB camera at the same time. Comparing with existing methods that require high cost and complex operation, our calibration board is easy to product and use. The calibration board only consists of aluminum foil and a cardboard, so the manufacturing cost is lower than other kinds of calibration boards. In addition, the method of the nearest adjacent time we adopted can synchronize the image streams from a multi cameras system. By calculating the time difference and eliminating invalid frames, the method of the nearest adjacent time can solve the problem of NUCs quickly.

In the future, we will focus on the fusion of depth datas, thermal datas and RGB datas from the Kinect and the thermal camera. Besides, we will study indoor three dimensional temperature field reconstruction. In addition, we will explore a real time reconstruction system using the Kinect and the thermal camera.