Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Blind spot of the vehicle is one of the major reasons causing car accidents. Helping drivers to perceive the situation around the vehicle while driving is the main goal of intelligent driving assistance systems. Toward this goal, many kinds of driving assistance systems have been developed. For example, parking sensors measure the distances of the obstacles from the vehicles and warn the drivers by beepers. Automatic braking system can stop or slow down the vehicle to prevent collisions. Rear-view cameras capture the views behind the vehicles and can help drivers to drive backward. Because the prices of cameras keep dropping, it becomes practical to mount multiple cameras around the car to provide drivers with vehicle surrounding views for better perception of situations around the vehicles.

Ehlgen and Pajdla [1] proposed to use four omnidirectional cameras around a truck. They developed several ways to split the overlap area and provided a bird’s-eye view image. But their integrated results are discontinuous on the seam between different cameras. In [2], Liu et al. mounted six fisheye cameras around the vehicle and stitched all six images together to provide a bird’s-eye view of the vehicle surrounding environments. In their method, they applied one-dimensional stitching to improve the discontinuity issue on the seam at the expense of large computational costs. Recently, Nissan [3] developed an assistant system, Around View Monitor, which can render the surrounding view of the vehicle by using four wide-angle cameras. This system can provide drivers with both the aerial view and the original images captured by the four cameras. The Eagle Eye system from Luxgen [4] and Multi-View Camera System from Honda [5] used similar approaches to provide drivers better visual perception of surroundings. These two systems project the captured and undistorted images onto a ground plane and leave four black seams between adjacent cameras. Delphi Automative also proposed a parking system, \(360^\circ \) Surround View System with Parking Guidance [6]. They applied a blending algorithm to provide smooth visual results from the aerial view. In driving situations like lane changing and passing, the visual information from aerial view may be inadequate for the drivers. Fujitsu also released a system, \(360^\circ \) warp-around video imaging system [7], which can project the acquired images onto a 3D curved plane and can provide drivers with a third-person view.

In this paper we propose a monitoring system for driving assistance, as its system flowchart illustrated in Fig. 1. First, we placed the fisheye camera on a rotating device controlled by a stepping motor, as shown in Fig. 4(a), to automatically calibrate the fisheye camera. Next, we mounted four of this kind of fisheye cameras around the vehicle and acquired distorted images from these cameras. Then we applied Zhang’s method [8] to estimate the intrinsic parameters of each camera. Once the intrinsic parameters were obtained, these distorted images were dewarpped into perspective ones and then mapped to the ground plane by using homography transformation. Extrinsic parameters of each camera were then calculated by the homography matrix and intrinsic parameters of this camera. Finally, we projected the undistorted images onto a proposed hybrid model and render the images from a driver-selected viewpoint. We used a look-up table approach and can finish the whole image mapping process in real time.

Fig. 1.
figure 1

Flowchart of the proposed system.

2 Correction of Image Distortion and Estimation of Intrinsic Parameters

This section describes the proposed methods of camera calibration and image dewarping. Because the field of view of the fisheye lens we adopted is so large (\(183^\circ \)), the widely-used FOV (field of view) fisheye lens model can not fit well. Instead of the full calibration procedure to determine the full set of camera parameters, therefore, we developed a simple method to dewarp fisheye camera images according to the relationship between the incident angle and the image formation distance of a point. Radial distortion of fisheye lens causes an inward or outward displacement of a given image point from its ideal location. There are many existing models for the calibration of fisheye cameras [912]. Devernay and Faugeras [13] used FOV model to calibrate the fisheye cameras. They used the idea that the distance between the ideal image point and the principal point is roughly proportional to the angle of incidence, as shown in Fig. 2(b). However, the FOV model is not suitable for all fisheye cameras. Figure 2(a) shows the difference between the perspective model, FOV model, and our measurement. As described in Sect. 2.3, we conducted a simple experiment to figure out how the radial distortion changes when the incident angle increases. In Sect. 2.4, we describe how to apply Zhang’s method [8] to obtain the intrinsic parameters of the virtual camera, which is the undistorted counterpart of the fisheye camera. Combining with the homography matrix, we can estimate the extrinsic parameters of the virtual camera, as described in Sect. 3.

Fig. 2.
figure 2

(a) Relationship between incident angle and displacement of image point for perspective model, FOV model, and real fisheye camera. (b) Concept of the FOV model.

2.1 Aspect Ratio of Pixel

Because of the non-isometric pixel of fisheye camera, the circular shape of the fisheye lens border may form an elliptic shape in the captured image. The first step of camera calibration is to correct the aspect ratio of pixels. As shown in Fig. 3, we performed the ellipse fitting [14] process to obtain the major and minor axes and center point of the ellipse. Then we rescaled the images along the minor axis so that the length of minor axis is equal to that of the major one. Once we obtained the circular images, we assume that the lens distortion is identical along any axis crossing the image center.

Fig. 3.
figure 3

(a) An image captured by a fisheye camera. (b) The image of a white wall captured by a fisheye camera. (c) Ellipse fitting result.

2.2 Feature Point Acquisition

In order to dewarp the fisheye image, we have to estimate the relationship between the incident angle and distortion measure. First, the fisheye camera was mounted on the center of a rotating table with its optic axis perpendicular to the normal vector of the rotating table. The rotating table was controlled by a stepping motor, as shown in Fig. 4(a). Next, we designated a feature point, which was a red circle pattern on a wall and adjusted the rotating table to locate this point at the image center of the camera, as shown in Fig. 4(b). In our experiments, the distance between the wall and the camera was 5 m and we took a sequence of images while rotating the table, one image per \(0.9\) degrees. Furthermore, when the slanted fisheye camera mounted on the table was rotating, feature point on the image moved along an oblique line, as shown in Fig. 4(c). Consider the aspect ratio of pixels, \(\alpha \), the distance between the ellipse center \((x_C,y_C)\) and the feature point \((x_F,y_F)\) is:

$$\begin{aligned} D((x_C,y_C ),(x_F,y_F ))=\sqrt{(x_C-x_F)^{2}+(\alpha (y_C-y_F))^{2}} {.} \end{aligned}$$
(1)

As a result, we can construct a table recording the mapping relationship between the rotation angle, which is equal to incident angle, and the distance from the feature point to the ellipse center. Here we denote this mapping relation as:

$$\begin{aligned} T(D_{f,i})= \varTheta _i {,} \end{aligned}$$
(2)

where \(D_{f,i}\) is the distance from the point \(i\) on fisheye image to the ellipse center and \(\varTheta _i\) is the corresponding incident angle.

Fig. 4.
figure 4

(a) The rotating table controlled by a stepping motor. (b) The feature point on a wall five meters away from the camera. (c) Locus of feature points.

2.3 Mapping Rule

According to the perspective projection model, incident angle \(\varTheta _i\) and the distance \(D_{p,i}\) from the projection point on the projection plane to the center of the plane are related by:

$$\begin{aligned} f\tan (\varTheta _{i}) = D_{p,i} {,} \end{aligned}$$
(3)

where \(f\) is the focal length of the perspective projection camera model. \(FOV\) is the maximum of incident angle which can be projected on the plane. \(D_{FOV}\) is the \(D_{p,i}\) when point \(i\) is projected with incident angle \(FOV\), which is considered as a normalization term. We can then eliminate \(f\) by substitution and obtain the following equation:

$$\begin{aligned} \frac{D_{p,i}}{D_{FOV}} = \frac{\tan (\varTheta _i))}{\tan (FOV))} {.} \end{aligned}$$
(4)

By substituting (2) into (4), the relationship between \(D_{f,i}\) and \(D_{p,i}\) is derived as:

$$\begin{aligned} D_{p,i}=\frac{D_{FOV}}{\tan (FOV)}\tan (T^{-1}(D_{f,i})) {.} \end{aligned}$$
(5)

We can use this relationship to undistort fisheye image to a perspective projection one. Two examples of undistorted fisheye images are shown in Fig. 5.

Fig. 5.
figure 5

(a)(c) The images captured by a fisheye camera. (b)(d) The corresponding undistorted results.

2.4 Intrinsic Parameter

In order to obtain the relationship between world coordinate system and image coordinate system, we must find out the intrinsic and extrinsic parameters of our fisheye cameras. As mentioned before, the field of view of the fisheye lens we adopted is too large to use FOV model for calibration. Therefore, we estimate the parameters of the virtual camera, which is the undistorted counterpart of the fisheye camera. Following the standard procedure of Zhang’s calibration method [8], we made a chess board as a calibration pattern and took multiple pictures with the chess board pattern in different positions and poses. Next, we undistorted the fisheye images by using the mapping rule described above. Finally we obtained camera intrinsic parameters by using the OpenCV procedure of Zhang’s method [8], as shown in Fig. 6.

Fig. 6.
figure 6

(a) Chess board calibration pattern, (b)(e) the original fisheye images, (c)(f) undistorted images, and (d)(e) images in the calibration procedure.

3 Homography Transform and Extrinsic Parameters

3.1 Integration of Camera Views by Homography

After fisheye camera calibration, the next step is to stitch four camera images to construct a monitoring image for vehicle surrounding view. We mounted four fisheye cameras around the vehicle for capturing images toward front, back, left, and right directions, as shown in Fig. 7(a). To stitch four images, we use the homography transformation to map the dewarped images onto a reference ground plane. In this work, the reference ground plane is a bird’s eye view image of a plaza captured on the roof of a building, as shown in Fig. 8(b). Forty-eight red circle patterns placed on the plaza were used as the features for the registration of dewarped fisheye images and the reference ground plane image, as shown in Fig. 7(a). At least four pairs of corresponding feature points were identified on both the ground plane image and the undistorted fisheye image. The homography relationship and the coordinate transformation between these two images were then estimated from this corresponding relationship. RANSAC [15] (RANdom SAmple Consensus) was applied for better estimation results.

Fig. 7.
figure 7

(a) Four fisheye cameras mounted around the vehicle and 48 circle patterns for calibration. (b) Red circles for the estimation of homography transformation.

3.2 Extrinsic Parameters

If we obtain the homography transformation matrix, we can know the relationship between each fisheye image and ground plane image. The extrinsic parameters, which denote the coordinate system transformation from fisheye camera coordinate system to the world coordinate system, are calculated by the following equations [8].

$$\begin{aligned} M_{ext}=&\begin{bmatrix} R_{1}&R_{2}&R_{3}&T\\ \end{bmatrix}\end{aligned}$$
(6)
$$\begin{aligned} H=\begin{bmatrix} h_{1}&h_{2}&h_{3}\\ \end{bmatrix}&=s\cdot M_{int}\cdot \begin{bmatrix} R_{1}&R_{2}&T \end{bmatrix}\end{aligned}$$
(7)
$$\begin{aligned} R_{1}=&{}\frac{1}{s}\cdot M_{int}^{-1}\cdot h_{1}\end{aligned}$$
(8)
$$\begin{aligned} R_{2}=&{}\frac{1}{s}\cdot M_{int}^{-1}\cdot h_{2}\end{aligned}$$
(9)
$$\begin{aligned} T=&{}\frac{1}{s}\cdot M_{int}^{-1}\cdot h_{3} \end{aligned}$$
(10)

and

$$\begin{aligned} \quad R_{3}= R_{1}\times R_{2} {,} \end{aligned}$$
(11)

where \(h_{1}\), \(h_{2}\), \(h_{3}\) are the column of homography matrix \(H\), \(s\) is a scaling factor, \(M_{int}\) is the intrinsic matrix and \(M_{ext}\) is the extrinsic matrix which can be composed to the rotation matrix \(R_{1}\), \(R_{2}\), \(R_{3}\) and translation matrix \(T\).

4 Brightness Uniformity and Image Blending

4.1 Brightness Uniformity

Once the transformation matrix H is obtained, we use it to transform and map the fisheye image onto the ground plane image, as shown in Fig. 8(d). By repeating this process for each of the four fisheye cameras we can stitch all of the fisheye images into a panoramic one. But the various brightness of different cameras view may cause the obvious edges in the overlapping region, as shown in Fig. 9(a). In this work, we applied the following algorithm to tackle this problem.

Fig. 8.
figure 8

(a) Homography mapping. (b) The ground reference plane. (c) Undistorted image captured by front fisheye camera. (d) Stitched images of (c) and (b) using homography transform.

Fig. 9.
figure 9

(a) Before and (b) after brightness uniformity correction.

Step 1: Separating the overlapping region from stitching image.

Step 2: Transforming from RGB to YUV color space and using Y value as luminance.

Step 3: Finding the sub-region gain of each pixel in overlapping region by the following equation:

$$\begin{aligned} G_{i,x,y}=\frac{L_{i,x,y}+L_{j,x,y}}{2\times L_{i,x,y}} {,} G_{j,x,y}=\frac{L_{i,x,y}+L_{j,x,y}}{2\times L_{j,x,y}} {,} \end{aligned}$$
(12)

where \(i, j\) represent the indices of the cameras and \(L_{i,x,y}\) represent the mean luminance of sub-region in camera \(i\) whose centroid coordinate is \((x,y)\). The size of each sub-region is \(25\times 25\).

Step 4: Smoothing the available gain with Gaussian kernel of size \(25\times 25\).

Step 5: Polynomial fitting to obtain the remaining gains which are not in the overlapping region along horizontal or vertical scan line of stitching image. In order to maintain the automatic gain control mechanism of the used cameras, we set a center value close to unity in order to maintain the original information. As the polynomial fitting shown in Fig. 10, \(B\) and \(F\) are the two interpolation boundary called beginning and ending positions and \(M\) is the middle position of the scan line. The gain value on the position \(M\) is calculated by using the following averaging equation:

$$\begin{aligned} G_M = \frac{2+G_B+G_F}{4} {.} \end{aligned}$$
(13)
Fig. 10.
figure 10

(a) For each scan line, we first obtain the two boundary points named B and F. The green points indicate the two gain value of B and F. (b) We add a middle position gain named M whose gain value is calculated by averaging equations of the gains of B and F and unity. (c) Polynomial fitting to obtain the remaining gains.

4.2 Linear Blending

Because of the wide FOV angle of fisheye camera, each corner of the stitching image is within region overlapped to the view of other cameras. We applied linear blending process for the stitching smoothness. As shown below, linear blending is a linear combination process of overlapped pixels:

$$\begin{aligned} l^{linear}(x,y)=\frac{\sum _{i=1}^{n}l^{i}(x,y)W^{i}(x,y)}{\sum _{i=1}^{n}W^{i}(x,y)} {,} \end{aligned}$$
(14)

where \(I^i (x,y)\) is the image intensity of pixel in coordinate \((x,y)\) of image \(i\) and \(W^i (x,y)\) is the weight in coordinate \((x,y)\) of image \(i\). The shorter distance between overlapped pixel and image center, the larger the weight is. After the blending process, we can obtain the smooth stitched images, as shown in Fig. 11. Figure 12 shows an overall result of dewarping and stitching images captured from four fisheye cameras.

5 Hybrid Model Projection

To build a system that can allow driver to select different third-person view and can synthesize the result image from the four images acquired by fisheye cameras, an intuitive way is to perform view interpolation proposed by Chen [16]. However, in the vehicle surrounding monitoring application, computational cost is large to apply the algorithm of view interpolation when moving objects exist. Therefore, we design a 3D hybrid model and project four undistorted images onto it. Then we use intrinsic and extrinsic parameters of the selected viewpoint to synthesize the results. The concept of view interpolation is shown in Fig. 13(a).

Fig. 11.
figure 11

(a) Stitched image without blending. (b) Stitched image with linear blending.

Fig. 12.
figure 12

The stitched result of four images from fisheye cameras.

Fig. 13.
figure 13

(a) View interpolation. (b) Variable viewpoint.

5.1 View Interpolation Using Hybrid Model

As other vehicle surrounding systems mentioned at Sect. 1, we can stitch four dewarpped images into ground plane to see the objects around vehicle. However, there are two major problems in this way. First, distortion in the image is inevitable. Second, image textures above the vanishing point would be projected to an infinite position. In this way, the driver will not be able to see the scene above the vanishing line from the viewpoint. To solve these problems, we calculate the camera extrinsic parameters from homography matrix and camera intrinsic parameters and then project the acquired images to 3D model. We use back-projection to find the texture of 3D model. The projection equation is shown in (15):

$$\begin{aligned} \begin{bmatrix} x&y&1 \end{bmatrix}^{^{T}} = \frac{1}{w}\cdot M_{int}\cdot M_{ext}\cdot \begin{bmatrix} X&Y&Z&1 \end{bmatrix}^{^{T}} {,} \end{aligned}$$
(15)

where \(\begin{bmatrix} x&y&1 \end{bmatrix}^{^{T}}\) is the coordinate in dewarpped image, \(w\) is the homogeneous factor, \(M_{int}\) and \(M_{ext}\) are the intrinsic and extrinsic parameters of fisheye camera, respectively, and \(\begin{bmatrix} X&Y&Z&1 \end{bmatrix}^{^{T}}\) is the coordinate in 3D model.

We use four parameters to define the viewpoint, as shown in Fig. 13(b). \(Pan\) is from \(-\pi \) to \(\pi \), describing the main direction specified by the driver. \(Tilt\) is set as \(\frac{\pi }{3}\), which is the angle between the virtual camera and the normal vector of the ground. The elevation angle, \(\gamma \), is set as \(\frac{\pi }{12}\). The distance from the virtual camera to the centroid of curved surface is 5 m.

5.2 Model Comparison

We compared the result images by projecting the acquired images into four different 3D models. The equation of these models are:

Model 1: Ground plane and cylinder surface:

$$\begin{aligned} \left\{ \begin{matrix} z=0, &{} \quad \text {while} ~~ x^2+y^2< d^2\\ z\ge 0,&{} \quad \text {while} ~~ x^2+y^2= d^2 \end{matrix}\right. \end{aligned}$$
(16)

Model 2: Second degree paraboloid:

$$\begin{aligned} z=\frac{(x^2+ y^2)}{a^2} \end{aligned}$$
(17)

Model 3: Fourth degree paraboloid:

$$\begin{aligned} z=\frac{(x^4+ y^4)}{a^4} \end{aligned}$$
(18)

Model 4: Hybrid model:

$$\begin{aligned} \left\{ \begin{matrix} z=\frac{(x^4+ y^4)}{a^4},&{} \quad \text {while} ~~ x^4+y^4< c^4\\ z \ge \frac{(x^4+ y^4)}{a^4},&{} \quad \text {while} ~~ x^4+y^4= c^4 \end{matrix}\right. {,} \end{aligned}$$
(19)

where \(a\), \(c\) and \(d\) are the coefficients of the paraboloid, columnar and cylinder. Figure 14 shows an example of comparison results. After image undistorted, we project the images into ground plane by using homography matrix, as shown in Fig. 14(b). Distortion is large for tall objects like trees and buildings. Another problem is that the scene above the vanishing point does not appear on the ground plane. Therefore we project the acquired images onto a 3D surface to reduce the distortions. In Model 1 (Fig. 14(c)), because the ground plane and cylinder is perpendicular, objects bend severely in the image. Although scene looks smooth in Models 2 and 3 (Figs. 14(d)(e)), there is expansion effect between surroundings.

Fig. 14.
figure 14

The results using different 3D models. (a) Acquired image from fisheye camera. (b) After homography transformation. (c) Use ground plane and cylinder model. (d) Use second degree paraboloid. (e) Use fourth degree paraboloid. (f) Use hybrid model.

There are two reasons that we choose the hybrid model as our 3D model. First, the columnar surface makes the view more realistic in four directions. Second, the smoothness between fourth degree paraboloid and columnar is better. As shown in Fig. 14(f), better result with less distortion is achieved by using the hybrid model.

5.3 Lookup Table

Instead of warping and stitching the whole images for each frame, we use a lookup table approach to decrease the computational cost. According to the look-up table, we can get the pixel value of output image from the original fisheye camera images by at most seven add operations and six multiplication operations.

Each entry of the lookup table contains three data values and the information of each pixel can be retrieved from single camera frame. One is the camera ID and the other two are the coordinates \((x, y)\) in that image. For those pixels in the overlapping area, there will be seven data values in the lookup table. They are two coordinates of two different images and one linear blending weighting factor. The structure of the lookup table is shown in Fig. 15.

Fig. 15.
figure 15

Structure of the lookup table.

Fig. 16.
figure 16

The results of different viewpoints. (a) Front view. (b) Left view. (c) Top view. (d) Right view. (e) Rear view.

Fig. 17.
figure 17

Ghost effect and image distortion. The color line shows the seam between paraboloid model and columnar model, and the blue line shows the overlapping area. (a) Ghost effect. (b) Image distortion.

6 Experimental Results and Discussion

Figure 16 shows the result of our system in different viewpoint. We developed our system on a PC with an Intel Xeon 3.3 GHz CPU and 16 G RAM. The software was developed with Visual Studio 2010, Release mode with full code optimization. In our experiments, four fisheye cameras were mounted on a car and we tested our system in a campus and recorded about 300 s. The computation time of each frame was 0.7 s. Drivers can check every side of the vehicle carefully. By using this system, it will be safer to drive through a small lane, change lane in the highway, turn right or left in the intersection, and backward in a narrow area.

The major problem in the Fujitsu system is that the object may disappear in one view and show up again in another view because this system shrink the overlapping area between two neighbouring cameras. To tackle this problem, we choose the wide angle fisheye camera to enlarge the overlapping area. Inside the overlapping area, the images of objects appear in two different camera image simultaneously. Due to the unknown depth of objects, the same person in different cameras may be projected to different positions and cause the ghost effect in our system, as shown in Fig. 17(a). Because we use one static 3D model for all vehicle surrounding scenes, no matter for the passing pedestrians or the trees in 15 m away, we can project them onto the same 3D model and may cause distortions as shown in Fig. 17(b). We need a 3D model which can adjust its shape according to the 3D positions of surroundings to solve these problems in the future.

7 Conclusions

In this work, we have developed a real-time vehicle surrounding monitoring system. It can provide a realistic and intuitive surrounding scene for drivers. To construct our system, a simple and precise method for fisheye camera calibration and image undistortion are proposed. Furthermore, a novel hyper projection model, which contains a paraboloid surface and a columnar surface, is used. It makes the final rendered view look more realistic. With this monitoring system, driver can use the third-person viewpoint to watch the surroundings of her vehicle. The proposed system can provide the drivers with good comprehension of the surrounding situations and reduce the risk of car accidents.