Keywords

1 Introduction

Most recently working areas are camera calibration and stereo matching systems based on the binocular stereo vision camera. In camera calibration, we have to estimate the parameter of camera that is very useful to acquire good 3D positioning reconstructions accuracy. This paper investigates camera parameters that inference the system reconstruction accuracy. The camera parameters associated with a stereo imaging setup can be classified into two main categories: (i) intrinsic camera parameters and (ii) extrinsic camera parameters. The effect of extrinsic camera parameters in a stereo imaging system can be tackled by the epipolar geometry constraint. However, intrinsic camera parameters, like focal length, lens distortion, image resolution, etc., interfere directly in the 3D reconstruction. In this context, it will be interesting to study the effects of these parameters in the 3D reconstruction.

In set of 2D digital image that considered as an input (arbitrary perspective views) and parameters of the viewing geometry, the output represents 3D object that defines by structural parameter of the model. This type of reconstruction is performed by input in the form of individual point (pixels), and groups of pixels represent some geometric feature in the image like as line, curve, etc. However, it is very difficult to process because the relative depth is missing during the projection of 3D scene on to the projection plane. In traditional method, 3D real-time depth information was extracted from image using sensor, whereas passive stereo vision model represents a significant error in processing un-texture region which is frequent in indoor world.

The 3D points lie on the line which starts from the center of projection, and it moves in the direction of its corresponding projected points in an image plane. We have only one image or view; the corresponding unique 3D points cannot be reconstructed as different 3D points lie on the same projected line. So if we want unique reconstruction, then we use at least two images. It provides the intersection of two projection lines through several projection planes that yield the unique 3D point.

There are following basics steps for stereo view 3D reconstruction process:

  • Check the corresponding 2D features of the projected 3D scene or object onto the view planes.

  • Find out the parameters of these projected 2D geometrical features in both view planes.

  • Establish the correspondence between these projected 2D features.

  • Use inverse perspective equations for computing the 3D structural parameters.

  • Show the 3D object using line drawings or shading model.

The 2D projections or input image generate hurdles in the problem of reconstruction of a 3D object. These projections mainly depend on the relative position of the object with effect to the cameras, line of sight and other parameters of the cameras. The 3D reconstruction problem is divided into two subproblems: correspondence and triangulation. In the problem of correspondence includes the corresponding features in different images. It is formulated as: Given different images or views of a 3D object, find different points or features in one view which may be referred as the same points or features in other views. While the triangulation problem yields the location of 3D points from its two or more projections. The main difficulties in the field of multiview reconstruction involve: establishing the correspondence between pair of images and obtaining inverse mathematical functions based on camera models (perspective, weak perspective, affine, etc.)

Each object in real world is 3D, and this is one of the main issues in computer vision to acquisition of 3D information of real-world scene from 2D image. The acquisition is divided into two methods: active method and passive method. The passive method calculates the inverse problem of process of projecting a 3D scene onto a 2D image plane, and here is an advantage to find the three-dimensional information with no effect to the scene. And the active method emits the radio or light energy from source. The passive technique is referred to as motion, stereo, shading, texture, contours and so on. Basically, there are two categories of inverse problem: photometric (optical) and geometrical. And all these types of inverse problems are used for image formation process.

1.1 Binocular Stereo

Binocular stereo or two-view stereo is based on the person stereo vision, and it is a non-contact passive sensing method. A pair of images of a 3D object (scene) is obtained from two different viewpoints under perspective projection as illustrated in Fig. 1. To obtain a 3D object from these perspective images, a distance measurement (depth) from a known reference coordinate system is computed based on triangulation. The main problem in binocular stereo is to find the correspondences in stereo pair called the stereo correspondence problem. In this section, some concepts (camera model, epipolar geometry, rectification and disparity map) related to stereo correspondence problem have been introduced.

Fig. 1
figure 1

Binocular stereo image formations

2 Literature Review

Several studies [1, 13, 16, 20] were carried out for a surface reconstruction from the scanned data. However, the study split into two methods: Voronoi based and mesh free. Voronoi methods are used with Delaunay triangle [5] to reconstruct the surfaces, and according to mesh-free algorithm, the surfaces are reconstructed by using B-splines [22], radial basis functions [18], and PDE and MLS [9] techniques. Along with these techniques, data point acquisition is proving to be a very useful step. In most of the hardware system [7, 21], a light wave technique was utilized in [7] for 3D surface point reproduction. A MS Xbox kinect sensor is used to get the volumetric depiction, and a similar kind of MS kinect sensor camera was used in [21] 3D inside scene recreation. Computational geometry devices were created in [1] for the surface demonstration from the valuable information. To again construct surfaces and to get correspondences all the while, some notable finite set of instruction [6, 11, 12] which depends on the volumetric reproduction is displayed from the 2D pictures existed in writing. A strategy for spaces carving and voxel’s coloring dependent on the perceive ability and the stability of the voxels in the picture was created [12]. In the direct discrete minimization (DDM), details of graph cut methodologies were likewise proposed in [8, 11, 17] for building up the corresponding within the images. The entirety of the proposed strategies acquires disparity maps with exact shapes [2, 19] but limited depth precision. Be that as it may, this limitation was removed in [4]. Other calculation for the reconstruction is of 3D scenes from just two perspectives and depends on minimum line correspondences introduced in [15]. Another method for a precise 3D shape estimation of different separate items was exhibited in [10] utilizing stereo vision methodology. Actually, the excellence of the system is capacity of the system of playing out full-field three-dimensional shape estimation with high precision even within the sight of discontinuities and different separate districts. Numerous applications like acknowledgment, robot vision and activity require a substitute portrayal of three-dimensional questions in a minimized structure. Curve skeleton is an alternate representation from average surface representation [3]. Curve skeleton incorporates a few models: virtual route, enrollment, liveliness, transforming, logical investigation, shape acknowledgment and shape recovery. On account of 2D, skeleton is named as medial axis, but in 3D, these are called mean surface. Curve skeleton is an uncommon 1D representation of a 3D object whose recreation is additionally a troublesome subject because of a not-well-presented issue. Aside from the above representation techniques, endeavors were additionally made for reproducing the 3D surface utilizing skeletons of the object [7, 14].

The Pinhole Camera Model.

The pinhole camera model describes the mathematical relationship between the coordinates of a 3D point and its projection onto the image plane.

As shown in Fig. 2, basically it consists of two screens: retinal plane and focal plane. The retinal plane (Re) is where the 2D image is formed; the focal plane (F) placed in center is called optical center C. Here f is the focal length of the camera based on both planes which are parallel from certain distance. A straight line connects the point W. It is world point. This point W mapped 2D image by using perspective projection. Since retinal plane and focal plane are parallel to each other, the points lie on focal plane and it has no image on retinal plane.

Fig. 2
figure 2

Pinhole camera model

Epipolar Constraints:

The epipolar constraint makes sure that the epipolar lines are corresponded to the horizontal scan lines and here shifted horizontally all the corresponding point in both the images and reduced the two-dimensional search space into one-dimensional image.

If there are two cameras, every points of these cameras are W = [xw yw zw, 1]T of the real-world reference frame can be projected to the suitable image frame (wl and wr) using the transformation matrix P = TiTl as known form above equation.

$$ \begin{gathered} W_{{\text{r}}} = \, P_{{\text{r}}} W, \hfill \\ W_{{\text{l}}} = PW \hfill \\ \end{gathered} $$

In order to correct the image according to the epipolar constraints, these left projection matrix Pl & right projection matrix Pr have to be used in the following particular conditions:

  1. 1.

    Focal length must be equal for both cameras which is necessary.

  2. 2.

    Focal plane must be same for both cameras which is also necessary

  3. 3.

    And the optical centers of camera need to be constant.

  4. 4.

    Each point in left image and right image needs to be same for correspondence of the vertical coordinate.

Pl and Pr can be further calculated using these, and obtained image can be transformed according to the epipolar constraint.

Rectification Process:

The rectification process is used to determine distance of an object in triangular-based stereo vision. However, binocular disparity is the process of corresponding depth of an object to change its position, especially in case of different camera view, when relative position is known for the camera.

It is a transform process that is used onto same plane image for two or more images. Generally, ratification is used for correspondence analysis. Correspondence processes the used rectification in very simple way. It defines two new perspective matrices that store the optical centers with the baseline contained in the focal planes. This ensures that the epipoles are at infinity, and hence, these epipolar lines are parallel.

3 Disparity for Three-dimensional Reconstructions

The disparity is the distance between two points in the rectified object that is image. It is actually applied in three-dimensional reconstruction, because it is proportional to the 3D world coordinate and distance between the cameras.

3.1 Disparity Map

Once both the stereo images are rectified, the next step is to find the correspondence between them. For this, between left image and the right image, it shows the similarity or different measures. Sum of Square Differences (SSD) used measure for are based, the Sum of Absolute Differences (SAD), the Normalized Cross Correlation (NCC) and the census. Here coordinates are defined as

$$ {\text{SSD }}\left( {\text{d}} \right){ } = \mathop \sum \limits_{i } \sum_{j } [I_{l} \left( {x + d + i,y + j} \right) - I_{r} \left( {x + i,y + j} \right)] $$
$$ {\text{SSD }}\left( {\text{d}} \right){ } = \mathop \sum \limits_{i } \sum_{j } [I_{l} \left( {x + d + i,y + j} \right) - I_{r} \left( {x + i,y + j} \right)] $$
$$ {\text{NCC }}\left( d \right){ } = \frac{{C(I_{l } I_{r} ) - \mathop \sum \nolimits_{i} \sum\nolimits_{j} {\mu_{l} \mu_{r} } }}{{\mathop \sum \nolimits_{i} \sum_{j} \sigma_{l} \sigma_{r} }} $$

Here d represents the disparity between left and right images, and Il and Ir represent the stereo pair in left and right images at a point (x, y) in the right image. µl and µr represent the mean intensities of left image as well as right image in the corresponding windows, and σl and σr are standard deviations in the windows, respectively. \(C(I_{l} ,I_{r} )\) is a cross-correlation between the corresponding windows:

$$ C(I_{l} ,I_{r} ) = \mathop \sum \limits_{i } \sum_{j } I_{l} \left( {x + d + i,y + j} \right)I_{r} \left( {x + i,y + j} \right) $$

In Fig. 3, the disparity d can be found by minimize difference measures or by maximize similarity measures, but due to ill-posed nature of the problem, one cannot find the unique correspondence.

Fig. 3
figure 3

Disparity

4 Camera’s Parameters

As to change a point of the 3D world coordinate into a 2D purpose of the image planet, the information of quality camera is needed. Basically, two types of camera parameter, that is intrinsic and extrinsic, are also called internal and external, respectively. Intrinsic parameter defines the interior geometric and optical attributes of the camera, and extrinsic defines the position and direction of the camera in the world framework. The consequences for the framework reconstruction exactness are analyzed using results of simulation. At the end, we analyzed three parameter errors and system reconstruction errors, and the affected accuracy of the system is given. However, intrinsic camera parameters, like focal length, lens distortion, image resolution, etc., interfere directly in the 3D reconstruction accuracy which is somewhat bigger, and relationship of them is direct. The mistake in outer parameter straightforwardly impacts remaking exactness by influencing pattern separation and the point between cameras.

As observed in Fig. 4, the system for displaying at least two cameras comprises three separate coordinate scheme, (xw, yw, zw) is the world reference frame and (xc, yc, zc) is the camera frame with the focuses on origin and the imaging frame (X, Y). A 3D point, given in same world directions, can be changed over into the camera frame, by a revolution rij and an interpretation tj which is communicated by the outward parameter as

Fig. 4
figure 4

Intrinsic and extrinsic parameters of camera

$$ \left( {\begin{array}{*{20}c} {x_{c} } \\ {\begin{array}{*{20}c} {y_{c} } \\ {z_{c} } \\ \end{array} } \\ \end{array} } \right) = T_{l} \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{\text{x}}_{{\text{w}}} } \\ {{\text{y}}_{{\text{w}}} } \\ \end{array} } \\ {{\text{z}}_{{\text{w}}} } \\ \end{array} } \right) {\text{Where}} \; T_{l} = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{\text{ r}}_{11} {\text{ r}}_{12} {\text{ r}}_{13} {\text{ t}}_{1} } \\ {{\text{r}}_{21} {\text{ r}}_{22} {\text{ r}}_{23} {\text{ t}}_{2} } \\ \end{array} } \\ {{\text{r}}_{31} {\text{r}}_{32} {\text{ r}}_{33} {\text{ t}}_{3} } \\ \end{array} } \right) $$

(I) Intrinsic Parameters:

It is the most important parameter of the camera that for each camera, it characterizes the change from picture plane coordinate to pixel coordinate. If the intrinsic camera parameters are known, it is possible to obtain a metric reconstruction. If you obtain matrix reconstruction, it is necessary know intrinsic parameter. This metric has five parameters; this calibration can be obtained through off-line calibration with a calibration object.

At that point, it is changed over to the two-dimensional picture planes utilizing the intrinsic coordinates. These are specifically the central length f, the rule point (u0, v0), which is the focal point of the picture plane, and (k0, k1) the size of pixels in millimeter (mm) or α = f/k0 and β = f/k1. The change utilizing inherent parameter is as per the following:

$$ { }\left( {\begin{array}{*{20}c} X \\ {\begin{array}{*{20}c} Y \\ Z \\ \end{array} } \\ \end{array} } \right) = T_{i} \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{\text{x}}_{{\text{c}}} } \\ {{\text{y}}_{{\text{c}}} } \\ \end{array} } \\ {{\text{z}}_{{\text{c}}} } \\ \end{array} } \right){\text{ Where }}T_{i} = \left( {\begin{array}{*{20}c} {\alpha 0 u_{0} } \\ {0 \beta v_{0} } \\ { 0 0 1 } \\ \end{array} } \right) $$

Since (X, Y, Z) is uniform, and Z variables are divided X, Y, Z variable in order to find out coordinate of pixels X’ & Y’ and on the focal plane the points are Z = 0 and zc = 0. These pixels cannot be changed into image plane coordinate because it is divided by zero that is not define and this point connected by in straight; by this reason does not intersect by optical center to image plane, whereas image plane is parallel to it. There are following equations that given a point in world coordinate to convert into 2D image plane.

$$ \left( {\begin{array}{*{20}c} X \\ {\begin{array}{*{20}c} Y \\ Z \\ \end{array} } \\ \end{array} } \right) = T_{l} T_{i} \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{{x}}_{{{w}}} } \\ {{{y}}_{{{w}}} } \\ \end{array} } \\ {{{z}}_{{\begin{array}{*{20}c} w \\ 1 \\ \end{array} }} } \\ \end{array} } \right) $$

The information given by internal and external parameter of camera permits for rectifying of images and ensures the epipolar constraint.

(II) Extrinsic Parameters (R, T)

The extrinsic parameter describes the relative position and orientation of the two cameras. It defines the 3D world coordinate to 3D camera coordinates. It also defines the position of the camera's heading in world coordinates and camera center. R represents the rotation matrix, and T represents the position of the origin of the world coordinate.

5 Calibration

Stereo vision is used to find out the depth from camera image when we compare one or more views of same scene. When we compute the result, each 3D point corresponds to a pixel in one of the images. Binocular stereo process applies on only two images, which taken with cameras that are differentiated by a horizontal distance called “baseline.” Calibrating the stereo camera defines the actual unit of 3D world points such as millimeters, relative to the cameras. So many applications of this process include measurement, aerial photogrammetric and robot navigation. The calibrated stereo camera combination is of following steps.

  1. 1.

    To the calibrated stereo camera system

  2. 2.

    To correct a pair of stereo images

  3. 3.

    To compute disparity

  4. 4.

    And reconstructing the three-dimensional point cloud.

5.1 Calibration of a Stereo Camera

The calibration of stereo camera provides the intrinsic estimation of every camera. It involves the translation and rotation of the second camera corresponding to the first camera. All the parameters of camera are used to rectify a stereo pair of image and to make the two images parallel. The rectified image calculates the disparity map required to reconstruct 3D view. When you calibrate the stereo method, you drew multiple pairs of a calibration design from different views. The whole design of calibration can be shown in every image. You must save the image in a format that not has use lossy compression such as PNG format. JPEG image format is not in lossy format. The lossy format reduces the calibration accuracy. When the camera is set in parallel position, then epipolar line turns into parallel. This means that, the left image point is exactly same in the right image on the line (Fig. 5).

Fig. 5
figure 5

Calibrated image

When you want to calibrate your stereo camera, then the camera lenses make some distortion. This distortion in the image makes straight lines in the real world appear curved (Fig. 6).

Fig. 6
figure 6

Uncalibrated stereo images

The above two images are taken from uncalibrated stereo. In this picture, the cameras are not perfectly parallel that mean more or less parallel.

6 Experimental Results

A stereo imaging system was simulated in order to observe the effect of stereo rig parameters in the reconstruction error. We have given our emphasis on the two parameters mainly:

  1. 1.

    Baseline distance

  2. 2.

    Focal length.

In order to see the effect of these two parameters, we have considered the reconstruction of a quadratic curve, in which these parameters have been perturbed to see the effect on the 3D shape of the curve in stereo reconstruction.

The original 3D plot of the synthetic shapes has been shown in Fig. 7. Generally, the projected images in both views correspond to the different samplings of 3D object. Thus, we have evaluated the reconstruction algorithm with those synthetic objects whose data points correspond to different samplings of the 3D free-form objects. We have added the white Gaussian noise with varying standard deviations to the original data and perturbed the focal length and baseline distance a little bit in order to access the effect on reconstructed result.

Fig. 7
figure 7

Original and reconstructed elliptical curve in space with perturbed stereo parameters

In this experiment, the original and reconstructed objects are found to have a similar shape. The proposed algorithm is equally effective for the reconstruction of curve in case of a small variation in the stereo rig parameters. Moreover, from a qualitative point of view, we consider the following two tables to evaluate the effect of stereo parameters in the reconstruction of an ellipse and a helix using stereo vision technique.

Table 1 represents absolute error in reconstruction of an ellipse and helix when changing focal length parameter of camera (Fig. 8).

Table 1 Effect of focal length
Fig. 8
figure 8

Focal length versus change in focal length percentage

In Table 2, we have shown the mean error in reconstruction of an ellipse and helix when changing the baseline distance parameter of camera (Fig. 9).

Table 2 Effect of baseline distance
Fig. 9
figure 9

Baseline distance of camera versus baseline distance of camera in percentage

7 Conclusion

In this paper, we studied the effect of varying camera parameters in the process of matching and reconstruction and gave some results. This paper shows if the recent methods are adopted using inverse geometric reconstruction, then the problem can be modeled as a stereo vision where the effect of stereo rig parameters is negligible if the deviation in the parameters is quite small. However, the error increases significantly if there is a large perturbation in the stereo rig parameters. In future, this problem can be studied in real-time stereo. Such an implementation will be useful in many applications such as ball trajectory tracking in sports event, missile and robot path planning.