Keywords

1 Introduction

Image 3D reconstruction is a solution aspect of many problems in computer vision, including object, scene recognition, solving for the robot obstacle recognition and etc [13]. The paper presents an innovation configured catadioptric sensor, an accurate omnidirectional stereo vision optical device [4] based on a common perspective camera coupled with two hyperbolic mirrors. The incident light rays are easily found from the point of the image that ensures a single viewpoint (SVP) as the hyperbolic mirrors. In this system the two hyperbolic mirrors, which aligned coaxially and separately, share one focus that coincides with the camera center, and it was developed in [58]. Therefore the geometry of this system naturally ensures epipolar lines in the two images of the scene are matched. The separation between the two mirrors provides a large baseline and eventually leads to a precise result. The paper provides two methods of stereo matching to three-dimensional calculation.

The edge detection [9] is that process helps to simplify the analysis of images in the need to deal with the data quantity, while preserving useful information about the structural of object boundaries. The method classifies a pixel as an edge if the gradient magnitude of the pixel is larger than those of pixels at both sides of the pixels that the two biggest strength to change direction. In summary, the performance criteria are that good detection, good localization and only one response to a single edge.

The method of feature extraction is mentioned in [10]. The major stages of computation used to generate the set of image features as follows: 1. Scale-space extrema detection. 2. Keypoints localization. 3. Orientation assignment. 4. Keypoints descriptor.

The main contributions of this paper are using novel methods of Canny and SIFT for stereo matching and proposed an innovation procedure of the 3D reconstruction.

The structure of this paper is as follows: In Sect. 2, the method for 3D reconstruction is shortly sketched; the most important part of the paper is stereo matching, so the procedure of reliable stereo matching is analyzed. The experimental analysis is shown in Sect. 3. In the end, the different ability for the identify obstacles can be shown in Tables 1 and 2.

2 Stereo Matching and 3D Reconstruction

An image collected by the system (OSVOD) is shown in Fig. 1, and it is due to the hyperbolic mirror reflection, meanwhile, the image is distortion, it is far away from the system, the imaging distortion is more serious. The system [11, 12] can be spread out into an accord with imaging perspective projection to the whole image. The image included the nearly 360° field of view, and the imaging scope of the upper mirror is the plane to point to center for 300 pixels radius of the circle, the lower mirror is from 300 to 700 pixels. The imaging scheme [13] of 3D reconstruction is improved in [1419].

In Fig. 1, it is the system vision (OSVOD). In Fig. 2, it is the original image captured by OSVOD.

The procedure for 3D reconstruction from omnidirectional images is proposed. The procedure for the steps of 3D reconstruction can be outlined as follows:

  1. Step 1.

    Capturing an omnidirectional image.

  2. Step 2.

    Unwrapping the omnidirectional image into panoramic images.

  3. Step 3.

    Doing Canny edge detection SIFT points extraction on the panoramic image which is converted from the image via bellow hyperbolic mirror.

  4. Step 4.

    Finding matched points of the edge and SIFT between the images obtained by Step 3.

  5. Step 5.

    3D calculating based on the matched points and the system model.

The important portion of the paper is the stereo matching. The steps of the stereo matching are as follows.

Fig. 1.
figure 1

The appearance of the vision system.

Fig. 2.
figure 2

The original image captured by OSVOD.

2.1 Select the Size of Matching Template

Jasmine Banks [8] analyzed the relation of the template size and the matching effect, and he found that the bigger template possess the smooth effect. Considering this image noise is stronger, so the 25 × 25 template is chosen in this paper.

2.2 Choose Similar Measure

Based on the matching of catadioptric images, Wei [20] found that the matching accuracy of NCC(Normalized cross-correlation) measure is higher than other measures. But it is difficult to directly measure the similarity among the corresponding points; the more intuitive ZNCC measure is applied in this paper.

NCC:

$$ C_{NCC} \left( {p,d} \right) = \frac{{\sum\limits_{{\left( {x,y} \right) \in W_{P} }} {I_{a} \left( {x,y} \right) \bullet I_{b} \left( {x + d,y} \right)} }}{{\sqrt {\sum\limits_{{\left( {x,y} \right) \in W_{P} }} {I_{a}^{2} \left( {x,y} \right)\sum\limits_{{\left( {x,y} \right) \in W_{P} }} {I_{b}^{2} \left( {x + d,y} \right)} } } }} $$
(1)

ZNCC:

$$ C_{ZNCC} \left( {p,d} \right) = \frac{{\sum\limits_{{\left( {x,y} \right) \in W_{P} }} {\left( {I_{a} \left( {x,y} \right) - \bar{I}_{a} } \right) \bullet \left( {I_{b} \left( {x + d,y} \right) - \bar{I}_{b} } \right)} }}{{\sqrt {\sum\limits_{{\left( {x,y} \right) \in W_{P} }} {\left( {I_{a}^{2} \left( {x,y} \right) - \bar{I}_{a} } \right)^{2} \sum\limits_{{\left( {x,y} \right) \in W_{P} }} {\left( {I_{b}^{2} \left( {x + d,y} \right) - \bar{I}_{b} } \right)^{2} } } } }} $$
(2)

Where σ is the selection of the template for pixel grayscale of average, and it is a variable. a and b is the reference and matching images separately.

It is supposed that the reference points for matching is I(i, j) from the lower mirror, and i 1(i, j) and i 2(i, j) is the points respectively for the cylindrical and perspective projection.

2.3 The Stereo Matching of Cylindrical Projection

First of all, searching for the corresponding points I 1(i, j) in the cylinder, and choosing 25 × 25 template for the lower mirror in above image in Fig. 3, which name is T a . If the value of each point of pixel gray on the template is x i , the pixel grayscale average and \( \sigma_{a} \) for variance on the template is calculated.

$$ \mu_{a} = \frac{1}{625}\sum\limits_{i = 1}^{25} {\sum\limits_{j = 1}^{25} {x_{i} } } $$
(3)
$$ \sigma_{a}^{2} = \sum\limits_{i = 1}^{25} {\sum\limits_{j = 1}^{25} {(x_{i} - \mu_{a} )^{2} } } $$
(4)

Setting 25 × 25 template for the upper mirror in below image in Fig. 3 is to search the corresponding point in the same column, which name is T b . The grey value of each point pixel is y i in the template, the pixels mean μ b and variance \( \sigma_{b} \) of the corresponding matching point in the template are calculated according to the type (2) and (3), so the best match position is:

$$ i_{1} (i^{*} ,j^{*} ) = \hbox{max} \{ C_{ZNCC} (p,d)|(i,j) \in T_{b} \} $$
(5)

Related measure value:

$$ C_{ZNCC}^{ 1} \left( {p,d} \right) = \hbox{max} \{ C_{ZNCC} (p,d)|(i,j) \in T_{b} \} $$
(6)
Fig. 3.
figure 3

The sketch of the coherence checkout of the stereo matching.

2.4 The Corresponding Points Matched in Perspective Projection

The matched reference i 1(i, j) in cylindrical image projects to the corresponding point of i 2(i, j) in perspective image. While the search area of the cylindrical image will also project to the perspective image that is still same search process with cylindrical image projection, only in cylindrical image on the search corresponding points, but the similar measure is calculated matched reference points and the corresponding points of perspective image on (the search area of cylindrical projection will project to the cylindrical, and the corresponding points available on perspective of the corresponding points) between related measure value, the bigger related measure value \( C_{zncc}^{2} \left( {p,d} \right) \), also get the best corresponding points i 2(i *, j *). The relevant measure value

$$ C = Max(C_{ZNCC}^{1} (p,d),C_{ZNCC}^{2} (p,d)) $$

To select the bigger value point of the relevant measure for a matched point, then return to the original image to get the best matched reference point.

In order to prevent mismatch, if the related measure value C is less than a threshold K, it fails. Otherwise to do consistency check.

2.5 Consistency Check

From the top five steps, we get the best matched point I *(i, j) by the I(i, j) in below mirror. What consistency check is that I *(i, j) is referenced in the above mirror, and searching the corresponding points in below mirror. If the best point is I(i, j), then passing through the consistency check. Otherwise there are not corresponding matched points. The perspective and cylindrical projection are to be combined, and more points through the consistency check, thus generate more 3D reconstruction points.

Through the consistency of calibration of all the matching points and the three dimensional axis is calculated and the depth maps is produced.

3 Experiment Results

This paper presents a method 3D reconstruction and depth map building based on edge and SIFT feature points. Figure 4 shows the original omnidirectional image captured by OSVOD.

The input images are converted to panoramic images, in which epipolar lines become vertical and parallel. Therefore the efficient stereo matching algorithm for conventional stereo images can be applied to this study. Figure 5 shows the panoramic images unwrapped from the original image. The panoramic image that after Canny edge detection is shown in Fig. 6, and the panoramic image that after SIFT detection is shown in Fig. 7. Epipolar geometry makes the stereo matching easier by reducing the 2D search for corresponding pixels to a 1D search along the same epipolar line in both images (Figs. 8, 9, 10 and 11).

Fig. 4.
figure 4

The original omnidirectional image of stereo matching and 3D reconstruction captured by OSVOD.

Fig. 5.
figure 5

The cylinder panoramic images generated from the low hyperbolic mirror.

Fig. 6.
figure 6

The Canny edge detection result based on the cylinder panoramic images generated from the low hyperbolic mirror.

Fig. 7.
figure 7

The SIFT feature extraction result based on the cylinder panoramic images generated from the low hyperbolic mirror.

Fig. 8.
figure 8

The 3D reconstruction result based on Canny edge detection.

Fig. 9.
figure 9

The 3D reconstruction result based on SIFT features points.

Fig. 10.
figure 10

The depth map (the mobile robots’ obstacle map, which red represents obstacles) calculated based on the 3D reconstruction of the Canny edge points (Color figure online).

Fig. 11.
figure 11

The depth map (the mobile robots’ obstacle map, which red represents obstacles) calculated based on the 3D reconstruction of the SIFT feature points (Color figure online).

Section 3 shows the result of the matching and 3D calculation, the blue points is used to represent those objects whose z coordinates range is between 30 mm and 60 mm. The red points donate those objects whose z coordinates range is larger than 60 mm, and it can be seen obstacles.

Table 1. The detailed distributing of stereo matching errors by Canny
Table 2. The detailed distributing of stereo matching errors by SIFT

Table 1 shows the detailed distributing of stereo matching errors by Canny, and Table 2 shows the detailed distributing of stereo matching errors by SIFT. From the Tables 1 and 2, the sum number of points for edge detection is more than the feature detection. The ability of object detection by the Canny is stronger than SIFT, but the ability of identify obstacles is weaker.

4 Conclusion

In the work we present a 3D reconstruction and depth map method based on the stereo matching and 3D calculation of the edge and SIFT feature points. The experiment results demonstrate it is practicable to install the omnidirectional stereo vision system on mobile robots to detect obstacles around the robot in this paper. Moreover the stereo vision system can be used for giving omnidirectional range measurements with the benefits of stereo vision need for only one image. Compared with the single-camera omnidirectional stereo vision systems previously reported, our stereo vision system has such significant advantages that its geometry calculates is easier and faster, and the 3D reconstruction accuracy is quite good except the occlusion and ambiguous regions, as the depth map shows.