Reconstruction of the 3D Scenes from the Matching Between Image Pair Taken by an Uncalibrated Camera

Karim, Karima; El Akkad, Nabil; Satori, Khalid

doi:10.1007/978-3-319-96292-4_35

Karima Karim¹²,
Nabil El Akkad^12,13 &
Khalid Satori¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 872))

Included in the following conference series:

International Conference on Big Data, Cloud and Applications

1139 Accesses

Abstract

In this paper, we will study a new approach of reconstruction of three-dimensional scenes from an auto calibration method of camera characterized by variable parameters. Indeed, obtaining the 3D scene is based on the Euclidean reconstruction of the interest points detected and matched between pair of images. The relationship between the matches and camera parameters is used to formulate a nonlinear equation system. This system is transformed into a nonlinear cost function, which will be minimized to determine the intrinsic and extrinsic camera parameters and subsequently estimate the projection matrices. Finally, the coordinates of the 3D points of the scene are obtained by solving a linear equation system. The results of the experiments show the strengths of this contribution in terms of precision and convergence.

Access provided by CONRICYT-eBooks. Download conference paper PDF

To Study the Effect of Varying Camera Parameters in the Process of Matching and Reconstruction

Incremental Multi-view 3D Reconstruction Starting from Two Images Taken by a Stereo Pair of Cameras

Article 13 March 2015

Minimal Solutions for Pose Estimation of a Multi-Camera System

Keywords

1 Introduction

In this work, we will investigate about the three-dimensional reconstruction being a technique that allows obtaining a 3D representation of an object from a sequence of images of this object taken by different views. In fact, several 3D reconstruction techniques use calibration or Auto-calibration methods.

During this work, we will presented a new approach to reconstructing three-dimensional scenes from a method of autocalibration of cameras characterized by variable parameters. In general, the determination of the 3D scene is based on the euclidean reconstruction of the interest points detected and matched by the ORB descriptor [20]. The intrinsic parameters of the cameras are estimated by the resolution of a nonlinear equation system (using the nonlinear equations of the Levenberg-Marquart algorithm [18]), and they are used with the fundamental matrices (estimated from 8 pairings between the image couples by the RANSAC algorithm [11]) to determine the extrinsic camera parameters, and finally to estimate the projection matrix (expressed according to the intrinsic and extrinsic parameters of the cameras used). The relationships between camera parameters, projection matrix elements, pairing coordinates, and 3D point coordinates gives a linear equation system and the resolution of this system permits to obtain a cloud of 3D points.

In this introduction, we have therefore provided the general ideas that will be investigated in this paper. The rest of this work is organized as follows:

A diagram of different steps of our method is presented in the second part, the scene and the camera model are presented in the third part, the fourth part treats the auto calibration of the cameras, the fifth part explains the reconstruction of the 3D scene, the experiments will be discussed in the sixth paragraph and the conclusion is presented in the last part.

2 Diagram of Different Steps of Our Method

The Fig. 1. below represents a diagram of different steps of the reconstruction of 3D scene:

3 Scene and Camera Model

3.1 Presentation of the Scene

We consider two points $ {\text{S}}_{1} $ and $ {\text{S}}_{2} $ of the 3D scene, there is a single point $ {\text{S}}_{3} $, such as $ {\text{S}}_{1} {\text{S}}_{2} {\text{S}}_{3} $: is an equilateral triangle. $ {\text{R}}_{\text{e}} \left( {{\text{OX}}_{\text{e}} {{\rm Y}}_{\text{e}} {{\rm Z}}_{\text{e}} } \right) $ is the euclidean reference associated to the triangle wich $ {\text{O}} $ is its center and $ {\text{b}} $ its side.

3.2 Model of the Camera

We are using the pinhole model of the camera Fig. 2. so that we project the points of the 3D scene in the planes of images, this model is characterized by a matrix $ {\text{K}}_{\text{i}} \left( {{\text{R}}_{\text{i}} {{\rm t}}_{\text{i}} } \right) $ of size (3 × 4), with:

$ {\text{R}}_{\text{i}} : $ the rotation matrix
$ {\text{t}}_{\text{i}} : $ the translation vector
$ {\text{K}}_{\text{i}} : $ The matrix of intrinsic parameters defined by:
$$ {\text{K}}_{\text{i}} = \left( {\begin{array}{*{20}c} {{\text{f}}_{\text{i}} } & {{\text{s}}_{\text{i}} } & {{\text{u}}_{{ \circ {\text{i}}}} } \\ 0 & {\upvarepsilon_{\text{i}} {\rm{f}}_{\text{i}} } & {{\text{v}}_{{ \circ {\text{i}}}} } \\ 0 & 0 & 1 \\ \end{array} } \right) $$
(1)
with $ {\text{f}}_{\text{i}} : $ focal length
$ \upvarepsilon_{\text{i}} : $ the scaling factor
$ {\text{s}}_{\text{i}} : $ the skew factor
$ \left( {{\text{u}}_{{0{\text{i}}}} , {\text{v}}_{{0{\text{i}}}} } \right): $ the coordinates of the principal point.

4 Camera Autocalibration

The auto Calibration [1,2,3,4,5,6,7,8,9,10] is a technique that allows us to estimate the parameters of the cameras without any prior knowledge on the scene.

4.1 ORB Descriptor: Oriented FAST and Rotated BRIEF

The detection [12,13,14] and the matching [15,16,17] of the interest points are important steps in the autocalibration and the reconstruction of 3D scenes, in this paper we based on the ORB descriptor: Oriented FAST and rotated BRIEF [21] (ORB: Binary Robust Independent Elementary Features) which is a fast robust local feature detector, first presented by Rublee et al. in 2011 [20], that can be used in computer vision tasks like object recognition or 3D reconstruction. It is a fusion of the FAST key point detector and BRIEF descriptor with some modifications [9]. Initially to determine the key points, it uses FAST. Then a Harris corner measure is applied to find top N points. FAST does not compute the orientation and is rotation variant. It computes the intensity weighted centroid of the patch with located corner at center. The direction of the vector from this corner point to centroid gives the orientation. Moments are computed to improve the rotation invariance. The descriptor BRIEF poorly performs if there is an in-plane rotation. In ORB, a rotation matrix is computed using the orientation of patch and then the BRIEF descriptors are steered according to the orientation.

The ORB descriptor is a bit similar to BRIEF. It doesn’t have an elaborate sampling pattern as BRISK [26] or FREAK [27]. However, there are two main differences between ORB and BRIEF:

1.
ORB uses an orientation compensation mechanism, making it rotation invariant.
2.
ORB learns the optimal sampling pairs, whereas BRIEF uses randomly chosen sampling pairs.

ORB uses a simple measure of corner orientation – the intensity centroid [28]. First, the moments of a patch are defined as:

$$ {\forall } {\text{p, q}} \in \{ 0,1\} :{\text{m}}_{\text{pq }} = \sum {_{\text{x,y}}\,{\text{ x}}^{\text{p}} {{\rm y}}^{\text{q}} {\text{ I(x,y)}}} $$

(2)

With:

$ {\text{p, q}} \in \{ 0,1\} $ Binary selector for x and y direction

x,y Circular window
$ {\text{x}}^{\text{p}} {{\rm y}}^{\text{q}} $ weighted by coordinate
$ {\text{I}}\left( {{\text{x}},{\text{y}}} \right) $ image function

Image moments help us to calculate some features like center of mass of the object, area of the object etc.

With these moments we can find the centroid, the “center of mass” of the patch as:

$$ {\text{C}} = \left( { \frac{{{\text{m}}_{10} }}{{{\text{m}}_{00} }} , \frac{{{\text{m}}_{01} }}{{{\text{m}}_{00} }} } \right) $$

(3)

and by constructing a vector from the patch center O to the centroid C, we can define the relative orientation of the patch as:

$$ \mathop \to \limits_{OC} \theta = {\text{atan}}2\left( {m_{01} , m_{10} } \right) $$

(4)

ORB discretize the angle to increments of $ \frac{{2\uppi}}{30} $ (12°), and construct a lookup table of precomputed BRIEF patterns. As long as the keypoint orientation $ \uptheta $ is consistent across views, the correct set of points will be used to compute its descriptor.

To conclude, ORB is binary descriptor that is similar to BRIEF, with the added advantages of rotation invariance and learned sampling pairs. You’re probably asking yourself, how does ORB perform in comparison to BRIEF. Well, in non-geometric transformation (those that are image capture dependent and do not rely on the viewpoint, such as blur, JPEG compression, exposure and illumination) BRIEF actually outperforms ORB. In affine transformation, BRIEF perform poorly under large rotation or scale change as it’s not designed to handle such changes. In perspective transformations, which are the result of view-point change, BRIEF surprisingly slightly outperforms ORB.

4.2 The Projection Matrix

We consider $ {\text{S}}_{1} $ and $ {\text{S}}_{2} $ two points of the 3D scene and π the plan which contains these two points.

$ {\text{R}}_{\text{e}} \left( {{\text{O X}}_{\text{e}} {{\rm Y}}_{\text{e}} {{\rm Z}}_{\text{e}} } \right) $ is the Euclidian reference which is associated to the triangle of the center O and side b

The coordinates of points $ {\text{S}}_{1} $, $ {\text{S}}_{2} $ and $ {\text{S}}_{3} $ Fig. 3 are given as below:

$$ {\text{S}}_{1} = \left( {\frac{\text{b}}{2},\frac{\surd 3}{2}{\text{b}},1} \right)^{\text{T}} $$

$$ {\text{S}}_{2} = \left( {{\text{b}},0,1} \right)^{\text{T}} $$

$$ {\text{S}}_{3} = \left( {0,1,1} \right)^{\text{T}} $$

We consider the two homography $ {\text{H}}_{\text{i}} $ and $ {\text{H}}_{\text{j}} $ that can be used to project the plan in the images i and j, so the projection of the two points can be represented by the following expressions:

$$ {\text{s}}_{\text{im}} { \sim }{\text{H}}_{\text{i}} {{\rm S}}_{\text{m}} $$

(5)

$$ {\text{s}}_{\text{jm}} { \sim }{\text{H}}_{\text{j}} {{\rm S}}_{\text{m}} $$

(6)

With $ {\text{m}} = 1,2 $. $ {\text{s}}_{\text{im}} $ and $ {\text{s}}_{\text{jm}} $ represent respectively the points in the images $ {\text{i}} $ and $ {\text{j }} $ which are the projections of the two summits $ {\text{S}}_{1} $ and $ {\text{S}}_{2} $ of the 3D scene, and $ {\text{H}}_{\text{n}} $ represents the homography matrix defined by:

$$ {\text{H}}_{\text{n}} = {\text{K}}_{\text{n}} {{\rm R}}_{\text{n}} \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} 1 & 0 \\ 0 & 1 \\ 0 & 0 \\ \end{array} } \\ \end{array} {\text{R}}_{\text{n}}^{\text{T}} {{\rm t}}_{\text{n}} } \right) ;\,{\text{n}} = {\text{i}}, {\text{j }} $$

(7)

With:
$ {\text{R}}_{\text{n}} : $ the rotation matrix
$ {\text{t}}_{\text{n}} : $ the translation vector
$ {\text{K}}_{\text{n}} : $ The matrix of intrinsic parameters.

The expressions (5) and (6) can be written as :

$$ {\text{s}}_{\text{im}} { \sim }{\text{H}}_{\text{i}} {{\rm BS'}}_{\text{m}} $$

(8)

$$ {\text{s}}_{\text{jm}} { \sim }{\text{H}}_{\text{j}} {{\rm BS'}}_{\text{m}} $$

(9)

With : $ {\text{B}} = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\text{b}} & {\frac{\text{b}}{2}} & 0 \\ 0 & {\frac{\surd 3}{2}{\text{b}}} & 0 \\ 0 & 0 & 1 \\ \end{array} } \\ \end{array} } \right) $

$$ {\text{S'}}_{\text{m}} = \left( {\begin{array}{*{20}c} {\text{a}} \\ {\text{b}} \\ 1 \\ \end{array} } \right) $$

For:

$$ \left\{ {\begin{array}{*{20}c} {{\text{m}} = 1 < = > a = 0 \;and\; b = 1} \\ {{\text{m}} = 2 < = > a = 1 \;and\; b = 0} \\ \end{array} } \right. $$

We put:

$$ {\text{P}}_{\text{n}} { \sim }{\text{H}}_{\text{n}} {{\rm B }}; \,{\text{n}} = {\text{i}}, {\text{j }} $$

(10)

With $ {\text{P}}_{\text{i}} $ and $ {\text{P}}_{\text{j}} $ are the projections matrix of the two points $ {\text{S}}_{1}^{ '} $ and $ {\text{S}}_{2}^{ '} $ in the images i and j Figs. 3 and 6.

From the Eq. (10) we have:

$$ {\text{P}}_{\text{j}} { \sim } {\text{H}}_{\text{ij}} {{\rm P}}_{\text{i}} $$

(11)

With:

$$ {\text{H}}_{\text{ij}} { \sim }{\text{H}}_{\text{j}} {{\rm H}}_{\text{i}}^{ - 1} $$

(12)

$ {\text{H}}_{\text{ij}} $ is the homography between the images $ {\text{i }} $ and $ {\text{j}} $.

The Eqs. (8), (9) and (10) give:

$$ {\text{s}}_{\text{im}} { \sim }{\text{P}}_{\text{i}} {{\rm S}}_{\text{m}}^{ '} $$

(13)

$$ {\text{s}}_{\text{jm}} { \sim }{\text{P}}_{\text{j}} {{\rm S}}_{\text{m}}^{ '} $$

(14)

And from the Eqs. (11) and (14) we have :

$$ {\text{s}}_{\text{jm}} { \sim }{\text{H}}_{\text{ij}} {{\rm P}}_{\text{i}} {{\rm S}}_{\text{m }}^{ '} $$

(15)

The Eq. (15) gives:

$$ {\text{e}}_{\text{j}}^{ '} {\text{s}}_{\text{jm}} { \sim }{\text{e}}_{\text{j}}^{ '} {\text{H}}_{\text{ij}} {{\rm P}}_{\text{i}} {{\rm S}}_{\text{m}}^{ '} $$

(16)

This later gives:

$$ {\text{e}}_{\text{j}}^{ '} {\text{s}}_{\text{jm}} { \sim }{\text{F}}_{\text{ij}} {{\rm P}}_{\text{i}} {{\rm S}}_{\text{m}}^{ '} $$

(17)

With $ {\text{F}}_{\text{ij}} $ is the fundamental matrix between the images i and j.

$$ {\text{e}}_{\text{j}}^{ '} = \left( {\begin{array}{*{20}c} 0 & { - {\text{e}}_{{{\text{j}}3}} } & {{\text{e}}_{{{\text{j}}2}} } \\ {{\text{e}}_{{{\text{j}}3}} } & 0 & { - {\text{e}}_{{{\text{j}}1}} } \\ { - {\text{e}}_{{{\text{j}}2}} } & {{\text{e}}_{{{\text{j}}1}} } & 0 \\ \end{array} } \right) $$

$ \left( {{\text{e}}_{{{\text{j}}1}} {{\rm e}}_{{{\text{j}}2}} {{\rm e}}_{{{\text{j}}3}} } \right)^{\text{T}} $ are the coordinates of the epipole of the right image, this epipole can be estimated by the fundamental matrix.

The expression (18) gives:

$$ {\text{s}}_{{{\text{i}}1}} \sim{\text{P}}_{\text{i}} {{\rm S}}_{1}^{ '} $$

(18)

$$ {\text{s}}_{{{\text{i}}2}} \sim{\text{P}}_{\text{i}} {{\rm S}}_{2}^{ '} $$

(19)

So from the two last relationships, we gets four equations with eight unknowns that are the elements of $ {\text{P}}_{\text{i}} $

The expression (17) gives:

$$ {\text{e}}_{\text{j}}^{ '} {\text{s}}_{{{\text{j}}1}} \sim{\text{F}}_{\text{ij}} {{\rm P}}_{\text{i}} {{\rm S}}_{1}^{ '} $$

(20)

$$ {\text{e}}_{\text{j}}^{ '} {\text{s}}_{{{\text{j}}2}} \sim{\text{F}}_{\text{ij}} {{\rm P}}_{\text{i}} {{\rm S}}_{2}^{ '} $$

(21)

From the two last relationships, we get four other equations with eight unknowns which are the parameters of $ {\text{P}}_{\text{i}} $. So we can estimate the parameters of $ {\text{P}}_{\text{i}} $, because we have a total of eight unknown equations that are the elements of $ {\text{P}}_{\text{i}} $.

The Eq. (11) gives:

$$ {\text{e}}_{\text{j}}^{ '} {\text{P}}_{\text{j}} { \sim }{\text{e}}_{\text{j}}^{ '} {{\rm H}}_{\text{ij}} {{\rm P}}_{\text{i}} $$

(22)

That gives:

$$ {\text{e}}_{\text{j}}^{ '} {\text{P}}_{\text{j}} { \sim }{\text{F}}_{\text{ij}} {{\rm P}}_{\text{i}} $$

(23)

The previous expression gives eight unknown equations that are the elements of $ {\text{P}}_{\text{j}} $.

So we can estimate the parameters of $ {\text{P}}_{\text{j}} $ from these eight equations with eight unknown.

4.3 Autocalibration Equations

In this part, we will determine the relationship between the images of the absolute conic $ (\upomega_{\text{i }}\,{\text{ and }}\upomega_{\text{j}} $), and a relationship between the two points $ \left( {{\text{S}}_{1} , {\text{S}}_{2} } \right) $ of the 3D scene and their projections $ \left( {{\text{s}}_{{{\text{i}}1}} , {\text{s}}_{{{\text{i}}2}} } \right) $ and $ \left( {{\text{s}}_{{{\text{j}}1}} , {\text{s}}_{{{\text{j}}2}} } \right) $ in the planes of the left and right images respectively, the different relationships are established from some techniques of projective geometry. A nonlinear cost function will be defined from the determination of these relationships. The formulated cost function will be minimized by the Levenberg-Marquardt algorithm [18] to estimate $ \upomega_{\text{i}} $ and $ \upomega_{\text{j}} $ and finally the intrinsic parameters of the cameras used [24].

The Eq. (11) gives:

$$ \uplambda_{\text{im}} {{\rm s}}_{\text{im}} = {\text{P}}_{\text{i}} {{\rm S}}_{\text{m}}^{ '} $$

(24)

With: $ {\text{P}}_{\text{i}} = \left( {\begin{array}{*{20}c} {{\text{P}}_{11} } & {{\text{P}}_{12} } & {{\text{P}}_{13} } \\ {{\text{P}}_{21} } & {{\text{P}}_{22} } & {{\text{P}}_{23} } \\ {{\text{P}}_{31} } & {{\text{P}}_{32} } & {{\text{P}}_{33} } \\ \end{array} } \right) $

$$ {\text{s}}_{\text{im}} = \left( {\begin{array}{*{20}c} {{\text{x}}_{\text{im}} } \\ {{\text{y}}_{\text{im}} } \\ 1 \\ \end{array} } \right) $$

$$ {\text{P}}_{\text{i}}^{\text{T}}\upomega_{\text{i}} {{\rm P}}_{\text{i}} \sim\left( {\begin{array}{*{20}c} {{\text{B}}^{{{\prime }{\text{T}}}} {{\rm B}}^{{\prime }} } & {{\text{B}}^{{{\prime }{\text{T}}}} {{\rm R}}_{\text{i}}^{\text{T}} {{\rm t}}_{\text{i}} } \\ {{\text{t}}_{\text{i}}^{\text{T}} {{\rm R}}_{\text{i}} {{\rm B}}^{{\prime }} } & {{\text{t}}_{\text{i}}^{\text{T}} {{\rm t}}_{\text{i}} } \\ \end{array} } \right) $$

(25)

With:

$$ {\text{B}}^{\prime } = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\text{b}} & {\frac{\text{b}}{2}} \\ 0 & {\frac{\surd 3}{2}{\text{b}}} \\ 0 & 0 \\ \end{array} } \\ \end{array} } \right) $$

(26)

$ K_{i} $ is an upper-triangular matrix normalized as $ \det {\text{K}}_{\text{i}} = 1 $

$ \upomega_{\text{i}} = \left( {{\text{K}}_{\text{i}} {{\rm K}}_{\text{i}}^{\text{T}} } \right)^{ - 1} $ is the image of the absolute conic.

The same for $ {\text{P}}_{\text{j}} $:

$$ {\text{P}}_{\text{j}}^{\text{T}}\upomega_{\text{j}} {{\rm P}}_{\text{j}} \sim \left( {\begin{array}{*{20}c} {{\text{B}}^{{{\prime }{\text{T}}}} {{\rm B}}^{{\prime }} } & {{\text{B}}^{{{\prime }{\text{T}}}} {{\rm R}}_{\text{j}}^{\text{T}} {{\rm t}}_{\text{j}} } \\ {{\text{t}}_{\text{j}}^{\text{T}} {{\rm R}}_{\text{j}} {{\rm B}}^{{\prime }} } & {{\text{t}}_{\text{j}}^{\text{T}} {{\rm t}}_{\text{j}} } \\ \end{array} } \right) $$

(27)

We can deduce that the first rows and columns of the matrix $ {\text{P}}_{\text{i}}^{\text{T}}\upomega_{\text{i}} {{\rm P}}_{\text{i}} $ and $ {\text{P}}_{\text{j}}^{\text{T}}\upomega_{\text{j}} {{\rm P}}_{\text{j}} $ are the same.

We put $ {\text{X}}_{\text{i}} $ and $ {\text{X}}_{\text{j}} $ the two matrix corresponding respectively to the first two rows and columns of the two previous matrices.

$ {\text{X}}_{\text{m}} = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{\text{x}}_{{1{\text{m}}}} } & {{\text{x}}_{{3{\text{m}}}} } \\ {{\text{x}}_{{3{\text{m}}}} } & {{\text{x}}_{{2{\text{m}}}} } \\ \end{array} } \\ \end{array} } \right) $, with $ {\text{m}} = {\text{i}}, {\text{j}}. $

So we conclude the 3 following equations:

$$ \left\{ {\begin{array}{*{20}c} {{\text{x}}_{{1{\text{i}}}} = {\text{x}}_{{2{\text{i}}}} } \\ {{\text{x}}_{{1{\text{j}}}} = {\text{x}}_{{2{\text{j}}}} } \\ {{\text{x}}_{{1{\text{i}}}} {\text{x}}_{{3{\text{j}}}} = {\text{x}}_{{1{\text{j}}}} {\text{x}}_{{3{\text{i}}}} } \\ \end{array} } \right. $$

(28)

Each image pair gives a system of 3 equations with 8 unknown (4 unknown for $ \upomega_{\text{i}} $ and 4 unknown for $ \upomega_{\text{j}} $), so to solve the equation system (28), we need at least 4 images.

The equation system (28) is nonlinear, so to solve this system of equations we minimize the following nonlinear cost function:

$$ { \hbox{min} }_{{\upomega_{\text{k}} }} \sum\nolimits_{{{\text{j}} = {\text{i}} + 1}}^{\text{n}} {\sum\nolimits_{{{\text{i}} = 1}}^{{{\text{n}} - 1}} {\left( {\upalpha_{\text{ij}}^{2} +\upbeta_{\text{ij}}^{2} +\upgamma_{\text{ij}}^{2} } \right)} } $$

(29)

With: $ { \propto }_{\text{ij}} = {\text{q}}_{{1{\text{i}}}} - {\text{q}}_{{2{\text{i}}}} ; \upbeta_{\text{ij}} = {\text{q}}_{{1{\text{j}}}} - {\text{q}}_{{2{\text{j}}}} ; \upgamma_{\text{ij}} = {\text{q}}_{{1{\text{i}}}} {\text{q}}_{{3{\text{j}}}} - {\text{q}}_{{1{\text{j}}}} {\text{q}}_{{3{\text{i}}}} $, and : n is the number of images.

The Eq. (29) will be minimized by the Levenberg–Marquardt algorithm [18], this algorithm requires an initialization step. So the camera parameters are initialized as follows:

Pixels are squares, so: $ \upvarepsilon_{\text{i}} =\upvarepsilon_{\text{j}} = 1 $, $ {\text{s}}_{\text{i}} = {\text{s}}_{\text{j}} = 0 $,

The principal point is in the centre of the image so: $ {\text{x}}_{{0{\text{i}}}} = {\text{y}}_{{0{\text{i}}}} = {\text{x}}_{{0{\text{j}}}} = {\text{y}}_{{0{\text{j}}}} = 256 $ (because the images used are of sizes 512 × 512), and the focal distances $ {\text{f}}_{\text{i}} $ and $ {\text{f}}_{\text{j}} $ are obtained by the resolution of the equation system (29).

4.4 General Algorithm

5 Reconstruction of the 3D Scene

This part is dedicated to the 3D reconstruction to determine a cloud of 3D points from the matching between the pairs of images [19, 22, 23, 25]. In theory, getting the position of 3D points from their projections in the images is trivial. The matching 2D point pair must be the projections of the 3D points in the images.

This reconstruction is possible when the geometric relationship between the cameras is known and when the projection of the same point is measured in the images.

The reconstruction of a few points of the 3D scene requires the estimation of the projection matrix of this scene in different images.

We have: $ {\text{P}}_{0} $ and $ {\text{P}}_{ 1 } $ two projection matrices of the 3D scene, respectively in the plane of the images, such as:

$$ {\text{s}}_{{0{\text{m}}}} { \sim }{\text{P}}_{0} {\text{S}}_{\text{m}} $$

(30)

$$ {\text{s}}_{\text{im}} { \sim }{\text{P}}_{\text{i}} {{\text S}}_{\text{m}} $$

We have $ {\text{P}}{ \sim }{\text{K}}\left( {\text{R t}} \right) $

So,

$$ {\text{P}}_{0} { \sim }{\text{K}}_{0} \left( {{\text{I}}_{3} {\text{O}}} \right) $$

(31)

$$ {\text{P}}_{1} { \sim }{\text{K}}_{1} \left( {{\text{R}}_{1} {\text{t}}_{1} } \right) $$

The essential matrix [29] is the specialization of the fundamental matrix to the case of normalized image coordinates. Historically, the essential matrix was introduced (by Longuet-Higgins) before the fundamental matrix, and the fundamental matrix may be thought of as the generalization of the essential matrix in which the (inessential) assumption of calibrated cameras is removed. The essential matrix has fewer degrees of freedom, and additional properties, compared to the fundamental matrix.

The defining equation for the essential matrix is:

$$ \widehat{X}_{1}^{T} E \widehat{X}_{0} = 0 $$

With $ \widehat{X} = K^{ - 1} X $.

In terms of the normalized image coordinates for corresponding points $ X_{0} \leftrightarrow X_{1} $

Substituting for $ \widehat{X}_{0} $ and $ \widehat{X}_{1} $ gives $ X_{1}^{T} K_{1}^{T} EK^{ - 1} X_{0} = 0 $. Comparing this with the relation $ X_{1}^{T} F_{12} X_{0} = 0 $ for the fundamental matrix, it follows that the relationship between the fundamental and essential matrices is

$$ {\text{E}}_{12} = {\text{K}}_{1}^{\text{T}} {{\rm F}}_{12} {\text{K}}_{0} $$

(32)

With: $ {\text{F}}_{12} $ represent the fundamental matrix between the first and second images, It is estimated from 8 matches between this couple of images.

$ {\text{E}}_{12} $ is decompose into singular value in the following equation:

$$ {\text{E}}_{12} =\uplambda {\text{L}}_{1} {\text{U}}\left( {\begin{array}{*{20}c} 1 & 1 & 0 \\ \end{array} } \right){\text{L}}_{2}^{\text{T}} $$

(33)

With

λ is a non-zero scalar,

And $ {\text{U}}\left( {\begin{array}{*{20}c} 1 & 1 & 0 \\ \end{array} } \right) $ is written in the following form:

$$ {\text{U}}\left( {\begin{array}{*{20}c} 1 & 1 & 0 \\ \end{array} } \right) = {\text{N}}_{1} {\text{N}}_{2}^{\text{T}} = - {\text{N}}_{1} {\text{N}}_{2}^{\text{T}} $$

(34)

$$ {\text{N}}_{1} = \left( {\begin{array}{*{20}c} 0 & 1 & 0 \\ { - 1} & 0 & 0 \\ 0 & 0 & 0 \\ \end{array} } \right),\, {\text{N}}_{2} = \left( {\begin{array}{*{20}c} 0 & { - 1} & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \\ \end{array} } \right) $$

(35)

From (33) and (34), we have:

$$ {\text{E}}_{12} =\uplambda {\text{L}}_{1} {\text{N}}_{1} {{\rm N}}_{2}^{\text{T}} {{\rm L}}_{2}^{\text{T}} = -\uplambda {\text{L}}_{1} {{\rm N}}_{1} {{\rm N}}_{2}^{\text{T}} {{\rm L}}_{2}^{\text{T}} $$

(36)

$ {\text{L}}_{1} $ is orthonormal, so the matrix $ {\text{E}}_{12} $ can be written as the following form:

$$ {\text{E}}_{12} { \sim } {\text{L}}_{1} {\text{N}}_{1} {{\rm L}}_{1 }^{\text{T}} \left( { \mp {\text{L}}_{1} {\text{N}}_{2} {\text{L}}_{2}^{\text{T}} } \right){ \sim } - {\text{L}}_{1} {\text{N}}_{1} {\text{L}}_{1}^{\text{T}} \left( { \mp {\text{L}}_{1} {\text{N}}_{2}^{\text{T}} {{\rm L}}_{2} } \right) $$

(37)

On the other hand, $ {\text{E}}_{12} $ is expressed as follows:

$$ {\text{E}}_{12} { \sim }\left[ {{\text{t}}_{1} } \right]_{{ \wedge }} {\text{R}}_{1} $$

(38)

$$ \left[ {{\text{t}}_{1} } \right]_{{ \wedge }} = \left[ {\begin{array}{*{20}c} 0 & { - {\text{t}}_{13} } & {{\text{t}}_{12} } \\ {{\text{t}}_{13} } & 0 & { - {\text{t}}_{11} } \\ { - {\text{t}}_{12} } & {{\text{t}}_{11} } & 0 \\ \end{array} } \right] $$

(39)

And $ \left( {{\text{t}}_{11} {\text{t}}_{12} {\text{t}}_{13} } \right)^{\text{T}} $ are the coordinates of the translation vector $ {\text{t}}_{1} $.

From the two latest expressions, we can conclude the vector $ {\text{t}}_{1} $ that admits an unique solution:

$$ \left[ {{\text{t}}_{1} } \right]_{{ \wedge }} { \sim }{\text{L}}_{1} {\text{N}}_{1} {\text{L}}_{1}^{\text{T}} $$

(40)

And the rotation matrix $ {\text{R}}_{1} $ admits 4 solutions

$$ {\text{R}}_{1} { \sim } \mp {\text{L}}_{1} {\text{N}}_{2} {\text{L}}_{2 }^{\text{T}}\, {\text{ or R}}_{1} { \sim } \mp {\text{L}}_{1} {\text{N}}_{2}^{\text{T}} {{\rm L}}_{2 }^{\text{T }} $$

(41)

But the determinant of the rotation matrix must be equal to 1, which allows fixing a sign for the two matrices:

$$ \mp {\text{L}}_{1} {\text{N}}_{2} {\text{L}}_{2 }^{\text{T}} \;{\text{and }} \mp {\text{L}}_{1} {\text{N}}_{2}^{\text{T}} {{\rm L}}_{2 }^{\text{T}} $$

So the number of solutions for $ {\text{R}}_{1} $ becomes 2.

We use the two solutions to reconstruct the 3D scene, and finally we choose the solution that gives the best Euclidean reconstruction.

From the Eq. (30), we obtain the following linear system of equations:

$$ {\text{M}}\left( {\text{X Y Z}} \right)^{\text{T}} = {\text{N}} $$

(42)

$ {\text{M }} $: Matrix of size 4 x 3
$ {\text{N }} $: Vector of size 4

These two matrices are expressed in function of the elements of the projection matrices and the coordinates of the matches.

$ \left( {\text{X Y Z}} \right)^{\text{T}} $: The vector of the coordinates of the searched 3D point.

The coordinates of the 3D points (the solution of the Eq. (42)) are obtained by the following expression:

$$ {\text{detM}}^{\text{T}} {{\rm M }} \ne 0 \;{\text{so M}}^{\text{T}}\,{\text{M is no singular }} $$

$$ \left( {\text{X Y Z}} \right)^{\text{T}} = \left( {{\text{M}}^{\text{T}} {{\rm M}}} \right)^{ - 1} {\text{M}}^{\text{T}} {{\rm N}} $$

(43)

6 Experimentations

In this part, we have taken two images of an unknown three-dimensional scene by a CCD camera characterized by variable intrinsic parameters Fig. 4. In the first step, we applied the ORB descriptor to determine the interest points Fig. 5. And the matching between the two selected images Fig. 6. Subsequently and after implementation the algorithms of Ransac and Levenberg-Marquardt while relying on the Python programming language, we got the result of the 3D reconstruction below Fig. 7:

The detection of interest points, Fig. 5. And the mapping Fig. 6 are carried out by the descriptor ORB [20]. The determination of the relationship between the matches and the camera parameters permit to formulate a system of non-linear equations. This system is introduced in a non-linear cost function. The minimization of this function by Levenberg-Marquardt algorithm [18] allows finding an optimal solution of the camera parameters. These parameters are used with the matches to obtain an initial point cloud Fig. 7.

We have a lot of values to estimate, every parameters have a minimum value.

The intrinsic camera parameters (focal lengths, coordinates of the principal points, scale factors, skew factors) and the rotation matrices.

This population is chosen in a way that each parameter belongs to a specific interval Table 1.

Table 1. Intervals of camera parameters

Full size table

The usefulness of our contribution is to obtain a 3D scene reconstructed just from 2 images taken from an uncalibrated camera and with variable intrinsic parameters. The next steps will be the 3D modeling in order to finalize our work and find a robust results and a very well a 3D scene reconstructed based on a triangulation construction and a texture mapping.

7 Conclusion

In this work we have treated a new approach of the reconstruction of three-dimensional scenes from a method of autocalibration of cameras characterized by variable intrinsic parameters. The interest points are detected and matched by the ORB descriptor, and it’s used later with the projection matrix (expressed according to camera settings) of the scene in the planar images to determine coordinate of the point cloud, so that we can reconstruct the scene.

References

Lourakis, M.I.A., Deriche, R.: Camera self-calibration using the kruppa equations and the SVD of the fundamental matrix: the case of varying intrinsic parameters. Technical report 3911, INRIA (2000)
Google Scholar
Sturm, P.: Critical motion sequences for the self-calibration of cameras and stereo systems with variable focal length. Image Vis. Comput. 20(5–6), 415–426 (2002)
Article Google Scholar
Malis, E., Capolla, R.: Camera self-calibration from unknown planar structures enforcing the multi-view constraints between collineations. IEEE Trans. Pattern Anal. Mach. Intell. 4(9) (2002)
Google Scholar
Gurdjos, P., Sturm, P.: Methods and geometry for plane-based self-calibration. In: CVPR, pp. 491–496 (2003)
Google Scholar
Liu, P., Shi, J., Zhou, J., Jiang, L.: Camera self-calibration using the geometric structure in real scenes. In: Proceedings of the Computer Graphics International (2003)
Google Scholar
Hemayed, E.E.: A survey of camera self-calibration. In: Proceedings of the IEEE Conference on AVSS (2003)
Google Scholar
Zhang, W.: A simple method for 3D reconstruction from two views. In: GVIP 05 Conference, CICC, Cairo, Egypt, December 2005
Google Scholar
Boudine, B., Kramm, S., El Akkad, N., Bensrhair, A., Saaidi, A., Satori, K.: A flexible technique based on fundamental matrix for camera self-calibration with variable intrinsic parameters from two views. J. Vis. Commun. Image R. 39, 40–50 (2016)
Article Google Scholar
El Akkad, N., Merras, M., Saaidi, A., Satori, K.: Camera self-calibration with varying intrinsic parameters by an unknown three-dimensional scene. Vis. Comput. 30(5), 519–530 (2014)
Article Google Scholar
El Akkad, N., Merras, M., Saaidi, A., Satori, K.: Camera self-calibration with varying parameters from two views. WSEAS Trans. Inf. Sci. Appl. 10(11), 356–367 (2013)
Google Scholar
Torr, P.H.S., Murray, D.W.: The development and comparison of robust methods for estimating the fundamental matrix. IJCV 24, 271–300 (1997)
Article Google Scholar
Trajkovic, M., Hedley, M.: Fast corner detection. Image Vis. Comput. 16, 75–87 (1998)
Article Google Scholar
Harris, C., Stephens, M.: A combined corner et edge detector. In: 4th Alvey vision Conference, pp. 147–151 (1988)
Google Scholar
Smith, S.M., Brady, J.M.: A new approach to low level image processing. Int. J. Comput. Vis. 23(1), 45–78 (1997)
Article Google Scholar
Saaidi, A., Tairi, H., Satori, K.: Fast stereo matching using rectification and correlation techniques. In: ISCCSP, Second International Symposium on Communications, Control And Signal Processing, Marrakech, Morrocco, March 2006
Google Scholar
Chambon, S., Crouzil, A.: Similarity measures for image matching despite occlusions in stereo vision. Pattern Recognit. 44(9), 2063–2075 (2011)
Article Google Scholar
Mattoccia, S., Tombari, F., Di Stefano, L.: Fast full-search equivalent template matching by enhanced bounded correlation. IEEE Trans. Image Process. 17(4), 528–538 (2008)
Article MathSciNet Google Scholar
Moré, J.J.: The Levenberg-Marquardt algorithm: implementation and theory. In: Watson, G.A. (ed.) Numerical Analysis. LNM, vol. 630, pp. 105–116. Springer, Heidelberg (1978). https://doi.org/10.1007/BFb0067700
Chapter Google Scholar
El Akkad, N., El Hazzat, S., Saaidi, A., Satori, K.: Reconstruction of 3D scenes by camera self-calibration and using genetic algorithms. 3D Res. 7, 6 (2016)
Article Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571. IEEE (2011)
Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56
Chapter Google Scholar
Merras, M., Saaidi, A., El Akkad, N., Satori, K.: Multi-view 3D reconstruction and modeling of the unknown 3D scenes using genetic algorithms. Soft Comput. (2017). https://doi.org/10.1007/s00500-017-2966-z
El Hazzat, S., Merras, M., El Akkad, N., Saaidi, A., Satori, K.: 3D reconstruction system based on incremental structure from motion using a camera with varying parameters. Vis. Comput. (2017). https://doi.org/10.1007/s00371-017-1451-0
El Akkad, N., Merras, M., Baataoui, A., Saaidi, A., Satori, K.: Camera self-calibration having the varying parameters and based on homography of the plane at infinity. Multimed. Tools Appl. (2017). https://doi.org/10.1007/s11042-017-5012-3
El Akkad, N., El Hazzat, S., Saaidi, A., Satori, K.: Reconstruction of 3D scenes by camera self-calibration and using genetic algorithms. 3D Res. 7(6), 1–17 (2016)
Google Scholar
Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. In: 2011 IEEE International Conference on Computer Vision (ICCV). IEEE (2011)‏
Google Scholar
Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2012)‏
Google Scholar
Rosin, P.L.: Measuring corner properties. Comput. Vis. Image Underst. 73(2), 291–307 (1999)
Article Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2004)
Book Google Scholar

Download references

Author information

Authors and Affiliations

LIIAN, Department of Computer Science, Faculty of Science, Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, B.P 1796, Atlas, Fez, Morocco
Karima Karim, Nabil El Akkad & Khalid Satori
Department of Mathematics and Computer Science, National School of Applied Sciences (ENSA) of Al-Hoceima, University of Mohamed First, B.P 03, Ajdir, Oujda, Morocco
Nabil El Akkad

Authors

Karima Karim
View author publications
You can also search for this author in PubMed Google Scholar
Nabil El Akkad
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Satori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Karima Karim , Nabil El Akkad or Khalid Satori .

Editor information

Editors and Affiliations

Abdelmalek Essaâdi University, Tétouan, Morocco
Youness Tabii
Abdelmalek Essaâdi University, Tétouan, Morocco
Mohamed Lazaar
Abdelmalek Essaâdi University, Tétouan, Morocco
Mohammed Al Achhab
Université Ibn-Tofail, Tétouan, Morocco
Nourddine Enneya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karim, K., El Akkad, N., Satori, K. (2018). Reconstruction of the 3D Scenes from the Matching Between Image Pair Taken by an Uncalibrated Camera. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-96292-4_35
Published: 14 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96291-7
Online ISBN: 978-3-319-96292-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Reconstruction of the 3D Scenes from the Matching Between Image Pair Taken by an Uncalibrated Camera

Abstract

Similar content being viewed by others

To Study the Effect of Varying Camera Parameters in the Process of Matching and Reconstruction

Incremental Multi-view 3D Reconstruction Starting from Two Images Taken by a Stereo Pair of Cameras

Minimal Solutions for Pose Estimation of a Multi-Camera System

Keywords

1 Introduction

2 Diagram of Different Steps of Our Method

3 Scene and Camera Model

3.1 Presentation of the Scene

3.2 Model of the Camera

4 Camera Autocalibration

4.1 ORB Descriptor: Oriented FAST and Rotated BRIEF

4.2 The Projection Matrix

4.3 Autocalibration Equations

4.4 General Algorithm

5 Reconstruction of the 3D Scene

6 Experimentations

7 Conclusion

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Reconstruction of the 3D Scenes from the Matching Between Image Pair Taken by an Uncalibrated Camera

Abstract

Similar content being viewed by others

To Study the Effect of Varying Camera Parameters in the Process of Matching and Reconstruction

Incremental Multi-view 3D Reconstruction Starting from Two Images Taken by a Stereo Pair of Cameras

Minimal Solutions for Pose Estimation of a Multi-Camera System

Keywords

1 Introduction

2 Diagram of Different Steps of Our Method

3 Scene and Camera Model

3.1 Presentation of the Scene

3.2 Model of the Camera

4 Camera Autocalibration

4.1 ORB Descriptor: Oriented FAST and Rotated BRIEF

4.2 The Projection Matrix

4.3 Autocalibration Equations

4.4 General Algorithm

5 Reconstruction of the 3D Scene

6 Experimentations

7 Conclusion

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation