Keywords

1 Introduction

In this work, we will investigate about the three-dimensional reconstruction being a technique that allows obtaining a 3D representation of an object from a sequence of images of this object taken by different views. In fact, several 3D reconstruction techniques use calibration or Auto-calibration methods.

During this work, we will presented a new approach to reconstructing three-dimensional scenes from a method of autocalibration of cameras characterized by variable parameters. In general, the determination of the 3D scene is based on the euclidean reconstruction of the interest points detected and matched by the ORB descriptor [20]. The intrinsic parameters of the cameras are estimated by the resolution of a nonlinear equation system (using the nonlinear equations of the Levenberg-Marquart algorithm [18]), and they are used with the fundamental matrices (estimated from 8 pairings between the image couples by the RANSAC algorithm [11]) to determine the extrinsic camera parameters, and finally to estimate the projection matrix (expressed according to the intrinsic and extrinsic parameters of the cameras used). The relationships between camera parameters, projection matrix elements, pairing coordinates, and 3D point coordinates gives a linear equation system and the resolution of this system permits to obtain a cloud of 3D points.

In this introduction, we have therefore provided the general ideas that will be investigated in this paper. The rest of this work is organized as follows:

A diagram of different steps of our method is presented in the second part, the scene and the camera model are presented in the third part, the fourth part treats the auto calibration of the cameras, the fifth part explains the reconstruction of the 3D scene, the experiments will be discussed in the sixth paragraph and the conclusion is presented in the last part.

2 Diagram of Different Steps of Our Method

The Fig. 1. below represents a diagram of different steps of the reconstruction of 3D scene:

Fig. 1.
figure 1

Diagram of the reconstruction of the 3D scene

3 Scene and Camera Model

3.1 Presentation of the Scene

We consider two points \( {\text{S}}_{1} \) and \( {\text{S}}_{2} \) of the 3D scene, there is a single point \( {\text{S}}_{3} \), such as \( {\text{S}}_{1} {\text{S}}_{2} {\text{S}}_{3} \): is an equilateral triangle. \( {\text{R}}_{\text{e}} \left( {{\text{OX}}_{\text{e}} {{\rm Y}}_{\text{e}} {{\rm Z}}_{\text{e}} } \right) \) is the euclidean reference associated to the triangle wich \( {\text{O}} \) is its center and \( {\text{b}} \) its side.

3.2 Model of the Camera

We are using the pinhole model of the camera Fig. 2. so that we project the points of the 3D scene in the planes of images, this model is characterized by a matrix \( {\text{K}}_{\text{i}} \left( {{\text{R}}_{\text{i}} {{\rm t}}_{\text{i}} } \right) \) of size (3 × 4), with:

Fig. 2.
figure 2

Representation of the scene

  • \( {\text{R}}_{\text{i}} : \) the rotation matrix

  • \( {\text{t}}_{\text{i}} : \) the translation vector

  • \( {\text{K}}_{\text{i}} : \) The matrix of intrinsic parameters defined by:

    $$ {\text{K}}_{\text{i}} = \left( {\begin{array}{*{20}c} {{\text{f}}_{\text{i}} } & {{\text{s}}_{\text{i}} } & {{\text{u}}_{{ \circ {\text{i}}}} } \\ 0 & {\upvarepsilon_{\text{i}} {\rm{f}}_{\text{i}} } & {{\text{v}}_{{ \circ {\text{i}}}} } \\ 0 & 0 & 1 \\ \end{array} } \right) $$
    (1)
  • with \( {\text{f}}_{\text{i}} : \) focal length

  • \( \upvarepsilon_{\text{i}} : \) the scaling factor

  • \( {\text{s}}_{\text{i}} : \) the skew factor

  • \( \left( {{\text{u}}_{{0{\text{i}}}} , {\text{v}}_{{0{\text{i}}}} } \right): \) the coordinates of the principal point.

4 Camera Autocalibration

The auto Calibration [1,2,3,4,5,6,7,8,9,10] is a technique that allows us to estimate the parameters of the cameras without any prior knowledge on the scene.

4.1 ORB Descriptor: Oriented FAST and Rotated BRIEF

The detection [12,13,14] and the matching [15,16,17] of the interest points are important steps in the autocalibration and the reconstruction of 3D scenes, in this paper we based on the ORB descriptor: Oriented FAST and rotated BRIEF [21] (ORB: Binary Robust Independent Elementary Features) which is a fast robust local feature detector, first presented by Rublee et al. in 2011 [20], that can be used in computer vision tasks like object recognition or 3D reconstruction. It is a fusion of the FAST key point detector and BRIEF descriptor with some modifications [9]. Initially to determine the key points, it uses FAST. Then a Harris corner measure is applied to find top N points. FAST does not compute the orientation and is rotation variant. It computes the intensity weighted centroid of the patch with located corner at center. The direction of the vector from this corner point to centroid gives the orientation. Moments are computed to improve the rotation invariance. The descriptor BRIEF poorly performs if there is an in-plane rotation. In ORB, a rotation matrix is computed using the orientation of patch and then the BRIEF descriptors are steered according to the orientation.

The ORB descriptor is a bit similar to BRIEF. It doesn’t have an elaborate sampling pattern as BRISK [26] or FREAK [27]. However, there are two main differences between ORB and BRIEF:

  1. 1.

    ORB uses an orientation compensation mechanism, making it rotation invariant.

  2. 2.

    ORB learns the optimal sampling pairs, whereas BRIEF uses randomly chosen sampling pairs.

ORB uses a simple measure of corner orientation – the intensity centroid [28]. First, the moments of a patch are defined as:

$$ {\forall } {\text{p, q}} \in \{ 0,1\} :{\text{m}}_{\text{pq }} = \sum {_{\text{x,y}}\,{\text{ x}}^{\text{p}} {{\rm y}}^{\text{q}} {\text{ I(x,y)}}} $$
(2)

With:

\( {\text{p, q}} \in \{ 0,1\} \) Binary selector for x and y direction

  • x,y Circular window

  • \( {\text{x}}^{\text{p}} {{\rm y}}^{\text{q}} \) weighted by coordinate

  • \( {\text{I}}\left( {{\text{x}},{\text{y}}} \right) \) image function

Image moments help us to calculate some features like center of mass of the object, area of the object etc.

With these moments we can find the centroid, the “center of mass” of the patch as:

$$ {\text{C}} = \left( { \frac{{{\text{m}}_{10} }}{{{\text{m}}_{00} }} , \frac{{{\text{m}}_{01} }}{{{\text{m}}_{00} }} } \right) $$
(3)

and by constructing a vector from the patch center O to the centroid C, we can define the relative orientation of the patch as:

$$ \mathop \to \limits_{OC} \theta = {\text{atan}}2\left( {m_{01} , m_{10} } \right) $$
(4)
figure a

ORB discretize the angle to increments of \( \frac{{2\uppi}}{30} \) (12°), and construct a lookup table of precomputed BRIEF patterns. As long as the keypoint orientation \( \uptheta \) is consistent across views, the correct set of points will be used to compute its descriptor.

To conclude, ORB is binary descriptor that is similar to BRIEF, with the added advantages of rotation invariance and learned sampling pairs. You’re probably asking yourself, how does ORB perform in comparison to BRIEF. Well, in non-geometric transformation (those that are image capture dependent and do not rely on the viewpoint, such as blur, JPEG compression, exposure and illumination) BRIEF actually outperforms ORB. In affine transformation, BRIEF perform poorly under large rotation or scale change as it’s not designed to handle such changes. In perspective transformations, which are the result of view-point change, BRIEF surprisingly slightly outperforms ORB.

4.2 The Projection Matrix

We consider \( {\text{S}}_{1} \) and \( {\text{S}}_{2} \) two points of the 3D scene and π the plan which contains these two points.

\( {\text{R}}_{\text{e}} \left( {{\text{O X}}_{\text{e}} {{\rm Y}}_{\text{e}} {{\rm Z}}_{\text{e}} } \right) \) is the Euclidian reference which is associated to the triangle of the center O and side b

The coordinates of points \( {\text{S}}_{1} \), \( {\text{S}}_{2} \) and \( {\text{S}}_{3} \) Fig. 3 are given as below:

$$ {\text{S}}_{1} = \left( {\frac{\text{b}}{2},\frac{\surd 3}{2}{\text{b}},1} \right)^{\text{T}} $$
$$ {\text{S}}_{2} = \left( {{\text{b}},0,1} \right)^{\text{T}} $$
$$ {\text{S}}_{3} = \left( {0,1,1} \right)^{\text{T}} $$
Fig. 3.
figure 3

Representation of points \( {\text{S}}_{1} \), \( {\text{S}}_{2} \) and \( {\text{S}}_{3} \) in the two images i and j.

We consider the two homography \( {\text{H}}_{\text{i}} \) and \( {\text{H}}_{\text{j}} \) that can be used to project the plan in the images i and j, so the projection of the two points can be represented by the following expressions:

$$ {\text{s}}_{\text{im}} { \sim }{\text{H}}_{\text{i}} {{\rm S}}_{\text{m}} $$
(5)
$$ {\text{s}}_{\text{jm}} { \sim }{\text{H}}_{\text{j}} {{\rm S}}_{\text{m}} $$
(6)

With \( {\text{m}} = 1,2 \). \( {\text{s}}_{\text{im}} \) and \( {\text{s}}_{\text{jm}} \) represent respectively the points in the images \( {\text{i}} \) and \( {\text{j }} \) which are the projections of the two summits \( {\text{S}}_{1} \) and \( {\text{S}}_{2} \) of the 3D scene, and \( {\text{H}}_{\text{n}} \) represents the homography matrix defined by:

$$ {\text{H}}_{\text{n}} = {\text{K}}_{\text{n}} {{\rm R}}_{\text{n}} \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} 1 & 0 \\ 0 & 1 \\ 0 & 0 \\ \end{array} } \\ \end{array} {\text{R}}_{\text{n}}^{\text{T}} {{\rm t}}_{\text{n}} } \right) ;\,{\text{n}} = {\text{i}}, {\text{j }} $$
(7)
  • With:

  • \( {\text{R}}_{\text{n}} : \) the rotation matrix

  • \( {\text{t}}_{\text{n}} : \) the translation vector

  • \( {\text{K}}_{\text{n}} : \) The matrix of intrinsic parameters.

The expressions (5) and (6) can be written as :

$$ {\text{s}}_{\text{im}} { \sim }{\text{H}}_{\text{i}} {{\rm BS'}}_{\text{m}} $$
(8)
$$ {\text{s}}_{\text{jm}} { \sim }{\text{H}}_{\text{j}} {{\rm BS'}}_{\text{m}} $$
(9)

With : \( {\text{B}} = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\text{b}} & {\frac{\text{b}}{2}} & 0 \\ 0 & {\frac{\surd 3}{2}{\text{b}}} & 0 \\ 0 & 0 & 1 \\ \end{array} } \\ \end{array} } \right) \)

$$ {\text{S'}}_{\text{m}} = \left( {\begin{array}{*{20}c} {\text{a}} \\ {\text{b}} \\ 1 \\ \end{array} } \right) $$

For:

$$ \left\{ {\begin{array}{*{20}c} {{\text{m}} = 1 < = > a = 0 \;and\; b = 1} \\ {{\text{m}} = 2 < = > a = 1 \;and\; b = 0} \\ \end{array} } \right. $$

We put:

$$ {\text{P}}_{\text{n}} { \sim }{\text{H}}_{\text{n}} {{\rm B }}; \,{\text{n}} = {\text{i}}, {\text{j }} $$
(10)

With \( {\text{P}}_{\text{i}} \) and \( {\text{P}}_{\text{j}} \) are the projections matrix of the two points \( {\text{S}}_{1}^{ '} \) and \( {\text{S}}_{2}^{ '} \) in the images i and j Figs. 3 and 6.

From the Eq. (10) we have:

$$ {\text{P}}_{\text{j}} { \sim } {\text{H}}_{\text{ij}} {{\rm P}}_{\text{i}} $$
(11)

With:

$$ {\text{H}}_{\text{ij}} { \sim }{\text{H}}_{\text{j}} {{\rm H}}_{\text{i}}^{ - 1} $$
(12)

\( {\text{H}}_{\text{ij}} \) is the homography between the images \( {\text{i }} \) and \( {\text{j}} \).

The Eqs. (8), (9) and (10) give:

$$ {\text{s}}_{\text{im}} { \sim }{\text{P}}_{\text{i}} {{\rm S}}_{\text{m}}^{ '} $$
(13)
$$ {\text{s}}_{\text{jm}} { \sim }{\text{P}}_{\text{j}} {{\rm S}}_{\text{m}}^{ '} $$
(14)

And from the Eqs. (11) and (14) we have :

$$ {\text{s}}_{\text{jm}} { \sim }{\text{H}}_{\text{ij}} {{\rm P}}_{\text{i}} {{\rm S}}_{\text{m }}^{ '} $$
(15)

The Eq. (15) gives:

$$ {\text{e}}_{\text{j}}^{ '} {\text{s}}_{\text{jm}} { \sim }{\text{e}}_{\text{j}}^{ '} {\text{H}}_{\text{ij}} {{\rm P}}_{\text{i}} {{\rm S}}_{\text{m}}^{ '} $$
(16)

This later gives:

$$ {\text{e}}_{\text{j}}^{ '} {\text{s}}_{\text{jm}} { \sim }{\text{F}}_{\text{ij}} {{\rm P}}_{\text{i}} {{\rm S}}_{\text{m}}^{ '} $$
(17)

With \( {\text{F}}_{\text{ij}} \) is the fundamental matrix between the images i and j.

$$ {\text{e}}_{\text{j}}^{ '} = \left( {\begin{array}{*{20}c} 0 & { - {\text{e}}_{{{\text{j}}3}} } & {{\text{e}}_{{{\text{j}}2}} } \\ {{\text{e}}_{{{\text{j}}3}} } & 0 & { - {\text{e}}_{{{\text{j}}1}} } \\ { - {\text{e}}_{{{\text{j}}2}} } & {{\text{e}}_{{{\text{j}}1}} } & 0 \\ \end{array} } \right) $$

\( \left( {{\text{e}}_{{{\text{j}}1}} {{\rm e}}_{{{\text{j}}2}} {{\rm e}}_{{{\text{j}}3}} } \right)^{\text{T}} \) are the coordinates of the epipole of the right image, this epipole can be estimated by the fundamental matrix.

The expression (18) gives:

$$ {\text{s}}_{{{\text{i}}1}} \sim{\text{P}}_{\text{i}} {{\rm S}}_{1}^{ '} $$
(18)
$$ {\text{s}}_{{{\text{i}}2}} \sim{\text{P}}_{\text{i}} {{\rm S}}_{2}^{ '} $$
(19)

So from the two last relationships, we gets four equations with eight unknowns that are the elements of \( {\text{P}}_{\text{i}} \)

The expression (17) gives:

$$ {\text{e}}_{\text{j}}^{ '} {\text{s}}_{{{\text{j}}1}} \sim{\text{F}}_{\text{ij}} {{\rm P}}_{\text{i}} {{\rm S}}_{1}^{ '} $$
(20)
$$ {\text{e}}_{\text{j}}^{ '} {\text{s}}_{{{\text{j}}2}} \sim{\text{F}}_{\text{ij}} {{\rm P}}_{\text{i}} {{\rm S}}_{2}^{ '} $$
(21)

From the two last relationships, we get four other equations with eight unknowns which are the parameters of  \( {\text{P}}_{\text{i}} \). So we can estimate the parameters of \( {\text{P}}_{\text{i}} \), because we have a total of eight unknown equations that are the elements of \( {\text{P}}_{\text{i}} \).

The Eq. (11) gives:

$$ {\text{e}}_{\text{j}}^{ '} {\text{P}}_{\text{j}} { \sim }{\text{e}}_{\text{j}}^{ '} {{\rm H}}_{\text{ij}} {{\rm P}}_{\text{i}} $$
(22)

That gives:

$$ {\text{e}}_{\text{j}}^{ '} {\text{P}}_{\text{j}} { \sim }{\text{F}}_{\text{ij}} {{\rm P}}_{\text{i}} $$
(23)

The previous expression gives eight unknown equations that are the elements of \( {\text{P}}_{\text{j}} \).

So we can estimate the parameters of \( {\text{P}}_{\text{j}} \) from these eight equations with eight unknown.

4.3 Autocalibration Equations

In this part, we will determine the relationship between the images of the absolute conic \( (\upomega_{\text{i }}\,{\text{ and }}\upomega_{\text{j}} \)), and a relationship between the two points \( \left( {{\text{S}}_{1} , {\text{S}}_{2} } \right) \) of the 3D scene and their projections \( \left( {{\text{s}}_{{{\text{i}}1}} , {\text{s}}_{{{\text{i}}2}} } \right) \) and \( \left( {{\text{s}}_{{{\text{j}}1}} , {\text{s}}_{{{\text{j}}2}} } \right) \) in the planes of the left and right images respectively, the different relationships are established from some techniques of projective geometry. A nonlinear cost function will be defined from the determination of these relationships. The formulated cost function will be minimized by the Levenberg-Marquardt algorithm [18] to estimate \( \upomega_{\text{i}} \) and \( \upomega_{\text{j}} \) and finally the intrinsic parameters of the cameras used [24].

The Eq. (11) gives:

$$ \uplambda_{\text{im}} {{\rm s}}_{\text{im}} = {\text{P}}_{\text{i}} {{\rm S}}_{\text{m}}^{ '} $$
(24)

With: \( {\text{P}}_{\text{i}} = \left( {\begin{array}{*{20}c} {{\text{P}}_{11} } & {{\text{P}}_{12} } & {{\text{P}}_{13} } \\ {{\text{P}}_{21} } & {{\text{P}}_{22} } & {{\text{P}}_{23} } \\ {{\text{P}}_{31} } & {{\text{P}}_{32} } & {{\text{P}}_{33} } \\ \end{array} } \right) \)

$$ {\text{s}}_{\text{im}} = \left( {\begin{array}{*{20}c} {{\text{x}}_{\text{im}} } \\ {{\text{y}}_{\text{im}} } \\ 1 \\ \end{array} } \right) $$
$$ {\text{P}}_{\text{i}}^{\text{T}}\upomega_{\text{i}} {{\rm P}}_{\text{i}} \sim\left( {\begin{array}{*{20}c} {{\text{B}}^{{{\prime }{\text{T}}}} {{\rm B}}^{{\prime }} } & {{\text{B}}^{{{\prime }{\text{T}}}} {{\rm R}}_{\text{i}}^{\text{T}} {{\rm t}}_{\text{i}} } \\ {{\text{t}}_{\text{i}}^{\text{T}} {{\rm R}}_{\text{i}} {{\rm B}}^{{\prime }} } & {{\text{t}}_{\text{i}}^{\text{T}} {{\rm t}}_{\text{i}} } \\ \end{array} } \right) $$
(25)

With:

$$ {\text{B}}^{\prime } = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\text{b}} & {\frac{\text{b}}{2}} \\ 0 & {\frac{\surd 3}{2}{\text{b}}} \\ 0 & 0 \\ \end{array} } \\ \end{array} } \right) $$
(26)

\( K_{i} \) is an upper-triangular matrix normalized as \( \det {\text{K}}_{\text{i}} = 1 \)

\( \upomega_{\text{i}} = \left( {{\text{K}}_{\text{i}} {{\rm K}}_{\text{i}}^{\text{T}} } \right)^{ - 1} \) is the image of the absolute conic.

The same for \( {\text{P}}_{\text{j}} \):

$$ {\text{P}}_{\text{j}}^{\text{T}}\upomega_{\text{j}} {{\rm P}}_{\text{j}} \sim \left( {\begin{array}{*{20}c} {{\text{B}}^{{{\prime }{\text{T}}}} {{\rm B}}^{{\prime }} } & {{\text{B}}^{{{\prime }{\text{T}}}} {{\rm R}}_{\text{j}}^{\text{T}} {{\rm t}}_{\text{j}} } \\ {{\text{t}}_{\text{j}}^{\text{T}} {{\rm R}}_{\text{j}} {{\rm B}}^{{\prime }} } & {{\text{t}}_{\text{j}}^{\text{T}} {{\rm t}}_{\text{j}} } \\ \end{array} } \right) $$
(27)

We can deduce that the first rows and columns of the matrix \( {\text{P}}_{\text{i}}^{\text{T}}\upomega_{\text{i}} {{\rm P}}_{\text{i}} \) and \( {\text{P}}_{\text{j}}^{\text{T}}\upomega_{\text{j}} {{\rm P}}_{\text{j}} \) are the same.

We put \( {\text{X}}_{\text{i}} \) and \( {\text{X}}_{\text{j}} \) the two matrix corresponding respectively to the first two rows and columns of the two previous matrices.

\( {\text{X}}_{\text{m}} = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{\text{x}}_{{1{\text{m}}}} } & {{\text{x}}_{{3{\text{m}}}} } \\ {{\text{x}}_{{3{\text{m}}}} } & {{\text{x}}_{{2{\text{m}}}} } \\ \end{array} } \\ \end{array} } \right) \), with \( {\text{m}} = {\text{i}}, {\text{j}}. \)

So we conclude the 3 following equations:

$$ \left\{ {\begin{array}{*{20}c} {{\text{x}}_{{1{\text{i}}}} = {\text{x}}_{{2{\text{i}}}} } \\ {{\text{x}}_{{1{\text{j}}}} = {\text{x}}_{{2{\text{j}}}} } \\ {{\text{x}}_{{1{\text{i}}}} {\text{x}}_{{3{\text{j}}}} = {\text{x}}_{{1{\text{j}}}} {\text{x}}_{{3{\text{i}}}} } \\ \end{array} } \right. $$
(28)

Each image pair gives a system of 3 equations with 8 unknown (4 unknown for \( \upomega_{\text{i}} \) and 4 unknown for \( \upomega_{\text{j}} \)), so to solve the equation system (28), we need at least 4 images.

The equation system (28) is nonlinear, so to solve this system of equations we minimize the following nonlinear cost function:

$$ { \hbox{min} }_{{\upomega_{\text{k}} }} \sum\nolimits_{{{\text{j}} = {\text{i}} + 1}}^{\text{n}} {\sum\nolimits_{{{\text{i}} = 1}}^{{{\text{n}} - 1}} {\left( {\upalpha_{\text{ij}}^{2} +\upbeta_{\text{ij}}^{2} +\upgamma_{\text{ij}}^{2} } \right)} } $$
(29)

With: \( { \propto }_{\text{ij}} = {\text{q}}_{{1{\text{i}}}} - {\text{q}}_{{2{\text{i}}}} ; \upbeta_{\text{ij}} = {\text{q}}_{{1{\text{j}}}} - {\text{q}}_{{2{\text{j}}}} ; \upgamma_{\text{ij}} = {\text{q}}_{{1{\text{i}}}} {\text{q}}_{{3{\text{j}}}} - {\text{q}}_{{1{\text{j}}}} {\text{q}}_{{3{\text{i}}}} \), and : n is the number of images.

The Eq. (29) will be minimized by the Levenberg–Marquardt algorithm [18], this algorithm requires an initialization step. So the camera parameters are initialized as follows:

Pixels are squares, so: \( \upvarepsilon_{\text{i}} =\upvarepsilon_{\text{j}} = 1 \), \( {\text{s}}_{\text{i}} = {\text{s}}_{\text{j}} = 0 \),

The principal point is in the centre of the image so: \( {\text{x}}_{{0{\text{i}}}} = {\text{y}}_{{0{\text{i}}}} = {\text{x}}_{{0{\text{j}}}} = {\text{y}}_{{0{\text{j}}}} = 256 \) (because the images used are of sizes 512 × 512), and the focal distances \( {\text{f}}_{\text{i}} \) and \( {\text{f}}_{\text{j}} \) are obtained by the resolution of the equation system (29).

4.4 General Algorithm

figure b

5 Reconstruction of the 3D Scene

This part is dedicated to the 3D reconstruction to determine a cloud of 3D points from the matching between the pairs of images [19, 22, 23, 25]. In theory, getting the position of 3D points from their projections in the images is trivial. The matching 2D point pair must be the projections of the 3D points in the images.

This reconstruction is possible when the geometric relationship between the cameras is known and when the projection of the same point is measured in the images.

The reconstruction of a few points of the 3D scene requires the estimation of the projection matrix of this scene in different images.

We have: \( {\text{P}}_{0} \) and \( {\text{P}}_{ 1 } \) two projection matrices of the 3D scene, respectively in the plane of the images, such as:

$$ {\text{s}}_{{0{\text{m}}}} { \sim }{\text{P}}_{0} {\text{S}}_{\text{m}} $$
(30)
$$ {\text{s}}_{\text{im}} { \sim }{\text{P}}_{\text{i}} {{\text S}}_{\text{m}} $$

We have \( {\text{P}}{ \sim }{\text{K}}\left( {\text{R t}} \right) \)

So,

$$ {\text{P}}_{0} { \sim }{\text{K}}_{0} \left( {{\text{I}}_{3} {\text{O}}} \right) $$
(31)
$$ {\text{P}}_{1} { \sim }{\text{K}}_{1} \left( {{\text{R}}_{1} {\text{t}}_{1} } \right) $$

The essential matrix [29] is the specialization of the fundamental matrix to the case of normalized image coordinates. Historically, the essential matrix was introduced (by Longuet-Higgins) before the fundamental matrix, and the fundamental matrix may be thought of as the generalization of the essential matrix in which the (inessential) assumption of calibrated cameras is removed. The essential matrix has fewer degrees of freedom, and additional properties, compared to the fundamental matrix.

The defining equation for the essential matrix is:

$$ \widehat{X}_{1}^{T} E \widehat{X}_{0} = 0 $$

With \( \widehat{X} = K^{ - 1} X \).

In terms of the normalized image coordinates for corresponding points \( X_{0} \leftrightarrow X_{1} \)

Substituting for \( \widehat{X}_{0} \) and \( \widehat{X}_{1} \) gives \( X_{1}^{T} K_{1}^{T} EK^{ - 1} X_{0} = 0 \). Comparing this with the relation \( X_{1}^{T} F_{12} X_{0} = 0 \) for the fundamental matrix, it follows that the relationship between the fundamental and essential matrices is

$$ {\text{E}}_{12} = {\text{K}}_{1}^{\text{T}} {{\rm F}}_{12} {\text{K}}_{0} $$
(32)

With: \( {\text{F}}_{12} \) represent the fundamental matrix between the first and second images, It is estimated from 8 matches between this couple of images.

\( {\text{E}}_{12} \) is decompose into singular value in the following equation:

$$ {\text{E}}_{12} =\uplambda {\text{L}}_{1} {\text{U}}\left( {\begin{array}{*{20}c} 1 & 1 & 0 \\ \end{array} } \right){\text{L}}_{2}^{\text{T}} $$
(33)

With

λ is a non-zero scalar,

And \( {\text{U}}\left( {\begin{array}{*{20}c} 1 & 1 & 0 \\ \end{array} } \right) \) is written in the following form:

$$ {\text{U}}\left( {\begin{array}{*{20}c} 1 & 1 & 0 \\ \end{array} } \right) = {\text{N}}_{1} {\text{N}}_{2}^{\text{T}} = - {\text{N}}_{1} {\text{N}}_{2}^{\text{T}} $$
(34)
$$ {\text{N}}_{1} = \left( {\begin{array}{*{20}c} 0 & 1 & 0 \\ { - 1} & 0 & 0 \\ 0 & 0 & 0 \\ \end{array} } \right),\, {\text{N}}_{2} = \left( {\begin{array}{*{20}c} 0 & { - 1} & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \\ \end{array} } \right) $$
(35)

From (33) and (34), we have:

$$ {\text{E}}_{12} =\uplambda {\text{L}}_{1} {\text{N}}_{1} {{\rm N}}_{2}^{\text{T}} {{\rm L}}_{2}^{\text{T}} = -\uplambda {\text{L}}_{1} {{\rm N}}_{1} {{\rm N}}_{2}^{\text{T}} {{\rm L}}_{2}^{\text{T}} $$
(36)

\( {\text{L}}_{1} \) is orthonormal, so the matrix \( {\text{E}}_{12} \) can be written as the following form:

$$ {\text{E}}_{12} { \sim } {\text{L}}_{1} {\text{N}}_{1} {{\rm L}}_{1 }^{\text{T}} \left( { \mp {\text{L}}_{1} {\text{N}}_{2} {\text{L}}_{2}^{\text{T}} } \right){ \sim } - {\text{L}}_{1} {\text{N}}_{1} {\text{L}}_{1}^{\text{T}} \left( { \mp {\text{L}}_{1} {\text{N}}_{2}^{\text{T}} {{\rm L}}_{2} } \right) $$
(37)

On the other hand, \( {\text{E}}_{12} \) is expressed as follows:

$$ {\text{E}}_{12} { \sim }\left[ {{\text{t}}_{1} } \right]_{{ \wedge }} {\text{R}}_{1} $$
(38)
$$ \left[ {{\text{t}}_{1} } \right]_{{ \wedge }} = \left[ {\begin{array}{*{20}c} 0 & { - {\text{t}}_{13} } & {{\text{t}}_{12} } \\ {{\text{t}}_{13} } & 0 & { - {\text{t}}_{11} } \\ { - {\text{t}}_{12} } & {{\text{t}}_{11} } & 0 \\ \end{array} } \right] $$
(39)

And \( \left( {{\text{t}}_{11} {\text{t}}_{12} {\text{t}}_{13} } \right)^{\text{T}} \) are the coordinates of the translation vector \( {\text{t}}_{1} \).

From the two latest expressions, we can conclude the vector \( {\text{t}}_{1} \) that admits an unique solution:

$$ \left[ {{\text{t}}_{1} } \right]_{{ \wedge }} { \sim }{\text{L}}_{1} {\text{N}}_{1} {\text{L}}_{1}^{\text{T}} $$
(40)

And the rotation matrix \( {\text{R}}_{1} \) admits 4 solutions

$$ {\text{R}}_{1} { \sim } \mp {\text{L}}_{1} {\text{N}}_{2} {\text{L}}_{2 }^{\text{T}}\, {\text{ or R}}_{1} { \sim } \mp {\text{L}}_{1} {\text{N}}_{2}^{\text{T}} {{\rm L}}_{2 }^{\text{T }} $$
(41)

But the determinant of the rotation matrix must be equal to 1, which allows fixing a sign for the two matrices:

$$ \mp {\text{L}}_{1} {\text{N}}_{2} {\text{L}}_{2 }^{\text{T}} \;{\text{and }} \mp {\text{L}}_{1} {\text{N}}_{2}^{\text{T}} {{\rm L}}_{2 }^{\text{T}} $$

So the number of solutions for \( {\text{R}}_{1} \) becomes 2.

We use the two solutions to reconstruct the 3D scene, and finally we choose the solution that gives the best Euclidean reconstruction.

From the Eq. (30), we obtain the following linear system of equations:

$$ {\text{M}}\left( {\text{X Y Z}} \right)^{\text{T}} = {\text{N}} $$
(42)
  • \( {\text{M }} \): Matrix of size 4 x 3

  • \( {\text{N }} \): Vector of size 4

These two matrices are expressed in function of the elements of the projection matrices and the coordinates of the matches.

\( \left( {\text{X Y Z}} \right)^{\text{T}} \): The vector of the coordinates of the searched 3D point.

The coordinates of the 3D points (the solution of the Eq. (42)) are obtained by the following expression:

$$ {\text{detM}}^{\text{T}} {{\rm M }} \ne 0 \;{\text{so M}}^{\text{T}}\,{\text{M is no singular }} $$
$$ \left( {\text{X Y Z}} \right)^{\text{T}} = \left( {{\text{M}}^{\text{T}} {{\rm M}}} \right)^{ - 1} {\text{M}}^{\text{T}} {{\rm N}} $$
(43)

6 Experimentations

In this part, we have taken two images of an unknown three-dimensional scene by a CCD camera characterized by variable intrinsic parameters Fig. 4. In the first step, we applied the ORB descriptor to determine the interest points Fig. 5. And the matching between the two selected images Fig. 6. Subsequently and after implementation the algorithms of Ransac and Levenberg-Marquardt while relying on the Python programming language, we got the result of the 3D reconstruction below Fig. 7:

Fig. 4.
figure 4

Two images of unknown 3D scene

Fig. 5.
figure 5

The interest points in the two images (blue color) (Color figure online)

Fig. 6.
figure 6

The matches between the two images

Fig. 7.
figure 7

The reconstructed 3D scene

The detection of interest points, Fig. 5. And the mapping Fig. 6 are carried out by the descriptor ORB [20]. The determination of the relationship between the matches and the camera parameters permit to formulate a system of non-linear equations. This system is introduced in a non-linear cost function. The minimization of this function by Levenberg-Marquardt algorithm [18] allows finding an optimal solution of the camera parameters. These parameters are used with the matches to obtain an initial point cloud Fig. 7.

We have a lot of values to estimate, every parameters have a minimum value.

The intrinsic camera parameters (focal lengths, coordinates of the principal points, scale factors, skew factors) and the rotation matrices.

This population is chosen in a way that each parameter belongs to a specific interval Table 1.

Table 1. Intervals of camera parameters

The usefulness of our contribution is to obtain a 3D scene reconstructed just from 2 images taken from an uncalibrated camera and with variable intrinsic parameters. The next steps will be the 3D modeling in order to finalize our work and find a robust results and a very well a 3D scene reconstructed based on a triangulation construction and a texture mapping.

7 Conclusion

In this work we have treated a new approach of the reconstruction of three-dimensional scenes from a method of autocalibration of cameras characterized by variable intrinsic parameters. The interest points are detected and matched by the ORB descriptor, and it’s used later with the projection matrix (expressed according to camera settings) of the scene in the planar images to determine coordinate of the point cloud, so that we can reconstruct the scene.