Binocular vision calibration and 3D re-construction with an orthogonal learning neural network

Ge, Dong-yuan; Yao, Xi-fan; Lian, Zhao-tong

doi:10.1007/s11042-015-2845-5

Binocular vision calibration and 3D re-construction with an orthogonal learning neural network

Published: 16 August 2015

Volume 75, pages 15635–15650, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Binocular vision calibration and 3D re-construction with an orthogonal learning neural network

Download PDF

653 Accesses
13 Citations
Explore all metrics

Abstract

A new approach for binocular vision system calibration and 3D re-construction is proposed. While the system is calibrated, the sum of square distances between the vector coordinates recombined with the coordinates of feature points in the world frame and those in image frame to the fitted hyperplane is taken as an objective function. An orthogonal learning neural network is designed, where a self-adaptive minor component extracting method is adopted. When the network comes to equilibrium, the projective matrixes for the two cameras are obtained from the eigen-vectors of the autocorrelation matrix corresponding to the minimum eigen values, so the calibration of the binocular vision system is achieved. As for 3D re-construction, an autocorrelation matrix is obtained from feature point coordinates in image planes and calibration data, and an orthogonal learning network is designed. After the network is trained, the autocorrelation matrix’s eigen-vector corresponding to the minimum eigen-values is obtained, from which the 3D coordinates are obtained also. The proposed approach is a novel application of minor component analysis and orthogonal learning network in binocular vision system and 3D re-construction.

A Data-Driven Algorithm for Large-Scale Multi-camera Calibration

Camera Calibration Implementation Based on Zhang Zhengyou Plane Method

Neural-Network Model for Compensation of Lens Distortion in Camera Calibration

Article 01 July 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Camera calibration is a process of determining the internal camera geometric and optical characteristics (intrinsic parameters) and/or the 3D position and orientation of the camera frame relative to a certain world frame (extrinsic parameters) [6, 12]. The applications of vision system include 3D sensing and measurement, precision manufacturing, automated assembly, monitoring and tracking etc. While a binocular vision system is calibrated, the scene’s 3D geometric information can be reconstructed by obtaining two digital images from different angles. There are many research reports on the camera calibration. For example, MA obtained a intrinsic parameters of the camera by designing two sets of three pure orthogonal translation movements, and the orientations of the camera with respect to the hand frame with a set pairwise orthogonal translation movements [11]. ZHANG obtained the intrinsic and extrinsic parameters via the homography matrix in the light of orthogonality of the rotational matrix with the homography obtained from the 3D feature coordinates in a target block and its 2D coordinates. In general, the calculated rotational matrix did not satisfy the orthogonal properties well [19]. Rahman and Krouglicof proposed the quaternion representation of spatial orientation, which resulted in a system of equations that was minimally redundant and free of singularities, and applied a technique to minimize the error between the reconstructed image points and their experimentally determined counterparts in the “distortion free” space, so the technique facilitated the incorporation of the exact lens distortion model as opposed to that relying on an approximation one [15]. Recently, we have proposed a method for camera calibration with an adaptive principal component extraction network in which the sum of square distances from the vector coordinates of feature points to those in hyperplane is taken as objective function, and the eigen-vector of autocorrelation matrix corresponding to minimal eigen-values as a projective matrix. But the calibration of binocular vision system and 3D re-construction has not been done [5]. Chen presented a novel method to analyze the blur distribution in an image and found the optimal focusing distance so that additional constraints could be used to generate absolute measurement of the models [4]. ZHENG proposed a minimum calibration condition that consisted of two vanishing points and a vanishing line so as to estimate camera intrinsic parameters (including the principal point coordinates) and rotation angles, which adopted least squares optimization instead of closed-form computation. The proposed method was practical and suitable for more traffic scenes while roadside camera is calibrated [21]. YIN presented a semi-automatic scene calibration method that combined tracked blobs with user-selected line scene features to recover the homographies between camera views, so the system could deal with mapping a network of cameras with overlapped fields of view into a single ground plane view, even when the overlap was not substantial [17]. There is no similar work done on the binocular vision system calibration and 3D re-construction by means of minor components analysis and the adaptive orthogonal learning network. Therefore, based on our previous work on neuro-calibration techniques, in this study we put forward a novel method where a self-adaptive orthogonal learning network is used to achieve calibration for the binocular vision system and 3D measurement.

2 Model of the binocular vision system

In the binocular vision system as shown in Fig. 1, camera frames are C ₁ and C ₂, o ₁ u ₁ v ₁ and o ₂ u ₂ v ₂ are the image coordinate systems measured in pixels, O _w X _w Y _w Z _w is the world frame measured in mm, the homogeneous coordinates of feature point P in the world frame are (X _wi, Y _wi, Z _wi, 1), which is projected into the image planes so p₁ and p₂ are obtained, and their homogeneous coordinates are (u _1i, v _1i, 1) and (u _2i, v _2i, 1) respectively. The projective matrixes of the left and right cameras are M ₁ and M ₂ respectively, and the transformation relations of o ₁ u ₁ v ₁ or o ₂ u ₂ v ₂ and O _w X _w Y _w Z _w can be described as follows:

$$ {Z}_{ci}^{(1)}\left[\begin{array}{l}{u}_{1\mathbf{i}}\\ {}{v}_{1\mathbf{i}}\\ {}1\end{array}\right]=\left[\begin{array}{cccc}\hfill {m}_{11}^{(1)}\hfill & \hfill {m}_{12}^{(1)}\hfill & \hfill {m}_{13}^{(1)}\hfill & \hfill {m}_{14}^{(1)}\hfill \\ {}\hfill {m}_{21}^{(1)}\hfill & \hfill {m}_{22}^{(1)}\hfill & \hfill {m}_{23}^{(1)}\hfill & \hfill {m}_{24}^{(1)}\hfill \\ {}\hfill {m}_{31}^{(1)}\hfill & \hfill {m}_{32}^{(1)}\hfill & \hfill {m}_{33}^{(1)}\hfill & \hfill {m}_{34}^{(1)}\hfill \end{array}\right]\left[\begin{array}{l}{X}_{\mathbf{wi}}\\ {}{Y}_{\mathbf{wi}}\\ {}{Z}_{\mathbf{wi}}\\ {}1\end{array}\right] $$

(1)

$$ {Z}_{ci}^{(2)}\left[\begin{array}{l}{u}_{2\mathbf{i}}\\ {}{v}_{2\mathbf{i}}\\ {}1\end{array}\right]=\left[\begin{array}{cccc}\hfill {m}_{11}^{(2)}\hfill & \hfill {m}_{12}^{(2)}\hfill & \hfill {m}_{13}^{(2)}\hfill & \hfill {m}_{14}^{(2)}\hfill \\ {}\hfill {m}_{21}^{(2)}\hfill & \hfill {m}_{22}^{(2)}\hfill & \hfill {m}_{23}^{(2)}\hfill & \hfill {m}_{24}^{(2)}\hfill \\ {}\hfill {m}_{31}^{(2)}\hfill & \hfill {m}_{32}^{(2)}\hfill & \hfill {m}_{33}^{(2)}\hfill & \hfill {m}_{34}^{(2)}\hfill \end{array}\right]\left[\begin{array}{l}{X}_{\mathbf{wi}}\\ {}{Y}_{\mathbf{wi}}\\ {}{Z}_{\mathbf{wi}}\\ {}1\end{array}\right] $$

(2)

where $ {m}_{11}^{(1)},\begin{array}{ccc}\hfill \cdots, {m}_{34}^{(1)},\hfill & \hfill {m}_{11}^{(2)},\cdots, \hfill & \hfill {m}_{34}^{(2)}\hfill \end{array} $ are elements of projections matrices of the left and right cameras.

If Z ⁽¹⁾_ci and Z ⁽²⁾_ci in Eqs. (1) and (2) are cancelled respectively, then we can obtain

$$ {X}_{wi}{m}_{11}^{(1)}+{Y}_{wi}{m}_{12}^{(1)}+{Z}_{wi}{m}_{13}^{(1)}+{m}_{14}^{(1)}-{u}_{1i}{X}_{wi}{m}_{31}^{(1)}-{u}_{1i}{Y}_{wi}{m}_{32}^{(1)}-{u}_{1i}{Z}_{wi}{m}_{33}^{(1)}-{u}_{1i}{m}_{34}^{(1)}=0 $$

(3)

$$ {X}_{wi}{m}_{21}^{(1)}+{Y}_{wi}{m}_{22}^{(1)}+{Z}_{wi}{m}_{23}^{(1)}+{m}_{24}^{(1)}-{v}_{1i}{X}_{wi}{m}_{31}^{(1)}-{v}_{1i}{Y}_{wi}{m}_{32}^{(1)}-{v}_{1i}{Z}_{wi}{m}_{33}^{(1)}-{v}_{1i}{m}_{34}^{(1)}=0 $$

(4)

$$ {X}_{wi}{m}_{11}^{(2)}+{Y}_{wi}{m}_{12}^{(2)}+{Z}_{wi}{m}_{13}^{(2)}+{m}_{14}^{(2)}-{u}_{2i}{X}_{wi}{m}_{31}^{(2)}-{u}_{2i}{Y}_{wi}{m}_{32}^{(2)}-{u}_{2i}{Z}_{wi}{m}_{33}^{(2)}-{u}_{2i}{m}_{34}^{(2)}=0 $$

(5)

$$ {X}_{wi}{m}_{21}^{(2)}+{Y}_{wi}{m}_{22}^{(2)}+{Z}_{wi}{m}_{23}^{(2)}+{m}_{24}^{(2)}-{v}_{2i}{X}_{wi}{m}_{31}^{(2)}-{v}_{2i}{Y}_{wi}{m}_{32}^{(2)}-{v}_{2i}{Z}_{wi}{m}_{33}^{(2)}-{v}_{2i}{m}_{34}^{(2)}=0 $$

(6)

At the same time, Eqs. (3), (4), (5) and (6) can be divided by − u _1i, − v _1i, − u _2i and − v _2i respectively, which don’t change the transformation relation of two sides in Eqs. (3)–(6). While the binocular vision system is calibrated, if n feature pionts’ coordinates in world frame and in image frames are obtained, a linear equation can be obtained according to Eqs. (3) and (4), that is

$$ {\mathbf{A}}_1{\mathbf{n}}_1={\mathbf{0}}^{(1)} $$

(7)

where \( {\mathbf{A}}_1=\left[\begin{array}{cccccccccccc}\hfill -{X}_{w1}/{u}_{11}\hfill & \hfill -{Y}_{w1}/{u}_{11}\hfill & \hfill -{Z}_{w1}/{u}_{11}\hfill & \hfill -1/{u}_{11}\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill {X}_{w1}\hfill & \hfill {Y}_{w1}\hfill & \hfill {Z}_{w1}\hfill & \hfill 1\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill -{X}_{w1}/{v}_{11}\hfill & \hfill -{Y}_{w1}/{v}_{11}\hfill & \hfill -{Z}_{w1}/{v}_{11}\hfill & \hfill -1/{v}_{11}\hfill & \hfill {X}_{w1}\hfill & \hfill {Y}_{w1}\hfill & \hfill {Z}_{w1}\hfill & \hfill 1\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill \\ {}\hfill -{X}_{wn}/{u}_{1n}\hfill & \hfill -{X}_{wn}/{u}_{1n}\hfill & \hfill -{X}_{wn}/{u}_{1n}\hfill & \hfill -1/{u}_{1n}\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill 1\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill -{X}_{wn}/{v}_{1n}\hfill & \hfill -{X}_{wn}/{v}_{1n}\hfill & \hfill -{X}_{wn}/{v}_{1n}\hfill & \hfill -1/{v}_{1n}\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill 1\hfill \end{array}\right] \), which is a 2n × 12 matrix; n ₁ is a column vector, consisting of m ⁽¹⁾₁₁ , ⋯ m ⁽¹⁾₁₄ , m ⁽¹⁾₂₁ , ⋯ m ⁽¹⁾₂₄ , m ⁽¹⁾₃₁ , ⋯, m ⁽¹⁾₃₄ ; and 0 ⁽¹⁾ is a vector consisting of 12 constants of 0.

As for the right camera, a similar equation like Eq. (7) can be obtained from Eqs. (5) and (6), that is

$$ {\mathbf{A}}_2{\mathbf{n}}_2={\mathbf{0}}^{(2)} $$

(8)

According to Eqs. (7) and (8), we can get an overdetermined equation, that is

$$ \mathbf{An}=\mathbf{0} $$

(9)

where $ \mathbf{A}=\left[\begin{array}{cc}\hfill {\mathbf{A}}_1\hfill & \hfill {\mathbf{0}}_1\hfill \\ {}\hfill {\mathbf{0}}_2\hfill & \hfill {\mathbf{A}}_2\hfill \end{array}\right] $, which is a 4n × 24 matrix; n = [n ₁, n ₂]^T, which is a column vector consisting of elements of n ₁ and n ₂, and 0 ₁ and 0 ₂ are zero matrixes with 12 × 12.

3 Minor component analysis and solving algorithm

While the binocular vision system is calibrated, the 22 elements of the fitting projective matrixes except m ⁽¹⁾₃₄ and m ⁽²⁾₃₄ in Eq. (9) are taken as the coefficients of the hyper-plane, which constitute the fitting vector m = [m ⁽¹⁾₁₁ , ⋯, m ⁽¹⁾₃₃ , m ⁽²⁾₁₁ , ⋯, m ⁽²⁾₃₃ ]^T. The coordinates of sampled points in the world frame and in the image frame are transformed to vector points x _i, so the algorithm we adopted is to minimize the sum of the squared distances between all the vector points (i.e. combination coordinates) x _i and fitting hyper-plane, thus the objective function is

$$ \underset{\mathbf{m}}{ \min }E\left(\mathbf{m}\right)={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i}{e}_j^2}} $$

(10)

where e _j = m ^T x _j/‖m‖₂.

Let $ {\mathbf{x}}_{4i-3}={\left[\begin{array}{cc}\hfill \begin{array}{ccccc}\hfill -{X}_{wi}/{u}_{1i}\hfill & \hfill -{Y}_{wi}/{u}_{1i}\hfill & \hfill -{Z}_{wi}/{u}_{1i}\hfill & \hfill -1/{u}_{1i}\hfill & \hfill {\mathbf{0}}^{(1)}\hfill \end{array}\hfill & \hfill \begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\mathbf{0}\hfill \end{array}\right]}^{\mathrm{T}} $, $ {\mathbf{x}}_{4i-2}={\left[\begin{array}{ccccc}\hfill {\mathbf{0}}^{(1)}\hfill & \hfill -{X}_{wi}/{v}_{1i}\hfill & \hfill -{Y}_{wi}/{v}_{1i}\hfill & \hfill -{Z}_{wi}/{v}_{1i}\hfill & \hfill -1/{v}_{1i}\begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\mathbf{0}\hfill \end{array}\right]}^{\mathrm{T}} $, $ {\mathbf{x}}_{4i-1}={\left[\begin{array}{ccc}\hfill \mathbf{0}\hfill & \hfill \begin{array}{ccccc}\hfill -{X}_{wi}/{u}_{2i}\hfill & \hfill -{Y}_{wi}/{u}_{2i}\hfill & \hfill -{Z}_{wi}/{u}_{2i}\hfill & \hfill -1/{u}_{2i}\hfill & \hfill {\mathbf{0}}^{(1)}\hfill \end{array}\hfill & \hfill \begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\hfill \end{array}\right]}^{\mathrm{T}} $, $ {\mathbf{x}}_{4i}={\left[\begin{array}{ccc}\hfill \mathbf{0}\hfill & \hfill \begin{array}{ccccc}\hfill {\mathbf{0}}^{(1)}\hfill & \hfill -{X}_{wi}/{v}_{2i}\hfill & \hfill -{W}_{wi}/{v}_{2i}\hfill & \hfill -{Z}_{wi}/{v}_{2i}\hfill & \hfill -1/{v}_{2i}\hfill \end{array}\hfill & \hfill \begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\hfill \end{array}\right]}^{\mathrm{T}} $ where 0 is a 1 × 11 row vector, and 0 ⁽¹⁾ is a 1 × 4 row vector. Thus

$$ \begin{array}{c}\hfill E\left(\mathbf{m}\right)={\displaystyle \sum_{i=1}^N\frac{{\displaystyle \sum_{j=4i-3}^{4i-2}{\left({\mathbf{m}}^{\mathrm{T}}{\mathbf{x}}_j+{m}_{34}^{(1)}\right)}^2}+{\displaystyle \sum_{j=4i-1}^{4i}{\left({\mathbf{m}}^{\mathrm{T}}{\mathbf{x}}_j+{m}_{34}^{(2)}\right)}^2}}{\left|\right|\mathbf{m}\left|\right|{}_2^2}}\hfill \\ {}\hfill =\frac{{\mathbf{m}}^{\mathrm{T}}\mathbf{R}\mathbf{m}+2{m}_{34}^{(1)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_1+2{m}_{34}^{(2)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_2+2N\left({\left({m}_{34}^{(1)}\right)}^2+{\left({m}_{34}^{(2)}\right)}^2\right)}{\left|\right|\mathbf{m}\left|\right|{}_2^2}\hfill \end{array} $$

(11)

where $ \mathbf{R}={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}}+{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}} $, which can be written as [R ₁ + R ₂]; R ₁ and R ₂ are 22 × 22 real symmetric matrixes, i.e., $ {\mathbf{R}}_1={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}} $, and $ {\mathbf{R}}_2={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}} $; $ {\mathbf{b}}_1={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{x}}_j}} $; and $ {\mathbf{b}}_2={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{x}}_j}} $.

In order to get the minimum, the critical point can be obtained from dE/d m = 0, that is

$$ \left(\mathbf{R}+{m}_{34}^{(1)}{\mathbf{b}}_1+{m}_{34}^{(2)}{\mathbf{b}}_2\right)\mathbf{m}-\frac{{\mathbf{m}}^{\mathrm{T}}\mathbf{R}\mathbf{m}+2{m}_{34}^{(1)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_1+2{m}_{34}^{(2)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_2+2N{\left({m}_{34}^{(1)}+{m}_{34}^{(2)}\right)}^2}{\left|\right|\mathbf{m}\left|\right|{}_2^2}\mathbf{m}=\mathbf{0} $$

(12)

Let $ \lambda =\frac{{\mathbf{m}}^{\mathrm{T}}\mathbf{R}\mathbf{m}+2{m}_{34}^{(1)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_1+2{m}_{34}^{(2)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_2+2N{\left({m}_{34}^{(1)}+{m}_{34}^{(2)}\right)}^2}{\left|\right|\mathbf{m}\left|\right|{}_2^2} $. According to the expected values of m ⁽¹⁾₃₄ and m ⁽²⁾₃₄ from Eqs. (7) and (8). And assuming m ⁽¹⁾₃₄ = − m ^T b ₁/2N and m ⁽²⁾₃₄ = − m ^T b ₂/2N, so we have

$$ \mathbf{Mm}-\lambda \mathbf{m}=\mathbf{0} $$

(13)

where M = R − B, $ \lambda =\frac{{\mathbf{m}}^{\mathrm{T}}\mathbf{Mm}}{{\mathbf{m}}^{\mathrm{T}}\mathbf{m}} $, $ \mathbf{B}=\left[\begin{array}{cc}\hfill {\mathbf{B}}_1\hfill & \hfill {\mathbf{B}}_2\hfill \end{array}\right] $, B ₁ = b ₁ b ₁ ^T/2N, B ₂ = b ₂ b ₂ ^T/2N, and $ \mathbf{M}=\left[\begin{array}{cc}\hfill {\mathbf{M}}_1\hfill & \hfill \mathbf{0}\hfill \\ {}\hfill \mathbf{0}\hfill & \hfill {\mathbf{M}}_2\hfill \end{array}\right] $. Thus λ is the eigen-value of M, and m is its corresponding eigen-vector [1].

At the same time, assuming eigen values of M ₁ and M ₂ to be λ ₁, λ ₂, ⋯, and λ ₁₁, μ ₁, μ ₂, ⋯, μ ₁₁, respectively, so there are orthogonal matrixes P and Q, which meet M ₁ = PΛ ₁ P ^− 1, and M ₂ = QΛ ₂ Q ^− 1, where $ {\boldsymbol{\Lambda}}_1= diag\left(\begin{array}{cccc}\hfill {\lambda}_1,\hfill & \hfill {\lambda}_2,\hfill & \hfill \cdots, \hfill & \hfill {\lambda}_{11}\hfill \end{array}\right) $, $ {\boldsymbol{\Lambda}}_2= diag\left(\begin{array}{cccc}\hfill {\mu}_1,\hfill & \hfill {\mu}_2,\hfill & \hfill \cdots, \hfill & \hfill {\mu}_{11}\hfill \end{array}\right) $. Thus

$$ \mathbf{M}=\left[\begin{array}{cc}\hfill \mathbf{P}{\boldsymbol{\Lambda}}_1{\mathbf{P}}^{-1}\hfill & \hfill \mathbf{0}\hfill \\ {}\hfill \mathbf{0}\hfill & \hfill \mathbf{Q}{\boldsymbol{\Lambda}}_2{\mathbf{Q}}^{-1}\hfill \end{array}\right]=\left[\begin{array}{cc}\hfill \mathbf{P}\hfill & \hfill \mathbf{0}\hfill \\ {}\hfill \mathbf{0}\hfill & \hfill \mathbf{Q}\hfill \end{array}\right]\boldsymbol{\Lambda} {\left[\begin{array}{cc}\hfill \mathbf{P}\hfill & \hfill \mathbf{0}\hfill \\ {}\hfill \mathbf{0}\hfill & \hfill \mathbf{Q}\hfill \end{array}\right]}^{-1} $$

(14)

As for the eigen values of M, we have sorted them in order from large to small. For example $ \boldsymbol{\Lambda} = diag\left(\begin{array}{cccccc}\hfill {\mu}_1\hfill & \hfill {\lambda}_1\hfill & \hfill \cdots \hfill & \hfill {\lambda}_{10}\hfill & \hfill {\lambda}_{11}\hfill & \hfill {\mu}_{11}\hfill \end{array}\right) $, or $ \boldsymbol{\Lambda} = diag\left(\begin{array}{cccccc}\hfill {\lambda}_1\hfill & \hfill {\mu}_1\hfill & \hfill \cdots \hfill & \hfill {\mu}_{11}\hfill & \hfill {\lambda}_{10}\hfill & \hfill {\lambda}_{11}\hfill \end{array}\right) $, and so on. Thus M can be described as M = BΛ B ^− 1, where B consists of 22 column vectors such as [m ₁, 0]^T, [0, m ₂]^T, [m ₃, 0]^T, …, [0, m _j]^T , …, and [m ₂₂, 0]^T, which is an orthogonal eigen vector of M corresponding to the eigen values respectively, and the projective matrixes of binocular vision are obtained from the normalization eigenvectors of the M corresponding to the minimum eigen values.

Due to the projective matrices of the left and right cameras being different in binocular vision system, if the left camera’s projective matrix is obtained from the eigen vector m = [m ₂₂, 0]^T of the autocorrelation matrices corresponding to the minimal eigen value λ ₁₁, so the right camera’s projective matrix can be obtained from the eigen vector m = [0, m ₂₁]^T corresponding to the minimal eigen value μ ₁₁, where 0 is a 1 × 11 matrix. And $ {m}_{34}^{(1)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{m}}_{22}^{\mathrm{T}}{\mathbf{x}}_j}}/2N $ , $ {m}_{34}^{(2)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{m}}_{21}^{\mathrm{T}}{\mathbf{x}}_j}}/2N $, where i = 1, 2, ⋯, N. Thus the projective matrix of the left camera is m _L = [m ₂₂, m ⁽¹⁾₃₄ ]^T, and that for the right camera is m _R = [m ₂₁, m ⁽²⁾₃₄ ]^T.

4 Design of the self-adaptive orthogonal learning neural network

An orthogonal learning neural network with a lateral connection proposed by KUNG [10] was adopted in the experiments. The structure is shown in Fig. 2, and its input data are the row vectors of the auto-correlation matrix M. There are 22 neurons in the output layer, and the 1st neuron connects with input neurons with m ₁ = [m ⁽¹⁾₁ , m ⁽²⁾₁ , ⋯, m ⁽²²⁾₁ ]^T, but without a lateral connection while trained. The j ^th neuron connects with both the input neurons as m _j = [m ⁽¹⁾_j , m ⁽²⁾_j , ⋯, m ⁽²²⁾_j ]^T, and the front (j-1) outputs as the lateral weight vector W _j = [w ⁽¹⁾_j , w ⁽²⁾_j , …, w ^{(j − 1)}_j ]^T. The network is trained out step by step. While the j ^th neuron is training, the 1st, 2nd, …, and (j-1)^th neurons have already been trained, i.e. stable values m ₁, m ₂, ⋯, m _j − 1 are obtained, all of which are orthogonal each other. When the j ^th neuron is trained completely, the lateral connection weights approximate 0, and m _j is perpendicular to m ₁, m ₂, ⋯, m _j − 1 respectively [13].

The output of the 1st neuron is

$$ {O}_1={\mathbf{m}}_1^{\mathrm{T}}{\mathbf{M}}_i $$

(15)

Due to the fact that there is no lateral connection, the learning algorithm for the 1st neuron is as following,

$$ \varDelta {\mathbf{m}}_1=\beta \left({O}_1{\mathbf{M}}_i-\frac{O_1^2}{{\mathbf{m}}_1^{\mathrm{T}}{\mathbf{m}}_1}{\mathbf{m}}_1\right) $$

(16)

The 1st term of Eq. (16) is that for Hebbian learning rule, which represents a self-strengthening function. When the network comes to the stable state, i.e. Δ m ₁ → 0, we have $ {O}_1{\mathbf{M}}_i-\frac{O_1^2}{{\mathbf{m}}_1^{\mathrm{T}}{\mathbf{m}}_1}{\mathbf{m}}_1=\mathbf{0} $. According to Eqs. (15) and (16), Mm _i − λ ₁ m ₁ = 0, and $ {\lambda}_1=\frac{{\mathbf{m}}_1^{\mathrm{T}}\mathbf{M}{\mathbf{m}}_1}{{\mathbf{m}}_1^{\mathrm{T}}{\mathbf{m}}_1} $, where m ₁ is the eigen-vector of the self-correlation matrix M, which is corresponding to the max eigen value λ ₁.

The training for the j ^th neuron is similar to the above method, i.e.

$$ {\mathbf{O}}_j={\mathbf{V}}_j{\mathbf{M}}_i $$

(17)

$$ {O}_j={\mathbf{m}}_j^T{\mathbf{M}}_i+{\mathbf{W}}_j^{\mathrm{T}}{\mathbf{O}}_j $$

(18)

where O _j = [O ₁, O ₂, ⋯, O _j − 1]^T, which is the output of the front (j-1) neurons; $ {\mathbf{V}}_j={\left[\begin{array}{cc}\hfill {\mathbf{m}}_1,\hfill & \hfill {\mathbf{m}}_2,\cdots, {\mathbf{m}}_{j-1}\hfill \end{array}\right]}^{\mathrm{T}} $, a weight matrix; and W _j = [w ⁽¹⁾_j , w ⁽²⁾_j , ⋯, w ^{(j − 1)}_j ]^T, the vector of the lateral connection weights for the j ^th neuron.

After normalized, the learning rule for the j ^th neuron is as follows [8, 18],

$$ \varDelta {\mathbf{m}}_j=\beta \left({O}_j{\mathbf{M}}_i-\frac{O_j^2}{{\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j}{\mathbf{m}}_j\right) $$

(19)

$$ \varDelta {\mathbf{W}}_j=-\gamma \left({O}_j{\mathbf{O}}_j+{O}_j^2{\mathbf{W}}_j\right) $$

(20)

where β and γ are positive parameters which determine the learning rates, and their values are set according to the corresponding autocorrelation-matrix for fast training speed with no oscillations. The 1st term of Eq. (19) is that for Hebbian learning rule, which represents a self-strengthening function; the 2nd terms in Eqs. (19) and (20) play a stable role for system, and the 1st term in Eq. (20) stands for the anti-Hebbian learning rule, which causes an inhibition function, and makes the outputs of the network non-correlative even if the input signals are correlative. That is the weight W _j plays the role of “subtracting” the first (j-1) components from the j ^th neuron, i.e. the first principal component m ₁ of M, the second principal component m ₂ of M, …, and the (j-1)^th principal component m _j − 1 of M are subtracted. Thus the j ^th vector m _j tends to become orthogonal to all the previous components, i.e. m ₁, m ₂, ⋯, m _j − 1, when the train of j ^th neuron is over. Hence the orthogonal learning rule constitutes an anti-Hebbian rule.

And the iteration algorithms for m _j(t + 1) and W _j(t + 1) are m _j(t + 1) = m _j(t) + Δ m _j(t), and W _j(t + 1) = W _j(t) + Δ W _j(t), respectively. Assuming β and γ are sufficiently small values so that the values of m _j(t + 1) and W _j(t + 1) remain approximately constant during that period of time while an average of the variable in an equation is taken over one sweep of the training data (one sweep means one round of training process involving all the given sample input patterns). To facilitate the proof, we assume that β = γ. Therefore, according to the Eqs. (19) and (20) the weight iteration in one sweep for the orthogonal learning network can be rewritten in the form of state transition matrix as follows,

$$ \left[\begin{array}{c}\hfill {\mathbf{m}}_j\left(t+1\right)\hfill \\ {}\hfill {\mathbf{W}}_j\left(t+1\right)\hfill \end{array}\right]=\left[\begin{array}{cc}\hfill {\mathbf{M}}_{11}\hfill & \hfill {\mathbf{M}}_{12}\hfill \\ {}\hfill {\mathbf{M}}_{21}\hfill & \hfill {\mathbf{M}}_{22}\hfill \end{array}\right]\left[\begin{array}{l}{\mathbf{m}}_j(t)\\ {}{\mathbf{W}}_j(t)\end{array}\right] $$

(21)

where $ {\mathbf{M}}_{11}={\mathbf{E}}_{22}+\gamma \left(\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right)-\frac{\sigma (t)}{{\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j}{\mathbf{E}}_{22}\right) $, $ {\mathbf{M}}_{12}=\gamma \left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right){\mathbf{V}}_j^{\mathrm{T}} $, $ {\mathbf{M}}_{21}=-\gamma {\mathbf{V}}_j\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right) $, $ {\mathbf{M}}_{22}={\mathbf{E}}_{j-1}-\gamma \left({\mathbf{V}}_j\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right){\mathbf{V}}_j^{\mathrm{T}}+\sigma (t){\mathbf{E}}_{j-1}\right) $, and σ(t) = E{O ²_j (t)}, where E ₂₂ is a unit matrix with 22 × 22, and E _j − 1 is a unit matrix with (j − 1) × (j − 1).

In the experiment, first the m _j is initialed at random; then let them be normalized to form a unitary vector; and at every iteration, while the tuned vector m _j is obtained, which is normalized to obtained a unitary vector again. Thus m ^T_j m _j = 1, i.e. m _j is a unitary vector during the iteration processing. On the other hand, while m _j(t + 1) in Eq. (21) is left multiplied by V _j, then W _j(t + 1) is added, so we have

$$ {\mathbf{V}}_j{\mathbf{m}}_j\left(t+1\right)+{\mathbf{W}}_j\left(t+1\right)=\left(1-\gamma \sigma (t)\right)\left({\mathbf{V}}_j{\mathbf{m}}_j(t)+{\mathbf{W}}_j(t)\right) $$

(22)

As 1 − γσ(t) < 1, when t → ∞, we have

$$ {\mathbf{V}}_j{\mathbf{m}}_j\left(t+1\right)+{\mathbf{W}}_j\left(t+1\right)\to \mathbf{0} $$

(23)

At the same time, according to Eq. (19), when the system comes to the steady state, that is Δ m _j(t + 1) → 0, we have $ \left(\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right)-{\lambda}_j{\mathbf{E}}_{22}\right){\mathbf{m}}_j+\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right){\mathbf{V}}_j^T{\mathbf{W}}_j\to \mathbf{0} $. Due to $ \left(\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right)-{\lambda}_j{\mathbf{E}}_{22}\right){\mathbf{m}}_j\to \mathbf{0} $, thus we have $ \left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right){\mathbf{V}}_j^T{\mathbf{W}}_j\to \mathbf{0} $, and W _j → 0 (with probability 1). From Eq. (23) we know V _j m _j(t + 1) → 0, that is m _j(t + 1) is orthogonal to the vector elements of V _j (i.e. m ₁, m ₂, … , m _j − 1), if the iteration times are sufficiently large.

Assuming the learning rates β and γ decrease to zero at the proper speed (for example, let β = Δt) , thus the Eq. (19) can be described differentially, that is,

$$ \frac{d{\mathbf{m}}_j}{dt}={O}_j{\mathbf{M}}_i-\frac{O_j^2}{{\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j}{\mathbf{m}}_j $$

(24)

When the system comes to a stable equilibrium, i.e. d m _j/dt → 0, the lateral connection approximates to zero. Thus, O _j = m _i M _i. According to Eq. (24), the asymptotic stable solution is obtained as follows,

$$ \mathbf{M}{\mathbf{m}}_j=\frac{{\mathbf{m}}_j^{\mathrm{T}}\mathbf{M}{\mathbf{m}}_j}{{\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j}{\mathbf{m}}_j $$

(25)

where $ \mathbf{M}={\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i^{\mathrm{T}}{\mathbf{M}}_i} $, which is an auto-correlation and symmetrical matrix.

The flow of the solving program is shown in Fig. 3, where ε ₁ and ε ₂ are jump conditions for program iteration.

When the network comes to the stable state, m ₁, m ₂, ⋯, m ₂₂ will converge to the eigen-vectors of an auto-correlation matrix M, that is $ \underset{n\to \infty }{ \lim }{\lambda}_j=\left({\mathbf{m}}_j^{\mathrm{T}}\mathbf{M}{\mathbf{m}}_j\right)/\left({\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j\right) $, which is equivalent to the Lagrange multiplier in Eq. (13). Thus the eigenvectors of M corresponding to a minimum eigen value λ ₁₁ and μ ₁₁, i.e. $ \left[\begin{array}{cc}\hfill {\mathbf{m}}_1\hfill & \hfill \mathbf{0}\hfill \end{array}\right] $ and $ \left[\begin{array}{cc}\hfill \mathbf{0}\hfill & \hfill {\mathbf{m}}_2\hfill \end{array}\right] $ are obtained respectively, elements of which can be taken as the fitting coefficients of the projective matrixes of the cameras in the binocular vision system.

5 Results system calibration and 3D re-construction

5.1 Binocular vision system calibration experiment

As been shown in Fig. 4, a precise robot, which consists of servomechanism, motion controllers, mechanical body, binocular vision system and so on. In the vision system, two cameras are mounted at the ends of a manipulator, which move with the end-effector together (eye-in-hand), so the transformation relation of end-effector and cameras is constant while the manipulator moves. While the vision system is calibrated, first the 3D coordinates of feature points are measured with a 3 dimension coordinate measuring machine [16, 22]. And in order to obtain the variations in Z axis direction of feature points coordinates, the target block’s images are sampled with the cameras at different positions by moving the manipulator vertically. Then the corresponding 2D coordinates are estimated with sub-pixel accuracy in the light of the improved Canny’s edge detector algorithm [2, 9].

In the program, the forward and lateral connection weights are initialized at random, and let ε ₁ = 0.05, ε ₂ = 0.005. After the 22nd neuron trained, the eigen-vector of an auto-correlation matrix of input signals can be obtained, which corresponds to the minimal eigen values, and if $ {\mathbf{v}}_{22}={\left[\begin{array}{cc}\hfill {\mathbf{m}}^{(1)}\hfill & \hfill \mathbf{0}\hfill \end{array}\right]}^{\mathrm{T}} $, else $ {\mathbf{v}}_{21}={\left[\begin{array}{cc}\hfill \mathbf{0}\hfill & \hfill {\mathbf{m}}^{(2)}\hfill \end{array}\right]}^{\mathrm{T}} $. So parameters m ⁽¹⁾₃₄ and m ⁽²⁾₃₄ can be obtained, i.e. $ {m}_{34}^{(1)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{m}}_{22}^{\mathrm{T}}{\mathbf{x}}_j}}/2N $, and $ {m}_{34}^{(2)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{m}}_{21}^{\mathrm{T}}{\mathbf{x}}_j}}/2N $, where i = 1, 2, …, N, m ₂₁ and m ₂₂ are corresponding to eigen values λ ₁₁ and μ ₁₁ respectively. So the projective matrixes of the left and right cameras in the binocular vision system can be obtained from the eigenvectors corresponding to minimum eigen values m ⁽¹⁾₃₄ and m ⁽²⁾₃₄ respectively, which are shown in Table 1.

Table 1 Projective matrixes of binocular vision system obtained with the proposed approach

Full size table

The elements in Table 1 constitute the projective matrices of left and right cameras, so the transformation relations between the image frames and the world frame in binocular vision system are estimated, which are obtained at once.

5.2 3D re-construction by the self-adaptive orthogonal learning network

In the system of binocular vision, the 3D re-construction can be carried out in the light of the self-adaptive orthogonal learning network with lateral restraint. Assume homogeneous coordinates of a feature point in the world frame to be (X, Y, Z, 1), whose projective coordinates in the left and right camera image planes are (u _1i, v _1i, 1) and (u _2i, v _2i, 1), according to the camera’s mathematic model, 4 equations can be obtained from Eqs. (3) – (6) as follows:

$$ \left({u}_1{m}_{31}^{(1)}-{m}_{11}^{(1)}\right)X+\left({u}_1{m}_{32}^{(1)}-{m}_{12}^{(1)}\right)Y+\left({u}_1{m}_{33}^{(1)}-{m}_{13}^{(1)}\right)Z={m}_{14}^{(1)}-{u}_1{m}_{34}^{(1)} $$

(26)

$$ \left({v}_1{m}_{31}^{(1)}-{m}_{21}^{(1)}\right)X+\left({v}_1{m}_{32}^{(1)}-{m}_{22}^{(1)}\right)Y+\left({v}_1{m}_{33}^{(1)}-{m}_{23}^{(1)}\right)Z={m}_{24}^{(1)}-{v}_1{m}_{34}^{(1)} $$

(27)

$$ \left({u}_2{m}_{31}^{(2)}-{m}_{11}^{(2)}\right)X+\left({u}_2{m}_{32}^{(2)}-{m}_{12}^{(2)}\right)Y+\left({u}_2{m}_{33}^{(2)}-{m}_{13}^{(2)}\right)Z={m}_{14}^{(2)}-{u}_2{m}_{34}^{(2)} $$

(28)

$$ \left({v}_2{m}_{31}^{(2)}-{m}_{21}^{(2)}\right)X+\left({v}_2{m}_{32}^{(2)}-{m}_{22}^{(2)}\right)Y+\left({v}_2{m}_{33}^{(2)}-{m}_{23}^{(2)}\right)Z={m}_{24}^{(2)}-{v}_2{m}_{34}^{(2)} $$

(29)

According to analytic geometry, the physical meaning of Eqs. (26) and (27) (or Eqs. (28) and (29)) denotes the line for O₁P₁ (or O₂P₂) as shown in Fig 1. In order to obtain the 3D information, the coordinates of P can be obtained from the cross point of O₁P₁ and O₂P₂ [20]. In the solving algorithm, from Eqs. (26)–(29), we let \( \mathbf{N}=\left[\begin{array}{ccc}\hfill \frac{u1{m}_{31}^{(1)}-{m}_{11}^{(1)}}{m_{14}^{(1)}-{u}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{u_1{m}_{32}^{(1)}-{m}_{12}^{(1)}}{m_{14}^{(1)}-{u}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{u_1{m}_{33}^{(1)}-{m}_{13}^{(1)}}{m_{14}^{(1)}-{u}_1{m}_{34}^{(1)}}\hfill \\ {}\hfill \frac{v_1{m}_{31}^{(1)}-{m}_{21}^{(1)}}{m_{24}^{(1)}-{v}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{v_1{m}_{32}^{(1)}-{m}_{22}^{(1)}}{m_{24}^{(1)}-{v}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{v_1{m}_{33}^{(1)}-{m}_{23}^{(1)}}{m_{24}^{(1)}-{v}_1{m}_{34}^{(1)}}\hfill \\ {}\hfill \frac{u_2{m}_{31}^{(2)}-{m}_{11}^{(2)}}{m_{14}^{(2)}-{u}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{u_2{m}_{32}^{(2)}-{m}_{12}^{(2)}}{m_{14}^{(2)}-{u}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{u_2{m}_{33}^{(2)}-{m}_{13}^{(2)}}{m_{14}^{(2)}-{u}_2{m}_{34}^{(2)}}\hfill \\ {}\hfill \frac{v_2{m}_{31}^{(2)}-{m}_{21}^{(2)}}{m_{24}^{(2)}-{v}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{v_2{m}_{32}^{(2)}-{m}_{22}^{(2)}}{m_{24}^{(2)}-{v}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{v_2{m}_{33}^{(2)}-{m}_{23}^{(2)}}{m_{24}^{(2)}-{v}_2{m}_{34}^{(2)}}\hfill \end{array}\right] \), $ \mathbf{d}=\left[\begin{array}{c}\hfill x\hfill \\ {}\hfill y\hfill \\ {}\hfill z\hfill \end{array}\right] $, and $ \mathbf{c}={\left[\begin{array}{cccc}\hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill \end{array}\right]}^{\mathrm{T}} $, so we have

$$ \mathbf{N}\mathbf{d}+\mathbf{c}=\mathbf{0} $$

(30)

According to the above methods, let the objective function be as follows,

$$ \underset{\mathbf{d}}{ \min }L\left(\mathbf{d}\right)={\displaystyle \sum_{i=1}^4{r}_i^2} $$

(31)

where $ {r}_i=\frac{\left|{\mathbf{N}}_i\mathbf{d}+{c}_i\right|}{{\left\Vert \mathbf{d}\right\Vert}_2} $, and N _i is the i ^th row vector of N. In the algorithm, let c _i = c (i = 1, 2, 3, 4), where c is the constant proportional to 1, which has no influence on the solving iteration of 3D re-construction. For example, if a fitting coefficient vector d can make the sum of the squares of the distances from all the vector points’ coordinates N _i to the fitting hyperplane minimizing Eq. (31) with constant 1, it is the same for Eq. (31) with arbitrary constant c, and there is only an offset value for Eq. (31). Thus Eq. (31) can be re-written as follows,

$$ L=\frac{{\displaystyle \sum_{i=1}^4{\mathbf{d}}^{\mathrm{T}}{\mathbf{N}}_i^{\mathrm{T}}{\mathbf{N}}_i\mathbf{d}}+2c{\displaystyle \sum_{i=1}^4{\mathbf{N}}_i\mathbf{d}}+4{c}^2}{{\left\Vert \mathbf{d}\right\Vert}_2^2}=\frac{{\mathbf{d}}^{\mathrm{T}}\mathbf{s}\mathbf{d}+2c\mathbf{t}\mathbf{d}+4{c}^2}{{\left\Vert \mathbf{d}\right\Vert}_2^2} $$

(32)

where $ \mathbf{s}={\displaystyle \sum_{i=1}^4{\mathbf{N}}_i^{\mathrm{T}}{\mathbf{N}}_i} $, and $ \mathbf{t}={\displaystyle \sum_{i=1}^4{\mathbf{N}}_i} $, where N _i is the i ^th row vector of N.

In order to minimize L, the critical points of Eq. (32) can be gotten by letting dL/d d = 0, that is

$$ \mathbf{s}\mathbf{d}+c{\mathbf{t}}^{\mathrm{T}}-\frac{{\mathbf{d}}^{\mathrm{T}}\mathbf{s}\mathbf{d}+2c\mathbf{t}\mathbf{d}+4{c}^2}{{\left\Vert \mathbf{d}\right\Vert}_2^2}\mathbf{d}=0 $$

(33)

According to Eq. (30), we know the expected value of $ c=-{\mathbf{N}}_i\mathbf{d}=-\raisebox{1ex}{$\mathbf{t}\mathbf{d}$}\!\left/ \!\raisebox{-1ex}{$4$}\right. $, thus

$$ \mathbf{T}\mathbf{d}-\lambda \mathbf{d}=0 $$

(34)

where $ \mathbf{T}=\mathbf{s}-\raisebox{1ex}{${\mathbf{t}}^{\mathrm{T}}\mathbf{t}$}\!\left/ \!\raisebox{-1ex}{$4$}\right. $, and $ \lambda =\frac{{\mathbf{d}}^{\mathrm{T}}\mathbf{T}\mathbf{d}}{{\left\Vert \mathbf{d}\right\Vert}_2^2} $.

Thus d is the eigen-vector of T corresponding to the eigen-value λ, and the 3D coordinates of the feature points in the world frame can be obtained from the eigen-vector of T corresponding to the minimum eigen-value.

For the 3D re-construction, the experiment is carried out in high precision robot, as can be seen from Fig. 4. Firstly move the manipulator in vertical direction through controller of servosystem, and the target block’s images are sampled by the two cameras in the stereo vision system. Then 6 points are chosen at random for precision analysis at several positions, and the 2D coordinates of the feature points projected in the left and right camera image planes are obtained and shown in Table 2.

Table 2 Coordinates of the feature points projected in the left and right image planes (/Pixel)

Full size table

In the 3D re-construction programming, the adaptive orthogonal learning network is designed similarly as Fig. 2; and the flow chart of program is similar to Fig. 3, whose input neurons are three and the output neurons are three too. And the input signals are the i ^th row vector of T. After the 3rd neuron trained, the system came to an equilibrium state, and the scale was obtained according to Eq. (35), so the world coordinates of the feature points could be obtained from the weight vector which connected the input neurons according to Eq. (36), and the 3D re-construction was achieved.

$$ s=\mathbf{t}\mathbf{d}/4 $$

(35)

$$ \mathbf{D}=\raisebox{1ex}{${\mathbf{d}}^{(3)}$}\!\left/ \!\raisebox{-1ex}{$s$}\right. $$

(36)

where d ⁽³⁾ is the eigen-vector of T corresponding to the minimal eigen-value, and D is the solved 3D coordinates of the feature point in the world frame.

On the other hand, if the least square method (LSM) is adopted, the projective matrixes of the left and right camera are obtained as shown in Table 3.

Table 3 Projective matrixes obtained with LSM

Full size table

While precision analysis experiment is carried out, the actual coordinates (AC) of feature points in the world frame are measured and shown in the 1st line of Table 4. When the 3D re-construction is achieved, if their coordinates are obtained in the light of the adaptive orthogonal learning algorithm (abbreviated as CwAOL), which are shown in the 2nd line in Table 4. For the system calibration with the least square method, the 3D coordinates of the feature points in the world coordinate system (abbreviated as CwLSM) are obtained as shown in the 3rd line in Table 4.

Table 4 3D coordinates and precision performance indexes (/mm)

Full size table

The difference between the actual coordinates in the world frame and the solved coordinates is taken as the precision performance index [3, 7, 14]. That is

$$ {d}_i=\sqrt{{\left({X}_i^{(a)}-{X}_i^{(s)}\right)}^2+{\left({Y}_i^{(a)}-{Y}_i^{(s)}\right)}^2+{\left({Z}_i^{(a)}-{Z}_i^{(s)}\right)}^2} $$

(37)

where (X ^(a)_i , Y ^(a)_i , Z ^(a)_i ) are the actual coordinates in the world frame, and (X ^(s)_i , Y ^(s)_i , Z ^(s)_i ) are the solved 3D coordinates according to the proposed technique, or the other data processing methods such as LSM.

The precision performance index of the two algorithms can be obtained according to Eq. (37), as shown in the 4th and 5th line in Table 4. PIwAOL in Table 4 denotes the precision index with the proposed technique, i.e. the self-adaptive orthogonal learning network. And PIwLSM denotes the precision index with the least square method. From Table 4, it is demonstrated that the proposed approach has higher precision, and can meet the precision requirements for engineering practice. From the above caculation, we found that the presented results are interesting: with patterns of lateral inhibition, the orthogonal learning neural network can achieve fast self-organization for relatively loose structural constraints according to simple anti-Hebbian rule, or a slight modification. The proposed technique is referential and helpful for precision manufacture and measurement, such as precision processing of micro-drill, gear, and other work-pieces.

6 Conclusions and future works

The proposed approach in the paper has the following key features as opposed to other techniques [4–6, 11, 12, 15, 17, 19, 21]: 1) The fitting projective matrixes for the left and right cameras in the binocular vision system are obtained from the eigen-vectors of an auto-correlation matrix corresponding to minimal eigen-values, which minimize the sum of the square of distances from the combined vector coordinates of the feature points to the fitting hyperplane; 2) A self-adaptive orthogonal learning neural network is designed to obtain the eigen-vectors of the autocorrelation matrix corresponding to the minimal eigen-values, where the j ^th neuron is trained, its corresponding vector is perpendicular to the front (j-1) vectors already obtained; 3) 3D re-construction is carried out according to the technique proposed above with the advantages such as programming easily and high precision. Such a study provides a new and applicable technique in data processing for calibrating binocular vision systems and 3D re-construction. In the future we will carry out research on how to set the learning rate of the orthogonal learning neural network to enable the training speed as fast as possible.

References

Božić B, Ristić K, Pejić M (2014) Parameter estimation and accuracy analysis of the free geodetic network adjustment using singular value decomposition. Technical Gazette 21(2):451–456
Google Scholar
Brandusa PA, Catalin B (2011) Development of an embedded artificial vision system for an autonomous robot. Int J Innov Comput Inf Control 7(2):745–762
Google Scholar
Chatterjee C, Roychowdhury VP, Chong EKP (1997) A nonlinear gauss-seidel algorithm for noncoplanar and coplanar camera calibration with convergence analysis. Comp Vision Image Underst (CVIU) 67(1):58–80
Article Google Scholar
Chen SY, Li YF (2013) Finding optimal focusing distance and edge blur distribution for weakly calibrated 3-D vision. IEEE Trans Ind Inf 9(3):1680–1687
Article Google Scholar
Ge DY, Yao XF (2010) Application of the Sanger operator with lateral connection in camera calibration. J Optoelectron Laser 21(11):1720–1724
Google Scholar
Guillemaut JY, Aguado AS, Illingworth J (2005) Using points at infinity for parameter decoupling in camera calibration. IEEE Trans Pattern Anal Mach Intell 27(2):265–370
Article Google Scholar
Kang JH, Kang SJ, Kim S (2015) Line recognition algorithm for 3D polygonal model using a parallel computing platform. Multimed Tools Appl 74(1):259–270
Article Google Scholar
Khan MAU, Khan TM, Khan RB, Kiyani A, Khan MA (2012) Noise characterization in web cameras using independent component analysis. Int J Innov Comput Inf Control 7(2):302–311
Article Google Scholar
Khan SA, Usman M, Riaz N (2015) Face recognition via optimized features fusion. J Intell Fuzzy Syst 28(4):1819–1828
Google Scholar
Kung SY, Diamautaras KI (1990) A neural network learning algorithm for adaptive principle components extraction (APEX). Proc Int Conf Acoust Speech Signal Process 2:861–864
Article Google Scholar
Ma SD (1996) A self-calibration technique for active vision system. IEEE Trans Robot Autom 12(1):114–120
Article Google Scholar
Miyagawa I, Arai H, Koike H (2010) Simple camera calibration from a single image using five points on two orthogonal 1-d objects. IEEE Trans Image Process 19(6):1528–1538
Article MathSciNet Google Scholar
Palmieri F, Zhu J, Chang CH (1993) Anti-Hebbian learning in topologically constrained linear networks a tutorial. IEEE Trans Neural Netw 4(5):748–761
Article Google Scholar
Perez U, Cho SH, Asfour S (2009) Volumetric calibration of stereo camera in visual servo based robot control. Int J Adv Robot Syst 6(1):35–42
Google Scholar
Rahman T, Krouglicof N (2012) An efficient camera calibration technique offering robustness and accuracy over a wide range of lens distortion. IEEE Trans Image Process 21(2):626–637
Article MathSciNet Google Scholar
Wang B, Liu W, Jia Z et al (2011) Dimensional measurement of hot, large forgings with stereo vision structured light system. Proc IMechE B J Eng Manuf 225(6):901–908
Article Google Scholar
Yin F, Makris D, Velastin SA, Ellis T (2015) Calibration and object correspondence in camera networks with widely separated overlapping views. IET Comput Vis 9(3):354–367
Article Google Scholar
Yuan YM, Fang YH, Cui FX, Li DC (2011) Research on preprocessing algorithm for differential polarization spectrum of oil spills on water. Acta Opt Sin 31(11):1128001-1–1128001-7
Google Scholar
Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334
Article Google Scholar
Zhao ZX, Wen GJ, Zhang X, Li DR (2012) Model-based estimation for pose, velocity of projectile from stereo linear array image. Meas Sci Rev 12(3):104–110
Article Google Scholar
Zheng Y, Peng SL (2014) A practical roadside camera calibration method based on least squares optimization. IEEE Trans Intell Transp Syst 15(2):831–843
Article Google Scholar
Zhou JM, Zhou QX, Shu LL, Xu DD (2011) Research on thermal image enhancement for detecting early mechanical damage in apple. Sens Lett 9(3):1031–1036
Article Google Scholar

Download references

Acknowledgments

Foundation item: The work described in this paper is partially supported by the National Natural Science Foundation of China (51175187), the Hunan Provincial Natural Science Foundation of China (09JJ6092), and the Science & Technology Foundation of Guangdong Province under grant Nos. 2013B021300023 and 2013B090600112. The authors also gratefully acknowledge the suggestions of the reviewers, which helped to improve the presentation.

Author information

Authors and Affiliations

Department of Mechanical and Electrical Engineering, Tan Kah Kee College, Xiamen University, Zhangzhou, Fujian Province, 363105, China
Dong-yuan Ge
Faculty of Business Administration, University of Macau, Macau, SAR, China
Dong-yuan Ge & Zhao-tong Lian
School of Mechanical and Automotive Engineering, South China University of Technology, Guangzhou, Guangdong Province, 510640, China
Xi-fan Yao

Authors

Dong-yuan Ge
View author publications
You can also search for this author in PubMed Google Scholar
Xi-fan Yao
View author publications
You can also search for this author in PubMed Google Scholar
Zhao-tong Lian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong-yuan Ge.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ge, Dy., Yao, Xf. & Lian, Zt. Binocular vision calibration and 3D re-construction with an orthogonal learning neural network. Multimed Tools Appl 75, 15635–15650 (2016). https://doi.org/10.1007/s11042-015-2845-5

Download citation

Received: 18 April 2015
Revised: 28 June 2015
Accepted: 26 July 2015
Published: 16 August 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11042-015-2845-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Binocular vision calibration and 3D re-construction with an orthogonal learning neural network

Abstract

Similar content being viewed by others

A Data-Driven Algorithm for Large-Scale Multi-camera Calibration

Camera Calibration Implementation Based on Zhang Zhengyou Plane Method

Neural-Network Model for Compensation of Lens Distortion in Camera Calibration

1 Introduction

2 Model of the binocular vision system

3 Minor component analysis and solving algorithm

4 Design of the self-adaptive orthogonal learning neural network

5 Results system calibration and 3D re-construction

5.1 Binocular vision system calibration experiment

5.2 3D re-construction by the self-adaptive orthogonal learning network

6 Conclusions and future works

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Binocular vision calibration and 3D re-construction with an orthogonal learning neural network

Abstract

Similar content being viewed by others

A Data-Driven Algorithm for Large-Scale Multi-camera Calibration

Camera Calibration Implementation Based on Zhang Zhengyou Plane Method

Neural-Network Model for Compensation of Lens Distortion in Camera Calibration

Explore related subjects

1 Introduction

2 Model of the binocular vision system

3 Minor component analysis and solving algorithm

4 Design of the self-adaptive orthogonal learning neural network

5 Results system calibration and 3D re-construction

5.1 Binocular vision system calibration experiment

5.2 3D re-construction by the self-adaptive orthogonal learning network

6 Conclusions and future works

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation