1 Introduction

Camera calibration is a process of determining the internal camera geometric and optical characteristics (intrinsic parameters) and/or the 3D position and orientation of the camera frame relative to a certain world frame (extrinsic parameters) [6, 12]. The applications of vision system include 3D sensing and measurement, precision manufacturing, automated assembly, monitoring and tracking etc. While a binocular vision system is calibrated, the scene’s 3D geometric information can be reconstructed by obtaining two digital images from different angles. There are many research reports on the camera calibration. For example, MA obtained a intrinsic parameters of the camera by designing two sets of three pure orthogonal translation movements, and the orientations of the camera with respect to the hand frame with a set pairwise orthogonal translation movements [11]. ZHANG obtained the intrinsic and extrinsic parameters via the homography matrix in the light of orthogonality of the rotational matrix with the homography obtained from the 3D feature coordinates in a target block and its 2D coordinates. In general, the calculated rotational matrix did not satisfy the orthogonal properties well [19]. Rahman and Krouglicof proposed the quaternion representation of spatial orientation, which resulted in a system of equations that was minimally redundant and free of singularities, and applied a technique to minimize the error between the reconstructed image points and their experimentally determined counterparts in the “distortion free” space, so the technique facilitated the incorporation of the exact lens distortion model as opposed to that relying on an approximation one [15]. Recently, we have proposed a method for camera calibration with an adaptive principal component extraction network in which the sum of square distances from the vector coordinates of feature points to those in hyperplane is taken as objective function, and the eigen-vector of autocorrelation matrix corresponding to minimal eigen-values as a projective matrix. But the calibration of binocular vision system and 3D re-construction has not been done [5]. Chen presented a novel method to analyze the blur distribution in an image and found the optimal focusing distance so that additional constraints could be used to generate absolute measurement of the models [4]. ZHENG proposed a minimum calibration condition that consisted of two vanishing points and a vanishing line so as to estimate camera intrinsic parameters (including the principal point coordinates) and rotation angles, which adopted least squares optimization instead of closed-form computation. The proposed method was practical and suitable for more traffic scenes while roadside camera is calibrated [21]. YIN presented a semi-automatic scene calibration method that combined tracked blobs with user-selected line scene features to recover the homographies between camera views, so the system could deal with mapping a network of cameras with overlapped fields of view into a single ground plane view, even when the overlap was not substantial [17]. There is no similar work done on the binocular vision system calibration and 3D re-construction by means of minor components analysis and the adaptive orthogonal learning network. Therefore, based on our previous work on neuro-calibration techniques, in this study we put forward a novel method where a self-adaptive orthogonal learning network is used to achieve calibration for the binocular vision system and 3D measurement.

2 Model of the binocular vision system

In the binocular vision system as shown in Fig. 1, camera frames are C 1 and C 2, o 1 u 1 v 1 and o 2 u 2 v 2 are the image coordinate systems measured in pixels, O w X w Y w Z w is the world frame measured in mm, the homogeneous coordinates of feature point P in the world frame are (X wi , Y wi , Z wi , 1), which is projected into the image planes so p1 and p2 are obtained, and their homogeneous coordinates are (u 1i , v 1i , 1) and (u 2i , v 2i , 1) respectively. The projective matrixes of the left and right cameras are M 1 and M 2 respectively, and the transformation relations of o 1 u 1 v 1 or o 2 u 2 v 2 and O w X w Y w Z w can be described as follows:

$$ {Z}_{ci}^{(1)}\left[\begin{array}{l}{u}_{1\mathbf{i}}\\ {}{v}_{1\mathbf{i}}\\ {}1\end{array}\right]=\left[\begin{array}{cccc}\hfill {m}_{11}^{(1)}\hfill & \hfill {m}_{12}^{(1)}\hfill & \hfill {m}_{13}^{(1)}\hfill & \hfill {m}_{14}^{(1)}\hfill \\ {}\hfill {m}_{21}^{(1)}\hfill & \hfill {m}_{22}^{(1)}\hfill & \hfill {m}_{23}^{(1)}\hfill & \hfill {m}_{24}^{(1)}\hfill \\ {}\hfill {m}_{31}^{(1)}\hfill & \hfill {m}_{32}^{(1)}\hfill & \hfill {m}_{33}^{(1)}\hfill & \hfill {m}_{34}^{(1)}\hfill \end{array}\right]\left[\begin{array}{l}{X}_{\mathbf{wi}}\\ {}{Y}_{\mathbf{wi}}\\ {}{Z}_{\mathbf{wi}}\\ {}1\end{array}\right] $$
(1)
$$ {Z}_{ci}^{(2)}\left[\begin{array}{l}{u}_{2\mathbf{i}}\\ {}{v}_{2\mathbf{i}}\\ {}1\end{array}\right]=\left[\begin{array}{cccc}\hfill {m}_{11}^{(2)}\hfill & \hfill {m}_{12}^{(2)}\hfill & \hfill {m}_{13}^{(2)}\hfill & \hfill {m}_{14}^{(2)}\hfill \\ {}\hfill {m}_{21}^{(2)}\hfill & \hfill {m}_{22}^{(2)}\hfill & \hfill {m}_{23}^{(2)}\hfill & \hfill {m}_{24}^{(2)}\hfill \\ {}\hfill {m}_{31}^{(2)}\hfill & \hfill {m}_{32}^{(2)}\hfill & \hfill {m}_{33}^{(2)}\hfill & \hfill {m}_{34}^{(2)}\hfill \end{array}\right]\left[\begin{array}{l}{X}_{\mathbf{wi}}\\ {}{Y}_{\mathbf{wi}}\\ {}{Z}_{\mathbf{wi}}\\ {}1\end{array}\right] $$
(2)

where \( {m}_{11}^{(1)},\begin{array}{ccc}\hfill \cdots, {m}_{34}^{(1)},\hfill & \hfill {m}_{11}^{(2)},\cdots, \hfill & \hfill {m}_{34}^{(2)}\hfill \end{array} \) are elements of projections matrices of the left and right cameras.

Fig. 1
figure 1

Binocular vision system

If Z (1) ci and Z (2) ci in Eqs. (1) and (2) are cancelled respectively, then we can obtain

$$ {X}_{wi}{m}_{11}^{(1)}+{Y}_{wi}{m}_{12}^{(1)}+{Z}_{wi}{m}_{13}^{(1)}+{m}_{14}^{(1)}-{u}_{1i}{X}_{wi}{m}_{31}^{(1)}-{u}_{1i}{Y}_{wi}{m}_{32}^{(1)}-{u}_{1i}{Z}_{wi}{m}_{33}^{(1)}-{u}_{1i}{m}_{34}^{(1)}=0 $$
(3)
$$ {X}_{wi}{m}_{21}^{(1)}+{Y}_{wi}{m}_{22}^{(1)}+{Z}_{wi}{m}_{23}^{(1)}+{m}_{24}^{(1)}-{v}_{1i}{X}_{wi}{m}_{31}^{(1)}-{v}_{1i}{Y}_{wi}{m}_{32}^{(1)}-{v}_{1i}{Z}_{wi}{m}_{33}^{(1)}-{v}_{1i}{m}_{34}^{(1)}=0 $$
(4)
$$ {X}_{wi}{m}_{11}^{(2)}+{Y}_{wi}{m}_{12}^{(2)}+{Z}_{wi}{m}_{13}^{(2)}+{m}_{14}^{(2)}-{u}_{2i}{X}_{wi}{m}_{31}^{(2)}-{u}_{2i}{Y}_{wi}{m}_{32}^{(2)}-{u}_{2i}{Z}_{wi}{m}_{33}^{(2)}-{u}_{2i}{m}_{34}^{(2)}=0 $$
(5)
$$ {X}_{wi}{m}_{21}^{(2)}+{Y}_{wi}{m}_{22}^{(2)}+{Z}_{wi}{m}_{23}^{(2)}+{m}_{24}^{(2)}-{v}_{2i}{X}_{wi}{m}_{31}^{(2)}-{v}_{2i}{Y}_{wi}{m}_{32}^{(2)}-{v}_{2i}{Z}_{wi}{m}_{33}^{(2)}-{v}_{2i}{m}_{34}^{(2)}=0 $$
(6)

At the same time, Eqs. (3), (4), (5) and (6) can be divided by − u 1i , − v 1i , − u 2i and − v 2i respectively, which don’t change the transformation relation of two sides in Eqs. (3)–(6). While the binocular vision system is calibrated, if n feature pionts’ coordinates in world frame and in image frames are obtained, a linear equation can be obtained according to Eqs. (3) and (4), that is

$$ {\mathbf{A}}_1{\mathbf{n}}_1={\mathbf{0}}^{(1)} $$
(7)

where \( {\mathbf{A}}_1=\left[\begin{array}{cccccccccccc}\hfill -{X}_{w1}/{u}_{11}\hfill & \hfill -{Y}_{w1}/{u}_{11}\hfill & \hfill -{Z}_{w1}/{u}_{11}\hfill & \hfill -1/{u}_{11}\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill {X}_{w1}\hfill & \hfill {Y}_{w1}\hfill & \hfill {Z}_{w1}\hfill & \hfill 1\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill -{X}_{w1}/{v}_{11}\hfill & \hfill -{Y}_{w1}/{v}_{11}\hfill & \hfill -{Z}_{w1}/{v}_{11}\hfill & \hfill -1/{v}_{11}\hfill & \hfill {X}_{w1}\hfill & \hfill {Y}_{w1}\hfill & \hfill {Z}_{w1}\hfill & \hfill 1\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill \\ {}\hfill -{X}_{wn}/{u}_{1n}\hfill & \hfill -{X}_{wn}/{u}_{1n}\hfill & \hfill -{X}_{wn}/{u}_{1n}\hfill & \hfill -1/{u}_{1n}\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill 1\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill -{X}_{wn}/{v}_{1n}\hfill & \hfill -{X}_{wn}/{v}_{1n}\hfill & \hfill -{X}_{wn}/{v}_{1n}\hfill & \hfill -1/{v}_{1n}\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill 1\hfill \end{array}\right] \), which is a 2n × 12 matrix; n 1 is a column vector, consisting of m (1)11 , ⋯ m (1)14 , m (1)21 , ⋯ m (1)24 , m (1)31 , ⋯, m (1)34 ; and 0 (1) is a vector consisting of 12 constants of 0.

As for the right camera, a similar equation like Eq. (7) can be obtained from Eqs. (5) and (6), that is

$$ {\mathbf{A}}_2{\mathbf{n}}_2={\mathbf{0}}^{(2)} $$
(8)

According to Eqs. (7) and (8), we can get an overdetermined equation, that is

$$ \mathbf{An}=\mathbf{0} $$
(9)

where \( \mathbf{A}=\left[\begin{array}{cc}\hfill {\mathbf{A}}_1\hfill & \hfill {\mathbf{0}}_1\hfill \\ {}\hfill {\mathbf{0}}_2\hfill & \hfill {\mathbf{A}}_2\hfill \end{array}\right] \), which is a 4n × 24 matrix; n = [n 1, n 2]T, which is a column vector consisting of elements of n 1 and n 2, and 0 1 and 0 2 are zero matrixes with 12 × 12.

3 Minor component analysis and solving algorithm

While the binocular vision system is calibrated, the 22 elements of the fitting projective matrixes except m (1)34 and m (2)34 in Eq. (9) are taken as the coefficients of the hyper-plane, which constitute the fitting vector m = [m (1)11 , ⋯, m (1)33 , m (2)11 , ⋯, m (2)33 ]T. The coordinates of sampled points in the world frame and in the image frame are transformed to vector points x i , so the algorithm we adopted is to minimize the sum of the squared distances between all the vector points (i.e. combination coordinates) x i and fitting hyper-plane, thus the objective function is

$$ \underset{\mathbf{m}}{ \min }E\left(\mathbf{m}\right)={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i}{e}_j^2}} $$
(10)

where e j  = m T x j /‖m2.

Let \( {\mathbf{x}}_{4i-3}={\left[\begin{array}{cc}\hfill \begin{array}{ccccc}\hfill -{X}_{wi}/{u}_{1i}\hfill & \hfill -{Y}_{wi}/{u}_{1i}\hfill & \hfill -{Z}_{wi}/{u}_{1i}\hfill & \hfill -1/{u}_{1i}\hfill & \hfill {\mathbf{0}}^{(1)}\hfill \end{array}\hfill & \hfill \begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\mathbf{0}\hfill \end{array}\right]}^{\mathrm{T}} \), \( {\mathbf{x}}_{4i-2}={\left[\begin{array}{ccccc}\hfill {\mathbf{0}}^{(1)}\hfill & \hfill -{X}_{wi}/{v}_{1i}\hfill & \hfill -{Y}_{wi}/{v}_{1i}\hfill & \hfill -{Z}_{wi}/{v}_{1i}\hfill & \hfill -1/{v}_{1i}\begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\mathbf{0}\hfill \end{array}\right]}^{\mathrm{T}} \), \( {\mathbf{x}}_{4i-1}={\left[\begin{array}{ccc}\hfill \mathbf{0}\hfill & \hfill \begin{array}{ccccc}\hfill -{X}_{wi}/{u}_{2i}\hfill & \hfill -{Y}_{wi}/{u}_{2i}\hfill & \hfill -{Z}_{wi}/{u}_{2i}\hfill & \hfill -1/{u}_{2i}\hfill & \hfill {\mathbf{0}}^{(1)}\hfill \end{array}\hfill & \hfill \begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\hfill \end{array}\right]}^{\mathrm{T}} \), \( {\mathbf{x}}_{4i}={\left[\begin{array}{ccc}\hfill \mathbf{0}\hfill & \hfill \begin{array}{ccccc}\hfill {\mathbf{0}}^{(1)}\hfill & \hfill -{X}_{wi}/{v}_{2i}\hfill & \hfill -{W}_{wi}/{v}_{2i}\hfill & \hfill -{Z}_{wi}/{v}_{2i}\hfill & \hfill -1/{v}_{2i}\hfill \end{array}\hfill & \hfill \begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\hfill \end{array}\right]}^{\mathrm{T}} \) where 0 is a 1 × 11 row vector, and 0 (1) is a 1 × 4 row vector. Thus

$$ \begin{array}{c}\hfill E\left(\mathbf{m}\right)={\displaystyle \sum_{i=1}^N\frac{{\displaystyle \sum_{j=4i-3}^{4i-2}{\left({\mathbf{m}}^{\mathrm{T}}{\mathbf{x}}_j+{m}_{34}^{(1)}\right)}^2}+{\displaystyle \sum_{j=4i-1}^{4i}{\left({\mathbf{m}}^{\mathrm{T}}{\mathbf{x}}_j+{m}_{34}^{(2)}\right)}^2}}{\left|\right|\mathbf{m}\left|\right|{}_2^2}}\hfill \\ {}\hfill =\frac{{\mathbf{m}}^{\mathrm{T}}\mathbf{R}\mathbf{m}+2{m}_{34}^{(1)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_1+2{m}_{34}^{(2)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_2+2N\left({\left({m}_{34}^{(1)}\right)}^2+{\left({m}_{34}^{(2)}\right)}^2\right)}{\left|\right|\mathbf{m}\left|\right|{}_2^2}\hfill \end{array} $$
(11)

where \( \mathbf{R}={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}}+{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}} \), which can be written as [R 1 + R 2]; R 1 and R 2 are 22 × 22 real symmetric matrixes, i.e., \( {\mathbf{R}}_1={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}} \), and \( {\mathbf{R}}_2={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}} \); \( {\mathbf{b}}_1={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{x}}_j}} \); and \( {\mathbf{b}}_2={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{x}}_j}} \).

In order to get the minimum, the critical point can be obtained from dE/d m = 0, that is

$$ \left(\mathbf{R}+{m}_{34}^{(1)}{\mathbf{b}}_1+{m}_{34}^{(2)}{\mathbf{b}}_2\right)\mathbf{m}-\frac{{\mathbf{m}}^{\mathrm{T}}\mathbf{R}\mathbf{m}+2{m}_{34}^{(1)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_1+2{m}_{34}^{(2)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_2+2N{\left({m}_{34}^{(1)}+{m}_{34}^{(2)}\right)}^2}{\left|\right|\mathbf{m}\left|\right|{}_2^2}\mathbf{m}=\mathbf{0} $$
(12)

Let \( \lambda =\frac{{\mathbf{m}}^{\mathrm{T}}\mathbf{R}\mathbf{m}+2{m}_{34}^{(1)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_1+2{m}_{34}^{(2)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_2+2N{\left({m}_{34}^{(1)}+{m}_{34}^{(2)}\right)}^2}{\left|\right|\mathbf{m}\left|\right|{}_2^2} \). According to the expected values of m (1)34 and m (2)34 from Eqs. (7) and (8). And assuming m (1)34  = − m T b 1/2N and m (2)34  = − m T b 2/2N, so we have

$$ \mathbf{Mm}-\lambda \mathbf{m}=\mathbf{0} $$
(13)

where M = R − B, \( \lambda =\frac{{\mathbf{m}}^{\mathrm{T}}\mathbf{Mm}}{{\mathbf{m}}^{\mathrm{T}}\mathbf{m}} \), \( \mathbf{B}=\left[\begin{array}{cc}\hfill {\mathbf{B}}_1\hfill & \hfill {\mathbf{B}}_2\hfill \end{array}\right] \), B 1 = b 1 b 1 T/2N, B 2 = b 2 b 2 T/2N, and \( \mathbf{M}=\left[\begin{array}{cc}\hfill {\mathbf{M}}_1\hfill & \hfill \mathbf{0}\hfill \\ {}\hfill \mathbf{0}\hfill & \hfill {\mathbf{M}}_2\hfill \end{array}\right] \). Thus λ is the eigen-value of M, and m is its corresponding eigen-vector [1].

At the same time, assuming eigen values of M 1 and M 2 to be λ 1, λ 2, ⋯, and λ 11, μ 1, μ 2, ⋯, μ 11, respectively, so there are orthogonal matrixes P and Q, which meet M 1 =  1 P − 1, and M 2 =  2 Q − 1, where \( {\boldsymbol{\Lambda}}_1= diag\left(\begin{array}{cccc}\hfill {\lambda}_1,\hfill & \hfill {\lambda}_2,\hfill & \hfill \cdots, \hfill & \hfill {\lambda}_{11}\hfill \end{array}\right) \), \( {\boldsymbol{\Lambda}}_2= diag\left(\begin{array}{cccc}\hfill {\mu}_1,\hfill & \hfill {\mu}_2,\hfill & \hfill \cdots, \hfill & \hfill {\mu}_{11}\hfill \end{array}\right) \). Thus

$$ \mathbf{M}=\left[\begin{array}{cc}\hfill \mathbf{P}{\boldsymbol{\Lambda}}_1{\mathbf{P}}^{-1}\hfill & \hfill \mathbf{0}\hfill \\ {}\hfill \mathbf{0}\hfill & \hfill \mathbf{Q}{\boldsymbol{\Lambda}}_2{\mathbf{Q}}^{-1}\hfill \end{array}\right]=\left[\begin{array}{cc}\hfill \mathbf{P}\hfill & \hfill \mathbf{0}\hfill \\ {}\hfill \mathbf{0}\hfill & \hfill \mathbf{Q}\hfill \end{array}\right]\boldsymbol{\Lambda} {\left[\begin{array}{cc}\hfill \mathbf{P}\hfill & \hfill \mathbf{0}\hfill \\ {}\hfill \mathbf{0}\hfill & \hfill \mathbf{Q}\hfill \end{array}\right]}^{-1} $$
(14)

As for the eigen values of M, we have sorted them in order from large to small. For example \( \boldsymbol{\Lambda} = diag\left(\begin{array}{cccccc}\hfill {\mu}_1\hfill & \hfill {\lambda}_1\hfill & \hfill \cdots \hfill & \hfill {\lambda}_{10}\hfill & \hfill {\lambda}_{11}\hfill & \hfill {\mu}_{11}\hfill \end{array}\right) \), or \( \boldsymbol{\Lambda} = diag\left(\begin{array}{cccccc}\hfill {\lambda}_1\hfill & \hfill {\mu}_1\hfill & \hfill \cdots \hfill & \hfill {\mu}_{11}\hfill & \hfill {\lambda}_{10}\hfill & \hfill {\lambda}_{11}\hfill \end{array}\right) \), and so on. Thus M can be described as M = BΛ B − 1, where B consists of 22 column vectors such as [m 1, 0]T, [0, m 2]T, [m 3, 0]T, …, [0, m j ]T , …, and [m 22, 0]T, which is an orthogonal eigen vector of M corresponding to the eigen values respectively, and the projective matrixes of binocular vision are obtained from the normalization eigenvectors of the M corresponding to the minimum eigen values.

Due to the projective matrices of the left and right cameras being different in binocular vision system, if the left camera’s projective matrix is obtained from the eigen vector m = [m 22, 0]T of the autocorrelation matrices corresponding to the minimal eigen value λ 11, so the right camera’s projective matrix can be obtained from the eigen vector m = [0, m 21]T corresponding to the minimal eigen value μ 11, where 0 is a 1 × 11 matrix. And \( {m}_{34}^{(1)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{m}}_{22}^{\mathrm{T}}{\mathbf{x}}_j}}/2N \) , \( {m}_{34}^{(2)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{m}}_{21}^{\mathrm{T}}{\mathbf{x}}_j}}/2N \), where i = 1, 2, ⋯, N. Thus the projective matrix of the left camera is m L  = [m 22, m (1)34 ]T, and that for the right camera is m R  = [m 21, m (2)34 ]T.

4 Design of the self-adaptive orthogonal learning neural network

An orthogonal learning neural network with a lateral connection proposed by KUNG [10] was adopted in the experiments. The structure is shown in Fig. 2, and its input data are the row vectors of the auto-correlation matrix M. There are 22 neurons in the output layer, and the 1st neuron connects with input neurons with m 1 = [m (1)1 , m (2)1 , ⋯, m (22)1 ]T, but without a lateral connection while trained. The j th neuron connects with both the input neurons as m j  = [m (1) j , m (2) j , ⋯, m (22) j ]T, and the front (j-1) outputs as the lateral weight vector W j  = [w (1) j , w (2) j , …, w (j − 1) j ]T. The network is trained out step by step. While the j th neuron is training, the 1st, 2nd, …, and (j-1)th neurons have already been trained, i.e. stable values m 1, m 2, ⋯, m j − 1 are obtained, all of which are orthogonal each other. When the j th neuron is trained completely, the lateral connection weights approximate 0, and m j is perpendicular to m 1, m 2, ⋯, m j − 1 respectively [13].

Fig. 2
figure 2

Structure of an orthogonal learning network

The output of the 1st neuron is

$$ {O}_1={\mathbf{m}}_1^{\mathrm{T}}{\mathbf{M}}_i $$
(15)

Due to the fact that there is no lateral connection, the learning algorithm for the 1st neuron is as following,

$$ \varDelta {\mathbf{m}}_1=\beta \left({O}_1{\mathbf{M}}_i-\frac{O_1^2}{{\mathbf{m}}_1^{\mathrm{T}}{\mathbf{m}}_1}{\mathbf{m}}_1\right) $$
(16)

The 1st term of Eq. (16) is that for Hebbian learning rule, which represents a self-strengthening function. When the network comes to the stable state, i.e. Δ m 1 → 0, we have \( {O}_1{\mathbf{M}}_i-\frac{O_1^2}{{\mathbf{m}}_1^{\mathrm{T}}{\mathbf{m}}_1}{\mathbf{m}}_1=\mathbf{0} \). According to Eqs. (15) and (16), Mm i  − λ 1 m 1 = 0, and \( {\lambda}_1=\frac{{\mathbf{m}}_1^{\mathrm{T}}\mathbf{M}{\mathbf{m}}_1}{{\mathbf{m}}_1^{\mathrm{T}}{\mathbf{m}}_1} \), where m 1 is the eigen-vector of the self-correlation matrix M, which is corresponding to the max eigen value λ 1.

The training for the j th neuron is similar to the above method, i.e.

$$ {\mathbf{O}}_j={\mathbf{V}}_j{\mathbf{M}}_i $$
(17)
$$ {O}_j={\mathbf{m}}_j^T{\mathbf{M}}_i+{\mathbf{W}}_j^{\mathrm{T}}{\mathbf{O}}_j $$
(18)

where O j  = [O 1, O 2, ⋯, O j − 1]T, which is the output of the front (j-1) neurons; \( {\mathbf{V}}_j={\left[\begin{array}{cc}\hfill {\mathbf{m}}_1,\hfill & \hfill {\mathbf{m}}_2,\cdots, {\mathbf{m}}_{j-1}\hfill \end{array}\right]}^{\mathrm{T}} \), a weight matrix; and W j  = [w (1) j , w (2) j , ⋯, w (j − 1) j ]T, the vector of the lateral connection weights for the j th neuron.

After normalized, the learning rule for the j th neuron is as follows [8, 18],

$$ \varDelta {\mathbf{m}}_j=\beta \left({O}_j{\mathbf{M}}_i-\frac{O_j^2}{{\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j}{\mathbf{m}}_j\right) $$
(19)
$$ \varDelta {\mathbf{W}}_j=-\gamma \left({O}_j{\mathbf{O}}_j+{O}_j^2{\mathbf{W}}_j\right) $$
(20)

where β and γ are positive parameters which determine the learning rates, and their values are set according to the corresponding autocorrelation-matrix for fast training speed with no oscillations. The 1st term of Eq. (19) is that for Hebbian learning rule, which represents a self-strengthening function; the 2nd terms in Eqs. (19) and (20) play a stable role for system, and the 1st term in Eq. (20) stands for the anti-Hebbian learning rule, which causes an inhibition function, and makes the outputs of the network non-correlative even if the input signals are correlative. That is the weight W j plays the role of “subtracting” the first (j-1) components from the j th neuron, i.e. the first principal component m 1 of M, the second principal component m 2 of M, …, and the (j-1)th principal component m j − 1 of M are subtracted. Thus the j th vector m j tends to become orthogonal to all the previous components, i.e. m 1, m 2, ⋯, m j − 1, when the train of j th neuron is over. Hence the orthogonal learning rule constitutes an anti-Hebbian rule.

And the iteration algorithms for m j (t + 1) and W j (t + 1) are m j (t + 1) = m j (t) + Δ m j (t), and W j (t + 1) = W j (t) + Δ W j (t), respectively. Assuming β and γ are sufficiently small values so that the values of m j (t + 1) and W j (t + 1) remain approximately constant during that period of time while an average of the variable in an equation is taken over one sweep of the training data (one sweep means one round of training process involving all the given sample input patterns). To facilitate the proof, we assume that β = γ. Therefore, according to the Eqs. (19) and (20) the weight iteration in one sweep for the orthogonal learning network can be rewritten in the form of state transition matrix as follows,

$$ \left[\begin{array}{c}\hfill {\mathbf{m}}_j\left(t+1\right)\hfill \\ {}\hfill {\mathbf{W}}_j\left(t+1\right)\hfill \end{array}\right]=\left[\begin{array}{cc}\hfill {\mathbf{M}}_{11}\hfill & \hfill {\mathbf{M}}_{12}\hfill \\ {}\hfill {\mathbf{M}}_{21}\hfill & \hfill {\mathbf{M}}_{22}\hfill \end{array}\right]\left[\begin{array}{l}{\mathbf{m}}_j(t)\\ {}{\mathbf{W}}_j(t)\end{array}\right] $$
(21)

where \( {\mathbf{M}}_{11}={\mathbf{E}}_{22}+\gamma \left(\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right)-\frac{\sigma (t)}{{\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j}{\mathbf{E}}_{22}\right) \), \( {\mathbf{M}}_{12}=\gamma \left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right){\mathbf{V}}_j^{\mathrm{T}} \), \( {\mathbf{M}}_{21}=-\gamma {\mathbf{V}}_j\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right) \), \( {\mathbf{M}}_{22}={\mathbf{E}}_{j-1}-\gamma \left({\mathbf{V}}_j\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right){\mathbf{V}}_j^{\mathrm{T}}+\sigma (t){\mathbf{E}}_{j-1}\right) \), and σ(t) = E{O 2 j (t)}, where E 22 is a unit matrix with 22 × 22, and E j − 1 is a unit matrix with (j − 1) × (j − 1).

In the experiment, first the m j is initialed at random; then let them be normalized to form a unitary vector; and at every iteration, while the tuned vector m j is obtained, which is normalized to obtained a unitary vector again. Thus m T j m j  = 1, i.e. m j is a unitary vector during the iteration processing. On the other hand, while m j (t + 1) in Eq. (21) is left multiplied by V j , then W j (t + 1) is added, so we have

$$ {\mathbf{V}}_j{\mathbf{m}}_j\left(t+1\right)+{\mathbf{W}}_j\left(t+1\right)=\left(1-\gamma \sigma (t)\right)\left({\mathbf{V}}_j{\mathbf{m}}_j(t)+{\mathbf{W}}_j(t)\right) $$
(22)

As 1 − γσ(t) < 1, when t → ∞, we have

$$ {\mathbf{V}}_j{\mathbf{m}}_j\left(t+1\right)+{\mathbf{W}}_j\left(t+1\right)\to \mathbf{0} $$
(23)

At the same time, according to Eq. (19), when the system comes to the steady state, that is Δ m j (t + 1) → 0, we have \( \left(\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right)-{\lambda}_j{\mathbf{E}}_{22}\right){\mathbf{m}}_j+\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right){\mathbf{V}}_j^T{\mathbf{W}}_j\to \mathbf{0} \). Due to \( \left(\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right)-{\lambda}_j{\mathbf{E}}_{22}\right){\mathbf{m}}_j\to \mathbf{0} \), thus we have \( \left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right){\mathbf{V}}_j^T{\mathbf{W}}_j\to \mathbf{0} \), and W j  → 0 (with probability 1). From Eq. (23) we know V j m j (t + 1) → 0, that is m j (t + 1) is orthogonal to the vector elements of V j (i.e. m 1, m 2, … , m j − 1), if the iteration times are sufficiently large.

Assuming the learning rates β and γ decrease to zero at the proper speed (for example, let β = Δt) , thus the Eq. (19) can be described differentially, that is,

$$ \frac{d{\mathbf{m}}_j}{dt}={O}_j{\mathbf{M}}_i-\frac{O_j^2}{{\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j}{\mathbf{m}}_j $$
(24)

When the system comes to a stable equilibrium, i.e. d m j /dt → 0, the lateral connection approximates to zero. Thus, O j  = m i M i . According to Eq. (24), the asymptotic stable solution is obtained as follows,

$$ \mathbf{M}{\mathbf{m}}_j=\frac{{\mathbf{m}}_j^{\mathrm{T}}\mathbf{M}{\mathbf{m}}_j}{{\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j}{\mathbf{m}}_j $$
(25)

where \( \mathbf{M}={\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i^{\mathrm{T}}{\mathbf{M}}_i} \), which is an auto-correlation and symmetrical matrix.

The flow of the solving program is shown in Fig. 3, where ε 1 and ε 2 are jump conditions for program iteration.

Fig. 3
figure 3

Flow chart of the program

When the network comes to the stable state, m 1, m 2, ⋯, m 22 will converge to the eigen-vectors of an auto-correlation matrix M, that is \( \underset{n\to \infty }{ \lim }{\lambda}_j=\left({\mathbf{m}}_j^{\mathrm{T}}\mathbf{M}{\mathbf{m}}_j\right)/\left({\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j\right) \), which is equivalent to the Lagrange multiplier in Eq. (13). Thus the eigenvectors of M corresponding to a minimum eigen value λ 11 and μ 11, i.e. \( \left[\begin{array}{cc}\hfill {\mathbf{m}}_1\hfill & \hfill \mathbf{0}\hfill \end{array}\right] \) and \( \left[\begin{array}{cc}\hfill \mathbf{0}\hfill & \hfill {\mathbf{m}}_2\hfill \end{array}\right] \) are obtained respectively, elements of which can be taken as the fitting coefficients of the projective matrixes of the cameras in the binocular vision system.

5 Results system calibration and 3D re-construction

5.1 Binocular vision system calibration experiment

As been shown in Fig. 4, a precise robot, which consists of servomechanism, motion controllers, mechanical body, binocular vision system and so on. In the vision system, two cameras are mounted at the ends of a manipulator, which move with the end-effector together (eye-in-hand), so the transformation relation of end-effector and cameras is constant while the manipulator moves. While the vision system is calibrated, first the 3D coordinates of feature points are measured with a 3 dimension coordinate measuring machine [16, 22]. And in order to obtain the variations in Z axis direction of feature points coordinates, the target block’s images are sampled with the cameras at different positions by moving the manipulator vertically. Then the corresponding 2D coordinates are estimated with sub-pixel accuracy in the light of the improved Canny’s edge detector algorithm [2, 9].

Fig. 4
figure 4

Binocular vision system and manipulators

In the program, the forward and lateral connection weights are initialized at random, and let ε 1 = 0.05, ε 2 = 0.005. After the 22nd neuron trained, the eigen-vector of an auto-correlation matrix of input signals can be obtained, which corresponds to the minimal eigen values, and if \( {\mathbf{v}}_{22}={\left[\begin{array}{cc}\hfill {\mathbf{m}}^{(1)}\hfill & \hfill \mathbf{0}\hfill \end{array}\right]}^{\mathrm{T}} \), else \( {\mathbf{v}}_{21}={\left[\begin{array}{cc}\hfill \mathbf{0}\hfill & \hfill {\mathbf{m}}^{(2)}\hfill \end{array}\right]}^{\mathrm{T}} \). So parameters m (1)34 and m (2)34 can be obtained, i.e. \( {m}_{34}^{(1)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{m}}_{22}^{\mathrm{T}}{\mathbf{x}}_j}}/2N \), and \( {m}_{34}^{(2)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{m}}_{21}^{\mathrm{T}}{\mathbf{x}}_j}}/2N \), where i = 1, 2, …, N, m 21 and m 22 are corresponding to eigen values λ 11 and μ 11 respectively. So the projective matrixes of the left and right cameras in the binocular vision system can be obtained from the eigenvectors corresponding to minimum eigen values m (1)34 and m (2)34 respectively, which are shown in Table 1.

Table 1 Projective matrixes of binocular vision system obtained with the proposed approach

The elements in Table 1 constitute the projective matrices of left and right cameras, so the transformation relations between the image frames and the world frame in binocular vision system are estimated, which are obtained at once.

5.2 3D re-construction by the self-adaptive orthogonal learning network

In the system of binocular vision, the 3D re-construction can be carried out in the light of the self-adaptive orthogonal learning network with lateral restraint. Assume homogeneous coordinates of a feature point in the world frame to be (X, Y, Z, 1), whose projective coordinates in the left and right camera image planes are (u 1i , v 1i , 1) and (u 2i , v 2i , 1), according to the camera’s mathematic model, 4 equations can be obtained from Eqs. (3) – (6) as follows:

$$ \left({u}_1{m}_{31}^{(1)}-{m}_{11}^{(1)}\right)X+\left({u}_1{m}_{32}^{(1)}-{m}_{12}^{(1)}\right)Y+\left({u}_1{m}_{33}^{(1)}-{m}_{13}^{(1)}\right)Z={m}_{14}^{(1)}-{u}_1{m}_{34}^{(1)} $$
(26)
$$ \left({v}_1{m}_{31}^{(1)}-{m}_{21}^{(1)}\right)X+\left({v}_1{m}_{32}^{(1)}-{m}_{22}^{(1)}\right)Y+\left({v}_1{m}_{33}^{(1)}-{m}_{23}^{(1)}\right)Z={m}_{24}^{(1)}-{v}_1{m}_{34}^{(1)} $$
(27)
$$ \left({u}_2{m}_{31}^{(2)}-{m}_{11}^{(2)}\right)X+\left({u}_2{m}_{32}^{(2)}-{m}_{12}^{(2)}\right)Y+\left({u}_2{m}_{33}^{(2)}-{m}_{13}^{(2)}\right)Z={m}_{14}^{(2)}-{u}_2{m}_{34}^{(2)} $$
(28)
$$ \left({v}_2{m}_{31}^{(2)}-{m}_{21}^{(2)}\right)X+\left({v}_2{m}_{32}^{(2)}-{m}_{22}^{(2)}\right)Y+\left({v}_2{m}_{33}^{(2)}-{m}_{23}^{(2)}\right)Z={m}_{24}^{(2)}-{v}_2{m}_{34}^{(2)} $$
(29)

According to analytic geometry, the physical meaning of Eqs. (26) and (27) (or Eqs. (28) and (29)) denotes the line for O1P1 (or O2P2) as shown in Fig 1. In order to obtain the 3D information, the coordinates of P can be obtained from the cross point of O1P1 and O2P2 [20]. In the solving algorithm, from Eqs. (26)–(29), we let \( \mathbf{N}=\left[\begin{array}{ccc}\hfill \frac{u1{m}_{31}^{(1)}-{m}_{11}^{(1)}}{m_{14}^{(1)}-{u}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{u_1{m}_{32}^{(1)}-{m}_{12}^{(1)}}{m_{14}^{(1)}-{u}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{u_1{m}_{33}^{(1)}-{m}_{13}^{(1)}}{m_{14}^{(1)}-{u}_1{m}_{34}^{(1)}}\hfill \\ {}\hfill \frac{v_1{m}_{31}^{(1)}-{m}_{21}^{(1)}}{m_{24}^{(1)}-{v}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{v_1{m}_{32}^{(1)}-{m}_{22}^{(1)}}{m_{24}^{(1)}-{v}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{v_1{m}_{33}^{(1)}-{m}_{23}^{(1)}}{m_{24}^{(1)}-{v}_1{m}_{34}^{(1)}}\hfill \\ {}\hfill \frac{u_2{m}_{31}^{(2)}-{m}_{11}^{(2)}}{m_{14}^{(2)}-{u}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{u_2{m}_{32}^{(2)}-{m}_{12}^{(2)}}{m_{14}^{(2)}-{u}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{u_2{m}_{33}^{(2)}-{m}_{13}^{(2)}}{m_{14}^{(2)}-{u}_2{m}_{34}^{(2)}}\hfill \\ {}\hfill \frac{v_2{m}_{31}^{(2)}-{m}_{21}^{(2)}}{m_{24}^{(2)}-{v}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{v_2{m}_{32}^{(2)}-{m}_{22}^{(2)}}{m_{24}^{(2)}-{v}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{v_2{m}_{33}^{(2)}-{m}_{23}^{(2)}}{m_{24}^{(2)}-{v}_2{m}_{34}^{(2)}}\hfill \end{array}\right] \), \( \mathbf{d}=\left[\begin{array}{c}\hfill x\hfill \\ {}\hfill y\hfill \\ {}\hfill z\hfill \end{array}\right] \), and \( \mathbf{c}={\left[\begin{array}{cccc}\hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill \end{array}\right]}^{\mathrm{T}} \), so we have

$$ \mathbf{N}\mathbf{d}+\mathbf{c}=\mathbf{0} $$
(30)

According to the above methods, let the objective function be as follows,

$$ \underset{\mathbf{d}}{ \min }L\left(\mathbf{d}\right)={\displaystyle \sum_{i=1}^4{r}_i^2} $$
(31)

where \( {r}_i=\frac{\left|{\mathbf{N}}_i\mathbf{d}+{c}_i\right|}{{\left\Vert \mathbf{d}\right\Vert}_2} \), and N i is the i th row vector of N. In the algorithm, let c i  = c (i = 1, 2, 3, 4), where c is the constant proportional to 1, which has no influence on the solving iteration of 3D re-construction. For example, if a fitting coefficient vector d can make the sum of the squares of the distances from all the vector points’ coordinates N i to the fitting hyperplane minimizing Eq. (31) with constant 1, it is the same for Eq. (31) with arbitrary constant c, and there is only an offset value for Eq. (31). Thus Eq. (31) can be re-written as follows,

$$ L=\frac{{\displaystyle \sum_{i=1}^4{\mathbf{d}}^{\mathrm{T}}{\mathbf{N}}_i^{\mathrm{T}}{\mathbf{N}}_i\mathbf{d}}+2c{\displaystyle \sum_{i=1}^4{\mathbf{N}}_i\mathbf{d}}+4{c}^2}{{\left\Vert \mathbf{d}\right\Vert}_2^2}=\frac{{\mathbf{d}}^{\mathrm{T}}\mathbf{s}\mathbf{d}+2c\mathbf{t}\mathbf{d}+4{c}^2}{{\left\Vert \mathbf{d}\right\Vert}_2^2} $$
(32)

where \( \mathbf{s}={\displaystyle \sum_{i=1}^4{\mathbf{N}}_i^{\mathrm{T}}{\mathbf{N}}_i} \), and \( \mathbf{t}={\displaystyle \sum_{i=1}^4{\mathbf{N}}_i} \), where N i is the i th row vector of N.

In order to minimize L, the critical points of Eq. (32) can be gotten by letting dL/d d = 0, that is

$$ \mathbf{s}\mathbf{d}+c{\mathbf{t}}^{\mathrm{T}}-\frac{{\mathbf{d}}^{\mathrm{T}}\mathbf{s}\mathbf{d}+2c\mathbf{t}\mathbf{d}+4{c}^2}{{\left\Vert \mathbf{d}\right\Vert}_2^2}\mathbf{d}=0 $$
(33)

According to Eq. (30), we know the expected value of \( c=-{\mathbf{N}}_i\mathbf{d}=-\raisebox{1ex}{$\mathbf{t}\mathbf{d}$}\!\left/ \!\raisebox{-1ex}{$4$}\right. \), thus

$$ \mathbf{T}\mathbf{d}-\lambda \mathbf{d}=0 $$
(34)

where \( \mathbf{T}=\mathbf{s}-\raisebox{1ex}{${\mathbf{t}}^{\mathrm{T}}\mathbf{t}$}\!\left/ \!\raisebox{-1ex}{$4$}\right. \), and \( \lambda =\frac{{\mathbf{d}}^{\mathrm{T}}\mathbf{T}\mathbf{d}}{{\left\Vert \mathbf{d}\right\Vert}_2^2} \).

Thus d is the eigen-vector of T corresponding to the eigen-value λ, and the 3D coordinates of the feature points in the world frame can be obtained from the eigen-vector of T corresponding to the minimum eigen-value.

For the 3D re-construction, the experiment is carried out in high precision robot, as can be seen from Fig. 4. Firstly move the manipulator in vertical direction through controller of servosystem, and the target block’s images are sampled by the two cameras in the stereo vision system. Then 6 points are chosen at random for precision analysis at several positions, and the 2D coordinates of the feature points projected in the left and right camera image planes are obtained and shown in Table 2.

Table 2 Coordinates of the feature points projected in the left and right image planes (/Pixel)

In the 3D re-construction programming, the adaptive orthogonal learning network is designed similarly as Fig. 2; and the flow chart of program is similar to Fig. 3, whose input neurons are three and the output neurons are three too. And the input signals are the i th row vector of T. After the 3rd neuron trained, the system came to an equilibrium state, and the scale was obtained according to Eq. (35), so the world coordinates of the feature points could be obtained from the weight vector which connected the input neurons according to Eq. (36), and the 3D re-construction was achieved.

$$ s=\mathbf{t}\mathbf{d}/4 $$
(35)
$$ \mathbf{D}=\raisebox{1ex}{${\mathbf{d}}^{(3)}$}\!\left/ \!\raisebox{-1ex}{$s$}\right. $$
(36)

where d (3) is the eigen-vector of T corresponding to the minimal eigen-value, and D is the solved 3D coordinates of the feature point in the world frame.

On the other hand, if the least square method (LSM) is adopted, the projective matrixes of the left and right camera are obtained as shown in Table 3.

Table 3 Projective matrixes obtained with LSM

While precision analysis experiment is carried out, the actual coordinates (AC) of feature points in the world frame are measured and shown in the 1st line of Table 4. When the 3D re-construction is achieved, if their coordinates are obtained in the light of the adaptive orthogonal learning algorithm (abbreviated as CwAOL), which are shown in the 2nd line in Table 4. For the system calibration with the least square method, the 3D coordinates of the feature points in the world coordinate system (abbreviated as CwLSM) are obtained as shown in the 3rd line in Table 4.

Table 4 3D coordinates and precision performance indexes (/mm)

The difference between the actual coordinates in the world frame and the solved coordinates is taken as the precision performance index [3, 7, 14]. That is

$$ {d}_i=\sqrt{{\left({X}_i^{(a)}-{X}_i^{(s)}\right)}^2+{\left({Y}_i^{(a)}-{Y}_i^{(s)}\right)}^2+{\left({Z}_i^{(a)}-{Z}_i^{(s)}\right)}^2} $$
(37)

where (X (a) i , Y (a) i , Z (a) i ) are the actual coordinates in the world frame, and (X (s) i , Y (s) i , Z (s) i ) are the solved 3D coordinates according to the proposed technique, or the other data processing methods such as LSM.

The precision performance index of the two algorithms can be obtained according to Eq. (37), as shown in the 4th and 5th line in Table 4. PIwAOL in Table 4 denotes the precision index with the proposed technique, i.e. the self-adaptive orthogonal learning network. And PIwLSM denotes the precision index with the least square method. From Table 4, it is demonstrated that the proposed approach has higher precision, and can meet the precision requirements for engineering practice. From the above caculation, we found that the presented results are interesting: with patterns of lateral inhibition, the orthogonal learning neural network can achieve fast self-organization for relatively loose structural constraints according to simple anti-Hebbian rule, or a slight modification. The proposed technique is referential and helpful for precision manufacture and measurement, such as precision processing of micro-drill, gear, and other work-pieces.

6 Conclusions and future works

The proposed approach in the paper has the following key features as opposed to other techniques [46, 11, 12, 15, 17, 19, 21]: 1) The fitting projective matrixes for the left and right cameras in the binocular vision system are obtained from the eigen-vectors of an auto-correlation matrix corresponding to minimal eigen-values, which minimize the sum of the square of distances from the combined vector coordinates of the feature points to the fitting hyperplane; 2) A self-adaptive orthogonal learning neural network is designed to obtain the eigen-vectors of the autocorrelation matrix corresponding to the minimal eigen-values, where the j th neuron is trained, its corresponding vector is perpendicular to the front (j-1) vectors already obtained; 3) 3D re-construction is carried out according to the technique proposed above with the advantages such as programming easily and high precision. Such a study provides a new and applicable technique in data processing for calibrating binocular vision systems and 3D re-construction. In the future we will carry out research on how to set the learning rate of the orthogonal learning neural network to enable the training speed as fast as possible.