Abstract
A new approach for binocular vision system calibration and 3D re-construction is proposed. While the system is calibrated, the sum of square distances between the vector coordinates recombined with the coordinates of feature points in the world frame and those in image frame to the fitted hyperplane is taken as an objective function. An orthogonal learning neural network is designed, where a self-adaptive minor component extracting method is adopted. When the network comes to equilibrium, the projective matrixes for the two cameras are obtained from the eigen-vectors of the autocorrelation matrix corresponding to the minimum eigen values, so the calibration of the binocular vision system is achieved. As for 3D re-construction, an autocorrelation matrix is obtained from feature point coordinates in image planes and calibration data, and an orthogonal learning network is designed. After the network is trained, the autocorrelation matrix’s eigen-vector corresponding to the minimum eigen-values is obtained, from which the 3D coordinates are obtained also. The proposed approach is a novel application of minor component analysis and orthogonal learning network in binocular vision system and 3D re-construction.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Camera calibration is a process of determining the internal camera geometric and optical characteristics (intrinsic parameters) and/or the 3D position and orientation of the camera frame relative to a certain world frame (extrinsic parameters) [6, 12]. The applications of vision system include 3D sensing and measurement, precision manufacturing, automated assembly, monitoring and tracking etc. While a binocular vision system is calibrated, the scene’s 3D geometric information can be reconstructed by obtaining two digital images from different angles. There are many research reports on the camera calibration. For example, MA obtained a intrinsic parameters of the camera by designing two sets of three pure orthogonal translation movements, and the orientations of the camera with respect to the hand frame with a set pairwise orthogonal translation movements [11]. ZHANG obtained the intrinsic and extrinsic parameters via the homography matrix in the light of orthogonality of the rotational matrix with the homography obtained from the 3D feature coordinates in a target block and its 2D coordinates. In general, the calculated rotational matrix did not satisfy the orthogonal properties well [19]. Rahman and Krouglicof proposed the quaternion representation of spatial orientation, which resulted in a system of equations that was minimally redundant and free of singularities, and applied a technique to minimize the error between the reconstructed image points and their experimentally determined counterparts in the “distortion free” space, so the technique facilitated the incorporation of the exact lens distortion model as opposed to that relying on an approximation one [15]. Recently, we have proposed a method for camera calibration with an adaptive principal component extraction network in which the sum of square distances from the vector coordinates of feature points to those in hyperplane is taken as objective function, and the eigen-vector of autocorrelation matrix corresponding to minimal eigen-values as a projective matrix. But the calibration of binocular vision system and 3D re-construction has not been done [5]. Chen presented a novel method to analyze the blur distribution in an image and found the optimal focusing distance so that additional constraints could be used to generate absolute measurement of the models [4]. ZHENG proposed a minimum calibration condition that consisted of two vanishing points and a vanishing line so as to estimate camera intrinsic parameters (including the principal point coordinates) and rotation angles, which adopted least squares optimization instead of closed-form computation. The proposed method was practical and suitable for more traffic scenes while roadside camera is calibrated [21]. YIN presented a semi-automatic scene calibration method that combined tracked blobs with user-selected line scene features to recover the homographies between camera views, so the system could deal with mapping a network of cameras with overlapped fields of view into a single ground plane view, even when the overlap was not substantial [17]. There is no similar work done on the binocular vision system calibration and 3D re-construction by means of minor components analysis and the adaptive orthogonal learning network. Therefore, based on our previous work on neuro-calibration techniques, in this study we put forward a novel method where a self-adaptive orthogonal learning network is used to achieve calibration for the binocular vision system and 3D measurement.
2 Model of the binocular vision system
In the binocular vision system as shown in Fig. 1, camera frames are C 1 and C 2, o 1 u 1 v 1 and o 2 u 2 v 2 are the image coordinate systems measured in pixels, O w X w Y w Z w is the world frame measured in mm, the homogeneous coordinates of feature point P in the world frame are (X wi , Y wi , Z wi , 1), which is projected into the image planes so p1 and p2 are obtained, and their homogeneous coordinates are (u 1i , v 1i , 1) and (u 2i , v 2i , 1) respectively. The projective matrixes of the left and right cameras are M 1 and M 2 respectively, and the transformation relations of o 1 u 1 v 1 or o 2 u 2 v 2 and O w X w Y w Z w can be described as follows:
where \( {m}_{11}^{(1)},\begin{array}{ccc}\hfill \cdots, {m}_{34}^{(1)},\hfill & \hfill {m}_{11}^{(2)},\cdots, \hfill & \hfill {m}_{34}^{(2)}\hfill \end{array} \) are elements of projections matrices of the left and right cameras.
If Z (1) ci and Z (2) ci in Eqs. (1) and (2) are cancelled respectively, then we can obtain
At the same time, Eqs. (3), (4), (5) and (6) can be divided by − u 1i , − v 1i , − u 2i and − v 2i respectively, which don’t change the transformation relation of two sides in Eqs. (3)–(6). While the binocular vision system is calibrated, if n feature pionts’ coordinates in world frame and in image frames are obtained, a linear equation can be obtained according to Eqs. (3) and (4), that is
where \( {\mathbf{A}}_1=\left[\begin{array}{cccccccccccc}\hfill -{X}_{w1}/{u}_{11}\hfill & \hfill -{Y}_{w1}/{u}_{11}\hfill & \hfill -{Z}_{w1}/{u}_{11}\hfill & \hfill -1/{u}_{11}\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill {X}_{w1}\hfill & \hfill {Y}_{w1}\hfill & \hfill {Z}_{w1}\hfill & \hfill 1\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill -{X}_{w1}/{v}_{11}\hfill & \hfill -{Y}_{w1}/{v}_{11}\hfill & \hfill -{Z}_{w1}/{v}_{11}\hfill & \hfill -1/{v}_{11}\hfill & \hfill {X}_{w1}\hfill & \hfill {Y}_{w1}\hfill & \hfill {Z}_{w1}\hfill & \hfill 1\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill & \hfill \vdots \hfill \\ {}\hfill -{X}_{wn}/{u}_{1n}\hfill & \hfill -{X}_{wn}/{u}_{1n}\hfill & \hfill -{X}_{wn}/{u}_{1n}\hfill & \hfill -1/{u}_{1n}\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill 1\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill -{X}_{wn}/{v}_{1n}\hfill & \hfill -{X}_{wn}/{v}_{1n}\hfill & \hfill -{X}_{wn}/{v}_{1n}\hfill & \hfill -1/{v}_{1n}\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill {X}_{wn}\hfill & \hfill 1\hfill \end{array}\right] \), which is a 2n × 12 matrix; n 1 is a column vector, consisting of m (1)11 , ⋯ m (1)14 , m (1)21 , ⋯ m (1)24 , m (1)31 , ⋯, m (1)34 ; and 0 (1) is a vector consisting of 12 constants of 0.
As for the right camera, a similar equation like Eq. (7) can be obtained from Eqs. (5) and (6), that is
According to Eqs. (7) and (8), we can get an overdetermined equation, that is
where \( \mathbf{A}=\left[\begin{array}{cc}\hfill {\mathbf{A}}_1\hfill & \hfill {\mathbf{0}}_1\hfill \\ {}\hfill {\mathbf{0}}_2\hfill & \hfill {\mathbf{A}}_2\hfill \end{array}\right] \), which is a 4n × 24 matrix; n = [n 1, n 2]T, which is a column vector consisting of elements of n 1 and n 2, and 0 1 and 0 2 are zero matrixes with 12 × 12.
3 Minor component analysis and solving algorithm
While the binocular vision system is calibrated, the 22 elements of the fitting projective matrixes except m (1)34 and m (2)34 in Eq. (9) are taken as the coefficients of the hyper-plane, which constitute the fitting vector m = [m (1)11 , ⋯, m (1)33 , m (2)11 , ⋯, m (2)33 ]T. The coordinates of sampled points in the world frame and in the image frame are transformed to vector points x i , so the algorithm we adopted is to minimize the sum of the squared distances between all the vector points (i.e. combination coordinates) x i and fitting hyper-plane, thus the objective function is
where e j = m T x j /‖m‖2.
Let \( {\mathbf{x}}_{4i-3}={\left[\begin{array}{cc}\hfill \begin{array}{ccccc}\hfill -{X}_{wi}/{u}_{1i}\hfill & \hfill -{Y}_{wi}/{u}_{1i}\hfill & \hfill -{Z}_{wi}/{u}_{1i}\hfill & \hfill -1/{u}_{1i}\hfill & \hfill {\mathbf{0}}^{(1)}\hfill \end{array}\hfill & \hfill \begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\mathbf{0}\hfill \end{array}\right]}^{\mathrm{T}} \), \( {\mathbf{x}}_{4i-2}={\left[\begin{array}{ccccc}\hfill {\mathbf{0}}^{(1)}\hfill & \hfill -{X}_{wi}/{v}_{1i}\hfill & \hfill -{Y}_{wi}/{v}_{1i}\hfill & \hfill -{Z}_{wi}/{v}_{1i}\hfill & \hfill -1/{v}_{1i}\begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\mathbf{0}\hfill \end{array}\right]}^{\mathrm{T}} \), \( {\mathbf{x}}_{4i-1}={\left[\begin{array}{ccc}\hfill \mathbf{0}\hfill & \hfill \begin{array}{ccccc}\hfill -{X}_{wi}/{u}_{2i}\hfill & \hfill -{Y}_{wi}/{u}_{2i}\hfill & \hfill -{Z}_{wi}/{u}_{2i}\hfill & \hfill -1/{u}_{2i}\hfill & \hfill {\mathbf{0}}^{(1)}\hfill \end{array}\hfill & \hfill \begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\hfill \end{array}\right]}^{\mathrm{T}} \), \( {\mathbf{x}}_{4i}={\left[\begin{array}{ccc}\hfill \mathbf{0}\hfill & \hfill \begin{array}{ccccc}\hfill {\mathbf{0}}^{(1)}\hfill & \hfill -{X}_{wi}/{v}_{2i}\hfill & \hfill -{W}_{wi}/{v}_{2i}\hfill & \hfill -{Z}_{wi}/{v}_{2i}\hfill & \hfill -1/{v}_{2i}\hfill \end{array}\hfill & \hfill \begin{array}{ccc}\hfill {X}_{wi}\hfill & \hfill {Y}_{wi}\hfill & \hfill {Z}_{wi}\hfill \end{array}\hfill \end{array}\right]}^{\mathrm{T}} \) where 0 is a 1 × 11 row vector, and 0 (1) is a 1 × 4 row vector. Thus
where \( \mathbf{R}={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}}+{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}} \), which can be written as [R 1 + R 2]; R 1 and R 2 are 22 × 22 real symmetric matrixes, i.e., \( {\mathbf{R}}_1={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}} \), and \( {\mathbf{R}}_2={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{x}}_j{\mathbf{x}}_j^{\mathrm{T}}}} \); \( {\mathbf{b}}_1={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{x}}_j}} \); and \( {\mathbf{b}}_2={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{x}}_j}} \).
In order to get the minimum, the critical point can be obtained from dE/d m = 0, that is
Let \( \lambda =\frac{{\mathbf{m}}^{\mathrm{T}}\mathbf{R}\mathbf{m}+2{m}_{34}^{(1)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_1+2{m}_{34}^{(2)}{\mathbf{m}}^{\mathrm{T}}{\mathbf{b}}_2+2N{\left({m}_{34}^{(1)}+{m}_{34}^{(2)}\right)}^2}{\left|\right|\mathbf{m}\left|\right|{}_2^2} \). According to the expected values of m (1)34 and m (2)34 from Eqs. (7) and (8). And assuming m (1)34 = − m T b 1/2N and m (2)34 = − m T b 2/2N, so we have
where M = R − B, \( \lambda =\frac{{\mathbf{m}}^{\mathrm{T}}\mathbf{Mm}}{{\mathbf{m}}^{\mathrm{T}}\mathbf{m}} \), \( \mathbf{B}=\left[\begin{array}{cc}\hfill {\mathbf{B}}_1\hfill & \hfill {\mathbf{B}}_2\hfill \end{array}\right] \), B 1 = b 1 b 1 T/2N, B 2 = b 2 b 2 T/2N, and \( \mathbf{M}=\left[\begin{array}{cc}\hfill {\mathbf{M}}_1\hfill & \hfill \mathbf{0}\hfill \\ {}\hfill \mathbf{0}\hfill & \hfill {\mathbf{M}}_2\hfill \end{array}\right] \). Thus λ is the eigen-value of M, and m is its corresponding eigen-vector [1].
At the same time, assuming eigen values of M 1 and M 2 to be λ 1, λ 2, ⋯, and λ 11, μ 1, μ 2, ⋯, μ 11, respectively, so there are orthogonal matrixes P and Q, which meet M 1 = PΛ 1 P − 1, and M 2 = QΛ 2 Q − 1, where \( {\boldsymbol{\Lambda}}_1= diag\left(\begin{array}{cccc}\hfill {\lambda}_1,\hfill & \hfill {\lambda}_2,\hfill & \hfill \cdots, \hfill & \hfill {\lambda}_{11}\hfill \end{array}\right) \), \( {\boldsymbol{\Lambda}}_2= diag\left(\begin{array}{cccc}\hfill {\mu}_1,\hfill & \hfill {\mu}_2,\hfill & \hfill \cdots, \hfill & \hfill {\mu}_{11}\hfill \end{array}\right) \). Thus
As for the eigen values of M, we have sorted them in order from large to small. For example \( \boldsymbol{\Lambda} = diag\left(\begin{array}{cccccc}\hfill {\mu}_1\hfill & \hfill {\lambda}_1\hfill & \hfill \cdots \hfill & \hfill {\lambda}_{10}\hfill & \hfill {\lambda}_{11}\hfill & \hfill {\mu}_{11}\hfill \end{array}\right) \), or \( \boldsymbol{\Lambda} = diag\left(\begin{array}{cccccc}\hfill {\lambda}_1\hfill & \hfill {\mu}_1\hfill & \hfill \cdots \hfill & \hfill {\mu}_{11}\hfill & \hfill {\lambda}_{10}\hfill & \hfill {\lambda}_{11}\hfill \end{array}\right) \), and so on. Thus M can be described as M = BΛ B − 1, where B consists of 22 column vectors such as [m 1, 0]T, [0, m 2]T, [m 3, 0]T, …, [0, m j ]T , …, and [m 22, 0]T, which is an orthogonal eigen vector of M corresponding to the eigen values respectively, and the projective matrixes of binocular vision are obtained from the normalization eigenvectors of the M corresponding to the minimum eigen values.
Due to the projective matrices of the left and right cameras being different in binocular vision system, if the left camera’s projective matrix is obtained from the eigen vector m = [m 22, 0]T of the autocorrelation matrices corresponding to the minimal eigen value λ 11, so the right camera’s projective matrix can be obtained from the eigen vector m = [0, m 21]T corresponding to the minimal eigen value μ 11, where 0 is a 1 × 11 matrix. And \( {m}_{34}^{(1)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{m}}_{22}^{\mathrm{T}}{\mathbf{x}}_j}}/2N \) , \( {m}_{34}^{(2)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{m}}_{21}^{\mathrm{T}}{\mathbf{x}}_j}}/2N \), where i = 1, 2, ⋯, N. Thus the projective matrix of the left camera is m L = [m 22, m (1)34 ]T, and that for the right camera is m R = [m 21, m (2)34 ]T.
4 Design of the self-adaptive orthogonal learning neural network
An orthogonal learning neural network with a lateral connection proposed by KUNG [10] was adopted in the experiments. The structure is shown in Fig. 2, and its input data are the row vectors of the auto-correlation matrix M. There are 22 neurons in the output layer, and the 1st neuron connects with input neurons with m 1 = [m (1)1 , m (2)1 , ⋯, m (22)1 ]T, but without a lateral connection while trained. The j th neuron connects with both the input neurons as m j = [m (1) j , m (2) j , ⋯, m (22) j ]T, and the front (j-1) outputs as the lateral weight vector W j = [w (1) j , w (2) j , …, w (j − 1) j ]T. The network is trained out step by step. While the j th neuron is training, the 1st, 2nd, …, and (j-1)th neurons have already been trained, i.e. stable values m 1, m 2, ⋯, m j − 1 are obtained, all of which are orthogonal each other. When the j th neuron is trained completely, the lateral connection weights approximate 0, and m j is perpendicular to m 1, m 2, ⋯, m j − 1 respectively [13].
The output of the 1st neuron is
Due to the fact that there is no lateral connection, the learning algorithm for the 1st neuron is as following,
The 1st term of Eq. (16) is that for Hebbian learning rule, which represents a self-strengthening function. When the network comes to the stable state, i.e. Δ m 1 → 0, we have \( {O}_1{\mathbf{M}}_i-\frac{O_1^2}{{\mathbf{m}}_1^{\mathrm{T}}{\mathbf{m}}_1}{\mathbf{m}}_1=\mathbf{0} \). According to Eqs. (15) and (16), Mm i − λ 1 m 1 = 0, and \( {\lambda}_1=\frac{{\mathbf{m}}_1^{\mathrm{T}}\mathbf{M}{\mathbf{m}}_1}{{\mathbf{m}}_1^{\mathrm{T}}{\mathbf{m}}_1} \), where m 1 is the eigen-vector of the self-correlation matrix M, which is corresponding to the max eigen value λ 1.
The training for the j th neuron is similar to the above method, i.e.
where O j = [O 1, O 2, ⋯, O j − 1]T, which is the output of the front (j-1) neurons; \( {\mathbf{V}}_j={\left[\begin{array}{cc}\hfill {\mathbf{m}}_1,\hfill & \hfill {\mathbf{m}}_2,\cdots, {\mathbf{m}}_{j-1}\hfill \end{array}\right]}^{\mathrm{T}} \), a weight matrix; and W j = [w (1) j , w (2) j , ⋯, w (j − 1) j ]T, the vector of the lateral connection weights for the j th neuron.
After normalized, the learning rule for the j th neuron is as follows [8, 18],
where β and γ are positive parameters which determine the learning rates, and their values are set according to the corresponding autocorrelation-matrix for fast training speed with no oscillations. The 1st term of Eq. (19) is that for Hebbian learning rule, which represents a self-strengthening function; the 2nd terms in Eqs. (19) and (20) play a stable role for system, and the 1st term in Eq. (20) stands for the anti-Hebbian learning rule, which causes an inhibition function, and makes the outputs of the network non-correlative even if the input signals are correlative. That is the weight W j plays the role of “subtracting” the first (j-1) components from the j th neuron, i.e. the first principal component m 1 of M, the second principal component m 2 of M, …, and the (j-1)th principal component m j − 1 of M are subtracted. Thus the j th vector m j tends to become orthogonal to all the previous components, i.e. m 1, m 2, ⋯, m j − 1, when the train of j th neuron is over. Hence the orthogonal learning rule constitutes an anti-Hebbian rule.
And the iteration algorithms for m j (t + 1) and W j (t + 1) are m j (t + 1) = m j (t) + Δ m j (t), and W j (t + 1) = W j (t) + Δ W j (t), respectively. Assuming β and γ are sufficiently small values so that the values of m j (t + 1) and W j (t + 1) remain approximately constant during that period of time while an average of the variable in an equation is taken over one sweep of the training data (one sweep means one round of training process involving all the given sample input patterns). To facilitate the proof, we assume that β = γ. Therefore, according to the Eqs. (19) and (20) the weight iteration in one sweep for the orthogonal learning network can be rewritten in the form of state transition matrix as follows,
where \( {\mathbf{M}}_{11}={\mathbf{E}}_{22}+\gamma \left(\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right)-\frac{\sigma (t)}{{\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j}{\mathbf{E}}_{22}\right) \), \( {\mathbf{M}}_{12}=\gamma \left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right){\mathbf{V}}_j^{\mathrm{T}} \), \( {\mathbf{M}}_{21}=-\gamma {\mathbf{V}}_j\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right) \), \( {\mathbf{M}}_{22}={\mathbf{E}}_{j-1}-\gamma \left({\mathbf{V}}_j\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^{\mathrm{T}}}\right){\mathbf{V}}_j^{\mathrm{T}}+\sigma (t){\mathbf{E}}_{j-1}\right) \), and σ(t) = E{O 2 j (t)}, where E 22 is a unit matrix with 22 × 22, and E j − 1 is a unit matrix with (j − 1) × (j − 1).
In the experiment, first the m j is initialed at random; then let them be normalized to form a unitary vector; and at every iteration, while the tuned vector m j is obtained, which is normalized to obtained a unitary vector again. Thus m T j m j = 1, i.e. m j is a unitary vector during the iteration processing. On the other hand, while m j (t + 1) in Eq. (21) is left multiplied by V j , then W j (t + 1) is added, so we have
As 1 − γσ(t) < 1, when t → ∞, we have
At the same time, according to Eq. (19), when the system comes to the steady state, that is Δ m j (t + 1) → 0, we have \( \left(\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right)-{\lambda}_j{\mathbf{E}}_{22}\right){\mathbf{m}}_j+\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right){\mathbf{V}}_j^T{\mathbf{W}}_j\to \mathbf{0} \). Due to \( \left(\left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right)-{\lambda}_j{\mathbf{E}}_{22}\right){\mathbf{m}}_j\to \mathbf{0} \), thus we have \( \left({\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i{\mathbf{M}}_i^T}\right){\mathbf{V}}_j^T{\mathbf{W}}_j\to \mathbf{0} \), and W j → 0 (with probability 1). From Eq. (23) we know V j m j (t + 1) → 0, that is m j (t + 1) is orthogonal to the vector elements of V j (i.e. m 1, m 2, … , m j − 1), if the iteration times are sufficiently large.
Assuming the learning rates β and γ decrease to zero at the proper speed (for example, let β = Δt) , thus the Eq. (19) can be described differentially, that is,
When the system comes to a stable equilibrium, i.e. d m j /dt → 0, the lateral connection approximates to zero. Thus, O j = m i M i . According to Eq. (24), the asymptotic stable solution is obtained as follows,
where \( \mathbf{M}={\displaystyle \sum_{i=1}^{22}{\mathbf{M}}_i^{\mathrm{T}}{\mathbf{M}}_i} \), which is an auto-correlation and symmetrical matrix.
The flow of the solving program is shown in Fig. 3, where ε 1 and ε 2 are jump conditions for program iteration.
When the network comes to the stable state, m 1, m 2, ⋯, m 22 will converge to the eigen-vectors of an auto-correlation matrix M, that is \( \underset{n\to \infty }{ \lim }{\lambda}_j=\left({\mathbf{m}}_j^{\mathrm{T}}\mathbf{M}{\mathbf{m}}_j\right)/\left({\mathbf{m}}_j^{\mathrm{T}}{\mathbf{m}}_j\right) \), which is equivalent to the Lagrange multiplier in Eq. (13). Thus the eigenvectors of M corresponding to a minimum eigen value λ 11 and μ 11, i.e. \( \left[\begin{array}{cc}\hfill {\mathbf{m}}_1\hfill & \hfill \mathbf{0}\hfill \end{array}\right] \) and \( \left[\begin{array}{cc}\hfill \mathbf{0}\hfill & \hfill {\mathbf{m}}_2\hfill \end{array}\right] \) are obtained respectively, elements of which can be taken as the fitting coefficients of the projective matrixes of the cameras in the binocular vision system.
5 Results system calibration and 3D re-construction
5.1 Binocular vision system calibration experiment
As been shown in Fig. 4, a precise robot, which consists of servomechanism, motion controllers, mechanical body, binocular vision system and so on. In the vision system, two cameras are mounted at the ends of a manipulator, which move with the end-effector together (eye-in-hand), so the transformation relation of end-effector and cameras is constant while the manipulator moves. While the vision system is calibrated, first the 3D coordinates of feature points are measured with a 3 dimension coordinate measuring machine [16, 22]. And in order to obtain the variations in Z axis direction of feature points coordinates, the target block’s images are sampled with the cameras at different positions by moving the manipulator vertically. Then the corresponding 2D coordinates are estimated with sub-pixel accuracy in the light of the improved Canny’s edge detector algorithm [2, 9].
In the program, the forward and lateral connection weights are initialized at random, and let ε 1 = 0.05, ε 2 = 0.005. After the 22nd neuron trained, the eigen-vector of an auto-correlation matrix of input signals can be obtained, which corresponds to the minimal eigen values, and if \( {\mathbf{v}}_{22}={\left[\begin{array}{cc}\hfill {\mathbf{m}}^{(1)}\hfill & \hfill \mathbf{0}\hfill \end{array}\right]}^{\mathrm{T}} \), else \( {\mathbf{v}}_{21}={\left[\begin{array}{cc}\hfill \mathbf{0}\hfill & \hfill {\mathbf{m}}^{(2)}\hfill \end{array}\right]}^{\mathrm{T}} \). So parameters m (1)34 and m (2)34 can be obtained, i.e. \( {m}_{34}^{(1)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-3}^{4i-2}{\mathbf{m}}_{22}^{\mathrm{T}}{\mathbf{x}}_j}}/2N \), and \( {m}_{34}^{(2)}=-{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=4i-1}^{4i}{\mathbf{m}}_{21}^{\mathrm{T}}{\mathbf{x}}_j}}/2N \), where i = 1, 2, …, N, m 21 and m 22 are corresponding to eigen values λ 11 and μ 11 respectively. So the projective matrixes of the left and right cameras in the binocular vision system can be obtained from the eigenvectors corresponding to minimum eigen values m (1)34 and m (2)34 respectively, which are shown in Table 1.
The elements in Table 1 constitute the projective matrices of left and right cameras, so the transformation relations between the image frames and the world frame in binocular vision system are estimated, which are obtained at once.
5.2 3D re-construction by the self-adaptive orthogonal learning network
In the system of binocular vision, the 3D re-construction can be carried out in the light of the self-adaptive orthogonal learning network with lateral restraint. Assume homogeneous coordinates of a feature point in the world frame to be (X, Y, Z, 1), whose projective coordinates in the left and right camera image planes are (u 1i , v 1i , 1) and (u 2i , v 2i , 1), according to the camera’s mathematic model, 4 equations can be obtained from Eqs. (3) – (6) as follows:
According to analytic geometry, the physical meaning of Eqs. (26) and (27) (or Eqs. (28) and (29)) denotes the line for O1P1 (or O2P2) as shown in Fig 1. In order to obtain the 3D information, the coordinates of P can be obtained from the cross point of O1P1 and O2P2 [20]. In the solving algorithm, from Eqs. (26)–(29), we let \( \mathbf{N}=\left[\begin{array}{ccc}\hfill \frac{u1{m}_{31}^{(1)}-{m}_{11}^{(1)}}{m_{14}^{(1)}-{u}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{u_1{m}_{32}^{(1)}-{m}_{12}^{(1)}}{m_{14}^{(1)}-{u}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{u_1{m}_{33}^{(1)}-{m}_{13}^{(1)}}{m_{14}^{(1)}-{u}_1{m}_{34}^{(1)}}\hfill \\ {}\hfill \frac{v_1{m}_{31}^{(1)}-{m}_{21}^{(1)}}{m_{24}^{(1)}-{v}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{v_1{m}_{32}^{(1)}-{m}_{22}^{(1)}}{m_{24}^{(1)}-{v}_1{m}_{34}^{(1)}}\hfill & \hfill \frac{v_1{m}_{33}^{(1)}-{m}_{23}^{(1)}}{m_{24}^{(1)}-{v}_1{m}_{34}^{(1)}}\hfill \\ {}\hfill \frac{u_2{m}_{31}^{(2)}-{m}_{11}^{(2)}}{m_{14}^{(2)}-{u}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{u_2{m}_{32}^{(2)}-{m}_{12}^{(2)}}{m_{14}^{(2)}-{u}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{u_2{m}_{33}^{(2)}-{m}_{13}^{(2)}}{m_{14}^{(2)}-{u}_2{m}_{34}^{(2)}}\hfill \\ {}\hfill \frac{v_2{m}_{31}^{(2)}-{m}_{21}^{(2)}}{m_{24}^{(2)}-{v}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{v_2{m}_{32}^{(2)}-{m}_{22}^{(2)}}{m_{24}^{(2)}-{v}_2{m}_{34}^{(2)}}\hfill & \hfill \frac{v_2{m}_{33}^{(2)}-{m}_{23}^{(2)}}{m_{24}^{(2)}-{v}_2{m}_{34}^{(2)}}\hfill \end{array}\right] \), \( \mathbf{d}=\left[\begin{array}{c}\hfill x\hfill \\ {}\hfill y\hfill \\ {}\hfill z\hfill \end{array}\right] \), and \( \mathbf{c}={\left[\begin{array}{cccc}\hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill & \hfill -1\hfill \end{array}\right]}^{\mathrm{T}} \), so we have
According to the above methods, let the objective function be as follows,
where \( {r}_i=\frac{\left|{\mathbf{N}}_i\mathbf{d}+{c}_i\right|}{{\left\Vert \mathbf{d}\right\Vert}_2} \), and N i is the i th row vector of N. In the algorithm, let c i = c (i = 1, 2, 3, 4), where c is the constant proportional to 1, which has no influence on the solving iteration of 3D re-construction. For example, if a fitting coefficient vector d can make the sum of the squares of the distances from all the vector points’ coordinates N i to the fitting hyperplane minimizing Eq. (31) with constant 1, it is the same for Eq. (31) with arbitrary constant c, and there is only an offset value for Eq. (31). Thus Eq. (31) can be re-written as follows,
where \( \mathbf{s}={\displaystyle \sum_{i=1}^4{\mathbf{N}}_i^{\mathrm{T}}{\mathbf{N}}_i} \), and \( \mathbf{t}={\displaystyle \sum_{i=1}^4{\mathbf{N}}_i} \), where N i is the i th row vector of N.
In order to minimize L, the critical points of Eq. (32) can be gotten by letting dL/d d = 0, that is
According to Eq. (30), we know the expected value of \( c=-{\mathbf{N}}_i\mathbf{d}=-\raisebox{1ex}{$\mathbf{t}\mathbf{d}$}\!\left/ \!\raisebox{-1ex}{$4$}\right. \), thus
where \( \mathbf{T}=\mathbf{s}-\raisebox{1ex}{${\mathbf{t}}^{\mathrm{T}}\mathbf{t}$}\!\left/ \!\raisebox{-1ex}{$4$}\right. \), and \( \lambda =\frac{{\mathbf{d}}^{\mathrm{T}}\mathbf{T}\mathbf{d}}{{\left\Vert \mathbf{d}\right\Vert}_2^2} \).
Thus d is the eigen-vector of T corresponding to the eigen-value λ, and the 3D coordinates of the feature points in the world frame can be obtained from the eigen-vector of T corresponding to the minimum eigen-value.
For the 3D re-construction, the experiment is carried out in high precision robot, as can be seen from Fig. 4. Firstly move the manipulator in vertical direction through controller of servosystem, and the target block’s images are sampled by the two cameras in the stereo vision system. Then 6 points are chosen at random for precision analysis at several positions, and the 2D coordinates of the feature points projected in the left and right camera image planes are obtained and shown in Table 2.
In the 3D re-construction programming, the adaptive orthogonal learning network is designed similarly as Fig. 2; and the flow chart of program is similar to Fig. 3, whose input neurons are three and the output neurons are three too. And the input signals are the i th row vector of T. After the 3rd neuron trained, the system came to an equilibrium state, and the scale was obtained according to Eq. (35), so the world coordinates of the feature points could be obtained from the weight vector which connected the input neurons according to Eq. (36), and the 3D re-construction was achieved.
where d (3) is the eigen-vector of T corresponding to the minimal eigen-value, and D is the solved 3D coordinates of the feature point in the world frame.
On the other hand, if the least square method (LSM) is adopted, the projective matrixes of the left and right camera are obtained as shown in Table 3.
While precision analysis experiment is carried out, the actual coordinates (AC) of feature points in the world frame are measured and shown in the 1st line of Table 4. When the 3D re-construction is achieved, if their coordinates are obtained in the light of the adaptive orthogonal learning algorithm (abbreviated as CwAOL), which are shown in the 2nd line in Table 4. For the system calibration with the least square method, the 3D coordinates of the feature points in the world coordinate system (abbreviated as CwLSM) are obtained as shown in the 3rd line in Table 4.
The difference between the actual coordinates in the world frame and the solved coordinates is taken as the precision performance index [3, 7, 14]. That is
where (X (a) i , Y (a) i , Z (a) i ) are the actual coordinates in the world frame, and (X (s) i , Y (s) i , Z (s) i ) are the solved 3D coordinates according to the proposed technique, or the other data processing methods such as LSM.
The precision performance index of the two algorithms can be obtained according to Eq. (37), as shown in the 4th and 5th line in Table 4. PIwAOL in Table 4 denotes the precision index with the proposed technique, i.e. the self-adaptive orthogonal learning network. And PIwLSM denotes the precision index with the least square method. From Table 4, it is demonstrated that the proposed approach has higher precision, and can meet the precision requirements for engineering practice. From the above caculation, we found that the presented results are interesting: with patterns of lateral inhibition, the orthogonal learning neural network can achieve fast self-organization for relatively loose structural constraints according to simple anti-Hebbian rule, or a slight modification. The proposed technique is referential and helpful for precision manufacture and measurement, such as precision processing of micro-drill, gear, and other work-pieces.
6 Conclusions and future works
The proposed approach in the paper has the following key features as opposed to other techniques [4–6, 11, 12, 15, 17, 19, 21]: 1) The fitting projective matrixes for the left and right cameras in the binocular vision system are obtained from the eigen-vectors of an auto-correlation matrix corresponding to minimal eigen-values, which minimize the sum of the square of distances from the combined vector coordinates of the feature points to the fitting hyperplane; 2) A self-adaptive orthogonal learning neural network is designed to obtain the eigen-vectors of the autocorrelation matrix corresponding to the minimal eigen-values, where the j th neuron is trained, its corresponding vector is perpendicular to the front (j-1) vectors already obtained; 3) 3D re-construction is carried out according to the technique proposed above with the advantages such as programming easily and high precision. Such a study provides a new and applicable technique in data processing for calibrating binocular vision systems and 3D re-construction. In the future we will carry out research on how to set the learning rate of the orthogonal learning neural network to enable the training speed as fast as possible.
References
Božić B, Ristić K, Pejić M (2014) Parameter estimation and accuracy analysis of the free geodetic network adjustment using singular value decomposition. Technical Gazette 21(2):451–456
Brandusa PA, Catalin B (2011) Development of an embedded artificial vision system for an autonomous robot. Int J Innov Comput Inf Control 7(2):745–762
Chatterjee C, Roychowdhury VP, Chong EKP (1997) A nonlinear gauss-seidel algorithm for noncoplanar and coplanar camera calibration with convergence analysis. Comp Vision Image Underst (CVIU) 67(1):58–80
Chen SY, Li YF (2013) Finding optimal focusing distance and edge blur distribution for weakly calibrated 3-D vision. IEEE Trans Ind Inf 9(3):1680–1687
Ge DY, Yao XF (2010) Application of the Sanger operator with lateral connection in camera calibration. J Optoelectron Laser 21(11):1720–1724
Guillemaut JY, Aguado AS, Illingworth J (2005) Using points at infinity for parameter decoupling in camera calibration. IEEE Trans Pattern Anal Mach Intell 27(2):265–370
Kang JH, Kang SJ, Kim S (2015) Line recognition algorithm for 3D polygonal model using a parallel computing platform. Multimed Tools Appl 74(1):259–270
Khan MAU, Khan TM, Khan RB, Kiyani A, Khan MA (2012) Noise characterization in web cameras using independent component analysis. Int J Innov Comput Inf Control 7(2):302–311
Khan SA, Usman M, Riaz N (2015) Face recognition via optimized features fusion. J Intell Fuzzy Syst 28(4):1819–1828
Kung SY, Diamautaras KI (1990) A neural network learning algorithm for adaptive principle components extraction (APEX). Proc Int Conf Acoust Speech Signal Process 2:861–864
Ma SD (1996) A self-calibration technique for active vision system. IEEE Trans Robot Autom 12(1):114–120
Miyagawa I, Arai H, Koike H (2010) Simple camera calibration from a single image using five points on two orthogonal 1-d objects. IEEE Trans Image Process 19(6):1528–1538
Palmieri F, Zhu J, Chang CH (1993) Anti-Hebbian learning in topologically constrained linear networks a tutorial. IEEE Trans Neural Netw 4(5):748–761
Perez U, Cho SH, Asfour S (2009) Volumetric calibration of stereo camera in visual servo based robot control. Int J Adv Robot Syst 6(1):35–42
Rahman T, Krouglicof N (2012) An efficient camera calibration technique offering robustness and accuracy over a wide range of lens distortion. IEEE Trans Image Process 21(2):626–637
Wang B, Liu W, Jia Z et al (2011) Dimensional measurement of hot, large forgings with stereo vision structured light system. Proc IMechE B J Eng Manuf 225(6):901–908
Yin F, Makris D, Velastin SA, Ellis T (2015) Calibration and object correspondence in camera networks with widely separated overlapping views. IET Comput Vis 9(3):354–367
Yuan YM, Fang YH, Cui FX, Li DC (2011) Research on preprocessing algorithm for differential polarization spectrum of oil spills on water. Acta Opt Sin 31(11):1128001-1–1128001-7
Zhang Z (2000) A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 22(11):1330–1334
Zhao ZX, Wen GJ, Zhang X, Li DR (2012) Model-based estimation for pose, velocity of projectile from stereo linear array image. Meas Sci Rev 12(3):104–110
Zheng Y, Peng SL (2014) A practical roadside camera calibration method based on least squares optimization. IEEE Trans Intell Transp Syst 15(2):831–843
Zhou JM, Zhou QX, Shu LL, Xu DD (2011) Research on thermal image enhancement for detecting early mechanical damage in apple. Sens Lett 9(3):1031–1036
Acknowledgments
Foundation item: The work described in this paper is partially supported by the National Natural Science Foundation of China (51175187), the Hunan Provincial Natural Science Foundation of China (09JJ6092), and the Science & Technology Foundation of Guangdong Province under grant Nos. 2013B021300023 and 2013B090600112. The authors also gratefully acknowledge the suggestions of the reviewers, which helped to improve the presentation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ge, Dy., Yao, Xf. & Lian, Zt. Binocular vision calibration and 3D re-construction with an orthogonal learning neural network. Multimed Tools Appl 75, 15635–15650 (2016). https://doi.org/10.1007/s11042-015-2845-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2845-5