Keywords

1 Introduction

At present, the multi-objective vision pose estimation based on visual label [1] has been applied in many fields. Most of the mechanical arms on the market now contain various sensors because of the limited perception of external signals. Radar, laser and vision sensor are commonly used sensors. Traditional pose measurement needs vision sensor to obtain the information of the object to be measured. Its accuracy is mainly affected by the resolution and the distance of the object to be measured [2]. When the resolution is low or the distance of the object to be measured is long, the economy and accuracy of vision sensor will be reduced. In computer vision, the image information of the object can be obtained by placing cameras with different angles and different positions. Based on the multi-objective fusion technology [3], the spatial position and posture estimation is realized. Compared with the visual sensor, the camera has higher economy. The multi-eye vision has better accuracy compared with monocular vision and binocular vision [4].

This paper presents a method of robot arm space pose estimation based on three vision [5]. Based on the visual label detection system, three CMOS cameras are placed in different positions and angles respectively in space to obtain three real-time images of the moving manipulator. The label information includes four corner pixels, center point pixels, and the other is based on the visual label Homography matrix and ID of each tag. The external parameters of the camera are obtained by using the known pixel coordinates and spatial coordinates of the known image by PnP (perspective-n-point) algorithm [6], and the external parameters of the camera can be obtained by the four known points and the internal and external parameters of the calibrated camera, and the origin of the world coordinate system can be determined. According to the label [7] attached to the robot arm, the real-time pixel coordinates of the label can be obtained during the movement of the robot arm. The 3D position and posture in the robot arm space can be reconstructed according to the three-dimensional vision measurement system. The three-dimensional vision measurement system can cover a larger measurement area, and compared with the single binocular vision, the measurement system with three binocular vision has better robustness, and has a wider application in the actual complex application scenarios [8].

2 Position and Pose Measurement System Based on Binocular Vision

2.1 Pose Measurement Model Based on Binocular Vision

The multi eye pose positioning system used in this paper is composed of three cameras equipped with Sony Exmor R COMS sensor, manipulator, visual tag and computer (see Fig. 1).

Fig. 1.
figure 1

Three dimensional schematic diagram of position and pose measurement system based on binocular vision.

Fig. 2.
figure 2

System flow chart

The system consists of two parts (see Fig. 2), which are camera calibration system and multi camera pose measurement system. The functions of the system include: calibrating the camera to obtain the camera parameters, collecting the data set synchronously, detecting the visual label, transforming the pixel coordinates to the spatial coordinates, and calculating the optimal pose of the multi camera fusion.

3 Position and Pose Measurement System Based on Binocular Vision

3.1 Pinhole Camera Model

Fig. 3.
figure 3

Pinhole camera model

Image processing involves the following coordinate system: \(O_{W} - X_{W} Y_{W} Z_{W}\): World coordinate system, which describes the position of the camera in m. \(O_{C} - X_{C} Y_{C} Z_{C}\): Camera coordinate system, optical center as the origin, unit m. \(o - xy\): image coordinate system, the optical center is the center point of the image, and the unit is mm. \(uv\): pixel coordinate system, the origin is the upper left corner of the image, the unit is pixel. P: A point in the world coordinate system is a point in real life. \(p\): For the imaging point of point \(P\) in the image, the coordinates in the image coordinate system are \(\left( {x,y} \right)\), and the coordinates in the pixel coordinate system are \(\left( {u,v} \right)\). \(f\): Camera focal length, equal to \(o\) and \(O_{C}\) [9].

In binocular vision, the origin of the world coordinate system is usually set at the midpoint of the x-axis direction of the left camera or the right camera or both (see Fig. 3).

The next point is about the transformation of these coordinate systems. In other words, how a real object is imaged in the image.

$$ \begin{array}{*{20}c} {Z_{C} \left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {f_{x} } & 0 \\ 0 & {f_{y} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{0} } & 0 \\ {v_{0} } & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} 1 & 0 \\ \end{array} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\varvec{R}} & {\varvec{T}} \\ {\mathbf{0}} & {\mathbf{1}} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ {Z_{W} } \\ \end{array} } \\ 1 \\ \end{array} } \right]} \\ { = {\varvec{M}}_{1} {\varvec{M}}_{2} {\varvec{X}} = {\varvec{MX}}} \\ \end{array} $$
(1)

In the above formula, \( f_{x} ,f_{y}\) is called the normalized focal length on the \(u\)-axis and \(v\) axis in the pixel coordinate system, and \({\varvec{M}}\) is 3 × Projection matrix of 3. \({\varvec{M}}_{1}\) is the internal parameter matrix, \({\varvec{M}}_{2}\) is the external parameter matrix [8].

If the pixel coordinates (u, v) of the space point \(p\) are known, the internal parameters can be obtained by camera calibration. At this time, the real world cannot be determined \((X_{W} ,Y_{W} ,Z_{W} )\), Because the \({\varvec{M}}\) matrix is irreversible and the origin of the world coordinate system needs to be established. Therefore, this paper proposes to construct multiple linear equations by binocular or multi-objective cameras, determine the world origin and get the external parameters of the camera according to PNP, and then estimate the three-dimensional coordinates of the space point P.

3.2 Pinhole Camera Model

The visual label used in this paper is apriltag visual reference system, which can be used in robot, AR, camera calibration and other fields. The system can detect the sign in real time and calculate the relative position quickly. The system consists of the following main parts: the visual detector detects the edge of the input image and the two-dimensional coordinate information of the four corners in the label through gradient detection, establishes the corresponding world coordinate system according to the decoder of the system, and detects the pixel coordinates of the four corners one by one corresponding to the world coordinate system through the visual label system, The monasteric matrix can be obtained.

The visual reference system has two-dimensional spatial information, which is easier to calculate than two-dimensional code. As shown in Fig. 4, the visual label includes multiple categories, and the technology relies on alignment of multiple locating points and auxiliary points. Therefore, the visual reference system can detect a longer distance, and it can also detect in dark or poor detection environment, with high robustness (see Fig. 4).

Fig. 4.
figure 4

Some families of apriltag

4 The Model of Three Vision Measurement System

Fig. 5.
figure 5

The model of three vision measurement system

Through the different positions of the three cameras, the image of the visual label on the manipulator is obtained at the same time and in the same scene (see Fig. 5). Through the knowledge of machine vision, a series of two-dimensional coordinates are transformed into three-dimensional coordinates, so as to obtain the pose state of the manipulator [10].

In practical application, because the data is always noisy, as shown in Fig. 5, through the three-dimensional vision fusion, we can get that the three-dimensional coordinates of the measured object are disjoint points \(P_{1} ,P_{2} ,P_{3}\). After the least square method, the optimal three-dimensional coordinates can be obtained [3]. Let the world coordinate system of P be \(\left( {X_{W} ,Y_{W} ,Z_{W} } \right)\). The optimal objective function should be satisfied.

$$ \begin{array}{*{20}c} {F = \min \left( {P - P_{i} } \right) \;i = 1,2,3} \\ \end{array} $$
(2)

The solution process is as follows:

$$ \begin{array}{*{20}c} {Z_{C} \left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {f_{x} } & 0 \\ 0 & {f_{y} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{0} } & 0 \\ {v_{0} } & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} 0 & 0 \\ \end{array} } & {\begin{array}{*{20}c} 1 & 0 \\ \end{array} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\varvec{R}} & {\varvec{T}} \\ {\mathbf{0}} & {\mathbf{1}} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ {Z_{W} } \\ \end{array} } \\ 1 \\ \end{array} } \right]} \\ \end{array} $$
(3)

Among: \({\varvec{R}}\!\!:3 \times 3,{\varvec{T}}\!\!:3 \times 1\):

$$ Z_{i} \left[ {\begin{array}{*{20}c} {u_{i} } \\ {v_{i} } \\ 1 \\ \end{array} } \right] = {\varvec{M}}_{i} \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ {Z_{W} } \\ \end{array} } \\ 1 \\ \end{array} } \right] i = 1,2,3 $$
(4)

The results are as follows:

$$ \begin{array}{*{20}c} {Z_{i} \left[ {\begin{array}{*{20}c} {u_{i} } \\ {v_{i} } \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {m_{11}^{i} } & {m_{12}^{i} } \\ {m_{21}^{i} } & {m_{22}^{i} } \\ {m_{31}^{i} } & {m_{32}^{i} } \\ \end{array} } & {\begin{array}{*{20}c} {m_{13}^{i} } & {m_{14}^{i} } \\ {m_{23}^{i} } & {m_{24}^{i} } \\ {m_{33}^{i} } & {m_{34}^{i} } \\ \end{array} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ {Z_{W} } \\ \end{array} } \\ 1 \\ \end{array} } \right]} \\ \end{array} $$
(5)

Where \({\varvec{M}}_{i} { }\) is \({\varvec{A}}\left[ {{\varvec{R}}{ }{\varvec{T}}} \right]\), and \({\varvec{A}}\) is the internal parameter matrix of the camera. According to binocular vision, the coordinates of the least square method [11, 12] can be calculated.

$$ \begin{array}{*{20}c} {\left[ {\begin{array}{*{20}c} {X_{W} } \\ {Y_{W} } \\ {Z_{W} } \\ \end{array} } \right] = \left( {{\varvec{C}}_{i}^{T} {\varvec{C}}_{i} } \right)^{ - 1} {\varvec{C}}_{i} {\varvec{D}}_{i} ,i = \left( {1,2,3} \right)} \\ \end{array} $$
(6)
$$ \begin{aligned} & {\varvec{C}}_{1} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {u_{1} m_{31}^{1} - m_{11}^{1} } \\ {v_{1} m_{31}^{1} - m_{13}^{1} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{1} m_{32}^{1} - m_{12}^{1} } \\ {v_{1} m_{32}^{1} - m_{13}^{1} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{1} m_{33}^{1} - m_{13}^{1} } \\ {v_{1} m_{33}^{1} - m_{23}^{1} } \\ \end{array} } \\ {\begin{array}{*{20}c} {u_{2} m_{31}^{2} - m_{11}^{2} } \\ {v_{2} m_{31}^{2} - m_{13}^{2} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{2} m_{32}^{2} - m_{12}^{2} } \\ {v_{2} m_{32}^{2} - m_{13}^{2} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{2} m_{33}^{2} - m_{13}^{2} } \\ {v_{2} m_{33}^{2} - m_{23}^{2} } \\ \end{array} } \\ \end{array} } \right] \\ & {\varvec{D}}_{1} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {m_{14}^{1} - u_{1} m_{34}^{1} } \\ {m_{24}^{1} - v_{1} m_{34}^{1} } \\ \end{array} } \\ {\begin{array}{*{20}c} {m_{14}^{2} - u_{2} m_{34}^{2} } \\ {m_{24}^{2} - v_{2} m_{34}^{2} } \\ \end{array} } \\ \end{array} } \right] \\ \end{aligned} $$
(7)
$$ \begin{aligned} & {\varvec{C}}_{2} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {u_{2} m_{31}^{2} - m_{11}^{2} } \\ {v_{2} m_{31}^{2} - m_{21}^{2} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{2} m_{32}^{2} - m_{12}^{2} } \\ {v_{2} m_{32}^{2} - m_{22}^{2} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{2} m_{33}^{2} - m_{13}^{2} } \\ {v_{2} m_{33}^{2} - m_{23}^{2} } \\ \end{array} } \\ {\begin{array}{*{20}c} {u_{3} m_{31}^{3} - m_{11}^{3} } \\ {v_{3} m_{31}^{3} - m_{21}^{3} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{3} m_{32}^{3} - m_{12}^{3} } \\ {v_{3} m_{32}^{3} - m_{22}^{3} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{3} m_{33}^{3} - m_{13}^{3} } \\ {v_{3} m_{33}^{3} - m_{23}^{3} } \\ \end{array} } \\ \end{array} } \right] \\ & {\varvec{D}}_{2} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {m_{14}^{2} - u_{2} m_{34}^{2} } \\ {m_{24}^{2} - v_{2} m_{34}^{2} } \\ \end{array} } \\ {\begin{array}{*{20}c} {m_{14}^{3} - u_{3} m_{34}^{3} } \\ {m_{24}^{3} - v_{3} m_{34}^{3} } \\ \end{array} } \\ \end{array} } \right] \\ \end{aligned} $$
(8)
$$ \begin{aligned} & {\varvec{C}}_{3} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {u_{1} m_{31}^{1} - m_{11}^{1} } \\ {v_{1} m_{31}^{1} - m_{21}^{1} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{1} m_{32}^{1} - m_{12}^{1} } \\ {v_{1} m_{32}^{1} - m_{22}^{1} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{1} m_{33}^{1} - m_{13}^{1} } \\ {v_{1} m_{33}^{1} - m_{23}^{1} } \\ \end{array} } \\ {\begin{array}{*{20}c} {u_{3} m_{31}^{3} - m_{11}^{3} } \\ {v_{3} m_{31}^{3} - m_{21}^{3} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{3} m_{32}^{3} - m_{12}^{3} } \\ {v_{3} m_{32}^{3} - m_{13}^{3} } \\ \end{array} } & {\begin{array}{*{20}c} {u_{3} m_{33}^{3} - m_{13}^{3} } \\ {v_{3} m_{33}^{3} - m_{23}^{3} } \\ \end{array} } \\ \end{array} } \right] \\ & {\varvec{D}}_{3} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {m_{14}^{1} - u_{1} m_{34}^{1} } \\ {m_{24}^{1} - v_{1} m_{34}^{1} } \\ \end{array} } \\ {\begin{array}{*{20}c} {m_{14}^{3} - u_{3} m_{34}^{3} } \\ {m_{24}^{3} - v_{3} m_{34}^{3} } \\ \end{array} } \\ \end{array} } \right] \\ \end{aligned} $$
(9)

So I got it,

$$ \begin{array}{*{20}c} {F = \min \left( {\left\| {P - P_{1} } \right\| + \left\| {P - P_{2} } \right\| + \left\| {P - P_{3} } \right\|} \right)} \\ {\quad = \mathop \sum \limits_{i = 1}^{3} \begin{array}{*{20}c} {\{ \left( {X_{W} - X_{Wi} } \right)^{2} + \left( {Y_{W} - Y_{Wi} } \right)^{2} + } \\ {\left( {Z_{W} - Z_{Wi} } \right)^{2} \} } \\ \end{array} } \\ \end{array} $$
(10)

To get the optimal objective function, the following conditions should be satisfied at the same time:

$$ \begin{array}{*{20}c} {f_{i} = \left\{ {\begin{array}{*{20}c} {\left( {X_{W} - X_{Wi} } \right)^{2} + \left( {X_{W} - X_{Wi} } \right)^{2} + } \\ {\left( {X_{W} - X_{Wi} } \right)^{2} } \\ \end{array} } \right\} } \\ {i = 1,2,3} \\ \end{array} $$
(11)

According to the method of finding the triangle center of gravity, the optimal 3D pose coordinates can be obtained:

$$ \begin{array}{*{20}c} {X_{W} = \frac{1}{3}\mathop \sum \limits_{i = 1}^{3} X_{Wi} ,Y_{W} = \frac{1}{3}\mathop \sum \limits_{i = 1}^{3} Y_{Wi} ,Z_{W} = \frac{1}{3}\mathop \sum \limits_{i = 1}^{3} Z_{Wi} } \\ \end{array} $$
(12)

In the above formula, \(X_{Wi}\), \( Y_{Wi}\), \( Z_{Wi} \,(i = 1,\,2,\,3)\) is the pose output of three groups of binocular views, \(X_{W}\), \(Y_{W}\), \(Z_{W}\) is the pose output of the tricular view.

5 Experiment

5.1 Camera Calibration

Camera calibration [13] is to obtain camera internal parameters, distortion coefficient and other parameters. The common methods of camera calibration include linear calibration, nonlinear calibration, camera self calibration and Zhang Zhengyou calibration method [14]. The experiment adopts Zhang Zhengyou calibration method, and the specific calibration process is shown in Fig. 6. The method is simple and robust. In this experiment, 20 calibration plate images with different angles and positions were collected, and the camera internal parameters were obtained according to the calculated single stress matrix (see Fig. 6).

Fig. 6.
figure 6

Calibration picture

The size of the chessboard used in this experiment is 12 × 9. By taking pictures of different angles and distances, the internal parameters and distortion coefficients of the camera are obtained according to the calibration software. According to the principle of camera calibration, the internal parameters of the camera are fixed, and will not change because of the change of the pose of the camera, while the external parameters of the camera will change with the change of the pose of the camera. This experiment uses the PNP method to fix the origin of the world coordinate and obtain the external parameters of the camera.

5.2 Experimental Environment Layout

This experiment is based on the movement of the manipulator, through the auxiliary positioning of the visual label to obtain the spatial pose of the manipulator, by drawing the trajectory of the manipulator, measuring the error between the fixed truth point and the estimated point to test the accuracy and robustness of the three eye vision pose measurement system. The physical diagram of the experimental system is shown in Fig. 7.

Fig. 7.
figure 7

Motion picture of manipulator

The pose of the manipulator is acquired synchronously by three cameras. We control the manipulator to swing. The three-dimensional trajectory of the manipulator [15] is shown in Fig. 8 below. The measured values are obtained by measuring different fixed points.

Fig. 8.
figure 8

Robot fixed point

The position and attitude measurement system of the manipulator based on the multi vision predicts the motion trajectory of the manipulator (see Fig. 8). The attitude estimation based on the three vision is correct. Because there will be a small amplitude of mechanical vibration in the motion process of the manipulator, there will be slight frequency in the trajectory drawing. Figure 9 left shows the motion trajectory of joints 2 and 3 under the perspective 1, and Fig. 9 right shows joint 2 under the perspective 2, 3.

Fig. 9.
figure 9

Fixed point trajectory

During the movement of the manipulator, different positions are taken to measure the measurement values of different joint points. As shown in Fig. 8 (pose1-pose4), the measurement values of different joint points (joint points 1–3) of different positions are obtained. The error analysis of the three eye vision posture measurement system and binocular vision pose measurement system is shown in Table 1 below:

Table 1. Error analysis of binocular and tricular pose measurement

According to Table 1, it can be concluded that the relative error of x-axis is reduced from 7.0% to 2.4% based on the joint 2 of pose1 of the robot arm measured by three eye vision posture measurement. The relative error of y-axis is reduced from 9.8% to 2.8% for the joint 3 of the robot arm pose1. The relative error of z-axis is reduced from 6.2% to 2.4%, The robot arm 3D pose estimation system can effectively improve the results of binocular vision 3D pose measurement, and has a better accuracy.

6 Conclusion

In this paper, a manipulator spatial pose estimation system based on multi vision is proposed. The manipulator spatial pose can be obtained in real time through a low-cost visual tag system. According to the experimental results, the position and pose estimation of the manipulator based on multi vision can be completed under certain occlusion, which verifies the robustness of the system. Through the experimental error analysis of each axis, the experimental error of the three vision position and pose measurement system is less than 4.9 mm, which meets the requirements of the use.