Keywords

1 Introduction

In the near future industrial robots are projected to replace CNC machines for machining processes due to their flexibility, lower prices and large working space. The required accuracy for robotic machining is around \( \pm 0.20\,\text{mm} \) based on aerospace specifications, but in reality, only accuracies around 1 mm are obtained [1]. Therefore, the robot’s relatively low accuracy hinders them from being used in high precision applications.

Some works in literature proposed implementation of static calibration or usage of secondary high accuracy encoders installed at each joint for increasing the accuracy of industrial robots [2, 3]. However, disturbances acting on the robots during processes are not taken into account in static calibration methods, and installation of secondary encoders is very expensive and not feasible for all robots. Thus, real time path tracking and correction based on visual servoing is a feasible alternative to achieve the desired accuracies in manufacturing processes [4]. Many works in literature utilize highly accurate sensors such as laser trackers or photogrammetry sensors in the feedback loop of visual servoing [5, 6]. However, these sensors are very expensive and sometimes more than the industrial robot. Hence, relatively cheaper alternatives based on monocular camera systems were proposed by many works in literature. Nissler et al. [7] proposed utilization of AprilTag markers attached to the end effector of a robot. In their work they used optimization techniques to reduce positioning tracking errors to less than 10 mm. However, they used only planar markers thus faced rank deficiency problems in pose estimation and their work was not evaluated during trajectory tracking. Moreover, two data fusion methods based on multi sensor optimal information algorithms (MOIFA) and Kalman filter (KF) were proposed by Liu et al. [8]. These methods were used for fusing orientation data acquired from a digital inclinometer and position data obtained from a photogrammetry system during positioning of a KP 5 Arc Kuka robot’s end effector at seventy six points in a one meter cube space. However, they did not report orientation errors and did not evaluate their approach for trajectory tracking. In general, these works in literature assume the dynamics or kinematics of the industrial robots are known in the proposed eye in hand approaches. As for the KF type methods, they assume a linear dynamic process model along with the process and measurement noise to be known as well. Some works in literature utilized extended Kalman filter (EKF) [9], and adaptive Kalman filter (AKF) [10] to overcome these shortcomings in the estimation of an industrial robot’s pose. However, an accurate dynamic process model required for EKF is hard to obtain, and in the proposed AKF based methods measurement noise and time varying effects due to the robot’s trajectories are not considered, which in turn degrades their effectiveness. In these cases, data driven modeling techniques that can take into account all kinds of sensor errors, sensor noise and uncertainties have been found to be more effective [11,12,13,14].

In this work, an eye to hand camera based pose estimation system is developed for industrial robots through which a target object trackable with a monocular camera with ±90° in all directions is designed. The designed camera target (CT) is fitted with fiducial markers where their placement guarantees the detection of at least two non-planar markers from a single frame, thus preventing ambiguities in pose estimation.

Moreover, a data driven modeling method based on sparse regression is proposed for improving the pose estimated by the Levenberg Marquardt (LM) based algorithm [15], where the ground truth is obtained from a laser tracker. Using the proposed method, one can train all the camera based systems using a single laser tracker in a factory where several industrial robots are required to perform the same task.

The rest of the manuscript is structured as follows: In Sect. 2, a method for improving vision based pose estimation based on sparse regression is presented. The effectiveness of the proposed approach is validated by an experimental study in Sect. 3 where design and detection of the camera target for pose estimation are also described, followed by the conclusion in Sect. 4.

2 Improved Vision Based Pose Estimation Using Sparse Regression

This work proposes to improve the pose estimation accuracy of vision based systems through a data driven approach based on sparse regression. Using this method existing camera based systems can be made to provide better accuracies when trained using the ground truth pose \( \left( {T_{X} ,T_{Y} ,T_{Z} ,\upalpha,\upbeta,\upgamma} \right) \) such as the one provided by a laser tracker. In order to formulate this problem under a sparse regression framework, the inputs and ground truth of the system needs to be determined properly. The ground truth in pose estimation problem can obtained through the highly accurate laser tracker systems. As for inputs, the estimated pose \( \left( {\widehat{T}_{X} ,\widehat{T}_{Y} ,\widehat{T}_{Z} ,\widehat{\upalpha},\widehat{\upbeta},\widehat{\upgamma}} \right) \) provided by the vision system can be obtained through standard pose estimation algorithms in literature such as the Levenberg Marquardt (LM) based algorithm [15].

As for the proposed method based on sparse regression, this work builds upon the work presented by Brunton et al. in which they formulated sparse identification of nonlinear dynamics (SINDy) [16] for discovering governing dynamical equations from data. They leverage the fact that only a few terms are usually required to define dynamics of a physical system. Thus, the equations become sparse in a high dimensional nonlinear function space. Their work is formulated for dynamic systems where large data is collected for determining a function in state space which defines the equations of motion. In their formulation, they collect a time-history of the state \( X\left( t \right) \) and its derivative from which candidate nonlinear functions are generated. These functions can be constants, higher order polynomials, sinusoidal functions,…, etc. Afterwards, they formulate the problem as sparse regression and propose a method based on sequential thresholded least-squares algorithm [16] to solve it. This method is a faster and robust alternative to the least absolute shrinkage and selection operator (LASSO) [17] which is an \( \ell 1 \)-regularized regression that promotes sparsity. Using their proposed method, the sparse vectors of coefficients defining the dynamics can be determined, showing which nonlinearities are active in the physical system. This results in parsimonious models that balance accuracy with model complexity to avoid overfitting.

However, in this work the sparse regression problem is formulated for sparse identification of nonlinear statics (SINS). In particular, the relationship between the pose estimated by the vision system and the pose provided by the laser tracker is assumed to be represented by the following static nonlinear model:

$$ Y =\Psi \left( X \right)\Phi $$
(1)

where

$$ X = \left[ {\begin{array}{*{20}c} {x_{1} \left( {t_{1} } \right)} & \cdots & {x_{6} \left( {t_{1} } \right)} \\ \vdots & \ddots & \vdots \\ {x_{1} \left( {t_{m} } \right)} & \cdots & {x_{6} \left( {t_{m} } \right)} \\ \end{array} } \right]\,\text{and}\,Y = \left[ {\begin{array}{*{20}c} {y_{1} \left( {t_{1} } \right)} & \cdots & {y_{6} \left( {t_{1} } \right)} \\ \vdots & \ddots & \vdots \\ {y_{1} \left( {t_{m} } \right)} & \cdots & {y_{6} \left( {t_{m} } \right)} \\ \end{array} } \right] $$
(2)
$$ \Psi \left( {\text{X}} \right) = \left[ {\begin{array}{*{20}c} 1 & X & {X^{P2} } \\ \end{array} } \right] $$
(3)
$$ X^{P2} = \left[ {\begin{array}{*{20}c} {x_{1}^{2} \left( {t_{1} } \right)} & {x_{1} \left( {t_{1} } \right)x_{2} \left( {t_{1} } \right)} & \cdots \\ \vdots & \vdots & \ddots \\ {x_{1}^{2} \left( {t_{m} } \right)} & {x_{1} \left( {t_{m} } \right)x_{2} \left( {t_{m} } \right)} & \cdots \\ \end{array} \begin{array}{*{20}c} {x_{2}^{2} \left( {t_{1} } \right)} \\ \vdots \\ {x_{2}^{2} \left( {t_{m} } \right)} \\ \end{array} \begin{array}{*{20}c} {x_{2} \left( {t_{1} } \right)x_{3} \left( {t_{1} } \right)} & \cdots & {x_{6}^{2} \left( {t_{1} } \right)} \\ \vdots & \ddots & \vdots \\ {x_{2} \left( {t_{m} } \right)x_{3} \left( {t_{m} } \right)} & \cdots & {x_{6}^{2} \left( {t_{m} } \right)} \\ \end{array} } \right] $$
(4)

where \( x_{1} \) to \( x_{6} \) are the \( \widehat{T}_{X} ,\widehat{T}_{Y} ,\widehat{T}_{Z} ,\widehat{\upalpha},\widehat{\upbeta} \) and \( \widehat{\upgamma} \) estimated by the LM based pose estimation algorithm, \( y_{1} \) to \( y_{6} \) are the ground truth \( T_{X} ,T_{Y} ,T_{Z} ,\upalpha,\upbeta, \) and \( \upgamma \) measured by the laser tracker, \( \Phi \) contains the sparse vectors of coefficients, \( X^{{P_{2} }} \) denotes the quadratic nonlinearities in the variable \( X \), and \( \Psi \left( X \right) \) is the library consisting of candidate nonlinear functions of the columns of \( X \).

Each column of the augmented library \( \Psi \left( X \right) \) represents a candidate function for defining the relationship between the estimated and the ground truth pose. There is total freedom in choosing these functions and in this work the augmented library was constructed using up to \( 2^{nd} \) order polynomials (\( X^{{P_{2} }} \)) with cross terms and thus the resulting size of the sparse regression problem using \( m \) samples is as follows:

$$ Y_{mx6} =\Psi \left( {X_{mx6} } \right)_{mx28}\Phi _{28x6} $$
(5)

The sequential thresholded least-squares based algorithm proposed by Brunton et al. [16] starts with finding a least squares solution for \( \Phi \) and then setting all of its coefficients smaller than a threshold value (\( \uplambda \)) to zero. After determining the indices of the remaining nonzero coefficients, another least squares solution for \( \Phi \) onto the remaining indices is obtained. This procedure is performed repeatedly for the new coefficients using the same \( \uplambda \) until the nonzero coefficients converge. This algorithm is computationally efficient and rapidly converges to a sparse solution in a small number of iterations. Moreover, only a single parameter \( \uplambda \) is required to determine the degree of sparsity in \( \Phi \). The overall flowchart of the proposed method is shown in Fig. 1.

Fig. 1.
figure 1

The proposed sparse identification of nonlinear statics (SINS) for improving vision based pose estimation.

3 Experimental Results

In this section the design of the camera target for pose estimation, detection of the camera target and improved pose estimation results using the proposed method will be presented.

3.1 Design of the Camera Target for Pose Estimation

In this work the pose of a KUKA KR240 R2900 ultra robot’s end effector was tracked in real time using a vision based pose estimation system utilizing a Basler acA2040–120 um camera and was compared with the measurements obtained from a Leica AT960 laser tracker as shown in Fig. 2. The laser tracker works in tandem with the T-MAC probe which is rigidly attached to the end effector and the system has an accuracy of ±10 μm. A target object fitted with markers was designed and fixed to the end effector of the robot so as to estimate its pose from the camera. Since vision based pose estimation algorithms require the exact location of markers on the image plane, it is crucial to design and distribute the markers properly on the target to be tracked by the camera. Therefore, this work proposes utilization of fiducial markers generated from the ArUco library that can be detected robustly in real time. ArUco markers are 2D barcode like patterns usually used in robotics and augmented reality applications [18].

Fig. 2.
figure 2

Experimental setup.

The camera target (CT) was designed to have 5 faces with each face holding 8 ArUco markers. In order to produce nonplanar markers in each face, they were fitted with 4 planar markers and the other 4 were placed at 60° with the horizontal axis. This was designed so as to avoid ambiguities in pose estimation algorithms resulting from the usage of points extracted from a single plane. In literature it has been proven that pose estimation algorithms can provide a unique solution when points extracted from at least two distinct non-parallel planes are used. The CT was built using 3D printing with a size of \( 250 \times 234 \times 250\,\text{mm} \) and had a weight of \( 500\,\text{g} \). The markers were generated from ArUco’s \( 4 \times 4 \times 100 \) library and were fixed into \( 30 \text{mm}^{2} \) holes made in the constructed target object. Using this CT, the locations of all the markers in the object frame can be obtained from the CAD model and used in the vision based pose estimation algorithms.

3.2 Detection of the Camera Target

In the experiments, the vision based pose estimation and synchronization of data with the laser tracker was performed in LabVIEW [19] software. The images were acquired from the Basler ac2040–120 um camera at \( 375\, \text{Hz} \) with a resolution of \( 640 \times 480 \) pixels. These images were then fed into the python [20] node inside LabVIEW where the ArUco marker detection and Levenberg Marquardt based pose estimation algorithms were both operated at \( 1000 \,\text{Hz} \). Moreover, the proposed method can work at \( 6000 \,\text{Hz} \) for a single frame as well. Therefore, the total processing timeFootnote 1 for each image is \( 0.00216 \) s or about \( 463 \,\text{Hz} \). The estimated pose of the camera target (CT) as well as the detected markers are shown in Fig. 3. These results clearly show that the designed CT allows the detection of multiple nonplanar markers with a viewing angle of ±90° from all sides, hence rank deficiency problem is prevented in the pose estimation algorithm.

Fig. 3.
figure 3

(a)–(d) Samples showing marker detection (detected corners are in red) and estimated pose (red, green, blue coordinate axes) of the target object with respect to the camera frame. (Color figure online)

3.3 Pose Estimation Results

In order to evaluate the accuracy and precision of the camera based system, a trajectory tracking experiment based on ISO 9238 standard was conducted using a KUKA KR240 R2900 robot. The accuracy and repeatability of industrial robots are typically evaluated using the ISO 9238 standard during which the robot is tasked with following a set of trajectories multiple times while changing or not changing the orientation of the robot’s end effector. To evaluate the effectiveness of the proposed SINS algorithm and the constructed vision based system, the robot’s end effector was set to follow 16 distinct trajectories based on the ISO 9238 standard while changing its orientation continuously. As per the ISO 9238 guidelines, each of these trajectories contained 5 specific points at which the robot was stopped for 5 s and the experiment took 105.9 min to complete.

First the LM based pose estimation algorithm was implemented for the trajectory tracking of the KUKA KR240 R2900 robot’s end effector. Then, the proposed sparse identification of nonlinear statics (SINS) method was used to improve the pose estimated by the LM based algorithm. In order to evaluate the robustness of the proposed method, the training phase was performed three times using \( 30\% \), \( 50\% \), and \( 70\% \) of the data and was validated on the remaining \( 70\% \), \( 50\% \), and \( 30\% \) of the data based the time series cross validation [21] approach. The training was performed for 10 iterations using a threshold value (\( \uplambda \)) of \( 0.001 \) for the each of the three aforementioned cases and the obtained results are tabulated in Table 1, 2 and 3 for the trajectory tracking based on ISO 9238. The errors given in these tables which are denoted as \( E_{X} \), \( E_{Y} \), \( E_{Z} \), \( E_{Roll} \), \( E_{Pitch} \), and \( E_{Yaw} \) are the absolute errors between the ground truth pose provided by the laser tracker and the estimated pose by the LM based algorithm and improved with SINS. These tracking errors are given in \( mm \) for translation (\( E_{X} \), \( E_{Y} \), \( E_{Z} \)) and in degrees (\( ^\circ \)) for orientation (\( E_{Roll} \), \( E_{Pitch} \), \( E_{Yaw} \)).

Table 1. Pose tracking errors during trajectory tracking based on ISO 9238, trained with 30% of the dataset and validated on the rest.
Table 2. Pose tracking errors during trajectory tracking based on ISO 9238, trained with 50% of the dataset and validated on the rest.
Table 3. Pose tracking errors during trajectory tracking based on ISO 9238, trained with 70% of the dataset and validated on the rest.

As seen from the errors in these tables, the proposed method is able to reduce the position tracking errors at least by 1.23, 1.18, and 1.42 times and up to 1.26, 1.23, and 1.64 times for X, Y, and Z axes, respectively when compared with the pure LM based algorithm using 30% and 70% of the data for training the models. This is in addition to reducing the standard deviation of the position errors by up to 1.14, 1.16, and 1.58 times for X, Y, and Z axes, respectively. Furthermore, the orientation tracking errors were reduced by at least 4.65, 1.20, and 2.05 times and up to 4.79, 1.28, and 2.16 times for Roll, Pitch and Yaw axes, respectively. Moreover, the standard deviation of orientation errors were reduced by up to 1.94, 1.19, and 1.46 times for the Roll, Pitch and Yaw axes, respectively. From these results, it is seen that the proposed method is able to improve the position and orientation tracking accuracies even when \( 30\% \) of the data is used for training the proposed method, thus proving its robustness.

Figure 4 and Fig. 5 show the position and orientation trajectories of the laser target as tracked by the laser tracker in blue. The gray trajectories are the ones estimated by the camera system using LM based pose estimation algorithm and the red trajectories show the improved pose by the proposed SINS method. These images were obtained by training the proposed method with 70% of the data and evaluating it on the whole dataset.

Fig. 4.
figure 4

Position tracking results based on ISO 9238. (Color figure online)

Fig. 5.
figure 5

Orientation tracking results based on ISO 9238. (Color figure online)

It should be noted that the conducted experiment based on ISO 9238 is very challenging for vision based pose estimation due to the distance between the tracked target and the camera increasing a lot, thus decreasing the estimated pose’s accuracy. This is particularly the case in the conducted experiment due to the robot covering a large working space of \( 1140 \times 610 \times 945\,\text{mm} \) along the \( X \), \( Y \), and \( Z \) axes, respectively. Owing to this and the fact that the camera had to be placed 1 m away from the closes point of the work space due to viewing angle restrictions, the distance between the robot’s end effector and the camera changed from 1 m to 3 m during the 16 trajectories followed by the robot, thus making the position errors relatively high.

Moreover, the determined sparse coefficients for training the model with \( 70\% \) of the data are shown in Table 4. As seen, for position (\( \phi_{1} ,\phi_{2} ,\phi_{3} \)) only about \( 50\% \) and for orientation (\( \phi_{4} ,\phi_{5} ,\phi_{6} \)) only around \( 30\% \) of the coefficients are active. This makes the model sparse in the space of possible functions thus determining only the fewest terms to accurately represent the data. Furthermore, such a method is very intuitive in that one can clearly see the coefficients defining the nonlinear relationship and thus provides more insight into the structure of the problem at hand. Besides, training such a model in MATLAB [22] took only \( 0.35, 0.68, \) and \( 0.87 \) s for \( 30\% , 50\% , \) and \( 70\% \) of the data containing 63551 samples.

Table 4. The identified sparse coefficients for training a model with 70% of the data.

4 Conclusion

In this work a monocular machine vision based system was developed for estimating the pose of industrial robots’ end effector in real time. A camera target guaranteeing the detectability of at least two non-parallel markers within ±90° in all directions of the camera’s view was designed and fitted with fiducial markers. Moreover, sparse identification of nonlinear statics (SINS) based on sparse regression was proposed to determine a model with the least number of active coefficients relating the pose estimated by Levenberg-Marquardt (LM) to ground truth pose provided by a laser tracker. Thus, providing a parsimonious model to increase the accuracy and precision of the vision based pose estimation.

The proposed method was validated by tracking an industrial robot’s end effector for 16 distinct trajectories based on ISO 9238. The trajectories were followed by a KUKA KR240 R2900 ultra robot and the ground truth data was provided by the Leica AT960 laser tracker. As seen from the experimental results, the proposed method was able to reduce the position tracking errors by up to 1.26, 1.23, and 1.64 times for X, Y, and Z axes, respectively when compared with the pure LM based algorithm. This is in addition to reducing the orientation tracking errors by up to 4.79, 1.28, and 2.16 times for Roll, Pitch and Yaw axes, respectively. Moreover, by using the proposed method the standard deviation of the position errors were reduced by up to 1.14, 1.16, and 1.58 times for X, Y, and Z axes, respectively. All the while reducing the standard deviation of the orientation errors by up to 1.94, 1.19, and 1.46 times for the Roll, Pitch and Yaw axes, respectively. Therefore, the proposed method is able to increase the accuracy and precision of the standard LM based pose estimation algorithm during trajectory tracking of industrial robots’ end effector.

The determined sparse coefficients for training the model showed that only about \( 50\% \) of the coefficients were active for position improvement, whereas for orientation, only around \( 30\% \) of the coefficients were active. Thus, only the most important terms accurately representing the data were determined using the proposed method. This resulted in obtaining simple and robust models very fast, where one can clearly see the coefficients defining the nonlinear static system.