Keywords

1 Introduction

Visual-based feedback control is one promising solution for robots to operate in unstructured environments through image features [1,2,3]. There are many successful implementation of visual feedback control in robots, such as a position-based visual servoing (PBVS) [4] and image-based visual servoing (IBVS) methods are popular tracked [5]. In PBVS, the task is defined in the Cartesian space and the systems retrieve the 3D information about the scene by using the geometric model. Then the pose of the target was estimated with respect to the robotics coordinate system [6, 7]. Thus, this servoing system is inevitably associated with the robotic hand-eye calibration. In consequence, the PBVS is more sensitive to the calibration errors and the image features may disappear from the field of view (FOV) of camera [8, 9]. In contrast, the IBVS method regulates the robot dynamic behavior by using image features from its visual sensor [10,11,12,13]. This method does not require the 3D reconstruction of target, but requires the camera calibration and the image depth information. IBVS more suitable for preventing the image feature from leaving the FOV. However, it cannot keep the robot movement insider its workspace, particularly when a large displacement of positioning is required.

It is clear that the robotics calibration model and the image depth information should be provided for mentioned visual servoing (VS) methods for computing the dynamic mapping between the vision space and the robot workspace. Thus, to overcome the difficulties regarding in mapping calculation, we propose a new VS method with mapping estimator, which is treated as a stochastic state estimation problem.

The Kalman filter (KF) is one of the best linear state estimators, and its filtering gain is evaluated from the Gaussian white noise statistics of plant. As the noises are unknown in most real-world applications, the optimality of Kalman filter is never guaranteed. Therefore, some colored noise handling solutions have been presented, such as the dimension extension of Kalman filter [14, 15], LMS-based adaptive filtering [16], adaptive Wiener filtering [17], and others such neural network techniques [18, 19]. However, most of these filtering approaches require the noise variance parameters that are difficult to be derived in most situations.

This paper proposes a model-free approach to visual servoing control of a robotic manipulator operated in unknown noise variance environments. The visual-motor mapping and online estimation are conducted using adaptive Kalman filter with network learning techniques, in which the Kalman filtering model is built by adopting an equivalent observation equation for universal non-Gaussian noise. An observation correlation updating method is used to estimate the variance of the measurement noise via online learning. The network learning adjusts the network weights and enables the noise variances to be dynamically estimated.

The proposed mapping estimator does not require systems calibration and the depth information, and the 2D image measurements are directly used to estimate the desired movement of the robotic manipulator. The grasping positioning tasks are performed by reducing the image error between a set of current and desired image features in the image plane, providing highly flexibility for robot to operate in unknown environments. Extensive experiments are conducted on challenging tasks to verify the significant performance of the proposed approach.

2 The Problem Descriptions

The visual servoing control should firstly estimate a mapping matrix J(k) to describes the dynamic differential relationship between the visual space S and the robotic workspace P, and then construct a controller to derive the end-effector moving U(k) needed to minimize the errors of the image features F(k).

We consider a model-free system without the hand-eye calibration, let \(J\left( k \right) = \frac{\partial F(k)}{{\partial U(k)}} \in R^{r \times l}\) be the robot visual-motor mapping matrix. It then be formulated as the state estimation problem with KF techniques, in which the system state vector formed by concatenations of the row and the column elements of J(t), i.e. \(J(k) \subset R^{r \times l} \to X(k) \subset R^{n}\), \(n{ = }r \cdot l\). Assume that the state and observation equations of a robotic system are as follows:

$$ X(k) = \varphi X(k{ - 1}) + \Gamma \xi (k) $$
(1)
$$ Z(k) = hX(k) + \upsilon (k) $$
(2)

where \(Z(k) \in R^{m}\) is the observation vector, \(\varphi \in R^{n \times n}\) and \(h \in R^{m \times n}\) are the state transformation and state observation matrix, respectively.

Assume that the system processing noise \(\xi (k) \in R^{n}\) and observation noise \(\upsilon (k) \in R^{m}\) are the random Gaussian white noise sequences with zero mean, and the covariance matrices that are designated by \(Q_{\xi }\) and \(R_{\upsilon }\) respectively. According to the KF equations [20], the system state estimation is based on the following recurrence equations:

$$ \hat{X}(k/k{ - }1) = \varphi \hat{X}(k{ - 1/}k{ - 1}) $$
(3)
$$ \hat{X}(k{/}k) = \hat{X}(k{/}k{ - 1}) + K(k/k)(Z(k/k) - C\hat{X}(k/k - 1)) $$
(4)
$$ K(k/k) = P(k{/}k{ - 1})h^{T} (hP(k{/}k{ - 1})h^{T} + R_{\upsilon } )^{ - 1} $$
(5)
$$ P(k{/}k{ - 1}) = \varphi P(k{/}k)\varphi^{T} + Q_{\xi } $$
(6)
$$ P(k{/}k) = (E - K(k)h)P(k{/}k{ - 1}) $$
(7)

The mapping online estimation value can be recovered from the system state, i.e. \(\hat{X}(k/k) \subset R^{n} \to \hat{J}(k) \subset R^{r \times l}\). Since the observation noise of sensors is not the standard Gaussian white noise sequences, the noise variances are very difficult to be determined. Therefore, the observation Eq. (2) needs to be adjusted, and the noises variances \(Q_{\xi }\) and \(R_{\upsilon }\) should be online estimated before using KF in our visual servoing control system.

3 The Mapping Estimator with Online Learning

Considering the universal non-Gaussian noise model is stationary that can be generated by passing a white noise through a filter:

$$ \upsilon (k) = \lambda \upsilon (k - 1){ + }\eta (k{ - }1) $$
(8)

where \(\lambda\) is the transition coefficient and \(\eta (k)\) is the random white noise sequences with zero mean and the covariances \(R_{\eta }\).

Based on Eqs. (1), (2) and (8), we can derive the observation vector:

$$ \begin{gathered} Z(k + 1) = hX(k + 1) + \upsilon (k + 1) \\ = (h\varphi - \lambda h)X(k) + \lambda Z(k) + h\xi (k) + \eta (k) \\ \end{gathered} $$
(9)

Equation (9) is considered as equivalent observation equation, and can be deformed to:

$$ \underbrace {{\left( {Z(k + 1) - \lambda Z(k)} \right)}}_{{Z^{*} (k)}} = \underbrace {{\left( {h\varphi - \lambda h} \right)}}_{{h^{*} }}X(k) + \underbrace {h\Gamma \xi (k) + \eta (k)}_{{\upsilon^{*} (k)}} $$
(10)

The variance of the observation noise \(\upsilon^{*} (k)\) is computed as:

$$ \begin{gathered} R_{{\upsilon^{*} }} (k) = E\left\{ {\left( {h\Gamma \xi (k) + \eta (k)} \right)\left( {h\Gamma \xi (k) + \eta (k)} \right)^{T} } \right\} \\ = h\Gamma Q_{\xi } (k)\left( {h\Gamma } \right)^{T} + R_{\eta } (k) \\ \end{gathered} $$
(11)

Then, an observation correlations approach is conducted for online estimation of \(Q_{\xi }\) and \(R_{\eta }\) in Eq. (11).

According to Eq. (2) and Eq. (10), we have:

$$ Z^{*} (k) = h^{*} \varphi^{i} X(k - i){ + }h^{*} \Gamma \xi (k) + \upsilon^{*} (k) $$
(12)

Assume that X(k) and \(\upsilon^{*} (k)\) are not correlated, the random series {Z*(k)} is stationary and ergodic. Then the auto-correlation function \(C_{{z^{*} }} (i)\) of new observation series {Z*(k)} can be derived as follows:

$$ \begin{array}{*{20}l} {C_{{Z^{*} }} (i) = E\left[ {Z^{*} (k)Z^{{{*}T}} (k - i)} \right]} \hfill \\ {\quad \quad \;\,{ = }E\left\{ {\left[ {h^{*} \varphi^{i} X(k - i){ + }h^{*} \Gamma \xi (k) + \upsilon^{*} (k)} \right] \times \left[ {h^{*} X(k - i) + \upsilon^{*} (k - i)} \right]} \right\}} \hfill \\ {\quad \quad \;\, = h^{*} \varphi^{i} \varepsilon_{X} h^{{{*}T}} + h\Gamma Q_{\xi } \left( {h\Gamma } \right)^{T} + R_{\eta } ,i \ge 1} \hfill \\ \end{array} $$
(13)

where \(\varepsilon_{X} = E\left[ {X(k)X(k)^{T} } \right]\). However, \(C_{{Z^{*} }} (i)\) cannot be computed as \(Q_{\xi }\) and \(R_{\eta }\) are unknown.

Figure 1 shows an online single layer network learning used to estimate the \(C_{{Z^{*} }} (i)\) simultaneously. Let the input vector of the network be:

$$ X(i) = \left( {h^{*} \varphi^{i} \varepsilon_{X} h^{{{*}T}} ,\left( {h\Gamma } \right)^{T} ,I} \right)^{T} $$
(14)

According to Eq. (13), the network output for the i-th training sample is given by:

$$ C_{{Z^{*} }} (i) = W^{T} X(i) $$
(15)

where W is the network’s weight vector shown below:

$$ W{ = }\left( {I,h\Gamma Q_{\xi } ,R_{\eta } } \right)^{T} $$
(16)

After Z*(0),…,Z*(k) have been obtained, we can get the estimate of the auto-correlation function \(C_{{Z^{*} }} (i)\) below:

$$ \begin{gathered} \hat{C}_{{Z^{*} }} (i) = \frac{1}{k + 1}\sum\limits_{m = i}^{k} {Z(m)Z^{T} (m - i)} \hfill \\ \quad \quad \;\, = \frac{1}{k + 1}\left( {k\hat{C}_{k - 1} (i) + Z(k)Z^{T} (k - i)} \right),1 \le i \le k,k \ge 1 \hfill \\ \end{gathered} $$
(17)

It is obvious that (17) is updated sequentially and \(\hat{C}_{{Z^{*} }} (i)\) can be estimated online from the previous and new measurements. The network training updates the weights W to minimum the error between \(C_{{Z^{*} }} (i)\) and \(\hat{C}_{{Z^{*} }} (i)\), with the training cost function below:

$$ E{ = }\frac{{1}}{k}\sum\limits_{i = 1}^{k} {\left| {\hat{C}_{{Z^{*} }} (i) - C_{{Z^{*} }} (i)} \right|}^{2} $$
(18)

The weight vector is updated according to:

$$ W(j + 1){ = }W(j) + \frac{\gamma }{k}\sum\limits_{i = 1}^{k} {\left( {\hat{C}_{{z^{*} }} (i) - W^{T} X(i)} \right)X(i)} $$
(19)

where j and \(\gamma\) are the time instant and the learning rate, respectively.

Thus, we can obtain the noises variances \(Q_{\xi }\) and \(R_{\eta }\) from weights W. The non-Gaussian noise Kalman filtering structure with the online learning algorithm which called mapping estimator is shown in the Fig. 1.

Fig. 1.
figure 1

The structure of mapping estimator based on KF and the learning network, in which the network is used to estimate the variance of the noises.

4 Model-free Robotics Visual Servoing Base on Mapping Estimator

This section gives a model-free visual servoing scheme base on the mapping estimator. The image error function in image plane is defined as follows:

$$ e_{F} (k) = F\left( k \right) - F^{d} $$
(20)

where \(F\left( k \right){ = }\left( {f_{1} (k),...,f_{n} (k)} \right) \in R^{n}\) and \(F^{d} { = }\left( {f_{1}^{d} ,...,f_{n}^{d} } \right) \in R^{n}\) are the n-D current image features and desired, respectively. As the desired features Fd are constant, the derivation of error function (20) is:

$$ \dot{e}_{F} (k) = \frac{d}{dk}\left( {F(k) - F^{d} } \right) = \dot{F}(k) $$
(21)

For a discretionary manipulation task, the association of the time change of the image feature F(k) with the robot’s motion U(k) is done by [8]:

$$ \dot{F}(k){ = }J(k)U(k) $$
(22)

Substituting (22) into (21), we have:

$$ \dot{e}_{F} (k) = J(k)U(k) $$
(23)

where \(U(k){ = }\left( {\begin{array}{*{20}c} {V(k)} & {W(k)} \\ \end{array} } \right)^{{\mathbf{T}}}\) is the robotic control variable, in which \(V(k)\) and \(W(k)\) are linear and angular velocity of the end-effector, respectively.

There is nonzero constant \(\rho\) to make the following equation:

$$ \dot{e}_{F} (k) = - \rho e_{F} (k) $$
(24)

Then substituting (24) into (23), we have the control law:

$$ U(k) = { - }\rho J^{ + } (k)e_{F} (k) $$
(25)

where \(\rho\) is the control rate, and \(J_{{}}^{ + } (k)\) is the inverse mapping matrix as follows:

$$ J_{{}}^{ + } (k) = J(k)_{{}}^{T} \left( {J(k)J(k)_{{}}^{T} } \right)^{ - 1} $$
(26)

where the mapping matrix J(k) is estimated by mapping estimator shown in Fig. 2.

The steps of the model-free visual servoing are detailed below:

  1. 1)

    Given the desired feature Fd, control rate \(\rho\), mapping matrix J(0), and initial state vector \(J({0}) \to X({0/0})\).

  2. 2)

    At the k time, updating the system state X(k) for time k by Eq. (1), and then calculate the observation vector Z*(k) by Eq. (10).

  3. 3)

    The state estimate \(\hat{X}(k/k)\) at k time can be obtained by using the mapping estimator shown in Fig. 1.

  4. 4)

    To conduct mapping estimation \(\hat{X}(k/k) \to \hat{J}(k)\).

  5. 5)

    To control the robot motion by Eq. (25).

  6. 6)

    k ← k + 1, go to Step 2).

Fig. 2.
figure 2

The schema of model-free robotics visual servoing

5 Results and Discussions

For simplicity but without loss of generality, the system state transformation matrix and the noises drive matrix in state Eq. (1) are given by \(\Gamma { = }\varphi {\text{ = I}}\). Let the observation vector \(Z(k) = \Delta F(k) = J(k)U(k)\), and the observation matrix in Eq. (2) is \(h{ = }\left( {\begin{array}{*{20}c} {U(k)} & {} & 0 \\ {} & \ddots & {} \\ 0 & {} & {U(k)} \\ \end{array} } \right)\). The observation noise model is described by the filter (8) \(\lambda { = }0.5\). Thus, the new equivalent observation equation can be obtained by using (10). The noise variances are estimated simultaneously by the observation correlation-based algorithm with a learning network mentioned in Fig. 1.

The real experiment has been carried out by an eye-in-hand robot in our lab, which is shown in Fig. 3. The task is to control the manipulator from an arbitrary initial pose to the desired grasping pose by using the proposed model-free visual servoing with close feedback of image features. The center points of small circular disks on the object board are used as image feature. The control rate \(\rho\) in Eq. (25) is selected as 0.25.

In test 1, the initial pose of the robot is far from the desired pose, and the test results are shown in Fig. 4. In test 2, the initial features and desired features are located for the initial pose of the robot relative to the desired pose is rotation combined with translation positioning, the test results are shown in Fig. 5.

Fig. 3.
figure 3

The eye-in-hand robot system without any calibration parameters, the robot current pose and desired pose.

The FOV of the camera is 640 × 480 pixels in above two groups of positioning tests. The initial and the desired feature points are set as close as possible to the edge of the FOV. It can be seen from Fig. 4 (a) and Fig. 5 (a) that the motion trajectories of image features are smooth and stable within the FOV, and there are no feature points deviate from the image plane. On the other hand, it can be seen from Fig. 4 (b) and Fig. 5 (b) that the end-effector had the stabile motion without retreat and vibration in the process of the robot positioning. The robot trajectories in Cartesian space are almost the straight line from initial pose to the desired pose, with no conflict among the robot joints.

Figure 6 shows image errors for two tests, in which the error of the feature points uniformly converges, and the positioning steady-state error is within 10 pixels. It is clear that the model-free visual serving controller can provide high positioning precision.

Fig. 4.
figure 4

Experimental results for Test 1 by using model-free visual servoing.

Fig. 5.
figure 5

Experimental results for Test 2 by using model-free visual servoing.

Fig. 6.
figure 6

The image errors during testing

In the following tests, we verify the performance of the mapping estimator and the traditional KF method in Eq. (3) to Eq. (7). Two kinds of estimation approaches are applied to the model-free visual servoing controller. We chose the Gaussian distribution white noise with zero mean. The system noise variance is \(Q_{\xi } { = }0.02\) and the observation noise variance is \(R_{\upsilon } { = }0.15\) for the KF method. The mapping estimator uses the proposed approach and the learning network to estimate the variance of the noise online. The robot motion from initial pose to the desired pose has large range moving and a rotation of X and Z axis.

Figure 7 shows the results of mapping estimator using in model-free visual servoing and Fig. 8 shows the results of the traditional KF method. Comparing Fig. 7 (a) with Fig. 8 (a), it can be seen that the image feature trajectories by using our estimator are smoother, short and stable than the feature trajectories by the KF method. From Fig. 7 (b) and Fig. 8 (b), we can see that the robot trajectories in Cartesian space by using proposed estimator is stable without oscillation, while the KF estimation method has large motion oscillation, retreat and serious detour in the same tasks. As can be seen from Fig. 9 the steady-state positioning error for our estimator is smaller than the KF method.

To sum up, the model-free visual servoing with the mapping estimator has eliminated the requirement for the system calibration and target modeling. Also, it has the capability of online estimating the visual-moto mapping in a stochastic environment without the knowledge of noise statistics, and the performances of the robotics is improved greatly.

Fig. 7.
figure 7

Experimental results by using the mapping estimator.

Fig. 8.
figure 8

Experimental results by using the KF method.

Fig. 9.
figure 9

The image errors

6 Conclusion

In this work, a mapping estimator and a model-free robotic visual servoing scheme have been investigated for robotic grasping manipulation. The proposed mapping estimator can be used in the visual servo system without the need of calibration and the image depth information. Moreover, the mapping identification problems were solved by incorporating KF and network learning techniques. The proposed approach is able to online estimate the vision-motor differential relationship in unknown environments without noise statistical information. Various experiments were conducted by using both the KF and our methods. The results clearly show that the proposed visual servoing with mapping estimator approach outperform the traditional approach in terms of the trajectories of the image features and robot movement in the Cartesian space.