Keywords

1 Introduction

Soft robotic arms are safe, user-friendly and flexible in human-machine and machine-environment interaction [1], and therefore, they have a wide range of real-world applications and can revolutionize status quo with technological innovations [2]. For example, soft robotic arms can be applicable in the sorting of different shapes and fragile objects (fruits, vegetables, biological tissues etc.) [3], in medical and healthcare industry such as rehabilitation and auxiliary devices for stroke patients and assisted surgery [4, 5, 11]. Their physical adaptation to external environment empowers them with excellent capability to deal with uncertainty and disturbance [6, 7], allowing for low cost, safe and pleasurable human-robot interaction [8,9,10].

However, there is currently no suitable soft robotic arm sensor on the market and most of the soft robotic arms are still controlled by traditional sensors [12]. Although traditional rigid sensors have many applications on soft robotic arms, they are not always well-matched [13, 14]. Compared with traditional robotic arms, soft robotic arms have no so-called link and joint structure even do not need to drive with an electric motor [16]. In theory, the soft arm has multiple degrees of freedom, without an accurately characterized stiffness and damping [13]. Therefore, many mature sensors used in traditional robotic arms cannot be applied well in soft mechanical arms. These factors have greatly affected the control of the soft robotic arms [14]. To bridge the gap, a novel method to control the robotic arm with vision is employed. Visual control system automatically receives and processes images of real objects through optical and non-contact sensors to obtain the information required for robot motion [15, 18, 19].

Many current visual servoing robotic arms are not well solved in visual positioning problem. The main obstacle is the inability to apply the vision sensor alone to accurately obtain the depth of the target point [20]. For example, it is difficult to solve the depth problem with a monocular camera very accurately [27]. At present, the monocular measurement distance is very popular in machine learning model [28], and the data comes from statistics. Even if the correct rate is continuously improved under the supervised learning model, it is still only a regular information under big data, and there is no physical theory or geometric model to support it. The result analysis has uncontrollable factors. What’s more, the monocular camera is fixed in focal length and cannot be zoomed as fast as the human eye, it cannot solve the problem of imaging at different distances accurately. The binocular camera is a good complement to this shortcoming. The binocular camera can cover different ranges of scenes by using two identical cameras, which solves the problem that the monocular camera cannot switch the focal length back and forth and can also solve the problem of recognizing the images sharply at different distances.

By using a binocular camera, depth measurement may also face many issues such as result accuracy, real-time trade-off [21], and difficulty in obtaining accurate corresponding disparity pixel values to the actual distance in a linear model [22,23,24]. Therefore, in this work, some simple and feasible methods are proposed to improve the measurement efficiency and accuracy of the binocular system, and at the same time, it is well combined with the soft robotic arm control.

There are two main tasks of the binocular camera in this proposed system:

First, measure the depth from the target point to the end effector by binocular disparity relatively, accurately and efficiently [25].

Second, employ the left camera of the binocular camera to establish the imaging geometry model. And the depth information is used to obtain the X, Y coordinates of the target point [26].

The novelty of this work:

  • ① Proposes a simple method to improve the linear model of binocular disparity, making measurement results more accurate, and presents a strategy to improve the calculation speed of binocular disparity.

  • ② Provides a new idea for the eye-in-hand model. Compared to the most eye-in-hand model by monocular, the binocular camera is used in this work to obtain accurate depth measurement in real time with a high positioning accuracy.

  • ③ Matches the vision system with the soft robotic arm model to control the soft robotic arm moving to the target point.

2 Design

The soft robotic arm system is shown in Fig. 1. The vision system in this paper is a binocular camera with a variable baseline length from 4.2 cm to 17.0 cm, and the parameters of the binocular camera are listed in Table 1. It will be mounted on the end effector of the soft robot arm, moving with the end effector of the arm as the “hand-in-eye” model. And the details of binocular are shown in Fig. 2. Finally, the camera detects the depth of the target to the camera in real time and transmits it back to the base coordinate system and keeps end effector approaching the object to a predetermined distance.

Fig. 1.
figure 1

(a) shows the control platform of the soft robotic arm system. (b) shows the soft robotic arm.

Table 1. Parameters of binocular
Fig. 2.
figure 2

Details of binocular

The first part of this work is to use the binocular camera for depth measurement. At the beginning of the measurement, the camera is calibrated to remove the distortion, followed by the camera parameters and distortion coefficient. The intrinsic parameters will be combined with the depth information for the spatial position calculation. In this work, the camera is calibrated using the classic Zhang Zhengyou calibration method. A 50 mm * 50 mm, 10 * 6 calibration plate is chosen in this work. The binocular calibration phasing has special characteristics compared to the monocular calibration, and the calibration plate must appear in the left and right frames at the same time. Making the calibration plate appear in the entire view can effectively improve the accuracy of the calibration. A total of 30 pairs of image data to calibrate the binocular to get the camera parameters are conducted. Then the original frames are rectified with camera parameters and distortion coefficient. Next, the epipolar geometry is used to convert binocular into a standard format. Disparity calculation of two vertically aligned images is then conducted. There are currently three popular methods for calculating disparity: StereoBMState, StereoSGBMState, StereoGCState. The comparison of the three methods is as follows:

  • ① Calculation speed: BM method > SGBM method > GC method.

  • ② Disparity accuracy: BM method < SGBM method < GC method.

In this work, real-time and measurement accuracy are critical for the visual system, hence, average StereoSGBMState mode is used. Based on this algorithm, a scientific method will be employed to enhance the measurement results.

The spatial target point coordinate information is then transmitted back to the camera coordinate frame, after that from the camera coordinate frame to the end effector coordinate frame, and finally from the end effector coordinate frame to the robotic arm base coordinate frame.

3 Modeling

The soft robotic arm is different from the traditional rigid body arm, since its end-effector position is characterized by bending angle, rotation angle, and length. Therefore, in this paper, an algorithm transforming the software robot coordinate system into a Cartesian coordinate system is proposed to characterize the position of the end effector. The block diagram of the entire work is as shown in Fig. 3.

Fig. 3.
figure 3

Block diagram of visual servoing control by binocular vision

3.1 Modeling of Binocular and Monocular

The 3D coordinates of an object in real life can be determined with binocular stereo vision technology. Figure 4 below demonstrates the principle of binocular stereo vision. In Fig. 4, OL and OR are optical centers of the left and right cameras. Suppose two cameras have identical intrinsic and external parameters, which include f (focal length), B (the distance between optical centers), two cameras being on the same plane, and equal Y coordinates of their projection centers. The point P in space has imaging points in two cameras as Pleft and Pright.

Fig. 4.
figure 4

The principle of binocular stereo vision

From trigonometry,

$$ X_{left} = f\frac{x}{z} $$
(1)
$$ X_{right} = f\frac{{\left( {x - B} \right)}}{z} $$
(2)
$$ D = X_{left} - X_{right} $$
(3)

where Xleft, Xright are the values of the target point in X direction of the left and right imaging planes, and x is the length of target point to the left camera in X direction. z is the depth from the target point to the binocular center. D is the disparity.

Solve (1), (2), (3) simultaneously to calculate depth as well. Consequently, it can be derived that:

$$ z = \frac{Bf}{D} $$
(4)

During the imaging process of the camera, there exist 4 coordinate frames, which are pixel, image, camera and world coordinate systems. Figure 5 demonstrates the principle of camera imaging.

Fig. 5.
figure 5

Principle of camera projection

The relation between the pixel coordinate frame and the camera coordinate frame can be expressed with homogeneous matrices:

$$ \left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ 1 \\ \end{array} } \right] = \frac{1}{{z_{c} }}\left[ {\begin{array}{*{20}c} {\frac{1}{{d_{x} }}} & 0 & {u_{0} } & 0 \\ 0 & {\frac{1}{{d_{y} }}} & {v_{0} } & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {x_{c} } \\ {y_{c} } \\ {z_{c} } \\ 1 \\ \end{array} } \right] $$
(5)

where \( u, v \) represent the column and row numbers of the pixel in the image, \( u_{0} \), \( v_{0} \) represent pixel coordinate of the principle point, and \( d_{x} \), \( d_{y} \) are the physical measurements of the unit pixel on the horizontal and vertical axes. Because depth will be considered in this issue, a 4 × 4 matrix format is used where \( x_{c} \), \( y_{c} \), \( z_{c} \) indicate the point in the camera coordinate frame, \( f \) represents the focal length of the camera. \( \frac{f}{{d_{x} }} \), \( \frac{f}{{d_{y} }} \) are abbreviated into \( f_{x} \), \( f_{y} \). Camera intrinsic parameters obtained from camera calibration can be shown below:

$$ \begin{aligned} f_{x} & = 2.3643976238380526 \times 10^{2} \\ u_{0} & = 1.5168679331906756 \times 10^{2} \\ f_{y} & = 2.3486572411007802 \times 10^{2} \\ v_{0} & = 1.2106158962347398 \times 10^{2} \\ \end{aligned} $$

In this work, since the camera is mounted on the end effector, it is assumed that the camera coordinate frame and the robot end effector coordinate frame are the same.

3.2 Modeling of Soft Robotic Arm

The soft robotic arm studied in this paper is a light-weight backboneless soft robotic arm, consisting of 6 long elastic bellows installed circularly, and its end-effector position is characterized by bending angle α, rotation angle β, and length \( l \). The geometry model of the soft arm is shown as follows in Fig. 6, and the frame transfer relation between the base and the end effector is shown in Fig. 7.

Fig. 6.
figure 6

Geometric model of the soft arm

Fig. 7.
figure 7

Frame transfer relation

The homogeneous transformation from base to end effector coordinate frame is shown as follows:

$$ {}_{ }^{e} T_{b} = \left[ {\begin{array}{*{20}c} {{\text{c}}^{2} \beta \left( {c\alpha - 1} \right) + 1} & {s\beta c\beta \left( {c\alpha - 1} \right)} & {c\beta s\alpha } & { - \frac{l}{\upalpha}c\beta \left( {c\upalpha - 1} \right)} \\ {c\beta s\beta \left( {c\alpha - 1} \right)} & {{\text{s}}^{2} \beta \left( {c\alpha - 1} \right) + 1} & {s\beta s\alpha } & { - \frac{l}{\upalpha}s\beta \left( {c\upalpha - 1} \right)} \\ { - c\beta s\alpha } & { - s\beta s\alpha } & {c\alpha } & {\frac{l}{\upalpha}s\upalpha} \\ 0 & 0 & 0 & 1 \\ \end{array} } \right] $$
(6)

Finally, (5) and (6) can be solved to obtain the result:

$$ \left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ 1 \\ \end{array} } \right] = \frac{1}{{{\text{Z}}\left( {\text{t}} \right)}}\varOmega_{L} {}_{ }^{e} T_{b} \left( t \right)\left[ {\begin{array}{*{20}c} {x_{b} } \\ {y_{b} } \\ {z_{b} } \\ 1 \\ \end{array} } \right] $$
(7)

in which \( \varOmega_{L} \) is the intrinsic parameter of the left camera.

Moreover, from calculation, it can be obtained that the target \( \upalpha,\upbeta,l \) are:

$$ \upalpha = 2\,\arctan \left( {\frac{{\sqrt {{\text{x}}^{2} + {\text{y}}^{2} } }}{\text{z}}} \right) $$
(8)
$$ \upbeta = \arctan \frac{\text{y}}{{\rm x}} $$
(9)
$$ l = \frac{{\sqrt {{\text{x}}^{2} + {\text{y}}^{2} + {\text{z}}^{2} } }}{{\sin \left( {\arctan \frac{{\sqrt {{\text{x}}^{2} + {\text{y}}^{2} } }}{2}} \right)}} \cdot \,\arctan \frac{{\sqrt {{\text{x}}^{2} + {\text{y}}^{2} } }}{\text{z}} $$
(10)

Plug the coordinates of the target point in the base coordinate frame into (8), (9), (10) and obtain the target soft robotic arm parameters \( \upalpha, \) \( \upbeta \), \( l \) to control the arm close to the target point.

4 Improved Binocular Measurement Performance

4.1 Measurement Accuracy Optimization

Conversion of disparity pixel values to real distance. Since the soft robotic arm will combined with a gripper in the end and the length of gripper is 60 cm. The experiment is designed based on the actual range of 0–120 cm, with a gradient of 1 cm. And for each group, 10 sets of continuous pixel value output are recorded, which amounts to be a total of 1600 sets of data, as shown below in Fig. 8.

Fig. 8.
figure 8

Relationship between pixel and real distance

Two regions are omitted in the disparity map during the experiment, including blind spot region when the distance is very close, and the undesirable low-resolution region when the distance is very large. As a result, a clear pattern between the pixel values and the real distances can be discovered. The data is then fitted with a high degree polynomial to obtain a more accurate fit curve, and the result is shown in Fig. 9. The curving fitting formula (11) is used to calculate the actual distance.

Fig. 9.
figure 9

Curving fitting result

$$ {\text{Real}}\,{{\rm Distance}} = p_{1} {\text{x}}^{7} + p_{2} x^{6} + p_{3} x^{5} + p_{4} x^{4} + p_{5} x^{3} + p_{6} x^{2} + p_{7} x + p_{8} $$
(11)

where \( x \) is the value of pixel, and \( p_{1} = - 4.847 \times 10^{ - 12} \), \( p_{2} = 2.856 \times 10^{ - 9} \), \( p_{3} = - 5.756 \times 10^{ - 7} \), \( p_{4} = 2.668 \times 10^{ - 5} \), \( p_{5} = 0.006586 \), \( p_{6} = - 1.101 \), \( p_{7} = \, 67.91 \), \( p_{8} = - 1532 \).

4.2 Measurement Time Improvement

One challenge with binocular disparity is its overly long calculation time, which considerably restrains its application in real-time measurement. The algorithm of the binocular disparity map mainly calculates the difference of the corresponding points according to the matching of each pixel in the left and right images, hence, if a more accurate disparity map is to be obtained, a large amount of time will be consumed in the disparity calculation of left and right images. For example, during the initial experiment, the disparity calculation time for big-scale images was as long as 5 s, which certainly did not meet the time requirements for real-time measurements. To solve this problem, the proposed solution is to reduce the size of the target measurement area as shown in Fig. 10. The specific method is to take a square area of a certain size around the target point, and only perform disparity calculation on the pixels of this selected area. The relationship between the size of the region and the disparity calculation time by this method, and the relative error at each size is shown in Fig. 11.

Fig. 10.
figure 10

Method of speeding up calculation time

Fig. 11.
figure 11

Relation between time/error and region

From the relationship in Fig. 11, the calculation time is relatively short when the size of the region is 160 × 160 pixels with a high accuracy, for which the single running time of the vision system is about 0.2 s. Therefore, in the final calculation process, in order to maximize the real-time performance, a pixel area of 160 × 160 is selected.

4.3 Distance Measurement Error

According to the improvement method above, the distance measurement experiment was conducted. Experiments were conducted on the binocular measurement system, with a distance ranging from 57 cm to 120 cm. The measurement was recorded 10 times at each actual distance by two binocular disparity models which are linear and curve fitting. And the error of the experiment was calculated by the following expression

$$ {\text{relative}}\,{{\rm error }} = \frac{error}{real\, depth} $$
(12)

The error analysis chart is shown as follows in Fig. 12.

Fig. 12.
figure 12

Relative average error by real distance

Because the fitting model has a significantly higher measurement accuracy in the working area than the linear model. It can be seen from Fig. 12 that the binocular measurement system meets the required error in the working range well.

5 Experiment

5.1 Visual Servoing Experiment

Binocular is mounted on the center of the end effector and put the mark point at any position in the camera view. In experiment 1, the visual servoing control of robotic arm is set to the target point at a fixed distance. The initial distance is set to be 60 cm, then the robotic arm moves away from the target robotic arm, and finally, it comes back to 60 cm. The experiment 1 configuration is shown in Fig. 13. Experiment 2 is a real-time tracking of the target object. When the target moves in the view of the camera, the camera tracks the target and records the pixel coordinates. The experiment 2 configuration is shown in Fig. 14.

Fig. 13.
figure 13

(a) shows the visual servoing of the robot arm under bending condition to the fixed distance from the target point and (b) shows the depth change over time in experiment 1.

Fig. 14.
figure 14

(a) shows the real-time tracking of the target object. (b) shows the pixel coordinate change in the tracking task.

Based on the two experiments above, a third experiment to detect the object spatial position is operated. A control signal moving the soft robotic arm is generated so that the end of the arm points at the target and moves to a fixed distance from the target object. The experiment configuration is shown in Fig. 15. The target pixel coordinate in the image is tracked to the middle of the camera view. And the distance information is measured in real-time to reach the fixed distance in the task. The change of target pixel and distance are recorded and shown in Fig. 16.

Fig. 15.
figure 15

Experiment 3 configuration

Fig. 16.
figure 16

The change of pixel coordinate and distance

6 Conclusion

In this work, the binocular vision system can rather accurately control the distance from the end effector to the target during the motion of the soft robotic arm so that the end-effector can perform the task within the acceptable error range.

This simple and feasible binocular servoing control method will provide new ideas and inspirations for solving the difficulties in soft robots controlling and sensing fields.

In the future, the aim is to achieve a more accurate estimation of the end position of the arm, enabling the motion control to be more precise, and to equip the soft robotic arms with better capabilities of fulfilling more tasks.