Keywords

1 Introduction

The development of mobile manipulators in recent years has faced a number of new robotic tasks and open issues. The ability to open standard doors is useful for the mobility in human-intended environments [1]. It is also theoretically interesting as it requires interaction with a large-size movable mechanical object (comparable to or larger than the size of the robot).

There is a number of works on robotic door opening; however, it still may be considered a relevant scientific task. Door opening may be performed by various types of robotic systems e.g. manipulator on a wheeled platform [2,3,4,5] or humanoid robot [6, 7]. Control approaches for door opening can be divided into two groups: real-time compliance control methods [2, 8, 9] and planning the trajectory of a robotic system for door opening [3, 4, 6, 7].

The works in the first group consider the following task statement. The endeffector of the robot interacts with the door handle; resistance of the door is identified by the force and torque sensors. The task is to define control inputs based on these sensor data, which helps to move the door in the required direction without damage. This task requires fast communication with sensors and high performance of feedback processing. Communication and control frequency have an order of tens or hundreds Hz. The time of door opening is much longer then for opening by human (e.g., 20 s with 100 Hz communication in [9] and 35 s with 20 Hz in [8]).

The works in the second group aim to determine the trajectory (the reference of robot configurations), which allow the system to open the door. Derivation of control inputs is out of this task statement and can be executed by inner regulators of the robotic system. Obviously, two approaches may be combined, e.g., in [4] compliant controller is used to execute the trajectory, which is defined by the planner. Trajectory planning can be based on a heuristic search or on numerical optimization. The first approach is used in [3] where anytime repairing A* algorithm from [10] is applied to define the executable robot trajectory on a configuration graph. In [4], the sampling-based RRT algorithm was applied for motion planning. [6] claims that the sampling-based search is both overpowered (as the configuration space of door opening is not yet complicated for applying these techniques) and underpowered (as it requires post-processing to make the trajectory smoother) for door opening. Through these statements, the authors substantiate their approach based on the CHOMP [11] trajectory optimization. A bit later, [7] applied for this the TrajOpt [12] method, which is faster than CHOMP. The aforementioned works applied to a robotic system with complicated dynamics and required relatively long time either for trajectory definition or for motion execution and consider path and motion planning as two separated tasks. It seems promising to apply a holistic approach based on the nonlinear model predictive control (MPC), which provides trajectory optimization and a definition of related control inputs as a single operation.

We present a novel approach for wide door opening using a mobile manipulator. Trajectory planning is made for a wheeled platform while the control of the arm is defined to adjust the trajectory of the end-effector to the door opening trajectory under impedance of the door handle. This statement is a bit similar to [3], but contrary to this work, we, first, develop our own planner based on trajectory optimization and, second, apply high-level force control of the arm. The use of high-level force control simplifies the opening task; however, it is challenging as the arm must avoid singular configurations. We insert singularity avoidance into objectives and constraints for the optimization task. The wide opening of the door is predated by the door positioning and unlocking. Unlocking is implemented via simple high-level control rules, while the positioning of the door is based on machine vision. We have developed a complex neural solution for handle positioning from stereo camera images and estimate the door kinematic model based on lidar data. We have implemented and tested our approach on a mobile manipulator, which is shown in Fig. 1. It consists of the differential drive Husky platform [13] and UR5 [14] the robotic arm with two-fingers gripper and stereo camera mounted on its end-effector. Compared with purely indoor mobile manipulators (e.g. TIAGo [15]) Husky has a relatively big footprint, which makes collision avoidance challenging and reduces effective workspace of the arm. The rest of the paper is organised as follows. In Sect. 2, we present our common view on the opening pipeline. The following three sections focused on the three main subtasks of the solution: recognition of the door handle (3), handle twisting and slightly opening with simultaneous identification of door kinematics (4), and wide opening (5). In Sect. 6, we present the experimental results, while Sect. 7 includes the concluding remarks.

Fig. 1.
figure 1

Experimental mobile manipulator composed of a four-wheels platform and a robotic arm.

2 Common Architecture of the Solution

Following the STRL architecture for cognitive agents [16,17,18], we divide the components of the solution into strategic, tactical, and reactive layers. A strategic layer corresponds to high-level cognitive tasks, the tactical one corresponds to local perception and planning for executing high-level action, while the reactive one includes low-level data processing and control. We consider door opening as a single task, which is stated on the strategic layer and executed on the tactic and reactive layers. Common architecture is shown in Fig. 2. We divide the opening procedure into three stages:

  1. 1.

    Positioning of the door and wheelbase before opening. The spatial position of the door handle is identified from stereo images (Sect. 3; the handle recognition component is shown in Fig. 2).

  2. 2.

    Handle twisting and slightly opening via the manipulator (Sect. 4). The manipulator unlocks the handle and slightly opens the door (the handle twisting and slightly opening component is shown in Fig. 2). In the meantime, the kinematic model of the door is identified based on LIDAR data (the door state recognition component is shown in Fig. 2).

  3. 3.

    Door opening via the whole-body motion. The wheelbase and the manipulator are acting in order to open the door wide (Sect. 5).

Fig. 2.
figure 2

Architecture of the solution components.

3 Recognition and Positioning of the Door Handle

This section describes in detail the method for determining the door handle position, as well as the collection and annotation of the necessary data for 2D detection and keypoint regression tasks. The dataset includes the markup of:

  1. 1.

    door handle localization data via 2D bounding boxes \({b}_{i}\),

  2. 2.

    each \({b}_{i}\) contains inside six 2D keypoints \({k}_{ij}\), \(j \in \{1,\dots , 6\}\).

So, the dataset helps us solve two tasks simultaneously: 2D object detection and keypoint regression. To determine the position in 3D space, the door handle can be represented as a simplified skeleton model consisting of six keypoints.

Using the symmetry of an object, 1–2-3 points are assigned as «start» and 4–5-6 as the “end” of a handle. Points 2–3 and 4–5 are responsible for bending along the main axis 3–4 (Fig. 3). Photos of door handles are taken from different angles and scales. The dataset does not include information about keypoint visibility.

Fig. 3.
figure 3

Keypoints location overview on the created dataset.

As a result of manual marking up, 264 images were collected, which contain 325 2D bounding boxes in total. The dataset was divided according to Table 1.

Table 1. Distribution of the Train Test Split.

The model for obtaining 3D door handle coordinates consists of three main blocks: detection, regression of 3D coordinates, and projection modules (see Fig. 4). We will describe each of them in more detail (Table 2).

Fig. 4.
figure 4

Data pipeline to get 3D door handle keypoints.

Table 2. Methods evaluation.

YoloX [19] was taken as a detection module. This high-performance anchorfree method tiny version works 23 FPS on Jetson Xavier NX with TensorRT optimization. Additionally, it has a good speed-accuracy trade-off. It has been trained with original instructions from authors with 50 epochs on TeslaV100 GPU. HRNetLite [20] was taken as a keypoint regressor module. This is an updated light-weight version of HRNet [21]. Thanks to the multi-scale architecture, the model enables one to cope well with the problems of rotating and obscuring keypoints. Using the MMPose framework [22], it has been trained in the 256 × 192 image scale with 20 epochs. To compensate the lack of data and make the keypoint regressor more robust, different augmentation techniques were applied.

  1. 1.

    Detection module. To make our method resistant to color changes, such as shadows, light environment, etc., we use RandomHSV. Also, since the manipulator needs to observe the handle from different sides, we apply image transformations as RandomFlip and RandomAffine. The latter includes RandomShear, RandomTranslation, RandomRotation. To make inference more robust, we apply MixUp and Mosaic augmentations.

  2. 2.

    Regression module. To accurately determine the skeleton of the handle, the following transformations were selected: RandomFlip and RandomAffine.

    Using a depth map and an intrinsic matrix, we can obtain 3D coordinates:

    $$\left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ {1/z} \\ \end{array} } \right] = \frac{1}{z}\underbrace {\left[ {\begin{array}{*{20}c} K & 0 \\ 0 & 1 \\ \end{array} } \right]}_{4 \times 4}\underbrace {\left[ {\begin{array}{*{20}c} R & t \\ 0 & 0 \\ \end{array} } \right]}_{4 \times 4}\left[ {\begin{array}{*{20}c} x \\ y \\ z \\ 1 \\ \end{array} } \right]$$
    (1)

where rotation matrix R, translation vector t, and the intrinsic matrix K make up the camera projection matrix. u, v is a camera coordinate system, and z, y, z is a world coordinate system.

4 A Strategy for the Slight Opening

Geometric appearance for the slightly opening procedure is shown at Fig. 5. At the beginning the only known information is the position of the handle center H in the coordinate system related to the robot base (it is got from the recognition component). This information is enough to perform three consecutive actions: first, capture the handle by the gripper at point H, second, twist the handle via compliant rotation of the gripper, and, third, slightly open the door by pulling the handle. Wide opening of the door require knowledge about the position of the door axis O and rotation radius r. This task is equivalent to estimating the transition between base-related and door-related coordinate systems. Estimation of the door axis is made from LIDAR data while slightly opening.

Fig. 5.
figure 5

Relative positions of the robot and the door.

Some modern robotic arms, including UR5, are able to be controlled by forces in Cartesian task space (if joint configuration is not singular). Our common philosophy for operating a mobile manipulator is to use this ability as much as we can. This mode called “force mode” could be used for soft operations avoiding destruction of the objects and dropping the control box in the safety block. It was used for gripping and twisting the handle. For the handle twist stage, forces were determined in the gripper frame: down pressure and small rotation force around zgrip. After that, the manipulator could be able to slightly open the door by pulling the handle toward the negative zgrip. At the final stage, the manipulator returns the handle to the original orientation, for which forces were given as reversed forces at the twist stage and iteratively checking the alignment of the yzgrip plane and xybase plane.

The position of the door plane is obtained by solving the plane-popout task on the LIDAR pointcloud. A subset of the points near the handle are taken from the pointcloud. RANSAC is applied to fit the plane to these points. As a result, we get the parameters of the plane equation in base-related 3D coordinates. As we consider the plane to be vertical, this equation is reduced to the straight line equation in base-related planar coordinates. During the slight opening, several such estimations are made resulting in a set of line equations. By assuming that all these lines intersect in O, we can define a set of linear equations with unknown coordinates of O. Each equation looks as follows:

$$\cos \alpha_i x_{O\left( {base} \right)} + \sin \alpha_i y_{O\left( {base} \right)} = p_i$$
(2)

Here \({\mathrm{\alpha }}_{i}\) and \({p}_{i}\) are the door plane parameters for the single lidar measurement; specifically, \({\mathrm{\alpha }}_{i}\) is the door opening angle with respect to robot base. If we have more than two such equations, we can estimate O by the least squares method.

5 Planning for the Whole-Body Opening Motion

The geometric appearance for the wide opening task is shown in Fig. 6. If we want to exploit the Cartesian force control of the arm, we should define the platform trajectory in such a way that it avoids the pushing arm into a singular configuration. The non-singular configuration requires that the position of the end-effector be within the certain limits. The end-effector must be within the black-dashed circle in Fig. 6. The center of the circle is at distance b forward to the robot base. If the base joint of the arm is rotated with a certain angle, then the center of the non-singular area is also rotated with this angle. The radius of non-singular area \(\Delta {r}_{max}\) is equal to approximately 30 cm for UR5 robotic arm. Another aspect of singularity avoidance is that the difference between the azimuth of the end-effector and the azimuth of the base joint must not exceed certain value \(\varphi \) (for UR5 \(\mathrm{\varphi }\approx 45^\circ \)). The singularity avoidance work as force that push end-effector from danger zone, strength determined from experiments. The opening trajectory of the wheeled platform should be defined in such a way that the point, which is b forwards to the robot base, is within the corridor between the two red-dash arcs. In this case, the end-effector pulling the door handle is within a non-singular area. Also, the angle between b-line and the door normal must not exceed \(\varphi \). This angular constraint is harder to satisfy, but its violation is less critical. If the angular constraint is violated, but other constraints are satisfied, the arm may be switched to a fully compliant mode, which has no singularity restrictions. As the platform is moving through the planned path within the corridor, a compliant mode will allow the arm to open the door.

Fig. 6.
figure 6

Geometric singularity-avoidance constraints for wide-opening.

The motion planning for a wheeled platform is achieved by turning the planning task into a model-predictive control statement, which represents the optimization problem. A numerical solution is achieved with the “Acados” toolkit [21]. Let \(s={\left[x,y,v,\uptheta \right]}^{T}\) and \(u = {\left[a,w\right]}^{T}\) be the state and the input vectors of the system. Here \(x,y\) and \(\theta \) are the coordinates and orientation of the geometric center of the robot in the inertial frame of the door, \(v\) and \(a\) are the linear velocity and acceleration of the robot, and \(\upomega \) is the angular velocity of the robot. The kinematic model of the differential-drive platform is the following:

$$\begin{array}{c}\left\{\dot{x}\right\}= vcos\theta ,\\ \left\{\dot{y}\right\}= vsin\theta ,\\ \left\{\dot{v}\right\}= a,\\ \left\{\dot{\theta }\right\}= \omega .\end{array}$$
(3)

The optimization problem for the model (3) has a cost function, which should be minimized in the presence of constraints on s and u. For the planning of the motion of the mobile robot, it is suggested that the position of the base link of the manipulator should be as near as possible to the arc-path of the door handle; therefore, it is necessary to introduce the coordinates of the base link in the inertial door frame:

$$\begin{array}{c}{x}_{b}= -b\mathrm{cos}\left(\theta \right)+ x,\\ {y}_{b}= -b\mathrm{sin}\left(\theta \right)+ y.\end{array}$$
(4)

The position of the base link is given as:

$$z = \sqrt{{x}_{b}^{2}+ {y}_{b}^{2}}$$
(5)

The cost function of the studied problem consists of three terms: square deviations of the parameters of the state vector from the desired values, square values of the input vector from zeros, and the position of the base link from the arc-path of the door handle. The general cost function of the problem is written as:

$$J = \int_0^T {(s - s_d )^T W_1 (s - s_d ) + + (u - u_d )^T W_2 (u - u_d ) + (z - z_d )^T W_3 (z - z_d )d\tau } ,$$
(6)

where \({W}_{1}\in {\mathbb{R}}^{n\times n}\), \({W}_{2}\in {\mathbb{R}}^{m\times m}\) and \({W}_{3}\in {\mathbb{R}}\) are the weight matrices of the optimization problem, \({s}_{d}\), \({u}_{d}\) are reference values (the initial guess) of the state and input vector respectively, and \({z}_{d}\) equals the radius of the arc-path of the door handle. In another formulation of the optimization problem, the desired position of the base link of the manipulator can be added by a term-constrained minimum and maximum distance from the arc-path. In this case, the cost function (6) contains only the first two terms. The initial guess for the optimization problem can be constructed as a straight line from the position of the robot in front of the closed door to the end position behind the opened door with angle \(\pi /2\), so the end position of the initial guess is fixed with respect to the door frame. The change in scenarios of the initial guess depends on the initial position and orientation of the robot only. Figure 7 shows the initial and optimized path of the center point of the robot, in case that the initial posture of the robot is (-0.8, 0.8, -1.57) and the end posture is (0.8, 0.9, \(-\pi \)) where the handle is located at position (0.8, 0) for the closed door and at (0, 0.8) for the opened door.

Fig. 7.
figure 7

Initial guess and optimized trajectory.

6 Experiments

The experimental evaluation was made in two stages. At the first stage, we validated the accuracy of identifying door and handle. At the second stage, the common pipeline was tested on a real mobile manipulator.

To obtain groundtruth data about the handle position and the door kinematics, we applied the ARUCO markers recognition module from OpenCV [24]. This module returns pixel coordinates of the marker’s angles, which are then converted into 3D according to (1). Their 3D positions are used to define the door normal and transform between ARUCO-related, camera-related and robot-related coordinate systems. Distance between the marker and the center point of the door handle is measured manually. Thanks to augmentations the 2D keypoints, the regressor works in the most difficult situations. Using a depth map, the method can transform coordinates in 3D (Fig. 8). TensorRT implementation increases the processing speed, which enables one to work in real time. The comparison of 17 cases of positioning with the Aruco data showed that the standard deviation in positioning is equal to 9 mm. To validate the method of the door axis estimation, a set of experiments were conducted. We manually measured the angle of the door in several positions and recorded lidar scans to apply our algorithm. The results of the example procedure of axis estimation are given Table 3. As a result, we calculate the mean door angle error equal 1.2° and max door angle error equal 2.9°. The LS estimation of the door axis based on lidar measurements provides accuracy improving up to 1–2 cm.

Fig. 8.
figure 8

Network output and depth map.

Table 3. Door angle errors, °.

At the second stage of the experiments, the real robot was able to open the office door with an 0.8-m radius. The unlocking and slight opening stage took around 15 s, the wide opening for 90° – around 8 s. See the supplementary video [25] for the details.

7 Conclusion

In this work, we have developed a solution for robotic door opening based on LIDAR and stereo camera measurements. The opening motion of a mobile manipulator is determined by a combination of compliant control for the robotic arm in the task space and model predictive trajectory planning for the wheeled platform. Numerical experiments showed that the proposed methods for positioning the door handle and the axis satisfy accuracy requirements of a real robotic system. Our approach was implemented on a real mobile manipulator and enabled it to open standard doors in an office environment. These results may help to provide the mobility of mobile manipulators in office environments.