Keywords

1 Introduction

In recent years, the rapid development of e-commerce, online shopping has been very popular in China. Logistics speed has become one of the most important factors to let consumers have a better online shopping experience. Generally speaking, the time required for the logistics sorting link accounts for more than 40% of the operation time of the logistics center, and the required cost accounts for 15–20% of the total cost [1]. However, the logistics sorting link in China is basically in the stage of manual sorting, with low accuracy, high cost, and low efficiency. Therefore, in the evolution process of the whole popular industry from traditional to automatic and intelligent, the use of mechanical arm and other automation equipment instead of labor is an important direction to reduce cost and improve efficiency in the field of logistics.

The use of mechanical arms for logistics sorting requires the use of computer vision technology for commodity identification, segmentation, pose estimation, planning the sorting trajectory, flexible and stable grasp of goods. FCN, FasterRCNN, and so on are used to solve the segmentation of dense goods in the bin. FCN can use a deep convolution neural network to segment the input image of any size end-to-end pixel by pixel semantic segmentation, [2] but it needs to obtain the target by post-processing. FasterRCNN proposes the RPN network, which can detect the target faster, but the output result has a deviation error [3]. The traditional path planning algorithm is used Methods include artificial potential field method, A* algorithm, etc., but they can not guarantee to search the optimal path and are not suitable for the complex environment; [4] in logistics sorting scenario, the gripper needs to stably grasp the goods with different sizes and weights in a relatively narrow space, which also has higher requirements for the gripper.

To complete the logistics sorting task efficiently and reliably, we design a kind of mechanical arm sorting system, which is very suitable for the logistics sorting scene.

2 The Overall Scheme of the System

The sorting system is composed of camera, computer, manipulator and gripper. Firstly, the camera captures the 2D & 3D image of the object. After the image signal is transmitted to the computer, the machine vision technology is used to identify the object, grasp planning and path planning. According to the end pose, the inverse solution of the manipulator is solved, and then the motion command is sent to the manipulator controller. The manipulator cooperates with the gripper to complete the sorting task.

3 End Effectors

3.1 Structural Design

The sorting scene has the characteristics of a complex sorting environment, narrow working area, the irregular shape of goods, and large size span. Therefore, we design an end effector which covers multi-size goods.

As shown in Fig. 1, The end effector is composed of six different sizes of suction cups, sucker extension parts, and quick-change devices (Fig. 2). The extension part of the suction cup is rod-shaped, and one end is connected with the suction cup, which can extend the operation range of the mechanical arm so that the suction cup can be extended into the material box; the quick-change device is used to connect the mechanical arm and the suction cup extension piece; the base can temporarily place the end actuator.

Fig. 1.
figure 1

An end effector for covering multi-size goods

Fig. 2.
figure 2

The quick-change device

The designed end effector has six suckers of different sizes, including four 1 × 1 sucker, 12 × 1 sucker, and 12 × 2 suckers. The diameters of four 1 × 1 suckers are 1 cm, 2 cm, 3 cm, and 4.5 cm, corresponding to the products of 1 cm–2.5 cm, 2.2 cm–4 cm, 3.2–8 cm, 5 cm–20 cm, respectively; one 2 × 1 suction cup sucks heavy goods between 10–20 cm; one 2 × 2 sucker absorbs more than 15 cm goods. There is a certain overlap in the range of products absorbed by the six suckers, which can ensure that 2102 kinds of goods can be covered in the business scenario (the specific number is shown in Table 1).

Table 1. Product and sucker mapping table

The manipulator we used is m-10iA/7L of FANUC company. It has six axes, the maximum load is 7 kg, the working radius is 1632 mm, the repositioning accuracy is 0.03 mm, and the working space is shown in the Fig. 3.

Fig. 3.
figure 3

The working space of M-10iA/7L

3.2 The Perception and Control Strategy in Grasping

Before the system goes online, all goods are marked, and the mapping table can be automatically generated according to the product size in the database and the product size that the suction cup can absorb, to establish the mapping relationship between the commodity and the suction cup. But in the process of grabbing, there will be the phenomenon that the size of the product to be absorbed meets the requirements of the suction cup, and the weight exceeds the maximum suction range that the suction cup can bear. Therefore, we add an online learning strategy to the system. When the system counts that it has failed to absorb a certain commodity for more than 5 times, it will automatically switch to the suction device with a larger suction force following the size.

In the process of system scheduling, we develop a set of intelligent perception and control strategies. Because the local server system has the mapping table of goods and suckers, when the corresponding goods arrive at the picking position, the system will send the corresponding suction cup instructions to the upper computer in advance, trigger the pre-defined fast switch control strategy of the upper computer, and switch the suckers according to the mapping table of the goods and suckers.

4 Object Segmentation and Pose Estimation

4.1 Object Recognition, Segmentation and Pose Estimation

To recognize the object and perceive its pose, hand-eye calibration is needed first, which is to determine the pose relationship between camera and robot. Our sorting system adopts the “eye in hand” mode, that is, the camera and robot end are fixed together for calibration [5]. The related principle is shown in Fig. 4.

Fig. 4.
figure 4

The principle of hand-eye calibration

Instance segmentation is to segment the object and background at pixel level while detecting the object [6]. Although the depth camera obtains the depth information of the image while obtaining the color image, and can quickly obtain the color scene three-dimensional point cloud image after hand-eye calibration [7]. However, in the sorting scene, the goods in the bin are closely linked together, and it is usually impossible to distinguish each commodity by using the depth camera. Therefore, we use multi-sensor data fusion based on a structured light depth camera and RGB industrial camera to obtain the depth and texture information of goods in the material box and use the case segmentation algorithm based on deep learning to solve the problem of commodity identification and segmentation in the material box and focus on the recognition of tightly fitting and non-textured goods.

The algorithm we use benefits from the Mask-RCNN algorithm by Kaiming He [8]. The Mask-RCNN algorithm combines Faster R-CNN and FCNs, adopts ResNet-FPN architecture in feature extraction, and uses ROI align to replace ROI pooling in the connection part, which both enhance the robustness of the algorithm for small target detection. The algorithm also adds a mask prediction branch, which can predict a more accurate target contour and distinguish each instance [9].

Besides, due to the problem of packaging materials, there will be some reflection, which will lead to holes in the point cloud, which will have an impact on the calculation and grasp position and posture of the manipulator. If there is a more accurate commodity segmentation mask, the point cloud can be repaired by using the mask. Therefore, the algorithm features of Mask-RCNN are very suitable for our business scenarios.

To avoid the capture failure caused by grasping the edges and corners of the product when the manipulator grabs, we take the surface of each commodity as an example when defining the target instance, and mark the surface of each commodity that can be seen by each commodity in the view of the camera, that is, to mark the outline of each commodity surface. When determining the category and category, since the goods in the business scene are generally box-shaped, bottle-shaped, cylindrical, bag-shaped, and sheet-shaped, and are stacked in the material box in a square, incomplete or circular shape, the categories are defined as the above three categories. We have carried out experiments to divide the three categories into one category, two categories, and three categories respectively. We find that the effect of combining the three categories into one category is the best and the generalization ability is the strongest. In essence, merging the three categories into one category is to let the algorithm learn the outline of goods.

After the recognition and segmentation of the case, we use 2D/3D bit technology based on point cloud normal vector to realize the pose estimation of the goods in the material box. That is, through the 3D point cloud data collected by the depth camera, the point cloud vector is calculated to determine the position and posture of the object in the manipulator coordinate system through the previously calibrated hand-eye coordinate conversion parameters.

4.2 Experimental Results

We collected pictures of nearly 500 kinds of commodities, and the trained model can effectively detect more than 1000 kinds of untrained commodities as shown in Figs. 5, 6, 7 and 8.

Fig. 5.
figure 5

Training classification results

Fig. 6.
figure 6

Segmentation results

Fig. 7.
figure 7

Product segmentation results in production operation

Fig. 8.
figure 8

Commodity pose estimation

The image data we collected are placed in the bin, which effectively increases the background semantic information, so that the algorithm can detect a certain area in the bin as a commodity (the virtual image of the bin wall mirror), avoiding the manipulator to grab the bin. We have also done a lot of data augmentation, including contrast, brightness, up and down flip, left and right flip, color channel transformation, salt and pepper noise, image dropout, etc., which has greatly improved the robustness of the model.

5 Real-Time Planning and Control of Mechanical Arm Sorting

5.1 The Overall Technical Scheme

Given the kinematics and dynamics description, environment model, initial and target states, kinematics, and dynamics constraints of the manipulator, the path that meets the constraint conditions can be obtained by solving the motion planning of the manipulator, to realize the flexible and intelligent operation of the manipulator. This process (Fig. 9) can be roughly divided into three parts: environmental information storage and perception, path segmentation and search, and real-time motion planning.

Fig. 9.
figure 9

The overall technical scheme

5.2 Environmental Information Storage and Perception

The environment of manipulator motion is divided into the known environment and the unknown environment. The known environment needs simplification, modeling and pre-storage, while the unknown environment needs perception.

For general crawling, most environments are known. To simplify, model and store it in advance, many triangles can be used to approximate the appearance details of the object as much as possible, store the normal vector and coordinate information of all the triangles, or only retain the basic contour, and use the basic geometry to represent the known environment as much as possible, to improve the detection speed. Each link of manipulators can also be represented by simple basic geometry, and the information of manipulators is stored in URDF files.

A small part of the environment is unknown, such as the material box to be caught, the items in the material box, and so on. The multi-sensor data fusion of depth camera based on structured light and RGB industrial camera can be used to obtain the depth and texture information of goods in the material box, and the collision detection model can be formed by storing in octrees.

5.3 Path Segmentation and Search

Based on the position, posture, shape, weight and other characteristics of the object in the box, the path and trajectory of the manipulator grasping the object to the halfway point will change constantly, while the trajectory from the halfway point to the destination box will not change. To reduce the space calculation range of real-time motion planning, the fixed trajectory is optimized offline.

The robot arm is composed of joints, so the trajectory formed by the acceleration and deceleration of the joint is the trajectory of the manipulator. The optimal trajectories can be determined by weighting the risk factors and the shortest time-consuming. The risk factors can be determined in many ways, including whether the trajectory can achieve high acceleration, the possibility of collision with the environment, and can also be obtained from the reinforcement experience of other similar types of manipulator movement. We can set the threshold of risk factor, and then select the trajectory with the shortest time, or set the threshold of time to select the trajectory with the minimum risk factor.

5.4 Real-Time Motion Planning

The variable part of the trajectory needs to be obtained by real-time motion planning. Motion planning is to get the path that meets the constraint conditions by given the kinematics and dynamics description, environment model, initial and target states, kinematics and dynamics constraints.

Mathematical description of motion planning is as follows:

  1. 1.

    World W is R3 or R2;

  2. 2.

    Closure o ∈ W;

  3. 3.

    Robot a consists of m links:

  4. 4.

    Configuration space C is the set of all possible transformations of the robot. Based on this, Cobs and Cfree were produced;

  5. 5.

    The configuration q1 is the initial configuration;

  6. 6.

    The configuration qG is the target configuration;

  7. 7.

    Provide a continuous path τ: [0, 1] → Cfree such that τ(0) = q1, τ(1) = qG, or correctly report that such a path does not exist.

We use a 6-DOF manipulator to plan the motion in the joint space, which only needs to solve the forward kinematics.

Feasibility Test of Grab Point.

The target point of motion planning is the output of the visual part of the environment perception, and there are many points to be grasped in the visual output. Therefore, it is necessary to detect the feasibility of grasping points, including judging whether there is an inverse solution (i.e. judging whether the grasping pose is reachable) and whether there is a collision.

Inverse Solution Detection.

Since the axes of the three joints of the end of the industrial manipulator intersect at one point, there is an analytical inverse solution [10]. However, in the process of solving, singular problems may appear. For serial manipulators with degrees of freedom less than or equal to 6, the singular configuration will bring about infinite solutions and sudden stop. So we should avoid the appearance of singular value.

After the inverse solution of the manipulator is completed, the joint angle needs to be mapped to the actual joint angle of the manipulator. The joint angle of the manipulator may be greater than [0, 2π], so the actual joint angle can be determined according to the principle of minimum weighted joint angle, and the inverse solution beyond the range of joint angle should be discarded.

Collision Detection.

Motion planning also avoids collisions. We build the environment model through known environment modeling and environment perception, and collision detection can be carried out through the environment model.

In the sorting system, we use AABB algorithm for collision detection [11]. In this method, objects in space are surrounded by a cube bounding box parallel to the coordinate axis, which is called AABB bounding box. Then, from the eight vertices of the bounding box of object a and the eight vertices of the bounding box of object B, two maximum and minimum vertices are selected for comparison. If the coordinates of these two vertices intersect, then the two bounding boxes must intersect and collide, otherwise, they will not. This algorithm is fast and widely used, but it is rough.

Sample-Based Motion Planning.

There are two kinds of motion planning algorithms based on sampling: PRM and RRT.

Probabilistic Roadmap Method (PRM) is to select n nodes randomly in the planning space, and then connect the nodes, and remove the connection line with the obstacles, to get a random road map. In this method, if the number of nodes is too small, the global path may not be obtained; if the number of nodes is too large, the quality of the global path obtained is difficult to guarantee [12].

Rapid-exploration random tree method (RRT) extends a tree structure outward from the starting point, and the expansion direction of the tree structure is determined by random sampling points in the planning space. The shortcomings of this method are obvious:① the convergence speed is slow; ② the obtained path is unstable and has a large deviation from the optimal path [13]. ③ the sampling process is random, and the best path point may not be sampled [14]. If two fast-expanding random trees are grown from the initial state point and the target state point simultaneously to search the state space, and the greedy strategy is added to the growth mode, the useless search in the blank area can be reduced and the search speed in the space can be significantly improved. This algorithm is called RRT connect algorithm.

RRT-connect algorithm has a longer step size and faster tree growth; besides, two trees continue to expand toward each other alternately instead of random expansion. This heuristic extension makes tree expansion more greedy and explicit, so it is more effective [15].

Our sorting system uses RRT connect and PRM algorithm to complete the motion planning. Because there is more than one target point to capture, the planning is completed once it is connected with the start tree for each target, which speeds up the speed of sampling planning; for repeated planning in a similar environment, the PRM algorithm is adopted, one thread samples and checks the effectiveness, and the other thread searches. When planning for many times, the previous sampling points are retained, and the existing map is used next time to construct the path from the starting point to the endpoint. Once the construction is successful, the validity of the previous points used in the path is detected. If the nodes are invalid, they will be removed and reconnected.

5.5 Actual Planning Process

Firstly, we simplify the modeling, store the known environment, and perceive the unknown environment; then, we optimize the fixed path through offline motion planning and optimize the variable path through online motion planning, that is, sampling in the joint space allowed by the manipulator, and the sampling results construct the overall posture of the manipulator through the forward kinematics of the manipulator, and then complete the online integration in consideration of joint constraints, dynamic constraints and collision detection sports planning. Combined with offline and online motion planning, the overall smooth path can be obtained. After the manipulator has been running for a while, the off-line learning is carried out periodically to improve the path planning strategy of the manipulator and realize the strategy update iteration.

6 System Integration and Verification

Our logistics sorting system is currently used in the sorting task in the Central China INTPLOG beauty warehouse (Fig. 10). After the sorting task is sent to the robot control system, the robot control system coordinates all hardware to perform relevant actions: first, trigger the 2D & 3D camera to take a photo, and obtain the 2D color image and 3D of the goods in the material box Point cloud image information, 2D & 3D images are transmitted to the image industrial computer through TCP/IP, and the corresponding visual algorithm is used to process the conversion relationship between the manipulator and the commodity in a Cartesian coordinate system. The conversion relationship is transmitted to the RCS through TCP/IP, and the RCS sends motion instructions to the manipulator controller, and then the manipulator performs a sorting task. The sorting tasks are repeated according to the execution order of the algorithm.

Fig. 10.
figure 10

The working scene

The orders processed by the manipulator accounted for 50%–80% of the total orders of the roadway, and the theoretical coverage rate of the whole warehouse SKU (more than 2400 kinds) reached 88.7%. The actual working verification shows that the actual picking and placing time of the manipulator is about 4.5 s, the grasping success rate is more than 99%, and the working efficiency of the manipulator workstation is 500 pieces/h.

7 Conclusion

We designed a kind of mechanical arm sorting system applied in logistics sorting scenes. The system is composed of manipulators, 2D & 3D camera, image industrial control computer, end effector, vibration table (used to adjust the position and posture of goods in the bin) and hood, etc. When the system receives the sorting task, the 2D & 3D camera takes pictures of the goods, and then obtains the transformation relationship between the manipulator and the goods in a Cartesian coordinate system through the corresponding visual algorithm, including the use of case segmentation technology based on deep learning to complete the identification and segmentation of goods in the bin, especially the identification of tightly fitting and non-textured goods, and the use of the point-based method. The 2D/3D position technology including the cloud normal vector is used to realize the pose estimation of the goods in the bin. Then, the trajectory of the manipulator is planned based on the RRT-Connect algorithm and the PRM algorithm. The manipulator is grasped by a specially designed end gripper with six specifications of suckers and can learn online. Finally, the sorting task is completed efficiently and accurately.