Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

To achieve an industrial automatic warehouse environment, where all the autonomous vehicles can localize themselves, perform task planning and interact with each other, a map that represents the infrastructure and goods in the environment is required. One of the approaches to achieve such warehouse environment is to create the map automatically using unmanned vehicle. To explore an unknown environment, simultaneous localization and mapping (SLAM) technique is often used. It is the joint estimation of the unmanned vehicle’s position in the environment and a model of its surroundings i.e., the map. It is a key requirement for fully autonomous mobile vehicles operating in an unknown workspace. A vast number of SLAM implementations have been proposed for aerial or ground vehicles. In this paper, a quadrotor helicopter was used to create the map of a warehouse.

1.1 Related Works

Compared to ground vehicles, the micro aerial vehicles (MAVs) have advantages in form of mobility. For instance, they can operate in three-dimensional space and do not require the ground to run on. While MAVs enable opportunities for performing new tasks, they contains more challenges compared to the ground robots. Especially, weight and power constraints, as discussed in [15]. Many studies have been performed on SLAM using MAVs in indoor [7, 16, 20, 22] and outdoor [3, 17] environments. In these works different type of sensors such as camera [9, 13], and lightweight laser scanner are commonly used for mapping and tracking purposes. In contrast to laser rangefinders, cameras are affordable, small, and light. Furthermore, they can be used in both indoor and outdoor environments with less limitations. These characteristics make cameras more suitable choice for aerial vehicles.

Parallel Tracking and Mapping (PTAM) [14] is one of the popular keyframe-based algorithms. By using image streams from a camera, one can construct point cloud of corner features in the environment. PTAM can be used for tracking-while-mapping purposes [3, 10]. By using PTAM, no prior model of the scene is required, and the algorithm provide a 3D map of corner features in the observed image frames.

Majority of the performed researches were using MAV platform from Ascending Technologies GmbH (AscTec). This platform was used mostly to perform autonomous exploration and navigation tasks such as the work from Shen et al. [21, 22], Weiss et al. [23], Bachrach et al. [46] and Pravitra et al. [20]. The platform offers high performance on-board processors with around 600–650 g payload. Another alternative is the platform from MikroKopters [2], which as an example is used in Piratla et al.’s work [18]. Finally, Parrot AR.Drone 2.0, is a quadrocopter platform with cheaper price. It does not offer any payload. Only a few research have been done using AR.Drone. For example, autonomous indoor flight from Bills et al. [7], vision based navigation from Blosch et al. [8] and Engel et al. [11]. The interesting feature of the AR.Drone is the cost; it doesn’t have a pay load, which makes it more suitable for exploration mission and human robot interaction (HRI) [19].

In this work we employ a cheap and easy to use platform, AR.Drone, to explore and map the infrastructure in a real warehouse environment. The system provides a map, representing the structure of the surveyed environment. Our experiments are done in a warehouse where pillars from storing shelves are desired as landmark objects. In order to detect pillars, we combine the information from PTAM with a multi-stage image analysis algorithm which uses the prior knowledge about the unique and uniform color of the pillars. This information fusion provides us a robust and accurate position estimation of pillars location in the warehouse. The system is implemented in Robot Operating System (ROS) and MATLAB, and has been successfully tested in a real-world experiments. The map generated after scaling has a metric error less than 20 cm.

The remaining of this paper is structured as follows. In Sect. 3.1, our system architecture and modifications are elaborated. Section 2 explain the proposed method, and we describe our experiments and results in Sect. 3. We summarize our contribution and discuss future work in Sect. 4.

2 Method

System overview is elaborated in Fig. 1. Based on image stream from AR.Drone, Parallel Tracking and Mapping (PTAM) [14] provides position of the drone and point cloud of corner features in the environment. Localization of drone by means of the PTAM and required modifications are described in Sect. 2.1. With the same image stream, image analysis techniques are applied to obtain pillars’ position (projection) in two-dimensional world coordinates. In Sect. 2.2, the process of pillars detection and corresponding assumptions are presented. Finally, the map of pillars in world coordinates is created using the correspondence between PTAM’s point cloud and the two-dimensional position of the pillars (described in Sect. 2.3).

Fig. 1.
figure 1

System overview

2.1 Localization and Point Cloud Map

Localization of the drone is carried out by employing an implementation of PTAM, which is a vision based tracking system designed for augmented reality. It provides position of the drone in the environment as well as point cloud of corner features. The package employs Extended Kalman Filter, of which the control gains are used for EKF prediction, location of extracted corner features and navigation data are used for EKF update. After the initialization of PTAM, a point cloud map \(\varvec{m}=\{m_1, m_2,..., m_k\}\) that contains the location of corner features in the image is created.

2.2 Landmark Detection

Pillars of the shelves capture structural layout of a warehouse. They are common in warehouse environment and usually painted with unique and uniform color, as shown in Fig. 2. Therefore, they are chosen as representative objects in this work. In this work, a multi-stage image analysis and a grouping algorithm is employed to obtain pillars’ position \(\varvec{\varLambda }=\{\lambda _1, \lambda _2,..., \lambda _l\}\) from image sequences and drone’s odometry [12].

Fig. 2.
figure 2

(left) A scene in the warehouse, where pillars are common infrastructure with uniformed and unique colour. (right) Structural layout of the warehouse: pillar can be considered as landmark of the environment.

Initially the image acquired from AR.Drone is rotated to align the vertical axis of the image with the corridor. This rotation will facilitate correlating the image coordinate and the point cloud data (provided by PTAM). The first stage of the image analysis algorithm is color segmentation. This task is performed in HSV color space. Pixels belonging to pillars are extracted by thresholding the HSV component. Afterwards, the edges of pillars are detected using Canny edge detector. At last, Hough transform is employed to extract lines that represent the edges of the pillars. Since the correlation between image coordinate and the point cloud is provided in an open loop, only horizontal edges are accepted to improve the accuracy of the result. The pillar positions \(\varLambda \) in the global frame are simultaneously published into ROS topic and stored. After all the image sequences are processed, the estimated pillar positions will be grouped into several clusters, each cluster representing a pillar. Then, weighted average mean is applied on all pillar projections and the mean position will be considered as the estimated pillar position of each cluster (Fig. 3).

Fig. 3.
figure 3

(left) shows the view of bottom camera. Red lines in (right) are detected edges of pillars. (Color figure online)

2.3 Mapping Pillars Using Data Fusion

Creation of pillar map relies on two sources, point cloud \(\{m_1, m_2,..., m_k\}\) from PTAM and estimated pillar position \(\{\lambda _1, \lambda _2,..., \lambda _l\}\) from pillar extraction method. The mapping algorithm [12] fuses the two sources and generates a two-dimensional map of pillar’s position \(\varvec{\hat{\lambda }}=\{\hat{\lambda }_1, \hat{\lambda }_2,..., \hat{\lambda }_l\}\). Point cloud map from PTAM includes 3D position of corner features in image frames, some of the corners extracted from the pillars (since there are structural rectangle hole patterns on every pillar) and therefore indicate pillar’s two-dimensional position \((x_{\lambda _t}, y_{\lambda _t})\). Estimated pillar position from pillar detection could be utilized to locate these points and filter out the irrelevant points (extracted from other object). To find the dominant orientation of the point cloud. Radon transform is employed. Through this operation, the point cloud data and estimated pillar position are correlated in the same coordinates. At the end, a filter (1) is applied to accept the point cloud that is close to the estimated pillar position.

$$\begin{aligned} \varvec{\hat{\lambda }_n} = f(\varvec{m}, \lambda _n, W_{size}) \end{aligned}$$
(1)

\(\varvec{m}\) is the position of all corner features, \(\lambda _n\) is the estimated two-dimensional position of a pillar and \(W_{size}\) is the size of a window function. Any corner point that is located within the window function of a pillar is selected as a candidate of the correspondent pillar. An average weight is applied on each candidate and mean value of these points along x and y axis is calculated. Algorithm 1 describe the details of the pillar localization and mapping method. Figure 4a shows the point cloud that aligned to the corridor and Fig. 4b shows the point cloud that is belong to the pillars filtered by the Algorithm 1. The red points are considered as the estimated two-dimensional position of the pillar \(\varvec{\hat{\lambda }}\).

figure a
Fig. 4.
figure 4

(a) shows point cloud from PTAM, (b) shows filtered point cloud and the mapping result is shown in (c)

3 Experiments and Results

3.1 System Architecture and Modifications

Overview of our system is shown in Fig. 1. It contains two major parts. First, Parrot AR.Drone 2.0, a quadrotor helicopter platform that carries two cameras, one looking forward (front camera) and the other facing downward (bottom camera). Second, a computer, which gather data from the AR.Drone into robot operating system (ROS). The computer and the AR.Drone is connected with a wireless connection. The data consists of (a) image stream from a selected camera and (b) position of the AR.Drone estimated by inertial measurement unit (IMU). ROS uses the data to run PTAM algorithm, which provide three-dimensional coordinates of corner features detected in the environment, referred to as point cloud in Fig. 1. Moreover, pillars are detected from images using color segmentation. “Estimated Pose of Landmarks in 2D” in Fig. 1 is the estimated positions of the pillars in two-dimensional space using this pillar detection method. Later, MATLAB was used to combined the data and generate a map of pillars of the environment. For more details regarding the methods, please refer to Sect. 2.

One major modification to the original AR.Drone was made, which is to move the front camera to the bottom of the AR.Drone looking downward. The main reason for this modification is because of the PTAM algorithm. PTAM’s performance depends on tracking corner features and rapid changes of the scenes cause the algorithm to fail (e.g. a sharp yaw turn by AR.Drone). Moreover, PTAM expect the motion of the camera to be in parallel to the scene. In front camera set up, the scene changes topologically as the drone flies into the corridor, hence the features’ motion is not in parallel to the camera. Figure 5 illustrates the view comparison between front and down camera configurations. However, the bottom camera has a considerably small field of view and low resolution which doesn’t provide enough feature for the algorithm to track. Therefore, to satisfied the requirements, the front camera set up on the AR.Drone was modified. This camera setup was inspired by ethzasl_ptam package [1], where with PTAM is employed with a similar camera setup mounted on a high altitude aerial vehicle.

Fig. 5.
figure 5

Illustration of scenes from original and modified camera

3.2 Results

The experiment was conducted in a warehouse. The AR.Drone flew straight through a corridor in a warehouse, start from \((0,0)\) going in \(+y\) direction. It was manually controlled using joystick. Following data are collected from the AR.Drone through ROS: (a) image stream; (b) IMUs data; and (c) control command that sent to the AR.Drone. Then, the recorded data was used in localization, pillar detection method and generate point cloud map. Finally, outputs from the methods were used to generate an object map in MATLAB. The map is illustrated in Fig. 4c. Pillars are represented with \((x,y)\) value in world coordinates. Corridor width is calculated based on taking the distance between two peaks of point cloud projection at the dominant orientation of Radon transform, which is 2.8 m for this dataset. Distances between adjacent pillars is calculated in x and y direction. After that, mean and variance of distances, shown in Table 1, are calculated for further evaluation.

In order to evaluate the result, the map is scaled up to the actual distance in the environment. In this case, we scale the map based on three criteria: (a) distance between pillars on left side of the corridor; (b) distance between pillars on right side of the corridor; and (c) width of the corridor. Table 2 presents errors compared to real distances after scaling.

Table 1. Mean and variance of distance between pillars in meters
Table 2. Estimation error after scaling

4 Conclusion

Towards the goal of industrial automation in warehouses, a surveying system based on a low cost platform, Parrot AR.Drone 2.0, is proposed in this paper. Based on image stream and IMUs, the drone employs Parallel Tracking and Mapping (PTAM) to localize itself in the environment. Moreover, PTAM generates point cloud of corner features in the environment, which is used in pillar localization and mapping. Concurrently, our pillar detection method detects and provide 2D pillar projections. Finally, a pillar map is created by finding the correspondence between the point cloud and pillar projections.

As a result, the system provides a pillar map with the biggest error of 20.3 cm (7.5189 %). Cost of the system is comparatively low and doesn’t need any prior setup on the environment, e.g. pre-mounted tags.

4.1 Discussion

One problematic issue of the map generated by PTAM is the scale. It is greatly depends on the initialization of the first two image frame (key frames), which is currently done manually. Therefore the map generated from two different trails can be inconsistent.

In addition, image sequences from AR.Drone are not always smooth and sometimes do not have sufficient quality to detect enough corner features. As a result, the algorithm eventually lose tracking and fail the localization. Later, image sequences were improved to be more smooth by reducing the control speed of AR.Drone. Pillar detection method is based on the color of pillars, which is known and pre-defined before the experiment. Our test environment has uniform color of pillars, which might be different in other environment. Thus, a more general approach is desired.

4.2 Future Work

Long-term autonomy of intelligent vehicles within warehouse environment requires autonomous exploration, obstacle avoidance and self-charging functionality. The proposed work can be integrated into autonomous warehouse for surveillance purpose, e.g. MAVs can be deployed to modelling traffics in the warehouse, provide useful information for planning the path of ground vehicles as well as detect anomalies in the warehouse.