Keywords

1 Introduction

A fundamental problem for an autonomous mobile robot is knowing its current position and orientation by sensorial observation and previous accurate localization. This is still the subject of several researches in the mobile robot community with the aim of increasing robot autonomy. Although global positioning system (GPS) is suitable for mobile robot localization in outdoor environment, it is difficult to be used in an indoor environment. In case GPS is unavailable, localization using odometry [1] and dead reckoning using IMU sensors [2] may provide an alternative solution. However, odometry is subject to growing errors over time and it is hence insufficient for many tasks [3]. The indoor navigation is based on the exploitation of the environment and of available technologies that allow localisation even in indoor scenarios. One of the most widely used techniques is to place landmarks in known environment’s points. In this way mounting an on-board robot webcam (focused on the landmarks), the localisation system uses the information about the position of each landmark to localise the robot inside the environment [4]. Nevertheless, in presence of obstacles between robot and landmarks, the relative position can not be evaluated. Moreover, especially in industrial scenarios, it is usually not well accepted the introduction of landmarks in the environment. A possible alternative is to take advantage of a priori knowledge of the environment where the idea is to extract features from the environment and to compare them with a priori knowledge. For example, in [5] the choice of the corridor as the working environment has two important reasons. First, the corridor is a common part of most domestic environments and being able to navigate in it has potential on its own. Second, it has a very regular and simple structure, making it easier to start the development of a more general vision based solutions.

In recent years, smart buildings use wireless sensor network (WSN) as localization systems: in [6], WSN is used as an assistant for odometry and other on board sensors used by the localisation systems. The main limit of such infrastructure is its cost, that is not negligible. Another possible infrastructure is based on cameras placed in known positions in the environment. For example, in [7] a camera is placed on the ceiling of the hall. A drawback of the proposed method is the localisation failure in certain situations. For instance in case of bad illumination, markers on the robots can not be detected and localised properly.

The purpose of this work is to create a localisation system based on low cost devices, in order to verify applicability of the algorithms presented in [810]. We want to identify robots moving in an indoor environment, fusing vision data from the cameras mounted in the environment, odometry data from the encoders fixed on the robot’s wheels and IMU data from the IMU sensor mounted on board. Based on the work proposed in [11], where a single camera is used to localise a single robot, we develop a localisation system able to manage multiple robots with multiple cameras. The developed system is environment independent and hence it does not require any a priori knowledge of the environment. Moreover, our framework is able to manage an unlimited number of robots and cameras at the same time. For this reason the system can perform in an indefinitely wide environment. In addition, a method for on line camera self-calibration is also proposed. Once the initial calibration of the cameras is done, it is possible to move each camera while a monocular visual odometry algorithm [12] performs a continuous real-time calibration of the cameras parameters. To speed up the visual localisation and to increase accuracy, vision information is combined with robots sensors using two Kalman-Filters. One of the main problems when dealing with optical cameras is the illumination of the scene that can change during robot operation. With the proposed approach failures in detection are avoided thanks to an on-line calibration of the HSV (Hue, Saturation and Value) values of the markers colour, performed by an operator. In this way we can change the colours calibration whenever lights in the environment change increasing robot autonomy.

The paper is organized as follows: in Sect. 2 we define the visual tracking system. In Sect. 3 the integration of on board sensors is detailed while Sect. 4 is devoted to the experimental set-up and results. Finally Sect. 5 presents some conclusive remarks and the planned future work, such as an extension of the system to other, more demanding, situations.

2 Visual Tracking Module

The localisation system consists of two different modules. The first is the module responsible for the localisation of the vehicle through fixed cameras while the second is dedicated to the integration of the data provided by the first one and all the other available sensors. The system has been developed so that the two modules can be implemented independently. In this Section details on the Visual Tracking Module are provided. On top of each robot are placed two coloured balls mounted at fixed known distance and such that the midpoint coincides with the centre of the robot, as shown in Fig. 1.

Fig. 1.
figure 1

Roomba robot equipped with front red marker and back blue marker mounted at 10 cm (Color figure online)

In order to identify each marker position, some sequential operations must be performed. First, the camera hardware must be calibrated: 8 parameters are calibrated such that the position and orientation of each camera with respect to the fixed reference system can be computed. Once images are acquired from each camera, a filtering operation starts, (e.g. see [13]):

  1. 1.

    Remove image’s noise through a Gaussian blur filter;

  2. 2.

    Convert RGB (Red, Green, Blue) color model to HSV (Hue, Saturation, Value). In this way we can filter the hue, saturation and value of the negligible colours, i.e. colours different from those used for the markers;

  3. 3.

    Use morphological operations such as erosion and dilation. The main uses are:

    • Remove noise;

    • Isolate individual elements and join related elements in an image;

    • Find intensity bumps or holes in an image.

Once marker contours are identified, a white rectangle is drawn around them whose center coincides with the marker position in the environment, as shown in Fig. 2.

Fig. 2.
figure 2

Robot is tracked by a camera and its position and orientation is marked with rectangle centered on passive markers

The center of the marker is reported in pixel coordinate of the camera reference frame. In order to allow more cameras to identify the same marker a common reference frame is chosen, e.g. the world reference system.

Marker positions, from different cameras, of the same color are collected into position arrays, so that it is possible to assign to each robot only the two position arrays related to the colors of its markers. Each robot manages a private Kalman filter, which predicts the marker position at the current time instant. Such prediction is used to reduce false assignments, through the execution of a spatial filter on the position arrays received from the cameras. Moreover, due to the fact that the distance between the markers is fixed and known, a second check is performed to select the pair of markers that respect the given distance. Once marker positions are selected, the robot position is computed as the midpoint of the selected markers. Moreover, based on this information, the heading of each robot can also be computed. A graphical scheme of the entire process described in this section is reported in Fig. 3, where the software architecture of the visual tracking module is represented.

Fig. 3.
figure 3

Tracking Visual Module software architecture. The main steps of each camera are: camera calibration, image acquisition, marker position computation, coloured marker positions array creation, arrays sorting for each robot, robot position and orientation computation (Color figure online)

In case of obstacles in the environment, the visual tracking module computes robot position using Kalman filter pure prediction. Such kind of computation is affected by a fast degradation of the performances decreasing the robot autonomy. To mitigate such problem and to improve the dead-reckoning computation, an integration with other sensors data can be performed as described in Sect. 3.

2.1 Online Calibration

In order to avoid the use of an external infrastructure, that is typically unavailable in disaster scenarios in case of search and rescue tasks, the capability of tracking passive markers with on board cameras has been also developed. More specifically, by the use of a visual odometry module such as the one proposed in [12], it is possible to have an online calibration of camera parameters. Online calibration and tracking with on board cameras allow to localise robots in environment parts not reachable by a fixed camera infrastructure, e.g. hidden areas occluded by obstacles. Moreover, this feature may also be used to avoid localisation errors due to unexpected external perturbation of the camera positions in an infrastructure.

3 On Board Sensor Integration Module

In general, localisation using odometry and dead reckoning using IMU sensors may provide an alternative solution for robot localisation purposes. Unfortunately, it is well known that odometry from encoders and dead reckoning from IMU sensors may show substantial errors in the localisation due to wheel slippage and drift respectively [14]. These errors can be overcome by integrating these sensors with the data provided by the visual tracking module proposed in the previous section. Fusing all these sensor data through two Extended Kalman filters (EKFs) reduces the robot localisation error. The integration is performed with two sequential EKFs as shown in Fig. 4. This approach combines the benefits of the three sensor typologies. The visual tracking module can be seen as a correction for the estimation obtained based on the on board sensors. On the other hand, the on board odometry information can be employed to overcome time-delays that occurs in the vision system, as well as errors due to regions where no visual markers are available. Referring to Fig. 4, the prediction operation of the first EKF is executed once a new odometry pose estimation is available. Since the on board odometry estimation is fast, the estimation of the robot’s state can be performed at a high frequency. The second and subsequent EKF is used to integrate the filtered odometry obtained with the first EKF with the data provided by the visual tracking module. The update step of the second EKF is executed when a new pose estimation is obtained from the vision tracking module and it is employed to correct the estimation performed by the on board odometry. It is worth noticing that the second EKF has a lower update frequency with respect to the first one.

Fig. 4.
figure 4

Two EKFs fusing all sensors data. The first ones fuses on-board odometry data, the second one updates the pose estimation with visual data

Time-delay of the visual tracking module is also an issue which needs to be tackled. In our vision system, for example, the vision time-delay is about 300 ms and to overcome this problem, the past odometry information is saved in a buffer. When obtaining a vision estimation, this information is aligned to the robot’s pose in the past. Thus, the Kalman belief state now corresponds to the (past) robot’s pose and the update can be performed.

4 Experimental Results and Validation

To facilitate the evaluation of the performance of the proposed approach we developed a system fully integrated with the well known Robotic Operating System (ROS) [15]. Such integration lead us to test the framework in a simulation environment (Gazebo) and in a real world scenario without any change of the code. In this section, the results and the performances achieved by the proposed localisation system are reported. Two type of tests have been performed:

  1. 1.

    Static tests are used to verify the accuracy of the measures of the visual tracking module;

  2. 2.

    Dinamic tests are used to verify the accuracy of the measures when on board sensors are integrated with visual tracking module.

4.1 Static Tests

For the evaluation performance of the visual tracking module, two different types of cameras have been used to test the applicability of the proposed approach to different hardware:

  1. 1.

    Philips SPC1030NC/00;

  2. 2.

    Logitech HDWebcam C270.

The first test aimed to define the maximum distance from the cameras lens to perform a localisation of the robots on the floor. Cameras were mounted on a two meters tall tripod, with fixed pitch angle. Results obtained are that:

  1. 1.

    the Philips SPC1030NC/00 localises robots up to 5 m distance;

  2. 2.

    the Logitech HDWebcam C270 localises robots up to 5.5 m distance.

The second test was performed to define the measurement accuracy provided by the cameras with respect to known positions. The accuracy error obtained ranged from 2 cm to 5 cm for both cameras. The cameras have been positioned in the environment according to the performance determined by the static tests.

4.2 Dynamic Tests

Dynamic tests have been conducted to validate the performance of the proposed localisation system and in particular the integration of the data from the on board sensors with the visual tracking module. To verify the functionality of the proposed framework autonomous vehicles are moved by a remote operator along a fixed trajectory: a rectangle of 2.5 m per 1.5 m. Data are recorded once the robot starts moving until it reaches the initial position. In Fig. 5 odometry data and filtered odometry (outcomes of the first EKF described in previous section) are compared in a single rectangle tracking.

Fig. 5.
figure 5

The robot’s trajectory using only odometry data is shown in red, the trajectory using odometry and IMU data is shown in white. The grey squares bound the perimeter of the rectangle followed by the robot (Color figure online)

As expected, the final pose error in case of the robot odometry (red trajectory) is quite high, for the experiment reported in Fig. 5 the final pose shifted of approximately 51 cm with respect to the ground truth. On the other hand, when filtered odometry is used (white trajectory), the final position estimation is, as expected, affected by a smaller error of 33 cm compared to the ground truth. These results show that the first EKF improves the estimation of the position and orientation of the robots, but they are still not satisfactory to allow the robot operating for a significant amount of time. When visual tracking module is integrated with on board sensors, the estimation of the final position and orientation is affected by an error of 10 cm compared to the ground truth reducing the error of the \(70\,\%\).

Fig. 6.
figure 6

In the left there are three acquired images by the cameras at the end of the experiment. In the right the real trajectory is shown in red and the estimated trajectory using all the sensors is shown in yellow (Color figure online)

A final set of experiments has been conducted to validate the localisation system in case of a single robot and multi cameras and then in case of multiple robots and multiple cameras. In the first test the robot is piloted in an indoor area of 7 m \(\times \) 3 m monitored by three cameras located in known positions that focus three different and possibly overlapping sub-areas of the experiment. In Fig. 6, a snapshot of the three cameras is reported on the left. Notice that the robot presence is detected only by the first camera. On the right the real trajectory is reported in red while the one estimated with the proposed localisation system is reported in yellow. It is worth noticing that, the proposed method allows a smooth trajectory estimation even if the robot, while moving, is detected by different cameras. Hence, no substantial errors occur during cameras switching. In this case, the visual tracking module integrated with on board sensors, is able to estimate the robot position with errors less than 10 cm.

A final test has been performed with two different cameras and two robots. During such experiment robots occurred to be in the same camera field of view at the same time or in different camera field-of-views. In Fig. 7 it is reported a snapshot of two different cameras when more than one robots are in their field of view, at the same time. The proposed system has been able to manage robots overlapping in the captured image without compromising the robot localisation. With this experiment the localisation system demonstrates to perform well also in complex scenarios.

During experiments we have hence obtained very good results in localisation accuracy and robustness, indeed the mean position error is between 1 to 10 cm and the mean absolute heading angle error is between 1 to 5\(^{\circ }\) validating the proposed method.

Fig. 7.
figure 7

Two cameras manage two robots at the same time

5 Conclusion

In this paper a simple but efficient algorithm to fuse vision and on-board odometry for accurate indoor localisation using low cost devices has been presented. For fusing sensors informations, two sequential EKFs are used. The proposed method is able to combine the high tracking frequency of the odometry with the accuracy of the vision tracking system. This work allows to create a low cost indoor localisation system that could be used also in external scenarios, converting visual data in GPS data without any substantial code modification. In the future we aim to investigate the possibility of employing brightness sensors to automate the HSV values calibration. In this way each robot would be able to adjust the HSV values calibration on their markers according to the specific brightness detected in the robot’s position.