Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The development of environment and activity monitoring systems, based on heterogeneous networks of sensors, constitutes an active investigation field, with many potential applications, including safety, security, ambient intelligence and health-care assistance. In real scenarios, such as buildings, airports, road and rail networks, or sport grounds, a single sensor is not able to monitor the whole environment or to track a moving object or a person for a long period of time, due to field of view limitations. Furthermore, integrating information from multiple sensors is a basic requirement for achieving an adequate level of robustness and scalability. Typical solutions include the use of fixed camera networks that are able to cooperate to monitor wide areas and track objects beyond the capabilities of each single sensor [21]. However, fixed cameras may pose some critical limitations in large environments and wherever infrastructure preparation is expensive or unfeasible. As an alternative, mobile and multi-functional robots have been proposed as means to reduce the environment structuring and the number of devices needed to cover a given area [8]. The use of robots significantly expands the potential of monitoring systems, which can evolve from the traditional passive role, in which the system can only detect events and trigger alarms, to an active one, in which a robot can be used to interact with the environment, with humans or with other robots for more complex cooperative actions [17].

Fig. 15.1
figure 1

Conceptual representation of the proposed ambient assisted living system

In this chapter, a Distributed Ambient Assisted Living (DAAL) system is proposed. It is based on a distributed architecture exploiting fixed and mobile heterogeneous sensors to intelligently monitor large environments and track human activities. The proposed cooperative monitoring system integrates fixed calibrated cameras with a team of autonomous mobile robots equipped with different sensors. A conceptual representation of the system is shown in Fig. 15.1. The system is being developed as part of the project BAITAH (Italian National Research Program PON-BAITAH—“Methodology and Instruments of Building Automation and Information Technology for pervasive models of treatment and Aids for domestic Healthcare”), aimed at identifying and developing ICT-based Ambient Intelligence technologies to support the independent living of fragile people in their domestic environments. In this project, mobile sensors are intended to provide two main contributions: they can supply information about the observed human target in areas that are out of the field of view of fixed cameras (thus reducing complexity and costs of the required infrastructure), and they can move close to the target to increase precision and reliability of scene analysis whenever fixed sensors are unable to provide robust estimates. In designing such a system, a major challenge is the integration of high-level decision-making issues with primitive simple behaviors for different operative scenarios. This aim requires a modular and reconfigurable system, capable of simultaneously addressing low-level reactive control, general purpose and monitoring tasks and high-level control algorithms in a distributed fashion. This chapter presents an overview of both the system architecture and the implemented algorithms.

The remainder of the chapter is structured as follows. Section 15.2 presents related work. In Sect. 15.3 the distributed algorithmic framework for the ambient assisted living system is presented. In Sect. 15.4 details about the implementation of the system in a real-world scenario are provided. Results of simulation tests are shown in Sect. 15.5, while preliminary real-world experiments using the proposed system are described in Sect. 15.6. Finally, conclusions are drawn in Sect. 15.7.

2 Related Work

In the last few years, many researchers have focused their attention on Ambient Assisted Living (AAL) technologies [10, 17]. Among the several research challenges in the AAL domain, one of the main issue regards the monitoring of people activities [9]. It is easy to note that this scenario is based on an accurate and robust tracking of people in the environment. This can be achieved exploiting the most recent techniques in multi-target tracking using distributed architectures. Several papers have addressed the problem of multi-target tracking by means of distributed camera networks. In [20], for example, the Kalman–Consensus filter [14] is used in order to fuse the information coming from each camera of the network in a decentralized way. The presence of a consensus step significantly increases the system performance as shown by experimental results. In [19] an extension of [20] to wide-area scene understanding is presented. To optimize the dynamic scene analysis, a control framework for a PTZ camera network is introduced. In [21], a survey of distributed multi-camera systems for target tracking on planar surfaces is provided. In [22], a review of distributed algorithms for several computer vision applications is presented, emphasizing the advantages of distributed approaches with respect to centralized ones. As a basic principle, in distributed estimation, each node of the network locally estimates the state of a dynamical process using information provided by its local sensor and by a subset of nodes of the network, called neighbors [24]. In the literature, several approaches relative to distributed estimation in sensor networks can be found. Their particular characteristic is the presence of an agreement step in order to minimize the discrepancy among sensory nodes [2, 3, 7].

While the use of multiple sensors increases reliability and effectiveness in large environments, it poses problems related to the need of infrastructures that can be heavy and expensive. These infrastructures can be reduced by exploiting the flexibility of moving sensors mounted on semi or fully autonomous vehicles that can be employed as individual agents or organized in teams to provide intelligent distributed monitoring of broad areas. Mobile sensors may significantly expand the potential of AAL technologies beyond the traditional passive role of event detection and alarm triggering from a static point of view. Mobile robots can actively interact with the environment, with humans or with other robots to accomplish more complex cooperative actions [1, 23]. Nevertheless, mobile surveillance devices based on autonomous vehicles are still in their initial stage of development and many issues are currently under investigation [5, 6, 12].

3 Distributed Ambient Assisted Living

In this chapter a Distributed Ambient Assisted Living (DAAL) framework is introduced. The proposed DAAL system is a multi-agentFootnote 1 heterogeneous network for distributed monitoring of people in an indoor environment. It is composed by a network of fixed cameras, to execute surveillance tasks in areas of relevant interest, and mobile robots, that are able to perform local and specific monitoring tasks to completely cover the environment. Integration among the various agents, fixed and mobile, is performed via a distributed control architecture which uses a wireless network as a communication channel. In this section, first, the target detection algorithms, used by either the fixed or the mobile agents, are described. Then, the distributed target tracking algorithm is presented.

3.1 Target Detection Using Fixed Cameras

Fixed cameras of the DAAL system are distributed in different locations of the environment to optimize the target detection task. Each fixed node is equipped with the following functionalities [9].

Motion Detection. The binary shape of moving objects (e.g., people) is extracted. Specifically, a statistical background model is generated by evaluating mean value and standard deviation for each point. Then, foreground moving regions are detected by estimating, for each pixel, the similarity between the current frame and the background model.

Shadow Removal. This task is necessary because foreground pixels may correspond not only to real moving objects, but also to their shadows. The shadow pixels need to be removed, as they alter the real shape of objects and decrease the precision of their localization. A connectivity analysis is, finally, performed to aggregate pixels belonging to the same moving object.

Object Tracking. The detected moving objects, after shadow removal, are tracked over time. Statistical (tracked object life time) and spatial information are extracted for each of them. This task enables the association of each moving region to the corresponding target object, based on its appearance. Furthermore, it reduces false detections due to noise or light reflections.

3D Moving Object Localization. The intersection of the central axis of the rectangular bounding box containing the moving region with its lower side provides the estimate of object position on the ground plane. The corresponding 3D position is evaluated using a pre-calibrated homographic matrix between the image plane and the 3D ground plane.

3.2 Target Detection Using the Mobile Robots

Robots used in the DAAL system are equipped with an RGB-D sensor, namely the Microsoft Kinect camera, to detect people in the environment [15]. The Kinect sensor, through the 3D data representation, allows to robustly track the positions of a group of people in the environment and to detect their movements [18]. Furthermore, in the case the Kinect is mounted on a robot, the presence of a relative motion between the camera and the global reference system should be taken into account in order to obtain better results.

In the DAAL system people are identified in the scene captured by the Kinect camera onboard the robot and then a single person of interest is selected. Once the robot is focused on a person, a tracking algorithm keeps track of the position of that person and a control algorithm allows the robot to move toward the person in order to improve the tracking performance. Thus, each mobile robot is equipped with the following functionalities.

Person Tracking. Through an algorithm for segmentation and recognition of the human skeleton, that takes advantage of both the RGB color information and the depth information of the image [13], the robot detects all the people in the scene. Then a person of interest is selected, for instance the person nearest to the robot or the one who made a given gesture. To improve the estimation of the position of each single person of interest and to improve the performance in realistic situations affected by noise, the tracking algorithm uses a Kalman filter that improves the estimation of the position. Finally, the correct position, obtained from the output of the Kalman filter, can be used as the input for the motion controller.

Person Following. The person follower controller is basically a motion control algorithm which requires two different inputs: the updated position of the person of interest, given by the person tracker algorithm, and the information about static and dynamic obstacles. In particular, the algorithm sets a trajectory planner with the position of the person as the point of arrival and at the same time the robot is aware of the obstacles in the environment obtained by the use of predefined map. In this way, the person follower controls the motion of the robot, exploiting the robust performance of a map-based navigation system, following the position of a person of interest.

3.3 Distributed Target Tracking Algorithm

At the final stage of the DAAL system all the information coming for the various sensors are fused using a distributed framework. To this end, the DAAL system exploits the fully distributed Consensus-based Distributed Target Tracking (CDTT) algorithm [16], to enhance the performance of the people tracking in a heterogeneous sensor network. The CDTT consists of a two-phase iterative procedure: an estimation step and a consensus step. In the estimation phase, each node of the network gives an estimate of the position of the target. If the node can directly take a measurement, then it will estimate the target position by means of a Kalman filter. Otherwise, the node will take a prediction of the target motion according to the embedded linear motion model of the Kalman filter. In the consensus phase, all the estimates in the network converge to a common value via a max-consensus protocol, performed on a measurement accuracy metrics called perception confidence value. This approach was proved to provide good performance in heterogeneous sensor networks composed by nodes with limited sensing capabilities [7]. The CDTT approach is totally distributed, as it does not involve any form of centralization. Moreover, it guarantees the agreement of the network nodes on the target position. The reader is referred to [16] for further information.

4 System Implementation

The DAAL system was implemented in the Robotics Laboratory of CNR ISSIA, Bari, Italy. In this section, details about the implementation of the system are provided.

In the DAAL network, each agent corresponds to an independent software component that is executed on the robots embedded PC for the mobile agents and on a workstation for the fixed cameras. This architectural solution provides several advantages. First, the system is totally plug-&-play, i.e., to increase the number of sensors it is sufficient to add a new camera (with its IP address) to the network, so that no further effort to program new cameras is required. Software maintenance is easy and immediate, avoiding the broadcast updating of each camera software. Moreover, algorithms for motion detection and shadow removal, as the ones explained in the previous section, are based on the evaluation of pixel correlation that requires a very fast processing unit to run in real time and that cannot be done efficiently with embedded cameras. The team of agents forms a peer-to-peer network. The network nodes differ only for their own sensor capabilities. In particular, every agent is able to detect an event (e.g., to perceive moving people or objects) and to localize an event (e.g., tracking the position of a person) in the environment using one or more sensor devices, whereas, in addition, mobile agents are able to execute tasks, through their actuators. The detailed description of fixed and mobile nodes are given in the following.

Fig. 15.2
figure 2

Schematic representation of interconnections among nodes composing the Fixed Node module

4.1 Setup of Fixed Nodes

The fixed agent software runs on a workstation linked to each camera by the network infrastructure. The schematic representation of interconnections among nodes composing the Fixed Node module is shown in Fig. 15.2. For each connected camera an autonomous thread integrated in the Robot Operating System (ROS)Footnote 2 framework is implemented, to execute some well-defined ordered tasks as explained in Sect. 15.3.

Fig. 15.3
figure 3

The measurement error model for one of the cameras: \(f(x) = a\,e^{bx}\). The error is limited when the sensor-target distance is under 7–8 m, and, above that, the error increases. This suggests that the camera should be deployed ensuring a maximum distance to the object under 7–8 m

Moreover in a real-world implementation, the measurement error of cameras should be taken into account. The measurement error of the cameras depends on the distance of the target relative to the sensor. In order to characterize the measurement error, an error model is fitted from a series of measurement errors obtained by the comparison of the position measured by the camera and the real position of the target (the real position of the target is retrieved by means of a theodolite). Figure 15.3 shows the model fitted as an exponential function for one of the cameras:

$$\begin{aligned} f(x) = a\,e^{bx}, \end{aligned}$$
(15.1)

where \(a = 0.009269 \pm 0.0074\) [meters] and \(b = 0.3258 \pm 0.0696\) 1/[meters] are the value of the coefficients (with 95 % confidence bounds) defining the actual function \(f(x)\).

Fig. 15.4
figure 4

Schematic representation of interconnections among nodes composing the Mobile Node module

4.2 Setup of Mobile Nodes

The mobile nodes of the network consist of mobile robots. Each mobile agent is equipped with sensory devices to interact with the environment. Every node is able to localize itself in the environment and to safely navigate avoiding static and dynamic obstacles. It is also able to identify and track the position of a target in the environment. ROS has been adopted as a framework for communication management, sensor acquisition and actuator control on the mobile robots. It is an open source framework that presents several packages ready to run in order to control all the devices of a robotic platform. ROS provides a Navigation Stack, which enables the robot to navigate in a known environment avoiding obstacles, as well as sensor management packages [11]. The most important characteristic of ROS is its modular structure that makes it possible to modify or substitute some modules. In order to develop a customized monitoring architecture, new functionalities have been developed and added to the native ROS framework. Specifically, the structure of the navigation stack of ROS has been modified in order to add surveillance capabilities to the mobile nodes. A coordinate transformation from local to global coordinates was also introduced for the people tracking task.

In Fig. 15.4 a schematic representation of the mobile node module is reported. All ROS nodes run on the on-board laptop, except for sicktoolbox_wrapper and p2os_driver, which runs on the embedded pc of the robot. As can be seen, the Navigation Stack of ROS produces robot position estimates, as well as information about obstacles on the basis of laser measurements. The ROS node motion_control, implemented by our research team, sends velocity references to p2os_driver ROS node, responsible of the robot guidance. The people_tracker node estimates the relative position of people with respect to the robot, on the basis of the skeleton information received from openni_tracker. The relative coordinates of detected people, transformed in the world reference frame, provide input data to the distributed target tracking algorithm.

5 Simulation Results

This section deals with the evaluation of the DAAL system through a numerical simulations’ campaign. The evaluation is focused on the tracking task performance. The setup and results of such simulations are described below.

5.1 Simulation Setup

The DAAL system performance is tested in a realistic scenario with a setup similar to the real one. A target moving inside a given environment, according to various random trajectories, is simulated for the target tracking task. The analysis is focused on the evaluation of the presence of mobile nodes in the network, specifically, the simulations investigate if the presence of mobile nodes increases the tracking performance according to a given evaluation index. A network composed by three cameras, named as \(C_1\), \(C_2\), \(C_3\), and one robot, \(R_1\), is assumed (as in the real DAAL system). Heterogeneity in the sensor network is due to the different sensing ranges of sensors, set on the basis of real devices’ characteristics. Specifically, the sensing area is defined as a circular sector area placed at the front of the sensor with radius of \(r_{C_1} = 10\) m for camera \(C_1\), \(r_{C_2} = 8.5\) m for camera \(C_2\), \(r_{C_3} =7\) m for camera \(C_3\), and \(r_{R_1} = 5\) m for robot \(R_1\). The sensors are modeled as range-bearing, with measurement error depending on distance and bearing of the target relative to the sensor. In order to assess the system performance, attention is focused on the tracking accuracy, by evaluating the discrepancy between estimated and actual target trajectory. Specifically, as a metric for target tracking accuracy estimation, the mean square error (in norm) is computed as:

$$\begin{aligned} \text{ MSE }=\frac{1}{k_f} \sum _{k=1}^{k_f} \Vert \overline{\varvec{\xi }}_i(k) - \varvec{\xi }(k) \Vert ^2 \end{aligned}$$
(15.2)

where \(k\) is the simulated discrete time, \(k_f\) is the duration (in time samples) of the target trajectory, \(\varvec{\xi }(k)\) is the actual target position at time \(k\), and \(\overline{\varvec{\xi }}_i(k)\) is the global target position estimates performed by the \(i\)th sensor of the network. It should be noted that the estimated target position is the same for any node of the network, since after convergence of the consensus step of the CDTT algorithm all the network nodes share the same information about the target location.

Fig. 15.5
figure 5

DAAL system evaluation: simulation of a random generated target trajectory for a network of \(3\) fixed nodes (in black) and \(1\) mobile node (in green). The red line indicates the actual trajectory of the target, while the blue dots are the estimated positions of the target returned by the CDTT algorithm. a The mobile robot surveys an assigned area of interest of the environment. b As soon as the target enters this area, the mobile node approaches it to perform a more accurate measure

5.2 Numerical Results

The tracking performance of the DAAL system is analyzed using the simulation setup described in Sect. 15.5.1. A campaign of Monte Carlo simulations is run in two different cases. In the first case, the mobile robot is kept at a fixed position, thus acting as a fourth static node. In the second case, the mobile robot can move in the surroundings of its initial location.

A set of 250 random target trajectories is run both for the case of static nodes only and for the scenario including the mobile node. The tracking simulation for one of the trajectories is shown in Fig. 15.5. It refers to the simulation performed using a mobile node (green arrow) in addition to the static ones (black arrows). A solid red line denotes the actual target trajectory, while the estimated positions at each time step \(k\) are marked by blue circles. Initially (Fig. 15.5a), the target is sensed by the static node \(C_2\), while the other nodes are aware of the target position thanks to the consensus convergence. As soon as the target enters the area surveyed by the mobile node (Fig. 15.5b), the latter approaches the target to perform a more accurate measure.

The numerical results of the simulation campaign are reported in Table 15.1, showing a mean square error of 0.2339 m and 0.1523 m, for the static network and for the network including the mobile node, respectively. As can be noted, the presence of a mobile node increases the tracking accuracy. This is mainly due to two reasons: first, the mobile node can approach the target, so that it can measure the position of the target with higher accuracy according to the adopted range-bearing sensor model. In addition, the mobile node can track the target also in areas hidden to the fixed nodes, thus increasing the overall coverage of the network.

Table 15.1 Average MSE and variance in the tracking of 250 repeated random target trajectories: comparison between results with and without the presence of a mobile node

6 Experimental Results

In this section, the DAAL system is validated through experimental tests conducted in a real-world scenario. The evaluation is focused on the distributed target tracking task. First, we describe the environment setup, then the system and the results of the experimental tests are presented.

Fig. 15.6
figure 6

Map of one corridor of the office with overlaid the position of three static cameras (red circles) and one mobile agent (green triangle)

6.1 Environment Setup

The environment setup used for the experimentation of the system is shown in Fig. 15.6. The picture shows the map of a corridor of the ISSIA-CNR building, as it is built by the gmapping node available in ROS using the laser data acquired by a mobile robot during a complete exploration of the environment. In this experimentation, three fixed cameras and one mobile robot have been employed. The positions of the fixed cameras (\(C_1\), \(C_2\), \(C_3\)) and of the mobile robot (\(R_1\)) are overlaid on the map. The mobile agent is able to localize itself in the environment and, using its on-board sensors, it is able to carry out surveillance tasks, such as people detection and tracking. Cameras are calibrated, therefore events detected in the image plane can be located in the real world and their positions can be communicated to the mobile agent. The mobile robot can explore areas that are unobservable by the fixed cameras and improve the accuracy in detecting events by reaching proper positions in the environment. Hence, the proposed system could be useful to reduce the number of fixed sensors or to monitor areas (e.g., cluttered environments) in which the field of view of fixed cameras can be temporarily and dynamically reduced.

Fig. 15.7
figure 7

The nodes of the network. On the left, the mobile agent PeopleBot. The robot is equipped with a laser range-finder SICK LMS200 and a Kinect. On the right, two different AXIS cameras: on the top, a Mpixel Axis IP color camera with 1,280 \(\times \) 1,024. On the bottom, an Axis IP color cameras with a \(640\times 480\) pixel camera

6.2 System Setup

The fixed nodes are three wireless IP cameras (\(C_1\), \(C_2\), \(C_3\)) with different spatial resolution, located in different points of the environment (see map in Fig. 15.6). \(C_2\) and \(C_3\) are Axis IP color cameras with a \(640\times 480\) pixel resolution and an acquisition frame rate of 10 frames/s. \(C_1\) is a Mpixel Axis IP color camera with 1,280 \(\times \) 1,024 pixel resolution and full frame acquisition rate of 8 frames/s (see Fig.15.7, on the right). A calibration step to estimate intrinsic and extrinsic parameters was performed for each camera using the Matlab Calibration Toolbox,Footnote 3 so that camera coordinates can be mapped to the global world reference frame provided by the map built by the mobile robots.

Fig. 15.8
figure 8

Trajectory 1. The measurement of the position of the target carried out by each of the sensor of the network (a) and the CDTT trajectory recovered on line and in distributed fashion by the network (b)

Fig. 15.9
figure 9

Trajectory 2. The measurement of the position of the target carried out by each of the sensor of the network (a) and the CDTT trajectory recovered on line and in distributed fashion by the network (b)

The mobile agent (denoted as \(R_1\) in Fig. 15.6) consists of a PeopleBot mobile robot platform equipped with a laser range-finder, a Kinect, and an on-board laptop (see Fig.15.7, on the left). The SICK laser is connected with the embedded robot control unit. The Kinect camera and the PeopleBot control unit are connected with the laptop, via a USB cable and a crossover cable, respectively. The laser range-finder is used to build a map of the environment and to localize the vehicle. The Kinect is used for both navigation (e.g., obstacle avoidance) and high-level tasks such as people detection and tracking.

The DAAL system performance is tested in a real time application in which a network of three cameras and a robot is used to monitor a large environment and track the position occupied by a given target. The target to be tracked is a person moving in the environment following two given trajectories. The robot is equipped with an on board Kinect camera whose field of view is \(58^\circ \) horizontal, \(45^\circ \) vertical, \(70^\circ \) diagonal, and the operational range is between \(0.8\) m (\(2.6\) ft) and \(3.5\) m (\(11\) ft) [4].

6.3 Results of Experiments

In the experimentation, the target follows two different trajectories in the environment, as shown in Figs. 15.8 and 15.9. Specifically, in Figs. 15.8a and 15.9a, the target trajectory is denoted by a red line, while the target positions as estimated by the three cameras and the robot are denoted by different markers. In Figs. 15.8b and 15.9b the target trajectory (red line) is compared with the trajectory (blue dots) as estimated by the CDTT algorithm. It should be noted that the estimated target position is the same for any node of the network, since after convergence of the consensus step of the CDTT algorithm all the network nodes share the same information about the target location. In order to quantify the tracking performance, we suppose that the target is moving with a constant velocity and we calculate the MSE, as done for the simulated case. Results are collected in Table 15.2, showing a mean square error of \(1.15\) and \(0.75\) m, for Trajectory 1 and Trajectory 2, respectively. Figure 15.10 shows two frames, acquired from the Kinect camera on the robot during the tracking of the Trajectory 1 depicted in Fig. 15.8.

Table 15.2 Average MSE and variance in the tracking of a person moving in the laboratory by means of a network of \(4\) nodes, \(3\) fixed and \(1\) mobile
Fig. 15.10
figure 10

Two different instants of the tracking of Trajectory 1, acquired from the Kinect sensor

7 Conclusions

In this chapter, a novel activity monitoring architecture for Ambient Assisted Living applications has been introduced. The main contribution is the combination of fixed and mobile nodes in the monitoring network: mobile sensors enable the complete coverage of large environments with fewer fixed sensors and increase the accuracy of measurements by reaching the most favorable position to observe the current target. The global logical architecture used by the system has been presented. The software agents developed to work on fixed and mobile nodes have been described. Simulations of the behavior of the system in a realistic environment (with sensor parameters closely corresponding to the characteristics of the real fixed and mobile sensors) have been carried out and the results have been shown. They have been obtained using a distributed target tracking algorithm developed by some of the authors. Furthermore, preliminary experimental results obtained by the real sensors in our lab environment are presented, showing the feasibility and effectiveness of the proposed system.