1 Introduction

The development of training technologies for deployment of automated systems, capable of interpreting and understanding the dynamics of objects and events in multimedia environments, is intended to become a valuable tool for modern society and smart world. New systems incorporating Wireless Multimedia Sensor Networks (WMSNs) are being increasingly applied to smart cities. Thus, the smart cities concept is becoming a reality due to the deployment of a variety of those technologies. One of the main research areas where WMSNs can be widely applied is Video surveillance since this kind of technologies can provide easier installations and lower maintenance systems. In Video surveillance environments, a huge volume of information is generated. Distributed architectures allow to manage this amount of information in different locations and to share only relevant information. Industry reports predict that the global Video surveillance market will grow at an annual rate of 16.97% in the coming future [34], specially in the case of smart cities. Latest reports indicate that in the future, surveillance solutions efforts will focus on the introduction of video analysis and the concept of cloud computing. In fact, intelligent video surveillance is one of the most active research areas in computer vision. Many different types of video surveillance systems have been developed [20, 22, 24, 37]. A video surveillance system is a combination of hardware and software components that are used to capture and analyze video. The primary aim of these systems is to monitor the behaviour of objects (usually people or vehicles) in order to check for suspicious or abnormal behaviours, using the extracted information of those objects (physical features, trajectories, speed…) from a variety of sensors (such as surveillance cameras). The sensor sensitivities from a variety of sensors are decisive phenomena for extracting accurate information based on the size in pixels for high resolution 2D or 3D cameras, the temperature in degrees obtained from thermal information cameras, and time in seconds for any activities in the scene. In sensor’s operating range, an ideal sensor has a constant sensitivity. In saturation state, sensor can no longer respond to any changes. For real-world applications, the scene composition (e.g., occluded or abandoned object(s)) is an important issue for activities detection based on an accurate information extraction process that a video surveillance system’s performance is highly affected from actual scene composition and sensor sensitivities (see, for instance, [11]).

A review of computer vision and pattern recognition technologies used in intelligent multi-camera video surveillance is presented in [35]. A summary of surveillance systems generations evolutions as well as previous developments on object detection, recognition, tracking, behaviour analysis and storage can be found in [33]. However, there is not still a standard solution that supports the entire life cycle of these information systems: acquisition of information, video processing, categorization and management, including automatic detection of risk situations and decision support, all of them based on context information according to the specific characteristics of the monitored environment. In addition, most of these systems do not allow automatic re-configuration based on previous behaviours in the system. For instance, to switch a part of the system on (one or several sensors) given an alarm generated in other part of the system based on the combined information from one or several sensors.

INVISUM (Intelligent VideoSurveillance System) is a project funded by the Spanish Ministry of Economy and Competitivity focused on the development of an advanced and complete security system [8]. The goal of the project is the development of an intelligent video surveillance system that addresses the limitations of scalability and flexibility of current video surveillance systems incorporating new compression techniques, pattern detection, decision support, and advanced architectures to maximize the efficiency of the system. Thus, a modular platform that facilitates the integration of modules from different manufactures and based on various technologies, is desired. In this paper we present a context-aware wireless distributed architecture for a real-time intelligent surveillance system that is able to detect and track multiple people and vehicles, monitoring their behaviours in a combined indoor-outdoor environment. The system has been designed thinking on many different types of devices (sensors and computers) should be able to be connected among them in an easy-way. Thus, the architecture described below is aimed at allowing wireless (as well as wired) connections and a flexible processes distribution in a cloud and highly modular environment.

The paper is organized as follows. Section 2 details the wireless distributed architecture of the proposed system. Section 3 describes some developed high level components that use this architecture. A case of study is presented in Section 4. Section 5 summarizes a discussion of the main results. Section 6 provides the concluding remarks.

2 System architecture

The proposed intelligent video surveillance architecture has been designed to simultaneously manage a variety of scenarios by using several scenes with several wireless interconnected sensors in each scene. That is, a WMSN is designed. The data extracted from each multimedia sensor in the network are processed to detect abnormal activities. In such a case, an alarm is generated and all the relevant information regarding this alarm is stored for later analysis. By fusing the alarms information, more complex alarms could be generated later in the system.

Figure 1 presents the proposed system architecture based on six modules: Sensor Manager, Information Extractor, Activities, Scene Managers, System Core and High Level Components. Each module includes processes, and the different modules cooperate to achieve the system aims. On the one hand, the modules in orange correspond to processing data modules. On the other hand, the modules in blue correspond to managing data modules. To transmit the information, there are a data flow (green lines), an alarm signals flow (red lines), and a control signals flow (blue lines).

Fig. 1
figure 1

Proposed Architecture for the INVISUM project

In a nutshell, the performance of the system is as follows. First, the Sensor Manager configures the sensors with the set-up provide by the Scene Manager. Then, the Sensor Manager reads the data from the sensors and send these data to the Information Extractors in the proper format. The Information Extractor extracts the relevant information from the data given by the Sensor Manager. This information is sent to the Activities modules and to its corresponding Scene Manager. The Activities modules deal with the defined security activities in the proposed scenarios. The paper focuses on the security issues regarding the activities considered in the INVISUM project, i.e., a combined indoor-outdoor video surveillance environment. The main task of the Activities is to detect abnormal activities and to generate alarms when needed. When an alarm is generated, it is sent to the corresponding Scene Manager which processes it and transmits the result to the System Core. The Scene Manager modules supervise data, configurations and alarms of each scene. Notice that there are as many Scene manager modules as possible scenes in the whole system. The Scene Managers configure the Sensor Managers, the Information Extractors and the Activities modules with the configuration given by the System core for each scene. The System Core receives all the data and alarms from the Scene Managers, and manage them before transmitting global alarms and data to the High Level Components modules. In addition, the System Core could receive external configurations from the High Level Components. The System Core could also take some decisions regarding the system configuration. When a global alarm is generated the System Core generates additional data and alarms which are sent to the High Level Components modules in order to enrich the current information available in these modules.

To build this architecture the Robot Operating System (ROS) was used [23]. ROS is a set of software libraries and tools that provide a structured communications layer above the host operating systems of a heterogenous computers set. Communications provided by ROS are established through a TCP-based network. Thus, all the communications in the proposed achitecture are wireless ones. Since the chosen communication protocol just transmits the useful information, fast inter-modules communication is expected. Notice that all modules in the system run concurrently and asynchronously. Given the distributed nature of the designed architecture each module could run on different computers.

Given that one of the main tasks of the Scene Managers and System Core is to fuse all the relevant information, power and capability requirements for the WMSN should be greater here than in the other modules. A typical distribution could be the following. The different Sensor Managers could be integrated on the sensors, or in small capability computers (such as, for instance, a Raspberry Pi). The Information Extractors, the Activities, and the High Level Components could be distributed in several standard computers. However, the Scenes Managers and the System Core should be in a more poweful and reliable dedicated computer (such as a server).

In Fig. 2, a more detailed representation of the system architecture is presented. This figure includes the components of each module, the interfaces and the conexions. The proposed architecture has been designed for the INVISUM project [8]. One of the main aims of the project is to build a global, flexible and scalable video surveillance sytem to operate in a multiple-sensor environment in several indoor-outdoor scenarios. Modules, processes and signals flows presented in Fig. 2 respond to such complex video surveillance environment. In the next sections, these modules are detailed.

Fig. 2
figure 2

Detailed Proposed Architecture for the INVISUM project

2.1 Sensor manager

The Sensor Manager modules read the data from the sensors through a Reader process, and configure the sensors using a Configuration process. Notice that, there are a Sensor Manager for each kind of sensor in the system. The idea behind these modules is to achieve a plug-and-play sensor connection. One of the tasks of the Sensor Managers is to homogenize data formats. For instance, if there are several kinds of video-cameras in the system, the Sensor Manager module of each camera translates each video format into a unique format established by the Sensor Manager configuration. Each Sensor Manager provides data to the corresponding Information Extractor. For example, when the sensor is a camera, the Information Extractor processes the frames in order to extract the relevant information. Notice that, when the Sensor Manager is accompanied by algorithms capable of automatically extracting and processing useful sensor information, then no Information Extractor action is needed. In such a case, the useful information is directly sent from the Sensor Manager to the Activities module. For instance, when we are dealing with Fingerprint sensors, these usually provide a vector of features that describes the individual pattern of interest (see [12] for a complete description of Fingerprint recognition algorithms). These patterns will be send to the corresponding activity (Access Control). Another example are the Thermal sensors, that provide the temperature of several points in the corresponding scene (see [7] for a complete description of Thermal Sensors). As before, these features will be send to the corresponding activity (Fire Detector).

Furthermore, the Configuration process translates the system configuration given by the Scence manager into the proper sensor configuration format.

2.2 Information extractor

Each Information Extractor module receive data, in the proper format, from the corresponding Sensor Manager. The configurations for the Information Extractors are set by the Scenes Managers through the control signals flow. For the purposes of this project, the following processes have considered: Objects Information, Vehicle Features, Motion, and Faces for 2D and 3D camera sensors. Several state-of-the-art methods to process information have been used. Given a sensor, and the corresponding Sensor Manager, one of these processes is included in the corresponding Information Extractor module. For instance, if the sensor is a 3D camera for a control access, the process included in the corresponding Information Extractor is the Faces process. However, if the sensor is a 2D RGB camera in a car access control, the process included in the Information Extractor is the Vehicle Features process. Then, these processes work on the data and the resulting information is sent to the Activities and corresponding Scene Manager. Next, each process is explained in detail.

2.2.1 Objects information

This process is aimed at the extraction of information associated with 2D camera sensors. Given a sequence of images, the Objects Information process extracts objects, their features and their trajectories. We use the background substraction method based on Gaussian Mixture of Models (GMM) for object detection. After background substraction, a bag-of-soft-biometric features is extracted from each detected object. These features are related to the RGB color space, grayscale statistics and histograms, geometry, HSV color space, co-occurrence matrix and Local Binary Patterns [17]. Other features, such as size of the object, are considered. The Kalman filter algorithm is used for trajectory prediction. The input parameters of the Objects Information process are the minimum object size to be considered, and the background update rate, that is, the speed or frequency at which the background gets updated. These parameters are fixed by the corresponding Scene Manager Configuration process. For instance, given two different scenes, this process allows that the minimum object size parameter could be different in each scene.

2.2.2 Vehicle features

The Vehicle Features process uses information extracted from 2D camera sensors. Typically, this sensor would be placed in a parking area for access control and parking applications (see [5] for a complete state-of-the-art review). The proposed process recognizes a vehicles license plate number from an image or images using domain information. In order to locate the region of interest the shape and the size of a car plate are considered. For the character recognition, Neural Networks were trained. In addition, a vehicle brand recognition algorithm is used [10]. This is a complex task since each car has a unique logo, but it could vary in size, color, texture, etc.

2.2.3 Motion

This is a process for the extraction of information from 2D and 3D camera sensors. The Motion process extracts the well-known Motion History Image (MHI) and the Motion Energy Image (MEI) (see [1] for a complete description). The MHI gives the temporal information of the motion at the image plane, and the MEI indicates where the motion has occurred in the image plane [3]. Vehicles and people extracted information results quite useful for the system funcionallity and to improve its perfomance.

2.2.4 Faces

The Faces process uses information from 2D and 3D camera sensors. For each image representation (2D, 3D), Parallel Gabor Principal Component Analysis is used for feature extraction (see [15, 29] for a complete description). These filters have the ability to extract the most significant features in a visual scene. The main challenge in the implementation of this module is the proper combination of 2D and 3D information. In [15] linear combinations of several sources of information are used. However, more complex methods could be considered [13].

2.3 Activities

Activities modules process information and generate alarms when abnormal situations are detected. The configuration for the Activities is set by the Scenes Managers. The generated alarms are sent, through the alarm signals flow, to the corresponding Scene Manager to be processed. For the purposes of the INVISUM project, we consider several activity processes: Trajectory Analysis, Restricted Area, Behaviour Detector, Fire Detector, Suspect Detector, Abandoned Objects, Tampering, Fall Detector and Access Control. These processes require some configuration parameters to define when an alarm must be set. These parameters are fixed in the Scene Manager Configuration process. Given the design of the proposed system, all the activities have the same interface. In addition, all the activities interact in the same way with the Information Extractors and Scene Managers.

2.3.1 Trajectory analysis

The Objects Information process of the Information Extractor module obtains the trajectory of an object as described in Section 2.2.1. In any anomaly trajectory method, one of the main challenges is to define what an anomaly (outlier) is. In the proposed system, the configuration from the Scene Manager defines what is an anomalous trajectory. This information could come as a result of a previous training stage, or as a direct definition from an user [19, 32]. Given the scene, the security user could define a trajectory as anomalous (for instance, to wander around the area of interest). Given a new trajectory, this process returns the maximum similarity to a predefined normal trajectory [30].

2.3.2 Restricted area

This process has been designed to detect objects in a restricted area of a scene. Such an area is defined by the Scene Manager configuration. Object features and trajectory are obtained in the Information Extractor module as described in Section 2.2.1. Given these features (specially the object size) and the trajectory (specially motionlessness), it is possible to detect illegal parking, or unwanted human presence in the predefined area.

2.3.3 Behaviour detector

Crowd behaviour analysis is a recent area of interest in computer vision (see [36] for a complete survey). However, the real crowd motion exhibits complex behaviors that are difficult to model. We build a Behaviour Detector from objects trajectories obtained in the Information Extractor module. The Behaviour Detector provides a crowd density measure, and generates an alarm when this measure is higher than a fixed threshold. Furthermore, an alarm is generated when a higher correlation among a large number of trajectories in the crowd appears. As before, these parameters are defined in the Configuration process of the Scene Manager.

2.3.4 Fire detector

In this process, the input information is the temperature from Thermal Information cameras. In this case, the information arrives directly from the Sensor Manager. Thus, no Information Extractor process in needed. This thermal imaging system uses infrared imaging technology that detects infrared radiation or heat. In each frame of the video, the pixels with temperature higher than a threshold are detected. Those pixels define a blob. When the size of the blob is high enough, and this blob is detected for a sufficient number of seconds, an alarm is generated and transmitted to the Scene Manager. Thus, the minimum size for the blob, and the minimum number of seconds are the two parameters to be fixed.

2.3.5 Suspect detector

The Suspect Detector process receives the blobs and features from the Objects Information process of the Information Extractor module. In addition, it receives the configuration signal from the Scene Manager that informs about the reference features values that the Suspect Detector process should look for. The normalized (from 0 to 1) Euclidean distance between those reference values and the features values in each new blob, are calculated (as indicated in [17, 18]). If the distance is lower than a threshold, an alarm is generated and transmitted to the Scene Manager.

2.3.6 Abandoned objects

The most popular abandoned object detection algorithms are base on background subtraction, due to their superior robustness in complex real world scenario as those proposed in this project. Following [31], we consider an abandoned object to be a nomotion object that has not been in the images before. The parameters for this process are the size of the object and the time lapse to label an object as abandoned. The main challenge in this module is the proper context-aware configuration of these parameters. To perform this task, human domain expert is mandatory.

2.3.7 Tampering

In this paper, camera tampering is defined as any relevant event which dramatically alters the image seen by the camera. For instance, the camera is moved, partially obscured, severely defocused, covered or sprayed, etc. This situations imply big objects in the image. Thus, numerous methods for tampering detection use image difference calculation (see, for instance, [25]). In the proposed system tampering is detecting by the size of the detected objects. If the size is larger than a threshold fixed by the Scene Manager configuration, an alarm is generated.

2.3.8 Fall detector

Recently some works exploiting computer vision for detecting falls have been presented [3, 4, 26, 38]. The Motion process of the Information Extractor module obtains the motion information of an object as described in Section 2.2.3. In this process we detect Humans fall based on the MHI and MEI.

2.3.9 Access control

Biometric characteristics are widely used in access controls. For example, automated border controls (ABC) systems have been installed in different worldwide airport entries in the last years (see, for instance, [21, 28]). In this paper, the Access Control process uses information from the Face process in the Information Extractor module (Section 2.2.4), and information from fingerprint sensors. To perform fingerprint biometric acquisition, a traditional touch-based fingerprint recognition system is used [12]. In this case, the information arrives to the Access Control process directly from the Fingerprint sensors (no Information Extractor is needed). The methods proposed in [13] have been used to combine these two sources of information.

The process compares the biometric data with the models fixed by the Scene Manager configuration. Typically, this means a search of a person in a list. The process output is the identification or not of the person. In the last case, an alarm is generated and transmitted to the Scene Manager.

2.3.10 Additional activities

The considered activities correspond to that included in the scope of the INVISUM project. That is, an indoor-outdoor multisensor video surveillance problem. However, given that all the activities share the same interface, adding new activities requires little effort, if necessary. That is, our system is flexible and easily customizable.

2.4 Scene manager

As shown in Fig. 1, there are three processes in the Scene Manager modules: Data, Scene Alarms and Configuration. The wireless and distributed system has been designed to manage several scenes with several sensors in each scene. The Scene Manager configures each Sensor Manager, each Information Extractor and each Activity related to the scene. The Data process receives the data from the Information Extractors in order to synchronize them. After that, the synchronized data are sent to the System Core. Besides, the Scene Manager receives all the alarms generated in the Activities modules. In the Scene Alarms process, all these alarms are synchronized and combined. On the one hand, this combination could be very simple, for instance using a rule such as “to generate an scene alarm when a single activity alarm is generated”. On the other hand, this combination could be based on a more complex algorithm. For instance, a method to build an alarm buffer for the measurement of risk according to the system activities could be developed (see, [16] for a similar approach in the traffic safety domain).

The alarms from all the Scene Managers are sent to the System Core for further processing.

2.5 System core

The System Core is the responsible to monitor the system for proper performance. There are three processes in the System Core: DataBase, Global Alarm and Control (see Fig. 1). The System Core receives all the synchronized data, and alarms from the Scene Managers. In the Global Alarms process, all alarms received are synchronized and combined following the configuration parameters given by the Control process. This process generates an unique global alarm for the system.

In the DataBase process all received data are synchronized using the timestamp given by the Sensors Managers. When a global alarm is generated, these data are sent to the High Level Components following the configuration from the Control process. This DataBase process works as a buffer, that performs with a temporal memory allocation scheme.

The Control process generates the signals for the Scene Manager Configuration processes. These signals can be fixed by the Control Manager process in the High Level Components as a set of preconfigured rules, or they can be learned to respond to specific alarm situations. Notice that the security user manages the system through this Control process. For instance, when the system generates an alarm from the information of sensor A, the set of rules states the other sensors B,C,.., etc., to collect as much information as possible about that alarm.

3 High level components

In this Section, three high level components that use the previous architecture are detailed: Alarms DataBase, Multisensor Tracker and Control Manager. These processes receive/send data, alarms and configuration signals from/to the System Core.

3.1 Alarms database

This process manages the DataBase of the system’s alarms. When an alarm is generated in the System Core, all the data related to that alarm are stored in the DataBase.

3.2 Multisensor tracker

Tracking multiple people from standard cameras is challenging, specially when overlapping between several cameras is not covered (see, for instance, [2, 9]). In the proposed problem, when an alarm is generated in the System Core, the Multisensor Tracker manages all the information regarding the object or trajectory that caused the alarm. For instance, when a person is detected in a restricted area by the corresponding Activity, an alarm is generated in the System Core. Then, the System Core activates the Multisensor Tracker in order to collect all the data regarding that person from the different sensors. Notice that, to do that, this process configures the Suspect Detectors of the corresponding Activities.

3.3 Control manager

As presented in previous section, the security user configures the whole system through the Control process in the System Core module. The user sets the parameters of each Scene Manager Configuration process and the parameters to control the Global Alarms process in the System Core. Notice that each Scene Manager Configuration process controls the parameters of the Activities, Information Extractors and Sensors Managers involved in each scene. To facilitate the installation and configuration of the proposed system, three basic configurations are presented to the security user: Level 3 or low security, Level 2 or medium security, and Level 1 or high security. Table 1 shows a summary of these security configurations and the parameters established for each process.

Table 1 Different security configurations for the security system

To manage the Control process in the System Core, a Control Manager Tool (CMT) was developed. The main aim of this tool is to facilitate the configuration of the system parameters to the security user. In addition, global information is presented as a result. The proposed CMT allows two visualizations (see Figs. 3 and 4). In the first visualization, the called Viewer, the CMT shows information about the global system, their architecture, sensors and connections. The user can choose a sensor to obtain the visual signal. In addition, the system status with all the generated alarms is presented. For instance, in the image in Fig. 3 an abnormal trajectory and an abandoned object have been detected in scenario A.

Fig. 3
figure 3

Viewer screen in the Control Manager Tool (CMT)

Fig. 4
figure 4

Configuration screen in the Control Manager Tool (CMT)

In the second visualization, the called Configurator, the CMT allows the user to configure the system. First, a control to run/stop the system is presented. To facilitate the configuration of the system, the user can easily chose the level of security associated with a color scale: green for low security, orange for medium security, and red for high security. The user can add/delete Scene Managers, Sensors Managers and Activities to the system. Furthermore, it is possible to change the level of security for Scene Managers and Activities. Notice that it is possible to select different levels of security for each scene and activity. For instance, the user could chose a low security level for the Trajectory Analysis activity, but a high level of security for the Abandoned Objects activity. In such a way, the system would respond almost immediately to an abandoned object situation, but slowly to an abnormal trajectory. Moreover, the security user could chose various security levels fot different scenes.

To test the system, a set of experiments were designed to be executed on two real scenarios where our system was deployed. These experiments are presented in the next Section.

4 System tests (a case of study)

To validate the performance of our system, we have performed experiments using two academic scenes at the campus of the University Rey Juan Carlos in Mostoles, Spain.

The first scenario is an indoor scene shown in Fig. 5. The image covers most of the main hall of a classroom building. The hall is mainly used to move from one classroom to other and to get into and out of the building. In this case, five inexpensive sensors were used: two RGB cameras, a 2D-3D camera, a thermal camera and a fingerprint sensor. The RGB cameras have a CMOS sensor with 1920 × 1080 pixels of resolution and a frame rate of 30 fps. These cameras have WI-FI connection. In these examples the standard protocol 802.11/b/g was used. They were mounted overlooking the ground floor, on the first and second floors, respectively. The 2D-3D camera is a Kinect V1.0 which has a CCD sensor with 640 × 480 pixels of resolution for RGB images and a CCD sensor with 320 × 240 resolution for infra-red and 3D images. Besides, the Kinect sensor has a infra-red proyector that emits a patron used for the 3D images calculation. The Kinect is connected to a Raspberry Pi B+ through USB 2.0. It has a ARM1176JZF-S CPU at 700 MHz and 512 Mb of RAM. The Sensor Manager process runs on the Raspberry Pi that sends the collected data through WI-FI to the corresponding Information Extractor. The range of temperatures for the thermal camera is from −20 to 120 degrees. It gives images of 640 × 480 pixels in a range of 30 m. The thermal camera is connected to a smartphone through micro USB. The smartphone is able to store, process in real-time and send, through WI-FI, the information to the corresponding Activity process. The fingerprint sensor is used as an employee time clock attendance machine. It is supposed to be used by the teachers every hour from ten minutes before to ten minutes after o’clock. However, it is supposed not to be used out of that period of time. That is, there is a time window of twenty minutes per hour to use the sensor. This context information is fixed by the configuration control.

Fig. 5
figure 5

Indoor Scenario and sensors positions

The second area of insterest is an outdoor scene shown in Fig. 6. The image covers a parking area close to the classroom building considered in the previous scene. In this case, five inexpensive sensors were used: three RGB cameras, a 2D-3D camera, and a thermal camera. To monitor the entire scene two RGB cameras, similar to those in the indoor are, are used. To extract vehicles information, one RGB camera is used. It has a CMOS sensor with 2048 × 1536 pixels of resolution and a frame rate of 12.5 fps. The thermal camera has the same characteristics than the camera used in the indoor scenario. The 2D-3D camera is a Kinect V1.0. In this case, this camera is set in the entry of the nearest building to the parking, to people access control.

Fig. 6
figure 6

Outdoor Scenario and sensors positions

For our purposes, we have used a distributed achitecture of computers. In this case, two Raspberry Pi computers, two smartphones and four Intel Core i5 (4GB of RAM) computers, are considered. The connections between them have been made through WI-FI 802.11/b/g. One computer (server) is used for the System Core and the two Scene Managers. A second computer is used for the High Level Components module. A third computer is used for the Sensor Managers, Activities and Information Extractors of the indoor scene, and a fourth computer is used for the Sensor Managers, Activities and Information Extractors of the outdoor scene.

4.1 Configuration for the tests

Given the sensors and the communications between the system modules previously presented, it is possible to define the set of rules to manage the system. In the indoor scenario, when one of the RBG cameras or the fingerprint sensor generate an alarm, the Kinect sensor switches on to collect as much information as possible about the person that generates that alarm. Similarly, in the outdoor scenario, when one of the RBG cameras generates an alarm, the Kinect sensor (in the entry of the building) switches on in order to collect as much information as possible about the person that generates that alarm. When an alarm is generated in one scenario, the Multisensor Tracker process looks for the person that generates the alarm in the RGB cameras of the two scenarios. In addition, the information collected by the plates reader RGB is retrieved in order to merge a car with the suspect. All the acquired data are stored in a 2GB memory.

4.2 Experiments

To test the perfomance of the proposed system, several security incidents were preprogrammed in each scenario to induced alarm situations.

The first security incident is performed to test context information. In the indoor scenario, the fingerprint sensor area is not physically protected. It is mandatory to generate an alarm when someone (an actor) acts over the fingerprint sensor or stands close to the sensor for a long period of time, out of the allowed time interval. If an alarm is generated, then a signal is sending to the Kinect sensor to switch on. The face information of the actor is collected and their identity is verified by the Access Control Activity. That is, an actor is labelled as suspect given the information of one sensor, and new information about the actor is collected by other sensors.

The second security incident is performed in the two scenes. First, an actor leaves an object (a backpack) on the floor in the indoor scenario. This should be detected as an abandoned object. Then, an alarm should be generated and the features of the actor collected from the RGB cameras. Next, the actor moves to the outdoor scenario. The system tries to locate the actor in the outdoor scenario using the information from the two RGB cameras.

5 Data analysis

A data analysis of the collected information was performed using the pre-configured security configurations presented in Table 1. The description of each alarm situation detected by our system are summarized in Table 2. Notice that, in the High security level, the system is able to detect six situations as alarms. In the case of performing under a Low security level, only one situation was reported as alarm. Finally, when the system performs under a Medium security level, three situations were considered as alarms.

Table 2 Different security situations detected for the security system. Given the security level configurations, the secutiry levels in which the situation would be detected as an alarm is marked (X)

As expected, the system is able to detect all the preprogrammed security incidents. First, the Restricted Area activity sent an alarm to the Scene Manager in the scenario indoor, when a person stayed in the predefined area for a time higher than the preconfigured time threshold (see Table 1). Notice that the alarm will be generated (or not) depending on the level of security that the system is working with. On the one hand, if the secutiry level is low, the person detected in the restricted are during 90s will not generate an alarm. On the other hand, if the secutiry level is medium or high, the same situation will generate an alarm. When an alarm was generated, a signal to switch on the Kinect sensor was sent. In such a case, the Face information was recorded in order to verify (if possible) their identity. Thus, several sensors are used to detect an event. In this case, the event is the presence of a suspect behaviour of a previous signed up suspect. A detailed schedule listing the events that take place during this situation is presented in Fig. 7. The RGB camera sensor sends images to the Sensor Manager located in a Computer A. The Sensor Manager sends the images in the proper format to the Objects Information process.

Fig. 7
figure 7

Flow diagram of the Restriced Area experiment

This process obtains the blobs and sends them to the Restricted Area process. The Restriced Area process generates an alarm and sends that signal to the Scene Manager A, located in a server. The System Core (located in the same server) recieves such alarm and send a control signal, in order to switch on a new sensor, to the Scene Manager A. This signal arrives to the Sensor Manager B (located in a Raspberry Pi), that manages the Kinect sensor. Figures 8a and b presents images of this security incident.

Fig. 8
figure 8

Secutiry situations detected as alarms by the system

In addition to the pregrogrammed secutiry incidents, other ones were detected. For instance, during the record of the experiment, a hand appears in front of the camera (see Fig. 8c). This was detected by the Tampering process and an alarm was generated. Since the hand was very close to the camera, the alarm will be generated in any of the predefined security levels.

The abandoned objects module detected a backpack in the floor, and the person was labeled as suspect (see Figs. 8d and e, for the RGB and Thermal Images, respectively). By using the features extracted in the Objects Information process, the suspect was detected in the outdoor scenario (using the Suspect Detector activity) and the person and vehicle information was recorded in the alarm database (see Fig. 8g).

The Behaviour Detector process generated an alarm related to the number of similar trajectories of people in the scene. In this particular case, a group of 13 people were walking on similar trajectories at the same time in the image (see Fig. 8f). Given the parameters in the Security Configurations (see Table 1), this was detected as an alarm when the system performed in a High security level. The detected situation corresponds to a movement of students from their classroom to the exit of the classrooms building, when the class has finished. That is, it could be considered as a false alarm, since it is not a risk situation. Thus, in the indoor scenario, it was decided to increase this particular Security Configuration parameter from 10 to 20. This case indicates that a new rule adding context knowledge in the Behaviour Detector activity should be considered.

Furthermore, in the outdoor scenario, one no-pregrogrammed security incident was detected (see Fig. 8h). A vehicle stayed in a restricted area during 68 s. An alarm will be generated when the system was performed under medium or high security levels. Figure 8i shows the information extracted by the Vehicle Features process (see Section 2.2.2) regarding the vehicle that caused the alarm.

Thus, the results of the tests show that the proposed system is reliable for security incidents identification. In addition, some of the alarms will be useful to redefine some of the default parameters in the system.

Further qualitative evaluation results, objective evaluation of the system has been performed. Notice that this goes beyond the purpose of the present paper. However, to validate the performance of the system, several studies have been developed. Using the architecture first presented in this paper, a context-aware distance for anomalous human trajectories detection has been recently presented [27]. The presented method outperforms the alternative distances: 75.8% of outliers detections in a database of 182 real trajectories. Another paper that presents a novel methodology for human reidentification in multi-camera VideoSurveillance environment scenarios using the proposed distributed architecture has been accepted for its publication [14]. A number of discriminative features extracted in a scene are used in a new scene in order to detect suspicious persons using the images from a non-overlapping camera. In this research, the 100% of the suspicious persons were detected, and a 5.88% of false alarms were generated.

6 Conclusions

In this paper a wireless and distributed architecture intelligent surveillance system has been presented and tested. The proposed architecture has been designed to manage several scenenarios with several wireless interconnected sensors, as part of a video surveillance system. Several state-of-the-art methods to process information and to detect abnormal activites have been used. The proposed architecture has been built using the Robot Operating System that manages wireless communications in a multimedia sensor network. The wireless multimedia sensor architecture presented in this paper has been tested in real situations. It has been shown, by employing multiple and heterogenous sources of information, that the system is capable to detect risk situations and generate appropriate alarm signals. The flexibility and adaptability of the poposed architecture has been tested. In a real experiment it has been shown that the detected alarms updated configuration parameters and rules of the system to improve its reliability. Overall, the results have been promising and the proposed architecture can serve as the foundation for further enhancements.

An objective evaluation of the distributed architecture intelligent surveillance system has been performed via experts’ analysis. Two experienced experts working (>25 years) for the company responsible of the security at the University [6] were questioned regarding the usabilty, features, and capabilities of the system. They provided an average score of 9.5 over 10 about the global performance and the security support provided for the system.

Although this goes beyond the purposes of the present paper, preliminary statistical analysis and evaluation based on Data Analysis section have been promising and show that the proposed architecture can serve as the foundation for further research.

Although in this paper only academic scenarios were considered, the proposed architecture can be fully extended to smart world. As smart cities, our system relies on multimedia sensory data acquired from multiple sensors in distributed locations.

In the future, the proposed multimedia sensor network can be enriched with new sensors, such as road sensors and portable air sensors, with new multimedia devices capabilities.

To extend the system, a complete alarm module that combines information from several (and probably asynchronous) alarms is under development. This module will be part of the INVISUM project. It will consider relevant variables to increase the alarm level or decrease the alarm level depending on the security incidents. In the past, the same idea has been successfully used to build a risk buffer in the area of driving risk detection [16]. The fusion of several security experts’ knowledge is considered to improve the future alarm module.