Introduction

The availability of new technologies such as remotely-operated and autonomous drones, wearable visual sensing equipment, and ground robots, allow a rapid deployment of mobile cameras in unknown environments with the ability to adapt to unforeseen situations, extend the duration of an observation and improve the performance of video analysis1. Moreover, the increasing need for safety and security, combined with the growing availability of these visual sensors mounted on mobile agents, make camera networks increasingly explored2. Applications include public and private environments, such as robot navigation in post-disaster areas, crime prevention, traffic control, autonomous driving, accident detection, and monitoring patients, elderly, and children at home3,4.

In order to automatize the interaction between humans and the surrounding environment, mobile cameras require to find the objects of interest (detection), follow them by an over-time localization (intra-camera tracking), and link the same objects across the camera network (re-identification) by exploiting the redundancy and richness of information provided by all cameras. We define this overall task as object association which is normally performed by employing each single camera with the aim of monitoring an area as wide as possible.

When association in a camera network is performed with cameras presenting both overlapping and non-overlapping Fields-of-Views (FoVs), the task-at-hand has to face constant changes in illumination and background both locally and across cameras without the possibility of reliably calibrating the cameras for position (viewpoint) and color. Targets can then appear and be seen from different viewing angles, thus making challenging association and assignment of unique IDs that are robust to frequent entering and exiting of the cameras’ FOVs. In addition to this, time efficiency is fundamental when deploying mobile cameras due to the nature of the dynamic interactions between humans and environment5. This can be achieved by having both an efficient communication across the network robust to mis-communications2, and a fast on-board implementation of the association algorithm. For example, in forensic applications decisions must be taken immediately when an event occurs and suspects have to be followed continuously over time. A camera network is also required to be resilient to different network sizes and must be able to integrate new cameras joining the network, with a fully distributed approach being favourable to avoid single failure points6. Figure 1 shows a typical mobile camera scenario.

Figure 1
figure 1

Pictorial layout of a camera network. Each camera unit is a node in the network. Cameras can see different people at a certain time instant. Blue lines correspond to communication links.

In this paper, we propose DMMA, a real-time target-management module for Distributed Multi-camera Multi-target Association, a distributed strategy suitable for moving cameras (see Fig. 2). The management module updates and shares across the network a data-structure that maintains target labels and appearance over time using local and network information to obtain robustness to both occlusions and target appearance/disappearance. Moreover, a new camera joining the network can be fully operational after downloading the data-structure from the other nodes. A consensus among the cameras is obtained by sharing the data-structure variations across the network with decisions taken locally during association.

Figure 2
figure 2

Block diagram of the Distributed Multi-camera Multi-target Association (DMMA). A: switch activated periodically or when the tracking confidence is low. Target management: receives in input the extracted features; deals with intra-camera and inter-camera associations, both by the Hungarian algorithm7; communicates with the other cameras. Local data-structure is updated at each time step.

In summary, our main contributions are:

  • a target-representation that consists of both appearance and deep features;

  • a target-management module that deals with occlusions as well as targets entering/exiting the camera’s FOVs;

  • a novel mobile-camera dataset comprising six different scenes with moving cameras and objects.

Related work

Target association in cameras networks deals with detection8, tracking9, re-identification10 and distributed protocols11. We provide an overview of the main methods with a focus on those solutions designed for real-time implementations.


Camera networks Strategies for target association in camera networks can be categorized into centralized, distributed, and decentralized11. Most camera networks utilize a centralized approach where a server receives data from each camera in the network12. Although this strategy can exploit directly existing single-camera protocols (e.g. a single-camera tracker) by fusing the information centrally, the presence of a single fusion center leads to a lack of scalability and possibly to a communication bottleneck13. Distributed approaches operate with no fusion centers, thus improving the scalability and potentially reducing the communication bottlenecks. However, they are normally more complex protocols as they require to reach a consensus remotely. Distributed approaches for camera networks include a multi-target square-root cubature information consensus filter to increase tracking accuracy and stability14 and an information weighted consensus filter for solving the data association problem15. Decentralized protocols instead are a hybrid solution between centralized and distributed, as cameras are grouped into clusters and they communicate with their local fusion centers only16. This solution may provide a more scalable solution than a fully centralized approach but less than a distributed. Schwager et al.17 present a strategy for the deployment of robotic cameras in a decentralized way, which can accommodate groups of cameras to monitor an environment. The majority of the solutions for camera networks focus on improving communication and how information are managed across the camera network while assuming targets are perfectly detected, tracked and re-identified18,12,19. However this may not be always the case. Graph modeling is an effective way to tackle object re-identification when the topology of camera network is known. Chen et al.12 introduced a global graph model with in input different observations, such as detections, tracklets, trajectories or pairs. Cai et al.18 utilized the topology information of a camera network to re-identify objects across camera views. Hofmann et al.19 presented a global min-cost flow graph that joins the different-view detections.


Detection In order to properly associate multiple targets across a camera network, targets require to be detected in each of the cameras where they are visible20. Mobile cameras are challenging for background subtraction techniques since the background constantly changes, hence approaches based on learning the shape of the target are normally preferable21. Single-Shot Detector (SSD)22, You Only Look Once (YOLO)23 MobileNet24 and EfficientDet25 are examples of target detectors with implementations that can run in real time and are based on detecting a shape learned during training.


Tracking Once the targets are detected, an identifier (ID) is assigned to each target and ideally kept over time and across all cameras. If a target is new to the network, then a new ID is created. Tracking and re-identification deal with assigning an ID in a single camera and across cameras, respectively, and while the main challenge of a tracker is to maintain the same ID to the same target over time, re-identification focuses on assigning the same ID to the same target seen by different cameras. A Multi-Object Tracking (MOT) framework for mobile cameras was proposed by Choi et al.26 where both the camera’s ego-motion and the objects’ paths are estimated. Detections can be linked with Markov Decision Processes (MDP)27, a Kalman filtering in the image space along with a frame-by-frame data association based on the Hungarian algorithm and weights obtained by the amount of bounding-box overlap (SORT)28, or by a Convolutional Neural Network (CNN)29. Graph-learning based methods30,31 are effective in associating trajectories for the targets, but tend to fail in occlusion scenario. This problem can be dealt with by learning and updating the appearance of targets using a track management32 or a person re-identification dataset33. In order to increase robustness, a self-supervised learning detector can be employed by combining re-identification feature34 or by using the prediction of the motion35.


Re-identification Re-identification techniques deal with illumination changes, and variations of viewpoint and pose, by extracting robust visual features describing the target, including color36, texture37 and shape38 features, or by deep learning39. The latter methods are normally more effective as they are capable of obtaining the most discriminative features for the targets, although they fail in scenarios different from the training set. A solution to this is reinforcement learning which allows an algorithm trained on a dataset to be tested on another dataset40. An unsupervised cross-dataset transfer learning approach was proposed in41, where an asymmetric multi-task dictionary model was learned to extract discriminative features from an unlabelled target data. Cheng et al.42 introduced a transfer-metric learning approach with a shared latent subspace to describe the commonalities of persons in different datasets. Wang et al.43 proposed a transferable joint attribute-identity deep learning, which simultaneously learns attributed labels and identity features across different datasets.

Compared to the state-of-the-art methods, we deal with association by relying on a local database shared across the network in order to deal with continues changes of the appearance of a target and with cameras entering/exiting the network. Moreover, our algorithmic choices are made to optimize speed and enable a real-time implementation.

Proposed approach

Overview

Let \({\mathcal {C}} = \left\{ C_1,\ldots ,C_c, \ldots ,C_N\right\}\) be a network with N cameras and \({\mathcal {L}}=\{l_1, \ldots , l_l, \ldots l_L\}\) be the set of possible target labels. Each camera \(C_c\) has a local data-structure that stores the features for each target for the past J frames and is maintained up-to-date over time.

In order to operate in real time, a target-management module in each camera optimizes the assignment of the labels to the targets over time, and manages cameras leaving/joining the network.

For intra-camera tracking, each camera is equipped with target detection and tracking modules. As the latter has to be scale-invariant to cope with moving cameras and fast to maintain real-time, a trade-off has to be sought between fast trackers that may not be scale invariant44 and scale-invariant trackers that may be slow45. The target-management module performs association between existing targets and detections in each camera, and inter-camera association with the features of the targets received from other cameras.

Remark 1

Our focus is to implement an efficient target association while assuming an ideal communication across cameras, namely the data transmission has no loss or delay. In our experiments, cameras exchange targets information, which are wrapped by .xml files, through the computer memory. See46 for more details on non-ideal communication.

Target descriptor

Let \(\mathbf{x }_c^{l}(t)\) represent the features of target \(l_l\) at time t in camera \(C_c\) obtained by target detection and let a local data-structure in each \(C_c\) maintains over time the features of each target for the past J frames. The features for target \(l_l\) are defined as

$$\begin{aligned} \mathbf{x }_c^{l}(t)=[H_{\mathbf{x }_c^{l}(t)}, D_{\mathbf{x }_c^{l}(t)}], \end{aligned}$$
(1)

where \(H_{\mathbf{x }_c^{l}(t)}\) and \(D_{\mathbf{x }_c^{l}(t)}\) are the appearance and deep features of the target, respectively. \(H_{\mathbf{x }_c^{l}(t)}\) concatenates two RGB m-bin histograms \(H^1_{\mathbf{x }_c^{l}(t)}\) and \(H^2_{\mathbf{x }_c^{l}(t)}\), which are obtained on image patches of upper and lower parts of a target. The bins of the histogram are defined through a computationally efficient colour-naming (CN) approach following the insights of47 that defines how CN is a strong visual attribute robust to intensity variations48,49 when the discriminative RGB values are learned directly from public datasets.

Similarly to47, we choose m = 11 for its discriminating accuracy with bins representing black, blue, brown, grey, green, orange, pink, purple, red, white and yellow colours. Unlike50 that employs same-size patches, we calculate the histograms on image patches with size adaptive to the target bounding box in order to deal with changes in target size. Let M and N be the bounding-box height and width, respectively, the side of an image patch is

$$\begin{aligned} a = \frac{\max \left\{ M,N\right\} }{2K} \end{aligned}$$
(2)

pixels. \(H^1_{\mathbf{x }_c^{l}(t)}\) and \(H^2_{\mathbf{x }_c^{l}(t)}\) are each obtained on K/2 squared image patches, whose centre \({\varvec{r}}\) is located as50:

$$\begin{aligned} {\mathcal {N}}({\varvec{r}}|\varvec{\mu } ,\varvec{\varSigma })=(2\pi )^{-\frac{K}{2}}|\varvec{\varSigma }|^{-\frac{K}{2}}e^{-\frac{K}{2}({\varvec{r}}-\varvec{\mu })^T\varvec{\varSigma }^{-1}({\varvec{r}}-\varvec{\mu })}, \end{aligned}$$
(3)

where \({\mathcal {N}}\) is a normal probability density function with mean \(\varvec{\mu }=[M/2, N/2]\) and covariance matrix

$$\begin{aligned} \varvec{\varSigma } =\begin{bmatrix}2N&{}0\\ 0&{}3M\end{bmatrix}. \end{aligned}$$
(4)

Colour histogram feature is insensitive to pose and shape deformation variation, because it utilizes the statistical information of the target. However, as the detected target images usually include background and occlusion, the statistical feature is not robust for real-world application. Deep learning based methods have been successfully applied in extracting discriminative feature for re-identification51. Although these methods achieve better accuracy, they are usually time-consuming. To achieve real-time processing, we use an efficient pre-trained backbone network to extract feature. The choice of backbone is explained in detail in “Experimental results” section.

As shown in Fig. 3, the appearance feature \(H_{\mathbf{x }_c^{l}(t)}\) concatenates upper and lower CN histograms and the deep feature \(D_{\mathbf{x }_c^{l}(t)}\) is extracted from a backbone network.

Figure 3
figure 3

Appearance feature (top) as the concatenation of upper (light green) and lower (light blue) histograms and deep feature (bottom) extracted from a backbone network.

Target management

The target-management module performs association between existing targets and new target detections (intra-camera association), and between existing targets and new targets from the network (inter-camera association). The pairs of targets, i and j, considered for association are those with a high appearance-correlation

$$\begin{aligned} \kappa (\mathbf{x }_c^{i}(t), \mathbf{x }_c^{j}(t))>\psi , \end{aligned}$$
(5)

where \(\kappa\) is the correlation function and, only for intra-camera association, spatial intersection-over-union of bounding boxes greater than \(\gamma\). The more abrupt the illumination changes are expected in the scene, the lower \(\psi\), and the faster the targets are expected to be and the lower the fps of the video stream is, the lower \(\gamma\). Association is performed by the Hungarian Algorithm7 and, in intra-camera association, detections not associated are considered new targets. A consensus among cameras is obtained by performing the intra-camera association, followed by the inter-camera association. This maintains the labels consistent over time for targets meeting the appearance-correlation constraint (Eq. 5). The target management module processes sequentially the inputs received by the network and shares in the network modifications on appearance (and label).

Object features are updated in the data-structure as

$$\begin{aligned} \mathbf{x }_c^{l}(t+1) = (1-\alpha _f)\hat{\mathbf{x }}_c({\hat{t}}) + \alpha _f \mathbf{x }_c^{l}(t), \end{aligned}$$
(6)

for intra-camera association, where \(\hat{\mathbf{x }}_c({\hat{t}})\) is the appearance feature of the associated detection, \({\hat{t}}\in \{t-J, \ldots ,t-1,t\}\) and \(\alpha _f\) is the forgetting factor of each camera. A lower \(\alpha _f\) would result in a less discriminative feature vector, while a higher \(\alpha _f\) would make the tracking less responsive to appearance changes, thus producing drift.

For inter-camera association, appearance features are updated with the data received from other cameras as:

$$\begin{aligned} \mathbf{x }_c^{{\overline{l}}}(t+1) = (1-\alpha _n)\overline{\mathbf{x }}_{{\overline{c}}}^{{\overline{l}}}({\overline{t}}) + \alpha _n \mathbf{x }_c^{l}(t), \end{aligned}$$
(7)

where \(\overline{\mathbf{x }}_{{\overline{c}}}^{{\overline{l}}}({\overline{t}})\) is the appearance feature of the associated target with label \(l_{{\overline{l}}}\) from camera \(C_{{\overline{c}}}\), \({\overline{t}}\in \{t-J,\ldots ,t-1,t\}\) and \(\alpha _n\) is the network factor. The lower \(\alpha _n\), the more the information from the network is considered.

Validation

Datasets and experimental setup

To validate the proposed method, we decided to run our experiments on people as target. Existing camera network datasets only contain static cameras where also the cameras topology is available, like PETS200952, NLPR_MCT12, DukeMTMC53, however in order to properly test the proposed method, we require a dataset with targets moving continuously across cameras. To this aim, we introduce a new dataset that contains six scenes with up to four people recorded with two moving hand-held cameras, where people are annotated with a bounding box (using vbb54). The diagrammatic overview of the six scenes is shown in Fig. 4. Videos are in HD (1280 \(\times\) 720 pixels), running at 30 Hz and having more than 10,000 frames in total.

Figure 4
figure 4

Diagrammatic overview of the proposed dataset. Legend: Trapezoid = camera; blue arrow = camera movement; red arrows = target movement.

In Scene 1 and 2, we have static people but they continuously enter/exit the cameras’ FOVs due to the cameras motion, in Scene 3 and 4 people move and the illumination conditions change drastically, and in Scene 5 and 6 people move and occlude each other beside entering/exiting the cameras FOVs. The dataset is fully labeled. Each person in the sequences is manually annotated using the video bounding box (vbb)54. The annotations consist of position and size of the objects labeled with a unique ID.

For intra-camera tracking, we detect people with EfficientDet25 which is faster than YOLO23 and SSD22, and track them with Fast Compressive Tracking (FCT)55, chosen because of its speed (150 fps) and scale-invariant properties. FCT differentiates between target and background by calculating the likelihood of a nearby patch belonging to a target with an online Naive Bayes classifier. A convolution with Haar Filters56 generates a high-dimensional multi-scale feature vector, which is reduced by Compressive Sensing55. We initialize one FCT per EfficientDet detection and improve its performance by combining it with new detections obtained every \(\delta\) frames or when the FCT tracking confidence, \(\phi\), is lower than a threshold \(\beta\). DMMA can run live but the validation in this section is performed on video datasets to allow a proper analysis. DMMA is instantiated with \(\delta = 5\) frames, J = 2 frames, \(\alpha _f=0.5\), \(\alpha _n=0.2\), \(\gamma =0.2\), \(\psi =0.4\) and K = 48, and FCT with \(\beta = 0.4\).

We implement all experiments using the same system, whose configuration is shown in Table 1.

Table 1 Configuration of experimental environment.

Performance measures

To evaluate the performance of target descriptors, we use Cumulative Matching Characteristic (CMC) curves57 as the evaluation criteria, which is defined as a function of Rank-r:

$$\begin{aligned} q(r)=\frac{|C(r)|}{|{\mathcal {P}}_g|}, \end{aligned}$$
(8)

where \(|{\mathcal {P}}_g|\) represents the total number of images in the gallery, and the query set C(r) is defined as:

$$\begin{aligned} C(r)=\left\{ p_i: rank(p_i) \le r \right\} \quad \forall p_i \in {\mathcal {P}}_g. \end{aligned}$$
(9)

Since most intra-camera tracking algorithms usually use the multi-object tracking metrics as their evaluation criteria, we utilize the evaluation metrics defined in58. These include number of False Positives (FP), number of False Negatives (FN), number of ID Switches (IDS), number of Mostly Lost (ML) trajectories, number of Mostly Tracked (MT) trajectories, Multiple Object Tracking Accuracy (MOTA, summary of overall tracking accuracy in terms of FP, FN and IDS), and IDF153, while inter-camera association with Multi-Camera object Tracking Accuracy (MCTA)12:

$$\begin{aligned} MCTA=\left( \frac{2pr}{p+r}\right) \left( 1-\frac{ \sum _{t}m_t^s}{\sum _{t}u_t^s}\right) \left( 1-\frac{ \sum _{t}m_t^c}{\sum _{t}u_t^c}\right) \end{aligned}$$
(10)

where \(p = 1-\frac{\sum _{t}f_t}{\sum _{t}h_t}\) is the precision, \(r = 1-\frac{\sum _{t}i_t}{\sum _{t}g_t}\) is the recall, and \(m_t\), \(u_t\), \(f_t\), \(h_t\), \(i_t\) and \(g_t\) are the number of ID switches, true positives, false positives, trajectory hypothesises, misses and ground truths at time t, respectively, and where s and c denote matches within the same and across cameras, respectively. MCTA ranges between 0 and 1 (the higher MCTA, the better the performance). Speed is measured in frames per seconds (fps) on the algorithms.

Experimental results

In this section, we firstly evaluate the target representation, the intra-camera, and the inter-camera tracking performances. Then we analyze the impact of parameters and compare with state-of-the-art methods on MOT16 dataset. Finally, the qualitative results are depicted.


Target representation performance Table 2 compares the appearance representation, CN, with the results by the Hue (H) and Saturation (S) histograms of the randomly-sampled patches projected on 30 H bins and 32 S bins concatenated (HS); a deep feature with accurate backbone (NASNet59); a effecient backbone (MobileNet24); and by concatenating CN and MobileNet (CN + MobileNet). Results are reported as the percentage of correctly matched pairs within a specific rank57 and speed, on 600 pairs of images distributed among different targets and case difficulty (e.g. due to occlusions or lighting changes) of the proposed dataset. As can be observed, the NASNet has the best performance with 94.2% of queries resulting in rank 1 correct match. CN + MobileNet is second with approximately 92.1% of the queries resulting in rank 1 correct match and 98.3% in the 30 top ranked. However, the speed of NASNet (12.5 fps) is two times slower than ours (28.1 fps). Thus, the proposed CN + MobileNet shows the best trade-off in terms of performance and speed.

Table 2 Comparison of appearance and deep features (see “Experimental results” section for details).

Intra-camera tracking performance We compare the proposed method against DeepSORT29, MDP27, MFI_tst35 and FairMOT34, for intra-camera tracking. As DMMA would use information across cameras, we perform a comparison with DMMA run as an intra-camera tracker, such as with no inter-camera communications (DMMA-nc). We also compare DMMA against detector and Hungarian Algorithm at every frame with no FCT tracking (DMMA-nt). DMMA-nc and DMMA-nt are baselines optimized for the task-at-hand. Table 3 compares intra-camera tracking results. DMMA-nc is the only method running in real-time (32 fps), while maintaining the best average MOTA. In the most difficult scenes in terms of colour changes and heavy occlusions (scenes 3, 5 and 6), DeepSORT drops accuracy with respect to MDP and DMMA-nc, while FairMOT shows comparable results with respect to DMMA-nc but cannot reach a real-time performance. Where FairMOT and MDP have a higher MOTA, DMMA-nc has a comparable accuracy. Figure 5 shows sample tracking results on the proposed datasets.

Table 3 Comparison of intra-camera tracking accuracy on the proposed dataset, and speed of detection and tracking combined.
Figure 5
figure 5

Intra-camera tracking comparison (proposed dataset: scene 2 and camera 2) with target-size changes and one heavy occlusion. Top to bottom: DeepSORT29, MDP27 and DMMA-nc. Left to right: frames 1, 190, 203 and 220. DeepSORT and MDP wrongly assign labels 3–4.


Inter-camera tracking performance Table 4 reports the inter-camera association results. DMMA has a higher MCTA than DMMA-nt and DMMA-nc. DMMA-nc performs better than DMMA-nt, but worse than DMMA, thus validating the use of information from the network. The result of DMMA (MCTA 63.9) on scene 3 which has heavy illumination changes can be considered satisfactory, given that no explicit cross-camera calibration or training is performed.

Table 4 Performance evaluation for inter-camera association on the proposed dataset.

In terms of speed, DMMA achieves 32 fps, only 1 fps slower than DMMA-nc which does not receive data from the network. Note that DMMA-nc and DMMA have a higher standard deviation due to the variability of the target search performed by FCT. As we performed all the tests with display on for the analysis of the results, we also tested the proposed solution with no display to simulate how the implementation would perform if deployed with no screens (when they are not required or available in a system). In this case, the speed increases by about 24% on average.


Impact of parameters Table 5 shows the impact of detection frequent \(\delta\) and maintaining frame number J on our dataset. As we can observe that too large \(\delta\) and J lead to degradation of accuracy, which indicates drift caused without the detector’s correction over a long duration. However, smaller \(\delta\) results in recalling detector and initializing trackers frequently, which is time-consuming. Consequentially, we set \(\delta = 5\) and J = 2 to strike a good balance between speed and accuracy. We further perform a sensitivity analysis for \(\psi\), \(\gamma\), \(\alpha _f\) and \(\alpha _n\), and, on average, results remain substantially unchanged in our experiments with a 10% variation.

Table 5 MCTA of different \(\delta\) and J on the proposed dataset (bold: best results).

Performance on MOT16 We compare DMMA-nc with state-of-the-art MOT trackers including one-shot (FairMOT) and two-step (DeepSort29 and MFI_tst35) MOT trackers. Following FairMOT34, we pre-train the detector on the CrowdHuman dataset60. Table 6 shows the performance results. Due to the robustness of proposed target representation, we have the lowest IDs within comparative trackers. This demonstrates that we obtain consistent trajectories of objects. Also, DMMA-nc has the second highest MOTA score and IDF1. This can be attributed to the proposed target management maintaining object association in spite of occlusions and entrance/exiting of camera FoVs. Although FairMOT out-performances DMMA-nc in MOT metrics, the main contribution of DMMA is to devise a data association among mobile camera network without cross-camera calibration.

Table 6 Comparison of MOT trackers on MOT16 dataset (\(\downarrow\) = the lower the better; and \(\uparrow\) = the higher the better; bold: best results).

Qualitative results Finally, qualitative results are shown in Fig. 6. In Fig. 6e, f, we can appreciate the heavy illumination change in Scene 3 that leads to a wrong label assignment in Camera 1 while tracking performs well in Camera 2. In Fig. 6h, although Target 2 is completely occluded by Target 4, the method can properly assign the correct label. Similarly, in Fig. 6k the correct labels are assigned even when the targets are not entirely visible. However labels 5 and labels 6 are wrongly assigned due to the very dark conditions created in the scene.

Figure 6
figure 6

DMMA results on the proposed dataset. Different scenes show different frame numbers to better demonstrate the challenging scenarios.

Conclusion

We presented a target-management module for multi-camera multi-target tracking for a moving-camera network that runs in real-time reaching 32 fps on HD videos. The tracker, DMMA, allows cameras to join or leave without affecting the network’s performance along with targets that are re-identified when re-entering the camera’s FOVs. The tracker can also deal with heavy occlusions and targets at different scales. Experiments were performed on a new mobile-camera dataset and public MOT dataset. Experiment results demonstrate the proposed approach performs well in terms of accuracy, effectiveness and speed.

As future work, we will extend the validation to other camera networks with a variable number of cameras and with a real communication channel.

Informed consent

For online open-access publication of the images has been obtained from all the participants.