Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In computer vision systems, a background model is a representation of the background image and it is based on its associated statistics. Background models are widely used for foreground objects segmentation, which is a fundamental task in many computer vision problems including moving object detection, shadow detection and removal, and image classification problems. As the visual scene changes with time, these models are continuously updated to include the required background modifications; the development of model updating algorithms is called Background Maintenance Problem.

As described in [3, 18], there are many problems that the background mainenance algorithms should solve, mainly related to the reaction of the background to both sudden or gradual changes in the visual scene, such as the sudden or gradual environmental light changes. Moreover, the moving object detection process can generate ghost images if the background image reconstruction is not fast enough. Other problems may be caused by shadows, because foreground objects often generate cast shadows which appear different from the modeled background.

Hence high quality background management algorithms are generally quite compex. In fact, there is a trade-off between the accuracy of the background image and the computing time the algorithms require. Of course the complexity of the computation of these methods is the big obstacle for real-time applications.

For this reason, many authors implemented background management algorithms on a Graphic Processing Unit (GPU). GPU’s have a high number of parallel processors and implement the stream processor paradigm, a form of Single Instruction, Multiple Data (SIMD) parallel processing. Under this paradigm, the same series of operations (kernel function) are independently applied onto each element in a set of data (stream) in an unspecified order and in batches of an (a priori) undetermined number of elements. This type of processing is particularly suited when each input pixel is independently processed.

In this paper we describe the GPU implementation of the background maintenance algorithm described in [4]. The algorithm is indeed pixel-wise, as almost all the processing are performed independently on each pixel and, as such, it is well suited to GPU implementation and high speed-ups can be expected. It is worth remarking that an important characteristic of real time computation of background maintenance algorithms is important at least for detecting moving objects from high definition images. It is worth recalling that the trend of digital cameras concerns increasing image resolution and increasing frame rate. In fact, with high resolution images it is possible to perform zoom of a particular region of the image, such as a door or an entrance (as shown in Fig. 1, right panel). Moreover, high frame rate allows to capture elements moving at high speed, such as fast running persons or high speed cars. Both these features are very important in video surveillance. Another application of the proposed algorithm is when the scene is crowded: the background is quite completely hidden for a long period of time and need to be reconstructed in real-time. In Fig. 1 we report two examples of typical images used in the experimental measurements described in Sect. 6.

Fig. 1.
figure 1

Example of complex scenes that requires highly accurate background maintenance.

This paper is organized as follows. Section 2 deals with some previously published work on GPU implementation of other Computer Vison algorithms. The related speed-up figures are reported. In Sect. 3 a review of the previously published ([4]) the pixel-wise algorithms for background maintenance is reported, and in Sect. 4 the parallelization of the previously proposed algorithm is presented. In Sect. 6 its performances in term of computational efficiency, speedup and quality are outlined and discussed. Finally in Sect. 7 some final remarks are proposed.

2 Related Work

Graphic Processing Units or GPUs are highly parallel processors, initially thought for graphic processing and rapidly evolved to general purpose processing, or GPGPU. Image processing and Computer graphic remain however the fields where the GPU can give the greatest benefits, as long as the algorithms are mostly pixel-wise. Clearly using the GPU for algorithmic processing, frees the CPU for other tasks.

Many GPU implementations of background methods have been published in the previous years. The work of [11] presents a foreground-background segmentation based on a color similarity test in a small pixel neighborhood, integrated into a Bayesian estimation framework, where iterative MRF-based model is applied. Using images of size 640x480 pixels on an NVIDIA GeForce 6800GT graphics card with AGP 8x, the following run-times are reported, depending on the number of MRF-iterations. The time needed for uploading the input images to the GPU and downloading the final segmentation is not included. The run-time varies from 2.51 ms to 4.84 ms for 0 and 10 iterations, correspondingly. In [15], a background model with low latency is constructed to detect moving objects in video sequences. Subsequent frames are stacked on top of each other using associated registration information that must be obtained in a preprocessing step. Their GPU implementation is running on a test system equipped with Nvidia GeForce 8800 GTX model, featuring 768 MB of video ram. The time required to construct the mean and median approximation backgrounds of length 48 on a combined GPU and CPU implementation is 16 and 151 ms, respectively. The authors in [20] describe a GPU-based implementation of motion detection from a moving platform. A step compensating for camera motion is required prior to estimating of the background model. Due to inevitable registration errors, the background model is estimated according to a sliding window of frames. The background model is based on an RGB texture histogram and a search for the bin with the largest number of samples. The resulting GPU-based implementation can build the background model and detect motion regions at around 18 fps on 320videos. Finally, the work of [13] proposes an approach that incorporates a pixel-based online learning method to adapt to temporal background changes, together with a graph cuts method to propagate per-pixel evaluation results over nearby pixels. The speed of their method is measured on a Lenovo T60 laptop with Intel Centrino T2500 CPU and ATI Mobility Radeon X1400 GPU. For image sequences with 320x240 resolutions, they achieve 16 fps.

In [16] Pham et al. describe a GPU implementation of an improved version of the Extended Gaussian mixture background model. Pham et al. show in this paper that their GPU implementation gained a speed-up of at least 10 with respect to a CPU implementation. Pham et al. used Core 2 2.6 GHz CPU and a GeForse 9600GT GPU.

In [14], the authors implemented a background maintenance algorithm based on an adaptive mixture of Gaussians (AGMM). Using an NVIDIA GPU with the CUDA [1] they achieved 18 acceleration compared with an implementation on an Intel multicore CPU using multithreading. In the same paper, an implementation on IBMs Cell Broadband Engine Architecture (CBEA) achieved 3 the acceleration compared with the same Intel multicore CPU benchmark.

3 The Baseline Method in a Nutshell

The block diagram of the resulting algorithm is shown in Fig. 2.

Fig. 2.
figure 2

Block diagram of the proposed algorithm.

The difference between background and foreground is computed to establish which pixels of the image would be updated. The difference vector \(\varDelta \) is calculated as follows:

$$\begin{aligned} \varDelta = \left[ \left| I^{R}_{x,y} - B^{R}_{x,y}\right| , \left| I^{G}_{x,y} - B^{G}_{x,y}\right| , \left| I^{B}_{x,y} - B^{B}_{x,y}\right| \right] ^T \end{aligned}$$

where (xy) is the pixel position, \(I^{c}\) the intensity of the current image for the channel c, \(c = \left( Red, Green, Blue \right) \), \(B^{c}\) the intensity of the background image and \(\tau =[\tau ^R,\tau ^G,\tau ^B]^T\) is a vector of thresholds used to detect changes in each channel.

For each image \(I^{c}\), at each frame t, the color distribution for each pixel (xy) is calculated using histogram analysis:

$$\begin{aligned} H\left( t+1, I^{c}\right) = \left\{ \begin{array}{ll} H\left( t, I^{c}\right) + 2 \cdot \delta \left[ p\left( x,y\right) - I^{c}\right] &{} \text{ if } \varDelta \ge \tau \\ H\left( t, I^{c}\right) + \delta \left[ p\left( x,y\right) - I^{c}\right] &{} \text{ otherwise } \end{array} \right. \end{aligned}$$
(1)

At each frame t, the numbers of Found Changes (FC) and Not Found Changes (NFC) are updated as shown in (2) and (3), where U is a parameter that have to be assigned in order to control the update rate of the background model. A typical value of U is equal to 100 frames.

$$\begin{aligned} FC_{x,y} (t+1) = \left\{ \begin{array}{cl} FC_{x,y} (t) + 1 &{} \text{ if } \varDelta \ge \tau \\ 0 &{} \text{ if } \varDelta \ngeq \tau \wedge NFC_{x,y}(t) = U \end{array} \right. \end{aligned}$$
(2)
$$\begin{aligned} NFC_{x,y} (t+1) = \left\{ \begin{array}{cl} NFC_{x,y} (t) + 1 &{} \text{ if } \varDelta \ngeq \tau \\ 0 &{} otherwise \end{array} \right. \end{aligned}$$
(3)

FC and NFC are used to trigger the background updating phase, which is performed if the number of Changes Found for the pixel (xy) is greater than a given threshold. In the EHB algorithm, this threshold is constant for all the image, while in the proposed algorithm is computed for each pixel, as follows.

Introducing a weight \(\alpha _{x,y}\) on the variability of the intensity of the pixel (xy):

$$\begin{aligned} \alpha _{x,y} = \frac{1}{max\left( 1,\sigma \left( x,y\right) \right) } \cdot \left( 1 - \frac{1}{\gamma }\frac{\sum ^{T}_{i=1} M_{x,y}(i)}{T} ) \right) , \end{aligned}$$
(4)

where the fraction \(\frac{1}{\gamma }\) is typically around \(\frac{1}{3}\), and a weight \(\beta _{x,y}\) on the number of changed pixels:

$$\begin{aligned} \beta _{x,y} = \frac{1}{\gamma } \cdot \left( \frac{ \sum _{x,y} M_{x,y}}{\text{ number } \text{ of } \text{ pixels } \text{ in } \text{ the } \text{ image }} + 1 \right) , \end{aligned}$$
(5)

we compute the threshold \(\phi _{x,y}\) as

$$\begin{aligned} \phi _{x,y} = ( \alpha _{x,y} - \beta _{x,y} ) \cdot U \end{aligned}$$
(6)

Equations (4) and (5) use the instantaneous change of pixel (xy), represented by the binary matrix \(M_{x,y}(t)\) computed as follows:

$$\begin{aligned} M_{x,y} (t) = \left\{ \begin{array}{cl} 1 &{} \text{ if } \varDelta \ge \tau \text{ at } \text{ time } \text{ t }\\ 0 &{} otherwise \end{array} \right. \end{aligned}$$
(7)

Thus, if \(FC_{x,y} > \phi _{x,y}\) the pixel in the background is considered to be changed and hence its histogram model should to be updated. Moreover, if the model is changed, the background image should be reconstructed from the histogram model.

The matrix NFC is also used for another background maintenance problem. Over long acquisition time, if a pixel has small variations under the threshold \(\phi \), it can have changed its value. So, if NFC is greater than 100 times U, the background image is computed from the histograms model even for unchanged pixels.

This algorithm offers some improved features with respect to EHB. First of all, the algorithm is capable to adapt the background to the gradual changes of lights that happens at different hours and weather conditions during the day, as the histograms are continuously updated, and is capable to adapt single parts of the background image taking into account the different dynamics of the changes in different regions of the grabbed image. The proposed algorithm is also well suited to face the problem of sudden light changes, as when a light is turned on or when sun appears among the numbs, choosing accordingly the parameter U. Moreover, one can expect a reduced number of I/O operations due to the reduced updates of the background image. Some other features are in common to EHB, such as the absence of a training phase and the fact that it can work properly when the start grabbed image has foreground elements already present.

4 GPU-Based Implementation

GPU computing turns the massive floating-point computational power of a modern graphics accelerator’s shader pipeline into general-purpose computing power. When utilizing a GPU there are several things that must be considered, as the internal structure of the GPU is completely different from the internal structure of CPUs (Fig. 3).

Fig. 3.
figure 3

Quick comparison between CPU and GPU.

First of all, the execution model of GPUs is really different from CPUs. GPUs employ massive parallelism and wide vector instructions, executing the same instruction for more elements at a time. Without designing algorithms that take this into consideration, the performance will be only a small fraction of what the hardware is capable of. Fig. 4 shows how a multithreaded program can easily adapt to different GPU structures.

Another thing to consider is that the GPU is on a separate circuit board, the graphic card, that is connected to the rest of the computer through a PCI express slot (as shown in Fig. 5), which implies that all data transfer between the CPU and the GPU is relatively slow. On the basis of this considerations, it is only through an accurate understanding of the architecture that we can develop and implement efficient algorithms.

5 Parallelization of the Proposed Histogram-Based Algorithm for Moving Object Segmentation

The proposed approach has been implemented on GPU: each acquired image is divided into 8x8 pixel blocks and for each block a thread pool of independent threads is instantiated.

A big amount of memory is required because, inside the GPU, for each concurrent thread several data structure have to be stored for each pixel, namely the three histograms \(H^c\), M, FC and NFC. A schema of the data structure is represented in Fig. 6.

Fig. 4.
figure 4

Execution of parallel threads on different GPU architectures.

Fig. 5.
figure 5

Architecture of the system GPU and CPU.

Fig. 6.
figure 6

Data structure used in the parallelized algorithm.

Each thread updates the model of a single pixel of the background. As the pixels are update by independent threads, this approach does not require inter-thread communication to synchronize the thread operations. A schematic representation of the overall parallelized algorithm is reported in Fig. 7.

6 Experimental Assessment and Analysis

The results described in this section have been computed on one core of an Intel Core 2 Quad Q9550 CPU running at 2.83 GHz and will be used to evaluate the GPU speedup. We implemented on GPUs the proposed algorithm and the EHB algorithm. In the following, the performance in terms of computational time are presented. In Table 1 the time required for the computation on two different GPU architectures is reported: a single GPU board (NVIDIA GeForce 9800 GT with 512 MB) and a dual GPU board (GTX 295 with 1024 MB).

A typical measure used to evaluate the scalability of parallel algorithms is the speedup, defined as the ratio of the CPU time over the GPU time. In Fig. 8 the speed-up of the proposed algorithm is reported for different dimensions of the image: the nominal resolution of the images on the camera used is 320x240 pixels, corresponding to 76800 pixels on the abscissa of Fig. 8. The GPU time is computed on an NVIDIA 9800 GTX.

Fig. 7.
figure 7

Data management in the parallelized algorithm.

Fig. 8.
figure 8

Speed-up evaluated on a Nvidia 9800 GTX.

Table 1. Computational time [ms] on different GPUs
Fig. 9.
figure 9

Similarity Index over the control parameter U.

Fig. 10.
figure 10

Results using a simple average algorithm implemented on CPU.

Fig. 11.
figure 11

Results using the proposed algorithm implemented on GPU.

Fig. 12.
figure 12

Ratio of the GPU computing time between EHB and the proposed algorithm.

In Fig. 9 we report the similarity of the proposed algorithm vs. the control parameter U (described in Sect. 3). As the frame rate increases, the image precision can reach a higher value, and more computational effort is required. This improvement is due to the fact that the background is updated more frequently and it can allow to observe fast event as quickly moving objects, fast changes and light variations. If the update frequency is too low, when an event occurs among two update time instants, it will not be recorded.

Finally, we evaluated the quality of the background model computed by GPU. This is in general difficult to perform as it would require ground truth background models. Hence, we compare the background model generated by GPU with the one generated by the CPU version, computing the average difference AD described in Eq. (8) as proposed in [22]:

$$\begin{aligned} AD = 2 \cdot abs \left( \frac{I_{GPU} - I_{CPU}}{I_{GPU} + I_{CPU}} \right) \end{aligned}$$
(8)

The average difference AD, evaluated on the same 13000 frames on GPU and CPU, is 0.5 % and the variance is 0.5 %. Thus, we can conclude that GPU and CPU versions are providing the same results.

7 Concluding Remarks and Future Work

It is worth noting that the background can be estimated, as in Yu-Medioni [22], with a simple average of previous frames. This average approach is faster. However, its performance are very poor, as shown in Fig. 10.

Figure 10 shows, from the top, the same complex scenes reported in Fig. 1. From the top, we see the grabbed image, the reconstructed background in the middle, and, at the bottom, the difference image D obtained with the average approach used in Yu-Medioni [22]. It is clear that in the difference image D there are many ghosts which make impossible to determine the moving objects.

On the other hand, in Fig. 11 the same scenes are processed with the proposed algorithm implemented on the GPU. It is evident from Fig. 11 that the algorithm implemented on GPU gives the same results of the algorithm implemented on CPU, as reported in [4]. Moreover the algorithm leads to a much better results than the average approach used by Yu-Medioni because the reconstructed backgrounds are cleaner (middle panel) and the difference images (on the bottom) allow a much more precise determination of moving objects.

Finally, it is important to note that the proposed algorithm is well tailored to parallel implementation. In fact, Fig. 12 shows that on GPU the improved background quality of the proposed algorithm is obtained with low computational time. In Fig. 12 we report the ratio of the computation time between EHB and the proposed algorithm versus different image resolutions. For the considered image dimensions, the proposed algorithm scales very well as the number of pixels increases.

As the evolution of video camera technology provides more powerful devices, the resolution of the acquired image becomes higher and higher to provide better definition of details. In the video surveillance field, higher resolutions allow to zoom a region of an image without sacrifice spatial resolution. It is worth noting that the proposed algorithm is well suited for high resolution images, as it presents a linear speedup as the number of pixel increases. Moreover, the proposed algorithm can manage high frame rate in real-time, and it is suited for video tracking of rapidly moving objects.

At current state of the art, full HD videos can be managed in real-time using the current generation of GPU. The proposed parallel algorithm slightly depends on the particular GPU architecture adopted, so it might operate properly on future generation GPUs. Other interesting extensions of the overall framework concern with: (i) studying how fragmentation techniques (e.g., [2, 7]) can be integrated as to improve the efficiency of our framework; (ii) moving towards the Big-Data’s philosophy (e.g., [5, 6, 21]), as moving objects naturally generate big data sets; (iii) exploring privacy-preservation issues (e.g., [810]), which are now becoming more and more critical for image processing research (e.g., [12, 17, 19]).