Abstract
This paper describes a real-time implementation of a recently proposed background maintenance algorithm and reports the relative performances. Experimental results on dynamic scenes taken from a fixed camera show that the proposed parallel algorithm produces background images with an improved quality with respect to classical pixel-wise algorithms, obtaining a speedup of more than 35 times compared to CPU implementation. It is worth noting that we used both the GeForce 9 series (actually a 9800 GPU) available from the year 2008 and the GeForce 200 series (actually a 295 GPU) available from the year 2009. Finally, we show that this parallel implementation allows us to use it in real-time moving object detection application.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Graphic Processing Unit
- Background Model
- Background Image
- Graphic Processing Unit Implementation
- Move Object Detection
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
In computer vision systems, a background model is a representation of the background image and it is based on its associated statistics. Background models are widely used for foreground objects segmentation, which is a fundamental task in many computer vision problems including moving object detection, shadow detection and removal, and image classification problems. As the visual scene changes with time, these models are continuously updated to include the required background modifications; the development of model updating algorithms is called Background Maintenance Problem.
As described in [3, 18], there are many problems that the background mainenance algorithms should solve, mainly related to the reaction of the background to both sudden or gradual changes in the visual scene, such as the sudden or gradual environmental light changes. Moreover, the moving object detection process can generate ghost images if the background image reconstruction is not fast enough. Other problems may be caused by shadows, because foreground objects often generate cast shadows which appear different from the modeled background.
Hence high quality background management algorithms are generally quite compex. In fact, there is a trade-off between the accuracy of the background image and the computing time the algorithms require. Of course the complexity of the computation of these methods is the big obstacle for real-time applications.
For this reason, many authors implemented background management algorithms on a Graphic Processing Unit (GPU). GPU’s have a high number of parallel processors and implement the stream processor paradigm, a form of Single Instruction, Multiple Data (SIMD) parallel processing. Under this paradigm, the same series of operations (kernel function) are independently applied onto each element in a set of data (stream) in an unspecified order and in batches of an (a priori) undetermined number of elements. This type of processing is particularly suited when each input pixel is independently processed.
In this paper we describe the GPU implementation of the background maintenance algorithm described in [4]. The algorithm is indeed pixel-wise, as almost all the processing are performed independently on each pixel and, as such, it is well suited to GPU implementation and high speed-ups can be expected. It is worth remarking that an important characteristic of real time computation of background maintenance algorithms is important at least for detecting moving objects from high definition images. It is worth recalling that the trend of digital cameras concerns increasing image resolution and increasing frame rate. In fact, with high resolution images it is possible to perform zoom of a particular region of the image, such as a door or an entrance (as shown in Fig. 1, right panel). Moreover, high frame rate allows to capture elements moving at high speed, such as fast running persons or high speed cars. Both these features are very important in video surveillance. Another application of the proposed algorithm is when the scene is crowded: the background is quite completely hidden for a long period of time and need to be reconstructed in real-time. In Fig. 1 we report two examples of typical images used in the experimental measurements described in Sect. 6.
This paper is organized as follows. Section 2 deals with some previously published work on GPU implementation of other Computer Vison algorithms. The related speed-up figures are reported. In Sect. 3 a review of the previously published ([4]) the pixel-wise algorithms for background maintenance is reported, and in Sect. 4 the parallelization of the previously proposed algorithm is presented. In Sect. 6 its performances in term of computational efficiency, speedup and quality are outlined and discussed. Finally in Sect. 7 some final remarks are proposed.
2 Related Work
Graphic Processing Units or GPUs are highly parallel processors, initially thought for graphic processing and rapidly evolved to general purpose processing, or GPGPU. Image processing and Computer graphic remain however the fields where the GPU can give the greatest benefits, as long as the algorithms are mostly pixel-wise. Clearly using the GPU for algorithmic processing, frees the CPU for other tasks.
Many GPU implementations of background methods have been published in the previous years. The work of [11] presents a foreground-background segmentation based on a color similarity test in a small pixel neighborhood, integrated into a Bayesian estimation framework, where iterative MRF-based model is applied. Using images of size 640x480 pixels on an NVIDIA GeForce 6800GT graphics card with AGP 8x, the following run-times are reported, depending on the number of MRF-iterations. The time needed for uploading the input images to the GPU and downloading the final segmentation is not included. The run-time varies from 2.51 ms to 4.84 ms for 0 and 10 iterations, correspondingly. In [15], a background model with low latency is constructed to detect moving objects in video sequences. Subsequent frames are stacked on top of each other using associated registration information that must be obtained in a preprocessing step. Their GPU implementation is running on a test system equipped with Nvidia GeForce 8800 GTX model, featuring 768 MB of video ram. The time required to construct the mean and median approximation backgrounds of length 48 on a combined GPU and CPU implementation is 16 and 151 ms, respectively. The authors in [20] describe a GPU-based implementation of motion detection from a moving platform. A step compensating for camera motion is required prior to estimating of the background model. Due to inevitable registration errors, the background model is estimated according to a sliding window of frames. The background model is based on an RGB texture histogram and a search for the bin with the largest number of samples. The resulting GPU-based implementation can build the background model and detect motion regions at around 18 fps on 320videos. Finally, the work of [13] proposes an approach that incorporates a pixel-based online learning method to adapt to temporal background changes, together with a graph cuts method to propagate per-pixel evaluation results over nearby pixels. The speed of their method is measured on a Lenovo T60 laptop with Intel Centrino T2500 CPU and ATI Mobility Radeon X1400 GPU. For image sequences with 320x240 resolutions, they achieve 16 fps.
In [16] Pham et al. describe a GPU implementation of an improved version of the Extended Gaussian mixture background model. Pham et al. show in this paper that their GPU implementation gained a speed-up of at least 10 with respect to a CPU implementation. Pham et al. used Core 2 2.6 GHz CPU and a GeForse 9600GT GPU.
In [14], the authors implemented a background maintenance algorithm based on an adaptive mixture of Gaussians (AGMM). Using an NVIDIA GPU with the CUDA [1] they achieved 18 acceleration compared with an implementation on an Intel multicore CPU using multithreading. In the same paper, an implementation on IBMs Cell Broadband Engine Architecture (CBEA) achieved 3 the acceleration compared with the same Intel multicore CPU benchmark.
3 The Baseline Method in a Nutshell
The block diagram of the resulting algorithm is shown in Fig. 2.
The difference between background and foreground is computed to establish which pixels of the image would be updated. The difference vector \(\varDelta \) is calculated as follows:
where (x, y) is the pixel position, \(I^{c}\) the intensity of the current image for the channel c, \(c = \left( Red, Green, Blue \right) \), \(B^{c}\) the intensity of the background image and \(\tau =[\tau ^R,\tau ^G,\tau ^B]^T\) is a vector of thresholds used to detect changes in each channel.
For each image \(I^{c}\), at each frame t, the color distribution for each pixel (x, y) is calculated using histogram analysis:
At each frame t, the numbers of Found Changes (FC) and Not Found Changes (NFC) are updated as shown in (2) and (3), where U is a parameter that have to be assigned in order to control the update rate of the background model. A typical value of U is equal to 100 frames.
FC and NFC are used to trigger the background updating phase, which is performed if the number of Changes Found for the pixel (x, y) is greater than a given threshold. In the EHB algorithm, this threshold is constant for all the image, while in the proposed algorithm is computed for each pixel, as follows.
Introducing a weight \(\alpha _{x,y}\) on the variability of the intensity of the pixel (x, y):
where the fraction \(\frac{1}{\gamma }\) is typically around \(\frac{1}{3}\), and a weight \(\beta _{x,y}\) on the number of changed pixels:
we compute the threshold \(\phi _{x,y}\) as
Equations (4) and (5) use the instantaneous change of pixel (x, y), represented by the binary matrix \(M_{x,y}(t)\) computed as follows:
Thus, if \(FC_{x,y} > \phi _{x,y}\) the pixel in the background is considered to be changed and hence its histogram model should to be updated. Moreover, if the model is changed, the background image should be reconstructed from the histogram model.
The matrix NFC is also used for another background maintenance problem. Over long acquisition time, if a pixel has small variations under the threshold \(\phi \), it can have changed its value. So, if NFC is greater than 100 times U, the background image is computed from the histograms model even for unchanged pixels.
This algorithm offers some improved features with respect to EHB. First of all, the algorithm is capable to adapt the background to the gradual changes of lights that happens at different hours and weather conditions during the day, as the histograms are continuously updated, and is capable to adapt single parts of the background image taking into account the different dynamics of the changes in different regions of the grabbed image. The proposed algorithm is also well suited to face the problem of sudden light changes, as when a light is turned on or when sun appears among the numbs, choosing accordingly the parameter U. Moreover, one can expect a reduced number of I/O operations due to the reduced updates of the background image. Some other features are in common to EHB, such as the absence of a training phase and the fact that it can work properly when the start grabbed image has foreground elements already present.
4 GPU-Based Implementation
GPU computing turns the massive floating-point computational power of a modern graphics accelerator’s shader pipeline into general-purpose computing power. When utilizing a GPU there are several things that must be considered, as the internal structure of the GPU is completely different from the internal structure of CPUs (Fig. 3).
First of all, the execution model of GPUs is really different from CPUs. GPUs employ massive parallelism and wide vector instructions, executing the same instruction for more elements at a time. Without designing algorithms that take this into consideration, the performance will be only a small fraction of what the hardware is capable of. Fig. 4 shows how a multithreaded program can easily adapt to different GPU structures.
Another thing to consider is that the GPU is on a separate circuit board, the graphic card, that is connected to the rest of the computer through a PCI express slot (as shown in Fig. 5), which implies that all data transfer between the CPU and the GPU is relatively slow. On the basis of this considerations, it is only through an accurate understanding of the architecture that we can develop and implement efficient algorithms.
5 Parallelization of the Proposed Histogram-Based Algorithm for Moving Object Segmentation
The proposed approach has been implemented on GPU: each acquired image is divided into 8x8 pixel blocks and for each block a thread pool of independent threads is instantiated.
A big amount of memory is required because, inside the GPU, for each concurrent thread several data structure have to be stored for each pixel, namely the three histograms \(H^c\), M, FC and NFC. A schema of the data structure is represented in Fig. 6.
Each thread updates the model of a single pixel of the background. As the pixels are update by independent threads, this approach does not require inter-thread communication to synchronize the thread operations. A schematic representation of the overall parallelized algorithm is reported in Fig. 7.
6 Experimental Assessment and Analysis
The results described in this section have been computed on one core of an Intel Core 2 Quad Q9550 CPU running at 2.83 GHz and will be used to evaluate the GPU speedup. We implemented on GPUs the proposed algorithm and the EHB algorithm. In the following, the performance in terms of computational time are presented. In Table 1 the time required for the computation on two different GPU architectures is reported: a single GPU board (NVIDIA GeForce 9800 GT with 512 MB) and a dual GPU board (GTX 295 with 1024 MB).
A typical measure used to evaluate the scalability of parallel algorithms is the speedup, defined as the ratio of the CPU time over the GPU time. In Fig. 8 the speed-up of the proposed algorithm is reported for different dimensions of the image: the nominal resolution of the images on the camera used is 320x240 pixels, corresponding to 76800 pixels on the abscissa of Fig. 8. The GPU time is computed on an NVIDIA 9800 GTX.
In Fig. 9 we report the similarity of the proposed algorithm vs. the control parameter U (described in Sect. 3). As the frame rate increases, the image precision can reach a higher value, and more computational effort is required. This improvement is due to the fact that the background is updated more frequently and it can allow to observe fast event as quickly moving objects, fast changes and light variations. If the update frequency is too low, when an event occurs among two update time instants, it will not be recorded.
Finally, we evaluated the quality of the background model computed by GPU. This is in general difficult to perform as it would require ground truth background models. Hence, we compare the background model generated by GPU with the one generated by the CPU version, computing the average difference AD described in Eq. (8) as proposed in [22]:
The average difference AD, evaluated on the same 13000 frames on GPU and CPU, is 0.5 % and the variance is 0.5 %. Thus, we can conclude that GPU and CPU versions are providing the same results.
7 Concluding Remarks and Future Work
It is worth noting that the background can be estimated, as in Yu-Medioni [22], with a simple average of previous frames. This average approach is faster. However, its performance are very poor, as shown in Fig. 10.
Figure 10 shows, from the top, the same complex scenes reported in Fig. 1. From the top, we see the grabbed image, the reconstructed background in the middle, and, at the bottom, the difference image D obtained with the average approach used in Yu-Medioni [22]. It is clear that in the difference image D there are many ghosts which make impossible to determine the moving objects.
On the other hand, in Fig. 11 the same scenes are processed with the proposed algorithm implemented on the GPU. It is evident from Fig. 11 that the algorithm implemented on GPU gives the same results of the algorithm implemented on CPU, as reported in [4]. Moreover the algorithm leads to a much better results than the average approach used by Yu-Medioni because the reconstructed backgrounds are cleaner (middle panel) and the difference images (on the bottom) allow a much more precise determination of moving objects.
Finally, it is important to note that the proposed algorithm is well tailored to parallel implementation. In fact, Fig. 12 shows that on GPU the improved background quality of the proposed algorithm is obtained with low computational time. In Fig. 12 we report the ratio of the computation time between EHB and the proposed algorithm versus different image resolutions. For the considered image dimensions, the proposed algorithm scales very well as the number of pixels increases.
As the evolution of video camera technology provides more powerful devices, the resolution of the acquired image becomes higher and higher to provide better definition of details. In the video surveillance field, higher resolutions allow to zoom a region of an image without sacrifice spatial resolution. It is worth noting that the proposed algorithm is well suited for high resolution images, as it presents a linear speedup as the number of pixel increases. Moreover, the proposed algorithm can manage high frame rate in real-time, and it is suited for video tracking of rapidly moving objects.
At current state of the art, full HD videos can be managed in real-time using the current generation of GPU. The proposed parallel algorithm slightly depends on the particular GPU architecture adopted, so it might operate properly on future generation GPUs. Other interesting extensions of the overall framework concern with: (i) studying how fragmentation techniques (e.g., [2, 7]) can be integrated as to improve the efficiency of our framework; (ii) moving towards the Big-Data’s philosophy (e.g., [5, 6, 21]), as moving objects naturally generate big data sets; (iii) exploring privacy-preservation issues (e.g., [8–10]), which are now becoming more and more critical for image processing research (e.g., [12, 17, 19]).
References
\(\bullet \), Cuda c programming guide. http://developer.nvidia.com/cuda/nvidia-gpu-computing-documentation
Bonifati, A., Cuzzocrea, A.: Efficient fragmentation of large XML documents. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 539–550. Springer, Heidelberg (2007)
Cucchiara, R., Grana, C., Piccardi, M., Prati, A.: Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans. Pattern Anal. Mach. Intell. 25(10), 1337–1342 (2003)
Cuzzocrea, A., Mumolo, E., Moro, A., Umeda, K.: Effective and efficient moving object segmentation via an innovative statistical approach. In: Proceedings of International Conference on Complex, Intelligent, and Software Intensive Systems (2015)
Cuzzocrea, A.: Analytics over big data: exploring the convergence of datawarehousing, OLAP and data-intensive cloud infrastructures. In: 37th Annual IEEE Computer Software and Applications Conference, COMPSAC 2013, Kyoto, Japan, July 22–26, 2013, pp. 481–483 (2013)
Cuzzocrea, A., Bellatreche, L., Song, I.-Y.: Data warehousing and OLAP over big data: current challenges and future research directions. In: Proceedings of the Sixteenth International Workshop on Data Warehousing and OLAP, DOLAP 2013, San Francisco, CA, USA, October 28, 2013, pp. 67–70 (2013)
Cuzzocrea, A., Darmont, J., Mahboubi, H.: Fragmenting very large XML data warehouses via k-means clustering algorithm. IJBIDM 4(3/4), 301–328 (2009)
Cuzzocrea, A., Russo, V.: Privacy preserving OLAP and OLAP security. In: Encyclopedia of Data Warehousing and Mining, 2nd edn., vol. 4, pp. 1575–1581 (2009)
Cuzzocrea, A., Russo, V., Saccà, D.: A robust sampling-based framework for privacy preserving OLAP. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 97–114. Springer, Heidelberg (2008)
Cuzzocrea, A., Saccà, D.: Balancing accuracy and privacy of OLAP aggregations on data cubes. Proceedings of the DOLAP 2010, ACM 13th International Workshop on Data Warehousing and OLAP, Toronto, Ontario, Canada, October 30, 2010, pp. 93–98 (2010)
Griesser, A., De Roeck, S., Neubeck, A., Van Gool, L.: Gpu-based foreground background segmentation using an extended colinearity criterion. In: Proceedings of Vision, Modeling and Visualization
Donghui, H., Bin, S., Zheng, S., Zhao, Z.-Q., Xintao, W., Xindong, W.: Security and privacy protocols for perceptual image hashing. IJSNet 17(3), 146–162 (2015)
Cheng, L., Gong, M.: Real-time foreground segmentation on gpus using local online learning and global graph cut optimization. In: ICPR
Wolf, M., Poremba, M., Xie, Y.: Accelerating adaptive background subtraction with gpu and cbea architecture. In: Proceedings of the IEEE Workshop Signal Processing Systems
Ohmer, J.F., Perry, P.G., Redding, N.J.: Gpu-accelerated background generation algorithm with low latency. In: Proceedings of the Conference of the Australian Pattern Recognition Society on Digital Image Compression Techniques and Applications
Pham, V., Phong, V.D., Hung, V.T., Bac, L.H.: Gpu implementation of extended gaussian mixture model for background subtraction. In: Proceedings of the IEEE International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future
Squicciarini, A.C., Lin, D., Sundareswaran, S., Wede, J.: Privacy policy inference of user-uploaded images on content sharing sites. IEEE Trans. Knowl. Data Eng. 27(1), 193–206 (2015)
Toyama, K., Krumm, J., Brumitt, B., Meyers, B.: Wallflower: principles and practice of background maintenance. In: The Proceedings of the Seventh IEEE International Conference on Computer Visio
Wang, C., Zhang, B., Ren, K., Roveda, J.: Privacy-assured outsourcing of image reconstruction service in cloud. IEEE Trans. Emerging Topics Comput. 1(1), 166–177 (2013)
Medioni G., Qian, Y.: A gpu implementation of motion detection from a moving platform. In: CVPR
Yu, B., Cuzzocrea, A., Jeong, D.H., Maydebura, S.: On managing very large sensor-network data using bigtable. In: 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012, Ottawa, Canada, May 13–16, 2012, pp. 918–922 (2012)
Qian, Yu., Medioni, G.: A gpu-based implementation of motion detection from a moving platform. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2008, pp. 1–6 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Cuzzocrea, A., Mumolo, E., Moro, A., Umeda, K. (2015). A GPU-Based Statistical Framework for Moving Object Segmentation: Implementation, Analysis and Applications. In: Di Fatta, G., Fortino, G., Li, W., Pathan, M., Stahl, F., Guerrieri, A. (eds) Internet and Distributed Computing Systems. IDCS 2015. Lecture Notes in Computer Science(), vol 9258. Springer, Cham. https://doi.org/10.1007/978-3-319-23237-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-23237-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23236-2
Online ISBN: 978-3-319-23237-9
eBook Packages: Computer ScienceComputer Science (R0)