Abstract
A histogram is a compact representation of the distribution of data in an image with a full range of applications in diverse fields. Histogram generation is an inherently sequential operation where every pixel votes in a reduced set of bins. This makes finding efficient parallel implementations very desirable but challenging, because on graphics processing units thousands of threads may be atomically updating a short number of histogram bins. Under these circumstances, collisions among threads will be very frequent and such collisions will serialize thread execution, seriously damaging the performance. In this paper we propose a highly optimized approach to histogram calculation, which tackles such performance bottlenecks. It uses histogram replication for eliminating position conflicts, padding to reduce bank conflicts, and an improved access to input data called interleaved read access. Our so-called \({\mathcal{R}}\) -per-block approach to histogram calculation has been successfully compared to the main state-of-the-art works using four histogram-based image processing kernels and two real image databases. Results show that our proposal is between 1.4 and 15.7 faster than every previous implementation for histograms of up to 4,096 bins.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Hateren J.H.v., Schaaf A.v.d.: Independent component filters of natural images compared with simple cells in primary visual cortex. Proc. Biol. Sci. 265(1394), 359–366 (1998)
Idris, F., Panchanathan, S.: Review of image and video indexing techniques. J. Vis. Commun. Image Represent. 8(2), 146–166 (1997). doi:10.1006/jvci.1997.0355.http://www.sciencedirect.com/science/article/B6WMK-45KKSGK-5/2/df25df5374b5ce44616de5550980b9d2
Khronos group: OpenCL (2011). http://www.khronos.org/opencl/
Nugteren, C., van den Braak, G.J., Corporaal, H., Mesman, B.: High performance predictable histogramming on gpus: exploring and evaluating algorithm trade-offs. In: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, GPGPU-4, pp. 1:1–1:8. ACM, New York (2011). http://doi.acm.org/10.1145/1964179.1964181
NVIDIA: Fermi compute architecture. White paper (2009). http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
NVIDIA: CUDA C Best Practices Guide 4.0 (2011). http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Best_Practices_Guide.pdf
NVIDIA: CUDA C Programming Guide 4.0 (2011). http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf
NVIDIA: CUDA Zone (2011). http://developer.nvidia.com/category/zone/cuda-zone
Olmos A., Frederick A.: A biologically inspired algorithm for the recovery of shading and reflectance images. Perception 33(12), 1463 (2004)
Pal, N.R., Pal, S.K.: A review on image segmentation techniques. Pattern Recogn. 26(9), 1277–1294 (1993). http://www.scopus.co.. Cited by (since 1996): 975
Podlozhnyuk, V.: Histogram calculation in CUDA. White paper (2007). http://developer.download.nvidia.com/compute/cuda/1_1/Website/projects/histogram256/doc/histogram.pdf
Shams, R., Kennedy, R.A.: Efficient histogram algorithms for NVIDIA CUDA compatible devices. In: Proc. Int. Conf. on Signal Processing and Communications Systems (ICSPCS), pp. 418–422. Gold Coast, Australia (2007)
Shams R., Sadeghi P., Kennedy R.A., Hartley R.: Parallel computation of mutual information on the GPU with application to real-time registration of 3D medical images. Comput. Methods Programs Biomed. 99(2), 133–146 (2010)
West J., Fitzpatrick J.M., Wang M.Y., Dawant B.M., Maurer C.R., Kessler R.M., Maciunas R.J., Barillot C., Lemoine D., Collignon A., Maes F., Sumanaweera T.S., Harkness B., Hemler P.F., Hill D.L.G., Hawkes D.J., Studholme C., Maintz J.B.A., Viergever M.A., Mal G., Pennec X., Noz M.E., Maguire G.Q., Pollack M., Pelizzari C.A., Robb R.A., Hanson D., Woods R.P.: Comparison and evaluation of retrospective intermodality brain image registration techniques. J. Comput. Assist. Tomogr. 21, 554–566 (1997)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gómez-Luna, J., González-Linares, J.M., Benavides, J.I. et al. An optimized approach to histogram computation on GPU. Machine Vision and Applications 24, 899–908 (2013). https://doi.org/10.1007/s00138-012-0443-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-012-0443-3