Abstract
In this paper, we review some alternatives to reduce the computational complexity of the Non-Local Means image filter and present a CUDA-based implementation of it for GPUs, comparing its performance on different GPUs and with respect to reference CPU implementations. Starting from a naive CUDA implementation, we describe different aspects of CUDA and the algorithm itself that can be leveraged to decrease the execution time. Our GPU implementation achieved speedups of up to 35.8x with respect to our reduced-complexity reference implementation on the CPU, and more than 700x over a plain CPU implementation.
ANII FMV200913042 and SticAmsud MMVPSCV. Thanks to P. Ezzatti and E. Dufrechou from Univ. de la Republica for discussions and running our code on their machines.
Chapter PDF
Similar content being viewed by others
References
Podlozhnyuk, V., Kharlamov, A.: Image convolution with CUDA. Technical report. NVIDIA, Inc., Santa Clara (2007)
Podlozhnyuk, V., Kharlamov, A.: Image denoising. Technical report. NVIDIA, Inc., Santa Clara (2007)
Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: CVPR, pp. 60–65 (2005)
Condat, L.: A simple trick to speed up the non-local means. Technical report
Darbon, J., Cunha, A., Chan, T., Osher, S., Jensen, G.: Fast nonlocal filtering applied to electron cryomicroscopy. In: ISBI, pp. 1331–1334 (2008)
Goossens, B., Luong, H., Aelterman, J., Pižurica, A., Philips, W.: A GPU-accelerated real-time NLMeans algorithm for denoising color video sequences. In: Blanc-Talon, J., Bone, D., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2010, Part II. LNCS, vol. 6475, pp. 46–57. Springer, Heidelberg (2010)
Orchard, J., Ebrahimi, M., Wong, A.: Efficient nonlocal-means denoising using the SVD. In: ICIP, pp. 1732–1735 (2008)
Tasdizen, T.: Principal neighborhood dictionaries for nonlocal means image denoising. IEEE Trans. on Image Process. 18(12), 2649–2660 (2009)
Wu, H., Zhang, W.-H., Gao, D.-Z., Yin, X.-D., Chen, Y., Wang, W.-D.: Fast CT image processing using parallelized non-local means. Journal of Medical and Biological Eng. 31(6), 437–441 (2011)
Mueller, K., Zheng, Z., Xu, W.: Performance tuning for CUDA-accelerated neighborhood denoising filters. In: Workshop on High Performance Image Reconstruction (July 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Márques, A., Pardo, A. (2013). Implementation of Non Local Means Filter in GPUs. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2013. Lecture Notes in Computer Science, vol 8258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41822-8_51
Download citation
DOI: https://doi.org/10.1007/978-3-642-41822-8_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41821-1
Online ISBN: 978-3-642-41822-8
eBook Packages: Computer ScienceComputer Science (R0)