Abstract
In this paper, we propose a fast and flexible sorting algorithm with CUDA. The proposed algorithm is much more practical than the previous GPU-based sorting algorithms, as it is able to handle the sorting of elements represented by integers, floats and structures. Meanwhile, our algorithm is optimized for the modern GPU architecture to obtain high performance. We use different strategies for sorting disorderly list and nearly-sorted list to make it adaptive. Extensive experiments demon- strate our algorithm has higher performance than previous GPU-based sorting algorithms and can support real-time applications.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Purcell, T.J., Donner, C., Cammarano, M., Jensen, H.W., Hanrahan, P.: Photon Mapping on Programmable Graphics Hardware. In: Proceedings of the ACM Siggraph Eurographics Symposium on Graphics Hardware (2003)
Kapasi, U.J., Dally, W.J., Rixner, S., Mattson, P.R., Owens, J.D., Khailany, B.: Efficient Conditional Operations for Data-parallel Architectures. In: Proceedings of the 33rd annual ACM/IEEE International Symposium on Microarchitecture, pp. 159–170 (2000)
Kipfer, P., Segal, M., Westermann, R.: UberFlow: A GPU-based Particle Engine. In: Proceedings of the ACM Siggraph/Eurographics Conference on Graphics Hardware, pp. 115–122 (2004)
Greβ, A., Zachmann, G.: GPU-ABiSort: Optimal Parallel Sorting on Stream Architectures. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (2006)
Bilardi, G., Nicolau, A.: Adaptive Bitonic Sorting. An Optimal Parallel Algorithm for Shared Memory Machines. SIAM Journal on Computing 18(2), 216–228 (1989)
Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and Approximate Stream Mining of Quantiles and Frequencies Using Graphics Processors. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 611–622 (2005)
NVIDIA Corporation. NVIDIA CUDA Programming Guide (2008)
Sintorn, E., Assarsson, U.: Fast Parallel GPU-Sorting Using a Hybrid Algorithm. In: Workshop on General Purpose Processing on Graphics Processing Units (2007)
Cederman, D., Tsigas, P.: A Practical Quicksort Algorithm for Graphics Processors. Technical Report 2008-01, Computer Science and Engineering Chalmers University of Technology (2008)
Harris, M., Satish, N.: Designing Efficient Sorting Algorithms for Manycore GPUs. NVIDIA Technical Report (2008)
Harris, M.: Optimizing Parallel Reduction in CUDA. NVIDIA Developer Technology (2008)
Blelloch, E., Greg Plaxton, C., Leiserson, C.E., Smith, S.J., Maggs, B.M., Zagha, M.: An Experimental Analysis of Parallel Sorting Algorithms (1998)
Harris, M., Sengupta, S., Owens, J.D.: Parallel Prefix Sum (Scan) with CUDA. In: Nguyen, H. (ed.) GPU Gems 3, Addison-Wesley, Reading (2007)
Bilardi, G., Nicolau, A.: Adaptive bitonic sorting: an optimal parallel algorithm for shared-memory machines. SIAM J. Comput. 18(2), 216–228 (1989)
Kider, J.T.: GPU as a Parallel Machine: Sorting on the GPU, Lecture of University of Pennsylvania (2005)
Knuth, D.: Section 5.2.4: Sorting by merging. In: The Art of Computer Programming, Sorting and Searching, vol. 3, pp. 158–168 (1998) ISBN 0-201-89685-0
Harris, M.: Parallel Prefix Sum(Scan) with CUDA (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, S., Qin, J., Xie, Y., Zhao, J., Heng, PA. (2009). A Fast and Flexible Sorting Algorithm with CUDA. In: Hua, A., Chang, SL. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2009. Lecture Notes in Computer Science, vol 5574. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03095-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-03095-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03094-9
Online ISBN: 978-3-642-03095-6
eBook Packages: Computer ScienceComputer Science (R0)