Using Free Scheduling for Programming Graphic Cards

Bielecki, Wlodzimierz; Palkowski, Marek

doi:10.1007/978-3-642-30397-5_7

Wlodzimierz Bielecki¹⁹ &
Marek Palkowski¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7174))

1206 Accesses
2 Citations

Abstract

An approach is presented permitting us to build free scheduling for statement instances of affine loops. Under the free schedule, loop statement instances are executed as soon as their operands are available. To describe and implement the approach, the dependence analysis by Pugh and Wonnacott was chosen where dependences are found in the form of tuple relations. The proposed algorithm has been implemented and verified by means of the Omega project software. Results of experiments with the NAS benchmark suite are discussed. Speed-up and efficiency of parallel code produced by means of the approach are studied. Problems to be resolved in order to enhance the power of the presented technique are outlined.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Parametric GPU Code Generation for Affine Loop Programs

Practical Approach to Introducing Parallelism in Sequential Programs

Towards High-Level Programming for Systems with Many Cores

Keywords

References

Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT 13 IEEE International Conference on Parallel Architecture and Compilation Techniques, Juan-les-Pins, pp. 7–16 (September 2004)
Google Scholar
Beletska, A., Bielecki, W., Cohen, A., Palkowski, M., Siedlecki, K.: Coarse-grained loop parallelization: Iteration space slicing vs affine transformations. Parallel Computing 37, 479–497 (2011)
Article Google Scholar
Beletskyy, V., Siedlecki, K.: Finding Free Schedules for Non-uniform Loops. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 297–302. Springer, Heidelberg (2003)
Chapter Google Scholar
Bielecki, W., Klimek, T., Trifunovic, K.: Calculating exact transitive closure for a normalized affine integer tuple relation. Electronic Notes in Discrete Mathematics 33, 7–14 (2009)
Article MathSciNet MATH Google Scholar
Wlodzimierz, B., Tomasz, K., Marek, P., Beletska, A.: An Iterative Algorithm of Computing the Transitive Closure of a Union of Parameterized Affine Integer Tuple Relations. In: Wu, W., Daescu, O. (eds.) COCOA 2010, Part I. LNCS, vol. 6508, pp. 104–113. Springer, Heidelberg (2010)
Chapter Google Scholar
Bielecki, W., Palkowski, M.: Extracting Both Affine and Non-linear Synchronization-Free Slices in Program Loops. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 196–205. Springer, Heidelberg (2010)
Chapter Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Conference on Programming Language Design and Implementation, pp. 101–113. ACM (2008)
Google Scholar
Chen, D.K.: Compiler optimizations for parallel loops with fine-grained synchronization. Ph.D. thesis, Champaign, IL, USA (1994), uMI Order No. GAX95-12325
Google Scholar
Darte, A., Robert, Y.: Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distrib. Syst. 5, 814–822 (1994)
Article Google Scholar
Darte, A., Vivien, F.: Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. In: Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques, PACT 1996, pp. 281–291. IEEE Computer Society, Washington, DC, USA (1996)
Chapter Google Scholar
Darte, A., Khachiyan, L., Robert, Y.: Linear scheduling is nearly optimal. Parallel Processing Letters 1(2), 73–81 (1991)
Article Google Scholar
Feautrier, P.: Some efficient solutions to the affine scheduling problem: I. one-dimensional time. Int. J. Parallel Program. 21(5), 313–348 (1992)
Article MathSciNet MATH Google Scholar
Feautrier, P.: Some efficient solutions to the affine scheduling problem: II. multi-dimensional time. Int. J. Parallel Program. 21(5), 389–420 (1992)
Article MathSciNet MATH Google Scholar
Kelly, W., Maslov, V., Pugh, W., Rosser, E., Shpeisman, T., Wonnacott, D.: The omega library interface guide. Tech. rep., College Park, MD, USA (1995)
Google Scholar
Kelly, W., Pugh, W.: A framework for unifying reordering transformations. Tech. rep., Univ. of Maryland Institute for Advanced Computer Studies Report No. UMIACS-TR-92-126.1, College Park, MD, USA (1993)
Google Scholar
Kelly, W., Pugh, W., Rosser, E., Shpeisman, T.: Transitive closure of infinite graphs and its applications. Int. J. Parallel Programming 24(6), 579–598 (1996)
Article Google Scholar
Krothapalli, V., Sadayappan, P.: Removal of redundant dependences in doacross loops with constant dependences. IEEE Transactions on Parallel and Distributed Systems 2, 281–289 (1991)
Article Google Scholar
Le Gouëslier d’Argence, P.: Affine scheduling on bounded convex polyhedric domains is asymptotically optimal. Theor. Comput. Sci. 196, 395–415 (1998)
Article MathSciNet MATH Google Scholar
Lim, A.W., Cheong, G.I., Lam, M.S.: An affine partitioning algorithm to maximize parallelism and minimize communication. In: Proceedings of the 13th ACM SIGARCH International Conference on Supercomputing, pp. 228–237. ACM Press (1999)
Google Scholar
Surhone, L.M., Tennoe, M.T., Henssonow, S.F.: Presburger Arithmetic. VDM Verlag Dr. Mueller AG & Co. Kg (2010); ISBN: 6133083557
Google Scholar
Midkiff, S.P., Padua, D.A.: Compiler algorithms for synchronization. IEEE Transactions on Computers 36, 1485–1495 (1987)
Article MATH Google Scholar
Midkiff, S.P., Padua, D.A.: A comparison of four synchronization optimization techniques. In: ICPP (2), pp. 9–16 (1991)
Google Scholar
NAS: Parallel Benchmarks Suite, Version 3.3 (February 2008), http://www.nas.nasa.gov
NVIDIA: NVIDIA CUDA C Programming Guide 4.0 (2011), http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf
Pugh, W., Wonnacott, D.: An Exact Method for Analysis of Value-Based Array Data Dependences. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768, pp. 546–566. Springer, Heidelberg (1994)
Chapter Google Scholar
Verdoolaege, S.: Integer set library - manual. Tech. rep. (2011), http://www.kotnet.org/~skimo//isl/manual.pdf
Vivien, F.: On the Optimality of Feautrier’s Scheduling Algorithm. In: Monien, B., Feldmann, R.L. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 299–308. Springer, Heidelberg (2002)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, West Pomeranian University of Technology, 70210, Zolnierska 49, Szczecin, Poland
Wlodzimierz Bielecki & Marek Palkowski

Authors

Wlodzimierz Bielecki
View author publications
You can also search for this author in PubMed Google Scholar
Marek Palkowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

High Performance Computing Center Stuttgart (HLRS), Universität Stuttgart, Nobelstraße 19, 70569, Stuttgart, Germany
Rainer Keller
Institute of Computer Science and Engineering, Karlsruhe Institute of Technology (KIT), Haid-und-Neu-Straße 7, 76131, Karlsruhe, Germany
David Kramer
Institute for Applied and Numerical Mathematics, SRG New Frontiers in High Performance Computing and Karlsruhe Institute of Technology (KIT), 4, Fritz-Erler-Straße 23, 76133, Karlsruhe, Germany
Jan-Philipp Weiss

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bielecki, W., Palkowski, M. (2012). Using Free Scheduling for Programming Graphic Cards. In: Keller, R., Kramer, D., Weiss, JP. (eds) Facing the Multicore - Challenge II. Lecture Notes in Computer Science, vol 7174. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30397-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-30397-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30396-8
Online ISBN: 978-3-642-30397-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Free Scheduling for Programming Graphic Cards

Abstract

Chapter PDF

Similar content being viewed by others

Parametric GPU Code Generation for Affine Loop Programs

Practical Approach to Introducing Parallelism in Sequential Programs

Towards High-Level Programming for Systems with Many Cores

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Using Free Scheduling for Programming Graphic Cards

Abstract

Chapter PDF

Similar content being viewed by others

Parametric GPU Code Generation for Affine Loop Programs

Practical Approach to Introducing Parallelism in Sequential Programs

Towards High-Level Programming for Systems with Many Cores

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation