Parallelizing User-Defined and Implicit Reductions Globally on Multiprocessors

Liao, Shih-wei

doi:10.1007/11859802_16

Shih-wei Liao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4186))

Included in the following conference series:

Asia-Pacific Conference on Advances in Computer Systems Architecture

664 Accesses
4 Citations

Abstract

Multiprocessors are becoming prevalent in the PC world. Major CPU vendors such as Intel and Advanced Micro Devices have migrated to multicore processors. However, this also means that computers will run an application at full speed only if that application is parallelized. To take advantage of more than a fraction of compute resource on a die, we develop a compiler to parallelize a common and powerful programming paradigm, namely reduction. Our goal is to exploit the full potential of reductions for efficient execution of applications on multiprocessors, including multicores. Note that reduction operations are common in streaming applications, financial computing and HPC domain. In fact, 9% of all MPI invocations in the NAS Parallel Benchmarks are reduction library calls. Recognizing implicit reductions in Fortran and C is important for parallelization on multiprocessors. Recent languages such as Brook Streaming language and Chapel language allow users to specify reduction functions. Our compiler provides a unified framework for processing both implicit and user-defined reductions. Both types of reductions are propagated and analyzed interprocedurally. Our global algorithm can enhance the scope of user-defined reductions and parallelize coarser-grained reductions. Thanking to the powerful algorithm and representation, we obtain an average speedup of 3 on 4 processors. The speedup is only 1.7 if only intraprocedural scalar reductions are parallelized.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Legacy code and parallel computing: updating and parallelizing a numerical model

Article 23 January 2020

Steal Locally, Share Globally

Article 28 March 2015

Multithreaded runtime framework for parallel and adaptive applications

Article 31 July 2022

Keywords

References

Buck. Brook Language Specification (October 2003), http://merrimac.stanford.edu/brook
Deitz, S., Callahan, D., Chamberlain, B., Snyder, L.: Global-View Abstractions for User-Defined Reductions and Scans. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, New York (March 2006)
Google Scholar
Hall, M., Amarasinghe, S., Murphy, B., Liao, S., Lam, M.: Detecting Coarse-Grain Parallelism Using an Interprocedural Parallelizing Compiler. In: Proceedings of Supercomputing, San Diego, CA (December 1995)
Google Scholar
Hall, M., Anderson, J., Amarasinghe, S., Murphy, B., Liao, S., Bugnion, E., Lam, M.S.: Maximizing Multiprocessor Performance with the SUIF Compiler. IEEE Computer 29(12) (December 1996)
Google Scholar
Bailey, D., Harris, T., Saphir, W., Van der Wijngaart, R., Woo, A., Yarrow, M.: The NAS Parallel Benchmarks 2.0. Technical Report RNR-95-020, NASA Ames Research Center, Moffet Field, CA (December 1995)
Google Scholar
Blelloch, G.E.: Vector Models for Data Parallel Computing. MIIT Press, Cambridge (1990)
Google Scholar
Intel Multi-Core and AMD Multi-Core Technology (June 2006), http://www.intel.com/multi-core/ , http://www.intel.com/multi-core/
Iverson, K.: A Programming Language. John Wiley & Sons, Chichester (1962)
MATH Google Scholar
Liao, S., Du, Z., Wu, G., Lueh, G.: Data and Computation Transformations for Brook Streaming Applications on Multiprocessors. In: IEEE/ACM International Symposium on Code Generation and Optimization, New York (March 2006)
Google Scholar
High Performance Fortran Forum. High Performance Fortran Specification Version 2.0 (January 1997)
Google Scholar
Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface, 2nd edn. MIT Press, Cambridge (1999)
Google Scholar
Charles, P., Donawa, C., Ebcioglu, K., Grothoff, C., Kielstra, A., von Praun, C., Saraswat, V., Sarkar, V.: X10: An Object-oriented Approach to Non-uniform Cluster Computing. In: Proceedings Of the Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA) – Onward! Track (October 2005)
Google Scholar
Official OpenMP Specifications Version 2.5 (May 2005), http://www.openmp.org
Fortress: A New Programming Language for Scientific Computing (2005), http://research.sun.com/projects/plrg/fortress0618.pdf
Ammarguellat, Z., Harrison, W.: Automatic Recognition of Induction Variables and Recurrence Relations by Abstract Interpretation. In: Proceedings of the SIGPLAN 1990 Conference on Programming Language Design and Implementation. White Plains, NY (1990)
Google Scholar
Haghighat, M., Polychronopoulos, C.: Symbolic Analysis: A Basis for Parallelization, Optimization and Scheduling of Programs. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768. Springer, Heidelberg (1994)
Google Scholar
Haghighat, M., Polychronopoulos, C.: Symbolic Analysis for Parallelizing Compilers. ACM Transactions on Programming Languages and Systems 18(4) (July 1996)
Google Scholar
Pottenger, B., Eigenmann, R.: Parallelization in the Presence of Generalized Induction and Reduction Variables. In: Proceedings of the 1995 ACM International Conference on Supercomputing (June 1995)
Google Scholar
Pointer, L.: Perfect: Performance Evaluation for Cost Effective Transformations Report 2. In: Technical Report 964, University of Illinois, Urbana-Champaign (March 1990)
Google Scholar

Download references

Author information

Authors and Affiliations

Intel Corporation, 2200 Mission College Blvd, Santa Clara, CA, 95054, USA
Shih-wei Liao

Authors

Shih-wei Liao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Systems Architecture Group, University of Amsterdam, The Netherlands
Chris Jesshope
School of Computer Science, University of Hertfordshire, College Lane, AL10 9AB, Hatfield, UK
Colin Egan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liao, Sw. (2006). Parallelizing User-Defined and Implicit Reductions Globally on Multiprocessors. In: Jesshope, C., Egan, C. (eds) Advances in Computer Systems Architecture. ACSAC 2006. Lecture Notes in Computer Science, vol 4186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11859802_16

Download citation

DOI: https://doi.org/10.1007/11859802_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40056-1
Online ISBN: 978-3-540-40058-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallelizing User-Defined and Implicit Reductions Globally on Multiprocessors

Abstract

Chapter PDF

Similar content being viewed by others

Legacy code and parallel computing: updating and parallelizing a numerical model

Steal Locally, Share Globally

Multithreaded runtime framework for parallel and adaptive applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Parallelizing User-Defined and Implicit Reductions Globally on Multiprocessors

Abstract

Chapter PDF

Similar content being viewed by others

Legacy code and parallel computing: updating and parallelizing a numerical model

Steal Locally, Share Globally

Multithreaded runtime framework for parallel and adaptive applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation