Abstract
Overlapping communication with computation is a well-known approach to improving performance. Previous research has focused on optimizations performed by the programmer. This paper presents a compiler algorithm that automatically determines the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation. The algorithm avoids generating redundant communication by providing a framework for combining information on data dependence, communication, and reuse. It also describes a method of generating messages to exchange data between processors for tiled loops on distributed memory machines. The algorithm has been implemented in our High Performance Fortran (HPF) compiler, and experimental results have shown its effectiveness on distributed memory machines, such as the RISC System/6000 Scalable POWERparallel System. This paper also discusses the architectural problems of efficient optimization.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
REFERENCES
Stanford SUIF Compiler Group, SUIF: A Parallelizing and Optimizing Research Compiler, Technical Report, Stanford University, CSL-TR-94-620 (1994).
C. W. Tseng, An Optimizing Fortran D Compiler for MIMD Distributed-Memory Machines, Ph.D. thesis, Rice University, CRPC-TR93291 (1993).
H. P. Zima, H. J. Bast, and M. Gerndt, SUPERB: A Tool for Semiautomatic MIMD-SIMD Parallelization, Parallel Computing 6:1–18 (1988).
Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, and S. Ranka, Fortran90D-HPF Compiler for Distributed Memory MIMD Computers: Design, Implementation and Performance Results, Proc. Supercomputing, pp. 351–360 (1993).
P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, and S. Ramaswamy, The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers, Proc. First Int'l Workshop on Parallel Processing, pp. 322–330 (1994).
T. Shindo, H. Iwashita, T. Doi, J. Hagiwara, and S. Kaneshiro, HPF Compiler for the AP1000, Proc. Int'l. Conf. Supercomputing, pp. 190–194 (1995).
High Performance Fortran Forum, High Performance Fortran Language Specification, Version 1.0, Technical Report, Rice University, CRPC-TR92225 (1992).
S. Hiranandani, K. Kennedy, and C. W. Tseng, Compiling Fortran D for MIMD Distributed-Memory Machines, Comm. ACM 35:66-80 (1992).
D. E. Culler, A. Dusseau, S. Goldstein, A. Krishnamurthy, S. Lumetta, T. Eicken, and K. Yelick, Parallel Programming in Split-C, Proc. Supercomputing, pp. 262–273 (1993).
A. Lain and P. Banerjee, Techniques to Overlap Computation and Communication in Irregular Iterative Applications, Proc. Int'l Conf. Supercomputing, pp. 236–245 (1994).
S. Hiranandani, K. Kennedy, and C. W. Tseng, Preliminary Experiences with the Fortan D Compiler, Proc. Supercomputing, pp. 338–350 (1993).
T. Horie, K. Hayashi, T. Shimizu, and H. Ishihata, Improving AP1000 Parallel Computer Performance with Message Communication, 20th Ann. Int'l Symp. Computer Architecture, pp. 314–325 (1993).
A. Rogar and K. Pingali, Process Decomposition Through Locality of Reference, Proc. SIGPLAN '89 Conf. Progr. Language Design and Implementation (1989).
D. J. Palermo, E. Su, J. A. Chandy, and P. Banerjee, Communication Optimizations Used in the PARADIGM Compiler for Distributed-Memory Multicomputers, Proc. 23rd Int'l Conf. Parallel Processing, pp. II:1–10 (1994).
H. Ohta, Y. Saito, M. Kainaga, and H. Ono, Optimal Tile Size Adjustment in Compiling General DOACROSS Loop Nests, Proc. Int'l Conf. Supercomputing, pp. 270–279 (1995).
U. Banerjee, Unimodular Transformations of Double Loops, Proc. Workshop on Advances Lang. Compilers for Parallel Processing, pp. 192–219 (1990).
M. Wolfe, High Performance Compiler for Parallel Computing, Addison-Wesley Publishing Company, (1995).
M. Wolfe, More Iteration Space Tiling, Proc. Supercomputing, pp. 655–664 (1989).
M. E. Wolfe and M. S. Lam, A Loop Transformation and Theory and an Algorithm to Maximize Parallelism, IEEE Trans. Parallel Distrib. Syst. 2(4):452–471 (1991).
T. Agewara, J. L. Martin, J. H. Mirza, D. C. Sadler, D. M. Dias, and M. Snir, SP2 System Architecture, IBM Syst. J. 344(2):152–184 (1995).
M. E. Wolfe and M. S. Lam, A Data Locality Optimizing Algorithm, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation, pp. 30–44 (1991).
C. Koelbel, P. Mehrotra, and J. V. Rosendale, Supporting Shared Data Structures on Distributed Memory Architectures, Proc. ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, pp. 177–186 (1990).
R. Hanxlenden and K. Kennedy, GIVE-N-TAKE: A Balanced Code Placement Framework, Proc. ACM SIGPLAN '94 Conf. Progr. Lang. Design and Implementation, pp. 107–120 (1994).
K. Fujiwara, K. Shiratori, M. Suzuki and H. Kasahara, Multiprocessor Scheduling Algorithms Considering Data-Preloading and Poststoring, Trans. IEICE, D-1 75(8):495–503 (1992).
A. W. Lim and M. S. Lam, Maximizing Parallelism and Minimizing Synchronization with Affine Transforms, Conf. Record of the 24th Ann. ACM SIGPLAN-SIGACT Symp. on Principles of Progr. Lang. (1997).
J. M. Anderson, S. P. Amarasinghe, and M. S. Lam, Data and Computation Transformations for Multiprocessors, Proc. Fifth ACM SIGPLAN Symp. on Principles and Practice of Parallel Processing (1995).
Michael Philippsen, Automatic Alignment of Array Data and Processes to Reduce Communication Time on DMPPs, Proc. Fifth ACM SIGPLAN Symp. on Principles and Practice of Parallel Processing (1995).
K. Ishizaki and H. Komatsu, A Loop Parallelization Algorithm for HPF Compilers, Eigth Workshop on Language and Compilers for Parallel Computing, pp. 12.1–15 (1995).
D. Calllahan and K. Kennedy, Compiling Programs for Distributed-Memory Multiprocessors, J. Supercomputing 2:151–169 (1988).
T. Suganuma, H. Komatsu, and T. Nakatani, Detection and Global Optimization of Reduction Operations for Distributed Parallel Machines, Proc. Int'l Conf. Supercomputing (1996).
M. Snir, P. Hochschild, D. D. Fryer, and K. J. Gildea, The Communication Software and Parallel Environment of the IBM SP2, IBM Syst. J. 34(2):205–221 (1995).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ishizaki, K., Komatsu, H. & Nakatani, T. A Loop Transformation Algorithm for Communication Overlapping. International Journal of Parallel Programming 28, 135–154 (2000). https://doi.org/10.1023/A:1007554715418
Issue Date:
DOI: https://doi.org/10.1023/A:1007554715418