Abstract
We describe a programming interface for parallel computing on NUMA (Non-Uniform Memory Access) shared memory machines. Although the interest in this architecture is rapidly growing and more and more hardware manufacturers offer products of this type, there is still a lack in parallelization support. We developed SMI, the Shared Memory Interface, and implemented it as a library on an SCI-coupled cluster of workstations. It aims at providing sophisticated support to account for the NUMA performance characteristic and to allow a step-by-step parallelization. We show it's application to the parallelization of a sparse matrix computation.
Preview
Unable to display preview. Download preview PDF.
References
Abandah, G. A.; Davidson, E. S.: Characterizing Shared Memory and Communication Performance: A Case Study of the Convex SPP-1000. Technical Report CSE-TR-277-96, Dept. of EECS, Univ. of Michigan, Ann Arbor, 1996.
Adve, S. V.; Gharachorloo, K.: Shared Memory Consistency Models: A Tutorial. WRL Research Report 95/7, Digital Western Res. Labs, Palo Alto, California, 1995.
Bemmerl, T.; Ries, B.: Programming Tools for Distributed Multiprocessor Environments. Int. J. of High Speed Comp., Vol. 5, No. 7, pp. 595–615, 1993.
Carter, J. B.; Bennett, J. K., Zwaenepoel, W.: Implementation and Performance of Munin. Proc. 13th ACM Symp. on Operating Sys. Principles (SOSP), pp. 152–164, Oct. 1991.
Chandra, R.; Gharachorloo, K.; Soundararajan, V.; Gupta, A.: Performance Evaluation of Hybrid Hardware and Software Distributed Shared Memory Protocols. Proc. 8th ACM Int. Conf. on Supercomputing, pp. 274–288, 1994.
Clark, R.; Alnes, K.: SCI Interconnect Chipset and Adapter: Building Large Scale Enterprise Servers with Pentium Pro SHV Nodes. Proc. Hot Interconnects IV, 1996.
Convex Computer Corp.: Convex Exemplar Architecture. 1994
Dolphin Interconnect Solutions, AS: SPARC SBus-SCI Cluster Adapter Card. White Paper, June 1995.
Dormanns, M.; Sprangers, W.; Ertl, H.; Bemmerl, T.: Performance Potential of a SCI Workstation Cluster for Grid-Based Scientific Codes. Proc. High Perf. Computing, 1997.
Falfasi, B.; Lebeck, A. R.; Reinhardt, S. K.; Schoinas, I.; Hill, M. D.; Larus, J. R.; Rogers, A.; Wood, D. A.: Application-Specific Protocols for User-Level Shared Memory. Proc. Supercomputing, 1994.
George, A.; Todd, R.; Phillips, W.; Miars, M.; Rosen, W.: Parallel Processing Experiments on an SCI-based Workstation Cluster. Proc. 5th Int. Workshop on SCI-based High-Perf. Low-Cost Computing, pp. 29–39, March 1996.
Gharachorloo, K.; Gupta, A.; Hennessy, J.: Performance Evaluation of memory Consistency Models for Shared-Memory Multiprocessors. Proc. 4th Int. Conf. on Arch. Support for Prog. Languages and Operating Systems, pp. 245–257, 1991.
Gillet, R. B.: Memory Channel Network for PCI. IEEE Micro, pp. 12–18, Feb. 1996.
IEEE: ANSI/IEEE Std. 1596–1992, Scalable Coherent Interface (SCI). 1992.
Iftode, L.; Singh, J.P.; Li, K.: Irregular Applications under Software Shared Memory. Technical Report TR-514-96, Dept. of Computer Science, Princeton Univ, 1996.
Lamport, L.: How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Trans. on Computers, C-28(9), pp. 241–248, Sept. 1979.
Lenoski, D. E.; Weber, W.-D.: Scalable Shared-Memory Multiprocessing. Morgan Kaufmann Publishers, 1995.
Lovett, T.; Clapp, R.: STiNG: A CC-NUMA Computer System for the Commercial Market-place. Proc. 23rd Annual Int. Symp. on Comp. Architecture, 1996.
Nieplocha, J.; Harrison, R. J.; Littlefield, R. J.: GLOBAL Arrays: A Portable “Shared-Memory” Programming Model for Distributed Memory Computers. Proc. Supercomputing, 1994.
Omang, K.; Parady, B.: Performance of Low-Cost UltraSparc Multiprocessors connected by SCI. Research Report No. 219, Univ. of Oslo, Dept. of Comp. Science, June 1996.
Protic, J.; Tomasevic, M.; Milutinovic, V.: Distributed Shared Memory: Concepts and Systems. IEEE Par. & Distr. Technology, Vol. 4, No. 2, pp. 63–79, 1996.
Saad, Y.: SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations. Technical Report 90-20, Research Institute for Advanced Computer Science (RIACS), NASA Ames Research Center, Moffet Field, CA, 1990.
Sandhu, H. S.; Gamsa, B.; Zhou, S.: The Shared Region Approach to Software Cache Coherence on Multiprocessors. Proc. ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pp. 229–238, 1993.
Torrellas, J.; Lam, M. S.; Hennessy, J. L.: False Sharing and Spatial Locality in Multiprocessor Caches. IEEE TOC, June 1994.
Zhang, X.; Yan, Y.; Castaneda, R.: Evaluating and Designing Software Mutual Exclusion Algorithms on Shared-Memory Multiprocessors. IEEE Par. and Distrib. Tech., Vol. 4, No. 1, pp.25–42, 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dormanns, M., Sprangers, W., Ertl, H., Bemmerl, T. (1997). A programming interface for NUMA shared-memory clusters. In: Hertzberger, B., Sloot, P. (eds) High-Performance Computing and Networking. HPCN-Europe 1997. Lecture Notes in Computer Science, vol 1225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0031641
Download citation
DOI: https://doi.org/10.1007/BFb0031641
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62898-9
Online ISBN: 978-3-540-69041-2
eBook Packages: Springer Book Archive