Abstract
The M-Machine is an experimental multicomputer being developed to test architectural concepts motivated by the constraints of modern semiconductor technology and the demands of programming systems. The M-Machine computing nodes are connected with a 3-D mesh network; each node is a multithreaded processor incorporating 9 function units, on-chip cache, and local memory. The multiple function units are used to exploit both instruction-level and thread-level parallelism. A user accessible message passing system yields fast communication and synchronization between nodes. Rapid access to remote memory is provided transparently to the user with a combination of hardware and software mechanisms. This paper presents the architecture of the M-Machine and describes how its mechanisms attempt to maximize both single thread performance and overall system throughput. The architecture is complete and the MAP chip, which will serve as the M-Machine processing node, is currently being implemented.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
J. L. Hennessy and N. P. Jouppi, Computer Technology and Architecture: An Evolving Interaction.Computer, pp. 18–29 (September 1991).
C. A. Mead, L. A. Conway,Introduction to VLSI Systems. Addison-Wesley, Reading, Massachusetts, (1980).
L. Gwennap, New MIPS Chip Targets Windows NT Boxes.Microprocessor Report (November 18, 1992).
S. W. Keckler, and W. J. Dally, Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism.Proc. 19th Int’l. Symp. Computer Archit., Queensland, Australia, ACM, pp. 202–213 (May 1992).
N. P. Carter, S. W. Keckler, and W. J. Dally, Hardware Support for Fast Capability-Based Addressing.Proc. Sixth Int’l. Conf. on Archit. Support Progr. Lang. Oper. Syst. (ASPLO VI), Association for Computing Machinery Press, pp. 319–327 (October 1994).
R. Tomasulo, An Efficient Algorithm for Exploiting Multiple Arithmetic Units.IBM J. 11, 25–33 (January 1967).
W. M. Johnson,Superscalar Microprocessor Design. Prentice Hall, Englewood Cliffs, New Jersey (1991).
R. P. Colwell, W. E. Hall, C. S. Joshi, D. B. Papworth, P. K. Rodman, and J. E. Tornes, Architecture and Implementation of a VLIW Supercomputer.Proc. Supercomputing, IEEE Computer Society Press, pp. 910–919 (November 1990).
A. Gupta, and W.-D. Weber, Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results.Proc. 16th Ann. Symp. Computer Archit. IEEE, pp. 273–280 (May 1989).
R. H. Halstead, and T. Fujita, MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing.15th Ann. Symp. Computer Archit. IEEE Computer Society, pp. 443–451 (May 1988).
B. J. Smith, Architecture and Applications of the HEP Multiprocessor Computer System.SPIE Vol. 298 Real-Time Signal Processing IV, Denelcor, Inc., Aurora, Colorado, pp. 241–248 (1981).
R. Alversonet al., The Tera Computer System.Proc. Int’l. Conf. Supercomputing, ACM SIGPLAN Computer Architecture News, pp. 1–6 (September 1990).
R. S. Nikhil, G. M. Papadopoulos, Arvind, *T: A Multithreaded Massively Parallel Architecture. Computation Structures Group Memo 325-1, Laboratory for Computer Science, Massachusetts Institute of Technology (November 1991).
H. H. Humet al., A Design Study of the EARTH Multiprocessor.Int’l. Conf. Parallel Archit. and Compilation Techn., pp. 59–68 (1995).
S. Sakai, Y. Kodoma, and Y. Yamaguchi, Prototype Implementation of a Highly Parallel Dataflow Machine em-4.Proc. Fifth Int’l. Parallel Processing Symp., IEEE Computer Society, pp. 278–286 (May 1991).
A. Wolfe, and J. P. Shen, A Variable Instruction Stream Extension to the VLIW Architecture.Proc. Fourth Int’l. Conf. Archit. Support for Progr. Lang. Oper. Syst., ACM Press, pp. 2–14 (April 1991).
G. S. Sohi, S. E. Breach, and T. Vijaykumar, Multiscalar Processors.Proc. 22nd Int’l. Symp. Computer Archit., pp. 414–425 (May 1995).
D. M. Tullsen, S. J. Eggers, and H. M. Levy, Simultaneous Multithreading: Maximizing On-Chip Parallelism.Proc. 22nd Int’l. Symp. Computer Archit., pp. 392–403 (May 1995).
M. D. Noakes, D. A. Wallach, and W. J. Dally, The J-Machine Multicomputer: An Architectural Evaluation.Proc. 20th Int’l. Symp. Computer Archit., San Diego, California, IEEE, pp. 224–235 (May 1993).
W. J. Dallyet al., The J-Machine: A Fine-Grain Concurrent Computer.Proc. the IFIP Congress G. Ritter, (ed.), North-Holland, pp. 1147–1153 (August 1989).
P. Agrawal, W. Dally, W. Fischer, H. Jagadisch, A. Krishnakumar, and R. Tutundjian, A. Mars, A Multiprocessor-Based Programmable Accelerator.IEEE Design Test 4:28–36 (October 1987).
S. Borkaret al., Supporting Systolic and Memory Communication in Iwarp.Proc. 17th Int’l. Symp. Computer Archit., pp. 70–81 (May 1990).
G. M. Papadopoulos, G. A. Boughton, R. Grainer, and M. J. Beckerle, *T: Integrated Building Blocks for Parallel Computing.Proc. Supercomputing, IEEE, pp. 624–635 (1993).
D. S. Henry, and C. F. Joerg, A Tightly-Coupled Processor-Network Interface.Fifth Int’l. Conf. Archit. Support for Progr. Lang. Oper. Systems (ASPLOS V), ACM, pp. 111–122 (October 1992).
Cray Research, Inc.,Cray T3D System Architecture Overview. Chippewa Falls, Wisconsin (1993).
G. Pfisteret al., The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture.Proc. Int’l. Conf. Parallel Processing, pp. 764–771 (1985).
J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoniet al., The Stanford FLASH Multiprocessor.Proc. 21st Int’l. Symp. Computer Archit., IEEE, pp. 302–313 (April 1994).
L. K. Ivy, A Shared Virtual Memory System for Parallel Computing.Int’l. Conf. Parallel Processing, pp. 94–101 (1988).
A. Agarwalet al., The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor.Scalable Shared Memory Multiprocessor, Kluwer Academic Publishers, (1991).
D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, The DASH prototype: Implementation and Performance.Proc. 19th Ann. Int’l. Symp. Computer Archit., IEEE, pp. 92–103 (1992).
S. J. Franket al., Multiprocessor Digital Data Processing System. United States Patent No. 5,055,999 (October 8 1991).
P. G. Lowney, S. G. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O’Donnell, and J. C. Ruttenberg, The Multiflow Trace Scheduling Compiler.J. Supercomputing 7(1/2):51–142 (May 1993).
A. Zaafrani, H. G. Dietz, and M. O’Keefe, Static Scheduling for Barrier MIMD Architectures.Int’l. Conf. Parallel Processing (1990).
Y. Gurevich, The M-Machine Operating System. Master of Engineering Thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science (September 1995).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Fillo, M., Keckler, S.W., Dally, W.J. et al. The M-machine multicomputer. Int J Parallel Prog 25, 183–212 (1997). https://doi.org/10.1007/BF02700035
Issue Date:
DOI: https://doi.org/10.1007/BF02700035