Abstract
In recent years multi-core processors have seen broad adoption in application domains ranging from embedded systems through general-purpose computing to large-scale data centres. Simulation technology for multi-core systems, however, lags behind and does not provide the simulation speed required to effectively support design space exploration and parallel software development. While state-of-the-art instruction set simulators (Iss) for single-core machines reach or exceed the performance levels of speed-optimised silicon implementations of embedded processors, the same does not hold for multi-core simulators where large performance penalties are to be paid. In this paper we develop a fast and scalable simulation methodology for multi-core platforms based on parallel and just-in-time (Jit) dynamic binary translation (Dbt). Our approach can model large-scale multi-core configurations, does not rely on prior profiling, instrumentation, or compilation, and works for all binaries targeting a state-of-the-art embedded multi-core platform implementing the ARCompact instruction set architecture (Isa). We have evaluated our parallel simulation methodology against the industry standard Splash-2 and Eembc MultiBench benchmarks and demonstrate simulation speeds up to 25,307 Mips on a 32-core x86 host machine for as many as 2,048 target processors whilst exhibiting minimal and near constant overhead, including memory considerations.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Argollo E., Falcón A., Faraboschi P., Monchiero M., Ortega D.: COTSon: infrastructure for full system simulation. SIGOPS Oper. Syst. Rev. 43, 52–61 (2009). doi:10.1145/1496909.1496921
August D., Chang J., Girbal S., Gracia-Perez D., Mouchard G., Penry D.A., Temam O., Vachharajani N.: Unisim: an open simulation environment and library for complex architecture design and collaborative development. IEEE Comput. Archit. Lett. 6, 45–48 (2007). doi:10.1109/L-CA.2007.12
Austin T., Larson E., Ernst D.: SimpleScalar: an infrastructure for computer system modeling. Computer 35, 59–67 (2002). doi:10.1109/2.982917
Aycock J.: A brief history of just-in-time. ACM Comput. Surv. 35, 97–113 (2003)
Bellard, F.: QEMU, a fast and portable dynamic translator. In: Proceedings of the 2005 USENIX Annual Technical Conference, ATEC ’05, pp. 41–41. USENIX Association, Berkeley, CA, USA (2005)
Böhm, I., Franke, B., Topham, N.P.: Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator. In: Kurdahi, F.J., Takala J. (eds.) ICSAMOS, pp. 1–10. IEEE (2010)
Böhm, I., Edler von Koch, T.J., Kyle, S., Franke, B., Topham, N.: Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11), ACM (2011)
Chen J., Annavaram M., Dubois M.: SlackSim: a platform for parallel simulations of CMPs on CMPs. SIGARCH Comput. Archit. News 37, 20–29 (2009). doi:10.1145/1577129.1577134
Chidester M., George A.: Parallel simulation of chip-multiprocessor architectures. ACM Trans. Model. Comput. Simul. 12, 176–200 (2002). doi:10.1145/643114.643116
Chiou D., Angepat H., Patil N., Sunwoo D.: Accurate functional-first multicore simulators. IEEE Comput. Archit. Lett. 8, 64–67 (2009). doi:10.1109/L-CA.2009.44
Chiou, D., Sunwoo, D., Angepat, H., Kim, J., Patil, N., Reinhart, W., Johnson, D.: Parallelizing computer system simulators. In: Parallel and Distributed Processing, 2008, IPDPS 2008. IEEE International Symposium on, pp. 1–5 (2008). doi:10.1109/IPDPS.2008.4536407
Chung E.S., Nurvitadhi E., Hoe J.C., Falsafi B., Mai K.: PROToFLEX: FPGA-accelerated hybrid functional simulator. Parallel Distrib. Process. Symp. Int. 0, 326 (2007). doi:10.1109/IPDPS.2007.370516
Chung E.S., Papamichael M.K., Nurvitadhi E., Hoe J.C., Mai K., Falsafi B.: ProtoFlex: towards scalable, full-system multiprocessor simulations using FPGAs. ACM Trans. Reconfigurable Technol. Syst. 2, 15–11532 (2009). doi:10.1145/1534916.1534925
Covington, R., Dwarkada, S., Jump, J.R., Sinclair, J.B., Madala, S.: The efficient simulation of parallel computer systems. Int. J. Comput. Simul. 1(1), 31–58 (1991)
EnCore embedded processor. URL: http://groups.inf.ed.ac.uk/pasta/hw_encore.html
Hardavellas N., Somogyi S., Wenisch T.F., Wunderlich R.E., Chen S., Kim J., Falsafi B., Hoe J.C., Nowatzyk A.G.: SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. SIGMETRICS Perform. Eval. Rev. 31, 31–34 (2004). doi:10.1145/1054907.1054914
Hughes, C., Pai, V., Ranganathan, P., Adve, S.: RSIM: simulating shared-memory multiprocessors with ILP processors. Computer (2002)
Kanaujia, S., Papazian, I.E., Chamberlain, J., Baxter, J.: FastMP: a multi-core simulation methodology. In: Proceedings of the Workshop on Modeling, Benchmarking and Simulation (MoBS 2006), Boston, Massachusetts (2006)
Lantz, R.: Parallel SimOS: scalability and performance for large system simulation (2007). http://www-cs.stanford.edu
Lantz, R.: Fast functional simulation with parallel Embra. In: Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation (2008)
Magnusson P.S., Christensson M., Eskilson J., Forsgren D., Hållberg G., Högberg J., Larsson F., Moestedt A., Werner B.: Simics: a full system simulation platform. Computer 35, 50–58 (2002). doi:10.1109/2.982916
Martin M.M.K., Sorin D.J., Beckmann B.M., Marty M.R., Xu M., Alameldeen A.R., Moore K.E., Hill M.D., Wood D.A.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 92–99 (2005). doi:10.1145/1105734.1105747
Miller, J.E.M., Kasture, H., Kurian, G., Gruenwald III, C., Beckmann, N., Celio, C., Eastep, J., Agarwal, A.: Graphite: a distributed parallel simulator for multicores. In: The 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA) (2010)
Monchiero M., Ahn J.H., Falcón A., Ortega D., Faraboschi P.: How to simulate 1000 cores. SIGARCH Comput. Archit. News 37, 10–19 (2009). doi:10.1145/1577129.1577133
Mukherjee S.S., Reinhardt S.K., Falsafi B., Litzkow M., Hill M.D., Wood D.A., Huss-Lederman S., Larus J.R.: Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator. IEEE Concurr. 8, 12–20 (2000). doi:10.1109/4434.895100
PCSX2. URL: http://pcsx2.net/
Penry, D.A., Fay, D., Hodgdon, D., Wells, R., Schelle, G., August, D.I., Connors, D.: Exploiting parallelism and structure to accelerate the simulation of chip multi-processors. In: in Proceedings of the Twelfth International Symposium on High-Performance Computer Architecture, pp. 29–40 (2006)
Reinhardt, S.K., Hill, M.D., Larus, J.R., Lebeck, A.R., Lewis, J.C., Wood, D.A.: The wisconsin wind tunnel: virtual prototyping of parallel computers. In: Proceedings of the 1993 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’93, pp. 48–60. ACM, New York, NY, USA (1993). doi:10.1145/166955.166979
Sui, X., Wu, J., Yin, W., Zhou, D., Gong, Z.: MALsim: a functional-level parallel simulation platform for CMPs. In: 2nd International Conference on Computer Engineering and Technology (ICCET) 2010, vol. 2, p. V2, IEEE (2010)
Synopsys Inc.: ARCompact instruction set architecture. URL: http://www.synopsys.com
Tan, Z., Waterman, A., Avizienis, R., Lee, Y., Cook, H., Patterson, D., Asanović K.: RAMP gold: an FPGA-based architecture simulator for multiprocessors. In: Proceedings of the 47th Design Automation Conference, DAC ’10, pp. 463–468. ACM, New York, NY, USA (2010). doi:10.1145/1837274.1837390
Tan, Z., Waterman, A., Cook, H., Bird, S., Asanović, K., Patterson, D.: A case for FAME: FPGA architecture model execution. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pp. 290–301. ACM, New York, NY, USA (2010). doi:10.1145/1815961.1815999
The Embedded Microprocessor Benchmark Consortium: MultiBench 1.0 Multicore Benchmark Software (02 February 2010)
Wang K., Zhang Y., Wang H., Shen X.: Parallelization of IBM mambo system simulator in functional modes. ACM SIGOPS Oper. Syst. Rev. 42(1), 71–76 (2008)
Wawrzynek J., Patterson D., Oskin M., Lu S.L., Kozyrakis C., Hoe J.C., Chiou D., Asanovic K.: RAMP: research accelerator for multiple processors. IEEE Micro 27, 46–57 (2007). doi:10.1109/MM.2007.39
Wentzlaff, D., Agarwal, A.: Constructing virtual architectures on a tiled processor. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO ’06, pp. 173–184. IEEE Computer Society, Washington, DC, USA (2006). doi:10.1109/CGO.2006.11
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA ’95, pp. 24–36. ACM, New York, NY, USA (1995). doi:10.1145/223982.223990
Zheng, G., Kakulapati, G., Kalé, L.V.: BigSim: a parallel simulator for performance prediction of extremely large parallel machines. In: Parallel and Distributed Processing Symposium, International, vol. 1, p. 78b (2004). doi:10.1109/IPDPS.2004.1303013
Zhong R., Zhu Y., Chen W., Lin M., Wong W.F.: An inter-core communication enabled multi-core simulator based on simplescalar. Advanced Information Networking and Applications Workshops, International Conference 1, 758–763 (2007). doi:10.1109/AINAW.2007.87
Zhu, X., Malik, S.: Using a communication architecture specification in an application-driven retargetable prototyping platform for multiprocessing. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE ’04, vol. 2, pp. 21–244. IEEE Computer Society, Washington, DC, USA (2004)
Zhu, X., Wu, J., Sui, X., Yin, W., Wang, Q., Gong, Z.: PCAsim: a parallel cycle accurate simulation platform for CMPs. In: Proceedings of the 2010 International Conference on Computer Design and Applications (ICCDA), pp. V1-597–V1-601 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Almer, O., Böhm, I., von Koch, T.E. et al. A Parallel Dynamic Binary Translator for Efficient Multi-Core Simulation. Int J Parallel Prog 41, 212–235 (2013). https://doi.org/10.1007/s10766-012-0222-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-012-0222-9