Characterizing the Performance-Energy Tradeoff of Small ARM Cores in HPC Computation

Laurenzano, Michael A.; Tiwari, Ananta; Jundt, Adam; Peraza, Joshua; Ward, William A.; Campbell, Roy; Carrington, Laura

doi:10.1007/978-3-319-09873-9_11

Michael A. Laurenzano^16,17,
Ananta Tiwari^16,18,
Adam Jundt¹⁶,
Joshua Peraza¹⁶,
William A. Ward Jr.¹⁹,
Roy Campbell¹⁹ &
…
Laura Carrington^16,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8632))

Included in the following conference series:

European Conference on Parallel Processing

2991 Accesses
17 Citations

Abstract

Deploying large numbers of small, low-power cores has been gaining traction recently as a system design strategy in high performance computing (HPC). The ARM platform that dominates the embedded and mobile computing segments is now being considered as an alternative to high-end x86 processors that largely dominate HPC because peak performance per watt may be substantially improved using off-the-shelf commodity processors.

In this work we methodically characterize the performance and energy of HPC computations drawn from a number of problem domains on current ARM and x86 processors. Unsurprisingly, we find that the performance, energy and energy-delay product of applications running on these platforms varies significantly across problem types and inputs. Using static program analysis we further show that this variation can be explained largely in terms of the capabilities of two processor subsystems: single instruction multiple data (SIMD)/floating point and the cache/memory hierarchy; and that static analysis of this kind is sufficient to predict which platform is best for a particular application/input pair. In the context of these findings, we evaluate how some of the key architectural changes being made for upcoming 64-bit ARM platforms may impact HPC application performance.

Download to read the full chapter text

Chapter PDF

The survey on ARM processors for HPC

Article 08 June 2019

Evaluating the Performance of Kunpeng 920 Processors on Modern HPC Applications

The Investigation of the ARMv7 and Intel Haswell Architectures Suitability for Performance and Energy-Aware Computing

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

CORAL Benchmark Codes (2013), https://asc.llnl.gov/CORAL-benchmarks/
The Top 500 list (November 2013), http://www.top500.org
Yokogawa: WT300 Series Digital Power Meters, http://tmi.yokogawa.com/us/products/digital-power-analyzers/digital-power-analyzers/wt300-series-digital-power-meters/
Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al.: The landscape of parallel computing research: A view from berkeley. Technical report, Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)
Google Scholar
Attig, N., Gibbon, P., Lippert, T.: Trends in supercomputing: The european path to exascale. Computer Physics Communications 182(9), 2041–2046 (2011)
Article Google Scholar
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The nas parallel benchmarks summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, 1991, pp. 158–165. IEEE (1991)
Google Scholar
Blem, E.R., Menon, J., Sankaralingam, K.: Power struggles: Revisiting the risc vs. cisc debate on contemporary arm and x86 architectures. In: HPCA, pp. 1–12 (2013)
Google Scholar
Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. International Journal of High Performance Computing Applications 21(4), 457–466 (2007)
Article Google Scholar
Carrington, L., Laurenzano, M., Snavely, A., Campbell, R.L., Davis, L.P.: How well can simple metrics represent the performance of hpc applications? In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC 2005, p. 48. IEEE Computer Society, Washington, DC (2005)
Google Scholar
Carrington, L., Snavely, A., Gao, X., Wolter, N.: A performance prediction framework for scientific applications. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J. J., Zomaya, A.Y. (eds.) ICCS 2003, Part III. LNCS, vol. 2659, pp. 926–935. Springer, Heidelberg (2003)
Chapter Google Scholar
Cordery, M., Austin, B., Wassermann, H., Daley, C., Wright, N., Hammond, S., Doerfler, D.: Analysis of cray xc30 performance using trinity-nersc-8 benchmarks and comparison with cray xe6 and ibm bg/q (2013)
Google Scholar
Analytics, E.P.: EPAX Toolkit: Binary Analysis for ARM (2014), http://epaxtoolkit.com/
Fürlinger, K., Klausecker, C., Kranzlmüller, D.: Towards energy efficient parallel computing on consumer electronic devices. In: Kranzlmüller, D., Toja, A.M. (eds.) ICT-GLOW 2011. LNCS, vol. 6868, pp. 1–9. Springer, Heidelberg (2011)
Chapter Google Scholar
Goodacre, J.: Technology preview: The armv8 architecture. White Paper (November 2011)
Google Scholar
Goodacre, J., Cambridge, A.: The evolution of the arm architecture towards big data and the data-centre. In: Proceedings of the 8th Workshop on Virtualization in High-Performance Cloud Computing, p. 4. ACM (2013)
Google Scholar
Heroux, M.A., Doerfler, D.W., Crozier, P.S., Willenbring, J.M., Edwards, H.C., Williams, A., Rajan, M., Keiter, E.R., Thornquist, H.K., Numrich, R.W.: Improving performance via mini-applications. Sandia National Laboratories, Tech. Rep. (2009)
Google Scholar
Hölzle, U.: Brawny cores still beat wimpy cores, most of the time. IEEE Micro 30(4) (2010)
Google Scholar
Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing (CDROM), Supercomputing 2001, pp. 37–37. ACM, New York (2001)
Chapter Google Scholar
Kerbyson, D.J., Jones, P.W.: A performance model of the parallel ocean program. Int. J. High Perform. Comput. Appl. 19(3), 261–276 (2005)
Article Google Scholar
Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., et al.: Exascale computing study: Technology challenges in achieving exascale systems (2008)
Google Scholar
Laurenzano, M.A., Meswani, M., Carrington, L., Snavely, A., Tikir, M.M., Poole, S.: Reducing energy usage with memory and computation-aware dynamic frequency scaling. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 79–90. Springer, Heidelberg (2011)
Chapter Google Scholar
Laurenzano, M.A., Tikir, M.M., Carrington, L., Snavely, A.: Pebil: Efficient static binary instrumentation for linux. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS 2010, pp. 175–183. IEEE (2010)
Google Scholar
Ou, Z., Pang, B., Deng, Y., Nurminen, J.K., Yla-Jaaski, A., Hui, P.: Energy- and cost-efficiency analysis of arm-based clusters. In: Symposium on Cluster, Cloud and Grid Computing, CCGRID (2012)
Google Scholar
Padoin, E.L., de Oliveira, D.A., Velho, P., Navaux, P.O., Videau, B., Degomme, A., Mehaut, J.-F.: Scalability and energy efficiency of hpc cluster with arm mpsoc
Google Scholar
Pouchet, L.-N.: PolyBench: The Polyhedral Benchmark suite (2012), http://www.cse.ohio-state.edu/~pouchet/software/polybench/
Rajovic, N., Rico, A., Vipond, J., Gelado, I., Puzovik, N., Ramirez, A.: Experiences with mobile processors for energy efficient hpc. In: Design, Automation and Test in Europe Conference and Exhibition, DATE (2013)
Google Scholar
Shalf, J., Dosanjh, S., Morrison, J.: Exascale computing technology challenges. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 1–25. Springer, Heidelberg (2011)
Chapter Google Scholar
Sharkawi, S., DeSota, D., Panda, R., Stevens, S., Taylor, V., Wu, X.: Swapp: A framework for performance projections of hpc applications using benchmarks. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW 2012, pp. 1722–1731. IEEE Computer Society, Washington, DC (2012)
Google Scholar
Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., Purkayastha, A.: A framework for performance modeling and prediction. In: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, Supercomputing 2002, pp. 1–17. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Snir, M., Gropp, W., Kogge, P.: Exascale research: Preparing for the post–moore era (2011)
Google Scholar
Vogt, W.P., Johnson, R.B.: Dictionary of statistics & methodology: A nontechnical guide for the social sciences. Sage (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

EP Analytics, USA
Michael A. Laurenzano, Ananta Tiwari, Adam Jundt, Joshua Peraza & Laura Carrington
Dept. of Computer Science and Engineering, University of Michigan, USA
Michael A. Laurenzano
Performance Modeling and Characterization Lab., San Diego Supercomputer Center, USA
Ananta Tiwari & Laura Carrington
U.S. Dept. of Defense, High Performance Computing Modernization Program, USA
William A. Ward Jr. & Roy Campbell

Authors

Michael A. Laurenzano
View author publications
You can also search for this author in PubMed Google Scholar
Ananta Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Adam Jundt
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Peraza
View author publications
You can also search for this author in PubMed Google Scholar
William A. Ward Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Roy Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Laura Carrington
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRACS/INESC-TEC and FCUP, Universidade do Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal
Fernando Silva , Inês Dutra & Vítor Santos Costa , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laurenzano, M.A. et al. (2014). Characterizing the Performance-Energy Tradeoff of Small ARM Cores in HPC Computation. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-09873-9_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Characterizing the Performance-Energy Tradeoff of Small ARM Cores in HPC Computation

Abstract

Chapter PDF

Similar content being viewed by others

The survey on ARM processors for HPC

Evaluating the Performance of Kunpeng 920 Processors on Modern HPC Applications

The Investigation of the ARMv7 and Intel Haswell Architectures Suitability for Performance and Energy-Aware Computing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Characterizing the Performance-Energy Tradeoff of Small ARM Cores in HPC Computation

Abstract

Chapter PDF

Similar content being viewed by others

The survey on ARM processors for HPC

Evaluating the Performance of Kunpeng 920 Processors on Modern HPC Applications

The Investigation of the ARMv7 and Intel Haswell Architectures Suitability for Performance and Energy-Aware Computing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation