Abstract
Deploying large numbers of small, low-power cores has been gaining traction recently as a system design strategy in high performance computing (HPC). The ARM platform that dominates the embedded and mobile computing segments is now being considered as an alternative to high-end x86 processors that largely dominate HPC because peak performance per watt may be substantially improved using off-the-shelf commodity processors.
In this work we methodically characterize the performance and energy of HPC computations drawn from a number of problem domains on current ARM and x86 processors. Unsurprisingly, we find that the performance, energy and energy-delay product of applications running on these platforms varies significantly across problem types and inputs. Using static program analysis we further show that this variation can be explained largely in terms of the capabilities of two processor subsystems: single instruction multiple data (SIMD)/floating point and the cache/memory hierarchy; and that static analysis of this kind is sufficient to predict which platform is best for a particular application/input pair. In the context of these findings, we evaluate how some of the key architectural changes being made for upcoming 64-bit ARM platforms may impact HPC application performance.
Chapter PDF
Similar content being viewed by others
Keywords
- High Performance Computing
- Single Instruction Multiple Data
- Application Benchmark
- Power Draw
- High Performance Computing System
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
CORAL Benchmark Codes (2013), https://asc.llnl.gov/CORAL-benchmarks/
The Top 500 list (November 2013), http://www.top500.org
Yokogawa: WT300 Series Digital Power Meters, http://tmi.yokogawa.com/us/products/digital-power-analyzers/digital-power-analyzers/wt300-series-digital-power-meters/
Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al.: The landscape of parallel computing research: A view from berkeley. Technical report, Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)
Attig, N., Gibbon, P., Lippert, T.: Trends in supercomputing: The european path to exascale. Computer Physics Communications 182(9), 2041–2046 (2011)
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The nas parallel benchmarks summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, 1991, pp. 158–165. IEEE (1991)
Blem, E.R., Menon, J., Sankaralingam, K.: Power struggles: Revisiting the risc vs. cisc debate on contemporary arm and x86 architectures. In: HPCA, pp. 1–12 (2013)
Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. International Journal of High Performance Computing Applications 21(4), 457–466 (2007)
Carrington, L., Laurenzano, M., Snavely, A., Campbell, R.L., Davis, L.P.: How well can simple metrics represent the performance of hpc applications? In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC 2005, p. 48. IEEE Computer Society, Washington, DC (2005)
Carrington, L., Snavely, A., Gao, X., Wolter, N.: A performance prediction framework for scientific applications. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J. J., Zomaya, A.Y. (eds.) ICCS 2003, Part III. LNCS, vol. 2659, pp. 926–935. Springer, Heidelberg (2003)
Cordery, M., Austin, B., Wassermann, H., Daley, C., Wright, N., Hammond, S., Doerfler, D.: Analysis of cray xc30 performance using trinity-nersc-8 benchmarks and comparison with cray xe6 and ibm bg/q (2013)
Analytics, E.P.: EPAX Toolkit: Binary Analysis for ARM (2014), http://epaxtoolkit.com/
Fürlinger, K., Klausecker, C., Kranzlmüller, D.: Towards energy efficient parallel computing on consumer electronic devices. In: Kranzlmüller, D., Toja, A.M. (eds.) ICT-GLOW 2011. LNCS, vol. 6868, pp. 1–9. Springer, Heidelberg (2011)
Goodacre, J.: Technology preview: The armv8 architecture. White Paper (November 2011)
Goodacre, J., Cambridge, A.: The evolution of the arm architecture towards big data and the data-centre. In: Proceedings of the 8th Workshop on Virtualization in High-Performance Cloud Computing, p. 4. ACM (2013)
Heroux, M.A., Doerfler, D.W., Crozier, P.S., Willenbring, J.M., Edwards, H.C., Williams, A., Rajan, M., Keiter, E.R., Thornquist, H.K., Numrich, R.W.: Improving performance via mini-applications. Sandia National Laboratories, Tech. Rep. (2009)
Hölzle, U.: Brawny cores still beat wimpy cores, most of the time. IEEE Micro 30(4) (2010)
Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing (CDROM), Supercomputing 2001, pp. 37–37. ACM, New York (2001)
Kerbyson, D.J., Jones, P.W.: A performance model of the parallel ocean program. Int. J. High Perform. Comput. Appl. 19(3), 261–276 (2005)
Kogge, P., Bergman, K., Borkar, S., Campbell, D., Carson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hill, K., et al.: Exascale computing study: Technology challenges in achieving exascale systems (2008)
Laurenzano, M.A., Meswani, M., Carrington, L., Snavely, A., Tikir, M.M., Poole, S.: Reducing energy usage with memory and computation-aware dynamic frequency scaling. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 79–90. Springer, Heidelberg (2011)
Laurenzano, M.A., Tikir, M.M., Carrington, L., Snavely, A.: Pebil: Efficient static binary instrumentation for linux. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS 2010, pp. 175–183. IEEE (2010)
Ou, Z., Pang, B., Deng, Y., Nurminen, J.K., Yla-Jaaski, A., Hui, P.: Energy- and cost-efficiency analysis of arm-based clusters. In: Symposium on Cluster, Cloud and Grid Computing, CCGRID (2012)
Padoin, E.L., de Oliveira, D.A., Velho, P., Navaux, P.O., Videau, B., Degomme, A., Mehaut, J.-F.: Scalability and energy efficiency of hpc cluster with arm mpsoc
Pouchet, L.-N.: PolyBench: The Polyhedral Benchmark suite (2012), http://www.cse.ohio-state.edu/~pouchet/software/polybench/
Rajovic, N., Rico, A., Vipond, J., Gelado, I., Puzovik, N., Ramirez, A.: Experiences with mobile processors for energy efficient hpc. In: Design, Automation and Test in Europe Conference and Exhibition, DATE (2013)
Shalf, J., Dosanjh, S., Morrison, J.: Exascale computing technology challenges. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 1–25. Springer, Heidelberg (2011)
Sharkawi, S., DeSota, D., Panda, R., Stevens, S., Taylor, V., Wu, X.: Swapp: A framework for performance projections of hpc applications using benchmarks. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW 2012, pp. 1722–1731. IEEE Computer Society, Washington, DC (2012)
Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., Purkayastha, A.: A framework for performance modeling and prediction. In: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, Supercomputing 2002, pp. 1–17. IEEE Computer Society Press, Los Alamitos (2002)
Snir, M., Gropp, W., Kogge, P.: Exascale research: Preparing for the post–moore era (2011)
Vogt, W.P., Johnson, R.B.: Dictionary of statistics & methodology: A nontechnical guide for the social sciences. Sage (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Laurenzano, M.A. et al. (2014). Characterizing the Performance-Energy Tradeoff of Small ARM Cores in HPC Computation. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-09873-9_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)