Abstract
Model sizes have increased significantly in the fields of engineering and scientific computation. Some additional computing devices such as GPU, accelerators and co-processors have been applied to improve the computation performance. This paper presents several strategies to optimize the computation performance. The first strategy is to combine a computation unit with multiple of 4-tetrahedrons to support AVX vectorization. The second strategy is to utilize a GPU device. Several techniques are proposed to reduce the time for data exchange between host and GPU memory spaces. The proposed techniques are implemented by using OpenCL framework. The mass property of many solid finite elements is calculated and its computation performances on various computation platforms are compared. Numerical experiments showed that computation performance has improved 26.47 times on CPU and 6.95 on GPU, compared to the version without using the proposed techniques.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
TOP500: http://www.top500.org/.
H. Y. Jung, C. W. Jun, and J. H. Sohn, GPU-based collision analysis between a multi-body system and numerous particles, Journal of Mechanical Science and Technology, 27 (4) 2013) 973–980.
C. W. Jun and J. H. Sohn, Numerical efficiency of CUDA based parallel programming for dynamic analysis of multibody systems with multi-joints and multi-force elements, Journal of Mechanical Science and Technology, 27 (12) 2013) 3565–3570.
DAFUL 4.2 User's Manual, Virtual Motion, Inc. (2013).
OpenCL: http://www.khronos.org/opencl/.
A. Munshi, B. Gaster, T. G. Mattson and D. Ginsburg, OpenCL programming guide, Pearson Education (2011).
B. Gaster, L. Howes, D. R. Kaeli, P. Mistry and D. Schaa, Heterogeneous Computing with OpenCL, Newnes (2011).
OpenCL: http://www.khronos.org/opencl/.
F. Tonon, Explicit exact formulas for the 3-D tetrahedron inertia tensor in terms of its vertex coordinates, Journal of Mathematics and Statistics, 1 (1) 2005) 8–11.
Intel OpenCL SDK v1.1, Intel (2013).
R. Karrenberg and S. Hack, Improving Perfor-mance of OpenCL on CPUs, CC'12 Proceedings of the 21st international conference on Compiler Construction, Springer Berlin Heidelberg (2012) 1–20.
http://en.wikipedia.org/wiki/CUDA_Pinned_memory, Wikipedia.
CUDA Programming Guide, NVIDIA (2011).
J. Shen, J. Fang, H. Sips and A. L. Varbanescu, Performance gaps between OpenMP and OpenCL for multi-core CPUs, Parallel Processing Workshops (ICPPW), 2012 41st International Conference, IEEE (2012) 116–125.
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper was presented at the Joint Conference of the 3rd IMSD and the 7th ACMD, Busan, Korea, June, 2014. Recommended by Guest Editor Sung-Soo Kim and Jin Hwan Choi
Ji-Hyun Jung received the B.S. and M.S. in Mechanical Engineering from Hanyang University in 2007 and 2010, respectively. He is now enrolled in the Doctorial course in Mechanical Engineering at Hanyang University. His current research interests are heterogeneous computing with GPU and co-processor and improvement of mechanical software.
Dae-Sung Bae received the M.S. and Ph.D. from the university of Iowa in 1983 and 1986, respectively. His currect research interests are meshfree method and parallel processing in the field of mechanical and structural dynamics.
Rights and permissions
About this article
Cite this article
Jung, JH., Bae, DS. Optimization of operating and assembling mass properties of solid elements on heterogeneous platforms using OpenCL framework. J Mech Sci Technol 29, 2631–2637 (2015). https://doi.org/10.1007/s12206-015-0508-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12206-015-0508-0