Abstract
As a consequence of the immense computational power available in GPUs, the usage of these platforms for running data-intensive general purpose programs has been increasing. Since memory and processor architectures of CPUs and GPUs are substantially different, programs designed for each platform are also very different and often resort to a very distinct set of algorithms and data structures. Selecting between the CPU or GPU for a given program is not easy as there are variations in the hardware of the GPU, in the amount of data, and in several other performance factors.
ÆminiumGPU is a new data-parallel framework for developing and running parallel programs on CPUs and GPUs. ÆminiumGPU programs are written in a Java using Map-Reduce primitives and are compiled into hybrid executables which can run in either platforms. Thus, the decision of which platform is going to be used for executing a program is delayed until run-time and automatically performed by the system using Machine-Learning techniques.
Our tests show that ÆminiumGPU is able to achieve speedups up to 65x and that the average accuracy of the platform selection algorithm, in choosing the best platform for executing a program, is above 92%.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Stork, S., Marques, P., Aldrich, J.: Concurrency by default: using permissions to express dataflow in stateful programs. In: OOPSLA Companion, pp. 933–940 (2009)
Pawlak, R., Noguera, C., Petitprez, N.: Spoon: Program analysis and transformation in java (2006)
Harris, M.: Optimizing parallel reduction in cuda (2010)
Russell, T., Malik, A.M., Chase, M., van Beek, P.: Learning basic block scheduling heuristics from optimal data. In: Proceedings of the 2005 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON 2005. IBM Press (2005)
Cavazos, J., Moss, J.E.B.: Inducing heuristics to decide whether to schedule. SIGPLAN Not. 39(6), 183–194 (2004)
Wang, Z., O’Boyle, M.F.: Mapping parallelism to multi-cores: a machine learning based approach. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2009, pp. 75–84. ACM, New York (2009)
Holmes, G., Donkin, A., Witten, I.: Weka: A machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems 1994, pp. 357–361. IEEE (1994)
Frost, G.: Aparapi (2011), http://code.google.com/p/aparapi/
Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: Compiling an embedded data parallel language. In: Principles and Practices of Parallel Programming (PPoPP), pp. 47–56 (2011)
Chafik, O.: Scalacl (2011), http://code.google.com/p/scalacl/
Cunningham, D., Bordawekar, R., Saraswat, V.: Gpu programming in a high level language: compiling x10 to cuda. In: Proceedings of the 2011 ACM SIGPLAN X10 Workshop, X10 2011, pp. 8:1–8:10. ACM, New York (2011)
Chakravarty, M., Keller, G., Lee, S., McDonell, T., Grover, V.: Accelerating haskell array codes with multicore gpus. In: Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming, pp. 3–14. ACM (2011)
Leung, A., Lhoták, O., Lashari, G.: Automatic parallelization for graphics processing units. In: Proceedings of the 7th International Conference on Principles and Practice of Programming in Java, pp. 91–100. ACM (2009)
He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a mapreduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT 2008, pp. 260–269. ACM, New York (2008)
Hong, C., Chen, D., Chen, W., Zheng, W., Lin, H.: Mapcg: writing parallel program portable between cpu and gpu. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT 2010, pp. 217–226. ACM, New York (2010)
Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, vol. 42, pp. 45–55. ACM, New York (2009)
Joselli, M., Zamith, M., Clua, E., Montenegro, A., Conci, A., Leal-Toledo, R., Valente, L., Feijó, B., d’Ornellas, M., Pozzer, C.: Automatic dynamic task distribution between cpu and gpu for real-time systems. In: 11th IEEE International Conference on Computational Science and Engineering, CSE 2008. IEEE (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fonseca, A., Cabral, B. (2013). ÆminiumGPU: An Intelligent Framework for GPU Programming. In: Keller, R., Kramer, D., Weiss, JP. (eds) Facing the Multicore-Challenge III. Lecture Notes in Computer Science, vol 7686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35893-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-35893-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35892-0
Online ISBN: 978-3-642-35893-7
eBook Packages: Computer ScienceComputer Science (R0)