ÆminiumGPU: An Intelligent Framework for GPU Programming

Fonseca, Alcides; Cabral, Bruno

doi:10.1007/978-3-642-35893-7_9

Alcides Fonseca¹⁹ &
Bruno Cabral¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7686))

1266 Accesses
6 Citations

Abstract

As a consequence of the immense computational power available in GPUs, the usage of these platforms for running data-intensive general purpose programs has been increasing. Since memory and processor architectures of CPUs and GPUs are substantially different, programs designed for each platform are also very different and often resort to a very distinct set of algorithms and data structures. Selecting between the CPU or GPU for a given program is not easy as there are variations in the hardware of the GPU, in the amount of data, and in several other performance factors.

ÆminiumGPU is a new data-parallel framework for developing and running parallel programs on CPUs and GPUs. ÆminiumGPU programs are written in a Java using Map-Reduce primitives and are compiled into hybrid executables which can run in either platforms. Thus, the decision of which platform is going to be used for executing a program is delayed until run-time and automatically performed by the system using Machine-Learning techniques.

Our tests show that ÆminiumGPU is able to achieve speedups up to 65x and that the average accuracy of the platform selection algorithm, in choosing the best platform for executing a program, is above 92%.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms

Genetic Algorithm Modeling with GPU Parallel Computing Technology

PHINEAS: An Embedded Heterogeneous Parallel Platform

Keywords

References

Stork, S., Marques, P., Aldrich, J.: Concurrency by default: using permissions to express dataflow in stateful programs. In: OOPSLA Companion, pp. 933–940 (2009)
Google Scholar
Pawlak, R., Noguera, C., Petitprez, N.: Spoon: Program analysis and transformation in java (2006)
Google Scholar
Harris, M.: Optimizing parallel reduction in cuda (2010)
Google Scholar
Russell, T., Malik, A.M., Chase, M., van Beek, P.: Learning basic block scheduling heuristics from optimal data. In: Proceedings of the 2005 Conference of the Centre for Advanced Studies on Collaborative Research, CASCON 2005. IBM Press (2005)
Google Scholar
Cavazos, J., Moss, J.E.B.: Inducing heuristics to decide whether to schedule. SIGPLAN Not. 39(6), 183–194 (2004)
Article Google Scholar
Wang, Z., O’Boyle, M.F.: Mapping parallelism to multi-cores: a machine learning based approach. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2009, pp. 75–84. ACM, New York (2009)
Google Scholar
Holmes, G., Donkin, A., Witten, I.: Weka: A machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems 1994, pp. 357–361. IEEE (1994)
Google Scholar
Frost, G.: Aparapi (2011), http://code.google.com/p/aparapi/
Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: Compiling an embedded data parallel language. In: Principles and Practices of Parallel Programming (PPoPP), pp. 47–56 (2011)
Google Scholar
Chafik, O.: Scalacl (2011), http://code.google.com/p/scalacl/
Cunningham, D., Bordawekar, R., Saraswat, V.: Gpu programming in a high level language: compiling x10 to cuda. In: Proceedings of the 2011 ACM SIGPLAN X10 Workshop, X10 2011, pp. 8:1–8:10. ACM, New York (2011)
Chapter Google Scholar
Chakravarty, M., Keller, G., Lee, S., McDonell, T., Grover, V.: Accelerating haskell array codes with multicore gpus. In: Proceedings of the Sixth Workshop on Declarative Aspects of Multicore Programming, pp. 3–14. ACM (2011)
Google Scholar
Leung, A., Lhoták, O., Lashari, G.: Automatic parallelization for graphics processing units. In: Proceedings of the 7th International Conference on Principles and Practice of Programming in Java, pp. 91–100. ACM (2009)
Google Scholar
He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a mapreduce framework on graphics processors. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT 2008, pp. 260–269. ACM, New York (2008)
Chapter Google Scholar
Hong, C., Chen, D., Chen, W., Zheng, W., Lin, H.: Mapcg: writing parallel program portable between cpu and gpu. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT 2010, pp. 217–226. ACM, New York (2010)
Chapter Google Scholar
Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, vol. 42, pp. 45–55. ACM, New York (2009)
Chapter Google Scholar
Joselli, M., Zamith, M., Clua, E., Montenegro, A., Conci, A., Leal-Toledo, R., Valente, L., Feijó, B., d’Ornellas, M., Pozzer, C.: Automatic dynamic task distribution between cpu and gpu for real-time systems. In: 11th IEEE International Conference on Computational Science and Engineering, CSE 2008. IEEE (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Coimbra, Portugal
Alcides Fonseca & Bruno Cabral

Authors

Alcides Fonseca
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Cabral
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Topology, Computer Science and Mathematics, University of Applied Science, Schellingstraße 24, 70174, Stuttgart, Germany
Rainer Keller
Institute of Computer Science and Engineering, Karlsruhe Institute of Technology (KIT), Haid-und-Neu-Straße 7, 76131, Karlsruhe, Germany
David Kramer
Institute for Applied and Numerical Mathematics 4, Karlsruhe Institut of Technology, Fritz-Erler-Straße 23, 76133, Karlsruhe, Germany
Jan-Philipp Weiss

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fonseca, A., Cabral, B. (2013). ÆminiumGPU: An Intelligent Framework for GPU Programming. In: Keller, R., Kramer, D., Weiss, JP. (eds) Facing the Multicore-Challenge III. Lecture Notes in Computer Science, vol 7686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35893-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-35893-7_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35892-0
Online ISBN: 978-3-642-35893-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ÆminiumGPU: An Intelligent Framework for GPU Programming

Abstract

Chapter PDF

Similar content being viewed by others

Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms

Genetic Algorithm Modeling with GPU Parallel Computing Technology

PHINEAS: An Embedded Heterogeneous Parallel Platform

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

ÆminiumGPU: An Intelligent Framework for GPU Programming

Abstract

Chapter PDF

Similar content being viewed by others

Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms

Genetic Algorithm Modeling with GPU Parallel Computing Technology

PHINEAS: An Embedded Heterogeneous Parallel Platform

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation