Abstract
The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately available parallelism and further extraction of parallelism is limited by small data sets and a relatively high parallelization overhead. Load balance is difficult to obtain due to the limited parallelism and made worse by non-uniform memory latency. Three parallel OpenMP implementations of the application are discussed and evaluated. We show that with some modifications relative speedups in excess of 9 on a 16 CPU system can be reached.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Vangal S.R., Howard J., Ruhl G., Dighe S., Wilson H., Tschanz J., Finan D., Singh A., Jacob T., Jain S., Erraguntla V., Roberts C., Hoskote Y., Borkar N., Borkar S.: An 80-Tile Sub-100-W TeraFLOPS processor in 65-nm CMOS. IEEE J. Solid-State Circ. 43(1), 29–41 (2008)
Shah, M., Barren, J., Brooks, J., Golla, R., Grohoski, G., Gura, N., Hertherington, R., Jordan, P., Luttrell, M., Olson, C., Sana, B., Sheahan, D., Spracklen, L., Wynn, A.: UltraSPARC T2: a highly-threaded, power-efficient, SPARC SOC. In: Proceedings of IEEE Asian Solid-State Circuits Conference, pp. 22–25 (2007)
Asanovic K., Bodik R., Catanzaro B C., Gebis J.J., Husbands P., Keutzer K., Patterson D.A., Plishker W.L., Shalf J., Williams S.W., Yelick K.A.: The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report. EECS Department, University of California, Berkeley (2006)
OpenMP Architecture Review Board. OpenMP Application Program Interface 2.5 [Online]. Available: http://www.openmp.org. Accessed 8 Oct 2008 (2005)
OpenMP Architecture Review Board. OpenMP Application Program Interface 3.0 [Online]. Available: http://www.openmp.org. Accessed 8 Oct 2008 (2008)
Magarshack, P., Paulin, P.: System-on-chip beyond the nanometer wall. In: Proceedings of Design Automation Conference, pp. 419–424 (2003)
Benini L., De Micheli G.: Networks on chips: a new soc paradigm. Computer 35(1), 70–78 (2002)
Bertozzi D., Jalabert A., Murali S., Tamhankar R., Stergiou S., Benini L., De Micheli G.: NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. IEEE Trans. Parallel Distrib. Syst. 16(2), 113–129 (2005)
Bell, S., Edwards, B., Amann, J., Conlin, R., Joyce, K., Leung, V., MacKay, J., Reif, M., Bao, L., Brown, J., Mattina, M., Miao, C.-C, Ramey, C., Wentzlaff, D., Anderson, W., Berger, E., Fairbanks, N., Khan, D., Montenegro, F., Stickney, J., Zook, J.: TILE64 processor: a 64-Core SoC with mesh interconnect. In: IEEE International Solid-State Circuits Conference—Digest of Technical Papers, pp. 88–598 (2008)
Phillips R., Watson L., Wynne R.: Hybrid image classification and parameter selection using a shared memory parallel algorithm. Comput. Geosci. 33(7), 875–897 (2007)
Meerwald, P., Norcen, R., Uhl, A.: Parallel JPEG2000 image coding on multiprocessors. In: Proceedings of the 16th International Parallel and Distributed Processing Symposium, pp. 9–14 (2002)
Terboven, C., Deselaers, T., Bischof, C., Ney, H.: Shared-memory parallelization for content-based image retrieval. In: Proceedings of European Conference on Computer Vision Workshop on Computation Intensive Methods for Computer Vision, Graz, Austria (May 2006)
an Mey D., Sarholz S., Terboven C.: Nested parallelization with OpenMP. Int. J. Parallel Program. 35(5), 459–476 (2007)
Blikberg R., Srevik T.: Load balancing and OpenMP implementation of nested parallelism. Parallel Comput. 31(10–12), 984–998 (2005)
Duran A., Silvera R., Corbalan J., Labarta J.: Runtime adjustment of parallel nested loops. In: Proceedings of the Workshop on OpenMP Applications and Tools, pp. 137–147 (May 2004)
Duran, A., Gonzalez, M., Corbalan, J.: Automatic thread distribution for nested parallelism in OpenMP. In: Proceedings of the International Conference on Supercomputing, pp. 121–130 (2005)
Clemmensen L.H., Hansen M.E., Frisvad J.C., Ersboll B.K.: A method for comparison of growth media in objective identification of penicillium based on multispectral imaging. J. Microbiol. Methods 69(2), 249–255 (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rasmussen, M.S., Stuart, M.B. & Karlsson, S. Parallelism and Scalability in an Image Processing Application. Int J Parallel Prog 37, 306–323 (2009). https://doi.org/10.1007/s10766-009-0098-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-009-0098-5