Abstract
This paper presents experiences and results obtained in optimizing the parallel communication performance of a production-quality medical image reconstruction application. The fundamental communication operations in the application’s principal algorithm are collective reductions. The overhead of these operations was reduced by transforming the algorithm to overlap its computation and communication. Several different approaches to communication progress were studied, both user-directed and asynchronous. Experimental results comparing the new approach to the previous implementation show overall application performance improvements of up to 8%, when run on 32 nodes.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Communication Overhead
- Message Passing Interface
- Communication Optimization
- Collective Operation
- Strong Scaling
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Kösters, T., Wübbeling, F., Natterer, F.: Scatter correction in PET using the transport equation. In: IEEE Nuclear Science Symposium Conference Record, pp. 3305–3309. IEEE, Los Alamitos (2006)
Schellmann, M., Gorlatch, S.: Comparison of two decomposition strategies for parallelizing the 3d list-mode OSEM algorithm. In: Proceedings Fully 3D Meeting and HPIR Workshop, pp. 37–40 (2007)
Message Passing Interface Forum: MPI-2: Extensions to the Message-Passing Interface. Technical Report, University of Tennessee, Knoxville (1997)
Gorlatch, S.: Send-receive considered harmful: Myths and realities of message passing. ACM Trans. Program. Lang. Syst. 26(1), 47–56 (2004)
Brightwell, R., Riesen, R., Underwood, K.D.: Analyzing the impact of overlap, offload, and independent progress for message passing interface applications. Int. J. High Perform. Comput. Appl. 19(2), 103–117 (2005)
Hoefler, T., Lumsdaine, A., Rehm, W.: Implementation and Performance Analysis of Non-Blocking Collective Operations for MPI. In: Proceedings of the 2007 International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2007. IEEE Computer Society/ACM (November 2007)
Reader, A.J., Erlandsson, K., Flower, M.A., Ott, R.J.: Fast accurate iterative reconstruction for low-statistics positron volume imaging. Phys. Med. Biol. 43(4), 823–834 (1998)
Schäfers, K.P., Reader, A.J., Kriens, M., Knoess, C., Schober, O., Schäfers, M.: Performance evaluation of the 32-module quadHIDAC small-animal PET scanner. Journal Nucl. Med. 46(6), 996–1004 (2005)
Hoefler, T., Lumsdaine, A.: Optimizing non-blocking Collective Operations for InfiniBand. In: Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS) (April 2008)
Hoefler, T., Schneider, T., Lumsdaine, A.: Accurately Measuring Collective Operations at Massive Scale. In: Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS) (April 2008)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hoefler, T., Schellmann, M., Gorlatch, S., Lumsdaine, A. (2008). Communication Optimization for Medical Image Reconstruction Algorithms. In: Lastovetsky, A., Kechadi, T., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2008. Lecture Notes in Computer Science, vol 5205. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87475-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-87475-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87474-4
Online ISBN: 978-3-540-87475-1
eBook Packages: Computer ScienceComputer Science (R0)