Abstract
The Single Source Shortest Path (SSSP) computation over large graphs has raised significant challenges to the memory capacity and processing efficiency. Utilizing disk-based parallel iterative computing is an economic solution. However, costs of disk I/O and communication affect the performance heavily. This paper proposes a state-transition model for SSSP and then designs two optimization strategies based on it. First, we introduce a tunable hash index to reduce the scale of wasteful data loaded from the disk. Second, we propose a new iterative mechanism and design an Across-step Message Pruning (ASMP) policy to deal with the communication bottleneck. The experimental results illustrate that our SSSP computation is 2 times faster than a basic Giraph (a memory-resident parallel framework) implementation. Compared with Hadoop and Hama (disk-resident parallel frameworks), the speedup is 21 to 43.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Gao, J., Jin, R.M., Zhou, J.S., et al.: Relational Approach for Shortest Path Discovery over Large Graphs. PVLDB 5(4), 358–369 (2012)
Malewicz, G., Austern, M.H., Bik, A.J.C., et al.: Pregel: A System for Large-Scale Graph Processing. In: Proc. of SIGMOD, pp. 135–146 (2010)
Apache Incubator Giraph, http://incubator.apache.org/giraph/
Ewen, S., Tzoumas, K., Kaufmann, M., et al.: Spinning Fast Iterative Data Flows. PVLDB 5(11), 1268–1279 (2012)
Apache Hama, http://hama.apache.org/
Meyer, U., Sanders, P.: Δ-Stepping: A Parallel Single Source Shortest Path Algorithm. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 393–404. Springer, Heidelberg (1998)
Thorup, M.: Undirected Single-Source Shortest Paths with Positive Integer Wights in Linear Time. JACM 46(3), 362–394 (1999)
Meyer, U., Osipov, V.: Design and Implementation of a Practical I/O-efficient Shortest Paths Algorithm. In: Proc. of ALENEX, pp. 85–96 (2009)
Cheng, J., Ke, Y., Chu, S., et al.: Efficient Processing of Distance Queries in Large Graphs: A Vertex Cover Approach. In: Proc. of SIGMOD, pp. 457–468 (2012)
Valiant, L.G.: A Bridging Model for Parallel Computation. Communications of the ACM 33(8), 103–111 (1990)
Apache Hadoop, http://hadoop.apache.org/
Fagin, R., Nievergelt, J., Pippenger, N.: Extendible Hashing - A Fast Access Method for Dynamic Files. TODS 4(3), 315–344 (1979)
Litwin, W.: Linear Hashing: A New Tool for File and Table Addressing. In: Proc. of VLDB, pp. 212–223 (1980)
Xiao, Y.H., Wu, W.T., Pei, J.: Efficiently Indexing Shortest Paths by Exploiting Symmetry in Graphs. In: Proc. of EDBT, pp. 493–504 (2009)
Wei, F.: TEDI: Efficient Shortest Path Query Answering on Graphs. In: Proc. of SIGMOD, pp. 99–110 (2010)
Trinity, http://research.microsoft.com/en-us/projects/trinity/
Bu, Y., Howe, B., Balazinska, M., et al.: HaLoop: Efficient Iterative Data Processing on Large Clusters. PVLDB 3(1-2), 285–296 (2010)
Twister: Iterative MapReduce, http://www.iterativemapreduce.org/
SNAP: Network dataset, http://snap.stanford.edu/data/soc-LiveJournal1.html
9th DIMACS, http://www.dis.uniroma1.it/challenge9/download.shtml
Using the Wikipedia link dataset, http://haselgrove.id.au/wikipedia.htm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, Z., Gu, Y., Zimmermann, R., Yu, G. (2013). Shortest Path Computation over Disk-Resident Large Graphs Based on Extended Bulk Synchronous Parallel Methods. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications. DASFAA 2013. Lecture Notes in Computer Science, vol 7826. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37450-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-37450-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37449-4
Online ISBN: 978-3-642-37450-0
eBook Packages: Computer ScienceComputer Science (R0)