Abstract
With the popularity of uncertain data, queries over uncertain graphs have become a hot topic in the database community. As one of the important queries, the shortest path query over an uncertain graph has attracted much attention of researchers due to its wide applications. Although there are some efficient solutions addressing this problem, all existing models ignore an important property existing in uncertain graphs: the correlation among the edges sharing the same vertex. In this paper, we apply Markov network to model the hidden correlation in uncertain graphs and compute the shortest path. Unfortunately, calculating the shortest path and corresponding probability over uncertain graphs modeled by Markov networks is a #P-hard problem. Thus, we propose a filtering-and-verification framework to accelerate the queries. In the filtering phase, we design a probabilistic shortest path index based on vertex cuts and blocks of a graph. We find a series of upper bounds and prune the vertices and edges whose upper bounds of the shortest path probability are lower than the threshold. By carefully picking up the blocks and vertex cuts, the index is optimized to have the maximum pruning capability, so that we can filter a large number of vertices which make no contribution to the final shortest path query results. In the verification phase, we develop an efficient sampling algorithm to determine the final query answers. Finally, we verify the efficiency and effectiveness of our solutions with extensive experiments.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Yuan Y, Chen L, Wang G. Efficiently answering probability threshold-based shortest path queries over uncertain graphs. In Proc. the 15th DASFAA, Apr. 2010, pp.155-170.
Asthana S, King O D, Gibbons F D, Roth F P. Predicting protein complex membership using probabilistic network reliability. Genome Research, 2004, 14(6): 1170-1175.
Nierman A, Jagadish H. ProTDB: Probabilistic data in XML. In Proc. the 28th VLDB, Aug. 2002, pp.646-657.
Adar E, Ré C. Managing uncertainty in social networks. IEEE Data Eng. Bull, 2007, 30(2): 15-22.
Lian X, Chen L. Efficient query answering in probabilistic RDF graphs. In Proc. ACM SIGMOD International Conference on Management of Data, May 2011, pp.157-168.
Zou L, Peng P, Zhao D. Top-K possible shortest path query over a large uncertain graph. In Proc. the 12th Int. Conf. Web Information System Engineering, Oct. 2011, pp.72-86.
Tao Y, Sheng C, Pei J. On k-skip shortest paths. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2011, pp.421-432.
Yao B, Tang M, Li F. Multi-approximate-keyword routing in GIS data. In Proc. the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Nov. 2011, pp.201-210.
Fan W, Wang X, Wu Y. Performance guarantees for distributed reachability queries. Proceedings of the VLDB Endowment, 2012, 5(11): 1304-1315.
Gao J, Jin R, Zhou J, Yu J X, Jiang X, Wang T. Relational approach for shortest path discovery over large graphs. Proceedings of the VLDB Endowment, 2011, 5(4): 358-369.
Yan X, Yu P S, Han J. Substructure similarity search in graph databases. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2005, pp.766-777.
Atkinson K E. An Introduction to Numerical Analysis. John Wiley & Sons, 2008.
Deshpande A, Getoor L, Sen P. Graphical models for uncertain data. Managing and Mining Uncertain Data, Aggarwal C C (ed.), Springer, 1980: 77-112.
Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques (1 edition). The MIT Press, 2009.
Cheng Y, Yuan Y, Wang G, Qiao B, Wang Z. Efficient sampling methods for shortest path query over uncertain graphs. In Proc. the 19th DASFAA, Apr. 2014, pp.124-140.
Jin R, Ruan N, Xiang Y, Lee V. A highway-centric labeling approach for answering distance queries on large sparse graphs. In Proc. ACM SIGMOD International Conference on Management of Data, May 2012, pp.445-456.
Zou Z, Gao H, Li J. Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In Proc. the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul. 2010, pp.633-642.
Chandran L S, Ram L S. On the number of minimum cuts in a graph. SIAM Journal on Discrete Mathematics, 2004, 18(1): 177-194.
Shoikhet K, Geiger D. A practical algorithm for finding optimal triangulations. In Proc. the 14th National Conference on Artificial Intelligence and the 9th Innovative Applications of Artificial Intelligence Conference (AAAI), Jul. 1997, pp.185-190.
Becker A, Geiger D. A sufficiently fast algorithm for finding close to optimal junction trees. In Proc. the 12th International Conference on Uncertainty in Artificial Intelligence, Aug. 1996, pp.81-89.
Garg N, Vazirani V V, Yannakakis M. Multiway cuts in directed and node weighted graphs. In Proc. the 21st Int. Colloquium on Automata, Languages and Programming, Jul. 1994, pp.487-498.
Chua H N, Sung W K, Wong L. Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics, 2006, 22(13): 1623-1630.
Hua M, Pei J. Probabilistic path queries in road networks: Traffic uncertainty aware path selection. In Proc. the 13th International Conference on Extending Database Technology (EDBT), Mar. 2010, pp.347-358.
Bast H, Funke S, Matijevic D. TRANSIT—Ultrafast shortestpath queries with linear-time preprocessing. In Proc. the 9th DIMACS Implementation Challenge: Shortest Paths, Nov. 2006.
Delling D, Sanders P, Schultes D, Wagner D. Engineering route planning algorithms. In Algorithmics of Large and Complex Networks, Lerner J, Wagner D, Iweig K A (eds.), Springer, 2009, pp.117-139.
Klein P N, Mozes S, Weimann O. Shortest paths in directed planar graphs with negative lengths: A linear-space O(n log2 n)-time algorithm. ACM Transactions on Algorithms (TALG), 2010, 6(2): Article No. 30.
Cohen E, Halperin E, Kaplan H, Zwick U. Reachability and distance queries via 2-hop labels. SIAM Journal on Comp., 2003, 32(5): 1338-1355.
Wei F. TEDI: Efficient shortest path query answering on graphs. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2010, pp.99-110.
Potamias M, Bonchi F, Castillo C, Gionis A. Fast shortest path distance estimation in large networks. In Proc. the 18th ACM Conference on Information and Knowledge Management, Nov. 2009, pp.867-876.
Gubichev A, Bedathur S, Seufert S,Weikum G. Fast and accurate estimation of shortest paths in large graphs. In Proc. the 19th ACM Conference on Information and Knowledge Management, Nov. 2010, pp.499-508.
Sankaranarayanan J, Samet H, Alborzi H. Path oracles for spatial networks. Proceedings of the VLDB Endowment, 2009, 2(1): 1210-1221.
Samet H, Sankaranarayanan J, Alborzi H. Scalable network distance browsing in spatial databases. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2008, pp.43-54.
Sankaranarayanan J, Alborzi H, Samet H. Efficient query processing on spatial networks. In Proc. the 13th International Workshop on Geographic Information Systems (GIS), Nov. 2005, pp.200-209.
Rice M, Tsotras V J. Graph indexing of road networks for shortest path queries with label restrictions. Proceedings of the VLDB Endowment, 2010, 4(2): 69-80.
Jing N, Huang Y W, Rundensteiner E A. Hierarchical encoded path views for path query processing: An optimal model and its performance evaluation. IEEE Transactions on Knowledge and Data Engineering, 1998, 10(3): 409-432.
Wu L, Xiao X, Deng D, Cong G, Zhu A D, Zhou S. Shortest path and distance queries on road networks: An experimental evaluation. Proceedings of the VLDB Endowment, 2012, 5(5): 406-417.
Cheng J, Ke Y, Chu S, Cheng C. Efficient processing of distance queries in large graphs: A vertex cover approach. In Proc. ACM SIGMOD International Conference on Management of Data, May 2012, pp.457-468.
Tong Y, Chen L, Cheng Y, Yu P S. Mining frequent itemsets over uncertain databases. Proceedings of the VLDB Endowment, 2012, 5(11): 1650-1661.
Tong Y, Chen L, Ding B. Discovering threshold-based frequent closed itemsets over probabilistic data. In Proc. the 28th International Conference on Data Engineering (ICDE), Apr. 2012, pp.270-281.
Yuan Y, Wang G, Chen L, Wang H. Efficient subgraph similarity search on large probabilistic graph databases. Proceedings of the VLDB Endowment, 2012, 5(9): 800-811.
Zou Z, Li J, Gao H, Zhang S. Mining frequent subgraph patterns from uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(9): 1203-1218.
Tong Y, Zhang X, Chen L. Tracking frequent items over distributed probabilistic data. World Wide Web, 2015.
Kanagal B, Deshpande A. Indexing correlated probabilistic databases. In Proc. ACM SIGMOD International Conference on Management of Data, Jun. 2009, pp.455-468.
Deshpande A, Guestrin C, Hong W H, Madden S. Exploiting correlated attributes in acquisitional query processing. In Proc. the 21st International Conference on Data Engineering (ICDE), Apr. 2005, pp.143-154.
Deshpande A, Garofalakis M, Rastogi R. Independence is good: Dependency-based histogram synopses for highdimensional data. ACM SIGMOD Record, 2001, 30(2): 199-210.
Sen P, Deshpande A, Getoor L. PrDB: Managing and exploiting rich correlations in probabilistic databases. The VLDB Journal, 2009, 18(5): 1065-1090.
Kanagal B, Deshpande A. Lineage processing over correlated probabilistic databases. In Proc. ACM SIGMOD Int. Conf. Management of Data, Jun. 2010, pp.675-686.
Tzoumas K, Deshpande A, Jensen C S. Lightweight graphical models for selectivity estimation without independence assumptions. Proceedings of the VLDB Endowment, 2011, 4(11): 852-863.
Loui R P. Optimal paths in graphs with stochastic or multidimensional weights. Communications of the ACM, 1983, 26(9): 670-676.
Jin R, Liu L, Ding B, Wang H. Distance-constraint reachability computation in uncertain graphs. Proceedings of the VLDB Endowment, 2011, 4(9): 551-562.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cheng, YR., Yuan, Y., Chen, L. et al. Threshold-Based Shortest Path Query over Large Correlated Uncertain Graphs. J. Comput. Sci. Technol. 30, 762–780 (2015). https://doi.org/10.1007/s11390-015-1559-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-015-1559-5