Abstract
A growing number of applications store and analyze graph-structured data. These applications impose challenging infrastructure demands due to a need for scalable, high-throughput, and low-latency graph processing. Existing state-of-the-art storage systems and data processing systems are limited in at least one of these dimensions, and simply layering these technologies is inadequate.
We present Concerto, a graph store based on distributed, in-memory data structures. In addition to enabling efficient graph traversals by co-locating graph nodes and associated edges where possible, Concerto provides transactional updates while scaling to hundreds of nodes. Concerto introduces graph views to denote sub-graphs on which user-defined functions can be invoked. Using graph views, programmers can perform event-driven analysis and dynamically optimize application performance. Our results show that Concerto is significantly faster than in-memory MySQL, in-memory Neo4j, and GemFire for graph insertions as well as graph queries. We demonstrate the utility of Concerto’s features in the design of two real-world applications: real-time incident impact analysis on a road network and targeted advertising in a social network.
Chapter PDF
Similar content being viewed by others
References
Facebook’s new realtime analytics system: Hbase to process 20 billion events per day, http://highscalability.com/blog/2011/3/22/facebooks-new-realtime-analytics-system-hbase-to-process-20.html
Twitter by the numbers, http://mehack.com/twitter-by-the-numbers
Miller, M., Gupta, C., Wang, Y.: An empirical analysis of the impact of incidents on freeway traffic. Research paper HPL-2011-134, Hewlett Packard, Palo Alto, CA, USA (2011)
Caltrans performance measurement system (pems), http://pems.dot.ca.gov/
Malewicz, G., Austern, M.H., Bik, A.J., Dehñert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: Proceedings of SIGMOD, pp. 135–146 (2010)
GemFire: Technical white paper, copyright 2005 by gemstone systems (2005), http://community.gemstone.com/display/gemfire60/EDF+Technical+White+Paper
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Proceedings of OSDI 2004, pp. 137–150 (December 2004)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of NSDI, San Jose, CA, pp. 1–14 (2012)
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of OSDI, Hollywood, pp. 1–14 (October 2012)
Lattanzi, S., Moseley, B., Suri, S., Vassilvitskii, S.: Filtering: a method for solving graph problems in mapreduce. In: Proceedings of SPAA, 85–94 (2011)
Infinitegraph: The distributed graph database, http://www.infinitegraph.com/
Neo4j: Nosql for the enterprise, http://neo4j.org/
Twitter flockdb, http://engineering.twitter.com/2010/05/introducing-flockdb.html
Iordanov, B.: HyperGraphDB: A generalized graph database. In: Shen, H.T., Pei, J., Özsu, M.T., Zou, L., Lu, J., Ling, T.-W., Yu, G., Zhuang, Y., Shao, J. (eds.) WAIM 2010. LNCS, vol. 6185, pp. 25–36. Springer, Heidelberg (2010)
Martínez-Bazan, N., Gómez-Villamor, S., Escale-Claveras, F.: Dex: A high-performance graph database management system. In: Proceedings of IEEE ICDE Workshop on Graph Data Management, pp. 124–127. IEEE (2011)
Prabhakaran, V., Wu, M., Weng, X., McSherry, F., Zhou, L., Haridasan, M.: Managing large graphs on multi-cores with graph awareness. In: Proceedings of USENIX ATC, Berkeley, CA, USA, pp. 1–12 (2012)
Shao, B., Wang, H., Li, Y.: Trinity: A distributed graph engine on a memory cloud. In: Proceedings of SIGMOD (2013)
Fitzpatrick, B.: Distributed caching with memcached. Linux Journal 2004(124), 5
Huang, J., Abadi, D.J., Ren, K.: Scalable sparql querying of large rdf graphs, 1123–1134 (August 2011)
Karypis, G., Kumar, V.: Metis - unstructured graph partitioning and sparse matrix ordering system. Technical report, University of Minnesota (1995)
Mondal, J., Deshpande, A.: Managing Large Dynamic Graphs Efficiently. In: Proceedings of SIGMOD, pp. 145–156 (2012)
Aguilera, M.K., Merchant, A., Shah, M.A., Veitch, A.C., Karamanolis, C.T.: Sinfonia: A new paradigm for building scalable distributed systems. ACM Trans. Comput. Syst. 27(3), 1–5 (2009)
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33, 103–111 (1990)
Geambasu, R., Levy, A.A., Kohno, T., Krishnamurthy, A., Levy, H.M.: Comet: An active distributed key-value store. In: Proceedings of OSDI, pp. 1–13 (2010)
Newman, M.E.J., Watts, D.J., Strogatz, S.H.: Random graph models of social networks. Proceedings of the National Academy of Sciences of the United States of America 99, 2566–2572 (2002)
Stanford large network dataset collection, http://snap.stanford.edu/data/index.html
Montresor, A., De Pellegrini, F., Miorandi, D.: Distributed k-core decomposition. In: Proceedings of PODC, pp. 207–208 (2011)
Kwon, J., Mauch, M., Varaiya, P.: The components of congestion: delay from incidents, special events, lane closures, weather, potential ramp metering gain, and demand. In: Proceedings of the TRB 85th Annual Meeting (2006)
Facebook developers: custom audience targeting, https://developers.facebook.com/docs/reference/ads-api/custom-audience-targeting/
Sarwat, M., Elnikety, S., He, Y., Kliot, G.: Horton: Online query execution engine for large distributed graphs. In: Proceedings of ICDE. Demonstration (2012)
Agarwal, V., Petrini, F., Pasetto, D., Bader, D.A.: Scalable graph exploration on multicore processors. In: Proceedings of ACM/IEEE Supercomputing, pp. 1–11. IEEE Computer Society, Washington, DC (2010)
Pearce, R., Gokhale, M., Amato, N.M.: Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In: Proceedings of ACM/IEEE Supercomputing, pp. 1–11. IEEE Computer Society, Washington, DC (2010)
Cheng, R., Hong, J., Kyrola, A., Miao, Y., Weng, X., Wu, M., Yang, F., Zhou, L., Zhao, F., Chen, E.: Kineograph: taking the pulse of a fast-changing and connected world. In: Proceedings of EuroSys, pp. 85–98. ACM, New York (2012)
Gutiérrez, A., Pucheral, P., Steffen, H., Thévenin, J.M.: Database graph views: A practical model to manage persistent graphs. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 IFIP International Federation for Information Processing
About this paper
Cite this paper
Lee, M.M., Roy, I., AuYoung, A., Talwar, V., Jayaram, K.R., Zhou, Y. (2013). Views and Transactional Storage for Large Graphs. In: Eyers, D., Schwan, K. (eds) Middleware 2013. Middleware 2013. Lecture Notes in Computer Science, vol 8275. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45065-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-45065-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45064-8
Online ISBN: 978-3-642-45065-5
eBook Packages: Computer ScienceComputer Science (R0)