Abstract
Crowdsourcing has become a popular means for quickly achieving various tasks in large quantities. CollabMap is an online mapping application in which we crowdsource the identification of evacuation routes in residential areas to be used for planning large-scale evacuations. So far, approximately 38,000 micro-tasks have been completed by over 100 contributors. In order to assist with data verification, we introduced provenance tracking into the application, and approximately 5,000 provenance graphs have been generated. They have provided us various insights into the typical characteristics of provenance graphs in the crowdsourcing context. In particular, we have estimated probability distribution functions over three selected characteristics of these provenance graphs: the node degree, the graph diameter, and the densification exponent. We describe methods to define these three characteristics across specific combinations of node types and edge types, and present our findings in this paper. Applications of our methods include rapid comparison of one provenance graph versus another, or of one style of provenance database versus another. Our results also indicate that provenance graphs represent a suitable area of exploitation for existing network analysis tools concerned with modelling, prediction, and the inference of missing nodes and edges.
Chapter PDF
Similar content being viewed by others
Keywords
- Degree Distribution
- Node Degree
- Community Detection
- Process Network Analysis
- Nonnegative Matrix Factorization
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Altintas, I., Anand, M.K., Crawl, D., Bowers, S., Belloum, A., Missier, P., Ludäscher, B., Goble, C.A., Sloot, P.M.A.: Understanding Collaborative Studies through Interoperable Workflow Provenance. In: McGuinness, D.L., Michaelis, J.R., Moreau, L. (eds.) IPAW 2010. LNCS, vol. 6378, pp. 42–58. Springer, Heidelberg (2010)
Batagelj, V., Mrvar, A.: Pajek-program for large network analysis. Connections 21(2), 47–57 (1998)
Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K., Arbor, A.: Soylent: A Word Processor with a Crowd Inside. In: Artificial Intelligence, pp. 313–322 (2010)
Chung, F., Lu, L.: The average distances in random graphs with given expected degrees. Proc. Natl. Acad. Sci. USA 99, 15879–15882 (2002)
Clauset, A., Shalizi, C., Newman, M.: Power-law distributions in empirical data. SIAM Review 51, 661–703 (2009)
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959)
Kolaczyk, E.: Statistical Analysis of Network Data. Springer (2009)
Leskovec, J., Adamic, L., Huberman, B.: The dynamics of viral marketing. In: ACM Conference on Electronic Commerce (2006)
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data 1(1), 2 (2007)
Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., Ghahramani, Z.: Kronecker Graphs: An Approach to Modeling Networks. Journal of Machine Learning Research 11, 985–1042 (2010)
Margo, D., Smogor, R.: Using provenance to extract semantic file attributes. In: Proceedings of the 2nd Conference on Theory and Practice of Provenance, TAPP 2010, p. 7. USENIX Association, Berkeley (2010)
Milgram, S.: The small world problem. Psychology Today 1, 61–67 (1967)
Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E., Van den Bussche, J.: The Open Provenance Model core specification (v1.1). Future Generation Computer Systems (July 2010)
Newman, M.E.J.: The structure and function of complex networks. SIAM Review 45(2), 58 (2003)
Newman, M.: Networks: an introduction. Oxford University Press (2010)
Psorakis, I., Roberts, S., Ebden, M., Sheldon, B.: Overlapping community detection using Bayesian nonnegative matrix factorization. Physical Review E 83(6), 066114 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ebden, M., Huynh, T.D., Moreau, L., Ramchurn, S., Roberts, S. (2012). Network Analysis on Provenance Graphs from a Crowdsourcing Application. In: Groth, P., Frew, J. (eds) Provenance and Annotation of Data and Processes. IPAW 2012. Lecture Notes in Computer Science, vol 7525. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34222-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-34222-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34221-9
Online ISBN: 978-3-642-34222-6
eBook Packages: Computer ScienceComputer Science (R0)