Abstract
SPADE is an open source software infrastructure for data provenance collection and management. The underlying data model used throughout the system is graph-based, consisting of vertices and directed edges that are modeled after the node and relationship types described in the Open Provenance Model. The system has been designed to decouple the collection, storage, and querying of provenance metadata. At its core is a novel provenance kernel that mediates between the producers and consumers of provenance information, and handles the persistent storage of records. It operates as a service, peering with remote instances to enable distributed provenance queries. The provenance kernel on each host handles the buffering, filtering, and multiplexing of incoming metadata from multiple sources, including the operating system, applications, and manual curation. Provenance elements can be located locally with queries that use wildcard, fuzzy, proximity, range, and Boolean operators. Ancestor and descendant queries are transparently propagated across hosts until a terminating expression is satisfied, while distributed path queries are accelerated with provenance sketches.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abraham, J., Brazier, P., Chebotko, A., Navarro, J., Piazza, A.: Distributed storage and querying techniques for a semantic Web of scientific workflow provenance. In: IEEE International Conference on Services Computing (2010)
Nedim Alpdemir, M., Mukherjee, A., Paton, N.W., Fernandes, A.A.A., Watson, P., Glover, K., Greenhalgh, C., Oinn, T., Tipney, H.: Contextualised Workflow Execution in MyGrid. In: Sloot, P.M.A., Hoekstra, A.G., Priol, T., Reinefeld, A., Bubak, M. (eds.) EGC 2005. LNCS, vol. 3470, pp. 444–453. Springer, Heidelberg (2005)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research 25 (1997)
Apache Web Server (Version 2.2.22), http://httpd.apache.org/
Bhagwat, D., Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: An annotation management system for relational databases. In: 30th ACM International Conference on Very Large Data Bases (2004)
Bose, R., Frew, J.: Lineage retrieval for scientific data processing: A survey. ACM Computing Surveys 37(1) (2005)
Callahan, S., Freire, J., Santos, E., Scheidegger, C., Silva, C., Vo, H.: VisTrails: Visualization meets data management. In: ACM SIGMOD International Conference on Management of Data (2006)
Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: BigTable: A distributed storage system for structured data. 7th USENIX Symposium on Operating Systems Design and Implementation (2006)
Event Tracing for Windows, http://msdn.microsoft.com/en-us/library/bb968803.aspx
Foster, I.T., Vckler, J.-S., Wilde, M., Zhao, Y.: A virtual data system for representing, querying, and automating data derivation. In: Scientific and Statistical Database Management Conference (2002)
Frew, J., Bose, R.: Earth System Science Workbench: A data management infrastructure for earth science products. In: Scientific and Statistical Database Management Conference (2001)
Frew, J., Metzger, D., Slaughter, P.: Automatic capture and reconstruction of computational provenance. Concurrency and Computation 20(5) (2008)
Filesystem in Userspace, http://fuse.sourceforge.net
Gehani, A., Lindqvist, U.: Bonsai: Balanced lineage authentication. In: 23rd Annual Computer Security Applications Conference. IEEE Computer Society (2007)
Gehani, A., Kim, M., Zhang, J.: Steps toward managing lineage metadata in Grid clusters. In: 1st Workshop on the Theory and Practice of Provenance (2009)
Gehani, A., Kim, M., Malik, T.: Efficient querying of distributed provenance stores. In: 8th ACM Workshop on the Challenges of Large Applications in Distributed Environments (2010)
Gehani, A., Kim, M.: Mendel: Efficiently verifying the lineage of data modified in multiple trust domains. In: 19th ACM International Symposium on High Performance Distributed Computing (2010)
Gehani, A., Tariq, D., Baig, B., Malik, T.: Policy-based integration of provenance metadata. In: 12th IEEE International Symposium on Policies for Distributed Systems and Networks (2011)
Glavic, B., Alonso, G.: Perm: Processing provenance and data on the same data model through query rewriting. In: 25th International Conference on Data Engineering (2009)
Graphviz, http://www.graphviz.org/
Green, T., Karvounarakis, G., Tannen, V.: Provenance semirings. In: 26th ACM Symposium on Principles of Database Systems (2007)
Groth, P., Moreau, L.: Representing distributed systems using the Open Provenance Model. Future Generation Computer Systems 27(6) (2011)
Heydon, A., Levin, R., Mann, T., Yu, Y.: The Vesta Approach to Software Configuration Management. Technical Report 168, Compaq Systems Research Center (2001)
Heinis, T., Alonso, G.: Efficient lineage tracking for scientific workflows. In: ACM SIGMOD International Conference on Management of Data (2008)
Holland, D.A., Braun, U., Maclean, D., Muniswamy-Reddy, K., Seltzer, M.: Choosing a data model and query language for provenance. In: 2nd International Provenance and Annotation Workshop (2008)
Installable File System, http://msdn.microsoft.com/en-us/windows/hardware/gg463062.aspx
Influenza Data, National Institutes of Health, ftp://ftp.ncbi.nlm.nih.gov/genomes/INFLUENZA/influenza.faa
Java Data Base Connectivity, http://www.oracle.com/technetwork/java/overview-141217.html
Java Native Interface, http://java.sun.com/docs/books/jni/
Kementsietsidis, A., Wang, M.: On the efficiency of provenance queries. In: 25th International Conference on Data Engineering (2009)
Linux Audit, http://people.redhat.com/sgrubb/audit/
LLVM, http://llvm.org
Apache Lucene, http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/queryparsersyntax.html
MacFUSE, http://code.google.com/p/macfuse/
Macko, P., Seltzer, M.: A general-purpose provenance library. In: 4th USENIX Workshop on the Theory and Practice of Provenance (2012)
Malik, T., Gehani, A., Tariq, D., Zaffar, F.: Sketching Distributed Data Provenance. In: Liu, Q., Bai, Q., Giugni, S., Williamson, D., Taylor, J. (eds.) Data Provenance and Data Management in eScience. SCI, vol. 426, pp. 85–108. Springer, Heidelberg (2013)
Miles, S., Deelman, E., Groth, P., Vahi, K., Mehta, G., Moreau, L.: Connecting scientific data to scientific experiments with provenance. In: 3rd IEEE International Conference on e-Science and Grid Computing (2007)
Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E., Van den Bussche, J.: The Open Provenance Model core specification (v1.1). Future Generation Computer Systems (2010)
MySQL, http://www.mysql.com/
Neo4j, http://neo4j.org/
Novel Information Gathering and Harvesting Techniques for Intelligence in Global Autonomous Language Exploitation, http://www.speech.sri.com/projects/GALE/
Open Provenance Model, http://openprovenance.org/
Pancerella, C., Hewson, J., Koegler, W., Leahy, D., Lee, M., Rahn, L., Yang, C., Myers, J.D., Didier, B., McCoy, R., Schuchardt, K., Stephan, E., Windus, T., Amin, K., Bittner, S., Lansing, C., Minkoff, M., Nijsure, S., van. Laszewski, G., Pinzon, R., Ruscic, B., Wagner, A., Wang, B., Pitz, W., Ho, Y.L., Montoya, D., Xu, L., Allison, T.C., Green Jr., W.H., Frenklach, M.: Metadata in the collaboratory for multi-scale chemical science. In: Dublin Core Conference (2003)
Process Monitor, Windows Sysinternals, http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx
Rajgarhia, A., Gehani, A.: Performance and extension of user space file systems. In: 25th ACM Symposium on Applied Computing (2010)
Muniswamy-Reddy, K.-K., Holland, D.A., Braun, U., Seltzer, M.: Provenance-aware storage systems. In: USENIX Annual Technical Conference (2006)
Muniswamy-Reddy, K.-K, Braun, U., Holland, D.A., Macko, P., Maclean, D., Margo, D., Seltzer, M., Smogor, R.: Layering in provenance systems. In: USENIX Annual Technical Conference (2009)
Muniswamy-Reddy, K.-K., Macko, P., Seltzer, M.: Making a Cloud provenance-aware. In: 1st USENIX Workshop on the Theory and Practice of Provenance (2009)
Muniswamy-Reddy, K.-K., Macko, P., Seltzer, M.: Provenance for the Cloud. In: 8th USENIX Conference on File and Storage Technologies (2010)
Lineage File System, http://crypto.stanford.edu/~cao/lineage.html
Scalable Authentication of Grid Data Provenance, http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0722068
Silva, C.T., Freire, J., Callahan, S.: Provenance for visualizations: Reproducibility and beyond. Computing in Science and Engineering 9(5) (2007)
SLAC National Accelerator Laboratory, http://www.slac.stanford.edu/
Support for Provenance Auditing in Distributed Environments, http://spade.csl.sri.com/
Szomszor, M., Moreau, L.: Recording and Reasoning over Data Provenance in Web and Grid Services. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS/DOA/ODBASE 2003. LNCS, vol. 2888, pp. 603–620. Springer, Heidelberg (2003)
Tariq, D., Ali, M., Gehani, A.: Towards Automated Collection of Application-Level Data Provenance. In: 4th USENIX Workshop on the Theory and Practice of Provenance (2012)
Tupelo project, NCSA, http://tupeloproject.ncsa.uiuc.edu/node/2
Windows Driver Kit, http://msdn.microsoft.com/en-us/windows/hardware/gg487428.aspx
WebDAV, http://www.webdav.org/
Widom, J.: Trio: A system for integrated management of data, accuracy and lineage. In: 2nd Conference on Innovative Data Systems Research (2005)
Windows Management Instrumentation, http://msdn.microsoft.com/en-us/library/aa394582(v=VS.85).aspx
Zhao, J., Goble, C.A., Stevens, R., Bechhofer, S.: Semantically Linking and Browsing Provenance Logs for E-science. In: Bouzeghoub, M., Goble, C.A., Kashyap, V., Spaccapietra, S. (eds.) ICSNW 2004. LNCS, vol. 3226, pp. 158–176. Springer, Heidelberg (2004)
Zhou, W., Sherr, M., Tao, T., Li, X., Loo, B., Mao, Y.: Efficient querying and maintenance of network provenance at Internet-scale. In: ACM SIGMOD International Conference on Management of Data (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Gehani, A., Tariq, D. (2012). SPADE: Support for Provenance Auditing in Distributed Environments. In: Narasimhan, P., Triantafillou, P. (eds) Middleware 2012. Middleware 2012. Lecture Notes in Computer Science, vol 7662. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35170-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-35170-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35169-3
Online ISBN: 978-3-642-35170-9
eBook Packages: Computer ScienceComputer Science (R0)