Abstract
With the proliferation of database views and curated data- bases, the issue of data provenance - where a piece of data came from and the process by which it arrived in the database - is becoming increasingly important, especially in scientific databases where understanding provenance is crucial to the accuracy and currency of data. In this paper we describe an approach to computing provenance when the data of interest has been created by a database query. We adopt a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML. A novel aspect of our work is a distinction between “why” provenance (refers to the source data that had some influence on the existence of the data) and “where” provenance (refers to the location(s) in the source databases from which the data was extracted).
Supported in part by an Alfred P. Sloan Research Fellowship.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
INFOBIOGEN. DBCAT, The Public Catalog of Databases. http://www.infobiogen.fr/services/dbcat/, cited 5 June 2000.
A. Woodruff and M. Stonebraker. Supporting fine-grained data lineage in a database visualization environment. In ICDE, pages 91–102, 1997.
S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web. From Relations to Semistructured Data and XML. Morgan Kaufman, 2000.
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison Wesley Publishing Co, 1995.
S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The lorel query language for semistructured data. Journal on Digital Libraries, 1(1), 1996.
P. Buneman, A. Deutsch, and W. Tan. A Deterministic Model for Semistructured Data. In Proc. of the Workshop On Query Processing for Semistructured Data and Non-standard Data Formats, pages 14–19, 1999.
Y. Cui and J. Widom. Practical lineage tracing in data warehouses. In ICDE, pages 367–378, 2000.
A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: A Query Language for XML, 1998. http://www.w3.org/TR/NOTE-xml-ql.
R. Durbin and J. T. Mieg. ACeDB-A C. elegans Database: Syntactic definitions for the ACeDB data base manager, 1992. http://probe.nalusda.gov:8000/acedocs/syntax.html.
H. Liefke and S. Davidson. Efficient View Maintenance in XML Data Warehouses. Technical Report MS-CIS-99-27, University of Pennsylvania, 1999.
A. Klug. On conjuncitve queries containing inequalities. Journal of the ACM, 1(1):146–160, 1988.
L. Wong. Normal Forms and Conservative Properties for Query Languages over Collection Types. In PODS, Washington, D.C., May 1993.
P. Buneman and S. Davidson and G. Hillebrand and D. Suciu. A Query Language and Optimization Techniques for Unstructured Data. In SIGMOD, pages 505–516, 1996.
Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In ICDE, 1996.
World Wide Web Consortium (W3C). Document Object Model (DOM) Level 1 Specification, 2000. http://www.w3.org/TR/REC-DOM-Level-1.
World Wide Web Consortium (W3C). XML Schema Part 0: Primer, 2000. http://www.w3.org/TR/xmlschema-0/.
Y. Zhuge, H. Garcia-Molina, J. Hammer, and J. Widom. View maintenance in a warehousing environment. In SIGMOD, pages 316–327, 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Buneman, P., Khanna, S., Wang-Chiew, T. (2001). Why and Where: A Characterization of Data Provenance. In: Van den Bussche, J., Vianu, V. (eds) Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44503-X_20
Download citation
DOI: https://doi.org/10.1007/3-540-44503-X_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41456-8
Online ISBN: 978-3-540-44503-6
eBook Packages: Springer Book Archive