Abstract
We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Such an annotation management system could be used for understanding the provenance (aka lineage) of data, who has seen or edited a piece of data or the quality of data, which are useful functionalities for applications that deal with integration of scientific and biological data.
We present an extension, pSQL, of a fragment of SQL that has three different types of annotation propagation schemes, each useful for different purposes. The default scheme propagates annotations according to where data is copied from. The default-all scheme propagates annotations according to where data is copied from among all equivalent formulations of a given query. The custom scheme allows a user to specify how annotations should propagate. We present a storage scheme for the annotations and describe algorithms for translating a pSQL query under each propagation scheme into one or more SQL queries that would correctly retrieve the relevant annotations according to the specified propagation scheme. For the default-all scheme, we also show how we generate finitely many queries that can simulate the annotation propagation behavior of the set of all equivalent queries, which is possibly infinite. The algorithms are implemented and the feasibility of the system is demonstrated by a set of experiments that we have conducted.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Apweiler, R., Bairoch, A., Wu, C., Barker, W., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M., Natale, D., O'Donovan, C., Redaschi, N., Yeh, L.: Uniprot: the universal protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004)
Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucleic Acids Res. 28, 45–48, 2000
DBCAT, The Public Catalog of Databases. http://www.infobiogen.fr/services/dbcat/. Cited 5 June 2000
Denning, D.E., Lunt, T.F., Schell, R.R., Shockley, W.R., Heckman, M.: The seaview security model. In: Proceedings of the IEEE Symposium on Security and Privacy, Washington, DC, pp. 218–233, (1988)
Jajodia, S., Sandhu, R.S.: Polyinstantiation integrity in multilevel relations. In: Proceedings of the IEEE Symposium on Security and Privacy, Oakland, California, pp. 104–115, (1990)
Myers, A.C., Liskov, B.: A decentralized model for information control. In: Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), Saint-Malo, France, pp. 129–142, (1997)
Tan, W.: Containment of relational queries with annotation propagation. In: Proceedings of the International Workshop on Database and Programming Languages (DBPL), Potsdam, Germany, pp. 3‘7–53, (2003)
Lee, T., Bressan, S., Madnick, S.: Source attribution for querying against semi-structured documents. In: Workshop on Web Information and Data Management (WIDM), Washington, DC (1998)
Wang, Y.R., Madnick, S.E.: A polygen model for heterogeneous database systems: The source tagging perspective. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), Brisbane, Queensland, Australia, pp. 519–538, (1990)
Cui, Y., Widom, J., Wiener, J.: Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. (TODS) 25(2), 179–227 (2000)
Buneman, P., Khanna, S., Tan, W.: Why and where: A characterization of data provenance. In: Proceedings of the International Conference on Database Theory (ICDT), London, United Kingdom, pp. 316–330, (2001)
Bernstein, P., Bergstraesser, T.: Meta-data support for data transformations using microsoft repository. IEEE Data Eng. Bull. 22(1), 9–14 (1999)
Maier, D., Delcambre, L.: Superimposed information for the internet. In: Proceedings of the International Workshop on the Web and Databases (WebDB), Philadelphia, Pennsylvania, pp. 1–9, (1999)
Kahan, J., Koivunen, M., Prud'Hommeaux, E., Swick, R.: Annotea: An open rdf infrastructure for shared web annotations. In: Proceedings of the International World Wide Web Conference(WWW10), Hong Kong, China, pp. 623–632, (2001)
LaLiberte, D., Braverman, A.: A protocol for scalable group and public annotations. In: Proceedings of the International World Wide Web Conference(WWW3), Darmstadt, Germany (1995)
Phelps, T.A., Wilensky, R.: Multivalent documents. In: Proceedings of the Communications of the Association for Computing Machinery (CACM) 43(6), 82–90 (2000)
Schickler, M.A., Mazer, M.S., Brooks, C.: Pan-browser support for annotations and other meta-information on the world wide web. In: Proceedings of the International World Wide Web Conference(WWW5), Paris, France (1996)
W3C. Annotea Project. http://www.w3.org/2001/Annotea
biodas.org. http://biodas.org.
Dowell, R.: A distributed annotation system. Technical report, Department of Computer Science, Washington University in St. Louis (2001)
Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: The human genome browser at UCSC. Genome Res. 12(5), 996–1006 (2002)
Phelps, T.A., Wilensky, R.: Multivalent annotations. In: Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries, Pisa, Italy, pp. 287–303, (1997)
Phelps, T.A., Wilensky, R.: Robust intra-document locations. In: Proceedings of the International World Wide Web Conference(WWW9), Amsterdam, The Netherlands, pp. 105–118, (2000)
Buneman, P., Khanna, S., Tan, W.: On propagation of deletions and annotations through views. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), Wisconsin, Madison, pp. 150–158, (2002)
Abiteboul, S., Hull, R., Vianu, V.: Foundations of databases. Addison-Wesley Co., Reading, MA (1995)
Kementseitsidis, A., Arenas, M., Miller, R.J.: Mapping data in peer-to-peer systems: Semantics and algorithmic issues. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), San Diego, CA, pp. 325–336, (2003)
Tan, W.: Containment of relational queries with annotation propagation. Technical report, Department of Computer Science, UC Santa Cruz (2003)
Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: DBNotes: A post-it system for relational databases based on provenance. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD) '05, pp. 942–944, (2005)
TPC Transaction Processing Performance Council. http://www.tpc.org
Chaudhuri, S., Vardi, M.Y.: Optimization of real conjunctive queries. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), Washington, DC, pp. 59–70, (1993)
Sagiv, Y., Yannakakis, M.: Equivalence among relational expressions with union and difference operators. J. Assoc. Comput. Machine. (JACM) 27(4), 633–655 (1980)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bhagwat, D., Chiticariu, L., Tan, WC. et al. An annotation management system for relational databases. The VLDB Journal 14, 373–396 (2005). https://doi.org/10.1007/s00778-005-0156-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-005-0156-6