Abstract
Large-scale e-Science experiments present unprecedented data handling requirements with their multi-petabyte data storages. Complex software applications, such as the ATLAS High Energy Physics experiment at CERN, run throughout Grid computing sites around the world in a distributed environment, with scientists performing concurrent analysis on data and producing new data products shared among the collaboration. In this paper, we introduce a multi-phase infrastructure to achieve data provenance for an e-Science experiment. We propose an infrastructure to integrate provenance onto an existing legacy application with strong emphasis on scalability and explore the relationship between provenance and metadata introducing a model where data provenance is made available as metadata through a separate reasoning phase.
Chapter PDF
Similar content being viewed by others
References
Buneman, P., Khanna, S., Tan, W.C.: Data provenance: Some basic issues. In: Foundations of Software Technology and Theoretical Computer Science (2000)
Buneman, P., Khanna, S., Tajima, K., Tan, W.C.: Archiving scientific data. In: Proc. of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 1–12. ACM Press, New York (2002)
Cui, Y., Widom, J.: Practical lineage tracing in data warehouses. In: Proceedings of the 16th International Conference on Data Engineering (ICDE 2000), San Diego, California (February 2000)
Widom, J., Cui, Y.: Lineage tracing for general data warehouse transformations. The VLDB Journal, 471–480 (2001)
Szomszor, M., Moreau, L.: Recording and reasoning over data provenance in web and grid services. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 603–620. Springer, Heidelberg (2003)
McCool, R., Silva, P., McGuinness, D.: Knowledge provenance infrastructure. IEEE Data Eng. Bull. 26(4), 26–32 (2003)
Woodruff, Stonebraker, M.: Supporting fine-grained data lineage in a database visualization environment. In: ICDE 1997: Proceedings of the Thirteenth International Conference on Data Engineering, Washington, DC, USA, pp. 91–102. IEEE Computer Society, Los Alamitos (1997)
Foster, I., Voeckler, J., Wilde, M., Zhao, Y.: Chimera: A virtual data system for representing, querying, and automating data derivation (2002)
Groth, P., Luck, M., Moreau, L.: Formalising a protocol for recording provenance in grids. In: Proc. of the UK OST e-Science second Al l Hands Meeting 2004 (AHM 2004), Nottingham, UK (September 2004)
Groth, P., Luck, M., Moreau, L.: A protocol for recording provenance in service-oriented grids. In: Higashino, T. (ed.) OPODIS 2004. LNCS, vol. 3544, pp. 124–139. Springer, Heidelberg (2005)
Singh, M.P., Huhns, M.N.: Service-Oriented Computing: Semantics, Processes, Agents. John Wiley & Sons, Ltd., Chichester (2005)
Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtual organizations. LNCS. Springer, Heidelberg (2001)
Zhao, J., Goble, C., Stevens, R., Bechhofer, S.: Semantically Linking and Browsing Provenance Logs for e-Science. In: Bouzeghoub, M., Goble, C.A., Kashyap, V., Spaccapietra, S. (eds.) ICSNW 2004. LNCS, vol. 3226, pp. 158–176. Springer, Heidelberg (2004)
ATLAS Computing Group, ATLAS Computing Technical Design Report (June 20, 2005), http://doc.cern.ch/archive/electronic/cern/preprints/lhcc/public/lhcc-2005-022.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Branco, M., Moreau, L. (2006). Enabling Provenance on Large Scale e-Science Applications. In: Moreau, L., Foster, I. (eds) Provenance and Annotation of Data. IPAW 2006. Lecture Notes in Computer Science, vol 4145. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11890850_7
Download citation
DOI: https://doi.org/10.1007/11890850_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46302-3
Online ISBN: 978-3-540-46303-0
eBook Packages: Computer ScienceComputer Science (R0)