Abstract
Capturing provenance information in scientific workflows is not only useful for determining data-dependencies, but also for a wide range of queries including fault tolerance and usage statistics. As collaborative scientific workflow environments provide users with reusable shared workflows, collection and usage of provenance data in a generic way that could serve multiple data and computational models become vital. This paper presents a method for capturing data value- and control- dependencies for provenance information collection in the Kepler scientific workflow system. It also describes how the collected information based on these dependencies could be used for a fault tolerance framework in different models of computation.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Freire, J., Silva, C., Callahan, S., Santos, E., Scheidegger, C., Vo, H.: Managing Rapidly-Evolving Scientific Workflows. In: Proceedings of International Provenance and Annotation Workshop, pp. 10–18 (2006)
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger-Frank, E., Jones, M., Lee, E., Tao, J., Zhao, Y.: Scientific Workflow Management and the Kepler System. Special Issue: Workflow in Grid Systems. Concurrency and Computation: Practice & Experience 18(10), 1039–1065 (2006)
Jaeger-Frank, E., Crosby, C., Memon, A., Nandigam, V., Arrowsmith, J., Conner, J., Altintas, I., Baru, C.: A Three-Tier Architecture for LiDAR Interpolation and Analysis. In: Proceedings of International Workshop on Workflow Systems in e-Science, pp. 920–927 (2006)
Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance Collection Support in the Kepler Scientific Workflow System. In: Proceedings of International Provenance and Annotation Workshop, pp. 118–132 (2006)
Altintas, I., et al.: Provenance in Kepler-based Scientific Workflow Systems. In: Microsoft e-Science Workshop, poster (2007)
Goderis, A., Brooks, C., Altintas, I., Lee, E.A., Goble, C.: Composing Different Models of Computation in Kepler and Ptolemy II. In: Proceedings of the International Conference on Computational Science (2007)
Myers, A.: JFlow: practical mostly-static information flow control. In: Proceedings Symposium on Principles of Programming Languages, pp. 228–241 (1999)
Haldar, V., Chandra, D., Franz, M.: Dynamic Taint Propagation for Java. In: Proceedings of Computer Security Applications Conference, pp. 303–311 (2005)
Wall, L., Christiansen, T., Orwant, J.: Programming Perl, 3rd edn. O’Reilly, Sebastopol
Mitasova, H., Mitas, L., Harmon, R.: Simultaneous spline interpolation and topographic analysis for lidar elevation data: methods for open source GIS. IEEE GRSL 2(4), 375–379 (2005)
Miles, S., Groth, P., Branco, M., Moreau, L.: The Requirements of Recording and Using Provenance in e-Science Experiments. Journal of Grid Computing 5(1), 1–25 (2007)
Zhao, Y., Wilde, M., Foster, I.: Applying the Virtual Data Provenance Model. In: Proceedings of International Provenance and Annotation Workshop, pp. 148–161 (2006)
Wootten, I., Rana, O., Rajbhandari, S.: Recording Actor State in Scientific Workflows. In: Proceedings of International Provenance and Annotation Workshop, pp. 109–117 (2006)
Ludäscher, B., Podhorszki, N., Altintas, I., Bowers, S., McPhillips, T.: From Computation Models to Models of Provenance: The RWS Approach. Concurrency and Computation: Practice & Experience 2(5), 507–518 (2007)
Plankensteiner, K., Prodan, R., Fahringer, T., Kertesz, A., Kacsuk, P.: Fault-tolerant behavior in state-of-the-art Grid Workflow Management Systems. TR-0091, CoreGRID (2007)
Fahringer, T., Prodan, R., Duan, R., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H., Villazon, A., Wieczorek, M.: ASKALON: A Grid Application Development and Computing Environment. In: Proceedings of International Workshop on Grid Computing (2005)
Bowers, S., Ludäscher, B., Ngu, A., Critchlow, T.: Enabling Scientific Workflow Reuse through Structured Composition of Dataflow and Control-Flow. In: IEEE Workshop on Workflow and Data Flow for Scientific Applications (2006)
Feng, T.H., Lee, E.A.: Real-Time Distributed Discrete-Event Execution with Fault Tolerance. In: Proceedings of IEEE Real-Time and Embedded Technology and Applications Symposium (2008)
Laszewski, G., Hategan, M.: Workflow Concepts of the Java CoG Kit. Journal of Grid Computing 3(3-4), 239–258 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Crawl, D., Altintas, I. (2008). A Provenance-Based Fault Tolerance Mechanism for Scientific Workflows. In: Freire, J., Koop, D., Moreau, L. (eds) Provenance and Annotation of Data and Processes. IPAW 2008. Lecture Notes in Computer Science, vol 5272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89965-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-89965-5_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89964-8
Online ISBN: 978-3-540-89965-5
eBook Packages: Computer ScienceComputer Science (R0)