Abstract
In the paper we focused on the problem of efficient handling of ETL processes failures. During such a process, a data warehouse is filled with data. Because large amounts of data need to be processed, the whole process takes a lot of time. After a failure there may be no time to restart the process. In such a situation a resumption algorithm should be applied. In the paper we present a new approach to the checkpoint-based resumption method. We combine checkpointing with the Design-Resume algorithm. Such a combination is supposed to work more efficiently than the pure checkpointing. Moreover, not all the ETL application modules must implement the checkpointing. We present a basic idea of the algorithm, its requirements and necessary definitions. The proposed solution is then compared to other resumption methods and obtained results are discussed.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bruckner R., List B., Schiefer J.: Striving Towards Near Real-Time Data Integration for Data Warehouses. DaWaK 2002.
FujiMoto R.: Parallel discrete event simulation, Communications of the ACM, 33(10), 1990
Galhardas H., Florescu D., Shasha D., Simon E.: Ajax: An Extensible Data Cleaning-Tool. In Proc. ACM SIGMOD Intl. Conf. On the Management of Data, Texas (2000).
Gorawski M., Malczok R.: Distributed Spatial Data Warehouse Indexed with Virtual Memory Aggregation Tree. 5th Workshop on Spatial-Temporal DataBase Management (STDBM_VLDB’04), Toronto, Canada 2004.
Gorawski M., Marks P.: High Efficiency of Hybrid Resumption in Distributed Data Warehouses. 1st Intl. Workshop on High Availability in Distributed Systems (HADIS 2005), Copenhagen, Denmark 2005.
Gorawski M., Chechelski R.; Spatial Telemetric Data Warehouse Balancing Algorithm in Oracle9i/Java Environment, Intelligent Information Systems, Gdansk, Poland, 2005.
Labio W., Wiener J., Garcia-Molina H., Gorelik V.: Efficient resumption of interrupted warehouse loads. SIGMOD Conference, 2000.
Labio W, Wiener J., Garcia-Molina H., Gorelik V.: Resumption algorithms. Technical report, Stanford University, 1998.
Plank J. S., An Overview of Checkpointing in Uniprocessor and Distributed Systems, Focusing on Implementation and Performance. Technical report, University of Tennessee, 1997
Sagent Technologies Inc.: Personal correspondence with customers.
Vassiliadis P., Simitsis A., Skiadopoulos S.: Modeling ETL Activities asGraphs. InProc. 4th Intl. Workshop on Design and Management of Data Warehouses, Canada, (2002).
Vassiliadis P., Simitsis A., Georgantas P., Terrovitis M.: A Framework for the Design of ETL Scenarios. CAiSE 2003.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 International Federation for Information Processing
About this paper
Cite this paper
Gorawski, M., Marks, P. (2006). Checkpoint-based resumption in data warehouses. In: Sacha, K. (eds) Software Engineering Techniques: Design for Quality. IFIP International Federation for Information Processing, vol 227. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39388-9_30
Download citation
DOI: https://doi.org/10.1007/978-0-387-39388-9_30
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-39387-2
Online ISBN: 978-0-387-39388-9
eBook Packages: Computer ScienceComputer Science (R0)