Abstract
Data warehouses (DWs) are complex computer systems whose main goal is to facilitate the decision making process of knowledge workers. ETL (Extraction-Transformation-Loading) processes are responsible for the extraction of data from heterogeneous operational data sources, their transformation (conversion, cleaning, normalization, etc.) and their loading into DWs. ETL processes are a key component of DWs because incorrect or misleading data will produce wrong business decisions, and therefore, a correct design of these processes at early stages of a DW project is absolutely necessary to improve data quality. However, not much research has dealt with the modeling of ETL processes. In this paper, we present our approach, based on the Unified Modeling Language (UML), which allows us to accomplish the conceptual modeling of these ETL processes. We provide the necessary mechanisms for an easy and quick specification of the common operations defined in these ETL processes such as, the integration of different data sources, the transformation between source and target attributes, the generation of surrogate keys and so on. Another advantage of our proposal is the use of the UML (standardization, ease-of-use and functionality) and the seamless integration of the design of the ETL processes with the DW conceptual schema.
This paper has been partially supported by the Spanish Ministery of Science and Technology, project number TIC2001-3530-C02-02.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Inmon, W.H.: Building the Data Warehouse. QED Press/John Wiley (1992); Last edition: 3rd edn. John Wiley & Sons (2002)
SQL Power Group: How do I ensure the success of my DW? (2002), Internet: http://www.sqlpower.ca/page/dw best practices
Strange, K.: ETLWas the Key to this Data Warehouse’s Success. Technical Report CS-15-3143, Gartner (2002)
Rahm, E., Do, H.: Data Cleaning: Problems and Current Approaches. IEEE Bulletin of the Technical Committee on Data Engineering 23, 3–13 (2000)
Friedman, T.: ETL Magic Quadrant Update: Market Pressure Increases. Technical Report M-19-1108, Gartner (2003)
Greenfield, L.: Data Extraction, Transforming, Loading (ETL) Tools. The Data Warehousing Information Center (2003), Internet http://www.dwinfocenter.org/clean.html
Agosta, L.: Market Overview Update: ETL. Technical Report RPA-032002-00021, Giga Information Group (2002)
Kimball, R.: The Data Warehouse Toolkit. John Wiley & Sons, Chichester (1996); Last edition: 2nd edn. John Wiley & Sons (2002)
Object Management Group (OMG): Unified Modeling Language Specification 1.4 (2001), Internet http://www.omg.org/cgi-bin/doc?formal/01-09-67
Trujillo, J., Palomar, M., Gómez, J., Song, I.: Designing Data Warehouses with OO Conceptual Models. IEEE Computer, special issue on Data Warehouses 34, 66–75 (2001)
Luján-Mora, S., Trujillo, J., Song, I.: Extending UML for Multidimensional Modeling. In: Jézéquel, J.-M., Hussmann, H., Cook, S. (eds.) UML 2002. LNCS, vol. 2460, pp. 290–304. Springer, Heidelberg (2002)
Luján-Mora, S., Trujillo, J., Song, I.: Multidimensional Modeling with UML Package Diagrams. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 199–213. Springer, Heidelberg (2002)
Eckerson, W.: Data Quality and the Bottom Line. Technical report, The Data Warehousing Institute (2002)
Naiburg, E., Maksimchuk, R.: UML for Database Design. Addison-Wesley, Reading (2001)
Golfarelli, M., Rizzi, S.: A methodological Framework for Data Warehouse Design. In: Proc. of the ACM 1st Intl. Workshop on Data warehousing and OLAP (DOLAP 1998), Washington D.C., USA, pp. 3–9 (1998)
Sapia, C., Blaschka, M., Höfling, G., Dinter, B.: Extending the E/R Model for the Multidimensional Paradigm. In: Kambayashi, Y., Lee, D.-L., Lim, E.-p., Mohania, M., Masunaga, Y. (eds.) ER Workshops 1998. LNCS, vol. 1552, pp. 105–116. Springer, Heidelberg (1999)
Tryfona, N., Busborg, F., Christiansen, J.: starER: A Conceptual Model for Data Warehouse Design. In: Proc. of the ACM 2nd Intl. Workshop on Data warehousing and OLAP (DOLAP 1999), Kansas City, Missouri, USA (1999)
Husemann, B., Lechtenborger, J., Vossen, G.: Conceptual Data Warehouse Design. In: Proc. of the 2nd. Intl. Workshop on Design and Management of Data Warehouses (DMDW 2000), Stockholm, Sweden, pp. 3–9 (2000)
Abelló, A., Samos, J., Saltor, F.: YAM2 (Yet Another Multidimensional Model): An Extension of UML. In: International Database Engineering & Applications Symposium (IDEAS 2002), Edmonton, Canada, pp. 172–181 (2002)
National Technical University of Athens (Greece): Knowledge and Database Systems Laboratory (2003), Internet http://www.dblab.ntua.gr/
Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual Modeling for ETL Processes. In: 5th ACM International Workshop on Data Warehousing and OLAP (DOLAP 2002), McLean, USA, pp. 14–21 (2002)
Vassiliadis, P., Vagena, Z., Skiadopoulos, S., Karayannidis, N., Sellis, T.: ARKTOS: towards the modeling, design, control and execution of ETL processes. Information Systems, 537–561 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Trujillo, J., Luján-Mora, S. (2003). A UML Based Approach for Modeling ETL Processes in Data Warehouses. In: Song, IY., Liddle, S.W., Ling, TW., Scheuermann, P. (eds) Conceptual Modeling - ER 2003. ER 2003. Lecture Notes in Computer Science, vol 2813. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39648-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-39648-2_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20299-8
Online ISBN: 978-3-540-39648-2
eBook Packages: Springer Book Archive