A Document-Based Data Warehousing Approach for Large Scale Data Mining

Chai, Hualei; Wu, Gang; Zhao, Yuan

doi:10.1007/978-3-642-37015-1_7

Hualei Chai¹⁹,
Gang Wu¹⁹ &
Yuan Zhao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 7719))

Included in the following conference series:

Joint International Conference on Pervasive Computing and the Networked World

3966 Accesses
4 Citations

Abstract

Data mining techniques are widely applied and data warehousing is relatively important in this process. Both scalability and efficiency have always been the key issues in data warehousing. Due to the explosive growth of data, data warehousing today is facing tough challenges in these issues and traditional method encounters its bottleneck. In this paper, we present a document-based data warehousing approach. In our approach, the ETL process is carried out through MapReduce framework and the data warehouse is constructed on a distributed, document-oriented database. A case study is given to demonstrate details of the entire process. Comparing with RDBMS based data warehousing, our approach illustrates better scalability, flexibility and efficiency.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Survey of Big Data Warehousing Techniques

Nuts and Bolts of ETL in Data Warehouse

Data Warehouse Modernization Using Document-Oriented ETL Framework for Real Time Analytics

Keywords

References

Gupta, V.R.: An Introduction to Data Warehousing. System Services Corporation (1997)
Google Scholar
Tan, A.X., et al.: A Comparison of Approaches for Large-Scale Data Mining. Technical Report UTDCS-24-10 (2010)
Google Scholar
Yang, L., Shi, Z.: An Efficient Data Mining Framework on Hadoop using Java Persistentce API. In: 10th IEEE International Conference on Computer and Information Technology (2010)
Google Scholar
Zhao, J.: Designing Distributed Data Warehouses and OLAP Systems. In: ISTA 2005, pp. 254–263 (2005)
Google Scholar
Sreenivasa Rao, V., Vidyavathi, S.: Distributed Data Mining And Mining Multi-agent Data. International Journal on Computer Science and Engineering (IJCSE) 02(04), 1237–1244 (2010)
Google Scholar
Han, J., et al.: A Novel Solution of Distributed Memory NoSQL database for Cloud Computing. In: 2011 10th IEEE/ACIS International Conference on Computer and Information Science (2011), 978-0-7695-4401-4/11$26.00
Google Scholar
Sen, A., Sinha, A.P.: A comparison of data warehousing methodologies. Communications of The ACM 48(3) (2005)
Google Scholar
JSON, http://www.json.org/
Inmon, W.H.: Building the Data Warehouse. John Wiley (1992)
Google Scholar
Chaudhuri, S., Dayal, U.: An Overview of Data Warehousing and OLAP Technology. ACM Sigmod Record (1997)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI (2004)
Google Scholar
Ghemawat, S., et al.: The Google File System. In: SOSP 2003. ACM (2003)
Google Scholar
Chang, F., et al.: BigTable: A Distributed Storage System for Structured Data. In: OSDI (2006)
Google Scholar
Apache Hadoop, http://hadoop.apache.org/
KDD Cup 2012, http://www.kddcup2012.org/
MongoDB, http://www.mongodb.org/

Download references

Author information

Authors and Affiliations

School of Software, Shanghai Jiao Tong University, Shanghai, China
Hualei Chai, Gang Wu & Yuan Zhao

Authors

Hualei Chai
View author publications
You can also search for this author in PubMed Google Scholar
Gang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wuhan University of Technology, Heping Road 1178, Wuchang District, 430081, Wuhan, Hubei, China
Qiaohong Zu
Hayes Park Central, Fujitsu Laboratories of Europe Ltd., Hayes End Road, UB4 8FE, Hayes, Middlesex, UK
Bo Hu
Department of Electrical and Electronics Engineering, Aksaray University, Merkez Kampüsü, 68100, Aksaray, Turkey
Atilla Elçi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chai, H., Wu, G., Zhao, Y. (2013). A Document-Based Data Warehousing Approach for Large Scale Data Mining. In: Zu, Q., Hu, B., Elçi, A. (eds) Pervasive Computing and the Networked World. ICPCA/SWS 2012. Lecture Notes in Computer Science, vol 7719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37015-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-37015-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37014-4
Online ISBN: 978-3-642-37015-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Document-Based Data Warehousing Approach for Large Scale Data Mining

Abstract

Chapter PDF

Similar content being viewed by others

Survey of Big Data Warehousing Techniques

Nuts and Bolts of ETL in Data Warehouse

Data Warehouse Modernization Using Document-Oriented ETL Framework for Real Time Analytics

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Document-Based Data Warehousing Approach for Large Scale Data Mining

Abstract

Chapter PDF

Similar content being viewed by others

Survey of Big Data Warehousing Techniques

Nuts and Bolts of ETL in Data Warehouse

Data Warehouse Modernization Using Document-Oriented ETL Framework for Real Time Analytics

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation