Abstract
In recent years, research on big data, data storage and other topics that represent innovations in the analytics field has become very popular. This paper describes a proposal of a big web data application and archive for the distributed data processing with Apache Hadoop, including the framework with selected methods, which can be used with this platform. It proposes a workflow to create a web content mining application and a big data archive, which uses modern technologies like Python, PHP, JavaScript, MySQL and cloud services. It also shows the overview about the architecture, methods and data structures used in the context of web mining, distributed processing and big data analytics.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Chen, M., Mao, S., Liu, Y.: Big Data: A Survey. Mobile Networks and Applications 19(2), 171–209 (2014)
Tien, J.M.: Big Data: Unleashing Information. Journal of Systems Science and Systems Engineering 22(2), 127–151 (2013)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Vilas, K.S.: Big Data Mining. International Journal of Computer Science and Management Research 1(1), 12–17 (2012)
Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178. ACM (2009)
Power, D.: Using ‘Big Data’ for analytics and decision support. Journal of Decision Systems 23(2), 222–228 (2014)
Zhang, Q., Segall, R.S.: Web mining: A survey of current research, techniques, and software. International Journal of Information Technology & Decision Making 7(4), 683–720 (2008)
Jackson, Q.T.: Efficient formalism-only parsing of XML/HTML using the § -calculus. ACM SIGPLAN Notices 38(2), 29–35 (2003)
Peng, D., Cao, L., Xu, W.: Using JSON for Data Exchanging in Web Service Applications. Journal of Computational Information Systems 7(16), 5883–5890 (2011)
Henning, M.: API design matters. Queue 5(4), 24–36 (2007)
Holmes, A.: Hadoop in Practice. Manning, Shelter Island (2012)
Bradley, C., et al.: Data Modeling Considerations in Hadoop and Hive. Technical paper (2013)
Verma, A., et al.: Breaking the MapReduce stage barrier. Cluster computing 16(1), 191–206 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Lnenicka, M., Hovad, J., Komarkova, J. (2015). A Proposal of a Big Web Data Application and Archive for the Distributed Data Processing with Apache Hadoop. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9330. Springer, Cham. https://doi.org/10.1007/978-3-319-24306-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-24306-1_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24305-4
Online ISBN: 978-3-319-24306-1
eBook Packages: Computer ScienceComputer Science (R0)