Abstract
Web mining is the application of data mining techniques on the web data to solve the problem of extracting useful information. As the information in the internet increases, the search engines lack the efficiency of providing relevant and required information. This paper proposes an approach for web content mining using Genetic Algorithm. Genetic Algorithm is being used for wide range of optimization problems. Evolutionary computing methods help in developing web mining tools which extract relevant and required information. It has been shown experimentally that the proposed approach is able to select good quality web pages as compared to the other existing algorithms proposed in the literature. The proposed approach considers several parameters like time website existed, backward link, forwards links and others for selecting good quality web pages.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Ajoudanian, S., Jazi, M.D.: Deep Web Content Mining. World Academy of Science, Engineering and Technology 49 (2009)
Gonzales, E., Mabu, S., Taboada, K., Hirasawa, K.: Web Mining using Genetic Relation Algorithm. In: SICE Annual Conference, pp. 1622–1627 (2010)
Kosla, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2, 1–15 (2000)
Liu, B., Chiang, K.C.: Editorial Special Issue on Web Content Mining. ACM Journal of Machine Learning Research 4, 177–210 (2004)
Nimgaonkar, S., Duppala, S.: A Survey on Web Content Mining and extraction of Structured and Semi structured data. IJCA Journal (2012)
Singh, B., Singh, H.K.: Web Data Mining Research: A Survey. In: IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–10 (2010)
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.N.: Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data (2000)
Chakraborty, R.C.: Fundamentals of Gentetic Algorithms. Artificial Intelligence (2010)
Agyemang, M., Barker, K., Alhajj, R.S.: WCOND-Mine: Algorithm for detecting Web Content Outliers from Web Documents. In: 10th IEEE Symposium on Computers and Communication, pp. 885–890 (2005)
Etzioni, O.: The World Wide Web: Quagmire or Gold Mine? Communications of the ACM 39(11), 65–68 (1996)
Zhi, Z., Jun, J., Fujun, Z., Qiangang, D.: A New Genetic Algorithm for Web-based Negotiation Support system. In: IEEE International Conference on Natural Language Processing and Knowledge Engineering, pp. 209–214 (2003)
Bidgoli, B.M., Punch, W.F.: Using Genetic Algorithms for Data Mining Optimization in an Educational Web Based System. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 2252–2263. Springer, Heidelberg (2003)
Mathew, T.V.: Genetic Algorithm. pp. 1–15 (2005)
Khalessizadeh, S.M., Zaefarian, R., Nasseri, S.H., Ardil, E.: Genetic Mining: Using Genetic Algorithm for Topic Based on Concept Distribution. World Academy of Science, Engineering and Technology (2006)
Juang, C.F.: A Hybrid of Genetic Algorithm and Particle Swarm Optimization for Recurrent Network Design. IEEE Transactions on System, Man and Cybernetics, 997–1006 (2004)
Dallal, A.A., Shaker, R.: Genetic Algorithm in Web Search Using Inverted Index Representation. In: 5th IEEE GCC Conference & Exhibition, pp. 1–5 (2009)
Nasaaroui, O., Dasgupta, D., Pavuluri, M.: S2GA: a soft structured Genetic Algorithm and its application in Web Mining. In: Fuzzy Information Processing Society. IEEE Proceedings, pp. 87–92 (2002)
Toth, P.: Applying Web-Mining Methods for Analysis in Virtual Learning Environment (2006)
Liu, B.: Web Content Mining. In: The 14th International World Wide Web Conference, Japan, May 10-14 (2005)
Nick, Z.Z., Themis, P.: Web Search using a Genetic Algorithm. IEEE Internet Computing 5(2), 18–26 (2001)
Kudelka, M., Snasel, V., Lehecka, O., Qawasmeh, E.E.: Web Content Mining Using Web Design Patterns (2008)
Dunham, M.H.: Data Mining Introductory and Advanced Topics. Pearson Education, India (2006)
Van, C.J.: Information Retrieval. Butterworths (1979)
MitChell, T.: Machine Learning, ch. 1-9. McGraw Hill (1997)
Mitchell, M.: An Introduction to Genetic Algorithms, ch. 1-6. MIT Press, pp. 1–203 (1998)
Haupt, R. L.: Practical Genetic Algorithms, ch. 1-7. John Wiley & Sons Inc., pp. 1-251 (2004)
Marghny, M.H., Ali, A.F.: Web Mining Based on Genetic Algorithm. In: AIML Conference, pp. 82–87 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Johnson, F., Kumar, S. (2013). Web Content Mining Using Genetic Algorithm. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds) Advances in Computing, Communication, and Control. ICAC3 2013. Communications in Computer and Information Science, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36321-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-36321-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36320-7
Online ISBN: 978-3-642-36321-4
eBook Packages: Computer ScienceComputer Science (R0)