Abstract
This article reports a software development of a generic search engine utilizing an unsupervised learning approach. This learning approach has become apparently important due to the growth rate of data which has increased tremendously and challenge our capacity to write software algorithm and implementation around it. This was advocated as a mean to understand better the flow of algorithm in an uncontrolled environment setting. It uses the Depth-First-Search (DFS) algorithm retrieval strategy to retrieve pages with topical searching. Subsequently, an inverted indexing technique is applied to store mapping from contents to its location in a database. Subsequently, these techniques require proper approach to avoid flooding of irrelevant links which can constitute a poor design and constructed search engine to crash. The main idea of this research is to learn the concept of how to crawl, index, search and rank the output accordingly in an uncontrolled environment. This is a contrast as compared to a supervised learning conditions which could lead to information less overloading.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Yahoo! Inc., http://www.yahoo.com
Bing Inc., http://www.bing.com
Google Inc., http://www.google.com
Mishra, A.A., Kamat, C.: Migration of Search Engine Process into the Cloud. International Journal of Computer Application 19(1) (April 2011)
Glover, E.J., Lawrence, S., Gordon, M.D., Birmingham, W.P., Giiles, C.L.: Web Search Your Way. Communications of the ACM 44(12), 97–102 (2001)
Chen, H., Buntin, P., Sutjahjo, S., Sommer, C., Neely, D.: Expert, prediction, symbolic learning and neural networks: An experiment on Greyhound racing. IEEE Expert. 9(21), 21–27 (1994)
Fumas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The Vocabulary Problem in Human-System Communication. Communications of the ACM 30(11), 964–971 (1987)
Kumar, G., Duhan, N., Sharma, A.K.: Page Ranking Based on Number of Visits of Links of Web Pages. In: Computer & Communication Technology (ICCT), pp. 11–14 (2011)
Brin, S., Page, L.: The anatomy of a Large-Scale Hypertextual Web Search Engine. In: World Wide Web Conference (WWW 1998), pp. 107–117 (1998)
Koster, M.: Robots in the web: threat or treat. ConneXions 4(4) (April 1995)
Berry, D.C., Dienes, Z.: Implicit learning: Theoretical and empirical issues. Erlbaum, Hillsdale (1993)
Hayes, N., Broadbent, D.E.: Two modes of learning for interactive tasks. Cognition 24, 249–276 (1988)
Hock, H.S., Malcus, L., Hasher, L.: Frequency discrimination: Assessing global and elemental letter units in memory. Journal of Experimental Psychology, Learning, Memory & Cognition 12, 232–240 (1986)
Kellog, R.T.: When can we introspect accurately about mental processes. Memory & Cognition 10, 141–144 (1982)
Bowman, C.M., Danzig, P.B., Manber, U., Schwartz, F.: Scalable Internet Resource Discovery: Research Problems and Approaches. Communications of the ACM 37(8), 98–107 (1994)
Croft, W.B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, p. 344. Pearson Education Inc. (2010)
Love, B.C., Markman, A.B., Yamauchi, T.: Modeling classification and inference learning. In: Fifteenth National Conference on Artificial Intelligence, pp. 136–141. MIT Press, MA (2000)
Yamauchi, T., Love, B.C., Markman, A.B.: Learning non-linearly separable categories by inference and classification. Journal of Experimental Psychology: Leraning, Memory & Cognition 28, 585–593 (2002)
Sheperd, R.N., Hoyland, C.L., Jenkims, J.M.: Learning and memorization of classifications. Psychological Monographs 75(13, Whole No. 517)
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR (1998)
Kotsiantis, S.: Supervised Machine Learning: A Review of Classification Techniques. Informatica Journal 31, 249–268 (2007)
Rahman, M.N., Seyal, A.H., Maidin, S.A.: Search engine development: Adaptation from supervised learning methodology. In: Fourth International Conference on Digital Information Processing & Communications, March 18 -20, pp. 35–42 (2014)
Chau, M., Wong, C.H.: Designing the user interface and functions of a search engine development tool. Decision Support Systems 48, 369–382 (2010)
Salton, G.: Automatic Text Processing. Addison-Wesley, Reading (1989)
Faloutsos, C.: Access Methods for Text. ACM Computing Surveys 17(1), 48–74 (1985)
Bar-Ilan, J.: Search engine results over time: A case study on search engine stability, http://www.cindoc.csis.es/cybermetrics/articles/v2i1p1.html (retrieved January 26, 2014)
Bar-Ilan, J.: Search engine ability to cope with the changing web. In: Levene, M., Poulovasilis, A. (eds.) Web Dynamics. Springer, Berlin (2004)
Mettrop, W., Nieuwenhuysen, P.: Internet search engines - fluctuations in document accessibility. Journal of Documentation 57(5), 623–651 (2001)
Rousseau, R.: Daily time series of common single word searches in AltaVista and NorthernLight. Cybermetrics 2(3) (1999)
Frants, V.I., Shapiro, J., Taksa, I., Voiskunskii, V.G.: Boolean Search: Current State and Perspectives. Journal of the American Society of Information Science 50(1), 86–95 (1999)
Jung, S., Herlocker, J.L., Webster, J.: Click Data as Implicit Relevance Feedback in Web Search. Information Processing & Management 33, 791–807 (2007)
Chau, M., Chen, H., Qin, J., Zhou, J.Y., Qin, Y., Sung, W.K., McDonald, D.: Comparison of two approaches to building a vertical search tool: a case study in the nanotechnology domain. In: 2nd ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 135–144. ACM (July 2002)
Yi, J.: The Research of Search Engine Based on Semantic Web. In: International Symposium on Intelligent Information Technology Workshop (IITAW), pp. 360–363 (2008)
Brassard, G., Bratley, P.: Fundamentals of Algorithms, 1st edn., pp. 303–305. PHI Publications, New Delhi (2008)
Russel, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (1995)
Segaran, T.: Programming Collective Intelligence: Building Smart Web 2.0 Applications, 1st edn. O’Reilly Media Inc. (2007)
Rahman, M.N., Seyal, A.H., Mohamed, H.Y., Mashud, I.: A theoretical framework on the use of database management systems. Journal of Technology & Management 5(1), 36–48 (2007)
Najork, M.: Web Crawler Architecture. Encyclopedia of Database Systems (2009)
Love, B.C.: Comparing supervised and unsupervised category learning. Psychonomic Bulletin & Review 9(4), 829–835 (2002)
Rahman, M.N., Seyal, A.H., Mohamed, H.A.Y.: An empirical framework of DBMS usage in Brunei Darussalam. In: The Fifth Annual Global Information Technology Management World Conference (GITM), San Diego, California, USA, June 13-15, pp. 189–192 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rahman, M.N.A., Seyal, A.H., Omar, M.S., Maidin, S.A. (2015). A Search Engine Development Utilizing Unsupervised Learning Approach. In: Intan, R., Chi, CH., Palit, H., Santoso, L. (eds) Intelligence in the Era of Big Data. ICSIIT 2015. Communications in Computer and Information Science, vol 516. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46742-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-662-46742-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46741-1
Online ISBN: 978-3-662-46742-8
eBook Packages: Computer ScienceComputer Science (R0)