Abstract
The need to search for specific information or products in the ever expanding Internet has led the development of Web search engines and recommender systems. Whereas their benefit is the provision of a direct connection between users and the information or products sought within the Big Data, any search outcome will be influenced by a commercial interest as well as by the users’ own ambiguity in formulating their requests or queries. This research analyses the result rank relevance provided by the different Web search engines, metasearch engines, academic databases and recommender systems. We propose an Intelligent Internet Search Assistant (ISA) that acts as an interface between the user and Big Data search engines. We also present a new relevance metric which combines both relevance and rank. We use this metric to validate and compare the performance of our proposed algorithm against other search engines and recommender systems. On average, our ISA outperforms other search engines.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
- Intelligent Internet Search Assistant
- Random Neural Network
- Web search
- Search Engines
- Recommender Systems
- Big Data
1 Introduction
The extensive size of the Big Data does not allow Internet users to find all relevant information or products without the use of Web Search Engines or Recommender Systems. Web users can not be guaranteed that the results provided by search applications are either exhaustive or relevant to their search needs. Businesses have the commercial interest to rank higher on results or recommendations to attract more customers while Web search engines and recommender systems make their profit based on their advertisements and product purchase. The main consequence is that irrelevant results or products may be shown on top positions and relevant ones “hidden” at the very bottom of the search list. As the size of the Internet and Big Data increasingly expands, Web Users are more and more dependent on information filtering applications.
We describe the application of neural networks in recommender systems in Sect. 2. In order to address the presented search issues; this paper proposes in Sect. 3 an Intelligent Internet Search Assistant (ISA) that acts as an interface between an individual user’s query and the different search engines. We have validated our ISA against other Web search engines and metasearch engines, online databases and recommender systems on Sect. 4. Our conclusions are presented on Sect. 5.
2 Related Work
The ability of neural networks to learn iteratively from different inputs to acquire the desired outputs as a mechanism of adaptation to users’ interest in order to provide relevant answers have already been applied in the World Wide Web and recommender systems. S. Patil et al. [1] propose a recommender system using collaborative filtering mechanism with k-separability approach for Web based marketing. They build a model for each user on several steps: they cluster a group of individuals into different categories according to their similarity using Adaptive Resonance Theory (ART) and then they calculate the Singular Value Decomposition matrix. M. Lee et al. [2] propose a new recommender system which combines collaborative filtering with a Self-Organizing Map neural network. They segment all users by demographic characteristics where users in each segment are clustered according to the preference of items using the neural network. C. Vassiliou et al. [3] propose a framework that combines neural networks and collaborative filtering. Their approach uses a neural network to recognize implicit patterns between user profiles and items of interest which are then further enhanced by collaborative filtering to personalized suggestions. K. Kongsakun et al. [4] develop an intelligent recommender system framework based on an investigation of the possible correlations between the students’ historic records and final results. C. Chang et al. [5] train the artificial neural networks to group users into different types. They use an Adaptive Resonance Theory (ART) neural network model in an unsupervised learning model where the input layer is a vector made of user’s features and the output layer is the different cluster. P. Chou et al. [6] integrate a back propagation neural network with supervised learning and feed forward architecture in an “interior desire system”. D. Billsus et al. [7] propose a representation for collaborative filtering tasks that allows the application of any machine learning algorithm, including a feed forward neural network with k input neurons, 2 hidden neurons and 1 output neuron. M. Krstic et al. [8] apply a single hidden layer feed forward neural network as a classifier tool which estimates whether a certain TV programme is relevant to the user based on the TV programme description, contextual data and the feedback provided by the user. C. Biancalana et al. [9] propose a neural network to include contextual information on film recommendations. The aim of the neural network is to identify which member of a household gave a specific rating to a film at a specific time. M. Devi et al. [10] use a probabilistic neural network to calculate the rating between users based on the rating matrix. They smooth the sparse rating matrix by predicting the rating values of the unrated items.
3 The Intelligent Internet Search Assistant Model
The search assistant we design is based on the Random Neural Network (RNN) [11–13]. This is a spiking recurrent stochastic model for neural networks. Its main analytical properties are the “product form” and the existence of the unique network steady state solution. It represents more closely how signals are transmitted in many biological neural networks where they actual travel as spikes or impulses, rather than as analogue signal levels, and has been used in different applications including network routing with cognitive packet networks, using reinforcement learning, which requires the search for paths that meet certain pre-specified quality of service requirements [14], search for exit routes for evacuees in emergency situations [15, 16] and network routing [17], pattern based search for specific objects [18], video compression [19], image texture learning and generation [20] and Deep Learning [21].
Gelenbe, E. et al. have investigated different search models [22–24]. In the case of our own application of the RNN [25]; our ISA acquires a query from the user and retrieves results from one or various search engines assigning one neuron per each Web result dimension. The result relevance is calculated by applying our innovative cost function based on the division of a query into a multidimensional vector weighting its dimension terms with different relevance parameters. Our ISA adapts and learns the perceived user’s interest and reorders the retrieved snippets based in our dimension relevant centre point. Our ISA learns result relevance on an iterative process where the user evaluates directly the listed results. We evaluate and compare its performance against other search engines with a new proposed quality definition, which combines both relevance and rank. We have also included two learning algorithms; Gradient Descent learns the centre of relevant dimensions and Reinforcement Learning updates the network weights based on rewarding relevant dimensions and punishing irrelevant ones.
4 Validation
4.1 Web Search Validation
We can affirm the superior search engine has the higher density of better scoring results on top positions based on the result master list. In order to measure numerically Web search quality or to establish a benchmark from we can compare search performance; we propose the following algorithm, where results showed at top positions are rewarded and results showed at lower positions are penalized. We define quality, Q, as:
where RML is the rank of the result in the master list that represents the optimum result relevance order, RSE is the rank of the same result in a particular search engine and Y is the number of results shown to the user, if the result order is larger than Y, we discard the result in our calculation as it is considered irrelevant.
We define normalized quality, \( \overline{\text{Q}} \), as the division of the quality, Q, by the optimum figure which it is when the results provided are ranked in the same order as in the master list; this value corresponds to the sum of the squares of the first Y integers:
where Y the total number of results shown to the user.
The Intelligent Internet Search Assistant we have proposed emulates how Web search engines work by using a very similar interface to introduce and display information. We validate our proposed ISA with current Metasearch engines, we retrieve the results from the Web search engines they use to generate the result master list and then compare the results provided by the Metasearch engines against this result master list. This proposed method has the inconvenience that we are not considering any result obtained from Internet Web directories neither Online databases from where Metasearch engines may have retrieved some results displayed. We have selected both Ixquick and Metacrawler as the Metasearch engines we can compare our ISA. After analysing the main characteristics of both Metasearch engines we consider Metacrawler uses (Google Yahoo and Yandex) and Ixquick uses (Google Yahoo and Bing) as their main source of search results. We have run our ISA to acquire 10 different queries based on the travel industry from a user. The ISA then retrieves the first 30 results from each of the main Web search engine driver programmed (Google, Yahoo, Bing, and Yandex), we have therefore scored 30 points to the Web site result that is displayed in the top position, 1 point to the Web site result that is shown in the last position and 0 points to each of the result that belongs to the same Web site and it is shown more than once. After we have scored the 120 results provided by the 4 different Web search engines, we combine them by adding the scores of the results which have the same Web site and rank them to generate the result master list. We have done this evaluation exercise for each high level query. We then retrieve the first 30 results from Metacrawler and Ixquick and benchmark them against the result master list using the Quality formula proposed. We present the average Quality values for the 10 different queries on the below table (Table 1).
4.2 Online Academic Database Validation
In order to measure search quality we can affirm a better Online Academic Database provides with a list of more relevant results on top positions. We propose the following quality description where within a list of N results we score N to the first result and 1 to the last result, the value of the quality proposed is then the summation of the position score based of each of the selected results. Our definition of Quality, Q, can be defined as:
where RSEi is the rank of the result i in a particular search engine with a value of N if the result is in the first position and 1 if the result is the last one. Y is the total number of results selected by the user. The best Online Academic Database would have the largest Quality value. We define normalized quality, \( \overline{\text{Q}} \), as the division of the quality, Q, by the optimum figure which it is when the user consider relevant all the results provided by the Web search engine. On this situation Y and N have the same value:
We define I as the quality improvement between our Intelligent Search Assistant against an Online Academic Database:
Where I is the Improvement, QW is the quality of the Intelligent Search Assistant and QR is the quality reference; we use the Quality of Google Scholar, IEEE Xplore, CiteseerX or Microsoft Academic as QR in our validation exercise in the first iteration and on further iterations, we use as Quality reference the value of the previous iteration.
Our Intelligent Internet Search Assistant can select between the main Online Academic (Google Scholar, IEEE Xplore, CiteseerX or Microsoft Academic) and the type of learning to be implemented. Our ISA then collects the first 50 results from the search engine selected, reorders them according to its cost function and finally shows to the user the first 20 results. Our ISA reorders results while learning on the two step iterative process showing only the best 20 results to the user. We have searched for 6 different queries. We have used the four different Online Academic Databases for each query, 24 searches in total. We have selected Gradient Descent and Reinforcement Learning for 3 queries (12 searches) each. The table shows the average Quality value of the Database search engine and ISA. The first I represents the improvement from ISA against the Online Academic Databases; the second I is between ISA iterations 2 and 1 and finally the third I is between the ISA iterations 3 and 2 (Table 2).
4.3 Recommender System Validation
We have implemented our Intelligent Search Assistant to reorder the results from three different independent recommender systems: GroupLens film database, Trip Advisor and Amazon. Our ISA reorders the films or products based on the updated result relevance calculated by combining only the value of the relevant selected dimensions. The higher the value the more relevant the film or product should be. ISA shows to the user the first 20 results including its ranking. The user then selects the films or products with higher ranking; this ranking has been previously calculated by adding user reviews to the same products and calculating the average value. We have included Gradient Descent and Reinforcement Learning for different queries in our validation. Experimental results are shown on the following table (Table 3).
5 Conclusions
We have proposed a novel approach to Web search and recommendation systems within Big Data; the user iteratively trains the neural network while looking for relevant results. We have also defined a different process; the application of the Random Neural Network as a biological inspired algorithm to measure both user relevance and result ranking based on a predetermined cost function. Our Intelligent Search Assistant performs generally slightly better than Google and other Web search engines however, this evaluation may be biased because users tend to concentrate on the first results provided which were the ones we showed in our algorithm. Our ISA adapts and learns from user previous relevance measurements increasing significantly its quality and improvement within the first iteration. Reinforcement Learning algorithm performs better than Gradient Descent. Although Gradient Descent provides a better quality on the first iteration; Reinforcement Learning outperforms on the second one due its higher learning rate. Both of them have a residual learning on their third iteration. Gradient Descent would have been the preferred learning algorithm if only one iteration is required; however Reinforcement Learning would have been a better option in the case of two iterations. It is not recommended three iterations because learning is only residual.
References
Patil, S., Mane, Y., Dabre, K., Dewan, P., Kalbande, D.: An efficient recommender system using collaborative filtering methods with k-separability approach. Int. J. Eng. Res. Appl., 30–35 (2012)
Lee, M., Choi, P., Woo, Y.T.: A hybrid recommender system combining collaborative filtering with neural network. In: de Bra, P., Brusilovsky, P., Conejo, R. (eds.) AH 2002. LNCS, vol. 2347, pp. 531–534. Springer, Heidelberg (2002)
Vassiliou, C., Stamoulis, D., Martakos, D., Athanassopoulos, S.: A recommender system framework combining neural networks & collaborative filtering. In: International Conference on Instrumentation, Measurement, Circuits and Systems, pp. 285–290 (2006)
Kongsakun, K., Kajornrit, J., Fung, C.: Neural network modelling for an intelligent recommendation system supporting SRM for universities in Thailand. In: International Conference on Computing and Information Technology, vol. 2, pp. 34–44 (2013)
Chang, C., Chen, P., Chiu, F., Chen, Y.: Application of neural networks and Kanos’s method to content recommendation in Web personalization. Expert Syst. Appl. 36, 5310–5316 (2009)
Chou, P., Li, P., Chen, K., Wu, M.: Integrating Web mining and neural network for personalized e-commerce automatic service. Expert Syst. Appl. 37, 2898–2910 (2010)
Billsus, D., Pazzani, M.: Learning collaborative information filters. In: International Conference of Machine Learning, pp. 46–54 (1998)
Krstic, M., Bjelica, M.: Context aware personalized program guide based on neural network. IEEE Trans. Consum. Electron. 58, 1301–1306 (2012)
Biancana, C., Gaspareti, F., Micarelli, A., Miola A., Sansonetti, G.: Context-aware movie recommendation based on signal processing and machine learning. In: The Challenge on Context Aware Movie Recommendation, pp. 5–10 (2011)
Devi, M., Samy, R., Kumar, S., Venkatesh, P.: Probabilistic neural network approach to alleviate sparsity and cold start problems in collaborative recommender systems. Comput. Intell. Comput. Res., 1–4 (2010)
Gelenbe, E.: Random neural network with negative and positive signals and product form solution. Neural Comput. 1, 502–510 (1989)
Gelenbe, E.: Learning in the recurrent Random Neural Network. Neural Comput. 5, 154–164 (1993)
Gelenbe, E., Timotheou, S.: Random neural networks with synchronized interactions. Neural Comput. 20(9), 2308–2324 (2008)
Gelenbe, E., Lent, R., Xu, Z.: Towards networks with cognitive packets. In: Goto, K., Hasegawa, T., Takagi, H., Takahashi, Y. (eds.) Performance and QoS of Next Generation Networking, pp. 3–17. Springer, London (2011)
Gelenbe, E., Wu, F.J.: Large scale simulation for human evacuation and rescue. Comput. Math Appl. 64(12), 3869–3880 (2012)
Filippoupolitis, A., Hey, L., Loukas, G., Gelenbe, E., Timotheou, S.: Emergency response simulation using wireless sensor networks. In: Proceedings of the 1st International Conference on Ambient Media and Systems, p. 21 (2008)
Gelenbe, E.: Steps towards self-aware networks. Commun. ACM 52(7), 66–75 (2009)
Gelenbe, E., Koçak, T.: Area-based results for mine detection. IEEE Trans. Geosci. Remote Sens. 38(1), 12–24 (2000)
Cramer, C., Gelenbe, E., Bakircloglu, H.: Low bit-rate video compression with neural networks and temporal subsampling. Proc. IEEE 84(10), 1529–1543 (1996)
Atalay, V., Gelenbe, E., Yalabik, N.: The random neural network model for texture generation. Int. J. Pattern Recognit Artif Intell. 6(1), 131–141 (1992)
Gelenbe, E., Yin, Y.: Deep learning with random neural networks, In: International Joint Conference on Neural Networks (IJCNN 2016) World Congress on Computational Intelligence. IEEE Xplore, Vancouver (2016). Paper Number 16502
Gelenbe, E.: Search in unknown random environments. Phys. Rev. E 82(6), 061112 (2007)
Gelenbe, E., Abdelrahman, O.H.: Search in the universe of big networks and data. IEEE Netw. 28(4), 20–25 (2014)
Abdelrahman, O.H., Gelenbe, E.: Time and energy in team-based search. Phys. Rev. E 87, 032125 (2013)
Gelenbe, E., Serrano, W.: An intelligent internet search assistant based on the random neural network. In: Iliadis, L., Maglogiannis, I. (eds.) AIAI 2016, IFIP AICT, vol. 475, pp. 141–153. Springer, Switzerland (2016)
Acknowledgment
This research has used Groupfilms dataset from the Department of Computer Science and Engineering at the University of Minnesota; Trip Advisor dataset from the University of California-Irvine, Machine Learning repository, Centre for Machine Learning and Intelligent Systems and Amazon dataset from Julian McAuley Computer Science Department at University of California, San Diego.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Serrano, W. (2017). A Big Data Intelligent Search Assistant Based on the Random Neural Network. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds) Advances in Big Data. INNS 2016. Advances in Intelligent Systems and Computing, vol 529. Springer, Cham. https://doi.org/10.1007/978-3-319-47898-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-47898-2_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47897-5
Online ISBN: 978-3-319-47898-2
eBook Packages: EngineeringEngineering (R0)