Optimizing Performance of User Web Browsing Search

Sunita; Rana, Vijay

doi:10.1007/978-981-13-3143-5_20

Sunita¹⁵ &
Vijay Rana¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 956))

Included in the following conference series:

International Conference on Advanced Informatics for Computing Research

1216 Accesses
2 Citations

Abstract

Web crawling and word sensing are critical nowadays. In case of web browsing, searching consume time in case proposer requirements from user is not extracted. In earlier work on web browsers word correction was missing which is a main inclusion in the proposed work. The problem with existing literature is time complexity in fetching the correct keyword from user query string. We propose character shuffle pre-processing searching mechanism. Using the proposed method, time complexity is reduced since clustering is used for searching the keywords. The searching don’t required entire database to be searched over rather only particular cluster is searched. To fetch meaningful keywords database is maintained. The keywords within the database increases as more and more user interact with this search engine. The worth of this study is proved using parameters execution time and number of meaningful keywords.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Web Extraction Browsing Scheme for Time-Critical Specific URLs Fetching

A Review of Focused Crawling Schemes for Search Engine

A Study on Competent Crawling Algorithm (CCA) for Web Search to Enhance Efficiency of Information Retrieval

Keywords

1 Introduction

The detecting word sensing is critical and complicated task. The techniques following as under category is known as Word Sense Disambiguation (WSD). Is follow with NLP with reduced the energy consumption. The phases with natural language processing (NLP) is include with preprocessing, feature extraction, segmentation and classification. Preprocessing indicate removing any abnormal present in the data. The feature ext is next phase, which is used in order to fetch the critical necessary feature out of the available information. The segmentation is a phase which is used to divide the information into critical and non-critical words. Classification is the last phase which is used to divide the information into correct phase all of these phases area critical in general data mining. NLP (Natural language processing) uses this specific field to extract the meaningful information out of the user query.

The problem associated with word sensing for all under NP hard, NP hard problem is complex problem since a same word has contain different meaning associated from user query. They consider the two sentences underneath

E.g.

(1)
“I am sit near the bank”.
(2)
“What is interest of the SBI bank”.

The word bank obviously has different meanings in the two contexts above [13]. In the primary first context it implies the bank of the river and in the second it implies the money of bank. The machine can’t to find the actual meaning of the words. In this case to be need a trained the system to extract the sense of the words. There are four regular ways to deal with Word Sense Disambiguation

Unsupervised methods: Unsupervised [1] models concentrate on taking in an example in the information with no outside input. Clustering is an exemplary case of unsupervised learning model.
Semi-supervised methods: Semi-Supervised learning [2] uses a set of curated, labelled data and tries to infer new labels/attributes on new data sets. Semi-Supervised learning models are a solid middle ground between supervised and unsupervised models.
Supervised methods: Supervised learning [3] models use external feedback to learning functions that map inputs to output observations. In those models the external environment acts as a “teacher” of the AI algorithms. These use word sensing methods to learn from labelled preparation sets. A number of the general techniques used are “decision-lists”, “decision trees”, “naïve-Bayes”, “neural-networks”, “support vector machines” (SVM).
Knowledge-based methods: Reinforcement learning models use opposite dynamics such as rewards and punishment to “reinforce” different types of knowledge. This type of learning technique is becoming really popular in modern AI solutions. Knowledge-based methods rely on “dictionaries-based”, “thesauri” and “lexical-resources” for knowledge bases.

In unsupervised learning dictionary maintains is not possible since customization is not possible. Semi-supervised this mechanism could be time consuming is expensive and it could be parsley customized. This is mechanism historical data plays a part the proposed work uses present and future work associated searching. Hence it can’t be used along with proposed system. Supervised learning it is customizable and can be used along with proposed.

The learning mechanisms greatly influence the pattern by which discovery of normal and abnormal phrase is made. For this purpose, supervised learning mechanism is proposed in this research. The word correction and searching take into consideration application program interface (API) from online source JOC. Survey of the article is organised as below: part 2 gives the literature review, part 3 illustrated the gaps, part 4 recently the proposed system, part 5 gives the results and last details are gives in the conclusion and future scope.

2 Literature Survey

The literature is conducted to look for optimal technique used for browsing websites with minimum amount of time consumed.

[4] proposed model suggested a social content unfolding along the line of semantics and time. Clustering [5] mechanism is imposed reducing the overall search time required. [6, 7] proposed a challenging task of surveying through the mechanism used for sentiment analysis. Sentiment analysis techniques suggested in the literature used to accurately predict the desire of the user by looking at the search query. [8] In this article highlights many searching techniques with various searching algorithms like fast string search algorithm with vector approach & bi linear search. [9] The proposed method beats different baselines and before proposed web-construct semantic closeness measures in light of three benchmark datasets demonstrating a high relationship with human appraisals. Proposed strategy fundamentally enhances the precision in a network mining assignment. [10, 11] Proposed framework a lexical example extraction algorithm to extricate various semantic relations that exist between two words. Work is led on datasets of vague questions, demonstrate that our approach enhances query output clustering as far as both clustering quality and level of expansion. [12] Proposed a portion based KNN clustering algorithm which enhanced exactness of KNN clustering algorithm. The proposed algorithm KKNNC utilizing the six UCI data sets, and contrasted it and KNNC algorithm in the tests. The exploratory outcomes demonstrate that KKNNC algorithm outflank KNNC algorithm in precision fundamentally. [13] Proposed model to distinguish some ease of use related issues in Semantic Web. Ease of use of some catchphrase and shape based instruments and their restrictions are being talked about. Results and discoveries of an ease of use study of the device are exhibited. [3] Proposed framework examines different systems for client driven relationship of inquiry and thinking. Human critical thinking in psychological science, a client inquiry in view of connected client interests. The multi-level technique from the human critical thinking to vast scale look was proposed through [3]. [14] proposed exponential law-based intrigue’s safeguarding demonstrating, arrange statistics– based data gathering, and philosophy managed various leveled thinking were created to execute client question parsing and seeking criteria. Talked about methods utilized for question expectation recognition, exploiting client conduct to comprehend their interests and inclinations on web based business. Technique was intended for utilizing the substance of internet searcher result pages (SERPs), alongside the data acquired from question strings, to examine qualities of inquiry aim, with a specific spotlight on supported pursuit.

In the studied literature, execution time is sufficiently high due to lack of clustering and redundant information search and retrieval. The proposed system utilizes the tokenization and keyword searching mechanism for effectively finding the resources required for user query.

3 Research Gap

The existing literature provides content based searching however does not eliminate redundant keywords. Also dissimilar keyword searching and elimination is missing causing higher execution time and least efficient URL retrieval. This system will require large amount of information in order to make correct decision. The information which is provided to the recommender system must be consistent in nature. For the information some sort of information system is required. The recommender system will take the information and formulate the decision in one of the following two ways- either by the use of collaborative filtering or by the use of content filtering. The collaborative filtering is the mechanism of filtering for information among the multi agents, viewpoints, data sources etc. The content filtering on the other hand is the mechanism of using the program in order to filter the information which is going to be used within the system. People now days are more and more concerned with the environment. For this purpose concise information retrieval system is required.

For this purpose, efficient parsing and correction system along with clustering for reducing execution time is designed. The proposed model is described in next section as.

4 Proposed Model

Proposed model combination of multiple phases and Parsing is one of the critical phases.

Parsing

Extracting the meaningful information out of the particular string is the main objective of the parsing. In order to do this, space is act as the separator. Example “My name is Sunita Mahajan”. Suppose we have a dict.mdb database.

Since the specified words matched with dictionary hence successful tokenization & as well as parsing is done. “my”, “name”, “is”, “Sunita”, “Mahajan”. After performing parsing successfully extract the meaningful keywords from the given string.

Finding meaningful keywords

Another dictionary with the co-related words is maintained order to determined meaning of the sentence. The match is counted as hit and no match is indicated with missed. The main task of our approach is to increase hit and missed occur words replaced with corrected words. The equation is used to calculate hit to miss ratio.

$$ \varvec{TS}\_\varvec{hit}\,\varvec{ratio} = \frac{{\varvec{Hits}_{\varvec{i}} }}{{\varvec{TS}_{\varvec{i}} }} $$

(1)

Equation 1: Total hit ratio

This equation indicates the total number of keywords fetched by the proposed system to the total keywords present within the dictionary. The result is presented in the form of percentages.

In the word sensing model which is proposed the hit ratio is given by considering the total words count of 100 in a dictionary.

The hit ration ex and pro indicated that the result of pro model is better since the words which does not exists in the dictionary are added to the dictionary with user permission. This procedure of higher hit ratio as compared to existing model (Fig. 1).

4.1 Proposed Algorithm

The algorithm which will describes the creation of Recommender system for the promotion of Selected Websites is describes through the following steps.

In the proposed algorithm first step receive the parameters of the user query to be tested (Pi). In the second step preprocessing, divide the string into tokens, this process is also known as parsing. Parsing act as a separator after the parsing extract the keywords are matched within a dict.mdb database. Find the meaningful keywords. The actual keywords the fetched from the user query. In the next step find the ambiguous words, and find the actual sense the keywords.

The success of the system will be determined using Hit Ratio.

$$ \varvec{hit}\,\varvec{ratio} = \frac{{\varvec{Hits}_{\varvec{i}} }}{{\varvec{TS}_{\varvec{i}} }} $$

(2)

Higher the hit ratio more successful will be the given system.

The existing approach doesn’t consider the identification of similar words & also the token matching process is slow in the proposed literature the explanations variation of keyword fetch and matching consider. The complexity of the search is reducing greatly by the use of proposed system.

5 Performance and Result Evaluation

In this section performance can be evaluate on number of websites compare along with execution time. It is total time consumed to relevant websites.

Recall

$$ \varvec{Recall} = \frac{{\varvec{Number}\,\varvec{of}\,\varvec{RW}}}{{\varvec{RW} + \varvec{NRW}}} \times {\mathbf{100}} $$

(3)

The overall performance is described in Tables 1 and 2 highlights list of four keywords with their recall measurer that describes the relevant and non-relevant result.

Table 1. Showing the dictionary containing words along with meaning

Full size table

Table 2. Comparison in terms of ratio

Full size table

Single-keyword based query

(See Figs. 2, 3 and Table 3).

Table 3. Study of quantitative analysis

Full size table

Multi-keyword based query

(See Figs. 4, 5, Tables 4, 5 and 6).

Table 4. Confusion matrix for single-keyword

Full size table

Table 5. Study of quantitative analysis

Full size table

Table 6. Confusion matrix for multi-keywords

Full size table

6 Conclusion

The result from the proposed system indicates betterment in terms of confusion matrix. The keywords matching and parsing process gives unique labels along with high precision. The keyword matching frequency yield the order at which the obtained website is going to be displayed at the browser. The pre-processing phase also filters the information to be displayed to the user keeping in mind the user interest. The time consumption in fetching the website greatly depends upon the server caches and processor speed. The proposed system is tested on single CPU but may perform better on GPU. In the future work, clustering along with sense annotation and location sensitive along with proposed system.

References

Che, W., Liu, T.: Using word sense disambiguation for semantic role labeling. In: 2010 4th International Universal Communication Symposium, pp. 167–174 (2010)
Google Scholar
Features, S.: A method for word sense disambiguation combining context semantic features, pp. 283–287 (2016)
Google Scholar
Zeng, Y., et al.: User-centric query refinement and processing using granularity-based strategies. Knowl. Inf. Syst. 27(3), 419–450 (2011)
Article Google Scholar
De Maio, C., Fenza, G., Loia, V., Orciuoli, F.: Unfolding social content evolution along time and semantics. Future Gener. Comput. Syst. 66, 146–159 (2017)
Article Google Scholar
Shekarpour, S., Marx, E.: RQUERY : rewriting natural language queries on knowledge graphs to alleviate the vocabulary mismatch problem (2017)
Google Scholar
Mohey, D., Hussein, E.M.: A survey on sentiment analysis challenges. J. King Saud Univ. - Eng. Sci. 30, 330–338 (2016)
Google Scholar
Mahajan, S., Sharma, S., Rana, V.: Design a perception based semantics model for knowledge extraction. Int. J. Comput. Intell. Res. 13(6), 1547–1556 (2017)
Google Scholar
Chandra, S.: A brief study and analysis of different searching algorithms (2017)
Google Scholar
Bollegala, D., Matsuo, Y., Ishizuka, M.: A web search engine-based approach to measure semantic similarity between words. IEEE Trans. Knowl. Data Eng. 23(7), 977–990 (2011)
Article Google Scholar
Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. Computational Linguistics, pp. 116–126 (2010)
Google Scholar
Rana, V.: An approaches & comprehensive survey for measuring semantic relatedness with knowledge resources, vol. 6, no. 1, pp. 398–403 (2018)
Google Scholar
Wang, Y.: K-nearest neighbor clustering algorithm based on kernel methods, pp. 0–3 (2010)
Google Scholar
Haider, A., Raza, A.: Keyword and form based semantic search tools and their usability. In: 8th International Conference on Digital Information Management ICDIM 2013, pp. 85–89 (2013)
Google Scholar
Ashkan, A., Clarke, C.L.A.: Impact of query intent and search context on clickthrough behavior in sponsored search. Knowl. Inf. Syst. 34(2), 425–452 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Arni University, Kangra, India
Sunita
SBBS University, Khiala, India
Vijay Rana

Authors

Sunita
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Rana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sunita or Vijay Rana .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Maharshi Dayanand University, Rohtak, Haryana, India
Ashish Kumar Luhach
Namibia University of Science and Technology, Windhoek, Namibia
Dharm Singh
National Chung Cheng University, Minxiong Township, Chiayi County, Taiwan
Pao-Ann Hsiung
Electrical and Electronics Engineering, Universiti Malaysia Pahang, Pekan, Pahang, Malaysia
Kamarul Bin Ghazali Hawari
Saint Mary’s University, Halifax, NS, Canada
Pawan Lingras
Department of Computer Science and Engineering, Jaypee University of Information Technology, Kandaghat, India
Pradeep Kumar Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sunita, Rana, V. (2019). Optimizing Performance of User Web Browsing Search. In: Luhach, A., Singh, D., Hsiung, PA., Hawari, K., Lingras, P., Singh, P. (eds) Advanced Informatics for Computing Research. ICAICR 2018. Communications in Computer and Information Science, vol 956. Springer, Singapore. https://doi.org/10.1007/978-981-13-3143-5_20

Download citation

DOI: https://doi.org/10.1007/978-981-13-3143-5_20
Published: 28 November 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3142-8
Online ISBN: 978-981-13-3143-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimizing Performance of User Web Browsing Search

Abstract

Similar content being viewed by others

A Web Extraction Browsing Scheme for Time-Critical Specific URLs Fetching

A Review of Focused Crawling Schemes for Search Engine

A Study on Competent Crawling Algorithm (CCA) for Web Search to Enhance Efficiency of Information Retrieval

Keywords

1 Introduction

2 Literature Survey

3 Research Gap