6.1 Introduction

Web data keeps expanding and is available in various data forms because of the rapid growth of online advertising, publishing, e-commerce and entertainment. Although Web search technology provides efficient and effective information access to users, it is still a difficult task to search useful knowledge about user needs from their search queries. Therefore, query suggestion is an important and an essential feature of commercial Web search engines. The users can directly use query suggestions results for the future new search.

Query suggestion is an efficient way to enhance keyword-based search, which is extensively useful for Web search systems. Users need to modify queries so often because queries are often informational. Users may seek discrete information on a distinct subject, hence may check out various query terms. Users may not have sufficient knowledge on a topic, therefore adequate terms are not known to retrieve the required information.

In Kato et al. [1], query recommendations are frequently used when (1) a initial query is an exceptional query, (2) single-term query is used as input query, (3) explicit queries are suggested, (4) suggestions are provided based on modification of input query and (5) various URLs has been clicked by users on the resulting search page.

Query suggestions provided to the user efficiently can reduce the complexity of the search and help them to locate the required information more precisely. This method is extensively accepted by product, music, video search, retrieval of medical information and patent search information. Query suggestions techniques are implemented by commercial search such as Searches related in Google, Search Assist in Yahoo! and Related Searches in Bing Search.

Through query suggestion, search engines have succeeded in obtaining Web information for users, but the keyword-based search is not able to organize and formulate input queries. Silverstein et al. [2] derived that users’ input query’s average length is 2.35 terms (AltaVista search engines query log). This shows that most of the user queries are short. A short query cannot describe the information needed of user search and sometimes ambiguous in meaning expression. Because of insufficient knowledge about domain, users find it difficult to organize and define appropriate input queries. Then the user has to rephrase the query words or query frequently, which affects the search performance.

In ([3,4,5,6,7,8]), the authors have focused on query suggestions by considering users’ previous query and click behaviour. There are two major issues with query-URLs recommendations: (i) the common clicks on URLs are limited for various queries (ii) though users may click the same URLs for two different queries, they may be irrelevant as that Web documents may have different contents [9]. It is necessary to generate useful suggestions by solving these problems. It is required to discover users’ information needs to organize queries with a precise meaning. Users’ search log provides information needs from users’ click behaviour. If a certain retrieved result is clicked by the user, we cannot conclude that the clicked result is completely relevant to the user query since he has not seen the full document. But the brief description of the document, i.e. snippet is shown to the user and is read by the user if he decides to click that document. It can be considered that snippet reflects the user’s information need.

Lu et al. [10] have designed a method to determine the goal of user search for a query. These search goals are obtained by clustering the proposed feedback sessions. The clicked and un-clicked documents with the last clicked document represents feedback session in a user search log. Pseudo documents are obtained by mapping feedback sessions to reflect the information needs of the users effectively. Pseudo documents are clustered using k-mean algorithm to derive search goals of users.

In this chapter, Related Search Recommendation (RSR) framework is proposed to recommend related queries for user input query. This framework uses user feedback from click through log of search engine. User click through log is converted into feedback session with clicked and un-clicked URLs and it ends with last clicked URL in a session. Each clicked and un-clicked URLs of feedback session is converted into enriched documents by calculating term frequency–inverse document frequency for each term present in the title and snippet of that URL. Pseudo documents are generated by merging all the enriched documents of a feedback session. Finally, the optimized pseudo document is generated by combining all the pseudo documents for a given input query, which reflects the user’s information need. Recommendations are generated and ranked by combining query and terms for all the methods.

6.2 Related Works

Mostly, users access Web pages by querying through search engines by which the performance of search engines is affected. In this chapter, we are recommending related search queries with the user feedback session. In this session, clicked and un-clicked document’s snippets are used to formulate related search queries. We need to calculate the similarity between different words that exist in snippets to obtain the desired results. We have reviewed several papers related to measuring the similarity between words and different techniques used for query recommendations using snippets in this section.

6.2.1 Measuring Similarity Between Two Words

Miao et al. [11] have developed a query expansion method based on Rocchio’s model. In this model, proximity information is modelled by proposed Proximity-based Term Frequency ptf in the pseudo relevant documents. Expansion terms and their proximity relation with query terms is modelled by ptf. Window-based, kernel-based and Hyperspace Analogue to Language (HAL) methods are proposed as proximity measures for evaluating the relationship between query terms and expansion terms. This model achieves better performance over position relevance model and classic Rocchio’s model.

Hamai et al. [12] have discussed a transformation function to measure semantic similarity between two given words. This approach uses page counts of documents title to measure similarity. This approach outperforms similarity measures defined over snippets.

Bollegala et al. [13] have presented an approach to calculate semantic similarity between words. Text snippets are used to obtain Lexico-syntactic patterns from a Web search engine. Support vector machine is used to integrate page count based similarity score and lexico-syntactic patterns to generate semantic similarity measure. This method performs better than Information content measures and Edge counting WordNet-based methods.

Li et al. [14] have presented an approach to calculate the semantic similarity between terms and multiword statement. A large Web corpus is used to form an isA semantic network to provide contexts for the terms. The meaning of input terms is formulated by K-medoids clustering algorithm and similarity is computed with max–max similarity function. This algorithm outperforms multiword expression pairs and Pearson correlation coefficient on word pairs.

Bollegala et al. [15, 16] have developed a relational model to calculate the semantic similarity between two words. Snippets of Web pages are used to obtain lexical patterns. Semantically related patterns are identified by extracted clusters from sequential pattern clustering algorithm. Mahalanobis distance is used to calculate semantic similarity between two words. This method outperforms all WordNet-based approaches ([17,18,19,20,21,22]).

6.2.2 Query Recommendation Techniques

Song et al. [23] have designed query suggestion method by using users’ feedbacks in the query logs. Query-URL bipartite graph is constructed for click and un-click information. Random Walk with Restart (RWR) technique is applied to both the graphs. The category of URLs is used to construct correlation matrix for URLs. Optimal query correlation matrix is constructed by combining two query correlation matrices, which is used for query suggestion. This framework gives better performance than pseudo-relevance feedback models ([24,25,26]) and random walk models.

Kharitonov et al. [27] have focused on contextualization framework for diversifying query suggestion. This framework utilizes the user’s history query, the previously clicked and skipped documents and examines query suggestions. Mean Reciprocal Rank (MRR) is used as a performance evaluation metric. This framework is compared with non-diversified ranking with the previous query, ranking with the previous query as a context and clicks and skips as context.

Ozetem et al. [28] have developed an approach to learn the probability with machine learning that a user may find a relevant follow-up query after executing the input query. To measure the relevance of follow-up query, probabilistic utility function is used which relies on the query co-occurrence. To capture the semantic similarity of the suggestions, lexical and result set based characteristics are developed. Gradient Boosted Decision Tree (GBDT) regression is performed to rank the suggestions for input query and remove the irrelevant. This approach shows significant improvement over Mutual Information (MI) method.

Broccolo et al. [29] have investigated a query suggestion algorithm that can cover long tail queries. This algorithm uses search shortcuts model to process a full text query, which is indexed in user sessions recorded in a query log. This algorithm outperforms Query Flow Graph (QFG) and Cover Graph (CG) by providing the most relevant query suggestions.

Zhang et al. [30] have developed an approach for query suggestion based on query search. This approach constructs an ordered set of search terms drawn from documents to create candidate query suggestions. It builds query suggestions separately for each potentially relevant document. This approach provides more relevant query suggestions for short queries as well as long queries.

Gomex et al. [31] have designed a novel technique to visualize the collection of textual snippets returned from a Web query. This technique constructs intuitive and meaningful layouts that optimize the placement of snippets by employing an energy function. This function considers both overlapping removal and preservation of neighbourhood structures. This technique is compared with VPSC, PRISM, Voronoi based and RWordle-C by using Euclidean distance, layout similarity and neighbourhood preservation metrics.

Phan et al. [32] have introduced a method to process sparse and short documents by hidden-topic-based framework on the Web. This framework solves data sparseness and synonyms/homonyms problems of documents. Common hidden topics are determined from datasets to make documents short, less sparse and more topic oriented. This framework is evaluated for online advertising applications on Web search domain matching/ranking and classification. Precision and recall are used to evaluate hidden topics which are used in the improvement of ranking and matching performance.

He et al. [33] have presented a novel sequential query prediction approach for understanding users’ search intent and recommending queries. A sequential probabilistic model called Mixture Variable Memory Markov Model is developed for online query recommendation. Experiments results show that ordered queries within the same session are highly correlated and should be utilized to understand the user information needs. Coverage and accuracy are used as performance evaluation metrics.

Jiang et al. [34] have presented a query recommendation method based on Query Hashing (QH). QH generates many similar and dissimilar query-pairs as prior knowledge from query sessions. Then QH learns a transformation from the prior knowledge such that after transformation of similar queries tend to have similar hash values. In the recommendation stage, queries that have similar hash values to the given query are ranked and top K queries are displayed as the recommendation result. QH model is compared with hashing-based methods, SimHash, Kernelized Locality Sensitive Hashing and Inverted list. This method achieves the best results in terms of efficiency and recommendation performance.

Li et al. [35] have proposed a query suggestion approach. In the learning step, a generative probabilistic model is obtained by learning external knowledge gained from the Web dataset for Web queries. Latent semantic topic model is used to organize the co-occurrence of the Web queries. Posterior distribution of hidden topics is obtained for each candidate query with this model. The topic distribution is acquired in online query suggestion step for an given input query. The candidate queries and input query similarity is computed by using their corresponding topic distribution. Finally, suggestions are provided by listing candidate queries based on similarity score. Precision and Mean Average Precision (MAP) is used as evaluation metrics. This approach gives better query suggestions than URL model and comparable results with the term feature model.

Liu et al. [36] have proposed a snippet click model for query recommendation. This model determines the information need of users from search logs. The clicked snippets are used to represent the information need of the users and with this judgement snippet click models are constructed. Click through rate and click amount are used as metrics to evaluate the performance of the algorithm. The proposed algorithm is providing more efficient recommendation than Biadu and Sogou search engines.

Table 6.1 shows comparison of related works.

Table 6.1 Related work comparison

6.3 Related Search Recommendation Framework and RSR Algorithm

6.3.1 Problem Definition

Given a user input query q and user click through log lg from the Web search engine S, our objective is to recommend expanded queries \(q_e\). It is assumed that the user is online while entering input query and considers only top-50 retrieved search results.

6.3.2 Co-occurrence Measures to Compute Semantic Similarity

Co-occurrence measures Dice, Jaccard, Pointwise Mutual Information (PMI) and Overlap are explained to calculate semantic similarity. The notation P(Q) is used to represent the page counts for the query Q in the search engine. The WebJaccard between terms \(T_1\) and \(T_2\), (i) WebJaccard(\(T_1\), \(T_2\)) is defined as

$$\begin{aligned} WebJaccard(T_1,T_2) = \frac{P(T_1 \cap T_2)}{P(T_1) + P(T_2) - P(T_1 \cap T_2) } \end{aligned}$$
(6.1)

Here, P(\(T_1\) \(\cap \) \(T_2\)) denotes the co-occurrence of terms \(T_1\) and \(T_2\).

(ii) WebDice(\(T_1\), \(T_2\))WebDice is defined as

$$\begin{aligned} WebDice(T_1,T_2) = \frac{2P(T_1 \cap T_2)}{P(T_1) + P(T_2)} \end{aligned}$$
(6.2)

WebOverlap(\(T_1\), \(T_2\)) is a natural modification to the Simpson coefficient. (iii) WebOverlap(\(T_1\), \(T_2\)) is defined as

$$\begin{aligned} WebOverlap(T_1,T_2) = \frac{P(T_1 \cap T_2)}{min(P(T_1),P(T_2))} \end{aligned}$$
(6.3)

Pointwise Mutual Information (PMI) is a measure of association used in statistics and information theory. It reflects the dependencies of two probabilistic events. (iv) WebPMI is defined as a modification of pointwise mutual information using page counts as

$$\begin{aligned} WebPMI(T_1,T_2) = \log _2\Bigg (\frac{\frac{P(T_1 \cap T_2)}{N}}{\frac{P(T_1)}{N}\frac{P(T_2)}{N}}\Bigg ) \end{aligned}$$
(6.4)

6.3.3 WordNet-Based Semantic Similarity

WordNet based measures are discussed to calculate semantic similarity. WordNet [38] developed by Princeton University is a lexical database in English. It is well suited for similarity measures, since it organizes verbs, nouns, adjectives and adverbs with variation in semantic relations into synonym sets (synsets) by representing one concept. It uses is-a relation to organize noun and verbs into hierarchies. Semantic relations used by WordNet are autonomy, synonymy, member, hyponymy, domain, relation, cause and similar and so on. wup (Wu and Palmer 1994), lch (Leacock and Chodorow 1998) and path calculates similarity with path length. lin (Lin 1998), res (Resnik 1995) and jcn (Jiang and Conrath 1997) measures similarity with information content, which is a corpus-based measure of the specificity of concept. WordNet also provides is-made-of, has-part, is-an-attribute-of, etc.,  non-hierarchical relations. With this additional relations, measures of relatedness are also supported by WordNet which are lesk (Banerjee and Pedersen 2003), hso (Hirst and St-onge 1998) and vector (Patwardhan 2003).

6.3.4 Rocchio’s Model

Rocchio’s model [37] and Snippet Click model [36] are compared with RSR algorithm. Rocchio’s Model [37] uses relevant and irrelevant URLs identified by users in search log to extend the input query. The extended query is used to carry out retrieval again. These URLs are converted into documents with title and snippet. Let the input query be q, the set of related documents accepted by users be \(D_r\) and the set of non-related documents be \(D_{ir}\). The expanded query \(q_e\) is computed by using Eq. 6.5. Here, a, b and c are parameters and their traditional values are 1, 0.8 and 0.1, respectively. Related documents are given more importance than non-related documents. The importance of terms which are present in both related and non-related documents and only in non-related documents is reduced by subtraction.

$$\begin{aligned} q_e = aq + \frac{b}{\mid D_r \mid }\sum _{d_r \in D_r}d_r - \frac{c}{\mid D_{ir} \mid }\sum _{d_{ir} \in D_{ir}}d_{ir} \end{aligned}$$
(6.5)

6.3.5 Snippet Click Model

Global-scale snippet click model [36] uses clicked URLs \(CLK_{url}\) from the user search log for a given input query q. Snippets are extracted for \(CLK_{url}\) and converted into documents D. Each keyword Term Frequency (TF) is calculated in documents D. Top N keywords with largest TFs is used as recommendation candidates. These N keywords are combined with the input query q and displayed as recommendations.

Related search recommendation framework is presented as shown in Fig. 6.1. Feedback sessions are generated for a given query from the user search logs and pseudo documents are mapped to it.

Fig. 6.1
figure 1

Related search recommendation framework

Feedback Sessions: Generally, a session can be defined as a list of consecutive queries to correlate a particular user search knowledge and clicked URLs for Web search [39]. Lu et al. [10] have focused on deriving a feedback session with a single query. In this chapter, query suggestions are generated for a query and hence a single session with a single query is suitable and is different from the traditional session.

The feedback session is defined with both clicked and un-clicked documents and it ends with last clicked documents in a session. This feedback session gives information that all the URLs have been examined and assessed by users before the last click. Figure 6.2 shows an example of feedback session for query bank exam. The left part is the 19 search results of the query bank exam and the right part is a user’s click series, with 1 as clicked URLs by user and 0 as un-clicked. Here, a single session includes 19 URLs, while the feedback session includes only 15 URLs. The feedback session consists of four clicked and six un-clicked URLs. Inside this session, the clicked URLs display that is relevant to the users and the un-clicked URLs display that is irrelevant to the users. The un-clicked URLs followed by the last clicked URL are ignored in the feedback session since it is not assured that users have scanned or not.

Fig. 6.2
figure 2

An example of feedback session for query bank exam in rectangular box

Generate Enriched Documents from Feedback Sessions: It is not suitable to use feedback sessions directly to obtain meaningful information for suggestions as it may differ for different search history and queries. Usually, users have ambiguous keywords in their minds to represent their information need. Hence, it is not a good idea to generate relation between the user query keywords for recommendations. Enriched documents [10] are generated from feedback sessions and this enriched document is used to locate keywords that appear in snippets clicked and un-clicked documents in feedback session. The method of generating enriched document is given in Function 6.1.

figure a

\(T_v\) and \(S_v\) vectors are given in Eqs. 6.6 and 6.7.

$$\begin{aligned} T_v = [t_{w1},t_{w2},...t_{wm}] \end{aligned}$$
(6.6)
$$\begin{aligned} S_v = [t_{w1},t_{w2},...t_{wn}], \end{aligned}$$
(6.7)

where \(t_{wm} =\) Term Frequency–Inverse Document Frequency (TF-IDF) value of the mth term in URL’s title and \(t_{wn} =\) TF-IDF value of the nth term in the URL’s snippet. The enriched document is defined as given in Eq. 6.8.

$$\begin{aligned} ED = w_tT_v + w_sT_s = [ed_{w1}, ed_{w2}....ed_{wk}] \end{aligned}$$
(6.8)

where \(w_t\) is the weight of the title, \(w_s\) is the weight of the snippet and \(ed_{wi}\) indicates the importance of ith term in the URL. As the title directly represents the URL information, it is necessary to give more importance to title terms than the snippet terms, and therefore \(w_t\) is set to 2 and \(w_s\) is set to 1. Five enriched documents are generated for five URLs of feedback session (see Fig. 6.1).

Generate Pseudo Documents from Enriched Documents: For a feedback session, each URL is converted into enriched document. This document contains frequent terms that appears in clicked and un-clicked documents. For each feedback session, a Pseudo Document is generated from its enriched documents. The method of generating Pseudo Document (PD) is shown in Function 6.2.

figure b

The generated \(PD =\) [\(ed_{w1}\), \(ed_{w2}\)...\(ed_{wp}\)]

$$\begin{aligned} ed_w = \mathop {\text {argmin}}\limits _{ed_w} \{ \sum _{M}[ed_w - ed_{wclk}]^2 - \lambda \sum _{N}[ed_w - ed_{wunclk}]^2 \} \end{aligned}$$
(6.9)

Here, \(ed_w\) is the optimized term in Pseudo Documents, \(ed_{wclk}\) is the term from clicked enriched documents, \(ed_{wunclk}\) is the term from un-clicked enriched documents and \(\lambda \) is a parameter balancing the importance of clicked and un-clicked URLs. \(\lambda \) is set to 0.5 because if \(\lambda \) is set to a small value, then un-clicked URLs importance is reduced and if \(\lambda \) value is too large then un-clicked URLs dominates the value of \(ed_w\). A pseudo document generated from five enriched documents is shown in Fig. 6.1.

Generate Optimized Pseudo Document from Pseudo Documents: The pseudo document reflects both the relevant and irrelevant documents to the users. Optimized Pseudo document is generated by combining all the pseudo documents for an input query. The method for generating optimized pseudo document is shown in Function 6.3. N is set to 10 as we observe that the top 10 terms are representing the users’ information need.

Semantic similarity is calculated between optimized pseudo document terms by WebJaccard, WebDice, WebPMI, WebOverlap methods and WordNet-based similarity measures as discussed. Recommendation results are generated and ranked by combining query and terms for all the methods. These results are evaluated in performance evaluation.

figure c

6.3.6 RSR Algorithm

In this section, we present Related Search Recommendation (RSR) Algorithm as shown in Algorithm 6.1

figure d

6.4 Experiments

6.4.1 Data Collection

To evaluate our proposed method, 95 students participated and each student is assigned 5 queries to collect the feedback session (Permission is taken from the Chairperson, Department of Computer Science and Engineering, UVCE, Bangalore). A Google middleware is implemented to monitor the user clicks. The top 50 search results from Google are retrieved for the submitted query. The title and web-snippets of resulting search are presented to the user as the snippets provide more information about the documents and help them to guide to the click URLs. Feedback sessions are generated through the clicked information of a user for a given input query. Table 6.2 shows the statistics of the clicked information of users for this experiment.

Table 6.2 Statistics of clicked information of users

6.4.2 Experimental Setup

The setup of Related Search Recommendation (RSR) framework is as follows: Feedback sessions are generated for a given input query from the user click through log as discussed. Each URL in the feedback session is enriched with title and snippet terms after removing stopwords and applying stemming. Terms are weighed using Term Frequency–Inverse Document Frequency (TF-IDF) as explained in Function 6.1. Enriched documents of a feedback session are classified into clicked and un-clicked documents. Pseudo documents are generated by Eq. 6.9. Similarly, Pseudo documents are generated for all the feedback sessions for an input query. Optimized Pseudo document is generated by combining all the pseudo documents as shown in Function 6.3. Optimized Pseudo document has top-10 terms which reflect the user’s information need. Semantic similarity between these terms \(t_s\) are calculated by WebJaccard, WebDice, WebPMI, WebOverlap methods and WordNet-based similarity measures. Recommendations are generated and ranked by combining query and terms \(t_s\) for all the methods.

The setup of Rocchio’s model is as follows: User-identified relevant and irrelevant URLs are partitioned from the user click through log for a given input query. These URLs are converted into documents with title and snippet. Stopwords removal and stemming are applied for these documents to reduce noise. Expanded queries are generated by Eq. 6.5.

The setup of Snippet Click Model (SCM) is as follows: All the clicked URLs from user click through log are obtained for a given input query. Snippets are extracted from these URLs. Top-10 keywords are extracted by calculating the term frequency of the terms present in snippets. Query recommendations are generated by combining the input query with extracted keywords.

To examine the effectiveness of considering only clicked URLs in our proposed method (click-RSR), enriched documents are generated with only clicked URLs. Pseudo documents are generated by setting \(\lambda \) value to zero in Eq. 6.9 to remove the effect of un-clicked URLs. Optimized Pseudo document is generated by combining all the pseudo documents as shown in Function 6.3. Optimized Pseudo document has top-10 terms. Semantic similarity between these terms \(t_s\) are calculated by WebJaccard, WebDice, WebPMI and WebOverlap methods. Recommendations are generated and ranked by combining query and terms \(t_s\) for all the methods.

6.4.3 Query Recommendation Results

Top-5 recommendation results of Rocchio’s model, Snippet Click model, Click-RSR and our RSR algorithm is shown in Table 6.3. Only terms are displayed in recommendation results due to space restriction. The actual recommendations for all models are query + terms. For query bank exam, recommendations for Rocchio’s model are bank exam finance, bank exam institute, bank exam tutor, bank exam ibpsadda and bank exam gr8ambitionz. Recommendations for Snippet Click Model are bank exam bank, bank exam competitive, bank exam exam, bank exam notification, bank exam awareness. Recommendations for Click-RSR are bank exam question bank, bank exam question tutor, bank exam papers bank, bank exam shortcuts bank, bank exam bank facebook. Recommendations for the RSR algorithm are bank exam tutor ibpsadda, bank exam institute finance, bank exam courses prepare, bank exam papers content, bank exam sector tutor.

Table 6.3 Related search recommendation results comparison

6.4.4 Performance Analysis

From the result shown in Table 6.3, it is observed that RSR algorithm recommends related queries to the given input query. Hundred test queries from various topics like Science, Shopping, Health care have been included.

Lu et al. [10] have discovered different users search goals for a query by using feedback session. This search goals can be utilized in query recommendations. Feedback sessions are utilized in this work and the performance of RSR algorithm is compared with different recommendation methods like classical Rochhio’s model [37], Snippet Click Model [36] and modified approach of RSR algorithm considering only clicked URLs. We have adopted Click Through Rate (CTR) method used in [36] to evaluate related search recommendations. CTR is the percentage of ever clicked recommendations in all recommendations for a given query. The set of students who participated in collecting click through log also participated in computing CTR as they can judge the recommendation results effectively. CTR is used to evaluate whether the recommendation is clicked by the user and a higher CTR value proves the effectiveness of the algorithm.

CTR is calculated for top-5 recommendations results generated with WebJaccard, WebDice, WebPMI and WebOverlap methods for RSR algorithm. The average value of CTR and ranked recommendations results are depicted in Fig. 6.3 for all the methods. The average CTR value of Top-5 recommendations are displayed in Table 6.4. CTR is also calculated for WordNet different semantic similarity measures. The average CTR value of Top-5 recommendations are displayed in Table 6.5. It is observed from WordNet similarity measures that few terms are not available in WordNet database, hence are not able to find out similarity between two terms. It is observed from Tables 6.4 and 6.5 that recommendations ranked with WebOverlap method have higher CTR value. Hence, WebOverlap method is adopted to rank RSR recommendations.

Fig. 6.3
figure 3

CTR versus ranked recommendation results

Table 6.4 Average CTR value for Top-5 recommendations for RSR and Click-RSR algorithm
Table 6.5 Average CTR value for Top-5 recommendations for WordNet similarity measures

Similarly, CTR is calculated for top-5 recommendations results generated with WebJaccard, WebDice, WebPMI and WebOverlap methods for click-RSR algorithm. The average CTR value of Top-5 recommendations are displayed in Table 6.4. It is observed that recommendations ranked with WebOverlap method have higher CTR value. Hence, WebOverlap method is adopted to rank click-RSR recommendations.

To compare the RSR algorithm with other models, the average CTR value and ranked recommendations are displayed in Fig. 6.4. The average CTR value of Top-5 recommendations for all the models are depicted in Table 6.6. It is observed that the RSR algorithm has highest CTR value in comparison with other models.

It is observed that the CTR value of the RSR algorithm increases by 25% in comparison with SCM. The major difference between our algorithm and SCM is the consideration of un-clicked URLs along with clicked URLs, while SCM considers only clicked URLs. Even the weighing of terms in SCM is limited to term frequency which is further optimized in RSR algorithm.

The CTR value of the RSR algorithm increases by 24% in comparison with Rocchio’s model. The difference between two approaches are as follows: (1) In our approach, feedback sessions are limited to the last clicked URL as the left-out URLs may not be of user’s interest. (2) Click through data is considered as sessions in RSR algorithm while in Rocchio’s model it is treated as group of clicked/un-clicked URLs.

Fig. 6.4
figure 4

CTR comparison with other models

Table 6.6 Average CTR value for Top-5 recommendations for all models

The CTR value of RSR algorithm increases by 5% in comparison with click-RSR. The major difference between RSR algorithm and click-RSR is the consideration of only clicked URLs in the feedback session. It is observed from the recommendations result from RSR algorithm that the terms from un-clicked URLs are also present. It is observed that top-5 recommendations from RSR algorithm for 100 test queries consists of about 23.5% of overall terms from the un-clicked URLs in the feedback sessions, which shows the importance of the un-clicked URLs scanned by users. Thus, the RSR algorithm outperforms the click-RSR.

6.5 Summary

In this chapter, we have presented Related Search Recommendation (RSR) algorithm to suggest related queries to given input query by using feedback session from user click through log. Each feedback session is converted into enriched documents. Pseudo Documents are generated by combining all the enriched documents of a feedback session. Optimized Pseudo Document is generated by combining all the Pseudo Documents for a given input query, which reflects the user’s information need. Semantic similarity is calculated by WebJaccard, WebDice, WebPMI and WebOverlap methods for terms present in the optimized Pseudo Document. Recommendations are generated and ranked by combining query and terms for all the methods. Simulations are performed on click through log generated by displaying title and snippet to the students of our college and compared with Rocchio’s model, Snippet Click Model and Click-RSR. Click Through Rate (CTR) is used as a performance evaluation metric. Simulation results show that RSR algorithm outperforms Rocchio’s model, Snippet Click Model and Click-RSR by providing higher CTR value. Further, this work can be extended to classify the search results into different topics.