1 Introduction

In the present scenario, real and interesting datasets are available in abundant amount worldwide for any discipline such as sports, politics, education, psychology, stock market. But searching of useful information is the tedious and complex task. Metasearch engines provide lists of different Web sites to a user query containing its keywords. These lists merely match the user satisfaction level. Hence, rank aggregation of several lists into one “super” list is one of the best solutions for getting better results. The various rank aggregation techniques were employed in literature such as score-based methods, positional methods, probabilistic methods, learning techniques and soft computing techniques (Beg and Ahmad 2003, 2004; Beg et al. 2016; Dwork et al. 2001; Renda and Straccia 2003; Montague et al. 2001). In order to investigate the performance of rank aggregation techniques, two distance measures such as Spearman’s footrule and Kendall-tau are used. The distance value of final aggregated list is calculated among other input lists. Hence, a normalized aggregated distance is obtained and top rank is assigned to a Web site having minimum distance value. The minimization of this normalized aggregated distance is known as optimal aggregation which is an NP-hard problem. A common rank aggregation technique was the Borda’s method developed in 1770s by Borda (1781). Although this method was applied on the political dataset, but later on it was investigated with several new problems like sports, psychology, stock market as application areas (Langville and Meyer 2012). In 1951, Arrow has proved his impossibility theorem, which describes the limitations inherent in the voting system. Shimura (1973) proposed a fuzzy logic technique for ranking of objects. Dubois and Prade’s (DP) (Ross 1997) technique was adopted for rank aggregation which was based on the on the fuzzy ordering of elements. In this technique apart from mean position of the document, averages over the results were also used. This approach was missing in the Borda’s method. Entropy technique by Ross (1997) was employed for ranking large datasets with lot of variation and complexity by using the concept of threshold for obtaining position of documents in a given range.

In 1998, paper by Larry and Brin (Page and Brin 1998) “Anatomy of a Search Engine” opened new horizons in the field of ranking and rating by designing Google search engines which was based on Markov chains to rank web pages. The Google search engine was based on the PageRank algorithm which helped in calculating the relevancy of new pages in comparison with other existing pages. PageRank (Page and Brin 1998) and Kleinberg’s HITS (Kleinberg 1999) algorithms are currently used for measuring the importance of a document on web. Dwork et al. (2001) considered the problem of rank aggregation in context of web. They studied rank aggregation technique to overcome the problem of spam fighting. Montague et al. (2001) came up with various methods of metasearch. His work was focused on the normalization score values which were used to decompose the metasearch concept.

Beg and Ahmad (2003) employed genetic algorithm (GA) for optimization of Spearman footrule distance for partial lists known as partial footrule optimal aggregation (PFOA). Beg and Ahmad (2004) proposed three strategies which were compared with classical fuzzy technique and Borda’s method for web applications. Yan et al. (2011) improved the quality of PageRank algorithm by introducing a new algorithm based on genetic algorithm known as a genetic PageRank (GPRA). The GPRA algorithm was based on the topic-sensitive idea which is divided into two phases: a collection of themes related to the calculation of PageRank vectors and online inquiries to determine the context of theme. The problem of refinement of page ranking was considered by Laughlin et al. (2011) using fuzzy set and logic. Akritidis et al. (2011) highlighted the effective rank aggregation techniques for metasearch. In his paper QuadRank, a new rank aggregation method was introduced which consider the query terms, the collected results and the data correlated with each of these results related to title, textual, snippet, URLs and individual rankings, etc. Likewise, the concept of feature selection was employed by Waad et al. (2013) in the area of rank aggregation. Rank aggregation problem was investigated (Pihur et al. 2014) on R tool for computing cross-entropy. The concept of multiple objective GA-based rank aggregation techniques was addressed by Kaur et al. (2015). The Mean and Stuart methods were applied to ranking of different queries. Beg et al. (2016) covered the rank aggregation problem in metasearch by modified Shimura method with addition of OWA operator. Kaur et al. (2017) had implemented the genetic algorithm by optimizing the distance measures for various URLs obtained from different search engines using rank aggregation techniques. Genetic-based Kendall-tau as (GKTu) and Spearman’s foot rule as (GSFD) distance measures were proposed to derive optimized distance measure values. However, the work reported by Beg and Kaur provide better solution to ranking problem, but it is having drawback of large execution time. Napoles et al. (2015) presented a model based on ant colony for partial ranking using the Kemeny approach.

In the past few years, more advanced nature inspired optimization techniques were proposed, such as ant colony, PSO and bee colony for the solutions to dynamic and mathematical problems. These techniques are yet to be applied for ranking of Web sites. Highly motivated from the different ranking techniques and need to provide better strategy for rank aggregation having less execution time are taken as problem objectives in this work.

In this paper, a bio-inspired ant colony-based rank aggregation methodology is proposed. Simulation solutions are received for five aggregation methods, such as Borda, Markov chain, scaled footrule, PageRank and mean-by-variance. At a later point, the execution of the suggested scheme is compared with traditional genetic algorithm on the basis of implementation time and minimized distance value. At last, three main performance metrics such as Precision, Recall and F-Measure are also discussed for checking the efficiency of various search engines.

2 Rank aggregation methods used for metasearch

The area of rank aggregation is divided into two categories, i.e., unsupervised and supervised learning approach. In case of unsupervised approach, no training dataset is required for deriving the results like Borda count (Borda 1781; Saari 2000), median rank aggregation by Dwork et al. (2001), genetic and fuzzy-based rank aggregation (Beg and Ahmad 2002), whereas the supervised techniques utilizes a dataset.

2.1 Kemeny optimal aggregation method

Kemeny optimal aggregation (Kemeny and Snell 1962) is one of the desirable methods used for ranking problems. Given a set of total rankings \(\sigma ^{1},\ldots ,\sigma ^{j},\ldots , \sigma ^{N}\) on the set of items Z, the goal is to find the order of Z, \(\sigma \). Suppose that the input rankings \(\sigma ^{j}\) are noisy versions of \(\sigma \), which is obtained by swapping two elements of \(\sigma \) with a probability \(r<0.5\) given as.

$$\begin{aligned} \sigma ^{*}= {\arg \min }_\sigma \frac{1}{N}\sum _{j=1}^N {K(\sigma ,\sigma ^{j}}) \end{aligned}$$
(1)

The estimate \(\sigma ^{*}\) is referred as to Kemeny optimal aggregation. Dwork et al. (2001) in his work had shown that the computation of Kemeny optimal aggregation is a NP-hard problem even when the number of rankings is four.

2.2 Borda’s method

Borda’s method (Borda 1781) is one of the classical method based on the position oriented score. The scores are assigned as per the position by which a candidate appears within each voter’s ranked list of preferences. The score value of all candidates is used for sorting. In this method, a score is assigned first, suppose \(S_i (c)=\) the number of candidates ranked below c in \(\alpha _i\) is the list and \(\alpha _1,\alpha _2,\ldots ,\alpha _k\) is the set of full list. For each candidate \(c\in D\), the total Borda score is expressed as:

$$\begin{aligned} \sum _{i=1}^k {S_i (c)} \end{aligned}$$
(2)

The candidates are sorted in decreasing order of the Borda score.

2.3 Markov chain method

This method of ranking designed by Markov in 1906 and its counter parts were proposed by Dwork et al. (2001). Markov chains for a system involves a set of states having different items. A nonnegative stochastic matrix M of \(n\times n\) item is formed. The method progresses initially by moving from a particular state to other state. The \(M_{ij}\) value indicates the state i to state j movement. The Markov chain comprised of probability distributed matrix formed by the product of M and current state distribution. The start state of distribution is indicated by x on N. The process follows a number of iterations continue until distribution reaches to a steady-state value which is known as stationary distribution. The stationary distribution is given by the principal of left eigenvector y of M, i.e., \(yM=\gamma y\). All entries in y are known as natural ordering on S which is called as Markov chain ordering of M as MC1, MC2, MC3 and MC4. These chains can be applied on different applications as per their requirements.

2.4 PageRank method

Page and Brin (1998) developed the PageRank algorithm at Stanford University. This algorithm had become the back bone of Google search engine along with several other search engines. The page rank methods calculate the relevancy of pages in comparison with other existing web pages. Other parameters like title, tag, anchor tag, keywords are also used to calculate the overall rank of page under PageRank.

Final results are arranged in descending order using following relation as.

$$\begin{aligned} \hbox {PR}\left( u \right) =\left( {1-d} \right) +d\mathop \sum \limits _{v\in B\left( u \right) } \frac{\hbox {PR}\left( v \right) }{N_v} \end{aligned}$$
(3)

where PR(u) denotes the page rank for page (u) and \(N_v\) denotes the number of out-links of page \((\upsilon )\). d is a dampening factor, and its value is 0.85. The value of d can vary as per user’s requirement and following the direct links and \((1-d)\) as the page rank distribution from non-directly linked pages.

2.5 HITS algorithm

HITS algorithm (Kleinberg 1999) is based on the concept of authority and hub instead of in-links and out-links concept for allotting rank to the pages. The number of edges among hubs and authority was taken for calculations. Weights were assigned to hubs and authority in order to modify the existing results. The algorithm works in the form of iterations for calculation of ranks. As first iteration starts, the weight assigned to a node is computed as:

$$\begin{aligned} a_i = \sum \limits _{j\in B\left( i \right) } h_{j} \quad {\text {and}} \quad h_j = \sum \limits _{i\in F\left( j \right) } a_i \end{aligned}$$
(4)

where \(a_i\) is the authority update connected to i and \(h_{j}\) is the hub update connected to j.

3 Rank aggregation-based distance measures

Rank aggregation method can be defined as generating a new list of elements which is closest to a set of given lists. The performance of a rank aggregation method is measured by two distance measures, namely Kendall-tau distance and Spearman’s footrule distance. These measures were first reported by Diaconis and Graham (1977) and Diaconis (1988).

3.1 Kendall-tau distance measure

This method is widely used to calculate the correlation between two rank lists. Kendall-tau distance in Kendall (1938) counts the number of inversion in the order of pair of candidates between two rank lists. The number of pair-wise disagreements is counted by the Kendall-tau distance between two full lists. The distance variables taken for two full lists k and l are denoted by \(\sigma \) and \(\tau \) formulated as:

$$\begin{aligned} K\left( {\sigma ,\tau } \right) =\left| {\left\{ {\left( {k,l} \right) ;k<l,\sigma \left( k \right) \left\langle {\sigma \left( l \right) \, {\text {but}} \, \tau \left( k \right) } \right\rangle \tau \left( l \right) } \right\} } \right| \end{aligned}$$
(5)

The normalized value for Kendall-tau distance is derived by dividing with greatest possible coefficient value of 0.5.

3.2 Spearman’s footrule distance measure

The Spearman’s footrule distance (Diaconis and Graham 1977) is defined as absolute distance between positions of candidates in two ranking lists. Suppose two document lists l1 : (d1, d2, d3, d4) and l2 : (d4, d3, d2, d1) are given. Spearman’s footrule distance d is calculated as the sum of absolute difference between positions of documents, e.g., position of d1 in l1 is 1st and l2 is 4th, and therefore, the contribution of document d1 to distance is \(|{1-4}|\). Hence, total distance is calculated as.

$$\begin{aligned} d= & {} \left| {1-4} \right| +\left| {2-3} \right| +\left| {3-2} \right| +\left| {4-1} \right| \\ d= & {} 8 \end{aligned}$$

The Spearman’s footrule distance calculates the sum of all elements, i.e., k for a given set, suppose ‘A’ is generated. Equations (5) and (6) give the absolute difference between the ranks of k obtained according to the two lists. The two given lists are \(\sigma ,\tau \), and their Spearman’s footrule distance is given as:

$$\begin{aligned} F\left( {\sigma ,\tau } \right) =\mathop \sum \limits _k \left| {\sigma \left( k \right) -\tau \left( l \right) } \right| \end{aligned}$$
(6)

In this work, minimization of Kendall-tau and Spearman’s footrule distance measures is formulated as objective functions:

$$\begin{aligned} {\text {Obj}}\_{\text {function}} =\min \{K(\sigma ,\tau ),F(\sigma ,\tau )\} \end{aligned}$$
(7)

4 Rank aggregation by genetic algorithms approach

Genetic algorithm (GA) was developed by Holland (1975) which forms the strong foundation for evolutionary computing (Yang 2010, 2014). The whole process of GA is basically divided into three steps which are crossover, mutation and selection. The use of GA for NP-hard problem was firstly applied by Goldberg (1989) in search and machine learning area. The general procedure for GA is to encode the objective or cost. Secondly, define the fitness function or selection criterion, and then create a population of individuals. Then, an iteration or evolution cycle is carried out by evaluating the fitness of all the individuals in the population. Also with these cycles, new population are created by using the older one. Lastly decode the results in order to obtain the optimized value (Yang 2010). The pseudo-code for GA optimized distance measures for rank aggregation is given as:

figure a

The objective function from Eq. (7) is evaluated with the selected set of chromosomes from the pool of randomly generated permutations of the universe \(U=\{1,2,3,\ldots , |U|\}\). Then, each of the chromosomes from the pool takes it turn to evaluate the objective function, which is the aggregated footrule distance and the aggregated Kendall distance of that chromosome with the rankings given by the M participating search engines. Those chromosomes having less values survive, whereas a more value-based chromosomes leaves the pool (Beg and Ahmad 2003).

During the crossover stage, we will consider two to-be-crossed chromosomes as \(D_i^k =[d_{(i,1)},d_{(i,2)},\ldots ,d_{(i,n)}]\) and \(D_j^k =[d_{(j,1)},d_{(j,2)},\ldots ,d_{(j,n)}]\) where \(n=|U|\).

Then, exchange of elements within the group of first \(m ({m<n})\) elements will take place, and remaining elements which are \(d_{(i,m+1)},d_{(i,m+2)},\ldots ,d_{(i,n)}\) in \(D_i^k\) and \(d_{(j,m+1)},d_{(i,j+2)},\ldots ,d_{(j,n)}\) in \(D_j^k\) will get adjusted to form new sets of chromosomes which are \([d_{j,1},\ldots , d_{j,m},d_{i,m+1}^{\prime }, \ldots ,d_{i,n}^{\prime }]\) and \(D_j^{(k+1)} =[d_{(i,1)},\ldots ,d_{(i,m)},d_{(j,m+1) (j,m+1)}^{\prime }, \ldots ,d_{(j,n)}^{\prime }]\) to form valid permutations. The implementation of permutation will take by using the concept of multiplication of permutations (Beg and Ahmad 2003). The following chromosomes will be formed after multiplication of permutation as per the relations formulated as.

$$\begin{aligned} \left\{ d_{i,m+1}^{\prime },\ldots ,d_{i,n}^{\prime } \right\}= & {} \{d_{(j,m+1)} ,d_{(i,m+2)},\ldots ,d_{(j,n)}\} \nonumber \\&\times \{d_{(i,m+1)},d_{(i,m+2)} ,\ldots ,d_{(i,n)}\} \end{aligned}$$
(8)
$$\begin{aligned} \left\{ d_{j,m+1}^{\prime },\ldots ,d_{j,n}^{\prime } \right\}= & {} \{d_{i,m+1)} ,d_{(i,m+2)},\ldots ,d_{(j,n)}\} \nonumber \\&\times \{d_{(j,m+1)},d_{(j,m+2)} ,\ldots ,d_{(j,n)}\} \end{aligned}$$
(9)

New generation-based valid permutation of the universe U for two distinct chromosomes will be formed as \(D_i^{k+1}\) and \(D_j^{k+1}\). The digit to-be-mutated is exchanged with any other randomly selected digit in that very permutation. This process will result in producing chromosomes of valid permutation (Beg and Ahmad 2003).

GA (Kaur et al. 2017) was applied for finding the optimal values of Kendall-tau (KTu) and Spearman’s footrule (SFD) distance measures. Also, the initial population is created with help of rank aggregation methods such as Borda, MBV, Markov, PageRank and scaled footrule methods given in Table 2. Then, the Kendall-tau and Spearman’s footrule are chosen as objective functions in order to calculate the distance value. The results obtained from 50 and 100 iterations using GA is shown in Tables 3 and 4. The complete steps for the proposed GA algorithm using Kendall-tau and Spearmen’s footrule for rank aggregation are discussed in Kaur et al. (2017).

From the literature, a lot of drawbacks such as large execution time, the choice of rate of mutation, crossover and selection are need to be carefully handled for GA to avoid false results. The performance of individual query is also not covered in our previous work (Kaur et al. 2017) in terms of Precision, Recall and F-Measure. Here we proposed ant colony-based Kendall-tau and Spearman’s footrule distance measures for rank aggregation to each query. The performance of individual search engine pertaining to each query with and without word association is also tested using the Precision, Recall and F-Measure parameters.

5 Proposed ant colony-based rank aggregation strategy

In 1992, M. Dorigo had discovered the foraging behavior of wild ants by noting their food searching capabilities. Later on this foraging behavior was utilized for solving various optimization problems reported by Dorigo (1992) and Dorigo and Gambardella (1997). Ant colony optimization strategy is mainly inspired by the food searching technique of ants by strengthening the pheromone concentration on the path closest to the food and evaporation of pheromone of pheromones from the path far away from the food/destination Yang (2014). ACO was also, applied for solving discrete problems addressed by Dorigo et al. (1999) and Blum (2005). Similarly, the natural behavior of honey bees for food search is given by Gould and Gould (1988). Primarily, the ACO is based on the principle that at first, a random route is generated which is similar to mutation. Then pheromone based selection helps in selecting the shortest routes. In ACO there is no explicit crossover but fitness based mutation helps in providing the solutions. In this work a web link rank aggregation problem is framed and two objective functions are defined such as Kendall tau and spearman foot rule. The URLs have been arranged in a manner that URL with minimum objective function values will occupy highest or first rank position in the whole list among other URLs. The pseudo code for proposed Ant Colony based rank aggregation strategy is given as:

figure b

The optimized values of objective functions obtained with the help of proposed ACO based strategy are depicted in the Tables 3 and 4 respectively.

6 Results and discussion

In this section, the performance of proposed ant colony-based strategy is compared with genetic algorithm for rank aggregation. In this work, initial dataset is taken and referred from Kaur et al. (2017) which comprised of three main queries (Q1–Q3) which are further refined by using word association as shown in Table 1. All these queries are fed to the five search engines such as \({ Google}^{1}\), \({ AltaVista}^{2}\), \({ Deeperweb}^{3}\), \({ Excite}^{4}\) and \({ Hotbot}^{5}\).

Table 1 Main queries and new logical operator-based Queries
Table 2 Aggregated rank list for each query by rank aggregation methods

The aggregated rank list is given in Table 2 for each query by rank aggregation methods such as Borda, mean-by-variance, Markov Chain MC4, PageRank and scaled footrule.

The comparative analysis between GA- and ACO-based ranking strategies is discussed in the next subsections.

6.1 Performance comparison of GA- and ACO-based ranking strategies

Both the proposed ACO- and GA (Kaur et al. 2017)-based ranking strategies are compared on the basis of experimental simulation results for 50 and 100 number of iterations. Table 3 indicates the values of optimized values of Kendall-tau distance measure and execution time at 100 iterations for each query by rank aggregation methods. The values of 50 iterations are not indicated due to space constraints. But the graphical results for 50 iterations are shown in Figs. 1 and 2, respectively.

Table 3 Comparison of proposed ACO and GA optimized Kendall-tau distance measure at (100 gen.)

Likewise, the optimized values of Spearman’s footrule distance measure and execution time for both GA- and proposed ACO-based techniques are shown in Table 4 at 100 iterations.

Fig. 1
figure 1

Graphical representation of fitness value at 50 and 100 iterations for GA and ACO to query Q1 by Borda

Fig. 2
figure 2

Graphical representation of fitness value at 50 and 100 iterations for GA and ACO to query Q3b by Borda

Table 4 Comparison of proposed ACO and GA optimized Spearman’s footrule distance measures at (100 gen.)

Tables 3 and 4 show the optimized distance values for Kendall-tau and Spearman’s footrule distance measure with standard form, GA and ACO at 100 iterations, respectively. In Table 3, it is observed that proposed ACO-based strategy gives better results as compared to GA. For example, for Query 1 the Kendall-tau distance value is 0.237 with standard method, with GA it is 0.226, and by using ACO it is reduced to 0.178 with very less execution time as 13.16 s. Also, at 100th iteration Spearman’s footrule value is 0.51 with GA, whereas with ACO the value is 0.31 with execution time of 14.98 s compared to 139.8 s with GA as shown in Table 4. Hence, from the overall analysis, the optimized results by proposed ACO-based strategy are found much better than GA and standard form for both the distance measure.

Table 5 Performance comparison of rank aggregation methods on the basis of aggregated ranks

However, average aggregated distance value for Borda method with standard form is found minimum, but ACO-based strategy gives optimized results for other ranking methods. Table 5 shows the average aggregated rank of all queries for Kendall-tau and Spearman’s distance measures.

Table 6 Comparison of rank aggregation methods on the basis of GA and ACO optimized Kendall and Spearman’s distance measures

Table 6 indicates the average aggregated rank values of Kendall Tau and Spearman Foot Rule for Genetic and proposed ACO based ranking strategies. The aggregated distance values for all the ranking methods have been found minimum with proposed strategy as compared to Genetic optimization (Kaur et al. 2017) as indicated in Table 6. It is concluded from the aggregated ranks that the Borda method has minimum rank value among its counterparts. Hence, the Borda method has been chosen for further analysis and for obtaining much refined rank list which can meet the user’s satisfaction level.

Figure 1 indicates the graphical representation of best cost derived from GA and proposed ACO approach for Borda method with Query Q1 at 50th and 100th iterations. Figure 2 signifies that ACO-based approach outcast GA in terms of best cost, i.e., the optimized distance measure value. Also, Fig. 2 represents the best cost for Borda method with Query Q3b at 100 iterations which gets improved from 50th iteration with the proposed ACO algorithm.

6.2 Best optimized aggregated rank list for Queries

For the given set of three queries Q1-Q3, to test the relevancy of a query two sub queries have been derived using AND-OR technique as indicated in Table 1, which consist of a total of nine queries. Hence, from the above analysis for selection of best query among the set of three queries like from the set of queries Q1, Q1a and Q1b, the query Q1 has occupied the top most position in the aggregated list with minimum distance value by Borda’s method as indicated in Tables 3 and 4. Table 7 shows the previous and optimized new ranks of 20 URLs for query Q1. Hence, the final aggregated list of 20 URLs for query Q1 is

Q1:\(\{1,3,2,4,7,9,16,19,12,18,5,10,6,11,15,8,13,14,17,20\}\) as given in Table 7.

Table 7 Optimized aggregated rank list of URLs

Similarly, among other queries such as Q2, Q2a and Q2b, the query Q2a is having more optimized aggregated list of URLs from scaled footrule rank aggregation method as given in Tables 3 and 4. Out of queries Q3, Q3a and Q3b, the query Q3a of PageRank is the winner on the basis of minimum distance measure value obtained from proposed ACO-based strategy given in Tables 3 and 4. Hence, the best final aggregated URL list for Q2a and Q3a is:

$$\begin{aligned} t_\mathrm{query2a}= & {} \left\{ 15,4,9,13,8,7,6,11,3,10,5,12,14,1,2,\right. \\&\left. 16,17,18,19,20\right\} \\ t_\mathrm{query3a}= & {} \left\{ 4,14,20,8,16,2,1,3,10,6,7,9,12,11,15,\right. \\&\left. 13,5,17,18,19 \right\} \end{aligned}$$

6.3 Performance metrics for optimized lists

In this section, the various testing parameters such as Precision, Recall and F-Measures are applied to investigate the performance of metasearch engines.

6.3.1 Precision metrics

The Precision metrics is also called as positive predictive value which is the fraction of retrieved instances that are relevant. The performance of final aggregated queries such as query Q1, Q2a and Q3a is evaluated on the basis of precision. Precision is the ratio of retrieved relevant document over retrieved documents. The Precision value for the URLs derived for query Q1 is calculated using Eq. (13). Table 8 represents the precision for top twenty documents retrieved from all the five search engines.

$$\begin{aligned} {\text {Precision}} =\frac{{\text {retrived relevant}}}{{\text {retrieved}}}= \frac{|{A\cap B}|}{|B|} \end{aligned}$$
(13)
Table 8 Precision for Query Q1

In Table 8, the average precision for Google and Excite search engines is found best than for other search engines.

6.3.2 Recall metrics

Recall is the ratio of retrieving relevant documents over the relevant documents. The Recall metrics is evaluated for Q1 which is the fraction of relevant instances that are retrieved also known as sensitivity. Using Eq. (14), the value of Recall is calculated for Q1 and results are shown in Table 9. The value of recall for first URL of Q1 is 0.05 which means that out of total 20 URLs (relevant documents retrieved), the first URL is 0.05 times relevant, i.e., (1/20). Similarly, for all top twenty documents the Recall is calculated as shown in Table 9.

$$\begin{aligned} {\text {Recall}} =\frac{{\text {retrived relevant}}}{{\text {relevant}}}= \frac{|{A\cap B}|}{|A|} \end{aligned}$$
(14)
Table 9 Recall for Query Q1

6.3.3 F-Measure metrics

F-Measure is also known as F-score which is the weighted harmonic mean of measuring the accuracy. The F-Measure is used during the work where Precision and Recall are equally weighted. It considers both the value of Precision and Recall to calculate the score. F-Measure is calculated using Eq. (15) for top twenty documents retrieved for Query Q1as shown in Table 10.

$$\begin{aligned} F{\text {-Measure}} =\frac{2\times {\text {Precision}} \times {\text {Recall}}}{{\text {Precision}}+ {\text {Recall}}} \end{aligned}$$
(15)

Table 10 represents the F-Measure for Query 1 calculated using Eq. (15) for twenty URLs.

Table 10 F-Measure for Query Q1
Fig. 3
figure 3

Average performance metrics

Figure 3 shows the comparison of Precision, Recall and F-Measure on the basis of average value of performance metrics for the query Q1. Also, Fig. 3 represents that out of all five search engines, Google performance metrics outcast the performance of other search engines. The positions acquired by search engines in terms of performance metrics are \(\hbox {Google}>\hbox {Excite}> \hbox {HotBot}> \hbox {AltaVista} > \hbox {Deeperweb}\).

The values for the Precision, Recall and F-Measure for all the final aggregated queries in order to check the relevancy of the documents are shown in Table 11. Also, Table 11 represents that the average performance metrics for Precision and Recall which are used to check the relevancy of each query as given in Tables 3 and 4 by the proposed ACO scheme. Hence, it is concluded from the performance analysis that Google and Excite search engines are the winners in producing most relevant documents to the users by the proposed ACO-based strategy.

7 Conclusion and future scope

Table 11 Average performance metrics

In this work, ranking problem of web links has been framed by applying different rank aggregation techniques. Kendall tau and Spearman’s footrule distance measures based objective functions are chosen for optimization purpose. Ant Colony based rank optimization strategy has been proposed to minimize the objective function value and to assign highest rank to a web link having lowest objective function value. Also, the performance of the proposed ranking strategy has been compared with the Standard and GA optimized rank aggregation methods on the basis of distance values (Kendall and Spearman) and execution time for each query as given in Tables 3 and 4 respectively. From the simulation results it has been concluded that Borda is the best method among its other aggregation methods as it gives minimum distance values. At last performance tests are applied such as Precision, Recall and F-Measure to test the proposed strategy. Also, the word association operators are applied to the queries which provide more prominent and very close results to the web lists provided by search engines. In future, other meta-heuristic approaches such as Particle Swarm, Teacher-learner, Moth search and hybrid techniques can be applied to optimize the performance of search engines and to improve metasearch techniques.