Keywords

1 Introduction

The almost ubiquitous availability of the World Wide Web has given an unprecedented opportunity to many organizations to maintain an effective online presence in order to create a large user base, which would enable them to accomplish their business goals. This would necessitate the designing of an effective and efficient web site, which is capable of serving large number of users in the best possible manner. Such web sites should have the potential to convert visitors into users even while retaining its existing users. This would enable them to sustain themselves and be competitive in the online environment. This task can be achieved by providing an effective and efficient browsing experience to its visitors. Designing such a web site poses many challenges due to changing navigational behavior of users, since every user visits a site for different purposes and the same user may visit the web site with different goals at different times [16]. No matter how carefully or judiciously a web site is designed, it may still not adequately cater to the needs and requirements of users due to highly unpredictable nature of user’s future navigational behavior. In [16], adaptive web site design is suggested as an alternative to address these issues. An adaptive web site is one that is capable of automatically improving its organization and presentation based on access patterns exhibited by the user’s navigational behavior. There are two approaches to web site adaptation namely customization based web site adaptation and transformation based web site adaptation [13]. The former approach mainly targets individual users, using their past navigation browsing behavior, based on which it speculates their possible future moves or preferences, and accordingly customizes the web page. In the latter approach, partial or, in some cases, almost total modification of the web site structure is done in order to cater to the browsing behavior of large number of users. This paper focuses on the transformation based web site adaptation, where there is a need to extract information from the web log data, comprising user-web interaction data, to understand and analyze the user navigation behavior. This extracted information can be used to improve the web site structure, which in turn would result in the improvement of the overall performance of the web site. Web site structure organization, which primarily aims to provide its users with easier navigation, has been an active area of research [1, 3, 7, 12, 13, 24]. In order to make the navigation easier and simpler for a user, several aspects or criteria need to be taken into consideration while optimizing the structure of a web site. These may include navigation, response time, credibility, content, usability, organization, interactivity, download time, presentation, visualization, product association level, number of clicks, search depth, information overload, web site links, access patterns, cost etc. [13, 7, 1315, 17, 20, 22, 28]. The relevance and importance of these criteria may vary for different web sites. The relevant criteria need to be considered for selecting a set of web objects, from amongst a large number of web objects, which should be displayed on the web site. This being a combinatorial optimization problem would require simultaneous optimization of multiple objectives based on the relevant and key criteria for a given web site. Several approaches exist in literature [1, 3, 7, 13, 22, 24, 28] that address this problem as a multi-criteria web site optimization problem. In this paper, multi-criteria web site optimization problem, given in [1], has been addressed as a tri-objective optimization problem with the three objectives being minimization of download time, maximization of web site visualization and maximization of product association level. This tri-objective optimization problem is addressed using the vector evaluated genetic algorithm (VEGA) [19], which is a multi-objective genetic algorithm.

This paper is organized as follows: Sect. 2 briefly discusses web site optimization followed by the multi-criteria web site optimization (MCWSO) problem in Sect. 3. MCWSO using VEGA is discussed in Sect. 4 followed by an example based on it in Sect. 5. Section 6 discusses experimental results. The conclusion is given in Sect. 7.

2 Website Optimization

The galloping growth of the World Wide Web has eased our daily life by offering almost all sorts of services; while on the other hand, it has posed major challenges to the web designers. The biggest challenge for them is to provide a website that caters to a large number of users having varied interests and having dynamic navigation behavior. The goal is to provide users with a smooth and satisfying browsing experience on the web. Many attempts have been made in this direction by using the knowledge stored in the weblog data, and by considering the various criteria and constraints [1, 7, 1215, 23, 25, 28], with the aim of improving the overall website structure and simplifying access to the web. In [7], user access pattern based website reorganization, that evolves the site into an adaptive web site, to cater to the user’s requirements efficiently despite their changing navigation behaviors, has been proposed. This approach comprises three main steps namely, pre-processing, page classification and site reorganization. In [28], an approach for website link structure improvement based on the user visiting pattern, that optimizes and re-evaluates the link structure, in order to improve the average connectivity has been discussed. While improving the website structure several criteria are considered like the download time, product association level and visualization score [1]. In [27], design features for website optimization, based on empirical studies of the users perception, has been discussed. These design features include easier navigation, up-to-date information, visual design, multimedia, site responsiveness, search tool, comprehensiveness of information, security of information, accuracy of data etc. Among these features, the easier navigation is considered as one of the most important feature for an effective website design. Low download time and high website visualization helps in attracting users’ attention. The up-to-date information makes the content more interesting and reliable. More comprehensive information makes the user stay longer and ease the navigation. Availability of the search tool facility provides for quick search of the desired items on a complex website. A quick responsive site can easily engage a visitor and can make the navigation smooth. Comprehensibility of information, and accuracy of data, will be helpful in increasing the interest and trust in the website and may result in the visitor revisiting the site. While optimizing the website design, there are many constraints that need to be considered in practice. These may include maximum out-degree of a page, maximum depth of the web site, overall site connectivity, basic links maintenance, page classification, compatible links and the security [7, 13, 22, 25, 26]. Fixing the maximum number of out-degree or links from a web page would help in reducing the information overload. Keeping the website depth minimum would help in finding the desired information, for a user, in lesser number of clicks resulting in user satisfaction. Overall site connectivity constraints entail the provisioning of a path from the home page to the rest of the pages of the website. The basic links structure should remain intact subsequent to the web site structure optimization.

Approaches applied to achieve an optimal website are mostly based on, either the mathematical programming models [3, 12, 22, 24] or the metaheuristic based approaches [1, 10, 13, 18, 23, 26]. In [24], web site optimization has been considered as a structural optimization problem and a framework, which maps the website design problem into a graph model, has been proposed to which mathematical optimization has been applied for arriving at an optimal web design. In [22], a mathematical programming model has been proposed for improved user navigation efficiency, catering to both experienced and new users. With the intent of assisting existing users and site designers, structure stability constraint has been discussed in this approach. In [3], a mathematical programming model has been proposed for website structure enhancement, while minimizing the cost of user disorientation due to complete reorganization of the website. This approach advocates structure re-organization by making minimal changes to its existing structure. In [12], a 0-1 programming model has been proposed for website reorganization based on information overload and search depth. In order to reduce the computation time, a two stage heuristic has been discussed in this paper. This approach faces scalability issues for websites having large number of links.

The ability of metaheuristic based methods to provide good solutions in reasonable amounts of time makes them suitable for addressing website structure optimization problems. In [10], a simulated annealing based approach for website structure optimization, considering webpage relinking, has been proposed. This approach utilizes the combined user preference data for website structure improvement for easier navigation. In [13], an ant colony optimization based website structure reorganization using two criteria namely, information overload and search depth, has been proposed for effective and easier navigation. This approach has been shown to take less computation time for relatively small synthetic and real websites. In [23], a mathematical model with the objective being the minimization of the weighted distance between web pages for e-commerce website, has been proposed. A statistical Hopfield neural-network, and the strategic oscillation-based tabu-search algorithms, were applied to optimize the link structure of the web pages. In [18], a quadratic assignment problem based method has been applied to improve the link structure and navigation of the website. The degree of connectivity and distance between the web pages has been used in this approach, and an ant colony metaheuristic based technique is applied to solve this optimization problem. In [26], an enhanced tabu search based technique for website structure optimization, using web usage data, has been proposed. This approach allows a cyclic path and derives the bi-directional association between two pages. It also discussed the notion of compatible link constraints, in order to maintain the business logic by preventing the links between designated pages. In [1], a multi-criteria website optimization (MCWSO) approach comprising three criteria namely, download time, visualization score and product association level has been proposed. This paper addresses this MCWSO problem, which is discussed next.

3 The MCWSO Problem

In order to achieve an effective and efficient web site that has the potential to convert visitors to users, there is a need to consider multiple relevant and key criteria to select an optimum sequence of distinct web objects for a web site with the aim of improving the browsing experience of a user. In [1], three such criteria namely product association level, download time and visualization have been considered for web site optimization. For an effective and efficient web site, the primary aims would be to minimize the download time (efficient access to web items), maximize the web site visualization (improve the look and feel of web site) and maximize the product association levels (increase the overall sale of items). This problem, considering these three criteria, has been formulated as a single objective optimization problem as given below:

$$ F = DT*w1 + \left( {1 - VS} \right)*w2 + (1 - PS)*w3 $$

where

$$ DT = \frac{{\mathop {\sum }\nolimits_{k = 1}^{M} D(k)}}{{(Max_{k = 1}^{N} D(k))*M}},\;VS = \frac{{\mathop {\sum }\nolimits_{k = 1}^{M} V(k)}}{{(Max_{k = 1}^{N} V(k))*M}}\;{\text{and}}\;PS = \frac{{\mathop {\sum }\nolimits_{k = 1}^{M} P(k)}}{{(Max_{k = 1}^{N} P(k))*M}} $$

where DT is the download time, VS is the visualization score, PS is the probability of selling an object after accessing the previous object, D(k) is the download time of the kth web-object in the sequence, V(k) is visualization score of the kth web-object, P(k) is the probability that the product or service represented by the kth web-object will be sold if followed by the (k1)th web-object in the sequence, N is the total number of web-objects and M is the number of web-objects appearing in the sequence under consideration i.e. a sequence contains any M web-objects out of the given N objects. The objective is to minimize F.

The above-mentioned single objective optimization problem has been addressed using the genetic algorithm in [1]. The choice of weights, which requires profound knowledge of the domain and user preferences, is the major limitation of the above aggregate weighted sum approach [21]. Also, addressing a tri-objective optimization problem, as a single objective optimization problem, does not appropriately reflect the real life web site optimization problem. In this paper, an attempt has been made to address this problem as a tri-objective optimization problem with the three objectives being Minimize Download Time, Maximize Web Site Visualization and Maximize Product Association level i.e.

$$ Minimize \,DT = \frac{{\mathop {\sum }\nolimits_{k = 1}^{M} D(k)}}{{(Max_{k = 1}^{N} D(k))*M}},\;Maximize \,VS = \frac{{\mathop {\sum }\nolimits_{k = 1}^{M} V(k)}}{{(Max_{k = 1}^{N} V(k))*M}}\;{\text{and}}\;Maximize\, PS = \frac{{\mathop {\sum }\nolimits_{k = 1}^{M} P(k)}}{{(Max_{k = 1}^{N} P(k))*M}} $$

The objective would be to select a sequence of distinct web objects, from amongst all possible web objects, based on the simultaneous optimization of the above three objectives. The inherent nature of this problem makes it a combinatorial optimization problem and metaheuristic techniques have been widely used for solving such problems. In this paper, a multi-objective genetic algorithm VEGA has been used to solve this multi-criteria web site optimization problem. The use of VEGA addresses the major limitation, i.e. the choice of weights, associated with the aggregate weighted sum approach used in [1], where the weights are not associated with the objectives. MCWSO using VEGA is discussed next.

4 MCWSO Using Vega

The MCWSO problem [1] and briefly discussed above, requires the simultaneous optimization of three objectives namely Minimize Download Time, Maximize Web Site Visualization and Maximize Product Association Levels for selecting web object sequences (wos), comprising of web objects, that would result in the sale of product. For this multi-objective genetic algorithm VEGA is used, which is discussed next.

4.1 VEGA

VEGA is considered to be the first multi-objective genetic algorithm [4, 6]. VEGA evaluates an objective vector, instead of the scalar objective function used in the aggregated weighted sum approach, against each element of the vector representing each objective function [4, 6]. VEGA, which is a simple extension of GA, modifies the selection operator of GA to solve the multi-objective optimization problem.

In VEGA [4, 6, 19], for a problem comprising M objectives, the population of N individuals (I 1 , I 2 ,…, I N ) is divided into M sub-populations (SP 1 , SP 2 ,…, SP M ), each of size N/M. Each sub-population considers only one of the M-objectives for computing the fitness of the individuals in it. Thereafter, the proportionate selection operator is applied to select individuals in each sub-population to generate a corresponding selected sub-population (SSP 1 , SSP 2 ,…, SSP M ) having size N/M. These M selected sub-populations, comprising individuals for mating, are shuffled together to obtain a population of mating individuals (MI 1 , MI 2 ,…, MI N ) having size N. This is followed by applying the crossover and mutation operators, as carried out in traditional GA, on the population of mating individuals to produce a population of N individuals (I 1 , I 2 ,…, I N ) for the next generation. VEGA has been used to solve the multi-criteria web site optimization problem discussed above. The proposed VEGA based MCWSO algorithm (MCWSO VEGA ) is discussed next.

4.2 MCWSOVEGA

In order to address the MCWSO problem discussed above, there is a need to select optimal sequences of distinct web objects with respect to criteria download time, web site visualization and product association levels. The proposed VEGA based MCWSO algorithm (MCWSO VEGA ) attempts to generate such web object sequences. Algorithm MCWSO VEGA is given in Fig. 1.

Fig. 1
figure 1

Algorithm MCWSO VEGA

As per algorithm MCWSO VEGA , initially, a population of web object sequences P wos is randomly generated. Each wos is comprised of distinct objects (M), each randomly chosen from amongst all the web objects (N). Since in the web site optimization problem under consideration, three criteria’s are used, i.e. |C| = 3, the population P wos is divided into |C| sub-populations SP wos , each of size P wos /|C|. Each sub-population SP wos uses only one of the criteria or objectives for evaluating the wos in it. This is followed by applying proportionate selection [9], based on the computed objective values of wos in each sub-population, to generate the corresponding mating sub-population of wos MSP wos . Thereafter, crossover and mutation is applied to produce a new population P wos of wos for the next generation. One of the major limitations of VEGA is the issue of speciation [8], i.e. it emphasizes on individuals that are good for individual objective functions. As a result, it is unlikely to find trade-off solutions that take into consideration all the objectives. In MCWSO VEGA , an attempt has been made to address this problem by considering one wos from each sub-population and perform crossover between them to produce an off-spring wos. Since there is a need to generate a sequence of distinct web objects, single point order crossover [5] is used. Next, the offspring wos produced after the crossover undergoes mutation. Again, since the web objects in wos are required to be distinct, swap mutation [11] is used.

After crossover and mutation, the new population of wos P wos is produced for the next generation. The algorithm runs till the stopping condition is met. The stopping condition can be the pre-specified number of generations for which the algorithms run or the stagnation condition is reached, i.e. there is not much improvement in the solution for a pre-specified number of generations. Thereafter, the algorithm MCWSO VEGA produces the Top-K web object sequences WOS Top-K as output.

An example illustrating the use of MCWSO VEGA to generate web object sequences, which is based on minimization of download time, maximization of web site visualization and maximization of product association levels is discussed next.

5 An Example

Suppose there are ten items to be sold through the website. Each of these items is represented by a web-object (A, B, C, D, E, F, G, H, I, J). The download time (D), visualization score (V) and probability P ij (of buying the jth item just after visiting the ith item) for each of these web-objects are given in Fig. 2a. First, the population of web object sequences P wos are randomly generated and are shown in Fig. 2b. Next, the population P wos is divided into three sub-populations SP wos -1, SP wos -2 and SP wos -3, each containing four web object sequences. These are given in Fig. 2c.

Fig. 2
figure 2

a D, V and Pij of web objects. b Sub-population of wos SPwos-1, SPwos-2 and SPwos-3

Next, the fitness values of wos in each of the sub-populations are computed. For wos in SP wos -1, SP wos -2 and SP wos -3 , the fitness is computed using DT, VS and PS respectively and are shown in Fig. 3. Next, proportionate Roulette Wheel selection is applied to each of the sub-population of wos SP wos -1, SP wos -2 and SP wos -3 to select the corresponding mating sub-population of wos MSP wos -1, MSP wos -2 and MSP wos -3, given in Fig. 4a. Single point order crossover, as discussed before, with crossover probability P c  = 0.75 is performed between wos in each of the mating sub-population MSP wos -1, MSP wos -2 and MSP wos -3 to generate offspring wos. This is followed by performing swap mutation with mutation probability P m  = 0.05. The new population of wos P wos produced after crossover and mutation are given in Fig. 4b. The above process is repeated for a pre-specified number of generations whereafter the Top-K wos is produced as output.

Fig. 3
figure 3

DT, VS and PS of wos in SP wos -1, SP wos -2 and SP wos -3 respectively

Fig. 4
figure 4

a Mating sub-population of wos MSP wos-1 , MSP wos-2 and MSP wos-3 . b Population of wos P wos for the next generation

In order to compare the quality of wos produced by MCWSO VEGA and the GA based MCWSO algorithm MCWSO GA [1], experiments were performed. The results and observations based on these experiments are discussed next.

6 Experimental Results

Algorithms MCWSO VEGA and MCWSO GA were implemented using JDK 1.6 in a Windows-7 environment. These algorithms were compared by conducting experiments on an Intel based 2.13 GHz PC having 4 GB RAM. First, in order to ascertain the performance of MCWSO VEGA and MCWSO GA for different crossover rate (CR) and mutation rate (MR), a graph showing the average value of fitness F, for weights w1 = 1, w2 = 1 and w3 = 1, against the number of generations for displaying 5, 10, 15 and 20 web objects were plotted. These graphs for MCWSO VEGA (VEGA(CR, MR)) and MCWSO GA (GA(CR, MR)), where CR = {0.6, 0.8} and MR = {0.05, 0.1), for displaying 5, 10, 15 and 20 web objects are shown in Fig. 5. It can be inferred from these graphs that for every CR and MR, MCWSO VEGA , in comparison to MCWSO GA , converges to a comparatively lower average fitness value F of Top-10 wos generated by them. Further, MCWSO VEGA performs best for CR = 0.6 and MR = 0.1 and MCWSO GA performs best for CR = 0.8 and MR = 0.05.

Fig. 5
figure 5

MCWSOVEGA versus MCWSOGA (Average F Vs. Generations of Top-10 wos—5, 10, 15 and 20 Web Objects)

Next, graphs to compare the average fitness F of Top-K wos generated by MCWSO VEGA and MCWSO GA for the observed crossover rates and mutation rates{CR = 0.6, MR = 0.1} and {CR = 0.8, MR = 0.05} respectively for 5, 10, 15 and 20 web objects were plotted and are shown in Fig. 6. It can be observed from each of these graphs that the Top-K wos generated by MCWSO VEGA have a lower average fitness F than those generated by MCWSO GA . This may be attributed to the fact that MCWSO VEGA simultaneously optimizes the three objective and produces offsprings resulting from crossover performed between wos, one each from each of the sub-population concerned with the three objectives.

Fig. 6
figure 6

MCWSOVEGA versus MCWSOGA (Average F Vs. Top-K wos after 500 Generations—5, 10, 15 and 20 Web Objects)

7 Conclusion

A VEGA based MCWSO algorithm MCWSO VEGA has been proposed in this paper that generates optimal web object sequences for a given web site based on three criteria i.e. the download time, their visualization and the product association level of web objects. Unlike MCWSO GA , MCWSO VEGA does not require assignment of weights to these three criteria. MCWSO VEGA selects web object sequences, from amongst all possible web object sequences, based on simultaneous optimization of the three objectives, namely the minimization of download time, the maximization of web site visualization and the maximization of product association levels. Further, the crossover operation has been adapted to ensure that the population produced after every generation comprises of web object sequences that reflects the three objectives. As a result, MCWSO VEGA is able to generate web object sequences that have an acceptable trade-off between the three objectives. Experimental results show that the MCWSO VEGA performs better than the MCWSO GA with regard to generating comparatively better web object sequences for a given web site. This would improve the browsing experience of the user and motivate the user to buy products online.