Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

10.1 Introduction

There are currently several ways of measuring the success of e-commerce sites. Many companies specialize in analyzing e-commerce and each of them uses different methods and techniques to solve the same problem: Is this Website successful? Why or why not? How do you make it successful?

In this chapter a case study is presented where the effectiveness of an e-commerce site is studied using Web server log file (Web access logs) data and commercial data (sales figures). The Web data and selling data are combined to determine the Website’s ability to achieve the goal for which it was created: to sell. For this we will use clustering techniques and values obtained for site success metrics by adapting the work developed by Spiliopoulou [10]. The final goal is to provide the company under analysis with objective information that can be used to improve the site, raise marketer’s awareness of these factors and increase sales. This process combines technical Web data analysis and marketing analysis.

Firstly theoretical concepts, such as the concept of e-loyalty and how to measure the success of a site are presented. This is followed by an introduction to the problem and the company behind this case study. How the company’s Website’s buying process is organized is analyzed along with how the Web server log data are pre-processed and stored in a data warehouse. The Web server log results are measured, the selling results are evaluated and a clustering analysis is performed. The Website success is then measured for each of the clusters discovered. Finally, the main conclusions for this case study are presented along with proposals for future work.

10.2 Electronic Commerce on the Web

Electronic commerce may be defined as the process of buying, selling, or exchanging products, services, or information via computer networks [13]. E-commerce has definitely been changing the world in terms of the way people interact and schedule their time, in the way companies reorganize their selling processes and human resources, in the way governments relate to the people, to companies and to other countries. The economy, markets, society, the labor market and industry have all been and still are being shaken by e-commerce.

10.2.1 Electronic Commerce from a Marketing Perspective

“The ability to track a user’s browsing behavior down to the individual mouse clicks has brought the vendor and end customer closer than ever before. It is now possible for a vendor to personalize their product message for individual customers on a massive scale; a phenomenon that is being referred to as mass customization” [11].

Electronic commerce can be a huge source of income. Analyzing the data generated by Web usage in electronic commerce sites provides important added value. Web usage mining [4, 10, 11] can contribute to optimizing a company’s Websites, meeting the needs of users/customers and accomplishing the aims of the owners of the sites. For marketers, it is clear that they are most concerned with the return on investment. For Travis [12], approaching the World Wide Web in the right way brings considerable advantages; “There are four key benefits from a customer-centred approach: higher revenues, loyal customers, improved brand volume and process improvement.”

10.2.2 The Customer and E-Loyalty

Understanding consumer behavior on the World Wide Web is essential for the success of a business. With this knowledge, marketers will be able to respond to their consumers’ needs on time. After all, competitors are just one click away. Moreover, customers’ tolerance of inconsistency and mediocrity is rapidly disappearing [8].

Turban et al. [13] define e-loyalty as the customer’s loyalty to an entity that sells online, be it apparel, music, books or any kind of service. Customer acquisition and retention is a critical success factor in e-commerce. Liao et al. [6] warn that the maturity and low cost of technology in this business lowers the entrance threshold for new competitors. Furthermore, the transparency of information makes the business model easy for competitors to mimic.

After acquiring a new customer how can we gain his/her loyalty? “To gain the loyalty of customers, you must first gain their trust” [8]. According to Liao et al. [6] in order to maintain customers’ trust, you have to constantly improve the usability of your Website. The Website is the only way the customer has of getting to know its supplier. Therefore, the sites’ usability is of extreme importance in terms of deciding whether to trust the supplier or not.

The basis for loyalty is not technological; loyalty is based on old-fashioned customer service basics like: quality customer support, on-time delivery, compelling product presentations, convenient and reasonably priced shipping and handling and clear and trustworthy privacy policies. What is actually changing is the rhythm at which economies are played out and the need for speed in improving products and services [8]. Companies must constantly deliver a total customer experience to their customers. Seybold et al. [9] define a total customer experience as: “A consistent representation and flawless execution, across distribution channels and interaction touchpoints, of the emotional connection and relationship you want your customers to have with your brand.”

When customers identify themselves with a brand it is more likely that they become and remain loyal to it. Every time they come into contact with the brand, if they have a positive experience they are reinforcing that loyalty [9].

A company must define what kind of customers they are willing to attract because the site design strongly defines this. When defining what kind of customers to attract and which ones to avoid, the company must be aware of the different categories of on-line customers. There are types of loyalty-oriented customers and types of customers who flit—like butterflies—from site to site seeking bargains [8].

For companies that use both channels for business (the traditional and the Web) it is important to balance both in terms of human resources. A company should not view the Web channel as a mere way to reduce costs by bypassing its commissioned sales force. Reichheld and Schefter [8] give the example of a successful company that seamlessly integrates its Web channel with its traditional channel. This company pays sales commissions independently of the channel which was used to sell, because this means that the sales representatives direct customers to the most convenient channel. Windham and Orton [14] conclude that if Web retailers wish to establish Web brand loyalty and remain competitive, they must provide the components that create a good consumer experience.

10.3 How to Measure the Success of a Site?

“For a business deploying a personalization system, accuracy of the system will be little solace if it does not translate into an increase in quantitative business metrics such as profits or qualitative metrics such as customer loyalty” [1].

Website usage has been monitored one way or the other by examining Web server logs. A large number of success measures have been tried and developed since the first Websites were created and many programs are available for this end. Berthon et al. [2] modeled the flow of surfer activity on a Website as a six-stage process; this process has six indexes that measure: the awareness efficiency, the locability/attractability efficiency, the contact efficiency, the conversion efficiency and the retention efficiency. In 2001, Spiliopoulou and Pohle [10] used two of Berthon’s measures of a site’s success: the contact efficiency and the conversion efficiency. The first measure assesses how effectively the organization transforms Website hits into visits, which means the number of users that spent at least a user-defined minimum amount of time exploring the site. The second measures the ability to turn visitors into purchasers, which is measured by the ratio of users that after exploring the site also made a purchase. In this study, Spiliopoulou and Pohle [10] defined their goal as being not only to measure but also to improve a Website’s success. Mafe and Navarre [7] have made further developments by defining an online consumer typology. This is made by segmenting consumers according to their Web behavior and Web purchases. Their study intends to make the Website’s marketing actions more profitable and obtain a competitive advantage.

10.4 The B2B Portal Case

Starting with the measures mentioned above, the online activity of the e-commerce site owned by INTROduxi, a Portuguese company (www.introduxi.pt) was analyzed with the aim of identifying its strengths and weakness and taking into account its customers’ profiles. The core business of this Website is selling hardware and software products to retailers. It is a business to business Website. In order to be a registered user of this Website, the client must be a company with an activity related to selling computing material or providing assistance in the area. The company has a policy of selling through the online channel as well as through the call centre placed at the headquarters (this channel will be called the “classic channel”). As a consequence, the Website not only aims to offer electronic commerce but it also makes it possible for registered users to check which products exist and if they are available, using the call centre afterwards. As a matter of fact, there are some products for which the price is not available even when added to the shopping cart. In these situations the user is invited to contact their account manager at the call centre.

INTROduxi was formed in 1995 and is now one of the main players in the Portuguese computer market. The company aims to make their Website more successful with respect to either direct sales (selling through the Web channel) or indirect sales (using the Web channel to promote traditional sales). Although the activity of the site can be analyzed using Web analytics tools such as Google Analytics [5], such a solution has limitations that the study presented in this chapter proposes to overcome. For example, we present a cross analysis between Web accesses and offline sales.

The buying process is the process that leads the user to reach the site’s goal by confirming and concluding the order; it is also the focus of this analysis. The buying process on INTROduxi’s Website takes five steps (Fig. 10.1) from entering the site (step 1) to actually ordering the product (step 5). In the buying process there are two turning points that deserve attention. One is when the user enters an action page (a page which indicates that the user is potentially interested in purchasing) (step 2). The other turning point corresponds to accessing a target page, the page that corresponds to an actual purchase (step 5). It is important to note that it is possible to skip step 2 when a specific product is directly added to the shopping cart. In this case, after selecting the kind of product the user is interested in (step 1), the user immediately adds it to the shopping cart (step 3) instead of having a closer look at the specific product of choice (step 2).

Fig. 10.1
figure 1

The two different ways of completing a buying process on the INTROduxi Website

Table 10.1 Success measures

The meaning of the site’s success measures are now defined more precisely: the contact efficiency of an action page A, the relative contact efficiency of an action page A and the conversion efficiency of a page P with respect to a target page T. The definitions presented in Table 10.1 are adapted from Spiliopoulou et al. [10]:

The pre-processed Web server log data are stored in a data warehouse and the measures are implemented using SQL queries as described in the next section.

10.4.1 Pre-processing and Storage of Web Server Log Data

The ETL (extraction–transformation–loading) process and the data warehouse proposed in [3] were used to pre-process and store the Web server log data. The general architecture of such a data warehouse is presented in Fig. 10.2. In [3] the goal is to use the architecture with the widest possible applicability. For this case study the architecture was adapted according to our needs.

Fig. 10.2
figure 2

Architecture of the data warehouse

Only the site independent part of our data warehouse is used here. We exploit usage data extracted from Web access logs corresponding to 1 month of activity. The general model of the data warehouse follows a star scheme that is represented by centralized fact tables which are connected to multiple dimension tables (Fig. 10.3). The characters “#” and “*” indicate that the field is a primary or a foreign key in the table, respectively. The tables Parameter Page and Parameter Referer store the name and value of the parameters of an URI. These are neither fact nor dimension tables, they are just normalized relational tables to make the usage of parameters of an URI easier.

Fig. 10.3
figure 3

Star schema of the data warehouse

The Fact Table Usage is filled with data about accesses/requests to pages of the Website. The ETL process is described in Fig. 10.4 and it is implemented as a composition of different existing tools.

Fig. 10.4
figure 4

The process of extracting, transforming and loading (ETL) the Web data into the data warehouse

In the extraction step, the process creates a local version of the remote Website and access logs. This local version is stored in the Data Staging Area (DSA), a simple directory in the file system. Wget Footnote 1 and Scp Footnote 2 were used for this task. In the transformation step, the local version of the site and logs are pre-processed and transformed into useful information that is ready to be loaded into the data warehouse. The pre-processing of the access logs consists of merging the log files, removing irrelevant requests and/or data fields, removing robot requests and identifying users and sessions for the local version of the access logs. WUMPrep,Footnote 3 a collection of Perl programs supporting data preparation for data mining of web logs was used. For the loading step, two components were implemented, etlHtml and etlLog, they use simple SQL commands to load data into the data warehouse. Additionally, to handle data collected from a transactional database, a component was developed, called etlDb, to select data, pre-process and load them into the data warehouse.

Once the data has been stored in the data warehouse, the success measures can be calculated. To do that, only the highlighted fields in Fig. 10.3 were used. Using simple SQL queries, the numerator and denominator values can be calculated for the three measures. For example, the Contact Efficiency measure can be calculated as follows:

Numerator: :

Query to count all different sessions containing action pages about laptops.

Denominator: :

Query to count all different sessions.

10.5 Case Study Results

10.5.1 Contact Efficiency

The contact efficiency of A is the percentage of sessions in which an attempt to reach the site’s goal has been made using action page A. By computing this value for each action page, it is possible to (1) identify the impact of each page on the overall success of a site in engaging visitors and (2) detect pages with low contact efficiency. In the case of INTROduxi, the action page and the target page were chosen for each of the 81 product families. Each page is identified in the access log through a specific substring in the URI, as described in the example above. Table 10.2 depicts an aggregate of the results for the three measures for each type of product (accessories, components, computers, consumables, images, peripherals, networks and communications, software).

The overall results for the contact efficiency measure are necessarily low. This is an expected implication of the fact that there are many action pages that the user can go to. In other words, a user is not likely to visit such a number of different products at every session. However, it can be observed that different product types can have very different contact efficiency. This can be a measure of the popularity of the product or product type. Components and peripherals are in fact the products sold in the highest quantities. The relative contact efficiency is a measure appropriate for sites with many action pages, which is the case, or with a large number of inactive sessions, which is also the case.

Table 10.2 Efficiency results per product type

10.5.2 Conversion Efficiency

The conversion efficiency estimates the success of a given page in helping/guiding users towards a target page. With this measure, it is possible to study the impact of each page on the success of the site and to identify pages that have low conversion efficiency and require improvement. The action page for Computer (Computador), for example has a very low conversion efficiency (the lowest of the 8) for a page with such high contact efficiency (the fourth highest). This kind of discrepancy can be monitored and the sales force can be informed. However, as we will see, such differences can also be explained by different prices and product prevalence.

10.5.3 Log Based Metrics vs. Sales Metrics

Fig. 10.5
figure 5

Combining success measures and sales figures per family of products

As seen above, the efficiency of a page may be very different for different products or product types. Although this variation may be caused by usability problems, most of it is probably explained by the popularity and the price of the products. Products sold in large quantities also get more visits to their pages. Expensive products may have lower conversion efficiency. This hypothetical relationship will be studied in this section by relating the success measures obtained from the access log with the sales figures. The sales numbers consider three determinant variables for the success of the business: the quantities sold, the average price of the products and the turnover for each product. The success metrics come from the calculation of the contact efficiency, relative contact efficiency and conversion efficiency measures considering the entire log.

In Fig. 10.5 each point represents a family of products. The first observation is that there are different shapes of point clouds. Some tend to follow a diagonal line where the y axis increases with the x axis. Others follow a kind of power law. After a detailed look, it is possible to see that contact efficiency tends to increase with average price and turnover. Conversion efficiency has the opposite behaviour. In other words, it can be confirmed that expensive products have low conversion efficiency. Inexpensive products with low conversion efficiency should therefore be analysed in terms of the usability of their representing Web pages.

The relation between contact efficiency and quantity sold is more surprising, since there are highly visited products sold in relatively low quantities and vice versa. This may be explained by direct sales external to the Web channel.

10.5.4 Segmented Analysis

The next question is how usage behavior changes with customer segments. For this, INTROduxi’s customer profiles have been grouped according to their click stream behaviour and selling information. Therefore, contrary to the above analysis, which was performed on all users of the site, independently of being buying users or not, this clustered analysis was performed on customers only (users who actually made a purchase). The variables chosen were the following: the number of page views, the average page views per session, the average time per session, the customer share, the total number of units ordered and the average price per order. Using the SPAD softwareFootnote 4 seven clusters were found which can be characterized according to the relative contact efficiency results, as follows:

Cluster 1::

“Low-price I” (557 customers)—the action pages or the type of products that are relatively more important on the site for these customers are the ones with the lowest average prices.

Cluster 2::

“Good share & Poor navigation performance” (64 customers)—the Componentes and Perifericos (components and peripherals) type of products are the most important on the site, although the majority of these types of products are cheap, this cluster leans towards the products of greater value within this range.

Cluster 3::

“Big value orders” (eight customers)—the most important type of product for this cluster is Computador (computer), this makes a lot of sense since they are known for the incredibly high average price of their orders.

Cluster 4::

“Online all day long” (ten customers)—the type of products relatively more important to the customers within this cluster are Perifericos (peripherals). They visualize more pages per session than the others and have an extreme relative contact efficiency result of 100 %.

Cluster 5::

“Low-price II” (197 customers)—as in cluster 1, the type of products that are relatively more important on the site for these customers are the ones with the lowest average prices.

Cluster 6::

“Good share & Good performance” (140 customers)—the Componentes and Perifericos (components and peripherals) type of products are the most important within the site for this cluster; it offsets the low value of the products ordered with the volume of quantities ordered.

Cluster 7::

“Top share customers” (30 customers)—the action pages relatively more important on the site are the ones concerning the type of product Componentes (components), again a type of product with a low average price; this cluster totally wins the ordering championship, achieving the highest customer share via quantity.

Table 10.3 shows how the action pages for each type of product are visited by each cluster. The pages about computers, accessories and peripherals are successful within cluster 3. Otherwise the pages for computers do not have a great impact. With this information, the owners of the site can focus on particular groups when they analyze the success of each part of the site. In other words, the pages for a certain product type are only expected to be successful for certain groups of customers. It must be noted that this analysis can be performed with a finer level of detail, such as on product family or even product. Another interesting example is cluster 4, where peripherals are very important. If, during the continuous monitoring of these measures, this value drops for these customers, then something wrong is happening on the site.

Table 10.3 Relative contact efficiency results per cluster and by type of product

Table 10.4 depicts the conversion efficiency for the Web pages of each type of product under each segment. It is interesting to see, for example, that despite the huge difference in terms of contact efficiency of the pages on computers for clusters 3 and 2 (100 vs. 19.69 %), the difference is very small when it comes to conversion (71.43 vs. 60 %). This means that something on the action pages for selling computers is not working as expected for cluster 3.

Clusters 1 “Low-price I” and 5 “Low-price II” showed unexpectedly good conversion efficiency results for most types of products. It could be interesting to develop a targeted marketing campaign for the customers of these clusters. The purpose would be to increase these customers’ interest in higher added value products similarly to cluster 2 or to raise the number of orders among these customers similarly to cluster 6. Since the behaviour of these two clusters is very similar it would be better to test this marketing campaign on only one of the clusters. It could make more sense to start with cluster 5 “Low-price II” since its navigation performance results are better than those from cluster 1 “Low-price I”.

Table 10.4 Conversion efficiency results per cluster and by type of product

10.6 Conclusions and Future Work

In this chapter we have objectively measured the success of an e-commerce Website from a Portuguese company. We have adapted the Website success measures proposed in the literature to the specific buying process under study. We have observed that products with relatively high contact efficiency can have low conversion efficiency and vice versa. This highlights the pages which need more attention and the ones which are already successful.

Furthermore, we have combined the analysis of the access log data with the selling data available, obtaining further validation information on the marketing success of the Web pages. We conclude that, despite the fact that we have observed major tendencies; some product pages do not follow the general laws. Further investigation into the relation between the Web success measures and the commercial activity measures would be required.

We have performed clustering to segment clients according to sales and navigational behavior and studied the success of the site by segment. This makes it possible to see the success of parts of the site for different user groups. We can therefore design different action pages for different customer segments. We can also ignore expected low Web performance for some pages within some groups, as well as high demand performance within other groups.

Following the work presented in this chapter we would like to study the dynamics of Website usability, collecting contact and conversion efficiency measures over a period of time and determining maximum and minimum threshold values that can be used to monitor the success of the site and its parts. At the same time we are interested in performing a similar study for each of the segments identified.

It would be interesting to relate sudden variations in the success of the site and of its pages to decisions made by the marketing team, or to outside events. For example, if a new campaign increases contact but reduces conversion, this could be studied.

We would also like to implement a tool that collects data and continuously compiles the measures and makes it available to the site’s management. This way the management could make decisions in real time, based on how the customers are behaving on their Web site.