Keywords

1 Introduction

The Click-Through Rate (CTR) is specified as the proportion of number of clicks out of the number of impressions for an online advertisement. CTR is an important factor which helps in determining what advertisement is needed to be displayed and in what sequence. The choice of a correct advertisement along with the appropriate display sequence is significant influencers of the clicks made by user on that particular advertisement (Koonin and Galperin 2003). Impression of an advertisement is how the content of advertisement influences the user ant attracts him for clicking the advertisement. The advertisements which include expert evidence and statistical evidence have high CTR than the advertisements which involve causal evidence only (Haans et al. 2013). CTR outlines the number of users who click the specific website through the advertisement from the other applications (apart from search engines) like e-mail campaign, html link, search engines, and advertising. But with the number of clicks one has to check whether the clicks are generating revenues for the business or not otherwise all the clicks are worthless (Hopewell 2007).

Most search engines aim at maximizing the ad quality (which is measured by clicks) to be able to maximize revenue, and their ordering of advertisements by most search engines takes place based on their expected revenue. Also, the chance that an advertisement will get clicked on drops very prominently with the position of the ad, thus, the accuracy with which we estimate this CTR, is very important as it has a significant on the revenues that can be earned.

1.1 Need for CTR

The prominent question is how the description of an advertisement can influence the user or searcher that may result in high CTR. It helps the business in converting leads and thereby helps in increasing the revenue of the business. It also helps in forecasting digital performance of various campaigns, user traffic on their website, and identification of search keywords targets. A report based on keyword search statistics communicates that majority of the searches on platforms like Google results in listing of paid advertisements to attract clicks from the users (Catalyst and Martineau 2013).

The probability of clicking on advertisement drops significantly as the query of users does not match with the searched results. It could be as high as 90% as per few studies (Richardson et al. 2007). Therefore, the accuracy with which we estimate the CTR of a website or an advertisement can have a significant effect on revenue of the business. When a query is typed by the user, search engine matching occurs for all the available keywords which results in appropriate advertisement display. Further, these ads can be distinguished into three categories, namely, exact match, phrase match, and broad match. Exact match can be defined as exact similarity between the user’s query and keywords present in the search engine. Phrase match can be defined as the occurrence of the keyword as the subset of the query. Similarly, broad match can be defined as the close similarity between the keyword and the advertisement that appeared on the screen (Trofimov et al. 2012). On the basis of these evaluations, a relevance judgment will be done that will indicate the relevance of each document (Carterette and Jones 2007).

A ranking with a single perfectly relevant document might have low CTR than one for the same query that lists may somewhat relevant documents. Advertisements need to be ranked with already existed advertisements as ranking can have strong influence on the users and advertisers satisfaction.

1.2 Factors Impacting CTR

There are broadly two factors that influence the results of the queries on the search engine, viz., selection of the advertisements to be displayed and the order of the advertisement to be displayed, i.e., ranking need to take place. Advertisers specify under what circumstance the advertisements will be shown. Ranking of advertisements is done in order to place “best performing advertisement first”. If a user is shown an advertisement, will it be attractive enough to grab a click form the user? This is a very important question for any sponsored search advertising. “We find that click-through rates are higher for advertisements involving expert/statistical evidence than for advertisements involving causal evidence” (Haans et al. 2013).

Display Position

The number of advertisements that match a query or to say that an advertisement that is going to appear on specific search keywords by the user is far more than the number of advertisements that can actually be displayed (Richardson et al. 2007).

There are slots available which the advertisers look at occupying for the advertisements they are selling to search engines. The number of valuable slots available for eligible advertisements is very less. For instance, majority of the users searching information on engines seldom move ahead of the first page of the engine’s search results. If they do not find what they are looking for, they try a new set of keywords. In such a case, the number of advertisements which can be possibly displayed on this page corresponding to this query is limited. CTR is greatly affected by the position of an advertisement on this page. As the visibility goes on decreasing with the lower positioning of the advertisement, CTR also sees a declining pattern corresponding to the position. This is to say that the advertisements placed at lower slots are relatively less impactful.

Search Engine Advertisement

It is the focus of online marketers designing advertisements to design in a way that they are able to persuade the users to click on the posted advertisement and make every effort to turn these users into buyers. Advertisers write persuasive text messages in their advertisements to get them clicked. But the quality of their argument is of critical importance which helps in deciding the outcome of this process of persuasion. The advertisers make claims in their advertisement—presentations. This evidence becomes a part of the claim as the support required for improvement of the quality of the argument presented by the advertiser. Evidence may be referred to as “data (facts or opinions) presented as proof for an assertion” (Reynolds and Reynolds 2002, p. 429). There are different types of evidences that are used by the advertisers to improve the effectiveness of advertisement texts. These can be causal, statistical, anecdotal, and expert evidence (Hornikx 2005). General idea about the evidence has been given in the literature and is formulated by Rieke and Sillars (1975) as follows:

  1. a.

    Causal evidence—explaining the occurrence of an effect;

  2. b.

    Statistical evidence—a numerical representation of numbers;

  3. c.

    Anecdotal evidence—usage of case stories, examples, or illustrations;

  4. d.

    Expert evidence—citing experts in order to enhance the credibility of the advertisement.

Thus, for users querying on the search engines, conduction of an early stage search, with ads getting involved in expert evidence having high statistical evidence or credibility thereby increasing the verification of the information (Lindsey and Yun 2003), may have higher CTRs as compared to causal evidence-based ads, which are not based on simple rejection or acceptance cues. But, on the other hand, central route-based users, i.e., those who browse with the purpose of searching particular information, are in a position to grab underlying information and meaning related to the content of the message in a much better way. For this set of users, the causal argument is considered as a precious information source and thus their chance of conversion of a causal advertisement is more than any other kind of advertisement. Based on the environment and circumstances, the two different types of evidences may perform differently, with one being superior to the other. The most important task for any advertisement is to decide its objective. If the aim is increasing traffic, statistical or expert evidences must be used in an ad as these result in more number of clicks but if the aim is to get conversions, the causal evidence works the best on this metric and outweighs all others (Haans et al. 2013).

Banner Advertisement

Content elements include emotional appeals and use of incentives. Design elements include color, animation, and interactivity. Effective Internet banner advertisement builds brand value and thereby increases the efficiency of advertisements. Advertisers are inclined to modify the information on the advertisements based on the extent of their involvement. If they are highly involved, they use cognitive approach to make evaluations and factors such as color and sound do not make impact. But in case of low involvement, users barely pay attention to the content but just scan through. There is only subconscious level of engagement of the user. The interactive design of a banner advertisement has a substantial effect on its CTR as it facilitates two-way communication. The animation and the color it uses can also elicit some kind of emotional feeling in the users (Lohtia et al. 2003). In either case, it is either the design or the content of the advertisement that affects the CTR directly.

User Interactivity

This is one factor that primarily differentiates the traditional advertisement methods from the new-age online advertising. “The point and click nature of the online medium makes it easy for frustrated or bored visitors to head off to other websites; therefore, ads displayed on websites need to capture web surfers’ and web searchers’ attention through an emotional, rational, or mixed (i.e., rational and emotional) appeal” (Singh and Dalal 1999). “Interactive advertising enhances brand awareness and usually results in higher click-through rates than other forms of online advertising” (Rosenkrans 2010; Lemonnier 2008). A unifying principle behind advertisement based on good media content is its ability to interact with audience/user (Lemonnier 2008). An advertisement with better media-based content can attract and convert more number of consumer attention as compared to the static banners, and can be helpful in enhancing interactivity, along with allowing online users to engage with e-commerce based transactions without leaving the web hosting infrastructure for the advertisement (Briones 1999; Li and Bukovac 1999).

Relevancy of Keywords

Whenever any user puts in a query, the matching happens in the search engine with all the keywords and searched relevant advertisement for the searched query are also browsed. The relevancy of the keyword is decided by the degree of match between the query asked and the result optimized by the search engine. The relevancy of the keyword depends on some factors like word count, body of the word count, title of the advertisement, relevancy of capital letters in the advertisement, length of the query asked, etc. These factors decide the rankings of the advertisement on the basis of the relevance of the keywords used in the content of the advertisement.

Cyclic Fluctuations

CTR estimation can be done from the available historical data with an assumption of notable exceptions: infrequent searches, time-to-time occurrences, and isolated events. Here, cyclic fluctuations are complicated and widely studied. As per a study, approximately 34% of the keywords in the query show some degree of periodic behavior, but approximately 90% show no periodic changes (Regelson and Fain 2006).

Offer

Most people, who are surfing the web, are often trying to look for things they are not able to get offline. They search for better deals, discounts; they search for what the brand has for its online advertisement offer (Lohtia et al. 2003). Offers sent via e-mail advertisements build a brand relationship with its customers and have become a source of engagement. It has become a place to search for new products and talent.

1.3 Issues with CTR

The query issued by the user and advertisements placed on the basis of the query is carried out by web mining and ranking of similar advertisements, with best advertisement among them will be displayed on the top by the search engine. By optimizing the relevance of user, query may or may not lead to clicks. This is done in order to show the correlation between the clicks and the query rewrites. CTR of an advertisement gets highly affected by the popularity of the keywords (Jerath et al. 2014). CTR is predicted by the investigators by collecting historical click information as it provides tangible and intangible examples of user behavior. Sometimes sufficient historical data is available for CTR estimation which gives a reliable estimation, and with the clustering of the searched information the investigators could predict CTR. Even after estimating CTR from historical data, it may vary because of the smaller number of searches, ineffective impression, and thus a smaller sample size of data for estimating CTR (Regelson and Fain 2006).

Nowadays, CTR prediction is also done for news queries that should be displayed when it is highly relevant to the query that has been asked by the user. This growing trend of commercializing specialized content such as news, products, etc. furnished with web search results introduces challenges by mixing news search results with the regular search results panel (König et al. 2009).

Despite the fact that all forms of online marketers are striving hard to incorporate all elements that add efficiency to their advertisement, make it effective enough to not only gain a click but also get the desired lead; there are studies, elaborated in the literature review that the rate at which the advertisement gets a click is a lot more than the advertisement generates conversion. Also, studies show that the CTR for a mobile advertisement is more than the advertisement for desktop browsing (Ackley 2015).

1.4 Research Purpose

The research purpose for the current work is as follows.

  1. a.

    To study click patterns for individuals on the desktop and mobile ads, and

  2. b.

    To develop a machine learning based prediction model to estimate the click-through rate.

The study has been divided into five sections. In the initial section, we have already given the background about CTR. The next section will give an overview of the kind of work done in this field. It is followed by Sect. 3, i.e., methodology of the study. Then the results of the experiments are presented in Sect. 4 followed by Sect. 5 that describes the discussion of results obtained and conclusion.

2 Literature Review

Advertisement refers to any paid communication about a product or a service, which is not personal. Advertisement started with print media such as newspaper and magazines, and then it moved on to radio broadcast and television. With the evolution of Internet, the focus shifted toward online advertisement but now with increasing adoption of mobiles and other portable devices, the advertisers are now shifting their interest toward mobile advertisement. Mobile advertisement is being considered as the fastest growing platform for advertisement (Bart et al. 2012; Wang et al. 2011).

After clicking on the advertisement, the user is redirected to the website of the advertiser and then user makes decision of purchasing the product or buying the services provided on the website. Our definition of CTR does not involve purchasing of the product by the user after clicking on the advertisement.

Sponsored search occupies two-fifths of the overall online market of advertising. According to a study done by Catalyst and Martineau (2013) on Google Desktop CTR, 48% of the searches result in organic click on page one and remaining 52% searches result in paid clicks. Thus, there is high possibility for a user clicking on a sponsored ad (Catalyst and Martineau 2013).

The position of an advertisement in the sponsored search list can impact the CTR of the advertisement. CTR reduces with different positions, and conversion rate initially goes up and then comes down for larger keywords (Agarwal et al. 2011). Joachims et al. (2005) also talk about biasness of trust which results in more clicks on advertisement ranked higher on search engine. A study done by Catalyst and Martineau (2013) on Google Desktop CTR also suggests that order in which the advertisement is displayed affects its CTR. Thus, position of an advertisement can be considered as integral influencer which impacts the CTR of an advertisement.

The content and design of an advertisement can impact the CTR of an advertisement banner (Lohtia et al. 2003). The advertisement size and design help in increasing the CTR for ads (Sigel et al. 2008). The authors stated that an advertisement banner of size 160 × 160 style achieved the highest CTR when compared to advertisement banner of size 728 × 90 and 300 × 250. The authors further stated that the advertisement banner of size 160 × 160 performed better than 728 × 90 in the interaction rate and 300 × 250 size advertisement banner achieved the highest interaction rate, and thus it shows that it becomes relevant to study the size of advertisement banner for increasing the efficiency of the advertisement.

There is a close relationship between visual appearance of an advertisement banner and user response (Azimi et al. 2012). The authors conducted different experiments to find out the relationships of visual features for prediction of click-through rate, along with the performance classification and ranking. There are many factors that can lead to user clicking on an advertisement. According to the authors Richardson et al. (2007), factors like reputation, attention capture, relevancy landing page quality, etc. impact the CTR on an advertisement. Further, the authors developed a model for predicting the probability that the advertisement will be clicked by the user on the basis of above factors.

Zorn et al. (2012) talk about the influence of language and animation on banner CTR. The authors conducted experiment across two website types to study the influence of language and animation on CTR. The authors reported that language had no impact on the two websites while search sites portrayed high-level differences among the static and animated sites.

Lohtia et al. (2003) studied the dependence of CTR on design of the advertisement and content of the message. To study the characteristics of design, the authors studied interactivity, color, and animation. The authors further stated that CTR is impacted by interactivity in the advertisement, and use of animation and emotion improved the CTR for B2C advertisement banner but for B2B banner advertisement, CTR decreased and using moderate level of color was found to be more effective than using high or low level of color in banner advertisement. There is a close relationship between visual appearance of an advertisement banner and user response (Azimi et al. 2012).

According to Tucker (2010), the banner advertisements that provide privacy control of personalized information are likely to attract more users toward the advertisement. The study conducted by the author indicated that clicks can be doubled on advertisement that provides privacy control and uses unique private information to personalize their message. Most of the firms based on Internet collate humungous information about the web visitors and use the information to create personalized advertisement (Tucker 2010). Using personalized information about user can lead to negative reaction from consumer which can cause consumer to avoid the advertisement’s appeal (White et al. 2008). Cleff (2007) also talks about the mobile advertisement’s privacy issues. As per the author, the success of m-advertising is determined by the industrial and legislative initiative’s development and execution. The author further added that users should have control in some form on the data in their phone and there should be a mechanism for choice of mobile advertisement in their phones. Trust is an important factor that decides whether a user is going to click on advertisement or not (Young and Wilkinson 1989). A study done by Davis et al. (2011) suggests that user’s trust is impacted by factors like reputation of the vendor and structural assurance.

Bart et al. (2012) suggest that product type and product involvement are two important determinants of user’s intent of purchasing the advertised products. The authors further stated that presence of these two factors provides more exposure to advertisement. Effectiveness of online advertisement is also impacted by determinants like Internet skills and usage, content of the advertisement, location of ad, and income (Mohammed and Alkubise 2012). The authors’ findings suggested that location of advertisement is the most important determinant of the online advertisement’s effectiveness.

Mobile advertisement is the process of advertising on wireless devices (Chen, Wu and Li 2014). The authors’ findings further suggest that quality and creativity impacts the performance of product marketing, and it positively affects sales performance of the product. A study done by Catalyst and Martineau (2013) suggests that paid advertisements on mobile are found to be more effective than desktop as the screen size is small and results are viewed in limit. Free applications available on the Google Play (earlier know as Android Market) and App Store from Apple follow a revenue model in which the free application includes advertisement which gets inserted in the application itself and it is shown at different positions during the usage (Vallina et al. 2012). 73% of applications available on Google Play are free (Leontiadis et al. 2012), and thus it can be considered that free applications attract larger number of users, and hence larger number of downloads as compared to paid applications (Hamburger 2014). The mobile advertisement model consists of three actors: user who uses the application, the developer who expects benefits from usage of the application, and advertisement network helping the developer with compensation in exchange of advertisements of user’s interest (Leontiadis et al. 2012).

The prior literature (Agarwal et al. 2011; Catalyst and Martineau 2013; Lohtia et al. 2003; Azimi et al. 2012) reveals that advertisement position impacts the CTR of an advertisement. The display and contents of advertisements, and use of animation, language, and privacy controls are some factors that can impact the CTR of an advertisement. It also indicates that the advertisers are now more focused toward mobile advertisement because of increasing number of mobile user.

Most studies done on mobile advertisement have been focused upon the estimation of number of clicks on the advertisement but very less research has been done on conversion of those of clicks. Click-through rate only provides information regarding the number of clicks on a particular advertisement; it is not an effective measure to study the number of users who actually visited advertiser’s website with intent of purchasing the product or using the services provided on the website. In this chapter, we will be building a CTR model through machine learning which will be able to predict whether an advertisement will be clicked or not by the user.

3 Methodology

The study has been designed to forecast the clicks by user on an online advertisement. A total 0.8 million of transactional data has been used to prepare and test the model. Classification tree model has been used to develop the click estimation using SPSS version 21.0. The data is available from Avazu which has shared its data with Kaggle (https://www.kaggle.com/c/avazu-ctr-prediction/details/timeline). The available data fields are the ad identifier, binary symbol for click or non-click, time format, position of banner, site details (id, domain, and category), app details (id, domain, category), mobile device details (id, ip, model, and type), type of connection, and some anonymized categorical variables.

The sample data was portioned in two groups with eighty percent as training group and twenty percent as test group. The first group was used for model development, while the other group was used for testing the rules generated. Two parameters were used to check the model’s significance—Accuracy and precision. The total number of correct predictions out of total elements is defined as accuracy (AC) and can be defined as follows:

$$ \frac{TruePositive + TrueNegative}{TruePositive + FalsePositive + TrueNegative + FalseNegative} $$

Classification trees are easy to implement technique and majorly implemented in areas like retail, health care, BFSI, etc. CHAID classification tree algorithm has been used in this study which has segmented the set into different groups. “These segments, called nodes, are split in such a way that the variation of the response variable (categorical) is minimized within the segments and maximized among the segments. After the initial splitting of the population into two or more nodes (defined by values of an independent or predictor variable), the splitting process is repeated on each of the nodes. Each node is treated like a new sub-population” (Ramaswami and Bhaskaran 2010). There is hierarchical output generated from the CHAID modeling, as shown in the analysis section. And the tree has been used to forecast the Click-Through Rate (CTR).

4 Results

Each node in Fig. 1 contains the details of node id (ID), number of data objects (N), and the possible outcomes of “CTR and non-click-through rate.” The tree begins from the topmost decision node (ID = 0) with (N = 799,999) instances of the dataset, and the whole dataset is divided further on the basis of variable that is likely to affect customer intend of clicking a particular website or not. The topmost node suggests that site category is the utmost important variable that is likely to affect customer’s intention to click a particular website. The first branch stemming out of the tree, i.e., (ID = 1 to 9), represents the occurrence of all those events that are likely to be generated. It can be seen that ID = 1 and ID = 5 have the best CTR for the site category, while ID = 3, 4, 7, 8 have the next best set of CTR.

Fig. 1
figure 1

First-level classification tree for the dataset based on clicks and non-clicks

Figure 2 showcases the further bifurcation of (ID = 1) on the basis of site id and subsequently with device connection. This is the second most important variable that advertisers should take into consideration while running ads online. ID = 1 containing 177,984 instances is further split into 18 nodes (ID = 1 to 18), each node representing different site ids with their respective CTR and non-click-through rates. Here, Node 10 is further split into two more nodes (ID-37 and 38), on the basis of predictor variable device connection type and Node 13 is also further split into two different nodes (ID-39 and 40), on the basis of predictor variable device connection type.

Fig. 2
figure 2

Classification tree nodes at level 2 based on site ID

It can be concluded that for site category (ID = 1) and site id (ID = 15), the CTR is 63.8% which is higher when compared to others. Similarly, more nodes were tested for the dataset. Model validation was done with the help of split sample method. The sample data was divided into two sub-groups, test and training, with the first being used to test the rules generated out of the second group. The results revealed that the overall model prediction accuracy of 83.83% was achieved from the CHAID technique (accuracy for test set and train set was found to be approximately similar to each other), which suggests that it seems that CHAID is a fairly efficient way of classification model for CTR prediction, and was found to be better than the analysis presented by Ramaswami and Bhaskaran (2010).

5 Conclusion

This study has reviewed the difficulty of estimating CTR for advertisements using CHAID model. CHAID method was useful in visualizing the relation between CTR and other related factors. The decision rules have been formulated to estimate CTR. Some of the important variables have been presented in the study that affects the probability of an advertisement to be clicked. This study would be useful for online advertisement agencies and marketing managers to take decisions regarding the placement of an online advertisement. Also, it would be useful for managers to obtain better ROI for various ad campaigns.