Keywords

1 Introduction

Understanding user behavior is one of the prime concerns of Web search related studies. It provides opportunities for advertising, suggestions on the design of user interface, and assessment indicators for result relevance.

One of the most common approaches to understand user behavior is analysis of query logs. Previous studies on general Web search focused on individual query, session analysis, and click position on the SERPs [3, 8, 10]. These findings help us gain better understanding in how users use search engine to get information. However, the way to show results in image search differs greatly from that of general Web search. Most image search results are organized in grids instead of the linear result list in general Web search. Therefore, there exists a huge difference in user behavior between general Web search and image search [4, 10]. Queries in image search have more zero hits and are more specific. What’s more, preview pages are provided by image search engine. The image preview page is an enlarged preview of an image, which is usually shown after the image is clicked on the SERP. On the preview page, the enlarged picture is presented together with navigation buttons (e.g. Previous, Next), download button, and thumbnails for further exploration of the results (see Fig. 1). This interaction mechanism is significantly different from the landing page reading process of general Web search engines. If users want to view more results, they can jump to another search result directly instead of turning back to the SERP. However to the best of our knowledge, few existing works investigate its influence in user behavior.

Fig. 1.
figure 1

Image preview page (shown after the image is clicked on the SERP).

In this study, we analyze user behavior including individual query analyses, temporal distribution, click-through behavior on the SERPs, and behavior on preview page based on the logs collected from a popular image search engine.

To summarize, the main contributions of this work are as follow:

  • We find a number of differences in search behavior between general Web search and image search. Compared to general Web search, image search users usually submit shorter query strings and their selections of query terms are more diverse.

  • We show that clicked images for the same query vary greatly across users, which potentially indicate serious challenge for click models to perform as well as they do in general Web search.

  • We analyze users’ exploratory behavior on preview page. 61.4% of users view more than one image on a preview page. It provides further evidence of exploratory behavior in image search.

The paper is organized as follows. In the next section, we review related work. Section 3 introduces our dataset. We report the analysis results of user behavior in image search in Sects. 47. Finally, we discuss conclusions and future work.

2 Related Work

With the wide application of Web search engines, log analysis has become one of the most common approaches to understand user behavior. There are many previous studies in large-scale Web search studies [3, 8, 13, 14], from which we can get a timeline of the evolution of general Web search. Intents behind queries can be classified into three categories: informational, navigational, and transactional [1]. Search engines need to deal with all three types because each type is best satisfied by very different results.

Several image search studies characterizing the general user behavior also have been performed in the past [2, 4,5,6]. Their approach take different factors into consideration, such as query length, query reformulation patterns, and the search depth. Andre [4] made a large-scale analysis of query logs to characterize some of the differences between general Web search and image search. It derived four main characteristics that make image search unique from its Web search counterpart. Compared to general Web search, image search leads to shorter queries, tends to be more exploratory, and requires more interactions [10]. Another related work in understanding image search behavior was conducted by Goodrum and Spink [7] They found that image queries contained on average 3.74 words. They also reported a high percentage of unique terms. These studies help us better understand the general user behavior in image search.

What do users search for in image search? A query log analysis showed that more than 20% of image search queries are related to location, nature, and daily life categories [15]. Pu [10] classified the 1,000 most frequent image queries based on a proprietary subject-based categorization scheme. They found that the majority of the queries were in the entertainment domain. Most recently, Park, et al.[2] further examined the differences between head queries and long-tail queries. They looked at the query types which belong to the intersection of subject-based and facet-based categorizations to uncover more fine-grained categories that cover a significant amount of search requests. It sheds light on the importance of considering query categories to better understand user behavior on image search platforms.

Understanding interactive behavior with image search result pages is also of vital importance. Interactive behavior provides abundant implicit user feedback for image search engine. Smith [16] presented a study that compares click-through on image searches with what has been discovered for traditional text search. They also evaluated searches for different types of images. Since the presented results are quite different (image v.s. text), the findings of previous studies based on traditional text search interactions were not applicable to image search interactions.

Most of the above studies mainly focused on interactive behavior on image search result pages. The variance of click behavior between different users still lacks fine-gained analysis. Although Park et al. [2] showed several user behavior patterns across query types on preview pages, little attention has been paid to the general situation.

3 Dataset

We take 7 days’ image search server logs on desktop collected in March 2017. We extract four types of behavior from the log: the SERP information, click events, download events, and preview page events. The details of the data are presented in Table 1. From the SERP information, we can get the rank position of all thumbnail result images. For click events, we have the id of the clicked image. For download events, we extract the entrance information including time and image id. For preview page events, we have the event type (e.g. mouse scroll, click the thumbnails below).

Table 1. The details of the data.

The dataset contains approximately 19.2M searches, 7.9M unique queries, 7.6M sessions, and 5.3M users (see Table 2). In this study, we also use another dataset collected in January 2017 from a popular Web search engine in China. It contains 287.5M searches and 67.9M unique queries.

Table 2. Dataset statistics.

4 Query Distributions

In this section, we analyze the query distributions from different aspects, including query length, query frequency, and the usage of advanced search. We also make a comparison between image search and general Web search.

4.1 Query Length

As the first step of our analysis, we look at the number of words and characters per query. It should be noted that the number of words here refers to the number of words separated by spaces. As shown in Fig. 2, there are 5.69 characters in each query on average. Among all the queries, 6.3% contain only one word and each query contains only 1.05 words on average. It means that the average length of words used in image search is 5.69/1.05=5.42, which is much larger than the average length of Chinese words. Users usually put two or more Chinese words together as the query string, instead of using spaces to separate each word. The search engine should run an effective word segmentation program after receiving a query.

We also investigate that the average number of words per query is 6.64 in Web search. This reflects that users are lazy when they input a query string. Image search engine gets even less information about what users really want to search for. Therefore we need to pay more attention to analysis of user behavior in image search. It helps understand users’ search intent and improve the quality of search results.

Fig. 2.
figure 2

The distribution of query length: (a)number of words per query, (b)number of characters per query.

4.2 Query Frequency

We also focus on the query frequency distribution. We get how many times each unique query was submitted in the 7 days.

As expected, a small number of queries account for a large part of all queries. At the head of the distribution, the top 0.17% of unique queries occur more than 100 times, which account for 26.12% of all queries and the top 3.5% of unique queries account for 50% of all queries, the top 23% of unique queries account for 70% of all queries. It also has a long tail, 78.14% of unique queries were submitted only once, and they account for 30.41% of all queries.

This is very similar to Jaimie Y. Park’s report in English environment [2]: 75% of unique queries were issued only once, and they account for approximately 25% of the traffic in the sample; the other 25% of queries account for 75% of all traffic. Only the top 5.99% queries account for 70% of all queries in Web search. It seems that their selections of query terms are less diverse.

This goes to show that a large part of queries committed by users are repeated. A small number of queries account for a large part of users’ needs. If the search engine can pay more attention to improving the ranking quality of those popular searches’ results, users’ satisfaction will improve significantly.

4.3 Usage of Advanced Search

Users can add some specific words and symbols (such as “and”, “or”, “not”, ‘+’) to query strings to use advanced search. The percentage of advanced search is only 0.73% reported in 2007 [3]. In our analysis, the percentage has grown substantially to 4.57% in image search, 6.06% in Web search.

5 Temporal Distribution

Next, we focus on what time of the day users use the image search engine on desktop. Figure 3 illustrates the queries as a distribution of the hour of the day. It shows that the majority of desktop image search occurs from 9AM to 5PM which are normal working hours. Interestingly, this is very similar to the statistics by Yang Song on Web search: the majority of desktop search occurs from 8AM to 5PM [11]. The number of searches decreases significantly during the lunch time. We also compare the time distribution between weekend and workday. The number of searches from 9AM to 5PM on a workday is obviously greater than that on weekend. Desktop searchers tend to use image search engine at work.

Fig. 3.
figure 3

Temporal distributions: (a)for hour of the day, (b)comparison between weekend and workday.

6 Session Characteristics

A session is “a series of queries by a single user made within a small range of time" as defined in Craig Silverstein’s study [8]. In this section, we partition a user’s actions into separate sessions when the time between consecutive actions exceeds 30 min [2]. We look at the distribution of queries per session and average number of sessions for a user in one day.

As is shown in Fig. 4, the average number of queries per session is 2.54. 56.6% of sessions contain only one query. 91% of sessions contain less than 6 queries. It is similar to the previously published result [2]: 2.95 queries per session.

Fig. 4.
figure 4

Number of queries per session.

5,255,291 users ever used image search engine in the 7 days. The average number of queries per user is 3.66, sessions per user is 1.44. 78.9% of users used the image search engine only once in the 7 days.

7 Interaction with Search Results

Understanding users’ interaction with search results helps make search intent explicit. Users’ interaction behavior can be used by search engine as relevance feedback data. In this section, we analyze browsing depth, the position distribution of clicks and click entropy distribution.

7.1 Browsing Depth

There are 5 rows of images on each page in the data we use. As users scroll down the result page, images are loaded page after page automatically. We use how many pages of results user explores during one query to define browsing depth. From Table 3 we can see a clear distribution of browsing depth. We find that image search has deeper browsing depths than Web search. Experiments show that 85% of users view only the first page of results in Web search [3].

Differ from the automatically loading of image search results, normal Web search engines have only 10 results each page. Users must click on the next page button if they want to browse more results. So rather than view more results, users tend to change query strings to get new results. This also indicates that image search is more exploratory than text search.

Table 3. Distribution of browsing depth.

7.2 Dwell Time

In our previous work, we conducted an eye-tracking study to investigate users examination behavior in image searches [17]. Based on the fixation data collected in the eye-tracking experiment, in this paper we calculate the dwell time for images on each page/row and plot users dwell time distribution in Fig. 5.

Fig. 5.
figure 5

Distribution of dwell time on each (a)page, (b)row.

As is shown in Fig. 5, the dwell time decreases with the page number and row number. The first page and first row have longer dwell time than the other positions. It shows that users pay more attention to images placed in the first page, especially in the first row.

7.3 Click Position Distribution

Next, we examine the position distribution of click. Most image search engines adopt two-dimensional result placement instead of linear result list in general Web search. As is stated in previous section, each result page contains 5 rows. Since each of the search engine result pages we collected contains 5 pages of results. Figure 6 shows the distributions of click counts on each row and page.

Clicks on the first page account for 57.68% of all clicks. A sharp decay is observed over the top 2 rows and top 2 pages. The top-ranked images have more clicks than those of lower ranked. It shows that position has more influence on click. On the other hand, this indicates that the first page have better relevance and users trust in search engine’s ranking.

Fig. 6.
figure 6

Distribution of click position: % of Clicks on (a)N-th row, (b)N-th page.

7.4 Click Entropy Distribution

In this section, we use click entropy to explain the following questions.

  • What’s the distribution of clicks in one session?

  • Is click behavior different between users in the same query?

The click entropy is calculated as follows:

$$\begin{aligned} ClickEntropy = \sum _{P_i} P_i log(P_i). \end{aligned}$$
(1)

Clicks in One Query: In one query, \(P_i\) refers to the distribution of one user’s clicks on image i. \(P_i\) = \(\frac{the\ number\ of\ clicks\ on\ image\ i}{the\ number\ of\ all\ clicks} \). We compute entropy only for queries that have at least one click. Figure 7 shows that for 72.04% of queries there is only one image clicked (maybe multiple clicks on the same image). A few queries have clicks on two or more images. However, in general Web search clicks are more dispersive. For 32.2% of queries, there is only one doc clicked [12].

Possibly because there are few click behavior in image search (average number of clicks is 0.89 per query). Based on Rongwei Cen’s findings [12], if users’ clicks are definite, we tend to think users are satisfied with the search results.

Fig. 7.
figure 7

Distributions of clickEntropy (considering click in one session).

Clicks of Different Users: Across users, we compute click entropy on all clicks and first click in one query. \(P_i\) refers to the distribution of all users’ (who submit the same query) clicks on image i, \(P_i\) = \(\frac{the\ number\ of\ clicks\ on\ image\ i}{the\ number\ of\ all\ clicks} \). We compute entropy only for queries that are submitted by at least two users.

Figure 8 shows that for 4.7% of queries there is only one image clicked. Clicked images for the same query vary greatly across users. It potentially indicates a large challenge for click models to perform as well as they do in general Web search.

Fig. 8.
figure 8

Distributions of clickEntropy (considering click in one query between different users): (a)all clicks of different users in one query, (b)the first click of different users in one query.

7.5 Behavior on Preview Page

Most of the image search engines provide preview of an image for users after clicking on it. Users can click the thumbnail, previous and next button to view more images. They can also download the full-size image. We make an analysis on these interaction behaviors.

Table 4 shows that the average preview duration is 4.67 min. It illustrates that users spend long time on further exploration. There is a huge difference between the average number of clicks on previous button and next button. Few users ever look back on previous images. For 38.60% of result clicks, only one image is viewed on the preview page. In means that a user clicks on an image, views the image, and does not preview any other result images on the page. In other words, 61.4% of users view more than one images on a preview page. It provides further evidence of exploratory behavior in image search.

Table 4. Preview page characteristics.

8 Conclusions and Future Work

In this paper, we carry out an analysis of user behavior in image search based on logs. It includes individual query analyses, temporal distribution, click-through behavior on the result pages, and behavior on preview page search. We obtain three interesting findings. (1) A number of differences in search behavior between general Web search and image search is found. Compared to general Web search, image search users usually submit shorter query strings and their selections of query terms are more diverse. (2) We show that clicked images for the same query vary greatly across users, which potentially indicate serious challenge for click models to perform as well as they do in general Web search. (3) We analyze users’ exploratory behavior on preview page. 61.4% of users view more than one image on a preview page. It provides further evidence of exploratory behavior in image search.

Our study makes a comprehensive analysis on user behavior in image search. Interesting directions for future work include the impact of image content, relevance, and diversity on the click. As mobile search is getting larger than desktop search, we also plan to investigate user behavior in image search on mobile devices and examine the difference between mobile devices and desktop.