A Study of User Image Search Behavior Based on Log Analysis

Wu, Zhijing; Xie, Xiaohui; Liu, Yiqun; Zhang, Min; Ma, Shaoping

doi:10.1007/978-3-319-68699-8_6

Zhijing Wu¹⁸,
Xiaohui Xie¹⁸,
Yiqun Liu¹⁸,
Min Zhang¹⁸ &
…
Shaoping Ma¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10390))

Included in the following conference series:

China Conference on Information Retrieval

646 Accesses
4 Citations

Abstract

Study of user behavior in Web search helps understand users’ search intents and improve the ranking quality of search results. To better understand user’s Web image search behavior in practical environment, we investigate user behavior by analyzing a query log collected in one week from a popular image search engine in China. We focus on individual query analyses, temporal distribution, click-through behavior on the search engine result pages (SERPs), and behaviors on preview pages. Compared to general Web search, image search users usually submit shorter query strings and their selections of query terms are more diverse. We find that there exists a huge difference among users in image search click-through behavior. Users are more likely to do exploratory search compared to that in general Web search. This finding may provide us some insights about users’ behavior in the context of image search. Our findings may also benefit multiple perspectives of image search, such as UI design, effectiveness evaluation, ranking algorithms, and etc.

This work is supported by Natural Science Foundation of China (Grant No. 61622208, 61532011, 61672311) and National Key Basic Research Program (2015CB358700).

Access provided by CONRICYT-eBooks. Download conference paper PDF

An Eye-Tracking Study of User Behavior in Web Image Search

The effects of multiple query evidences on social image retrieval

Article 19 December 2014

Shop by image: characterizing visual search in e-commerce

Article 03 March 2023

Keywords

1 Introduction

Understanding user behavior is one of the prime concerns of Web search related studies. It provides opportunities for advertising, suggestions on the design of user interface, and assessment indicators for result relevance.

One of the most common approaches to understand user behavior is analysis of query logs. Previous studies on general Web search focused on individual query, session analysis, and click position on the SERPs [3, 8, 10]. These findings help us gain better understanding in how users use search engine to get information. However, the way to show results in image search differs greatly from that of general Web search. Most image search results are organized in grids instead of the linear result list in general Web search. Therefore, there exists a huge difference in user behavior between general Web search and image search [4, 10]. Queries in image search have more zero hits and are more specific. What’s more, preview pages are provided by image search engine. The image preview page is an enlarged preview of an image, which is usually shown after the image is clicked on the SERP. On the preview page, the enlarged picture is presented together with navigation buttons (e.g. Previous, Next), download button, and thumbnails for further exploration of the results (see Fig. 1). This interaction mechanism is significantly different from the landing page reading process of general Web search engines. If users want to view more results, they can jump to another search result directly instead of turning back to the SERP. However to the best of our knowledge, few existing works investigate its influence in user behavior.

In this study, we analyze user behavior including individual query analyses, temporal distribution, click-through behavior on the SERPs, and behavior on preview page based on the logs collected from a popular image search engine.

To summarize, the main contributions of this work are as follow:

We find a number of differences in search behavior between general Web search and image search. Compared to general Web search, image search users usually submit shorter query strings and their selections of query terms are more diverse.
We show that clicked images for the same query vary greatly across users, which potentially indicate serious challenge for click models to perform as well as they do in general Web search.
We analyze users’ exploratory behavior on preview page. 61.4% of users view more than one image on a preview page. It provides further evidence of exploratory behavior in image search.

The paper is organized as follows. In the next section, we review related work. Section 3 introduces our dataset. We report the analysis results of user behavior in image search in Sects. 4–7. Finally, we discuss conclusions and future work.

2 Related Work

With the wide application of Web search engines, log analysis has become one of the most common approaches to understand user behavior. There are many previous studies in large-scale Web search studies [3, 8, 13, 14], from which we can get a timeline of the evolution of general Web search. Intents behind queries can be classified into three categories: informational, navigational, and transactional [1]. Search engines need to deal with all three types because each type is best satisfied by very different results.

Several image search studies characterizing the general user behavior also have been performed in the past [2, 4,5,6]. Their approach take different factors into consideration, such as query length, query reformulation patterns, and the search depth. Andre [4] made a large-scale analysis of query logs to characterize some of the differences between general Web search and image search. It derived four main characteristics that make image search unique from its Web search counterpart. Compared to general Web search, image search leads to shorter queries, tends to be more exploratory, and requires more interactions [10]. Another related work in understanding image search behavior was conducted by Goodrum and Spink [7] They found that image queries contained on average 3.74 words. They also reported a high percentage of unique terms. These studies help us better understand the general user behavior in image search.

What do users search for in image search? A query log analysis showed that more than 20% of image search queries are related to location, nature, and daily life categories [15]. Pu [10] classified the 1,000 most frequent image queries based on a proprietary subject-based categorization scheme. They found that the majority of the queries were in the entertainment domain. Most recently, Park, et al.[2] further examined the differences between head queries and long-tail queries. They looked at the query types which belong to the intersection of subject-based and facet-based categorizations to uncover more fine-grained categories that cover a significant amount of search requests. It sheds light on the importance of considering query categories to better understand user behavior on image search platforms.

Understanding interactive behavior with image search result pages is also of vital importance. Interactive behavior provides abundant implicit user feedback for image search engine. Smith [16] presented a study that compares click-through on image searches with what has been discovered for traditional text search. They also evaluated searches for different types of images. Since the presented results are quite different (image v.s. text), the findings of previous studies based on traditional text search interactions were not applicable to image search interactions.

Most of the above studies mainly focused on interactive behavior on image search result pages. The variance of click behavior between different users still lacks fine-gained analysis. Although Park et al. [2] showed several user behavior patterns across query types on preview pages, little attention has been paid to the general situation.

3 Dataset

We take 7 days’ image search server logs on desktop collected in March 2017. We extract four types of behavior from the log: the SERP information, click events, download events, and preview page events. The details of the data are presented in Table 1. From the SERP information, we can get the rank position of all thumbnail result images. For click events, we have the id of the clicked image. For download events, we extract the entrance information including time and image id. For preview page events, we have the event type (e.g. mouse scroll, click the thumbnails below).

Table 1. The details of the data.

Full size table

The dataset contains approximately 19.2M searches, 7.9M unique queries, 7.6M sessions, and 5.3M users (see Table 2). In this study, we also use another dataset collected in January 2017 from a popular Web search engine in China. It contains 287.5M searches and 67.9M unique queries.

Table 2. Dataset statistics.

Full size table

4 Query Distributions

In this section, we analyze the query distributions from different aspects, including query length, query frequency, and the usage of advanced search. We also make a comparison between image search and general Web search.

4.1 Query Length

As the first step of our analysis, we look at the number of words and characters per query. It should be noted that the number of words here refers to the number of words separated by spaces. As shown in Fig. 2, there are 5.69 characters in each query on average. Among all the queries, 6.3% contain only one word and each query contains only 1.05 words on average. It means that the average length of words used in image search is 5.69/1.05=5.42, which is much larger than the average length of Chinese words. Users usually put two or more Chinese words together as the query string, instead of using spaces to separate each word. The search engine should run an effective word segmentation program after receiving a query.

We also investigate that the average number of words per query is 6.64 in Web search. This reflects that users are lazy when they input a query string. Image search engine gets even less information about what users really want to search for. Therefore we need to pay more attention to analysis of user behavior in image search. It helps understand users’ search intent and improve the quality of search results.

4.2 Query Frequency

We also focus on the query frequency distribution. We get how many times each unique query was submitted in the 7 days.

As expected, a small number of queries account for a large part of all queries. At the head of the distribution, the top 0.17% of unique queries occur more than 100 times, which account for 26.12% of all queries and the top 3.5% of unique queries account for 50% of all queries, the top 23% of unique queries account for 70% of all queries. It also has a long tail, 78.14% of unique queries were submitted only once, and they account for 30.41% of all queries.

This is very similar to Jaimie Y. Park’s report in English environment [2]: 75% of unique queries were issued only once, and they account for approximately 25% of the traffic in the sample; the other 25% of queries account for 75% of all traffic. Only the top 5.99% queries account for 70% of all queries in Web search. It seems that their selections of query terms are less diverse.

This goes to show that a large part of queries committed by users are repeated. A small number of queries account for a large part of users’ needs. If the search engine can pay more attention to improving the ranking quality of those popular searches’ results, users’ satisfaction will improve significantly.

4.3 Usage of Advanced Search

Users can add some specific words and symbols (such as “and”, “or”, “not”, ‘+’) to query strings to use advanced search. The percentage of advanced search is only 0.73% reported in 2007 [3]. In our analysis, the percentage has grown substantially to 4.57% in image search, 6.06% in Web search.

5 Temporal Distribution

Next, we focus on what time of the day users use the image search engine on desktop. Figure 3 illustrates the queries as a distribution of the hour of the day. It shows that the majority of desktop image search occurs from 9AM to 5PM which are normal working hours. Interestingly, this is very similar to the statistics by Yang Song on Web search: the majority of desktop search occurs from 8AM to 5PM [11]. The number of searches decreases significantly during the lunch time. We also compare the time distribution between weekend and workday. The number of searches from 9AM to 5PM on a workday is obviously greater than that on weekend. Desktop searchers tend to use image search engine at work.

6 Session Characteristics

A session is “a series of queries by a single user made within a small range of time" as defined in Craig Silverstein’s study [8]. In this section, we partition a user’s actions into separate sessions when the time between consecutive actions exceeds 30 min [2]. We look at the distribution of queries per session and average number of sessions for a user in one day.

As is shown in Fig. 4, the average number of queries per session is 2.54. 56.6% of sessions contain only one query. 91% of sessions contain less than 6 queries. It is similar to the previously published result [2]: 2.95 queries per session.

5,255,291 users ever used image search engine in the 7 days. The average number of queries per user is 3.66, sessions per user is 1.44. 78.9% of users used the image search engine only once in the 7 days.

7 Interaction with Search Results

Understanding users’ interaction with search results helps make search intent explicit. Users’ interaction behavior can be used by search engine as relevance feedback data. In this section, we analyze browsing depth, the position distribution of clicks and click entropy distribution.

7.1 Browsing Depth

There are 5 rows of images on each page in the data we use. As users scroll down the result page, images are loaded page after page automatically. We use how many pages of results user explores during one query to define browsing depth. From Table 3 we can see a clear distribution of browsing depth. We find that image search has deeper browsing depths than Web search. Experiments show that 85% of users view only the first page of results in Web search [3].

Differ from the automatically loading of image search results, normal Web search engines have only 10 results each page. Users must click on the next page button if they want to browse more results. So rather than view more results, users tend to change query strings to get new results. This also indicates that image search is more exploratory than text search.

Table 3. Distribution of browsing depth.

Full size table

7.2 Dwell Time

In our previous work, we conducted an eye-tracking study to investigate users examination behavior in image searches [17]. Based on the fixation data collected in the eye-tracking experiment, in this paper we calculate the dwell time for images on each page/row and plot users dwell time distribution in Fig. 5.

As is shown in Fig. 5, the dwell time decreases with the page number and row number. The first page and first row have longer dwell time than the other positions. It shows that users pay more attention to images placed in the first page, especially in the first row.

7.3 Click Position Distribution

Next, we examine the position distribution of click. Most image search engines adopt two-dimensional result placement instead of linear result list in general Web search. As is stated in previous section, each result page contains 5 rows. Since each of the search engine result pages we collected contains 5 pages of results. Figure 6 shows the distributions of click counts on each row and page.

Clicks on the first page account for 57.68% of all clicks. A sharp decay is observed over the top 2 rows and top 2 pages. The top-ranked images have more clicks than those of lower ranked. It shows that position has more influence on click. On the other hand, this indicates that the first page have better relevance and users trust in search engine’s ranking.

7.4 Click Entropy Distribution

In this section, we use click entropy to explain the following questions.

What’s the distribution of clicks in one session?
Is click behavior different between users in the same query?

The click entropy is calculated as follows:

$$\begin{aligned} ClickEntropy = \sum _{P_i} P_i log(P_i). \end{aligned}$$

(1)

Clicks in One Query: In one query, $P_i$ refers to the distribution of one user’s clicks on image i. $P_i$ = $\frac{the\ number\ of\ clicks\ on\ image\ i}{the\ number\ of\ all\ clicks} $. We compute entropy only for queries that have at least one click. Figure 7 shows that for 72.04% of queries there is only one image clicked (maybe multiple clicks on the same image). A few queries have clicks on two or more images. However, in general Web search clicks are more dispersive. For 32.2% of queries, there is only one doc clicked [12].

Possibly because there are few click behavior in image search (average number of clicks is 0.89 per query). Based on Rongwei Cen’s findings [12], if users’ clicks are definite, we tend to think users are satisfied with the search results.

Clicks of Different Users: Across users, we compute click entropy on all clicks and first click in one query. $P_i$ refers to the distribution of all users’ (who submit the same query) clicks on image i, $P_i$ = $\frac{the\ number\ of\ clicks\ on\ image\ i}{the\ number\ of\ all\ clicks} $. We compute entropy only for queries that are submitted by at least two users.

Figure 8 shows that for 4.7% of queries there is only one image clicked. Clicked images for the same query vary greatly across users. It potentially indicates a large challenge for click models to perform as well as they do in general Web search.

7.5 Behavior on Preview Page

Most of the image search engines provide preview of an image for users after clicking on it. Users can click the thumbnail, previous and next button to view more images. They can also download the full-size image. We make an analysis on these interaction behaviors.

Table 4 shows that the average preview duration is 4.67 min. It illustrates that users spend long time on further exploration. There is a huge difference between the average number of clicks on previous button and next button. Few users ever look back on previous images. For 38.60% of result clicks, only one image is viewed on the preview page. In means that a user clicks on an image, views the image, and does not preview any other result images on the page. In other words, 61.4% of users view more than one images on a preview page. It provides further evidence of exploratory behavior in image search.

Table 4. Preview page characteristics.

Full size table

8 Conclusions and Future Work

In this paper, we carry out an analysis of user behavior in image search based on logs. It includes individual query analyses, temporal distribution, click-through behavior on the result pages, and behavior on preview page search. We obtain three interesting findings. (1) A number of differences in search behavior between general Web search and image search is found. Compared to general Web search, image search users usually submit shorter query strings and their selections of query terms are more diverse. (2) We show that clicked images for the same query vary greatly across users, which potentially indicate serious challenge for click models to perform as well as they do in general Web search. (3) We analyze users’ exploratory behavior on preview page. 61.4% of users view more than one image on a preview page. It provides further evidence of exploratory behavior in image search.

Our study makes a comprehensive analysis on user behavior in image search. Interesting directions for future work include the impact of image content, relevance, and diversity on the click. As mobile search is getting larger than desktop search, we also plan to investigate user behavior in image search on mobile devices and examine the difference between mobile devices and desktop.

References

Broder, A.: A taxonomy of web search. SIGIR FORUM. 36(2), 3–10 (2002)
Article MATH Google Scholar
Park, J.Y., O’Hare, N., Schifanella, R., Jaimes, A., Chung, C.: A large-scale study of user image search behavior on the web. In: Proceedings of CHI (2015)
Google Scholar
Huijia, Y., Liu, Y., Zhang, M., Liyun, R., Ma, S.: Research in search engine user behavior based on log analysis (in chinese). J. Chin. Inf. Process. 21(1), 109–114 (2007)
Google Scholar
André, P., Cutrell, E., Tan, D.S., Smith, G.: Designing novel image search interfaces by understanding unique characteristics and usage. In: Gross, T., Gulliksen, J., Kotzé, P., Oestreicher, L., Palanque, P., Prates, R.O., Winckler, M. (eds.) INTERACT 2009. LNCS, vol. 5727, pp. 340–353. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03658-3_40
Chapter Google Scholar
Goodrum, A., Spink, A.: Visual information seeking: a study of image eries on the world wide web. In: Proceedings of the ASIS Annual Meeting, vol. 36, pp. 665–674. ERIC (1999)
Google Scholar
O’Hare, N., de Juan, P., Schifanella, R., He, Y., Yin, D., Chang, Y.: Leveraging user interaction signals for web image search. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 559–568. ACM (2016)
Google Scholar
Goodrum, A., Spink, A.: Image searching on the excite web search engine. Inf. Process. Manage. 37(2), 295–311 (2001)
Article MATH Google Scholar
Silverstein, C., Henzinger, M., Marais, H., et al.: Analysis of a very large web search engine query log [J]. SIGIR Forum 33(1), 6212 (1999)
Article Google Scholar
Kamvar, M., Baluja, S.: A Large Scale Study of Wireless Search Behavior: Google Mobile Search. In: Proceedings of SIGCHI Conference on Human Factors in Computing Systems, pp. 701–709. ACM Press (2006)
Google Scholar
H-T, Pu.: A comparative analysis of web image and textual queries. OIR 29(5), 457–467 (2005)
Google Scholar
Song, Y., Ma, H., Wang, H., Wang, K.: Exploring and exploiting user search behavior on mobile and tablet devices to improve search relevance. In: Proceedings of 22nd International Conference on World Wide Web, pp. 1201–1212, Rio de Janeiro, ACM (2013)
Google Scholar
Cen, R., Liu, Y., Zhang, M., Liyun, R., Ma, S.: Reliability analysis for the behavior of web retrieval users. J. Softw. 21(5), 1055–1066 (2010)
Article Google Scholar
Jansen, B.J., Spink, A., Bateman, J., Saracevic, T.: Real life information retrieval: a study of user queries on the web. SIGIR Forum 32(1), 5–17 (1998)
Article Google Scholar
Spink, A., Jansen, B., Wolfram, D., Saracevic, T.: From E-sex to E-commerce: web search changes. IEEE Comput. 35(3), 107–110 (2002)
Article Google Scholar
Zhang, L., Chen, L., Jing, F., Deng, K., Ma, W.: Enjoyphoto : a vertical image search engine for enjoying high-quality photos. In: MM 2006, pp. 367–376 (2006)
Google Scholar
Smith, G., Brien, C., Ashman, H.: Evaluating implicit judgments from image search clickthrough data. JASIST 63, 2451–2462 (2012)
Article Google Scholar
Xie, X., Liu, Y., Wang, X., Wang, M., Wu, Z., Wu, Y., Zhang, M., Ma, S.: Investigating examination behavior of image search users. In: The 39th ACM SIGIR International Conference on Research and Development in Information Retrieval (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing, China
Zhijing Wu, Xiaohui Xie, Yiqun Liu, Min Zhang & Shaoping Ma

Authors

Zhijing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Xie
View author publications
You can also search for this author in PubMed Google Scholar
Yiqun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shaoping Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiqun Liu .

Editor information

Editors and Affiliations

Renmin University, Beijing, China
Jirong Wen
Université de Montréal, Montreal, Canada
Jianyun Nie
East China University of Science and Technology, Shanghai, China
Tong Ruan
Tsinghua University, Beijing, China
Yiqun Liu
Wuhan University, Wuhan, Hubei, China
Tieyun Qian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Z., Xie, X., Liu, Y., Zhang, M., Ma, S. (2017). A Study of User Image Search Behavior Based on Log Analysis. In: Wen, J., Nie, J., Ruan, T., Liu, Y., Qian, T. (eds) Information Retrieval. CCIR 2017. Lecture Notes in Computer Science(), vol 10390. Springer, Cham. https://doi.org/10.1007/978-3-319-68699-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-68699-8_6
Published: 21 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68698-1
Online ISBN: 978-3-319-68699-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Study of User Image Search Behavior Based on Log Analysis

Abstract

Similar content being viewed by others

An Eye-Tracking Study of User Behavior in Web Image Search

The effects of multiple query evidences on social image retrieval

Shop by image: characterizing visual search in e-commerce

Keywords

1 Introduction

2 Related Work

3 Dataset