Keywords

1 Introduction

It has been widely recognized that when people are looking for information relating to one topic, they may accidentally discover useful or interesting information relating to another topic. That is, low or no involvement and low or no expectation can also result in the acquisition of information [1]. Such phenomenon is known as information encountering (IE). Nowadays, the overload of information and the abundance of ways to access information make IE unavoidable and ubiquitous [2]. Despite decades of research efforts to conceptualize, model, and interpret IE, there still lack context-specific studies which are able to engender more practical implications for developing IE friendly environments.

Social question and answer (Q&A) sites, such as Quora and Zhihu, essentially support a questioning-based information seeking process in which users pose their information needs as questions in natural language to a community and receive targeted answers from peer users who are willing to share their knowledge [3]. Often, social Q&A sites categorize questions into topics and allow commenting on answers to ensure objectivity and quality [4]. As typical social media, they provide a vigorous environment where pivotal navigation is enabled among various elements, including questions, answers, users, topics, and comments, etc. [5].

It’s interesting to notice that social Q&A sites demonstrate the environmental characteristics that may facilitate the occurring of IE, such as enabling exploration, trigger-rich, highlighting triggers, enabling connections, and leading to the unexpected [6]. However, the phenomenon of IE has been basically ignored in the existing literature on social Q&A. To fill the gap of research, this study aims to reveal how people encounter information in this particular context. Real users’ IE experiences on social Q&A sites were collected through a diary study combined with the critical incident technique (CIT), and a context-specific IE process model was established based on both qualitative and quantitative analysis of the incidents.

2 Literature Review

2.1 IE Process Models

A number of models have been created to describe the process of serendipity or IE. McCay-Peet and Toms’s [7] model of serendipity process embodies the following components: search for solution to Task A, precipitating conditions, a bisociation between previously unconnected pieces of information, a trigger that activates the bisociation, unexpected solutions to both Tasks A and B. Recently, they consolidated several previous models into a new one that consists of trigger, connection, follow-up, valuable outcome, unexpected thread, and perception of serendipity. In particular, the trigger is “a verbal, textual, or visual cue that initiates or sparks an individual’s experience of serendipity” [8].

Existing IE process models place an emphasis on users’ behavioral characteristics that are observable. Erdelez [9] indicated that IE is embedded within a high-level process of information seeking. A typical IE episode contains five steps, i.e. noticing, stopping, examining, capturing, and returning. A further development is the integrated model of online IE which provides a global view of the three phases respectively accommodating the pre-, mid-, and post-activities of IE. Specifically, IE may happen during online browsing, searching, or social interaction; the acquisition of interesting/useful information is preceded by noticing an information stimulus and examining the content; and the encountered information may be explored further, used immediately, saved, and/or shared. The stimulus, which can be identified and consumed within an instant, is a navigational representation of the content [10].

The above models have laid a solid theoretical foundation for IE research. Nevertheless, the environment has an important impact on what users will do by defining what they can do. As IE research evolves, context-specific models are needed to reflect more accurately the subdivisions of the field.

2.2 IE on Social Media

Thanks to the broad coverage of topics, natural style of human-to-human querying, and enriched means of participation and interaction, social Q&A sites are gaining popularity among information seekers [4]. Previous related studies can be categorized into two major streams: content-centered and user-centered [11].

Despite the ignorance of the IE phenomenon on social Q&A, IE researchers have considered the more general context of social media. In Dantonio et al. [12], interviewees reported that they had come across content serendipitously when undertaking unfocused browsing on academic social media. Panahi et al. [13] found through the interview of physicians that social media supports IE in six ways including publicizing, dissemination, personalization, keeping up to date, documentation, and retrieving.

Users might share their everyday life serendipity experiences on blogging or microblogging services. For examples, Rubin et al. [14] retrieved naturally occurring accounts of chance encounters from GoogleBlog, Bogers and Björneborn [15] crawled a large quantity of tweets containing the word “serendipity” on Twitter, and Tsai [16] used selected tweets to stimulate IE on Twitter in the laboratory setting. In addition, microblogging provides context for the discussion of serendipitous learning which occurs in browsing through the stream of social updates [17].

Millions of questions and answers have been accumulated on mature social Q&A sites and they are usually searchable with internal search tools and browsable by topical classifications. User ratings and comments are aggregated to rank and recommend popular questions, best answers, and active users. The high visibility and accessibility of available content increases greatly the chance of IE. This study is interested in how IE occur on social Q&A sites.

3 Methods

3.1 Research Setting

This study chose Zhihu (http://www.zhihu.com/), the most influential Chinese-language social Q&A site, as the research setting. It has attracted 65 million registered users, including 18.5 million daily active users. In 2016, there were more than 6 million questions posted to the site and engendering 23.33 million answers. Zhihu was chosen not only for its prevalence, but also for its representativeness as an IE-prone environment that provides various triggers.

Zhihu consists of three basic page categories, i.e. Q&A pages, topic pages, and user pages. Q&A pages display questions and their affiliated answers and comments with the users who ask, answer, or comment. Comments may be made on questions and/or answers. One can also find system-recommended related questions and user-created collections of questions on Q&A pages. Topic pages exist mainly for navigational purposes, allowing users to browse for questions within a loosely structured multi-level hierarchy. Topics are actually the system-supplied tags (e.g. “art” and “health”) that askers select to describe the questions upon submission. Finally, user pages show users’ profiles and their activities on the site, such as asking, answering, sharing, following, collecting, and so on. It should be mentioned that Zhihu has especially introduced a column where users can contribute thematic articles.

3.2 Data Collection

The CIT has been widely employed to capture people’s experiences and perceptions in IE research. It “outlines procedures for collecting observed incidents having special significance and meeting systematically defined criteria” [18]. It is common that the CIT was applied in interviews, but the integrity and accuracy of the descriptions provided by the interviewees may be affected by their abilities to retrieve the incidents from memory. Also importantly, the quantity of the IE incidents collected through time-consuming and cost-intensive interviews may be very limited, varying from 20 to 30 incidents per study.

This study instead combined the CIT with diaries for data collection. Diary studies ask participants to record the event of interest as soon as it happens, especially suitable for capturing easily changeable and indiscernible information. Diaries are also superior to interviews in terms of both sample representativeness and size due to the absence of geographic restrictions [19]. Participants were recruited via multiple channels with small incentives. They needed to complete an online questionnaire (https://sojump.com/jq/13713854.aspx) as soon as they encountered information on Zhihu.

The questionnaire divides into three parts. The first part (Q1) begins with a brief description of IE according to Erdelez’s [1] definition of the concept. Also provided are a couple of real-world examples of IE incidents, and the participants are invited to give an account of their own IE incidents on Zhihu similarly. Such free-style narration may include unexpected interesting facts about their IE experiences, but it may also be lack of desirable details. So the second part (Q2–Q17) asks a series of questions that help the participants recall and think their experiences in terms of smaller components. These questions were created and arranged based mainly on the models by Erdelez [1], Jiang et al. [10], and McCay-Peet and Toms [8] as mentioned above. This part will be described in more details in the next section. The last part (Q18–Q22) of the questionnaire collects the participants’ background information.

3.3 Data Analysis

The CIT-based diary study was conducted between July 28th and August 14th, 2016. A total of 163 IE incidents were collected, but 55 eliminated for containing contradicting and/or inaccurate information. The remaining 108 incidents were contributed by 83 participants as one might submit multiple responses over the time. The top participant contributed 7 incidents while the majority only one. The narrative descriptions of the IE incidents were analyzed in combination with the answers provided for the questions in the second part that can be divided into 6 major sections:

  • Q2 to Q4 were created based on the pre-activities phase in Jiang et al.’s [10] model, including the type of the foreground activity, its urgency, and the participant’s emotional state;

  • The inclusion of Q5 to Q8 reflects the importance of a trigger [8] or a stimulus [10] that diverts people’s attention from the foreground activity. Q5 and Q7 ask respectively about the noticing and stopping steps in Erdelez’s [9] model, with Q6 and Q8 eliciting the reasons behind the actions;

  • The examining of the content associated with the trigger or stimulus considered in all three models is explored through Q9 to Q12, including the specific content examined as well as its relevance to the foreground activity and value type and level;

  • Q13 focuses on the capturing step in Erdelez’s [9] model or the post-activities phase in Jiang et al.’s [10] model, i.e. how one deals with the encountered information;

  • Q14 focuses on the returning step in Erdelez’s [9] model, i.e. what one does after IE;

  • Q15 and Q16 collect one’s overall IE frequency and attitude.

Such division enabled a natural framework for the content analysis of the 108 valid IE incidents collected in this study. The qualitative data analysis tool NVivo 10 was employed to perform the analysis. The written texts exported from each response were unitized according to the above sections, and concepts were extracted and categorized section by section. In addition, the statistical analysis software package SPSS was employed to perform independent-samples T tests and correlation analysis.

4 Results

In general, the 83 participants of the diary study were well-educated young people (pursuing bachelor’s or master’s degree, N = 74, 89.16%) aged between 19 and 25 (N = 68, 81.93%), with female (N = 47, 56.63%) slightly more than male (N = 36, 43.37%). 81.93% (N = 68) of them were familiar with Zhihu (4 points and above). Their information seeking on this social Q&A site featured keyword searching (N = 77, 92.77%), monitoring followed sources (N = 68, 81.93%), and browsing popular (N = 55, 66.27%) and personalized recommendations (N = 52, 62.65%), while direct question asking (N = 28, 33.73%) and topic classification browsing (N = 17, 20.48%) not so frequently seen. Based on their frequencies of IE on Zhihu, “encounterers” (4–5 points) dominated (N = 46, 55.42%), followed by “occasional encounterers” (1–3 points, N = 28, 33.73%) and “super encounterers” (6–7 points, N = 9, 10.84%). The majority (N = 75, 90.36%) had a positive attitude toward IE (4 points and above).

4.1 Foreground Activities

Foreground activities are the activities that people intentionally or consciously get themselves involved in (Q2). All of the above mentioned information seeking activities were found to have provided context for IE to occur on Zhihu, and they basically divide into:

  • Purposeless scanning (N = 46, 42.59%) usually involves no specific need or goal. When visiting Zhihu as a daily habit or pastime during their spare time, the participants tended to browse popular or personalized recommendations provided on the homepage.

  • Purposeful searching/browsing (N = 43, 39.81%) is a goal-driven activity that may in turn belongs to a higher-level task. Some participants would distinguish “searching” from “browsing”, while others used the general term “looking for”.

  • Monitoring (N = 19, 17.59%) refers to following a topic, a question, or a user of interest to obtain updates. Most related incidents explicitly indicated a source that had been identified at an earlier time and checking for updates from the source regularly.

When engaged in the foreground activities, the participants in general were in a relatively positive emotional state (Q3, M = 4.72) and felt the activities not so urgent (Q4, M = 2.88). Independent-samples T tests were performed to find significant differences between searching/browsing and scanning for both emotional state (t = 2.088, p = .04) and urgency level (t = 2.771, p = .007). And the participants’ emotional state in monitoring is significantly more positive than that in scanning (t = 3.228, p = .002).

4.2 Noticing and Stopping

Embedded in the above foreground activities are various information stimuli from the basic elements of Zhihu. If the stimuli are not components of the foreground activities or closely related, users need to stop the foreground activities, probably temporarily, so as to deal with the stimuli, which initiates the process of IE [9]. The information stimuli involved in the IE incidents (Q5) present themselves in different formats:

  • Texts (N = 95, 87.96%) are words, phrases, and short sentences that convey linguistic meanings. Such stimuli might appear in question titles and descriptions (N = 40), answers (N = 34), topic tags and descriptions (N = 11), column articles (N = 7), user profiles (N = 2), and comments (N = 1).

  • Images (N = 22, 20.37%) as stimuli took two primary forms: affiliated pictures (N = 15) which were inserted into answers, question descriptions, or column articles to support surrounding texts, and avatars or icons (N = 7) used to recognize specific users or topics.

  • Numbers (N = 15, 13.89%) refer to the frequencies the questions, answers, topics, users, and articles being liked, followed, and/or commented. Zhihu keeps track of the liking, following, and commenting activities and generates total counts to imply popularity. High popularity tended to attract special attention.

Nevertheless, in some uncommon situations an information stimulus is not a necessity. For examples, participants P51, P54, and P57 reported that they clicked on some random links by mistake but opened pages that happened to contain desirable information. These situations should be deemed pure chance.

The participants were attracted by the stimuli for two major reasons (Q6). First, the stimuli were perceived to be interesting (N = 71, 65.74%): they might evoke curiosity through novelty or engender resonation through commonness. Second, the stimuli were perceived to be useful (N = 37, 34.26%): they might be connected to existing problems in one’s life or work or one’s current feelings that needed adjustment.

Stopping the foreground activities is a reaction to the stimuli. The stopping act may last for a very short time before the next tangible act, i.e. clicking. According to the responses to Q7, the participants clicked on the stimuli immediately in most incidents (N = 83, 76.85%), while clicking on the stimuli after a moment is much less frequent (N = 25, 23.15%).

Q8 elicits the reasons for the difference in stopping durations. Immediate clicking was explained in multiple ways: Zhihu was highly trusted for its secure environment and high-quality information; the value of the stimuli was easy to recognize; and the stimuli helped escape from the fatigue of prolonged purposeful searching/browsing or the boredom of prolonged purposeless scanning. In contrast, deferred clicking allowed time for the participants to think so that they could search in their memory for existing problems to which the stimuli were relevant. It is also possible that they hesitated to click for fear of hindering the foreground activities or just because the value of the stimuli was doubtful.

4.3 Examining

Upon accessing the information content that a stimulus represents and is usually hyperlinked to, one may examine it through various mental actions in order to determine the relevance, quality, and value of the content. This study found by analyzing the responses to Q9 that all basic elements of Zhihu had been examined as content encountered:

  • Answers (N = 91, 84.26%). The commonest stimulus-content pair is question-answer (N = 40). The participants tended to be attracted by question titles or descriptions and then examined the affiliated answers. Also frequently seen is the answer-answer pair (N = 35).

  • Questions (N = 31, 28.70%). The top pairs in this category include answer-question (N = 13), topic-question (N = 8), and question-question (N = 5). The descriptions of questions would be examined in more detail in response to the stimuli from their answers, topics, or related questions.

  • Users (N = 30, 27.78%). Once a question or an answer caught one’s attention, it was naturally desirable to know more about its contributor, thus resulting in examining his or her profile and activities.

  • Topics (N = 26, 24.07%). As questions and answers are typical semantic objects, another inclination after noticing a question or an answer was to examine the descriptions of the topics to which it had been assigned to.

  • Other (N = 12, 11.11%). This mainly refers to examining the content of a column article as preceded by noticing the title or the number of likes of the article.

  • Comments (N = 10, 9.26%). The examination of comments usually resulted from the noticing of questions or answers.

The participants thought the above information content not so relevant to their foreground activities (Q10, M = 3.86), suggesting that the incidents provided basically met the unexpectedness criterion of IE. Significant difference was found between searching/browsing and scanning in terms of relevance (t = 2.277, p = .025). This is understandable because the relevance was evaluated against purpose and the weaker the purpose, the lower the relevance.

The value of the content (Q11) consists either in its usefulness in helping solve a problem or in its interestingness to satisfy curiosity, each explaining half of the incidents (N = 54, 50.00%). The value of the content was evaluated to be high (Q12, M = 5.30), with the value scored 4 points and above in the vast majority of the incidents (N = 103, 95.37%). And the interesting content demonstrated significant higher value than the useful content (t = 2.072, p = .008). Furthermore, content value is positively related to the relevance to foreground activities, but only moderately (r = .317, p = .001).

4.4 Capturing

Zhihu is a well-established social Q&A site, and it enables users to capture encountered information in the following modes by providing corresponding functionalities (Q13):

  • Collecting (N = 71, 65.74%) is the most popular way of capturing. Most participants (N = 65) collected encountered answers or column articles to their Zhihu Favorites, while a few (N = 6) bookmarked the pages as the medium carrying any encountered content in Web browsers.

  • Exploring (N = 47, 43.52%) can be considered as extended efforts to enhance the examination of the content. The participants proceeded to exploration in order to understand the content more deeply.

  • Sharing (N = 41, 37.96%) encountered answers, question, topic and column articles to the social networking service WeChat or the microblogging service Sina Weibo as supported directly by Zhihu (N = 15). Other content was shared by sending the page URLs via Web browser functions (N = 13). However, the channels and/or targets of sharing were not specified in the remaining incidents.

  • Using (N = 29, 26.85%) is applying immediately the encountered information to satisfy existing needs or personal interests.

  • Saving (N = 12, 11.11%) the desirable content as texts or screenshots to the local disks on computers (N = 1) and cell phones (N = 1), online cloud storage (N = 1), or other unspecified places (N = 9).

  • Following (N = 7, 6.48%) refers to monitoring a source for updates.

As the total frequency of the above modes of capturing exceeds 108, some participants actually adopted multiple modes in the same incident. Combining two modes is the commonest (N = 62, 57.41%), and the most frequent combinations are “collecting + exploring” (N = 13, 12.04%) and “collecting + sharing” (N = 11, 10.19%). Since no significant difference was found among different modes or mode combinations for the value type (useful/interesting) or level (from low to high) of the encountered information, the adoption of capturing modes should be attributed basically to user habits or preferences.

4.5 Follow-Up Activities

Erdelez [20] deemed it natural that people would return to the “initial information seeking task”, i.e. the foreground activity, after IE. This study, however, identified more possibilities for the follow-up activities (Q14):

  • Terminating (N = 63, 58.33%) refers to ending IE and also abandoning the original foreground activity.

  • Returning (N = 27, 25.00%) is reconnecting with the foreground activity.

  • Probing (N = 18, 16.67%) is a follow-up activity that deserves special attention in spite of its lower frequency. Following the cues that appeared in the IE process, one might initiate information seeking in new directions, with the original foreground activity continuing to stay in the background.

Again, there exists no significant difference in value type or level among the three follow-up activities.

5 Discussion and Conclusions

5.1 Research Methods in IE Research

This study developed a CIT-based diary questionnaire to add structure to users’ recording of their IE experience. Obviously, the quantity of incidents collected in this study far exceeded those in other IE studies employing interviews. The increased sample size resulted in the abundance of data, which made it easier to detect trends. More importantly, the combination of free-style narration of IE incidents and targeted questions addressing each basic incident components did conduce to data completeness, allowing complementation and verification between user-concerned and researcher-concerned details. Grounded on the common steps or phases in established models, the questions in the second section of the questionnaire are mostly directly applicable to the research of IE in other contexts. However, a special observation is that when one participant contributed four or more incidents, these incidents tended to be similar. Hence the distribution of the questionnaire should be as wide as possible to avoid such bias. Also, since the questionnaire requires text input that is time-consuming, appropriate incentives are necessary to encourage participation.

5.2 Design Implications for Social Q&A Sites

An IE process model (Fig. 1) emerged from the above phased analysis. This context-specific model suggests that social Q&A sites have reshaped the IE process. The facilitation of IE is reflected in the greater variety of foreground activities and capturing modes, and the differences in follow-up activities reveal the multifold roles of IE to users.

Fig. 1.
figure 1

A model of the IE process on social Q&A sites

This study differentiated purposeless scanning and purposeful browsing as foreground activities of IE. The latter often involves a need or goal and relies on systematic strategies of relevance recognition. In addition, monitoring was for the first time identified as a type of foreground activity as popularized by the convenience of following various elements (i.e. topics, questions, and users) on social Q&A sites. One actually specifies a direction of information seeking when establishing a source of monitoring since the topic or question followed is associated with a theme and the user an interest. Although it is difficult to control the occurrence of IE in purposeless scanning that is near random, stimuli can be intentionally embedded into or reduced from other information seeking activities to induce (e.g. providing associative query suggestions in searching) or prevent (e.g. simplifying source presentation in monitoring) IE as needed.

“Capturing” in the IE process originally refers to “the extraction and saving of the encountered information for future use” [20]. As found in this study, even more modes were enabled and adopted which realized personal information management and collaboration. Given the prevalence of collecting for future use, however, social Q&A sites need consider how to revitalize the encountered information in storage and help users utilize it at a later time. Or more powerful support should be provided to encourage the immediate using of encountered information that can be easily forgotten.

Previous studies mostly neglected the activities that happen after the encountered information is captured. On social Q&A sites, the original information seeking could be totally overwhelmed by the occurrence of IE. The actual reasons behind different follow-up activities are worth further exploration. It is important to realize that the value of IE should not be achieved at the cost of jeopardizing the main information seeking task. As returning is not natural, the system should consider providing reminders and/or shortcuts that help users return to what they originally come for.