Keywords

1 Introduction

Web usage mining has been around for quite some time now. Since the late 1990s and early 2000s, researchers have contributed dozens of studies about handling interaction logs and how to utilize them in their field of research. These early studies focus on search behaviour, interpreting how users interact with search systems and what is actually searched for [5, 34]. Initial findings gave insight about average query length, amount of queries and reformulations or the number of visited result pages.

However, the actual identification of sessions in the interaction logs received a growing interest. Identifying patterns and segmenting logs into user sessions has grown to be a focal point, being the foundation for any further analysis or research [13]. Various methods were tested for finding reasonable session boundaries, often applying mechanical cuts like time outs. The most common inactivity time out of 30 min, most likely evolved from the 25.5 min proposed by [5], is still used today. Later, research interest went from mechanical sessions to a more intent-oriented approach, acknowledging that finding suitable user context is easier when sessions are logically segmented rather than mechanically. Therefore, definitions vary from mechanical [5] to logical [17].

Today, most related publications still apply the 30 min inactivity cut as a foundation. From user modelling to recommendation to personalisation - the 30 min rule seems to be omnipresent. This position paper is part of a dissertation project researching the impact of different session modelling concepts. A quick timeline on the development of session concepts is presented and the solitary use of a temporal constraint discussed.

2 Literature Review

Session Identification. Early studies identifying sessions as the basic unit of measurement in interaction logs mostly relied on time gaps to decide if two consecutive queries belong to the same session, resulting in mechanically segmented sessions. [5] were among the first to introduce a temporal constraint. They report an average time of 9.3 min between interactions, adding 1.5 standard deviations to propose a temporal inactivity limit of 25.5 min. Other temporal cuts are also reported: 5 min [33], 15 min [14, 15] or even 60 min and longer [3].

Over the years, these time constraints have evolved into a 30 min inactivity time out. Many works rely exclusively on this arbitrarily set time limit [4, 8, 21, 24, 37], others recognized a need for more evidence, using stopping patterns [39] or dynamic time thresholds based on visited pages [7, 41] and users [27]. After [35] reported multitasking during search sessions, even identifying interleaving intents, growing interest was directed to the identification of tasks rather than mechanical sessions.

Task Identification. Tasks may be similar to sessions, but they move away from purely mechanical thresholds to logical boundaries. Simple approaches use lexical similarity between adjacent queries [11] to identify topically related segments, assuming that queries that do not share any terms with previous ones indicate a new session [17] (although the sessions are identified with a temporal constraint in the first place). A prime example of the combination of lexical similarity and temporal relationship is [9], who use a geometric approach to calculate similarity between query pairs based on a 24 h temporal limit. Most approaches still use (mechanical) session-based features to calculate similarity between queries. Some use sequential patterns [28, 30], others employ external sources to create a richer semantic context like thesauri [16] or pre-trained embeddings [10].

Even more advanced is the identification of cross-session tasks, recognizing the importance of interleaving and multiple tasks throughout the boundaries of mechanical sessions. [19] identified tasks as just another level of measurement. They define search sessions as user activity within a fixed time window, search goals as the atomic information need producing one or more queries and search missions as the overarching concept, connecting various search goals and therefore possibly spanning multiple sessions. This hierarchical point of view works well for describing user behaviour: visiting an information system in a session, searching for several goals belonging to one search mission. In [22], this concept is exploited via hierarchical clustering algorithms based on multiple query features. [12] and [13] propose a cascading method for connecting related adjacent queries by consecutively using lexical and semantic similarity, temporal proximity, search results and context comparison to find logically coherent search missions. Other studies compare adjacent queries with binary classifiers [1, 20], use latent structural Support Vector Machines [38] or utilize term and context embeddings [25, 32].

3 Discussion

[40] qualitatively analysed real web sessions, identifying multiple factors as potential indicators for session boundaries: changing topics or tasks related to the topic, switching to a different phase of a mission, different environmental context (i.e. being among people) and the time gap as the traditional measure. Acknowledging the potential co-existence of these measures strongly supports a development from mechanical sessions to logically connected segments, possibly connecting multiple mechanical sessions and tasks. These concepts build upon each other and should be applied accordingly.

However, sessions identified with temporal boundaries are still widely used. 30 min of inactivity is the industry standard [2], despite clear indicators that solitary use of time gaps is not reliable [6, 10, 26]. Many applications using interaction logs still exclusively apply the 30 min inactivity time out rule as a foundation for algorithms or analysis. Receiving much attention lately is sequential user or topic modeling with recurrent neural networks. From predictions about sequences or session outcomes [36] to session-based or session-aware recommendation [23, 29, 31], either the 30 min or a slightly changed temporal constraint is used to detect sessions.

[12] criticized that published studies often do not state how sessions are built. But what is actually worse is that often mechanical sessions are used even when the aim of the study strongly suggests logical sessions [12]. Little thought is put into segmentation. Depending on the application, there are multiple possible definitions on how to structure a user’s history [18] and the potential impact of different session models should be more present in research.

4 Conclusion

Algorithms need input data. In Information Retrieval, this input data comes excessively often in the form of interaction logs. Besides laboratory studies, interaction logs represent the main source of information regarding the understanding of users, their information needs and how they interact with search engines or information systems.

Although much effort has been put into segmenting logs in a meaningful way, and although task- and mission-based approaches have received much attention, many recent studies still apply only temporal constraints. They use mechanical sessions to model user context in many different ways (i.e. compare the recent wave of studies using recurrent neural networks). The actual basis for these algorithms are still sessions identified with a 30 min inactivity time out.

This position paper questions the lack of effort put into the pre-processing of interaction logs. A significant amount of thought should be put into the input for any algorithm. The 30 min inactivity time out might be perfectly fine for most applications - but arbitrarily and unquestioningly applying it as the basis for any and all algorithms may lead to wrong conclusions, no matter the algorithm quality.