Keywords

11.1 Introduction

The emergence of a new type of users called the digital natives has posed new questions in the field of information systems (IS). Digital natives have grown up surrounded by information and communication technology that is pervasive and ubiquitous (Prensky 2001; Tapscott 2008). As we are convinced that digital natives are the future users of decision support systems (DSS), we like to present a ubiquitous in-store recommender system.

Our work presented here is part of a larger project on a ubiquitous DSS (UDSS) called Attentive Mobile Interactive Cognitive Assistant (AMICA). The aim of that project is to realize AMICA as an attentive recommender system which is worn by the user during in-store purchase decisions. Such a system is context-aware, which means that eye-tracking technology is applied to learn from the users’ attentional processes when standing in front of a product shelf. The system will automatically detect the user’s needs and provide appropriate product information and recommendations.

Recently published articles have argued that a challenge for existing recommender systems is to elicit preference information in a minimally intrusive manner in order to reduce the user’s effort (Ansari et al. 2000; Murray et al. 2010). A solution is to use UDSSs that are context-aware (Adomavicius et al. 2005; Adomavicius and Tuzhilin 2011; Lee and Benbasat 2010; Palmisano et al. 2008). Such systems learn the user’s preferences in real-time and thus take into account that users construct their preferences while they process the decision-relevant information (Häubl and Murray 2003; Slovic 1995). Furthermore, previous research shows that not only users’ preferences but also their decision strategies are highly contingent upon the context (Bettman et al. 1998; Payne et al. 1993). UDSS, such as AMICA, have the advantage that in real-world decision environments rich data can be gathered about the context, for example location information, eye movements, gestures and speech.

This paper gives a short overview of the technology to be introduced and presents results from a pre-study to convey the underlying concepts. It explains the AMICA design, which is based on specific user requirements for ubiquitous systems, and presents a first idea on how to automatically detect the decision context.

In the following sections, we provide a literature overview about recommender systems for in-store purchase decisions and suggest new concepts to make them context-aware. Then, we illustrate how our system architecture achieves specific requirements for making the DSS ubiquitous. Finally, we conducted a pre-study in which users evaluate a prototype of the system. This proof-of-concept demonstrates how the system can learn about users’ information needs from the context.

11.2 In-Store Decision Support with Mobile Devices

Several publications have developed techniques for identifying products which are of interest for users. Those DSS display product information on mobile devices (Resatsch et al. 2008; van der Heijden 2006; von Reischach et al. 2009). The most often used techniques are manually typing in a barcode or the product name, automatic barcode recognition, near field communication (NFC) and RFID. NFC — as the fastest method—achieves the highest perceived ease of use (von Reischach et al. 2009). All approaches require the user to get into close proximity of the target or even to pick it up and turn it around. NFC, for example, requires the user to hold the mobile device in a 5 cm range from the product and it requires on average 3.3 s for detecting the product. Considering that more than 100 different products of muesli are in a standard shelf in the supermarket, this highly manual interaction is tedious and unfeasible except when only information for a significantly small number of products is required. This, however, is contrasted with the problem that all described techniques are only collecting information about the products themselves but not about the topology of the product arrangement in the shelf. So no further orientation help for the user can be given and thus these systems cannot help during the visual search process itself. In summary, these approaches focus on displaying product information and/or provide simple recommendations made by other consumers or experts. The approaches are non-personalized, non-social and not very interactive. They require explicit input about the user’s context and have very limited context-awareness overall.

Resatsch et al. (2008) found that digital natives valued their in-store decision support system more than digital immigrants. Moreover, they were interested in receiving recommendations from the system in addition to just getting product information. Furthermore, privacy and data security were not a concern for those respondents. However, the credibility of the information source was very important to them. Lee and Benbasat (2010) compared the applicability of two types of online-recommender systems for mobile in-store usage. They found that users who compared complete products with one another achieved higher decision accuracy than users who compared products along attributes (for example, users they compared products first across their prices and then across their brands). Thus, the authors achieved context-awareness by taking into account the more typical of these two ways of comparing products in-store which is the complete product comparison. Though their approach can be classified as interactive and personalized, the degrees of intuitiveness, attractiveness and social components are limited and their system requires extensive explicit user input.

Other researchers have suggested using location-awareness to build context-aware systems. Kawashima et al. (2006), for example, estimate the user’s interest in an object based on the user’s physical distance from the objects in the store. Fang et al. (2012) estimate the user’s preferences for a brand using the time they spent on a particular brand in a store and how often they look at it. Their system had a higher ease of use, usefulness and satisfaction than a benchmark that required explicit user input. We think that among the systems presented, these context-aware systems that take into account location-awareness come closest to what an UDSS for in-store purchase decisions should look like. However, geospatial location is too coarse to convey relevant information for decisions in a supermarket, where shelves are closely packed with different product types. In the following, we will thus present the AMICA platform that supports more sophisticated and fine-grained localizations, not only of the user, but also of the user’s target of visual attention.

11.3 Towards a UDSS-Design

For the design of AMICA, we orient ourselves on the requirements described by Vodanovich et al. (2010) and Junglas and Watson (2006), as they cover most of the requirements found by other research groups (Resatsch et al. 2007, 2008; Tilvawala et al. 2011). Junglas and Watson (2006) identified four fundamental requirements for UDSS in shopping environments: ubiquity, universality, uniqueness, and unison. Ubiquity is defined as reachability, accessibility and portability. Universality refers to universal mobile devices. Uniqueness means that users can be identified and localized. Thus, this concept is similar to personalization. Unison calls for integration of data so that people have a consistent view of information. Vodanovich et al. (2010) suggest a guiding list of requirements particularly for UDSS if they are used by digital natives that includes: (1) personalized, (2) interactive, (3) intuitive (4) attractive and (5) social. Personalization refers to “the ability to provide content and services that are tailored to individuals based on knowledge about their preferences and behaviors” (Adomavicius and Tuzhilin 2005, p. 84). Interactivity is usually defined as the quality of being interactive, i.e. responding to previous actions. Intuitive refers to interfaces that can be navigated without further explanation. The attractive dimension is achieved by including “cool” and up-to-date designs (Vodanovich et al. 2010, p. 719). Social systems allow users to express their own identities or showing users who contributed what.

We try to achieve Ubiquity by putting the UDSS in objects which are commonly used daily: glasses. They are an accepted and often necessary accessory of our culture. Besides their primary function, different manifestations for sports, fashion or safety exist. In contrast to the existing approaches that work with mobile devices, such as smartphones, we believe glasses to be more ubiquitous because they are wearables that users will likely over time integrate into their body schema and as the envisioned system does only require little if any explicit interaction they are much easier to handle. The technical basis of the UDSS, such as the AMICA prototype described in the following, may provide an open platform for many extensions (Apps). Such an UDSS is thus a sophisticated technical device and we expect it to be attractive for users.

Universality is guaranteed by building on top of existing mobile technology which includes means for mobile communication and mobile apps. In addition, we also introduce cognitive apps, applications that are tailored to specific interaction contexts which require little explicit user interaction but are based on cognitive models and are triggered by observing behavioral patterns of the user. In fact, the UDSS described in this paper is only one kind of such cognitive apps. This should also increase attractiveness, as the user is not required to handle additional technologies (cell phone/smartphone).

Uniqueness is given on multiple levels. Glasses are very personal devices with distinct ownership. The system includes different technologies for localization (GPS, WiFi, 3G, NFC, Accelerometer, Gyroscope, and Compass) and thus supports a solid level of context awareness.

The system’s cognitive architecture is tailored to support a high-level of personalization, e.g. by adapting to the goals of the user. A conversational interface is at the heart of the system, which adds to personalization and supports a social binding between the user and the system. The possibility to use speech, gaze and gestures to communicate with the system should make it very intuitive.

Unison is supported by a cognitive architecture that supports means-ends reasoning and an elaborated memory model. Interaction with the system is a social activity on its own, but common technologies which make use of social media can easily be integrated as well.

11.4 Attentive Mobile Interactive Cognitive Assistant (AMICA)

AMICA is a platform for personal ubiquitous computing. The underlying architecture is that of an intelligent agent who has the capabilities to perceive its environment, reason about it and act accordingly (Russell and Norvig 1995).

The perception of the system is supported by several sensors. Besides the internal proprioception sensors for localization described above, it supports a microphone, a scene camera and an eye-tracking system (see Figs. 11.1 and 11.2). The use of mobile eye-tracking is a unique feature of the system. It supports a highly localized detection of the visual attention of the wearer and thus enables increases context awareness beyond geospatial localization and basic activity recognition (see Meißner et al. 2013 for a discussion of requirements).

Fig. 11.1
figure 1figure 1

The AMICA system prototype is based on standard technologies, such as a laptop (backpack), microphones, earphones, a scene camera and a binocular eye-tracking system

Fig. 11.2
figure 2figure 2

The AMICA system in action. The backpack contains the laptop hosting all the functionality of AMICA

A cognitive architecture based on a belief-desire-intention (BDI) architecture is at the core of the reasoning system (Bratman 1999). It supports modal logic to represent believes about the world and about the intentions and goals of the user. The architecture supports domain-specific extensions, called cognitive apps, for different contexts and daily activities. The idea is that the system dynamically detects the current situation and enables relevant cognitive apps dynamically. For example, in a current prototype we have a cognitive app for chess tutoring which automatically detects chess boards using computer vision and provides hints to the user to support them in learning chess. Figure 11.3 displays the information flow of the system’s components. Based on the available sensors, the system localizes itself and detects the context of the current interaction. If a particular context, e.g. a store, aisle or shelf, is detected, the decision support system app is activated and a more fine-grained detection of the current decision context is triggered. The system then continues with the detection of the decision type that defines the user’s intention (e.g., goal-oriented buying of a particular product versus exploratory browsing, see below) and then detects the exact stage in the decision process (e.g., orientation, evaluation and validation; Russo and Leclerc 1994). Based on this detection of the context, recommendations can be communicated in the dialogue with the user taking into account the model of the user and her preferences.

Fig. 11.3
figure 3figure 3

Information architecture for the AMICA system

The current version of AMICA conveys information to the user using audio output (sonification or voice), but extensions to support a near-eye visual display are possible, such as the available EPSON Moverio BT-200 or the upcoming GlassUp or Google Glasses. Using vibration, the user can be subtly made aware of potential decision support to be offered by AMICA.

11.5 Context Detection: Goal-Oriented Versus Exploratory Decisions

One important aspect of AMICA is the unobtrusive detection of the current context and of the tasks the users are occupied with. Geospatial localization can narrow down the set of possible decision contexts. Detecting the current task requires more sophisticated methods. As a unique approach, AMICA tries to infer tasks based on visual information about the current scene, e.g. the location in front of a certain shelf (see Figs.11.4 and 11.5) using computer-vision methods (Harmening and Pfeiffer 2013), combined with the observation of the attentional processes of the user which can be found out through the use of eye-tracking.

Fig. 11.4
figure 4figure 4

Scanpath of participant 4 during the exploratory task

Fig. 11.5
figure 5figure 5

Scanpath of participant 7 during the goal-oriented task

In recent years, more sophisticated mobile eye-tracking systems have enabled researchers to investigate attentional processes in natural environments, like supermarkets. It has been shown that attentional processes can differ considerably between laboratory settings and more natural environments (see e.g., Hayhoe and Ballard 2005) and that attentional processes are highly task dependent (Gidlöf et al. 2013). Castelhano et al. (2009), for example, found large differences in eye movements when participants processed information under two instruction sets: visual search and memorization.

In line with the above research, we expect similar effects in a supermarket. Imagine, for example, that a consumer has planned to buy muesli. When entering the supermarket, she already knows that she likes mueslis with chocolate and almonds and that the muesli should be low in calories. For this planned purchase, the consumer obviously is goal-oriented. Now imagine another consumer who does not know which characteristics are important for her and therefore is browsing the supermarket aisles. This kind of attentional process can be best described as exploratory. This distinction can be derived from research on search behavior which implies that searching can be dichotomized into goal-directed versus exploratory search (Janiszewski 1998).

We follow the idea by Moe (2003) with the aim to find indicators that allow us to differentiate between goal-oriented versus exploratory tasks in in-store purchase decisions using AMICA. We argue that a UDSS that is able to automatically detect whether a goal or exploratory context is given will help to better adopt the decision support provided to the users’ needs. In another study we found that users more strongly prefer to receive ratings and comments and product recommendations in goal-oriented tasks than in exploratory tasks (Pfeiffer et al. 2014). Furthermore, in exploratory task a perfect detection rate of products is more important than in goal-oriented tasks. More work is needed to find out about different requirements for decision-support in different purchase decisions.

In the following, we investigate differences in attentional processes when consumers are manipulated to make goal-oriented and exploratory decisions and observe whether the two cognitive tasks can be identified using attentional processes.

11.6 Evaluation of AMICA

11.6.1 Setup of the experiment

We conducted an experimental study in a medium-sized grocery store. Twenty shoppers were recruited directly after entering the store and they received 10 € as incentive for participation. The mean age was 31.3 (standard deviation (std.) = 13.27, maximum 53 years) and 70 % were female. We chose muesli as product category, because it is information-intensive, offered a sufficient variety of 116 different products, and the packages have a form that can be more easily annotated for the eye-tracking analyses than other products. Furthermore, the packages are more or less of equal size which makes certain measures, such as the distance between products considered, easier to compare. Participants were very different with respect to their interest in muesli. Five reported to never buy muesli, 7 reported to buy muesli fewer than once a month, 3 once a month and 5 several times per month. Fifteen participants said that they eat muesli up to 3 times per week. The mean of how often muesli is consumed per week is 1.95 (std = 2.26).

The participants were randomly assigned to either a goal-oriented task (GT) or an exploratory task (ET), yielding a group size of ten respondents for each of the two decisions tasks. In each group, they were first read out the task description and then the experimenter ensured that participants had understood the task. In GT they were told to select a muesli for a friend who would come for a visit. In that scenario, the friend likes to have a muesli which fulfills two binary criteria and one continuous criterion, i.e. the muesli should (1) contain chocolate, (2) contain almond, (3) be as low in calories as possible. Eight products fulfilled both binary criteria and there was one optimal product that was the lowest in calories. In ET, participants were asked to gain a fairly good overview about the muesli assortment and to determine criteria which are important for them when buying muesli. Afterwards, they had to choose one product they would potentially buy themselves.

During the task, participants had to wear the AMICA prototype. Running in non-interactive mode, the sensors of AMICA (scene-camera, microphone and eye-tracking system) recorded the behavior of the participants, but the system provided no feedback. This way, we collected first data on the acceptance of wearing such a system in public areas and on the benefit of envisaged functionalities with a post-hoc interview, while at the same time, we were able to record a corpus of real-life data on customer behavior in specified contexts with identified intentions.

11.6.2 Results: Differentiating Between Decision Situations Based on Eye Movements

Table 11.1 summarizes the values for describing the decision processes of respondents in the GT versus ET: Four observations are missing because of technical problems with the USB-port during recording of the eye movements. The results show that several measurements are potentially useful for detecting whether a user is in a goal-oriented or exploratory decision situation. In the GT, respondents put much more effort in the purchase decision (a,c,d,g). They made more fixations (d) on more products (c) and subsequent fixations were closer together (b). They also spent more time (e) on individual products, especially before making their decision (g). This last result is in line with Gidlöf et al. (2013) who found an attentional focus to the finally chosen product when participants selected products from a supermarket shelf. We categorized information available on muesli packages in the following three categories: h) Brand and logo, i) price and j) detailed information, such as ingredients. In the GT, information about brand and details were important while it was primarily brand for the ET.

Table 11.1 Key differences between the performances in GT and ET

These behavioral differences also show up in typical scanpaths of participants for ET (Fig. 11.4) and GT (Fig. 11.5). Each fixation is represented by a circle and eye-movements between fixations are drawn as paths between these circles. Larger circles represent more fixations on the same object, more intense colors represent a longer summed dwell time. S stands for start (the first fixation) and E for end (the last fixation).

These first results show that when exploring information in a product category, participants focus on brand-related and price information and neglect detailed information. However, when pursuing the goal to select a product based on predefined preferences, participants acquire more detailed information. In sum, the results show that the level of processing of detailed information can provide information about the goal orientation of the participant. The aim of the UDSS therefore should be to identify the degree of goal-orientation and, based on that, to give detailed information which is of help in the purchase situation at hand. The next aim of our project therefore is to investigate how a recommender system can be trained to learn which specific information is relevant in a certain decision situation.

11.7 Conclusion

The design of mobile recommender systems for in-store usage gives rise to several research questions. We are developing such a system based on the presented AMICA platform. In this paper, we required ground-truth data on real-life attentional processes during decision making in front of a supermarket shelf. The AMICA system provides advanced technologies for context-awareness based on computer vision using a scene camera and eye-tracking. This promises unmatched possibilities to monitor the attentional processes of the user in real-time and enables us to differentiate between decision tasks with distinct needs of support from a recommender system. The results of our study show that the attentional processes during the goal-oriented and the exploratory tasks bear enough differences in basic eye-movement features, such as duration and number of fixations, or the ratio of fixations on detailed information versus brand and logo, in order to allow us to differentiate between the two tasks.

These first results suggest that using the AMICA framework for mobile recommender systems is promising. In the future, we would like to build a classifier using an appropriate subset of indicators that describes the difference in decision processes between goal-oriented versus exploratory tasks. Next, we would like to investigate the differences in the users’ needs in the two different decision contexts and define the decision model. Putting both results together, we should be able to implement a first prototype of AMICA.