Keywords

1 Introduction

Traditional physical stores are closing down rapidly—the “retail apocalypse” [3]. Still, these stores remain the main pillar of the retail sector, accounting for 90% of worldwide sales [18]. Currently, physical stores are moving towards omnichannel retail [5] which aims to make customer experience the center of the business model and enhance it as much as possible. Hence, this omnichannel approach to retail differs from the traditional approaches (Brick and mortar and e-commerce) [14] in terms of deployment which poses a problem to retailer to transition. A promising approach is the deployment of novel digital technologies [13, 15, 27]. One of the mega-trends expected to revolutionize retail is mixed reality (MR) [5] as it offers a personalized hedonic experience using immersive holographic interfaces. However, little research has been done on concrete use cases of MR in omnichannel retail. In this paper, we aim to close this gap by providing a conceptual system that integrates the Microsoft HoloLens and an in-store recommender system for offline stores. We argue that the proposed system will have a positive impact on customers’ path-to-purchase [29] and will accelerate both the search for information and the evaluation of alternatives [34].

We first discuss relevant previous work and the necessary theoretical and technological background in Sect. 2, before outlining the proposed system in Sect. 3. Finally, Sect. 4 offers an in-detail discussion which also takes into account limitations and future work and concludes the paper.

2 Previous Work and Background

To justify the use of MR for in-store recommendations, we will review previous approaches to deploy recommender systems in physical retail but also investigate other use cases of MR in retail to identify their limitations and to draw inspiration for our recommender system (Sect. 3).

Researchers have experimented with a range of different technologies to integrate with recommender systems in brick and mortar stores. E.g., Fang et al. [9] created a smartphone-based store recommender system for shopping malls that implicitly captures customers’ preferences by analyzing their positions using WiFi-RSS. Similarly, Silva et al. [30] provided recommendations of new or unseen stores in a mall to users via their mobile phones. By now, smartphones can also be used to check-out and pay [25]. Recently, [24] identified that resolving the complexity of the context data and creating more personalized advertisements as two of the directions where researchers need to contribute in order to improve mobile recommender systems and in the present article, we aim to work towards those directions. Another technology that has been integrated with recommender systems is computer vision which can be used to track customers in-store using surveillance cameras [22]. Other options include smart mirrors which detect RFID tags and recommend similar products [11].

In the recommender systems literature, collaborative filtering (CF) [12] is one of the most common approaches to generate recommendations. CF is often based on the idea of matrix factorization, e.g., using singular value decomposition (SVD) [4]. First, an initial matrix R that captures the preference of a set of users for a set of items is constructed based on known ratings of items by users. Then, using SVD, this user-item matrix is decomposed into a user-to-feature similarity matrix, an item-to-feature similarity matrix and a diagonal feature-weight matrix; this is used to predict users ratings to then generate the top-N most relevant items suggestions to the customers. To improve recommendation accuracy, additional information can be integrated, e.g., transactions of purchased products, or extensions of SVD like SVDFeature [7] can be used. Also, by capturing customers’ movements inside the store and interactions with products, content-aware collaborative filtering approaches can be realized, which aim to use “contextual signals to become more human-centered” [10]. Methods to achieve context awareness can be categorized into pre-filtering, post-filtering and contextual modelling approaches. Their advantages and shortcomings have been discussed extensively in the literature [1].

Our proposed system will use MR technology. This term was defined to describe semi-virtual environments in which physical and digital objects coexist and interact with each other [21]. Possible advantages of this kind of virtual environment in retail have been discussed in the literature, concluding that they improve customer experience in terms of both utilitarian and hedonic values [20, 23] providing useful tools and making the shopping experience more enjoyable. One use case has been demonstrated in [31] where the authors evaluated augmented reality at the point of sales in retail stores to improve the assessment of information. Recently, recommender systems researchers have discovered the technology, too, and first combinations have emerged [2, 19] which promise to improve the item-exploration and decision-making stages of the shopping process.

Table 1. Input and output channels provided by the Microsoft HoloLens.

The Microsoft HoloLens is a MR head-mounted display (HMD) which shows holographic content which are objects made of light and sound that appear in the world around the user, just as if they were real objects. Holograms respond to gaze, gestures and voice commands, and can interact with real-world surfaces. With holograms, digital objects can be created that are part of the real world. HoloLens enriches holograms with light and sound effects (Table 1, third column) and uses sensors to enable interaction. The same sensors (Table 1, first column) allow to extract different valuable inputs for a recommender system (Table 1, second col.): Position, head-gaze and eye-gaze of the user are determined by spatial mapping of the surrounding environment. Using the cameras and microphone, the HoloLens can recognize hand gestures and voice commands. Practitioners have added support for additional capabilities such as QR code and object recognition [32].

3 An In-store Mixed Reality Recommender System

Our MR recommender system proposes relevant products to customers throughout their shopping journey via an application running on a HoloLens, making product information search and comparison more efficient [34]. In this scenario, the device is owned by the customer, so it will contain personal information needed for the system to provide more personalized product suggestions. Recommendations are designed to take the customer’s behavior in-store and other information, e.g., the history of purchases, into account to facilitate the search for relevant products. We first evaluate which of the input channels provided by the HoloLens (Table 1, second col.) can be used to extract information about the user (Sect. 3.1) and then conceptualize the recommender system using this information (Sect. 3.2).

Fig. 1.
figure 1

Overview of our system. The bottom row shows two usage scenarios. In the left example, triggered by the examination of a package of milk, the user is presented with two product recommendations (milk drink and yogurt) shown as 3D holograms. In the second example, while looking at the box of a video game, the user is presented with additional information in the form of a video trailer, screenshots and prices.

3.1 Input Data

The top left corner of Fig. 1 shows the input channels provided by a HoloLens. Users can input information using the device’s point-and-click interface, for example to maintain personal profiles with basic data such as age or interests but also dynamic information such as their current shopping list or wish list. The device itself provides the user’s current and previous positions inside the store. By combining the raw positions with semantical information about the store layout, a list of encountered products, shelves and departments is computed. The inertial measurement unit provides information about the orientation of the user’s head to determine an approximate gaze direction which can be refined employing the integrated cameras. The cameras are also responsible for image and object recognition tasks, e.g., of examined products. Finally, using gestures and voice commands, the customer can navigate the system’s user interface in a natural way (Fig. 1, bottom) and provide feedback about products. Apart from these types of input data, the recommender system incorporates information maintained in the retail’s IT systems, including product ratings by individual users, product information and the overall purchases of all customers.

3.2 Recommender System

We follow van Capelleveen et al.’s [6] Recommender Canvas methodology as a guidance framework. The design process is sub-divided into six areas—goal, domain characteristics, functional design, technical framework, interface design and evaluation—which we will describe in the following.

The goal of the proposed system is to support the product purchase decision by personalized products recommendations and to provide relevant information when making purchase decisions. Examples for such information include the location of a recommended product in the store and product information presented through a holographic interface (Fig. 1, bottom right).

The design of the system is influenced by several characteristics of the target domain, retail. One of the biggest challenges for the adaptation of recommender systems in offline retail is the availability of relevant data. By tapping the HoloLens inputs (Sect. 3.1) as a valuable source of additional data, we can overcome many of the limitations, e.g., the lack of contextual real-time information. In the proposed physical store system, data includes the customer’s location and trajectory inside the store. This helps retailers to understand personal preferences and provide a purchase experience which is centered around individual customers. The integration of the additional inputs can be achieved using data hybridization through SVDFeature [8] which can leverage latent features and side information, such as visited departments.

The functional design of the system is driven by the question of what functionalities a user might expect from the system. Contemporary customers have a desire to be treated by retailers on a personal level (Sect. 1) which is facilitated by system’s high level of context awareness and use of real time information [10].

The core technical framework for our recommender system follows a collaborative filtering approach. In this, the selection of the filtering method is critical. We propose to use SVDFeature [7] which is based on matrix factorization (Sect. 2) and provides a balanced trade-off between complexity and flexibility. This allows to incorporate all of the inputs provided by the HoloLens. Another important point of consideration is the training method to use. The model can be trained by minimizing the loss function using stochastic gradient descent [17]. As recommendations are known to be more relevant when customers are close to the recommended product [28], we propose to use the location of the user in post-context filtering [1].

The interface design takes into consideration how, when, what and where to present recommendations. They could be in the form of image, video, text overlays as well as three-dimensional holograms of recommended products. We devise two use cases. In the first one (Fig. 1, bottom left), the user sees a top-N list of recommended products. The second scenario is the automated presentation of additional information (Fig. 1, bottom right) that may help the user in making buying decisions. In both of these cases, users can use the interface of the system to receive recommendations and information effortlessly by clicking on the respective holographic buttons.

To evaluate the system performance, user-centric studies along the lines of [26] and [16] can be conducted to validate the claim that the system improves a customer’s shopping experience. Common quantitative measures such as precision and recall can be used to measure the quality of the recommendations. Capelleveen’s Recommender Canvas [6] also considers ways to optimize the recommender system further. However, given the prototypical nature of our system, optimization methods are not applicable yet.

4 Discussion

Both mixed reality and recommender systems have proven to be advantageous in retail [9, 23]. This paper proposes combining both to aid the transition of traditional retail into the omnichannel model. We leverage the capabilities of the Microsoft HoloLens using it as both a valuable source of input and an output channel for recommendations. The HoloLens can permanently monitor what shelves or products a customer is examining, collecting more information and allowing retailers to assist users better. The increased collection of input data reduces the problem of data sparsity resulting in more accurate recommendations for the customer.

An alternative approach to obtain similar real-time information is computer vision [22] which however has some limitations. For example, to infer which product a customer is seeing using standard surveillance cameras would require a large amount of cameras placed in different locations, powerful hardware and algorithms to analyze the video streams. In contrast, the HoloLens provides this information as a single integrated device which does not need any additional hardware or software to be set up. Furthermore, there are also evident advantages regarding the output. Perhaps the most important one is the immersive environment for the end user. This interaction is similar to that with real objects, making it more natural than in mobile phones and fully immersive environments. This immersion makes the shopping experience more pleasant to the user and more streamlined towards a buying decision.

In its current form, our system has some remaining limitations. The first is the accessibility and usability of its hardware. Also, the gesture-based interactions typically require some time for new users to get used to. Future work includes the deployment of our prototype in a lab or actual retail store for qualitative and quantitative user studies. There are multiple directions in which the system can be extended, for example a specialization for a specific type of retail. Regarding the hardware used, the next generation of the HoloLens has already been released. Some of its features are more precise position and eye gaze estimations which would improve the precision of the recommender system and reduce some of the current limitations. A challenge for retailers will be to understand the customers’ perception regarding privacy. Customers usually express higher privacy concerns in personalized services than in non-personal ones [33]. Through user studies, it will be interesting to see to which extent immersion can aid the perception of the personalized recommender system.