Keywords

1 Introduction

Retail business models have evolved over the years to create a value chain that combines multiple channels to interact with customers and suppliers. For many years, the traditional retail model operated with physical stores complemented with a variety of marketing efforts to drive traffic to each location and provide an attractive shopping experience. Technological advances have been expanding the set of channels that are used to interact with customers, with e-commerce being perhaps one of the most disruptive, creating new business models that have shaped the current landscape into what is referred to as omnichannel retailing.

There are different paths through which retail firms have evolved into their current state of channel configurations. A large fraction of companies, such as Walmart and Barnes & Noble, correspond to traditional brick-and-mortar operations, opening online channels to complement their still-dominant traditional stores. Other companies were born as purely online players but have found opportunities to sell in physical stores for some product categories. This is the case of Amazon with their lines of AmazonBooks and AmazonGo. Other e-tailers have attempted to gain space in the physical world by using third-party stores. For example, Dell currently sells not only through its own website but also through Best Buy stores.

Regardless of how companies have evolved toward an omnichannel strategy, in most of these cases, online and offline divisions have grown with siloed structures operating with limited coordination (Herhausen et al. 2015). Whereas these structures impose some challenges to conduct an integrated analysis of customer behavior, they also bring opportunities to support integrated decisions based on evidence. For example, new configuration in retailers’ logistic chains has made it possible for customers to navigate more freely between the boundaries of traditional channels. While webrooming and showrooming are phenomena that naturally occur with the appearance of electronic channels, retailers can take advantage of them by designing processes that lead to smoother and more profitable purchase journeys. As pointed out by Mehra et al. (2017), retailers can use different strategies to face these phenomena, including price matching and exclusive product assortment. Similarly, Verhoef et al. (2015) show that showrooming can be moderated not only by price savings but also by perceived price dispersion.

In terms of technological progress, in the last 30 years, we have observed radical advances in data management tools, and we can now store and access large collections of data in a relatively inexpensive manner. More recently, cloud computing and column-oriented and other NoSQL databases have emerged as common tools to address big data (Hashem et al. 2015). Moreover, every year, we observe the emergence of new technologies that capture new types of data that were not available before (Bradlow et al. 2017). For example, video data have been used to support store operations (Musalem et al. 2016), and customer location data have been used to better understand customer responses to promotional activities (Goic et al. 2018). Whereas most of the tangible advances in data management are provided by software and hardware components, it is important to realize that they must be accompanied by adequate organizational support to react opportunely and flexibly to the new insights generated (McAfee et al. 2012).

To provide a seamless customer experience, an omnichannel strategy requires a detailed design of several relationship between customers and all available channels. For the purpose of building a conceptual framework to organize the practice of analytics in this setting, it is useful to first think about retail analytics more generally; this section follows a constructive approach for this framework through a revision of existing research in operations management, marketing, and economics in the context of retail analytics.

Bell et al. 2014 propose a simple framework to think about alternative business models in retail. They adopt a customer-focused view and identify two fundamental components to classify omnichannel strategies. They first define where a customer obtains the information she needs in the purchase process (e.g., online vs. brick-and-mortar stores) and, second, how the transactions are going to be fulfilled (e.g., store vs. home delivery). In a traditional brick-and-mortar business model, both the information acquisition and fulfillment are executed in the store: customers obtain product information by looking at the product assortment on display and receiving assistance from sales employees. With this information, customers decide what to buy, and the fulfillment is performed immediately with the available inventory. In the pure e-commerce business model, customers acquire all the information online and make a purchase decision, with the fulfillment performed through direct delivery. In this framework, omnichannel retailing corresponds to the business model in which some of these components (information acquisition and fulfillment) are performed online, and others, offline. For example, a retailer implementing a buy-online, pick-up in-store initiative builds a new business model in which the customer acquires information and processes the payment online, but the fulfillment is performed in the physical store. Another business model is showrooming (Economist 2016), in which customers acquire product information in a physical store with limited inventory, the final purchase is executed online and the fulfillment is performed via delivery.

The framework of Bell et al. (2014) is useful to understand the underlying processes of a wide variety of retail operations, structuring the sequence of steps required to create a smooth experience for customers and how the relevant information should be delivered to them. However, when thinking about analytics, the focus shifts to the mechanisms required to learn from customer data to support decision making. Here, we extend this scheme to describe not only retail business models but also operational decisions that can be supported using analytical models. More specifically, we extend the conceptual framework of Bell et al. (2014) by adding another layer to describe how to analyze omnichannel data. Given the complexity of customer journeys in an omnichannel world, a key element to create a fruitful environment to apply analytics is the identification of all relevant instances in which customer behavior is observed. With the recent and continuous growth of electronic channels, an enormous amount of data of different types is being recorded, including browsing behavior and store visits, which are driving some of the most recent advances in omnichannel analytics.

In our view, the process via which customers acquire information and product fulfillment are parts of a unified customer experience, and therefore, the practice of analytics should be primarily devoted to support decisions that enhance that experience as a whole. In fact, in an omnichannel world, interactions with customers are seen as a continuous process in which retailers are facilitators in a complex customer journey. Thus, in our extended framework, we consider that analytical models should be not only thought of systemically but also evaluated considering all possible paths that a customer follows in their shopping process. One of the key premises of omnichannel retailing is to facilitate a fully-integrated approach that provides shoppers a unified experience across all available touch-points, and therefore, an intervention in one channel can drive sales via others.

Within the customer experience, we keep the two key concepts of information delivery and fulfillment, as proposed by Bell et al. (2014), because they emphasize that customer insights derived from formal analysis of customer data can support not only the manner in which retailers inform customers but also the mechanisms that retailers use to make products available to customers. When talking about information delivery, it is important to realize that customers can obtain valuable information in many different forms. From a broad perspective, availability, physical inspection (e.g., texture, color, and smell), warranties, recommendations, and product reviews are all types of information that a customer might be interested in knowing when evaluating alternatives. When talking about fulfillment, we certainly include delivery and pick-ups as the most notorious cases, but the framework is open to new mechanisms.

Consequently, we propose an extended framework to understand retail analytics built from the following components:

  • Decision focus: Research in retail operations and related fields focuses on transforming data into prescriptions and decisions related to some specific aspect of the customer experience. Hence, one dimension to characterize work in analytics is by the type of decisions under study.

  • Data sources: There are numerous types of data that can be used to conduct analytics. This typically goes hand in hand with the decision focus, as some data sources are more relevant to analyze certain decisions. The type and sources of data are another dimension to classify work in retail analytics.

  • Methods: This refers to the type of tools and methodologies used to process and analyze data in order to prescribe practical decisions. The selection of the best tools depends on the facets of customer experience that are analyzed and also on the types of data.

This framework is illustrated in Fig. 1. We show that online and offline channels share some commonalities and differences across these three dimensions. Data in online channels tend to be more granular but usually more difficult to process because they integrate information from multiple sources. Since the methods used depend on the type of data, there are also differences across the channels in terms of the methodologies for data analysis. The next sections describe in more detail each of these components.

Fig. 1
figure 1

Framework of retail analytics

2 Data-Driven Decisions

The main purpose of conducting an analytics project is to provide a quantitative and systematic approach to improve decisions and the allocation of resources in an organization. A retail analytics project is typically geared towards improving decisions related to some specific aspect of the customer experience, either through the information acquisition or fulfillment parts or both. Examples include decisions related to product assortment, pricing and service quality, among others. Some of these decisions are applicable in a single channel, and others apply to both online and offline channels. Consequently, a first dimension that differentiates retail analytics work is the type of decisions that the study seeks to support.

From an operations perspective, we identify the following key decisions that directly determine customer experiences in retail: (1) design and layout of the shopping environment (website or store), (2) inventory and product variety, (3) customer assistance in the shopping process, (4) pricing optimization, (5) promotion planning and execution, (6) customer reviews, and (7) locations of stores and fulfillment centers.

The definition of the type of decisions determines the sources of data that are used to conduct the analysis. Although our classification of retail analytics involves three dimensions—decisions, data, and methods—the data requirements go hand in hand with the types of decision; therefore, it is natural to discuss data sources along with the decisions that use them.

The data sources and technologies used to collect the data can be quite different depending on whether the decisions are focused on the online or offline channel. Hence, it is useful to first discuss some fundamental differences between the data coming from online and offline channels. The online shopping process can be observed in detail through browsing records and search history, whereas offline, it is difficult to track details about the products on which customers focus their attention. In this regard, the technology used to capture and manage the data that track customer experiences during the shopping process is important to determine the scope of retail analytics. The rest of this section describes specific differences in the data sources and structures across online and offline settings and also shows how recent data-capture technologies have led the way toward an integrated view of omnichannel analytics.

Data about customer purchase decisions are an essential element in most retail analytics studies and are perhaps the ultimate performance outcome to measure the effectiveness of retail management. Pioneering work by Guadagni and Little (1983) initiated a vast body of work using transactional data from point-of-sales facilitated by scanner technology. Transactional data have been used to study the effect of promotions (Gupta 1988), the existence of price references (Roberts and Lattin 1991) and the adoption of new products (Bronnenberg and Mela 2004), to name a few applications. An important difference between online and offline purchase data is the level of detail at which the customer can be tracked across shopping visits. In traditional stores, purchase transactions are typically anonymous, unless the retailer operates with a loyalty program or some other incentive for customers to provide their identity. Marketing research companies such as IRI and Nielsen construct customer panel data, but the sample sizes for a specific retail chain or store can be limited to a handful of observations per panelist. In contrast, it is more common for online purchases to have a customer identifier that can be used to track repeated purchases with higher frequency. Customer visits are also easier to record in online channels using cookies and other technologies to track web visits. Consequently, customer panel data describing repeated shopping behavior are less costly to collect and are available with higher frequency in online channels relative to offline. Nevertheless, aggregate store-level data have been analyzed extensively in the offline channel and have been successful in driving retail management in a broad set of applications (see examples in Fisher et al. (2009)).

In online channels, measures of conversion—another important performance measure—are straightforward to compute combining purchase data with browsing information. This measure can also be calculated at the individual customer level, tracking which products customers view, add to the shopping cart and ultimately buy. In contrast, offline channels have only recently started using measures of conversion, with the advancements of traffic counters and other people-tracking technology in retail stores (Kesavan and Mani 2015; Musalem et al. 2016). Nevertheless, traffic counters do not provide customer identities; thus, conversion is still measured at a more aggregate level relative to studies using online data.

Understanding customer shopping paths and browsing behavior has been useful to design e-commerce websites effectively. With the widespread availability of clickstream data, in the early 2000s, researchers from different fields became interested in studying various components of online browsing. For example, Montgomery et al. (2004b) show that page viewing is informative about purchase intention, significantly improving the accuracy of conversion models. Park and Fader (2004) extend this stream of research and demonstrate that click-stream data from multiple retailers can be combined to further improve predictions. More recently, Huang and Van Mieghem (2014) show that clickstream data can effectively be used to improve operational forecasting and inventory management. In general, whereas early works concentrate on describing how users navigate in retailer websites (e.g., Bucklin and Catarina (2003)), more recent works use browsing data, including visit duration and page views, as predictors of basket value (Mallapragada et al. (2016)).

The design of an e-commerce website can be viewed as the equivalent of the layout of a physical store. In contrast to online, studies of shopping/browsing behavior in offline channels are rare. Moreover, online retailers can also target customers with personalized websites based on past browsing behavior, whereas the layout of a physical store is less dynamic and impossible to personalize. More recently, customer geolocation data obtained via RFID, beacons, and WiFi appear promising for expanding the scale at which browsing behavior can be measured at a low cost in retail stores. For example, Larson et al. (2005) use RFID technology installed in supermarket carts to track customer shopping paths to infer browsing behavior and characterize shopping visits. Section 5 describes some of these applications in detail and shows how methods used for studying online browsing behavior can be extended to analyze offline channels.

Inventory management has been an important area of research in the operations management community. Inventory affects customer experience through several mechanisms. First, it determines the availability of options that customers can choose from, in terms of both the breadth of the assortment and the amount of the inventory for each option. Second, the level of inventory can also influence customer perceptions about the product (Cui et al. 2018; Cachon et al. 2018). Researchers studying inventory find data to be quite different in the online and offline channels. Offline, customers are usually directly exposed to the inventory available in the store. However, it is difficult to track the exact inventory that each customer was exposed to because inventory data are imprecise (DeHoratius and Raman 2008). Therefore, empirical research using inventory information from the offline channel typically works with aggregate store-level data (Musalem et al. 2010; Conlon and Mortimer 2013; Vulcano et al. 2012). In contrast, in online channels, inventory information is not always available to customers. In some cases, customers can see if a product is in stock (because out-of-stock products are usually not displayed) but not the exact level of inventory available. Some online retailers provide inventory information to customers about how many units are left. In this case, there are data regarding the exact level of inventory that each customer was exposed to, thus making it possible to analyze the impact of inventory on purchases at an individual customer level (Cui et al. 2018). In addition, inventory in online channels is typically managed in centralized warehouses, in contrast to inventory in stores, which is more exposed to customer manipulation, increasing inventory movement and shrinkage and thereby reducing the level of control that the retailer has over the inventory. Consequently, online channels typically have better control of the level of inventory to which customers are exposed during their shopping.

In offline retail, the salesforce is a key resource to implement an effective execution in physical stores. It is also the second-largest operating expense in retail after inventory costs (Kesavan and Mani 2015). Measuring the value of the salesforce in terms of generating sales is useful to optimize the staffing levels in a retail store and across the chain (Fisher et al. 2017). Studies by Perdikaki et al. (2012) and Chuang et al. (2016) have investigated this problem, relating staffing levels and customer store traffic to store revenues. More recently, technological solutions to track people inside retail stores are providing opportunities to measure specific interactions between customers and employees, thereby providing a more fine-grain view regarding the role of the salesforce in generating revenue. For example, Musalem et al. (2016) and Jain et al. (2016) use video analytics to measure how assistance by employees affects sales.

Although salesforce is a less critical resource in online channels, many e-commerce websites have implemented sales assistance through chat or phone. When available, the information about the customer–employee interaction can be recorded in great detail (requiring voice transcription for telephone contact centers, as in Netzer et al. (2012)). Processing such information has become possible using tools from machine learning and natural language processing, which have been used to track customer sentiment and other metrics related to the customer experience (Yom-Tov et al. 2018).

Pricing optimization is an active area in which analytics have been successful in changing management practice (Phillips 2005). The literature regarding data analytics in pricing is extensive and includes work about markdown optimization (Soysal and Krishnamurthi 2012; Moon et al. 2017), dynamic pricing (Elmaghraby and Keskinocak 2003), market competition (Besanko et al. 1998), and behavioral aspects of consumer choice (Busse et al. 2013). Transaction prices in offline channels are recorded precisely using point-of-sales data; however, posted prices are not always recorded for unsold items and may require imputation to account for missing data (Bradley 2003). In general, pricing online tends to be more dynamic relative to offline, and therefore it is possible to use real-time information about market conditions to optimize prices frequently (Fisher et al. 2017). Online browsing data also provide information about posted prices that were observed by each customer and therefore tend to be more precise relative to what is typically recorded in offline channels.

Promotions have been a fundamental tool in traditional retail. Following a long tradition in marketing research (Blattberg et al. 1995; Christen et al. 1997), it is useful to distinguish displays, which are promotional information presented during a customer’s shopping visit, and features, corresponding to coupons, mail and other promotional information used to attract customers to the store. Similar to the case of inventory, data about promotions tend to be more granular in the online channel. Online, it is possible to target in-site promotions to specific customers and identify which customers are actually exposed to the promotion; this is usually not possible in a retail store. Moreover, for feature promotions in offline channels, it is not always possible to know which of the customers in the store actually received the promotion, and researchers have developed specific methods to address this issue (Musalem et al. 2008). In e-commerce, there are technological solutions for this problem—such as website cookies, click redirects and pixels—in order to identify which customers were targeted with online feature promotions (Visual IQ 2018).

Advertising is another area where electronic channels have played a crucial role. Manchanda et al. (2006) were one of the first to show a positive correlation between exposure to online advertising and sales. Since then, different studies have analyzed a variety of aspects of online advertising, such as the decomposition between short- and long-term effects (Breuer et al. 2011), the effect of location in sponsored search (Agarwal et al. 2011), and privacy concerns derived from increasing levels of personalization in online advertising (Goldfarb and Tucker 2011). Several innovations have been introduced in the area of online advertising, adding new mechanisms to better target customers with more relevant information. Consequently, research about online advertising has been focused on the effect of these new mechanisms, such as retargeting (Lambrecht and Tucker 2013) and personalization (Bleier and Maik 2015). When thinking about advertising in omnichannel environments, a first-order concern is whether online advertising has an effect on brick-and-mortar channels (Dinner et al. 2014; Goic et al. 2018). Compared to brick-and-mortar retailing, the availability of detailed information about online advertising exposure has facilitated the estimation of its effectiveness. However, at present, customers are exposed to many different advertising instruments, including TV and radio, sponsored search and social media. Although some specific interactions have been well studied (e.g., Liaukonyte et al. (2015)), the problem of attributing the impact of each promotional vehicle in converting customers is still an active area of research (Li and Kannan 2014; Kannan et al. 2016).

Online channels have also exploded the amount of data related to product reviews. Before the e-commerce era, products reviews were typically available through consumer reports and expert reviews. With the growth of e-commerce and social networks, online reviews have grown exponentially through different sources, including retail websites, social networks, and specialized search engines. Online reviews have been used to analyze their own impact on sales (Luca 2016; Floyd et al. 2014), to predict quality issues (Kang et al. 2013) and also as an outcome to measure product reputation (Li 2016). Moreover, information about product reviews is sufficiently rich to describe customer learning (Zhao et al. 2013) or even to infer market structures (Lee and Bradlow 2011). Analyzing these massive amounts of online data has required using suitable methods from machine learning, which are described in Sect. 3.

Information about the competition—which is relevant for retail management in a competitive market—has been used in multiple research studies involving analytics. Price and promotion data for both offline and online channels are usually available from third parties, such as Nielsen and IRI, but in online channels, it is also possible to obtain data about the competition using publicly available data from retail websites. For example, Li et al. (2017) use prices posted for hotels to identify relevant competitors in a market. Inventory levels of the competition are more difficult to obtain but feasible in cases in which the inventory is published online (Cui et al. 2018).

Offline, store and warehouse locations are an important decision that has been studied extensively through data analytics (Zheng 2016). Location decisions usually require data about the consumer population and demographics, in addition to information about other establishments from the same chain and the competition. Some studies combine this information with customers’ addresses recorded in sales transactions to measure more accurately the relevant market of a given location (Albuquerque and Bronnenberg 2012). In online channels, the most common approach to fulfillment is direct delivery, for which the location of retail outlets is apparently irrelevant. However, some evidence derived from store openings suggests that the location of physical stores can affect online sales (Wang and Goldfarb 2017). Recent studies have analyzed alternative fulfillment methods to solve the last-mile problem in online purchases, such as temporary pick-up locations (Glaeser et al. 2019) and pick-up lockers (Yuen et al. 2018). In moving to omnichannel, the location of stores for online order pick-up and the location of showrooms become interesting applications to use analytics.

Table 1 summarizes the decisions and the differences in the types of data that are typically used in the offline and online channels.

Table 1 Comparison of different types of data in online and offline channels

3 Methods in Retail Analytics

A second dimension to characterize a retail analytics study is the type of methodological tools used to study a problem. Most analytic initiatives combine tools from optimization and stochastic modeling with data analysis methods from statistics and data science. Our framework is focused on the data analysis piece, and this section provides a brief classification of the different approaches to conduct data analysis and empirical research in the context of retail analytics.

Based on Terwiesch et al. (2018), we identify three approaches to conduct data analytics in retail. The first approach is focused on evaluating the impact of an intervention. For example, Gallino and Moreno (2014) conduct an empirical study to measure the impact of the implementation of a buy-online pick-up in store initiative in a retail chain. In e-commerce, it is common practice to validate a new design using A/B testing, corresponding to the systematic application of controlled experiments in which some customers are randomly selected to apply an intervention to their experiences and are then compared to a similar pool of customers used as a control group. For example, Dinerstein et al. (2018) analyze the effect of a change in the search engine of eBay that facilitates price comparison among similar products on customers’ choices and their price elasticity. The main objective of this impact-evaluation approach is to validate whether a change in management practices was effective in improving some aspect of the customer experience and, based on this analysis, decide to deploy these changes throughout the system.

Another potential approach to conduct data analytics is to validate whether theoretical predictions are observed in practice. For example, Cachon et al. (2018) describe alternative theories that provide ambiguous predictions regarding how inventory levels affect demand and use observational data from an automobile company to test which of these multiple mechanisms dominates in this particular setting. Similarly, Santos et al. (2012) use browsing data to test different economic theories about how customer search in online markets. In some cases, this approach overlaps with the impact evaluation of an intervention, but it puts the focus on understanding the underlying mechanisms through which the intervention affects customer experience and performance outcomes.

The third approach seeks to use data analytics to estimate the key parameters that are required as inputs for a decision model. The operations management field has a long tradition of combining analytical models and optimization methods from operations research and economics to help managers improve the efficiency of their operations. In applications of retail management, these models include more sophisticated decision processes of customers and decentralized decisions in an organization, supply chain, or competitive market. Applying these models in practice requires an empirical validation of some of the assumptions required by these models and to estimate some of its input parameters and primitives. For example, a key input for models of assortment planning is the pattern of demand substitution when a product is added to/removed from the assortment. Kök and Fisher (2007) develop a decision model based on math programming to optimize assortments that includes a parametric demand model to characterize substitution patterns; the Authors also develops a method to estimate the parameters of this demand model using historical data.

In addition, data analysis methods can be separated into two groups depending on the type of question that seeks to be answered: (1) descriptive analysis, where the objective is to find correlations among factors that can be used to predict a given outcome without changing the current configuration of the system being analyzed, and (2) causal analysis, which goes one step further with the objective of measuring the causal effect of one or several factors on a given outcome in order to predict the effect of an intervention. Note that both approaches involve making predictions. Consider as an example the predictions that a store manager must make in order to decide staffing levels in a retail store. First, the manager must make a forecast about the traffic that the store will receive, based on information about seasonality, weather patterns, location demographics and market-level variables, such as consumer confidence indices. Making this prediction does not require a causal analysis: it is a descriptive study that associates different covariates with the customer traffic expected at the store. Second, the manager has to predict how the sales of the store would be affected by increasing or decreasing the staffing level for a given customer traffic forecast. This prediction requires a causal analysis because it is responding a “what-if” question associated with an intervention that changes the current process of assigning labor to stores (see Fisher et al. (2017) for an example).

Finally, data analysis can be grouped into different types of methods that are used to analyze the data. Here, we distinguish two families of techniques: (1) statistical/econometric methods, which are based on probability models that describe the data generating process, and (2) machine learning methods, which are useful to analyze dependencies among a large number of variables in large-scale data, without going into the details of the data generating process. Both approaches can be used to perform descriptive analysis or causal inference, depending on the data used in the study.

The gold standard for conducting causal analysis is using an experimental design, in which the analyst manipulates the data-generating process in order to produce exogenous variation of an intervention of interest, typically by randomly assigning the treatment (i.e., the intervention) across different groups. The advantage of this approach is that any association between the treatment and the outcome of interest can be attributed to a causal effect of the intervention. This type of research design is more frequently used in online retailing because the randomization across treatment and control groups can usually be manipulated at the individual customer level through the e-commerce webpage at a low cost and on a large scale. The design of such experiments involves determining a priori an estimate of the required sample size to have a properly powered statistical test to validate the results of the experiment. In addition to A/B testing, experimental designs are also used to estimate the parameters that enter a decision model. Recent work along these lines is the work of Fisher et al. (2017), who use field experiments to measure demand elasticities, which are then used to prescribe dynamic pricing strategies.

In contrast to online, in offline channels, it is difficult to target interventions to specific customers; therefore, the manipulation of an experimental design has to be performed at a more aggregate level—product category, store, regions, etc.—thus increasing the cost of the experiment. For this reason, experimental designs are more scarce in offline retail, but there are a few that have been effective in practice. A notable example is the recent work by Williams et al. (2018), who conducted a field study (in collaboration with Gap Inc.) to analyze the impact of working schedules on employee productivity, randomly assigning stores into a treatment group in which the company changed working schedules to be more stable and under more employee control. More generally, data analysis in offline channels is frequently conducted via an observational design, in which the analyst is limited to collecting historical data without intervening in the data-generating process. This reduces the cost of the data collection but introduces several challenges to conduct causal inference.

A common challenge of observational designs is that the association between two variables of interest may not be entirely driven by a causal effect, as there may be other factors not observed in the data—termed the “omitted variables”—that simultaneously affect the two variables of interest in a systematic direction. Hence, two variables that exhibit statistical correlation do not imply a causal effect between them—the association may be confounded by a third factor that is not included as part of the data analysis (and thus the established mantra that correlation does not imply causation). The problem of omitted variables is present in most empirical studies in OM that use an observational design. Consider, for example, a causal analysis to measure the effect of inventory on demand. Cui et al. (2018) study this question using an experimental design, manipulating the level of inventory that different customers were presented on the product webpages of an e-commerce site. In this experiment, the correlation between the level of inventory to which customers are exposed to and the conversion rates provides a direct measure of the causal effect of inventory on sales. However, now, consider answering the same question with an observational design, obtaining historical transaction data regarding customer browsing behavior and the inventory levels that were presented to each of them. The inventory levels are a decision of the manager that are based on demand forecasts: products that are predicted to be popular are stocked with higher inventory. This fact generates a positive correlation between inventory and demand that is not causal. Consequently, the variation of inventory in an observational design cannot be used directly to conduct a causal analysis. This example can be generalized to other factors that are typically of interest in retail analytics, such as prices, promotions, staffing levels, and location, all of which are decisions that were made by managers based on demand projections.

A fundamental aspect to conduct causal analysis using an observational design is what is referred to as the identification strategy: a clear definition of the sources of variation in the data that can be used to estimate the causal effect of interest. Whereas the variation in an experimental design is generated by the analyst, in an observational design, it is necessary to define what the exogenous sources of variation are that can be used to identify a causal effect. In the example of the causal effect of inventory on demand, an exogenous source of variation is a supply shock to inventory that is unrelated to demand. Cachon et al. (2018) use this identification strategy, collecting data about extreme weather events occurring at automobile assembly plants, which propagate supply shocks through the supply chain to produce exogenous variation in inventory levels at the dealerships.

Although experimental designs are more common online, we still see that most of the academic research regarding e-commerce uses observational designs. There are several reasons that explain the low usage of experiments in empirical research in online retail. First, there is often resistance from managers to implement field experiments beyond marginal variations of the current practices, which is the most typical use of A/B testing platforms. Although the culture of validating interventions through experimentation is at the heart of technology-oriented companies that were “born online,” this approach has been more difficult to disseminate in traditional companies that were born in the “old economy.” Hence, offline retailers that moved to omnichannel tend to show some resistance in adopting experimental designs throughout the organization.

Second, the size effects of the interventions are typically small, which increases the number of customers that are needed in the experimental design to provide a precise estimate of the intervention’s impact. This effect is particularly salient in studies that seek to analyze the effects of online advertising: Lewis and Rao (2015) show that measuring returns on advertising is difficult, requiring an immense amount of observations (over 10 million person-weeks) to measure the effects with sufficient precision (Berman and Elea (2018) show how to use post-stratification to increase the statistical power in marketing experiments).

Third, analytics in e-commerce often involve integrating multiple sources of data, some of which cannot be directly manipulated by the analyst. Consider the work by Luca (2016) investigating the causal impact of online restaurant reviews on sales. This causal analysis is challenging because restaurant quality, which is difficult to measure precisely, constitutes an omitted variable that correlates with sales and review ratings. It may not be the review per se that drives the sales: the review is just a signal indicating the high quality of the restaurant. Running an experimental design would require manipulating online reviews, which has led to ethical and legal issues (Gonzalez 2016). This limits many retail analytics projects to using observational designs for performing causal analysis.

The most common approaches to address issues of endogeneity in observational designs come from the econometrics field, which has focused on developing estimation techniques to work with nonexperimental data. Some of these tools include combining cross-sectional and longitudinal data through panel datasets and difference-in-difference (Gallino and Moreno 2014), instrumental variables (Cachon et al. 2018), natural experiments (Parker et al. 2016; Sorensen 2007), and regression discontinuity (Cohen et al. 2016; Luca 2016). Machine learning methods have traditionally been used in descriptive analysis involving a large number of variables. For example, Fu (2018) uses machine learning to conduct descriptive analysis to improve demand forecasts, analyzing large-scale data from online fashion blogs to predict the color popularity of apparel products, which can be used to update production plans to better match demand. Nevertheless, more recent work has been extending machine learning methods to conduct causal analysis. In this context, Li et al. (2017) use high-dimensional online pricing data to identify causal relationships among prices that can be used to identify competitive sets (products and firms that compete with each other). Another example is Glaeser et al. (2019), who combine machine learning techniques with panel data methods to identify the market potential of multiple delivery locations of an online retailer.

4 Convergence of Data and Methods in Omnichannel Analytics

Although there are important differences in the decisions, data, and methods that have been used to analyze offline and online customer experiences, there is an emerging trend of convergence in some research areas. We pick two application domains that exhibit two-way convergence between online and offline retail analytics. The first domain is related to location decisions, which traditionally have been analyzed in offline retail but are now becoming relevant to online retail. The second domain is customer browsing behavior, which has been dominant in online retail but is now emerging in offline channels, facilitated by new technologies that track customer shopping paths.

In traditional e-commerce, in which the online and offline channels are viewed as separate business units, physical store locations have a minor role in the online business. However, with the emergence of the omnichannel view, physical stores become an important lever to influence the customer experience by providing complementary information to the online channel and adding a new fulfillment channel for online purchases.

The study by Gallino and Moreno (2014) analyzes the impact of adding the option to pick online purchases in the store, analyzing outcomes in both the online and offline channels. The study combines information about store locations with online browsing and transaction data to study how online shopping was affected by the intervention depending on the customer’s proximity to a physical store. In addition, the study also analyzes how the intervention affects sales at the physical stores using aggregate data. Hence, the analysis combines different data sources of various forms to provide a complete view of the impact of this omnichannel initiative. In terms of methods, the study conducts a causal analysis using econometric techniques in an observational design. Whereas it would have been possible to conduct A/B testing, the company had already implemented the intervention when the researchers collected the data. Nevertheless, a clever identification strategy that uses customer zipcodes located far away from physical stores—for which the intervention should have no effect—provides a quasi-experimental design to conduct the causal analysis.

Glaeser et al. (2019) provide another good example of convergence, analyzing the design regarding the last-mile delivery of an online retailer. The retailer collaborator in this study developed a fulfillment approach based on moving trucks that use temporary convenient locations to park and wait for customers to pick up their purchases. The problem of choosing the truck locations resembles the problem of deciding which store locations to open in the offline channel: opening a new location captures demand from that geographical area, but as more locations are opened, there is a cannibalization effect across locations. In Glaeser et al. (2019), the optimal location problem is more dynamic: there is not only cannibalization across locations in close proximity but also intertemporal cannibalization of opening the same location too frequently. Hence, two inputs are required to build a decision model that optimizes pick-up locations: (1) the base attractiveness of a location, which is defined as the demand that would be captured if the location was opened in isolation, and (2) the interlocation and intertemporary cannibalization effects that occur when customers switch from one pick-up opportunity to another. For the first problem, there are massive amounts of data describing the demographics, store outlet, transportation, commuting patterns, and many other variables. Glaeser et al. (2019) evaluate multiple machine learning techniques to identify the key variables that predict the base attractiveness of a location by explaining online purchases with the multiple location characteristics. This descriptive analysis is combined with a causal analysis to measure the effect of opening a nearby location close to the focal location—in terms of either geographic or time proximity. This is done using panel data methods that exploit the dynamic nature of pick-up locations, which is typically more difficult with traditional store openings, which tend to expand slowly (a notable exception is given in studies about Walmart location openings, e.g., Basker 2005; Basker and Noel 2009).

Whereas the above examples use physical locations as a fulfillment option for e-commerce, stores can also complement the information acquisition piece of the customer experience in online shopping. The concept of showrooming—using the inventory in the store to obtain product information that is then used to purchase online—has become both an opportunity and a threat for the online and offline channels. A threat for offline retail is when customers can explore the inventory in the store and then purchase (usually at a lower price) from another e-commerce retailer. Several strategies have been suggested to counter this threat (Mehra et al. 2017). Showrooming has also become an opportunity for omnichannel retailers: separating the purchase from the distribution allows for retailers to lower inventory costs at the stores and perform the fulfillment more efficiently from the warehouses. By reducing inventory handling, showrooms also save time for the salesforce to focus on customer assistance (Economist 2016). Bell et al. (2017), in collaboration with the online eyeglass e-commerce retailer Warby Parker, evaluate the impact of the opening of showrooms on the online business. They exploit the sequential opening of showrooms to identify the causal effect on online purchases for customers within the trading area of each location, use propensity score matching—an econometric technique that matches control and treatment groups based on observable characteristics. They find that sales increase within the trading area by 3% in the online channel and product returns are reduced by 1%. As with the case of buy-online pick-up in store, this work demonstrates an interesting convergence of data and methods, integrating geographical information from location openings with the granular data from online customer purchases.

Another prominent example of convergence of online and offline retailing is the analysis of customer browsing behavior. Early work studying customer shopping behavior started with observational and ethnographic research (Paco 1999; Underhill 2005). E-commerce opened new opportunities to study shopping behavior, using detailed browsing data from web logs. For example, using online browsing data, Moe (2003) identifies different store visit profiles that are associated with specific shopping objectives. Similarly, Danaher et al. (2006) analyze which factors explain web site visit duration and depth, including demographic variables and website characteristics.

Studies based on online purchase data have detailed information about the search history of customers, which can be used to understand the consideration set of customers and measure more precisely purchase incidence (Wu and Rangaswamy 2003). In contrast, data regarding browsing behavior in retail stores have been, for the most part, nonexistent. Studies that seek to measure the effect of changes in the layout and display of a store have typically used aggregate store-level data to conduct causal analysis. Moreover, most studies using scanner data from supermarkets need to make strong assumptions to infer which customers were actually considering to purchase a product; in general, the conversion rates are low, which reduces the statistical power of the statistical models that seek to understand purchase incidence. Even panel data collected by third parties suffer from this problem, but to a less extent if past purchases can predict future purchase patterns well (Bell et al. 1998).

Farley and Ring (1966) were among the first to conduct data analytics with customer shopping path data, collecting the data manually. With the deployment of sensors that can track the movement of customers inside the store, new opportunities to study browsing behavior in physical stores have emerged. Larson et al. (2005) and Hui et al. (2009b) use Radio Frequency Identification (RFID) technology to track movements of shopping carts in supermarkets. Larson et al. (2005) develop novel multivariate clustering algorithms to identify customer segments based on their shopping paths. Hui et al. (2009b) take a close looking at individual customer shopping paths, linking them with basket purchases, to study how customers browse in the store and how they deviate from the optimal shopping path that minimizes traveling distance. Hui et al. (2009) test several behavioral hypothesis to explain customer shopping paths. Seiler and Pinna (2017) use RFID to measure consumer search effort by customers shopping in supermarket, showing that an additional minute of search lowers total expenditures by 8%. Sorensen (2003) developed a shopping-tracking system based on RFID, which has been patented and used to track customer shopping paths with high frequency.

Although RFID has opened new possibilities to track customer paths, its application has been limited to supermarkets and other stores, where RFID tags can be installed in shopping carts. More recent studies have used computer-vision technology to track customers inside stores using videocamera recording, which can be used in a wider set of retail sectors. Burke (2006) describes an application of this technology in a consumer electronics retailer to measure retail productivity during the holiday shopping season. The study shows that despite the large increase in customer traffic to the stores, conversion rates were relatively low, in part due to long waiting times at checkout, overwhelmed sales employees and high stock-out rates. A more recent study by Jain et al. (2016) uses similar technology to track customer shopping and their interactions with sales staff, providing a detailed analysis of how customers acquire information using different resources at the stores. Lu et al. (2013) use computer vision to track the lengths of lines at supermarket deli counters to measure the impact of waiting times on customer purchases. They find that customers focus primarily on the length of the queue and that queue lengths above 5 people have a significant impact on conversion rates. Musalem et al. (2016) develop a methodology to track customer assistance by employees using cameras, which is scalable and can be used to monitor this service metric on a daily basis. They combine these data with aggregate sales to measure the mediating effect of salespeople in generating sales from customer traffic.

Analyzing customer shopping paths can be viewed more generally as a relevant problem for studying consumer behavior, as described by the general framework developed by Hui et al. (2009a). They identify three examples of path data in retail from online and offline channels that are good examples of the convergence of data and methods across these domains:

  • Shopping paths in stores, tracking customer movements and dwelling times in different store areas.

  • Eye-tracking to measure how customer focus their attention in advertising, shelf facings, and display promotions.

  • Web browsing behavior tracking the sequence of web pages visited by a customer on an e-commerce site.

Burke (2006) also describes interesting research applications that exhibit this convergence in retail analytics, in what he calls “customer experience management” (an analogy to customer relationship management), facilitated by technologies that enable real-time tracking of the customer experience.

New technologies to track customer paths in stores are emerging. In particular, tracking the location of WiFi devices inside stores is now feasible, which has great potential given the widespread adoption of smartphones. In addition to customer path-tracking, these technologies also allow for two-way communication with the customer, which can be used to assist customers during their shopping and provide georeferenced targeted promotions. In Sect. 6, we describe new applications using this technology.

5 Examples of Data and Methodology Integration

The previous section shows some examples of previous work in the literature that suggests a trend of convergence in data and methods across online and offline channels, which have been propelled by the omnichannel retail business model. This section provides two examples from our own work that provide more detail about the research design and execution of analytic projects in omnichannel retail, putting special emphasis on the integration of online and offline data into a seamless and unified approach to study decisions that affect customer experiences in both channels.

5.1 Triggered Email Marketing

Triggered or behavioral emails correspond to personalized messages sent automatically as a response to specific actions of customers. Typical examples of this type of campaign include confirmation and order status emails, cross-selling recommendations, cart abandonment reminders, and re-engagement emails. There are at least two reasons to believe triggered emails can have relatively large response rates compared to traditional emailing. First, the identification of the right time to deliver a marketing communication can be an important driver of effectiveness (Li et al. 2011). Second, triggered emails enable the identification of good prospects when most historical data are not very informative. Whereas for many product categories, the analysis of purchase history is a good predictor of future purchases (Rossi et al. 1996), for infrequently purchased items, there is insufficient history at the customer level to make a proper inference of purchase intentions. Omnichannel analytics can play a crucial role in delivering effective communications with customers in this setting. In fact, omnichannel retailers can leverage information gathered from different channels to create a more complete description of customers needs at any point of time. In the particular case of products with noninformative purchase histories, retailers can rely on recent browsing data to infer purchase intentions. For example, a customer actively browsing products online in the washing machine category can be a predictor of purchase incidence in that product category. This example is a good illustration of how retailers can combine data from different channels to support sales effort allocation decisions.

How can analytics contribute to better understanding the impact of triggered email marketing? How can analytics help managers to design more effective communications? First, it is necessary to evaluate whether this initiative is indeed effective. In fact, a simple exploration of key performance metrics without an adequate evaluation of causality is insufficient to adopt this strategy. In the context of behavioral targeting, causality is a crucial concern because large response rates can be fundamentally driven by selection of customers who would buy regardless of the firm intervention. In other words, customers receiving triggered emails might exhibit larger sales just because they were interested in buying in the first place, not because the firm communicated with them. From an omnichannel perspective, the evaluation should not only be performed against concurrent controls but also decomposed by channel and category. In this manner, product and store managers can better anticipate variations in sales volume.

The second motivation to use analytics in this context is to guide the design of effective emailing. Several decisions must be made when designing a promotional campaign. A first decision is which customers should be prioritized to receive the automated messages, based on how many times they visit the product web page and other indications of purchase intentions. Once good prospects have been identified: what should be the context of the message? Should the retailer only recommend those products that the customer already visited, or should they inform customers about a broader assortment? Finally, after knowing the recipient and the content, when should the retailer send the message? Should they send it right after identifying that a customer is actively browsing in a product category? Whereas an immediate response can be consider intrusive, a late response might arrive after the customer made her purchase decision. An experimental design was implemented to answer these questions empirically, as is described next.

5.1.1 Experimental Design

To evaluate the effectiveness of triggered email marketing from an omnichannel point of view, we partnered with a large regional retailer in Latin America. This firm operates several department stores and has a well-established online channel that accounts for approximately 10% of corporate sales. The firm was interested in conducting a pilot study to evaluate the business potential of browsing abandonment triggers. To pilot the study, we selected categories in the electronic goods department (LED TVs, smartphones, washers, dryers, and heaters) because products in these categories receive a relatively large share of page views, and transactional data have relatively low explanatory power to describe short-term buying behavior.

To reduce the impact of seasonality or other confounding effects, for each day, we randomly assigned approximately half of the customers who satisfied the triggering condition to a control group to whom no message was sent. This helps to evaluate the marginal impact of emails and separate this effect from pure identification of purchase intentions. In a period of 30 days, we identified 23,906 browsing abandonment events, but we restricted our attention to customers who had opened emails from the company in the recent months. For this group, we sent 5723 mails, and for each one sent, we observed whether it was opened, if they clicked in the message to visit the website and whether the customer purchased in any department in any of the available channels. The emails we sent also varied in terms of the timing, repetition, and personalization level used to create the content of the message. Here, we concentrated on two key design variables related to the content of the message and the time at which it was sent. Regarding the content, we evaluated different strategies of product recommendations. First, we considered a set of products closely related to the one the customer visited the most and then another with the most popular items in that product category. Regarding the timing, we evaluated sending the email 2 and 4 days after identifying a browsing abandonment event. For a more detailed list of conditions, see Goic et al. (2016).

5.1.2 Results

We started the analysis with a simple comparison of sales of the treated versus the control group, as reported in Table 2. Overall, we found that communicating with customers through triggered email can increase sales significantly for this segment. When decomposing online and offline sales, both exhibited positive lifts on sales, but only the effect on online sales was significant. Similarly, when analyzing the categories from which customers ended up buying, we did find that sales in the own category are significantly larger for treated customers, but they are not in other categories. The positive but not significant effect on offline and cross-category sales invites further analysis with larger sample sizes or with different treatments to evaluate whether these business opportunities can be translated into profit.

Table 2 Overall effect of triggered emails on sales

To analyze the impact of design variables such as timing and the content of the recommendation, we need to compare customers treated in one condition with customers treated in another condition. A simple method to do this is using a difference-in-difference approach, in which we compare the marginal lift of each type of campaign in terms of their corresponding controls. Let y i be the sales for individual i and Trigger i a dummy variable indicating whether the individual was treated with a triggered email. To complete the regression model, we also need to include dummy variables to indicate the type of treatment that we used. Considering that we analyzed the nature of the product recommendation and the timing of the message, we define Narrow i to indicate whether the customer received a narrow set of recommendations (as opposed to those who were recommended the most popular products in the category) and TwoDays i to indicate whether they received the email 2 days after the triggered condition was observed (as opposed to those who received the message 4 days after). Thus, we used the following regressions to compare the effectiveness of different treatments:

$$\displaystyle \begin{aligned} \begin{array}{rcl} y_{i} &\displaystyle = &\displaystyle \alpha_{0}+\alpha_{1} Narrow_i+\beta_{2}Trigger_i+\tau Narrow_i \cdot Trigger_i+\epsilon_{i} \end{array} \end{aligned} $$
(1)
$$\displaystyle \begin{aligned} \begin{array}{rcl} y_{i} &\displaystyle = &\displaystyle \beta_{0}+\beta_{1} TwoDays_i+\beta_{2} Trigger_i+\gamma TwoDays_i \cdot Trigger_i+\varepsilon_{i} \end{array} \end{aligned} $$
(2)

In these equations, we are interested in the parameters τ and γ. Those parameters indicate what fraction of the additional sales are explained by that specific type of treatment. For example, if τ is positive, we conclude that recommending customers products similar to those he or she was browsing is associated with larger sales. The results of these regressions are presented in Table 3 and indicate almost no significant difference between the types of treatment. Although narrow assortments and earlier communications generated larger sales, those differences are not significant. Among all the partial effects, the regression results indicate that narrow assortment in online sales is the only treatment that has a significant positive effect.

Table 3 Effect of design variables in email effectiveness

Having established the main results, there are a variety of additional analyses that can be conducted to refine the execution of automatic communications with customers. In fact, we expect that the manner in which the retailer communicates with customers should depend on the product category and customers’ characteristics . For some categories, the recommendation should be concentrated on the attributes in which the customer has expressed interest. For others, the retailer could be better off recommending from a wider product range. Similarly, the relative importance of online versus offline sales could also depend on the nature of the product, with a larger impact on offline sales for products with nondigital attributes, such as furniture or apparel (Gallino and Moreno 2018). In this regard, we ran a series of complementary regression analyses and found that effectiveness is moderated by customer characteristics and by the time of the day when the message is delivered. For example, our results suggest that event-based marketing is more effective for older customers. This finding is probably because young customers are very active in browsing, but tighter budgets can limit their expenditures.

Overall, this example illustrates that an omnichannel perspective of analytics can make communications more effective. The observation of customer behavior in one channel can lead to a response in a different channel. However, proper implementation and evaluation of these practices require a careful statistical analysis.

5.2 Enhancing Store Operations with Browsing Data

Customers using multiple channels in their purchase processes have been documented since the early 2000s (Thomas and Sullivan 2005). Since then, the evaluation of the incremental value of each channel has been an active area of academic research (Kannan et al. 2016). An important cross-channel behavior is research shopping, where customers use electronic channels to gather information and compare products, but the transaction is finally performed at brick-and-mortar stores (Verhoef et al. 2007). Recent statistics indicate that this is indeed a growing tendency in the industry (Ellet 2018). How can retailers leverage this behavior to improve store operations and provide better service quality? How can analytics help to achieve these goals?

Like in the previous example of triggered emails, we consider that online navigation patterns can be informative about short-term customer preferences. Realizing that a sizable fraction of customers that are navigating today are going to turn to physical stores to complete their purchases in the near future opens the opportunity to identify patterns in online browsing to improve store execution. For example, if we identify that a specific product is being visited more intensively, then store managers can check inventory levels, create special displays or simply let salespersons know which are the most-visited products such that they can make recommendations based on browsing popularity. This is precisely what we intend to analyze in this example: what is the effect on store sales of informing store personnel about browsing navigation patterns?

5.2.1 Methodology

The analytical solution required detailed exploration of customer data not only to evaluate the performance of the initiative but also to craft the reports that are delivered to store personnel. For example, product preferences are likely to be store-specific, and therefore, each store should be informed only from customers who are likely to visit that specific store. After evaluating several alternatives, we decided to select customers based on two criteria. First, we apply a geographical filter such that only customers living within a certain distance from the store are eligible to be analyzed. Among them, we looked at purchase histories and selected customers who concentrated the majority of purchases in that store in the last 6 months. In terms of customer heterogeneity, we explored several classifications, but we ended up classifying customers based on only gender and age because these are the characteristics that salespersons can easily identify in stores. Therefore, navigation reports are made at that aggregation level.

To determine the list of products to include in the reports, we simply considered the list of the 10-most-visited products in each department broken down into customer segments. The reason for this is that it was determined to be the easiest to interpret by salespersons. Some minor variations were made to accommodate distinctive characteristics for each department. For example, for female shoes, we only differentiated by age, not by gender. Similarly, for audio & video, an additional list with accessories was provided. Finally, we checked inventories to report only available products.

Reports are delivered once per week by email to each sales coordinator of each of the departments in the treated stores and then via a physical copy directly handed in by the research team. During the same visit, a short survey is conducted to verify, among other things, how much they used the reports in the previous period, whether the reports effectively reflect customers’ preferences and how much they think they can influence customer decisions.Footnote 1

To evaluate the impact of providing store personnel with browsing data, we compared with a situation in which such information is not provided. Unlike the previous case, in which the intervention was made at the customer level, here, the treatment is executed at the store level, making it more difficult to have a perfectly controlled experiment. Therefore, in the analysis, we tried to control for other observables that might drive variations in sales. We evaluated the impact on sales in two stores, and in each store, we covered nine departments with relatively large percentages of customers declaring to perform online research before going to the store, as is indicated in Fig. 2.

Fig. 2
figure 2

Framework examples

Every week, each department could be treated (T) or left as a control (C). For those treated, we considered a variation with a weaker treatment (W), in which the recommended products do not correspond to the 10 most visited product, but instead to a random sample of products drawn from the 11th through 30th most visited. The comparison against this condition enabled us to test whether a potential positive effect depends on the quality of the information provided. For this purpose, we considered the possibility of giving salespersons a placebo with a random selection from the whole list of available products. Unfortunately, such placebo generated lists with unpopular products, making it not credible for salespersons. In addition, we also monitored the performance of two additional stores that were untreated and used them as complementary controls. The selection of these mirror stores was performed to match as closely as possible the size and sales volume of the treated stores. The experiment occurred during a 6-week period, from October 10 to November 30 of 2016, as is reported in Table 4.

Table 4 Schedule of treatment for different stores and departments

5.2.2 Results

We focus our description on the incremental revenues per department, week, and store. We have a relatively large number of independent variables including store, department, and week fixed effects and several covariates characterizing the salesforce in each department and store. For example, we have the total number of employees in the department, the percentage of male workers, the average age, and for how long the employees have been working at the specific store. Considering the sample size, the selection of control variables can have a relevant impact on the evaluation, and therefore, we ran regression models for all the 8192 combinations of regressors. Considering only the cases in which the regression coefficient for the treatment is significant at the 95% level, we found an average treatment effect (ATE) of 9.34%. That is, if a department receives a summary of what customer were browsing in online channels, that department on average increases weekly revenues by more than 9%. As a robustness analysis, we ran a series of LASSO regressions (Tibshirani 1996). Here, we include all the available controls, and the model automatically selects a subset to be included in the prediction. The results of LASSO regressions are similar but slightly weaker than the previous regression analysis, as is presented in Table 5.

Table 5 LASSO ATE results

To complete the analysis, we considered the possibility that sales workers may have not fully received the treatment. Whereas we are certain that the reports were given to all store managers, we did not guarantee that they were given to all associates in a timely manner. Moreover, even if they received the reports, they might have not used them actively. To address this concern, we used information from employees’ surveys to apply an instrumental variable approach (IV). More specifically, in the survey, we asked “how much did you use the report over the course of the last week?”; for every week, we took the mean for each department store and used it as an instrument. Using the IV approach, and only considering cases in which the coefficient for the treatment is significant at the 95% level, we found a local average treatment effect (LATE) of 4.71%.

Overall, these results suggest a large treatment effect, improving revenue from 4% to 9%, possibly even more for departments that use the information more actively. Notice, however, that this is the total effect of the treatment, but the previous analysis is silent about how much of the effect is explained by the quality of the information that we are providing. As we mentioned in the methodological discussion, in our design, we also considered a set of weaker treatments in which the recommended products do not correspond to the 10 most-visited products but instead to a random sample of products drawn from the 11th to 30th most visited. When repeating the LASSO regression in a variety of specifications, we found no significant difference between the strong and weak treatments. Whereas these results might indicate that the effect of the reports is not very sensible for detecting the very best preferred products, it also raises concerns of a Hawthorne effect—a change in the behavior of sales employees due to their awareness of forming part of a pilot study (Schwartz et al. 2013)—and therefore calls for further exploration.

The nature of these results highlights two good practices when applying analytics: first, the importance of running several sensibility analyses to evaluate the robustness of the results, and second, as we indicated in our conceptual framework, omnichannel analytics is part of a continuous process of learning. Therefore, after deriving new business insights, it is always beneficial to evaluate what other analyses can be performed with the new available data.

6 Current Challenges and Future Developments

In the previous sections, we have described how different data sources can be combined to create more profitable interactions with customers. With the consolidation of additional channels and the new technologies, there are important challenges that must be addressed. In this section, we describe current challenges in the practice of omnichannel analytics and some directions that we believe are going to be important in the near future.

6.1 Mobile Retailing and the Blurry Boundary Between Online and Offline Shopping

The penetration of mobile marketing has been rising fast in the past few years, and it is expected that expenditure on mobile advertising will continue to increase. Although early academic work regarding mobile marketing was devoted to characterizing its adoption (Yang 2010) and describes consumer attitudes about and acceptance of the technology (Holmes et al. 2013), it has not been until recent years that its effectiveness has been empirically analyzed. For example, Wang et al. (2015) explore how purchase patterns differ among customers buying from their mobile phone compared to those shopping from traditional desktop devices and found that mobile customers are indeed associated with more purchases. However, what is really new with mobile in an omnichannel world? Is it just another channel in an already complex set of available touchpoints? We pose that there are structural reasons to believe that mobile technologies not only play a special role in omnichannel retailing but also are blurring the boundaries between online and offline shopping.

First, mobile technologies are ubiquitous (Okazaki and Mendez 2013). At present, mobile users have access to thousands of applications to assist them in a myriad of task ranging from financial planning to learning new languages. Moreover, as is pointed out by Kleijnen et al. (2007), those activities can be performed anytime and anywhere. Thus, from an omnichannel perspective, mobile devices radically expand the scope in which retailers can connect with customers. Second, the portability of mobile devices provides the conditions for a joint experience of online and offline channels. That is, while customers are shopping in brick-and-mortar stores, they can be, at the same time, comparing prices or reading product reviews. Bearing in mind that a seamless customer experience is the golden goal of omnichannel retailing, the simultaneous character of mobile usage brings important operational challenges in terms of channel coordination. The existence of mobile channel also opens opportunities to offer superior experiences through interaction throughout the complete shopping journey. Consider, for example, a grocery store at which customers can create shopping lists in their smartphones. Using beacons, upon entrance, the retailer can identify that the customer has arrived and send them information about products on discount in their shopping lists and help them to locate them in the store. While in the store, the customer can even indicate if a given product is not available, and at the checkout, mobile technologies can also help to speed the payment process up. Similar experiences can be implemented in other formats. For instance, in home improvement, mobile apps can provide image recognition capabilities to identify the specific type of items that customers are looking for. For apparel, customers can interact with the stores in real time to check product availability for a given combination of size and color. All these examples require a profuse use of analytics to determine the right customer, the right content, and the right time of the interaction.

We have argued that online and offline data are complementary because they are informative about different aspects of consumer behavior. Similarly, information coming from mobile sources exhibits different patterns than that collected from other electronic sources. For example, in Fig. 3, we display browsing behavior from customer visiting the website of a large regional retailer. In this figure, we compare visit patterns of mobile customers with those using regular desktop computers. Panel (a) shows the times at which customers use these devices, revealing that although desktop users have their peak in the afternoon office hours, mobile users have a much later peak. Similarly, panel (b) compares age profiles, showing that mobile users are more concentrated in younger segments.

Fig. 3
figure 3

(a) Desktop vs. (b) mobile browsing behavior

We have also argued that both data and methodologies are converging. The joint experience of online and offline channels through mobile makes it difficult to classify whether the information is coming from an online or offline source. Mobile also constitutes a good example of convergence in methods. In mobile, each data point is usually associated with a specific location, and therefore, we can analyze relatively long sequences of information belonging to a shopping trip. This structure closely resembles what online marketers have explored to describe browsing sessions using clickstream data, and therefore, the development of unifying frameworks to describe path data associated with customer journeys is a promising stream research (Kannan et al. 2016; Hui et al. 2009a).

Mobile data can also help to understand new omnichannel phenomena. For example, to understand whether customers are showrooming in certain categories, we need to know not only when customers visit the stores but also what are the product categories on which they are focusing their attention. Modern smartphones are equipped with several sensors providing detailed information about customer usage. For example, retailers can easily identify position of customer in the store using beacons or WiFi signals. This information can be used to compute the ratio of customers who purchase in each category to those who visit and then characterize their purchase patterns. For example, Fig. 4 displays the percentage of customers visiting selected areas of a supermarket as a function of total time in the store. Whereas most of the customers are identified to cross the cashier area, those spending more than an hour in the store are significantly more like to visit the electronic department, and at the same time, they are less likely to complete a transaction at the cashiers. This pattern is what we would expect if a customer were showroooming in the electronic department.

Fig. 4
figure 4

Percentages of customers visiting each product category as a function of total time in the store

To complete an omnichannel experience, the retailer should not only be able to identify the categories where customers are showrooming but also to determine those customers to personalize the value proposition through all available channels. Personalization is precisely another challenge in omnichannel analytics.

6.2 New Technologies, Personalization, and Privacy Concerns

Personalization corresponds to the practice of using historical customer data to decide which value offering is more suitable for each individual customer (Arora et al. 2008). The literature suggests that personalization provides several benefits for customers, such as better communication and product matches according to their needs (Vesanen 2007; Murray and Häubl 2009). Personalization also yields benefits to retailers because it generates higher response rates and profits (Postma and Brokke 2002).

As pointed out by Montgomery and Smith (2009), an important challenge associated with personalization is given by its computational complexity. This is particularly relevant in personalization implementations such as shopbots (Montgomery et al. 2004a) or adaptive websites (Hauser et al. 2009) that require real-time responses. In this context, analytics should play a major role not only in developing the algorithms to personalize content, assortment and prices but also in determining what are the data sources that are more informative for each type of recommendation.

One of the major criticisms raised against personalization is the invasion of the consumer’s privacy (Van Doorn and Hoekstra 2013). Personalization necessarily implies showing customers that their transactional and demographic data are being used to generate content, which can be evaluated as invasive by some of them. Although some level of personalization can be implemented without collecting personal data (Sackmann et al. 2006), it is a major challenge for managers to find the proper balance between more detailed information leading to more effective recommendation and the potential privacy concerns that come with this information. From an omnichannel perspective, this is even more relevant because the existence of multiple communication channels must motivate retailers to rationalize the manner in which they connect with customers to only connect with them when it is relevant (Ketelaar et al. 2018).

New technologies are permanently introducing challenges and opportunities for analytics. We identify at least three areas in which technology can drive new innovations in retail analytics (Grewal et al. 2017). First, some technologies will be disruptive for the creation of new business models, where analytics will play an important role in the design and execution. In addition to the changes that mobile devices are already producing in retail operations, we expect that technologies such as dynamic digital signage (Roggeveen et al. 2016) and virtual reality shopping (Nantel 2004; Suh and Eun Lee 2005) will become established mechanisms to enhance customers’ shopping experience. Second, the Internet of Things (Da et al. 2014) and other sensing technology capturing real-time data about the shopping environment will enable retail operations to adapt dynamically and thereby become more efficient and responsive. Lastly, in the last few years, we have observed enormous advances in the methods to analyze data (Bradlow et al. 2017). In the context of omnichannel retailing, we expect that some of these methods will move the current frontier to expand the possibilities of combining customer data from different channels to derive a truly integrated view of customer behavior.

7 Conclusions

We have based our discussion on omnichannel analytics on the following four key ideas:

  1. 1.

    The data that retailers can collect from online and offline sources are different.

  2. 2.

    The methodologies developed to address different types of data from online and offline sources are different.

  3. 3.

    The type of information contained in online and offline data are complementary, and therefore, they can be informative about different facets of customer behavior.

  4. 4.

    In omnichannel environments, the boundaries dividing online and offline data are disappearing, and the methodologies to analyze these data are converging.

In comparing online and offline retailing, a literature review in the fields of operations management and marketing reveals important differences in the decisions, data and methods used across channels:

  • Online data tend to be more granular, providing information at the individual-customer level. Moreover, online browsing data can be used to calculate conversion precisely at the customer level. In offline, transaction data tend to be analyzed at a more aggregate level, and customer panel data are usually collected via third parties.

  • Decisions online tend to be more dynamic than offline. Prices change more frequently, and layout, display, and inventory/variety can be personalized to each customer.

  • Experimental designs are more frequently found in online channels. For this reason, causal analysis is more challenging in offline channels and therefore requires different methodologies to address causality.

These differences notwithstanding, there is also some evidence for convergence across methods and data in omnichannel retailing . We identified several examples of decisions that become relevant in an omnichannel business model that illustrate how the analytics that were originally used in one channel can be extended to the other channel. Location decisions, which have been extensively used in offline retail, are now relevant in online channels as fulfillment strategies and showrooming. We also observe that customer browsing behavior, which became possible to analyze with data from web logs in e-commerce, has now become feasible to study in offline channels using different types of tracking technologies.

In our description, we have provided some specific examples illustrating how different data sources can be used to support decision-making in different channels. Future opportunities are arising with the proliferation of mobile retailing to generate a seamless omnichannel customer experience. Moreover, new technologies will continue to emerge, creating new challenges and opportunities, and having an integrated view of analytics that combines data and methods from both streams of research will become even more valuable to develop innovations in the retail sector.