1 Introduction

What are the important features of the pricing and promotion tactics used in the U.S. retail grocery industry? We study this question using Nielsen RMS scanner data that record weekly store-level prices and quantities for nearly 50,000 products in 17,184 stores that belong to 81 different retail chains, including grocery stores, drug stores, and mass merchandisers. This comprehensive data set allows us to provide generalizable insights that are not specific to a small number of products, categories, stores, or retailers as in most of the extant literature. Similar (or the same) data have been widely used for academic research in marketing, industrial organization, and macroeconomics as well as by managers and analysts in the industry. Our goal is to provide a range of generalizable insights on pricing, promotion, and demand patterns that we hope will spur future research. Further, we intend these insights to inform industry practitioners to evaluate and improve current pricing and promotion tactics.

The first part of the paper documents the degree of price dispersion for (almost) identical products across stores at a given moment in time. We define products as UPCs,Footnote 1 which are identical across stores, and also as brands, which are “almost identical” aggregates of UPCs that differ in form or pack size but contain the exact same product content. The overall degree of price dispersion is large even at narrowly defined geographic levels. For instance, at the 3-digit ZIP code level, the ratio of the 95th to 5th percentile of prices is 1.294 for the median UPC and 1.433 for the median brand. Further, there is substantial heterogeneity in the degree of price dispersion across products.

The overall price dispersion may be due to variation in base prices, i.e. the regular shelf-price of a product, or due to price promotions when a product is sold at a temporary discount. To distinguish between these two sources of price dispersion we develop a new algorithm that classifies prices as base or promoted prices. We document a substantial degree of base price dispersion that is only modestly smaller than the overall degree of price dispersion, both nationally and at the local market level. Further, most products are frequently promoted. For instance, the median product is promoted once in 6.8 weeks, and the median promotional discount is 19.5%. Also, there is much heterogeneity in the promotion frequency and depth across stores.

To quantify the contribution of the different sources to the overall price dispersion, we decompose the overall variance in prices across stores and weeks in a year into separate components. Within markets, persistent UPC base price differences across stores account for the largest share of the price variance (46.5%), whereas the within-store variance of base prices during a year only accounts for a substantially smaller share (18%). Despite extensive promotional activity, the contribution of promotions to the market-level price variance is modest (35%). This is due to co-existing EDLP (every-day low price) vs. Hi-Lo pricing patterns, whereby stores with systematically high base prices offer deeper or more frequent price discounts than stores with systematically low base prices. Thus, the EDLP vs. Hi-Lo pattern compresses the overall price dispersion across stores.

In the second part of the paper we study if differences across stores in prices and promotion tactics, i.e. promotion frequency and depth, are related to market versus retail chain-specific factors. We find that market (3-digit ZIP code) factors can explain a significant percentage of the national price variance (46.5% for the median product). However, a substantially larger percentage of the price variance is explained by retail chain factors (69.9%), and local market/retail chain factors in particular (88.1% for the median product). An analogous analysis for differences in promotion frequency and depth across stores reveals almost identical results. Further, it is not just that the overall frequency and depth of promotions are similar at the chain and especially at the market/chain level, but individual promotion events are coordinated, too. In particular, we document that the incidence of a store-level promotion is strongly associated with the percentage of other stores in the same retail chain that promote the product in the same week, both a the local market and at the national level.

The third part of the paper documents that the similarity in prices and promotions among stores in the same retail chain can be explained by similarity in demand. We estimate store-level price elasticities and promotion effects for 2,000 brands using similar data and a statistical approach (a Bayesian hierarchical model) that is employed by sophisticated analysts in the industry. Hence, our estimates are similar to the estimated effects used by retailers to make price and promotion decisions.

Mirroring our findings for price and promotion tactics, we find that a large percentage of the overall variance in price elasticities and promotion effects is explained by retail chain and especially local market/retail chain factors. In particular, using the estimates that most closely emulate the information available to industry analysts, 51.0% of the overall variation in price elasticities is explained by chain factors and 70.5% is explained by market/chain factors. This finding raises the question if retail managers can actually distinguish among price elasticities and promotion effects across stores. To address this question, we predict credible intervals for the estimates and document the percentage of store-level price and promotion effects with credible intervals that exclude the local chain-level average of the estimated effects. Using this approach, a large majority of the estimates is indistinguishable from the market/chain-level price and promotion effects. Further, our results show that store-level cross-price elasticities in particular are typically imprecisely estimated.

The difficulty to distinguish among local store-level price and promotion effects and to obtain precise cross-price elasticity estimates suggests that retail managers may plausibly consider price discrimination across stores as infeasible. This hypothesis is also consistent with the anecdotal evidence from conversations with retail chain managers, who frequently indicated that local price discrimination (“store-specific marketing”) is challenging to implement in practice. Our explanation for price and promotion similarity is different, although not mutually exclusive, from other explanations that have been proposed in the literature, including managerial inertia (DellaVigna & Gentzkow, 2019), brand image and fairness concerns (Ater & Rigbi, 2020), and the softening of competition if firms can commit to less granular pricing (Adams & Williams, 2019; Dobson & Waterson, 2005). Industry insiders have also proposed that price discrimination across stores is infeasible in practice due to the institutional constraints of feature advertising, i.e. retailer advertising that highlights price promotions in local markets. Our analysis rejects this last hypothesis, however, because not only promoted but also base prices are similar across stores.

The paper is organized as follows. We discuss the related literature in Section 2, and provide an overview of the data sources in Section 3. Section 4 documents the basic facts of price dispersion, Section 5 separately documents the degree of base price dispersion and the prevalence of price promotions, and Section 6 provides a decomposition of the overall price variance into base price and promotion components. Section 7 shows how price and promotion differences across stores are related to market versus retail chain-specific factors, and Section 8 documents chain-level price similarity and promotion coordination. In Section 9 we examine how the chain-level similarity in prices and promotions is related to similarity in demand, and Section 10 examines if retailers can distinguish among store-level price elasticities and promotion effects. We discuss our proposed explanation for price similarity in Section 11 and relate it to alternative hypotheses. Section 12 concludes.

2 Literature review

Several studies have documented price dispersion in retail grocery stores, including (Lach, 2002; Eden, 2014; Dubois & Perrone, 2015; Eizenberg et al., 2021), although with less extensive product, market, or retail chain coverage. Most closely related to our analysis of price dispersion in Section 4 is Kaplan and Menzio (2015), which studies price dispersion using the Nielsen Homescan household panel data. The Homescan panel data only contains the prices of products purchased. Hence, because demand decreases in price, the panel data do not represent the full distribution of prices at which products were available for purchase. This problem affects our analysis to a substantially lesser degree, because we observe the price of a product whenever at least one unit was sold in a given store/week.Footnote 2 The size of the Homescan panel only allows to systematically capture the prices of a small number of products. Related, due to the limited sample size, Kaplan and Menzio (2015) present results for 54 Scantrack markets, compared to the 840 3-digit ZIP codes that we use as the most granular market definition. However, an advantage of the Homescan panel is that it contains purchase data from all retail chains, unlike the Nielsen RMS data used in our paper.Footnote 3

Our paper documents a large degree of similarity in prices and promotions among stores in the same retail chain. Similar results are reported in Nakamura (2008) and DellaVigna and Gentzkow (2019). Our main explanation for this similarity is based on similarity in demand, and in particular the difficulty to distinguish among and to obtain precise price and promotion effect estimates at the store level. We will discuss this explanation and the relationship to other explanations that have been provided by (Adams & Williams, 2019; DellaVigna & Gentzkow, 2019), and Ater and Rigbi (2020) in Section 11.

Related to our work, Adams and Williams (2019) provide evidence on retail price similarity in the home improvement industry. Similar to our findings, they document that prices are not store-specific, but cluster within pricing zones.

Our analysis, which focuses on the dispersion of prices at a given moment in time, is related to work that provides generalizable evidence on the time-series variation in prices (Bronnenberg et al., 2006). A related literature documents the frequency of price adjustments and price rigidity, which has important implications for macroeconomics (Nakamura & Steinsson, 2008, 2013). Our work is also related to research on assortments across stores. For a sample of products in four categories, Hwang et al. (2010) find that stores that belong to the same retail chain in a market (state) carry similar assortments. This finding mirrors our results on price and promotion similarities within retail chains.

Our analysis of store-level price elasticities and promotion effects in Section 9 is related to meta-analyses of price elasticities in the marketing literature, in particular Tellis (1988) and Bijmolt et al. (2005). Other studies relate brand or category elasticities and promotion effects to market characteristics (Bolton, 1989), demographic and competitor information (Hoch et al., 1995; Boatwright et al., 2004), and category characteristics (Narasimhan et al., 1996).

3 Data description

Our analysis primarily uses the Nielsen RMS (Retail Measurement Services) retail scanner data. The data are available for academic research through a partnership between the Nielsen Company and the James M. Kilts Center for Marketing at the University of Chicago Booth School of Business.Footnote 4 We also use the Nielsen Homescan consumer panel data to select the products in our sample.

3.1 Nielsen RMS retail scanner data

The RMS scanner data record sales units and prices at the week-level, separately for all stores and UPCs (universal product code). The data available from the Kilts Center for Marketing covers close to 40,000 stores across various channels, including grocery stores, mass merchandisers, drug stores, convenience stores, and gas stations. Although the data cover cover many stores and retailers, the subset of the RMS data available from the Kilts Center for Marketing is neither a census nor a randomly selected sample. However, the data have broad geographic coverage and, on average, account for between 50 and 60% of all market-level spending in grocery and drug stores and for one third of all spending at mass merchandisers.Footnote 5 The data allow us to identify all stores that belong to the same retailer. However, the exact identity (name) of a retail chain is concealed.

A week in the RMS data is a seven-day period that ends on a Saturday. If the shelf price of a UPC changes during this period the quantity-weighted average over the different shelf prices in the week is recorded.

The RMS data only contain records for weeks when at least one unit of the product was sold in a given store. Hence, in particular for small (in terms of revenue) products the incidence of weeks without price observations can be high. For such products the observed sample is more likely to include low, promoted prices than high, regular prices, and hence the sample will not represent the true price distribution. To ameliorate this problem, we impute the missing prices using on an algorithm that first classifies the observed prices as either base (regular) prices or promoted prices. The algorithm distinguishes between regular and promoted prices based on the frequently observed saw-tooth pattern in a store-level time series of prices whereby prices alternate between periods with (almost) constant regular price levels and shorter periods with temporarily reduced price levels. We perform this classification separately for each store-UPC pair. We assume that weeks without sales are non-promoted weeks, which is justified by the large sales spikes that are frequently observed in promoted weeks. Hence, we impute the missing prices using the predictions of the current regular (non-promoted) price levels based on the price classification algorithm. Two examples of observed store-level price series and the corresponding predicted base prices are given in the Appendix in Figs. 15 and 16.

3.2 Nielsen Homescan household panel data

We use the Nielsen Homescan household panel to select the product sample for our analysis. The participating households scan all purchased items after each shopping trip, and thus Homescan provides comprehensive data on the UPCs purchased and the corresponding prices paid. During the sample period in this paper, 2008-2010, the Homescan panel includes more than 60,000 households. Nielsen provides sampling weights (“projection factors”) to make summary statistics from the data, such as total spending in retail stores, representative of the US population at large.

3.3 Sample selection

The 2008-2010 RMS data include information on almost one million UPCs. We intend our analysis to be as comprehensive as possible. However, including all products is challenging, in particular because “small” products rarely sell. For such products, store-level price observations are only available for a small percentage of weeks, and hence our price imputation algorithm (Section 3.1) is likely to yield noisy results.

Therefore, we use a subset of all products in our analysis. We select the UPCs that are observed in both the RMS scanner data and the Homescan household panel data, and we then choose the top 50,000 products based on total Homescan expenditure. We use the Homescan expenditure data to select products that are representative of the overall buying-behavior of US households. As discussed above, Homescan is intended to be representative, whereas the stores and retailers in the RMS data need not be representative of the US population at large.

The 50,000 chosen products account for 73% of the total Homescan expenditure and 79% of the revenue in the RMS data.Footnote 6 Excluding some infrequently sold UPCs, our final sample contains 47,355 products. Table 1 provides summary statistics for the product/store/week level price observations. In each year we observe more than 6 billion prices. In total, there are 18.81 billion price observations corresponding to 434 billion dollars in revenue. Including the imputed prices the number of observations is 27.09 billion; 30.6% of the prices are imputed.

Table 1 Product sample descriptive statistics

3.4 UPCs versus brands

We will frequently compare results that use UPCs as product definition with results for products defined as a brand. We obtain brand-level data by aggregating UPCs that share a common brand name. For example, all UPCs with the brand description “COCA-COLA CLASSIC R” or “COCA-COLA R” belong to the same brand, Coca-Cola, while products with the description “COCA-COLA DT” belong to a different brand, Diet Coke. The sales volume of a brand is measured in equivalent units, such as ounces or counts, and the brand price is measured as the average price per equivalent unit (e.g. 12 cents per ounce). We calculate weighted average prices, using the total product-level revenue over all stores and weeks as weights. Thus, differences in brand prices are entirely due to differences in the underlying UPC prices, not due to differences in the aggregation weights.

Using this aggregation process we obtain 11,279 brands. Summary statistics for the brand sample are shown in Table 1.

3.5 Chain and store coverage

Our sample includes data from 17,184 stores that belong to 81 different retail chains, including grocery stores, drug stores, and mass merchandisers. These stores represent more than 90% of the total revenue in the Kilts-Nielsen RMS data. For stores with a large incidence of weeks when the products in our sample did not sell, the predictions of the price imputation algorithm will be unreliable. We hence exclude these predominantly small stores, especially convenience stores and gas stations, from the analysis.

Table 8 in the Appendix summarizes the number of retail chains and stores in our sample at the DMA (designated market area) and ZIP+ 3 level.Footnote 7 Three-digit ZIP codes and counties are the smallest geographic areas accessible to researchers in the Kilts-Nielsen RMS data. Our sample contains 205 DMAs and 840 3-digit ZIP codes with at least one store. At the DMA level, the median number of chains is 6 and the median number of stores is 32. At the ZIP+ 3 level, the corresponding numbers are 4 and 10, respectively. Hence, even at the smallest geographic level there are multiple retailers and stores, and thus the measured price dispersion is unlikely to be limited due to a small number of store or retail chain observations.

3.6 Product assortments

The dispersion of prices will also be limited if many products are sold only in few stores. We provide a detailed analysis of UPC and brand availability across stores and retail chains in Appendix A.3. We find that most “large” (as measured by total revenue) products are widely available, but there are also many predominantly “small” UPCs and brands that are available only in few stores. We account for product size differences by analyzing the revenue-weighted distribution of prices, which places less weight on small products that are not widely available.

4 Price dispersion: The basic facts

We start our analysis by documenting the price dispersion of identical or almost identical products across stores at any given moment in time. We present the results separately for products defined as UPCs or brands. UPCs are identical across stores. Brand prices are calculated as a weighted average over the prices of the individual UPCs that share the same brand name (Section 3.4). These UPCs typically differ along pack size (15 oz, 20 oz, etc.) or form factor (bottles, cans, etc.). Hence, the brand aggregates need not be exactly identical across stores. Apart from the packaging, however, the main product is de facto physically identical across UPCs, and consumers are likely to perceive the product content as identical. Hence, the comparison of weighted average brand prices across stores is meaningful, and we refer to brands as “almost identical products.”

4.1 Price dispersion measures

For each product j we measure the dispersion of prices in week t using two statistics. Both statistics are calculated using the sample of store-level prices, \(\mathcal {P}_{jt}=\{p_{jst}:s\in \mathcal {S}_{jt}\}\), where \(\mathcal {S}_{jt}\) is the set of all stores that sell product j in week t. The first statistic is the standard deviation of the log of prices,

$$ \sigma_{jt}=\sqrt{\frac{1}{N_{jt}-1}\sum\limits_{s\in\mathcal{S}_{jt}}\left( \log(p_{jst})-\overline{\log(p_{jt})}\right)^{2}}. $$

σjt measures dispersion based on the percentage price differences from the geometric mean across stores.Footnote 8 The second statistic, rjt(0.05), is the ratio of the 95th to the 5th percentile of the price observations \(\mathcal {P}_{jt}\). We calculate the statistics for each week in 2010, and then take the mean over all weeks to report the average dispersion statistics σj and rj(0.05).

4.2 Price dispersion: UPCs

Figure 1 and Table 2 summarize the distribution of the price dispersion statistics across all 47,355 UPCs. To account for differences in the “importance” of each product we summarize the weighted distribution of the dispersion statistics using total product-level revenue across all stores and weeks in 2010 as weights.

Fig. 1
figure 1

Price dispersion statistics: UPC prices

Table 2 Price and base price dispersion statistics

The top row of Fig. 1 displays the weighted distribution of the price dispersion statistics at the national level. Overall, the degree of price dispersion for identical products across stores at any given moment in time is large. The log-price standard deviation, σj, for the median product is 0.161, which roughly indicates that 95% of prices vary over a range from 32% below to 32% above the average national price of the median product.Footnote 9 The ratio of the 95th to 5th percentile of prices, rj(0.05), is 1.646 for the median product.

The large degree of price dispersion at the national level may simply reflect systematic differences in price levels across regions. Our main focus, however, is on documenting the price dispersion of identical products in local markets, where consumers could at least in principle buy the same product from any of the local stores. To account for systematic regional price differences, we calculate the dispersion statistics separately for each market m using the price observations \(\mathcal {P}_{jmt}=\{p_{jts}:s\in \mathcal {S}_{jmt}\}\).Footnote 10 We then take the weighted average over all market-level dispersion statistics using the number of observations in each market as weights. We use two market definitions: DMAs (designated market areas) and 3-digit ZIP codes.Footnote 11

The market-average price dispersion statistics are shown in the middle and lower rows of Fig. 1 (see Table 2 for detailed numbers). The log-price standard deviation for the median product is 0.110 at the DMA level and 0.099 at the 3-digit ZIP code level, and the corresponding rj(0.05) values are 1.360 and 1.294 respectively. Hence, even in local markets there is a large degree of price dispersion for identical products.

While the overall degree of price dispersion of identical products at any given moment in time is large, Figure 1 and Table 2 also reveal another, equally important fact: There is substantial heterogeneity in the dispersion statistics across products. At the market (3-digit ZIP code) level, the standard deviation of log prices ranges from from 0.021 at the 5th percentile to 0.196 at the 95th percentile of the dispersion statistics. Similarly, the 95th to 5th percentile ratio of prices ranges from 1.045 to 1.713 when comparing the 5th and 95th percentile values.

4.3 Price dispersion: Brands

As discussed in Section 3.4, differences in brand prices across stores with the same brand-level assortments of UPCs are entirely due to differences in the UPC prices. However, brand prices may differ across stores if the assortments differ, even if the underlying UPC prices are identical.

Figure 2 and Table 2 summarize the distributions of the brand-price dispersion statistics. Generally, the degree of price dispersion is larger at the brand level compared to the UPC level. Nationally, the standard deviation of log brand prices is 0.175, compared to 0.161 for UPC prices. At the 3-digit ZIP code level the corresponding numbers are 0.129 (brands) and 0.099 (UPCs), respectively.

Fig. 2
figure 2

Price dispersion statistics: Brand-level prices

The large degree of heterogeneity in price dispersion that we documented for UPCs is also evident for brands. For example, at the 3-digit ZIP code level the standard deviation of log brand prices is 0.060 at the 5th percentile, compared to 0.239 at the 95th percentile.

We refer the reader to Section B in the Appendix for some additional results, in particular a sensitivity analysis that uses two alternative measures of price dispersion. Section B also compares our results to the results in Kaplan and Menzio (2015).

5 Base prices and promotions

For many products, prices alternate between periods when the product is sold at the base price, i.e. the regular or every-day shelf price, and periods when the product is promoted, i.e. offered at a discount. Base prices change only infrequently. The large degree of price dispersion documented in the previous section could be due to differences in base prices across stores, reflecting a relatively persistent component in the dispersion of prices, or due to price promotions that are idiosyncratic to stores. In this section we document the dispersion of base prices and the promotion policies for the products in our data. We also document the “importance” of price promotions based on the percentage of the total product volume sold on promotion.

5.1 Base price dispersion

We measure the dispersion of base prices, \({\mathscr{B}}_{jt}=\{b_{jst}:s\in \mathcal {S}_{jt}\}\), using the same statistics as before, the standard deviation of the log of base prices and the ratio of the 95th to the 5th percentile of base prices across stores. We show the results for UPCs in Fig. 3 and the results for brands in Fig. 19 in the Appendix. Detailed summary statistics are in Table 2.

Fig. 3
figure 3

Base prices dispersion statistics: UPC base prices

Generally, the degree of base price dispersion is substantial. The standard deviation of the log of base prices is 0.133 for the median product at the national level and 0.078 at the 3-digit ZIP code market level. The degree of base price dispersion is not much smaller than the degree of price dispersion. For example, the standard deviation of the log of prices at the 3-digit ZIP code level is 0.099, compared to 0.078 for the log of base prices. For brands, the difference is even smaller. Similar to the results in Section 4, there is much heterogeneity in the base price dispersion statistics across products.

5.2 Price promotions

The Nielsen RMS data do not directly indicate if a product was promoted. Instead, we infer a price promotion from the difference between the imputed base price and the realized price. We define the percentage price discount, or promotion depth, as follows:

$$ \delta_{jst}=\frac{b_{jst}-p_{jst}}{b_{jst}}. $$

We then classify the product as promoted if the percentage price discount exceeds a threshold \(\bar {\delta }\). The variable \(D_{jst}=\mathbb {I}\{\delta _{jst}\geq \bar {\delta }\}\) captures promotional events, where Djst = 1 indicates a promotion. We use a promotion threshold of \(\bar {\delta }=0.05\) in our analysis, a choice that we justify in detail in Appendix C.1. In short, small percentage price discounts where δjst is close to 0 are unlikely to represent a planned price promotion, but are likely due to measurement error in prices.

The price dispersion analysis so far was based on data from 2010. Here we extend the sample period to 2008-2010 to reduce measurement error, in particular in the promotion frequency statistic discussed below.

5.2.1 Promotion frequency

We measure the promotion frequency of product j in store s as

$$ \pi_{js}=\frac{1}{N_{js}}\sum\limits_{t\in\mathcal{T}_{js}}D_{jst}, $$

where \(\mathcal {T}_{js}\) includes the weeks in 2008-2010 when product j was sold in store s, and Njs is the corresponding number of observations. We calculate the product-level promotion frequency, πj, by taking a weighted average of πjs across all stores using the number of store-level observations as weights.Footnote 12 We measure the heterogeneity in the promotion frequency across stores using the difference between the 95th and the 5th percentile of the πjs observations (weighted by Njs).

The left panel in the middle row of Fig. 4 displays the weighted distribution (using total product revenue) of the promotion frequency across products (see Table 3 for key summary statistics). The average promotion frequency for the median product is 0.147, implying that the product is promoted about once in 6.8 weeks. The promotion frequency varies strongly across products, ranging from 0.011 (once in 91 weeks) at the 5th percentile to 0.370 (once in 2.7 weeks) at the 95th percentile level. The left panel in the middle row of Fig. 4 shows the corresponding differences between the 95th and the 5th percentile of πjs across stores s. For the median product this difference is 0.314, compared to the median average promotion frequency of 0.147. Hence, there are large differences in the promotion frequency of a given product across stores.

Fig. 4
figure 4

Promotion depth, frequency, and differences in promotion depth and frequency across stores. Note: The top-left panel displays the distribution of the percentage price discounts, δjst, pooled across all products j, stores s, and weeks t when the price is strictly less than the base price, pjst < bjst. The middle and bottom-left panels display the weighted distribution of promotion frequency and promotion depth across products, j. Here, promotion depth is measured conditional on the product being promoted at a 5 percent promotion threshold. The middle and bottom-right panels summarize across-store differences in promotion frequency and promotion depth for all products. In particular, for each product j the differences are based on the 95th and 5th percentile of promotion frequency and promotion depth across all stores where the product is sold

Table 3 Price promotion statistics

5.2.2 Promotion depth

The promotion depth for product j in store s, i.e. the average promotional discount across all weeks when the product was promoted, is given by

$$ \delta_{js}=\frac{1}{N_{js}^{D}}\sum\limits_{t\in\mathcal{T}_{js},D_{jst}=1}\delta_{jst}, $$

where \(N_{js}^{D}\) is the number of promotion events. The product-level promotion depth statistic, δj, is the weighted average of δjs across all stores using \(N_{js}^{D}\) as weights. As for the promotion frequency, we measure the heterogeneity in the promotion depth across stores using the difference between the 95th and the 5th percentile of δjs across stores (the distribution of δjs is weighted using \(N_{js}^{D}\)).

The bottom row of Fig. 4 shows the distributions of the average promotion depth, δj, and the across-store heterogeneity in δjs. The promotion depth for the median product is 19.5%, and the distribution across products ranges from 10.2% at the 5th percentile to 31.5% at the 95th percentile. As for the promotion frequency, there is a much heterogeneity in the promotion depth across stores.

5.3 Volume sold on promotion and promotion multipliers

We measure the “importance” of promotions using the percentage of product volume that is sold during a promotional period. We also document the ratio of the average product volume sold during a promotion relative to the average product volume when the product was not promoted. In the retail industry and in brand management, this ratio is called a lift factor or promotion multiplier.

The top left panel in Fig. 5 displays the weighted distribution of the percentage volume sold on promotion across products,Footnote 13 and Table 3 provide detailed summary statistics. The median percentage of volume sold on promotion is 28.7%, and ranges from 1.8% to 61.4% at the 5th and 95th percentiles. The bottom left panel displays the corresponding distribution of the promotion multipliers, with a median of 3.04 and a range from 1.34 to 9.22. Hence, as expected, the volume sold on promotion is disproportionately high (relative to the overall incidence of promotions), and units sales spike relative to the non-promoted volume when a product is promoted. The right panels in Fig. 5 reveal a large degree of heterogeneity in these two statistics across stores for most products.

Fig. 5
figure 5

Promotion volume and multiplier, and differences in volume and multipliers across stores

6 Price variance decomposition

We documented both a substantial degree of base price dispersion and that many products are frequently promoted. To quantify the contribution of these factors to the overall price dispersion, we decompose the price variance of a product into components that capture (i) price differences across markets, (ii) persistent price or base price differences across stores within markets, (iii) within-store price or base price variation over time, and (iv) price variation due to promotions.

We perform the variance decomposition separately for each product (UPC or brand). To simplify the notation, we drop the product subscript j. \({\mathscr{M}}\) is the set of all markets, and \(\mathcal {S}\) is the set of all stores. For each store s we observe prices in periods \(t\in \mathcal {T}_{s}.\) Ns is the number of observations for stores s, Nm is the total number of observations (across stores and time periods) in market m, and N is the total number of observations across all markets. \(\bar {p}\) is the overall (national) average price, \(\bar {p}_{m}\) is the average price in market m, and \(\bar {p}_{s}\) is the average price in store s. The overall price variance of a product is

$$ \text{var}(p_{st})=\frac{1}{N}\sum\limits_{s\in\mathcal{S}}\sum\limits_{t\in\mathcal{T}_{s}}(p_{st}-\bar{p})^{2}. $$

6.1 Basic decomposition

The basic decomposition quantifies the contribution of price variation across markets, within-market price variation across stores, and the variation of prices within stores:Footnote 14

$$ \begin{array}{llll} \text{var}(p_{st}) & =\text{var}(\bar{p}_{m}) & & \text{(across-market)}\\ & + \frac{1}{N}{\sum}_{m\in\mathcal{M}}N_{m}\text{var}(\bar{p}_{s}|m) & & \text{(across-store)}\\ & + \frac{1}{N}{\sum}_{s\in\mathcal{S}}N_{s}\text{var}(p_{st}|s). & & \text{(within-store)} \end{array} $$
(1)

\(\text {var}(\bar {p}_{m})\) is the variance of average market-level prices across markets, \(\text {var}(\bar {p}_{s}|m)\) is the within-market variance of average store-level prices, and var(pst|s) is the within-store variance of prices over time.Footnote 15

We report the revenue-weighted mean of the variance components in Table 4 using the 2010 data. Price-level differences across markets (ZIP+ 3 areas) account for 32.7% of the overall price variance for UPCs and 29.7% for brands. Hence, more than two-thirds of the national price variance is due to price variation within markets. Price-level differences across stores within markets account for 27% of the overall variance for UPCs and 42.3% for brands. Furthermore, the contribution of within-store price variation is 40.3% for UPCs and 28% for brands.

Table 4 Price variance decomposition

6.2 Decomposition into base prices and promotions

We now provide a more detailed decomposition that distinguishes between the contribution of base price differences and price promotions to the overall price variance

$$ \begin{array}{llll} \text{var}(p_{st}) & =\text{var}(\bar{p}_{m}) & & \text{(across-market)}\\ & + \frac{1}{N}{\sum}_{m\in\mathcal{M}}N_{m}\text{var}(\bar{b}_{s}|m) & & \text{(across-store base price var.)}\\ & + \frac{1}{N}{\sum}_{s\in\mathcal{S}}N_{s}\text{var}(b_{st}|s) & & \text{(within-store base price var.)}\\ & + \frac{1}{N}{\sum}_{m\in\mathcal{M}}N_{m}\text{var}(b_{st}-p_{st}|m) & & \text{(promotional discount var.)}\\ & - 2\frac{1}{N}{\sum}_{m\in\mathcal{M}}N_{m}\text{cov}(b_{st}-p_{st},b_{st}|m) & & \text{(EDLP vs. Hi-Lo adjustment)} \end{array} $$
(2)

The first component captures the variance of price levels across markets. The second component is the within-market variance of base prices, indicating persistent base price differences across stores, whereas the third component is the within-store variance in base prices over time.

The last two terms in Eq. 2 capture the contribution of price promotions to the overall price variance. The fourth term is the variance of promotional price discounts, bstpst, which is zero in the absence of price promotions. The last term is negative if the weighted average of the covariances between promotional price discounts and base prices is positive. A positive correlation indicates an EDLP (everyday low price) vs. Hi-Lo pricing pattern at the product level: Stores with above average base prices offer larger promotional price discounts than stores with below average base prices. Correspondingly, we call the last term in the decomposition (2) the “EDLP vs. Hi-Lo adjustment.” In the absence of an EDLP vs. Hi-Lo pricing pattern, price promotions increase the overall price variance. However, if there is EDLP vs. Hi-Lo pricing, the adjustment term will be negative and the overall variance in prices will be reduced. Intuitively, EDLP vs. Hi-Lo pricing compresses the price distribution and thus reduces the variance in prices. In Appendix D.3 we present an example that shows that price promotions may even decrease the overall price variance in the presence of an EDLP vs. Hi-Lo pricing pattern.

The results in Table 4 show that the within-market variance of average store-level base prices accounts for 31.3% of the overall price variance for UPCs and for 49.9% of the price variance for brands. The within-store base price variation over the course of a year accounts for a much smaller percentage of the overall price variance, 12.3% for UPCs and 13.4% for brands. Hence, store-level base prices are fairly persistent over the course of a year.

Of particular interest if the role of price promotions. The promotional price discount component in Eq. 2 is large and positive—36.0% for UPCs and 29.9% for brands. However, the EDLP vs. Hi-Lo adjustment term is negative, -12.3% for UPCs and -22.9% for brands. Hence, there is strong evidence for EDLP vs. Hi-Lo pricing at the product-level, which reduces the contribution of price promotions to the overall price variance. Table 4 also shows the total contribution of promotions to the overall price variance, 23.7% for UPCs and 7.0% for brands.Footnote 16

The last two columns in Table 4 express the contributions of the across-store and within-store base price variances and the contribution of price promotions as a percentage of the within-marke t price variance. The results highlight the importance of persistent base price differences across stores (46.5% for UPCs and 70.9% for brands) relative to the total contribution of price promotions (35.2% for UPCs and 10% for brands).

More detailed statistics, including the key percentiles of the price variance components, are provided in Table 9 in the Appendix. In particular, Table 9 reports the percentage of products for which price promotions decrease the overall price variance: 3.6% for UPCs and 30.8% for brands.

A key result from the price variance decomposition is that the large differences in base (non-promoted) prices across stores in a local market persist over time. This finding implies the presence of potentially large consumer search costs (e.g. Sorensen, 2000, or Honka et al., 2019 for a general survey) or rational inattention (e.g. Joo, 2020).

7 Price dispersion at the market and retail chain level

The analysis so far has shown that some of the overall, national price variance is due to market-specific factors. We now extend this analysis and compare how much of the overall variance in prices and differences in promotion tactics across stores can be attributed to market versus retail chain-specific factors.

To measure the percentage of the price variance that can be explained by market or chain factors, we regress the price of a UPC, pjst, on indicator variables for (i) all markets (3-digit ZIP codes), (ii) all retail chains, or (iii) all market/retail chain combinations. Specifically, we estimate three separate regressions corresponding to one of the three sets of indicator variables. The regressions are performed separately for each product j and week t in 2010, and we take the average over all weeks to obtain a single R2 value for each product.

The revenue-weighted distribution of the R2 values is shown in the top row of Fig. 6 (Table 10 in the Appendix contains detailed numbers). For the median product, 46.5% of the overall price variance is explained by local market factors.Footnote 17 Chain-specific factors explain 69.9% of the price variance for the median product, and 88.1% of the price variance is explained by market/chain factors. Hence, prices are substantially more homogenous within the 81 different retail chains than within the 840 different 3-digit ZIP code areas in our data, and prices are particularly homogenous in retail chains at the local market level. To further illustrate the large difference in price homogeneity at the market versus market/chain level, note that for 90% of products the R2 is at least 26.6% based on market factors and at least 70.7% based on market/chain factors.

Fig. 6
figure 6

Percentage of variance of prices, promotion frequency, and promotion depth explained by market and chain factors

We perform a similar analysis for the store-level promotion frequency and promotion depth, πjs and δjs. The results are shown in the middle and bottom rows of Fig. 6. The results mirror the previous findings. For the median product, 36.1% of the variation in the promotion frequency across stores is explained by market factors, 62.3% is explained by retail chain factors, and 80.0% is explained by market/chain factors. The corresponding findings for promotion depth are similar.

We visualize the similarity of prices at the retail chain and market level for the case of Tide HE Liquid Laundry Detergent (100 oz) in Figs. 7 and 8. The figures display a two-dimensional representation of the store-level time series of prices using the projections onto the first two principal components.Footnote 18 Figure 7 is split into six panels that contain identical gray dots representing all projected store-level prices. In each of the panels some of the dots are colored according to the retail chain that the prices belong to. All stores that belong to the same retail chain appear in exactly one panel.Footnote 19 Figure 7 shows that the projected price vectors that belong to the same retail chain cluster and exhibit much less variance compared to the overall variance in projected prices. In Fig. 8 we color the projected store-level price vectors according to the market (DMA) that a store belongs to.Footnote 20 There is some clustering of prices also at the market level, but the similarity of prices within a market appears much smaller than the similarity of prices within a retail chain.

Fig. 7
figure 7

Projected store-level price vectors colored by retail chain

Fig. 8
figure 8

Projected store-level price vectors colored by DMA

8 Chain-level price similarity and promotion coordination

We now provide a more detailed analysis of the similarity in product pricing and promotions at the retail chain level.

8.1 Price dispersion at the retail chain level: Summary statistics

We quantify the chain-level price dispersion using the approach used to measure price dispersion at the national and market level in Section 4. Table 5 shows the results for UPCs. The overall log-price standard deviation at the chain level is 0.079 for the median product. At the market level, the chain-level price dispersion, 0.039, is substantially smaller, both compared to the overall chain-level price dispersion and the market-level price dispersion, which is 0.099 at the 3-digit ZIP code level. The smaller degree of chain-level price dispersion at the market level compared to the national level is evidence for zone pricing (Adams & Williams, 2019).

Table 5 UPC price and base price dispersion statistics: Market vs. retail chain-level

We discussed in Section 5.2 that the discrepancy between a retailer’s promotion calendar and the Nielsen RMS week definition creates measurement error that may exaggerate the variance in chain-level prices. However, this measurement error predominantly affects promoted prices, not base prices. For base prices, the log-price standard deviation is 0.078 at the chain level versus 0.027 at the chain/market (3-digit ZIP code) level. Hence, even if we account for measurement error, the local chain-level prices are not exactly identical, but similar across stores.Footnote 21

8.2 Promotion coordination

Next, we focus on promotions. In particular, we examine if price promotions are coordinated, in the sense that the same product is systematically promoted at the same time among the stores in a retail chain or in a market.

Appendix F provides an overview of promotion coordination, and analyzes the distribution of chain/market level promotion percentages, i.e. the fraction of local stores in a retail chain that simultaneously promote a product.

Here, we focus on a more formal statistical analysis to estimate the dependence in the incidence of promotions across stores. The promotion incidence is captured using the indicator Djst ∈{0,1}, where Djst = 1 if product j is promoted in store s in week t. For store s in market m we define the inside promotion percentage, the mean promotion incidence in week t among all other stores in the local retain chain except s:

$$ I_{jst}=\frac{1}{|\mathcal{S}_{sm}|}\sum\limits_{r\in\mathcal{S}_{sm},}D_{jrt}. $$

\(\mathcal {S}_{sm}\) is the set of all stores in market m that belong to the same retail chain as store s, not including store s itself.Footnote 22 Vice versa, the outside promotion percentage is the mean promotion incidence in week t among all stores in other local retain chains:

$$ O_{jst}=\frac{1}{|\mathcal{\bar{S}}_{sm}|}\sum\limits_{r\in\mathcal{\bar{S}}_{sm}}D_{jrt}. $$

\(\mathcal {\bar {S}}_{sm}\) is the set of all stores in market m that belong to a retail chain other than the chain that store s belongs to. We define markets as DMAs to ensure that the local retail chains have sufficiently many stores. In the extreme case when a chain had only one local store, the inside promotion percentage would not be defined.

To test for promotion coordination, we estimate the statistical association between the promotion indicator Djst and the inside and outside promotion percentages, xjst = (Ijst, Ojst), separately for each product j and store s:

$$ \mathbb{E}[D_{jst}|x_{jst}]=\Pr\{D_{jst}|x_{jst}\}=\alpha_{js}+{\upbeta}_{js}I_{jst}+\gamma_{js}O_{jst}. $$
(3)

If the promotions in store s are set independently of the same-chain and other-chain promotions in market m, then βjs = γjs = 0.

The estimates are summarized in Table 6 (Fig. 23 displays corresponding histograms).Footnote 23 We provide the results separately for the full model (3) and for a restricted version that only includes Ojst as independent variable. We first focus on the DMA-level results for the full model in the top panel. The median across all inside percentage coefficients, βjs, is 1.005, and 97.7% of the estimates are positive. Furthermore, we reject the null hypothesis that the inside percentage coefficient is not positive, βjs ≤ 0, for 95.8% of all estimates, and 52.8% of the estimates are not statistically different from 1 at the 5% level. These results provide clear evidence that the price promotions for most products are coordinated across stores within a retail chain. On the other hand, the median of the outside percentage coefficients, γjs, is -0.001, and 89.7% of the estimates are not statistically different from 0. Hence, conditional on the inside promotion percentage, Ijst, information on the contemporaneous promotion incidence in other retail chains in the local market is typically not predictive of Djst. The estimates of the intercept are small and mostly not distinguishable from 0, indicating that the promotion probability in a store is 0 if none of the other stores in the chain promote the product. This is further evidence that promotions are coordinated at the local chain level.

Table 6 Promotion coordination: Promotion dependence regressions

To investigate if promotions are also unconditionally independent of promotions in other retail chains, we estimate a restricted model that only includes the outside percentage as independent variable. The median of the coefficient estimates is 0.082, and the distribution of the estimates is skewed to the right. Hence, there is evidence that for some products promotions are unconditionally dependent on the promotion incidence in other local retail chains. This dependence is likely due to promotional allowances—trade deals that are offered by the product manufacturers to multiple or all retail chains. Another explanation is seasonality in demand, although seasonality is unlikely to account for the large documented degree in promotion coordination.

We also test if promotions are coordinated across markets at the national level. We define the national inside promotion percentage, the mean promotion incidence in week t among all other stores of the chain in different markets:

$$ I_{jst}^{\prime}=\frac{1}{|\mathcal{S}_{s,-m}|}\sum\limits_{r\in\mathcal{S}_{s,-m}}D_{jrt}. $$

\(\mathcal {S}_{s,-m}\) includes all stores in the same chain at the national level, excluding the market that store s belongs to. We similarly define the national outside promotion percentage, \(O_{jst}^{\prime }\), the mean promotion incidence in other chains outside the market. The estimates in Table 6 show that promotions are also strongly coordinated at the national level. The median of the national inside promotion percentage estimates is 1.017, and 97.2% of the estimates are positive. Further, mirroring the market-level results, promotions in store s are conditionally independent of the promotions in other retails chains outside the market. However, in the restricted regression, the median of the national outside percentage coefficients is 0.363. Thus, the national-level estimates provide stronger evidence for unconditional promotion dependence across retail chains than the market-level results.

8.3 Is price discrimination constrained by feature advertising?

Retailers use feature advertising to inform households of specific products and their prices. Feature ads are distributed in the form of circulars (print or digital) or newspaper inserts. Because feature ads typically apply to all stores in a market, they constrain the degree of price discrimination that is feasible for a retail chain. Hence, we analyze if the similarity of prices at the chain-level is due to feature advertising.

Our analysis relies on the fact that features are typically used to advertise promoted prices but not base prices. We are able to measure the association between feature ads and promotions because data on feature advertising are available in the Nielsen RMS scanner data for a sub-sample of 17% of stores.Footnote 24 We find that in weeks when a product is not promoted, the probability that a product is featured is only 0.03. Hence, while feature advertising may constrain price discrimination in promoted prices, it cannot constrain base prices.

We repeat the analysis in Section 7, where we measure the percentage of the price variance that can be attributed to market or retail chain-specific factors, for base prices only. If price similarity were due to feature advertising, chain and market/chain factors should explain a smaller percentage of the base price variance compared to the overall price variance. Figure 9 shows the revenue-weighted distributions of the R2 values from regressions of base prices on market (3-digit ZIP code), chain, and market/chain indicators. For comparison, we also show the corresponding distributions discussed in Section 7, where we do not differentiate between promoted and base prices. For the median product, the R2 values are somewhat larger for the base price regressions. For example, the median R2 for the base price regressions using market/chain indicators is 90.6%, compared to 88.1% for the analogous price regressions.

Fig. 9
figure 9

Percentage of variance of prices and base prices explained by market and chain factors

The results indicate that the degree of price similarity at the chain and market/chain level is somewhat higher for base prices, which are typically not featured. We conclude that the observed price similarity is not primarily due to feature advertising as an institutional constraint on price discrimination.

We also compared the degree of price dispersion across weeks with and without feature advertising. This analysis is inconclusive, however, because there is more measurement error for promoted prices than for base prices (Section 5.2), which introduces more artificial price dispersion in featured than in non-featured weeks.

8.4 Discussion

The similarity in prices and promotion strategies in our data differs from the heterogeneity in pricing strategies within the same retail chain that is documented in Ellickson and Misra (2008). The analysis in Ellickson and Misra (2008) uses data from the 1998 Trade Dimensions Supermarkets Plus Database, which provides information on store-level pricing strategies based on surveys of retail chain managers. Hence, the data are not directly comparable and cover a different time period than our work. In particular, the managers surveyed in the Trade Dimensions data classify store-level pricing policies as EDLP (everyday low price), promotional/Hi-Lo, or as a hybrid of EDLP and Hi-Lo. These qualitative responses may be consistent with the residual variation in pricing and promotion policies after accounting for market/chain dummies as shown in Fig. 6.

Also related to our work, Arcidiacono et al. (2020) find that the price similarity pattern remains unchanged even after the entry of a strong competitor, a Walmart Supercenter, in the local market.Footnote 25

9 Does demand similarity explain price similarity?

As shown in the previous two sections, prices are more similar at the chain level than at the market level, and prices are especially similar within chains at the local market level. Furthermore, promotions are highly coordinated across stores that belong to the same retail chain.

Without context, the economic implications of these findings are hard to assess. In particular, if there is a loss in profits because price discrimination is not employed depends on whether price elasticities and promotion effects differ significantly across the stores of a retailer. To provide this necessary context, we estimate store-level demand for the top 2,000 brands in our data, and we compare the similarity in prices and promotions to the corresponding similarity in price elasticities and promotion effects.

9.1 Demand model

We estimate demand for each product at the store level. We define products as brands. Estimating demand at the UPC level is difficult because of the large number of UPCs in most categories and because stores carry different assortments of UPCs under the same brand name. Hence, demand estimation in industrial organization and marketing is typically performed at the brand level (e.g. Hoch et la., 1995 and Nevo, 2001).

We use a log-linear demand model for brand j in store s and week t:

$$ \log(1+q_{jst})=\alpha_{js}+\sum\limits_{k\in\mathcal{J}_{js}}{\upbeta}_{jks}\log(p_{kst})+\sum\limits_{k\in\mathcal{J}_{js}}\gamma_{jks}D_{kst}+\tau_{j}(s,t)+\epsilon_{jst}. $$
(4)

We add 1 to the sales quantity, qjst, to ensure that the demand model is valid for the substantial number of observations with 0 sales in the data. pkst is the price of brand \(k\in \mathcal {J}_{js}\), where \(\mathcal {J}_{js}\) is a set of products in store s that are in the same category that brand j belongs to, including j itself. Dkst is a promotion indicator. The demand model includes brand-store fixed effects, αjs, and time fixed effects, τj(s, t). τj(s, t) is identical for all stores in a local market, defined based on the 3-digit ZIP code. Our main estimates are obtained with τj(s, t) defined as month fixed effects.Footnote 26

Using the demand model (4), the predicted price elasticity for brand j with respect to the price of product k is given by

$$ e_{jk}=\frac{\partial q_{jst}}{\partial p_{kst}}\frac{p_{kst}}{q_{jst}}={\upbeta}_{jks}\frac{q_{jst}+1}{q_{jst}}. $$

The price coefficient βjks approximates the price elasticity ejk, and for simplicity we will refer to βjks as a price elasticity from now on.

9.2 Sample selection

We estimate demand for the top 2,000 brands (based on total revenue) during the 2008-2010 period. We focus on these large brands to avoid measurement error in prices. This measurement error occurs because the price of a small brand frequently needs to be imputed for weeks with no sales, qjst = 0. Additionally, to avoid measurement error, for each brand we only estimate demand for stores where prices are observed in at least 80% of weeks. In many product categories it is not feasible to include the prices and promotions of all competing brands in the demand model. Hence, we only include the brands that account for at least 80% of the category revenue, with a maximum of 5. In total, we estimate 27.2 million brand-store demand models.

9.3 Estimation results

9.3.1 Main results

The distribution of the own-price elasticities, pooled across all brand-store estimates, is shown in the top panel of Fig. 10 (see Table 12 in the Appendix for detailed summary statistics).Footnote 27 The estimates are obtained using local (3-digit ZIP code) month fixed effects. We color estimates that are not statistically different from 0 at the 5% level in gray and all other estimates in blue.

Fig. 10
figure 10

Own-price elasticity estimates

The median of the brand-store price elasticity estimates is -1.93. There is a large degree of heterogeneity, with the estimates ranging from -6.647 at the 5th percentile to 2.025 at the 95th percentile of the distribution. 85.6% of the own-price elasticity estimates are negative, but only a small percentage, 4.1%, of the estimates is positive and statistically different from 0. That many parameters are not precisely estimated or have the “wrong” sign is expected because the estimates are brand/store-specific and obtained using observations for at most 156 weeks during the 2008-2010 period. 70.6% of the estimates indicate elastic demand, βjjs < − 1.

The top panels in Fig. 11 display the distributions of the estimated cross-price elasticities with respect to the two largest competitors in the product category. The medians of the cross-price elasticities are positive, but a large percentage of the estimates, 42.8% for the largest and 44.7% for the second largest competitor, is negative. The majority of the estimates is not statistically different from zero. This evidence indicates that it is particularly challenging to obtain precise cross-price elasticity estimates at the brand/store level using a 156 time series of weekly observations. To improve on these estimates we would have to impose parameter restrictions or estimate a demand model that relies on a smaller number of parameters, such as a logit or random coefficients logit demand system (Berry et al., 1994, 1995).

Fig. 11
figure 11

Cross-price and promotion effect estimates

The own-promotion effect estimates are shown in the bottom panel of Fig. 11. Among these estimates a larger percentage have the expected sign: 77.5% of all estimates and 94.8% of the estimates that are statistically different from zero are positive.

9.3.2 Bayesian hierarchical model results

As an alternative to the OLS estimates we use a Bayesian hierarchical model to obtain the posterior distribution of the demand parameters for each brand and store, 𝜃js = (αjsjs, γjs).Footnote 28 In the Bayesian hierarchical model specification, 𝜃js is assumed to be distributed according to the population distribution or first-stage prior p(𝜃), which we assume to be normal, \(N(\bar {\theta }_{j},V_{j})\). The posterior distribution of the store-level and population parameters is obtained using MCMC sampling. We obtain store-level estimates of the demand parameters based on the posterior means of 𝜃js. A detailed summary of the model specification and sampling approach is provided in Appendix G.

There are two reasons that motivate us to provide these additional estimates. First, the posterior means in the Bayesian hierarchical model are shrinkage estimators by which imprecise store-level estimates are shrunk to the population mean. This shrinkage property provides a form of regularization to guard against noisy parameter estimates, which is particularly important given the goal to obtain a large number of brand/store-level demand estimates. Second, Bayesian hierarchical models are widely used in the industry by in-house analysts and analytics companies that provide demand estimates for retail chains and brand manufacturers. Hence, the pricing and promotion decisions in our data are often made using estimates from a Bayesian hierarchical model.

The bottom panel in Fig. 10 displays the distribution of the own-price elasticity estimates, i.e. posterior means, from the Bayesian hierarchical model. The median is almost identical to the median of the OLS estimates. However, the distribution of the Bayesian hierarchical model estimates has thinner tails than the distribution of the OLS estimates, which is expected due to the shrinkage property of the Bayesian hierarchical model (see Table 12 for detailed results). Also, the percentage of negative elasticities is larger for the Bayesian hierarchical model compared to the OLS estimates, 90.3% versus 85.6%, and 74.9% versus 70.6% of the OLS estimates indicate elastic demand. In this sense, the Bayesian hierarchical model estimates conform more to expectations, although the overall difference with respect to the OLS estimates is only moderate.

9.3.3 Causal price and promotion effects?

In the presence of endogeneity or confounding the price and promotion coefficients do not have a causal interpretation. Our strategy to adjust for confounding relies on the store and market/time fixed effects that are included in the demand model. If τj(s, t) captures all time-varying demand components that are associated with the prices, and if the residual variation in prices, conditional on the fixed effects, is due to factors such as costs that do not directly affect demand, then we can interpret the estimated price and promotion coefficients as causal. This strategy is discussed further in Appendix H, and we show that the coefficient estimates are largely insensitive to the choice of less granular (quarterly) or more granular (weekly) time fixed effects. The robustness to the exact choice of the fixed effects suggests that confounding is of little concern, but we cannot conclusively rule out some remaining endogeneity.

We could have alternatively pursued an instrumental variables strategy (e.g. Berry et al., 1994, 1995). However, for the case of retail pricing, good instruments are hard to find and often weak, as emphasized by Rossi (2014). Ultimately, if we should use instruments or not is besides the point, for an important reason that we explain next.

In particular, instrumental variables strategies have not been employed widely in the industry for the purpose of price and promotion effect estimation. Hence, the estimates that we obtain are similar to the estimates available to sophisticated retailers and brand manufacturers. In our empirical analysis below we investigate if the price elasticities and promotion effects that are available to managers are similar or discernibly different across stores at the local retail chain level. For this purpose, what matters is an analysis of the estimates that managers use to set prices and promotions, whereas any potential bias due to confounding is of little relevance.

9.4 Similarity of price and promotion effects at the market and retail chain level

In Section 7 we measured the percentage of the variance in prices, promotion frequency, and promotion depth at the market and retail chain level. Analogous to this analysis, we now document what percentage of the variance of the own-price and promotion effects can be attributed to market (3-digit ZIP code), retail chain, and market/retail chain factors.

Figure 12 displays the distribution of the revenue-weighted R2 values from regressions of \(\hat {\upbeta }_{jjs}\) and \(\hat {\gamma }_{jjs}\) on market, chain, and market/chain indicators (for detailed results see Table 13 in the Appendix).Footnote 29 The top row shows the results for the OLS price elasticity estimates. For the median product, market factors explain 14.6% of the price elasticity variance across stores. Retail chain factors explain 17.2%, and in particular, market/chain factors explain almost half, 47.3% of the variance across stores. The results indicate that market factors account for a modest percentage of the differences in own-price elasticities across stores. However, within local markets, the elasticities are much more similar for stores that belong to the same retail chain.

Fig. 12
figure 12

Percentage of variance of own-price elasticities and own-promotion effects explained by market and chain factors

Although similar, the price elasticities are not identical at the local retail chain level—slightly more than half of the variation is not captured by market/chain fixed effects. This remaining variation in the elasticities might represent an unexploited opportunity to price discriminate across stores, or it may simply reflect measurement error in the estimated own-price elasticities. Hence, we compare the R2 values for the OLS own-price elasticity estimates to the corresponding R2 values for the posterior means of the elasticities from the Bayesian hierarchical demand model, which reduces measurement error by design. The results in the second row of Fig. 12 are consistent with less measurement error in the estimates from the Bayesian hierarchical demand model, as the R2 values are uniformly higher. Based on these alternative estimates, local market factors explain 20.7% of the overall variance, whereas chain factors explain 23.5% and market/chain factors explain more than half, 52.3% of the variation in own-price elasticities.

We perform an analogous analysis for the estimated own-promotion effects. The results are shown in the bottom two rows of Fig. 12. Based on the estimates using the Bayesian hierarchical model, market factors explain 19.3%, chain factors explain 30.5%, and market/chain factors explain 56.4% of the variance in the estimated promotion effects. Hence, compared to the results for the own-price elasticities, chain factors explain a larger percentage of the variation in the estimated promotion effects.

10 Can retailers distinguish among store-level price elasticities and promotion effects?

The documented chain-level similarity in own-price and promotion effects raises the question if retail managers are able to distinguish among the corresponding estimates that are available to them. Hence, unlike in the previous section, we now focus on the estimates used by retail managers who employ marketing analytics to make price and promotion decisions. In the industry, Bayesian hierarchical models of demand that are similar to our specification have been used since the early 2000s.Footnote 30 To emulate the empirical analysis available to a sophisticated retailer, we now estimate Bayesian hierarchical demand models separately for each retailer. Using only data for their own stores is a practical necessity for retailers, because scanner data for their competitors are typically not available to them. Different from the prior analysis, the population distribution (first-stage prior) of the demand parameters is now specified at the chain level, not at the national level. Hence, the store-level estimates will be shrunk to the chain-level mean, and the price and promotion effects that are visible to the managers will likely appear more similar compared to the previous estimates that were obtained using a national first-stage prior.

10.1 Similarity of price and promotion effects visible to managers

We first replicate the analysis in Section 9.4 using the store-level own-price elasticities and promotion effect estimates that are available to retailers, and document the percentage of the variance of the estimates that can be explained by market, retail chain, and market/chain factors. The results are shown in Fig. 13, with the estimates using a national first-stage prior displayed in the first row and the results using chain-level first-stage priors in the second row of each panel.

Fig. 13
figure 13

Percentage of variance of own-price elasticities and own-promotion effects explained by market and chain factors: Chain level vs. national priors

The variation in price elasticities and promotion effects explained by market factors is almost identical across the two specifications. However, chain and market/chain factors explain a substantially larger fraction of the variation in the estimates that are obtained using chain-level first-stage priors. Whereas 23.5% and 52.3% of the variation in price elasticities is explained by chain and market/chain factors when using national first-stage priors, 51.0% and 70.5% of the variation is explained by these factors when using using chain-level first-stage priors. Similarly, focusing on the promotion effects, the variation explained across the specifications is 30.5% versus 56.0% for chain factors and 56.4% versus 73.5% for market/chain factors.

10.2 Statistical distinguishability of the store-level estimates

The results in the previous sub-section indicate a large degree of similarity in the price and promotion effect estimates that are visible to sophisticated retailers who make data-driven pricing and promotion decisions. The similarity is especially stark considering that the store-level price and promotion effects are obtained using at most 156 weekly observations and hence likely affected by substantial sampling error. Hence, although the posterior means of the price and promotion effects are different, managers may consider price discrimination across stores to be impractical or infeasible due to the statistical uncertainty around the estimates.

To assess the statistical distinguishability of the estimates we again emulate the demand analysis that sophisticated retailers use in practice, and we construct credible intervals for the store-level price elasticity and promotion effect estimates. For each brand/market/retail chain combination, we then record the percentage of store-level price and promotion effects with credible intervals that exclude the mean of the local chain-level estimates. This approach to assess the difference in the estimated effects is not exactly identical to a frequentist hypothesis test, but plausibly corresponds to how differences between price and promotion affects are assessed in the industry practice.

Figure 14 displays histogram of the percentages of price and promotion effects that can be distinguished from the local chain-level mean. The observations in the distributions are at the brand/market/retail chain level. The results are shown separately using 95%, 90%, and 80% percent credible intervals. In the top row of each panel we show the distributions for all brands, whereas the histograms in the bottom row show the distributions for the 250 largest brands.Footnote 31

Fig. 14
figure 14

Percentage of estimated price elasticities and promotion effects different from market/retail chain level mean

Focusing on the results for all brands, the median percentages of statistically distinguishable price elasticities are 7.5%, 12.4%, and 21.2% when using 95%, 90%, and 80% credible intervals. For the 250 largest brands, the respective median percentages are 11.6%, 17.7%, and 28.4%. The somewhat larger fraction of statistically distinguishable price effects reflects the higher precision of the estimates for the largest products. However, even for the largest brands, retail managers who conduct a similar analysis will find that between 71.6% and 88.4% of all price elasticity estimates are not distinguishable from the chain/market-level average. A similar pattern holds for the estimated promotion effects, shown in the bottom panel of Fig. 14.

Note that the Bayesian hierarchical model estimates were obtained after residualizing the data to account for the common time/market fixed effects (Appendix G). Our credible intervals do not account for the uncertainty in the fixed effect estimates, and hence they understate the uncertainty about the price and promotion effects. Therefore, our numbers should be interpreted as upper bounds on the true statistical distinguishability of the store-level estimates.

11 Discussion: Explanations for price similarity

The analysis in the previous section provides one explanation for the observed lack of local price discrimination by retailers. In particular, the lack in precision and the corresponding difficulty to distinguish between store-level price and promotion effects may lead managers to conclude that price discrimination across stores is infeasible in practice. This explanation also conforms with the anecdotal evidence that we gathered in informal discussions with retail chain managers.

A related explanation is that cross-price elasticities are hard to estimate. Both valid own and cross-price elasticity estimates are required to predict and optimize category profits. However, as discussed in Section 9.3.1, the brand/store cross-price elasticity estimates in our data are imprecise, which may be expected given the limited sample size of at most 156 weekly observations for each brand/store. Our data, sample size, and estimation approach are similar to what is used in the industry.Footnote 32 Hence, it is unlikely that managers have access to better estimates. Further, pricing is part of what is called category management in the industryFootnote 33, and managers are aware that price changes for one product may also affect demand for other products in a category (or even lead to substitution across categories or stores). Hence, given the two related challenges to distinguish among price elasticities across stores and to obtain precise cross-price elasticities, retail managers may plausibly consider price discrimination across stores as infeasible.

We do not attempt to predict optimal, profit-maximizing prices in this paper, largely because of the challenges discussed above when estimating store-level price elasticities. Closely related to our work, DellaVigna and Gentzkow (2019) also document price similarity at the retail chain level. They estimate log-linear demand models that are similar to our specification using UPCs as product definition. Using the elasticity estimates they predict optimal product-level prices separately for each store within a chain. The corresponding predicted annual profit loss at the observed prices compared to optimal pricing for all products is $239,000 (1.79% of revenue) for the median store in their sample. To predict the optimal prices, DellaVigna and Gentzkow (2019) maximize profits for each UPC separately only with respect to its own price. In particular, competitor prices are not included in the estimated demand models. Thus, by abstracting away from substitution to other UPCs sold under the same brand name or to other brands in the category, they avoid the problems that we discussed above. DellaVigna and Gentzkow (2019) propose that managerial inertia, including agency frictions and behavioral factors, are the reason for price similarity and the deviation of actual prices from their prediction of optimal prices. This hypothesis is different from our proposed explanation for price similarity, which neither requires agency issues nor behavioral factors, although the two alternative explanations are not mutually exclusive.

We ruled out that price similarity is due to feature advertising (Section 8.3), in particular because not just promoted prices but also base prices exhibit a high degree of price similarity. However, consumers may prefer retailers with predictable, consistent pricing patterns over chains that sell products at different price points across stores, and hence the increase in profits from store-level price discrimination may be offset by the loss of consumers who substitute to competing retail chains. This aversion of consumers to price discrimination may be due to fairness concerns, as proposed by Kahneman et al. (1986). We are unable to test this explanation with our data. However, fairness concerns are consistent with the findings in Ater and Rigbi (2020), who provide evidence that a food price transparency regulation in Israel caused an increase in price similarity in grocery chains. As corroborating evidence, Ater and Rigbi (2020) argue that fairness concerns were an important part of the public debate that resulted in the legislation that included the price transparency regulation.

Adams and Williams (2019) and Dobson and Waterson (2005) argue that uniform pricing may soften competition and increase retailer profits. To achieve uniform pricing as the outcome in the Dobson and Waterson (2005) model, one competitor needs to be able to pre-commit to uniform pricing. In an empirical analysis of the home-improvement industry, Adams and Williams (2019) find that one retailer in the home-improvement industry would achieve higher profits if both the retailer and its main competitor used uniform pricing. Even with pre-commitment, however, uniform pricing is not an equilibrium outcome in the empirical example.

12 Conclusions

We analyze patterns in retail (grocery) price and promotion strategies and in the relationship between pricing and promotion strategies and product demand. We emphasize generalizable results that are based on a large sample of products, representing almost 80% of retail revenue and sold across a large number of stores.

We document a large degree of price dispersion at a given moment in time for identical products (UPCs) and almost identical products (brands) across U.S. retail stores, both nationally and at the local market level. The degree of price dispersion strongly varies across products. Most products are frequently promoted, but the overall price dispersion is not only due to promotions but also due to a large degree of dispersion in the non-promoted base prices. Decomposing the overall yearly price variance into base price and promotion components, we find that within markets, persistent base price differences across stores account for the largest share of the price variance, whereas the contribution of the within-store variance of base prices is relatively small. Despite the high degree of promotional activity, promotions account for only a moderate share of the overall price variance. This counterintuitive finding is due to an EDLP vs. Hi-Lo pricing pattern, which compresses the overall price dispersion across stores.

A key finding is the substantial similarity in prices, promotion frequency, and promotion depth at the retail chain and especially at the local market/retail chain level. Individual promotion events are also strongly coordinated within chains. This similarity in prices and promotions may appear like a missed opportunity to price discriminate, and it contradicts what has been the conventional wisdom in the academic marketing community, that there was a trend towards “store-specific marketing” in the grocery retail industry. However, mirroring the observed patterns for prices and promotions, we find that also store-level estimates of price elasticities and promotion effects are similar at the retail chain level, and even more so at local market/chain level. Further, the price and promotion effects across stores are hard to statistically distinguish from the local average effects, and cross-price elasticity estimates in particular are highly imprecise. We are confident that managers do not have access to more precise information, because our demand model, estimation method, and data closely emulate what sophisticated analysts use in practice. Therefore, managers may consider store-by-store pricing and promotion decisions to be practically infeasible.

Our novel explanation for price similarity differs from related work that has attributed price similarity to managerial inertia (DellaVigna & Gentzkow, 2019), brand image or fairness concerns (Ater & Rigbi, 2020), and competitive considerations (Adams & Williams, 2019). However, although different, the explanations are not mutually exclusive.

We emphasize, however, that our analysis is not normative. We argued that retail managers or analysts may consider price discrimination across stores to be infeasible if they attempt to statistically distinguish between price and promotion effects using an approach similar to Section 10.2. Our argument provides an explanation for the observed price and promotion similarity, but does not imply that retailers should not try to price discriminate based on store-level price and promotion effects. In particular, a strategy that sets store-level prices and promotions based on the expectation of profits with respect to the posterior distribution of the store-level demand estimates may increase profits despite the large statistical uncertainty. An empirical investigation into whether such a price discrimination strategy can improve retailer profits, and if it is feasible given the potential brand image and fairness concerns, would be of great practical value.