Keywords

1 Pricing Strategies for Stochastic Demand

Firms offering goods on online marketplaces have to face increasing competition and stochastic demand. One reason for the increasing competition is the rising application of automated repricing algorithms and the resulting shortening of time spans between price updates. The time pressure and stochastic demand make it challenging for firms to determine prices fast and efficiently (often for a large number of products) while ensuring to employ pricing strategies that maximize their own expected profits. But at the same time, online marketplaces also provide numerous advantages. Sellers are now able to observe the market situation at any given point in time and set prices accordingly. Having historical market data at hand also enables sellers to learn the demand over time and better understand the consumers’ decision making. More interestingly for the context of this paper, firms can learn the competitors’ strategies. Pricing strategies that use that demand knowledge and further competitor strategies will thus be of increasing interest.

Nevertheless, determining suitable price reactions is a highly challenging task. While fixed price strategies are relatively straightforward to manage, in an increasing number of contexts involving both perishable (e.g., fashion goods, seasonal products, event tickets) as well as durable goods (e.g., books, natural resources, gasoline) automated price adjustment strategies are employed. A typical pattern observed on markets with automated response strategies are cyclic price patterns over time, e.g., Edgeworth cycles as illustrated in Fig. 1. Here, firms compete with each other by undercutting the competitor’s price until the lower bound is reached (e.g., when margin nears zero) and one competitor raises the price in order to allow for future profits [1, 2].

Fig. 1.
figure 1

Exemplary illustration of Edgeworth price cycles in a duopoly. Both firms undercut each other until the green firm reaches his lower bound and adjusts his price to the upper bound. (Color figure online)

In this paper, we present a model for duopoly pricing models in a stochastic dynamic framework in which sales probabilities are allowed to be an arbitrary function of time and competitor prices. The goal is to take into account (i) varying (randomized) reaction times, (ii) various given competitor strategies, (iii) additional passive competitors that use constant prices, and (iv) competitors that optimally react.

1.1 Literature Review

The challenge of determining optimal prices for the sale of products is one of the key aspects of revenue management theory. This field of dynamic pricing has been discussed in an array of books (e.g., [3,4,5]). Chen and Chen published a survey giving an excellent overview of recent pricing models under competition [6]. Gallego and Wang consider a continuous time multi-product oligopoly for differentiated perishable goods using optimality conditions to reduce the multi-dimensional dynamic pricing problem to a one-dimensional one [7]. Gallego and Hu analyze structural properties of equilibrium strategies in more general oligopoly models for the sale of perishable products [8], basing the solution model on a deterministic version of the model. Martínez-de-Albéniz and Talluri consider duopoly and oligopoly pricing models for identical products [9]. They use a general stochastic counting process to model customer demand.

Further related models are studied by Yang and Xia [10] as well as Wu and Wu [11]. Levin et al. [12] and Liu and Zhang [13] analyze dynamic pricing models under competition including strategic customers. Dynamic pricing competition models with limited demand information are analyzed by Adida and Perakis [14], Tsai and Hung [15], and Chung et al. [16] using robust optimization and learning approaches. Many models consider continuous time models with finite horizon and limited inventory. In most existing models, discounting is not included and the demand is assumed to be of a somewhat artificial and stylized form. We consider an infinite horizon model without inventory restrictions (i.e., products can be reproduced or reordered) [17]. Demand is allowed to depend generally on time as well as on the market participants’ prices.

Current automated pricing strategies are comparatively simple and aggressive. One example is the often employed strategy of slightly undercutting the price of the cheapest competitor [18]. We do not assume that all market participants act rationally. In order to be able to respond to arbitrary suboptimal pricing strategies we provide applicable solution algorithms that allow computing optimal response strategies.

1.2 Contribution

This paper is an extended version of [17] in which we analyzed optimal price response strategies that are based on anticipated competitor strategies. The model is characterized by a discrete time setting, an infinite horizon, subsequent price reactions, and no inventory considerations.

Compared to [17], in this paper we make the following contributions: First, instead of applying value iteration, we compute optimal strategies by solving the Hamilton-Jacobi-Bellman equation using a non-linear solver. Second, we allow both firms to apply optimal price response strategies in order to study iterated mutual strategy adjustments. Third, we identify equilibrium strategies and analyze their characteristics. Fourth, we study how equilibrium strategies are affected by the discount factor.

The remainder of this paper is structured as follows. In Sect. 2, we describe the stochastic dynamic duopoly model with infinite time horizon for durable goods. We allow sales probabilities to depend on competitor prices as well as on time (seasonal effects). The state space is characterized by time and the actual competitors’ prices. The stochastic dynamic control problem is expressed in discrete time. In Sect. 3, we consider a duopoly competition. The competitor is assumed to frequently adjust its prices using a predetermined strategy. We assume that the price reactions of competitors as well as their reaction times can be anticipated. We set up a firm’s Hamilton-Jacobi-Bellman equation and use recursive methods (value iteration) to approximate the value function. We are able to compute optimal feedback prices as well as expected long-term profits of the two competing firms. Evaluating price paths over time, we are able to explain specific price cycles. Additionally, the results obtained are generalized to scenarios with randomized reaction times and mixed strategies.

In Sect. 4, optimal response strategies in the presence of active and passive competitors are analyzed. We examine how the duopoly game of two active competitors is affected by additional passive competitors. We show how to compute optimal pricing strategies and to evaluate expected profits. We also discuss how the cyclic price paths of the active competitors are affected by different price levels of passive competitors.

In Sect. 5, we evaluate the expected profits when different strategies are played against each other. We study scenarios in which the competitor also applies optimal response strategies. In Sect. 6, we study mutual optimal reaction strategies. We show that equilibrium strategies can be identified by iterating optimal response strategies. Eventually, the conclusion and managerial recommendations are given in Sect. 7.

2 Model Description

For this work, we consider the situation where a firm wants to sell goods (e.g., groceries, technical devices, gasoline) on a digital marketplace (e.g., Amazon, eBay, Alibaba). We assume that several sellers compete for the same market, i.e., customers are able to compare prices of different competitors at any given point in time.

We assume that the time horizon is infinite. We assume that firms are able to reproduce or reorder products (promise to deliver), and the ordering is decoupled from pricing decisions. If a sale takes place, shipping costs c have to be paid, \(c \ge 0\). A sale of one item at price a, \(a \ge 0\), leads to a net profit of \(a-c\). Discounting is also included in the model. We will use the discount factor \(\delta \), \(0< \delta < 1\), for the length of one period.

On the majority of marketplaces, prices cannot be continuously adjusted. Thus, we consider a discrete time model. The sales intensity of our product is denoted by \(\lambda \). Due to customer choice, the sales intensity will particularly depend on our offer price a and the competitors’ prices. We also allow the sales intensity to depend on time, e.g., the time of the day or the week. We assume that the time dependence is periodic and has an integer cycle length of J periods. In our model, the sales intensity \(\lambda \) is a general function of time, our offer price a and the competitors’ prices \(\varvec{p}\). Given the prices a and \(\varvec{p}\) in period t, the jump intensity \(\lambda \) satisfies, \(t = 0,1,2,...\), \(a \ge 0\), \(\varvec{p} \ge \varvec{0}\),

$$\begin{aligned} {\lambda _t}(a,\varvec{p}) = {\lambda _{{t_{}}\,{{\bmod }_{}}\,J}}(a,\varvec{p}). \end{aligned}$$
(1)

We assume the sales probabilities (for one period) to be Poisson distributed in our discrete time model. That means the probability to sell exactly i items within one period of time is given by, \(t = 0,1,2,...\), \(a \ge 0\), \(\varvec{p} \ge \varvec{0}\), \(i = 0,1,2,...\),

$$\begin{aligned} {P_t}(i,a,\varvec{p}) = \frac{{{\lambda _t}{{(a,\varvec{p})}^i}}}{{i!}} \cdot {e^{ - {\lambda _t}(a,\varvec{p})}}. \end{aligned}$$
(2)

A price a has to be determined for each period t. We call strategies \({({a_t})_t}\) admissible when they belong to the class of Markovian feedback policies; i.e., pricing decisions \({a_t} \ge 0\) may depend on time t and the current prices of the competitors. By A we denote the set of admissible prices. A list of variables and parameters is given in the Appendix, cf. Table 4.

By \({X_t}\) we denote the random number of sales in period t. Depending on the chosen pricing strategy \({({a_t})_t}\), the random accumulated profit from time/period t on (discounted on time t) amounts to, \(t = 0,1,2,...\),

$$\begin{aligned} {G_t}:= \sum \limits _{s = t}^\infty {{\delta ^{s - t}} \cdot ({a_s} - c) \cdot {X_s}}. \end{aligned}$$
(3)

The objective is determining a non-anticipating (Markovian) pricing strategy that maximizes the expected total profit \(E({G_0})\).

In the next sections, we will solve dynamic pricing problems that are related to (1)–(3). Further, we mostly assume a duopoly situation. We assume that the competitor frequently adjusts his/her prices and show how to derive optimal response strategies. We analyze the impact of different reaction times as well as randomized reaction times. We also consider the case in which the competitor plays mixed strategies. In Sect. 4, we compute pricing strategies for duopoly scenarios with additional passive competitors. Eventually, we let the competitor also apply optimized response strategies in Sects. 5 and 6.

3 Duopoly: Optimal Reaction Strategies

Due to the increasing market transparency on e-commerce platforms, sellers can observe and thus anticipate transitions of the market situation. In this section, we examine a duopoly where we compete with a seller that frequently adjusts her prices using a predetermined strategy.

3.1 Fixed Reaction Times

Having information about a competitor’s strategy at hand and being able to anticipate it allows us to optimize expected profits. Here, the price responses of competitors as well as their reaction time can be taken into account. In this case, a change of the market situation \(\varvec{p}\) can take place within a period. A typical scenario is that a competitor adjusts its price in response to our price with a certain delay. Throughout this section, we assume that the pricing strategy and the reaction time of the competitor is known; i.e., we assume that choosing a price a at time t is followed by a state transition (e.g., a competitor’s price reaction) and the current market situation \(\varvec{p}\) changes to a subsequent state described by a transition function F, which can depend both on the market situation \(\varvec{p}\) as well as price a.

We want to derive optimal price response strategies to a given competitor’s strategy. For simplicity, we consider the sale of one type of product in a duopoly situation. We assume that the state of the system (the market situation) is one-dimensional and simply characterized by the competitor’s price p, i.e., we let \(\varvec{p}:=p\).

In real-life applications, a firm is not able to adjust its prices immediately after the price reaction of the competing firm. Consequently, we assume that in each period the price reaction of the competing firm takes place with a delay of h periods, \(h < 1\). Thus, after an interval of size h the competitor adjusts its price from p to F(a), as illustrated in Fig. 2.

Fig. 2.
figure 2

Duopoly: sequence of price reactions, cf. [17].

In period t, the probability to sell exactly i items during the first interval (Phase 1, cf. Fig. 2) of size h is

$$\begin{aligned} P_t^{(h)}(i,a,p): = Pois\left( {h \cdot {\lambda _t}(a,p)} \right) \end{aligned}$$

while for the rest of the period (Phase 2, cf. Fig. 2) the sales probability changes to \(P_t^{(1 - h)}\left( {i,a,F(a)} \right) = Pois\left( {(1 - h) \cdot {\lambda _t}\left( {a,F(a)} \right) } \right) \).

We will use value iteration to approximate the value function which represents the present value of future profits. For a given “large” number T, \(T \gg J\), we let \({V_T}(p) = 0\) for all p, and compute, \(t = 0,1,2,...,T - 1\), \(0< h < 1\), \(p \in A\),

$$\begin{aligned} {V_t}(p) = \mathop {\max }\limits _{a \in A} \left\{ {\sum \limits _{{i_1} \ge 0} {P_t^{(h)}({i_1},a,p)} \cdot \sum \limits _{{i_2} \ge 0} {P_{t + h}^{(1 - h)}\left( {{i_2},a,F(a)} \right) } } \right. \end{aligned}$$
$$\begin{aligned} \left. { \cdot \left( {(a - c) \cdot ({i_1} + {i_2}) + \delta \cdot V_{t + 1}^{}\left( {F(a)} \right) } \right) } \right\} . \end{aligned}$$
(4)

The associated pricing strategy \(a_t^*(p)\), \(t = 0,1,2,...,J - 1\), \(p \in A\), is determined by the arg max of

$$a_t^*(p) = \mathop {\arg \max }\limits _{a \in A} \left\{ {\sum \limits _{{i_1} \ge 0} {P_t^{(h)}({i_1},a,p)} } \right. { \cdot \sum \limits _{{i_2} \ge 0} {P_{t + h}^{(1 - h)}\left( {{i_2},a,F(a)} \right) } }$$
$$\begin{aligned} \left. { \cdot \left( {(a - c) \cdot ({i_1} + {i_2}) + \delta \cdot V_{t + 1}^{}\left( {F(a)} \right) } \right) } \right\} . \end{aligned}$$
(5)

Given \(a_t^*(p)\) is not unique, we choose the largest one.

Remark 1

The recursive solution approach also allows to solve problems with perishable products and finite horizons T. Simply be evaluating Eqs. (4)–(5) for all \(t = 0,1,2,...,T - 1\).

In order to illustrate the approach, let us consider a numerical example for durable goods. We assume the competitor applies one of the most common strategies: our current price is undercut by \(\varepsilon \) down to a certain minimum (e.g., the shipping costs c). The sales dynamics of the following example above are based on a large data set from the Amazon marketplace for used books [19].

Definition 1

By \(P_t^{(h)}(i,a,\varvec{p}): = Pois\left( {h \cdot {e^{\varvec{x}(a,\varvec{p})'{{\varvec{\beta }}}}}/(1 + {e^{\varvec{x}(a,\varvec{p})'{{\varvec{\beta }}}}})} \right) \) we define sales probabilities for oligopoly settings which are based on linear combinations of the following five regressors \(\varvec{x} = \varvec{x}(a,\varvec{p})\), \(\varvec{p} = (p_1,...,p_K)\) with given coefficients \(\varvec{\beta }= ({\beta _1},...,{\beta _5})\):

  • (i) constant/intercept

    $$\begin{aligned} {x_1}(a,\varvec{p}) = 1 \end{aligned}$$
  • (ii) rank of price a within the set of competitor prices \(\varvec{p}\)

    $$\begin{aligned} {x_2}(a,\varvec{p}) = 1 + \left| {\left\{ {k = 1,...,K \left| {p_k < a} \right. } \right\} } \right| + 0.5 \cdot \left| {\left\{ {k = 1,...,K\left| {p_k = a} \right. } \right\} } \right| \end{aligned}$$
  • (iii) price gap between price a and the best competitor price

    $$\begin{aligned} {x_3}(a,\varvec{p}) = a_{} - \mathop {\min }\limits _{k = 1,...,K_{}} \{ p_k\} \end{aligned}$$
  • (iv) total number of competitors

    $$\begin{aligned} {x_4}(a,\varvec{p}) = K \end{aligned}$$
  • (v) average price level

    $$\begin{aligned} {x_5}(a,\varvec{p}) = (a_{} + \sum \nolimits _k {p_k} )/(1 + {K}) \end{aligned}$$

Example 1

We assume a duopoly, i.e., \(K=1\) and \(\varvec{p} = p\). Let \(c=3\), \(\delta = 0.99\), \(0 \le h \le 1\), and let \(F(a): = \max (a - \varepsilon ,c)\), \(\varepsilon \) = 1, \(a \in A: = \left\{ {1,2,...,100} \right\} \). For the computation of the value function, we let \(T:=1000\). We assume the sales probabilities \(P_t^{(h)}( \cdot ,a,p)\), cf. Definition 1, where \(\varvec{\beta }= (\mathrm{{ - 3}}\mathrm{{.89}}{,_{}}\mathrm{{ - 0}}\mathrm{{.56}}{\mathrm{{,}}_{}}\mathrm{{ - 0}}\mathrm{{.01}}{\mathrm{{,}}_{}}\mathrm{{0}}\mathrm{{.07}}{\mathrm{{,}}_{}}\mathrm{{ - 0}}\mathrm{{.02}})\).

Figures 3(a) and 4(a) illustrate optimal response strategies for different reaction times h = 0.1 and h = 0.9. The case \(h=0.1\) illustrates a fast reaction time of the competitor; \(h=0.9\) represents a slow reaction of the competitor. In the case of \(h=0.5\), both competing firms react equally fast. In all three cases the optimal response strategies are of similar shape. If the competitor’s price is either very low or very large, it is optimal to set the price to a certain moderate level. If the competitor’s price is somewhere in between (intermediate range), it is advisable to undercut that price by one price unit \(\varepsilon \). If h is larger, also the intermediate range is larger and the upper price level is increasing.

Fig. 3.
figure 3

Example 1 with \(h = 0.1\): optimal response strategy and price paths, cf. [17].

Employing optimal response strategies can create cyclic price patterns over time, so-called Edgeworth cycles [1, 2, 18]. The resulting price paths are illustrated in Figs. 3(b) and 4(b). We observe that the cycle length and the amplitude of the price patterns are increasing if the reaction time of the competitor is longer. Note, roughly \(h \cdot 100\%\) of the time our firm is offering the lowest price; i.e., the parameter h can also be used to model situations in which one firm is able to adjust its prices more often than another firm [20, 21].

Fig. 4.
figure 4

Example 1 with \(h = 0.9\): optimal response strategy and price paths, cf. [17].

Additionally, we are able to analyze the impact of the reaction time on expected long-term profits of our firm as well as the competitor. We assume that the competitor faces the same sales probabilities and shipping costs. The competitor’s expected profits can be recursively evaluated by, cf. (4), \(t = 0,1,2,...,T - 1\), \(0< h < 1\), \(a \in A\), \(V_{T + h}^{(c)}(a) = 0\),

$$V_{t + h}^{(c)}(a) = \sum \limits _{{i_2} \ge 0} {P_{t + h}^{(1 - h)}\left( {{i_2},F(a),a} \right) } \cdot \sum \limits _{{i_1} \ge 0} {P_{t + 1}^{(h)}\left( {{i_1},F(a),a_{t + {1_{}}\,{{\bmod }_{}}\,J}^*(F(a))} \right) } $$
$$\begin{aligned} \quad \cdot \left( {(F(a) - c) \cdot ({i_1} + {i_2}) + \delta \cdot V_{t + h + 1}^{(c)}\left( {a_{t + {1_{}}\,{{\bmod }_{}}\,J}^*\left( {F(a)} \right) } \right) } \right) . \end{aligned}$$
(6)

Because of the cyclic price paths, expected future profits \({V_0}(p)\) and \(V_h^{(c)}(a)\) are (almost) independent of the initial states or prices. Figure 5 depicts V as well as the competitor’s expected profits \({V^{(c)}}\) as a function of h. We observe that the expected profit V is linear increasing in the competitor’s reaction time; the competitor’s profit \({V^{(c)}}\) is decreasing in h. Note, the impact of h is substantial. The “disadvantage” of the player that stops the undercutting phase can already be compensated in case our reaction time is smaller than 0.46, i.e., if h is larger than 0.54.

3.2 Randomized Reaction Times

Due to the shown significant impact of reaction times firms will try to gain advantage by updating their prices more frequently. In addition, firms might also try to minimize their reaction times by anticipating their competitor’s time of adjustment. In order not to act predictably, firms will randomize their reaction times.

Our model can be extended to capture the cases in which reaction times are not deterministic. If the distribution of the reaction time of competitors is known, the Hamilton-Jacobi-Bellman (HJB) equation, cf. (4), can be modified. The different reaction scenarios just have to be considered with their corresponding probability. Note, the reaction times of different competitors can be observed over longer time spans.

Fig. 5.
figure 5

Expected profit for different reaction times of the competitor (Example 1), cf. [17].

In the following, we consider scenarios with randomized reaction times. We assume that each firm adjusts its price with a certain intensity (e.g., on average once a period of size 1). We model that approach as follows: we assume that at each point in time d, \(d = t + \varDelta ,t + 2\varDelta ,...,t + 1\), \(0 < \varDelta \ll 1\), our firm adjusts its price with probability q, \(0 < q \ll 1\); i.e., on average we adjust our price \(q/\varDelta \) times a period of size 1. Similarly, the competitor adjusts its price with probability \({q^{(c)}}\), \(0 < {q^{(c)}} \ll 1\). The competitor applies a certain strategy F(a). By \({a^ - }\) we denote our current price at time d, the beginning of the sub-period \((d,d + \varDelta )\). With probability \({q^{(c)}}\), the competitor adjusts its price from p to \(F({a^ - })\). We adjust the price \({a^ - }\) to price a with probability q. Since q and \({q^{(c)}}\) are assumed to be “small” we do not consider the case in which both firms adjust their prices at the same time. The related value function is given by, \({a^ - }, p \in A\), \(t = 0,\varDelta ,2\varDelta ,...,T - \varDelta \), \({\tilde{V}_T}({a^ - },p) = 0\),

$${\tilde{V}_t}({a^ - },p) = (1 - q - {q^{(c)}}) \cdot \sum \limits _{i \ge 0} {P_t^{(\varDelta )}(i,{a^ - },p)} $$
$$\begin{aligned} \cdot \left( {({a^ - } - c) \cdot i + {\delta ^\varDelta } \cdot \tilde{V}_{t + \varDelta }^{}({a^ - },p)} \right) \end{aligned}$$
$$ +~{q^{(c)}} \cdot \sum \limits _{i \ge 0} {P_t^{(\varDelta )}(i,{a^ - },F({a^ - }))} \cdot \left( {({a^ - } - c) \cdot i + {\delta ^\varDelta } \cdot \tilde{V}_{t + \varDelta }^{}({a^ - },F({a^ - }))} \right) $$
$$\begin{aligned} + q \cdot \mathop {\max }\limits _{a \in A} \left\{ {\sum \limits _{i \ge 0} {P_t^{(\varDelta )}(i,a,p) \cdot } \left( {(a - c) \cdot i + {\delta ^\varDelta } \cdot \tilde{V}_{t + \varDelta }^{}\left( {a,p} \right) } \right) } \right\} . \end{aligned}$$
(7)

The optimal price \(\tilde{a}_t^*({a^ - },p)\), \(t = 0,\varDelta ,2\varDelta ,...,J - \varDelta \), is determined by the arg max of (7). The competitor’s expected profit corresponds to, \(t = 0,\varDelta ,2\varDelta ,...,T - \varDelta \), \(\tilde{V}_T^{(c)}({a^ - },p) = 0\),

$$\tilde{V}_t^{(c)}({a^ - },p) = (1 - q - {q^{(c)}}) \cdot \sum \limits _{i \ge 0} {P_t^{(\varDelta )}(i,p,{a^ - })} $$
$$\begin{aligned} \cdot \left( {(p - c) \cdot i + {\delta ^\varDelta } \cdot \tilde{V}_{t + \varDelta }^{(c)}({a^ - },p)} \right) \end{aligned}$$
$$\begin{aligned} +\,{q^{(c)}} \cdot \sum \limits _{i \ge 0} {P_t^{(\varDelta )}(i,F({a^ - }),{a^ - })} \end{aligned}$$
$$\begin{aligned} \cdot \left( {(F({a^ - }) - c) \cdot i + {\delta ^\varDelta } \cdot \tilde{V}_{t + \varDelta }^{(c)}({a^ - },F({a^ - }))} \right) \end{aligned}$$
$$\begin{aligned} +\,q \cdot \sum \limits _{i \ge 0} {P_t^{(\varDelta )}\left( {i,p,\tilde{a}_{{t_{}}{{\bmod }_{}}J}^*({a^ - },p)} \right) } \end{aligned}$$
$$\begin{aligned} \cdot \left( {(p - c) \cdot i + {\delta ^\varDelta } \cdot \tilde{V}_{t + \varDelta }^{(c)}\left( {\tilde{a}_{{t_{}}{{\bmod }_{}}J}^*({a^ - },p),p} \right) } \right) . \end{aligned}$$
(8)

Example 2

We assume the duopoly setting of Example 1 and let \(c=3\), \(F(a): = \max (a - \varepsilon ,c)\), \(\varepsilon = 1\), \(a \in A: = \left\{ {1,2,...,100} \right\} \), \(\delta \) = 0.99, \(\varDelta \) = 0.1. We use \(T:=1000\). We consider different reaction probabilities q and \({q^{(c)}}\).

Table 1 contains the expected profits \((\tilde{V}\), \({\tilde{V}^{(c)}})\) of the two competing firms for different reaction probabilities. We observe that \(\tilde{V}\) is increasing in q and decreasing in \({q^{(c)}}\). For \({\tilde{V}^{(c)}}\) it is the other way around. We found that the ratio \(q/{q^{(c)}}\) of the adjustment frequencies is a critical quantity.

Table 1. Expected profits \((\tilde{V},{\tilde{V}^{(c)}})\) of both firms for different reaction probabilities q, \({q^{(c)}}= 0.05, 0.1, 0.2\), \(\delta =0.99\), \(\varDelta =0.1\); Example 2, cf. [17].

The importance of the overall adjustment frequency is alleviated as long as the ratio \(q/{q^{(c)}}\) is the same. Hence, the expected profits of both firms can be approximated by the profits from the model with deterministic reaction time, cf. Sect. 3.1, where \(h=q/{q^{(c)}}\), i.e., the percentage of time our firm has the most recently updated price.

Fig. 6.
figure 6

Comparison of evaluated price paths, cf. [17].

Figure 6(b) shows the price paths for the parameter setting of Example 2. Figure 6(a) shows the deterministic case of Example 1 for \(h=0.5\). We observe that overall the price patterns have similar characteristics. However, in the randomized case, the timing of the price reactions is not predictable. While in the deterministic \(h=0.5\) case (cf. Sect. 3.1) we have \(\tilde{V} = 16.44\) and \({\tilde{V}^{(c)}} = 17.13\), in the randomized case (\(\varDelta =0.1\), \(q = {q^{(c)}}=0.1\)) the expected profits are \(\tilde{V} = 16.48\) and \({\tilde{V}^{(c)}} = 17.09\). In both models the advantage of the aggressive player is basically the same, but for the model with randomized reaction times the advantage is slightly smaller.

3.3 Mixed Competitors’ Strategies

If the competitor’s strategy is known, suitable response strategies can be computed. Hence, firms might try to randomize their strategies. In this section, we will analyze scenarios in which competitors play a mixed pricing strategy.

Let us assume that the competitor plays strategy \({F_k}(a)\), \(a \in A\), with probability \({\pi _k}\), \(1 \le k \le K < \infty \), \(\sum \nolimits _k {{\pi _k}} = 1\). Further, we assume deterministic reaction times. We adjust our model, cf. Sect. 3.1, by using a weighted sum of potential price reactions. The Hamilton-Jacobi-Bellman (HJB) equation can be written as, \(t = 0,1,2,...,T - 1\), \(0< h < 1\), \(p \in A\),

$${V_t}(p) = \mathop {\max }\limits _{a \in A} \left\{ {\sum \limits _{{i_1} \ge 0} {P_t^{(h)}({i_1},a,p)} } \right. { \cdot \sum \limits _k {{\pi _k} \cdot \sum \limits _{{i_2} \ge 0} {P_{t + h}^{(1 - h)}\left( {{i_2},a,{F_k}(a)} \right) } } }$$
$$\begin{aligned} \left. { \cdot \left( {(a - c) \cdot ({i_1} + {i_2}) + \delta \cdot V_{t + 1}^{}\left( {{F_k}(a)} \right) } \right) } \right\} , \end{aligned}$$
(9)

where \({V_T}(p) = 0\) for all p. The associated pricing strategy \(a_t^*(p)\), \(t = 0,1,2,...,J - 1\), \(0< h < 1\), \(p \in A\), is determined by the arg max of (9). The resulting competitor’s expected profits can be computed by (starting from, e.g., \(V_{T + h}^{(c)}(a) = 0\)), \(t = 0,1,2,...,T - 1\), \(0< h < 1\), \(a \in A\),

$$\begin{aligned} V_{t + h}^{(c)}(a) = \sum \limits _k {{\pi _k} \cdot } \sum \limits _{{i_2} \ge 0} {P_{t + h}^{(1 - h)}\left( {{i_2},{F_k}(a),a} \right) } \end{aligned}$$
$$\begin{aligned} \cdot \sum \limits _{{i_1} \ge 0} {P_{t + 1}^{(h)}\left( {{i_1},{F_k}(a),a_{t + 1{_{}}\,{{\bmod }_{}}\,J}^*({F_k}(a))} \right) } \end{aligned}$$
$$\begin{aligned} \cdot \left( {({F_k}(a) - c) \cdot ({i_1} + {i_2}) + \delta \cdot V_{t + h + 1}^{(c)}\left( {a_{t + 1{_{}}\,{{\bmod }_{}}\,J}^*\left( {{F_k}(a)} \right) } \right) } \right) . \end{aligned}$$
(10)

Using the models just introduced, we can compute suitable pricing strategies in various competitive markets. As long as the number of competing firms is small, the value function and the optimal prices can be computed. Note, due to the coupled state transitions in general the value function has to be computed for all states in advance. When the number of competitors is large this can cause serious problems since the state space can grow exponentially (curse of dimensionality).

The approach is suitable if the number of competitors is small and their strategies are known. If the number of competitors is large and the strategies are unknown, we recommend using simple but robust strategies [19].

4 Active and Passive Sellers in Competition

In case the pricing strategies and the competitors’ reaction times are known, the model can be extended to an oligopoly setting. For each additional competitor the state space of the model has to be extended by one dimension. Note, only active competitors that frequently adjust their prices should be taken into account. Inactive customers will be treated as external fixed effects.

We assume one active competitor and Z passive competitors. The prices of the passive competitors are denoted by \(\varvec{z} = ({z_1},...,{z_Z})\), \({z_j} \ge 0\), \(j = 1,...,Z\), and assumed to be constant over time. The active competitor employs a (non-randomized) strategy F(a) that refers to our price a (not the passive one). The Hamilton-Jacobi-Bellman (HJB) equation can be written as, \(t = 0,1,2,...,T - 1\), \(0< h < 1\), \(p \ge 0\), \({V_T}(p,\varvec{z}) = 0\) for all \(p,\varvec{z}\),

$${V_t}(p,\varvec{z}) = \mathop {\max }\limits _{a \in A} \left\{ {\sum \limits _{{i_1} \ge 0} {P_t^{(h)}({i_1},a,p,\varvec{z})} } \right. { \cdot \sum \limits _{{i_2} \ge 0} {P_{t + h}^{(1 - h)}\left( {{i_2},a,F(a),\varvec{z}} \right) } }$$
$$\begin{aligned} \left. { \cdot \left( {(a - c) \cdot ({i_1} + {i_2}) + \delta \cdot V_{t + 1}^{}\left( {F(a),\varvec{z}} \right) } \right) } \right\} . \end{aligned}$$
(11)

The associated pricing strategy amounts to, \(t = 0,1,2,...,J - 1\), \(0< h < 1\), \(p \in A\),

$$a_t^*(p,\varvec{z}) = \mathop {\arg \max }\limits _{a \in A} \left\{ {\sum \limits _{{i_1} \ge 0} {P_t^{(h)}({i_1},a,p,\varvec{z})} } \right. { \cdot \sum \limits _{{i_2} \ge 0} {P_{t + h}^{(1 - h)}\left( {{i_2},a,F(a),\varvec{z}} \right) } }$$
$$\begin{aligned} \left. { \cdot \left( {(a - c) \cdot ({i_1} + {i_2}) + \delta \cdot V_{t + 1}^{}\left( {F(a),\varvec{z}} \right) } \right) } \right\} . \end{aligned}$$
(12)

The competitor’s profits can be computed by (using, e.g., \({V_{T + h}}(a,\varvec{z}) = 0\) for all \(a,\varvec{z}\)), \(t = 0,1,2,...,T - 1\), \(0< h < 1\), \(a \ge 0\),

$$\begin{aligned} V_{t + h}^{(c)}(a,\varvec{z}) = \sum \limits _{{i_2} \ge 0} {P_{t + h}^{(1 - h)}\left( {{i_2},F(a),a,\varvec{z}} \right) } \end{aligned}$$
$$\begin{aligned} \cdot \sum \limits _{{i_1} \ge 0} {P_{t + 1}^{(h)}\left( {{i_1},F(a),a_{t + {1_{}}\,{{\bmod }_{}}\,J}^*(F(a),\varvec{z}),\varvec{z}} \right) } \end{aligned}$$
$$\begin{aligned} \cdot \left( {(F(a) - c) \cdot ({i_1} + {i_2}) + \delta \cdot V_{t + h + 1}^{(c)}\left( {a_{t + {1_{}}{{\bmod }_{}}J}^*\left( {F(a),\varvec{z}} \right) ,\varvec{z}} \right) } \right) . \end{aligned}$$
(13)

It is not necessary to compute the value function for all price combinations of passive competitors in advance. The value function and the associated pricing strategy can be computed separately for specific market situations (e.g., just when they occur). In the following, we consider an example with active and passive competitors.

Fig. 7.
figure 7

Optimal response strategy and evaluated price paths (Example 3; \(h=0.5\), \(z=15\)), cf. [17].

Fig. 8.
figure 8

Optimal response strategy and evaluated price paths (Example 3; \(h=0.5\), \(z=25\)), cf. [17].

Example 3

We assume the duopoly setting of Example 1 and let \(F(a):= \max (a - \varepsilon ,c)\), \(\varepsilon =1\), \(c=3\), \(h=0.5\), \(a \in A: = \left\{ {1,2,...,100} \right\} \), \(\delta =0.99\), and \(T=1000\). Further, we consider an additional passive competitor with a constant price z, \(z=15, 20, 25\).

For the three cases \(z=15\), \(z=20\), and \(z=25\) the results are shown in Figs. 7, 8, and 9. We observe three different characteristics.

If the passive competitor’s price is low (\(z=15\)) the cyclic price battle between our firm and the aggressive firm takes place at a high price level, see Fig. 7(b). The response strategies of the three firms are illustrated in Fig. 7(a).

In the case that the price of passive firm is sufficiently high (\(z=20\)), the cyclic price paths of the two active firms take place below that level. If the constant price is “moderate” (\(z=20\)), then a mixture of the characteristics shown in Figs. 7 and 8 is optimal. Note, it is not advisable to place price offers that slightly exceed competitors’ prices (see Fig. 9).

Fig. 9.
figure 9

Optimal response strategy and evaluated price paths (Example 3; \(h=0.5\), \(z=20\)), cf. [17].

5 Duopoly: Iterated Strategy Adjustments

In this section, we generally evaluate the outcome when different strategies are played against each other in a duopoly setting.

5.1 Evaluating Competing Strategies

We assume time homogeneous demand and \(h=0.5\). If firm 1 plays a pure strategy \(S_1\) and firm 2 plays the pure strategy \(S_2\) then the associated expected profits can be computed by, \(t = 0,1,2,...,T - 1\), \(V_T^{(1)}(p) = V_T^{(2)}(p) = 0\), for all \(p \ge 0\),

$$\begin{aligned} V_t^{(1)}(p) = \sum \limits _{{i_1} \ge 0} {P_{}^{(h)}\left( {{i_1},{S_1}(p),p} \right) } \cdot \sum \limits _{{i_2} \ge 0} {P_{}^{(1 - h)}\left( {{i_2},{S_1}(p),{S_2}({S_1}(p))} \right) } \end{aligned}$$
$$\begin{aligned} \cdot \left( {({S_1}(p) - c) \cdot ({i_1} + {i_2}) + \delta \cdot V_{t + 1}^{(1)}\left( {{S_2}\left( {{S_1}(p)} \right) } \right) } \right) , \end{aligned}$$
(14)
$$\begin{aligned} V_t^{(2)}(p) = \sum \limits _{{i_1} \ge 0} {P_{}^{(h)}\left( {{i_1},{S_2}(p),p} \right) } \cdot \sum \limits _{{i_2} \ge 0} {P_{}^{(1 - h)}\left( {{i_2},{S_2}(p),{S_1}({S_2}(p))} \right) } \end{aligned}$$
$$\begin{aligned} \cdot \left( {({S_2}(p) - c) \cdot ({i_1} + {i_2}) + \delta \cdot V_{t + 1}^{(2)}\left( {{S_1}\left( {{S_2}(p)} \right) } \right) } \right) . \end{aligned}$$
(15)

Alternatively, for given strategies \(S_k\), \(k = 1,2\), we can exactly evaluate the associated expected profits \({V^{(k)}}\) by solving the linear system of equations, \(p \in A\), \(j,k = 1,2\), \(j \ne k\),

$$\begin{aligned} {V^{(k)}}(p) = \sum \limits _{{i_1} \ge 0} {P_{}^{({\varDelta _k})}\left( {{i_1},{S_k}(p),p} \right) } \cdot \sum \limits _{{i_2} \ge 0} {P_{}^{({\varDelta _j})}\left( {{i_2},{S_k}(p),{S_j}({S_k}(p))} \right) } \end{aligned}$$
$$\begin{aligned} \cdot \left( {({S_k}(p) - c) \cdot ({i_1} + {i_2}) + \delta \cdot {V^{(k)}}\left( {{S_j}\left( {{S_k}(p)} \right) } \right) } \right) , \end{aligned}$$
(16)

where \({\varDelta _k}:= h\) and \({\varDelta _j}:= 1-h\), \(0< h < 1\). Note, the system (16) has \(\left| A \right| \) equations and can be solved using standard linear solvers.

5.2 Iterating Optimal Response Strategies

In this subsection we let two firms optimally adjust their strategies in order to identify equilibrium strategies. The approach, cf. (16), cannot only be used to evaluate competing strategies, it can also be applied to exactly compute optimal reaction strategies, cf. (4)–(5), by solving the nonlinear system of equations, \(p \in A\), \(j,k = 1,2\), \(j \ne k\),

$$\begin{aligned} {V^{(k)}}(p) = \mathop {\max }\limits _{a \in A} \left\{ {\sum \limits _{{i_1} \ge 0} {P_{}^{({\varDelta _k})}\left( {{i_1},a,p} \right) } \cdot \sum \limits _{{i_2} \ge 0} {P_{}^{({\varDelta _j})}\left( {{i_2},a,{S_j}(a)} \right) } } \right. \end{aligned}$$
$$\begin{aligned} \left. { \cdot \left( {(a - c) \cdot ({i_1} + {i_2}) + \delta \cdot {V^{(k)}}\left( {{S_j}(a)} \right) } \right) } \right\} . \end{aligned}$$
(17)

If the number of admissible prices \(\left| A \right| \) is sufficiently small the system (17) can be solved using standard nonlinear solvers, such as MINOSFootnote 1. The associated pricing strategy \({a^{(k)}}(p;{S_j})\), \(p \in A\), \(j,k = 1,2\), \(j \ne k\), is given by the arg max of (17). If \({a^{(k)}}(p)\) is not unique, we choose the largest one. In the following example, we will iterate optimal response strategies.

Example 4

We assume the duopoly setting of Example 1. If not chosen differently, we let \(c=3\), \(h=0.5\), \(a \in A: = \left\{ {1,2,...,100} \right\} \), \(\delta =0.99\). We consider an initial strategy \({S^{(0)}}(p): = {S_U}(p): = \max (p - \varepsilon ,c)\), \(\varepsilon =1\). Additionally, by \({S^{(k)}}(p) = {S^{(k)}}(p;{S^{(k - 1)}})\) we denote the optimal response strategy to strategy \({S^{(k - 1)}}\), \(k = 1,2,...\), cf. (17).

Considering Example 4, we evaluate the expected profits of the different strategy combinations according to (16). The results are summarized in Table 2. We observe that the aggressive strategy \(S_{U}\) yields good results with the exception when the competitor also plays \(S_{U}\). The strategy \(S^{(1)}\) yields good results in all constellations. Strategy \(S^{(2)}\) is excellent when played against \(S^{(1)}\) but yields only moderate results in the other cases.

Table 2. Expected profits \(V_0^{(1)}(50)\) of firm 1 when its strategy \({S^{(k)}}\) is played against a strategy \({S^{(j)}}\), \(k,j = 0,1,2,...,5\), \({S^{(0)}}:= {S_U}\); Example 4.

Our example shows that optimal response strategies have a significant impact on expected profits. They help to gain profits, especially, when aggressive competitors are involved. On the other hand, we learn that it is also important to know a competitor’s strategies. In practical applications, a competitor’s price reactions can be inferred from market data over time.

6 Equilibrium Strategies

In this section, we want to identify mutual best response strategies. We consider the duopoly setting of Sect. 5. In order to identify equilibrium strategies, we further iterate mutual strategy responses.

We consider the setting of Example 4. Starting with the aggressive strategy \(S_U\) we allow the two competing firms to repeatedly adjust their strategies using optimal response strategies. Figure 10 illustrates the different iterated response strategies \({S^{(k)}}\) for \(k = 0,1,2,...,20\).

Fig. 10.
figure 10

Iterated response strategies (Example 4; \({S^{(0)}}:= {S_U}\), \(h=0.5\)).

We observe that optimal response strategies do not converge to mutual optimal pure strategies. Instead, we obtain a repeating cyclic sequence of strategy adjustments. The structure of the single response strategies is similar to those shown in Figs. 3 and 4.

However, pure mutual optimal response strategies do exist. We consider Example 4 for a different starting strategy. Figure 11 illustrates iterated response strategies \({S^{(k)}}\), \(k = 0,1,2,...,20\), for \({S^{(0)}}: = {S^{(0)}}(p) \equiv 20\).

We observe that after 11 iterations the optimal response strategies converge to a pure equilibrium strategy \({S^*}\) which is such that no firm has an incentive to deviate. The equilibrium strategy has a characteristic structure which can be described as follows.

Remark 2

If the competitor’s price is either below a certain low price \({p_{min}}\) or a above a certain large price \({p_{max}}\), it is optimal to set the price to the upper level \({p_{max}}\). If the competitor’s price is slightly under that upper price level \({p_{max}}\) (upper intermediate range), it is best to undercut that price by one price unit \(\varepsilon \) as long as the competitor’s price is above a certain medium price \({p_{med}}\). Is the competitor’s price below the medium price \({p_{med}}\) and above \({p_{min}}\) (lower intermediate range) it is optimal to decrease the price to \({p_{min}}\).

Fig. 11.
figure 11

Iterating equilibrium strategies (Example 4; \({S^{(0)}}:= 20\), \(h=0.5\)).

The equilibrium strategy is similar to the type of strategy derived in Sect. 3, see Figs. 3 and 4. The difference is the counterintuitive massive price drop (lower intermediate range) to the minimal price \({p_{min}}\).

This phenomenon can be explained as follows. The price drop forces the rational competitor to give in and to raise the price immediately. This way the price range in which the undercutting price battle takes place is shifted to a higher level, which in turn is advantageous for both competitors.

Table 3. Expected profits \(V_0^{(1)}(50)\) of firm 1 when its strategy \({S^{(k)}}\) is played against a strategy \({S^{(j)}}\), \(k,j = 0,1,2,...,5\), \({S^{(0)}}:= {20}\); Example 4.

Table 3 illustrates the expected profits of a firm when different iterated response strategies are played against each other, cf. Table 2, for \({S^{(0)}}:= 20\), i.e., the equilibrium case. We observe that profits quickly converge at a moderate level (16.43) compared to those in Table 2.

We varied different parameters of our model, such as the price granularity, the discount factor, and the initial strategy \({S^{(0)}}\). We found that mainly the initial strategy \({S^{(0)}}\) is responsible for pure equilibrium strategies to exist. In the context of Example 4 we obtain the same equilibrium, see Fig. 11, as long as \({S^{(0)}} \ge 18\). For \({S^{(0)}} < 18\) we obtain response cycles similar to Fig. 10.

Remark 3

If the starting strategy is aggressive, i.e., characterized by low prices we do not obtain a pure strategy equilibrium. If the starting strategy is not aggressive, we usually obtain a pure strategy equilibrium. Furthermore, in case a pure equilibrium strategy exists it is of the structure described above, cf. Remark 2.

At the end of this section, we study how equilibrium strategies are affected by the discount factor. We consider the setting of Example 4. Figure 12 illustrates pure equilibrium strategies for five different discount factors between 0 and 0.99.

Fig. 12.
figure 12

Equilibrium strategies for different discount factors, \(\delta = 0, 0.4, 0.7, 0.85, 0.99\), \(h=0.5\); Example 4.

We observe that for all \(S^{*}\) the mutual optimal response strategies \(\delta \) is of the structure described above which is characterized by (\({p_{min}}\), \({p_{med}}\), \({p_{max}}\)). While \({p_{min}}\) is not affected by \(\delta \) the thresholds \({p_{med}}\) and \({p_{max}}\) increase in \(\delta \). The range of the resulting staircase like price trajectories is hardly affected by \(\delta \) but the level at which the price battle takes place is higher if \(\delta \) increases.

7 Conclusion

The recent rise of e-commerce and the development of web technologies made it increasingly easy for merchants to observe market situations and automatically adjust their prices. Subsequently, more and more companies employ dynamic pricing strategies. In this paper, we analyze stochastic dynamic infinite horizon duopoly models characterized by active competitors. We set up a dynamic pricing model including discounting and shipping costs. The sales probabilities are allowed to arbitrarily depend on time, our price as well as the competitor’s prices. Data-driven estimations of sales intensities under pricing competition can be used to calibrate the model.

Assuming that a competitor’s response strategy is known, we show how to compute optimal reaction strategies that take advantage of price anticipations. As expected, it is often optimal to slightly undercut the competitor’s price. However, when the price falls below a certain lower bound it is advisable to raise the price to a certain upper bound. Our optimized strategies optimally choose these critical price bounds. Optimized feedback strategies effectively avoid a decline in price. Especially, when competitors play aggressive strategies it is important to react in a reasonable way in order not to lose potential profits.

Furthermore, we analyze reaction times or price adjustments frequencies, respectively. We find that they have a huge impact on expected profits. To be able to adjust prices more often than the competitor does is a competitive advantage. Hence, the ratio of the competitors’ prices adjustment frequencies is crucial for the firm’s expected profits. Moreover, it can be profitable to strategically time price adjustments. In order not to use predictable reaction time firms should randomize their price adjustments. We show how to derive optimal response strategies when reaction times are randomized.

We also derive optimal response strategies if additional players are involved that employ fixed price strategies. We analyzed how the presence of such additional passive competitors affects the price battle of two active players that frequently adjust their prices. Our technique to compute prices is simple and easy to implement.

Finally, we evaluated expected profits of competing pairs strategies if both players apply optimized price reactions. In order to identify equilibrium strategies, we analyzed iterated strategy adjustments. Mutual strategy responses do not necessarily have to converge as pure strategy equilibria might not exist. However, pure equilibrium strategies can be identified by iterating mutual strategy responses. We found that as long as strategies are not too aggressive optimal strategy adjustments lead to equilibrium strategies. These strategies have a characteristic structure: in a certain price range it is optimal to undercut the competitor’s price, otherwise it is optimal to either raise the price or force the competitor to restore the price level by significantly dropping the price.

In future research, we will use market data to estimate competitors’ response strategies. We will also extend the model to study the sale of perishable products with inventory restrictions.