Keywords

1 Introduction

In real-life applications, firms have to deal with competition and limited information. Sellers are required to choose appropriate pricing decisions to maximize their expected profits. In e-commerce, it has become easy to observe and to change prices. Hence, dynamic pricing strategies that take into account the competitor’s strategies will be more and more applied.

However, optimal price reactions are not easy to find. Applications can be found in a variety of domains that involve perishable (e.g., airline tickets, accommodation services, seasonal products) as well as durable goods (e.g., technical devices, natural resources).

In this paper, we study duopoly pricing models in a stochastic dynamic framework. We focus on perishable goods. In our model, sales probabilities are allowed to be an arbitrary function of time and the competitor’s prices. Our aim is to take into account scenarios in which (i) the competitor’s inventory level is observable, (ii) the competitor’s inventory level is not observable, and (iii) even the competitor’s pricing strategy is unknown.

1.1 Literature Review

To optimally sell products is a classical application of revenue management theory. The problem is closely related to the field of dynamic pricing, which is summarized in books by Talluri, van Ryzin [1], Phillips [2], and Yeoman, McMahon-Beattie [3]. The survey by Chen, Chen [4] provides an excellent overview of recent pricing models under competition.

Gallego, Wang [5], consider a continuous time multi-product oligopoly for differentiated perishable goods. They use optimality conditions to reduce the multi-dimensional dynamic pure pricing problem to a one dimensional one. Gallego, Hu [6] analyze structural properties of equilibrium strategies in more general oligopoly models for the sale of perishable products. Martinez-de-Albeniz, Talluri [7] consider duopoly and oligopoly pricing models for identical products. They use a general stochastic counting process to model the demand of customers.

Further related models are studied by Yang, Xia [8] and Wu, Wu [9]. Dynamic pricing models under competition that also include strategic customers are analyzed by Levin et al. [10] and Liu, Zhang [11]. Competitive pricing models with limited demand information are studied by Tsai, Hung [12], Adida, Perakis [13], and Chung et al. [14] using robust optimization and demand learning approaches. The effects of strategic interaction of data-driven policies in competitive settings are studied by, e.g., Kephart et al. [15] or Serth et al. [16], using interactive simulation platforms.

In most existing models strong assumptions are made: (i) sales probabilities are assumed to be of a highly stylized form, (ii) the competitors’ inventory levels are assumed to be observable, and (iii) competitors adjust their prices at the same point in time. While many papers concentrate on (the existence of) equilibrium strategies, we look for applicable solution algorithms that allow to compute effective response strategies in more realistic settings: Demand probabilities are allowed to generally depend on time as well as prices of all market participants. Inventory levels do not have to be mutually observable. As in practical applications, we assume sequential mutual price reactions with some delay. We consider a discrete time model which is based on the infinite horizon model described in [17]. We extend their model by additional inventory considerations and a finite horizon setting.

1.2 Contribution

This paper is an extended version of [18]. The main contribution of [18] is threefold. We (i) derive optimal pricing strategies when the competitor’s inventory level is observable, (ii) derive near-optimal pricing strategies for the case that the competitor’s inventory level is not observable, and (iii) we present a heuristic for the case that competitors’ strategies are not known.

Compared to [18], in this paper, we present extended evaluation studies and make the following contributions: First, to determine the value of information, we let our three types of strategies play against each other in different duopoly setups. We show that in different symmetric setups sales results are quite similar. Our evaluations of asymmetric strategy setups show that additional information leads to significantly higher profits (compared to the competitor). We also observe that strategies that use more information tend to have higher standard deviations of profits and a lower load factor. Second, we study to which extent performance results of various competitive setups are affected by the consumers’ price sensitivity. We find that a higher price sensitivity (e.g., when customers are less loyal) does not lead to a significant decrease in expected profits. Third, we study the impact of price response times on our strategies’ performances under various competitive setups. We observe that higher price reaction frequencies can even overcompensate a lack of information.

The remainder of this paper is organized as follows. In Sect. 2, we describe the stochastic dynamic duopoly model for the sale of a finite number of perishable goods. We allow sales intensities to depend on the competitor’s price as well as on time (cf. seasonal effects). The state space of our model is characterized by time and the current competitors’ prices. The stochastic dynamic control problem is expressed in discrete time.

In Sect. 3, we consider a duopoly competition, in which the inventory level of the competitor is observable. We assume that both competitors act rationally. We set up a firm’s Hamilton-Jacobi-Bellman equation and use recursive methods (value iteration) to compute both firms’ value functions. Finally, we are able to compute optimal feedback prices as well as expected profits of the two competing firms. By using numerical examples, we investigate typical properties of optimal pricing policies.

In Sect. 4, we analyze response strategies for cases where the inventory level of the competitor is not observable. By using a Hidden Markov Model, we show how to compute efficient pricing strategies and how to evaluate expected profits. Our proposed solution approach is based on the results of the full information model introduced in the previous section. The key idea is to let the competing firms mutually estimate their competitor’s remaining inventory level. In Sect. 5, we show how to derive applicable dynamic pricing heuristics for cases in which the competitor’s inventory level as well as its pricing strategy are unknown.

In Sect. 6, we compare the different strategies derived in this paper using various numerical experiments. We consider symmetric as well as asymmetric combinations of strategies that use different information structures. Conclusions and future work are given in the final section.

2 Model Description

We consider a situation in which a firm wants to sell a finite number of perishable goods (e.g., airline tickets, hotel tickets, etc.) on a digital market platform. We assume that a second seller competes for the same market. In our model, we allow customers to compare prices of the two different competitors.

The initial numbers of items of firm 1 and firm 2 are denoted by \({N^{(1)}}\) and \({N^{(2)}}\), respectively, \({N^{(1)}},{N^{(2)}} < \infty \). We assume that items cannot be reproduced or reordered. The time horizon T is finite, \(T < \infty \). If firm k sells one item, the shipping costs \({c^{(k)}}\) have to be paid, \(k = 1,2\). A sale of one of firm k’s items at price a leads to a net revenue of \(a - {c^{(k)}}\). Discounting is also included in the model. For the length of one period we use the discount factor \(\delta \), \(0 < \delta \le 1\).

Due to customer choice the sales probabilities of a firm should depend on its offer price a and the competitor’s price p. We also allow the sales probabilities to depend on time.

The (joint) probability that between time t and \(t + \varDelta \) firm 1 can sell exactly i items at a price a, \(a \ge 0\), while firm 2 can sell j items at price p, \(p \ge 0\), is denoted by, \(0 \le t < T\), \(\varDelta > 0\), \(i,j = 0,1,2,...\),

$$\begin{aligned} P_t^{(\varDelta )}(i,j,a,p). \end{aligned}$$

Without loss of generality, in the following, we assume Poisson distributed sales probabilities, i.e.,

$$\begin{aligned} \begin{gathered} P_t^{(\varDelta )}(i,j,a,p): = \frac{{\varLambda _{t,\varDelta }^{(1)}{{(a,p)}^i}}}{{i!}} \cdot {e^{ - \varLambda _{t,\varDelta }^{(1)}(a,p)}}\\ \cdot \frac{{\varLambda _{t,\varDelta }^{(2)}{{(p,a)}^j}}}{{j!}} \cdot {e^{ - \varLambda _{t,\varDelta }^{(2)}(p,a)}}, \end{gathered} \end{aligned}$$
(1)

where \(\varLambda _{t,\varDelta }^{(k)}(a,p): = \int _t^{t + \varDelta } {\lambda _s^{(k)}{{(a,p)}_{}}ds}\), \(k = 1,2\), \(a,p \ge 0\); the sales intensity of a firm k’s product is denoted by \({\lambda ^{(k)}}\). In our model, the sales intensity of firm k, \(k=1,2\), \(t \in \left[ {0,T} \right] \), \(a \ge 0\), \(p \ge 0\),

$$\begin{aligned} \lambda _t^{(k)}(a,p) \end{aligned}$$
(2)

is a general function of time t, offer price a, and the competitor’s price p. The random inventory level of firm k at time t is denoted by \(X_t^{(k)}\), \(0 \le t \le T\). The end of sale for firm k is the random time \({\tau ^{(k)}}\), when all of its items are sold, that is \({\tau ^{(k)}}: = \mathop {\min }\nolimits _{0 \le t \le T} \{ {t:X_t^{(k)} = 0} \} \wedge T\); for all remaining \(t \ge \tau \) we let a firm’s price \({a_t}: = 0\) and \(\lambda _t^{(k)}(0, \cdot ):= 0\), \(k=1,2\). As long as a firm has items left to sell, for each period t, a price a has to be chosen.

We call strategies \({({a_t})_t}\) admissible if they belong to the class of Markovian feedback policies; i.e., pricing decisions \({a_t} \ge 0\) may depend on time t, the current own inventory level, the current prices of the competitor, and (if observable) the inventory level of the competitor. By A we denote the set of admissible prices. A list of variables and parameters is given in the Appendix, see Table 7.

In some applications, sellers are able to anticipate transitions of the market situation. In particular, the price responses of competitors as well as their reaction time can be taken into account. In this case, a change of the competitor’s price p can take place within one period. A typical scenario is that a competitor adjusts its price in response to another competitor’s price adjustment with a certain delay.

In the following two sections, we assume that the pricing strategy and the reaction time of competitors are known. We assume that choosing a price a at time t is followed by a state transition (e.g., a competitor’s price reaction) and the current price p changes to a subsequent price reaction, which may depend on the current price decision a.

We assume that the state of the system is characterized by the inventory levels of both firms and the current competitor’s price. In real-life applications, a firm is not able to adjust its prices immediately after the price reaction of the competing firm. Hence, we assume that in each period the price reaction of the competing firm (firm 2) takes place with a delay of h periods, \(0<h<1\). After an interval of size h the competitor adjusts its price, see Fig. 1. Firm 1 responds to firm 2 with a delay of \(1-h\).

In period t the probability to sell exactly i items during the first interval of size h, i.e., \([t,t + h]\), is \(P_t^{(h)}(i,j,{a_t},{p_{t - 1 + h}})\), \(t = 0,1,...,T - 1\). Due to the competitor’s price reaction for the rest of the period \([t + h,t + 1]\) the sales probability changes to \(P_{t + h}^{(1 - h)}(i,j,{a_t},{p_{t + h}})\), \(t = 0,1,...,T - 1\).

Fig. 1.
figure 1

Sequence of price reactions in a duopoly with reaction time h and \(1-h\), respectively, \(0<h<1\), cf. [18].

For single intervals [0, h] and \([T,T+h]\), we assume that there is no demand and we let \(P_0^{(h)}(i,j,{a_0},{p_0}) = P_T^{(h)}(i,j,{a_T},{p_{T - 1 + h}}): = {1_{\{ i = j = 0\} }}\).

The evolution of the accumulated profit of firm k, \(k=1,2\), is connected to its inventory process \(X_t^{(k)}\) and characterized by each period’s realized net revenues. Depending on the chosen pricing strategy \({({a_t})_t}\) of firm 1 and the strategy \({({p_t})_t}\) of firm 2, the random accumulated profit of firm k from time t on (discounted on time t) amounts to, \(0 \le t \le T\), \(k=1,2\),

$$\begin{aligned} G_t^{(k)}:= \sum \limits _{s = t}^{T - 1} {{\delta ^{s - t}} \cdot ({a_s} - {c^{(k)}}) \cdot \left( {X_s^{(k)} - X_{s + 1}^{(k)}} \right) }. \end{aligned}$$
(3)

Each firm k seeks to determine a non-anticipating (Markovian) pricing policy that maximizes its expected total profit, \(k=1,2\),

$$\begin{aligned} E\left( {G_0^{(k)}\left| {X_0^{(1)} = {N^{(1)}},X_0^{(2)} = {N^{(2)}}} \right. } \right) . \end{aligned}$$
(4)

In the following sections, we solve dynamic pricing problems that are related to (1)–(4). In the next section, we consider competitive duopoly markets with complete information. In Sect. 4, we compute pricing strategies for scenarios with incomplete information and partially observable states, i.e., we assume that the competitor’s inventory level is not observable. In Sect. 5, we additionally assume that the competitor’s strategy is unknown. In Sect. 6, we compare the results of the three different models using extensive numerical experiments.

3 Optimal Dynamic Pricing Strategies in a Duopoly with Observable States

In this section, we want to derive mutual optimal price response strategies. We assume that both firms can mutually observe their inventory levels.

3.1 Solution with Full Knowledge

Following the Bellman approach, the best expected future profits of firm 1 and firm 2, i.e., \(E(G_t^{(1)}|X_t^{(1)} = n{,_{}}~X_t^{(2)} = m{,_{}}~{p_t} = p)\) and \(E(G_{t + h}^{(2)}|X_{t + h}^{(1)} = n{,_{}}~X_{t + h}^{(2)} = m{,_{}}~{a_{t + h}} = a)\), respectively, cf. (4), are described by the value functions \(V_t^*(n,m,p)\) and \(W_{t+h}^*(n,m,a)\), \(t=0,1,...,T\). The set of admissible prices A can be continuous or discrete. If either all items are sold or the time is up, no future profits can be made, i.e., the natural boundary condition for the value functions V and W are given by, \(n = 0,1,...,{N^{(1)}}\), \(m = 0,1,...,{N^{(2)}}\), \(a,p \in A\), \(t = 0,1,...,T - 1\),

$$\begin{aligned} V_t^*(0,m,p) = 0, \quad and \quad V_T^*(n,m,p) = 0, \end{aligned}$$
(5)
$$\begin{aligned} W_{t+h}^*(n,0,a) = 0, \quad and \quad W_{T+h}^*(n,m,a) = 0. \end{aligned}$$
(6)

We assume that in case of a run-out a firm sets its price equal to zero for the rest of the time horizon. The Hamilton-Jacobi-Bellman (HJB) equation of firm 1 can be written as, \(t = 0,1,...,T - 1\), \(n = 1,...,{N^{(1)}}\), \(m = 0,...,{N^{(2)}}\), \(0< h < 1\), \(a,p \in A\),

$$\begin{aligned} \begin{gathered} V_t^*(n,m,p) = \mathop {\max }\limits _{a \in A} \left\{ {\sum \limits _{{i_1},{j_1} \ge 0} {P_t^{(h)}({i_1},{j_1},a,p)} } \right. \\ \cdot \sum \limits _{{i_2},{j_2} \ge 0} {P_{t + h}^{(1 - h)}\left( {{i_2},{j_2}{{,1}_{\{ n - {i_1}> 0\} }} \cdot a,} \right. } \\ \left. {p_{t + h}^*\left( {{{(n - {i_1})}^ + },{{(m - {j_1})}^ + }{{,1}_{\{ n - {i_1}> 0\} }} \cdot a} \right) } \right) \\ \cdot \left( {(a - {c^{(1)}}) \cdot \min (n,{i_1} + {i_2}) } \right. \\ + \,\delta \cdot V_{t + 1}^*\left( {{{(n - {i_1} - {i_2})}^ + },{{(m - {j_1} - {j_2})}^ + }, {1_{\{ m - {j_1} - {j_2}> 0\} }}} \right. \\ \left. {\left. {\left. { \cdot \,p_{t + h}^*\left( {{{(n - {i_1})}^ + },{{(m - {j_1})}^ + }{{,1}_{\{ n - {i_1} > 0\} }} \cdot a} \right) } \right) } \right) } \right\} . \end{gathered} \end{aligned}$$
(7)

Note, (7) mirrors all possible sales scenarios within one period of time and takes the corresponding inventory transitions as well as the anticipated optimal price reactions of the competitor into account.

The HJB of firm 2 is given by, \(t = 0,1,...,T - 1\), \(n = 0,...,{N^{(1)}}\), \(m = 1,...,{N^{(2)}}\), \(0< h < 1\), \(a,p \in A\),

$$\begin{aligned} \begin{gathered} W_{t + h}^*(n,m,a) = \mathop {\max }\limits _{p \in A} \left\{ {\sum \limits _{{i_2},{j_2} \ge 0} {P_{t + h}^{(1 - h)}({i_2},{j_2},a,p)} } \right. \\ \cdot \sum \limits _{{i_1},{j_1} \ge 0} {P_{t + 1}^{(h)}\left( {{i_1},{j_1},} \right. } \\ \left. {a_{t + 1}^*\left( {{{(n - {i_1})}^ + },{{(m - {j_1})}^ + }{{,1}_{\{ m - {j_1}> 0\} }} \cdot p} \right) {{,1}_{\{ m - {j_1}> 0\} }} \cdot p} \right) \\ \cdot \,\left( {(p - {c^{(2)}}) \cdot \min (m,{j_1} + {j_2}) } \right. \\ +\, \delta \cdot W_{t + 1 + h}^*\left( {{{(n - {i_1} - {i_2})}^ + },{{(m - {j_1} - {j_2})}^ + }, } \right. \\ \left. {\left. {\left. { {1_{\{ n - {i_1} - {i_2}> 0\} }} \cdot a_{t + 1}^*\left( {{{(n - {i_1})}^ + },{{(m - {j_1})}^ + }{{,1}_{\{ m - {j_1} > 0\} }} \cdot p} \right) } \right) } \right) } \right\} . \end{gathered} \end{aligned}$$
(8)

The associated prices of both firms are given by the arg max of (7) and (8), respectively, i.e., \(n,m > 0\), \(t = 0,1,...,T - 1\),

$$\begin{aligned} a_t^*(n,m,p) = \mathop {\arg \max }\limits _{a \in A} \left\{ {...} \right\} , \end{aligned}$$
(9)
$$\begin{aligned} p_{t + h}^*(n,m,a) = \mathop {\arg \max }\limits _{p \in A} \left\{ {...} \right\} . \end{aligned}$$
(10)

If a firm runs out of inventory, we set the price 0, i.e., for all m, p we let \(a_t^*(0,m,p) = 0\) and for all n, a, we let \(p_{t + h}^*(n,0,a) = 0\). The coupled value functions and the optimal feedback policies of the two competing firms can be computed in the following recursive order, cf. (5)–(6):

$$\begin{aligned} \begin{gathered} p_{T - 1 + h}^*(n,m,a){,_{}}~W_{T - 1 + h}^*{(n,m,a)_{}} \rightarrow \\ a_{T - 1}^*(n,m,p){,_{}}~V_{T - 1}^*{(n,m,p)_{}}{ \rightarrow _{}} \ldots \\ { \ldots _{}}{ \rightarrow _{}}p_h^*(n,m,a){,_{}}~W_h^*(n,m,a)\\ { \rightarrow _{}}a_0^*(n,m,p){,_{}}~V_0^*(n,m,p). \end{gathered} \end{aligned}$$
(11)

3.2 Numerical Examples

To illustrate the approach, cf. (5)–(11), we consider a numerical example.

Example 3.1

We assume a duopoly. Let \(T=50\), \({c^{(1)}} = {c^{(2)}} = 10\), \({N^{(1)}} = {N^{(2)}} = 10\), \(\delta = 1\), \(h = 0.5\), and \(a \in A: = (10,20,...,400)\). We assume Poisson distributed sales probabilities \(P_t^{(h)}(i,j,a,p)\), which are determined by \(t = 0,h,1,...,T\), \(k=1,2\), \(a,p \in A\), cf. (1),

$$\begin{aligned} \varLambda _{t,h}^{(k)}(a,p):= h \cdot \left( {1 - {e^{ - {{10}^5} \cdot {a^{ - 2.5 + t/T}}}}} \right) \cdot \beta (a,p), \end{aligned}$$

and the competition factor \(\beta (a,p)\), \(a,p \in A\),

$$\begin{aligned} \beta (a,p):= {1_{\{ a > 0\} }} \cdot \frac{{ {p - L \cdot \min (a,p)} }}{{a + p - 2 \cdot L \cdot \min (a,p)}} \end{aligned}$$

which is characterized by the competition parameter L, \(-\infty< L < 1\). Note, the price sensitivity of customers is increasing in L. For the time being, we let \(L:=0.8\).

Table 1 illustrates the expected profits of firm 1 for different inventory levels n and different points in time t (for the case that firm 2’s price is \(p=100\) and its inventory level is \({N^{(2)}}=10\)). We observe that the expected future profits are decreasing in time and increasing-decreasing in the number of items left to sell. The optimal expected profits of the second firm have the same characteristics. Compared to firm 1 the total expected profits of firm 2 are slightly larger (\(W_h^*(10,10,a_0^*(10,10,0)) = 1769\)).

Table 1. Expected profits \(V_t^*(n,10,100)\), Example 3.1, cf. [18].

Table 2 illustrates the feedback prices of firm 1 for different competitor’s inventory levels m and different prices p (for the case that \(t = 20\) and firm 1’s inventory level is \({N^{(1)}}=10\)). We observe that optimal response prices are decreasing-increasing in the competitor’s price and decreasing in the competitor’s inventory level. I.e., in general, there is an incentive to (slightly) undercut the competitor’s price.

However, if the competitor has a small price and a small inventory level then it is more advantageous to set high prices such that the competitor is likely to sell all of its remaining items, and in turn, our firm becomes a monopolist for the rest of the time horizon. If the competitor’s inventory level is small, the optimal price can even dominate the monopoly price, cf. \(a_{20}^*(10,0,0)=260\) in Table 2.

Table 2. Expected profits \(a_{20}^*(10,m,p)\), Example 3.1, cf. [18].

Remark 3.1

  1. (i)

    The expected profits are increasing-decreasing in their own inventory level.

  2. (ii)

    The expected profits are decreasing in the competitor’s inventory level.

  3. (iii)

    If there is no discounting then the expected profits are increasing in the time-to-go.

  4. (iv)

    The expected profits are increasing-decreasing in the current competitor’s price.

Remark 3.2

  1. (i)

    The optimal prices are not necessarily decreasing in their own inventory level.

  2. (ii)

    The optimal prices are decreasing in the competitor’s inventory level.

  3. (iii)

    If demand is not increasing in time then the optimal prices are decreasing in the time.

  4. (iv)

    The optimal prices are decreasing-increasing in the current competitor’s price.

Fig. 2.
figure 2

Simulated price paths and associated inventory levels over time; Example 3.1, cf. [18].

Figure 2 illustrates simulated sales processes in the context of Example 3.1. Figure 2a illustrates the price trajectories of the two competing firms. Figure 2b shows the associated evolutions of the inventory levels. As demand is increasing in time, on average, prices as well as the number of sales increase at the end of the time horizon.

4 A Hidden Markov Model with Partially Observable States

In this section, we assume that the competitor’s inventory level cannot be observed. To derive feedback pricing strategies we use a Hidden Markov Model. We will use probability distributions for the competitor’s inventory level, which are based on the observable prices of both firms.

4.1 Theoretical Solution

Let \({\pi _t}(m)\) denote the (estimated) probability that firm 2 has exactly m items left at time t; let \({\varpi _t}(n)\) denote the probability that firm 1 has exactly n items left at time t. We assume that the initial inventory levels of both competitors are common knowledge; i.e., the starting distributions are \({\pi _0}(m) = {\pi _h}(m) = {1_{\{ m = {N^{(2)}}\} }}\) and \({\omega _0}(n) = {\omega _h}(n) = {1_{\{ n = {N^{(1)}}\} }}\). Furthermore, a run-out is observable, since we assume that in case of a run-out a firm has to set its price equal to zero. The evolutions of the probabilities \({\pi _t}(m)\) and \({\varpi _t}(n)\) are given by, \(n = 0,...,{N^{(1)}}\), \(m = 0,...,{N^{(2)}}\), \({a_t},{p_t},{a_{t - 1 + h}},{p_{t - 1 + h}} \in A\), \(t = 0,1,...,T\),

$$\begin{aligned}\begin{gathered} {\pi _{t + h}}(m;{a_t},{p_t}) = \\ \sum \limits _{{i_1},{j_1} \ge 0,0 \le {m^ - } \le {N^{(2)}}:\atop m = {({m^ - } - {j_1})^ + }} {P_t^{(h)}\left( {{i_1},{j_1},{a_t},{p_t}} \right) } \cdot {\pi _t}({m^ - }) \end{gathered}\end{aligned}$$
$$\begin{aligned} \begin{gathered} {\pi _t}(m;{a_{t - 1 + h}},{p_{t - 1 + h}}) = \\ \sum \limits _{{i_2},{j_2} \ge 0,\atop {0 \le {m^ - } \le {N^{(2)}}:\atop m = {({m^ - } - {j_2})^ + }}} {P_{t - 1 + h}^{(1 - h)}\left( {{i_2},{j_2},{a_{t - 1 + h}},{p_{t - 1 + h}}} \right) \cdot {\pi _{t - 1 + h}}({m^ - })} \end{gathered} \end{aligned}$$
(12)
$$\begin{aligned}\begin{gathered} {\varpi _{t + h}}(n;{a_t},{p_t}) = \\ \sum \limits _{{i_1},{j_1} \ge 0,0 \le {n^ - } \le {N^{(1)}}:\atop n = {({n^ - } - {i_1})^ + }} {P_t^{(h)}\left( {{i_1},{j_1},{a_t},{p_t}} \right) } \cdot {\varpi _t}({n^ - })\end{gathered}\end{aligned}$$
$$\begin{aligned} \begin{gathered} {\varpi _t}(n;{a_{t - 1 + h}},{p_{t - 1 + h}}) = \\ \sum \limits _{{i_2},{j_2} \ge 0,\atop {0 \le {n^ - } \le {N^{(1)}}:\atop n = {({n^ - } - {i_2})^ + }}} {P_{t - 1 + h}^{(1 - h)}\left( {{i_2},{j_2},{a_{t - 1 + h}},{p_{t - 1 + h}}} \right) \cdot {\varpi _{t - 1 + h}}({n^ - })}. \end{gathered} \end{aligned}$$
(13)

Note, (12) and (13) are relevant for both firms as they might try to estimate (i) the competitor’s inventory level as well as (ii) the competitor’s beliefs concerning the own inventory. This way the competitor’s price reactions can be anticipated via a probability distribution.

Both firms are assumed to act rationally. Pricing decisions are such that no firm has an advantage to deviate from its strategy. Due to the defined sequence of events, theoretically, optimal decisions can be recursively inferred. The corresponding value functions of both firms, denoted by

$$\begin{aligned} V_t^{(*)}(n,p,{\varvec{\pi }_t},{\varvec{\omega }_t}) \end{aligned}$$
(14)
$$\begin{aligned} W_{t + h}^{(*)}(m,a,{\varvec{\pi }_{t + h}},{\varvec{\omega }_{t + h}}), \end{aligned}$$
(15)

are determined by the usual boundary conditions \(V_t^{(*)}(0, \cdot , \cdot , \cdot ) = 0\), \(V_T^{(*)}( \cdot , \cdot , \cdot , \cdot ) = 0\) (for firm 1) and \(W_{t + h}^{(*)}(0, \cdot , \cdot , \cdot ) = 0\), \(W_{T + h}^{(*)}( \cdot , \cdot , \cdot , \cdot ) = 0\) (for firm 2) as well as an associated system of Bellman equations similar to (7)–(8) extended by transitions for the beliefs, cf. (12)–(13). The corresponding optimal feedback policies \(a_t^{(*)}(n,p,{\varvec{\pi }_t},{\varvec{\omega }_t})\) and \(p_{t + h}^{(*)}(m,a,{\varvec{\pi }_{t + h}},{\varvec{\omega }_{t + h}})\) of the two competing firms can be computed in recursive order (similar to (9)–(11)).

However, optimal policies cannot be computed in practical applications. Note, the size of the state space is exploding as the probability distributions \(\varvec{\pi }\) and \(\varvec{\omega }\) are involved (cf. curse of dimensionality). Hence, heuristic solutions are needed.

In the next subsection, we present an approach to compute viable heuristic feedback pricing strategies for the model with partially observable states. The key idea is to approximate the functions \(V_t^{(*)}(n,p,{\varvec{\pi }_t},{\varvec{\omega }_t})\) and \(W_{t + h}^{(*)}(m,a,{\varvec{\pi }_{t + h}},{\varvec{\omega }_{t + h}})\) by using weighted expressions of the value functions \(V_t^*(n,m,p)\) and \(W_t^*(n,m,a)\) (of the model with full knowledge) and their associated policies \(a_t^*(n,m,p)\) and \(p_t^*(n,m,a)\) derived in the previous Sect. 3.

4.2 Solution with Partial Knowledge

Motivated by the Hidden Markov Model (HMM), cf. Sect. 4.1, in which the competitor’s inventory level cannot be observed, we want to define viable heuristic pricing strategies for the two competing firms. Based on the current beliefs regarding the competitor’s inventory, we approximate the correct value functions (14)–(15) (and related controls) using price reactions, cf. (9)–(10), and future profits, cf. (7)–(8), of the fully observable model. As the value functions of the fully observable model might systematically overestimate the correct values (14)–(15), we include an additional positive penalty factor z. If z is smaller than 1, future profits (7)–(8) are reduced.

For firm 1 we define the feedback prices, \(t = 0,1,...,T - 1\), \(n = 1,...,{N^{(1)}}\), \(p \in A\),

$$\begin{aligned} \begin{gathered} \tilde{a}_t^{}(n,p;{\varvec{\pi }_t},{\varvec{\omega }_t}) = \mathop {\arg \max }\limits _{a \in A} \left\{ {\sum \limits _{{i_1},{j_1} \ge 0} {P_t^{(h)}({i_1},{j_1},a,p)} } \right. \\ \cdot \sum \limits _{0 \le \tilde{m} \le {N^{(2)}}} {{\pi _t}(\tilde{m})} \cdot \sum \limits _{0 \le \tilde{n} \le {N^{(1)}}} {{\varpi _t}(\tilde{n})} \cdot \sum \limits _{{i_2},{j_2} \ge 0} {P_{t + h}^{(1 - h)}\left( {{i_2},{j_2},} \right. } \\ \left. 1_{\{ \tilde{n} - {i_1}> 0\} } \cdot a, {p_{t + h}^*\left( {{{(\tilde{n} - {i_1})}^ + },{{(\tilde{m} - {j_1})}^ + }{{,1}_{\{ \tilde{n} - {i_1}> 0\} }} \cdot a} \right) } \right) \\ \cdot \, \left( {(a - {c^{(1)}}) \cdot \min (n,{i_1} + {i_2})} \right. + \delta \cdot z \\ \cdot \, V_{t + 1}^*\left( {{{(n - {i_1} - {i_2})}^ + },{{(\tilde{m} - {j_1} - {j_2})}^ + }, {1_{\{ \tilde{m} - {j_1} - {j_2}> 0\} }}} \right. \\ \left. {\left. {\left. { \cdot \, p_{t + h}^*\left( {{{(\tilde{n} - {i_1})}^ + },{{(\tilde{m} - {j_1})}^ + }{{,1}_{\{ \tilde{n} - {i_1} > 0\} }} \cdot a} \right) } \right) } \right) } \right\} . \end{gathered} \end{aligned}$$
(16)

Note, (16) mirrors the beliefs for both inventory levels and the corresponding transitions. For anticipated price reactions we use \({p^*}\), cf. (10). To estimate future profits, we use \(z \cdot {V^*}\), cf. (7).

Similarly, the prices of firm 2 are given by, \(t = 0,1,...,T - 1\), \(m = 1,...,{N^{(2)}}\), \(a \in A\),

$$\begin{aligned} \begin{gathered} \tilde{p}_{t + h}^{}(m,a;{\varvec{\pi }_t},{\varvec{\omega }_t}) = \mathop {\arg \max }\limits _{p \in A} \left\{ {\sum \limits _{{i_1},{j_1} \ge 0} {P_{t + h}^{(1 - h)}({i_1},{j_1},a,p)} } \right. \\ \cdot \sum \limits _{0 \le \tilde{m} \le {N^{(2)}}} {{\pi _{t + h}}(\tilde{m})} \cdot \sum \limits _{0 \le \tilde{n} \le {N^{(1)}}} {{\varpi _{t + h}}(\tilde{n})} \cdot \sum \limits _{{i_2},{j_2} \ge 0} {P_{t + 1}^{(h)}\left( {{i_2},{j_2},} \right. } \\ \left. {a_{t + 1}^*\left( {{{(\tilde{n} - {i_1})}^ + },{{(\tilde{m} - {j_1})}^ + }{{,1}_{\{ \tilde{m} - {j_1}> 0\} }} \cdot p} \right) {{,1}_{\{ \tilde{m} - {j_1}> 0\} }} \cdot p} \right) \\ \cdot \, \left( {(p - {c^{(2)}}) \cdot \min (m,{j_1} + {j_2})} \right. + \delta \cdot z \\ \cdot \, W_{t + 1 + h}^*\left( {{{(\tilde{n} - {i_1} - {i_2})}^ + },{{(m - {j_1} - {j_2})}^ + }, {1_{\{ \tilde{n} - {i_1} - {i_2}> 0\} }}} \right. \\ \left. {\left. {\left. { \cdot \, a_{t + 1}^*\left( {{{(\tilde{n} - {i_1})}^ + },{{(\tilde{m} - {j_1})}^ + }{{,1}_{\{ \tilde{m} - {j_1} > 0\} }} \cdot p} \right) } \right) } \right) } \right\} . \end{gathered} \end{aligned}$$
(17)

In each period, realized sales are used to update the beliefs \(\pi \) and \(\omega \) such that the prices (16) and (17) can be computed during the sales process, i.e.:

$$\begin{aligned} \begin{gathered} \tilde{a}_0^{}{({N^{(1)}},0;{\varvec{\pi }_0},{\varvec{\omega }_0})_{}}{ \rightarrow _{}}{\varvec{\pi }_h},{\varvec{\omega }_h}{ \rightarrow _{}}\tilde{p}_h^{}({N^{(2)}},{a_h};{\varvec{\pi }_h},{\varvec{\omega }_h})\\ { \rightarrow _{}}{\varvec{\pi }_1},{\varvec{\omega }_1}{ \rightarrow _{}}\tilde{a}_1^{}{(X_1^{(1)},{p_1};{\varvec{\pi }_1},{\varvec{\omega }_1})_{}}{ \rightarrow _{}} \ldots \\ { \ldots _{}}\tilde{a}_{T - 1}^{}{(X_{T - 1}^{(1)},{p_{T - 1}};{\varvec{\pi }_{T - 1}},{\varvec{\omega }_{T - 1}})_{}}{ \rightarrow _{}}{\varvec{\pi }_{T - 1 + h}},{\varvec{\omega }_{T - 1 + h}}\\ { \rightarrow _{}}\tilde{p}_{T - 1 + h}^{}(X_{T - 1 + h}^{(2)},{a_{T - 1 + h}};{\varvec{\pi }_{T - 1 + h}},{\varvec{\omega }_{T - 1 + h}}). \end{gathered} \end{aligned}$$
(18)

By using simulations both firms’ expected profits as well as their distributions can be easily approximated. Evaluating different z values makes it possible to identify the (mutual) best z value.

4.3 Numerical Example

To illustrate our approach, in this subsection, we consider a numerical example.

Example 4.1

We assume the setting of Example 3.1. Both firms use the heuristic Hidden Markov strategies, cf. (16)–(18), for different parameter values z, \(0.2 \le z \le 1.5\).

We observe that z has an impact on the expected profits of both competing firms. In our example, simulated average profits of both firms are maximized for \(z=0.8\). Note, the lower z is the more risk averse are the pricing policies (see standard deviations \(\sigma \)), cf. Table 3.

Table 3. Simulated expected profits and its standard deviations of both firms for different z values, Example 4.1, cf. [18].

Remark 4.1

(Parallelization). The computation of feedback policies and particularly extensive simulation studies can become CPU-intensive. Parallelization can be used to compute results more efficiently:

  1. (i)

    Feedback prices for the same point in time can run in parallel.

  2. (ii)

    Simulations can be computed independent from each other.

Figure 3 illustrates simulated sales processes in the context of Example 4.1. Figure 3a illustrates price trajectories of the two competing firms. Figure 3b shows the associated evolutions of the inventory levels and the (mutually) estimated inventory levels of the competitor (dashed lines).

Fig. 3.
figure 3

Simulated price paths and associated (estimated) inventory levels over time, \(z=0.8\); Example 4.1, cf. [18].

5 Unknown Strategies

In this section, we want to present another heuristic approach to derive effective pricing strategies in competitive markets with limited information. We assume that the strategy of the competitor is completely unknown.

Our key idea to deal with unknown price reactions is to assume sticky prices. For firm 1, we define the following value function, \(p \in A\), \(n \ge 1\), \(t = 0,1,...,T - 1\), \({\bar{V}_t}(0,p) = 0\) for all tp, \({\bar{V}_T}(n,p) = 0\) for all np,

$$\begin{aligned} \begin{gathered} {\bar{V}_t}(n,p) = \mathop {\max }\limits _{a \in A} \left\{ {\sum \limits _{{i_1},{j_1}} {P_t^{(h)}({i_1},{j_1},a,p)} } \right. \\ { \cdot \sum \limits _{{i_2},{j_2}} {P_{t + h}^{(1 - h)}({i_2},{j_2},a,p)} } \cdot \left( {(a - {c^{(1)}}) \cdot \min (n,{i_1} + {i_2})} \right. \\ \left. {\left. { + \,\delta \cdot \bar{V}_{t + 1}^{}\left( {{{(n - {i_1} - {i_2})}^ + },p} \right) } \right) } \right\} . \end{gathered} \end{aligned}$$
(19)

The heuristic strategy \({\bar{a}_t}(n,p)\) – determined by the arg max of (19) – only depends on t, n, and p. Similarly, the corresponding pricing strategy \({\bar{p}_t}(m,a)\) of firm 2 is determined by the arg max of, \(a \in A\), \(m \ge 1\), \(t = 0,1,...,T - 1\), \({\bar{W}_{t+h}}(0,a) = 0\) for all ta, \({\bar{W}_{T+h}}(m,a) = 0\) for all ma,

$$\begin{aligned} \begin{gathered} {\bar{W}_{t + h}}(m,a) = \mathop {\max }\limits _{p \in A} \left\{ {\sum \limits _{{i_2},{j_2}} {P_{t + h}^{(1 - h)}({i_2},{j_2},a,p)} } \right. \\ { \cdot \sum \limits _{{i_1},{j_1}} {P_{t + 1}^{(h)}({i_1},{j_1},a,p)} } \cdot \left( {(p - {c^{(2)}}) \cdot \min (m,{j_1} + {j_2})} \right. \\ \left. {\left. { +\, \delta \cdot \bar{W}_{t + 1 + h}^{}\left( {{{(m - {j_1} - {j_2})}^ + },a} \right) } \right) } \right\} . \end{gathered} \end{aligned}$$
(20)

The advantage of this approach is that the value function does not need to be computed for all competitors’ prices p in advance. The value function and the associated pricing policy can be computed separately for single prices p (e.g., just when they occur). If the competitor’s strategy is not known (which is often the case) it is not possible to anticipate potential price adjustments. This feedback strategy is able to react immediately if a change of the competitor’s price takes place. In such an event, the value functions (19)–(20) and the associated prices have to be computed for the new state.

Remark 5.1

(Oligopoly competition). Note, due to the curse of dimensionality, the strategies derived in Sects. 3 and 4 are just applicable when the number of competitors is small. The heuristic strategy described above, however, can still be applied if the number of competitors is large. In case of K competitors, the state p in (19) just have to be replaced by \(\varvec{p} = ({p^{(1)}},...,{p^{(K)}})\), \({p^{(k)}} \in A\), \(k = 1,...,K\).

Fig. 4.
figure 4

Simulated price paths and associated inventory levels over time; setting of Example 3.1, cf. [18].

For the case that the competitor’s strategy is unknown, Fig. 4 illustrates simulated sales processes based on the heuristic, cf. (19)–(20), in the context of Example 3.1. Figure 4a illustrates price trajectories of the two competing firms. We observe that firms either significantly raise the price or undercut the competitor’s price. Figure 4b shows corresponding inventory levels.

6 Evaluation

In this section, we want to compare the outcome of our different solution strategies, which take advantage of different kind of information.

6.1 Comparison of Strategies

If pricing strategies are allowed to use full information (i.e., the own inventory level, the competitor’s inventory level, and the competitor’s price), the optimal expected profits can be computed analytically, cf. Sect. 3. In case the competitor’s inventory level is not known, we presented an approach to compute viable strategies via a Hidden Markov Model, cf. Sect. 4. If the competitor’s inventory is not known and her pricing strategy as well as her reaction time is unknown, we proposed an efficient heuristic.

By \({S_{FK}}\), we denote the strategy derived in Sect. 3 (full knowledge). By \({S_{PK}}\), we denote the response strategy derived in Sect. 4 (partial knowledge) with \(z=0.8\). By \({S_{UK}}\), we denote the heuristic strategy, cf. Sect. 5, in case that the competitor’s strategy is unknown.

Considering the setting of Examples 3.1 and 4.1, the expected profits of the different symmetric strategy combinations are summarized in Table 4. In all cases, the expected total profits, the expected remaining inventory, and the standard deviations of total profits have been derived using simulations.

Table 4. Strategy comparison (benchmark case \(h=0.5\), \(L=0.8\)): Expected profits \(EG_0^{(1)}\) (of firm 1) and \(EG_0^{(2)}\) (of firm 2), when firm 1 and firm 2 play different pairs of strategies using \({S_{FK}}\) (full knowledge), \({S_{PK}}\) (partial knowledge), and \({S_{UK}}\) (unknown strategies), cf. Examples  3.14.1.

In the first three cases, we observe that in all three symmetric scenarios both firms can expect similar results, cf. Figs. 5, 6 and 7. It turns out that as long as the information structure is identical, a lack of information does not necessarily result in smaller expected profits.

Fig. 5.
figure 5

Simulated expected price paths, associated inventory levels, and accumulated profits over time, full knowledge FK vs. FK; setting of Examples 3.14.1.

Fig. 6.
figure 6

Simulated expected price paths, associated inventory levels, and accumulated profits over time, partial knowledge PK vs. PK; setting of Examples 3.14.1.

The number of unsold items as well as the variance of profits, however, have significant differences. In case of fully observable states (\(S_{FK}\) vs. \(S_{FK}\)) the remaining inventory and the variance of profits is comparably high. Both firms can expect almost equal results. In the second case with partially observable states (\(S_{PK}\) vs. \(S_{PK}\)) we observe that the load factor of both firms is higher and the variation of profits is much smaller. Since less information is available the competition between both firms is less intense.

Fig. 7.
figure 7

Simulated expected price paths, associated inventory levels, and accumulated profits over time, no knowledge UK vs. UK; setting of Examples 3.14.1.

In case of mutual unknown strategies (\(S_{UK}\) vs. \(S_{UK}\)), we obtain similar results. Furthermore, we can assume that the heuristic strategy \(S_{UK}\) will yield robust results when played against various other strategies. The other two strategies are optimized to play against a specific strategy. Hence, they might perform less well, when the competitor is playing a different strategy. Moreover, the efficient computation of our heuristic \(S_{UK}\) allows fast computation times and, in turn, a high price reaction frequency, which is also a competitive advantage.

In the remaining cases of Table 4, we present the results of asymmetric strategy pairs. As expected, we observe that strategies that have or use more information beat strategies with less information. However, profit differences are relatively small, which means that our strategies with incomplete information are surprisingly competitive.

Further, the firm that has the final price adjustment (firm 2) has a slight advantage. In general, we observe that strategies that use more information tend to have higher standard deviations of profits and a lower load factor.

Note, in the asymmetric setups, both strategies are not optimal response strategies; they are optimized to be played against their symmetric counterpart. Hence, theoretically results could be worse (compared to the symmetric cases) since in our asymmetric setups the competitor might not react as expected. However, we observe that profits are hardly lower. The reason is that the derived strategies (\(S_{FK}\), \(S_{PK}\), \(S_{UK}\)) are quite robust due to their feedback nature. Further, in asymmetric setups the competition is less fierce as price reactions are not optimized to be played against the competitor’s strategy. For optimized response strategies against given strategies, see [17].

6.2 Impact of Customers Price Sensitivity

In this subsection, we study to which extent results, cf. Table 4, are affected if customers are more price sensitive. Such cases can be modelled using a higher competition factor L, cf. Example 3.1. Similarly, a lower factor L corresponds to cases in which customers are more loyal and tend to stick to a certain firm instead of steadily comparing prices.

Table 5 summarizes the performance results for all symmetric and asymmetric duopoly scenarios for the case \(L:=0.95\). Again, results were computed using simulation studies.

In case of a higher price sensitivity, we again observe that strategies are more successful if more information is used/available. More interestingly, we observe that (compared to the benchmark case, cf. Table 4) due to fierce competition it is more important whether a firm has the last move. One might think that in cases with high price sensitivity profits are lower as products with the higher price can hardly be sold, and in turn, both firms are forced to systematically undercut the competitor’s price in order to sell items (race to the bottom). Surprisingly profits are not necessarily lower! The reason is that the effects of a higher price sensitivity are counterbalanced by the fact that the firm, which sells less fast is likely to become a monopolist for the rest of the time horizon.

Table 5. Impact of price sensitivity factor L (case \(h=0.5\), \(L=0.95\)): Expected profits \(EG_0^{(1)}\) (of firm 1) and \(EG_0^{(2)}\) (of firm 2), when firm 1 and firm 2 play different pairs of strategies using \({S_{FK}}\) (full knowledge), \({S_{PK}}\) (partial knowledge), and \({S_{UK}}\) (unknown strategies), cf. Example 3.14.1.
Table 6. Impact of reaction time (case \(h=0.2\) vs. \(h=0.8\), \(L=0.8\)): Expected profits \(EG_0^{(1)}\) (of firm 1) and \(EG_0^{(2)}\) (of firm 2), when firm 1 and firm 2 play different pairs of strategies using \({S_{FK}}\) (full knowledge), \({S_{PK}}\) (partial knowledge), and \({S_{UK}}\) (unknown strategies), cf. Examples 3.14.1.

6.3 Impact of Reaction Time

In this subsection, we investigate the impact of reaction times on our strategies’ performance results. In our model the reaction time can be varied via the parameter h, \(0<h<1\). While firm 2 reacts on firm 1’s action with a delay of h, firm 1’s reaction time on firm 2’s price adjustment is \(1-h\). A reaction time \(h=0.2\) corresponds to the case in which firm 1 has \(h=20\%\) of the time the “fresh” price; firm 2’s share is \(1-h=80\%\).

In real-life applications, firms often randomize their reaction time in order not to act predictably. In this case, the ratio of the competing firms’ reaction frequencies determines the share of time a firm has the most recent price update. In [17] it is demonstrated that such scenarios can be effectively modelled via our duopoly model with fixed reaction times h and \(1-h\), respectively.

To this end, Table 6 shows simulated performance results for all duopoly scenarios for two different (uneven) reaction times \(h=0.2\) and \(h=0.8\). The price sensitivity factor is \(L=0.8\).

We observe that, in general, profits are significantly affected by response times. Hence, price update frequencies are a competitive advantage. We find that the competitor with a better (more frequent) reaction time can even beat its opponent although a strategy with using less information is applied, i.e., a better reaction time can overcompensate the lack of information.

7 Conclusion

In e-commerce, it has become easier to observe and adjust prices automatically. Consequently, there exists an increased demand for dynamic pricing. The computation of suitable pricing strategies is highly challenging as soon as strategic competitors are involved and remaining inventory levels play a major role. In this paper, we analyzed stochastic dynamic finite horizon duopoly models characterized by price responses in discrete time. We allow sales probabilities to generally depend on time as well as the competitors’ prices. Further, we are able to model different price reaction times.

We have considered three different types of information structures. In the first setting, we assume that the inventory levels of the competing firms are mutually observable. We show that optimal price reaction strategies – which are based on mutual price anticipations – can be derived using standard methods (e.g., backward induction). Examples are used to identify structural properties of expected profits and feedback pricing strategies. Optimal prices are balancing two effects: (i) slightly undercut the competitor’s price in order to sell more items, and (ii) the use of high prices in order to promote a competitor’s run-out and to act as a monopolist for the rest of the time horizon.

In the second setting, we assume that the inventory of the competitor is not observable. Based on observable prices, we compute probability distributions (beliefs) for the number of items the competitor might have left to sell. We propose a Hidden Markov Model to be able to compute applicable feedback pricing strategies. Our examples show that the resulting expected profits of both firms are similar to those obtained in the model with full knowledge. The variance of profits and the average number of remaining items, however, is significantly lower.

In the third setting, we assume that the competitor’s strategy is completely unknown, i.e., competitors cannot anticipate price responses. We propose an efficient decomposition approach to circumvent the curse of dimensionality and demonstrate how to compute powerful pricing strategies. We verify that – when applied by both competitors – the heuristic yields the same expected profits as in the two other settings, in which more information is available.

We have shown how to compute applicable reaction strategies for real-life scenarios with different information structures. We find that sales results are quite similar as long as the information structure is symmetric. Our numerical experiments of asymmetric strategy setups show that additional information leads to significantly higher profits (compared to the competitor). Further, we observe that a higher price sensitivity (e.g., when customers are less loyal) does not lead to a significant decrease in expected profits. Moreover, we find that higher price reaction frequencies can even overcompensate a lack of information.

In future research, the model could be extended to study scenarios with (i) multiple products and substitution effects in demand, (ii) strategic customers that anticipate typical price trends, or (iii) competitors that seek to learn the competitors’ pricing strategy based on historic data.