1 Introduction

In this paper, we consider a stochastic inventory system with both minimum and maximum order quantity requirements. The minimum order quantity (MinOQ) requirement means that the order quantity must equal or exceed a specified level if an order is placed. The maximum order quantity (MaxOQ) constraint means that the order quantity cannot be larger than a specified amount.

The MinOQ requirement is widely adopted by suppliers in practice, as it sets the lowest quantity of a certain product that a supplier is willing to sell. If the buyer cannot reach the MinOQ requirement, then the supplier is not willing or able to enter production. With the prevalence of e-commerce, MinOQ requirements are becoming increasingly common in our lives, especially in online sourcing portals such as alibaba.com, where suppliers often set such requirements, e.g., 500 pieces for bathroom carpetsFootnote 1, 100 pieces for quartz watchesFootnote 2, and 50 pieces for denim jacketsFootnote 3.

The application of MaxOQ is also ubiquitous in practice. On the one hand, the MaxOQ requirement can be set by the supplier on its own. This may be due to the supplier’s insufficient production capacity or limited resources to be rationed among several competitors. For example, during the outbreak of the COVID-19 pandemic, several businesses set a maximum order quantity of face masks that each organization/person could buy (Global Times 2020; Strong 2020). On the other hand, the MaxOQ constraint can be caused by the buyer’s limited storage space or capital. For example, Chan and Muckstadt (1999) found that some firms have an upper limit on their order quantity, because fluctuations in inventory levels can be costly.

Since MinOQ and MaxOQ requirements are prevalent in industries, it is of no surprise that suppliers may simultaneously apply both requirements. Indeed, our decision to jointly consider both MinOQ and MaxOQ requirements in this paper is largely motivated by our experience with a wholesale company in China. For a variety of products, the firm first replenishes its stock from suppliers and then sells to retail customers, and for most of these products, the firm stipulates both MinOQ and MaxOQ requirements. Managers in such situations need principles or tools to help control their inventory, because both requirements have a negative effect on buyers’ inventory control, especially when MinOQs are relatively large compared with their demand, which is common in practice. However, to the best of our knowledge, no research has investigated inventory systems with both MinOQ and MaxOQ requirements. Thus, the primary goal of this paper is to fill this gap in the literature.

In this paper, we consider a single-product stochastic periodic-review inventory system with both MinOQ and MaxOQ requirements. The selling firm can make a decision at the beginning of each time period after reviewing the inventory position. When the firm decides to place an order, the order quantity must satisfy both the MinOQ and MaxOQ constraints, where we assume that the MinOQ is less than the MaxOQ. The leftover inventory is carried over to the next period and incurs a holding cost, whereas unsatisfied demand is fully backlogged and incurs a backordering cost. The total costs consist of the linear ordering cost, the holding cost, and the backordering cost. The objective is to minimize the long-run average cost of the system.

It is shown by Federgruen and Zipkin (1986a) that base-stock policies are optimal for systems with only MaxOQ requirements (without MinOQ requirements). However, the existence of MinOQ generates a considerable difficulty for inventory control. To see this, note that when there is only a MaxOQ constraint, the action sets for the problem are joint, from zero to the MaxOQ. When MinOQ requirements are taken into consideration, the action space becomes disjoint, from MinOQ to MaxOQ plus zero. Such a change makes optimally solving the inventory problem challenging. Zhao and Katehakis (2006) shows that the optimal policy structure is too complicated to be fully characterized due to disjoint action sets even for systems with only MinOQ requirements, to say nothing of systems with both MinOQ and MaxOQ requirements.

Therefore, the focus of this paper is to propose effective and easy-to-implement heuristic policies for stochastic inventory systems with both MinOQ and MaxOQ requirements. To achieve this, we develop a simple heuristic policy, called the modified (sS) policy, which is motivated by the modified echelon (rQ) policies recently adopted by Hu and Yang (2014) and Zhu et al. (2021) for multi-echelon inventory systems. The idea of the modified (sS) policy is quite simple: when the inventory position (before ordering) is at or below the reorder level s, an order is placed to raise the inventory position (after ordering) as close as possible to the order-up-to level S.

We find that under the modified (sS) policy, the inventory position after ordering follows a discrete-time Markov chain. This enables us to compute the steady-state probabilities and thus the expected long-run average cost. In addition, we show that the optimal inventory parameters are bounded, which allows us to optimize our proposed policy. Our numerical results verify the effectiveness of the modified (sS) policy, as its numerical performance is excellent.

Our results shed light on inventory control for stochastic systems with both MinOQ and MaxOQ constraints. First, although the optimal policy structure is still unknown, our proposed easy-to-implement policy can have excellent performance. Second, it is well known that (sS) policies perform well for single-stage systems, e.g., systems with fixed ordering costs. Our work shows that (sS) policies and their adaptations can still perform well in systems with both MinOQ and MaxOQ requirements. Third, note that (rQ) policies are reduced to (sS) policies when there is no batch size constraint, i.e., batch size \(Q=1\). Our work also extends the excellent performance of modified (rQ) policies for multi-echelon systems to single-stage systems.

The remainder of this paper is organized as follows. The literature on MinOQ and MaxOQ requirements is discussed in Sect.  2. In Sect. 3, the model description and notations are presented. We introduce our proposed policy and provide cost-evaluation methods in Sect. 4. Policy optimization and the corresponding algorithm are presented in Sect. 5. Numerical examples are presented in Sect.  6 to measure the effectiveness of the proposed policy by comparing it with a benchmark. Finally, Sect. 7 concludes the paper by summarizing the findings.

2 Literature review

This work is closely related to two streams of research literature: inventory systems with MaxOQ requirements and inventory systems with MinOQ requirements.

The first stream of related research is on inventory systems with MaxOQ requirements, usually in the context of capacitated inventory systems in inventory management literature. For pure capacitated inventory systems without MinOQ requirements, the issue of MaxOQ has been extensively studied. For example, Federgruen and Zipkin (1986a, 1986b) establish the optimality of (modified) base-stock policies for capacitated single-stage inventory systems under average-cost and discounted-cost criteria, respectively. Tayur (1993) designs an efficient algorithm to compute the optimal base-stock level. Levi et al. (2008) propose a novel cost-accounting scheme for capacitated inventory systems and develop a corresponding heuristic policy with a worst-case performance guarantee of two. Then Shi et al. (2014) develop an approximation algorithm with a worst-case performance guarantee of four for capacitated systems with setup costs.

A number of studies consider inventory systems with MaxOQ requirements in a more complicated setting. Kapuściński and Tayur (1998) extend the optimality of base-stock policies from stationary demand to nonstationary demand settings. Parker and Kapuscinski (2004) characterize the optimal policy structure for a capacitated two-echelon system. Huh et al. (2016) develop heuristic policies and derive performance bounds for serial systems with MaxOQ constraints at each stage. Gallego and Scheller-Wolf (2000) consider capacitated inventory systems with fixed ordering costs and show that the optimal capacitated policy has an (sS)-like structure. Similar to this work, there is a vast body of research that simultaneously considers MaxOQ and other requirements/factors, including remanufacturing (Gong and Chao 2013), random capacity (Ciarallo et al. 1994), advanced demand information (Özer and Wei 2004), and outsourcing options (Yang et al. 2005). Our work also falls into this category, as we simultaneously consider both MaxOQ and MinOQ requirements.

The second stream concerns inventory systems with MinOQ constraints. Zhao and Katehakis (2006) first partially characterize the optimal policy structure of a single-stage, single-item problem with an MinOQ requirement and demonstrate the complexity of the optimal policy by showing that the cost functions may have multiple local minimums in the domain. Then, Zhou et al. (2007) propose a two-parameter heuristic policy, called the (st) policy, for the same system. The (st) policy operates as follows. When the inventory position is less than s, an order is placed to raise the inventory position to \(s+M\); when the inventory position is between s and t, an order with exactly MinOQ units is placed; otherwise, no order is placed. In a later work, Kiesmüller et al. (2011) also study the same system and propose a simpler one-parameter policy, dubbed the S policy, which is a special case of the (st) policy with \(s=S-M_{min}\) and \(t=S-1\). Both heuristic policies in Zhou et al. (2007) and Kiesmüller et al. (2011) perform well in their numeric studies, and the latter, the performance of which may not be as strong as the former, is more effective and easier to implement.

Following these works, some papers study MinOQ requirements in a more complicated model setting. For example, Zhu et al. (2015) investigate a system with both MinOQ and batch ordering constraints. That is, each time an order is placed, the order quantity must be not only larger than a threshold but also an integral multiple of a specified batch size. In another example, Shen et al. (2019) consider a multi-echelon inventory system where the upstream installation faces an MinOQ requirement. Motivated by Zhou et al. (2007) and Kiesmüller et al. (2011), the authors develop two-parameter and one-parameter heuristic policies for a stochastic distribution inventory system. Other related works on MinOQ requirements include Chang (2004), Kesen et al. (2010), Okhrin and Richter (2011), Chow et al. (2012), Park and Klabjan (2015) and Hou et al. (2017). In addition to the MinOQ requirement, our work also takes into consideration MaxOQ constraints, which are absent in all of the aforementioned papers in this stream.

It should also be clarified that our work and other work in the MinOQ stream differ from the papers that consider minimum total order quantity commitments, e.g., Chen and Krass (2001), Yuan et al. (2015), Wang et al. (2017), and Gong et al. (2021). In works with a minimum total order quantity requirement, usually in a multi-period setting, the manager makes ordering decisions for each period, and the total order quantity across all periods is required to be no less than a specified level. However, there is no requirement on the order quantity for each period; that is, the manager can just order one single unit if she likes. Unlike those papers, in our work, it is the order quantity in each period that is required to be no less than a threshold, rather than the total order quantity.

Our work also belongs to the body of research that develops general (sS) or (rQ) policies. In addition to the above-mentioned (st) and S policies, which can be categorized as general (sS) policies, several papers particularly develop modified (sS) or (rQ) policies for complicated inventory systems. For example, Hu and Yang (2014) first propose modified echelon (rQ) policies for stochastic serial inventory systems. More important, the authors provide explicit performance bounds to verify the effectiveness of the proposed heuristic. Recently, Zhu et al. (2021) extend this work by developing modified echelon (rQ) policies and deriving performance bounds for stochastic distribution systems. A number of papers also consider adaptations of (rQ) policies that outperform classical (rQ) policies, e.g., Moinzadeh (2002), Özer (2003) and Axsäter and Marklund (2008). However, none of these papers take order quantity constraints into consideration.

Besides Hu and Yang (2014) and Zhu et al. (2021), the paper most closed to ours may be Chan and Muckstadt (1999). In their work, the authors study a production smoothing problem in which the production quantity in each period is strictly positive and further constrained by both minimum and maximum levels in each period. They characterize the optimal policy under the discounted cost criterion. However, our model is fundamentally different from their work because the order quantity in our work is either zero or within an interval (at least the MinOQ and at most the MaxOQ); therefore, the action space in our model is disjoint, while in Chan and Muckstadt (1999), the action sets are connected and compact. Zhao and Katehakis (2006) shows that the structure of the optimal policy is too complicated to be fully characterized due to these disjoint action sets, even for a system with only MinOQ requirements (without MaxOQ restrictions).

2.1 Contribution

In this work, we study a stochastic inventory system with both MinOQ and MaxOQ constraints, the optimal policy of which is still unknown, and even if it exists, it must be extremely complicated. Such a system is not well-studied yet in the literature due to its complexity, and no solution has been particularly developed. Therefore, it is of great relevance to propose some effective and easy-to-implement heuristic policies to manage this kind of inventory systems. To the best of our knowledge, our work is the first one that fills this gap in the literature. In particular, the disjoint action space makes it difficult for us to characterize any structural property or search the inventory parameters of the optimal policy. To solve this problem, we propose the modified (sS) policy, the salient feature of which is to bound the system state into a closed set (interval). Such a property enables us to compute the exact system-wide cost. In this sense, our work also contributes to the literature on modified (sS) polices that used to focus on computing cost upper bounds.

The contribution of our work on managerial insights is twofold. On the one hand, although the proposed policy makes intuitive sense, we show that the effectiveness of the modified echelon (rQ) policyFootnote 4 studied by Hu and Yang (2014) and Zhu et al. (2021) in multi-echelon inventory systems can carry over to single-stage systems, even with both MinOQ and MaxOQ requirementsFootnote 5. On the other hand, we extend the effectiveness of general (sS) policies (i.e., (sS) policies with some modifications) from standard single-stage systems without (or with partial) order quantity constraints (see Zheng 1991; Zhou et al. 2007; Kiesmüller et al. 2011) to systems with both MinOQ and MaxOQ constraints.

3 Model description

We consider a single-item period-review inventory system with stochastic demand. The stochastic demand D in each period is an independent identically distributed (i.i.d.) random variable and bounded on \([0,D_{max}]\). Whenever demand cannot be fully satisfied from stock, the unsatisfied units are backordered. We assume no fixed ordering cost, but there is an inventory holding cost h per unit per period, and a backlogging/penalty cost b per unit per period. In each period, the inventory system replenishes its stock from an external supplier with both MinOQ and MaxOQ requirements. That is, when an order is placed to the supplier, the order quantity q is lower bounded by MinOQ (denoted by \(M_{min}\)) and upper bounded by MaxOQ (denoted by \(M_{max}\)), i.e., \(M_{min}\le q\le M_{max}\). Note that it is also allowed to order nothing or not to place an order.

In addition, we make the following two assumptions on \(M_{max}\) and \(M_{min}\).

Assumption 1

(i). \(M_{min}\le M_{max}\); and (ii). \(D_{max}\le M_{max}\).

Assumption 1(i) is imposed to avoid trivial cases. Recall that as introduced in Sect. 1, the objective of this research, due to the existence of MinOQ requirements, is to provide effective inventory control policies for the settings where the MinOQ (i.e., \(M_{min}\)) is relatively large compared with demand (otherwise the MinOQ requirements would become trivial). Then, Assumption 1(ii) follows from this and Assumption 1(i). Note that if this assumption is violated, then for some combinations of specific realizations of D (e.g., consecutive large Ds), it is quite possible that the retailer needs a large amount of stock, while the supplier fails to fully replenish the retailer due to the capacity (MaxOQ) constraint. In such a case, inventory management is perhaps much less important than finding a supplier with enough capacity. Moreover, in such a case, E(D) tends to be relatively large, while the MinOQ requirement may be a loose constraint or even redundant. It is noteworthy that the scope of this research is inventory systems for which the MinOQ and MaxOQ requirements are both stringent.

We would like to note that when \(M_{min}=1\), our model is reduced to the capacitated inventory systems (without MinOQ requirements) studied in Federgruen and Zipkin (1986a). When \(M_{max}\rightarrow \infty \), our model is reduced to inventory systems with MinOQ requirements, which are well studied in Zhao and Katehakis (2006), Zhou et al. (2007), and Kiesmüller et al. (2011).

The sequence of events is as follows. A possibly outstanding order arrives at the beginning of each period. The inventory position is then reviewed and an order is placed if necessary. During the period, the stochastic demand is realized, and unsatisfied demands (if any) are fully backlogged. At the end of each period, the inventory cost is evaluated. Let \(Z^+=\max \{0,Z\}\) and denote the average inventory position after ordering by y. The long-run average cost function C(y) is given by

$$\begin{aligned} C(y)=h{\mathbb {E}}[(y-D)^+ ]+b{\mathbb {E}}[(D-y)^+], \end{aligned}$$

where y and D are both integers. It is easy to see that C(y) is convex. Let \(y^*\) be a minimizer of C(y). We also assume that \(M_{min}\) and \(M_{max}\) are both integers. In the remainder of this paper, we use [A, B] to denote the integer numbers between A and B (if A and B are integers, they are included).

4 Model analysis

In this section, we first introduce our proposed policy–the modified (sS) policy. Then we present how to compute the long-run average cost of the proposed policy.

4.1 Policy description

Zhao and Katehakis (2006) first find the structure of the optimal policy for the system with an MinOQ to be rather complex and conclude that such an optimal policy is not practically implementable. The presence of MaxOQ constraint makes the problem even more complicated. Therefore, it is necessary to develop easily implementable policies that have good performance. Motivated by the modified echelon (rQ) policies studied by Hu and Yang (2014) and Zhu et al. (2021) for multi-echelon systems without order quantity constraints and based on the analysis of Zhou et al. (2007) and Kiesmüller et al. (2011), we propose a modified (sS) policy for single-stage systems with both MinOQ and MaxOQ requirements.

Definition 1

(The modified (s S) policy) Under the modified (sS) policy, when the inventory position (before ordering) x is at or below the reorder level s, an order is placed to raise the inventory position (after ordering) y as close as possible to the order-up-to level S. Notably, the order size q must satisfy the MinOQ and MaxOQ constraints, i.e., \(M_{min}\le q\le M_{max}\).

Compared to the standard (sS) policy, the most important difference of the modified (sS) policy is that it does not require the inventory position after ordering y to be exactly the order-up-to level S. Rather, here, the modified echelon (sS) policy only requires y to approach S as close as possible but not necessarily to reach S, which implies that y can actually be either higher or lower than S.

Note that under the modified echelon (rQ) policy studied in multi-echelon systems (see Hu and Yang 2014; Zhu et al. 2021), an order is placed to raise the inventory position as close as possible to the order-up-to level. Following their spirit, we also refer to our proposed policy as a modified (sS) policy. However, there is also one fundamental difference between our policy and theirs. In their work, y at retailers must be equal to or less than the order-up-to level S (otherwise, their methods fail to work), while here under our policy, y can be higher than S. The reason for this is that the order quantity must satisfy the MinOQ and MaxOQ requirements. Specifically, in some scenarios, even an order of the largest size \(M_{max}\) fails to raise the inventory position to S. If so, we have \(y=x+M_{max}<S\). In some other scenarios, an order of the smallest size \(M_{min}\) raises y above S. If x is less than the reorder level s (probably close to s), such an order has to be placed to raise y to \(x+M_{min}\) above S.

Due to the complicated relationship among multiple parameters, including \(M_{min}\), \(M_{max}\), s, S and \(D_{max}\), the modified (sS) policy does not have a uniform expression for q. Therefore, we next separately consider six different cases. Simply speaking, the six cases are obtained based on 1) the relationship among \(M_{min}\), \(M_{max}\) and \(S-s\); and 2) the relation ship among \(M_{min}\), \(M_{max}\) and \(S-(s+1-D_{max})\). Note that \(s+1-D_{max}\) is the lowest possible inventory position x before ordering in the modified (sS) policy, and thus \(S-(s+1-D_{max})\) denotes the largest possible gap between x and S. Following the same logic, when an order is to be placed, s is the highest possible inventory position x before ordering, and \(S-s\) is the smallest possible gap between x and S.

4.2 Cost evaluation

As mentioned above, the order quantity q can have different expressions across six cases. Therefore, to evaluate the system-wide cost, we need to consider all possible cases. In this subsection, we provide cost evaluation for each separate case. Specifically, we provide detailed analysis for Case 1, and expressions of q for Cases 2–6 in the main body. The remaining analysis for Cases 2–6 follows the same logic of that for Case 1, and thus we relegate the details to online appendix. The purpose of this analysis is to compute the long-run average cost for the proposed modified (sS) policy, for any given pair of (sS). In next section, we will study how to select the values of s and S to minimize the total cost.

Case 1: \(M_{min}\le S-s\le M_{max}\) and \(S-(s+1-D_{max})\le M_{max}\). In this case, the gap between S and \(s+1-D_{max}\) is less than \(M_{max}\), this implies that the order-up-to level S can always be achieved by placing an order with the order size \(q\le M_{max}\). In addition, because \(S-s\ge M_{min}\), the order-up-to level S can always be reached exactly by placing an order with the order size \(q\ge M_{min}\). Therefore, in this case, the modified (sS) policy is reduced to the standard (sS) policy.

To better illustrate the system dynamics, we take period n as an example. Let \(x_n\) and \(y_n\) be the inventory positions before and after ordering in period n, respectively. Then, by definition, the order quantity in period n, denoted by \(q_n\), under the modified (sS) policy is given by

$$\begin{aligned} q_{n}=y_n-x_n=\left\{ \begin{array}{ll} 0, &{} \hbox {if }x_n>s; \\ S-x_n, &{} \hbox {if }x_n\le s. \end{array} \right. \end{aligned}$$

It follows that \(y_{n+1}\) in this case can be expressed as

$$\begin{aligned} y_{n+1}=\left\{ \begin{array}{ll} y_n-D_n, &{} \hbox {if }y_n-D_n>s; \\ S, &{} \hbox {if }y_n-D_n\le s, \end{array} \right. \end{aligned}$$

where \(D_n\) denotes the demand in period n. We can see that \(\{y_n\}\) is a discrete-time Markov chain (DTMC) and has a finite state space \([s+1, S]\). The state space can be divided into two segments:

  1. 1.

    \([s+1, S-1]\): if \(y_{n+1}\in [s+1, S-1]\), then this means that \(q_{n+1}=0\). It follows that \(D_n=y_n-y_{n+1}\).

  2. 2.

    S: if \(y_{n+1}=S\), then \(q_{n+1}=0\) or \(q_{n+1}=S-(y_n-D_n)\). Specifically, when \(q_{n+1}=0\), then \(D_n=y_n-y_{n+1}\); when \(q_{n+1}=S-(y_n-D_n)\), then \(y_n-D_n \le s\), i.e., \(D_n \ge y_n-s\).

For each state \(y_{n+1}\in [s+1,S]\), we have \(q_{n+1}=0\) or \(q_{n+1}=S-(y_n-D_n)\). Specifically, when \(q_{n+1}=0\), then \(D_n=y_n-y_{n+1}\); when \(q_{n+1}=S-(y_n-D_n)\), then \(D_n\ge y_n-s\).

It is easy to compute the transition probabilities \(P_{i,j}=Prob(y_{n+1}=j|y_n=i)\), and hence the transition matrix \({\mathbf {P}}\) for any \(i\in [s+1,S]\) is given by

$$\begin{aligned} P_{i,j}=\left\{ \begin{array}{ll} p_{(i-j)^+}, &{} \hbox {for }j\in [s+1,S-1] \\ \sum _{m=i}^{\infty } p_{(m-s)}+p_{(i-j)^+}, &{} \hbox {for }j=S \\ \end{array} \right. \end{aligned}$$
(1)

where \(p_k=Prob(D_n=k)\) and \(p_{a^+}\) equals \(p_{a}\) if \(a\ge 0\) and zero otherwise. Note that for a given pair of (sS), the transition matrix \({\mathbf {P}}\) is of a size \((S-s)\times (S-s)\). When computing \(P_{i,j}\), what really matters is the relatively location of the component in the matrix, i.e., the interaction of which row and which column. Once the row and column numbers are known, then \({\mathbf {P}}\) is readily computed, regardless of s and S. That is, the transition matrix \({\mathbf {P}}\) is independent of s and S.

Because the Markov chain is irreducible and positive recurrent, unique steady-state probabilities \(\vec {\pi }=\{\pi _1,\pi _2,\ldots , \pi _{S-s}\}\) exist, where \(\pi _i\) denotes the long-run average proportion of time in which the inventory position y is \(s+i\). Because the Markov chain is also aperiodic, \(\pi _i\) is also the limiting probability that the chain is in state i. Let \(n\rightarrow \infty \); we can have

$$\begin{aligned} \left\{ \begin{array}{ll} \sum _{i=1}^{S-s} \pi _i=1\\ \vec {\pi }{\mathbf {P}}=\vec {\pi } \end{array} \right. \end{aligned}$$
(2)

Therefore, we can calculate the stationary probabilities by solving the linear equations (2). We note that given the MinOQ \(M_{min}\), the MaxOQ \(M_{max}\), the demand distribution D and the fact that the values of s and S satisfy the conditions for this case (i.e., \(M_{min}\le S-s\le M_{max}\) and \(S-(s+1-D_{max})\le M_{max}\)), the transition matrix \({\mathbf {P}}\) and corresponding stationary probabilities \(\vec {\pi }\) depend only on \(\Delta \equiv S-s\) and are independent of the values of s and S. For any given \(\Delta \) in this case, there exists one and only one corresponding \({\mathbf {P}}\) and \(\vec {\pi }\). Now, we can calculate the long-run average cost for this case:

$$\begin{aligned} L(\Delta ,s)=\sum _{i=1}^{\Delta } \pi _iC(s+i). \end{aligned}$$
(3)

The analysis of other cases is similar to that for Case 1. Therefore, we relegate it to the online appendix. Below for each case, we provide the expressions of \(q_n\). Readers are referred to online appendix for detailed analysis for each case.

Case 2: \(M_{min}\le S-s\le M_{max}\) and \(S-(s+1-D_{max})\ge M_{max}\). In this case, the lowest possible inventory position \(x_n=s+1-D_{max}\) is far below S with a gap greater than \(M_{max}\). Therefore, in some scenarios, even an order of the largest size \(M_{max}\) cannot raise the inventory position to S. The inventory position \(y_n\) in such scenarios is raised as close as possible to S but is still less than S. By definition, \(q_n\) under the modified (sS) policy in this case is given by

$$\begin{aligned} q_{n}=y_n-x_n=\left\{ \begin{array}{ll} 0, &{} \hbox {if }x_n>s; \\ S-x_n, &{} \hbox {if }S-M_{max}<x_n\le s; \\ M_{max}, &{} \hbox {if }x_n\le S-M_{max}. \end{array} \right. \end{aligned}$$

Case 3: \(S-s> M_{max}\). It follows that \(s+M_{max}\le S\). It is easy to see that in this case, whenever \(x_n\) reaches s or below, an order of the largest size \(M_{max}\) is placed to raise \(y_n\) as close as possible to S, but \(y_n\) in this case is always less than S because \(s+M_{max}\le S\). The order quantity in this case is given by

$$\begin{aligned} q_{n}=y_n-x_n=\left\{ \begin{array}{ll} 0, &{} \hbox {if }x_n>s; \\ M_{max}, &{} \hbox {if }x_n\le s. \end{array} \right. \end{aligned}$$

Case 4: \(S-s< M_{min}\) and \(M_{min}\le S-(s+1-D_{max})\le M_{max}\). In this case, when \(x_n\) is slightly lower than s, by the definition of the proposed policy, an order of the smallest size \(M_{min}\) has to be placed to raise the inventory position. By \(S-s<M_{min}\), the inventory position after ordering \(y_n\) can be higher than the order-up-to level S. The order quantity in this case is given by

$$\begin{aligned} q_{n}=y_n-x_n=\left\{ \begin{array}{ll} 0, &{} \hbox {if }x_n>s; \\ M_{min}, &{} \hbox {if }s-M_{min}\le x_n\le s;\\ S-x_n, &{} \hbox {if }x_n<s-M_{min}. \end{array} \right. \end{aligned}$$

Case 5: \(S-s<M_{min}\) and \(S-(s+1-D_{max})\ge M_{max}\). This is the most complicated case. First, by \(S-(s+1-D_{max})\ge M_{max}\), we have \(s+1-D_{max}+M_{max}\le S\). This implies that in some scenarios, even an order of the largest size \(M_{max}\) cannot raise \(y_n\) exactly to S. Second, by \(S-s<M_{min}\), in some scenarios when \(x_n\) is slightly lower than s, an order of the smallest size \(M_{min}\) is placed, and the resulting \(y_n\) is higher than S. By definition, the order quantity is given by

$$\begin{aligned} q_{n}=y_n-x_n=\left\{ \begin{array}{ll} 0, &{} \hbox {if }x_n>s; \\ M_{min}, &{} \hbox {if }S-M_{min}\le x_n\le s;\\ S-x_n, &{} \hbox {if }S-M_{max}<x_n<S-M_{min}; \\ M_{max}, &{} \hbox {if }x_n\le S-M_{max}. \end{array} \right. \end{aligned}$$

Case 6: \(S-s< M_{min}\) and \(S-(s+1-D_{max})\le M_{min}\). In this case, we have \(S-s<M_{min}-D_{max}+1 \le M_{min}\). Therefore, whenever \(x_n\) reaches s or below, an order of the smallest size \(M_{min}\) is placed, and the inventory position \(y_n\) is not lower than S. The order quantity is given by

$$\begin{aligned} q_{n}=y_n-x_n=\left\{ \begin{array}{ll} 0, &{} \hbox {if }x_n>s; \\ M_{min}, &{} \hbox {if }x_n\le s. \end{array} \right. \end{aligned}$$

To summarize, for each case (possible combination of s, S, \(D_{max}\), \(M_{min}\) and \(M_{max}\)), we first provide the corresponding order quantity \(q_n\) based on the inventory position before ordering \(x_n\). Then, we can have the inventory position after ordering \(y_{n+1}\) based on \(x_{n+1}\), or equivalently \(y_n-D_n\). This enable us to formulate \(\{y_n\}\) as a discrete-time Markov chain. We then characterize the state space of the Markov chain, and further divide the state space into several segments based on the order quantity needed to reach the corresponding state space. We also express the transition probability between states as a transition matrix \({\mathbf {P}}\). Given \({\mathbf {P}}\), the steady-state probability \(\vec {\pi }\) can be obtained by solving the equation \(\vec {\pi }{\mathbf {P}}=\vec {\pi }\) with the summation of all elements in \(\vec {\pi }\) equal to 1. It follows that the long-run average cost can be easily computed with \(\vec {\pi }\). We provide detailed analysis of all six cases in online appendix.

Combining Cases 1 to 6, we summarize all six cases in Table 1, in an increasing order of \(\Delta >0\). In particular, note that Cases 1 and 5 are mutually exclusive. Specifically, if \(M_{max}-D_{max}+1 \le \Delta \le M_{min}\), we have Case 5, while Case 1 is impossible; if \(M_{min}\le \Delta \le M_{max}-D_{max}+1\), then Case 1 will arise with Case 5 being impossible.

Table 1 Summary of Cases 1 to 6

5 Policy optimization

Thus far, we have introduced the cost evaluation method for the modified (sS) policy. For any given pair of \((\Delta ,s)\), we are able to compute the long-run average cost \(L(\Delta ,s)\) using the method mentioned above. In this section, we will discuss how to optimize our proposed policy by selecting optimal policy parameters.

We first characterize the following property of \(L(\Delta , s)\).

Proposition 1

In each case, for any given \(\Delta \), \(L(\Delta , s)\) is always convex in s.

Proof of Proposition 1. Note that \(L(\Delta ,s)\) is expressed as the sum of several functions \(C(\cdot )\). In addition, the number of functions is fixed and independent of s, i.e., \(\Delta \) in Cases 1–3 and \(M_{min}\) in Cases 4–6, and \(C(\cdot )\) is convex for each i, so is their sum. Moreover, for a given \(\Delta \), the transition matrix \({\mathbf {P}}\) depends only on \(\Delta \). It follows that the limiting probability \(\vec {\pi }\) depends only on \(\Delta \), and is independent of s or S. Therefore, \(L(\Delta , s)\) is convex in s for any given \(\Delta \). \(\square \)

Proposition 1 establishes the convexity of \(L(\Delta , \cdot )\). Let \({\hat{s}}^{*}(\Delta )\) be the value of s at which L reaches its minimum for a given \(\Delta \). Note that \({\hat{s}}^{*}(\Delta )\) is a function of \(\Delta \). For convenience of notation, we use \(s^{*}_{\Delta }\) to replace \({\hat{s}}^{*}(\Delta )\); that is, \(s^{*}_{\Delta }\) denotes the corresponding optimal value of s for a given \(\Delta \).

In addition, observing Cases 1–6, \(\{y_n\}\) is always a DTMC with a finite state space \([s+1, s+a_\Delta ]\), where \(a_\Delta \) denotes the length of the vector \(\vec {\pi }\) for a given \(\Delta \). That is, \(a_\Delta \) takes a value of \(\Delta \) when \(\Delta \) satisfies the conditions of Cases 1–3 and takes a value of \(M_{min}\) when the conditions of Cases 4–6 are satisfied. Given this, the long-run average cost in this case can be generalized as \(L(\Delta ,s)=\sum _{i=1}^{a_\Delta }\pi _i C(s+i)\).

Proposition 2

Given \(\Delta \), \(s^{*}_{\Delta }\) in each case always satisfies \(s^{*}_{\Delta }<y^*\le s^{*}_{\Delta }+a_\Delta \).

Proof of Proposition 2. Based on the definitions of \(s^{*}_{\Delta }\) and \(y^*\), we prove the proposition by contradiction. First, assume that \(s^{*}_{\Delta }\ge y^*\). Because C(y) is non-decreasing when \(y\ge y^*\), \(C(s^{*}_{\Delta })\ge C(y^*)\), and hence \(C(s^{*}_\Delta +j)\ge C(y^*+j)\), \(\forall j\in [1,a_\Delta ]\). Therefore, we should have \(L(\Delta ,s^{*}_{\Delta })=\sum _{j=1}^{a_\Delta } \pi _j C(s^{*}_\Delta +j)\ge \sum _{j=1}^{a_\Delta } \pi _j C(y^*+j)=L(\Delta ,y^*)\). This contradicts the definition of \(s^{*}_{\Delta }\), and hence \(s^{*}_{\Delta }\le y^*\). We can also prove that \(y^*\le s^{*}_{\Delta }+\Delta \) by contradiction. Assume that \(y^*>s^*_{\Delta }+\Delta \), or equivalently, \(s^*_{\Delta }<y^*-\Delta \). Because C(y) is convex and \(y^*\) is the minimizer of C(y), it can be easily seen that \(C(s^{*}_{\Delta })>C(y^*-\Delta )>C(y^*)\). Therefore, \(C(s^{*}_{\Delta }+j)>C(y^*-\Delta +j)\), \(\forall j\in [1, a_\Delta ]\), and hence \(L(\Delta , s^{*}_{\Delta })>L(\Delta , y^*-\Delta )\). This means that \(s^{*}_{\Delta }\) is not optimal, which contradicts the definition of \(s^{*}_{\Delta }\). Therefore, we have \(y^*\le s^{*}_{\Delta }+\Delta \). \(\square \)

For any given \(\Delta \), Proposition 2 provides lower and upper bounds of \(s^*_{\Delta }\): \(y^*-a\le s^*_{\Delta }<y^*\). The narrowed search space significantly facilitates the search for \(s^*_{\Delta }\).

The optimal inventory parameters \((\Delta ^*, s^*)\) should be selected as those that minimize \(L(\Delta ,s)\). Now, the only remaining problem is how to search for \(\Delta ^*\). Once \(\Delta ^*\) is found, we can obtain \(s^*=s^*_{\Delta ^*}\) by applying the results in Proposition 2. The following lemma provides an upper bound on \(\Delta ^*\).

Proposition 3

There always exists \(\Delta ^*\) such that \(\Delta ^*\le M_{max}\).

Proof of Proposition 3. Suppose that there exists a pair of \((\Delta ^*, s^*_{\Delta ^*})\) such that \(\Delta ^*> M_{max}\) and \(s^*_{\Delta ^*}\) minimizes \(L(\Delta ^*,s)\) over s. Then, by definition, the inventory position after ordering \(\{y_n\}\) under this policy has a finite state space \([s^*_{\Delta ^*}+1, s^*_{\Delta ^*}+M_{max}]\). It can be easily checked that the cost of such a system always equals that of a system with \(\Delta =M_{max}\) and \(s=s^*_{M_{max}}\). \(\square \)

Based on the preceding propositions, we design an efficient and easy-to-implement algorithm (see Table 2) to compute \(\Delta ^*\) and \(s^{*}\) that minimize the long-run average cost.

Table 2 Algorithm for finding \(\Delta ^*\) and \(s^*\) in the modified (sS) policy

The complexity of solving linear equations (e.g., Eq.  (2)) to obtain steady-state probabilities is \(O(a^3)\) if Gaussian elimination is used. In addition, by Proposition 3, \(\Delta \) can take \(M_{max}\) different values; therefore, the total complexity of the algorithm is \(O(M_{max}^4)\).

6 Numerical experiments

In this section, we conduct numerical experiments to test the performance of the modified (sS) policy and measure the impact of inventory parameters.

6.1 Performance of the modified (sS) policy

In our numerical studies, we set \(M_{max}=100\) and assume that demands follow a Poisson distribution, as in Zhou et al. (2007) and Kiesmüller et al. (2011). We then conduct numerical studies with respect to the following parameters: the minimum order quantity \(M_{min}\), the expected demand per period E(D), and the backlogging cost b. Specifically, \(M_{min}\) takes values of 30, 40, 50, 60, 70, 80 and 90, E(D) takes values of 10, 20, 30, 40 and 50, and b takes values of 4, 9, 19, 49 and 99. Since we fix \(h=1\), \(b/(h+b)\) takes values of 0.80, 0.90, 0.95, 0.98 and 0.99. The complete set of parameter values is given in Table 3. All combinations of these parameters provide \(7*5*5=175\) test instances.

Table 3 Base parameter values for the numerical experiments

To better illustrate the performance of the modified (sS) policy, we compare it with a benchmark. We choose as a benchmark the “optimal policy" that achieves the minimal average cost among all admissible policies. We use the value iteration method to compute the optimal long-run average cost, and the optimal policy is computed as follows. We initially compute the minimal average cost of a certain number of periods. Then, we continue increasing for a fixed number of periods, computing the minimal average cost of these periods, and comparing the deviation of the two costs. The iteration does not end until the deviation is insensitive to the increments of periods. The cost of such a policy is widely adopted as a benchmark in the inventory management literature with an MinOQ requirement; see Zhou et al. (2007); Kiesmüller et al. (2011). We compare the long-run average cost of our proposed modified (sS) policy to this optimal cost. Denote the average cost of the (sS) policy by \(C^*\), i.e., \(C^*=L(\Delta ^*, s^*)\), and the optimal cost by \(C_{OPT}\). For each instance, we use G to denote the gap between the costs of these two policies as follows:

$$\begin{aligned} G=\frac{C^*-C_{OPT}}{C^*}\times 100\%. \end{aligned}$$

We calculate the average, maximal, and minimal gaps, which are denoted by avg G, max G, and min G, respectively. The numerical results are reported in Table 4.

Table 4 Performance of the modified (sS) policy
Table 5 Distribution of G

It can be observed from the last line of Table 4 that the overall performance of the proposed heuristic policy is excellent. The average gap G between the costs of the proposed and optimal policies is 1.19%, and the minimum G can be as small as less than 0.01%Footnote 6, while even the maximum G is approximately \(10\%\).

Table 5 presents the distribution information of G. As shown in Table 5, G is less than 0.05% in more than 40% of all instances. In addition, G is less than 0.5% and 1% in about 60% and 70% of instances, respectively. For approximately 85% of instances, G is less than 3%. All these results show that the performance of the modified (sS) policy is quite good and robust.

We now revisit Table 4 and conduct sensitivity analysis with respect to the inventory parameters. First, Table 4 shows that as the backlogging cost b increases, or equivalently, as the ratio \(b/(h+b)\) increases, the gap G also increases. In our numerical examples, avg G is raised from 0.72 to 1.35% when \(b/(h+b)\) increases from 0.8 to 0.99.

Second, as shown in Table 4, G tends to first increases and then decreases, as \(M_{min}\) increases. The intuition behind this is that when \(M_{min}\) is relatively small, the impact of MinOQ requirement is not very significant. As as \(M_{min}\) increases, the gap becomes larger, because the MinOQ requirement becomes more stringent. When \(M_{min}\) is relatively large, although the MinOQ requirement is quite stringent, the action space becomes limited as \(M_{min}\) increases and approaches to \(M_{max}\). Due to such a limited action space, the performance of the heuristic and optimal policy becomes close. As a result, when \(M_{min}\) is quite large (closed to \(M_{max}\)), G is relatively small.

Third, Table 4 also shows that G tends to increases when E(D) increases. The possible reason for this observation may be demand variability. Note that when E(D) increases for a Poisson-distributed demand, the demand variability, a direct measure of which may be the variance of demands, also increases.

Finally, we study how the optimal policy parameters \(s^*\) and \(S^*\) vary with respect to inventory system parameters, including \(M_{min}\), E(D), h and b. We plot the results in Fig. 1. The base parameters are chosen as \(h=1\), \(b=4\), \(M_{min}=30\) and \(E(D)=10\), and each time, we vary one of them. First, it can be observed from Fig. 1a that \(s^*\) and \(S^*\) are decreasing and increasing in \(M_{min}\), respectively, resulting in \(\Delta ^*\) increasing in \(M_{min}\). This is because as \(M_{min}\) increases, the MinOQ requirement becomes more stringent. As a result, \(\Delta ^*\) in turn increases to accommodate a larger MinOQ. Second, as shown in Fig.  1b, \(s^*\) and \(S^*\) tend to increase as E(D) increases. This observation is intuitive given the results in Proposition 2. Finally, combining Fig. 1c, d, it is shown that \(s^*\) and \(S^*\) increase as \(b/(h+b)\) increases. This is because when h increases, the firm tends to hold less inventory; on the other hand, when b increases, the firm would like to stock more inventory to reduce the possible backlogging/penalty cost.

Fig. 1
figure 1

Optimal policy parameters \(s^*\) and \(S^*\) with respect to inventory system parameters

7 Conclusion

In this work, we study stochastic inventory systems with both MinOQ and MaxOQ requirements. The optimal policy of such a system is unknown, and even if it exists, it must be quite complicated. Motivated by recent work on multi-echelon inventory systems, we propose a simple heuristic policy for which we are able to compute the system-wide costs by applying a Markov chain approach. We also derive bounds for the optimal parameters, which facilitate the optimization of the policy. We conduct numerical studies to verify the effectiveness of the proposed policy. The heuristic policy has an excellent performance with a gap (compared to the optimal cost) of less than 0.05% in more than half of all instances. In addition, we also conduct sensitivity analysis to investigate the impact of some inventory parameters.

Our work also has limitations and can be extended in several directions. First, we consider an infinite horizon problem under the average cost criterion. An immediate question is whether and how the modified (sS) policies can be used in finite horizon models. We believe that such an extension is interesting and nontrivial. Because the steady-state method may not continue to work in finite horizon models, the classical dynamic programming technique will be used in solving inventory management problems. Second, our work studies a single-stage, single-product system. One may extend our work to more complicated model settings, e.g., multi-echelon (e.g., Yuan et al. 2020), multi-product or batch-ordering systems.