1 Introduction

Flow lines consist of a number of stations that are arranged in series and separated by limited buffer spaces. The workpieces flow through the system from station to station, waiting in the buffer if the downstream station is not available. This type of production system is often applied in practice, mainly for mass production. Examples can be found in the automotive industry (Colledani et al. 2010; Li 2013) and in food production (Cooke et al. 2005; Liberopoulos and Tsarouhas 2005), among others.

Burman et al. (1998) note that there is a great potential in the systematic optimization of the buffer allocation in such stochastic flow lines, as it highly influences the performance of the line. The stochastic influences are due to random machine breakdowns, uncertain times of repair, and random processing times. This can lead to blocking and starvation of the stations, which may lead to a reduction of the throughput. The allocation of additional buffer space may increase the throughput, although it leads to an increase of the average work in process in the line. In this paper, we develop an optimization approach for the buffer allocation in a linear flow line with all those stochastic influences.

Two basic streams of research can be found regarding the allocation of buffers in stochastic flow lines: performance evaluation and optimization. Dallery and Gershwin (1992) and Gershwin and Schor (2000) provide an overview of the different evaluation approaches. Exact evaluation is only possible for small lines as analytical results are difficult to obtain (Li and Meerkov 2009). For longer lines, simulation and other approximative methods, e.g., decomposition or aggregation, are applied. The methods proposed in the literature on performance evaluation can also be used as integral parts of optimization approaches by applying generative methods and evaluative methods iteratively. The generative methods are used to obtain candidate solutions that are then evaluated. The optimization of buffer allocations, referred to as Buffer Allocation Problem (BAP) in literature, is NP-hard (MacGregor Smith and Cruz 2005). Three types of objective functions can be found: minimization of the total buffer capacities with respect to a given goal throughput, throughput maximization with respect to a limited number of buffers, and profit maximization. This paper focuses on the minimization of total buffer capacities.

The optimization approaches can be divided into exact approaches, heuristics, and rules of thumb. Demir et al. (2014) provide an overview on the approaches published after 1998. Exact approaches only exist for small lines because of the combinatorial complexity and the lack of exact evaluation methods (MacGregor Smith and Cruz 2005; Li and Meerkov 2009). Recently, sample-based approaches have been proposed to optimize flow lines with limited buffer capacities. For sufficiently large sample sizes, the obtained allocations converge to the exact solution. Matta and Chefson (2005) propose an iterative change of configurations to determine buffer allocations based on a mathematical programming formulation developed by Schruben (2000) and Chan and Schruben (2008). Matta (2008) presents an exact mixed-integer programming (MIP) formulation that optimizes the number of buffer spaces behind each station, using samples of the processing times in continuous time.

Heuristic methods based on samples are developed by Gürkan (2000), Helber et al. (2011) and Alfieri and Matta (2012, 2013). Gürkan (2000) uses sample-based gradient estimates of performance measures to obtain buffer allocations in continuous lines. She points out that this approach may also be used to approximate lines with discrete goods. Helber et al. (2011) present a discrete-time linear programming (LP) formulation that incorporates the BAP. The authors use sampling to transform the stochastic processing times of the different jobs at a given station into the corresponding realizations of production capacities per discrete time period. This method leads to simulation and discretization errors. Alfieri and Matta (2012) introduce the concept of time buffers, which can be used to derive approximate buffer allocations. This approach can also be applied to reduce the feasible region of the buffer capacities as necessary in Matta (2008). Recently, Alfieri and Matta (2013) proposed a time-based decomposition approach that solves the mathematical programming formulation by iteratively solving a number of subsystems. These subsystems contain only a portion of the entities in the whole model. The subsystems are connected via additional constraints reflecting the status of the system defined by previous subsystems. Other heuristic methods include Tabu Search and Simulated Annealing, as generative methods, in combination with simulation or decomposition, as evaluation methods (Lutz et al. 1998; Spinellis and Papadopoulos 2000). Yamashita and Altiok (1998) and Diamantidis and Papadopoulos (2004) apply Dynamic Programming in combination with decomposition or aggregation. In addition to the risk of obtaining local optima as final solutions, some of these methods are based on restrictive assumptions. Caramanis (1987) applies Generalized Benders Decomposition with gradient estimates for performance approximation. However, due to errors in the gradient estimates, optimal solutions cannot be guaranteed. Li and Meerkov (2009) propose heuristics based on closed formulas and recursion approaches. They show that these heuristics are fast, but do not necessarily provide good allocations.

Rules of thumb based on extensive numerical studies are proposed by Hillier et al. (1993), Powell and Pyke (1996), and Hillier (2000). However, these results may not be generalized, and a large computational effort is needed for their derivation.

This paper deals with exact sample-based MIP formulations, i.e., the obtained results are sample-exact. The advantage of these sampling approaches compared to other approaches proposed in literature is based on their flexibility: besides the ability to cope with both reliable and unreliable lines, they do not require the assumption of statistical independency. The processing times, times to failure, and repair times can follow any distribution, or may be taken from empirical data. However, when using standard solvers, the sample-based MIP formulations proposed in literature remain intractable for flow lines with more than three stations due to extensive computation times (Matta 2008). Therefore, to exploit the flexibility of these approaches, a fast solution method has to be developed. We develop a Benders Decomposition approach for such a MIP formulation of the BAP.

The main contribution of this paper is to develop a Benders Decomposition approach with combinatorial cuts to optimally and efficiently solve the BAP with respect to an underlying sample. The performance of this algorithm is improved via the derivation of initial bounds. The numerical study shows the great degree of flexibility of this approach, as its sample-based structure allows to take account for correlations and arbitrary distributions of processing times, times to failure, and repair times.

This paper is organized as follows. Section 2 introduces the MIP formulation for the optimization of flow lines. In Sect. 3, the Benders Decomposition approach and a procedure to obtain initial bounds are presented. Section 4 provides a numerical study on the performance of Benders Decomposition and the initial bounds. Section 5 presents the conclusions and further research efforts.

2 Sample-based flow line model

This section formulates the evaluation problem and the optimization problem with respect to the buffer allocation in flow lines. First, the underlying assumptions are given in Sect. 2.1. The Benders Decomposition approach is based on iterative generation of candidate allocations and evaluation of these candidates. Therefore, Sect. 2.2 presents a fast simulation algorithm for throughput evaluation for a given buffer allocation. Finally, Sect. 2.3 describes the mixed-integer program for buffer optimization.

The key idea of the sample-based modeling approach is to simulate the flow of a large number of workpieces throughout the line. Therefore, the start and departure times of processing a workpiece, \(w\), at a station, \(s\), are represented by a set of real-valued decision variables. The random processing times are replaced by a deterministic sample. The samples are generated by descriptive sampling (DS) (Saliby 1990a). In DS, deterministic values serve as the input for the inverse distribution function. These values are then shuffled randomly to represent random behavior. This method is more appropriate than simple random sampling (SRS) because it leads to a more precise description of the underlying distribution (Saliby 1990b). Moreover, DS leads to a reduction of the variability of the input sample and therefore to a reduction of the variability of the simulation results. The numerical study in Sect. 4.1 supports this claim in the case of the BAP [see also Stolletz and Weiss (2013)].

The samples consist of effective processing times, i.e., the repair times are assumed to be included in the (raw) processing times. This can be accomplished with a single distribution or the sum of the distributions of processing times and repair times.

2.1 Assumptions

The model of the flow line is based on the following assumptions:

  • The flow line consists of \(S\) stations, which process \(W\) workpieces.

  • A number of \(W_0\) workpieces corresponds to the warm-up phase.

  • The maximum capacity of the buffer behind station \(s\) is limited to \(B_s\).

  • The material supply to the first station is unlimited, i.e., the first station never starves.

  • The buffer behind the last station is infinitely large. Thus, this station cannot be blocked.

  • The processing times of the workpieces at each station are generally distributed or deterministic. The MIP uses sampled processing times, \(d_{s,w}\), for each station, \(s\), and each workpiece, \(w\).

  • The stations may be subject to operation-dependent failures. Times to failure and repair times are generally distributed. Sampled repair times are added to the sampled processing times, \(d_{s,w}\), of the workpiece \(w\), which is processed when the breakdown of station \(s\) occurs.

  • In the event of blocking, the station finishes the currently processed workpiece. Then, the workpiece waits at the station until a buffer space or the following station becomes available (blocking after service).

  • Transportation times are insignificant or are already included in the processing times.

  • A minimum throughput rate of \(\mathrm{TH}^*\) has to be reached after the warm-up.

Figure 1 shows an example of a flow line according to these assumptions.

Fig. 1
figure 1

Flow line under consideration

2.2 Evaluation of given allocations

If the capacities of the buffers are known, the start times and the departure times of each workpiece at each station can be derived using a fast simulation algorithm, as Algorithm 1. The corresponding notation can be found in Table 1.

figure a

The algorithm calculates the start and departure times of each workpiece \(w\) at each station \(s\). The first workpiece starts processing at the first station at time zero (line 1) and flows through the line without ever being blocked, because the line is empty. Consequently, it leaves a station \(s\) after the processing time elapsed (line 3) and starts processing at the subsequent station \(s+1\) as soon as it leaves \(s\) (line 4). Lines 7–24 model the flow of the remaining workpieces. Start times \(\mathrm{XS}_{s,w}\) of stations \(s = 2,\ldots ,S\) depend on the availability of the workpiece \(w\). Since the first station never starves, processing of a workpiece \(w\) starts when \(w-1\) leaves the station (line 10). At stations \(s = 2,\ldots ,S\), it may happen that no workpieces are available. In this case, \(s\) idles until station \(s-1\) provides a workpiece (lines 12 and 22). Departure times \(\mathrm{XF}_{s,w}\) of stations \(s = 2,\ldots ,S-1\) depend on the downstream buffer capacities \(X_s\) and the occurrence of blocking. If the capacities \(X_s\) are set to zero, a workpiece \(w\) leaves station \(s\) after it finished processing and the subsequent station becomes available (that is, workpiece \(w-1\) leaves station \(s+1\), line 15). In contrast, if buffer spaces are allocated behind station \(s\), but the available buffer capacity suffices for all workpieces in the system, blocking can never occur (line 17). If there are more workpieces in the system than buffer capacities at station \(s\), blocking may occur. Therefore, workpiece \(w\) leaves station \(s\) when its processing is completed and a buffer space becomes available (that is, a workpiece leaves the buffer, because it starts processing at station \(s+1\), line 19). The last station \(S\) is never blocked and consequently, workpieces leave this station directly after processing (line 23).

Table 1 Notation for the models

Based on this information, the realized throughput TH is calculated by the fraction of the number of finished parts \(W-W_0\) and the required time \(\mathrm{XF}_{S,W}-\mathrm{XF}_{S,W_0}\) after the warm-up phase:

$$\begin{aligned} \mathrm{TH} = \frac{W-W_0}{\mathrm{XF}_{S,W}-\mathrm{XF}_{S,W_0}} \end{aligned}$$
(1)

2.3 Optimization of buffer allocations

The problem of allocating a minimum number of total buffer spaces while achieving a given minimum throughput can be solved by a MIP formulation as follows. Additionally to the notation used in Sect. 2.2, a binary variable \(Y_{s,b}\) is required to indicate that the buffer capacity behind station \(s\) equals \(b\).

$$\begin{aligned}&\displaystyle \text {Minimize} \,\, \sum _{s=1}^{S-1} X_s&\end{aligned}$$
(2)
$$\begin{aligned} \text {s.t.}\qquad \qquad \mathrm{XS}_{s,w} + d_{s,w} \le \,&\mathrm{XF}_{s,w},&\quad \forall s,\; \forall w \end{aligned}$$
(3)
$$\begin{aligned} \mathrm{XF}_{s,w} \le \,&\mathrm{XS}_{s+1,w} ,&\quad \forall s \le S-1,\; \forall w \end{aligned}$$
(4)
$$\begin{aligned} \mathrm{XF}_{s,w} \le \,&\mathrm{XS}_{s,w+1} ,&\quad \forall s,\; \forall w \le W-1 \end{aligned}$$
(5)
$$\begin{aligned} \mathrm{XF}_{S,W}-\mathrm{XF}_{S,W_0} \le \,&\frac{W-W_0}{\mathrm{TH}^*},&\end{aligned}$$
(6)
$$\begin{aligned} \mathrm{XS}_{s+1,w} - \mathrm{XF}_{s,w+b} \le \,&M \cdot (1-{Y}_{s,b}),&\quad \forall s \le S-1,\; \forall b,\; \forall w \le W-b\end{aligned}$$
(7)
$$\begin{aligned} \sum _{b=0}^{B_s} {Y}_{s,b} =\,&1&\quad \forall s \le S-1,\end{aligned}$$
(8)
$$\begin{aligned} X_{s} =\,&\sum _{b=0}^{B_s} b \cdot {Y}_{s,b}&\quad \forall s \le S-1,\end{aligned}$$
(9)
$$\begin{aligned} \mathrm{XS}_{s,w},\mathrm{XF}_{s,w} \ge \,&0,&\quad \forall s,\; \forall w\end{aligned}$$
(10)
$$\begin{aligned} Y_{s,b} \in \,&\{0,1\}&\quad \forall s \le S-1,\; \forall b \end{aligned}$$
(11)

The objective function (2) minimizes the overall number of buffer spaces in the line. The constraints are linearizations of the formulas given in Algorithm 1. Constraint (3) states that a workpiece, \(w\), departs from station \(s\) at the earliest time after being processed. Consequently, the slack of the inequality corresponds to the blocking time of workpiece \(w\) after being processed at station \(s\). A workpiece cannot start being processed by station \(s+1\) until it departs from station \(s\). This is ensured by the inequality described by (4). The slack of this inequality defines the waiting time of workpiece \(w\) in the buffer between station \(s\) and station \(s+1\). As a station can only process one workpiece at a given time, the inequality in (5) states that workpiece \(w+1\) does not start processing at station \(s\) until the preceding workpiece \(w\) departs from this station. A station may starve between the processing of two consecutive workpieces, which is equivalent to the slack of Constraint (5). Inequality (6) ensures that a minimum throughput, \(\mathrm{TH}^*\), is reached [see Equality (1)]. Constraint (7) states that the buffer capacity is not exceeded. If \(b = X_s\), the inequality ensures that workpiece \(w\) departs from the buffer between stations \(s\) and \(s+1\) before workpiece \(w+b\) enters. Otherwise, the inequality is deactivated by the BigM on the right-hand side (RHS). We choose BigM as the product of the maximum possible buffer capacity, \(\max _s B_s\), and the maximum processing time, \(\max _{s,w} d_{s,w}\). If there is no buffer between station \(s\) and station \(s+1\), i.e., \(b = 0\), the inequality reduces to \(\mathrm{XS}_{s+1,w} \le \mathrm{XF}_{s,w}\). Together with Inequality (4), the departure time of workpiece \(w\) at station \(s\) is assured to equal the starting time of \(w\) at station \(s+1\). Compared to the formulation presented by Matta (2008), we assume blocking after service instead of blocking before service. The capacity of each buffer between two stations must be unique. This is stated in Eq. (8). Constraint (9) connects the (redundant) buffer space variables \(X_s\) and the binary variables \(Y_{s,b}\). Variables \(X_s\) are used for notational convenience.

Note that the combination of Equalities (4) and (5) determines the start times as in Algorithm 1. Accordingly, the combination of Eqs. (3) and (7) determines the completion times.

If the buffer capacities behind each station are given, the MIP can also be used for evaluation (instead of Algorithm 1). However, the throughput may be overestimated, because the warm-up phase is based on the number of workpieces instead of a specific point in time. This results in a degree of freedom regarding the start and departure times in the warm-up phase of the optimal solution. Due to this flexibility, the workpieces do not necessarily start processing as soon as possible. To avoid this overestimation, the start and departure times have to be added to the objective function (2).

3 Application of Benders Decomposition to the Buffer Allocation Problem

The complexity of the MIP presented in the previous section incurs long computation times. Therefore, it is necessary to apply certain techniques to reduce the computation time. One literature stream concerns decomposition methods, which aim to split the original problem into smaller parts and to solve them iteratively. One of these methods is Benders Decomposition (Benders 1962). Benders Decomposition divides the original problem into a master problem and a subproblem, both of which are solved iteratively. The master problem is a relaxation of the original problem, calculates a solution, and passes it to the subproblem. The subproblem uses this solution to generate cuts that contain information about the feasibility and optimality of the current master solution. These cuts are added to the master problem such that optimality is proven at the termination of the algorithm. Consequently, a sequence of master- and subproblems has to be solved to obtain an optimal solution of the original problem.

3.1 Adjustments and specific features

Figure 2 provides an overview of the decomposition procedure for the BAP. The master problem contains only binary and integer decision variables. The subproblem considers the remaining variables, assuming that the variables of the master problem are fixed. In the case of the MIP formulation presented in Sect. 2, the binary variables, \(Y_{s,b}\), and the integer variables, \(X_s\), become part of the master problem. The real-valued decision variables, \(\mathrm{XS}_{s,w}\) and \(\mathrm{XF}_{s,w}\), belong to the subproblem.

Fig. 2
figure 2

Overview of Benders Decomposition for buffer allocation

Constraints (3)–(6) and (10) only contain real-valued decision variables and thus belong to the subproblem. Constraints (8)–(11) are included in the master problem, as they only contain binary variables. Constraint (7) contains both types of variables. It forms part of the subproblem and contains the master variables, \(X_s\), as parameters. Consequently, Constraint (7) can be replaced by (12).

$$\begin{aligned} \mathrm{XS}_{s+1,w} - \mathrm{XF}_{s,w+X_s} \le \, 0, \quad \forall s \le S-1,\; \forall w \le W-X_s \end{aligned}$$
(12)

Moreover, as the integer variables are assumed to be known when the subproblem is solved, the subproblem reduces to the evaluation version of the MIP. Note that the objective function (2) includes no real-valued decision variables. Thus, the objective function of the master problem is equal to the objective function (2). To avoid an overestimation of the throughput as outlined in Sect. 2.3, we use Algorithm 1 to evaluate the throughput of a given buffer allocation. The feasibility of this throughput is then checked by comparison to the goal throughput, \(\mathrm{TH}^*\).

The information on feasibility is expressed in additional constraints, which include only the integer variables. We add these constraints, called feasibility cuts, to the master problem. If the master problem contains all of the feasibility cuts, it is equivalent to the original problem.

In general, an exponential number of such constraints exists, which are usually not known in advance. Therefore, we consider a relaxed master problem, which includes no feasibility constraints at the beginning of the solution process. By iterating between the relaxed master problem and the subproblem, additional cuts are generated to ensure the feasibility of the final solution. If the subproblem is feasible, the resulting solution is optimal.

Based on Eqs. (2) and (8)–(11), the complete master problem is defined as follows.

$$\begin{aligned}&\displaystyle \text {Minimize} \,\, \sum _{s=1}^{S-1} X_s \end{aligned}$$
(2)
$$\begin{aligned}&\displaystyle \text {s.t.}\quad \sum _{b=0}^{B_s} {Y}_{s,b} =\, 1 \quad \forall s \le S-1, \end{aligned}$$
(8)
$$\begin{aligned}&\displaystyle X_{s} =\, \sum _{b=0}^{B_s} b \cdot {Y}_{s,b} \quad \forall s \le S-1, \end{aligned}$$
(9)
$$\begin{aligned}&\displaystyle \text {Feasibility} \text { cuts} \nonumber \\&\displaystyle Y_{s,b} \in \, \{0,1\} \quad \forall s \le S-1,\; \forall b \end{aligned}$$
(11)

If the master problem is infeasible, the original problem is also infeasible because the master problem is a relaxation of the original problem as long as not all feasibility cuts are added. Because of the restriction of the buffer capacities to \(B_s\), unboundedness cannot occur in the master problem. The subproblem cannot be unbounded because it is a simple evaluation. If the original problem has an optimal solution, the algorithm finishes after a finite number of iterations when the subproblem does not return new feasibility cuts.

As described in the literature on Benders Decomposition, the feasibility cuts are obtained from inequality (13) (classical feasibility cut).

$$\begin{aligned} 0 \ge \! -\left( \sum _{s=1}^{S-1} \sum _{w=1}^{W-b_s} \mu ^h_{5,s,w,b_s} \cdot M \cdot (1\!-Y_{s,b_s}) \!+ \mu ^h_4 \cdot \frac{W-W_0}{\mathrm{TH}^*} - \sum _{s=1}^S \sum _{w=1}^W \mu ^h_{1,s,w} \cdot d_{s,w}\right) \end{aligned}$$
(13)

\(\mu ^h\) is an extreme ray. The cut only contains the binary variables associated with the buffer capacities in the current solution. Note that we use the LP to solve the subproblem in the case of the classical feasibility cut, as information from the dual subproblem is needed for the extreme rays. Because the original formulation uses BigM coefficients in Constraints (7), the classical feasibility cuts (13) are weak. As a solution, Codato and Fischetti (2006) propose combinatorial cuts for Benders Decomposition. These cuts force at least one variable to be changed and exclude the redundant constraints that are caused by the usage of BigM coefficients. For the BAP, more information is available. We develop new Combinatorial Cuts based on the following observations. If the current buffer allocation is infeasible, the capacity of at least one buffer has to be increased. If the buffer capacities are decreased, the throughput remains the same or decreases and the goal throughput cannot be reached. Therefore, all solutions that include only the combinations of smaller buffer capacities than the current solution are known to be infeasible as well. We propose the following combinatorial cut if the current buffer capacity behind station \(s\) equals \(b_s\):

$$\begin{aligned} 1 \le \sum _{s=1}^{S-1} \sum _{b=b_{s}+1}^{B_s} Y_{s,b}. \end{aligned}$$
(14)

The RHS sums all the variables of possible buffer capacities for every station that are larger than the current buffer capacities (\(b > b_{s}\)). At least one of these variables must assume a value of one, i.e., at least one of the buffers increases.

3.2 Generation of lower bounds from subsystems

Figure 3 depicts the solution process using Benders Decomposition with combinatorial cuts for an exemplary flow line with 5 stations, a sample size of \(W = 250{,}000\) workpieces, and a bottleneck at the end of the line. One can observe that the solver takes only a few steps to find upper bounds that are close to the optimum, while the lower bound increases in many small steps. This is because if a candidate solution reaches the goal throughput, the optimal total buffer capacity has to be smaller or equal to the total buffer capacity of this solution. In contrast, if a candidate solution does not fulfill the requirement of the goal throughput, it does not necessarily mean that the total buffer capacity of this solution has to be increased. There may be other solutions with the same total number of buffer spaces (or even less) but with a different allocation that are feasible. Therefore, it is crucial to find appropriate lower bounds.

Fig. 3
figure 3

Course of the lower and upper bounds during the solution process

In literature, numerous studies propose algorithms, which approximate the optimal buffer allocation. Li and Meerkov (2009) propose several approaches to approximate the optimal solution for lines with more than three stations, which have deterministic processing times and stochastic up and down times of the stations. To use these solutions as bounds, they have to be evaluated. Depending on whether the solution is feasible or infeasible, it serves as a feasibility cut or as an upper bound on the total number of buffer spaces. Derivation of guaranteed lower bounds or individual upper bounds cannot be accomplished with these approaches. Therefore, we focus on the generation of guaranteed lower bounds and compare the different strategies in the numerical study in Sect. 4.

We decompose the line into several subsystems assuming that the supply of the first station of each subsystem is unlimited. As a result, the effect of starvation, which can occur in the original line, is neglected for the first station in each subsystem. Additionally, it is assumed that the workpieces can always leave the subsystem. Therefore, the last station of each subsystem is never blocked. Thus, for given buffer capacities, the isolated subsystem will never have a lower throughput than the original system as proven in the following theorems.

Theorem 1

In steady-state, the throughput of a system with unlimited supply at the first station is higher or equal to the throughput of an identical system with limited supply.

Proof

Let \(\mathrm{Arr}_w \ge 0\) be the arrival time of workpiece \(w\) in the system with limited supply. According to Algorithm 1, the start times of workpieces 1 and 2 at the first station of the system with limited supply are calculated from \(\mathrm{XS}_{1,1}^{\mathrm{lim}} = \mathrm{Arr}_w\) and \(\mathrm{XS}_{1,2}^{\mathrm{lim}} = \max \{\mathrm{XF}_{1,1}, \mathrm{Arr}_w\}\), respectively. As \(\mathrm{Arr}_w = 0\) for all \(w\) in the system with unlimited supply, the start times equal \(\mathrm{XS}_{1,1}^{\mathrm{unl}} = 0\) and \(\mathrm{XS}_{1,2}^{\mathrm{unl}} = \mathrm{XF}_{1,1}\). Consequently, \(\mathrm{XS}_{1,2}^{\mathrm{unl}} \le \mathrm{XS}_{1,2}^{\mathrm{lim}}\). With mathematical induction using the above formulas for \(w\) and the formulas of Algorithm 1 to calculate start and departure times, it follows that \(\mathrm{XF}_{S,W}^{\mathrm{unl}} \le \mathrm{XF}_{S,W}^{\mathrm{lim}}\), i.e., less time to produce \(W\) workpieces is required in the unlimited case, and therefore, the throughput of the system with unlimited outflow is higher or equal to the throughput of an identical system with limited outflow. \(\square \)

Theorem 2

In steady-state, the throughput of a system with unlimited outflow at the last station is higher or equal to the throughput of an identical system with limited outflow.

Proof

Let \(\mathrm{Dep}_w \ge 0\) be the time, workpiece \(w\) is allowed to leave the system with limited supply. According to Algorithm 1, the departure time of workpiece 1 at the last station, \(S\), is calculated as \(\mathrm{XF}_{S,1}^{\mathrm{lim}} = \max \left\{ \mathrm{XS}_{S,1}+d_{S,1}, \mathrm{Dep}_1\right\} \) for the system with limited outflow. As \(Dep_w = 0\) for all \(w\) in the system with unlimited outflow, the departure time equals \(\mathrm{XF}_{S,1}^{\mathrm{unl}} = \mathrm{XS}_{S,1}+d_{S,1}\). Consequently, \(\mathrm{XF}_{S,1}^{\mathrm{unl}} \le \mathrm{XF}_{S,1}^{\mathrm{lim}}\). With mathematical induction using the above formulas for \(w\) and the formulas of Algorithm 1 to calculate start and departure times, it follows that \(\mathrm{XF}_{S,W}^{\mathrm{unl}} \le \mathrm{XF}_{S,W}^{\mathrm{lim}}\), i.e., less time to produce \(W\) workpieces is required in the unlimited case, and therefore, the throughput of the system with unlimited outflow is higher or equal to the throughput of an identical system with limited outflow. \(\square \)

Consequently, the optimal buffer capacities of the subsystems are lower than or equal to the optimal buffer capacities in the original line. Levantesi et al. (2001) use lower bounds from subsystems of size 2 as a starting point for a gradient algorithm to approximate optimal buffer allocations in continuous lines.

The larger the subsystems are, the better the original setting is approximated. However, for large subsystems, the computation time may be long. Therefore, we propose an iterative procedure. We first solve subsystems with \(i=2\) stations, as shown in Fig. 4. Each solution of a subsystem provides a certain buffer capacity that forms a lower bound for the respective buffer. These buffer capacities are then used as lower bounds in the original system and all of the subsequent subsystems. In the next step, we solve the subsystems of size \(i \ge 3\). The optimal buffer capacity of each subsystem \(l = 1,\ldots ,S-i+1\) of size \(i\) at station \(s\) is denoted by \(b_{s,l,i}\). Figures 4 and 5 depict a line with 5 stations divided into subsystems of size \(i = 2\) and \(i = 3\), respectively.

Fig. 4
figure 4

Generation of lower bounds via subsystems of size \(i=2\)

Fig. 5
figure 5

Generation of lower bounds via subsystems of size \(i=3\)

In contrast to the subsystems of size \(i=2\), the lower bounds derived from the subsystems of larger sizes do not form bounds for individual buffers. Individual bounds, i.e., \(b_{s,l,i} \le X_s\) for \(i \ge 3\), may force a certain buffer to be larger than necessary in the original line, resulting in a sub-optimal final solution for the original line. This is because the buffer allocation of the subsystem, which is found by the solver, may not be unique, as only the total number of buffer spaces is minimized. However, their sum forms a lower bound for all of the respective buffer capacities in the original line. Figure 5 illustrates this case for a subsystem of size \(i=3\). Inequality (15) presents the bounds obtained from subsystems of size \(i\).

$$\begin{aligned} \sum _{j = 0}^{i-2} b_{j+l,l,i} \le \sum _{j = 0}^{i-2} X_{j+l} \qquad \forall l \end{aligned}$$
(15)

We apply Benders Decomposition to solve each subsystem. The size of the subsystems is increased iteratively, until the size of the original line is reached. This procedure is depicted in Fig. 6.

Fig. 6
figure 6

Overview of bound calculation

4 Numerical study

All of the algorithms are implemented in C++. Gurobi 5.0, with default settings, is used to solve the linear and mixed-integer programs. The numerical study is performed on an Intel Core i7-3930K with 6\(\times \) 3.2 GHz and 32 GB RAM.

For all instances, the capacity of each buffer is limited to \(B_s = 20\), and the warm-up phase is chosen to be \(W_0 = 2000\).

To further speed up the solution process, we use callbacks, i.e., the master problem is not solved to optimality before handing over the values of the binary variables to the subproblem. Instead, a potential incumbent solution (the best integer solution found at any point of the search) is tested by the subproblem algorithm whenever the solver identifies one. If the solution is feasible, it becomes the new incumbent solution, and the solution process continues. Otherwise, a feasibility cut (13) or (14) is added to the master problem. We thereby avoid proving optimality in every step and visiting the nodes several times during different runs of the master problem. Both aspects waste time (Bai and Rubin 2009). Note that an implementation without callbacks would lead to complete enumeration for the BAP.

4.1 A note on robustness

We investigate the robustness based on the instances from the numerical study of Matta (2008). We assume a line with 5 stations and a bottleneck at the end. The processing times are exponentially distributed, with a base processing rate of 7.0. The processing rate of the bottleneck is assumed to be 6.0. The goal throughput is set to 5.776.

Figure 7 depicts the results of a throughput evaluation for different optimal buffer allocations for a varying number of workpieces. These allocations are obtained by independently solving 20 samples with 10,000 (Fig. 7a), 250,000 (Fig. 7b), 1,000,000 (Fig. 7c), and 5,000,000 workpieces (Fig. 7d) each. The throughput evaluation is conducted with 20 additional samples of 5,000,000 workpieces. Figure 7 presents the relative deviation of the minimum, average, and maximum throughput from the goal throughput that is obtained by these 20 samples for each buffer allocation. For 10,000 workpieces (Fig. 7a), the independent optimization of 20 samples leads to 19 different buffer allocations. The total buffer capacity lies between 36 and 44 for the different samples. For a total number of buffer spaces of 39 or above, the goal throughput is always reached, whereas a total number of 37 (or less) is not (even in the best case) sufficient. On average, the goal throughput is reached for the allocations with a buffer capacity of 38 in total. This means that in the case of the allocation with 44 buffer spaces in total, 6 redundant buffer spaces (14 % of the total buffer space needed) are allocated in the line. In Fig. 7b (250,000 workpieces), only 8 different buffer allocations are obtained with a total number of 38 or 39 buffer spaces in the line. On average, the goal throughput is always reached for all allocations. Even in the worst case, the maximum deviation from the goal throughput equals 0.03 %. Figure 7c shows very similar results for \(W=1,000,000\), with a maximum deviation of 0.01 %. Therefore, it can be concluded that 250,000 workpieces is sufficient to obtain robust results for the given configuration. However, for increasing number of stations or increasing squared coefficients of variation (SCV), additional workpieces may be required to obtain robust results, because more different allocations are obtained (see Tables 11 and 12 in the Appendix, respectively). Figure 7d shows that the algorithm converges to a unique solution of the total buffer capacity, i.e., 38 buffer spaces are allocated. Two allocations result from the optimization of 20 samples, which both always reach the goal throughput.

Fig. 7
figure 7figure 7

Robustness of the approach regarding the no. of workpieces (\(S~=~5\), bottleneck last)

Figure 8 shows the results of a throughput evaluation for the different optimal buffer allocations obtained from samples generated with simple random sampling (SRS) instead of descriptive sampling (DS), as explained in Sect. 2 (\(W=250{,}000\)). Compared to the results in Fig. 7b, the total number of buffer spaces varies between 37 and 39. The total number of different solutions obtained for 20 samples is 11 for SRS instead of 8 for DS. Moreover, the maximum deviation from the goal throughput is 0.03 % for DS. In contrast, in the case of SRS, a maximum deviation of 0.15 % is observed. Consequently, this demonstrates that DS leads to more robust results than SRS.

Fig. 8
figure 8

Robustness of simple random sampling (\(S~=~5, W~=~250{,}000\), bottleneck last)

4.2 Impact of bounds

This subsection compares three types of bounds: bounds derived from rules of thumb, bounds obtained from the optimal allocation (theoretical best case), and bounds generated from the subsystems as described in Sect. 3.2.

We use the rules of thumb developed by Powell and Pyke (1996) to generate allocations for given total buffer capacities. Powell and Pyke (1996) point out that balanced allocations lead to better throughput unless the imbalance caused by the bottleneck is more than 20 %. In the case of an imbalance of more than 20 %, the buffer capacity of the buffer, which is located farthest from the bottleneck, shall be decreased. The available buffer space shall be placed around the bottleneck. Infeasible allocations are used as feasibility cuts (14), while feasible allocations are upper bounds.

We investigate bounds from the optimal allocation (theoretical best case) to show the impact of near-optimal buffer allocations. In general, however, this solution is not known and can only be approximated, e.g., by rules of thumb and heuristics. The optimal solution provides the best upper bound for the buffer capacities. Moreover, solutions that are infeasible but close to the optimum are good candidates for feasibility cuts. Therefore, as upper bound, the optimal solution is used, whereas \(S-1\) feasibility cuts can be generated, each by decreasing the optimal capacity of a buffer by one.

The bounds generated from the subsystems according to Sect. 3.2 are of a different type as they provide (individual) lower bounds instead of only feasibility cuts.

Table 2 demonstrates the benefit of using different types of bounds for the exemplary flow line, which is described in the previous chapter. The first row shows the computation time without bounds of 7142 s as a reference. For the rules of thumb and the theoretical best cases, column 1 shows the tested allocations. Each of these allocations results either in a feasibility cut or a (non-individual) upper bound. We apply the rules of thumb for total buffer capacities of 37 and 38. Columns 2 and 3 depict whether the evaluation of the allocations results in a feasible or an infeasible throughput. The fourth and the fifth column show the computation times using these bounds and the resulting time savings in comparison to the calculation without bounds.

Table 2 Time saving potential of approximate solutions

It can be observed that, in most of the cases, the bounds have a positive impact on the computation time. The feasibility cuts generated from rules of thumb with 38 buffer spaces in total reduce the computation time by around 20 %. In contrast, the feasibility cuts with a total buffer capacity of 37 have little impact on the computation time. The effect of feasibility cuts generated from the theoretical best cases varies for the different allocations from 1 to 32 %. The upper bound obtained from the optimal solution leads to the highest decrease in computation time (34 %). However, even in this case, the impact is rather low. Moreover, approximate solutions generated by rules of thumb or heuristics, in general, are worse than the allocations generated from known optimal solutions, which further reduces the usefulness of such bounds. In contrast, the (individual) lower bounds generated from the subsystems reduce the computation time by 99 %. Therefore, it is more advantageous to implement the lower bounds generated from the subsystems as described in Sect. 3.2 instead of feasibility cuts or upper bounds from near-optimal solutions. Due to this reason, we omit further investigations of these upper bounds and feasibility cuts and focus on the lower bounds obtained from the subsystems.

4.3 Exponentially distributed processing times

The investigation of instances with exponentially distributed processing times is based on the instances from the numerical study of Matta (2008), but varies the number of stations and the location of the bottleneck. The distribution of the processing times is as described in Sect. 4.1. We test instances with 3, 5, and 7 stations with bottleneck at the end of the line or in the middle of the line. We generate 10 independent samples for each configuration. As the original MIP formulation is able to solve only small instances, we use samples of 10,000 workpieces to demonstrate the improvements in the computation time of Benders Decomposition. However, Sect. 4.1 shows that this sample size is not sufficient to obtain robust results. Therefore, further studies use samples with \(W=250{,}000\).

Table 3 presents the computation times of complete enumeration, the original formulation, Benders Decomposition with classical feasibility cuts (Cl. Cut), and benders decomposition with combinatorial feasibility cuts (Comb. Cut). In the latter case, we present both results with and without initial bounds. The computation time is limited to 10,000 s. Only two settings are solvable within this time limit using the original MIP or the Benders Decomposition approach with classical cuts.

Table 3 Mean computation times (exponential)

Benders Decomposition with combinatorial cuts finds the optimal solution much faster than the implementation with classical cuts. This matches the findings of Codato and Fischetti (2006). Even the implementation without callbacks leads to faster computation times. However, callbacks are required to solve instances with more than 3 stations. The procedure with Combinatorial Cuts without bounds is able to solve instances with up to 5 stations within the time limit. The additional computation time of Benders Decomposition with Classical Cuts is composed of the computation time due to the usage of the LP and the computation time that stems from the weakness of the cut. Benders Decomposition with Combinatorial Cuts and initial bounds solves all instances to optimality within a reasonable amount of time.

Table 3 also shows that the instances with a bottleneck in the middle of the line are easier to solve than the instances with a bottleneck at the end. The reason is that a bottleneck in the middle of the line is covered by more subsystems. Therefore, the obtained bounds are better, which results in a smaller feasible region.

To analyze the impact of the initial bounds, Fig. 9 compares the course of the lower and upper bounds for Benders Decomposition with Combinatorial Cuts, with and without initial bounds, for one sample of a 5-station line with 250,000 workpieces and bottleneck at the end. To derive the lower bounds, we optimized four 2-station subsystems, three 3-station subsystems, and two 4-station subsystems. The computation of the bounds is completed after 8 s, with a lower bound of 31 buffer spaces for the whole line. The lower bound for the case without bounds slowly rises by 1 in each step. In the case with initial bounds, the optimal solution of 38 is found after 19 s and is proven after 69 s. Without bounds, the upper bound drops in large steps until the optimal solution is found after 7051 s. This solution is proven to be optimal after 7141 s.

Fig. 9
figure 9

Course of the lower and upper bounds during the solution process (\(S~=~5, W~=~250{,}000\), bottleneck last)

Figure 10 depicts the shares of computation time for the bound calculation, the time until the optimal solution is found by the upper bound, and the time until this solution is proven to be optimal for a 5-station line with 250,000 workpieces and a bottleneck at the end. Most of the computation time is needed for the optimality proof. The calculation of the bounds represents only a small proportion of the total time, ranging from 9 to 15 % of the total computation time.

Fig. 10
figure 10

Share of computation times for bound calculation and optimality proof (\(S~=~5, W~=~250{,}000\), bottleneck last)

4.4 Generally distributed processing times

The following experiments give further insights on the performance of Benders Decomposition with Combinatorial Cuts and initial bounds. We investigate the performance of the algorithm with respect to generally distributed effective processing times. The generation of instances focuses on a base case, which is adapted from Helber et al. (2011) according to Table 4. We generate 10 independent samples for each configuration.

Table 4 Parameter settings for the base case

The experiment varies the distribution of the effective processing times and the number of stations based on the study in Helber et al. (2011). The Erlang-k distribution is used to generate processing times with squared coefficients of variation of 0.25 and 0.5, while the balanced mean variant of the Cox-2 distribution (Buzacott and Shanthikumar 1993) is used to generate processing times with squared coefficients of variation 1.0 and 2.0, respectively. The number of stations is set to 5 and 7, respectively.

The computational results are given in Table 5. The first four columns describe the setting. Column 5 gives the range of the total number of buffer spaces in the optimal solutions of 10 samples. The average computation times for the bounds and the total time are given in columns 6 and 7. The last column presents the maximum deviation from the goal throughput of all samples.

The instances with low SCV are solved quickly. The reason is that the initial lower bounds are better for small SCVs, as less starving and blocking occurs; see Tables 6, 10, 11, and 12 in the Appendix. Tables 6, 10, 11, and 12 present the values of the optimal solutions and the initial bounds for all of the subsystems of all of the samples (samples with identical bounds and identical optimal solutions are aggregated in a single line). For an SCV of 0.25, some initial bounds are tight (marked in bold).

Table 5 Mean computation times (Erlang-k and Cox-2)
Table 6 Detailed results (Erlang-k distribution, \(S\) = 5)

Instances with Cox-2 distributed processing times and 7 stations are especially difficult to solve, the computation time takes more than 1 h on average.

Table 5 also shows the computation time for the initial bounds. For Erlang-k instances with an SCV of 0.25, the calculation of the initial bounds takes a significant proportion of the total amount of computation time, summing up to approximately 50 % or even more. With increasing SCV, this proportion decreases. In the case of an SCV of 2.0, the portion of the bound calculation accounts for approximately 15 % and less of the total time. The detailed results for the initial bounds in Tables 6, 10, 11, and 12 show that it is reasonable to calculate all subsystems, as even large subsystems improve the (aggregated) bounds on the buffer capacities.

The column “Max. deviation from goal throughput” depicts the results of a throughput evaluation for the different optimal buffer allocations obtained from the different samples. The throughput evaluation is conducted with 10 new samples of 1,000,000 workpieces for each category of instances. The column shows the largest relative downward deviation of all optimal allocations if the goal throughput was not reached and the smallest relative upward deviation if it was reached. The deviation for each buffer allocation is shown in Tables 6, 10, 11, and 12. Very small downward and upward deviations are denoted as \(-\)0.00 and 0.00, respectively.

The maximum downward deviation obtained from all 80 optimization runs is only 0.36 %. Altogether, this shows that the Benders Decomposition approach with combinatorial cuts and initial bounds is able to optimize flow lines with generally distributed processing times quite well.

4.5 Correlated processing times

This experiment investigates the impact of statistical dependency on the optimal buffer allocation. Inman (1999) points out that statistical dependency of processing times, i.e., workpiece-dependent processing times at each station, occurs for example in the automotive industry when two- and four-door models are manufactured on the same line. We model this by generating processing times from Erlang-4 distribution with different rates for the two different types of workpieces. The rate corresponding to a workpiece of type 1 is set to 0.5, while the rate for workpieces of type 2 is 0.25. We assume that the probability that a workpieces is of type 2 is 20%. Non-listed parameters remain as in the base case (Table 4). We compare the results to allocations obtained from instances generated by Generalized Erlang distribution based on identical parameters. This corresponds to the case where correlation is neglected and approximated by independent identically distributed processing times. Generalized Erlang distribution may be interpreted as a random decision on the type for each processed workpiece at each station.

All instances were solved in less than 15 min. Further computational results are given in Table 7. Columns 2–4 correspond to the instances with correlation in processing times and the last three columns to the instances with Generalized Erlang distribution. The results show that the instances with Generalized Erlang distribution underestimate the throughput and therefore allocate more buffer spaces than necessary, mainly around the bottleneck. For the instances under investigation, on average 26 % additional buffer spaces were allocated. In conclusion, the approximation of correlated processing times by identical independently distributed processing times leads to substantial misallocation of buffer spaces. Therefore, it is important that correlations are considered in the solution approach, as it is possible with our approach.

Table 7 Detailed results (correlated processing times)

4.6 Long lines with reliable and unreliable stations

This experiment is devoted to long lines comprising 14 and 24 stations, respectively, some of which are reliable and others are unreliable (see Figs. 11, 12).

Fig. 11
figure 11

Setting of the 14-station line

Fig. 12
figure 12

Setting of the 24-station line

Reliable stations have Erlang-4-distributed (\(E_4\)) or deterministic (\(D\)) processing times, both with rate 0.5. Unreliable stations (DF) have deterministic processing times with rate 0.5 and exponentially distributed times to failure (TTF) and times to repair (TTR). The mean TTF and the mean TTR are chosen such that stations in the middle of the line, i.e., 7 and 8 in the line with 14 stations and 12 and 13 in the line with 24 stations, form the bottlenecks of the line. Non-listed parameters remain as in the base case (Table 4).

Tables 8 and 9 contain the results of the third experiment on long lines. The algorithm solves instances with 14 stations within 310 s on average for TTF = 10 and TTR = 4. If TTF = 5 and TTR = 2, the algorithm takes 50 s on average. For the line with 24 stations, 14  h on average are required to prove the optimal solution for TTF = 10 and TTR = 4. The instances with TTF = 5 and TTR = 2 can be solved within 10 min on average. The algorithm spends most of the time to calculate the results for the subsystems ( 93–98 % of the total time). Altogether, under consideration of the strategic nature of the Buffer Allocation Problem, the algorithm is able to optimize long lines in acceptable time.

Table 8 Detailed results (\(S\) = 14)
Table 9 Detailed results ( \(S=24\))

The majority of the buffer spaces (in many cases half of the total allocated capacities) is allocated between the bottleneck stations. At the beginning and at the end of the line, zero or only few buffer spaces are required. The last column in each table depicts the results of a throughput evaluation for the different optimal buffer allocations obtained from the different samples. The throughput evaluation is conducted with 10 new samples of 1,000,000 workpieces for each category of instances. The column shows the largest relative downward deviation of all optimal allocations if the goal throughput was not reached and the smallest relative upward deviation if it was reached. The maximum deviation obtained from all optimization runs is only 0.61 % for the 14-station line and 0.62 % for the 24-station line.

5 Conclusion and further research

In this paper, we develop a Benders Decomposition approach that is able to optimally solve the BAP with respect to an underlying sample. This approach divides the original problem into a master problem and a subproblem, which are both solved iteratively by exchanging information via cuts. We compared two types of cuts, classical feasibility cuts and combinatorial feasibility cuts. Our numerical study shows that the application of combinatorial cuts leads to substantial reductions in the computation time. Furthermore, we develop initial lower bounds based on the iterative solutions of subsystems for the original line. This approach is able to optimally allocate buffer spaces in long lines with arbitrary distributions of processing times, times to failure, and repair times within a reasonable amount of time. The numerical study also reveals that correlation effects in processing times have a significant effect, as the optimal buffer allocation is highly influenced. This demonstrates the necessity for flexible solution approaches, as the sample-based mathematical programming formulations.

Further research should be directed towards improving the computation times for lines with more stations. This may be performed by the analysis of additional bounds or by the development of a problem-specific branch and bound method. Additionally, the approach could be extended to more complex systems, such as flow lines with closed loops or several product types.