1 Introduction

Order scheduling is the problem of assigning customer orders to different machines. Orders might consist of multiple items and must be fully processed before they can be packaged and shipped. This problem can be found in many applications which include make-to-order production systems [1], assemble-to-order production systems [2], central fill pharmacies [3,4,5], and automotive repair shops [6]. In the literature, this scheduling problem has been studied to minimize a variety of objectives such as the makespan, the total completion time, and the total tardiness. This paper studies the order scheduling problem in high-throughput make-to-order manufacturing systems when the objective is to minimize the makespan and the total order collation delays.

Order collation delay is the completion time difference between the first and the last processed items within the same order [3, 4]. Large order collation delays contribute to a reduced throughput, non-recoverable productivity loss, or even system deadlocks. This research is motivated by an existing order collation delay problem in mail-order pharmacy automation systems (MOPA). MOPA is a high-throughput pharmacy fulfillment facility that receives large volumes of customized prescription orders from local pharmacies, fills them, and sends them back. The main component of the MOPA system is the robotic dispensing system (RDS), which consists of many auto-dispensers that count different medications in parallel, and a robot arm, which holds the vial during dispensing operation. Many of the prescription orders received consist of more than one medication, and each medication might be processed at a different RDS unit. Prescription orders with multiple medications have to be collated before packaging and shipping, which means that the first processed medication within an order has to wait until the remaining medications from the same order are processed. As the number of collated orders increases, the chances that the system might go into a deadlock or a crippled state increase. Another example of this problem can be found in automotive repair shops [6]. If a car has multiple failures, it may require more than one mechanic working simultaneously to repair it. The car will occupy a space in the shop until all repair processes are completed, which will prevent new cars from entering the system.

There are two options to avoid deadlocks: (1) increase order collation resources by investing in the material handling system or (2) schedule orders efficiently such that expected order collation delays are minimized. There have been several studies in the literature that solved the order collation delay scheduling problem by using exact methods or metaheuristics [3,4,5]. It was shown that the minimization of order collation delays as a sole objective function leads to an increased makespan [4]. In addition, it was noticed that the performance of the proposed methods degrades significantly as the problem size increases. In high-throughput manufacturing systems, thousands of orders are processed every day; thus, computationally efficient algorithms are needed. In this research, the workload balance with single-item orders (WBSO) heuristic is proposed to minimize the total order collation delays and the makespan. The proposed WBSO is compared with the non-dominated sorting genetic algorithm II (NSGA-II) and other greedy heuristics.

Machine flexibility is a critical factor that affects the performance of the order scheduling system. In the literature, machines are classified based on flexibility to three categories: (1) fully flexible, where machines are capable of processing all types of jobs; (2) fully dedicated, where machines are only capable of processing one type of job; and (3) multi-purpose or general, where machines are capable of processing certain types of jobs [7]. Fully flexible machines environment is expected to generate fewer total collation delays and makespan than other cases; however, in most of the applications, machines are either dedicated or multi-purpose. The performance of the proposed WBSO heuristic is evaluated based on a different amount of machine flexibility. In addition, different percentages of multi-item order jobs are incorporated in the scenarios.

The remainder of this paper is as follows: literature review is summarized in Sect. 2; the proposed heuristics are presented in detail in Sect. 3; results and analysis are discussed in Sect. 4; conclusions and future work are given in Sect. 5.

2 Literature Review

The order scheduling problem on parallel machines has been proven to be an NP-hard problem when the number of machines is larger than two [7,8,9,10,11]. Therefore, various heuristic methods have been proposed to provide near optimal solutions in minimal computational time for the three different machine environments: fully flexible, fully dedicated, and multi-purpose. For the fully flexible machine environment, the problem has been studied to minimize the total weighted completion time [12]. Several heuristics and rules have been proposed such as the longest processing time rule (LPT), weighted shortest total processing time (WSTP), and the weighted shortest LPT makespan first (WSLM) [12]. The WSTP-based heuristics provided better performance in terms of solution quality and computational time than the other proposed heuristics.

For the fully dedicated machine environment, the order scheduling problem has been studied to optimize different performance measures including the total weighted completion time [12, 13], the total tardiness [14], and the number of late jobs [9]. To minimize the total weighted completion times, several heuristics have been proposed such as the weighted shortest total processing time (WSTP), the weighted shortest maximum processing time (WSMP), and the weighted earliest completion time first (WECT) [12]. It was concluded that the WECT heuristic is preferable because it provides good results with short computational time [15]. A linear formulation of the problem has been proposed to generate optimal solution for the problem [13]. However, a problem with 200 orders and three machines takes up to 1 h of computational time. An approximation algorithm that incorporates a look-ahead mechanism has been proposed to minimize the total completion time [16]. Results showed that the proposed algorithm outperforms earliest completion time (ECT) heuristic while achieving similar computational performance.

Few studies in the literature addressed the order scheduling problem when the machines are multi-purpose. Three heuristics were proposed to minimize the total completion time of orders and the makespan while considering a job-split property in the system in [17]. It was shown that although multi-stage optimization heuristics provide better solution quality, they are computationally inefficient for solving large size problems. The problem of minimizing total tardiness of orders when machines are multi-purpose has also been studied. Multi-stage and bottleneck-focused algorithms were proposed to solve this problem in an air conditioner manufacturing company when a job-splitting property exists [18]. It was shown that the bottleneck-focused algorithms that take the processor schedule into consideration outperform multi-stage algorithms. The multi-site order scheduling problem (MSOS) is an application of this case [19]. In MSOS, which exists in companies that have multiple manufacturing plants, the company receives many production orders that need to be manufactured in different self-owned or collaborative facilities. This problem has been solved using NSGA-II to minimize three objectives: the total tardiness, the idle time, and the throughput time. Results indicated that NSGA-II provides superior results compared to the current scheduling algorithms. However, the number of orders is relatively small in this application when compared with MOPA systems.

The order scheduling problem in MOPA systems has several interesting features: (1) machines in this system are multi-purpose and their flexibility significantly affects the overall system performance [3]; (2) the number of orders that need to be scheduled is relatively high (for some systems the daily average is 45,000 orders); (3) factors such as the percentage of multi-item orders and the amount of machine flexibility are variable throughout the days and they are not similar for all MOPA systems. There have been several studies that addressed this scheduling problem in the literature. An adaptive parallel tabu search (APTS) algorithm combined with an \(\epsilon\)-constraint method has been proposed to solve the problem [4]. The proposed APTS outperformed LPT and tabu search in terms of total collation delays. This problem has also been solved using multi-objective genetic algorithms [5]. It was shown that NSGA-II provides the best Pareto front and the most stable behavior especially when the number of job in the system increases. However, both previous studies have assumed that the machines are fully flexible, which means that the RDS units are able to fill all types of medications. The effect of machine flexibility has been analyzed later in [3]. It was shown that machine flexibility significantly affects the total collation delays and the makespan. It was also concluded that the computational time of the metaheuristic increases significantly as the number of jobs, the flexibility of the machines, and the percentage of multi-item orders increase. Therefore, constructive heuristics that utilize simple priority rules are needed to efficiently schedule orders in MOPA systems.

3 Research Methodology

This research extends [3] by proposing a computationally efficient constructive heuristic that minimizes the total collation delays and the makespan. Therefore, a similar notation and mathematical model are used to provide the reader with better understanding of the problem. The proposed WBSO heuristic follows the concept of job insertion to balance workload. The details of this heuristic are presented in Sect. 3.2. The proposed algorithm is compared with NSGA-II, which is discussed in Sect. 3.3, and several greedy heuristics that include (1) longest processing time with order priorities (LPT-P), (2) least total workload with order priorities (LTW-P), and (3) multi-item orders first (MIOF). The details of the greedy heuristics are explained in Sect. 3.4. The list of notations used in the mathematical model is presented in Table 1.

Table 1 List of notations

3.1 Mathematical Model

A mixed integer linear programming model is proposed in [3] to solve this problem. In this mathematical model, eligible machines for each item are defined in the eligibility matrix A. If the item j from order i can be processed on machine m, then the value of the variable \(a_{ijm}\) is equal to one. Through this mechanism, the multi-purpose machine’s nature is incorporated into the problem.

$$\begin{aligned}{} & {} \min&\;\; C_{\max } \end{aligned}$$
(1)
$$\begin{aligned}{} & {} \min&\;\; \sum _i^I (L_{i} - E_{i}) \end{aligned}$$
(2)

s.t.

$$\begin{aligned} \sum _{m=1}^{K} \sum _{t=1}^{T} x_{ijmt} = 1\qquad\qquad \forall i,j \end{aligned}$$
(3)
$$\begin{aligned} \sum _{i=1}^{I} \sum _{j=1}^{n_i} x_{ijmt} \le 1\qquad\qquad \forall m,t \end{aligned}$$
(4)
$$\begin{aligned} x_{ijmt} - a_{ijm} \le 0\qquad\qquad \forall i,j,m,t \end{aligned}$$
(5)
$$\begin{aligned} s_{ijm1} = 0\qquad\qquad \forall i,j,m \end{aligned}$$
(6)
$$\begin{aligned} s_{ijm(t+1)} - \sum _{i=1}^{I} \sum _{j=1}^{n_i} \sum _{r=1}^{t} p_{ij} \cdot x_{ijmr}= 0\qquad\qquad \forall i,j,m,t < T \end{aligned}$$
(7)
$$\begin{aligned} c_{ij} - (s_{ijmt} + p_{ij}) - M (1-x_{ijmt}) \le 0\qquad\qquad \forall i,j,m,t \end{aligned}$$
(8)
$$\begin{aligned} s_{ijmt} + p_{ij} - c_{ij} - M (1-x_{ijmt}) \le 0\qquad\qquad \forall i,j,m,t \end{aligned}$$
(9)
$$\begin{aligned} \sum _{i=1}^{I} \sum _{j=1}^{n_i}\sum _{t=1}^{T} p_{ij} \cdot x_{ijmt} - C_{\max } \le 0\qquad\qquad \forall m \end{aligned}$$
(10)
$$\begin{aligned} c_{ij} - L_{i} \le 0\qquad\qquad \forall i,j \end{aligned}$$
(11)
$$\begin{aligned} E_{i} - c_{ij} \le 0\qquad\qquad \forall i,j \end{aligned}$$
(12)
$$\begin{aligned} s_{ijmt} \ge 0\qquad\qquad \forall i,j,m,t \end{aligned}$$
(13)
$$\begin{aligned} c_{ij} \ge 0\qquad\qquad \forall i,j \end{aligned}$$
(14)
$$\begin{aligned} E_{i}, L_{i} \ge 0\qquad\qquad \forall i\end{aligned}$$
(15)
$$\begin{aligned} x_{ijmt} \in \lbrace 0,1 \rbrace\qquad\qquad \forall i,j,m,t \end{aligned}$$
(16)

Equations (1) and (2) are the objective functions to minimize the schedule makespan and the total collation delays from all orders. Equations (3) and (4) ensure that each item is processed only once and that each position is filled by one item at max, while Eq. (5) guarantees that items are processed on eligible machines. Equations (6) and (7) are used to calculate items starting times, while Eqs. (8) and (9) are used to calculate items completion times. Equation 10 is used to calculate the schedule makespan, while Eqs. (11) and (12) define the earliest and latest order completion times. Equations (1316) state the feasible region for decision variables.

3.2 Workload Balancing by Single-Item Orders (WBSO)

There are two types of heuristics to solve scheduling problems: constructive and improvement heuristics. Constructive heuristics attempt to build a schedule from scratch through adding jobs each step, while improvement heuristics start with an initial solution and then improve it through steps of schedule alteration. Although improvement heuristics, including local search methods, usually provide the decision maker with higher quality solutions, they are criticized for their large computational time. In contrast, constructive heuristics can obtain high-quality solutions in short computational time, if designed well. The complexity of this problem arises from three aspects: the order scheduling nature of the problem, job assignment restrictions, and the fact that there are multiple objectives. These three aspects make it challenging to design an effective constructive heuristic that minimizes both objectives simultaneously.

In this research, a heuristic that is based on the concept of job insertion is proposed to solve this order scheduling problem. The concept of this heuristic, which is called the workload balancing with single-item orders (WBSO), is illustrated in Fig. 1. When a multi-item order is assigned, the workload difference is calculated, and a single-item order is inserted to balance the current workload before another multi-item order is assigned.

Fig. 1
figure 1

Illustration of WBSO algorithm concept

The WBSO algorithm can be summarized in the following steps:

Step 1.:

Calculate the maximum workload possible per each machine \(EW_m\)

$$\begin{aligned} EW_m = \sum _{i=1}^{I} \sum _{j=1}^{n_i} a_{ijm} \cdot p_{ij} \end{aligned}$$
(17)
Step 2.:

Choose the least flexible machine \(m_l\), which is the machine with the minimum expected workload, \(min \lbrace EW_1, EW_2,..., EW_K \rbrace\)

Step 3.:

For each order i, set order priority \(O_i\) as

$$\begin{aligned} O_i = \sum _{j=1}^{n_i} a_{ijm_l} \end{aligned}$$
(18)
Step 4.:

Sort the orders based on their priority \(O_i\) in descending order. Ties are broken arbitrarily.

$$\begin{aligned} O_1 \ge O_2 \ge O_3 \ge ... \ge O_I \end{aligned}$$
(19)
Step 5.:

Schedule the next multi-item order based on the priority list in Step (4) by following Steps (5.1\(-\)5.5):

5.1.:

Choose the least flexible item \(j_l\), min \(\lbrace \sum _{m=1}^K a_{i1m}\), \(\sum _{m=1}^K a_{i2m},\) \(\cdots ,\) \(\sum _{m=1}^K a_{i{n_i}m}\rbrace\). Ties are broken based on LPT and then arbitrarily.

5.2.:

Find the set of eligible machines (\(A_{j_l}\)) for item \(j_l\)

5.3.:

Calculate the current workload for each machine in set \(A_{j_l}\)

$$\begin{aligned} W_m = \sum _{i=1}^I \sum _{j=1}^{n_i} \sum _{t=1}^T p_{ij} \cdot x_{ijmt} \qquad \qquad \forall m \in A_{j_l} \end{aligned}$$
(20)
5.4.:

Assign item \(j_l\) on the machine with the minimum current workload

5.5.:

Repeat from Step (5.1) until all the items within the order are scheduled

Step 6.:

Calculate the current workload for each machine \(W_m\) in the system

$$\begin{aligned} W_m = \sum _{i=1}^I \sum _{j=1}^{n_i} \sum _{t=1}^T p_{ij} \cdot x_{ijmt} \end{aligned}$$
(21)
Step 7.:

Calculate the current largest workload difference D

$$\begin{aligned} D = \max \lbrace W_1, W_2, ..., W_K \rbrace - \min \lbrace W_1, W_2, ..., W_K \rbrace \end{aligned}$$
(22)
Step 8.:

While \(D > s\), where s is the shortest processing time in the system, do

8.1.:

Choose the machine with minimum current workload f

8.2.:

Find the set of unscheduled single-item orders that can be processed on machine f

8.3.:

If \(f \ne \phi\), do

8.3.1.:

Choose a single-item order with \(a_{ijf} = 1\) based on SPT

8.3.2.:

Schedule that single-item order on f to reduce the workload gap

8.3.3.:

Calculate the workload difference D

Step 9.:

Repeat from Step (5) until all multi-item orders are scheduled. If all multi-item orders are scheduled, go to Step (10)

Step 10.:

Schedule the remaining single-item orders based on \(O_i\) and the LPT rule

To illustrate the algorithm steps, consider the orders in Table 2. In this example, there are five single-item orders (1, 2, 3, 4, and 8) and three multi-item orders (5, 6, and 7). The corresponding binary values in the table represent the eligibility matrix. First, the maximum possible workload for each machine \(EW_m\) is calculated using Eq. (17). Based on these values (136, 137, 74), Machine 3 is picked as the least flexible machine (\(m_l\)); thus, the priority of orders with items that can be processed on this machine is increased. In Step 5, the next multi-item orders on the priority list are orders (5 and 7). Ties can be broken arbitrarily; thus, Order 5 is chosen to be scheduled in this step. For this order, Item 1 is assigned on Machine 3, while Item 2 can be assigned on Machine 1 or 2. At this step, because the workload of the two machines is equal, Machine 1 is chosen. In Step 6, the current workload (\(W_m\)) for each machine is calculated (14, 0, 15) and the workload difference (D) is determined (15).

Then, the workload difference is compared to a predefined value s, which can be assumed as the shortest processing time in the system (14). Because \(D > s\), the machine with the minimum current workload \(w_{min}\) is picked (Machine 2), and the single-item order with shortest processing time is assigned on it (Order 1). Because the current workload on the three machines is now less than s, the next multi-item order in the list is scheduled (Order 7). After assigning all multi-item orders, single-item orders are assigned based on LPT rule. The resulting schedule for these orders is presented in Fig. 2.

Fig. 2
figure 2

Calculated schedule based on WBSO

Table 2 Sample of random orders for WBSO numerical example

The performance of the WBSO is compared to the NSGA-II and the other greedy heuristics in this research.

3.3 Non-dominated Sorting Genetic Algorithm (NSGA-II)

Multi-objective optimization has received considerable attention in the literature over the past years. This is related to the fact that solutions in real-world problems are evaluated based on multiple objectives. The multi-objective optimization problem can be mathematically expressed as follows [20]:

$$\begin{aligned}&\text {min}&f = [f_1(x), f_2(x),..., f_m(x)] \end{aligned}$$
(23)
$$\begin{aligned}&\text {subject to}&x \in \Omega \end{aligned}$$
(24)

where \(\Omega\) is the decision variable space. In some problems, the objectives do not conflict with each other and it is easy to find a single compromise solution for the problem; however in most of the practical problems, there are usually conflicting objectives and it is difficult to find a single optimal solution [19]. In this situation, the objective is to find the set of Pareto optimal solutions, which are the solutions that cannot be improved in any objective without degrading another objective. These solutions are also referred to as non-dominated solutions in the problem.

Methods used to solve multi-objective optimization problems can be classified into two approaches. The first approach, which is the classical approach, attempts to combine multiple objectives into a single aggregating function. Examples of this approach include the weighted sum, goal programming, min-max Pareto point, and the \(\varepsilon\)-constraint methods [20]. The second approach, by contrast, attempts to generate the Pareto front, which is the set of non-dominated solutions in the problem. Pareto-based evolutionary algorithms have been widely applied in solving multi-objective optimization problems because they utilize a population-based approach which allows them to generate multiple solutions in one simulation run. Another advantage of using multi-objective evolutionary algorithms is that they are not affected by the shape or the continuity of the Pareto front in contrast to traditional optimization techniques.

The non-dominated sorting genetic algorithm II (NSGA-II), proposed by [21], is considered one of the most effective multi-objective evolutionary algorithms. It utilizes a fast non-dominated sorting algorithm, a crowded distance estimation procedure, and a crowded comparison operator to overcome some of the problems encountered by other previous MOEAs. These problems include non-elitism and the need to specify a sharing parameter. In this section, NSGA-II will be briefly described.

The first feature of the NSGA-II is the fast sorting algorithm, which can be summarized in the following steps:

Step 1.:

For each solution p in the population, calculate the domination count \(n_p\), which represents the number of individuals that dominate the solution, and create \(S_p\), which is the set of individuals dominated by the solution. Solutions that belong to the first non-dominated front will have an \(n_p\) equal to zero.

Step 2.:

Initialize the front counter i to one. For each solution p that belongs the frontier \(F_i\), visit each solution q from its set \(S_p\) and decrement its domination count \(n_q\) by one. If \(n_q\) becomes zero, put q in a separate list Q. The set Q will be used to store individuals for the \((i+1)^{th}\) front.

Step 3.:

Set \(F_i = Q\) and repeat Step two until \(Q=\phi\).

NSGA-II employs a crowding distance assignment process and crowded-comparison operator to guide the selection process to achieve better approximation of the Pareto front. The crowding distance assignment process starts by sorting the population in an ascending order, and assigning the boundary solutions (solutions with largest and smallest objective function values) an infinite distance value. Solutions in between (\(i=2\) to \(i=l-1\)) are assigned distance value according the following formula:

$$\begin{aligned} d_i = d_i + \frac{d_{(i+1)}.m - d_{(i-1)}.m}{f_m^{max} - f_m^{min}} \end{aligned}$$
(25)

where \(d_i. m\) is the \(m^{th}\) objective function value and \(f_m^{max}\) and \(f_m^{min}\) are the maximum and the minimum \(m^{th}\) objective function values. After the individuals are sorted based on non-domination and the crowding distance for each solution is assigned, a selection process is executed by using a crowded-comparison operator. The operator will check the ranking and choose the solution with the better ranking. If rankings are equal, the solution with the largest distance value is chosen (i.e., the solution located in a less crowded region).

The chromosome used in this problem is a 2D chromosome. Each row represents a machine, and each column represents a position. Each gene is an assigned item, and the sequence of genes represents the scheduling order of these items on the machine. The main loop of NSGA-II starts by creating a random parent population \(P_0\) by assigning items randomly to machines while taking into consideration eligibility constraints. A fast non-dominated sorting is then applied to the initial population and crowding distance is assigned to each solution. An offspring set \(Q_0\) is then created through applying binary selection, crossover, and mutation to the initial population. The crossover operator is used to create new solutions by combining two or more parent solutions, which helps to explore the search space by combining different elements of the parent solutions. A partially mapped crossover operator is used in this GA to produce offspring. This crossover process can be summarized briefly in the following steps: (1) randomly pick a segment from Parent 1 and copy it to the offspring; (2) locate the segment in Parent 2 and find the set of elements that were not copied i; (3) find the set of elements j from the offspring that do not exist in the segment of Parent 2; (4) place elements i in the location of elements j in the offspring. This process is restricted by machine assignment constraints. The mutation operator is used to introduce small random changes into the existing solutions, which can maintain diversity in the population and help avoid premature convergence of the algorithm. Mutation strategy applied in this research is a gene-swap mutation, where two genes are swapped randomly given that eligibility constraints are not violated. Figure 3 illustrates crossover and mutation operators.

Fig. 3
figure 3

GA operators: (left) crossover (right) mutation

The population set \(P_0\) and the offspring set \(Q_0\) are combined to form \(R_0\), and a fast non-dominated sorting is applied to it. After a crowding distance assignment process is applied, solutions are selected to fill the new population \(P_{t+1}\). An offspring set \(Q_{t+1}\) is then created through binary selection, crossover, and mutation. If the stopping criteria are met, \(Q_{t+1}\) is returned; else, parents and offspring sets are combined and the process continues. The main loop is described as follows:

Step 1.:

Initialize a random population \(P_0\)

Step 2.:

Apply fast non-dominated sorting to \(P_0\) and calculate \(d_i\) for each solution i

Step 3.:

Use binary selection, crossover, and mutation to create offspring \(Q_0\)

Step 4.:

Set counter \(t=1\)

Step 5.:

Combine \(P_t\) and \(Q_t\) to one set R(t)

Step 6.:

Apply fast non-dominated sorting to \(R_t\) and calculate \(d_i\) for each solution i

Step 7.:

Generate population \(P_{t+1}\)

Step 8.:

Terminate if stopping criteria are met and output \(Q_t\)

Step 9.:

Use binary selection, crossover, and mutation to create offspring \(Q_{t+1}\)

Step 10.:

Set \(t=t+1\) and go to Step 5

3.4 Greedy Heuristics

The proposed WBSO heuristic is evaluated against the following greedy heuristics.

3.4.1 Longest Processing Time with Order Priorities (LPT-P)

The longest processing time (LPT) heuristic, which prioritizes items based on the length of their processing time, reports optimal or near optimal makespan in parallel machine scheduling. The LPT heuristic is modified to consider the nature of the order scheduling problem studied. This modified heuristic can be summarized in the following steps:

Step 1.:

Pick the item with the longest processing time \(j_{LP}\)

Step 2.:

Assign item \(j_{LP}\) to the first available eligible machine

Step 3.:

Increase the priority of the remaining items in item \(j_{LP}\) order

Step 4.:

Assign the items that have higher priority

Step 5.:

Repeat until all items are assigned

3.4.2 Least Total Workload with Order Priorities (LTW-P)

The least total workload heuristic considers the scheduled and the expected unassigned workload for each machine [22]. This heuristic is also modified to consider order scheduling. Let \(S_m\) and \(W_m\) represent the current and the unassigned workloads of machine m. The modified heuristic can be summarized in the following steps:

Step 1.:

Find machine m with minimum \(S_m + W_m\)

Step 2.:

List all the items that can be processed on machine m

Step 3.:

Pick the item with the LPT and assign it to machine m

Step 4.:

Increase the priority of the remaining items in the order

Step 5.:

Assign the items that have higher priority

Step 6.:

Repeat until all items are assigned

3.4.3 Multi-item Orders First (MIOF)

In this heuristic, items that belong to multi-item orders will be given higher priority, and therefore will be scheduled first. This heuristic can be summarized in the following steps:

Step 1.:

Take a multi-item order (starting with three-item orders)

Step 2.:

Schedule all items within this order

Step 3.:

Repeat until all multi-item orders are scheduled

Step 4.:

Schedule single-item orders

4 Results and Analysis

Six scenarios with different problem sizes are designed to evaluate the performance of the proposed WBSO heuristic. In the first three scenarios, the amount of machine flexibility \(F_p\) is varied between 0, 50, and 100%, respectively, while the percentage of items that belong to multi-item orders \(\gamma\) is fixed at 50%. Machine flexibility is calculated based on the measure proposed by [22], which is illustrated in Eq. (26). In this equation, \(a_{jm}\) is a binary variable that indicates if item j can be processed on machine m, n is the number of items, and K is the number of machines. A system with \(F_p=0\%\) is a fully dedicated system, while a system with \(F_p=100\%\) is a fully flexible system. In the second three scenarios, \(\gamma\) is varied between 25, 50, and 75%, respectively, while \(F_p\) is fixed at 50%. Processing times are generated from a discrete random uniform distribution (U[14, 17]), and the number of machines analyzed is three. The number of items in multi-item orders follows a discrete random uniform distribution (U[2, 3]). The term “number of items” refers to the accumulated items that reach the dynamic scheduling system at the same moment, with a large-scale problem being defined as the number of accumulated items reaching 96. Instances and CPLEX results for these scenarios are obtained from [3] and provided here as a performance reference.

$$\begin{aligned} F_p = \dfrac{\sum _{j,m} a_{jm} - n}{n(K-1)} \cdot 100 \% \end{aligned}$$
(26)

It is difficult to compare the results of the WBSO and the NSGA-II, because the latter provides a set of solutions rather than a single one. To assess the convergence of both algorithms towards an ideal reference point, we employed the modified mean ideal distance measure (MMID). This metric is widely used in evaluating the performance of the NSGA-II algorithm and comparing it with other heuristics that yield single or multiple optimal solutions in the scheduling domain. MMID measure is calculated based on Eq. (27) [23]. In this equation, NPS is the number of Pareto solutions, s is the solution index, r is the objective index, \(f_r\) is the solution objective value, and z is the reference point. The reference point z is considered as \((C_{max}^*, 0)\), where \(C_{max}^*\) is the optimal makespan and 0 is the ideal value of total collation delays.

$$\begin{aligned} MMID = \dfrac{1}{NPS} \sum _{s\mathop{=}1}^{NPS} \sqrt{\sum _{r\mathop{=}1}^2 (f_{sr} - z_r)^2} \end{aligned}$$
(27)

The NSGA-II parameters and operators used in these experiments are presented in Table 3. These parameters have been optimized using a design of experiments approach to achieve the best performance in terms of solution quality and computational time. The results that illustrate the comparison between WBSO and the NSGA-II based on the MMID measure are shown in Table 4. For these experiments, \(F_p\) and \(\gamma\) are fixed at 50%. It can be observed that for large size problems (96, 120, and 160 items), the WBSO outperforms the NSGA-II by 33.7% on average. The experimental results also showed that the difference between MMID values and the average total collation delays is not significant. Therefore, the NSGA-II average values will be used to benchmark the WBSO performance.

Table 3 NSGA-II operators and parameters
Table 4 Comparison between WBSO and NSGA-II based on the MMID performance measure

The experimental results for different values of \(F_p\) are presented in Table 5.

Table 5 Experimental results for different amounts of machine flexibility

In these experiments, \(\gamma\) is fixed at 50%. Figure 4 illustrates the WBSO performance in terms of total collation delays when \(F_p = 50\%\). For small size problems (12, 24, and 48 items), WBSO generates larger makespan and total collation delays compared to NSGA-II. However compared to the other heuristics (LPT-P, LTW-P, MIOF), WBSO generates (63%, 90%, 65%) fewer collation delays on average. For large size problems, WBSO outperforms all other heuristics in generating fewer collation delays. When compared to the NSGA-II, WBSO generates 28% fewer collation delays and 6% more makespan on average. This indicates that when the number of items increases, the performance of local search methods degrades significantly. Moreover, when compared to other heuristics (LPT-P, LTW-P, MIOF), WBSO generates (68%, 94%, 55%) fewer collation delays on average. The primary reason is that the WBSO is designed to combine the features of the three scheduling rules.

Fig. 4
figure 4

Algorithms performance when \(F_p = 50\%, \gamma = 50\%\)

Figure 5 illustrates the performance of the proposed WBSO heuristic when the system is fully flexible (\(F_p=100\%\)). All the proposed heuristics generate optimal/near optimal solutions for the makespan. The WBSO heuristic generates only 0.27% more makespan compared to NSGA-II. For the total collation delays, the WBSO heuristic outperforms all other heuristics. Compared to the NSGA-II, the WBSO heuristic generates 70–85% fewer collation delays in large-size problems (96, 120, and 160 items). Compared to the other heuristics (LPT-P, LTW-P, MIOF), the WBSO heuristic generates (92%, 92%, 80%) fewer total collation delays on average in large size problems.

Fig. 5
figure 5

Algorithms performance when \(F_p = 100\%, \gamma = 50\%\)

The performance of the proposed heuristic in the fully dedicated environment (\(F_p=0\%\)) is illustrated in Fig. 6. All heuristics generate the same makespan, because each item can be processed on one machine only. For the total collation delays, the NSGA-II outperformed the WBSO heuristic in the 12, 24, 48, 96, and 120 items problems; however, both heuristics provided similar solution quality in the 160 items problem. Compared to the other heuristics (LPT-P, LTW-P, MIOF), the WBSO heuristic generates (87%, 90%, 78%) fewer total collation delays on average in large size problems.

Fig. 6
figure 6

Algorithms performance when \(F_p = 0\%, \gamma = 50\%\)

To evaluate the performance of the proposed heuristic for different values of \(\gamma\), the following scenarios are designed. In the these scenarios, \(\gamma\) is varied between 25%, 50%, and 75% while \(F_p\) is fixed at 50%. The obtained results are presented in Table 6. Figure 7 illustrates the WBSO performance when \(\gamma\) is 25%. Compared to the NSGA-II, the WBSO achieves 27% fewer collation delays on average in large size problems. Compared to the other heuristics, the WBSO heuristic generates (80%, 95%, 65%) fewer collation delays on average in large size problems.

Table 6 Experimental results for different values of\(\gamma\)
Fig. 7
figure 7

Algorithms performance when \(F_p = 50\%, \gamma = 25\%\)

The WBSO performance when \(\gamma\) is 75% is illustrated in Fig. 8. The WBSO heuristic generates (50%, 78%, 30%) fewer collation delays on average compared to the LPT-P, LTW-P, and the MIOF heuristics in the large size problems. It can also be noticed that the NSGA-II performance degrades significantly as the number of items increase.

Fig. 8
figure 8

Algorithms performance when \(F_p = 50\%, \gamma = 75\%\)

5 Conclusions and Future Work

In this research, an order scheduling heuristic is proposed to minimize the makespan and the total collation delays in high-throughput make-to-order manufacturing systems. The proposed heuristic, which is called the WBSO, is based on the concept of workload balancing by inserting single-item orders. The performance of WBSO is compared to NSGA-II, LPT-P, LTW-P, and MIOF. The experimental results show that the proposed provides 33% fewer collation delays and 6% more makespan on average when compared to the NSGA-II. The results also show that for large size problems, the WBSO generates (74%, 89%, and 62%) less collation delays on average than LPT-P, LTW-P, and MIOF rules. A major advantage of the WBSO algorithm is the ability to generate minimum total collation delays with minimum computational time (one second) for large size problems. On the contrary, the performance of meta-heuristic methods, such as the NSGA-II, degrades significantly as the problem size increases. In MOPA systems, the number of orders per day can reach 40,000 to 50,000 orders; therefore, computationally efficient algorithms, such as the WBSO, are needed.

Future work of this research would include improving the WBSO heuristic multi-objective performance and testing it on different values of processing times, levels of machine flexibility, and percentages of multi-item orders. In addition, the WBSO heuristic can be combined with local search methods to provide superior solutions. Future work would also include integrating this scheduling problem with the RDS planogram design problem [24]. In the planogram design problem, medication assignments to machines are optimized, which leads to more machine flexibility. The integrated problem will have more objectives; thus, the design of a scheduling rule will be a more challenging task.