1 Introduction

Flexible fowshops are becoming increasingly popular in industry, primarily due to large workload requirements imposed by jobs on machines representing one or more stages of a flowshop scheduling problem (Logendran et al. 2005). In a flexible flowshop environment, there are stages in series with a number of identical parallel machines at each stage. All jobs have to pass through all stages. At each stage, each job should be processed by only one machine.

Usually, the jobs assigned to a manufacturing cell are set to different groups based on their similarities such as similar shape or required setups on machines in order to improve the efficiency. This subject is called group scheduling (GS). GS improve the production efficiency by a reduction in setup time, tooling needs, and work-in-process inventories. Usually, a setup needed on a machine to transfer from one job to another job within a group is negligible. However, it is required to consider a setup time on each machine to transfer from processing one job of a group to a job of another group. Sometimes this setup time is dependent upon the previously processed group on the machine. In this case the problem is called the sequence-dependent group scheduling problem.

Schaller et al. (2000) consider minimization of makespan as the criterion, in the sequence-dependent permutation flowshop group scheduling problem, i.e., \(Fm|fmls,s_{hgi} ,prmu|C_{\max } \) based on updated scheduling notations by Pinedo (2012) for the first time. They propose several heuristic algorithms as well as a lower bounding method for the research problem. França et al. (2005) develop two metaheuristics based on genetic algorithms (GA) and memetic algorithms (MA) to solve the same problem proposed by Schaller et al. (2000). They show that the MA has a superior performance than the GA as well as the heuristics proposed by Schaller et al. (2000). Logendran et al. (2006b) propose metaheuristics algorithms based on tabu search (TS) for the \(F2|fmls,s_{hgi} ,prmu|C_{\max } \) problem. They also develop a lower bounding method based on the mathematical model of the research problem to evaluate the performance of the TS algorithms. Salmasi et al. (2011) develop a metaheuristic algorithm based on ant colony optimization (ACO) and a lower bounding method based on the mathematical model of the problem for the \(Fm|fmls,s_{hgi} ,prmu|C_{\max } \) problem. They show that the ACO has a better performance than the other available metaheuristics in the literature for the proposed research problem. Ravetti et al. (2012) propose and analyze parallel hybrid heuristics for permutation flowshop problem. The \(Fm|fmls,s_{hgi} ,prmu|\sum C_j \) problem is investigated by Salmasi et al. (2010) for the first time. They develop a mathematical model and several metaheuristics based on TS and ACO for the proposed research problem. They show that their proposed ACO algorithm has a superior performance than the TS. Furthermore, they develop a lower bounding method based on the branch and price (B&P) algorithm. Hajinejad et al. (2011) propose a particle swarm optimization (PSO) algorithm for the \(Fm|fmls,s_{hgi} ,prmu|\sum C_j \) problem and show that their PSO has a better performance than the ACO algorithm proposed by Salmasi et al. (2010). Naderi and Salmasi (2012) develop two mathematical models for the same problem. They show that one of the mathematical models is so effective that even medium size instances (problems up to 60 jobs in all groups) are solved to optimality in a reasonable time. They also propose a hybrid metaheuristic based on genetic and simulated annealing. They show that their proposed algorithm has a superior performance than the ACO proposed by Salmasi et al. (2010).

Most of prior research are performed on the flowshop group scheduling problem and there are only a few research that consider flexible flowshop environment. Logendran et al. (2005) consider the group scheduling flexible flowshop for the first time. In their research, three constructive heuristic algorithms are developed for the \(FFm|fmls|C_{\max } \) problem. Logendran et al. (2006a) develop a TS algorithm for the \(FFm|fmls,s_{hgi} |C_{\max } \) problem. For analyzing both the makespan value and computation time, a detailed statistical experiment is performed. Paternina-Arboleda et al. (2008) consider the problem of makespan minimization on a flexible flowshop and propose a heuristic algorithm based on the identification and exploitation of the bottleneck stage. Zandieh et al. (2009) develop two metaheuristic algorithms based on simulated annealing and GA for the \(FFm|fmls,s_{hgi} |C_{\max } \) problem. Salmasi et al. (2011) propose another TS algorithm for the same problem and show that their proposed algorithm has a superior performance than the proposed algorithm by Logendran et al. (2006a). They also propose a mathematical model for the proposed research problem for the first time. Keshavarz and Salmasi (2013) propose another mathematical model as well as a metaheuristic algorithm based on MA for the \(FFm|fmls,s_{hgi} |C_{\max } \) problem. They show that both their proposed mathematical model and metaheuristic algorithm have better performance than the ones proposed by Salmasi et al. (2011). They also proposed a lower bounding mechanism for the proposed research problem inspired from Salmasi et al. (2011).

All previous research for the \(FFm|fmls,s_{hgi} |\gamma \) problem consider minimization of makespan as the criterion. To the best of our knowledge, in this research the \(FFm|fmls,s_{hgi} |\sum C_j \) problem is considered for the first time. A mixed integer linear mathematical model and a metaheuristic algorithm based on MA are proposed for the research problem. Also a lower bounding method based on the B&P algorithm is proposed to evaluate the quality of the proposed MA.

The rest of the paper is organized as follows: The characteristics of the problem are described in Sect. 2. A mathematical model for the research problem is proposed in Sect. 3. The MA for obtaining approximate solutions to the problem is explained in Sect. 4. The Lower bounding method is described in Sect. 5. Results of computational experiments to evaluate the performance of the MA and the lower bounding method are reported in Sect. 6. Finally, the results are discussed in Sect. 7 with providing directions for future research.

2 Problem descriptions

Consider a flexible flowshop environment with \(m\) stages. Assume that \(N\) groups of jobs should be processed within these stages. Each stage (say stage \(i)\) has \(nm_i \) identical parallel machines. It is assumed that at least one stage has more than one identical machine in parallel. Each group (say group \(g)\) consists of \(n_g \) jobs. The goal is to find the best sequence of processing the groups as well as the jobs belonging to each group in order to minimize the total completion time. Based on Pinedo (2012) the flexible flowshop sequence-dependent group scheduling problem (FFSDGSP) with minimization of total completion time can be noted as \(FFm|fmls,s_{hgi} |\sum C_j \): Flexible flowshop with \(m\) stages (\(FFm)\); group (family) scheduling problem (fmls); sequence-dependent setups (\(s_{hgi} )\); and total completion time minimization (\(\sum C_j )\).

The setups are assumed to be anticipatory. In other words, the setup on a machine at each stage to process a group can be started before any job belonging to that group physically arrives at that stage. The group scheduling assumptions are valid in this problem. In other words, if processing of a job in a group starts on a machine, all jobs within that group should be processed before switching the machine to process the jobs of another group.

3 The mathematical model

A binary mixed integer linear programming formulation is developed for the research problem. In this model, it is assumed that a dummy group, say group 0, is set as the first group on each machine. This assumption is used to calculate the required setup time for the first group on each machine at each stage. It is also assumed that the completion time of this dummy group is equal to 0 at all stages. The notations, parameters, decision variables, and the mathematical model are as follows:

Parameters and notations:

\(N\) :

Number of groups

\(g\) :

Index used for representing groups       \(g=1,\ldots ,N\)

\(m\) :

Number of stages

\(i\) :

Index used for representing stages      \(i=1,\ldots ,m\)

\(nm_i \) :

Number of parallel machines at stage \(i\)             \( i=1,\ldots ,m\)

\(n_g \) :

Number of jobs in group \(g\)             \(g=1,\ldots ,N\)

\(p_{gji} \) :

Process time of job \(j\) of group \(g\) at stage \(i\)       \(g=1,\ldots ,N;j=1,\ldots ,n_g ; i=1,\ldots ,m\)

\(s_{hgi} \) :

Setup time for processing group \(g\) after group \(h\) on one of the machines of stage \(i\)       \(h=1,\ldots ,N;g=1,\ldots ,N; g \ne h ;i=1,\ldots ,m\)

\(s_{0gi} \) :

Setup time for processing group \(g\) as the first group on any machine of stage \(i\)       \(g=1,\ldots ,N;i=1,\ldots ,m\)

\(M\) :

A large number

Decision variables:

\(X_{hgi}\) :

\(=\left\{ \begin{array}{l@{\quad }l} 1 &{} {\hbox {If group }g\hbox { is processed immediately after group }h\hbox { on one of the}}\\ &{}\hbox {machines of stage }i\\ 0 &{} {\hbox {Otherwise}}\\ \end{array}\right. \)

\(Y_{gj_1 j_2 i}\) :

\(= \left\{ \begin{array}{l@{\quad }l} 1 &{}{\hbox {If in group }g\hbox { job }j_2 \hbox {is processed after job }j_1 \hbox {at stage }i} \\ 0 &{} {\hbox {Otherwise}}\end{array}\right. \)

\(C_{gji} \) :

Completion time of job \(j\) of group \(g\) at stage \(i\)

\(ST_{gi}\) :

Starting time of processing the jobs of group \(g\) at stage \(i\)

\(FT_{gi}\) :

Completion time of processing the jobs of group \(g\) at stage \(i\)

The model:

$$\begin{aligned} \min \sum \limits _{g=1}^N \sum \limits _{j=1}^{n_g } C_{gjm} \end{aligned}$$
(1)
$$\begin{aligned} \sum \limits _{\begin{array}{c} h=0 \\ h\ne g \end{array}}^N X_{hgi} =1 \quad \quad \quad g&= 1,...,N;i=1,...,m \end{aligned}$$
(2)
$$\begin{aligned} \sum \limits _{\begin{array}{c} {g=1} \\ {g\ne h} \end{array}}^N X_{hgi} \le 1 \quad \quad \quad h&= 1,...,N;i=1,...,m \end{aligned}$$
(3)
$$\begin{aligned} X_{hgi} +X_{ghi} \le 1 \quad \quad \quad h&= 1,...,N;g=2,...,N;g>h;i=1,...,m \end{aligned}$$
(4)
$$\begin{aligned} \sum \limits _{g=1}^N X_{0gi} \le nm_i \quad \quad \quad i&= 1,...,m \end{aligned}$$
(5)
$$\begin{aligned} ST_{gi} \ge FT_{hi} +s_{hgi} -M(1-X_{hgi} ) \quad h&= 0,...,N;g=1,\ldots ,N;g\ne h;i=1,...,m\nonumber \\ \end{aligned}$$
(6)
$$\begin{aligned} C_{gji} \ge ST_{gi} +p_{gji}\quad \quad \quad g&= 1,...,N;j=1,\ldots ,n_g ;i=1,\ldots ,m;p_{gji} \ne 0 \end{aligned}$$
(7)
$$\begin{aligned} FT_{gi} \ge C_{gji} \quad \quad \quad g&= 1,..,N;j=1,\ldots ,n_g ;i=1,\ldots ,m;p_{gji} \ne 0 \end{aligned}$$
(8)
$$\begin{aligned}&C_{g{j_{2}} i} \ge C_{gj_{1} i} +p_{gj_{2} i} -M(1-Y_{gj_{1} j_{2} i} )\nonumber \\&\quad g=1,...,N;j_{1} =1,\ldots ,n_{g} ;j_{2} =2,\ldots ,n_{g} ; j_{2} >j_{1} ; \quad i=1,\ldots ,m;p_{gj_{2} i} \ne 0\quad \quad \quad \end{aligned}$$
(9)
$$\begin{aligned}&C_{gj_{1} i} \ge C_{gj_{2} i} +p_{gj_{1} i} -MY_{gj_{1} j_{2} i} \quad g=1,...,N;j_{1} =1,\ldots ,n_{g} ;j_{2} =2,\ldots ,n_{g} ; \nonumber \\&\quad j_{2} >j_{1} ; i=1,\ldots ,m;p_{gj_{1} i} \ne 0 \end{aligned}$$
(10)
$$\begin{aligned}&C_{gji} \ge C_{gj\left( {i-1} \right) } +p_{gji}\quad g=1,...,N;j=1,\ldots ,n_{g} ; \quad i=2,\ldots ,m \end{aligned}$$
(11)
$$\begin{aligned}&C_{gji} ,ST_{gi} ,FT_{gi} \ge 0\\&X_{hgi} ,Y_{gj_{1} j_{2} i} \in \left\{ {0,1} \right\} \end{aligned}$$

The objective function is minimization of total completion time as presented by Eq. (1). Constraint set (2) ensures that each group has exactly one preceding group at each stage. Each group has at most one successor group at each stage. Constraint set (3) is incorporated into the model for this reason. Each two groups such as groups \(h\) and \(g\) may have three different positions compared to each other at any stage such as stage \(i\): (1) group \(h\) is processed immediately before group \(g\) (2) group \(h\) is processed immediately after group \(g\) (3) these two groups are not processed exactly after each other (these two groups are processed either on different machines or on the same machine but not immediately after each other). Constraint set (4) is incorporated into the model for this reason. Constraint set (5) is incorporated into the model to ensure that \(nm_i \) machines are scheduled at stage \(i\). At each stage such as stage \(i\), \(nm_i \) groups can be scheduled after the dummy group and each of these groups are assigned to one of the parallel machines at that stage. The start time of processing each group at each stage is greater than or equal to the completion time of its immediate predecessor group plus the required setup time of that group. Constraint set (6) is incorporated into the model for this reason. Constraint set (7) is incorporated into the model to make sure that the completion time of a job at a stage is greater than the start time of processing its group, plus its processing time at that stage. The completion time of each group is greater than the completion times of its jobs. Constraint set (8) is incorporated into the model for this reason. The difference between the completion times of two jobs on a machine at each stage should be greater than the processing time of the job is processed later. Constraint sets (9) and (10) are incorporated into the model in order to support this fact. Constraint set (11) ensures that the completion time of each job at each stage is greater than or equal to its completion time at the previous stage plus its processing time at this stage.

The classical \(1|s_{hg} |\sum C_j \) and \(F2||\sum C_j \) problems are NP-hard (Pinedo 2012). Flexible flowshop is a generalization of these problems, so it can be concluded that the \(FFm|fmls,s_{hgi} |\sum C_j \) problem is also NP-hard. Thus, metaheuristic algorithms are required to solve large size problems.

4 Metaheuristic–memetic algorithm

Since the research problem has been shown to be NP-hard, six metaheuristic algorithms based on MA are developed to heuristically solve the problem. MA is an evolutionary algorithm introduced by Moscato (1989) as a hybrid GA combined with an individual learning procedure for local refinement. Previous research show that MA has a good performance in scheduling and timetabling problems (Hart and Krasnogor 2005; França et al. 2005; Keshavarz and Salmasi 2013). Keshavarz and Salmasi (2013) propose a very efficient MA algorithm for the \(FFm|fmls,s_{hgi} |C_{\max } \)problem. Motivated from this, we applied the same algorithm to solve the proposed research problem by several modifications in solution representation, calculating the fitness of a solution, crossover operator, and local search procedure. Other features of the applied MA are the same as the proposed MA by Keshavarz and Salmasi (2013). The characteristics of the proposed algorithm are discussed in the following subsections.

4.1 Solution representation

Generally, solutions are represented by a string of digits called chromosome. Different steps of the algorithm such as crossover and mutation are applied on chromosomes.

The algorithm should determine the sequence of groups as well as jobs within each group on machines at each stage. Any sequence of groups followed by sequence of jobs within each group can be considered as a solution. Each permutation of the digits 1 to \(N\) represents a priority list for assigning groups to machines at every stage. Similarly, every permutation of the digits 1 to \(n_g \) represents the sequence of jobs that belong to the \(g^{th}\) group. So, solution representation for each stage can be presented as a string such as [\(G_1 \), \(G_2 \),..., \(G_N \) | \(J_{11} \), \(J_{12} \),..., \(J_{1n_1 } \) | ... | \(J_{N1} \), \(J_{N2} \), ..., \(J_{Nn_N } \)]. The first part of this chromosome represents the priority list of groups. The other parts represent the sequence of jobs within each group sequentially. In this way, chromosomes are divided into \(N+1\) independent parts. To reduce the search space of the MA, it is assumed that the sequence of jobs within groups does not change through successive stages. The procedure of assigning groups and jobs to machines at each stage is discussed in Sect. 4.10.

4.2 Calculating the total completion time of a chromosome

Assigning groups to the first available machine is not an efficient sequencing rule for the proposed research problem since it does not consider the effect of sequence-dependent setup time between groups. In other words, keeping some machines idle while a group is waiting for processing may result in a better schedule. Hence, the optimal schedule may belong to a non-delay schedule (A feasible schedule is called non-delay if no machine is kept idle while an operation is waiting for processing).

For calculating the fitness of a chromosome, which is the total completion time value of a chromosome, the groups are assigned to machines at each stage based on the priority list of groups in that chromosome. By considering the setup time that is needed for processing a group on a machine (based on the last processed group on that machine), the group is assigned to a machine with the earliest possible start time. After assigning a group to a machine, the completion time of its jobs are computed based on a job sequence of that group. After computing the completion time of jobs at all stages, the sum of completion time of jobs at the last stage is equal to the total completion time.

As an example, consider \(G_1 \), \(G_2 \),..., \(G_N \) as the priority list of groups and let \(nm_1 =3\). First, group \(G_1 \) is assigned to the first machine of stage 1. Then, the completion time of its jobs are calculated based on the job sequence of this group. Assume \(C_1 \) as the completion time of group \(G_1 \) at stage 1 (\(C_1 \) is equal to the completion time of the last job of group \(G_1 )\). Then, group \(G_2 \) is assigned to the second machine of stage 1, if \(s_{0G_1 } \le C_1 +s_{G_1 G_2 } \). Otherwise, it is assigned to the first machine and will be processed after group \(G_1 \). Assume that group \(G_2 \) is assigned to the second machine and its completion time at stage 1 is \(C_2 \). If \(\min \{ {s_{0G_3} ,C_1 +s_{G_1 G_3 } ,C_2 +s_{G_2 G_3 } } \}=s_{0G_3 },\) then group \(G_3 \) is assigned to the third machine. If \(\min \{{s_{0G_3 } ,C_1 +s_{G_1 G_3 } ,C_2 +s_{G_2 G_3 } } \}=C_1 +s_{G_1 G_3 },\) group \(G_3 \) is assigned to the first machine and otherwise if \(\min \{ {s_{0G_3 } ,C_1 +s_{G_1 G_3 } ,C_2 +s_{G_2 G_3 } }\}=C_2 +s_{G_2 G_3 } , \hbox {group}\,\, G_3\) is assigned to the second machine. Assignment of other groups to the machines is done in the same way.

4.3 Crossover operator

Three different crossover operators, namely “Order Crossover” (OX), “Similar Job Order Crossover” (SJOX), and “Partially Mapped Crossover” (PMX) are employed in this research problem. OX crossover is applied by Keshavarz and Salmasi (2013). OX Crossover determines the piece of the first parent by generating two random numbers to copy over to the offspring. The rest of the offspring is completed according to order of alleles in the second parent. SJOX and PMX crossovers are proposed by Ruiz et al. (2003) and Goldberg and Lingle (1985), respectively. SJOX crossover examines each position of the parents and determines identical alleles at the same positions to copy over to both offspring. Therefore, each offspring directly fills up all alleles from one of the parents up to a randomly chosen cut point. PMX crossover partitions each parent two substrings, and exchanges the two substrings to produce proto-offspring. A mapping relationship based on the selected substrings is determined, and proto-offspring is legalized.

4.4 Local search procedure

Two local search procedures as (1) Swap and the Insertion Moves (SIM) and (2) Path Relinking (PR) are considered as the local search procedures. Keshavarz and Salmasi (2013) propose MA based on SIM local search. In swap moves, the position of two alleles are changed, whereas in insertion moves one allele is removed from its position and inserted in another one. PR is a search technique originally presented by Glover and Laguna (1997) where the goal is to explore the search space or “path” between a given set (usually two) of good solutions. The steps of the proposed MA are presented as a flowchart in Fig. 2 in Appendix 1.

5 Lower bounding method: the branch and price algorithm

The B&P algorithm has been known as an efficient method for solving integer linear programming problems with huge number of variables. The details of the B&P algorithm can be found in Barnhart et al. (1998), Wilhelm (2001), and Wilhelm et al. (2003). Salmasi et al. (2010) and Gelogullari and Logendran (2010) used B&P algorithm to find a lower bound for the flowshop group scheduling problem.

In this research, we propose a B&P algorithm for finding the lower bound of the research problem. The reformulated mathematical model, the column generation approach, and the details of the proposed B&P algorithm are discussed in the following sections.

5.1 Dantzig–Wolf decomposition model

A decomposition formulation for FFSDGSP by reformulating it as a set-partitioning master problem (MP) by using Dantzig–Wolfe decomposition is proposed. The following parameters and decision variables beside the defined ones in Sect. 3 are required to present this model:

Parameters:

\(B_i \) :

Set of all feasible schedules at stage \(i \quad i=1,\ldots ,m\)

\(K_i \) :

Number of all feasible schedules at stage \(i \quad i=1,\ldots ,m\)

\(b_i^k \) :

The \(k^{th}\) feasible schedule at stage \(i\)    \(i=1,\ldots ,m;k=1,\ldots K_i \)

\(C_{gji}^k \) :

Completion time of job \(j\) of group \(g\) at stage \(i\) in schedule \(b_i^k\)   \(g=1,\ldots ,N;j=1,\ldots ,n_g ;i=1,\ldots ,m\)

Decision variables:

\(\lambda _i^k\) :

\(=\left\{ \begin{array}{l@{\quad }l} 1 &{} {\hbox {If schedule }b_i^k \in B\hbox { is selected at stage }i} \\ 0&{} {\hbox {Otherwise}} \\ \end{array}\right. \quad \quad \quad \quad i=1,\ldots ,m;k=1,\ldots K_i\)

Each column corresponds to a feasible schedule of groups and jobs at a stage.The set of all feasible schedules at stage \(i\) is denoted by \(B_i \). Each of these schedules \(b_i^k ,k=1,\ldots ,K_i \), where \(K_i =\left| {B_i } \right| \), is defined by a set of completion times \(\left\{ {C_{gji}^k } \right\} \). MP is an integer programming model with a huge number of binary variables. Since there are a huge number of feasible schedules at each stage, it is not possible to include them all in the model. Therefore, column generation approach is used to solve LP relaxation of MP by adding new schedules with negative reduced costs to the model as needed. The linear programming master problem, which includes only a subset of all possible schedules, is called the restricted linear master problem (RLMP) and can be written as below:

RLMP Model:

$$\begin{aligned} \min Z=\sum \limits _{k=1}^{K_m } \sum \limits _{g=1}^N \sum \limits _{j=1}^{n_g } \lambda _m^k C_{gjm}^k \end{aligned}$$
(12)

s.t.

$$\begin{aligned}&\displaystyle \sum \limits _{k=1}^{K_i } \lambda _i^k C_{gji}^k - \sum \limits _{k=1}^{K_{\left( {i-1} \right) } } \lambda _{\left( {i-1} \right) }^k C_{gj\left( {i-1} \right) }^k \ge p_{gji} \quad \quad \quad \quad \begin{array}{c} i=2,\ldots ,m; \\ g=1,\ldots ,N; \\ j=1,\ldots ,n_g \\ \end{array} \quad \left( {\omega _{gji} } \right) \end{aligned}$$
(13)
$$\begin{aligned}&\displaystyle \sum \limits _{k=1}^{K_i } \lambda _i^k =1\quad \quad \quad i=1,\ldots ,m \left( {\alpha _i } \right) \end{aligned}$$
(14)
$$\begin{aligned}&\displaystyle \lambda _i^k \ge 0 \quad \quad \quad \begin{array}{c} i=1,\ldots ,m; \\ k=1,\ldots ,K_i \\ \end{array} \end{aligned}$$
(15)

The objective is to minimize the total completion time as presented by equation (12). Constraint set (13) ensures that the completion time of a job at a stage (say stage \(i)\) is greater than or equal to the completion time of the job at the preceding stage (stage \(i-1)\) plus the process time of the job at that stage. This constraint set deals with the completion time of jobs at more than one stage. Thus, this constraint set is considered as the linking (complicating) constraint in the model. Constraint set (14) is the convexity constraint and ensure that only one schedule is selected at each stage. Variables \(\lambda _i^k \) are originally binary that are relaxed to continuous decision variables in the RLMP.

\(\omega _{gji} \) and \(\alpha _i \) denote the dual variables corresponding to constraint sets (13) and (14), respectively. The dual problem of RLMP is presented to facilitate the presentation of the sub-problems. Sub-problems are used to identify if there is any column to add to RLMP to improve the objective function value. The dual of RLMP is generated as the following:

The Dual of RLMP Model:

$$\begin{aligned} \max Z^{D}=\sum \limits _{i=2}^m \sum \limits _{g=1}^N \sum \limits _{j=1}^{n_g } p_{gji} \omega _{gji} + \sum \limits _{i=1}^m \alpha _i \end{aligned}$$
(16)

s.t.

$$\begin{aligned}&\displaystyle -\sum \limits _{g=1}^N \sum \limits _{j=1}^{n_g } \omega _{gj2} C_{gj1}^k +\alpha _1 \le 0\quad \quad \quad k=1,\ldots K_1\end{aligned}$$
(17)
$$\begin{aligned}&\displaystyle \sum \limits _{g=1}^N \sum \limits _{j=1}^{n_g } \omega _{gji} C_{gji}^k -\sum \limits _{g=1}^N \sum \limits _{j=1}^{n_g } \omega _{gj(i+1)} C_{gji}^k +\alpha _i \le 0\quad \quad \quad i=2,\ldots ,m-1;k=1,\ldots ,K_i\end{aligned}$$
(18)
$$\begin{aligned}&\displaystyle \sum \limits _{g=1}^N \sum \limits _{j=1}^{n_g } (\omega _{gjm} -1)C_{gjm}^k +\alpha _m \le 0 \quad \quad \quad k=1,\ldots ,K_m\end{aligned}$$
(19)
$$\begin{aligned}&\displaystyle \omega _{gji} \ge 0\quad \quad \quad i=2,\ldots ,m;g=1,\ldots ,N;j=1,\ldots ,n_g\end{aligned}$$
(20)
$$\begin{aligned}&\displaystyle \alpha _i \quad \quad \quad \hbox { Unrestricted} \quad \quad i=1,\ldots ,m \end{aligned}$$
(21)

Define \(\omega _{gj1} =0\) and \(\omega _{gj\left( {m+1} \right) } =1\) for \(g=1,\ldots ,N\) and \(j=1,\ldots ,n_g \). Then, in order to find the schedule with the smallest reduced cost at the \(i{th}\) stage, it is needed to solve the following sub-problem (SP\(i)\):

SP\(i\) Model (\(i=1,\ldots ,m)\):

$$\begin{aligned} \min Z^{SPi}=\sum \limits _{g=1}^N \sum \limits _{j=1}^{n_g } \left( {\omega _{gj(i+1)} -\omega _{gji} } \right) C_{gji} -\alpha _i \end{aligned}$$
(22)

s.t.

capacity constraint sets for \(nm_i \) parallel machines at stage \(i\) (i.e., constraints (2)–(10) of the original model for a specific \(i)\)

The last terms in (22) i.e., \(\alpha _i \), is constant. Thus, solving each sub-problem is equivalent to solving a parallel machine sequence-dependent group scheduling problem with total weighted completion time as objective function i.e., \(Pm|fmls,s_{hgi} |\sum w_j C_j \). Kan (1976) shows that \(1|fmls,s_{hgi} |\sum w_j C_j \) is strongly NP-hard. So it can be concluded that \(Pm|fmls,s_{hgi} |\sum w_j C_j \) is also strongly NP-hard.

Thus, all sub-problems have similar structure and are strongly NP-hard. If the optimal values of at least one of the sub-problems are negative, there are column(s) that can be added to the master problem and improve the objective function value. The process of solving sub-problems and adding new columns to RLMP continues until all sub-problems have positive optimal objective function values.

In SP2 through \(m\), the objective function coefficient of \(C_{gji} \) i.e., \(\left( {\omega _{gj(i+1)} -\omega _{gji} } \right) \)can be negative. In this case, the sub-problem would be unbounded. This causes the column generation converges to optimal solution very slowly. In order to resolve this issue, a constraint set is incorporated into the dual of the RLMP model to enforce all coefficients to be positive. The following constraints are incorporated into the dual of RLMP:

$$\begin{aligned} (\omega _{gj(i+1)} -\omega _{gji} )\ge 0\quad \quad \quad i=2,\ldots ,m;g=1,\ldots ,N;j=1,\ldots ,n_g \end{aligned}$$
(23)

To achieve this purpose, artificial variables \(R_{gji} \) are added to the RLMP. Then, the RLMP model with artificial variables (RLMP-A) would be as follows:

RLMP-A Model:

$$\begin{aligned} \min Z=\sum \limits _{k=1}^{K_m } \sum \limits _{g=1}^N \sum \limits _{j=1}^{n_g } \lambda _m^k C_{gjm}^k +\sum \limits _{g=1}^N \sum \limits _{j=1}^{n_g } R_{gjm} \end{aligned}$$
(24)

s.t.

$$\begin{aligned}&\displaystyle \sum \limits _{k=1}^{K_2 } \lambda _2^k C_{gj2}^k -\sum \limits _{k=1}^{K_1 } \lambda _1^k C_{gj1}^k +R_{gj2} \ge p_{gj2} \quad \quad \quad g=1,\ldots ,N;j=1,\ldots ,n_g\end{aligned}$$
(25)
$$\begin{aligned}&\displaystyle \sum \limits _{k=1}^{K_i } \lambda _i^k C_{gji}^k -\sum \limits _{k=1}^{K_{\left( {i-1} \right) } } \lambda _{\left( {i-1} \right) }^k C_{gj\left( {i-1} \right) }^k +R_{gji} -R_{gj(i-1)} \ge p_{gji}\nonumber \\&\displaystyle i=3,\ldots ,m;g=1,\ldots ,N; j=1,\ldots ,n_g\end{aligned}$$
(26)
$$\begin{aligned}&\displaystyle \sum \limits _{k=1}^{K_i } \lambda _i^k =1 \quad \quad \quad i=1,\ldots ,m\end{aligned}$$
(27)
$$\begin{aligned}&\displaystyle \lambda _i^k \ge 0 \quad \quad \quad i=1,\ldots ,m;k=1,\ldots ,K_i\end{aligned}$$
(28)
$$\begin{aligned}&\displaystyle R_{gji} \ge 0\quad \quad \quad \quad i=2,\ldots ,m;g=1,\ldots ,N; j=1,\ldots ,n_g \end{aligned}$$
(29)

Since a few constraints are incorporated into the dual model, the primal model provides a lower bound for the LP relaxation of MP.

5.2 Solving sub-problems

During column generation approach, sub-problems should be solved so many times. Solving sub-problems using integer programming model has shown a very poor performance in Salmasi et al. (2010) for a similar problem. For this reason, in this research two efficient algorithms are developed for solving the sub-problems heuristically and optimally.

It is not necessary that the sub-problems be solved optimally during the intermediate iterations of column generation approach and finding an improving column is adequate. Thus, a metaheuristic is used to solve sub-problems during early iterations. When the metaheuristic is unable to find a suitable column for all sub-problems, the sub-problems are solved optimally. At the end of the column generation approach, all sub-problems must be solved optimally to make sure that the optimal solution of the node is found. In this research, MA is used for solving sub-problems heuristically. After solving the sub-problems using the metaheuristic, if at least one of the sub-problems have a negative objective function, new columns are added to the RLMP-A. By solving updated RLMP-A, new dual values are obtained to update sub-problem objective functions. Adding new columns by solving sub-problems heuristically is continued until all sub-problems are unable to provide a column that improves the objective function value of RLMP-A.

To ensure the optimality of RLMP-A solution, the sub-problems should be solved by an optimal algorithm. A branch and bound (B&B) algorithm is developed to solve the sub-problems optimally. If the optimal solution of one of the sub-problems becomes negative, the new generated column is added to RLMP-A. By solving updated RLMP-A, we obtain new dual values and update sub-problems objective functions. Again the MA is used for solving sub-problems. Finally, the column generation approach stops when the MA cannot find new columns and the optimal solution of all sub-problems (based on the B&B algorithm) are positive. In this case the optimality of the LP relaxation of master problem is proved. After solving RLMP-A, if the values of \(\lambda _i^k \) do not satisfy the integrality condition, branching is occurring. The flowchart of column generation procedure is presented in Appendix 1 as Fig. 3.

5.2.1 Job sequence in sub-problem optimal solution

For each assignment of groups to parallel machines at a stage, the optimal sequence of jobs within each group can be found by using a sorting rule. For a group that assigned to a machine finding its optimal job sequence is equivalent to finding the optimal solution of a \(1||\sum {w_j C_j } \) problem (Pinedo 2012). So in each sub-problem, jobs should be sorted based on the weighted shortest processing time (WSPT) rule. The WSPT rule for each SP\(i\) (\(i=1\) through \(m)\) can be defined as follows:

WSPT rule for SP \(i\) ( \(i= 1\) through \(m\)):

Sort jobs within each group in decrease order based on the value of \({(\omega _{gj(i+1)} -\omega _{gji} )}/{p_{gji} }\)

This rule is used in both MA and B&B algorithms for finding the optimal job sequences within each group.

5.2.2 The Branch and Bound algorithm for solving sub-problems

All sub-problems are \(Pm|fmls,s_{hgi} |\sum w_j C_j \) and the job sequence within each group can be determined using the WSPT rule. In order to find the optimal sequence of groups, each group is considered as an aggregated job. The process time of this aggregated job is equal to the sum of the process times of the jobs belonging to that group. Assume \(j_{(1)} ,j_{(2)} ,\ldots ,j_{(n_g )} \) is the optimal job sequence of group \(g,g=1,\ldots ,N.\) Assume \(Ct_g \) as the completion time of group \(g\). The completion time of each job can be represented as:

$$\begin{aligned} C_{gj_{(r)} i} =Ct_g -\sum \limits _{t=r+1}^{n_g } p_{gj_{\left( t \right) } i}\quad r=1,\ldots ,n_g \end{aligned}$$
(30)

The objective function of the SP\(i,i=1,\ldots ,m\) can be written as the following:

$$\begin{aligned} Z^{SPi}&= \sum \limits _{g=1}^N \sum \limits _{j=1}^{n_g} \left( {\omega _{gj(i+1)} -\omega _{gji} } \right) C_{gji} -\alpha _i =\sum \limits _{g=1}^N \sum \limits _{r=1}^{n_g } \left( {\omega _{gj_{\left( r \right) } \left( {i+1} \right) } -\omega _{gj_{\left( r \right) } i} } \right) C_{gj_{\left( r \right) } i} -\alpha _i\nonumber \\&= \sum \limits _{g=1}^N \sum \limits _{r=1}^{n_g } \left( {\omega _{gj_{\left( r \right) } \left( {i+1} \right) } -\omega _{gj_{\left( r \right) } i} } \right) \left( {Ct_g -\sum \limits _{t=r+1}^{n_g } p_{gj_{\left( t \right) } i} } \right) -\alpha _i\nonumber \\&= \sum \limits _{g=1}^N \sum \limits _{r=1}^{n_g } \left( {\omega _{gj_{\left( r \right) } \left( {i+1} \right) } -\omega _{gj_{\left( r \right) } i} } \right) Ct_g -\sum \limits _{g=1}^N \sum \limits _{r=1}^{n_g } \sum \limits _{t=r+1}^{n_g } \left( {\omega _{gj_{\left( r \right) } \left( {i+1} \right) } -\omega _{gj_{\left( r \right) } i} } \right) p_{gj_{\left( t \right) } i} -\alpha _i\nonumber \\&= \sum \limits _{g=1}^N \sum \limits _{r=1}^{n_g } \left( {\omega _{gj_{\left( r \right) } \left( {i+1} \right) } -\omega _{gj_{\left( r \right) } i} } \right) Ct_g -constant_i\nonumber \\&= \sum \limits _{g=1}^N \left( {\sum \limits _{j=1}^{n_g } \left( {\omega _{gj\left( {i+1} \right) } -\omega _{gji} } \right) } \right) Ct_g -constant_i \end{aligned}$$
(31)

Thus, the SP1 through \(m\) can be considered as \(Pm|s_{hg} |\sum w_j C_j \) problems with \(N\) aggregated jobs and the weight of each aggregated job \(g\) is \(\left( {\sum \nolimits _{j=1}^{n_g } \left( {\omega _{gj\left( {i+1} \right) } -\omega _{gji} } \right) } \right) \).

A B&B algorithm is developed to solve the \(Pm|s_{hg} |\sum w_j C_j \) problem to find the optimal group sequence for every \(Pm|fmls,s_{hgi} |\sum w_j C_j \) problem. The B&B algorithm can also be used to get a valid lower bound for each sub-problem. Every B&B algorithm consists of three main procedures: initialization, branching and bounding. The solution of MA is used as an initial upper bound for the B&B tree (initialization). Each node of the B&B tree consists of a partial solution. Let \(\mathcal{S}\) be the set of scheduled groups and \({\mathcal{S}}'\) be the complementary set of \(\mathcal{S}\) or the set of unscheduled groups. At each level of B&B tree, one of the unscheduled groups is added to the scheduled groups set (Branching). It is clear that each parent node at level \(l\) generates\(\left( {N-l} \right) \) new nodes at level \(\left( {l+1} \right) \). At level 0 of B&B tree, \(\mathcal{S}=\varnothing \) and at each node of level \(l=1,\ldots ,N-2\), \(\mathcal{S}\) contains \(l\) groups. There are \(P\left( {N,l} \right) ={N!}/{\left( {N-l} \right) !}\) nodes at level \(l\). \(\mathcal{S}\) is a priority list and scheduled groups are assigned to machines based on this priority list. Each group is assigned to a machine with the earliest possible starting time (See Sect. 4.10 for more details about assigning groups to machines).

At each node, the total weighted completion time of scheduled groups \((\mathcal{S})\) and the lower bound of total weighted completion time of unscheduled groups \(({\mathcal{S}}')\) are calculated (Bounding). The summation of these two values provides the lower bound value associated with that node. For \(\mathcal{S}=\left\{ {g_{(1)} ,g_{(2)} ,\ldots ,g_{(l)} } \right\} ,\) the total weighted completion time of scheduled groups (\(Z^{\mathcal{S}})\) for each SP\(i\) (\(i=1\) through \(m)\) is represented as formula (35):

$$\begin{aligned} Z^{\mathcal{S}}=\sum \limits _{t=1}^l \left( {\sum \limits _{j=1}^{n_{g_{(t)} } } \left( {\omega _{g_{(t)} j\left( {i+1} \right) } -\omega _{g_{(t)} ji} } \right) } \right) Ct_{g_{(t)} } \end{aligned}$$
(32)

where \(Ct_{g_{(t)} } ,t=1,\ldots ,l\) is calculated based on the assignment of groups to \(nm_i \) parallel machines at stage \(i\).

For calculating the lower bound of the total weighted completion time of unscheduled groups\((\mathcal{S}^{{\prime }})\), each unscheduled group such as group \(g\) is considered as a job with process time equal to:

$$\begin{aligned} \left( {{\min \limits _h} \left\{ {s_{hgi}} \right\} +\sum _{j=1}^{n_g } {p_{gji} } } \right) \big /{nm_i } \end{aligned}$$
(33)

In other words, the minimum possible setup time for processing group \(g\) is added to the processing time of this group and then the result is divided by the number of parallel machines. Then, the unscheduled groups are assigned to a single machine with ready time equal to the minimum ready time of parallel machines after processing the scheduled groups. Ready time of each machine is the completion time of the last group that is processed on that machine. Let \(r\) as the minimum ready time. With this assumption, the problem of finding the sequence of unscheduled groups reduces to \(1||\sum w_j C_j \). The optimal sequence for \(1||\sum w_j C_j \) can be found by using the WSPT rule. Based on the WSPT rule, for each SP\(i\) (\(i=1\) through \(m)\), the unscheduled groups should be sorted in decrease order based on the value of

$$\begin{aligned} {\left( {\sum \nolimits _{j=1}^{n_g } \left( {\omega _{gj\left( {i+1} \right) } -\omega _{gji} } \right) } \right) }\Bigg /{\left( {{\left( {{\min \limits _h} \left\{ {s_{hgi} } \right\} +\sum _{j=1}^{n_g } {p_{gji} } } \right) }\Bigg /{nm_i }} \right) }. \end{aligned}$$

Assume that after using the WSPT rule, the sorted unscheduled groups are \(g_{(l+1)} ,g_{(l+2)} ,\ldots , g_{(N)} \). Then, the completion time of unscheduled groups can be calculated using the following formulas:

$$\begin{aligned} Ct_{g_{(l+1)} }&= r+{\left( {{\min \limits _h } \left\{ {s_{hg_{(l\ell +1)} i} } \right\} + \sum \nolimits _{j=1}^{n_{g(l+1)}} {p_{g_{{(l+1)}^{ji}}}}} \right) }\Bigg /{nm_i } \end{aligned}$$
(34)
$$\begin{aligned} Ct_{g_{(t)} }&= Ct_{g_{\left( {t-1} \right) } } +{\left( {{\min \limits _h} \left\{ {s_{hg_{(t)} i} } \right\} +\sum \nolimits _{j=1}^{n_g } {p_{g_{{(t)}^{ji}}} } } \right) }\Bigg /{nm_i }, \quad t=\ell l+2,\ldots ,N \end{aligned}$$
(35)

In this case, the lower bound of total weighted completion time of unscheduled groups (\(Z^{{\mathcal{S}}'})\) for each sub-problem can be calculated based on formula (39):

$$\begin{aligned} Z^{{\mathcal{S}}'}=\sum \limits _{t=\ell l+1}^N \left( {\sum \limits _{j=1}^{n_{g_{(t)} } } \left( {\omega _{g_{(t)} j\left( {i+1} \right) } -\omega _{g_{{(t)}^{ji}}} } \right) } \right) Ct_{g_{(t)} } \end{aligned}$$
(36)

Then, the lower bound of each SP\(i\) (\(i=1\) through \(m)\) associated with each node can be calculated by adding \(Z^{\mathcal{S}}\), \(Z^{{\mathcal{S}}'}\), and the constant value related to each sub-problem. These values are represented as formula (40):

$$\begin{aligned} \begin{array}{l} Z^{\mathcal{S}}+Z^{{\mathcal{S}}'}-constant_i =\sum \limits _{t=1}^l \left( {\sum \limits _{j=1}^{n_{g_{\left( t \right) } } } \left( {\omega _{g_{{( t )}^ {j( {i+1} )}}} -\omega _{g_{{(t)}^ {ji}}} } \right) } \right) Ct_{g_{\left( t \right) } }+\sum \limits _{t=\ell l+1}^N \\ \quad \times \left( {\sum \limits _{j=1}^{n_{g_{\left( t \right) } } } \left( {\omega _{g_{{( t )}^ {j( {i+1})}}} -\omega _{g_{{( t)}^ {ji}}} } \right) } \right) Ct_{g_{\left( t \right) } } -\sum \limits _{g=1}^N \sum \limits _{r=1}^{n_g } \sum \limits _{t=r+1}^{n_g } \left( {\omega _{gj_{\left( r \right) } \left( {i+1} \right) } -\omega _{gj_{\left( r \right) } i} } \right) p_{gj_{\left( t \right) } i} -\alpha _i \\ \end{array} \end{aligned}$$
(37)

A node is fathomed if its lower bound be greater than or equal to the current upper bound or be at the level \(N-2\) of the B&B tree (in this level the sequence of all groups are determined). Whenever the algorithm reaches to a node at level \(N-2\) if its objective function value is lower than the current upper bound, the upper bound is updated. The best first search policy is used for traversing B&B tree.

In order to explain how the B&B algorithmis implemented, an example for \(P2|s_{hg} |\sum w_j C_j \) with four jobs is presentedin Appendix 2.

5.3 Branching method for branch and price algorithm

After solving the RLMP-A optimally, it is not guaranteed that the values of \(\lambda _i^k \) be integer (0 or 1). In this case the branching is occurred in order to enhance the value of the lower bound. Applying a standard B&B procedure to the RLMP-A with its existing columns will not guarantee an optimal (or even feasible) solution (Barnhart et al. 1998). Barnhart et al. (1994) and Desrosiers et al. (1995) suggest to branch on the decision variables of the original problem. The decision variables related to the sequence of groups or \(X_{hgi} \) are selected for branching in this research. If \(X_{hgi} \) is selected for branching,in one of the generated nodes in the next level of B&P tree, group \(h\) is processed exactly before group \(g\) (on one of the parallel machines at stage \(i)\) and in the other node, group \(g\) is not processed exactly after group \(h\) (the two groups are processed on different machines or on the same machine but group\(g\)is not processed exactly after group \(h)\).

To find the best decision variable for branching, for each two groups \(h\) and \(g\) and each stage \(i\), a branching index is calculated. The index is equal to the sum of \(\lambda _i^k \) which in its corresponding schedule, group \(h\) is processed exactly before group \(g\). This index may vary from 0 to 1. If this index is close to 0, then it means that in most schedules of stage \(i\), group \(h\) is not processed exactly before group \(g\). If this index is close to 1, then it means that in most schedules of stage \(i\), group \(h\) is processed exactly before group \(g\). So, the decision variable with closer branching index to 0.5 is selected for branching.

5.4 Branch and price algorithm stopping criteria

The branching process is continued until all nodes provide an integer solution, be infeasible, or are fathomed. Since this process requires a considerable amount of time, especially for large size problems, a time limitation criterion is applied. The B&P algorithm terminates after 15 minutes runtime. The B&P algorithm terminates when the algorithm solves the problem or the time limitation reached. If the B&P algorithm solves the problem, it provides a lower bound for the original problem. If in a problem, all nodes are not solved because of time limitation, the lower bound of the original problem is the minimum value of the nodes which are solved optimally, but their branches are not solved optimally yet. In other words the B&P tree is traversed based on breadth first search order and the minimum value of the nodes which are solved optimally, but their branches are not solved optimally is reported as the lower bound for the original problem. A simple flowchart of B&P algorithm is presented in Appendix 1 as Fig. 4.

6 Computational Results

Salmasi et al. (2010) generate a set of test problems for the \(Fm|fmls,s_{hgi} ,prmu|\sum C_j \) problem with two, three, and six-stage problems, separately. These test problems are generalized for this research problem. To use these test problems for the FFSDGSP, motivated from Logendran et al. (2006a), the number of machines at each stage are considered randomly between one to three machines. The specifications of the test problems are as follows:

  • The number of groups is between two to 16 in three different categories: small (problems with 2–5 groups), medium (problems with six to 10 groups), and large (problems with 11 to 16 groups).

  • The number of jobs in each group is between two to 10.

  • The ratio of setup times of processing groups on machines in consecutive stages is defined by three levels. The setup time of processing a group at a stage can be less, equal, or greater than the setup time of processing the group at the next stage.

These specifications are the ones used to generate a test problem. Then, each test problem is solved by the three crossovers as an algorithm factor by applying one of the two local search procedures. Based on this explanation, each experimental unit of the first three factors (which generate a test problem) is split into six different parts to be solved by one of the combinations of the crossovers and the local search procedures. Based on Salmasi and Logendran (2008), the split plot design is the most appropriate model to compare the results. The model is a mixed model since it includes fixed factors (groups, jobs, ratio of setup times, algorithms, and local search procedures) as well as a random factor (problem instances). The model of the experiment for a three stage problem can be represented as:

$$\begin{aligned} Y_{ijklmnr} =\mu +G_i +J_j +R1_k +R2_l +T_{t(ijkl)} +\alpha _m +L_n \hbox {+}\varepsilon _{ijklmnr} \end{aligned}$$
(38)

where \(\mu \) is the overall mean, \(G_{i}\) is the effect of group factor (\(i\) = 1, 2, 3), \(J_{j}\) is the effect of job factor (\(j\) = 1, 2, 3), \(R\)1\(_{k}\) is the ratio of set-up time of \(M_{1}\)/\(M_{2}\) factor (\(k\) = 1, 2, 3), \(R\)2\(_{l}\) is the ratio of set-up time of \(M_{2}\)/\(M_{3}\) factor (\(l\) = 1, 2, 3), \(T_{t}\) is the block factor (a random factor) \(t\)=1, ..., 162 (162 is the number of test problems), \(\alpha _{m}\) is the effect of algorithm factor (\(m\) = 1, 2, 3), \(L_{n}\) is the effect of local search procedures factor (\(n\) = 1, 2) and \(\varepsilon _{ijklmnr}\) is the error term,

The goals of performing the experimental design are:

  • Find the algorithm with the best performance.

  • Identify if there is a statistically significant difference between the performances of local search procedures.

The hypothesis tests to investigate for these goals are:

$$\begin{aligned} \begin{array}{l} \left\{ {{\begin{array}{l} {\hbox {H}_0\! :\alpha _1 =\alpha _2 =\alpha _3 } \\ {\hbox {H}_1\! :\hbox {if any of the }\alpha {'}s\, \hbox {is } \hbox {different}\,\, \hbox {from}\,\, \hbox {the} \,\hbox {others}} \\ \end{array} }} \right. \\ \left\{ {\begin{array}{l} \hbox {H}_0\! :L_1 =L_2 \\ \hbox {H}_1\! :L_1 \ne L_2 \\ \end{array}} \right. \\ \end{array} \end{aligned}$$

The Microsoft visual C++ 2008 is used to code the different versions of the MA. The proposed B&P algorithm is also coded with visual C++ 2008 using ILOG CPLEX (version 12.1) concert technology. The test problems are run on a laptop with 2.1 GHz CPU and 2 GB RAM. The experimental design is coded with Statistical Analysis System, SAS, Release 9.1, to find the best proposed MA. A significance level of 5 % is used in this experiment. Each problem is solved for 30 s by each algorithm.

The results of the experiment are presented in Appendix 3 for two, three, and six stage problems, separately. The results show that there is not a significant difference among the algorithms (the p-values of A (Algorithm) are equal to 0.9431, 0.9728, and 0.9422 for two, three, and six stage problems, separately). But there is a significant difference between the local search procedures (The p-values of L (local search procedures) are less than 0.0001 for two, three, and six stage problems in Tables 15, 16 and 17). Thus, there is a significant difference between using the local search procedures. The average of the objective function values for the test problems by using the SIM and PR local search procedures are presented in Table 1.

Table 1 The average of the objective function values for the test problems by using different local search procedures

Since the SIM local search algorithm provides solutions with the lower average, it can be considered as the better one. Since there is not any significant difference among the performance of the algorithms, the average of the objective function values of the algorithms by considering SIM as the local search algorithm procedure is calculated as presented in Table 2.

Table 2 The average of the objective function values by using different algorithms

In order to compare the performance of the algorithms based on the required time to reach to the best solution in test problems, the time spent to find the best solution in the test problems by using SIM as the local search algorithm procedure is reported in Table 3 and Fig.1. As it is shown in Table 3, the average time of finding the best solution by OX crossover with SIM local search is lower than the other ones. The details of this comparison are presented in Fig.1 by presenting the distribution of the time required to find the best solution in the 30 seconds time interval applied to solve each problem.

Table 3 The average time of finding the best solution by using different algorithms
Fig. 1
figure 1

Runtime percentage histogram

In order to evaluate the performance of the lower bounding method as well as the proposed MA (MA with OX crossover and SIM local search), all test problems are solved by these algorithms. The percentage gap of the MA is calculated based on this formula for test problems:

$$\begin{aligned} {(\hbox {The MA solution}-\hbox {The lower bound solution})}/{(\hbox {The lower bound solution})} \end{aligned}$$

The average percentage gap of the MA is presented in Table 4. The average percentage gap for all test problems is 6.03 %. It shows that both upper and lower bounding methods are efficient. The average percentage gap of the MA based on group size for two, three, and six-stage problems, separately are presented in Tables 5, 6 and 7.

Table 4 The average percentage gap for the MA from the lower bound
Table 5 The average percentage gap for 2-stage problems
Table 6 The average percentage gap for 3-stage problems
Table 7 The average percentage gap for 6-stage problems

The distribution of percentage gap for two, three, and six-stage problems separately are also presented in Tables 8, 9 and 10.

Table 8 The distribution of percentage gap for 2-stage problems
Table 9 The distribution of percentage gap for 3-stage problems
Table 10 The distribution of percentage gap for 6-stage problems

To speed up the column generation convergence, artificial variables are added to the master problem. It restricts the dual space and lead to a lower bound for the LP relaxation. The average percentage gap of the MA shows that the lower bounding method is efficient and the effect of artificial variables is not considerable.

The proposed mathematical model is used to solve small size problems optimally. The number of small size problems that are solved optimally, the largest solved problem, the average percentage error of the MA, and the average percentage error of the lower bounding method using the B&P algorithm are presented in Table 11.

Table 11 The comparison among the optimal solution, the memetic algorithm, and the lower bound for small size problems

The quality of solutions provided by the proposed lower bound is compared with a standard lower bound that is obtained by CPLEX. Totally 36 Random test problems (12 in each category of 2, 3 and 6-stage problems) are solved by CPLEX and the percentage gaps are reported after 1 hour. The average percentage gap of CPLEX is compared with the average percentage gap of the proposed lower bounding method in Table 12. The results show that the average percentage gap of the proposed B&P is significantly lower than the lower bounds provided by the CPLEX and so the proposed method outperforms CPLEX.

Table 12 The comparison among the average percentage gap of B&P and CPLEX

There is not any available research for the proposed research problem in the literature. Thus, the proposed methods cannot be compared with the result of any other research. However, Salmasi et al. (2010) develop several upper bounding methods and a lower bounding method based on the B&P algorithm for the \(Fm|fmls,s_{hgi} ,prmu|\sum C_j \) problem. Their problem is much easier than the research problem in this paper, since the flexible flowshop problem is a generalization of the flowshop problem.Their algorithms are run on a Power Edge 2650 with 2.4 GHz Xeon, and 4 GB RAM. They consider 4 hours as their stopping criterion and only solved 43 out of 270 test problems. The average percentage gap of their proposed algorithms (the ACO as the upper bound and the B&P algorithm as the lower bound) for these 43 test problems is 14.76 %. However, all of these test problems are solved for the flexible flowshop problem using the proposed methods in this research in 15 min time limit and the average percentage gap 6.03 % is reached. The superiority of the proposed algorithm in this research is mainly due to more efficient algorithms for solving sub-problems and the more balanced branching rule.

7 Conclusions and suggestions for future research

In this research, a mathematical model, a metaheuristic algorithm and a lower bounding method based on the B&P algorithm are proposed for the FFSDGSP. Some experiments are performed based on test problems ranging in size from small, medium, to large for two, three, and six-stage problems. The performance of the MA is compared with the result of the lower bounding method. The average percentage gap of the MA is 6.03% for randomly generated test problems. To reduce the search space and hence the run time of the metaheuristic algorithm, it is assumed that the sequence of jobs within groups does not change through successive stages in the MA. However, the results show this assumption can speed up solving the problem by the algorithms without significant effect on the quality of solution.

Recognizing the industrial relevance of FFSDGSP, further research can be performed by considering other optimization criteria such as minimization of total tardiness and earliness. Other exact, heuristic and lower bounding algorithms can be developed for the proposed research problem.