Introduction

To remain competitive, companies must focus on a continuous improvement of their processes which must meet an increased customers needs of quality, cost and time of availability. This improvement is supported by a quick development of new information and communication technologies. In this context, production scheduling and preventive maintenance are important mechanisms that affect performances of an industrial company. These two functions, generally operate on the same resources which may cause some conflicts between their respective objectives. However, in most real workshops their activities are independents and resources immobilization for maintenance tasks is rarely planned so that it does not interfere with productivity. In addition, cost incurred by a failure (involving a corrective maintenance, unscheduled stop of production, significant delivery delays, etc.) can be much higher than a scheduled stop of production for a preventive maintenance. Therefore, we are facing a joint scheduling problem of production and preventive maintenance tasks, with objectives of respecting, firstly, constraints of delay, cost and quality, and secondly, constraints of functioning security that ensure sustainability of production tools.

In this paper, we are interested in joint production and maintenance scheduling in permutation flow shop workshops (Blazewicz et al. 1996; Malakooti 2013). The complexity studies have shown that usually the flow shop scheduling problems lie in the NP-hard problem class (Rudek and Rudek 2013). Difficulty especially rises as the number of involved jobs or machines increases. In such a situation, despite the spectacular evolutions in computers calculation power, techniques that give the exact solution become insufficient. Thus, several methods were developed to deal with this issue in case of production scheduling (Gen and Lin 2014; Yagmahan and Yenisey 2009; Yenisey and Yagmahan 2014). Among these methods, our study provides an implementation of Ants Colony Optimisation (ACO) (Dorigo 1992), Genetic Algorithms (GA) (Goldberg 1989), Tabu Search (TS) (Glover 1986) and two hybrid GA an ACO approaches. Choice of these methods is not arbitrary but based on the fact that they allow us to obtain solutions as close as possible to the optimal one with a reasonable processing time. In case of only production flow shop scheduling, performances of these methods were demonstrated in works of : Alaykyran et al. (2007), Marimuthu et al. (2009) for Ants Colony Optimisation, Chiou et al. (2012), Ventura and Yoon (2013) for Genetic Algorithms and Bozejko et al. (2013), Grabowski and Wodecki (2004) for Tabu Search. We propose two different strategies to implement these methods for joint production and maintenance scheduling : sequential and integrated. The main difference between these two strategies is the way in which maintenance tasks will be processed during the joint scheduling construction.

The joint scheduling generated by these methods and strategies may deviate, at any time, from the desired theoretical performances. This deviation occurs when the joint scheduling faces disturbances due to various causes (machines failures, operators overloading, project objectives changing, etc.). Therefore, we must still search through all possible joint scheduling, the most robust ones, i.e. the most resisting scheduling to uncertainties which represent an important problem in real workshops. An approach to address this issue is to keep good performances even in presence of disturbances in workshops instead of trying to achieve exceptional performances in a deterministic status. This solution examines reaction of scheduling to disturbances after or during their generations.

Thus, the main objective of our work is to study robustness of joint production and maintenance scheduling in permutation flow shop workshops and to demonstrate that performances loss due to insertion of maintenance tasks results in a gain of joint scheduling robustness.

Given their complexity, there are few works dealing with scheduling robustness problems. The cases studied in this field are mainly those of only production scheduling. Works in Cui et al. (2004), Leus and Herroelen (2007) address the problem of finding robust scheduling for a single machine with failure uncertainty. In Cardin et al. (2013), authors carry out experimentations on a complex flexible manufacturing system in order to determine whether or not flexibility of scheduling methods can absorb uncertainties. Scheduling robustness for a flexible job shop problem with random machine failures is addressed in Cardin et al. (2013), He and Sun (2013), Weckman et al. (2012). Two-stage approaches to achieve stable and robust scheduling despite uncertain events are proposed in Alfieri et al. (2012), Huang et al. (2014), Mirabi et al. (2013), Rahmani and Heydari (2014). Authors in Ghezail et al. (2010), Rasconi et al. (2010) introduce different methodologies to perform a comparative evaluation of different approaches used to deal with the problem of scheduling with uncertainty. It is important to note that one of the original aspects of our contribution is that all of these works consider robustness of only production scheduling and did not study the impact of maintenance on this robustness.

The rest of this paper is organized as follows. “Literature review” section provides a focused literature review on joint production and maintenance scheduling. “Study background” section gives a background to our study describing the used data, functions, measures and their denotations. “Joint scheduling strategies” section presents the two joint production and maintenance scheduling strategies used in our study to know the sequential and the integrated strategy with the adaptation of the scheduling generating methods to each strategy. “Scheduling robustness” section describes our approach to study the scheduling robustness by presenting the considered disturbances and the procedures of applying these disturbances on both production and joint scheduling.“Experimental results” section provides experimental studies showing the merits of our approach. “Comparative Study” section presents a comparative study between our approach and other joint scheduling strategies. “Conclusion” section concludes this paper.

Literature review

In the literature, different ways of understanding and characterizing the problem of production and maintenance scheduling are proposed. Nevertheless, most works have opted for an integrated approach to solve a static problem (Ben Ali et al. 2011; Fitouhi and Nourelfath 2014; Wang and Liu 2013; Cui et al. 2014; Hadidi et al. 2011; Najid et al. 2011). In other words, maintenance and production tasks are considered simultaneously and the characteristics (availability date, operating time, etc.) of these tasks are known a priori. Some authors propose to carry out the maintenance tasks during the programmed stops of machines for other activities (quality control for example) (Ben Ali et al. 2011; Kovács et al. 2011; Fitouhi and Nourelfath 2014; Gustavsson et al. 2014; Wong et al. 2013; Roux et al. 2013; Najid et al. 2011; Tsai et al. 2001).

Other works focus on the planning level and propose a scheduling of maintenance and production operations, without necessarily worrying about operational conflicts that may arise (Buzacott and Shanthikumar 1993; Boukas et al. 1995; Rahim and Ben-Daya 2012; Nourelfath and Châtelet 2012; Wang and Liu 2013). On the other hand, authors in Moradi et al. (2011), Ben Ali et al. (2011), Kovács et al. (2011), Luo et al. (2011), Xia et al. (2015), Weinstein and H (1999), Lee and Chen (2000) deal with scheduling problems by considering all constraints to optimize a given criterion, regardless of the decision-making level.

Many studies in the literature concern single-machine problems that can form basis of complex scheduling (Wang and Liu 2013; Cui et al. 2014; Leus and Herroelen 2007; Mehta 1999; Shrouf et al. 2014; Tambe and Kulkarni 2015). Other works were interested in the scheduling problems of parallel machines that can be identical or not (Joo and Kim 2015; Lee and Chen 2000; Nourelfath and Châtelet 2012; Behnamian and Ghomi 2013; Berrichi and Yalaoui 2013; Mensendiek et al. 2015). The complexity of parallel machines scheduling prompted the majority of researchers to propose relaxed simulation of the generalized problems or some heuristics to solve them (James and Almada-Lobo 2011; Edis et al. 2013; Beraldi et al. 2008). Since the author in Johnson (1954) proposed a resolution algorithm of the flow shop model, many works considered this model (Sun et al. 2011; Ribas et al. 2010; Pan and Ruiz 2013; Neufeld et al. 2016; Yenisey and Yagmahan 2014). A part of these works has been restricted to the case of two machines given the difficulty of the general flow shop problem (Yang and Wang 2011; Espinouse et al. 2001; Zhao and Tang 2011; Mirabi et al. 2013; Huang et al. 2014). Furthermore, only few works are devoted to this model in presence of maintenance tasks, even if it has many applications in several industrial fields (Sun et al. 2011; Yenisey and Yagmahan 2014; Neufeld et al. 2016).

In the literature, many works focus on production criteria even in cases where the periods of unavailability are due to maintenance interventions (Harjunkoski et al. 2014; Moon et al. 2013; Joo and Kim 2015; Branke et al. 2016). This gap justified the orientations to give to our study concerning the consideration of maintenance when solving the production scheduling problem and the resolution strategy.

First, we consider the preventive maintenance regarding its predefined nature (Roux et al. 2013; Ruiz et al. 2007; Tsai et al. 2001; Wang and Liu 2013; Wong et al. 2013; Fitouhi and Nourelfath 2014; Gustavsson et al. 2014; Moradi et al. 2011; Naderi et al. 2011). Unlike the randomness of corrective maintenance, which acts to repair the failures, preventive maintenance results in planned interventions or periodic inspections aimed at avoiding the failures. In this case, finding the production scheduling is correlated with the resolution of the maintenance scheduling, which justify the need of a close cooperation between the departments responsible of these two functions. The literature mainly proposes two ways to choose periods of preventive maintenance: predetermined fixed periods or random periods generated during scheduling (Gertsbakh 2013; Xiong et al. 2013). For our case study, we consider predetermined periods of preventive maintenance. Nevertheless, we add tolerance intervals around each maintenance period. During these intervals, the maintenance cost is considered low, which gives more flexibility to maintenance planning.

Second, we chose a joint policy of production and maintenance scheduling. This policy allows companies to have better control of production and maintenance costs. An example of these costs are shortfalls generated by the production downtime during failures or maintenance periods (Wong et al. 2013; Hadidi et al. 2011; Rahim and Ben-Daya 2012; Tambe and Kulkarni 2015; Beraldi et al. 2008; Najid et al. 2011). Hence, this policy must be the result of a joint consultation between the two respective departments responsible of production and maintenance functions. Nevertheless, since production remains the predominant function in manufacturing systems, it will be a strong constraint for maintenance planning. Mainly, because priorities management related to the machines on which maintenance tasks must run. One of the main methods proposed to address this constraint is to take advantage of machines idle time to schedule the maximum tasks of maintenance and minimize production downtime (Naderi et al. 2011; Ruiz et al. 2007).

Third, we consider some scheduling strategies to manage operational conflicts that may arise when integrating maintenance into production scheduling. Since the functions of production and maintenance act on the same resources, namely machines, three main scheduling strategies of these two functions were identified. These strategies aim to resolve most effectively structural conflicts that may exist between the activities of production and maintenance (Campbell and Reyes-Picknell 2015; Lee and Chen 2000). The first strategy deals with an independent scheduling of these activities resulting in additional costs and delays in system operation (Tsai et al. 2001; Shahidehpour and Marwali 2012). The second strategy is to schedule jointly and sequentially the tasks of production and maintenance (Wong et al. 2013; Espinouse et al. 2001; Sloan 2004). The third strategy, called integrated, is to create a joint and simultaneous scheduling of maintenance and production tasks (Hadidi et al. 2011; Ben Ali et al. 2011; Najid et al. 2011). In our study, we opted for the two last strategies since they limit the risk of interference between production and maintenance activities and thus optimize the scheduling quality.

Study background

In this section we first describe data used in our work and their denotations, to know data of the permutation flow shop workshops and those relating to the systematic preventive maintenance as shown in Fig. 1. Then, we present the objective function used in our study.

Fig. 1
figure 1

Temporal characteristics of production and maintenance tasks

Joint scheduling data

In this study we are interested in permutation flow shop workshops which are frequently encountered in practice. Let m be the number of machines in the workshop and n the number of jobs to be processed. A job is defined as a sequence of m elementary production tasks following the order of machines. We denote by \(j (1 \le j \le m)\) a machine, \(i (1 \le i \le n)\) a job and P(ij) a production task of a job i processed on a machine j. It is also noted for each job i:

  • \(r_i\) : release date (the arrival date of the job i in the workshop).

  • \(d_i\) : due date (the delivery date of the job i).

  • \(t_{ij}\) : starting date on a machine j.

  • \(c_{ij}\) : completion date on a machine j.

  • \(p_{ij}\) : processing time on a machine j.

  • \(Cmax_i\) : \(max(c_{ij})\), the completion date of the last task of the job i.

In a permutation flow shop workshop, we consider that each machine can be subjected to one or more systematic preventive maintenance tasks which are periodic interventions. Each preventive maintenance task is characterized by some parameters predetermined by the maintenance department or by the manufacturer of the involved equipments. In practice some positive or negative deviations are tolerated about the ideal maintenance period. These deviations are supported by some tolerance intervals around each maintenance period (Benbouzid et al. 2003). In our study, the cost of a maintenance task if it is advanced or delayed within these intervals is acceptable. These intervals denoted give more flexibility to the maintenance planning if necessary and they represent a compromise between the maintenance cost and the risk of machine availability loss. In the following we denote by :

  • \(M_j\) : the maintenance task associated with the machine j. A machine may be subjected to several interventions (occurrences of \(M_j\))

  • \(T^*_j\) : the optimum periodicity of the maintenance task \(M_j\).

  • \(T_{minj}\) : the minimum time between two successive maintenance tasks \(M_j\) on a machine j.

  • \(T_{maxj}\) : the maximum time between two successive maintenance tasks \(M_j\) on a machine j.

  • \(p'_j\) : the processing time of the maintenance task \(M_j\). It is assumed to be known and constant.

The following notations are related to the kth occurrence of the maintenance task \(M_j\) :

  • \(t'_{jk}\) : the start date of the kth occurrence of \(M_j\).

  • \(c'_{jk}\) : the completion date of the kth occurrence of \(M_j\).

  • And the tolerance interval of the kth occurrence of the maintenance task \(M_j\)\([T_{minjk}, T_{maxjk}]\) is determined as follows :

  • \(T_{minjk} = t'_{jk-1} + p'_j + T_{minj}\) and \(T_{maxjk} = t'_{jk-1} + p'_j - T_{maxj}\).

Objective function

The objective function to optimize is a compromise between goals we want to achieve for a joint scheduling of production and maintenance tasks. Several constraints imposed by customers to their suppliers often imply the minimization of the manufacturing total time or makespan \(C_{max}\).

The respect of customers constraints imposes a proper functioning of suppliers production system which requires a planned maintenance periods. We denote by \(f_1\) the evaluation function of this maintenance planning expressed by:

$$\begin{aligned} f_1 = \sum _{j=1}^m \sum _{k=1}^{k_j} E'_{jk} + L'_{jk} \end{aligned}$$
(1)

where : \(k_j\) is the number of effective occurrences of the maintenance task \(M_j\).

\(E'_{jk}\) : the advance of the kth occurrence of \(M_j\).

$$\begin{aligned} E'_{jk} = max(0, (t'_{jk-1} + p'_{jk} + T_{minjk}) - t'_{jk}) \end{aligned}$$

\(L'_{jk}\) : the delay of the kth occurrence of \(M_j\).

$$\begin{aligned} L'_{jk} = max (0, t'_{jk} - (t'_{jk-1} + p'_{jk} - T_{maxjk}) \end{aligned}$$

To optimize the two evaluation function \(C_{max}\) and \(f_1\) we consider the following objective function:

$$\begin{aligned} f = \alpha C_{max} + \beta f_1 \end{aligned}$$
(2)

\(\alpha \) and \(\beta \) are parameters that measure the respective contributions of production and maintenance in the global objective function. In case of only production scheduling, the objective function is obtained by putting the coefficient \(\beta \) equal to 0. The objective function is first used to generate initial joint scheduling using methods and strategies presented in “Joint scheduling strategies” Section. Then, it is used to evaluate performances deviation of these scheduling after application of disturbances according to the procedure presented in “Scheduling robustness” section.

Joint scheduling strategies

The activities of production and maintenance appear as antagonistic. Indeed, if the production has the role of operating a system to achieve certain objectives, maintenance often requires system stops to maintain it. However, the connection between these two functions is necessary in order to achieve an optimal system functioning. The objective is to respect the best equipments maintenance frequency with a minimal damage to the production plan. To do this, we will consider two options: the sequential strategy supported by two separate services and the integrated strategy supported by a single production and maintenance service. These strategies use methods of solving permutation flow shop workshops scheduling that are based on techniques of combinatorial optimization.

Traditionally, the field of combinatorial optimization has preferred the exact methods to the detriment of approximate methods. The use of accurate methods allows benefiting from strong theoretical results accompanying notions of convergence and global optimality. Thus, several classical problems (reasonably sized) can be solved exactly, in a reasonable period of time. The exact methods generally involve enumerating all solutions of the search space in order to select the best one. In addition, to improve the enumeration of solutions, exact methods have techniques to detect the earliest possible failures and specific heuristics to guide choices. However, the computation time of such methods is exponential which increases as a function of the problem size. Therefore, these methods are only usable on problems where the number of combinations is small enough to explore the solutions space in a reasonable time. As exact methods applied to production scheduling problem, we can cite linear and dynamic programming (Johnson and Montgomery 1974) and Branch and Bound algorithm (Park and Kim 2000).

Recently, there is growing interest for the ability of the approximate methods, particularly metaheuristics, to provide high quality solutions for a reasonable computation time. The loss of optimality with these methods is compensated by the decrease of calculation time and therefore an increased responsiveness. Metaheuristics also benefit from other significant advantages such as their quick adaptation to structural changes of the problem. All metaheuristics are based on a balance between intensifying research and diversifying it. On the one hand, intensification allows searching for higher quality solutions based on already found ones and on the other hand, diversification implements strategies to explore a larger solutions space and escape from local minima. We present in the following an adaptation of some joint scheduling resolution metaheuristics, that provide this balance, to the sequential and integrated strategies. Before that, we give the solution coding that we used to represent joint scheduling.

Solution coding

A joint scheduling solution describes the executing sequence of production and maintenance tasks on all machines. Therefore, each solution is encoded by two components. The first component is a sequence S which represents order of production tasks execution coding as vector of size n (number of tasks) which elements represent tasks numbers. The second component is a matrix M that represents the insertion locations of maintenance tasks. An element M[jk] of the matrix M represents the insertion location of the kth maintenance task on the jth machine according to the sequence S. For example, \(M [1, 2] = 5\) means that the second maintenance task of the first machine is inserted at the 5th position in the sequence S.

Sequential strategy

We give in the following settings and algorithms of the approaches that we used in the first phase of the sequential strategy to know the production scheduling generating. Then, we present the second phase of this strategy which concerns the heuristic of maintenance tasks insertion.

Ant colony optimization

Ant colony optimization (ACO) is a metaheuristic proposed by in Dorigo (1992) for solving combinatorial problems. It has proved its efficiency in many optimization problems, especially in production scheduling (Alaykyran et al. 2007; Marimuthu et al. 2009). In the following we show our ACO implementation for generating a joint scheduling with the sequential strategy in a permutation flow shop workshop.

  • (1) Pheromone information initialization we represent pheromone information (called also intensity) in a general matrix \(\tau \) of dimension \(n \times n\) with n the number of tasks. A value of an element \(\tau [i, r]\), such that i represents the source task and r the destination task is initialized by a real value equal to 0.1.

  • (2) Heuristic information initialization this information, called also visibility, is represented by a global matrix \(\eta \) of dimension \(n\times n\), such that n is the number of tasks. An element \(\eta [i, r]\) represents quality of a task r for an ant positioned on a task i. Elements of the matrix \(\eta \) are initialized by values related to the processing times of the tasks belonging to the neighbourhood of an ant a using the following formula:

$$\begin{aligned} \eta _{ir} = 1 - \frac{\sum _j p_{rj}}{\sum _{l\in N_i^a}\sum _j p_{lj}} \end{aligned}$$
(3)

where:

  • i : current task;

  • \(p_{rj}\) : processing time of a task r on a machine j

  • l : tasks belonging to the neighbourhood \(N_i^a\) of the ant a;

  • (3) Ants positioning each ant is positioned on a randomly chosen task which will be inserted into a Tabu list belonging to this ant. Tabu list of an ant is a vector of a maximum size equal to the number of tasks in the problem.

  • (4) Solution construction (production sequence) each ant a selects the next task to be scheduled by calculating a transition probability at a time t given by:

$$\begin{aligned} \Pi _{ir}^a = \frac{\left[ \tau _{ir}(t) \right] ^\alpha \left[ \eta _{ir}(t) \right] ^\beta }{\sum _{l\in N_i^a \left[ \tau _{ir}(t) \right] ^\alpha \left[ \eta _{ir}(t) \right] ^\beta }} \end{aligned}$$
(4)

where: \(N_i^a\) : set of neighbours of task i not yet visited by ant a; \(\alpha \) and \(\beta \) are two parameters controlling the relative intensity \(\tau _{ir}\) and the visibility \(\eta _{ir}\) of the track (ir).

Then, the selected task in the previous step is inserted in the Tabu list of the ant a. An ant moving from a task i to a task r reduces the value of the pheromone information value \(\tau _{ir}\) on the track (ir). This update is performed in order to make less attractive the visited task to further research. The new value of \(\tau _{ir}\) is calculated as following :

$$\begin{aligned} \tau _{ir}(t + 1) = (1 - \rho ) \tau _{ir}(t) + \rho \tau _0 \end{aligned}$$
(5)

where:

  • \(\rho \) : the evaporation rate of pheromone with \(\rho \in \left[ 0, 1 \right] \);

  • \(\tau _0\) : the initial value of the pheromone on the track (ir). If there are still tasks to be scheduled by the ant a, then the construction process is repeated. The production sequence obtained by the ant a is the sequence contained in its Tabu list.

  • (5) The best solution updating once each ant has constructed a solution (production sequence), an update is performed to the circuit of the ant that generated the best solution. This update concerns the matrix \(\tau \) (pheromone information) and it is made according to the following formula:

    $$\begin{aligned} \tau _{ir}(t + 1) = (1 - \alpha ) \tau _{ir}(t) + \alpha \Delta \tau _{ir}(t) \end{aligned}$$
    (6)

    where : \(\Delta \tau _{ir}(t) = 1 / Objective function\).

  • (6) Solution improvement a kind of hybridization of the ACO algorithm and the local search heuristic (Michiels et al. 2010) is used in order to improve generated solutions and then accelerate the convergence process. In our ACO implementation, three levels of improvements are used (presented in Algorithm 1).

  • (7) Stop criterion The ant colony is restarted from step (3) while the fixed number of generations is not reached. We give in the following the general algorithm (Algorithm 1) that illustrates the different stages of our ACO implementation.

figure a

Genetic algorithm

Genetic algorithm (GA) (Goldberg 1989) belongs to the class of evolutionary algorithms. These algorithms generate solutions to optimization problems using techniques inspired by natural evolution. GA has already demonstrated its performances to solve production scheduling problems (Chiou et al. 2012; Ventura and Yoon 2013). We present here steps of our GA implementation for a joint production and maintenance scheduling problem using sequential strategy.

  • Initial population generating initial population is formed by a set of randomly generated individuals representing production sequences.

  • Adaptation function as dealing with a problem that aims a minimization of the objective function, we chose the following adaptation value : \(1 / (1 {+} objective function)\). The objective function is added to 1 to avoid division by zero.

  • Selection we used the roulette wheel method that provides a selection of the best individuals by rank and tournament (Goldberg 1989).

  • Crossover the objective is to combine two individuals called parents to make one or two new individuals called sons. In our case the k-point crossover (Goldberg 1989) is used. Its principle is to randomly generate k points of crossover which provides \(k +1\) segments in each parent. Then to copy odd parts (respectively even parts) of the first parent in the first son (respectively the second son). The missing segments are completed by the missing tasks according to their apparition order in the second parent.

  • Mutation it consists of swapping tasks of two randomly chosen positions of an individual according to some mutation rate.

  • Replacement we implemented the N-Best strategy (Goldberg 1989) where the N (given as parameter) best individuals are chosen to form the new population.

  • Evaluation adaptation values (fitness) are used to favour the best individuals. This evaluation is only conducted on new individuals selected for replacement.

  • Stopping criterion the algorithm stops when the number of generations which is defined by the user is reached or when a stagnation of the best solution is observed.

The general functioning principle of Genetic Algorithm used with the sequential strategy is given in the following (Algorithm 2).

figure b

Tabu search

Tabu search (TS) is an iterative metaheuristic introduced by Glover (1986). It is very effective to solve a considerable number of combinatorial optimization problems, in particular production scheduling problem (Bozejko et al. 2013; Grabowski and Wodecki 2004). In the following, we present our adaptation of Tabu search used with the sequential strategy for solving the joint production and maintenance scheduling.

Initial solution and neighbourhood generating To generate an initial production sequence of size n, we first initializes it with a canonical solution \(\lbrace 0, 1, 2, \ldots , n \rbrace \). Then, some changes are performed on this solution by randomly generating two positions and switching the tasks in these positions. The same principal is used to generate neighbourhood of a production sequences used by Tabu search algorithm (Algorithm 3).

Tabu list Tabu list allows to avoid blockages in local minima by prohibiting consideration of previous configurations. The list size is maintained throughout the method evolution. If its size is reached while a good solution is presented then the oldest solution in the list is removed to insert the new one.

Stopping criterion Search stops when a certain number of iterations is reached (defined as parameter) or when no improvement is observed for some number of iterations (stagnation). Algorithm 3 presents our adaptation of Tabu search used with the sequential strategy.

figure c

Maintenance tasks insertion

The principle of maintenance tasks insertion in a production sequences obtained by one of the previous algorithms is to process all maintenance tasks of the first machine, then we move to the second and so on until the last machine according to the depth-first search algorithm (Kleinberg et al. 2006). This strategy does not bear risks of missing tasks and has the advantage that an inserted maintenance task will no longer be shifted.

Insertion of the kth occurrence of a maintenance task on a machine j is done at locations that minimize a partial objective function (a partial evaluation of the obtained joint scheduling) according to the following cases:

  • Within its tolerance interval \([T_{minj}, T_{maxj}]\). In this case, there is no delay of a maintenance task.

  • Before its \(T_{minjk}\). To avoid having a large number of maintenance tasks too close, this insertion is possible only if \(T_{minjk}\) does not exceed a margin \(\tau \times T_{minj} (0<\tau <1) : T_{minjk} - t'_{jk} \le \tau \times T_{minj}\).

  • After its \(T_{maxjk}\) (delayed task). To ensure that no maintenance task will be missed, this insertion is possible only if its delay does not exceed a margin \(\tau \times T_{maxj} (0<\tau <1) : t'_{jk} - T_{maxjk} \le \tau \times T_{maxj}\).

Integrated strategy

The integrated strategy consists on ordering production and maintenance tasks at the same time, while optimizing the overall objective function. In the following, we present the integrated versions of the same approaches used with the sequential strategy.

Integrated ant colony optimization

Steps for generating an ACO joint scheduling using the integrated strategy are the same as those of the sequential strategy. The main modifications concern the solution construction and the best solution updating. These modifications are given in the following.

Construction of a joint solution (production and maintenance sequence) In this step, each ant builds a production sequence using the same procedure described in the sequential strategy. Then maintenance tasks are inserted using the depth-first search algorithm after each production sequence generating by each ant. The insertion of maintenance tasks at this level allows the evaluation to be done on the joint sequence and not only on the production sequence. The best solution is the joint scheduling that optimizes (minimizes) the overall objective function.

The best solution updating Once each ant in the colony built a solution (joint scheduling), an update is performed to the circuit of the ant which obtained the best solution at each generation. The considered best solution here is the one that minimizes the overall objective function given in formula (2).

Integrated genetic algorithm

The main difference between GA using the integrated strategy and GA using the sequential strategy is that, with the integrated strategy, the crossover and mutation operators consider at the same time production and maintenance tasks. In the following, we present the principle of these operators.

Crossover The final crossover operators are obtained by combining crossovers operators on production and those on maintenance. Crossover operators on production tasks are those defined for the sequential strategy and presented in “Genetic algorithm” section. The new crossover operator on maintenance is inspired by the k-points crossover operator principle. It consists on generating \(k \ (k \in [0 \ldots m])\) random points of crossover on each parent (this provides \(k + 1\) segments in each parent). Then, copying maintenance tasks located in the even parts (respectively odd parts) of the first parent into the even parts (respectively odd parts) of the first son (respectively the second son). In the same way, copying maintenance tasks located in the odd parts (respectively even parts) of the second parent into the odd parts (respectively even parts) of the first son (respectively the second son).

Mutation The mutation operator on production follows the same principle as that of the sequential strategy. At the same time, locations of maintenance tasks will be changed relatively to the new locations of the production tasks. In addition, randomly shifts to the left, or to the right, are applied on some maintenance tasks.

Integrated tabu search

In the following, we present the main changes of the integrated Tabu search algorithm compared to that used with the sequential strategy.

Initial solution generating To obtain the initial solution, Tabu search began by randomly generating a production sequence using the same principal presented in “Tabu search” section. Then, it inserts maintenance tasks in the obtained production sequence using the heuristic presented in “Maintenance tasks insertion” section.

Neighbourhood generating We present in the following two principles of movements used to generate the neighbourhood solutions :

  1. 1.

    Neighbourhood generating by maintenance tasks shifting: a maintenance task can have several possible locations in its tolerance interval. Therefore, it is useful to generate the neighbourhood of a solution using some possible locations that one or more maintenance tasks can have on one or more machines.

  2. 2.

    Neighbourhood generating by production tasks swapping: the purpose of this operation is to create new individuals by changing the execution order of production tasks groups using the same principle presented in “Tabu search” section, but keeping the right maintenance tasks location in relation with production tasks.

Hybrid methods

This trend of solutions aimed hybridization of optimization methods by making two or more of these methods working together to achieve the best solutions (Costa et al. 2015; Laha and Chakraborty 2009; Meeran and Morshed 2012). In our study, a hybridization between methods mentioned above is tried. Hybridizations are performed on the two joint scheduling strategies, the sequential and the integrated one. In the following, we propose two hybrid methods used in our study.

GA hybridization with tabu search (GAH) GA algorithm generates an initial solution for Tabu search. Then, Tabu search try to improve this solution.

ACO hybridization with tabu search (ACOH) The same precedent principle is applied, Tabu search starts with an initial solution generated by the ACO algorithm, then try to improved it.

Parameter tuning

Many parameters have to be tuned for the metaheuristics used in our study. In literature, there are two different strategies for parameter tuning : the off-line parameter initialization (or meta-optimization) and the online parameter tuning strategy (Talbi 2009). In off-line parameter initialization, the values of different parameters are fixed before the execution of the metaheuristic, while in the online approach, the parameters are updated dynamically during the execution of the metaheuristic. In our experimentations, we used the off-line tuning method “irace” (López-Ibánez et al. 2011), a publicly available implementation of Iterated F-Race proposed by Balaprakash et al. (2007) and further developed by Birattari et al. (2010). The main purpose of irace is to automatically congure optimization algorithms by nding the most appropriate settings given a set of tuning instances of an optimization problem. Irace starts by sampling a number of parameter settings of a given algorithm uniformly at random. Then, at each iteration, it selects a set of best settings using a racing procedure (Maron and Moore 1997) and Friedman’s non-parametric two-way analysis of variance by ranks (Daniel et al. 1990). This racing procedure iteratively runs the algorithm settings on a sequence of problem instances, and rejects settings as soon as there is enough statistical evidence that they perform worse than the best one. Then, the best settings are used to bias a local sampling model. The next iteration begins by selecting new settings from this model, then it applies the racing procedure to these settings with the previous best settings. This procedure continues to run until a given computational budget is exhausted. This computational budget may be an overall computation time or a number of experiments, where an experiment is the application of a configuration to an instance. In our study, we tuned each optimization algorithm (ACO, GA and TS) using a budget of 5000 experiments.

Scheduling robustness

In literature of scheduling in dynamic and stochastic environments, the robustness of a solution usually refers to a scheduling prepared for uncertainties of future unknown events (Cowling and Johansson 2002; Jensen 2001; Yellig and Mackulak 1997). The uncertainties may be from changes in the environment or from inaccurate execution of the solution itself. This is why the authors in Billaut et al. (2013), Pinedo (2012) noted that the robustness is a difficult concept to measure or even to define. Some authors prefer to use the concept of robust methods to describe a method that provides scheduling which preserves its performances in the presence of disturbances (Daniels and Kouvelis 1995; Shafaei and Brunn 1999).

The notion of disturbances here, is defined generally as any unanticipated operational event introduced during the implementation of the scheduling and which can affect its performances (Cowling and Johansson 2002; Mehta 1999; Shafaei and Brunn 1999). The disturbances considered in our study concern extensions of tasks processing time according to some classes and levels. These extensions can be justified by the need of having a better product quality or by consequences of eventual machines failures.

In literature, there are several approaches proposed to handle disturbances when solving optimization problems, a part of these approaches are cited in Bertsimas et al. (2011). However, a practical drawback of these approaches is that they are note computationally tractable for some optimization problems where the number of possible combinations is important (case of flow shop scheduling when considering a significant number of machines and tasks). In addition, these approaches generally have some cost in terms of optimality associated with robustness (Bertsimas and Sim 2004).

In the following, we give some necessary definitions for designing a model to study the robustness of joint production and maintenance scheduling in the presence of disturbances. In this model, we generate and apply disturbances on both production and joint scheduling. Our approach doesn’t handle disturbances when generating scheduling but it evaluates the robustness of the generated scheduling using algorithms presented in “Joint scheduling strategies” section. Our approach can also be adapted to study the robustness of joint scheduling generated using other metaheuristics or exact methods.

The disturbances classes represent the rate of tasks assigned to the machines subjected to disruptions (which will fail for example). Three classes intervals noted \(c_i\) (\(c_i \in \) { \([0, 40\%]\), \(]40, 70\%]\), \(]70, 100\%]\) }) are considered. The disturbances rate \(U_c\) is randomly chosen following a uniform distribution in the interval \(c_i\). \(U_c \sim U [a_i, b_i]\) with \(a_i\) and \(b_i\) are the interval bounds.

Disturbances levels are used to calculate the maximum extending processing time rate of tasks subjected to disturbances. Ten intervals noted \(l_i\) are considered. The extending rate is randomly chosen following a uniform distribution in the interval \(l_i\) (\(l_i \in \) { \([0, 10\%]\), \(]10, 20\%]\), ..., \(]90, 100\%]\) }). \(U_l \sim U[ \alpha _i, \beta _i]\) with \(\alpha _i\) and \(\beta _i\) are the interval bounds.

Assuming that there are n production tasks then the total number of tasks to disturb NT is given by the following formula:

$$\begin{aligned} NT = [n *U_c] \end{aligned}$$
(7)

where : [i] is the integer part of i.

Note that a total number of tasks to disrupt NT equal to 0 can be obtained using the latter formula. To avoid this case, NT is considered to be at least equal to 1.

On the other hand, a disturbance level is used to specify the maximum extension rate of production tasks processing times. New processing times of tasks subjected to disturbances is calculated using the following formula:

$$\begin{aligned} p_{ij}'' = p_{ij} *(1 + \xi _{ij}) \end{aligned}$$
(8)

where :

  • \(p_{ij}''\) : the new processing time of the disturbed task P(ij).

  • \(p_{ij}\) : the initial processing time before disturbing the task P(ij).

  • \(\xi _{ij}\) : the disturbance coefficient of the task P(ij). For production scheduling, before inserting maintenance tasks, \(\xi _{ij}\) is equal to \(U_l\).

For joint production and maintenance scheduling, the calculation of \(\xi _{ij}\) is more complicated. In fact, the insertion of maintenance tasks in the production scheduling may reduce the machines failures. Since the machines reliability does not last a long time, the risk of failure increase, the further one gets away from the last maintenance intervention. Thus, the evolution of the failure (disturbance) intensity in a period of time bounded by two successive maintenance operations on the same machine can be divided into two intervals. A first interval where the failure intensity is nil, which means that no disruption application is allowed since the machine is just subjected to a maintenance operation. This first interval is denoted by Rj for a given machine j. A second interval where the risk of a machine failure increases progressively as her reliability decreases. This interval is called the disturbance interval and denoted by \(D_{jk}\) for a given maintenance occurrence k on a machine j.

Within the second interval, calculation of disturbances coefficient \(\xi _{ij}\) depends on the level of disturbances (the maximum disturbance intensity to be considered) and the the distance of the production task from the last maintenance task performed on the same machine. For this reason it was assumed to divide the disturbance interval \(D_{j}\) in ten subdivisions. These subdivisions are used to realize the effect of maintenance tasks on production tasks.

If we consider these dependencies, the value of \(\xi _{ij}\) in case of a joint scheduling is given by the following formula:

$$\begin{aligned} \xi _{ij} = \chi _{ij} *U_l \end{aligned}$$
(9)

where \(\chi _{ij}\) depends on the subdivision in which the task P(ij) is located. In the following we present the steps of finding the formula used to calculate its value: First, the duration \(\rho \) between the starting date \(t_{ij}\) of the task P(ij) and the end date of the reliability interval Rj, is calculated by the following formula:

$$\begin{aligned} \rho = t_{ij} - (c'_{jk} + Rj) \end{aligned}$$

Now, if we denote by \(I_{jk}\) the time between two successive maintenance tasks on the same machine j (between the kth and the \((k+1)\)th occurrences) then the duration of the disturbance interval \(D_{jk}\) is given by the following formula : \(D_{jk} = I_{jk} - Rj\).

By performing a three-rule, the number of subdivisions x in \(\rho \) may be calculated, in order to find the subdivision in which the task P(ij) is located as follows : \(x = \rho *10 / D_{jk}\).

Finally, by substituting x in the formula \(\chi _{ij} = x / 10\), we will have:

$$\begin{aligned} \chi _{ij} = (t_{ij} - (c'_{jk} + Rj))/(I_{jk} - Rj) \end{aligned}$$

Thus, the general formula of the disturbance coefficient is given by :

$$\begin{aligned} \xi _{ij} = \frac{t_{ij} - (c'_{jk} + Rj)}{D_{jk} - Rj} *U_l \end{aligned}$$
(10)

The application of disturbances to the scheduling causes some problems due to the new coordinates of production tasks, which do not respect the nature of the workshop and the precedence constraints between tasks. In other words, overlaps, which are defined as an interlacing or a superposition of two tasks, have a great chance to appear. Two types of overlaps can be encountered : horizontal overlaps (between two tasks in succession on the same machine) and vertical overlap (between tasks of the same job).

To address this problem, we opted for the right shift approach (Vieira et al. 2003), which is a reactive approach to solve the problem of overlaps while maintaining the tasks order. This order should be maintained to make a valid comparison between performances of the initial scheduling and performances of the disturbed ones.

If the task concerned by an overlap is a maintenance task, two situations arise. If its shifting does not cause exceeding of its \(T_{max}\), then this shifting will be performed. Else it will be switched with the production task that caused the overlap. The switched production task will not be disturbed since it will be located just after the maintenance task, i.e. in the reliability interval. The starting date of the maintenance task will be chosen so that it does not come before its \(T_{min}\).

Since the calculation of the new processing times after disturbances uses a deterministic function (formula (10)), the complexity of our approach is mainly related to the right shift algorithm when overlaps occur. Right shift rescheduling means simply to delay the entire scheduling by the duration of the overlap. Since this is done by adding a fixed increment of time to each tasks of the scheduling, the right shift algorithm is clearly polynomial, since the computation only involves the addition of a time increment to each of the remaining operations. At most m*n operations are concerned by this shift.

The general principle of disruption is given in the Algorithm 4.

figure d

Experimental results

To implement the model presented in the previous sections, a sample of 20 initial scheduling is generated by each method and strategy presented in “Joint scheduling strategies” section. Each initial scheduling is disturbed according to 10 levels and 3 classes. For each class and level, 100 simulations are performed and the averages results of these 100 simulations are considered. The differences between the objective functions values of the disturbed scheduling and the initial scheduling (before application of disturbances) are considered to evaluate the performances deviations of these scheduling.

Benchmarks of Taillard (1993) and Reeves (1995) are used as production data. Maintenance data are randomly generated using a specific service developed to perform this study.

The experimental studies presented in this section try to answer the two essential questions of our study, to know : are there robust scheduling methods? Does maintenance contribute to the robustness of joint scheduling? In the following we present the experimental results according to these two questions.

Are there robust scheduling methods?

Figure 2 shows the performances deviations averages of the production scheduling calculated from the combined levels and classes of disturbances according to the used method and the workshop size. The main remark, is that methods robustness is inherent with small benchmarks for the most cases of the disturbance protocol. Moreover, the results of hybrid methods, ACO and GA are generally better than those of Tabu search.

Fig. 2
figure 2

Performances deviations averages of the production scheduling methods

Figure 3 shows the performances deviations averages of the joint scheduling obtained from the combined levels and classes of disturbances using the sequential strategy according to the initial scheduling method and the workshop size. The general trend of the curves is towards a full influence of machines and tasks number. We note again that generally, results of small benchmarks are better than those of big ones. Furthermore, in the first position we have the results of the hybrid methods which are better than those of the evolutionary methods (ACO and GA). In the last position we have results of the iterative method TS.

Fig. 3
figure 3

Performances deviations averages of the joint scheduling methods using the sequential strategy

Figure 4 shows the performances deviations averages of the joint scheduling using the integrated strategy calculated from the combined levels and classes of disturbances according to the initial scheduling methods and the workshop size. Again, the conclusion remains the same, since the robustness of scheduling methods is related to the disturbance protocol (level, class and size of the studied problems). Thus, the general trend remains towards results deterioration which increases by raising the workshop size. Although, the best results are those given by the hybrid methods followed by the evolutionary methods (ACO and GA) which give better results than the iterative method TS.

Fig. 4
figure 4

Performances deviations averages of the joint scheduling methods using the integrated strategy

Does maintenance contribute to the robustness of joint scheduling?

The objective is to demonstrate that, when disturbances occur, performances loss of joint production and maintenance scheduling is less than performances loss of only production scheduling. Figure 5 shows the performances deviations averages of the joint scheduling strategies compared to those of the production scheduling obtained from the combined levels and classes of disturbances of all scheduling methods presented in “Joint scheduling strategies” section.

The first observation concerns the impact of maintenance on robustness of scheduling. If we take any workshop (small or big), we note a clear performance stability of joint production and maintenance scheduling compared to only production scheduling. On average, \(75\%\) of disrupted production scheduling (different levels and classes) have a deviation over than \(30\%\). Another remark is that for small workshops, scheduling using the integrated strategy generally gives the best results. Also, it is important to note that the results of small workshops (sequential and integrated strategy) are better than those of the big ones because of the high density of these latter on tasks and machines.

Fig. 5
figure 5

Performances deviations averages of joint and production scheduling

As conclusion of this part, we can say that the insertion of maintenance tasks in production scheduling is certainly done at the expense of performance of these later (while remaining in very acceptable limits), but it provides a gain in terms of robustness. Indeed the contribution of maintenance is located at the reliability intervals of the maintained machines which allows the absorption of some disturbances. Furthermore, the general trend is the same for both strategies, however, the generated scheduling by the integrated strategy are more robust than those generated by the sequential strategy if we consider the small workshops.

Fig. 6
figure 6

Comparative study between joint scheduling strategies

Comparative study

This section gives a comparative study between the robustness of our joint scheduling strategies using hybridization of ACO and Tabu search approach (which gave the best results in “Experimental results” section) and other strategies used to generate production and maintenance scheduling. In the following, we denote by ACOHSEQ and ACOHINT, our sequential and integrated strategy using hybridization of ACO and Tabu search approach. This study uses the same experimental conditions presented in “Experimental results” section.

The first strategy considered in our study, called SALPT, was presented in Allaoui and Artiba (2004). To develop this strategy, the authors used the simulated annealing heuristic (Černỳ 1985), a dispatching rule that sequences the jobs in a decreasing order of the total processing times and a flexible simulation model. The second strategy of our study was presented in Ruiz et al. (2007). This strategy, called PACO, is an adaptation of the ant colony optimization algorithm to implicitly consider preventive maintenance operations in a permutation flow shop problem. The proposed preventive maintenance policy aims to maintain a minimum level of reliability after the production period. The third strategy was proposed in Wong et al. (2013) and considers multiple resources and preventive maintenance tasks in traditional production scheduling. This strategy, called joint scheduling (JS), adopts the multi-resource maintenance modelling approach where the starting times of the preventive tasks were treated as decision variables. The new problem is formulated as an integer-programming model and is solved by the genetic algorithm approach.

Figure 6 shows the performances deviations averages of the obtained joint scheduling calculated from the combined levels and classes of disturbances according to the used strategy and the workshop size. The main remark is that the robustness of joint scheduling strategies is related to the considered number of machines and tasks. Thus, the general trend of the curves is towards results deterioration which increases by raising the workshop density on tasks and machine. Although, the hybrid method using our integrated strategy (ACOHINT) gives better results than the other strategies.

Another observation concerns the impact of maintenance on robustness of scheduling. If we consider the results presented in Figs. 2 and 6, whatever the considered strategy, performance loss of disturbed joint scheduling is less than performance loss of only production scheduling.

Conclusion

This paper proposed a study of joint production and maintenance scheduling robustness in a permutation flow shop workshop using two strategies : sequential and integrated. As methods of scheduling resolution, we proposed an adaptation of Tabu Search, Genetic Algorithm, Ants Colony Optimization and some hybridizations of these methods.

Since in the real industrial workshops, the generated joint scheduling of production and maintenance tasks will be inevitably disrupted, we proposed a new approach to study the robustness of these scheduling. Our approach considers disturbances on the processing time of production tasks according to some levels and classes. Several experimental studies were conducted in this paper. Results of these experimentations show first the merits of maintenance on production scheduling robustness. Then, the comparative study of all studied methods showed that the integrated strategy is more robust than the sequential one, and that the hybrid and the evolutionary approaches generally give the best results.

Future work is to investigate the robustness of other scheduling methods with other workshops models such us job shop and open shop (Blazewicz et al. 1996; Malakooti 2013). For the scheduling robustness study, we can define another protocol and type of disturbances.