Keywords

1 Introduction

The resource constraint job scheduling (RCJS) problem is an \( NP \)-hard scheduling problem originally motivated by a mineral supply chain application. It involves simultaneously solving multiple single machine scheduling problems subject to a shared resource constraint. In mining supply chains this arises when multiple mines plan their production with a shared rail link that connects the mines to an export port.

Due to the complexity of the RCJS problem, several methods have been developed to solve it. Exact approaches such as integer linear programming [26] and constraint programming [15] have been attempted successfully [20, 21]. However, these approaches are computationally expensive and can only solve rather small instances. Hence alternatives such as metaheuristics [4] have been explored. Overall, the most effective methods so far are hybrid approaches, e.g., combinations of ant colony optimisation (ACO) and integer programming [22], ACO and constraint programming [7, 21], Lagrangian relaxation and particle swarm optimisation (PSO) [9] and column generation and differential evolution [18].

Project scheduling [2, 6, 8, 17], a very well-known class of problems, is closely related to the RCJS problem. There are two main differences: (1) in RCJS jobs must execute on the machine to which they are allocated, and (2) there is only one common shared resource. In addition most variants of project scheduling focus on minimising the makespan rather than tardiness. Brucker et al. [6] categorise project scheduling problem variants. Demeulemeester and Herroelen [8] investigate different heuristic and meta-heuristic approaches for the problem. Neumann et al. [17] tackle project scheduling with time windows and show that genetic algorithms, simulated annealing and exact approaches can be effective. Ballestin and Trautmann [2] explore a problem very similar to the RCJS problem, in which the objective is to minimise the cumulative deviation from the desired completion times of all the tasks. The approach they use is a population-based iterated local search. The studies from [5, 23, 24] investigate resource constrained project scheduling with the objective of maximising the net present value. Thiruvady et al. [23] show that a Lagrangian relaxation and ACO hybrid finds good heuristic solutions and upper bounds. Brent et al. [5] improve the same hybrid with a parallelisation in a multi-core shared memory architecture. Thiruvady et al. [24] show that a matheuristic derived from construct, solve, merge and adapt and parallel ACO improves upon previous approaches.

Unfortunately, current approaches require a substantial amount of computational resources, both in terms of computation time and in terms of parallel computing facilities. With the aim of deriving a computationally less intensive method, we tackle the RCJS problem in this work by means of a biased random key genetic algorithm (Brkga). This type of genetic algorithm [16] was first introduced in [11]. Since then, Brkgas have been shown to obtain excellent results for a substantial range of combinatorial optimization problems, including the maximum quasi-clique problem [19] and the project scheduling problem with flexible resources [1], to name just a few of the more recent applications. Furthermore, parallel and distributed versions of Brkga have been investigated [10, 12]. Júnior et al. [12] explore an irregular strip packing problem and the study by Alixandre and Dorn [10] shows good performance on the CEC 2013 benchmark datasets.

2 Resource Constrained Job Scheduling

The RCJS problem consists of a number of nearly independent single machine weighted tardiness problems that are linked by a single shared resource constraint. The problem can technically be described as follows. Each job from a given set \(J = \{1,\ldots ,n\}\) must execute in a non-preemptive way on one specific machine from a set M of machines. Each job \(j \in J\) has the following data associated with it: a release time \(r_j\), a processing time \(p_j\), a due time \(d_j\), the amount \(g_j\) required from the shared resource during the jobs execution, a weight \(w_j\), and the machine \(m_j \in M\) to which it belongs. The maximum amount of shared resource available at any time is G. Precedence constraints C may apply to two jobs on the same machine: \(i \rightarrow j \in C\) requires that job i completes before job j starts. The objective is to minimise the total weighted tardiness. Note that this problem is NP-Hard as the single machine weighted tardiness problem is already NP-hard [13].

This problem can be expressed in terms of a time-discretized integer linear program (ILP) as follows. Let \(T = \{1,\ldots ,t_{\max }\}\) be the set of considered discrete times (with \(t_{\max }\) being sufficiently large), and let \(z_{jt}\) be a binary variable for all \(j \in J\) and \(t \in T\) that takes value one if the processing of job j completes at time t or earlier. By defining the weighted tardiness for a job j at time t as \(w_{jt}:=\max \{0,w_j\,(t-d_j)\}\), the resulting ILP can be stated as follows:

$$\begin{aligned}&\min \quad {\sum _{j \in J}\sum _{t \in T}} w_{jt} \cdot (z_{jt} - z_{jt-1})&\end{aligned}$$
(1)
$$\begin{aligned} \text {s.t.}&z_{jt_{\max }} = 1&\forall \ j \in J \end{aligned}$$
(2)
$$\begin{aligned}&z_{jt} - z_{jt-1} \ge 0&\forall \ j \in J,\ t \in \{2,\ldots ,t_{\max }\} \end{aligned}$$
(3)
$$\begin{aligned}&z_{jt} = 0&\forall \ t \in T : t < r_j + p_j,\ j \in J \end{aligned}$$
(4)
$$\begin{aligned}&z_{bt} - z_{a,t-p_b} \le 0&\forall \ (a,b) \in C,\ t\in T: t> r_b+p_b \end{aligned}$$
(5)
$$\begin{aligned}&\sum _{j \in J^i} z_{j,t+p_j} - z_{jt}&\le 1&\forall \ i \in M,\ t\in T \end{aligned}$$
(6)
$$\begin{aligned}&\sum _{j \in J } g_j \cdot (z_{j,t+p_j}-z_{jt})&\le G&\forall \ t \in T \end{aligned}$$
(7)
$$\begin{aligned}&z_{jt} \in \{0&,1\}&\forall \ j \in J,\ t\in T \end{aligned}$$
(8)

Equalities (2) ensure that all jobs complete by \(t_{\max }\). Inequalities (3) guarantee that once a job completes it stays completed. Equalities (4) account for the release times of jobs. Inequalities (5) ensure that precedence constraints are satisfied and inequalities (6) make sure that at any time only one job is processed on a machine. Inequalities (5) require that the resource constraint on the common resource is satisfied at any time. There are many other ways to formulate this problem, but this is one of the most computationally efficient formulations [20].

3 A BRKGA for the RCJS Problem

A biased random key genetic algorithm (Brkga) is a steady-state genetic algorithm. The main machinery of the algorithm is problem-independent. Individuals are always coded in terms of random keys, that is, vectors of floating point values in [0, 1]. Moreover, the population management and the crossover operator are problem-independent as well. The only problem-dependent part is the way in which individuals are translated into valid solutions for the specific problem. The problem-independent part of the algorithm is shown in Algorithm 1. It starts by a call to function \(\textsf {GenerateInitialPopulation}(p_{\mathrm {size}})\) in order to generate a population P of \(p_{\mathrm {size}}\) random individuals. Hereby, each individual \(\pi \in P\) is a vector of length n (the number of jobs of the RCJS instance). The value of each position j of \(\pi \) (denoted by \(\pi (j)\)) is randomly chosen from [0, 1]. Note that \(\pi (j)\) is associated with job j of the RCJS instance. The next step consists of the evaluation of the individuals from the initial population, that is, the translation of the individuals into valid schedules for the RCJS problem, which will be explained in Sect. 3.1. As a consequence, each individual obtains its objective function value denoted by \(f(\pi )\). After that, the following actions are performed at each iteration of the algorithm’s main loop. First, the best \(\max \{\lfloor p_e \cdot p_{\mathrm {size}} \rfloor , 1\}\) individuals are copied over from P to \(P_e\) (function \(\textsf {EliteSolutions}(P,p_e)\)). Second, a set of \(\max \{\lfloor p_m \cdot p_{\mathrm {size}} \rfloor , 1\}\) so-called mutants—that is, randomly generated individuals—are produced and stored in \(P_m\). Next, a set \(P_c\) of \(p_{\mathrm {size}} - |P_e| - |P_m|\) new individuals are generated by crossover (function \(\textsf {Crossover}(P, P_e, prob_{\mathrm {elite}})\)). The generation of an offspring individual \(\pi _{\mathrm {off}}\) by crossover works as follows: (1) an elite parent \(\pi _1\) is chosen uniformly at random from \(P_e\), (2) a second parent \(\pi _2\) is chosen uniformly at random from \(P \setminus P_e\), and (3) \(\pi _{\mathrm {off}}\) is generated on the basis of \(\pi _1\) and \(\pi _2\) and stored in \(P_c\). Hereby, value \(\pi _{\mathrm {off}}(i)\) is set to \(\pi _1(i)\) with probability \(prob_{\mathrm {elite}}\), and to \(\pi _2(i)\) otherwise, for all \(i=1,\ldots ,n\). After generating all new offspring in \(P_m\) and \(P_c\), these new individuals are evaluated in function \(\textsf {Evaluate}(P_m \cup P_c)\). Remember that the individuals in \(P_e\) are already evaluated. Finally, the next generations’ population is obtained by the union of \(P_e\) with \(P_m\) and \(P_c\).

figure a

3.1 Evaluation of an Individual: The Decoder

The evaluation of an individual \(\pi \) (lines 4 and 9 of Algorithm 1) is the problem-dependent part of the Brkga. The function that evaluates individuals is called the decoder. In our Brkga implementation for the RCJS problem, the decoder involves the application of a greedy construction heuristic that was introduced in [25]. This greedy heuristic works as follows. It chooses, at each construction step, exactly one of the so-far unscheduled jobs, and provides it with a feasible starting time and, therefore, also with a finishing time. Henceforth, let \(J_{\mathrm {done}} \subseteq J\) be the set of jobs that are already scheduled, and let \(s_j\) denote the starting time of \(j \in J_{\mathrm {done}}\). At the start of the solution construction process it holds that \(J_{\mathrm {done}} := \emptyset \). The process stops when \(J_{\mathrm {done}} = J\).

Let \(\max _t := \max _{j=1}^n r_j + \sum _{j=1}^n p_j\) be a crude upper bound for the makespan of any feasible solution. Moreover, let \(C_j\) be the set of jobs that – -according to the precedence constraints in C—must be executed before j, and let \(M_{m_h} \subseteq J\) be the subset of jobs that must be processed on machine \(m_h \in M\). Furthermore, given a partial solution, let \(g^{\mathrm {sum}}_{t} \ge 0\) be the sum of the already consumed resource at time t.

Given \(J_{\mathrm {done}}\), the set of feasible jobs—that is, the set of jobs from which the next job to be scheduled can be chosen—is defined as follows: \(\hat{J} := \{j \in J \setminus J_{\mathrm {done}} \mid C_j \cap J_{\mathrm {done}} = C_j\}\). In words, the set of feasible jobs consists of those jobs that (1) are not scheduled yet and (2) whose predecessors are already scheduled. A time step \(t' \ge 0\) is a feasible starting time for a job \(j \in \hat{J}\), if and only if

  1. 1.

    \(t' \ge s_k+ p_k\), for all \(k \in J_{\mathrm {done}}\) \(\cap C_j\);

  2. 2.

    \(t' \ge s_k + p_k\), for all \(k \in M_{m_j} \cap J_{\mathrm {done}}\) (remember that \(m_j\) refers to the machine on which job j must be processed); and

  3. 3.

    \(g^{\mathrm {sum}}_{t} + g_j \le G\), for all \(t = t', \ldots , t' + p_j\).

Here \(T'\) is the set of feasible starting times for a job \(j \in \hat{J}\) and the earliest possible starting time \(s^{\min }_j\) is defined as \(s^{\min }_j := \min \{t' \mid t' \in T'\}\). Finally, for choosing a feasible job at each construction step, the jobs from \(j \in \hat{J}\) must be ordered in some way. In many scheduling applications, ordering the jobs according to their earliest possible starting times (in an increasing way) is a powerful mechanism. Therefore, our decoder combines the earliest starting time information with the numerical values of \(\pi \) in the following way. It produces an ordered list L of all the jobs j in \(\hat{J}\) sorted according to increasing values of \(\pi (j) \cdot (s^{\min }_j + 1)\). Then, the first job of L—let us call this job \(j^*\)—is chosen and added to \(J_{\mathrm {done}}\), and its starting time \(s_{j^*}\) is fixed to \(s^{\min }_{j^*}\).

3.2 Applying the Decoder in a Rollout Fashion

Any constructive heuristic can be applied in a so-called rollout fashion [3]. In the context of the decoder from the previous sub-section, this works as follows. Instead of ordering the jobs \(j \in \hat{J}\) at each construction step according to their \(\pi (j) \cdot (s^{\min }_j + 1)\) values, the decoder is completely applied to each partial solution \(J_{\mathrm {done}} \cup \{j\}\), for all \(j \in \hat{J}\). Hereby, the starting time of j is set to \(s^{\min }_{j}\) in each case. This provides us with \(|\hat{J}|\) complete solutions whose objective function values—henceforth called the rollout values—are then used for producing the ordered list L of all jobs from \(\hat{J}\) (in an increasing way). As in the standard decoder, the first job of L—let us call this job again \(j^*\)—is chosen and added to \(J_{\mathrm {done}}\), and its starting time \(s_{j^*}\) is set to \(s^{\min }_{j^*}\).

Even though applying the decoder in a rollout fashion provides better evaluations of the individuals, the computational time needed for evaluating an individual increases substantially. Therefore, we make use of the following techniques for shortening the run time:

  1. 1.

    We use an explicit rollout width \(\mathrm {ro}_{\mathrm {width}} > 0\). In those construction steps in which \(\mathrm {ro}_{\mathrm {width}} < |\hat{J}|\), the rollout is only applied to the first \(\mathrm {ro}_{\mathrm {width}}\) jobs from list L (when ordered according to the \(\pi (j) \cdot (\mathrm {st}^{\min }_j + 1)\) values). The remaining jobs in L receive a rollout value of \(\infty \). After that, the list L is reorderd according to the rollout values, the first job from L is selected and used to extend \(J_{\mathrm {done}}\), before we proceed to the next construction step.

  2. 2.

    The decoder is only applied in a rollout fashion (with a rollout width of \(\mathrm {ro}_{\mathrm {width}}\)) after a number of \(n_{\mathrm {noimpr}}^{\max } \ge 0\) consecutive Brkga iterations without an improvement of the best-so-far solution. After the execution of such a Brkga iteration in which the decoder is applied in a rollout fashion, the counter for consecutive non-improving Brkga iterations is re-initialized to zero, as at the start of the Brkga algorithm.

Clearly, \(\mathrm {ro}_{\mathrm {width}}\) and \(n_{\mathrm {noimpr}}^{\max }\) are two important algorithm parameters that control to what extent the decoder is applied in a rollout fashion.

4 Experimental Evaluation

All experiments concerning Brkga were performed on a cluster of machines with Intel\(^{\textregistered }\) Xeon\(^{\textregistered }\) CPU 5670 CPUs with 12 cores of 2.933 GHz and a minimum of 32 GB RAM. As mentioned before, the current state-of-the-art results for the RCJS problem were obtained by a recent hybrid algorithm labelled Cg-De-Ls that combines column generation with differential evolution and local search see [18]. Note that, while Brkga was run in a one-threaded mode with a limit of 3600 s of CPU time for each problem instance, Cg-De-Ls was implemented in a parallel framework and each run (limited by 3600 s of wall clock time) was given 16 cores on the Monash University’s Campus Cluster. Each machine of the cluster provides 24 cores and 256 GB RAM. Each physical core consists of two hyper-threaded cores with Intel Xeon E5-2680 v3 2.5 GHz, 30M Cache, 9.60GT/s QPI, Turbo, HT, 12C/24T (120W). In summary, consider that a run of Cg-De-Ls consumes at least one order of magnitude more computation time than a run of Brkga.

Problem Instances. The comparison of Brkga with Cg-De-Ls was conducted on 36 instances from a dataset that was originally introduced in [20]. This dataset consists of problem instances with the number of machines ranging from three to twenty, and there are three instances per number of machines. Each machine has to process, on average, 10.5 jobs; that is, an instance with three machines has approximately 32 jobs. Further details concerning the problem instances and their job characteristics (processing times, release times, weights, etc.) can be obtained from the original study.

Tuning of Brkga. The proposed Brkga approach has six parameters which require suitable values. In this work we made use of the automatic configuration tool irace [14] for finding such parameter values. More specifically, we aimed at identifying one parameter setting that works well for all 36 test problem instances. For this purpose, we selected six problem instances (having between 3 and 12 machines) from the additional instances provided in [20] which have not been tested in [18]. In addition, we added instances 15–3 and 20–5 from the 36 instances that will be used for the final experimentation, because [20] does not contain any other instances of that size. In total, this makes a set of eight tuning instances. The following parameter value ranges were considered:

  • \(p_{\mathrm {size}} \in \{10, 50, 100, 200, 500, 1000, 5000\}\).

  • \(p_e \in \{0.05, 0.1, 0.15, 0.2, 0.25\}\).

  • \(p_m \in \{0.1, 0.15, 0.2, 0.25, 0.3\}\).

  • \(prob_{\mathrm {elite}} \in \{0.5, 0.6, 0.7, 0.8, 0.9\}\).

  • \(\mathrm {ro}_{\mathrm {width}} \in \{2, 3, 5, 10, 20\}\).

  • \(n_{\mathrm {noimpr}}^{\max } \in \{10, 50, 100, 200, 500\}\).

In total, we allowed a maximum of 5000 experiments—with a computation time limit of 3600 s per run—for tuning. The results provided by irace were as follows: \(p_{\mathrm {size}} = 1000\), \(p_e = 0.25\), \(p_m = 0.15\), \(prob_{\mathrm {elite}} = 0.5\), \(\mathrm {ro}_{\mathrm {width}} = 3\), and \(n_{\mathrm {noimpr}}^{\max } = 200\). These parameter value settings were used for the final experimentation. The parameter settings of Cg-De-Ls (for the same set of problem instances) are described in [18].

4.1 Numerical Results

Brkga was applied ten times to all 36 considered problem instances with a CPU time limit of 3600 s per run. The numerical results—in comparison to those of Cg-De-Ls taken from [20]—are presented in Table 1. The first column provides the instance names. The following three columns show the results of Cg-De-Ls in terms of the best solution found in 30 runs (column with heading best), the average of the values of the 30 solutions found in 30 runs (column with heading avg) and the corresponding standard deviation (column with heading std). The same three columns (based on tens runs per problem instance) are provided for Brkga. Two additional columns provide information about the average computation time at which the best solution of each run was found and the corresponding standard deviation. Finally, note that values in columns avg are marked in bold font when the corresponding result is better (with statistical significance according to Student’s t-test with \(\alpha =0.05\)) than the result of the competing algorithm.

Table 1. A comparison of Brkga with CG-DE-LS [18]. Both algorithms were run 30 times on each problem instance and allowed 3600 s of run-time. Statistically significant results at \(\alpha = 0.05\) are shown in bold.

The results in Table 1 allow for the following observations:

  • For the small-medium problem instances (see the first 14 rows of Table 1) there is no a clear pattern, with Brkga outperforming Cg-De-Ls in some cases, and vice versa in others.

  • Starting from instance 7–47 (i.e., and all larger instances with 8 machines or more) Brkga clearly outperforms Cg-De-Ls. Hereby, the advantage of Brkga over Cg-De-Ls seems to grow with increasing problem instance size. In the case of the largest 11 instances, for example, the average performance of Brkga is better than the best solution values found by Cg-De-Ls.

In order to better understand the behaviour of Brkga, we provide graphics about the evolution of the best-so-far solution over time for four rather large problem instances in Fig. 1. More specifically, the graphics show the mean performance of Brkga over 10 runs, while the grey-shaded area around the curves show, for each time step on the y-axis, the performance of the worst run and of the best run among the 10 runs. Furthermore, the dashed horizontal lines indicate the value of the best solutions found by Cg-De-Ls within 30 runs, where each run made use of 16 threads in parallel. Finally, the vertical bars indicate the initiation of iterations with rollout evaluations (in any of the ten runs). In those cases in which such a vertical bar has a white square head, the rollout iteration was successful in the sense that the best-so-far solution was improved. Otherwise—that is, in those cases in which such a bar has a black diamond head—the rollout iteration was not successful. Note that in the context of instances 11–63 and 15–2 (Fig. 1a and b) only the successful rollout iterations are indicated, because showing all rollout iterations would have made these graphics unreadable.

Fig. 1.
figure 1

Evolution of the best-so-far solution of Brkga for four large problem instances. The curves show the mean performance over 10 runs, while the gray-shaded area behind the curves shows the spread of the 10 runs. The dashed horizontal bars indicate the best result of Cg-De-Ls after 30 runs. The vertical bars indicate the initiation of rollout iterations.

The graphics in Fig. 1 allow us to make the following conclusions:

  • First, in all four cases all ten runs of Brkga improve over the best solution found by Cg-De-Ls after a few hundred seconds. This is despite the fact that Cg-De-Ls makes use of 16 threads in parallel, while Brkga is run in one-threaded mode.

  • Second, the best moment to make use of rollout iterations seems to be when the algorithm is stuck for quite a while in a local minimum. Remember that the parameter setting was determined by our tuning procedure with irace, as described in the third paragraph of Sect. 4. The chosen settings are \(\mathrm {ro}_{\mathrm {width}} = 3\) and \(n_{\mathrm {noimpr}}^{\max } = 200\), that is, a very narrow rollout-width and a rather high number of consecutive non-improving iterations before a rollout iteration is initiated. The effect of this can be nicely seen in the four graphics. In fact, the first rollout iterations are—in all four cases—initiated after the algorithm has already outperformed Cg-De-Ls. The reason for making use of rollout iteration in this way is the significant difference in computation time requirements: a standard iteration requires 0.157 s for instance 11–63, 0.34 s for instance 15–2, 0.52 s for instance 20–2, and 0.64 s for instance 20–5. In contrast, a rollout iteration requires 12.7 s for instance 11–63, 41.1 s instance for 15–2, 89.3 s for 20–2, and 115.0 s for 20–5. That is, a rollout iteration consumes about two orders of magnitude more time than a standard algorithm iteration.

Summarizing, we can say that our Brkga algorithm significantly outperforms the current state-of-the-art algorithm Cg-De-Ls, especially with growing problem instance size. Moreover, the algorithm requires much less computational resources than its competitor from the literature.

5 Conclusions and Future Work

We considered the resource constraint job scheduling problem where multiple single machine scheduling problems are linked by one limited shared resource. The objective is to minimize the total weighted tardiness of all jobs. We tackled this problem by means of a biased random key genetic algorithm, which is a quite generic framework. For the problem dependent part of the algorithm—the decoder—we apply a greedy construction heuristic which processes the jobs in an order determined by the jobs’ random keys in combination with the earliest starting times. The basic greedy heuristic is further substantially enhanced by applying rollouts in a carefully controlled way in order to obtain a more promising ranking of the jobs. As rollouts are time-expensive, they are only used when the optimization gets stuck with the standard greedy criterion for a certain number of iterations.

Our experimental results show that in particular with growing problem instance size our approach significantly outperforms the leading column generation/differential evolution hybrid from the literature, both in terms of solution quality and computation time.