Comparison of ensemble learning methods for creating ensembles of dispatching rules for the unrelated machines environment

Ɖurasević, Marko; Jakobović, Domagoj

doi:10.1007/s10710-017-9302-3

Comparison of ensemble learning methods for creating ensembles of dispatching rules for the unrelated machines environment

Published: 08 April 2017

Volume 19, pages 53–92, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Comparison of ensemble learning methods for creating ensembles of dispatching rules for the unrelated machines environment

Download PDF

509 Accesses
39 Citations
Explore all metrics

Abstract

Dispatching rules are often the method of choice for solving various scheduling problems, especially since they are applicable in dynamic scheduling environments. Unfortunately, dispatching rules are hard to design and are also unable to deliver results which are of equal quality as results achieved by different metaheuristic methods. As a consequence, genetic programming is commonly used in order to automatically design dispatching rules. Furthermore, a great amount of research with different genetic programming methods is done to increase the performance of the generated dispatching rules. In order to additionally improve the effectiveness of the evolved dispatching rules, in this paper the use of several different ensemble learning algorithms is proposed to create ensembles of dispatching rules for the dynamic scheduling problem in the unrelated machines environment. Four different ensemble learning approaches will be considered, which will be used in order to create ensembles of dispatching rules: simple ensemble combination (proposed in this paper), BagGP, BoostGP and cooperative coevolution. Additionally, the effectiveness of these algorithms is analysed based on some ensemble learning parameters. Finally, an additional search method, which finds the optimal combinations of dispatching rules to form the ensembles, is proposed and applied. The obtained results show that by using the aforementioned ensemble learning approaches it is possible to significantly increase the performance of the generated dispatching rules.

Constructing Ensembles of Dispatching Rules for Multi-objective Problems

Creating dispatching rules by simple ensemble combination

Article 29 May 2019

Evolving Ensembles of Dispatching Rules Using Genetic Programming for Job Shop Scheduling

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Scheduling can be defined as a decision making process concerned with the allocation of tasks to scarce resources with the intention of optimising one or more user defined scheduling objectives [40]. Although different approaches have been defined for solving various scheduling problems, dispatching rules represent the methods of choice when dealing with dynamic scheduling problems. Dispatching rules (DRs) usually represent a simple function which determines the priorities of jobs that need to be scheduled, and based on those priorities decides which job should be scheduled. They are very popular methods for solving scheduling problems since they can be designed to optimise various scheduling criteria, and can be used for different scheduling environments and conditions. Since designing good DRs usually represents a lengthy trial and error process, researchers have focused on defining procedures which could automatically design new dispatching rules.

In order to deal with the problem of manual design of DRs, many different machine learning methods were used in order to automatically create DRs [5]. One of the most commonly used procedures in the automatic development of DRs is genetic programming (GP) [24, 41]. By using GP it is possible to create DRs for a wide variety of different scheduling conditions and scheduling objectives. This feature becomes even more important when there is a need to design DRs for arbitrary user defined criteria, since DRs for such criteria might not even exist. Additionally, DRs generated by GP have in most cases been able to outperform manually designed DRs. Because GP is able to generate good DRs efficiently, in recent years a lot of research has been undertaken in order to apply GP for generating DRs for a wide variety of scheduling problems, as well as to improve the performance of the generated DRs.

This paper analyses if the performance of DRs generated by GP can be improved by using different ensemble learning approaches. The motivation for using ensemble learning approaches comes from the fact that, in the machine learning field, ensemble learning approaches have shown to improve the results achieved for various classification problems [42]. Four ensemble learning approaches will be considered: simple ensemble combination, BagGP, BoostGP and cooperative coevolution. For each of the considered approaches the influence of the ensemble size and the ensemble combination method on the results will be analysed. Additionally, for all the aforementioned approaches a further step, which tries to find a better subset of DRs that should form the ensemble, is introduced.

The remainder of this paper is organised as follows. Section 2 gives a short literature overview concerned with the automatic creation of DRs with GP. The unrelated machines environment is described in Sect. 3. Section 4 describes the GP procedure used in order to automatically create DRs, while Sect. 5 describes the ensemble learning approaches used in this paper. The results achieved by the ensemble learning approaches are outlined in Sect. 6. In Sect. 7 a short discussion about the achieved results is given. Finally, Sect. 8 gives a short conclusion and outlines possibilities for future work.

2 Literature overview

Since it is able to evolve quite complex expressions and functions, GP has been used in the field of hyper-heuristics quite often [7, 8]. Consequentially, GP is also used in order to evolve new DRs for different scheduling problems. One of the first uses of GP in scheduling was in order to generate a sequence in which existing DRs need to be applied in order to create the schedule [9]. Miyashita later evolved DRs for the job-shop environment by using GP with a terminal set that contained several job properties [25]. In his work, Miyashita considered the scheduling environment as a multi agent system where each machine represented an individual agent. Based on that he proposed three different models: the homogeneous model, the distinct agent model and the mixed agent model. The homogeneous model generated a single DR for all machines in the scheduling environment. On the other hand, the distinct agent model generated a distinct DR for each machine in the scheduling environment. Finally, the mixed agent model combines the two aforementioned models in a way that two DRs are evolved, first of which will be used by bottleneck machines, while the second will be used by all other machines. Although the mixed agent model achieved the best results among the three multi-agent models, it comes with an obvious disadvantage, which is that the knowledge about which machines are bottlenecks needs to be known before the system starts with its execution. In their work, Jakobović et al. propose a GP model which extends the mixed agent model from Miyashita. Their GP approach generates three expressions instead of one. Two of those expressions represent regular DRs, while the third expression represents a decision function which determines which of the two DRs will be used for a concrete machine. In that way there is no need to have prior knowledge about which machines represent bottleneck resources, but this can rather be determined during the system execution using the decision function. Apart from its application in the single machine and job-shop environments, GP was also used to create new DRs in the parallel machines environment with good results [22].

Unlike in the aforementioned works, where only a single optimisation criterion was considered, Tay and Ho have used GP to generate DRs which were designed to optimise three criteria at the same time [45]. Hildebrandt et al. have performed an extensive analysis on creating DRs for the job-shop environment [16]. Jakobović and Marasović have further investigated the creation of DRs for the single machine and job-shop environments [23]. In their work they analysed the influence of the GP parameters on the quality of the evolved DRs. Apart from that they also analysed scheduling in the single machine environment under various constraints like set-up times and precedence constraints, and have also shown that GP was able to achieve better results than some standard DRs. Gene expression programming [11], a method similar to GP, was also used in order to evolve DRs for both the single machine environment [35] and job-shop environment [34]. The problem of global perspective of DRs and how GP can be used in order to evolve DRs with a better global perspective was analysed in [18]. In a study by Hunt et al. it was shown that GP is able to evolve optimal DRs for the static two machine job-shop environment, which demonstrates that with the right parameters GP is able to evolve optimal DRs. Different representations in GP were analysed by Nguyen et al. and it was shown that the representation used for evolving DRs influences the quality of the generated DRs [30]. A new GP approach which evolved iterative dispatching rules (IDRs) was proposed by Nguyen et al. [32]. Although the aforementioned approach was able to achieve better results than GP which evolves standard DRs, IDRs can only be used in the static environment in which information about the scheduling environment is known beforehand. Ɖurasevic et al. have compared several GP approaches for creating DRs in the unrelated scheduling environment, including GEP, IDRs and dimensionally aware GP [10]. Apart from generating DRs for the standard scheduling problems GP has also been applied in order to generate DRs for the order and acceptance (OAS) scheduling problem [26, 27, 37]. In the OAS problem, aside from scheduling jobs on machines, the system needs to decide which jobs will be accepted for scheduling. The generated DRs have also shown to be better than the standard DRs for the OAS problem, which shows that GP can generate DRs even for other forms of scheduling problems.

GP was also used in order to generate complete scheduling procedures (SPs), which consist of both DRs and due-date assignment rules (DDARs) [28, 33]. Those approaches used the cooperative coevolution procedure in order to generate two expressions (one of which represents a DR, while the other represents a DDAR) which together form a SP. The SPs evolved by GP have shown to be able to outperform some standard SPs from the literature. Nguyen et al. have used GP in order to generate DRs for optimising five scheduling criteria simultaneously and have shown that GP was able to evolve efficient DRs for the considered multi-objective criteria [29, 31]. A more in depth review of creating DRs by using GP can be found in [5].

Ensemble learning is often used in order to improve the performance of classifier systems [42]. Although ensemble learning approaches like bagging [6] or boosting [14] are commonly used in the machine learning community, ensemble learning approaches have not been as extensively used together with GP in order to improve its performance. Some notable applications of GP ensembles in the literature include classification with unbalanced data [3, 4], pattern classification [13] and intrusion detection [12]. GP ensemble learning approaches have been used for creating ensembles of DRs in only few occasions. In their work Park et al. [38] used the cooperative coevolution approach in order to create ensembles of DRs, and it was shown that such an approach achieves better results than standard GP. Unfortunately, in their work they only considered the static scheduling environment and did not additionally consider dynamic scheduling. Hart and Sim [15] propose a new hyper-heuristic called NELLI-GP which was used to solve static job-shop scheduling problems. This method creates an ensemble of DRs where each DR in the ensemble tries to adapt to a certain subset of problem instances.

3 Unrelated machines environment

The unrelated machines environment can be defined as a scheduling environment which consists of n jobs that need to be scheduled on one of the m available machines. Each machine can only execute one job at a time, and similarly, each job can only be executed by one machine. Preemption is not allowed, meaning that when a job starts executing on a given machine, it will execute until it is completed, after which a new job can be scheduled on the machine. Additionally, if release times are defined for jobs, then no job can start with execution before its respective release time. In this environment each job consists of several parameters:

processing time $p_{ij}$—defines the time needed for job with the index j to be executed on machine with the index i
release time $r_j$—defines the time in which the job with the index j becomes available
due date $d_j$—defines the point in time until which the job with the index j should finish with its execution, otherwise a certain loss will be incurred
weight $w_j$—defines the weight (importance) of the job with index j

After constructing the entire schedule, certain metrics are calculated for each job:

$C_j$—finishing time of job j
$F_j$—flowtime of job j:
$$F_j=C_j-r_j.$$
(1)
$T_j$—tardiness of job j:
$$T_j=\max \{C_j-d_j,0\}.$$
(2)
$U_j$—flag if job is tardy or not:
$$U_j=\left\{ \begin{array}{ll} 1{:}&T_j>0 \\ 0{:}&T_j=0 \end{array}\right. .$$
(3)

Based on the previously defined job metrics, many different scheduling criteria can be defined [1, 2]. This study will focus on optimising the following four scheduling criteria:

Twt—total weighted tardiness:
$$Twt=\sum _{j}w_{j}T_j,$$
(4)
Nwt—weighted number of tardy jobs:
$$Nwt=\sum _{j}w_{j}U_j.$$
(5)
Ft—total flowtime:
$$Ft=\sum _{j}F_j,$$
(6)
$C_{max}$—maximum finish time of all jobs:
$$C_{max}=\max _{j}\{C_j\}.$$
(7)

Apart from the scheduling criteria which are optimised, it is also important to outline under which scheduling conditions the problem is solved. If all job parameters are available before the system starts with its execution, then this type of scheduling is called static scheduling. As a consequence, search-based methods (like genetic algorithms or ant colony optimisation) can be used in order to construct the schedule before the start of the system execution. On the other hand, if job parameters become available only as the jobs are released into the system, and no knowledge about their values is available beforehand, then this type of scheduling is called dynamic scheduling. Since there is a need to quickly adapt to the changing scheduling conditions, search-based methods most often cannot be used for this type of scheduling. Because of that reason, DRs are the most commonly used methods for creating schedules in dynamic environments, since they can quickly react to the changing environment. In this paper the dynamic scheduling environment is considered, in which job parameters become available only when the job is released, and the schedule is constructed together with the execution of the system. Therefore, since the schedule is constructed in parallel with the execution of the system it is important that the scheduling decision can be performed quickly in order to not incur any additional delay.

4 Creating DRs with GP

DRs which are constructed in this study can be divided into two parts: a meta-algorithm and a priority function (PF). The meta-algorithm defines a procedure which is used in order to create the entire schedule incrementally. Although the meta-algorithm defines a global scheduling procedure, it still needs to use a concrete PF which is used to calculate priority values for jobs and machines. These priority values are then used by the meta-algorithm in order to determine which job should be scheduled on which machine and in which order. Algorithm 1 represents the meta-algorithm which is used in this study. This procedure tries to find the best mapping between a job and a machine. If the machine on which the chosen job should be scheduled is available, then the job is immediately scheduled on that machine. On the other hand if the machine is currently busy and executing another job, then the job will not be scheduled, but the scheduling decision will be postponed to a later moment in time.

Unlike the meta-algorithm, which is manually defined, the PFs used by it are evolved by GP. However, in order for GP to be able to evolve quality DRs, relevant information about the scheduling environment and its current state, which will be available to GP in the evolution process, needs to be defined. This is done by specifying a set of terminal nodes which will be used by GP in the construction of DRs. Table 1 represents the set of selected terminal nodes which will be used in the evolution process. The time variable, which appears in the description of some terminal nodes, represents the current time of the system. Terminals pt, dd and w represent given properties of the jobs. The pmin and pavg terminals are included in order to give DRs the ability to determine whether the currently considered processing time belongs to faster or slower processing times of the job (which depends on the machine). The SL terminal is included since it is commonly used in some standard DRs, as well as a terminal in many other studies. The remaining two terminals are more machine-centric, with TMR allowing us to determine how soon the considered machine will be available, and TFMA how soon the machine with the shortest processing time will be available. The latter is useful since jobs will quite often be scheduled on the machine with the fastest processing time. Therefore the inclusion of such a terminal has proven useful.

Table 1 Terminal nodes

Comparison of ensemble learning methods for creating ensembles of dispatching rules for the unrelated machines environment

Abstract

Similar content being viewed by others

Constructing Ensembles of Dispatching Rules for Multi-objective Problems

Creating dispatching rules by simple ensemble combination

Evolving Ensembles of Dispatching Rules Using Genetic Programming for Job Shop Scheduling

Explore related subjects

1 Introduction

2 Literature overview

3 Unrelated machines environment

4 Creating DRs with GP

5 Ensemble learning methods for GP

5.1 Simple ensemble combinations

5.2 BagGP

5.3 BoostGP

5.4 Cooperative coevolution

5.5 Ensemble subset search

6 Results

6.1 Benchmark setup and evaluation

6.2 Results for SEC

6.3 Results for BagGP

6.4 Results for BoostGP

6.5 Results for cooperative coevolution

6.6 Comparison of ensemble learning approaches

7 Discussion

7.1 SEC

7.2 BagGP

7.3 BoostGP

7.4 Cooperative coevolution

7.5 ESS

7.6 Comparison of all approaches

8 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation