Dynamic Scheduling Method of Multi-objective Job Shop Based on Reinforcement Learning

Zhang, Zhenwei; Qiao, Lihong; Huang, Zhicheng

doi:10.1007/978-981-19-8915-5_44

Zhenwei Zhang⁹,
Lihong Qiao¹⁰ &
Zhicheng Huang¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1714))

Included in the following conference series:

China Intelligent Networked Things Conference

671 Accesses

Abstract

Aiming at the dynamic scheduling problem in workshop production, we propose a multi-objective scheduling method. By analyzing the actual dynamic scheduling problem, a mathematical model is constructed. Then the dynamic interference factors in the actual production environment are classified, and the interference intensity and its parameters are designed. On this basis, a dynamic scheduling oriented process model is established by using reinforcement learning and scheduling rules, and the design of its state space, state action value table and reward function is introduced. Finally, the model is trained and we analyze the simulation results of different methods. The results show that the dynamic scheduling method based on reinforcement learning has good performance under different periods and disturbance intensity, which shows this method is effective and feasible for dynamic scheduling problem.

This work is supported by the National Key Research and Development Program of China under Grant 2018YFB1701800.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Multi-Agent Reinforcement Learning Tool for Job Shop Scheduling Problems

Solving the Job Shop Scheduling Problem with Reinforcement Learning: A Statistical Analysis

A Multi-objective Reinforcement Learning Algorithm for JSSP

Keywords

1 Introduction

Due to the complexity of the actual production process, uncertain factors such as equipment downtime, urgent orders, repairing due to quality problems, time adjustment and other factors are difficult to avoid. Job shop scheduling is often shown as complex dynamic scheduling, which needs to adjust the job plan at any time according to the changes of production conditions. For the dynamic scheduling problem, the traditional methods are simplifying the problem, ignoring the disturbance and uncertainty, and transforming the complex problem into a static scheduling problem. These methods need to be redesigned according to the current new state, and models are needed to readjusted according to the changes in the production environment. Since the disturbance of the system state is not considered in traditional methods, the actual production needs cannot be met. In addition, for large-scale production workshops, the order tasks are complex and a great number of resources are involved, which will geometrically increase the difficulty of scheduling problems. For these large-scale and more complex scheduling problems, traditional methods are not easy to be applied. Therefore, it is of great significance to study the dynamic scheduling problem for actual production.

Scheduling rule is a priority allocation rule, which has the characteristics of low time complexity and high robustness, and is applicable to solve dynamic scheduling problems. In recent years, scheduling rules are often used to solve job shop scheduling problems. Durasevic et al. [1] studied the applicability of different scheduling rules, and found out the scheduling standards suitable for each scheduling rule by testing nine standards and four job types. Kuck et al. [2] proposed an optimization method based on adaptive simulation to select appropriate scheduling rules for production control in the case of equipment failure in complex manufacturing systems. Zhang et al. [3] proposed a semantic-based scheduling rule selection system, which associated scheduling rules with optimization objectives through semantic similarity and semantic expressions, and realized the generation of scheduling rule combinations for a given production target. Rolf et al. [4] presented a method of scheduling rule allocation in solving the hybrid flow shop problem with sequence-related setup times. Lee et al. [5] proposed a sequential search method to set appropriate weight sets for scheduling rules, and used decision trees and hierarchical clustering to improve search efficiency. Braune et al. [6] proposed a tree based scheduling priority rule generation method, which realized the decision-making of job allocation and machine sequencing through single tree and multiple.

Reinforcement learning algorithm is a kind of method that does not rely on samples. Compared with traditional intelligent algorithm, it has higher efficiency and generalization ability. Q-learning (QL) is one of the main methods of reinforcement learning. It is a model-free learning method, which can avoid the huge amount of computation of large-scale scheduling, and is applicable to dynamic scheduling problems. At present, there are more and more researches on reinforcement learning for scheduling problems. Bouazza et al. [7] selected reasonable equipment and process routes for the dynamic scheduling by improving the state-action value table of the reinforcement learning algorithm. Shahrabi et al. [8] used QL algorithm to find the appropriate parameters for problem with equipment failure and dynamic arrival of workpieces. Shiue et al. [9] studied the problem of real-time scheduling (TRS) and proposed a real-time scheduling system using a multiple scheduling rules (MDR) mechanism to ensure that the knowledge base (KB) can respond to changes in the workshop environment in real time. Wang [10] proposed an adaptive scheduling strategy, which avoided the blind search problem of the traditional method through dynamic greedy search, and realized the weighted iteration of Q function by defining state error, which improved the speed and accuracy of the learning algorithm. Qu et al. [11] proposed a multi-agent method for the scheduling of production system covering multiple types of products, equipment and labor, which could adaptively update production plans in real time. Chen et al. [12] proposed a method for flexible job shop scheduling, which took genetic algorithm as the key and intelligently adjusted its parameters based on reinforcement learning. Kardos et al. [13] proposed a new method to select machines according to real-time information, so as to reduce the delay time of the workpiece in production.

The current research on dynamic scheduling problems is mostly based on specific workshop scenarios, and the scheduling scheme is only suitable for special environments, which is not universal. By analyzing the above research, this paper proposes a multi-objective dynamic scheduling method based on QL. Through QL technology, the optimal scheduling strategy under dynamic disturbance is obtained, and the real-time matching between the scheduling strategy and the production environment is realized.

2 Description of Dynamic Scheduling Problem

Job shop dynamic scheduling is a complex optimization problem, which can be described as: n workpieces $Q= \{Q_{1} {,}Q_{2} {,}...,Q_{n} \}$ are processed on m equipment $M = \{M_{1} {,}M_{2} {,}...,M_{n} \}$. Each workpiece Q_i has its corresponding process route and process, and each process corresponds to an optional equipment set. At the same time, it is necessary to consider the disturbance factors in actual production, such as equipment failure, urgent orders, etc. The goal of scheduling is to select the appropriate processing equipment for the workpiece under various constraints and dynamic disturbances, to determine the processing sequence of the workpiece and its working time, and to continuously improve the scheduling index through optimization to meet the expected index requirements.

For the universality of the problem, the dynamic scheduling problem in this paper is based on the several conditions:

The workpieces arrive dynamically, and the arrival times of the workpieces are random, regardless of the delivery time of the material. The processing time of the operation includes the preparation time.
Each process of the workpiece corresponds to an optional equipment set, and only one of the equipment can be selected to complete the process.
The process cannot be stopped halfway after it starts.
Each equipment can only be used for the processing of one workpiece at the same time, and other workpieces are not allowed to preempt after the processing starts.
Each workpiece has a definite process route, and the processing is carried out in the order specified in the process route. The next process can only be carried out after its previous process.
The processing time of an operation has nothing to do with the process route.

To define the dynamic scheduling problem, the definitions of relevant parameters are shown in Table 1:

Table 1. Parameter definition of dynamic scheduling problem

Full size table

For the dynamic scheduling problem, the constraints can be described as follows:

$$TQ_{sij} + X_{ijk} \times TQ_{ijk} \le TQ_{eij}$$

(1)

$$TQ_{eij} \le TQ_{si(j + 1)}$$

(2)

$$TQ_{{eiL_{i} }} \le C_{\max }$$

(3)

$$TQ_{sij} + TQ_{ijk} \le TQ_{shr} + \inf \cdot (1 - Y_{ijkhr} )$$

(4)

$$TQ_{eij} \le TQ_{si(j + 1)} + \inf \cdot (1 - Y_{ikhr(j + 1)} )$$

(5)

$$\sum\limits_{k = 1}^{{H_{ij} }} {X_{ijk} } = 1$$

(6)

$$\sum\limits_{i = 1}^{n} {\sum\limits_{k = 1}^{{L_{i} }} {Y_{ijkhr} } = X_{hrk} }$$

(7)

$$\sum\limits_{h = 1}^{n} {\sum\limits_{r = 1}^{{L_{h} }} {Y_{ijkhr} } = X_{ijk} }$$

(8)

In the above constraints, Eqs. 1 and 2 represent the production route constraints of the workpiece; Eq. 3 represents the constraint of the completion time of the process; Eqs. 4 and 5 indicate that a piece of equipment can only be used for one process at the same time; Eq. 6 represents the exclusive constraint of a process; Eqs. 7 and 8 represent the usability constraints of the equipment.

The goal of defining dynamic scheduling is to facilitate the evaluation of the effect of scheduling. Common scheduling performance evaluation includes production efficiency, such as maximum makespan; Stability of production process, such as deviation index; Economic indicators, such as production cost, total processing energy consumption and so on. This paper uses the synthesis of multiple indicators as the evaluation indicators.

$$f_{1} = \min (\max (C_{i} ))$$

(9)

$$f_{2} = \min (\frac{1}{n}\sum\limits_{i = 1}^{n} {(C_{i} - G_{i} )} )$$

(10)

$$f_{3} = \min (\mathop {\max }\limits_{i = 1}^{n} (\max (C_{i} - T_{ei} )))$$

(11)

Equation 9 represents the maximum makespan requirement, Eq. 10 represents the index of average flow time, and Eq. 11 represents the index of delayed delivery time. We construct the final scheduling performance index by synthesizing the above indicators, as shown in formula 12, and f is the objective function of comprehensive optimization.

$$f = \min F(f_{1} ,f_{2} ,f_{3} )$$

(12)

3 Dynamic Disturbance and Scheduling Rules

3.1 Analysis of Dynamic Disturbance Factors

The dynamic disturbance factors in actual production can be divided into indirect disturbance and direct disturbance according to their performance characteristics. Indirect disturbances, such as processing time deviation, poor material turnover, and equipment efficiency decline, etc., will affect the execution of scheduling only when these factors accumulate to a certain extent. Direct disturbance will significantly interfere with the scheduling and cause the adjustment of the plan. Direct disturbance can be divided into two types. One is related to resources, such as equipment failure, operation interruption, personnel absence, material shortage or delay, etc. The other is related to the workpiece, such as emergency order insertion, random arrival of workpiece, task cancellation, delivery date adjustment, working hours change, workpiece repair, etc. Common dynamic disturbance factors are shown in Table 2.

Table 2. Classification of dynamic disturbance factors

Full size table

This paper mainly studies the direct disturbance, focusing on four typical disturbance factors: the dynamic arrival of the workpiece, urgent order, equipment maintenance and workpiece repair. In the actual production process, the workpiece arrives randomly. In this paper, the arrival time of the workpiece is set so that they are uniformly distributed. For urgent order, it can be set by proportion R₁ and advance its delivery date. The equipment is not available if it is under the maintenance period, and the disturbance can be set by the maintenance time proportion R₂. For the quality problems in actual production, it is achieved by setting a certain number of workpieces with a proportion of F for rework. Considering the difference of disturbance degree in actual production, this paper reflects it through different disturbance intensity.

3.2 Scheduling Rules

The scheduling rules are to calculate the priority of the workpiece according to the selection of processing time, process quantity, delivery period, etc., and select workpiece to be processed for idle equipment.

For dynamic scheduling problems, the evaluation method based on scheduling rules can be used. This method is relatively easy to implement in the actual production environment. It belongs to an efficient closed-loop control method, so it can be used for real-time job shop scheduling. In the process of scheduling, we need to consider the selection of equipment and the allocation of jobs. This paper analyzes the scheduling rules for these two types of problems, studies the performance differences of different scheduling rules under dynamic disturbance, and finally selects the rules with excellent performance. Common scheduling rules are shown in Table 3.

Table 3. Typical scheduling rules

Full size table

4 Dynamic Scheduling Problem Solving Based on Reinforcement Learning

4.1 State Space Definition

The job shop scheduling process is transformed through the state space to express the system environment of reinforcement learning. The definition of state needs to reflect the features and process of the scheduling environment, and it needs to be able to express different scenarios. In this paper, according to the state of the equipment and the workpiece. The state space is defined by means of feature vectors. The state space includes 5 features, which are shown as follows.

$$s_{k,1} = {{n_{k} } \mathord{\left/ {\vphantom {{n_{k} } n}} \right. \kern-\nulldelimiterspace} n}$$

(13)

$$s_{k,2} = {{T_{k}^{a} } \mathord{\left/ {\vphantom {{T_{k}^{a} } {T^{a} }}} \right. \kern-\nulldelimiterspace} {T^{a} }}$$

(14)

$$s_{k,3} = {{\sum\limits_{j = 1}^{{L_{i} }} {TQ_{ijk} } } \mathord{\left/ {\vphantom {{\sum\limits_{j = 1}^{{L_{i} }} {TQ_{ijk} } } {\sum {\sum {TQ_{ij} } } }}} \right. \kern-\nulldelimiterspace} {\sum {\sum {TQ_{ij} } } }}$$

(15)

$$s_{k,4} = {{\sum\limits_{h = j + 1}^{{L_{i} }} {TQ_{ih} } } \mathord{\left/ {\vphantom {{\sum\limits_{h = j + 1}^{{L_{i} }} {TQ_{ih} } } {\sum\limits_{j = 1}^{{L_{i} }} {TQ_{ij} } }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{j = 1}^{{L_{i} }} {TQ_{ij} } }}$$

(16)

$$s_{k,5} = {{n_{k}^{d} } \mathord{\left/ {\vphantom {{n_{k}^{d} } n}} \right. \kern-\nulldelimiterspace} n}$$

(17)

The above equations describe the current state of the system, where n_k represents the number of workpieces processed by the equipment k at the current moment; $T_{k}^{a}$ indicates the number of processing operations of equipment k; $n_{k}^{d}$ indicates the number of delayed workpieces processed by equipment k. Equation 13 reflects the distribution of processed workpieces on each equipment; Eq. 14 reflects the distribution of processes on the equipment; Eq. 15 reflects the proportion of processing time of the equipment; Eq. 16 reflects the proportion of remaining time of work in process; Eq. 17 reflects the distribution state of delayed workpieces.

The state values constitute a vector [s11, s12, …, s15, s21, s22,…, s25, s31,…, sm5]. The state vector can be transformed into a state value located in a certain numerical interval (such as [0,100]) by using neural network, and the state value can be used as a criterion to distinguish the state of the scheduling environment.

4.2 Q-value Table

The action space of QL can be expressed as the scheduling behavior in the current state. This paper selects seven typical scheduling rules as the action space of QL. According to the aforementioned priority rules, the Q(s, a) table can be established by combining the state values. In this paper, 11 states are used to construct the Q(s, a) table, as shown in Table 4.

Table 4. Q(s, a) table

Full size table

For state 0, that is, the scheduling action has not yet started. This state is empty and is also the initial state, so its value is 0.

4.3 Design of Reward Function

For the reward function R, its construction needs to consider the performance indicators of the scheduling system, and the function needs to reflect the impact of action selection on the scheduling results. Therefore, the design of R needs to reflect not only the immediate reward of the action, but also the cumulative impact on the production cycle, and it should also be suitable for scheduling problems of all sizes. For the production efficiency indicators related to time, because these indicators are related to the utilization of equipment, it can be considered to take the working state of equipment as a reward and punishment function, and the equipment state can be defined as follows.

$$\delta_{i} (t) = \left\{ {\begin{array}{*{20}l} { - 1,\,\,{\rm Equipment}\ i \ {\rm is}\ {\rm idle} \ {\rm at}\ {\rm time}\ t} \hfill \\ {0,\,\,\,\,\,\,{\rm Equipment}\ i\ {\rm is}\ {\rm in}\ {\rm working}\ {\rm state}\ {\rm at}\ {\rm time}\ t} \hfill \\ \end{array} } \right.$$

The reward and punishment function are as follows.

$$r_{k} = \frac{1}{m}\sum\limits_{i = 1}^{m} {\int_{{\tau = t_{{k{ - }1}} }}^{{t_{k} }} {\delta_{i} (\tau )} }$$

(18)

In Eq. 18, m is the number of devices, and r_k represents the reward when it transitions from s_k-1 to s_k. The absolute value of r_k is the same as the average time that each device is idle when the two states are transferred. It can be seen that the production cycle is negatively correlated with the cumulative return.

$$\begin{gathered} R_{{\text{a}}} = \sum\limits_{k = 1}^{K} {r_{k} } = \frac{1}{m}\sum\limits_{k = 1}^{K} {\sum\limits_{i = 1}^{m} {\int_{{\tau = t_{k - 1} }}^{{t_{k} }} {\delta_{i} (\tau )} } } = \frac{1}{m}\sum\limits_{i = 1}^{m} {\int_{\tau = 0}^{{c_{\max } }} {\delta_{i} (\tau )} } \hfill \\ { = } - \frac{1}{m}\sum\limits_{k = 1}^{m} {(C_{\max } - \sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{{L_{i} }} {TQ_{ijk} } } )} = \frac{1}{m}\sum\limits_{k = 1}^{m} {\sum\limits_{i = 1}^{n} {\sum\limits_{j = 1}^{{L_{i} }} {TQ_{ijk} } } } - C_{\max } \hfill \\ \end{gathered}$$

(19)

In Eq. 19, R_a represents the cumulative return. It can be seen that the smaller the C_max, the greater the cumulative return R_a.

5 Case Study

To verify the effectiveness of the method, we carry out simulation analysis through MATLAB. The QL algorithm parameters settings: α = 0.05, β = 0.9, ε = 0.15. The initial state-action reward value is zero. For the experimental data, the number of processes of a single workpiece is between 1 and 4, the total number of equipment is 10, the process processing time is between 10 and 25, and the workpiece cache is 100. The processing time, the time interval of arrival, and the required completion time data all meet the normal distribution. The dynamic disturbance parameter settings are shown in Table 5. The training data of QL is randomly generated by the system, with a total of 100000 pieces of workpiece data (the unit of time-related data is hour). In the test phase, seven sets of workpiece data such as 300, 600, 1000, 1500, 2000, 2500 and 3000 are used respectively, and a single scheduling rule and the QL lgorithm in this paper are used for scheduling.

Table 5. Dynamic disturbance parameter setting

Full size table

Table 6. Performance comparison of different rules and QL

Full size table

Table 6 shows the makespan data obtained by different scheduling rules and the QL algorithm in this paper. It can be seen that the advantage of QL is not obvious when the number of workpieces is small, but with the increase of the number of workpieces, the performance advantage of QL scheduling becomes obvious. Through QL, the frequency of the system selects different rules in different delivery periods is shown in Table 7.

Table 7. Selection frequency of dispatching rules

Full size table

It can be seen that there are differences in the selection frequency of scheduling rules. The rule of MST, EDD and LPT are used more frequently under tight period, and MST, LPT, EDD and SPT are used more frequently under medium period, and MST and SPT are used more frequently under loose period. MST has the highest frequency of use under the three periods. These rules with higher frequency contribute the most to the solution of QL algorithm, while other rules are used less frequently and contribute less to the solution of the system.

Table 8 analyzes the comparison of the time limit and completion of QL and MST under the three periods. It can be seen that QL has better effect in the actual time limit, the tardiness and the advance.

Table 8. Comparison of time limit and completion under three periods

Full size table

Figure 1 shows the tardiness of QL and MST under different workpiece cache, including tight period and medium period. It is obvious that the tardiness of two methods increase with the expansion of the cache capacity, and the tardiness of tight period is worse than that of medium period. In addition, the QL curve rises gently than MST in both periods. In the medium period, the MST curve quickly crosses the QL curve when the workpiece cache capacity reaches 70, and it crosses the QL curve when the capacity is only 30 in the tight period.

Figures 2, 3 and 4 shows the comparison between QL and MST in terms of the change of overdue rate with the number of workpieces under different disturbance intensity and different periods. It can be seen that under the three periods, the overdue rate of both methods increases with the increase of disturbance intensity. When the number of workpieces increases from 200 to 1000, the overdue rate of each period and disturbance intensity increases rapidly. The higher the disturbance intensity, the greater the slope of the curve. After the number of workpieces exceeds 1000, the overdue rate decreases slightly, but remains at a high level. During this period, the overdue rate of QL is generally lower than that of MST. After the number of workpieces exceeds 2000, the overdue rate of QL and MST gradually decreases and stabilizes, and it decreases faster using QL under tight period and high disturbance intensity. By comparison, it is obvious that QL has better effect under tight period and high disturbance intensity.

Figure 5 is a comparative analysis of the maximum makespan of QL, GA and PSO under different workpiece cache capacities in the medium period, in which the number of the workpieces is 1000, the number of genetic algorithm population is 20, and the crossover and mutation parameters are 0.4 and 0.2, the acceleration index is 1.5. The particle size of PSO is 40, the acceleration factor is 2, the inertia weight is 0.5, the maximum particle speed is 0.7, and the number of iterations is 100. From Fig. 5 we can see that when the workpiece is 1000, GA and PSO have advantages over QL in terms of completion time optimization. However, from Fig. 6, the scheduling running time of GA and PSO is much higher than that of QL. When the workpiece cache is low, GA method is about 12 times the running time of QL. When the cache capacity is 80–150, the running time decreases slightly, and when the cache capacity is more than 150, it enters an upward trend. The performance of PSO is worse, and the running time increases sharply after the cache capacity exceeds 40. The running time of QL is always maintained at 2 s, which is only related to the number of workpieces and has nothing to do with the cache. It can be seen that QL is slightly weaker than GA and PSO in terms of completion time, but QL has a better time efficiency advantage in the case of large number of workpiece production and large workpiece cache.

6 Conclusion

This paper studies the dynamic scheduling problem based on reinforcement learning technology and scheduling rules. Typical production disturbances are classified and described through disturbance parameters, the dynamic scheduling problem and its optimization objectives can be consequently represented. On this basis, the state space, Q-value table and reward function of reinforcement learning scheduling are designed. The algorithm is carried out with simulation analysis by using the example data. Through the case study, the method proposed in this paper shows far superior to the traditional intelligent algorithm in time efficiency, and also has a good dynamic scheduling effect. This paper provides a new idea for large-scale job shop dynamic scheduling. For the selection of equipment, this paper adopts the method of man-hour priority, which still lacks the overall analysis of man-hour and load under the complete process route, and further in-depth research can be continued from this direction in the future.

References

Ðurasevic, M., Jakobovic, D.: A survey of dispatching rules for the dynamic unrelated machines environment. Expert Syst. App. 113, 555–569 (2018). https://doi.org/10.1016/j.eswa.2018.06.053
Article Google Scholar
Kuck M, Broda E, Freitag M, et al. Towards adaptive simulation-based optimization to select individual dispatching rules for production control. In: 2017 Winter Simulation Conference, WSC, pp. 3852–3863. IEEE, Las Vegas (2017)
Google Scholar
Zhang, H., Roy, U.: A semantics-based dispatching rule selection approach for job shop scheduling. J. Intell. Manuf. 30, 2759–2779 (2018)
Article Google Scholar
Rolf, B., Reggelin, T., Nahhas, A., et al.: Assigning dispatching rules using a genetic algorithm to solve a hybrid flow shop scheduling problem. Procedia Manuf. 42, 442–449 (2020)
Article Google Scholar
Lee, J.H., Kim, Y., Yun, B.K., et al.: A sequential search method of dispatching rules for scheduling of LCD manufacturing systems. IEEE Trans. Semicond. Manuf. 33(4), 496–503 (2020)
Article Google Scholar
Braune, R., Benda, F., Doerner, K.F., et al.: A genetic programming learning approach to generate dispatching rules for flexible shop scheduling problems. Int. J. Prod. Econ. 243, 108342 (2022)
Article Google Scholar
Bouazza, W., Sallez, Y., Beldjilali, B.: A distributed approach solving partially flexible job-shop scheduling problem with a Q-learning effect. IFAC Papersonline 50(1), 15890–15895 (2017)
Article Google Scholar
Shahrabi, J., Adibi, M.A., Mahootchi, M.: A reinforcement learning approach to parameter estimation in dynamic job shop scheduling. Comput. Indus. Eng. 110(aug), 75–82 (2017)
Article Google Scholar
Shiue, Y.R., Lee, K.C., Su, C.T.: Real-time scheduling for a smart factory using a reinforcement learning approach. Comput. Indus. Eng. 125(Nov), 604–614 (2018)
Article Google Scholar
Wang, Y.: Adaptive job shop scheduling strategy based on weighted Q-learning algorithm. J. Intell. Manuf. 31, 417–432 (2018)
Article Google Scholar
Qu, S., Wang, J., Govil, S., et al.: Optimized adaptive scheduling of a manufacturing process system with multi-skill workforce and multiple machine types: an ontology-based, multi-agent reinforcement learning approach. Procedia CIRP 57, 55–60 (2016)
Article Google Scholar
Chen, R., Yang, B., Li, S., et al.: A Self-Learning Genetic Algorithm based on Reinforcement Learning for Flexible Job-shop Scheduling Problem. Comput. Indus. Eng. 149(1993), 106778 (2020)
Article Google Scholar
Kardos, C., Laflamme, C., Gallina, V., et al.: Dynamic scheduling in a job-shop production system with reinforcement learning. Procedia CIRP 97(1), 104–109 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Institute of Radio Measurement, Beijing, 100039, China
Zhenwei Zhang
School of Mechanical Engineering and Automation, Beihang University, Beijing, 100191, China
Lihong Qiao & Zhicheng Huang

Authors

Zhenwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lihong Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Zhicheng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenwei Zhang .

Editor information

Editors and Affiliations

Beihang University, Beijing, China
Lin Zhang
Beijing University of Posts and Telecommunications, Beijing, China
Wensheng Yu
Xinjiang University, Xinjiang, China
Haijun Jiang
Beihang University, Beijing, China
Yuanjun Laili

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Qiao, L., Huang, Z. (2022). Dynamic Scheduling Method of Multi-objective Job Shop Based on Reinforcement Learning. In: Zhang, L., Yu, W., Jiang, H., Laili, Y. (eds) Intelligent Networked Things. CINT 2022. Communications in Computer and Information Science, vol 1714. Springer, Singapore. https://doi.org/10.1007/978-981-19-8915-5_44

Download citation

DOI: https://doi.org/10.1007/978-981-19-8915-5_44
Published: 28 January 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8914-8
Online ISBN: 978-981-19-8915-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Dynamic Scheduling Method of Multi-objective Job Shop Based on Reinforcement Learning

Abstract

Similar content being viewed by others

Multi-Agent Reinforcement Learning Tool for Job Shop Scheduling Problems

Solving the Job Shop Scheduling Problem with Reinforcement Learning: A Statistical Analysis

A Multi-objective Reinforcement Learning Algorithm for JSSP

Keywords

1 Introduction

2 Description of Dynamic Scheduling Problem

3 Dynamic Disturbance and Scheduling Rules