A deep reinforcement learning based on discrete state transition algorithm for solving fuzzy flexible job shop scheduling problem
-
摘要:
柔性作业车间调度问题(FJSP)作为一种在实际生活中应用广泛的调度问题,对其智能算法具有重要价值。为了解决FJSP,以最小化最大完工时间为优化目标,提出了一种基于近端策略优化的离散状态转移算法(DSTA-PPO)。DSTA-PPO具有3个特点:考虑到FJSP需要同时对工序排序、机器分配同时进行调度安排,结合工序编码和机器编码,设计了一种能够充分表达当前调度问题的状态特征;针对工序排序、机器分配设计了多种基于关键路径的搜索操作;通过强化学习的训练,能够有效地引导智能体选择正确的搜索操作优化当前的调度序列。通过基于不同数据集的仿真实验,验证了算法各环节的有效性,同时在相同算例上以最小化最大完工时间为对比指标与现有算法进行了比较,对比结果表明了所提算法能够在多数算例上以更短的完工时间对算例完成求解,有效地求解了柔性作业车间调度问题。
Abstract:The study of the intelligent algorithms for the flexible job shop scheduling issue (FJSP), a scheduling problem with a broad range of application backgrounds, is very relevant both academically and practically. To address FJSP with the objective of minimizing the maximum completion time, this paper proposes a discrete state transfer algorithm based on proximal policy optimization (DSTA-PPO). DSTA-PPO has the following three characteristics. Considering that FJSP requires simultaneous scheduling arrangements for operation sequencing and machine assignment. This state feature can adequately express the current scheduling problem that was designed by combining operation coding and machine coding. Various critical path based search operations have been designed for operation sequencing and machine allocation. Reinforcement learning training is an efficient way to direct intelligence to choose the best search operation to maximize the current scheduling sequence.. The effectiveness of each component of the algorithm is verified through simulation experiments on different datasets. Furthermore, a comparison is conducted with existing algorithms using the objective of minimizing the maximum completion time in the same instances. The comparison results show that the suggested method successfully resolves the flexible job shop scheduling issue by typically achieving shorter completion times.
-
表 1 3×3的FJSP实例
Table 1. 3×3 FJSP instance
工件 工序 工序号 工序和加工时间 M1 M2 M3 J1 O11 1 1 2 * O12 2 3 2 1 J2 O21 3 * 2 1 O22 4 1 2 4 O23 5 * * 4 J3 O31 6 3 2 2 O32 7 1 * 2 注:“*”表示该工序无法在该机器上进行加工。 表 2 正交试验表及各参数组合的响应值
Table 2. Orthogonal test table and response values of various parameter combinations
编号 参数因子组合 RV SE α batch mini-batch 1 1 1 1 1 3.054784 2 1 2 2 2 3.181015 3 1 3 3 3 3.08003 4 1 4 4 4 2.928553 5 2 1 2 3 2.802323 6 2 2 1 4 2.575107 7 2 3 4 1 2.600353 8 2 4 3 2 2.827569 9 3 1 3 4 2.2974 10 3 2 4 3 2.398384 11 3 3 1 2 2.019692 12 3 4 2 1 2.09543 13 4 1 4 2 2.322646 14 4 2 3 1 2.171169 15 4 3 2 4 2.145923 16 4 4 1 3 2.196415 表 3 参数各水平的极差分析
Table 3. Extreme variance analysis of parameters at different levels
参数因子 K值 Kavg值 最佳水平 R 水平数量 每水平重复数 1 2 3 4 1 2 3 4 SE 12.24 10.81 8.81 8.84 3.06 2.7 2.2 2.21 3 −0.86 4 4 α 10.48 10.33 9.85 10.05 2.62 2.58 2.46 2.51 3 −0.16 4 4 batch 9.85 10.22 10.38 10.25 2.46 2.56 2.59 2.56 1 −0.13 4 4 mini-batch 9.92 10.35 10.48 9.95 2.48 2.59 2.62 2.49 1 −0.14 4 4 表 4 第一类测试集对比
Table 4. Comparison of the first type of dataset
数据集 最优解 DSTA SLGA 集成强化学习 DSTA-PPO Mk01 36 40* 40* 42 40* Mk02 24 28 27 31 26* Mk03 204 204 204* 204* 204* Mk04 48 66 60* 69 60* Mk05 168 178 172* 180 172* Mk06 33 73 69 68 62 Mk07 133 147 144 153 142 Mk08 523 523* 523* 523* 523* Mk09 299 327 320 315 307* Mk10 165 263 254 230 225 表 5 第二类测试集对比
Table 5. Comparison of the second type of dataset
Dataset Val gap/% DSTA Multi-PPO DSTA-PPO UB DSTA Multi-PPO DSTA-PPO 10×5 Vdata_la1 613 610 573 570 7.54 7.02 0.53 Vdata_la2 557 555 536 529 5.29 4.91 1.32 Vdata_la3 531 532 487 477 11.32 11.53 2.10 Vdata_la4 538 530 510 502 7.17 5.58 1.59 Vdata_la5 490 507 467 457 7.22 10.94 2.19 15×5 Vdata_la6 824 820 802 799 3.13 2.63 0.38 Vdata_la7 788 757 755 749 5.21 1.07 0.80 Vdata_la8 791 782 765 765 3.40 2.22 0.00 Vdata_la9 925 879 859 853 8.44 3.05 0.70 Vdata_la10 830 862 808 804 3.23 7.21 0.50 20×5 Vdata_la11 1102 1101 1074 1071 2.89 2.80 0.28 Vdata_la12 960 950 939 936 2.56 1.50 0.32 Vdata_la13 1070 1053 1042 1038 3.08 1.45 0.39 Vdata_la14 1095 1086 1071 1070 2.34 1.56 0.09 Vdata_la15 1117 1111 1092 1089 2.57 2.02 0.28 10×10 Vdata_la16 717 717 717 717 0.00 0.00 0.00 Vdata_la17 646 647 646 646 0.00 0.15 0.00 Vdata_la18 714 663 663 663 7.69 0.00 0.00 Vdata_la19 705 626 644 617 14.26 1.46 4.38 Vdata_la20 756 756 756 756 0.00 0.00 0.00 15×10 Vdata_la21 903 887 870 804 12.31 10.32 8.21 Vdata_la22 849 793 798 736 15.35 7.74 8.42 Vdata_la23 970 858 888 815 19.02 5.28 8.96 Vdata_la24 891 883 834 775 14.97 13.94 7.61 Vdata_la25 896 883 820 756 18.52 16.80 8.47 20×10 Vdata_la26 1195 1089 1100 1054 13.38 3.32 4.36 Vdata_la27 1231 1123 1133 1084 13.56 3.60 4.52 Vdata_la28 1249 1106 1121 1070 16.73 3.36 4.77 Vdata_la29 1121 1049 1048 994 12.78 5.53 5.43 Vdata_la30 1256 1117 1122 1069 17.49 4.49 4.96 30×10 Vdata_la31 1688 1561 1553 1520 11.05 2.70 2.17 Vdata_la32 1837 1693 1697 1658 10.80 2.11 2.35 Vdata_la33 1630 1531 1530 1497 8.88 2.27 2.20 Vdata_la34 1731 1562 1565 1537 12.62 1.63 1.82 Vdata_la35 1711 1574 1582 1549 10.46 1.61 2.13 15×15 Vdata_la36 1100 985 967 948 16.03 3.90 2.00 Vdata_la37 1171 1028 1035 986 18.76 4.26 4.97 Vdata_la38 1061 948 943 943 12.51 0.53 0.00 Vdata_la39 1109 976 959 922 20.28 6.18 4.01 Vdata_la40 1041 968 955 955 9.01 1.36 0.00 注:Ave.Gap: DSTA、Multi-ppo、DSTA-PPO分别为9.55、4.20、2.58。 -
[1] THAMES L, SCHAEFER D. Software-defined cloud manufacturing for industry 4.0[J]. Procedia CIRP, 2016, 52: 12-17. doi: 10.1016/j.procir.2016.07.041 [2] GOLPÎRA H, KHAN S A R, SAFAEIPOUR S. A review of logistics Internet-of-things: current trends and scope for future research[J]. Journal of Industrial Information Integration, 2021, 22: 100194. doi: 10.1016/j.jii.2020.100194 [3] LIU Y K, WANG L H, WANG X V, et al. Scheduling in cloud manufacturing: state-of-the-art and research challenges[J]. International Journal of Production Research, 2019, 57(15-16): 4854-4879. doi: 10.1080/00207543.2018.1449978 [4] FANG Y L, PENG C, LOU P, et al. Digital-twin-based job shop scheduling toward smart manufacturing[J]. IEEE Transactions on Industrial Informatics, 2019, 15(12): 6425-6435. doi: 10.1109/TII.2019.2938572 [5] ZHANG Y F, WANG J, LIU S C, et al. Game theory based real‐time shop floor scheduling strategy and method for cloud manufacturing[J]. International Journal of Intelligent Systems, 2017, 32(4): 437-463. doi: 10.1002/int.21868 [6] GAO K Z, CAO Z G, ZHANG L, et al. A review on swarm intelligence and evolutionary algorithms for solving flexible job shop scheduling problems[J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6(4): 904-916. doi: 10.1109/JAS.2019.1911540 [7] CHOI I C, CHOI D S. A local search algorithm for jobshop scheduling problems with alternative operations and sequence-dependent setups[J]. Computers & Industrial Engineering, 2002, 42(1): 43-58. [8] ZHANG G H, SHAO X Y, LI P G, et al. An effective hybrid particle swarm optimization algorithm for multi-objective flexible job-shop scheduling problem[J]. Computers & Industrial Engineering, 2009, 56(4): 1309-1318. [9] NOUIRI M, BEKRAR A, JEMAI A, et al. Two stage particle swarm optimization to solve the flexible job shop predictive scheduling problem considering possible machine breakdowns[J]. Computers & Industrial Engineering, 2017, 112: 595-606. [10] NOUIRI M, BEKRAR A, JEMAI A, et al. An effective and distributed particle swarm optimization algorithm for flexible job-shop scheduling problem[J]. Journal of Intelligent Manufacturing, 2018, 29(3): 603-615. doi: 10.1007/s10845-015-1039-3 [11] XING L N, CHEN Y W, WANG P, et al. A knowledge-based ant colony optimization for flexible job shop scheduling problems[J]. Applied Soft Computing, 2010, 10(3): 888-896. doi: 10.1016/j.asoc.2009.10.006 [12] JIANG T H, ZHANG C. Application of grey wolf optimization for solving combinatorial problems: job shop and flexible job shop scheduling cases[J]. IEEE Access, 2018, 6: 26231-26240. doi: 10.1109/ACCESS.2018.2833552 [13] ZHOU X J, YANG C H, GUI W H. State transition algorithm[J]. Journal of Industrial & Management Optimization, 2012, 8(4): 1039-1056. [14] 董天雪, 阳春华, 周晓君, 等. 一种求解企业员工指派问题的离散状态转移算法[J]. 控制理论与应用, 2016, 33(10): 1378-1388. doi: 10.7641/CTA.2016.50982DONG T X, YANG C H, ZHOU X J, et al. A novel discrete state transition algorithm for staff assignment problem[J]. Control Theory & Applications, 2016, 33(10): 1378-1388(in Chinese). doi: 10.7641/CTA.2016.50982 [15] 张凤雪, 阳春华, 周晓君, 等. 基于控制周期计算的锌液净化除铜过程优化控制[J]. 控制理论与应用, 2017, 34(10): 1388-1395. doi: 10.7641/CTA.2017.60621ZHANG F X, YANG C H, ZHOU X J, et al. Optimal control based on control period calculation for copper removal process of zinc solution purification[J]. Control Theory & Applications, 2017, 34(10): 1388-1395(in Chinese). doi: 10.7641/CTA.2017.60621 [16] DU Y, LI J Q, CHEN X L, et al. Knowledge-based reinforcement learning and estimation of distribution algorithm for flexible job shop scheduling problem[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2022, 7(4): 1036-1050. [17] CHEN R H, YANG B, LI S, et al. A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem[J]. Computers & Industrial Engineering, 2020, 149: 106778. [18] LEI K, GUO P, ZHAO W C, et al. A multi-action deep reinforcement learning framework for flexible Job-shop scheduling problem[J]. Expert Systems with Applications, 2022, 205: 117796. doi: 10.1016/j.eswa.2022.117796 [19] 张凯, 毕利, 焦小刚. 集成强化学习算法的柔性作业车间调度问题研究[J]. 中国机械工程, 2023, 34(2): 201-207. doi: 10.3969/j.issn.1004-132X.2023.02.010ZHANG K, BI L, JIAO X G. Research on flexible job-shop scheduling problems with integrated reinforcement learning algorithm[J]. China Mechanical Engineering, 2023, 34(2): 201-207(in Chinese). doi: 10.3969/j.issn.1004-132X.2023.02.010 [20] 王凌, 王晶晶. 考虑运输时间的分布式绿色柔性作业车间调度协同群智能优化[J]. 中国科学: 技术科学: 1-15. -