基于深度强化学习的办公流程任务分配优化

廖晨阳; 于劲松; 乐祥立

doi:10.13700/j.bh.1001-5965.2022.0290

基于深度强化学习的办公流程任务分配优化

doi: 10.13700/j.bh.1001-5965.2022.0290

1.
北京航空航天大学自动化科学与电气工程学院，北京 100191
2.
北京航空航天大学高等理工学院，北京 100191

基金项目: 国家重点研发计划(2018YFB1004100)

详细信息

通讯作者:
E-mail：yujs@buaa.edu.cn

中图分类号: TP301.6
计量
- 文章访问数: 93
- HTML全文浏览量: 14
- PDF下载量: 15
- 被引次数: 0
出版历程
- 收稿日期: 2022-04-28
- 录用日期: 2022-05-06
- 网络出版日期: 2022-05-27
- 整期出版日期: 2024-02-27

Optimization of office process task allocation based on deep reinforcement learning

1.
School of Automation Science and Electrical Engineering，Beihang University，Beijing 100191，China
2.
Shenyuan Honors College，Beihang University，Beijing 100191，China

Funds: National Key R & D Program of China (2018YFB1004100)

More Information

Corresponding author: E-mail：yujs@buaa.edu.cn

摘要

摘要:
在办公平台中存在异构流程任务大量并行的情况，不仅需要任务执行者具有较强的能力,也对协同调度系统的性能提出了要求。采用强化学习(RL)算法，结合协作配合度、松弛度等定量分析，并基于马尔可夫博弈理论提出多智能体博弈模型，实现以总体流程配合度和最大完工时间为优化目标的优化调度系统，提高了总体执行效率。以真实的业务系统流程作为实验场景，在相同的优化目标下，对比D3QN等3种深度强化学习(DRL)算法和基于蚁群的元启发式算法，验证了所提方法的有效性。
- 工作流 /
- 任务调度 /
- 马尔可夫博弈 /
- 深度强化学习 /
- 协作度
Abstract:
In the office platform, we often need to face a large number of parallel heterogeneous process tasks. This not only tests the ability of task executors but also puts forward requirements for the performance of the scheduling system. The multi-agent game model based on Markov game theory is proposed in this paper, which adopts the reinforcement learning (RL) approach along with quantitative analysis of the degree of cooperation and relaxation. This model realizes the optimal scheduling system with the overall process degree and maximum completion time as the optimization objectives and enhances the overall execution efficiency. Finally, to confirm the efficacy of this approach, the meta-heuristic algorithm based on ant colony and the reinforcement learning algorithm based on D3QN and deep reinforcement learning (DRL) are contrasted using the real business system process as the experimental data and the identical optimization targets.
- work-flows /
- tasks scheduling /
- Markov games /
- deep reinforcement learning /
- cooperation degree

HTML全文

图 1 工作流实例

Figure 1. Workflow instance

下载: 全尺寸图片幻灯片

图 2 打回情况

Figure 2. Call back situation

下载: 全尺寸图片幻灯片

图 3 实例的数学表达

Figure 3. Mathematical expression of instance

下载: 全尺寸图片幻灯片

图 4 总体流程

Figure 4. Overall process

下载: 全尺寸图片幻灯片

图 5 DDQN与规则库的结合

Figure 5. Combination of DDQN and rule base

下载: 全尺寸图片幻灯片

图 6 损失函数计算

Figure 6. Loss function calculation

下载: 全尺寸图片幻灯片

图 7 协作配合度与总体协作配合度

Figure 7. Cooperation degree and overall cooperation degree

下载: 全尺寸图片幻灯片

图 8 多智能体博弈

Figure 8. Multi agent game

下载: 全尺寸图片幻灯片

图 9 θ_TQN内部参数的更新方法

Figure 9. Methodology for updating θ_TQN

下载: 全尺寸图片幻灯片

图 10 原始拓扑结构

Figure 10. Original topology structure

下载: 全尺寸图片幻灯片

图 11 收敛性验证

Figure 11. Convergence verification

下载: 全尺寸图片幻灯片

图 12 与单一规则比较

Figure 12. Comparison with single rule

下载: 全尺寸图片幻灯片

图 13 时间复杂度对比

Figure 13. Time complexity comparison

下载: 全尺寸图片幻灯片

图 14 协作配合度对比箱型图

Figure 14. Collaboration contrast box diagram

下载: 全尺寸图片幻灯片

图 15 最大完工时间对比

Figure 15. Make span comparison chart

下载: 全尺寸图片幻灯片

表 1 规则与调度函数对应关系

Table 1. Correspondence between rules and scheduling functions

规则编号	采用的 A(选取${O_{i,j}}$)函数	采用的 A′(选取${O_{i,j}}$)函数	采用的 B(选取$P_{i,j,k} $)函数
1	A₁( )	A₂( )	B₁( )
2	A₁( )	A₁( )	B₂( )
3	从$ {J_{{\text{uc}}}}(t) $中随机选取流程$J_i,Q_{i,j}$的j由B₁( )计算	从$ {J_{{\text{uc}}}}(t) $中随机选取流程$J_i,Q_{i,j}$的j由B₁( )计算	B₁( )
4	A₃( )	A₂( )	B₁( )
5	A₂( )	A₂( )	B₁( )
6	$ \begin{gathered}{J_i},{P_k} \leftarrow \arg \max i \in \\J_{{\mathrm{uc}}}(t)(\Delta {S_i})\end{gathered} $	$\begin{gathered}{J_i},{P_k} \leftarrow \arg \max i \in \\J_{{\mathrm{uc}}}(t)(\Delta {S_i})\end{gathered} $	B₂( )

下载: 导出CSV

表 2 流程仿真数据类

Table 2. Process simulation data class

类别编号	E_deadline			$ {n_1}/{n_2} $
类别编号	5	5.6	6	1	0.5
1	√			√
2	√				√
3		√		√
4		√			√
5			√	√
6			√		√

下载: 导出CSV

表 3 单一规则最终完工时间

Table 3. Single rule final completion

规则编号	最终完工时间消耗/h
1	622
2	618
3	894
4	843
5	663
6	1113

下载: 导出CSV

参考文献(18)

[1]	关静静, 贺鹏涛, 张冉. 基于作业调度算法的医院OA系统的优化[J]. 中国医疗设备, 2018, 33(2): 155-157. doi: 10.3969/j.issn.1674-1633.2018.02.042 GUAN J J, HE P T, ZHANG R. Optimization of hospital OA system based on job scheduling algorithm[J]. China Medical Devices, 2018, 33(2): 155-157(in Chinese). doi: 10.3969/j.issn.1674-1633.2018.02.042
[2]	GAO Y Q, ZHANG S Y, ZHOU J T. A hybrid algorithm for multi-objective scientific workflow scheduling in IaaS cloud[J]. IEEE Access, 2019, 7: 125783-125795. doi: 10.1109/ACCESS.2019.2939294
[3]	FARAHNAKIAN F, ASHRAF A, PAHIKKALA T, et al. Using ant colony system to consolidate VMs for green cloud computing[J]. IEEE Transactions on Services Computing, 2015, 8(2): 187-198. doi: 10.1109/TSC.2014.2382555
[4]	吕龙, 胡海洋, 李忠金, 等. 基于蚁群算法的工作流系统优化任务分配[J]. 计算机集成制造系统, 2018, 24(7): 1723-1735. doi: 10.13196/j.cims.2018.07.014 LYU L, HU H Y, LI Z J, et al. Optimizing task allocation in workflow system based on ant colony optimization[J]. Computer Integrated Manufacturing Systems, 2018, 24(7): 1723-1735(in Chinese). doi: 10.13196/j.cims.2018.07.014
[5]	KUMAR A, DIJKMAN R M, SONG M S. Optimal resource assignment in workflows for maximizing cooperation[C]//Proceedings of the 11th International Conference on Business Process. Berlin: Springer, 2013: 235-250.
[6]	许荣斌, 鲍广华, 杨培全, 等. 基于最大依赖度及最小冗余度的员工协作优化策略[J]. 计算机集成制造系统, 2017, 23(5): 1014-1019. doi: 10.13196/j.cims.2017.05.012 XU R B, BAO G H, YANG P Q, et al. Staff collective optimization strategy based on maximal dependency and minimal redundancy[J]. Computer Integrated Manufacturing Systems, 2017, 23(5): 1014-1019(in Chinese). doi: 10.13196/j.cims.2017.05.012
[7]	PIROOZFARD H, WONG K Y, WONG W P. Minimizing total carbon footprint and total late work criterion in flexible job shop scheduling by using an improved multi-objective genetic algorithm[J]. Resources, Conservation and Recycling, 2018, 128: 267-283. doi: 10.1016/j.resconrec.2016.12.001
[8]	CUI D L, KE W D, PENG Z P, et al. Multiple DAGs workflow scheduling algorithm based on reinforcement learning in cloud computing[C]//Proceedings of the International Symposium on Computational Intelligence and Intelligent Systems. Berlin: Springer, 2015: 305-311.
[9]	WU J H, PENG Z P, CUI D L, et al. A multi-object optimization cloud workflow scheduling algorithm based on reinforcement learning[C]//Proceedings of the International Conference on Intelligent Computing. Berlin: Springer, 2018: 550-559.
[10]	WANG Y F. Adaptive job shop scheduling strategy based on weighted Q-learning algorithm[J]. Journal of Intelligent Manufacturing, 2020, 31(2): 417-432. doi: 10.1007/s10845-018-1454-3
[11]	CAO Z, ZHANG H G, CAO Y, et al. A deep reinforcement learning approach to multi-component job scheduling in edge computing[C]//Proceedings of the 15th International Conference on Mobile Ad-Hoc and Sensor Networks. Piscataway: IEEE Press, 2020: 19-24.
[12]	HAN B A, YANG J J. Research on adaptive job shop scheduling problems based on dueling double DQN[J]. IEEE Access, 2020, 8: 186474-186495. doi: 10.1109/ACCESS.2020.3029868
[13]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236
[14]	VAN HASSELT H, GUEI A, SILLVER D. Deep reinforcement learning with double Q-Learning[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. New York: ACM, 2016: 2094-2100.
[15]	VAN HASSELT H. Double Q-learning[C]//Proceedings of the 23rd International Conference on Neural Infor- mation Processing Systems. New York: ACM, 2010: 2613-2621.
[16]	VAN HASSELT H V, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2016, 30(1): 1509.
[17]	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. (2019-07-05) [2022-04-01]. https://arxiv.org/abs/1509.02971.
[18]	SHIUE Y R, LEE K C, SU C T. Real-time scheduling for a smart factory using a reinforcement learning approach[J]. Computers & Industrial Engineering, 2018, 125: 604-614.