北京航空航天大学学报 ›› 2019, Vol. 45 ›› Issue (9): 1894-1901.doi: 10.13700/j.bh.1001-5965.2018.0789

• 论文 • 上一篇    下一篇

基于强化学习的时间触发通信调度方法

李浩若, 何锋, 郑重, 李二帅, 熊华钢   

  1. 北京航空航天大学 电子信息工程学院, 北京 100083
  • 收稿日期:2019-01-02 出版日期:2019-09-20 发布日期:2019-09-29
  • 通讯作者: 何锋 E-mail:robinleo@buaa.edu.cn
  • 作者简介:李浩若,男,硕士研究生。主要研究方向:实时通信系统;何锋,男,博士,副教授。主要研究方向:航空电子网络、分布式实时系统。
  • 基金资助:
    国家自然科学基金(61301086,71701020);中国民航大学天津市民用航空器适航与维修重点实验室开放基金(2017SW02)

Time-triggered communication scheduling method based on reinforcement learning

LI Haoruo, HE Feng, ZHENG Zhong, LI Ershuai, XIONG Huagang   

  1. School of Electronic and Information Engineering, Beihang University, Beijing 100083, China
  • Received:2019-01-02 Online:2019-09-20 Published:2019-09-29
  • Supported by:
    National Natural Science Foundation of China (61301086,71701020); Open Fund of Tianjin Civil Aircraft Airworthiness and Maintenance Key Laboratory of Civil Aviation University of China (2017SW02)

摘要: 未来航空电子系统中将会更广泛地选择基于时间触发的通信机制进行信息传输,以保证信息交互的确定性。如何合理地进行时间触发通信调度设计是时间触发应用于航空电子互连系统的关键。针对时间触发调度的周期性任务,提出了一种基于强化学习的周期调度时刻表生成方法。首先,将流量调度任务转换为树搜索问题,使之具有强化学习所需要的马尔可夫特性;随后,利用基于神经网络的强化学习算法对调度表进行探索,不断缩短延迟时间以优化调度表,且在训练完成后,可以直接使用到消息分布相近的任务中。与使用Yices等可满足模理论(SMT)形式化求解时间触发调度表方法相比,所提方法不会出现无法判定的问题,能够保证时间触发调度设计结果的正确性和优化性。对于包含1 000条消息的大型网络,所提方法的计算速度为SMT方法的数十倍以上,并且调度生成消息的端到端延迟在SMT方法的1%以下,大大提高了消息传输的及时性。

关键词: 时间触发, 调度方法, 强化学习, 树搜索, 偏置时间

Abstract: In the future, time-triggered communication mechanism will be more widely selected for information transmission to ensure the certainty of information interaction in avionics system. How to reasonably implement time-triggered communication scheduling design is the key to time-triggered application to avionics interconnection systems. For the periodic task of time-triggered scheduling, we proposed a method for generating periodic scheduling timetable based on reinforcement learning. Firstly, the traffic scheduling task is transformed into a tree search problem, which has the Markov characteristics needed for reinforcement learning. Then, the reinforcement learning algorithm based on neural network is used to explore the schedule, and the waiting time is shortened to optimize the schedule. As the training is completed, the model can be directly used in tasks with similar message distribution. Compared with the method, e.g. Yices, which uses the satisfiability modulo theories (SMT) to solve the time-triggered schedule, the proposed method does not cause undetermined problem, and can guarantee the correctness and optimization of the time-triggered scheduling design results. For a large network with 1 000 messages, the calculation speed of the proposed method is dozens of times faster than that of the SMT, and meanwhile, the end-to-end delay of the generated message by scheduling is less than 1% of that of the SMT, which greatly improves the timeliness of message transmission.

Key words: time-triggered, scheduling method, reinforcement learning, tree search, offset time

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发