北京航空航天大学学报 ›› 2018, Vol. 44 ›› Issue (12): 2651-2659.doi: 10.13700/j.bh.1001-5965.2018.0357

• 飞行器设计与制造 • 上一篇    下一篇

基于深度强化学习的软件定义卫星姿态控制算法

许轲, 吴凤鸽, 赵军锁   

  1. 中国科学院软件研究所, 北京 100190
  • 收稿日期:2018-06-13 修回日期:2018-08-14 出版日期:2018-12-20 发布日期:2018-12-28
  • 通讯作者: 吴凤鸽 E-mail:fengge@iscas.ac.cn
  • 作者简介:许轲,男,博士研究生。主要研究方向:智能信息处理;吴凤鸽,女,博士,副研究员。主要研究方向:智能信息处理。

Software defined satellite attitude control algorithm based on deep reinforcement learning

XU Ke, WU Fengge, ZHAO Junsuo   

  1. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
  • Received:2018-06-13 Revised:2018-08-14 Online:2018-12-20 Published:2018-12-28

摘要: 深度强化学习(DRL)作为一种新型的基于机器学习的控制算法,在机器人和无人机等智能控制领域展现出了优异的性能,而卫星姿态控制领域仍然在广泛使用传统的PID控制算法。随着卫星的小型化、智能化以至软件定义卫星的出现,传统控制算法越来越难以满足姿态控制系统对适应性、自主性、鲁棒性的需求。因此对基于深度强化学习的姿态控制算法进行了研究,该算法使用基于模型的算法,比非基于模型的算法拥有更快的收敛速度。与传统控制策略相比,该算法无需对卫星的物理参数和轨道参数等先验知识,具有较强的适应能力和自主控制能力,可以满足软件定义卫星适应不同硬件环境,进行快速研发和部署的需求。此外,该算法通过引入目标网络和并行化启发式搜索算法之后,在网络精度和计算速度方面进行了优化,并且通过仿真实验进行了验证。

关键词: 强化学习, 深度学习, 智能控制, 卫星姿态控制, 软件定义卫星

Abstract: Deep reinforcement learning (DRL) technique is a new kind of machine learning based control algorithm, which shows its outstanding performance in the area of robotics and unmanned aerial vehicle. Meanwhile, in the area of satellite attitude control, traditional PID control algorithm is still widely used. As satellites become smaller and more intelligent and software defined satellite emerges, traditional control methods are even harder to meet the needs of adaptability, autonomy and robustness. To deal with these problems, a deep reinforcement learning based attitude control algorithm is proposed. It is a kind of model-based algorithm, which has much faster convergence speed than model-free algorithm. Compared with traditional method, this algorithm does not need prior knowledge of satellite's physical or orbit parameters and has better adaptability and autonomy, which make it possible for software defined satellite to adapt to different hardware environments and to be developed and deployed much faster. Furthermore, through introducing target network and parallelized heuristic search algorithm, the proposed algorithm has higher network accuracy and faster computation speed. The simulation experiment verifies these improvements.

Key words: reinforcement learning, deep learning, intelligent control, satellite attitude control, software defined satellite

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发