留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的软件定义卫星姿态控制算法

许轲 吴凤鸽 赵军锁

许轲, 吴凤鸽, 赵军锁等 . 基于深度强化学习的软件定义卫星姿态控制算法[J]. 北京航空航天大学学报, 2018, 44(12): 2651-2659. doi: 10.13700/j.bh.1001-5965.2018.0357
引用本文: 许轲, 吴凤鸽, 赵军锁等 . 基于深度强化学习的软件定义卫星姿态控制算法[J]. 北京航空航天大学学报, 2018, 44(12): 2651-2659. doi: 10.13700/j.bh.1001-5965.2018.0357
XU Ke, WU Fengge, ZHAO Junsuoet al. Software defined satellite attitude control algorithm based on deep reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2018, 44(12): 2651-2659. doi: 10.13700/j.bh.1001-5965.2018.0357(in Chinese)
Citation: XU Ke, WU Fengge, ZHAO Junsuoet al. Software defined satellite attitude control algorithm based on deep reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2018, 44(12): 2651-2659. doi: 10.13700/j.bh.1001-5965.2018.0357(in Chinese)

基于深度强化学习的软件定义卫星姿态控制算法

doi: 10.13700/j.bh.1001-5965.2018.0357
详细信息
    作者简介:

    许轲  男, 博士研究生。主要研究方向:智能信息处理

    吴凤鸽  女, 博士, 副研究员。主要研究方向:智能信息处理

    通讯作者:

    吴凤鸽, E-mail: fengge@iscas.ac.cn

  • 中图分类号: V448.22+3;TP273+.2

Software defined satellite attitude control algorithm based on deep reinforcement learning

More Information
  • 摘要:

    深度强化学习(DRL)作为一种新型的基于机器学习的控制算法,在机器人和无人机等智能控制领域展现出了优异的性能,而卫星姿态控制领域仍然在广泛使用传统的PID控制算法。随着卫星的小型化、智能化以至软件定义卫星的出现,传统控制算法越来越难以满足姿态控制系统对适应性、自主性、鲁棒性的需求。因此对基于深度强化学习的姿态控制算法进行了研究,该算法使用基于模型的算法,比非基于模型的算法拥有更快的收敛速度。与传统控制策略相比,该算法无需对卫星的物理参数和轨道参数等先验知识,具有较强的适应能力和自主控制能力,可以满足软件定义卫星适应不同硬件环境,进行快速研发和部署的需求。此外,该算法通过引入目标网络和并行化启发式搜索算法之后,在网络精度和计算速度方面进行了优化,并且通过仿真实验进行了验证。

     

  • 图 1  卫星姿态控制系统过程图

    Figure 1.  Procedure chart of satellite attitude control system

    图 2  PID控制算法原理图

    Figure 2.  Schematic diagram of PID control algorithm

    图 3  基于模型的深度强化学习算法原理图

    Figure 3.  Schematic diagram of model based deep reinforcement learning algorithm

    图 4  卫星姿态控制仿真系统结构

    Figure 4.  Satellite attitude control simulation system structure

    图 5  控制算法比较

    Figure 5.  Comparison of control algorithms

    图 6  不同网络设置下的均方误差

    Figure 6.  Mean square error with different networks

    表  1  3种经典轨道状态下的干扰力矩

    Table  1.   Disturbance torque under three classic orbit states

    N·m
    干扰力矩 低轨(200 km) 中轨(1 000 km) 地球同步轨道(35 800 km)
    地球重力梯度 10-3 10-3 10-7
    空气阻力 0.4×10-5 10-9 0
    太阳光压 0.2×10-8 10-9 0.4×10-9
    太阳重力梯度 0.3×10-7 0.5×10-7 0.2×10-6
    月球重力梯度 0.6×10-7 0.1×10-6 0.5×10-6
    太阳潮汐 0.4×10-7 0.4×10-7 10-11
    月球潮汐 0.5×10-7 0.3×10-7 0.8×10-11
    下载: 导出CSV

    表  2  不同算法收敛精度及速度对比

    Table  2.   Comparison of convergence accuracy and speed among different algorithms

    网络设置 均方误差 收敛耗时/s
    最小值 最大值 均值
    3层全连接 0.931 1.515 1.047 301.05
    5层全连接 0.745 1.510 0.913 440.15
    目标网络 0.014 1.908 0.448 28.10
    下载: 导出CSV
  • [1] WILLIAMS T W, SHULMAN S, SEDLAK J, et al.Magnetospheric multiscale mission attitude dynamics: Observations from flight data[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA, 2016.
    [2] HU Q, LI L, FRISWELL M I.Spacecraft anti-unwinding attitude control with actuator nonlinearities and velocity limit[J].Journal of Guidance, Control, and Dynamics, 2015, 38(10):1-8.
    [3] MAZMANYAN L, AYOUBI M A.Takagi-Sugeno fuzzy model-based attitude control of spacecraft with partially-filled fuel tank[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA, 2014.
    [4] LEVINE S, FINN C, DARRELL T, et al.End-to-end training of deep visuomotor policies[J].Journal of Machine Learning Research, 2016, 17(1):1334-1373.
    [5] SILVER D, HUBERT T, SCHRITTWIESER J, et al.Mastering chess and shogi by self-play with a general reinforcement learning algorithm[EB/OL].(2017-12-05)[2018-06-13].
    [6] LILLICRAP T P, HUNT J J, PRITZEL A, et al.Continuous control with deep reinforcement learning[J].Computer Science, 2015, 8(6):A187.
    [7] BROCKMAN G, CHEUNG V, PETTERSSON L, et al.OpenAI Gym[EB/OL].(2016-01-05)[2018-06-13].
    [8] GROSS K, SWENSON E, AGTE J S.Optimal attitude control of a 6u cubesat with a four-wheel pyramid reaction wheel array and magnetic torque coils[C]//AIAA Modeling and Simulation Technologies Conference.Reston: AIAA, 2015.
    [9] AKELLA M R, THAKUR D, MAZENC F.Partial Lyapunov strictification: Smooth angular velocity observers for attitude tracking control[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA: 2015: 442-451.
    [10] XIAO B, HU Q, ZHANG Y, et al.Fault-tolerant tracking control of spacecraft with attitude-only measurement under actuator failures[J].Journal of Guidance, Control, and Dynamics, 2014, 37(3):838-849. doi: 10.2514/1.61369
    [11] WALKER A R, PUTMAN P T, COHEN K.Solely magnetic genetic/fuzzy-attitude-control algorithm for a CubeSat[J].Journal of Spacecraft & Rockets, 2015, 52(6):1627-1639.
    [12] GHADIRI H, SADEGHI M, ABASPOUR A, et al.Optimized fuzzy-quaternion attitude control of satellite in large maneuver[C]//International Conference on Space Operations.Reston: AIAA, 2015.
    [13] 卢伟.基于阻力参数估计的低轨卫星轨道确定与预报[D].哈尔滨: 哈尔滨工业大学, 2008.

    LU W.Orbit determination and prediction of low earth orbit satellites based on estimating drag parameters[D]. Harbin: Harbin Institute of Technology, 2008(in Chiense).
    [14] WANG P, ZHENG W, ZHANG H B, et al.Attitude control of low-orbit micro-satellite with active magnetic torque and aerodynamic torque[C]//20103rd In International Symposium on Systems and Control in Aeronautics and Astronautics.Reston: AIAA, 2010: 1460-1464.
    [15] YOO Y, KOO S, KIM G, et al.Attitude control system of a cube satellite with small solar sail[C]//AIAA Aerospace Sciences Meeting.Reston: AIAA, 2013.
    [16] FRANKLIN G F.Feedback control of dynamic systems[M].Beijign:Posts and Telecom Press, 2007.
    [17] WU B L.Spacecraft attitude control with input quantization[J].Journal of Guidance, Control, and Dynamics, 2016, 39(1):176-181. doi: 10.2514/1.G001427
    [18] TURKOGLU K, GONG A.Preliminary design and prototyping of a low-cost spacecraft attitude determination and control setup[C]//AIAA Guidance, Navigation, and Control Conference.Reston: AIAA, 2015.
    [19] WATKINS C J C H, DAYAN P.Q-learning[J].Machine Learning, 1992, 8(3-4):279-292. doi: 10.1007/BF00992698
    [20] KIRKPATRICK S, VECCHI M P.Optimization by simulated annealing[M]//MEZARO M, PARISI G, VIRASORO M.Spin glass theory and beyond: An introduction to the replica method and its applications.Singapore: World Scientific Press, 1987: 339-348.
    [21] KENNEDY J, EBERHART R.Particle swarm optimization[C]//Proceedings of ICNN'95-International Conference on Neural Networks.Piscataway, NJ: IEEE Press, 2002: 1942-1948.
    [22] SALIMANS T, HO J, CHEN X, et al.Evolution strategies as a scalable alternative to reinforcement learning[EB/OL].(2017-12-07).[2018-06-13].
  • 加载中
图(6) / 表(2)
计量
  • 文章访问数:  1018
  • HTML全文浏览量:  114
  • PDF下载量:  553
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-06-13
  • 录用日期:  2018-08-14
  • 网络出版日期:  2018-12-20

目录

    /

    返回文章
    返回
    常见问答