基于深度强化学习的软件定义卫星姿态控制算法

许轲; 吴凤鸽; 赵军锁

doi:10.13700/j.bh.1001-5965.2018.0357

基于深度强化学习的软件定义卫星姿态控制算法

doi: 10.13700/j.bh.1001-5965.2018.0357

中国科学院软件研究所, 北京 100190

详细信息

作者简介:
许轲男, 博士研究生。主要研究方向:智能信息处理

吴凤鸽女, 博士, 副研究员。主要研究方向:智能信息处理

通讯作者:
吴凤鸽, E-mail: fengge@iscas.ac.cn

中图分类号: V448.22⁺3;TP273⁺.2
计量
- 文章访问数: 803
- HTML全文浏览量: 95
- PDF下载量: 547
- 被引次数: 0
出版历程
- 收稿日期: 2018-06-13
- 录用日期: 2018-08-14
- 网络出版日期: 2018-12-20

Software defined satellite attitude control algorithm based on deep reinforcement learning

Institute of Software, Chinese Academy of Sciences, Beijing 100190, China

More Information

Corresponding author: WU Fengge, E-mail: fengge@iscas.ac.cn

摘要

摘要:
深度强化学习（DRL）作为一种新型的基于机器学习的控制算法，在机器人和无人机等智能控制领域展现出了优异的性能，而卫星姿态控制领域仍然在广泛使用传统的PID控制算法。随着卫星的小型化、智能化以至软件定义卫星的出现，传统控制算法越来越难以满足姿态控制系统对适应性、自主性、鲁棒性的需求。因此对基于深度强化学习的姿态控制算法进行了研究，该算法使用基于模型的算法，比非基于模型的算法拥有更快的收敛速度。与传统控制策略相比，该算法无需对卫星的物理参数和轨道参数等先验知识，具有较强的适应能力和自主控制能力，可以满足软件定义卫星适应不同硬件环境，进行快速研发和部署的需求。此外，该算法通过引入目标网络和并行化启发式搜索算法之后，在网络精度和计算速度方面进行了优化，并且通过仿真实验进行了验证。
- 强化学习 /
- 深度学习 /
- 智能控制 /
- 卫星姿态控制 /
- 软件定义卫星
Abstract:
Deep reinforcement learning (DRL) technique is a new kind of machine learning based control algorithm, which shows its outstanding performance in the area of robotics and unmanned aerial vehicle. Meanwhile, in the area of satellite attitude control, traditional PID control algorithm is still widely used. As satellites become smaller and more intelligent and software defined satellite emerges, traditional control methods are even harder to meet the needs of adaptability, autonomy and robustness. To deal with these problems, a deep reinforcement learning based attitude control algorithm is proposed. It is a kind of model-based algorithm, which has much faster convergence speed than model-free algorithm. Compared with traditional method, this algorithm does not need prior knowledge of satellite's physical or orbit parameters and has better adaptability and autonomy, which make it possible for software defined satellite to adapt to different hardware environments and to be developed and deployed much faster. Furthermore, through introducing target network and parallelized heuristic search algorithm, the proposed algorithm has higher network accuracy and faster computation speed. The simulation experiment verifies these improvements.
- reinforcement learning /
- deep learning /
- intelligent control /
- satellite attitude control /
- software defined satellite

HTML全文

图 1 卫星姿态控制系统过程图

Figure 1. Procedure chart of satellite attitude control system

下载: 全尺寸图片幻灯片

图 2 PID控制算法原理图

Figure 2. Schematic diagram of PID control algorithm

下载: 全尺寸图片幻灯片

图 3 基于模型的深度强化学习算法原理图

Figure 3. Schematic diagram of model based deep reinforcement learning algorithm

下载: 全尺寸图片幻灯片

图 4 卫星姿态控制仿真系统结构

Figure 4. Satellite attitude control simulation system structure

下载: 全尺寸图片幻灯片

图 5 控制算法比较

Figure 5. Comparison of control algorithms

下载: 全尺寸图片幻灯片

图 6 不同网络设置下的均方误差

Figure 6. Mean square error with different networks

下载: 全尺寸图片幻灯片

表 1 3种经典轨道状态下的干扰力矩

Table 1. Disturbance torque under three classic orbit states

N·m
干扰力矩	低轨(200 km)	中轨(1 000 km)	地球同步轨道(35 800 km)
地球重力梯度	10^-3	10^-3	10^-7
空气阻力	0.4×10^-5	10^-9	0
太阳光压	0.2×10^-8	10^-9	0.4×10^-9
太阳重力梯度	0.3×10^-7	0.5×10^-7	0.2×10^-6
月球重力梯度	0.6×10^-7	0.1×10^-6	0.5×10^-6
太阳潮汐	0.4×10^-7	0.4×10^-7	10^-11
月球潮汐	0.5×10^-7	0.3×10^-7	0.8×10^-11

下载: 导出CSV

表 2 不同算法收敛精度及速度对比

Table 2. Comparison of convergence accuracy and speed among different algorithms

网络设置	均方误差			收敛耗时/s
网络设置	最小值	最大值	均值	收敛耗时/s
3层全连接	0.931	1.515	1.047	301.05
5层全连接	0.745	1.510	0.913	440.15
目标网络	0.014	1.908	0.448	28.10

下载: 导出CSV

参考文献(22)

[1]	WILLIAMS T W, SHULMAN S, SEDLAK J, et al.Magnetospheric multiscale mission attitude dynamics: Observations from flight data[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA, 2016.
[2]	HU Q, LI L, FRISWELL M I.Spacecraft anti-unwinding attitude control with actuator nonlinearities and velocity limit[J].Journal of Guidance, Control, and Dynamics, 2015, 38(10):1-8. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=3c610737816cd29f1f9057191968bdc9
[3]	MAZMANYAN L, AYOUBI M A.Takagi-Sugeno fuzzy model-based attitude control of spacecraft with partially-filled fuel tank[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA, 2014.
[4]	LEVINE S, FINN C, DARRELL T, et al.End-to-end training of deep visuomotor policies[J].Journal of Machine Learning Research, 2016, 17(1):1334-1373. http://dl.acm.org/citation.cfm?id=2946684
[5]	SILVER D, HUBERT T, SCHRITTWIESER J, et al.Mastering chess and shogi by self-play with a general reinforcement learning algorithm[EB/OL].(2017-12-05)[2018-06-13].http://cn.arxiv.org/abs/1712.01815.
[6]	LILLICRAP T P, HUNT J J, PRITZEL A, et al.Continuous control with deep reinforcement learning[J].Computer Science, 2015, 8(6):A187. http://cn.bing.com/academic/profile?id=6acdf290bd7c97fa970faf7a2ba649ce&encoded=0&v=paper_preview&mkt=zh-cn
[7]	BROCKMAN G, CHEUNG V, PETTERSSON L, et al.OpenAI Gym[EB/OL].(2016-01-05)[2018-06-13].http://cn.arxiv.org/abs/1606.01540.
[8]	GROSS K, SWENSON E, AGTE J S.Optimal attitude control of a 6u cubesat with a four-wheel pyramid reaction wheel array and magnetic torque coils[C]//AIAA Modeling and Simulation Technologies Conference.Reston: AIAA, 2015. https://www.researchgate.net/publication/306357094_Optimal_Attitude_Control_of_a_6U_CubeSat_with_a_Four-Wheel_Pyramid_Reaction_Wheel_Array_and_Magnetic_Torque_Coils
[9]	AKELLA M R, THAKUR D, MAZENC F.Partial Lyapunov strictification: Smooth angular velocity observers for attitude tracking control[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA: 2015: 442-451. https://www.researchgate.net/publication/269163652_Partial_Lyapunov_Strictification_Smooth_Angular_Velocity_Observers_for_Attitude_Tracking_Control
[10]	XIAO B, HU Q, ZHANG Y, et al.Fault-tolerant tracking control of spacecraft with attitude-only measurement under actuator failures[J].Journal of Guidance, Control, and Dynamics, 2014, 37(3):838-849. doi: 10.2514/1.61369
[11]	WALKER A R, PUTMAN P T, COHEN K.Solely magnetic genetic/fuzzy-attitude-control algorithm for a CubeSat[J].Journal of Spacecraft & Rockets, 2015, 52(6):1627-1639. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=c7f82cb08e9cf6425c7c40e0d4f749dd
[12]	GHADIRI H, SADEGHI M, ABASPOUR A, et al.Optimized fuzzy-quaternion attitude control of satellite in large maneuver[C]//International Conference on Space Operations.Reston: AIAA, 2015. http://www.researchgate.net/publication/303098915_Optimized_Fuzzy-Quaternion_Attitude_control_of_Satellite_in_Large_maneuver
[13]	卢伟.基于阻力参数估计的低轨卫星轨道确定与预报[D].哈尔滨: 哈尔滨工业大学, 2008. http://www.wanfangdata.com.cn/details/detail.do?_type=degree&id=D254198 LU W.Orbit determination and prediction of low earth orbit satellites based on estimating drag parameters[D]. Harbin: Harbin Institute of Technology, 2008(in Chiense). http://www.wanfangdata.com.cn/details/detail.do?_type=degree&id=D254198
[14]	WANG P, ZHENG W, ZHANG H B, et al.Attitude control of low-orbit micro-satellite with active magnetic torque and aerodynamic torque[C]//20103rd In International Symposium on Systems and Control in Aeronautics and Astronautics.Reston: AIAA, 2010: 1460-1464. https://www.researchgate.net/publication/251967428_Attitude_control_of_low-orbit_micro-satellite_with_active_magnetic_torque_and_aerodynamic_torque
[15]	YOO Y, KOO S, KIM G, et al.Attitude control system of a cube satellite with small solar sail[C]//AIAA Aerospace Sciences Meeting.Reston: AIAA, 2013. http://www.researchgate.net/publication/273135427_attitude_control_system_of_a_cube_satellite_with_small_solar_sail
[16]	FRANKLIN G F.Feedback control of dynamic systems[M].Beijign:Posts and Telecom Press, 2007.
[17]	WU B L.Spacecraft attitude control with input quantization[J].Journal of Guidance, Control, and Dynamics, 2016, 39(1):176-181. doi: 10.2514/1.G001427
[18]	TURKOGLU K, GONG A.Preliminary design and prototyping of a low-cost spacecraft attitude determination and control setup[C]//AIAA Guidance, Navigation, and Control Conference.Reston: AIAA, 2015. http://gateway.proquest.com/openurl?res_dat=xri:pqm&ctx_ver=Z39.88-2004&rfr_id=info:xri/sid:baidu&rft_val_fmt=info:ofi/fmt:kev:mtx:article&genre=article&jtitle=Aiaa%20Journal&atitle=Preliminary%20Design%20and%20Prototyping%20of%20a%20Low-Cost%20Spacecraft%20Attitude%20Determination%20and%20Control%20Setup
[19]	WATKINS C J C H, DAYAN P.Q-learning[J].Machine Learning, 1992, 8(3-4):279-292. doi: 10.1007/BF00992698
[20]	KIRKPATRICK S, VECCHI M P.Optimization by simulated annealing[M]//MEZARO M, PARISI G, VIRASORO M.Spin glass theory and beyond: An introduction to the replica method and its applications.Singapore: World Scientific Press, 1987: 339-348.
[21]	KENNEDY J, EBERHART R.Particle swarm optimization[C]//Proceedings of ICNN'95-International Conference on Neural Networks.Piscataway, NJ: IEEE Press, 2002: 1942-1948.
[22]	SALIMANS T, HO J, CHEN X, et al.Evolution strategies as a scalable alternative to reinforcement learning[EB/OL].(2017-12-07).[2018-06-13].https://arxiv.org/abs/1703.03864.