Software defined satellite attitude control algorithm based on deep reinforcement learning
-
摘要:
深度强化学习(DRL)作为一种新型的基于机器学习的控制算法,在机器人和无人机等智能控制领域展现出了优异的性能,而卫星姿态控制领域仍然在广泛使用传统的PID控制算法。随着卫星的小型化、智能化以至软件定义卫星的出现,传统控制算法越来越难以满足姿态控制系统对适应性、自主性、鲁棒性的需求。因此对基于深度强化学习的姿态控制算法进行了研究,该算法使用基于模型的算法,比非基于模型的算法拥有更快的收敛速度。与传统控制策略相比,该算法无需对卫星的物理参数和轨道参数等先验知识,具有较强的适应能力和自主控制能力,可以满足软件定义卫星适应不同硬件环境,进行快速研发和部署的需求。此外,该算法通过引入目标网络和并行化启发式搜索算法之后,在网络精度和计算速度方面进行了优化,并且通过仿真实验进行了验证。
Abstract:Deep reinforcement learning (DRL) technique is a new kind of machine learning based control algorithm, which shows its outstanding performance in the area of robotics and unmanned aerial vehicle. Meanwhile, in the area of satellite attitude control, traditional PID control algorithm is still widely used. As satellites become smaller and more intelligent and software defined satellite emerges, traditional control methods are even harder to meet the needs of adaptability, autonomy and robustness. To deal with these problems, a deep reinforcement learning based attitude control algorithm is proposed. It is a kind of model-based algorithm, which has much faster convergence speed than model-free algorithm. Compared with traditional method, this algorithm does not need prior knowledge of satellite's physical or orbit parameters and has better adaptability and autonomy, which make it possible for software defined satellite to adapt to different hardware environments and to be developed and deployed much faster. Furthermore, through introducing target network and parallelized heuristic search algorithm, the proposed algorithm has higher network accuracy and faster computation speed. The simulation experiment verifies these improvements.
-
表 1 3种经典轨道状态下的干扰力矩
Table 1. Disturbance torque under three classic orbit states
N·m 干扰力矩 低轨(200 km) 中轨(1 000 km) 地球同步轨道(35 800 km) 地球重力梯度 10-3 10-3 10-7 空气阻力 0.4×10-5 10-9 0 太阳光压 0.2×10-8 10-9 0.4×10-9 太阳重力梯度 0.3×10-7 0.5×10-7 0.2×10-6 月球重力梯度 0.6×10-7 0.1×10-6 0.5×10-6 太阳潮汐 0.4×10-7 0.4×10-7 10-11 月球潮汐 0.5×10-7 0.3×10-7 0.8×10-11 表 2 不同算法收敛精度及速度对比
Table 2. Comparison of convergence accuracy and speed among different algorithms
网络设置 均方误差 收敛耗时/s 最小值 最大值 均值 3层全连接 0.931 1.515 1.047 301.05 5层全连接 0.745 1.510 0.913 440.15 目标网络 0.014 1.908 0.448 28.10 -
[1] WILLIAMS T W, SHULMAN S, SEDLAK J, et al.Magnetospheric multiscale mission attitude dynamics: Observations from flight data[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA, 2016. [2] HU Q, LI L, FRISWELL M I.Spacecraft anti-unwinding attitude control with actuator nonlinearities and velocity limit[J].Journal of Guidance, Control, and Dynamics, 2015, 38(10):1-8. [3] MAZMANYAN L, AYOUBI M A.Takagi-Sugeno fuzzy model-based attitude control of spacecraft with partially-filled fuel tank[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA, 2014. [4] LEVINE S, FINN C, DARRELL T, et al.End-to-end training of deep visuomotor policies[J].Journal of Machine Learning Research, 2016, 17(1):1334-1373. [5] SILVER D, HUBERT T, SCHRITTWIESER J, et al.Mastering chess and shogi by self-play with a general reinforcement learning algorithm[EB/OL].(2017-12-05)[2018-06-13]. [6] LILLICRAP T P, HUNT J J, PRITZEL A, et al.Continuous control with deep reinforcement learning[J].Computer Science, 2015, 8(6):A187. [7] BROCKMAN G, CHEUNG V, PETTERSSON L, et al.OpenAI Gym[EB/OL].(2016-01-05)[2018-06-13]. [8] GROSS K, SWENSON E, AGTE J S.Optimal attitude control of a 6u cubesat with a four-wheel pyramid reaction wheel array and magnetic torque coils[C]//AIAA Modeling and Simulation Technologies Conference.Reston: AIAA, 2015. [9] AKELLA M R, THAKUR D, MAZENC F.Partial Lyapunov strictification: Smooth angular velocity observers for attitude tracking control[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA: 2015: 442-451. [10] XIAO B, HU Q, ZHANG Y, et al.Fault-tolerant tracking control of spacecraft with attitude-only measurement under actuator failures[J].Journal of Guidance, Control, and Dynamics, 2014, 37(3):838-849. doi: 10.2514/1.61369 [11] WALKER A R, PUTMAN P T, COHEN K.Solely magnetic genetic/fuzzy-attitude-control algorithm for a CubeSat[J].Journal of Spacecraft & Rockets, 2015, 52(6):1627-1639. [12] GHADIRI H, SADEGHI M, ABASPOUR A, et al.Optimized fuzzy-quaternion attitude control of satellite in large maneuver[C]//International Conference on Space Operations.Reston: AIAA, 2015. [13] 卢伟.基于阻力参数估计的低轨卫星轨道确定与预报[D].哈尔滨: 哈尔滨工业大学, 2008.LU W.Orbit determination and prediction of low earth orbit satellites based on estimating drag parameters[D]. Harbin: Harbin Institute of Technology, 2008(in Chiense). [14] WANG P, ZHENG W, ZHANG H B, et al.Attitude control of low-orbit micro-satellite with active magnetic torque and aerodynamic torque[C]//20103rd In International Symposium on Systems and Control in Aeronautics and Astronautics.Reston: AIAA, 2010: 1460-1464. [15] YOO Y, KOO S, KIM G, et al.Attitude control system of a cube satellite with small solar sail[C]//AIAA Aerospace Sciences Meeting.Reston: AIAA, 2013. [16] FRANKLIN G F.Feedback control of dynamic systems[M].Beijign:Posts and Telecom Press, 2007. [17] WU B L.Spacecraft attitude control with input quantization[J].Journal of Guidance, Control, and Dynamics, 2016, 39(1):176-181. doi: 10.2514/1.G001427 [18] TURKOGLU K, GONG A.Preliminary design and prototyping of a low-cost spacecraft attitude determination and control setup[C]//AIAA Guidance, Navigation, and Control Conference.Reston: AIAA, 2015. [19] WATKINS C J C H, DAYAN P.Q-learning[J].Machine Learning, 1992, 8(3-4):279-292. doi: 10.1007/BF00992698 [20] KIRKPATRICK S, VECCHI M P.Optimization by simulated annealing[M]//MEZARO M, PARISI G, VIRASORO M.Spin glass theory and beyond: An introduction to the replica method and its applications.Singapore: World Scientific Press, 1987: 339-348. [21] KENNEDY J, EBERHART R.Particle swarm optimization[C]//Proceedings of ICNN'95-International Conference on Neural Networks.Piscataway, NJ: IEEE Press, 2002: 1942-1948. [22] SALIMANS T, HO J, CHEN X, et al.Evolution strategies as a scalable alternative to reinforcement learning[EB/OL].(2017-12-07).[2018-06-13].