Citation: | XU Ke, WU Fengge, ZHAO Junsuoet al. Software defined satellite attitude control algorithm based on deep reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2018, 44(12): 2651-2659. doi: 10.13700/j.bh.1001-5965.2018.0357(in Chinese) |
Deep reinforcement learning (DRL) technique is a new kind of machine learning based control algorithm, which shows its outstanding performance in the area of robotics and unmanned aerial vehicle. Meanwhile, in the area of satellite attitude control, traditional PID control algorithm is still widely used. As satellites become smaller and more intelligent and software defined satellite emerges, traditional control methods are even harder to meet the needs of adaptability, autonomy and robustness. To deal with these problems, a deep reinforcement learning based attitude control algorithm is proposed. It is a kind of model-based algorithm, which has much faster convergence speed than model-free algorithm. Compared with traditional method, this algorithm does not need prior knowledge of satellite's physical or orbit parameters and has better adaptability and autonomy, which make it possible for software defined satellite to adapt to different hardware environments and to be developed and deployed much faster. Furthermore, through introducing target network and parallelized heuristic search algorithm, the proposed algorithm has higher network accuracy and faster computation speed. The simulation experiment verifies these improvements.
[1] |
WILLIAMS T W, SHULMAN S, SEDLAK J, et al.Magnetospheric multiscale mission attitude dynamics: Observations from flight data[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA, 2016.
|
[2] |
HU Q, LI L, FRISWELL M I.Spacecraft anti-unwinding attitude control with actuator nonlinearities and velocity limit[J].Journal of Guidance, Control, and Dynamics, 2015, 38(10):1-8. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=3c610737816cd29f1f9057191968bdc9
|
[3] |
MAZMANYAN L, AYOUBI M A.Takagi-Sugeno fuzzy model-based attitude control of spacecraft with partially-filled fuel tank[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA, 2014.
|
[4] |
LEVINE S, FINN C, DARRELL T, et al.End-to-end training of deep visuomotor policies[J].Journal of Machine Learning Research, 2016, 17(1):1334-1373. http://dl.acm.org/citation.cfm?id=2946684
|
[5] |
SILVER D, HUBERT T, SCHRITTWIESER J, et al.Mastering chess and shogi by self-play with a general reinforcement learning algorithm[EB/OL].(2017-12-05)[2018-06-13].http://cn.arxiv.org/abs/1712.01815.
|
[6] |
LILLICRAP T P, HUNT J J, PRITZEL A, et al.Continuous control with deep reinforcement learning[J].Computer Science, 2015, 8(6):A187. http://cn.bing.com/academic/profile?id=6acdf290bd7c97fa970faf7a2ba649ce&encoded=0&v=paper_preview&mkt=zh-cn
|
[7] |
BROCKMAN G, CHEUNG V, PETTERSSON L, et al.OpenAI Gym[EB/OL].(2016-01-05)[2018-06-13].http://cn.arxiv.org/abs/1606.01540.
|
[8] |
GROSS K, SWENSON E, AGTE J S.Optimal attitude control of a 6u cubesat with a four-wheel pyramid reaction wheel array and magnetic torque coils[C]//AIAA Modeling and Simulation Technologies Conference.Reston: AIAA, 2015. https://www.researchgate.net/publication/306357094_Optimal_Attitude_Control_of_a_6U_CubeSat_with_a_Four-Wheel_Pyramid_Reaction_Wheel_Array_and_Magnetic_Torque_Coils
|
[9] |
AKELLA M R, THAKUR D, MAZENC F.Partial Lyapunov strictification: Smooth angular velocity observers for attitude tracking control[C]//AIAA/AAS Astrodynamics Specialist Conference.Reston: AIAA: 2015: 442-451. https://www.researchgate.net/publication/269163652_Partial_Lyapunov_Strictification_Smooth_Angular_Velocity_Observers_for_Attitude_Tracking_Control
|
[10] |
XIAO B, HU Q, ZHANG Y, et al.Fault-tolerant tracking control of spacecraft with attitude-only measurement under actuator failures[J].Journal of Guidance, Control, and Dynamics, 2014, 37(3):838-849. doi: 10.2514/1.61369
|
[11] |
WALKER A R, PUTMAN P T, COHEN K.Solely magnetic genetic/fuzzy-attitude-control algorithm for a CubeSat[J].Journal of Spacecraft & Rockets, 2015, 52(6):1627-1639. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=c7f82cb08e9cf6425c7c40e0d4f749dd
|
[12] |
GHADIRI H, SADEGHI M, ABASPOUR A, et al.Optimized fuzzy-quaternion attitude control of satellite in large maneuver[C]//International Conference on Space Operations.Reston: AIAA, 2015. http://www.researchgate.net/publication/303098915_Optimized_Fuzzy-Quaternion_Attitude_control_of_Satellite_in_Large_maneuver
|
[13] |
卢伟.基于阻力参数估计的低轨卫星轨道确定与预报[D].哈尔滨: 哈尔滨工业大学, 2008. http://www.wanfangdata.com.cn/details/detail.do?_type=degree&id=D254198
LU W.Orbit determination and prediction of low earth orbit satellites based on estimating drag parameters[D]. Harbin: Harbin Institute of Technology, 2008(in Chiense). http://www.wanfangdata.com.cn/details/detail.do?_type=degree&id=D254198
|
[14] |
WANG P, ZHENG W, ZHANG H B, et al.Attitude control of low-orbit micro-satellite with active magnetic torque and aerodynamic torque[C]//20103rd In International Symposium on Systems and Control in Aeronautics and Astronautics.Reston: AIAA, 2010: 1460-1464. https://www.researchgate.net/publication/251967428_Attitude_control_of_low-orbit_micro-satellite_with_active_magnetic_torque_and_aerodynamic_torque
|
[15] |
YOO Y, KOO S, KIM G, et al.Attitude control system of a cube satellite with small solar sail[C]//AIAA Aerospace Sciences Meeting.Reston: AIAA, 2013. http://www.researchgate.net/publication/273135427_attitude_control_system_of_a_cube_satellite_with_small_solar_sail
|
[16] |
FRANKLIN G F.Feedback control of dynamic systems[M].Beijign:Posts and Telecom Press, 2007.
|
[17] |
WU B L.Spacecraft attitude control with input quantization[J].Journal of Guidance, Control, and Dynamics, 2016, 39(1):176-181. doi: 10.2514/1.G001427
|
[18] |
TURKOGLU K, GONG A.Preliminary design and prototyping of a low-cost spacecraft attitude determination and control setup[C]//AIAA Guidance, Navigation, and Control Conference.Reston: AIAA, 2015. http://gateway.proquest.com/openurl?res_dat=xri:pqm&ctx_ver=Z39.88-2004&rfr_id=info:xri/sid:baidu&rft_val_fmt=info:ofi/fmt:kev:mtx:article&genre=article&jtitle=Aiaa%20Journal&atitle=Preliminary%20Design%20and%20Prototyping%20of%20a%20Low-Cost%20Spacecraft%20Attitude%20Determination%20and%20Control%20Setup
|
[19] |
WATKINS C J C H, DAYAN P.Q-learning[J].Machine Learning, 1992, 8(3-4):279-292. doi: 10.1007/BF00992698
|
[20] |
KIRKPATRICK S, VECCHI M P.Optimization by simulated annealing[M]//MEZARO M, PARISI G, VIRASORO M.Spin glass theory and beyond: An introduction to the replica method and its applications.Singapore: World Scientific Press, 1987: 339-348.
|
[21] |
KENNEDY J, EBERHART R.Particle swarm optimization[C]//Proceedings of ICNN'95-International Conference on Neural Networks.Piscataway, NJ: IEEE Press, 2002: 1942-1948.
|
[22] |
SALIMANS T, HO J, CHEN X, et al.Evolution strategies as a scalable alternative to reinforcement learning[EB/OL].(2017-12-07).[2018-06-13].https://arxiv.org/abs/1703.03864.
|