基于深度强化学习的平流层浮空器高度控制

张经伦; 杨希祥; 邓小龙; 郭正; 翟嘉琪

doi:10.13700/j.bh.1001-5965.2021.0622

基于深度强化学习的平流层浮空器高度控制

doi: 10.13700/j.bh.1001-5965.2021.0622

国防科技大学空天科学学院，长沙 410073

基金项目: 国家自然科学基金(52272445)；湖南省自然科学基金(2023JJ100056)

详细信息

通讯作者:
E-mail：nkyangxixiang@163.com

中图分类号: V472；TB553
计量
- 文章访问数: 192
- HTML全文浏览量: 62
- PDF下载量: 27
- 被引次数: 0
出版历程
- 收稿日期: 2021-10-22
- 录用日期: 2021-12-06
- 网络出版日期: 2021-12-30
- 整期出版日期: 2023-08-31

Altitude control of stratospheric aerostat based on deep reinforcement learning

College of Aerospace Science and Engineering，National University of Defense Technology，Changsha 410073，China

Funds: National Natural Science Foundation of China (52272445)；Hunan Provincial Natural Science Foundation (2023JJ100056)

More Information

Corresponding author: E-mail：nkyangxixiang@163.com

摘要

摘要:
为研究基于深度强化学习的平流层浮空器高度控制问题。建立平流层浮空器动力学模型，提出一种基于深度 Q 网络（DQN）算法的平流层浮空器高度控制方法，以平流层浮空器当前速度、位置、高度差作为智能体的观察状态，副气囊鼓风机开合时间作为智能体的输出动作，平流层浮空器非线性动力学模型与扰动风场作为智能体的学习环境。所提方法将平流层浮空器的高度控制问题转换为未知转移概率下连续状态、连续动作的强化学习过程，兼顾随机风场扰动与速度变化约束，实现稳定的变高度控制。仿真结果表明：考虑风场环境对浮空器影响下，DQN算法控制器可以很好的实现变高度的跟踪控制，最大稳态误差约为10 m，与传统比例积分微分（PID）控制器对比，其控制效果和鲁棒性更优。
- 临近空间 /
- 平流层浮空器 /
- 高度控制 /
- 深度强化学习 /
- PID控制器
Abstract:
A dynamic model of the stratospheric aerostat was built with the goal of controlling the aerostat's altitude while taking air temperature into consideration, and a method based on the deep Q-network (DQN) algorithm was developed. Due to the difficulty in predicting the stratospheric wind field and the physical model of the aerostat itself being unknown, most model-based control methods cannot solve the problem of long-term altitude control of the stratospheric aerostat. For this reason, the altitude control problem of the stratospheric aerostat is transformed into a continuous state and continuous action reinforcement learning process with unknown transition probability. The DQN algorithm combined with reinforcement learning and neural network can solve such problems well. The simulation results show that considering the influence of the wind field environment on the aerostat, the DQN algorithm controller can well realize the tracking control of variable altitude, and the maximum error is about 10 m. Compared with the traditional proportional inteyral derivative (PID) controller, the deep reinforcement learning algorithm proposed in this paper has a better control effect and robustness.
- near space /
- stratospheric aerostat /
- altitude control /
- deep reinforcement learning /
- PID controller

HTML全文

图 1 平流层浮空器系统

Figure 1. Stratoshperic aerostat system

下载: 全尺寸图片幻灯片

图 2 平流层浮空器受太阳辐射示意图

Figure 2. Schematic diagram of stratospheric aerostat exposed to solar radiation

下载: 全尺寸图片幻灯片

图 3 平流层浮空器高度控制原理

Figure 3. Principle of stratospheric aerostat altitude control

下载: 全尺寸图片幻灯片

图 4 强化学习基本原理

Figure 4. Basic principle of reinforcement learning

下载: 全尺寸图片幻灯片

图 5 $\varepsilon $-贪婪法算法流程

Figure 5. Flow chart of $\varepsilon$-greedy

下载: 全尺寸图片幻灯片

图 6 DQN算法训练过程原理

Figure 6. Principle of DQN algorithm training process

下载: 全尺寸图片幻灯片

图 7 风场环境固定与变化情况时PID高度控制器的效果

Figure 7. Effect of PID height controller when wind field environments are fixed or changing

下载: 全尺寸图片幻灯片

图 8 基于DQN的平流层浮空器高度控制

Figure 8. Altitude control of stratospheric aerostat based on DQN

下载: 全尺寸图片幻灯片

图 9 不同控制器下的高度控制误差

Figure 9. Altitude control error under different control methods

下载: 全尺寸图片幻灯片

图 10 回报值函数变化

Figure 10. Changing of reward value function

下载: 全尺寸图片幻灯片

图 11 不同初始高度下的控制结果

Figure 11. Control results at different initial heights

下载: 全尺寸图片幻灯片

表 1 智能体DQN算法控制结果

Table 1. Control results of agent DQN algorithm

飞行段/s 调节时间/s 超调量/m 稳态误差/m

0~250 10 9.6
250~450 100 11.2
450~650 68 4.5
650~850 76 6
850~1 000 61 10.5

下载: 导出CSV

表 2 智能体Q-learning算法控制结果

Table 2. Control results of agent Q-learning algorithm

飞行段/s 调节时间/s 超调量/m 稳态误差/m

0~250 49
250~450
450~65 153
650~850 53 13 10.8
850~1 000 97 14.3

下载: 导出CSV

参考文献(20)

[1]	洪延姬, 金星, 李小将. 临近空间飞行器技术[M]. 北京: 国防工业出版社, 2012: 20-34. HONG Y J, JIN X, LI X J. Near space vehicle technology[M]. Beijing: National Defense Industry Press, 2012: 20-34 (in Chinese).
[2]	LI J, LIAO J, LIAO Y X, et al. An approach for estimating perpetual endurance of the stratospheric solar-powered platform[J]. Aerospace Science and Technology, 2018, 79: 118-130. doi: 10.1016/j.ast.2018.05.035
[3]	邓小龙, 杨希祥, 麻震宇, 等. 基于风场环境利用的平流层浮空器区域驻留关键问题研究进展[J]. 航空学报, 2019, 40(8): 022941. DENG X L, YANG X X, MA Z Y, et al. Review of key technologies for station-keeping of stratospheric aerostats based on wind field utilization[J]. Acta Aeronautica et Astronautica Sinica, 2019, 40(8): 022941(in Chinese).
[4]	LU L L, SONG H W, WANG Y W, et al. Deformation behavior of non-rigid airships in wind tunnel tests[J]. Chinese Journal of Aeronautics, 2019, 32(3): 611-618. doi: 10.1016/j.cja.2018.12.016
[5]	张永栋, 翟嘉琪, 孟小君, 等. 基于行为逻辑的平流层飞艇试验自动测试方法[J]. 航空学报, 2018, 39(9): 322191. ZHANG Y D, ZHAI J Q, MENG X J, et al. Approach for automatic testing of stratospheric airship test based on behavior logic[J]. Acta Aeronautica et Astronautica Sinica, 2018, 39(9): 322191(in Chinese).
[6]	赵达, 刘东旭, 孙康文, 等. 平流层飞艇研制现状、技术难点及发展趋势[J]. 航空学报, 2016, 37(1): 45-56. ZHAO D, LIU D X, SUN K W, et al. Research status, technical difficulties and development trend of stratospheric airship[J]. Acta Aeronautica et Astronautica Sinica, 2016, 37(1): 45-56(in Chinese).
[7]	肖存英, 胡雄, 龚建村, 等. 中国上空平流层准零风层的特征分析[J]. 空间科学学报, 2008, 28(3): 230-235. doi: 10.11728/cjss2008.03.230 XIAO C Y, HU X, GONG J C, et al. Analysis of the characteristics of the stratospheric quasi-zero wind layer over China[J]. Chinese Journal of Space Science, 2008, 28(3): 230-235(in Chinese). doi: 10.11728/cjss2008.03.230
[8]	JIANG Y, LV M Y, QU Z P, et al. Performance evaluation for scientific balloon station-keeping strategies considering energy management strategy[J]. Renewable Energy, 2020, 156: 290-302. doi: 10.1016/j.renene.2020.04.011
[9]	王益平, 周飞, 徐明. 临近空间浮空器区域驻留控制策略研究[J]. 中国空间科学技术, 2018, 38(1): 63-69. WANG Y P, ZHOU F, XU M. Research on control strategy of territory-hovering aerostat in near space[J]. Chinese Space Science and Technology, 2018, 38(1): 63-69(in Chinese).
[10]	DU H F, LV M Y, ZHANG L C, et al. Energy management strategy design and station-keeping strategy optimization for high altitude balloon with altitude control system[J]. Aerospace Science and Technology, 2019, 93: 105342. doi: 10.1016/j.ast.2019.105342
[11]	WAGHELA R, YODER C D, GOPALARATHNAM A, et al. Aerodynamic sails for passive guidance of high-altitude balloons: Static-stability and equilibrium performance[J]. Journal of Aircraft, 2019, 56(5): 1849-1857. doi: 10.2514/1.C035353
[12]	KAYHAN Ö, YÜCEL Ö, HASTAOĞLU M A. Simulation and control of serviceable stratospheric balloons traversing a region via transport phenomena and PID[J]. Aerospace Science and Technology, 2016, 53: 232-240.
[13]	ZHENG Z W, CHEN T, XU M, et al. Modeling and path-following control of a vector-driven stratospheric satellite[J]. Advances in Space Research, 2016, 57(9): 1901-1913. doi: 10.1016/j.asr.2016.02.004
[14]	YANG X W, YANG X X, DENG X L. Horizontal trajectory control of stratospheric airships in wind field using Q-learning algorithm[J]. Aerospace Science and Technology, 2020, 106: 106100. doi: 10.1016/j.ast.2020.106100
[15]	SÓBESTER A, CZERSKI H, ZAPPONI N, et al. High-altitude gas balloon trajectory prediction: A MonteCarlo model[J]. AIAA Journal, 2014, 52(4): 832-842. doi: 10.2514/1.J052900
[16]	李春霖, 罗蓉媛, 陈彤曦. 平流层通信新思路—谷歌气球计划[J]. 通信技术, 2015, 48(2): 125-129. doi: 10.3969/j.issn.1002-0802.2015.02.002 LI C L, LUO R Y, CHEN T X. New idea for stratospheric comm-unications—Google Loon[J]. Communications Technology, 2015, 48(2): 125-129(in Chinese). doi: 10.3969/j.issn.1002-0802.2015.02.002
[17]	BELLEMARE M G, CANDIDO S, CASTRO P S, et al. Autonomous navigation of stratospheric balloons using reinforcement learning[J]. Nature, 2020, 588(7836): 77-82. doi: 10.1038/s41586-020-2939-8
[18]	YANG X X, ZHANG W H, HOU Z X. Improved thermal and vertical trajectory model for performance prediction of stratospheric balloons[J]. Journal of Aerospace Engineering, 2015, 28(3): 04014075. doi: 10.1061/(ASCE)AS.1943-5525.0000404
[19]	WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3): 279-292.
[20]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. Computer Science, 2013, 1312: 5602.