基于深度强化学习的风场中浮空器驻留控制

柏方超; 杨希祥; 邓小龙; 侯中喜

doi:10.13700/j.bh.1001-5965.2022.0629

基于深度强化学习的风场中浮空器驻留控制

doi: 10.13700/j.bh.1001-5965.2022.0629

国防科技大学空天科学学院，长沙 410073

基金项目: 国家自然科学基金(61903369,52272445)；湖南省自然科学基金(2023JJ10056)

详细信息

通讯作者:
E-mail：nkyangxixiang@163.com

中图分类号: V274
计量
- 文章访问数: 262
- HTML全文浏览量: 96
- PDF下载量: 4
- 被引次数: 0
出版历程
- 收稿日期: 2022-07-19
- 录用日期: 2022-12-09
- 网络出版日期: 2022-12-26
- 整期出版日期: 2024-07-18

Station keeping control for aerostat in wind fields based on deep reinforcement learning

College of Aerospace Science and Engineering，National University of Defense Technology，Changsha 410073，China

Funds: National Natural Science Foundation of China (61903369、52272445); Natural Science Foundation of Hunan Province (2023JJ10056)

More Information

Corresponding author: E-mail：nkyangxixiang@163.com

摘要

摘要:
建立了平流层浮空器区域驻留模型，在有动力和无动力推进的情况下，基于马尔可夫决策过程，将具有优先经验回放的双深度Q学习应用于平流层浮空器区域驻留控制。通过平均区域驻留半径、区域驻留有效时间比等参数来评价区域驻留控制方法的效果。典型风场中仿真分析结果指出：在区域驻留半径为50 km、区域驻留时间为3天的任务下，无动力推进的平流层浮空器的平均区域驻留半径为28.16 km，区域驻留有效时间比为83%；有动力推进平流层浮空器的平均区域驻留半径可达8.84 km，可实现区域驻留半径为20 km的飞行控制，区域驻留有效时间比为100%。
- 平流层浮空器 /
- 动态风场 /
- 区域驻留控制 /
- 深度强化学习 /
- 动力推进
Abstract:
In this paper, a stratospheric aerostat station keeping model is established. Based on Markov decision process, Double Deep Q-learning with prioritized experience replay is applied to stratospheric aerostat station keeping control under dynamic and non-dynamic conditions. Ultimately, metrics like the average station keeping radius and the station keeping effective time ratio are used to assess the effectiveness of the station keeping control approach. The simulation analysis results show that: under the mission the station keeping radius is 50 km and the station keeping time is three days, in the case of no power propulsion, the average station keeping radius of the stratospheric aerostat is 28.16 km, the station keeping effective time ratio is 83%. In the case of powered propulsion, the average station keeping radius of the stratospheric aerostat is significantly increased. The powered stratospheric aerostat can achieve flight control with a station keeping radius of 20 km, an average station keeping radius of 8.84 km, and a station keeping effective time ratio of 100%.
- stratospheric aerostat /
- dynamic wind field /
- station keeping control /
- reinforcement learning /
- power propulsion

HTML全文

图 1 平流层浮空器系统

Figure 1. Stratospheric aerostat system

下载: 全尺寸图片幻灯片

图 2 平流层浮空器水平方向转移策略原理

Figure 2. Schematic diagram of the horizontal transfer strategy of the stratospheric aerostat

下载: 全尺寸图片幻灯片

图 3 基于定点悬停的控制策略流程图

Figure 3. Control strategy flow chart based on fixed-point hovering

下载: 全尺寸图片幻灯片

图 4 智能体状态转移过程

Figure 4. Agent state transition process in an intelligent body

下载: 全尺寸图片幻灯片

图 5 基于DDQN的区域驻留控制流程

Figure 5. The area residency control flow diagram based on DDQN

下载: 全尺寸图片幻灯片

图 6 风场示意图

Figure 6. Schematic of wind fields

下载: 全尺寸图片幻灯片

图 7 定点悬停下的飞行仿真结果

Figure 7. The flight simulation result of the fixed point hovering

下载: 全尺寸图片幻灯片

图 8 强化学习控制下的飞行仿真结果

Figure 8. Flight simulation results under reinforcement learning control

下载: 全尺寸图片幻灯片

图 9 风场扰动下飞行仿真结果

Figure 9. Flight simulation results under wind disturbance

下载: 全尺寸图片幻灯片

图 10 东西单通道控制飞行仿真结果

Figure 10. Single-channel control flight simulation results in the east-west direction

下载: 全尺寸图片幻灯片

图 11 南北单通道控制飞行仿真结果

Figure 11. Single-channel control flight simulation results in the north-south direction

下载: 全尺寸图片幻灯片

图 12 双通道控制飞行仿真结果

Figure 12. Dual-channel control flight simulation results

下载: 全尺寸图片幻灯片

图 13 平均奖励值

Figure 13. Average rewards during training process

下载: 全尺寸图片幻灯片

表 1 环境状态空间参数设置

Table 1. Environmental state space parameter setting

参数	取值范围
高度h/km	18～22
东向位置x/km	−50～50
北向位置y/km	−50～50
副气囊空气质量m_air/kg	0～158
风速S_w/(m·s⁻¹)	−
风向与位置角度δ	0～π
注：东向、北向位置限制条件为$\sqrt {{x^2} + {y^2}} \leqslant {\text{50}} $，风速S_w根据真实风场确定，风向与位置角度δ根据真实风场与当前位置确定。

下载: 导出CSV

表 2 无推进系统作用下平流层浮空器动作空间

Table 2. Action space of stratospheric aerostat without propulsion system

动作空间	动作
a₁	阀门排气
a₂	阀门关
a₃	风机吸气

下载: 导出CSV

表 3 东西方向单通道推进系统作用下平流层浮空器动作空间

Table 3. Action space of stratospheric aerostat under the action of single-channel propulsion system in east-west direction

动作空间	动作
a₁	阀门排气，螺旋桨向东推进
a₂	阀门排气，螺旋桨向西推进
a₃	阀门排气，螺旋桨关闭
a₄	阀门关，螺旋桨向东推进
a₅	阀门关，螺旋桨向西推进
a₆	阀门关，螺旋桨关闭
a₇	风机吸气，螺旋桨向东推进
a₈	风机吸气，螺旋桨向西推进
a₉	风机吸气，螺旋桨关闭

下载: 导出CSV

表 4 南北方向单通道推进系统作用下平流层浮空器动作空间

Table 4. Action space of stratospheric aerostat under the action of single-channel propulsion system in north-south direction

动作空间	动作
a₁	阀门排气，螺旋桨向北推进
a₂	阀门排气，螺旋桨向南推进
a₃	阀门排气，螺旋桨关闭
a₄	阀门关，螺旋桨向北推进
a₅	阀门关，螺旋桨向南推进
a₆	阀门关，螺旋桨关闭
a₇	风机吸气，螺旋桨向北推进
a₈	风机吸气，螺旋桨向南推进
a₉	风机吸气，螺旋桨关闭

下载: 导出CSV

表 5 双通道推进系统作用下平流层浮空器动作空间

Table 5. The action space of the stratospheric aerostat under the action of the dual-channel propulsion system

动作空间	动作
a₁	阀门排气，螺旋桨向北
a₂	阀门排气，螺旋桨向东
a₃	阀门排气，螺旋桨向南
a₄	阀门排气，螺旋桨向北
a₅	阀门排气，螺旋桨向东北
a₆	阀门排气，螺旋桨向东南
a₇	阀门排气，螺旋桨向西南
a₈	阀门排气，螺旋桨向西北
a₉	阀门排气，螺旋桨关闭
a₁₀	阀门关，螺旋桨向北
a₁₁	阀门关，螺旋桨向东
a₁₂	阀门关，螺旋桨向南
a₁₃	阀门关，螺旋桨向西
a₁₄	阀门关，螺旋桨向东北
a₁₅	阀门关，螺旋桨向东南
a₁₆	阀门关，螺旋桨向西南
a₁₇	阀门关，螺旋桨向西北
a₁₈	阀门关，螺旋桨关闭
a₁₉	风机吸气，螺旋桨向北
a₂₀	风机吸气，螺旋桨向东
a₂₁	风机吸气，螺旋桨向南
a₂₂	风机吸气，螺旋桨向西
a₂₃	风机吸气，螺旋桨向东北
a₂₄	风机吸气，螺旋桨向东南
a₂₅	风机吸气，螺旋桨向西南
a₂₆	风机吸气，螺旋桨向西北
a₂₇	风机吸气，螺旋桨关闭

下载: 导出CSV

表 6 DDQN算法参数设置

Table 6. DDQN algorithm parameter settings

训练参数	数值
批学习数N_b	512
最大训练回合数N_max	2×10⁴
记忆回放单元大小M	2×10⁶
学习率	0.001
奖励偏差	−0.1
ε-贪婪算法下降参数${\varepsilon _{{\text{dec}}}}$	0.01
ε初值	0.98

下载: 导出CSV

表 7 平流层浮空器参数

Table 7. Stratospheric aerostat parameters

参数	数值
囊体半径/m	8.7
囊体体积/m³	2780
囊体总质量/kg	48
系统总质量/kg	177.2
阀门数量	1
阀门半径/m	0.04
工作高度/km	18～22

下载: 导出CSV

表 8 平流层浮空器初始状态

Table 8. Initial state of stratospheric aerostat

状态量	状态值
高度${h_0}$/km	20
x方向x₀/km	0
y方向y₀/km	0
初始空气质量/kg	67.58
初始时间	2021-08-03T0
结束时间	2021-08-06T0

下载: 导出CSV

参考文献(33)

[1]	侯中喜, 杨希祥, 乔凯, 等. 平流层飞艇技术[M]. 北京: 科学出版社, 2019. HOU Z X, YANG X X, QIAO K, et al. Stratospheric airship technology [M]. Beijing: Science Press, 2019.
[2]	李智斌, 黄宛宁, 张钊, 等. 2020年临近空间科技热点回眸[J]. 科技导报, 2021, 39(1): 54-68. LI Z B, HUANG W N, ZHANG Z, et al. Summary of the hot spots of near space science and technology in 2020[J]. Science & Technology Review, 2021, 39(1): 54-68(in Chinese).
[3]	NOCK K T, HEUN M K, AARON K M. Global stratospheric balloon constellations[J]. Advances in Space Research, 2002, 30(5): 1233-1238. doi: 10.1016/S0273-1177(02)00528-8
[4]	赵达, 刘东旭, 孙康文, 等. 平流层飞艇研制现状、技术难点及发展趋势[J]. 航空学报, 2016, 37(1): 45-56. ZHAO D, LIU D X, SUN K W, et al. Research status, technical difficulties and development trend of stratospheric airship[J]. Acta Aeronautica et Astronautica Sinica, 2016, 37(1): 45-56(in Chinese).
[5]	CATHEY H M, TUTTLE J W, FAIRBROTHER D A, et al. Qualification of the NASA super pressure balloon[C]// AIAA Balloon Systems Conference. Reston: AIAA, 2015: 2909.
[6]	刘东旭, 樊彦斌, 马云鹏, 等. 氦气渗透对高空长航时浮空器驻空能力影响[J]. 宇航学报, 2010, 31(11): 2477-2482. LIU D X, FAN Y B, MA Y P, et al. Effect of helium permeability on working endurance high altitude long duration LTA vehicle[J]. Journal of Astronautics, 2010, 31(11): 2477-2482(in Chinese).
[7]	杨燕初, 张航悦, 赵荣. 零压式高空气球球形设计与参数敏感性分析[J]. 国防科技大学学报, 2019, 41(1): 58-64. doi: 10.11887/j.cn.201901009 YANG Y C, ZHANG H Y, ZHAO R. Shape design of zero pressure high altitude balloon and sensitivity analysis of key parameters[J]. Journal of National University of Defense Technology, 2019, 41(1): 58-64(in Chinese). doi: 10.11887/j.cn.201901009
[8]	杨跃能. 平流层飞艇动力学建模与控制方法研究[D]. 长沙: 国防科学技术大学, 2013. YANG Y N. Dynamics modeling and flight control for a stratospheric airship[D]. Changsha: National University of Defense Technology, 2013(in Chinese).
[9]	杨希祥, 朱炳杰, 邓小龙, 等. Stratobus平流层飞艇项目研究进展与仿真分析[J]. 航空学报, 2021, 42(9): 224579. YANG X X, ZHU B J, DENG X L, et al. Development status and simulation analysis of stratospheric airship Stratobus[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(9): 224579(in Chinese).
[10]	RONEY J A. Statistical wind analysis for near-space applications[J]. Journal of Atmospheric and Solar-Terrestrial Physics, 2007, 69(13): 1485-1501. doi: 10.1016/j.jastp.2007.05.005
[11]	邓小龙, 丛伟轩, 李魁, 等. 风场综合利用的新型平流层浮空器轨迹设计[J]. 宇航学报, 2019, 40(7): 748-757. DENG X L, CONG W X, LI K, et al. Trajectory design of a novel stratospheric aerostat based on comprehensive utilization of wind fields[J]. Journal of Astronautics, 2019, 40(7): 748-757(in Chinese).
[12]	翟嘉琪, 杨希祥, 邓小龙, 等. 不确定风场下平流层浮空器全局路径规划[J]. 北京航空航天大学学报, 2023, 49(5): 1116-1126. ZHAI J Q, YANG X X, DENG X L, et al. Global path planning of stratospheric aerostat in uncertain wind field[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(5): 1116-1126(in Chinese).
[13]	SMITH M S. Demonstration of fine altitude control on stratospheric balloons to achieve a desired ground track[C]// AIAA Balloon Systems Conference. Reston: AIAA, 2017: 3287.
[14]	李魁, 邓小龙, 杨希祥, 等. 基于平流层风场预测的浮空器轨迹控制[J]. 北京航空航天大学学报, 2019, 45(5): 1008-1018. LI K, DENG X L, YANG X X, et al. Trajectory control of aerostat based on prediction of stratospheric wind field[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(5): 1008-1018(in Chinese).
[15]	TRAN N K, HE X, ZLOTNIK D E, et al. Attitude sensing and control of a stratospheric ballon platform[C]// AIAA Balloon Systems (BAL) Conference. Reston: AIAA, 2013: 1373.
[16]	DU H F, SUN T F, LV M Y, et al. Dynamic coverage performance of wind-assisted balloons mesh based on Voronoi partition and energy constraint[J]. Advances in Space Research, 2022, 70(2): 470-484. doi: 10.1016/j.asr.2022.04.051
[17]	YODER C D, GEMMER T R, MAZZOLENI A P. Modelling and performance analysis of a tether and sail-based trajectory control system for extra-terrestrial scientific balloon missions[J]. Acta Astronautica, 2019, 160: 527-537. doi: 10.1016/j.actaastro.2018.12.030
[18]	RAMESH S S, MA J L, LIM K M, et al. Numerical evaluation of station-keeping strategies for stratospheric balloons[J]. Aerospace Science and Technology, 2018, 80: 288-300. doi: 10.1016/j.ast.2018.07.010
[19]	BELLEMARE M G, CANDIDO S, CASTRO P S, et al. Autonomous navigation of stratospheric balloons using reinforcement learning[J]. Nature, 2020, 588(7836): 77-82. doi: 10.1038/s41586-020-2939-8
[20]	DU H F, LV M Y, LI J, et al. Station-keeping performance analysis for high altitude balloon with altitude control system[J]. Aerospace Science and Technology, 2019, 92: 644-652. doi: 10.1016/j.ast.2019.06.035
[21]	DU H F, LV M Y, ZHANG L C, et al. Energy management strategy design and station-keeping strategy optimization for high altitude balloon with altitude control system[J]. Aerospace Science and Technology, 2019, 93: 105342. doi: 10.1016/j.ast.2019.105342
[22]	王益平, 周飞, 徐明. 临近空间浮空器区域驻留控制策略研究[J]. 中国空间科学技术, 2018, 38(1): 63-69. WANG Y P, ZHOU F, XU M. Research on control strategy of territory-hovering aerostat in near space[J]. Chinese Space Science and Technology, 2018, 38(1): 63-69(in Chinese).
[23]	邓小龙, 杨希祥, 麻震宇, 等. 基于风场环境利用的平流层浮空器区域驻留关键问题研究进展[J]. 航空学报, 2019, 40(8): 022941. DENG X L, YANG X X, MA Z Y, et al. Review of key technologies for station-keeping of stratospheric aerostats based on wind field utilization[J]. Acta Aeronautica et Astronautica Sinica, 2019, 40(8): 022941(in Chinese).
[24]	JIANG Y, LV M Y, ZHU W Y, et al. A method of 3-D region controlling for scientific balloon long-endurance flight in the real wind[J]. Aerospace Science and Technology, 2020, 97: 105618. doi: 10.1016/j.ast.2019.105618
[25]	JIANG Y, LV M Y, LI J. Station-keeping control design of double balloon system based on horizontal region constraints[J]. Aerospace Science and Technology, 2020, 100: 105792. doi: 10.1016/j.ast.2020.105792
[26]	杨思明, 单征, 丁煜, 等. 深度强化学习研究综述[J]. 计算机工程, 2021, 47(12): 19-29. YANG S M, SHAN Z, DING Y, et al. Survey of research on deep reinforcement learning[J]. Computer Engineering, 2021, 47(12): 19-29(in Chinese).
[27]	张悦. 多智能体深度强化学习方法及应用研究[D]. 西安: 西安电子科技大学, 2018. ZHANG Y. Research on multi-agent deep reinforcement learning methods and applications[D]. Xi’an: Xidian University, 2018(in Chinese).
[28]	XU Z Y, LIU Y, DU H F, et al. Station-keeping for high-altitude balloon with reinforcement learning[J]. Advances in Space Research, 2022, 70(3): 733-751. doi: 10.1016/j.asr.2022.05.006
[29]	张小达, 张鹏, 李小龙. 《标准大气与参考大气模型应用指南》介绍[J]. 航天标准化, 2010(3): 8-11. ZHANG X D, ZHANG P, LI X L. Introduction of “application guide of standard atmosphere and reference atmosphere model”[J]. Aerospace Standardization, 2010(3): 8-11(in Chinese).
[30]	张顶立. 基于深度强化学习的城市场景无人机避撞决策研究[D]. 广汉: 中国民用航空飞行学院, 2022. ZHANG D L. Research on autonomous collision avoidance decisionmaking of UAV in urban airspace based on deep reinforcement learning[D]. Guanghan: Civil Aviation Flight University of China, 2022(in Chinese).
[31]	LI J X, CHEN Y T, ZHAO X N, et al. An improved DQN path planning algorithm[J]. The Journal of Supercomputing, 2022, 78(1): 616-639. doi: 10.1007/s11227-021-03878-2
[32]	SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. 2nd ed. London: MIT, 2018.
[33]	SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB/OL]. (2016-02-25)[2022-06-19]. https://doi.org/10.48550/arXiv.1511.05952.