留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的跳跃式导弹轨迹优化算法

龚开奇 魏宏夔 李嘉玮 宋晓 李勇 李怡昕 张岳

龚开奇,魏宏夔,李嘉玮,等. 基于深度强化学习的跳跃式导弹轨迹优化算法[J]. 北京航空航天大学学报,2023,49(6):1383-1393 doi: 10.13700/j.bh.1001-5965.2021.0436
引用本文: 龚开奇,魏宏夔,李嘉玮,等. 基于深度强化学习的跳跃式导弹轨迹优化算法[J]. 北京航空航天大学学报,2023,49(6):1383-1393 doi: 10.13700/j.bh.1001-5965.2021.0436
GONG K Q,WEI H K,LI J W,et al. Trajectory optimization algorithm of skipping missile based on deep reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(6):1383-1393 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0436
Citation: GONG K Q,WEI H K,LI J W,et al. Trajectory optimization algorithm of skipping missile based on deep reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(6):1383-1393 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0436

基于深度强化学习的跳跃式导弹轨迹优化算法

doi: 10.13700/j.bh.1001-5965.2021.0436
基金项目: 国家重点研发计划(2018YFB1702703)
详细信息
    通讯作者:

    E-mail:songxiao@buaa.edu.cn

  • 中图分类号: V448.23;TP183

Trajectory optimization algorithm of skipping missile based on deep reinforcement learning

Funds: National Key R& D Program of China (2018YFB1702703)
More Information
  • 摘要:

    导弹的跳跃飞行过程可建模为时变非线性微分方程组,该方程组难以得到解析解,给导弹的轨迹优化带来很大的困难。针对该问题,提出一种基于双深度Q网络的带网络优选(NEO)策略的深度强化学习 (NEO-DDQN)算法,所提算法以跳跃导弹航程最大化为优化目标,在热流密度、动压、过载及末速度等约束下,求解跳跃导弹的轨迹优化问题。设计问题的动作空间、状态空间和奖励函数;确定算法关键参数学习率的取值及合适的贪心策略,并提出NEO策略,得到所提NEO-DDQN算法;开展与最优恒定攻角(OCAOA)方案、遗传算法(GA)的对比实验。结果表明:NEO策略有效提升了算法的求解稳定性且将航程提升了2.52%;与OCAOA方案、GA相比,所提算法使跳跃导弹航程分别提高了2.61%和1.33%;所提算法还避免了直接求解复杂非线性微分方程,为轨迹优化问题提供了一种新型的基于学习的算法。

     

  • 图 1  跳跃导弹的三维模型

    Figure 1.  Three dimensional model of skipping missile

    图 2  导弹受力分析

    Figure 2.  Force analysis of missile

    图 3  导弹跳跃飞行示意图

    Figure 3.  Diagram of skipping flight of missile

    图 4  DQN(DDQN)算法的训练模型

    Figure 4.  Training model of DQN (DDQN) algorithm

    图 5  NEO-DDQN算法的训练模型

    Figure 5.  Training model of NEO-DDQN algorithm

    图 6  5种学习率的训练迭代

    Figure 6.  Training iteration for 5 learning rates

    图 7  5种贪心策略的训练迭代

    Figure 7.  Training iteration under 5 greedy strategies

    图 8  3种算法的热流密度、动压、过载与约束界限的关系

    Figure 8.  Relationship between heat flux, dynamic pressure, overload and restraint for three algorithms

    图 9  3种算法的航程-海拔图

    Figure 9.  Range-altitude diagram of three algorithms

    图 10  3种算法的攻角序列图

    Figure 10.  Sequence diagram of three algorithms of angle of attack

    图 11  导弹跳跃飞行的性能分析

    Figure 11.  Performance analysis of missile skipping flight

    表  1  导弹跳跃飞行仿真参数

    Table  1.   Simulation parameters of missile skipping flight

    参数数值
    初始位置/m(0, 45000)
    初速度大小/(m·s−12224
    初始速度倾角/(°)0
    质量/kg574
    特征面积/m20.203
    高度约束/km25~ 45
    下载: 导出CSV

    表  2  不同时间间隔对训练的影响

    Table  2.   Influence of different time intervals on training

    分段时间间隔/s航程/m
    10747878
    20776593
    40775804
    50768351
    下载: 导出CSV

    表  3  网络结构参数

    Table  3.   Parameters of network structure

    层数神经元个数
    输入层5
    隐含层140
    隐含层240
    输出层1
    下载: 导出CSV

    表  4  NEO-DDQN算法参数

    Table  4.   Parameters of NEO-DDQN algorithm

    参数数值
    观察期回合数N1500
    训练期回合数N259500
    训练频率N320
    更新频率N4200
    折扣因子$ \gamma $0.95
    批大小B32
    记忆池大小M4000
    下载: 导出CSV

    表  5  5种学习率最终航程的收敛值

    Table  5.   Final range convergence values of 5 learning rates

    学习率收敛值/m
    0.001745654
    0.01781434
    0.1774864
    0.2743651
    0.5746867
    下载: 导出CSV

    表  6  5种贪心策略最终航程的收敛值

    Table  6.   Final range convergence values of 5 greedy strategies

    epoch收敛值/m
    10000771926
    20000778832
    30000771433
    40000764878
    50000776593
    下载: 导出CSV

    表  7  NEO策略对算法结果的影响

    Table  7.   Influence of NEO strategy on algorithm results

    编号$ \theta_{1} $/m$ \theta_{2} $/m$ \theta_{2} $−$ \theta_{1} $/m
    17670407718304790
    276060977444013831
    37676427697342092
    4770645771134489
    577158778462613039
    670236578131678951
    776238978529522906
    876483478357518741
    975467777774723070
    1075569876892113223
    平均值75774877686119113
    下载: 导出CSV

    表  8  GA的参数设置

    Table  8.   Parameters setting of genetic algorithm

    参数数值
    时间间隔T/s20
    攻角/(°)0~30(整数)
    种群大小N510
    交叉概率Pc0.8
    变异概率Pv0.9
    迭代次数I100
    下载: 导出CSV

    表  9  GA的优化结果

    Table  9.   Optimization results of genetic algorithm

    编号航程 /m
    1741955
    2767656
    3752029
    4730913
    5721760
    6775017
    7764682
    8751243
    9744770
    10761691
    下载: 导出CSV

    表  10  3种算法的优化结果

    Table  10.   Optimization results of three algorithms

    算法航程
    /m
    末速度/(m·s−1航程提升 /%
    OCAOA7653437150.00
    GA7750177221.26
    NEO-DDQN7852957132.61
    下载: 导出CSV
  • [1] 王在铎, 王惠, 丁楠, 等. 高超声速飞行器技术研究进展[J]. 科技导报, 2021, 39(11): 59-67. doi: 10.3981/j.issn.1000-7857.2021.11.007

    WANG Z D, WANG H, DING N, et al. Research on the development of hypersonic vehicle technology[J]. Science & Technology Review, 2021, 39(11): 59-67(in Chinese). doi: 10.3981/j.issn.1000-7857.2021.11.007
    [2] 邵雷, 雷虎民, 赵锦. 临近空间高超声速飞行器轨迹预测方法研究进展[J]. 航空兵器, 2021, 28(2): 34-39. doi: 10.12132/ISSN.1673-5048.2020.0138

    SHAO L, LEI H M, ZHAO J. Research progress in trajectory prediction for near space hypersonic vehicle[J]. Aero Weaponry, 2021, 28(2): 34-39(in Chinese). doi: 10.12132/ISSN.1673-5048.2020.0138
    [3] 陈小庆, 侯中喜, 刘建霞. 高超声速滑翔式飞行器再入轨迹多目标多约束优化[J]. 国防科技大学学报, 2009, 31(6): 77-83. doi: 10.3969/j.issn.1001-2486.2009.06.015

    CHEN X Q, HOU Z X, LIU J X. Multi-objective optimization of reentry trajectory for hypersonic glide vehicle with multi-constraints[J]. Journal of National University of Defense Technology, 2009, 31(6): 77-83(in Chinese). doi: 10.3969/j.issn.1001-2486.2009.06.015
    [4] AN K, GUO Z Y, XU X P, et al. A framework of trajectory design and optimization for the hypersonic gliding vehicle[J]. Aerospace Science and Technology, 2020, 106: 106110. doi: 10.1016/j.ast.2020.106110
    [5] 何烈堂, 柳军, 侯中喜, 等. 无动力跳跃式跨大气层飞行的可行性研究[J]. 弹箭与制导学报, 2008, 28(2): 155-157. doi: 10.3969/j.issn.1673-9728.2008.02.048

    HE L T, LIU J, HOU Z X, et al. Feasibility study of unpropulsive skipping trans-atmospheric flight[J]. Journal of Projectiles, Rockets, Missiles and Guidance, 2008, 28(2): 155-157(in Chinese). doi: 10.3969/j.issn.1673-9728.2008.02.048
    [6] 国海峰, 黄长强, 丁达理, 等. 考虑随机干扰的高超声速滑翔飞行器轨迹优化[J]. 北京航空航天大学学报, 2014, 40(9): 1281-1290. doi: 10.13700/j.bh.1001-5965.2013.0755

    GUO H F, HUANG C Q, DING D L, et al. Trajectory optimization for hypersonic gliding vehicle considering stochastic disturbance[J]. Journal of Beijing University of Aeronautics and Astronautics, 2014, 40(9): 1281-1290(in Chinese). doi: 10.13700/j.bh.1001-5965.2013.0755
    [7] GATH P F, WELL K H, MEHLEM K. Initial guess generation for rocket ascent trajectory optimization using indirect methods[J]. Journal of Spacecraft and Rockets, 2002, 39(4): 515-521. doi: 10.2514/2.3864
    [8] BARRON R L, CHICK C M. Improved indirect method for air-vehicle trajectory optimization[J]. Journal of Guidance, Control, and Dynamics, 2006, 29(3): 643-652. doi: 10.2514/1.16228
    [9] ROSA SENTINELLA M, CASALINO L. Genetic algorithm and indirect method coupling for low-thrust trajectory optimization[C]//42nd AIAA/ASME/SAE/ASEE Joint Propulsion Conference & Exhibit. Reston: AIAA, 2006: 4468.
    [10] 李瑜, 杨志红, 崔乃刚. 洲际助推-滑翔导弹全程突防弹道优化[J]. 固体火箭技术, 2010, 33(2): 125-130. doi: 10.3969/j.issn.1006-2793.2010.02.002

    LI Y, YANG Z H, CUI N G. Optimization of overall penetration trajectory for intercontinental boost-glide missile[J]. Journal of Solid Rocket Technology, 2010, 33(2): 125-130(in Chinese). doi: 10.3969/j.issn.1006-2793.2010.02.002
    [11] 涂良辉, 袁建平, 岳晓奎, 等. 基于直接配点法的再入轨迹优化设计[J]. 西北工业大学学报, 2006, 24(5): 653-657. doi: 10.3969/j.issn.1000-2758.2006.05.026

    TU L H, YUAN J P, YUE X K, et al. Improving design of reentry vehicle trajectory optimization using direct collocation method[J]. Journal of Northwestern Polytechnical University, 2006, 24(5): 653-657(in Chinese). doi: 10.3969/j.issn.1000-2758.2006.05.026
    [12] RAO A, CLARKE K. Performance optimization of a maneuvering re-entry vehicle using a Legendre pseudospectral method[C]//AIAA Atmospheric Flight Mechanics Conference and Exhibit. Reston: AIAA, 2002: 4885.
    [13] COTTRILL G C, HARMON F G. Hybrid Gauss pseudospectral and generalized polynomial chaos algorithm to solve stochastic optimal control problems[C]//AIAA Guidance, Navigation, and Control Conference. Reston: AIAA, 2011: 6572.
    [14] KUMAR G N, AHMED M S, SARKAR A K, et al. Reentry trajectory optimization using gradient free algorithms[J]. IFAC-Papers OnLine, 2018, 51(1): 650-655. doi: 10.1016/j.ifacol.2018.05.109
    [15] CHAI R Q, SAVVARIS A, TSOURDOS A, et al. Solving multi-objective aeroassisted spacecraft trajectory optimization problems using extended NSGA-II[C]// AIAA SPACE and Astronautics Forum and Exposition. Reston: AIAA, 2017: 5193.
    [16] ZHAO J, ZHOU R. Particle swarm optimization applied to hypersonic reentry trajectories[J]. Chinese Journal of Aeronautics, 2015, 28(3): 822-831. doi: 10.1016/j.cja.2015.04.007
    [17] SHAHZAD SANA K, HU W D. Hypersonic reentry trajectory planning by using hybrid fractional-order particle swarm optimization and gravitational search algorithm[J]. Chinese Journal of Aeronautics, 2021, 34(1): 50-67. doi: 10.1016/j.cja.2020.09.039
    [18] SONG X, HAN D L, SUN J H, et al. A data-driven neural network approach to simulate pedestrian movement[J]. Physica A: Statistical Mechanics and its Applications, 2018, 509(11): 827-844.
    [19] 桑晨, 郭杰, 唐胜景, 等. 基于DDPG算法的变体飞行器自主变形决策[J]. 北京航空航天大学学报, 2022, 48(5): 910-919.

    SANG C, GUO J, TANG S J, et al. Autonomous deformation decision making of morphing aircraft based on DDPG algorithm[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(5): 910-919(in Chinese).
    [20] 许轲, 吴凤鸽, 赵军锁. 基于深度强化学习的软件定义卫星姿态控制算法[J]. 北京航空航天大学学报, 2018, 44(12): 2651-2659. doi: 10.13700/j.bh.1001-5965.2018.0357

    XU K, WU F G, ZHAO J S. Software defined satellite attitude control algorithm based on deep reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2018, 44(12): 2651-2659(in Chinese). doi: 10.13700/j.bh.1001-5965.2018.0357
    [21] NI W J, WU D, MA X P. Energy-optimal flight strategy for solar-powered aircraft using reinforcement learning with discrete actions[J]. IEEE Access, 2021, 9: 95317-95334. doi: 10.1109/ACCESS.2021.3095224
    [22] JIANG L, NAN Y, LI Z H. Realizing midcourse penetration with deep reinforcement learning[J]. IEEE Access, 2021, 9: 89812-89822.
    [23] GAO J S, SHI X M, CHENG Z T, et al. Reentry trajectory optimization based on deep reinforcement learning[C]//2019 Chinese Control and Decision Conference (CCDC). Piscataway: IEEE Press, 2019: 2588-2592.
    [24] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236
    [25] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[EB/OL]. (2013-03-02) [2021-05-15]. http://arxiv.org/abs/1312.5602.
    [26] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: ACM, 2016: 2094–2100.
  • 加载中
图(11) / 表(10)
计量
  • 文章访问数:  381
  • HTML全文浏览量:  80
  • PDF下载量:  75
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-08-02
  • 录用日期:  2021-11-11
  • 网络出版日期:  2021-12-14
  • 整期出版日期:  2023-06-30

目录

    /

    返回文章
    返回
    常见问答