留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于深度强化学习的固定翼无人机纵向控制

何海洋 赵振根 孔飞

何海洋,赵振根,孔飞. 基于深度强化学习的固定翼无人机纵向控制[J]. 北京航空航天大学学报,2026,52(4):1306-1315
引用本文: 何海洋,赵振根,孔飞. 基于深度强化学习的固定翼无人机纵向控制[J]. 北京航空航天大学学报,2026,52(4):1306-1315
HE H Y,ZHAO Z G,KONG F. Longitudinal control of fixed-wing UAV based on deep reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics,2026,52(4):1306-1315 (in Chinese)
Citation: HE H Y,ZHAO Z G,KONG F. Longitudinal control of fixed-wing UAV based on deep reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics,2026,52(4):1306-1315 (in Chinese)

基于深度强化学习的固定翼无人机纵向控制

doi: 10.13700/j.bh.1001-5965.2024.0075
基金项目: 

国家自然科学基金 (62233009,62003161)

详细信息
    通讯作者:

    E-mail:zhaozhengen@nuaa.edu.cn

  • 中图分类号: V249.1

Longitudinal control of fixed-wing UAV based on deep reinforcement learning

Funds: 

National Natural Science Foundation of China(62233009,62003161)

More Information
  • 摘要:

    固定翼无人机(UAV)作为典型的非线性系统,其动态特性变得越来越复杂。传统的控制方法主要基于模型和经验设计,缺乏对复杂环境和任务的适应性。基于多维连续状态输入、多维连续动作输出的深度确定性策略梯度(DDPG)算法,设计了一种固定翼无人机的纵向飞行控制器,以多个时刻的速度、俯仰角跟踪误差及相关量作为控制器的输入,输出为升降舵舵偏角和发动机推力信号。为提高算法的学习效率,减轻稀疏奖励对算法学习的影响,奖励函数中除跟踪误差的密集惩罚项外,还引入了正值激励因子,当跟踪误差控制在一定范围内并快速跟踪目标时给予正值奖励。实现了从无人机状态到控制面的端到端控制,并使用比例-积分-微分(PID)控制器进行了变控制目标与模型参数摄动的飞行仿真对比,仿真结果表明,基于深度强化学习(DRL)算法构建的控制系统不仅能实现控制目标,还具备一定的泛化能力和鲁棒性,控制性能在部分情况下优于PID控制器。

     

  • 图 1  无人机系统坐标、角度与空气动力学示意图

    Figure 1.  Illustration of UAV system coordinates, angles, and aerodynamics forces

    图 2  DDPG算法结构

    Figure 2.  DDPG algorithm structure

    图 3  奖励变化对比曲线

    Figure 3.  Comparison of reward variation curves

    图 4  DDPG控制结果对比

    Figure 4.  Comparison of DDPG control results

    图 5  DDPG与PID控制结果对比

    Figure 5.  Comparison of control results between DDPG and PID

    图 6  DDPG控制器控制结果

    Figure 6.  Control results of DDPG controller

    图 7  DDPG输出的控制信号

    Figure 7.  Control signal output of DDPG

    图 8  变期望俯仰角后DDPG与PID控制结果对比

    Figure 8.  Comparison of DDPG and PID control results after changing desired pitch angle

    图 9  变期望俯仰角的DDPG控制结果

    Figure 9.  DDPG control results for changing desired pitch angle

    图 10  DDPG和PID在模型参数摄动下的控制结果对比

    Figure 10.  Comparison of control results between DDPG and PID controllers under perturbed model parameters

    图 11  DDPG在模型参数摄动的控制结果

    Figure 11.  Control results of DDPG with model parameter perturbation

    表  1  DDPG训练参数

    Table  1.   DDPG training parameters

    Critic 学习率 Actor 学习率 优化器 批数量 经验回放池大小 惯性更新率 折扣系数 单次训练周期/个
    0.001 0.000 5 Adam 128 1000000 0.001 0.98 1000
     注:一个单次训练周期为10 s。
    下载: 导出CSV

    表  2  控制器对比仿真使用的参数不确定性

    Table  2.   Parameters uncertainties used for controllers comparison simulation

    参数 不确定性/%
    $ m $ 25
    $ {I}_{y} $ 25
    $ \rho $ 25
    $ {C}_{{{L}_{\alpha }}} $ 25
    $ {C}_{{{L}_{0}}}$ 10
    ${C}_{{{L}_{{{\delta }_{{\mathrm{e}}}}}}} $ 10
    $ {C}_{{{m}_{0}}} $ −20
    $ {C}_{{{m}_{\alpha }}}$ −20
    $ {C}_{{{m}_{{{\delta }_{{\mathrm{e}}}}}}} $ −20
    下载: 导出CSV
  • [1] GHAMARI M, RANGEL P, MEHRUBEOGLU M, et al. Unmanned aerial vehicle communications for civil applications: a review[J]. IEEE Access, 2022, 10: 102492-102531.
    [2] 符文星, 郭行, 闫杰. 智能无人飞行器技术发展趋势综述[J]. 无人系统技术, 2019, 2(4): 31-37.

    FU W X, GUO H, YAN J. Overview on the technology development trend of intelligent unmanned aerial vehicle[J]. Unmanned Systems Technology, 2019, 2(4): 31-37(in Chinese).
    [3] SHAN Y Q, WANG S, KONVISAROVA A, et al. Attitude control of flying wing UAV based on advanced ADRC[J]. IOP Conference Series: Materials Science and Engineering, 2019, 677(5): 137-142.
    [4] YU Z Q, ZHANG Y M, JIANG B. PID-type fault-tolerant prescribed performance control of fixed-wing UAV[J]. Journal of Systems Engineering and Electronics, 2021, 32(5): 1053-1061.
    [5] ZHAO X C, YUAN M N, CHENG P Y, et al. Robust H/S-plane controller of longitudinal control for UAVs[J]. IEEE Access, 2019, 7: 91367-91374.
    [6] ZHENG F Y, ZHEN Z Y, GONG H J. Observer-based backstepping longitudinal control for carrier-based UAV with actuator faults[J]. Journal of Systems Engineering and Electronics, 2017, 28(2): 322-377.
    [7] SEOKWON L, JIHOON L, SOMANG L, et al. Sliding mode guidance and control for UAV carrier landing[J]. IEEE Transactions on Aerospace and Electronic Systems, 2019, 55(2): 951-966.
    [8] ZHANG J L, ZHANG P, YAN J G. Distributed adaptive finite-time compensation control for UAV swarm with uncertain disturbances[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2021, 68(2): 829-841.
    [9] YU K Y, JIN K, DENG X Y. Review of deep reinforcement learning[C]//Proceedings of the 2022 IEEE 5th Advanced Information Management, Communicates, Electronic and Automation Control Conference. Piscataway: IEEE Press, 2022: 41-48.
    [10] 甄岩, 袁健全, 池庆玺, 等. 深度强化学习方法在飞行器控制中的应用研究[J]. 战术导弹技术, 2020(4): 112-118.

    ZHEN Y, YUAN J Q, CHI Q X, et al. Research on application of deep reinforcement learning method in aircraft control[J]. Tactical Missile Technology, 2020(4): 112-118(in Chinese).
    [11] 程林, 蒋方华, 李俊峰. 深度学习在飞行器动力学与控制中的应用研究综述[J]. 力学与实践, 2020, 42(3): 267-276.

    CHENG L, JIANG F H, LI J F. A review on the applications of deep learning in aircraft dynamics and control[J]. Mechanics in Engineering, 2020, 42(3): 267-276(in Chinese).
    [12] PI C H, HU K C, CHENG S, et al. Low-level autonomous control and tracking of quadrotor using reinforcement learning[J]. Control Engineering Practice, 2020, 95: 104222.
    [13] 孙丹, 高东, 郑建华, 等. 引入积分补偿的四旋翼确定性策略梯度控制器[J]. 计算机工程与设计, 2023, 44(1): 255-261.

    SUN D, GAO D, ZHENG J H, et al. Deterministic policy gradient controller with integral compensator for quadrotor[J]. Computer Engineering and Design, 2023, 44(1): 255-261(in Chinese).
    [14] YOO J, JANG D, KIM H J, et al. Hybrid reinforcement learning control for a micro quadrotor flight[J]. IEEE Control Systems Letters, 2021, 5(2): 505-510.
    [15] HAN H R, CHENG J, XI Z L, et al. Cascade flight control of quadrotors based on deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2022, 7(4): 11134-11141.
    [16] 梁吉, 王立松, 黄昱洲, 等. 基于深度强化学习的四旋翼无人机自主控制方法[J]. 计算机科学, 2023, 50(增刊2): 13-19.

    LIANG J, WANG L S, HUANG Y Z, et al. Autonomous control method of quadrotor UAV based on deep reinforcement learning[J]. Computer Science, 2023, 50(Sup 2): 13-19(in Chinese).
    [17] 孙丹, 高东, 郑建华, 等. 示教知识辅助的无人机强化学习控制算法[J]. 北京航空航天大学学报, 2023, 49(6): 1424-1433.

    SUN D, GAO D, ZHENG J H, et al. UAV reinforcement learning control algorithm with demonstrations[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(6): 1424-1433(in Chinese).
    [18] 张经伦, 杨希祥, 邓小龙, 等. 基于深度强化学习的平流层浮空器高度控制[J]. 北京航空航天大学学报, 2023, 49(8): 2062-2070.

    ZHANG J L, YANG X X, DENG X L, et al. Altitude control of stratospheric aerostat based on deep reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(8): 2062-2070(in Chinese).
    [19] BØHN E, COATES E M, REINHARDT D, et al. Data-efficient deep reinforcement learning for attitude control of fixed-wing UAVs: field experiments[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 35(3): 3168-3180.
    [20] BOHN E, COATES E M, MOE S, et al. Deep reinforcement learning attitude control of fixed-wing UAVs using proximal policy optimization[C]//Proceedings of the 2019 International Conference on Unmanned Aircraft Systems. Piscataway: IEEE Press, 2019: 523-533.
    [21] 章胜, 杜昕, 肖娟, 等. 基于深度强化学习的固定翼飞行器六自由度飞行智能控制[J]. 指挥与控制学报, 2022, 8(2): 179-188.

    ZHANG S, DU X, XIAO J, et al. Fixed-wing aircraft 6-DOF flight control based on deep reinforcement learning[J]. Journal of Command and Control, 2022, 8(2): 179-188(in Chinese).
    [22] BEARD R W, MCLAIN T W. Small unmanned aircraft: theory and practice[M]. Princeton: Princeton University Press, 2012: 43-52.
  • 加载中
图(11) / 表(2)
计量
  • 文章访问数:  913
  • HTML全文浏览量:  233
  • PDF下载量:  76
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-02-01
  • 录用日期:  2024-03-29
  • 网络出版日期:  2024-04-28
  • 整期出版日期:  2026-04-30

目录

    /

    返回文章
    返回
    常见问答