Direct lift control technology of carrier aircraft landing based on reinforcement learning
-
摘要:
针对舰载机自动着舰过程中受甲板运动及舰尾流扰动很容易发生触舰危险的问题,提出了基于近端策略优化(PPO)算法的舰载机自动着舰直接升力控制方法。PPO控制器以俯仰角、高度、航迹倾斜角、俯仰角速率、高度误差和航迹倾斜角速率等6个状态变作为输入,以襟翼的舵偏角增量作为输出,实现舰载机在着舰时航迹倾斜角的快速响应。与传统控制器相比,PPO控制器中的Actor-Critic框架大大提高了控制量的计算效率,降低了参数优化的难度。仿真实验基于MATLAB/Simulink中的F/A-18飞机动力学/运动学模型。利用PyCharm平台上构建的深度强化学习训练环境,通过UDP通信实现2个平台之间的数据交互。仿真结果表明:所提方法具有响应速度快、动态误差小的特点,能够将着舰的高度误差稳定在$ \pm $0.2 m以内,具有较高的控制精度。
Abstract:The direct lift control method of automatic landing based on Proximal Policy Optimization (PPO) algorithm was proposed to solve the problem that it is easy to touch ship due to disturbance of deck movement and carrier air wake during automatic landing of carrier aircraft. The PPO controller takes six state variables of pitch angle, height, flight path angle, pitch angle rate, height error and flight path angle rate as input and output as flap deflection angle, realizing the rapid response of carrier aircraft in different landing states of flight path angle. Compared with traditional PID controller, the Actor-Critic network in PPO controller greatly improves the calculation efficiency of control quantity, and also reduces the difficulty of parameter optimization. The simulation experiment in this paper is based on the dynamics/kinematics model of F/A-18 aircraft constructed in Matlab/Simulink. The intensive learning and training environment built on PyCharm platform is used to realize the data interaction between the two platforms through user datagram protocol (UDP) communication. The simulation results show that the proposed method has the characteristics of fast response speed and small dynamic error, and can stabilize the landing height error within ±0.2 m, with high control accuracy.
-
表 1 气动参数及外形参数
Table 1. Pneumatic parameters and shape parameters
参数 数值 机翼面积 ${S_{\text{w}}}$/m2 37.16 平均气动弦长 $\bar c$/m 3.51 翼展 $b$/m 11.43 飞机质量 $m$/kg 16 651 绕${O_{\text{b}}}{x_{\text{b}}}$轴转动惯量 ${I_{xx}}$/(kg·m2) 31 184 绕${O_{\text{b}}}{y_{\text{b}}}$轴转动惯量 ${I_{yy}}$/(kg·m2) 205 130 绕${O_{\text{b}}}{{\textit{z}}_{\text{b}}}$轴转动惯量 ${I_{{\textit{zz}}}}$/(kg·m2) 230 415 绕${O_{\text{b}}}{x_{\text{b}}}$轴惯性积${I_{x{\textit{z}}}}$/(kg·m2) −4 028 表 2 算法参数设置
Table 2. PPO algorithm parameter setting table
参数 数值 PyCharm平台IP端口 127.0.0.1.50000 (地址)Simulink平台IP端口 127.0.0.1.50001 (地址)Actor网络结构 6×128×128×2 Critic网络结构 6×128×128×128×1 Actor网络学习率 1×10−3 Critic网络学习率 5×10−3 回合数 20 000 步数 1 500 ${\gamma _{{\text{rl}}}}$ 0.99 $\lambda $ 0.95 $\varepsilon $ 0.2 训练前交互步数 2 048 交叉熵损失函数权重 0.01 小批次数据量 256 平均奖励计算回合数 10 动作界限值 30 -
[1] 张守权, 王华明. 舰载机全自动着舰综述[J]. 飞机设计, 2022, 42(3): 20-24.ZHANG S Q, WANG H M. Summary report on automatic carrier landing system[J]. Aircraft Design, 2022, 42(3): 20-24 (in Chinese). [2] 甄子洋, 王新华, 江驹, 等. 舰载机自动着舰引导与控制研究进展[J]. 航空学报, 2017, 38(2): 020435.ZHEN Z Y, WANG X H, JIANG J, et al. Research progress in guidance and control of automatic carrier landing of carrier-based aircraft[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(2): 020435 (in Chinese). [3] 张守权. 基于直接力控制的人工着舰技术综述[J]. 飞机设计, 2022, 42(2): 21-25.ZHANG S Q. A review of manual carrier landing technology based on direct force control[J]. Aircraft Design, 2022, 42(2): 21-25 (in Chinese). [4] WU W H, SONG L T, ZHANG Y, et al. Nonlinear comprehensive decoupling controller based on direct lift control for carrier landing[J]. IEEE Access, 2022, 10: 113875-113887. [5] YAN Y D, YANG J, LIU C J, et al. On the actuator dynamics of dynamic control allocation for a small fixed-wing UAV with direct lift control[J]. IEEE Transactions on Control Systems Technology, 2020, 28(3): 984-991. doi: 10.1109/TCST.2019.2945909 [6] GUAN Z Y, LIU H, ZHENG Z W, et al. Moving path following with integrated direct lift control for carrier landing[J]. Aerospace Science and Technology, 2022, 120: 107247. [7] 罗飞, 张军红, 耿延升, 等. 动态逆反馈控制框架下直接升力控制的控制分配研究[J]. 航空科学技术, 2022, 33(8): 51-60.LUO F, ZHANG J H, GENG Y S, et al. Study on control allocation technology of direct lift control under dynamic inversion feedback control framework[J]. Aeronautical Science & Technology, 2022, 33(8): 51-60 (in Chinese). [8] 魏毅寅, 郝明瑞, 范宇. 人工智能技术在宽域飞行器控制中的应用[J]. 宇航学报, 2023, 44(4): 530-537.WEI Y Y, HAO M R, FAN Y. The application of artificial intelligence technology in wide-field vehicle control[J]. Journal of Astronautics, 2023, 44(4): 530-537 (in Chinese). [9] 孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报, 2021, 42(8): 525799.SUN Z X, YANG S Q, PIAO H Y, et al. A survey of air combat artificial intelligence[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 525799 (in Chinese). [10] 付宇鹏, 邓向阳, 何明, 等. 基于强化学习的固定翼飞机姿态控制方法[J]. 控制与决策, 2023, 38(9): 2505-2510.FU Y P, DENG X Y, HE M, et al. Reinforcement learning based attitude controller design[J]. Control and Decision, 2023, 38(9): 2505-2510 (in Chinese). [11] 张瑞卿, 钟睿, 徐毅. 基于强化学习的航天器姿态控制器设计[J]. 上海航天(中英文), 2023, 40(1): 80-85.ZHANG R Q, ZHONG R, XU Y. Satellite attitude control based on reinforcement learning method[J]. Aerospace Shanghai (Chinese & English), 2023, 40(1): 80-85 (in Chinese). [12] 金磊, 杨绍龙. 基于强化学习的航天器姿态预设性能容错控制[J]. 北京航空航天大学学报, 2024, 50(8): 2404-2412.JIN L, YANG S L. Fault-tolerant control of spacecraft attitude with prescribed performance based on reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50(8): 2404-2412 (in Chinese). [13] 付宇鹏, 邓向阳, 朱子强, 等. 基于模仿强化学习的固定翼飞机姿态控制器[J]. 海军航空大学学报, 2022, 37(5): 393-399.FU Y P, DENG X Y, ZHU Z Q, et al. Imitation reinforcement learning based attitude controller for fixed-wing aircraft[J]. Journal of Naval Aviation University, 2022, 37(5): 393-399 (in Chinese). [14] 周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(4): 126731.ZHOU P, HUANG J T, ZHANG S, et al. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(4): 126731 (in Chinese). [15] 黄江涛, 刘刚, 周攀, 等. 基于深度强化学习技术的舰载无人机自主着舰控制研究[J]. 南京师范大学学报(工程技术版), 2022, 22(3): 63-71.HUANG J T, LIU G, ZHOU P, et al. Research on autonomous landing control of carrier-borne UCAV based on deep reinforcement learning technology[J]. Journal of Nanjing Normal University (Engineering and Technology Edition), 2022, 22(3): 63-71 (in Chinese). [16] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J/OL]. (2017-08-28)[2023-06-14]. https://doi.org/10.48550/arXiv.1707.06347. [17] GU Y, CHENG Y H, YU K, et al. Anti-martingale proximal policy optimization[J]. IEEE Transactions on Cybernetics, 2023, 53(10): 6421-6432. doi: 10.1109/TCYB.2022.3170355 [18] CHAKRABORTY A, SEILER P, BALAS G J. Susceptibility of F/A-18 flight controllers to the falling-leaf mode: linear analysis[J]. Journal of Guidance, Control, and Dynamics, 2011, 34(1): 57-72. doi: 10.2514/1.50674 [19] 张永花. 舰载机着舰过程甲板运动建模及补偿技术研究[D]. 南京: 南京航空航天大学, 2012: 9-10.ZHANG Y H. Research on deck motion modeling and compensation technology of carrier-based aircraft landing process[D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2012: 9-10 (in Chinese). [20] WOODCPCK T J. Background information and user guide for MIL-F-8785C[R]. Washington, D. C. : Air Force Wright Aeronautical, 1982. -