北京航空航天大学学报 ›› 2018, Vol. 44 ›› Issue (11): 2247-2256.doi: 10.13700/j.bh.1001-5965.2018.0132

• 论文 • 上一篇    下一篇

基于多目标优化与强化学习的空战机动决策

杜海文1, 崔明朗1, 韩统1, 魏政磊1, 唐传林2, 田野3   

  1. 1. 空军工程大学 航空工程学院, 西安 710038;
    2. 94782部队, 杭州 310004;
    3. 福州大学 物理与信息工程学院, 福州 350108
  • 收稿日期:2018-03-15 修回日期:2018-06-15 出版日期:2018-11-20 发布日期:2018-11-27
  • 通讯作者: 杜海文,E-mail:18191856512@163.com E-mail:18191856512@163.com
  • 作者简介:杜海文,男,硕士,教授,硕士生导师。主要研究方向:机载武器系统应用工程;崔明朗,男,硕士研究生。主要研究方向:无人飞行器武器作战系统与技术;韩统,男,博士,副教授,硕士生导师。主要研究方向:机载武器系统应用工程。
  • 基金资助:
    国家自然科学基金(61601505);陕西省自然科学基金(2017JM6078)

Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning

DU Haiwen1, CUI Minglang1, HAN Tong1, WEI Zhenglei1, TANG Chuanlin2, TIAN Ye3   

  1. 1. College of Aeronautics and Astronautics, Air Force Engineering University, Xi'an 710038, China;
    2. Unit 94782 of PLA, Hangzhou 310004, China;
    3. College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
  • Received:2018-03-15 Revised:2018-06-15 Online:2018-11-20 Published:2018-11-27
  • Supported by:
    National Natural Science Foundation of China (61601505); Natural Science Foundation of Shaanxi Province of China (2017JM6078)

摘要: 为了解决无人机自主空战中的机动决策问题,提出了一种将优化思想与机器学习相结合的机动决策模型。采用多目标优化方法作为决策模型核心,既解决了传统优化方法需要为多个优化目标设置权重的困难,又提高了决策模型的可拓展性;同时在多目标优化的基础上通过强化学习方法训练评价网络进行辅助决策,解决了决策模型在对抗时博弈性不足的缺点。为了测试决策模型的性能,以近距空战为背景,设计了3组仿真实验分别验证多目标优化方法的可行性、辅助决策网络的有效性以及决策模型的总体性能,仿真结果表明,决策模型可以对有机动的敌机进行有效的实时机动对抗。

关键词: 自主空战, 机动决策, 多目标优化, 强化学习, 神经网络

Abstract: To solve the problem of maneuvering decision in the autonomous air combat of unmanned combat aerial vehicle, the existing research achievements are analyzed and a maneuvering decision model that combines optimization idea with machine learning is proposed. The multi-objective optimization method is used as the core of decision model, which solves the problem of setting weight for multiple optimization targets and improves the extensibility of decision model. On the basis of multi-objective optimization, an evaluation network is trained by reinforcement learning and used for auxiliary decision-making to enhance the antagonism of decision model. In order to test the performance of decision model, with the background of short-range air combat, three simulation experiments are designed to test the feasibility of multi-objective optimization method, the effectiveness of auxiliary decision network and the overall performance of decision model. The simulation results show that the maneuvering decision model can be used in real-time confrontation with the maneuvering enemy aircraft.

Key words: autonomous air combat, maneuvering decision, multi-objective optimization, reinforcement learning, neural network

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发