基于强化学习的航空兵认知行为模型

马耀飞; 龚光红; 彭晓源

基于强化学习的航空兵认知行为模型

北京航空航天大学自动化科学与电气工程学院, 北京 100191

基金项目: 装备预研重点基金资助项目(9140A04040106HT0801)

详细信息

作者简介:
马耀飞(1981-),男,河南虞城人,讲师,mayaofei@gmail.com.

中图分类号: TP 391.9
计量
- 文章访问数: 3560
- HTML全文浏览量: 338
- PDF下载量: 1768
- 被引次数: 0
出版历程
- 收稿日期: 2009-03-26
- 网络出版日期: 2010-04-30

Cognition behavior model for air combat based on reinforcement learning

School of Automation Science and Electrical Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100191, China

摘要

摘要: 航空兵的认知行为模型为仿真航空兵的空战决策提供支持,通过强化学习积累战术决策经验.在虚拟战场环境中,作战态势通过多个属性进行描述,这使得强化学习过程将面临一个高维度的问题空间.传统的空间离散化方法处理高维空间时将对计算资源和存储资源产生极大需求,因此不可用.通过构造一个基于高斯径向基函数的拟合网络解决了这个问题,大大减少了对资源的需求以及强化学习周期,并最终产生了合理的机动策略.模型的有效性和自适应性通过一对一的空战仿真进行了验证,产生的交战轨迹与人类飞行员产生的交战轨迹类似.
- 强化学习 /
- 自适应系统 /
- 仿真
Abstract: A cognition model was proposed to support tactical decisions for simulated fighters to fight with each other in a virtual combat, and reinforcement learning (RL) technology was used to acquire knowledge. The combat situation was described by multi-attributes, which resulted in a high dimensional problem space in which the fighters learned to find action policies. The traditional approach that partitioned the problem space would impose demand on huge computation and storage resource. An approximation network is constructed based on Gaussian radial basis function to approximate the state value, which greatly reduced the resource demand and learning cycle time, and produced reasonable maneuver strategy. The model was verified by a one-to-one air combat simulation, and the produced trajectories are similar with those that human pilots flied in real combat.
- reinforcement learning /
- adaptive systems /
- simulation

HTML全文

参考文献(1)

[1] 尹全军.基于多Agent的计算机生成兵力建模与仿真 .长沙:国防科学技术大学机电工程与自动化学院,2005 Yin Quanjun.Modeling and simulation of computer generated forces based on multi-agent .Changsha:College of Mechanical Engineering and Automation,National University of Defense Technology,2005(in Chinese)[2] 张汝波.强化学习理论及应用[M].哈尔滨:哈尔滨工程大学出版社,2001 Zhang Rubo.Reinforcement learning theory and application[M].Harbin: Harbin Engineering University Press,2001(in Chinese)[3] Howard R A.Dynamic programming and markov processes[M].Cambridge: MIT Press,1960[4] Sutton R S,Barto A G.Time derivative models of pavlovian reinforcement,learning and computational neuroscience: foundations of adaptive networks[M].Cambridge: MIT Press,1990:497-537[5] Baron Sheldon,Kelinman D L,Serben Saul.A study of the markov game approach to tactical maneuvering problems .NASA CR-1979,1972[6] Moore A W,Atkeson C G.The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces [J].Machine Learning,1995,21(3):199-233[7] Park J,Sandberg I W.Universal approximation using radial-basis function network [J].Neural Computation,1991(3):246-257[8] Schaal S,Atkeson C G.From isolation to cooperation: an alternative view of a system of experts //Touretzky D S,Hasselmo M E.Advances in Neural Information Processing Systems 8.MA: MIT Press,1996: 605-611[9] 高浩,朱培申,高正红.高等飞行动力学[M].北京:国防工业出版社,2004:26-91 Gao Hao,Zhu Peishen,Gao Zhenghong.The advanced flight dynamics[M].Beijing: National Defense Industry Press,2004:26-91(in Chinese)[10] Virtanen K,Raivio T,Hmlinen R P.A decision analytic simulation approach to flight simulation .Helsinki:System Analysis Laboratory,2007 .[11] Kaebling L P,Littman M L,Moore A W.Reinforcement: a survey [J].Journal of Artificial Intelligence Research,1996(4):237-285

施引文献

资源附件(0)

访问统计