北京航空航天大学学报 ›› 2017, Vol. 43 ›› Issue (12): 2431-2438.doi: 10.13700/j.bh.1001-5965.2016.0903

• 论文 • 上一篇    下一篇

基于Q-Learning算法和神经网络的飞艇控制

聂春雨1, 祝明1, 郑泽伟2, 武哲1   

  1. 1. 北京航空航天大学 航空科学与工程学院, 北京 100083;
    2. 北京航空航天大学 自动化科学与电气工程学院, 北京 100083
  • 收稿日期:2016-11-29 修回日期:2017-02-06 出版日期:2017-12-20 发布日期:2017-03-23
  • 通讯作者: 祝明 E-mail:zhuming@buaa.edu.cn
  • 作者简介:聂春雨,男,博士研究生。主要研究方向:机载软件技术、临近空间低速飞行器智能控制技术;祝明,男,博士,副教授,博士生导师。主要研究方向:飞行器总体设计、无人飞行器系统设计、无人飞行器控制、综合航电与机载软件技术。
  • 基金资助:
    国家自然科学基金(61503010);中央高校基本科研业务费专项资金(YWF-14-RSC-103)

Airship control based on Q-Learning algorithm and neural network

NIE Chunyu1, ZHU Ming1, ZHENG Zewei2, WU Zhe1   

  1. 1. School of Aeronautic Science and Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100083, China;
    2. School of Automation Science and Electrical Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100083, China
  • Received:2016-11-29 Revised:2017-02-06 Online:2017-12-20 Published:2017-03-23
  • Supported by:
    National Natural Science Foundation of China (61503010); the Fundamental Research Funds for the Central Universities (YWF-14-RSC-103)

摘要: 针对现代飞艇控制中动力学模型不确定性带来的系统建模和参数辨识工作较为复杂的问题,提出了一种基于自适应建模和在线学习机制的控制策略。设计了一种在分析实际运动的基础上建立飞艇控制马尔可夫决策过程(MDP)模型的方法,具有自适应性。采用Q-Learning算法进行在线学习并利用小脑模型关节控制器(CMAC)神经网络对动作值函数进行泛化加速。对本文方法进行仿真并与经过参数整定的PID控制器对比,验证了该控制策略的有效性。结果表明,在线学习过程能够在数小时内收敛,通过自适应方法建立的MDP模型能够满足常见飞艇控制任务的需求。本文所提控制器能够获得与PID控制器精度相当且更为智能的控制效果。

关键词: 飞艇, 马尔可夫决策过程(MDP), 机器学习, Q-Learning, 小脑模型关节控制器(CMAC)

Abstract: An autonomous on-line learning control strategy based on adaptive modeling mechanism was proposed aimed at system modeling and parameter identification problems resulting from dynamic model uncertainties in modern airship control. An adaptive method to establish airship control Markov decision process (MDP) model was introduced on the foundation of analyzing airship's actual motion. On-line learning was carried out by Q-Learning algorithm, and cerebellar model articulation controller (CMAC) network was brought in for generalization of action value functions to accelerate algorithm convergence speed. Simulations of this autonomous on-line learning controller and comparisons with parameters turned PID controllers in normal control tasks were presented to demonstrate Q-Learning controller's effectiveness. The results show that the controller's on-line learning processes can converge in a few hours and the airship control MDP model established by the adaptive method satisfies the need of normal control tasks. The controller designed in this paper obtains similar precision as PID controllers and performs even more intelligently.

Key words: airship, Markov decision process (MDP), machine learning, Q-Learning, cerebellar model articulation controller (CMAC)

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发