Airship control based on Q-Learning algorithm and neural network

NIE Chunyu; ZHU Ming; ZHENG Zewei; WU Zhe

doi:10.13700/j.bh.1001-5965.2016.0903

Volume 43 Issue 12

Dec. 2017

Turn off MathJax

Article Contents

Abstract

References

Journal of Beijing University of Aeronautics and Astronautics > 2017 > 43(12): 2431-2438.

WU Da-fang, GAO Zhen-tong, WANG Yong-haiet al. Experimental Study on Fuzzy Control of Transient Aerodynamic Heat Flow of Missile[J]. Journal of Beijing University of Aeronautics and Astronautics, 2002, 28(6): 682-684. (in Chinese)

Citation:

NIE Chunyu, ZHU Ming, ZHENG Zewei, et al. Airship control based on Q-Learning algorithm and neural network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2017, 43(12): 2431-2438. doi: 10.13700/j.bh.1001-5965.2016.0903(in Chinese)

Citation:

PDF( 3186 KB)

Airship control based on Q-Learning algorithm and neural network

doi: 10.13700/j.bh.1001-5965.2016.0903

1.
School of Aeronautic Science and Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100083, China
2.
School of Automation Science and Electrical Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100083, China

Funds:

National Natural Science Foundation of China 61503010

the Fundamental Research Funds for the Central Universities YWF-14-RSC-103

More Information

Corresponding author: ZHU Ming, E-mail:zhuming@buaa.edu.cn
Received Date: 29 Nov 2016
Accepted Date: 06 Feb 2017
Publish Date: 20 Dec 2017

Abstract

Abstract

An autonomous on-line learning control strategy based on adaptive modeling mechanism was proposed aimed at system modeling and parameter identification problems resulting from dynamic model uncertainties in modern airship control. An adaptive method to establish airship control Markov decision process (MDP) model was introduced on the foundation of analyzing airship's actual motion. On-line learning was carried out by Q-Learning algorithm, and cerebellar model articulation controller (CMAC) network was brought in for generalization of action value functions to accelerate algorithm convergence speed. Simulations of this autonomous on-line learning controller and comparisons with parameters turned PID controllers in normal control tasks were presented to demonstrate Q-Learning controller's effectiveness. The results show that the controller's on-line learning processes can converge in a few hours and the airship control MDP model established by the adaptive method satisfies the need of normal control tasks. The controller designed in this paper obtains similar precision as PID controllers and performs even more intelligently.
- airship,
- Markov decision process (MDP),
- machine learning,
- Q-Learning,
- cerebellar model articulation controller (CMAC)

FullText(HTML)

References(21)

References

[1]	PRENTICE B E, KNOTTS R.Cargo airships:International competition[J].Journal of Transportation Technologies, 2014, 4:187-195. doi: 10.4236/jtts.2014.43019
[2]	赵达, 刘东旭, 孙康文, 等.平流层飞艇研制现状、技术难点及发展趋势[J].航空学报, 2016, 37(1):45-56. ZHAO D, LIU D X, SUN K W, et al.Research status, technical difficulties and development trend of stratospheric airship[J].Acta Aeronautica et Astronautica Sinica, 2016, 37(1):45-56(in Chinese).
[3]	郭虓. 平流层浮空器轨迹优化研究[D]. 北京: 北京航空航天大学, 2013: 29-36. GUO X.Trajectory optimization research for stratospheric aerostat[D].Beijing:Beihang University, 2013:29-36(in Chinese).
[4]	KHOURY G A.Airship technology[M].New York:Cambridge University Press, 2012:34-40.
[5]	YANG Y, WU J, ZHENG W.Positioning control for an autonomous airship[J].Journal of Aircraft, 2016, 53(6):1638-1646. doi: 10.2514/1.C033709
[6]	ZHENG Z W, ZHU M, SHI D L, et al.Hovering control for a stratospheric airship in unknown wind:AIAA-2014-0973[R].Reston:AIAA, 2014. doi: 10.2514/6.2014-0973
[7]	ZHENG Z, LIU L, ZHU M.Integrated guidance and control path following and dynamic control allocation for a stratospheric airship with redundant control systems[J].Proceedings of the Institution of Mechanical Engineers, Part G:Journal of Aerospace Engineering, 2016, 230(10):1813-1826. doi: 10.1177/0954410015613738
[8]	YANG Y, YAN Y, ZHU Z, et al.Positioning control for an unmanned airship using sliding mode control based on fuzzy approximation[J].Proceedings of the Institution of Mechanical Engineers, Part G:Journal of Aerospace Engineering, 2014, 228(14):2627-2640. doi: 10.1177/0954410014523577
[9]	ABBEEL P, COATES A, QUIGLEY M, et al.An application of reinforcement learning to aerobatic helicopter flight[C]//Advances in Neural Information Processing Systems, 2007:1-8.
[10]	徐昕.增强学习与近似动态规划[M].北京:科学出版社, 2010:18-27. XU X.Reinforcement learning and approximate dynamic programing[M].Beijing:Science Press, 2010:18-27(in Chinese).
[11]	PEARRE B, BROWN T X.Model-free trajectory optimization for unmanned aircraft serving as data ferries for widespread sensors[J].Remote Sensing, 2012, 4(10):2971-3005.
[12]	RAGI S, CHONG E K P.UAV path planning in a dynamic environment via partially observable Markov decision process[J].IEEE Transactions on Aerospace and Electronic Systems, 2013, 49(4):2397-2412. doi: 10.1109/TAES.2013.6621824
[13]	DUNN C, VALASEK J, KIRKPATRICK K.Unmanned air system search and localization guidance using reinforcement learning:AIAA-2012-2589[R].Reston:AIAA, 2012. doi: 10.2514/6.2012-2589
[14]	ZHANG B, MAO Z, LIU W, et al.Geometric reinforcement learning for path planning of UAVs[J].Journal of Intelligent & Robotic Systems, 2015, 77(2):391-409. doi: 10.1007/s10846-013-9901-z
[15]	FAUST A.Reinforcement learning and planning for preference balancing tasks[J].AI Matters, 2015, 1(3):8-12.
[16]	KO J, KLEIN D J, FOX D, et al.Gaussian processes and reinforcement learning for identification and control of an autonomous blimp[C]//Proceedings 2007 IEEE International Conference on Robotics and Automation.Piscataway, NJ:IEEE Press, 2007:742-747.
[17]	ROTTMANN A, PLAGEMANN C, HILGERS P, et al.Autonomous blimp control using model-free reinforcement learning in a continuous state and action space[C]//2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.Piscataway, NJ:IEEE Press, 2007:1895-1900.
[18]	LIN C M, PENG Y F.Adaptive CMAC-based supervisory control for uncertain nonlinear systems[J].IEEE Transactions on Systems, Man, and Cybernetics, Part B(Cybernetics), 2004, 34(2):1248-1260. doi: 10.1109/TSMCB.2003.822281
[19]	SCHMIDT D K.Modeling and near-space station keeping control of a large high-altitude airship[J].Journal of Guidance, Control, and Dynamics, 2007, 30(2):540-547. doi: 10.2514/1.24865
[20]	LS-S1200 UAV airship system overview parameters[EB/OL].[2017-12-18].
[21]	ATAEI M, YOUSEFI-KOMA A.Three-dimensional optimal path planning for waypoint guidance of an autonomous underwater vehicle[J].Robotics and Autonomous Systems, 2015, 67:23-32. doi: 10.1016/j.robot.2014.10.007

Relative Articles

Supplements(0)

Cited By

Cited by

Periodical cited type(5)

1.	龙远，邓小龙，杨希祥，侯中喜. 基于PSO-BP神经网络的平流层风场短期快速预测. 北京航空航天大学学报. 2022(10): 1970-1978 . 本站查看
2.	张秦浩，敖百强，张秦雪. Q-learning强化学习制导律. 系统工程与电子技术. 2020(02): 414-419 .
3.	卫玉梁，靳伍银. 基于神经网络Q-learning算法的智能车路径规划. 火力与指挥控制. 2019(02): 46-49 .
4.	闫军威，黄琪，周璇. 基于Double-DQN的中央空调系统节能优化运行. 华南理工大学学报(自然科学版). 2019(01): 135-144 .
5.	汪黎明. 制造企业零库存管理物资调度方法研究. 价值工程. 2019(23): 126-129 .

Other cited types(5)

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(9) / Tables(3)

Get Citation

PDF

XML

Article Metrics

Article views(766) PDF downloads(567)