-
摘要:
在目标跟踪问题中,针对飞行器控制算法难以适应目标大机动飞行甚至与我方博弈等难度较大任务的问题,提出了基于近似动态规划的目标追踪控制算法。该算法通过使用博弈策略对我方无人机进行训练形成经验,将双方位置等状态作为已知量,滚转方向作为控制量,利用两物体的相对位置得出其特征,并形成近似值函数;最终利用rollout算法进行最优跟踪决策求解,实现对跟踪目标甚至是博弈目标的灵活有效精确跟踪。仿真结果验证了近似动态规划用于控制算法的有效性。
Abstract:The control algorithm for the target tracking problem cannot be well adapted to the problem of large-scale maneuver flight or even game with us. This paper proposes a control algorithm for target tracking using approximate dynamic programming. The game algorithm is used to train our UAV to form an experience. The positions of both sides are taken as known quantity and the roll direction as the control quantity. The relative positions of two objects are used to derive their features and then an approximate function is formed. The rollout algorithm is used to obtain the optimal decision, and the flexible and effective tracking of tracking targets and even gaming targets can be achieved. The simulation results verify the effectiveness of approximate dynamic programming for control algorithms.
-
Key words:
- approximate dynamic programming /
- target tracking /
- flight control /
- optimal decision /
- game
-
表 1 近似动态规划变量说明
Table 1. ADP symbology
变量 说明 x 状态矢量 xi 在第i步的状态 xn X的第n个状态矢量 xterm 特殊的终止状态 xpos 无人机x坐标 ypos 无人机y坐标 X 状态矢量[x1, x2, …, xn]T f(x, u) 状态转移函数 π(x) 机动策略 π*(x) 最佳机动策略 π(x) 通过滚动算法生成的策略 J(x) 状态x的未来奖励值 Jk(x) J(x)的第k次迭代 Japprox(x) J(x)的函数逼近形式 S(x) 无人机的评估函数 γ 奖励折扣因子 u 控制或移动动作 ζ(x) 状态x的特征向量 β 函数参数向量 g(x) 目标奖励函数 gpa(x) 优势位置函数 pt 终止函数的概率 T Bellman逆操作因子 J*(x) J(x)的最佳值 算法1 优势位置函数gpa(x) 输入:{x}。 R=“飞行器与目标的欧几里得距离” if(0.1 m < R < 3.0 m) & (|AA| < 60°) & (|ATA| < 30°) then gpa(x)=1.0 else gpa(x)=0 end if 输出奖励:(gpa)。 算法2 状态转移函数f(xi, ub, ur) 输入:{xi, ub, ur}。 for i=1:5(once per Δt=0.05 s) do for{red, blue} do ( =40(°)/s, ϕrmax=18°, ϕbmax=23°) if u=L then ϕ=max(ϕ- Δt, -ϕmax) else if u=R then ϕ=min(ϕ+ Δt, ϕmax) end if tan ϕ(assume v=2.5 m/s) ψ=ψ+ Δt; xpos=xpos+Δtvsin ψ ypos=ypos+Δtvcos ψ end for end for 算法3 rollout算法 输入:xi。 初始化:Jbest=-∞。 for ub={L, S, R} do xtemp=f(xi, ub, πrnom(xtemp)) for j={1:Nrolls} do xtemp=f(xtemp, πapproxN(xtemp), πrnom(xtemp)) end for Jcurrent=[γJapproxN(xtemp)+g(xtemp)] if Jcurrent>Jbest then ubest=ub, Jbest=Jcurrent end if end for 输出:ubest。 表 2 各仿真图初始状态及目标策略
Table 2. Initial state and objet strategy of each simulation chart
xinit xbpos/m ybpos/m ψb/rad ϕb/rad xrpos/m yrpos/m ψr/rad ϕr/rad πr 1 0 1 0 0 1 0 0 0 Minimax 2 0 1 0 0 1 0 0 0 Maintain 3 0 1 -π/6 0 1 0 0 0 Maintain 4 0 1 π/6 0 1 0 0 0 Minimax 注:πr—敌机机动策略。 -
[1] 卢虎川, 李佩霞, 王栋.目标跟踪算法综述[J].模式识别与人工智能, 2018, 31(1):61-76. http://d.old.wanfangdata.com.cn/Periodical/mssbyrgzn201801008LU H C, LI P X, WANG D.Visual object tracking:A survey[J].Pattern Recognition and Artificial Intelligence, 2018, 31(1):61-76(in Chinese). http://d.old.wanfangdata.com.cn/Periodical/mssbyrgzn201801008 [2] CHENG Y Z. Meanshift, mode seeking, and clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995, 17(8):790-799. doi: 10.1109/34.400568 [3] ADAM A, RIVLIN E, SHIMSHONI I.Robust fragments-based tracking using the integral histogram[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2006: 798-805. http://www.cs.technion.ac.il/~amita/fragtrack/fragtrack_cvpr06.pdf [4] TURK M, PENTLAND A.Eigenfaces for recognition[J].Journal of Cognitive Neuroscience, 1991, 3(1):71-86. doi: 10.1162/jocn.1991.3.1.71 [5] TANG M, FENG J.Multi-kernel correlation filter for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2016: 3038-3046. https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Tang_Multi-Kernel_Correlation_Filter_ICCV_2015_paper.pdf [6] LI Y, ZHU J K.A scale adaptive kernel correlation filter tracker with feature integration[C]//Proceedings of the European Conference on Computer Vision.Berlin: Springer, 2014: 254-265. http://vigir.missouri.edu/~gdesouza/Research/Conference_CDs/ECCV_2014/workshops/w09/W9-07.pdf [7] DANELLJAN M, HGER G, KHAN F S.Accurate scale estimation for robust visual tracking[EB/OL].[2017-11-27].http://www.cvl.isy.liu.se/en/research/objrec/visualtracking/scalvistrack/index.html. [8] HENRIQUES J F, CASEIRO R, MARINS P, et al. High-speed tracking with kernelized correlation filters[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(3):583-596. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=c8f7e9e032e4e419c5c79d7a5f1f6494 [9] LI Y, ZHU J K, HOI S C H.Reliable patch trackers: Robust visual tracking by exploiting reliable patches[C]//Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2015: 353-361. https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Li_Reliable_Patch_Trackers_2015_CVPR_paper.pdf [10] MA C, HUANG J B, YANG X K, et al.Hierarchical convolutional features for visual tracking[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2015: 3074-3082. https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Ma_Hierarchical_Convolutional_Features_ICCV_2015_paper.pdf [11] 魏庆来.基于近似动态规划的非线性系统最优控制研究[D].沈阳: 东北大学, 2008: 6-8. http://cdmd.cnki.com.cn/Article/CDMD-10145-1012300362.htmWEI Q L.Researches on optimal control of nonlinear systems based on approximate dynamic programming[D].Shenyang: Northeastern University, 2008: 6-8(in Chinese). http://cdmd.cnki.com.cn/Article/CDMD-10145-1012300362.htm [12] BELLMAN R.On the theory of dynamic programming[J].Proceedings of the National Academy of Sciences of the United States of America, 1952, 38(8):716-719. doi: 10.1073/pnas.38.8.716 [13] ISAACS R.Games of pursuit[M].Santa Monica, CA:The Rand Corporation, 1951:256-257. [14] AUSTIN F, CARBONE G, FALCO M, et al.Game theory for automated maneuvering during air-to-air combat[J] Journal of Guidance, Control, and Dynamics, 1990, 13(6):1143-1149. doi: 10.2514/3.20590 [15] MCGREW J S, HOW J P, WILLIAMS B, et al.Air combat strategy using approximate dynamic programming[C]//AIAA Guidance, Navigation, and Control Conference and Exhibit.Reston: AIAA, 2008: 6-13. doi: 10.2514/1.46815 [16] ANWAR H, ZHU Q Y.Minimax game-theoretic approach to multiscale H-infinity optimal filtering[C]//2017 IEEE Global Conference on Signal and Information Processing(GlobalSIP).Piscataway, NJ: IEEE Press, 2017: 853-857. https://nyuscholars.nyu.edu/en/publications/minimax-game-theoretic-approach-to-multiscale-h-infinity-optimal- [17] SPRINKLE J, EKLUND J, KIM H, et al.Encoding aerial pursuit/evasion games with fixed wing aircraft into a nonlinear model predictive tracking controller[C]//Proceedings of 200443rd IEEE Conference on Decision and Control(CDC).Piscataway, NJ: IEEE Press, 2004: 2609-2614. https://www.researchgate.net/publication/4142744_Encoding_aerial_pursuitevasion_games_with_fixed_wing_aircraft_into_a_nonlinear_model_predictive_nacking_controller [18] EKLUND J, SPRINKLE J, KIM H, et al.Implementing and testing a nonlinear model predictive tracking controller for aerial pursuit/evasion games on a fixed wing aircraft[C]//Proceedings of 2005 American Control Conference.Piscataway, NJ: IEEE Press, 2005: 1509-1514. https://people.eecs.berkeley.edu/~sastry/pubs/PDFs%20of%20Pubs2000-2005/Publications%20of%20Postdocs/Eklund/Eklund.ImplementingTesting%202005.pdf [19] SHAW R.Fighter combat tactics and maneuvering[M].Annapolis:Naval Institute Press, 1985:12-15. [20] POWELLW B.Approximate dynamic programming solving the curses of dimensionality[M].2nd ed.Hoboken:John Wiley & Sons, Inc., 2011:305-307.