-
摘要:
针对多无人机区域覆盖任务中的机间通信资源分配问题,提出了一种基于强化学习的多智能体动态通信资源分配模型。利用多智能体生成树覆盖方法生成任务区域内各个无人机的覆盖航线,对无人机与地面基站及无人机之间的通信链路进行建模。由于飞行环境的不确定性,将长期的资源分配问题建模为随机博弈模型,将无人机间的空-空链路视作一个智能体,每个智能体采取的动作包含选择工作频段和发送端的传输功率。在此基础上,基于双深度Q网络(DDQN)设计多智能体强化学习(MARL)模型,使得每个智能体通过奖励函数的反馈学习到最优通信资源分配策略。仿真结果表明:MARL模型能够在动态航迹下自适应选择最佳通信资源分配策略,提高时延约束下的负载交付成功率,同时降低空-空链路对空地下行链路的干扰并增大信道总容量。
Abstract:This study presents a reinforcement learning-based multi-agent dynamic communication resource allocation model that addresses the issue of communication resource allocation in multi-UAV area coverage tasks. We first generate the coverage route of each UAV in the mission area by the multi-agent spanning tree coverage (MSTC) method, and model the communication link between the UAV and ground base station as well as UAV pairs. The uncertainty inherent in the flight environment motivates the modeling of the long-term resource allocation problem as a random game. T Considered an agent, the air-to-air connection between UAVs entails receiver, subchannel, and transmission power selection, among other modifications. We then design a multi-agent reinforcement learning (MARL) model based on the double deep Q-network (DDQN), where each agent learns the optimal communication resource allocation strategy through the feedback of the reward function. As shown by simulation results, the proposed MARL method can increase the overall channel capacity, decrease interference from air-to-ground uplink, and optimize communication resource allocation strategies under dynamic trajectories and delay constraints, while also improving the success rate of load delivery.
-
表 1 环境参数
Table 1. 1Environmental parameters
参数 数值 U2U智能体数目$ k $/个 4 带宽$ W $/MHz 4 载频$ {f}_{{\mathrm{c}}} $/GHz 2 基站天线高度/m 25 无人机高度/m 100 无人机速度/(m·s−1) [16,20] 传输时延$ T $/ms 100 传输负载$ L $/Byte 2×1 060 U2U传输功率等级/dBm [23,10,5,−100] U2I传输功率/dBm 23 覆盖区域大小/(m×m) 400×400 表 2 DDQN网络参数
Table 2. 2Parameters of the DDQN network
参数 数值 全连接层数/层 3 每层神经元个数/个 [500,250,120] 最小贪婪系数$ \varepsilon $ 0.02 折扣率$ \delta $ 1.0 训练轮数$ {e}^{\mathrm{m}\mathrm{a}\mathrm{x}} $/轮 3 000 每轮训练步数$ N $/步 100 记忆库容量/条 200 000 采样大小$ B $/条 100 更新目标值网络步数c/步 400 -
[1] HANSCOM A F B, BEDFORD M A. Unmanned aircraft system (UAS) service demand 2015-2035: Literature review & projections of future usage[R]. Washington: Department of Transportation, 2013. [2] 李艳庆. 基于遗传算法和深度强化学习的多无人机协同区域监视的航路规划[D]. 西安: 西安电子科技大学, 2018.LI Y Q. Cooperative path planning for region surveillance of multi-UAV based on genetic algorithm and deep reinforcement learning[D]. Xi’an: Xidian University, 2018(in Chinese). [3] 赵林, 张宇飞, 姚明旿, 等. 无人机集群协同技术发展与展望[J]. 无线电工程, 2021, 51(8): 823-828. doi: 10.3969/j.issn.1003-3106.2021.08.023ZHAO L, ZHANG Y F, YAO M W, et al. Development and trend of UAV swarm cooperative techniques[J]. Radio Engineering, 2021, 51(8): 823-828(in Chinese). doi: 10.3969/j.issn.1003-3106.2021.08.023 [4] 刘立辉, 赵彦杰, 赵小虎, 等. 一种无人集群系统仿真平台设计[J]. 中国电子科学研究院学报, 2017, 12(5): 506-512. doi: 10.3969/j.issn.1673-5692.2017.05.013LIU L H, ZHAO Y J, ZHAO X H, et al. Design of a simulation platform of unmanned swarm system[J]. Journal of China Academy of Electronics and Information Technology, 2017, 12(5): 506-512(in Chinese). doi: 10.3969/j.issn.1673-5692.2017.05.013 [5] BUCAILLE I, HÉTHUIN S, MUNARI A, et al. Rapidly deployable network for tactical applications: aerial base station with opportunistic links for unattended and temporary events ABSOLUTE example[C]//2013 IEEE Military Communications Conference. Piscataway: IEEE Press, 2013: 1116-1120. [6] XIAO Z Y, XIA P F, XIA X G. Enabling UAV cellular with millimeter-wave communication: potentials and approaches[J]. IEEE Communications Magazine, 2016, 54(5): 66-73. doi: 10.1109/MCOM.2016.7470937 [7] JIANG B, YANG J C, XU H F, et al. Multimedia data throughput maximization in internet-of-things system based on optimization of cache-enabled UAV[J]. IEEE Internet of Things Journal, 2019, 6(2): 3525-3532. doi: 10.1109/JIOT.2018.2886964 [8] MARDANI A, CHIABERGE M, GIACCONE P. Communication-aware UAV path planning[J]. IEEE Access, 2019, 7: 52609-52621. doi: 10.1109/ACCESS.2019.2911018 [9] ZHAN C, HU H, SUI X F, et al. Joint resource allocation and 3D aerial trajectory design for video streaming in UAV communication systems[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(8): 3227-3241. doi: 10.1109/TCSVT.2020.3035618 [10] 胡欣颖. 面向信息采集场景的无人机轨迹规划与通信资源分配研究[D]. 北京: 北京邮电大学, 2021: 8-19.HU X Y. Trajectory design and resource allocation of uav for data collection in WSNs[D]. Beijing: Beijing University of Posts and Telecommunications, 2021: 8-19(in Chinese). [11] MANZOOR A, KIM D H, HONG C S. Energy efficient resource allocation in UAV-based heterogeneous networks[C]// 2019 20th Asia-Pacific Network Operations and Management Symposium. Piscataway: IEEE Press, 2019: 1-4. [12] ZENG F Z, HU Z Z, XIAO Z, et al. Resource allocation and trajectory optimization for QoE provisioning in energy-efficient UAV-enabled wireless networks[J]. IEEE Transactions on Vehicular Technology, 2020, 69(7): 7634-7647. doi: 10.1109/TVT.2020.2986776 [13] LU W D, SI P Y, HUANG G X, et al. Interference reducing and resource allocation in UAV-powered wireless communication system[C]// 2020 International Wireless Communications and Mobile Computing. Piscataway: IEEE Press, 2020: 220-224. [14] 陶丽佳, 赵宜升, 徐新雅. 无人机协助边缘计算的能量收集MEC系统资源分配策略[J]. 南京邮电大学学报(自然科学版), 2022, 42(1): 37-44.TAO L J, ZHAO Y S, XU X Y. Resource allocation strategy for UAV-assisted edge computing in energy harvesting MEC system[J]. Journal of Nanjing University of Posts and Telecommunications (Natural Science Edition), 2022, 42(1): 37-44(in Chinese). [15] QIN X T, SONG Z Y, HAO Y Y, et al. Joint resource allocation and trajectory optimization for multi-UAV-assisted multi-access mobile edge computing[J]. IEEE Wireless Communications Letters, 2021, 10(7): 1400-1404. doi: 10.1109/LWC.2021.3068793 [16] 龚广伟, 谢添, 赵海涛, 等. 基于图着色的大规模无人机群三维网络资源分配算法[J]. 信号处理, 2022, 38(8): 1693-1702.GONG G W, XIE T, ZHAO H T, et al. A three-dimension resources allocation algorithm for large-scale UAV network based on graph coloring[J]. Journal of Signal Processing, 2022, 38(8): 1693-1702(in Chinese). [17] 吴启晖, 陈佳馨, 吴盛君, 等. 一种基于动态航迹的无人机群频谱资源分配方法: CN108632831A[P]. 2021-08-10.WU Q H, CHEN J X, WU S J, et al. A dynamic trajectory-based spectrum resource allocation method for UAV swarms: CN108632831A[P]. 2021-08-10(in Chinese). [18] CHEN M Z, SAAD W, YIN C C. Liquid state machine learning for resource allocation in a network of cache-enabled LTE-U UAVs[C]//2017 IEEE Global Communications Conference. Piscataway: IEEE Press, 2017: 1-6. [19] CHEN J X, WU Q H, XU Y H, et al. Distributed demand-aware channel-slot selection for multi-UAV networks: a game-theoretic learning approach[J]. IEEE Access, 2018, 6: 14799-14811. doi: 10.1109/ACCESS.2018.2811372 [20] ZHANG X C, ZHAO H T, XIONG J, et al. Scalable power control/beamforming in heterogeneous wireless networks with graph neural networks[C]// 2021 IEEE Global Communications Conference. Piscataway: IEEE Press, 2021: 1-6. [21] 贺颖. 基于深度强化学习的无线网络多维资源分配技术研究[D]. 大连: 大连理工大学, 2018.HE Y. Research on multi-dimensional resource allocation of wireless networks based on deep reinforcement learning[D]. Dalian: Dalian University of Technology, 2018(in Chinese). [22] ZHANG Q Q, MOZAFFARI M, SAAD W, et al. Machine learning for predictive on-demand deployment of uavs for wireless communications[C]// 2018 IEEE Global Communications Conference. Piscataway: IEEE Press, 2018: 1-6. [23] SUN N, WU J X. Minimum error transmissions with imperfect channel information in high mobility systems[C]//2013 IEEE Military Communications Conference. Piscataway: IEEE Press, 2013: 922-927. [24] CHUA M Y K, YU F R, LI J, et al. Medium access control for unmanned aerial vehicle (UAV) ad-hoc networks with full-duplex radios and multipacket reception capability[J]. IEEE Transactions on Vehicular Technology, 2013, 62(1): 390-394. doi: 10.1109/TVT.2012.2211905 [25] 任博, 刘圣宇, 崔连柱, 等. 多无人机侦察航路规划[C]// 2014(第五届)中国无人机大会论文集. 北京: 航空工业出版社, 2014: 793-797.REN B, LIU S Y, CUI L Z, et al. Multi-UAV reconnaissance route planning[C]//Proceedings of China Drone Conference. Beijing: Aviation Industry Press, 2014: 793-797(in Chinese). [26] ACEVEDO J J, ARRUE B C, MAZA I, et al. Distributed cooperation of multiple UAVs for area monitoring missions[M]//CARBONE G, GOMEZ-BRAVO F. Motion and Operation Planning of Robotic Systems. Cham: Springer, 2015: 471-494. [27] 张睿文, 宋笔锋, 裴扬, 等. 复杂任务场景无人机集群自组织侦察建模与仿真[J]. 航空工程进展, 2020, 11(3): 316-325,343.ZHANG R W, SONG B F, PEI Y, et al. Modelling and simulation of UAV swarm self-organized surveillance in complex mission scenarios[J]. Advances in Aeronautical Science and Engineering, 2020, 11(3): 316-325,343(in Chinese). [28] HAZON N, KAMINKA G A. On redundancy, efficiency, and robustness in coverage for multiple robots[J]. Robotics and Autonomous Systems, 2008, 56(12): 1102-1114. doi: 10.1016/j.robot.2008.01.006 [29] GABRIELY Y, RIMON E. Spanning-tree based coverage of continuous areas by a mobile robot[C]// Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation. Piscataway: IEEE Press, 2001: 1927-1933. [30] 3GPP. Study on enhanced LTE support for aerial vehicles: TR 36.777[S]. Nice, France: ETSI, 2017. [31] 3GPP. 5G; Study on channel model for frequencies from 0.5 to 100 GHz: TR 38.901[S]. Nice, France: ETSI, 2020. [32] 3GPP. 5G; Unmanned aerial system (UAS) support in 3GPP: TS 22.125[S]. Nice, France: ETSI, 2020. [33] RICE M, DYE R, WELLING K. Narrowband channel model for aeronautical telemetry[J]. IEEE Transactions on Aerospace and Electronic Systems, 2000, 36(4): 1371-1376. doi: 10.1109/7.892684 [34] GODDEMEIER N, WIETFELD C. Investigation of air-to-air channel characteristics and a UAV specific extension to the rice model[C]// 2015 IEEE Globecom Workshops. Piscataway: IEEE Press, 2015: 1-5. [35] AHMED N, KANHERE S S, JHA S. On the importance of link characterization for aerial wireless sensor networks[J]. IEEE Communications Magazine, 2016, 54(5): 52-57. doi: 10.1109/MCOM.2016.7470935 [36] ABUALHAOL I Y, MATALGAH M M. Performance analysis of multi-carrier relay-based UAV network over fading channels[C]// 2010 IEEE Globecom Workshops. Piscataway: IEEE Press, 2010: 1811-1815. [37] GODDEMEIER N, ROHDE S, WIETFELD C. Experimental validation of RSS driven UAV mobility behaviors in IEEE 802.11s networks[C]// 2012 IEEE Globecom Workshops. Piscataway: IEEE Press, 2012: 1550-1555. [38] NOWÉ A, VRANCX P, DE HAUWERE Y M. Game theory and multi-agent reinforcement learning[M]// WIERING M, VAN OTTERLO M. Reinforcement Learning. Berlin: Springer, 2012: 441-470. [39] NEYMAN A. From Markov chains to stochastic games[M]// Stochastic Games and Applications. Dordrecht: Springer Netherlands, 2003: 9-25. [40] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 1998. [41] TAMPUU A, MATIISEN T, KODELJA D, et al. Multiagent cooperation and competition with deep reinforcement learning[J]. PLoS One, 2017, 12(4): e0172395. doi: 10.1371/journal.pone.0172395