基于CGAN的避扰通信决策网络离线式训练方法

江民民; 李大朋; 邱昕; 慕福奇; 柴旭荣; 孙志浩

doi:10.13700/j.bh.1001-5965.2019.0448

基于CGAN的避扰通信决策网络离线式训练方法

doi: 10.13700/j.bh.1001-5965.2019.0448

1.
中国科学院大学微电子学院, 北京 100029
2.
中国科学院微电子研究所, 北京 100029

详细信息

作者简介:
江民民男, 硕士研究生。主要研究方向:人工智能、认知无线电

李大朋男, 博士, 副研究员。主要研究方向:数字信号处理

邱昕男, 博士, 研究员。主要研究方向:无线通信系统设计、通信信号处理技术

慕福奇男, 研究员, 博士生导师。主要研究方向:无线通信系统与技术、物联网传输与应用

柴旭荣男, 硕士, 高级工程师。主要研究方向:无线通信系统与技术、通信信号处理技术

孙志浩男, 硕士研究生。主要研究方向:数字信号处理

通讯作者:
李大朋, E-mail: insanegtp@sina.cn

中图分类号: TN974;TP181
计量
- 文章访问数: 617
- HTML全文浏览量: 102
- PDF下载量: 91
- 被引次数: 0
出版历程
- 收稿日期: 2019-08-16
- 录用日期: 2020-01-18
- 网络出版日期: 2020-07-20

An offline training method using CGAN for anti-jamming communication decision network

1.
School of Microelectronics, University of Chinese Academy of Sciences, Beijing 100029, China
2.
Institute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, China

More Information

Corresponding author: LI Dapeng, E-mail: insanegtp@sina.cn

摘要

摘要:
基于强化学习的避扰通信，由于需要不断地与环境交互从中学习到最优决策，其决策网络的训练时间受环境反馈速率的约束，通常耗时严重。针对这一问题，提出了一种离线式训练方法。构建出一种频谱虚拟环境生成器，可以快速生成大量的逼真合成频谱瀑布图，用于避扰通信决策网络训练。由于所提方法脱离真实环境反馈，形成离线式训练，进而显著提高模型训练效率。实验结果表明：与实时在线训练方法比较，所提离线式训练方法的训练时间可以减少50%以上。
- 强化学习 /
- 避扰通信 /
- 频谱瀑布图 /
- 条件生成对抗网络(CGAN) /
- 离线式训练
Abstract:
Due to the continuous interaction with the environment to learn the optimal decision, the training time of the decision network based on reinforcement learning is restricted by the feedback rate of the environment, which usually consumes a lot of time. To solve this problem, an offline training method is proposed. A spectrum virtual environment generator is constructed, which can quickly generate a large number of realistic synthetic spectrum waterfall images for the training of anti-jamming communication decision network. Because the method is separated from the real environment feedback, the offline training is formed and the efficiency of model training is improved significantly. Experimental results show that the training time of this offline method is reduced by more than 50% compared with the online real-time training method.
- reinforcement learning /
- anti-jamming communications /
- spectrum waterfall image /
- Conditional Generative Adversarial Nets (CGAN) /
- offline training

HTML全文

图 1 ADRLA训练过程图^[3]

Figure 1. Training process of ADRLA^[3]

下载: 全尺寸图片幻灯片

图 2 离线式快速避扰通信模型训练框架

Figure 2. An offline fast model training framework for anti-jamming communication

下载: 全尺寸图片幻灯片

图 3 产生频谱虚拟环境生成器的细节

Figure 3. Details of making spectrum virtual environment generator

下载: 全尺寸图片幻灯片

图 4 真实的SW图和相应的标注图

Figure 4. Real SW image and corresponding labeled image

下载: 全尺寸图片幻灯片

图 5 pix2pix中生成器和判别器功能作用

Figure 5. Functions of generator and discriminator in pix2pix

下载: 全尺寸图片幻灯片

图 6 增强后的pix2pix生成的合成SW图和原始pix2pix生成的合成SW图对比

Figure 6. Comparison between synthesis SW image generated by enhanced pix2pix and synthesis SW image generated by original pix2pix

下载: 全尺寸图片幻灯片

图 7 合成SW图和真实SW图

Figure 7. Synthesis SW image and real SW image

下载: 全尺寸图片幻灯片

图 8 条件图和合成SW图

Figure 8. Condition image and synthesis SW image

下载: 全尺寸图片幻灯片

图 9 扫频干扰模式的条件图和合成SW图

Figure 9. Condition image of sweeping jamming and corresponding synthesis SW image

下载: 全尺寸图片幻灯片

图 10 t时刻总体得分变化

Figure 10. Total reward variation of time t

下载: 全尺寸图片幻灯片

图 11 实验环境

Figure 11. Experimental environment

下载: 全尺寸图片幻灯片

图 12 在真实环境下的验证

Figure 12. Validation in real environment

下载: 全尺寸图片幻灯片

图 13 真实环境下本文方法和ADRLA的耗时对比

Figure 13. Comparison of time consumption between proposed method and ADRLA for real environment

下载: 全尺寸图片幻灯片

表 1 本文方法和ADRLA耗时对比

Table 1. Comparison of time consumption between proposed method and ADRLA

计算机性能	参数	T_{OFFLINE_ADRLA}/min	T_ADRLA/min
低性能(CPU)	T_{offline_init} =0.5s, T_CGAN=0.4s, T_{online_init}=40s, T_sample=0.8s, T_DQN=0.1s, k=1000	8.3	15.7
中等性能(1块GPU)	T_{offline_init}=0.3s, T_CGAN=0.2s, T_{online_init}=40s, T_sample=0.8s, T_DQN=0.05s, k=1000	4.2	14.8
高性能(4块GPU)	T_{offline_init}=0.1s, T_CGAN=0.1s, T_{online_init}=40s, T_sample=0.8s, T_DQN=0.01s, k=1000	1.8	14.2

下载: 导出CSV

参考文献(16)

[1]	DUAN R J, JIA L L, GUO P C.Research on spectrum allocation of HF access network based on intelligent frequency hopping[C]//8th International Symposium on Computational Intelligence and Design(ISCID).Piscataway: IEEE Press, 2015, 2: 295-300.
[2]	ZHANG L Y, GUAN Z Y, MELODIA T.United against the enemy:Anti-jamming based on cross-layer cooperation in wireless networks[J].IEEE Transactions on Wireless Communications, 2016, 15(8):5733-5747. doi: 10.1109/TWC.2016.2569083
[3]	LIU X, XU Y H, JIA L L, et al.Anti-jamming communications using spectrum waterfall:A deep reinforcement learning approach[J].IEEE Communications Letters, 2018, 22(5):998-1001. doi: 10.1109/LCOMM.2018.2815018
[4]	LIU Y, XU Y H, CHENG Y P, et al.A heterogeneous information fusion deep reinforcement learning for intelligent frequency selection of HF communication[J].China Communications, 2018, 15(9):73-84. doi: 10.1109/CC.2018.8456453
[5]	RIYAZ S, SANKLE K, LOANNIDIS S, et al.Deep learning convolutional neural networks for radio identification[J].IEEE Communications Magazine, 2018, 56(9):146-152. doi: 10.1109/MCOM.2018.1800153
[6]	XIAO L, CHEN T H, LIU J L, et al.Anti-jamming transmission Stackelberg game with observation errors[J].IEEE Communications Letters, 2015, 19(6):949-952. doi: 10.1109/LCOMM.2015.2418776
[7]	ERPEK T, SAGDUYU Y E, ERPEK T, et al.Adversarial deep learning for cognitive radio security: Jamming attack and defense strategies[C]//IEEE International Conference on Communications Workshops (ICC Workshops).Piscataway: IEEE Press, 2018, 5: 1-6.
[8]	WUNSCH F, PAOSANA F, RAJENDRAN S, et al.DySPAN spectrum challenge:Situational awareness and opportunistic spectrum access benchmarked[J].IEEE Transactions on Cognitive Communications and Networking, 2017, 3(3):550-562. doi: 10.1109/TCCN.2017.2745682
[9]	MIRZA M, OSINDERO S.Conditional generative adversarial nets[J].Computer Science, 2014, 27(8):2672-2680. http://d.old.wanfangdata.com.cn/Periodical/jsjfzsjytxxxb202006007
[10]	MINH V, KAVUKCUOGLU K, SILVER D, et al.Playing atari with deep reinforcement learning[EB/OL].(2014-11-06)[2019-08-15].https://arxiv.org/abs/1411.1784.
[11]	MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning[J].Nature, 2015, 518(7540):529-533. doi: 10.1038/nature14236
[12]	WANG J F, LI X, YANG J, et al.Stacked conditional generative adversarial networks for jointly learning shadow detection and shadow removal[EB/OL].(2017-12-07)[2019-08-15].https://arxiv.org/abs/1712.02478.
[13]	ODENA A, OLAH C, SHLENS J.Conditional image synthesis with auxiliary classifier GANs[C]//Proceedings of the 34th International Conference on Machine Learning.Piscataway: IEEE Press, 2017, 70: 2642-2651.
[14]	ISOLA P, ZHU J Y, ZHOU T H, et al.Image-to-image translation with conditional adversarial networks[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Piscataway: IEEE Press, 2017: 5967-5976.
[15]	MARTIN A, LÉON B.Towards principled methods for training generative adversarial networks[EB/OL].(2017-01-07)[2019-08-15].https://arxiv.org/abs/1701.04862.
[16]	MARTIN A, CHINTALA S, LÉON B.Wasserstein GAN[EB/OL].(2017-12-06)[2019-08-15].https://arxiv.org/abs/1701.07875.