留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

改进深度卷积生成式对抗网络的文本生成图像

李云红 朱绵云 任劼 苏雪平 周小计 于惠康

李云红,朱绵云,任劼,等. 改进深度卷积生成式对抗网络的文本生成图像[J]. 北京航空航天大学学报,2023,49(8):1875-1883 doi: 10.13700/j.bh.1001-5965.2021.0588
引用本文: 李云红,朱绵云,任劼,等. 改进深度卷积生成式对抗网络的文本生成图像[J]. 北京航空航天大学学报,2023,49(8):1875-1883 doi: 10.13700/j.bh.1001-5965.2021.0588
LI Y H,ZHU M Y,REN J,et al. Text-to-image synthesis based on modified deep convolutional generative adversarial network[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(8):1875-1883 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0588
Citation: LI Y H,ZHU M Y,REN J,et al. Text-to-image synthesis based on modified deep convolutional generative adversarial network[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(8):1875-1883 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0588

改进深度卷积生成式对抗网络的文本生成图像

doi: 10.13700/j.bh.1001-5965.2021.0588
基金项目: 国家自然科学基金(61902301);陕西省自然科学基础研究计划重点项目(2022JZ-35)
详细信息
    通讯作者:

    E-mail:hitliyunhong@163.com

  • 中图分类号: TP391.41

Text-to-image synthesis based on modified deep convolutional generative adversarial network

Funds: National Natural Science Foundation of China (61902301); Key Project of Natural Science Basic Research Plan in Shaanxi Province of China (2022JZ-35)
More Information
  • 摘要:

    针对深度卷积生成式对抗网络(DCGAN) 模型高维文本输入表示的稀疏性导致以文本为条件生成的图像结构缺失和图像不真实的问题,提出了一种改进深度卷积生成式对抗网络模型CA-DCGAN。采用深度卷积网络和循环文本编码器对输入的文本进行编码,得到文本的特征向量表示。引入条件增强(CA)模型,通过文本特征向量的均值和协方差矩阵产生附加的条件变量,代替原来的高维文本特征向量。将条件变量与随机噪声结合作为生成器的输入,并在生成器的损失中额外加入KL损失正则化项,避免模型训练过拟合,使模型可以更好的收敛,在判别器中使用谱约束(SN)层,防止其梯度下降太快造成生成器与判别器不平衡训练而发生模式崩溃的问题。实验验证结果表明:所提模型在Oxford-102-flowers和CUB-200数据集上生成的图像质量较alignDRAW、GAN-CLS、GAN-INT-CLS、StackGAN(64×64)、StackGAN-v1(64×64)模型更好且接近于真实样本,初始得分值最低分别提高了10.9%和5.6%,最高分别提高了41.4%和37.5%,FID值最低分别降低了11.4%和8.4%,最高分别降低了43.9%和42.5%,进一步表明了所提模型的有效性。

     

  • 图 1  GAN网络结构

    Figure 1.  GAN network structure

    图 2  CA-DCGAN宏观结构

    Figure 2.  Macroscopic structure of CA-DCGAN

    图 3  CA-DCGAN生成器网络结构

    Figure 3.  Generator network structure of CA-DCGAN

    图 4  CA-DCGAN判别器网络结构

    Figure 4.  Discriminator network structure of CA-DCGAN

    图 5  alignDRAW、GAN-CLS、GAN-INT-CLS、CA-DCGAN在Oxford-102-flowers数据集上生成结果比较

    Figure 5.  Comparison of generated results by alignDRAW, GAN-CLS,GAN-INT-CLS,CA-DCGAN models on Oxford-102-flowers dataset

    图 6  alignDRAW、GAN-CLS、GAN-INT-CLS、CA-DCGAN在CUB-200数据集上生成结果比较

    Figure 6.  Comparison of generated results by alignDRAW,GAN-CLS,GAN-INT-CLS,CA-DCGAN models on CUB-200 dataset

    图 7  StackGAN(64×64)、StackGAN-v1(64×64)与CA-DCGAN模型在Oxford-102-flowers和CUB-200数据集上生成结果比较

    Figure 7.  Comparison of generated results by StackGAN(64×64), StackGAN-v1(64×64) and CA-DCGAN models on Oxford-102-flowers dataset and CUB-200 dataset

    图 8  GAN-CLS、GAN-INT-CLS、CA-DCGAN在Oxford-102-flowers数据集上生成图像多样性对比

    Figure 8.  Comparison of image diversity generated by GAN-CLS,GAN-INT-CLS and CA-DCGAN on Oxford-102-flowers dataset

    图 9  GAN-CLS、GAN-INT-CLS、CA-DCGAN在CUB-200数据集上生成图像多样性对比

    Figure 9.  Comparison of image diversity generated by GAN-CLS,GAN-INT-CLS and CA-DCGAN on CUB-200 dataset

    表  1  实验数据集

    Table  1.   Experimental datasets

    数据集图像数量文本描述/图像类别
    Oxford-102-flowers训练集703410102
    测试集115510102
    CUB-200训练集885510200
    测试集293310200
    下载: 导出CSV

    表  2  不同模型在Oxford-102-flowers和CUB-200数据集上GIS值比较

    Table  2.   GIS comparison of different models on Oxford-102-flowers and CUB-200 datasets

    模型Oxford-102-flowersCUB-200
    alignDRAW2.152.32
    GAN-CLS2.492.64
    GAN-INT-CLS2.662.88
    StackGAN[20](64×64)2.702.93
    StackGAN-v1[21](64×64)2.743.02
    CA-DCGAN3.043.19
    下载: 导出CSV

    表  3  不同模型在Oxford-102-flowers和CUB-200数据集上GFID值比较

    Table  3.   GFID comparison of different models on Oxford-102-flowers and CUB-200 datasets

    模型Oxford-102CUB-200
    alignDRAW96.8788.54
    GAN-CLS88.7879.25
    GAN-INT-CLS79.9568.79
    StackGAN[20](64×64)61.3255.58
    StackGAN-v1[21](64×64)43.0235.11
    CA-DCGAN54.3650.92
    下载: 导出CSV

    表  4  KL引入后指标变化对比

    Table  4.   Comparison of index change on KL’s introduction

    模型GIS GFID
    Oxford-102-flowersCUB-200Oxford-102-flowersCUB-200
    Baseline2.913.12 69.7560.24
    Baseline +KL,$ \lambda $=12.953.1463.3454.56
    Baseline +KL,$ \lambda $=23.043.1954.3650.92
    Baseline +KL,$ \lambda $=52.923.1165.4959.51
    Baseline +KL,$ \lambda $=102.893.0668.4763.89
    下载: 导出CSV
  • [1] ZHOU K Y, YANG Y X, HOSPEDALES T, et al. Deep domain-adversarial image generation for domain generalisation[C]//34th AAAI Conference on Artificial Intelligence/32nd Innovative Applications of Artificial Intelligence Conference/10th AAAI Symposium on Educational Advances in Artificial Intelligence. Palo Alto: AAAI, 2020, 34: 13025-13032.
    [2] 陆婷婷, 李潇, 张尧, 等. 基于三维点云模型的空间目标光学图像生成技术[J]. 北京航空航天大学学报, 2020, 46(2): 274-286.

    LU T T, LI X, ZHANG Y, et al. A technology for generation of space object optical image based on 3D point cloud model[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(2): 274-286(in Chinese).
    [3] ZHANG Z, XIE Y, YANG L. Photographic text-to-image synthesis with a hierarchically-nested adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6199-6208.
    [4] 牛蒙蒙, 沈明瑞, 秦波, 等. 基于GAN的刀具状态监测数据集增强方法[J]. 组合机床与自动化加工技术, 2021(4): 113-115. doi: 10.13462/j.cnki.mmtamt.2021.04.027

    NIU M M, SHEN M R, QIN B, et al. A data augmentation method based on GAN in tool condition monitoring[J]. Combined Machine Tool and Automatic Machining Technology, 2021(4): 113-115(in Chinese). doi: 10.13462/j.cnki.mmtamt.2021.04.027
    [5] VENDROV I, KIROS R, FIDLER S, et al. Order-embeddings of images and language[EB/OL]. (2016-03-01)[2021-09-01]. https://arxiv.org/abs/1511.06361v3.
    [6] MANSIMOV E, PARISOTTO E, BA J L, et al. Generating images from captions with attention[EB/OL]. (2016-02-29)[2021-09-01]. https://arxiv.org/abs/1511.02793v2.
    [7] GREGOR K, DANIHELKA I, GRAVES A, et al. DRAW: A recurrent neural network for image generation[C]//Proceedings of the 32nd International Conference on Machine Learning. New York: ACM, 2015: 1462-1471.
    [8] REED S, VAN DEN OORD A, KALCHBRENNER N, et al. Generating interpretable images with controllable structure[C]//5th International Conference on Learning Representations, Appleton, WI: ICLR, 2016.
    [9] NGUYEN A, CLUNE J, BENGIO Y, et al. Plug & play generative networks: Conditional iterative generation of images in latent space[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 17355648.
    [10] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2672-2680.
    [11] MIRZA M, OSINDERO S. Conditional generative adversarial nets[EB/OL]. (2014-10-06)[2021-09-01]. https://arxiv.org/abs/1411.1784.
    [12] SCHUSTER M, PALIWAL K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 2002, 45(11): 2673-2681.
    [13] RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[C]//4th International Conference on Learning Representations, Appleton, WI: ICLR, 2016.
    [14] REED S, AKATA Z, YAN X, et al. Generative adversarial text to image synthesis[C]//Proceedings of the 33rd International Conference on Machine Learning. New York: ACM, 2016: 1060-1069.
    [15] REED S, AKATA Z, LEE H, et al. Learning deep representations of fine-grained visual descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 49-58.
    [16] NILSBACK M E, ZISSERMAN A. Automated flower classification over a large number of classes[C]//Proceedings of the IEEE Conference on Computer Vision, Graphics and Image Processing. Piscataway: IEEE Press, 2008: 722-729.
    [17] WAH C, BRANSON S, WELINDER P, et al. The Caltech-UCSD birds-200-2011 dataset: CNS-TR-2011-001[R]. Pasadena: California Institute of Technology, 2011.
    [18] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANs[C]//30th Conference on Neural Information Processing Systems, Cambridge: MIT Press, 2016: 2234-2242.
    [19] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2017: 6626-6637.
    [20] ZHANG H, XU T, LI H, et al. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 5907-5915.
    [21] ZHANG H, XU T, LI H, et al. StackGAN++: Realistic image synthesis with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1947-1962. doi: 10.1109/TPAMI.2018.2856256
  • 加载中
图(9) / 表(4)
计量
  • 文章访问数:  332
  • HTML全文浏览量:  64
  • PDF下载量:  97
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-10-01
  • 录用日期:  2021-12-24
  • 网络出版日期:  2022-02-07
  • 整期出版日期:  2023-08-31

目录

    /

    返回文章
    返回
    常见问答