-
摘要:
在对抗生成网络(GAN)这一概念的诞生及发展推动下,文本生成图像的研究取得进展和突破,但大部分的研究内容集中于提高生成图片稳定性和解析度的问题,提高生成结果美观度的研究则很少。而计算机视觉中另一项经典的课题——图像美观度评判的研究也在深度神经网络的推动下提出了一些成果可信度较高的美观度评判模型。本文借助美观度评判模型,对实现文本生成图像目标的GAN模型进行了改造,以期提高其生成图片的美观度指标。首先针对StackGAN++模型,通过选定的美观度评判模型从美学角度评估其生成结果;然后通过借助评判模型构造美学损失的方式对其进行优化。结果使得其生成图像的总体美学分数比原模型提高了3.17%,同时Inception Score提高了2.68%,证明所提方法具有一定效果,但仍存在一定缺陷和提升空间。
-
关键词:
- 文本生成图像 /
- 对抗生成网络(GAN) /
- 美观度评判 /
- StackGAN++ /
- 美学损失
Abstract:Due to the development of generative adversarial network (GAN), much progress has been achieved in the research of text-to-image synthesis. However, most of the researches focus on improving the stability and resolution of generated images rather than aesthetic quality. On the other hand, image aesthetic assessment research is also a classic task in computer vision field, and currently there exists several state-of-the-art models on image aesthetic assessment. In this work, we propose to improve the aesthetic quality of images generated by text-to-image GAN by incorporating an image aesthetic assessment model into a conditional GAN. We choose StackGAN++, a state-of-the-art text-to-image synthesis model, assess the aesthetic quality of images generated by it with a chosen aesthetic assessment model, then define a new loss function:aesthetic loss, and use it to improve StackGAN++. Compared with the original model, the total aesthetic score of generated images is improved by 3.17% and the inception score is improved by 2.68%, indicating that the proposed optimization is effective but still has several weaknesses that can be improved in future work.
-
表 1 不同生成数量情况下最高分数与前10平均分数
Table 1. Highest aesthetic score and average of top 10 aesthetic score when generating different quantities of images
图像生成数量 最高分数 前10平均分数 100 0.892 8 0.769 8 200 0.804 2 0.775 3 350 0.845 1 0.788 9 500 0.843 9 0.797 8 750 0.858 4 0.815 5 1 000 0.954 8 0.859 2 表 2 不同β取值对应模型的IS
Table 2. IS of models using different β
β IS 0 4.10±0.06 45 4.05±0.03 1 4.13±0.05 0.000 1 4.21±0.03 表 3 不同β取值对应模型的美学分数对比
Table 3. Comparison of aesthetic scores of models using different β
β 测试集平均分数 0 0.628 3 1 0.600 6 0.000 1 0.648 2 表 4 原模型与优化模型使用24条文本生成1 000张图像的平均美学分数对比
Table 4. Comparison of average aesthetic score of 1 000 images generated by original models and optimized models using 24 different texts
编号 平均美学分数 对比提升量 原模型 β= 0.000 1 1 0.592 9 0.619 2 +0.026 3 2 0.586 6 0.618 0 +0.031 4 3 0.567 1 0.617 0 +0.049 9 4 0.574 1 0.543 3 -0.030 8 5 0.581 9 0.551 5 -0.030 4 6 0.551 0 0.518 8 -0.032 2 7 0.602 4 0.622 1 +0.019 7 8 0.568 8 0.546 7 -0.022 1 9 0.598 7 0.627 9 +0.029 2 10 0.616 6 0.589 0 -0.027 6 11 0.600 8 0.613 8 +0.013 0 12 0.551 1 0.622 7 +0.071 6 13 0.700 7 0.691 3 -0.009 4 14 0.586 2 0.659 6 +0.073 4 15 0.586 8 0.623 2 +0.036 4 16 0.595 1 0.622 7 +0.027 6 17 0.621 7 0.634 5 +0.012 8 18 0.555 0 0.579 2 +0.024 2 19 0.585 9 0.597 4 +0.011 5 20 0.568 9 0.567 3 -0.001 6 21 0.598 8 0.640 5 +0.041 7 22 0.579 4 0.547 7 -0.031 7 23 0.558 7 0.593 0 +0.034 3 24 0.589 0 0.541 9 -0.047 1 -
[1] BODNAR C.Text to image synthesis using generative adversarial networks[EB/OL].(2018-05-02)[2019-07-08]. [2] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2014: 2672-2680. [3] ZHANG H, XU T, LI H, et al.Stackgan++: Realistic image synthesis with stacked generative adversarial networks[EB/OL].(2018-06-28)[2019-07-08]. [4] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al.Improved techniques for training gans[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2016: 2234-2242. [5] LI Z, TANG J, MEI T.Deep collaborative embedding for social image understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(9):2070-2083. [6] DENG Y, LOY C C, TANG X.Image aesthetic assessment:An experimental survey[J].IEEE Signal Processing Magazine, 2017, 34(4):80-106. doi: 10.1109/MSP.2017.2696576 [7] DATTA R, JOSHI D, LI J, et al.Studying aesthetics in photographic images using a computational approach[C]//European Conference on Computer Vision.Berlin: Springer, 2006: 288-301. doi: 10.1007/11744078_23 [8] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2012: 1097-1105. doi: 10.5555/2999134.2999257 [9] KONG S, SHEN X, LIN Z, et al.Photo aesthetics ranking network with attributes and content adaptation[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 662-679. doi: 10.1007%2F978-3-319-46448-0_40 [10] CHOPRA S, HADSELL R, LECUN Y.Learning a similarity metric discriminatively, with application to face verification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2005: 539-546. doi: 10.1109/CVPR.2005.202 [11] RADFORD A, METZ L, CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL].(2016-01-07)[2019-07-08]. [12] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al.Improved techniques for training gans[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2016: 2234-2242. [13] ARJOVSKY M, CHINTALA S, BOTTOU L.Wasserstein gan[EB/OL].(2017-12-06)[2019-07-08]. [14] MIRZA M, OSINDERO S.Conditional generative adversarial nets[EB/OL].(2014-11-06)[2019-07-08]. [15] REED S, AKATA Z, YAN X, et al.Generative adversarial text to image synthesis[EB/OL].(2016-06-05)[2019-07-08]. [16] ZHANG H, XU T, LI H, et al.Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 5907-5915. [17] XU T, ZHANG P, HUANG Q, et al.Attngan: Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 1316-1324. [18] CHA M, GWON Y, KUNG H T.Adversarial nets with perceptual losses for text-to-image synthesis[C]//2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).Piscataway, NJ: IEEE Press, 2017: 1-6. [19] JOHNSON J, ALAHI A, LI F.Perceptual losses for real-time style transfer and super-resolution[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 694-711. doi: 10.1007/978-3-319-46475-6_43 [20] SZEGEDY C, VANHOUCKE V, IOFFE S, et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 2818-2826. 期刊类型引用(24)
1. 黄润琴,苏珉,刘佳,王涛. 基于小波变换与奇异值分解的飞鸟动态电磁散射特征提取. 广西师范大学学报(自然科学版). 2024(04): 74-89 . 百度学术
2. 刘闪亮,吴仁彪,屈景怡,乔晗,何雨龙. Bi-PPYOLO tiny:一种轻量型的机场无人机检测方法. 安全与环境学报. 2023(02): 480-488 . 百度学术
3. 韩虎虎,祁鑫,王鹤飞,李惠翔,齐吉祥. 基于无人机巡检的光伏面板缺陷识别方法. 电子设计工程. 2023(06): 176-179+184 . 百度学术
4. 薛珊,陈宇超,吕琼莹,曹国华. 基于双支路神经网络的无人机图像识别方法. 计算机仿真. 2023(07): 233-238 . 百度学术
5. 刘佳,徐群玉,陈唯实. 无人机雷达航迹运动特征提取及组合分类方法. 系统工程与电子技术. 2023(10): 3122-3131 . 百度学术
6. 苏志刚,王雪萌. 基于双层特征选择的空中目标分类算法研究. 激光与光电子学进展. 2022(02): 243-251 . 百度学术
7. 睢丙东,苗林星,于国庆. 基于深度学习的无人机目标检测系统. 科技风. 2022(06): 4-6 . 百度学术
8. 陈唯实,黄毅峰,陈小龙,卢贤锋,张洁. 机场探鸟雷达技术发展与应用综述. 航空学报. 2022(01): 184-204 . 百度学术
9. 张虹波,匡银虎. 基于齐次变换的低空无人机防撞导引控制方法. 计算机仿真. 2022(06): 42-46 . 百度学术
10. 何炜琨,柳振明,王晓亮. 微动特征和运动特征融合处理的鸟与旋翼无人机目标辨别方法. 电子测量与仪器学报. 2022(07): 33-43 . 百度学术
11. Weiming TIAN,Linlin FANG,Rui WANG,Weidong LI,Chao ZHOU,Cheng HU. A robust tracking method focusing on target fluctuation and maneuver characteristics. Science China(Information Sciences). 2022(11): 263-280 . 必应学术
12. 刘雷,刘大卫,王晓光,陈俊男,刘东兴. 无人机集群与反无人机集群发展现状及展望. 航空学报. 2022(S1): 4-20 . 百度学术
13. 冯维婷,梁青. 闪烁现象下基于微动补偿的旋转叶片参数估计方法. 信号处理. 2022(12): 2617-2627 . 百度学术
14. 卢晓东,辛佳宁,顾嘉耀,王俊康,程承. 析构时间相关模型的面对称飞行器机动估计. 宇航学报. 2021(02): 167-174 . 百度学术
15. 陈唯实,黄毅峰,陈小龙,卢贤锋,张洁. 基于模型转换频率估计的低空目标分类. 系统工程与电子技术. 2021(04): 927-936 . 百度学术
16. 侯旋,薛飞,陈涛. 无人机目标检测量子多模式识别优化算法. 计算机工程与应用. 2021(07): 228-236 . 百度学术
17. 杨名宇,王浩,王含宇. 基于深度学习与光电探测的无人机防范技术. 液晶与显示. 2021(09): 1323-1330 . 百度学术
18. 苏子康,李春涛,余跃,徐忠楠,王宏伦. 绳系拖曳飞行器高抗扰轨迹跟踪控制. 北京航空航天大学学报. 2021(11): 2234-2248 . 本站查看
19. 屈旭涛,庄东晔,谢海斌. “低慢小”无人机探测方法. 指挥控制与仿真. 2020(02): 128-135 . 百度学术
20. 陈唯实,黄毅峰,陈小龙,卢贤锋. 机场净空区飞鸟与非合作无人机目标识别. 民航学报. 2020(03): 27-33 . 百度学术
21. 陈唯实,黄毅峰,卢贤锋. 多传感器融合的无人机探测技术应用综述. 现代雷达. 2020(06): 15-29 . 百度学术
22. 蒋镕圻,白若楷,彭月平. 低慢小无人机目标探测技术综述. 飞航导弹. 2020(09): 100-105 . 百度学术
23. 陈小龙,陈唯实,饶云华,黄勇,关键,董云龙. 飞鸟与无人机目标雷达探测与识别技术进展与展望. 雷达学报. 2020(05): 803-827 . 百度学术
24. 薛珊,李广青,吕琼莹,毛逸维. 基于卷积神经网络的反无人机系统声音识别方法. 工程科学学报. 2020(11): 1516-1524 . 百度学术
其他类型引用(18)
-