北京航空航天大学学报 ›› 2019, Vol. 45 ›› Issue (12): 2438-2448.doi: 10.13700/j.bh.1001-5965.2019.0366

• 论文 • 上一篇    下一篇

基于美学评判的文本生成图像优化

徐天宇, 王智   

  1. 清华大学 计算机科学与技术系, 深圳 518055
  • 收稿日期:2019-07-09 出版日期:2019-12-20 发布日期:2019-12-31
  • 通讯作者: 王智 E-mail:wangzhi@sz.tsinghua.edu.cn
  • 作者简介:徐天宇 男,硕士研究生。主要研究方向:数据挖掘与深度学习;王智 男,博士,副教授。主要研究方向:多媒体网络。
  • 基金资助:
    国家自然科学基金(61872215,61531006)

Text-to-image synthesis optimization based on aesthetic assessment

XU Tianyu, WANG Zhi   

  1. Department of Computer Science and Technology, Tsinghua University, Shenzhen 518055, China
  • Received:2019-07-09 Online:2019-12-20 Published:2019-12-31
  • Supported by:
    National Natural Science Foundation of China (61872215,61531006)

摘要: 在对抗生成网络(GAN)这一概念的诞生及发展推动下,文本生成图像的研究取得进展和突破,但大部分的研究内容集中于提高生成图片稳定性和解析度的问题,提高生成结果美观度的研究则很少。而计算机视觉中另一项经典的课题——图像美观度评判的研究也在深度神经网络的推动下提出了一些成果可信度较高的美观度评判模型。本文借助美观度评判模型,对实现文本生成图像目标的GAN模型进行了改造,以期提高其生成图片的美观度指标。首先针对StackGAN++模型,通过选定的美观度评判模型从美学角度评估其生成结果;然后通过借助评判模型构造美学损失的方式对其进行优化。结果使得其生成图像的总体美学分数比原模型提高了3.17%,同时Inception Score提高了2.68%,证明所提方法具有一定效果,但仍存在一定缺陷和提升空间。

关键词: 文本生成图像, 对抗生成网络(GAN), 美观度评判, StackGAN++, 美学损失

Abstract: Due to the development of generative adversarial network (GAN), much progress has been achieved in the research of text-to-image synthesis. However, most of the researches focus on improving the stability and resolution of generated images rather than aesthetic quality. On the other hand, image aesthetic assessment research is also a classic task in computer vision field, and currently there exists several state-of-the-art models on image aesthetic assessment. In this work, we propose to improve the aesthetic quality of images generated by text-to-image GAN by incorporating an image aesthetic assessment model into a conditional GAN. We choose StackGAN++, a state-of-the-art text-to-image synthesis model, assess the aesthetic quality of images generated by it with a chosen aesthetic assessment model, then define a new loss function:aesthetic loss, and use it to improve StackGAN++. Compared with the original model, the total aesthetic score of generated images is improved by 3.17% and the inception score is improved by 2.68%, indicating that the proposed optimization is effective but still has several weaknesses that can be improved in future work.

Key words: text-to-image synthesis, generative adversarial networks (GAN), aesthetic assessment, StackGAN++, aesthetic loss

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发