Volume 45 Issue 12
Dec.  2019
Turn off MathJax
Article Contents
XU Tianyu, WANG Zhi. Text-to-image synthesis optimization based on aesthetic assessment[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(12): 2438-2448. doi: 10.13700/j.bh.1001-5965.2019.0366(in Chinese)
Citation: XU Tianyu, WANG Zhi. Text-to-image synthesis optimization based on aesthetic assessment[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(12): 2438-2448. doi: 10.13700/j.bh.1001-5965.2019.0366(in Chinese)

Text-to-image synthesis optimization based on aesthetic assessment

doi: 10.13700/j.bh.1001-5965.2019.0366
Funds:

National Natural Science Foundation of China 61872215

National Natural Science Foundation of China 61531006

More Information
  • Corresponding author: WANG Zhi, E-mail: wangzhi@sz.tsinghua.edu.cn
  • Received Date: 09 Jul 2019
  • Accepted Date: 14 Aug 2019
  • Publish Date: 20 Dec 2019
  • Due to the development of generative adversarial network (GAN), much progress has been achieved in the research of text-to-image synthesis. However, most of the researches focus on improving the stability and resolution of generated images rather than aesthetic quality. On the other hand, image aesthetic assessment research is also a classic task in computer vision field, and currently there exists several state-of-the-art models on image aesthetic assessment. In this work, we propose to improve the aesthetic quality of images generated by text-to-image GAN by incorporating an image aesthetic assessment model into a conditional GAN. We choose StackGAN++, a state-of-the-art text-to-image synthesis model, assess the aesthetic quality of images generated by it with a chosen aesthetic assessment model, then define a new loss function:aesthetic loss, and use it to improve StackGAN++. Compared with the original model, the total aesthetic score of generated images is improved by 3.17% and the inception score is improved by 2.68%, indicating that the proposed optimization is effective but still has several weaknesses that can be improved in future work.

     

  • loading
  • [1]
    BODNAR C.Text to image synthesis using generative adversarial networks[EB/OL].(2018-05-02)[2019-07-08].https://arxiv.org/abs/1805.00676.
    [2]
    GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2014: 2672-2680.
    [3]
    ZHANG H, XU T, LI H, et al.Stackgan++: Realistic image synthesis with stacked generative adversarial networks[EB/OL].(2018-06-28)[2019-07-08].https://arxiv.org/abs/1710.10916.
    [4]
    SALIMANS T, GOODFELLOW I, ZAREMBA W, et al.Improved techniques for training gans[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2016: 2234-2242.
    [5]
    LI Z, TANG J, MEI T.Deep collaborative embedding for social image understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(9):2070-2083. http://cn.bing.com/academic/profile?id=48621016d95fc79e5a316cc1cf4f5fe2&encoded=0&v=paper_preview&mkt=zh-cn
    [6]
    DENG Y, LOY C C, TANG X.Image aesthetic assessment:An experimental survey[J].IEEE Signal Processing Magazine, 2017, 34(4):80-106. doi: 10.1109/MSP.2017.2696576
    [7]
    DATTA R, JOSHI D, LI J, et al.Studying aesthetics in photographic images using a computational approach[C]//European Conference on Computer Vision.Berlin: Springer, 2006: 288-301. doi: 10.1007/11744078_23
    [8]
    KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2012: 1097-1105. doi: 10.5555/2999134.2999257
    [9]
    KONG S, SHEN X, LIN Z, et al.Photo aesthetics ranking network with attributes and content adaptation[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 662-679. doi: 10.1007%2F978-3-319-46448-0_40
    [10]
    CHOPRA S, HADSELL R, LECUN Y.Learning a similarity metric discriminatively, with application to face verification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2005: 539-546. doi: 10.1109/CVPR.2005.202
    [11]
    RADFORD A, METZ L, CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL].(2016-01-07)[2019-07-08]. https://arxiv.org/abs/1511.06434.
    [12]
    SALIMANS T, GOODFELLOW I, ZAREMBA W, et al.Improved techniques for training gans[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2016: 2234-2242.
    [13]
    ARJOVSKY M, CHINTALA S, BOTTOU L.Wasserstein gan[EB/OL].(2017-12-06)[2019-07-08].https://arxiv.org/abs/1701.07875.
    [14]
    MIRZA M, OSINDERO S.Conditional generative adversarial nets[EB/OL].(2014-11-06)[2019-07-08].https://arxiv.org/abs/1411.1784.
    [15]
    REED S, AKATA Z, YAN X, et al.Generative adversarial text to image synthesis[EB/OL].(2016-06-05)[2019-07-08].https://arxiv.org/abs/1605.05396.
    [16]
    ZHANG H, XU T, LI H, et al.Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 5907-5915. https://arxiv.org/abs/1612.03242
    [17]
    XU T, ZHANG P, HUANG Q, et al.Attngan: Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 1316-1324. https://arxiv.org/abs/1711.10485
    [18]
    CHA M, GWON Y, KUNG H T.Adversarial nets with perceptual losses for text-to-image synthesis[C]//2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).Piscataway, NJ: IEEE Press, 2017: 1-6. https://arxiv.org/abs/1708.09321
    [19]
    JOHNSON J, ALAHI A, LI F.Perceptual losses for real-time style transfer and super-resolution[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 694-711. doi: 10.1007/978-3-319-46475-6_43
    [20]
    SZEGEDY C, VANHOUCKE V, IOFFE S, et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 2818-2826.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(4)

    Article Metrics

    Article views(492) PDF downloads(243) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return