Text-to-image synthesis optimization based on aesthetic assessment

XU Tianyu; WANG Zhi

doi:10.13700/j.bh.1001-5965.2019.0366

Volume 45 Issue 12

Dec. 2019

Turn off MathJax

Article Contents

Abstract

References

Journal of Beijing University of Aeronautics and Astronautics > 2019 > 45(12): 2438-2448.

Shen Xiaorong, Zhang Hai, Fan Yaozu, et al. Extended Kalman filter method for micro-inertial strapdown attitude determination system[J]. Journal of Beijing University of Aeronautics and Astronautics, 2007, 33(08): 933-935. (in Chinese)

Citation:

XU Tianyu, WANG Zhi. Text-to-image synthesis optimization based on aesthetic assessment[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(12): 2438-2448. doi: 10.13700/j.bh.1001-5965.2019.0366(in Chinese)

Citation:

PDF( 5034 KB)

Text-to-image synthesis optimization based on aesthetic assessment

doi: 10.13700/j.bh.1001-5965.2019.0366

XU Tianyu,
WANG Zhi^,

Department of Computer Science and Technology, Tsinghua University, Shenzhen 518055, China

Funds:

National Natural Science Foundation of China 61872215

National Natural Science Foundation of China 61531006

More Information

Corresponding author: WANG Zhi, E-mail: wangzhi@sz.tsinghua.edu.cn
Received Date: 09 Jul 2019
Accepted Date: 14 Aug 2019
Publish Date: 20 Dec 2019

Abstract

Abstract

Due to the development of generative adversarial network (GAN), much progress has been achieved in the research of text-to-image synthesis. However, most of the researches focus on improving the stability and resolution of generated images rather than aesthetic quality. On the other hand, image aesthetic assessment research is also a classic task in computer vision field, and currently there exists several state-of-the-art models on image aesthetic assessment. In this work, we propose to improve the aesthetic quality of images generated by text-to-image GAN by incorporating an image aesthetic assessment model into a conditional GAN. We choose StackGAN++, a state-of-the-art text-to-image synthesis model, assess the aesthetic quality of images generated by it with a chosen aesthetic assessment model, then define a new loss function:aesthetic loss, and use it to improve StackGAN++. Compared with the original model, the total aesthetic score of generated images is improved by 3.17% and the inception score is improved by 2.68%, indicating that the proposed optimization is effective but still has several weaknesses that can be improved in future work.
- text-to-image synthesis,
- generative adversarial networks (GAN),
- aesthetic assessment,
- StackGAN++,
- aesthetic loss

FullText(HTML)

References(20)

References

[1]	BODNAR C.Text to image synthesis using generative adversarial networks[EB/OL].(2018-05-02)[2019-07-08].
[2]	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2014: 2672-2680.
[3]	ZHANG H, XU T, LI H, et al.Stackgan++: Realistic image synthesis with stacked generative adversarial networks[EB/OL].(2018-06-28)[2019-07-08].
[4]	SALIMANS T, GOODFELLOW I, ZAREMBA W, et al.Improved techniques for training gans[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2016: 2234-2242.
[5]	LI Z, TANG J, MEI T.Deep collaborative embedding for social image understanding[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(9):2070-2083.
[6]	DENG Y, LOY C C, TANG X.Image aesthetic assessment:An experimental survey[J].IEEE Signal Processing Magazine, 2017, 34(4):80-106. doi: 10.1109/MSP.2017.2696576
[7]	DATTA R, JOSHI D, LI J, et al.Studying aesthetics in photographic images using a computational approach[C]//European Conference on Computer Vision.Berlin: Springer, 2006: 288-301. doi: 10.1007/11744078_23
[8]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2012: 1097-1105. doi: 10.5555/2999134.2999257
[9]	KONG S, SHEN X, LIN Z, et al.Photo aesthetics ranking network with attributes and content adaptation[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 662-679. doi: 10.1007%2F978-3-319-46448-0_40
[10]	CHOPRA S, HADSELL R, LECUN Y.Learning a similarity metric discriminatively, with application to face verification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2005: 539-546. doi: 10.1109/CVPR.2005.202
[11]	RADFORD A, METZ L, CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL].(2016-01-07)[2019-07-08].
[12]	SALIMANS T, GOODFELLOW I, ZAREMBA W, et al.Improved techniques for training gans[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2016: 2234-2242.
[13]	ARJOVSKY M, CHINTALA S, BOTTOU L.Wasserstein gan[EB/OL].(2017-12-06)[2019-07-08].
[14]	MIRZA M, OSINDERO S.Conditional generative adversarial nets[EB/OL].(2014-11-06)[2019-07-08].
[15]	REED S, AKATA Z, YAN X, et al.Generative adversarial text to image synthesis[EB/OL].(2016-06-05)[2019-07-08].
[16]	ZHANG H, XU T, LI H, et al.Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 5907-5915.
[17]	XU T, ZHANG P, HUANG Q, et al.Attngan: Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 1316-1324.
[18]	CHA M, GWON Y, KUNG H T.Adversarial nets with perceptual losses for text-to-image synthesis[C]//2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).Piscataway, NJ: IEEE Press, 2017: 1-6.
[19]	JOHNSON J, ALAHI A, LI F.Perceptual losses for real-time style transfer and super-resolution[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 694-711. doi: 10.1007/978-3-319-46475-6_43
[20]	SZEGEDY C, VANHOUCKE V, IOFFE S, et al.Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 2818-2826.

Relative Articles

[1]	LU S Q，GUAN F X，LAI H T，et al. Two-stage underwater image enhancement method based on convolutional neural networks[J]. Journal of Beijing University of Aeronautics and Astronautics，2025，51（1）：321-332 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.1003.
[2]	LIU C J，QIAO Z，YAN H W，et al. Semantic segmentation network of remote sensing images based on dual path supervision[J]. Journal of Beijing University of Aeronautics and Astronautics，2025，51（3）：732-741 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2023.0155.
[3]	GAO Yang, LIN Jiaquan. Optimization of cabin return air ratio based on air quality and compensation loss[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2024.0210
[4]	TIAN Yu, LI Ruiying. An improved network two-terminal connection reliability algorithm based on state vectors[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2024.0483
[5]	LI M H，JIN S，DU Y. Adversarial attack method based on loss smoothing[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（2）：663-670 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0478.
[6]	ZHU Shaochuan, ZHENG Lei, DU Wenbo. An epsilon constraint-based column generation for airport gate emergency reassignment problem[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2024.0419
[7]	LIU B D，YU J S，HAN D Y，et al. Complex equipment troubleshooting strategy generation based on Bayesian networks and reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（4）：1354-1364 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0449.
[8]	ZHANG Y Z，LI Y. Research on abstractive text summarization based on triplet information guidance[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（12）：3677-3685 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0896.
[9]	ZHANG Yi-tian, LUO Xi-ling, WANG Yu-peng. Self-supervised image change detection method based on lightweight capsule network[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023-0251
[10]	ZHONG D M，GONG H Y，SUN R. An improved STPA for accurate identification of loss scenarios[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（2）：311-323 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2021.0226.
[11]	LI Hong, LIANG Yi-di, YIN Cheng-dong, ZHENG Qiong-lin, ZHANG Bo. The Topology Self-generating Method of the Isolated DC-DC Converters with Two Switches[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023.0483
[12]	LIN Y H，LI C B. Multidimensional degradation data generation method based on variational autoencoder[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（10）：2617-2627 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2021.0760.
[13]	YANG B，HE Y Z，XU F，et al. Using improved genetic algorithm for software fault localization aided test case generation[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（9）：2279-2288 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0524.
[14]	XUE Y，HE F，GU X Y. UAV information interaction topology generation considering task allocation[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（7）：1787-1795 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2021.0486.
[15]	CHEN Y，CHEN J，TAO M F. Mural inpainting progressive generative adversarial networks based on structure guided[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（6）：1247-1259 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2021.0440.
[16]	CHEN Y，CHEN J，TAO M F. Mural inpainting with generative adversarial networks based on multi-scale feature and attention fusion[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（2）：254-264 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2021.0242.
[17]	SHAO Wei-zhi, XIONG Si-yu, PAN Li-li. Semi-supervised image retrieval based on triplet hash loss[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023.0451
[18]	LI Y H，ZHU M Y，REN J，et al. Text-to-image synthesis based on modified deep convolutional generative adversarial network[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（8）：1875-1883 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2021.0588.
[19]	LIU Hao, YANG Xiaoshan, XU Changsheng. Long-tail image captioning with dynamic semantic memory network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(8): 1399-1408. doi: 10.13700/j.bh.1001-5965.2021.0518
[20]	CHEN Weijing, WANG Weiying, JIN Qin. Image difference caption generation with text information assistance[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(8): 1436-1444. doi: 10.13700/j.bh.1001-5965.2021.0526

Supplements(0)

Cited By

Cited by

Periodical cited type(3)

1.	谭红臣，黄世华，肖贺文，于冰冰，刘秀平. 判别增强的生成对抗模型在文本至图像生成中的研究与应用. 计算机工程与科学. 2022(05): 855-861 .
2.	马力，邹亚莉. 嵌入自注意力机制的美学特征图像生成方法. 计算机科学与探索. 2021(09): 1728-1739 .
3.	张帅，杨雪霞. 基于文本—图像及流形插值的对抗模型. 软件导刊. 2020(08): 216-220 .

Other cited types(4)

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(4)

Get Citation

PDF

XML

Article Metrics

Article views(548) PDF downloads(243)