双重模态文本引导的图像修复算法

李海燕; 陈杰; 余鹏飞; 李海江; 张榆锋

doi:10.13700/j.bh.1001-5965.2021.0720

双重模态文本引导的图像修复算法

doi: 10.13700/j.bh.1001-5965.2021.0720

1.
云南大学信息学院，昆明 650000
2.
云南省交通投资建设集团有限公司，昆明 650000

基金项目: 国家自然科学基金(62266049,62066046)；云南省万人计划“教学名师”；云南省智能系统与计算重点实验室项目(202205AG070003)

详细信息

通讯作者:
E-mail：pfyu@ynu.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 2045
- HTML全文浏览量: 73
- PDF下载量: 132
- 被引次数: 0
出版历程
- 收稿日期: 2021-11-30
- 录用日期: 2022-01-16
- 网络出版日期: 2022-01-25
- 整期出版日期: 2023-10-31

Bimodal text-guided image inpainting algorithm

1.
School of Information Science and Engineering，Yunnan University，Kunming 650000，China
2.
Yunnan Communications Investment and Construction Group Co.，Ltd.，Kunming 650000，China

Funds: National Natural Science Foundation of China (62266049,62066046); “Famous Teacher” of Yunnan 10 000 Talents Program; Program of Yunnan Key Laboratory of Intelligent Systems and Computing (202205AG070003)

More Information

Corresponding author: E-mail：pfyu@ynu.edu.cn

摘要

摘要:
为解决现有图像修复算法因缺乏足够的上下文信息导致修复大面积破损时效果差且修复结果不可控的缺陷，提出了双重模态文本引导的图像修复算法。引入文本标签作为修复的控制引导，确保修复结果的整体与区域一致，并增加修复的可控多样性。设计双重模态掩码注意力机制提取破损区域的语义信息；通过深度文本图像融合模块加深生成器中的文本图像融合过程，并应用图像文本匹配损失最大化生成图像与文本之间的语义相似度；采用投射鉴别器训练生成图像与真实图像增强修复图像的真实性。在2个带有文本标签的数据集上进行定量和定性实验，结果表明：生成的修复图像与引导文本描述一致，可根据不同的文字描述生成多样的结果。
- 图像修复 /
- 文本引导 /
- 注意力机制 /
- 文本图像融合 /
- 图像文本匹配损失
Abstract:
A bimodal text-guided image inpainting model is proposed to address shortcomings of the existing image restoration algorithms, such as the restored results are poor and uncontrollable when repairing large areas of distortions due to lack of sufficient contextual information. The proposed algorithm introduces text labels as the control guide for restoration to ensure the overall and regional consistency of the inpainted results and to increase the controllable diversity of the results. Firstly, a dual bi-modal mask attention mechanism is designed to extract semantic information from the damaged region. Subsequently, the text image fusion process in the generator is deepened by a deep text-image fusion module, and the image-text matching loss is applied to maximize the semantic similarity between the generated images and the text. Finally, a projection discriminator is used to train the generated image with the original image to enhance the authenticity of the restored image. Quantitative and qualitative experiments are conducted on two datasets with textual labels. The experimental results demonstrate that the repaired images are consistent with the guidance text description, and various results can be generated according to various textual descriptions.
- image inpainting /
- text guidance /
- attention mechanism /
- text-image fusion /
- image-text matching loss

HTML全文

图 1 双重模态文本引导的图像修复模型

Figure 1. Bimodal text-guided image inpainting model

下载: 全尺寸图片幻灯片

图 2 对偶概率结构生成模型

Figure 2. Pairwise probabilistic structural generation module

下载: 全尺寸图片幻灯片

图 3 深度文本图像融合模块

Figure 3. Deep text image fusion module

下载: 全尺寸图片幻灯片

图 4 双重模态掩码注意力机制

Figure 4. Dual modal mask attention mechanism

下载: 全尺寸图片幻灯片

图 5 对CUB数据集叠加矩形掩码的图像修复结果对比

Figure 5. Comparison of image inpainting results of superimposed rectangular mask on CUB dataset

下载: 全尺寸图片幻灯片

图 6 对COCO数据集叠加矩形掩码的图像修复结果对比

Figure 6. Comparison of image inpainting results of superimposed rectangular masks on COCO dataset

下载: 全尺寸图片幻灯片

图 7 对CUB数据集叠加不规则掩码的修复结果对比

Figure 7. Comparison of image inpainting of superimposed irregular mask on CUB dataset

下载: 全尺寸图片幻灯片

图 8 对COCO数据集叠加不规则掩码的修复结果对比

Figure 8. Comparison of image inpainting results of superimposed irregular masks on COCO dataset

下载: 全尺寸图片幻灯片

图 9 消融实验图像修复结果对比

Figure 9. Comparison of image inpainting results of ablation experiments

下载: 全尺寸图片幻灯片

图 10 采用不同模型消融实验定量评价结果

Figure 10. Quantitative evaluation results of ablation experiments using different models

下载: 全尺寸图片幻灯片

图 11 可控制的修复实验

Figure 11. Controllable repair experiment

下载: 全尺寸图片幻灯片

表 1 对规则掩码在CUB/COCO数据集上定量对比

Table 1. Quantitative comparison of rule masks on CUB/COCO datasets

修复算法	峰值信噪比/dB		结构相似度		平均绝对误差
修复算法	CUB	COCO	CUB	COCO	CUB	COCO
PICNet^[20]	20.52	15.71	0.773	0.762	36.09	29.85
TDANet^[22]	21.99	16.64	0.781	0.766	33.57	33.85
CTSDG^[30]	21.86	16.62	0.769	0.758	33.36	30.67
本文算法	25.47	17.11	0.808	0.768	32.63	28.89

下载: 导出CSV

表 2 对不规则掩码在CUB/COCO数据集上定量对比

Table 2. Quantitative comparison of irregular masks on CUB/COCO datasets

修复算法	峰值信噪比/dB		结构相似度		平均绝对误差
修复算法	CUB	COCO	CUB	COCO	CUB	COCO
PICNet^[20]	22.96	22.89	0.827	0.816	27.47	29.64
TDANet^[22]	23.08	23.01	0.837	0.832	26.40	28.37
CTSDG^[30]	24.07	23.68	0.839	0.835	26.36	27.69
本文算法	24.36	24.25	0.841	0.843	25.87	26.45

下载: 导出CSV

表 3 消融实验定量对比

Table 3. Quantitative comparison of ablation experiments

修复算法	峰值信噪比/dB	结构相似度	平均绝对误差
基础模型	22.15	0.817	33.82
双重模态掩码注意力模块	22.31	0.818	32.27
深度文本图像融合模块	22.60	0.822	31.52
完整模型	25.77	0.847	27.54

下载: 导出CSV

参考文献(30)

[1]	BERTALMIO M, SAPIRO G, CASELLES V, et al. Image inpainting[C]//Proceedings of the ACM SIGGRAPH Conference on Computer Graphics. New York: ACM, 2000: 417-424.
[2]	胡循勇, 杨晓梅, 李昊怡, 等. 融合低秩和稀疏先验的结构性缺失图像修复[J]. 北京航空航天大学学报, 2022, 48(5): 855-862. HUX Y, YANG X M, LI H Y, et al. Structural missing image inpainting based on low rank and sparse prior[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(5): 855-862(in Chinese).
[3]	WANG X, ZHANG D, XING G. A novel image recovery method based on discrete cosine transform and matched blocks[J]. Nonlinear Dynamics, 2013, 73(3): 1945-1954. doi: 10.1007/s11071-013-0915-7
[4]	王兴元, 张继明. 一种基于混沌和汉明码的数字图像篡改检测及修复算法[J]. 物理学报, 2014, 63(2): 73-83. WANG X Y, ZHANG J M. A novel image authentication and recovery algorithm based on chaos and Hamming code[J]. Acta Physica Sinica, 2014, 63(2): 73-83(in Chinese).
[5]	XIA Z, WANG X, WANG C, et al. Subpixel-based accurate and fast dynamic tumor image recognition[J]. Journal of Medical Imaging and Health Informatics, 2018, 8(5): 925-931. doi: 10.1166/jmihi.2018.2390
[6]	LIN Z, ZHAO Z, ZHANG Z, et al. Weakly-supervised video moment retrieval via semantic completion network[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020, 34(7): 11539-11546.
[7]	PATHAK D, KRAHENBUHL P, DONAHUE J, et al. Context encoders: Feature learning by inpainting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2536-2544.
[8]	IIZUKA S, SIMO-SERRA E, ISHIKAWA H. Globally and locally consistent image completion[J]. ACM Transactions on Graphics, 2017, 36(4): 1-14.
[9]	MA Y, LIU X, BAI S, et al. Coarse-to-fine image inpainting via region-wise convolutions and non local correlation[C]//Proceedings of the IJCAI International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2019: 3123-3129.
[10]	SHIN Y G, SAGONG M C, YEO Y J, et al. PEPSI++: Fast and lightweight network for image inpainting[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(1): 252-265.
[11]	LIU H , JIANG B, XIAO Y, et al. Coherent semantic attention for image inpainting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 4170-4179.
[12]	WANG N, WANG W, HU W, et al. Thank a mural inpainting based on multi-scale adaptive partial convolution and stroke-like mask[J]. IEEE Transactions on Image Processing, 2021, 30: 3720-3733. doi: 10.1109/TIP.2021.3064268
[13]	YAN Z, LI X, LI M, et al. Shift-Net: Image inpainting via deep feature rearrangement[C]//Proceedings of the European Conference on the Computer Vision. Berlin: Springer, 2018, 1: 1-17.
[14]	YU J, LIN Z, YANG J, et al. Free-form image inpainting with gated convolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 4471-4480.
[15]	李海燕, 黄和富, 郭磊, 等. 基于残缺图像样本的生成对抗网络图像修复方法[J]. 北京航空航天大学学报, 2021, 47(10): 1949-1958. LI H Y, HUANG H F, GUO L, et al. Image inpainting method based on incomplete image samples in generative adversarial network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(10): 1949-1958(in Chinese).
[16]	李海燕, 吴自莹, 郭磊, 等. 基于混合空洞卷积网络的多鉴别器图像修复[J]. 华中科技大学学报(自然科学版), 2021, 49(3): 40-45. LI H Y, WU Z Y, GUO L, et al. Multidiscriminator image inpainting algorithm based on hybrid dilated convolution network[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2021, 49(3): 40-45(in Chinese).
[17]	徐涛, 周纪勇, 张国梁, 等. 场内外特征融合的残缺图像精细修复[J]. 光学精密工程, 2021, 29(10): 2481-2494. doi: 10.37188/OPE.20212910.2481 XU T, ZHOU J Y, ZHANG G L, et al. Fine restoration of incomplete image with external features and image features[J]. Optics and Precision Engineering, 2021, 29(10): 2481-2494(in Chinese). doi: 10.37188/OPE.20212910.2481
[18]	张雪菲, 程乐超, 白升利, 等. 基于变分自编码器的人脸图像修复[J]. 计算机辅助设计与图形学学报, 2020, 32(3): 401-409. ZHANG X F, CHEN L C, BAI S L, et al. Face image restoration based on variational autoencoder[J]. Journal of Computer Aided Design and Graphics, 2020, 32(3): 401-409(in Chinese).
[19]	WALKER J, DOERSCH C, GUPTA A, et al. An uncertain future: Forecasting from static images using variational autoencoders[C]//Proceedings of the European Conference on the Computer Vision. Berlin: Springer, 2016: 835-851.
[20]	ZHANG C, CHAM T J, CAI J. Pluralistic image completion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1438-1447.
[21]	NAZERI, NG E, JOSEPH T, et al. EdgeConnect: Generative image inpainting with adversarial edge learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 3265-3274.
[22]	ZHANG L, CHEN Q, HU B, et al. Text-guided neural image inpainting[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1302-1310.
[23]	SOHN K, LEE H, YAN X. Learning structured output representation using deep conditional generative models[J]. Advances in Neural Information Processing Systems, 2015, 28: 3483-3491.
[24]	ZHANG H, XU T, LI H, et al. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 5907-5915.
[25]	DOERSCH C. Tutorial on variational autoencoders[EB/OL]. (2021-01-03)[2021-11-01].
[26]	BAO J, CHEN D, WEN F, et al. CVAE-GAN: Fine-grained image generation through asymm-etric training[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2745-2754.
[27]	XU T, ZHANG P, HUANG Q, et al. AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 1316-1324.
[28]	DUAN K, PARIKH D, CRANDALL D, et al. Discovering localized attributes for fine-grained recognition[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2012: 3474-3481.
[29]	LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//Proceedings of the European Conference on the Computer Vision. Berlin: Springer, 2014: 740-755.
[30]	GUO X, YANG H, HUANG D. Image inpainting via conditional texture and structure dual generation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 14134-14143.