留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

双重模态文本引导的图像修复算法

李海燕 陈杰 余鹏飞 李海江 张榆锋

李海燕,陈杰,余鹏飞,等. 双重模态文本引导的图像修复算法[J]. 北京航空航天大学学报,2023,49(10):2547-2557 doi: 10.13700/j.bh.1001-5965.2021.0720
引用本文: 李海燕,陈杰,余鹏飞,等. 双重模态文本引导的图像修复算法[J]. 北京航空航天大学学报,2023,49(10):2547-2557 doi: 10.13700/j.bh.1001-5965.2021.0720
LI H Y,CHEN J,YU P F,et al. Bimodal text-guided image inpainting algorithm[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(10):2547-2557 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0720
Citation: LI H Y,CHEN J,YU P F,et al. Bimodal text-guided image inpainting algorithm[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(10):2547-2557 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0720

双重模态文本引导的图像修复算法

doi: 10.13700/j.bh.1001-5965.2021.0720
基金项目: 国家自然科学基金(62266049,62066046);云南省万人计划“教学名师”; 云南省智能系统与计算重点实验室项目(202205AG070003)
详细信息
    通讯作者:

    E-mail:pfyu@ynu.edu.cn

  • 中图分类号: TP391.4

Bimodal text-guided image inpainting algorithm

Funds: National Natural Science Foundation of China (62266049,62066046); “Famous Teacher” of Yunnan 10 000 Talents Program; Program of Yunnan Key Laboratory of Intelligent Systems and Computing (202205AG070003)
More Information
  • 摘要:

    为解决现有图像修复算法因缺乏足够的上下文信息导致修复大面积破损时效果差且修复结果不可控的缺陷,提出了双重模态文本引导的图像修复算法。引入文本标签作为修复的控制引导,确保修复结果的整体与区域一致,并增加修复的可控多样性。设计双重模态掩码注意力机制提取破损区域的语义信息;通过深度文本图像融合模块加深生成器中的文本图像融合过程,并应用图像文本匹配损失最大化生成图像与文本之间的语义相似度;采用投射鉴别器训练生成图像与真实图像增强修复图像的真实性。在2个带有文本标签的数据集上进行定量和定性实验,结果表明:生成的修复图像与引导文本描述一致,可根据不同的文字描述生成多样的结果。

     

  • 图 1  双重模态文本引导的图像修复模型

    Figure 1.  Bimodal text-guided image inpainting model

    图 2  对偶概率结构生成模型

    Figure 2.  Pairwise probabilistic structural generation module

    图 3  深度文本图像融合模块

    Figure 3.  Deep text image fusion module

    图 4  双重模态掩码注意力机制

    Figure 4.  Dual modal mask attention mechanism

    图 5  对CUB数据集叠加矩形掩码的图像修复结果对比

    Figure 5.  Comparison of image inpainting results of superimposed rectangular mask on CUB dataset

    图 6  对COCO数据集叠加矩形掩码的图像修复结果对比

    Figure 6.  Comparison of image inpainting results of superimposed rectangular masks on COCO dataset

    图 7  对CUB数据集叠加不规则掩码的修复结果对比

    Figure 7.  Comparison of image inpainting of superimposed irregular mask on CUB dataset

    图 8  对COCO数据集叠加不规则掩码的修复结果对比

    Figure 8.  Comparison of image inpainting results of superimposed irregular masks on COCO dataset

    图 9  消融实验图像修复结果对比

    Figure 9.  Comparison of image inpainting results of ablation experiments

    图 10  采用不同模型消融实验定量评价结果

    Figure 10.  Quantitative evaluation results of ablation experiments using different models

    图 11  可控制的修复实验

    Figure 11.  Controllable repair experiment

    表  1  对规则掩码在CUB/COCO数据集上定量对比

    Table  1.   Quantitative comparison of rule masks on CUB/COCO datasets

    修复算法峰值信噪比/dB结构相似度平均绝对误差
    CUBCOCOCUBCOCOCUBCOCO
    PICNet[20]20.5215.710.7730.76236.0929.85
    TDANet[22]21.9916.640.7810.76633.5733.85
    CTSDG[30]21.8616.620.7690.75833.3630.67
    本文算法25.4717.110.8080.76832.6328.89
    下载: 导出CSV

    表  2  对不规则掩码在CUB/COCO数据集上定量对比

    Table  2.   Quantitative comparison of irregular masks on CUB/COCO datasets

    修复算法峰值信噪比/dB结构相似度平均绝对误差
    CUBCOCOCUBCOCOCUBCOCO
    PICNet[20]22.9622.890.8270.81627.4729.64
    TDANet[22]23.0823.010.8370.83226.4028.37
    CTSDG[30]24.0723.680.8390.83526.3627.69
    本文算法24.3624.250.8410.84325.8726.45
    下载: 导出CSV

    表  3  消融实验定量对比

    Table  3.   Quantitative comparison of ablation experiments

    修复算法峰值信噪比/dB结构相似度平均绝对误差
    基础模型22.150.81733.82
    双重模态掩码
    注意力模块
    22.310.81832.27
    深度文本图像
    融合模块
    22.600.82231.52
    完整模型25.770.84727.54
    下载: 导出CSV
  • [1] BERTALMIO M, SAPIRO G, CASELLES V, et al. Image inpainting[C]//Proceedings of the ACM SIGGRAPH Conference on Computer Graphics. New York: ACM, 2000: 417-424.
    [2] 胡循勇, 杨晓梅, 李昊怡, 等. 融合低秩和稀疏先验的结构性缺失图像修复[J]. 北京航空航天大学学报, 2022, 48(5): 855-862.

    HUX Y, YANG X M, LI H Y, et al. Structural missing image inpainting based on low rank and sparse prior[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(5): 855-862(in Chinese).
    [3] WANG X, ZHANG D, XING G. A novel image recovery method based on discrete cosine transform and matched blocks[J]. Nonlinear Dynamics, 2013, 73(3): 1945-1954. doi: 10.1007/s11071-013-0915-7
    [4] 王兴元, 张继明. 一种基于混沌和汉明码的数字图像篡改检测及修复算法[J]. 物理学报, 2014, 63(2): 73-83.

    WANG X Y, ZHANG J M. A novel image authentication and recovery algorithm based on chaos and Hamming code[J]. Acta Physica Sinica, 2014, 63(2): 73-83(in Chinese).
    [5] XIA Z, WANG X, WANG C, et al. Subpixel-based accurate and fast dynamic tumor image recognition[J]. Journal of Medical Imaging and Health Informatics, 2018, 8(5): 925-931. doi: 10.1166/jmihi.2018.2390
    [6] LIN Z, ZHAO Z, ZHANG Z, et al. Weakly-supervised video moment retrieval via semantic completion network[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020, 34(7): 11539-11546.
    [7] PATHAK D, KRAHENBUHL P, DONAHUE J, et al. Context encoders: Feature learning by inpainting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2536-2544.
    [8] IIZUKA S, SIMO-SERRA E, ISHIKAWA H. Globally and locally consistent image completion[J]. ACM Transactions on Graphics, 2017, 36(4): 1-14.
    [9] MA Y, LIU X, BAI S, et al. Coarse-to-fine image inpainting via region-wise convolutions and non local correlation[C]//Proceedings of the IJCAI International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann, 2019: 3123-3129.
    [10] SHIN Y G, SAGONG M C, YEO Y J, et al. PEPSI++: Fast and lightweight network for image inpainting[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(1): 252-265.
    [11] LIU H , JIANG B, XIAO Y, et al. Coherent semantic attention for image inpainting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 4170-4179.
    [12] WANG N, WANG W, HU W, et al. Thank a mural inpainting based on multi-scale adaptive partial convolution and stroke-like mask[J]. IEEE Transactions on Image Processing, 2021, 30: 3720-3733. doi: 10.1109/TIP.2021.3064268
    [13] YAN Z, LI X, LI M, et al. Shift-Net: Image inpainting via deep feature rearrangement[C]//Proceedings of the European Conference on the Computer Vision. Berlin: Springer, 2018, 1: 1-17.
    [14] YU J, LIN Z, YANG J, et al. Free-form image inpainting with gated convolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 4471-4480.
    [15] 李海燕, 黄和富, 郭磊, 等. 基于残缺图像样本的生成对抗网络图像修复方法[J]. 北京航空航天大学学报, 2021, 47(10): 1949-1958.

    LI H Y, HUANG H F, GUO L, et al. Image inpainting method based on incomplete image samples in generative adversarial network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(10): 1949-1958(in Chinese).
    [16] 李海燕, 吴自莹, 郭磊, 等. 基于混合空洞卷积网络的多鉴别器图像修复[J]. 华中科技大学学报(自然科学版), 2021, 49(3): 40-45.

    LI H Y, WU Z Y, GUO L, et al. Multidiscriminator image inpainting algorithm based on hybrid dilated convolution network[J]. Journal of Huazhong University of Science and Technology (Natural Science Edition), 2021, 49(3): 40-45(in Chinese).
    [17] 徐涛, 周纪勇, 张国梁, 等. 场内外特征融合的残缺图像精细修复[J]. 光学精密工程, 2021, 29(10): 2481-2494. doi: 10.37188/OPE.20212910.2481

    XU T, ZHOU J Y, ZHANG G L, et al. Fine restoration of incomplete image with external features and image features[J]. Optics and Precision Engineering, 2021, 29(10): 2481-2494(in Chinese). doi: 10.37188/OPE.20212910.2481
    [18] 张雪菲, 程乐超, 白升利, 等. 基于变分自编码器的人脸图像修复[J]. 计算机辅助设计与图形学学报, 2020, 32(3): 401-409.

    ZHANG X F, CHEN L C, BAI S L, et al. Face image restoration based on variational autoencoder[J]. Journal of Computer Aided Design and Graphics, 2020, 32(3): 401-409(in Chinese).
    [19] WALKER J, DOERSCH C, GUPTA A, et al. An uncertain future: Forecasting from static images using variational autoencoders[C]//Proceedings of the European Conference on the Computer Vision. Berlin: Springer, 2016: 835-851.
    [20] ZHANG C, CHAM T J, CAI J. Pluralistic image completion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 1438-1447.
    [21] NAZERI, NG E, JOSEPH T, et al. EdgeConnect: Generative image inpainting with adversarial edge learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 3265-3274.
    [22] ZHANG L, CHEN Q, HU B, et al. Text-guided neural image inpainting[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York: ACM, 2020: 1302-1310.
    [23] SOHN K, LEE H, YAN X. Learning structured output representation using deep conditional generative models[J]. Advances in Neural Information Processing Systems, 2015, 28: 3483-3491.
    [24] ZHANG H, XU T, LI H, et al. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 5907-5915.
    [25] DOERSCH C. Tutorial on variational autoencoders[EB/OL]. (2021-01-03)[2021-11-01].https://arxiv.org/abs/1606.05908v1.
    [26] BAO J, CHEN D, WEN F, et al. CVAE-GAN: Fine-grained image generation through asymm-etric training[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2745-2754.
    [27] XU T, ZHANG P, HUANG Q, et al. AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 1316-1324.
    [28] DUAN K, PARIKH D, CRANDALL D, et al. Discovering localized attributes for fine-grained recognition[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2012: 3474-3481.
    [29] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//Proceedings of the European Conference on the Computer Vision. Berlin: Springer, 2014: 740-755.
    [30] GUO X, YANG H, HUANG D. Image inpainting via conditional texture and structure dual generation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 14134-14143.
  • 加载中
图(11) / 表(3)
计量
  • 文章访问数:  1783
  • HTML全文浏览量:  32
  • PDF下载量:  99
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-11-30
  • 录用日期:  2022-01-16
  • 网络出版日期:  2022-01-25
  • 整期出版日期:  2023-10-31

目录

    /

    返回文章
    返回
    常见问答