留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

联合语义分割和边缘纹理的人脸图像修复

石计亮 张乾 周遵富 杨思红

石计亮,张乾,周遵富,等. 联合语义分割和边缘纹理的人脸图像修复[J]. 北京航空航天大学学报,2026,52(6):2194-2207
引用本文: 石计亮,张乾,周遵富,等. 联合语义分割和边缘纹理的人脸图像修复[J]. 北京航空航天大学学报,2026,52(6):2194-2207
SHI J L,ZHANG Q,ZHOU Z F,et al. Face image inpainting combining semantic segmentation and edge texture[J]. Journal of Beijing University of Aeronautics and Astronautics,2026,52(6):2194-2207 (in Chinese)
Citation: SHI J L,ZHANG Q,ZHOU Z F,et al. Face image inpainting combining semantic segmentation and edge texture[J]. Journal of Beijing University of Aeronautics and Astronautics,2026,52(6):2194-2207 (in Chinese)

联合语义分割和边缘纹理的人脸图像修复

doi: 10.13700/j.bh.1001-5965.2024.0258
基金项目: 

贵州民族大学校级科研项目(GZMUZK[2021]YB23)

详细信息
    通讯作者:

    E-mail:gzmuzq@gzmu.edu.cn

  • 中图分类号: TP391.41

Face image inpainting combining semantic segmentation and edge texture

Funds: 

School-level Scientific Research Projects of Guizhou Minzu University (GZMUZK[2021]YB23)

More Information
  • 摘要:

    现有的图像修复方法通过预测辅助结构信息来填充逼真的补丁,但不准确的先验可能导致不合理的结构和模糊的纹理。同时,现有方法仅关注原始图像与修复图像之间的关系,未充分利用受损图像的信息。针对上述问题,提出一种端到端的Transformer人脸图像修复网络,利用语义分割和边缘纹理信息引导修复过程。其中,主修复网络包含一个RGB修复分支、2个用于语义分割和边缘纹理的辅助分支。在编码器中设计一组大核卷积上下文瓶颈(LKCCB)模块,以增加有效感受野和更好地上下文推理。为捕获遥远距离的上下文信息,提出嵌套动态辅助归一化多头注意力(NDAN-MHA)模块,其中,含有的动态辅助归一化(DAN)模块能够动态整合3个分支的结构特征,以此丰富语义一致性。此外,提出引入对比正则化(CR)网络来稳定和改进网络的训练,以生成更真实的修复图像。在CelebA-HQ和FFHQ数据集上进行定性和定量实验,结果表明:所提方法在主客观指标上均优于对比方法,能够合理地修复大面积不规则遮挡的人脸图像。

     

  • 图 1  网络整体框架

    Figure 1.  Overall framework of the network

    图 2  LKCCB模块

    Figure 2.  LKCCB module

    图 3  DAN模块

    Figure 3.  DAN module

    图 4  NDAN-MHA模块

    Figure 4.  NDAN-MHA module

    图 5  不同方法在2个数据集上的定性比较结果

    Figure 5.  Results of qualitative comparison of different methods on two datasets

    图 6  不同模块在2个数据集上的定性消融结果

    Figure 6.  Qualitative ablation results of different modules on two datasets

    图 7  LKCCB性能对比可视化

    Figure 7.  Visualization of LKCCB performance comparison

    图 8  CelebA-HQ数据集上瓶颈块的定性结果

    Figure 8.  Qualitative results of bottleneck blocks on the CelebA-HQ dataset

    图 9  NDAN-MHA模块中不同组合定性结果

    Figure 9.  Qualitative results of different combinations in the NDAN-MHA module

    图 10  损失函数定性消融结果

    Figure 10.  Loss function qualitative ablation results

    图 11  不同权重值对修复效果的定性结果

    Figure 11.  Qualitative results of different weighting values on the effectiveness of restoration

    表  1  不同方法在CelebA-HQ数据集上的定量比较结果

    Table  1.   Quantitative comparison results of different methods on the CelebA-HQ dataset

    方法PSNR/dB↑SSIM↑L1FID↓LPIPS↓
    掩码
    比例
    0.01~
    0.2
    掩码
    比例
    0.2~
    0.4
    掩码
    比例
    0.4~
    0.6
    掩码
    比例
    0.01~
    0.2
    掩码
    比例
    0.2~
    0.4
    掩码
    比例
    0.4~
    0.6
    掩码
    比例
    0.01~
    0.2
    掩码
    比例
    0.2~
    0.4
    掩码
    比例
    0.4~
    0.6
    掩码
    比例
    0.01~
    0.2
    掩码
    比例
    0.2~
    0.4
    掩码
    比例
    0.4~
    0.6
    掩码
    比例
    0.01~
    0.2
    掩码
    比例
    0.2~
    0.4
    掩码
    比例
    0.4~
    0.6
    CTSDG[8]38.0033.1826.430.9840.9470.8460.0030.0080.0211.2582.89510.4750.0180.0590.161
    HAN[24]35.9331.4226.240.9740.9170.8330.0050.0110.0241.9844.2168.7390.0320.0980.165
    T-former[25]36.1431.4426.180.9740.9150.8300.0050.0110.0241.5724.6309.5760.0250.0980.166
    MMT[21]37.7132.7726.480.9820.9420.8440.0030.0080.0221.1642.7007.5840.0180.0580.141
    AOT[15]35.0731.5924.590.9630.9210.7920.0090.0140.0332.4274.64219.6540.0350.0850.223
    HINT[26]37.4631.4225.780.9810.9110.8130.0030.0100.0251.2705.63511.4350.0200.1130.193
    本文方法38.5633.4726.820.9850.9490.8520.0030.0080.0210.9322.0716.3180.0150.0490.129
     注:“↑”表示该评价指标的值越高,图像修复效果越好;“↓”表示该评价指标的值越低,图像修复效果越好;各项指标的最优值用粗体表示。
    下载: 导出CSV

    表  2  不同方法在FFHQ数据集上的定量比较结果

    Table  2.   Quantitative comparison results of different methods on the FFHQ dataset

    方法PSNR/dB↑SSIM↑L1FID↓LPIPS↓
    掩码
    比例
    0.01~
    0.2
    掩码
    比例
    0.2~
    0.4
    掩码
    比例
    0.4~
    0.6
    掩码
    比例
    0.01~
    0.2
    掩码
    比例
    0.2~
    0.4
    掩码
    比例
    0.4~
    0.6
    掩码
    比例
    0.01~
    0.2
    掩码
    比例
    0.2~
    0.4
    掩码
    比例
    0.4~
    0.6
    掩码
    比例
    0.01~
    0.2
    掩码
    比例
    0.2~
    0.4
    掩码
    比例
    0.4~
    0.6
    掩码
    比例
    0.01~
    0.2
    掩码
    比例
    0.2~
    0.4
    掩码
    比例
    0.4~
    0.6
    PRVS[6]31.8026.1722.390.9090.7760.6370.0150.0280.04429.01142.73558.9210.1320.2570.357
    CTSDG[8]37.3631.7025.120.9810.9370.8220.0030.0080.0240.6132.12510.1160.0200.0690.182
    HAN[24]35.4029.7724.730.9710.9050.8060.0050.0130.0280.9304.1617.8490.0310.1040.181
    T-former[25]35.9830.2925.070.9740.9120.8150.0050.0120.0260.5462.3595.7930.0230.0860.163
    MMT[21]36.3330.9224.780.9770.9280.8150.0030.0090.0250.8232.9388.8450.0250.0800.177
    AOT[15]34.0730.3323.570.9520.9080.7700.0110.0160.0371.4983.88425.2730.0400.0900.244
    本文方法37.5231.8025.320.9810.9380.8310.0030.0080.0230.4731.5105.3300.0180.0590.149
     注:“↑”表示该评价指标的值越高,图像修复效果越好;“↓”表示该评价指标的值越低,图像修复效果越好;各项指标的最优值用粗体表示。
    下载: 导出CSV

    表  3  不同方法效率比较结果

    Table  3.   Comparative results of the efficiency of different methods

    方法 参数量 浮点运算速度/(109·s−1) 平均推理时间/ms
    PRVS[6] 56×106 44 14
    CTSDG[8] 52×106 92 25
    HAN[24] 19×106 138 16
    T-former[25] 15×106 51 22
    MMT[21] 51×106 98 22
    AOT[15] 15×106 73 20
    HINT[26] 72×106 222 112
    本文方法 61×106 93 21
    下载: 导出CSV

    表  4  不同模块在2个数据集上的定量消融结果

    Table  4.   Quantitative ablation results for different modules on two datasets

    序号LKCCB模块NDAN-MHA模块CR模块PSNR/dB↑SSIM↑L1FID↓LPIPS↓
    实验1×××27.34 / 25.840.865 / 0.8470.018 / 0.0206.026 / 7.3520.121 / 0.151
    实验2××27.54 / 26.060.868 / 0.8500.017 / 0.0205.947 / 6.3230.116 / 0.145
    实验3×27.67 / 26.130.870 / 0.8510.017 / 0.0205.697 / 6.1210.115 / 0.143
    实验4√(无负样本)27.98 / 26.380.875 / 0.8560.016 / 0.0194.902 / 4.2760.105 / 0.128
    本文方法28.09 / 26.590.878 / 0.8600.016 / 0.0184.575 / 3.6570.102 / 0.121
     注:“√”表示使用该模块;“×”表示未使用该模块;各指标值第1个为CelebA-HQ数据集的结果;第2个为FFHQ数据集的结果。
    下载: 导出CSV

    表  5  LKCCB中不同核大小K的消融实验

    Table  5.   Ablation experiments in LKCCB with different nucleus size K

    K d PSNR/dB↑ SSIM↑ L1 FID↓ LPIPS↓
    6 2 28.08 0.877 0.0162 4.635 0.1028
    15 3 28.05 0.877 0.0163 4.625 0.1026
    21 3 28.09 0.878 0.0162 4.575 0.1023
    28 4 28.08 0.878 0.0162 4.598 0.1030
    下载: 导出CSV

    表  6  CelebA-HQ数据集上瓶颈块的定量结果

    Table  6.   Quantitative results on bottleneck blocks on the CelebA-HQ dataset

    瓶颈块 PSNR/dB↑ SSIM↑ L1 FID↓ LPIPS↓
    RES@8 28.00 0.877 0.0164 4.783 0.1049
    AOT@8 27.97 0.876 0.0164 4.538 0.1032
    LKCCB@8
    (-DLKCB)
    28.06 0.877 0.0163 4.702 0.1037
    LKCCB@2 28.00 0.876 0.0164 4.654 0.1031
    LKCCB@4 28.06 0.877 0.0163 4.576 0.1026
    LKCCB@6 28.07 0.877 0.0162 4.556 0.1023
    LKCCB@8 28.09 0.878 0.0162 4.575 0.1023
     注:$ \mathrm{LKCCB}@L (L=2,  4,  6,  8) $表示使用$ L $个LKCCB模块进行实验;$ \mathrm{RES} $$ @8 $和$ \mathrm{AOT} @8 $分别表示8个ResNet模块和8个AOT模块;$ \mathrm{LKCCB}@8 $$ (-\mathrm{DLKCB}) $表示在8个LKCCB模块中去掉DLKCB进行实验。
    下载: 导出CSV

    表  7  NDAN-MHA模块中不同组合消融实验

    Table  7.   Ablation experiments with different combinations in the NDAN-MHA module

    序号 组合 PSNR/dB↑ SSIM↑ L1
    A NDAN-MHA+FFN[19] 27.96 0.877 0.0166
    B NDAN-MHA+拼接+Conv 27.98 0.876 0.0164
    C NDAN-MHA+AdaIN[28] 27.92 0.874 0.0166
    D NDAN-MHA+SPADE[17] 28.06 0.878 0.0163
    E NDAN-MHA+DAN 28.09 0.878 0.0162
    下载: 导出CSV

    表  8  损失函数定量消融结果

    Table  8.   Loss function quantitative ablation results

    损失函数 FID↓ LPIPS↓
    无损失函数 5.079 0.111
    去除感知损失 4.874 0.106
    去除风格+总变分损失 4.921 0.104
    去除风格+重构损失 5.107 0.108
    去除风格损失 4.834 0.102
    本文方法 4.575 0.102
    下载: 导出CSV

    表  9  不同权重值对修复效果的定量结果

    Table  9.   Quantitative results of different weighting values on the effectiveness of restoration

    权重设置 $ {\lambda }_{\text{edge}} $ $ {\lambda }_{\text{seg}} $ $ {\lambda }_{\text{contra}} $ PSNR/dB↑ FID↓ LPIPS↓
    设置1 1 1 1 27.84 5.187 0.109
    设置2 0.5 0.5 0.5 27.85 5.216 0.109
    设置3 0.5 0.5 2 28.03 4.741 0.104
    最终设置 0.5 0.5 1 28.09 4.575 0.102
    下载: 导出CSV
  • [1] NAZERI K, NG E, JOSEPH T, et al. EdgeConnect: structure guided image inpainting using edge prediction[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop. Piscataway: IEEE Press, 2019: 3265-3274.
    [2] YANG J, QI Z Q, SHI Y. Learning to incorporate structure knowledge for image inpainting[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020, 34(7): 12605-12612.
    [3] YAMASHITA Y, SHIMOSATO K, UKITA N. Boundary-aware image inpainting with multiple auxiliary cues[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE Press, 2022: 618-628.
    [4] 谭骏珊, 李雅芳, 秦姣华. 基于推理注意力机制的二阶段网络图像修复[J]. 电讯技术, 2022, 62(11): 1545-1553.

    TAN J S, LI Y F, QIN J H. Two-stage network image inpainting based on reasoning attention mechanism[J]. Telecommunication Engineering, 2022, 62(11): 1545-1553(in Chinese).
    [5] SHAO X R, YE H L, YANG B, et al. Two-stream coupling network with bidirectional interaction between structure and texture for image inpainting[J]. Expert Systems with Applications, 2023, 231: 120700.
    [6] LI J Y, HE F X, ZHANG L F, et al. Progressive reconstruction of visual structure for image inpainting[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 5961-5970.
    [7] XU S X, LIU D, XIONG Z W. E2I: Generative inpainting from edge to image[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(4): 1308-1322.
    [8] GUO X F, YANG H Y, HUANG D. Image inpainting via conditional texture and structure dual generation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2021: 14114-14123.
    [9] 陈晓雷, 杨佳, 梁其铎. 结合语义先验和深度注意力残差的图像修复[J]. 计算机科学与探索, 2023, 17(10): 2450-2461.

    CHEN X L, YANG J, LIANG Q D. Image inpainting combining semantic priors and deep attention residuals[J]. Journal of Frontiers of Computer Science and Technology, 2023, 17(10): 2450-2461(in Chinese).
    [10] HORITA D, YANG J L, CHEN D, et al. A structure-guided diffusion model for large-hole image completion[EB/OL]. (2023-09-06)[2024-05-06]. https://doi.org/10.48550/arXiv.2211.10437.
    [11] NICHOL A Q, DHARIWAL P, RAMESH A, et al. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models[C]//Proceedings of the 39th International Conference on Machine Learning. New York: PMLR Press, 2022: 16784-16804.
    [12] MA X, ZHOU X Q, HUANG H B, et al. Free-form image inpainting via contrastive attention network[C]//Proceedings of the 25th International Conference on Pattern Recognition. Piscataway: IEEE Press, 2021: 9242-9249.
    [13] ZUO Z W, ZHAO L, LI A L, et al. Generative image inpainting with segmentation confusion adversarial training and contrastive learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023, 37(3): 3888-3896.
    [14] LUO P J, XIAO G Q, GAO X B, et al. LKD-net: large kernel convolution network for single image dehazing[C]//Proceedings of the IEEE International Conference on Multimedia and Expo. Piscataway: IEEE Press, 2023: 1601-1606.
    [15] ZENG Y, FU J, CHAO H, et al. Aggregated contextual transformations for high-resolution image inpainting[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(7): 3266-3280.
    [16] SUVOROV R, LOGACHEVA E, MASHIKHIN A, et al. Resolution-robust large mask inpainting with fourier convolutions[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2022: 3172-3182.
    [17] PARK T, LIU M Y, WANG T C, et al. Semantic image synthesis with spatially-adaptive normalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 2332-2341.
    [18] MA N N, ZHANG X, LIU M Y, et al. Activate or not: learning customized activation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 8028-8038.
    [19] MA X Z, KONG X, WANG S N, et al. Luna: linear unified nested attention[EB/OL]. (2021-11-02)[2024-05-21]. https://doi.org/10.48550/arXiv.2106.01540.
    [20] LIU G L, REDA F A, SHIH K J, et al. Image inpainting for irregular holes using partial convolutions[C]//Proceedings of the 15th European Conference on Computer Vision. Berlin: Springer, 2018: 89-105.
    [21] YU Y S, DU D W, ZHANG L B, et al. Unbiased multi-modality guidance for image inpainting[C]//Proceedings of the 17th European Conference on Computer Vision. Berlin: Springer Press, 2022: 668-684.
    [22] KARRAS T, AILA T, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation[EB/OL]. (2018-02-26)[2024-05-30]. https://doi.org/10.48550/arXiv.1710.10196.
    [23] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4401-4410.
    [24] DENG Y, HUI S Q, MENG R Y, et al. Hourglass attention network for image inpainting[C]//Proceedings of the 17th European Conference on Computer Vision. Berlin: Springer, 2022: 483-501.
    [25] DENG Y, HUI S Q, ZHOU S P, et al. T-former: an efficient transformer for image inpainting[C]//Proceedings of the 30th ACM International Conference on Multimedia. New York: ACM Press, 2022: 6559-6568.
    [26] CHEN S, ATAPOUR-ABARGHOUEI A, SHUM H P H. HINT: high-quality inpainting transformer with mask-aware encoding and enhanced attention[J]. IEEE Transactions on Multimedia, 2024, 26: 7649-7660.
    [27] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
    [28] HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 1510-1519.
  • 加载中
图(11) / 表(9)
计量
  • 文章访问数:  243
  • HTML全文浏览量:  125
  • PDF下载量:  8
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-04-26
  • 录用日期:  2024-07-05
  • 网络出版日期:  2024-12-11
  • 整期出版日期:  2026-06-30

目录

    /

    返回文章
    返回
    常见问答