留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

显著性指导下图像迁移

蒋铼 戴宁 徐迈 邓欣 李胜曦

蒋铼,戴宁,徐迈,等. 显著性指导下图像迁移[J]. 北京航空航天大学学报,2023,49(10):2689-2698 doi: 10.13700/j.bh.1001-5965.2021.0732
引用本文: 蒋铼,戴宁,徐迈,等. 显著性指导下图像迁移[J]. 北京航空航天大学学报,2023,49(10):2689-2698 doi: 10.13700/j.bh.1001-5965.2021.0732
JIANG L,DAI N,XU M,et al. Saliency-guided image translation[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(10):2689-2698 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0732
Citation: JIANG L,DAI N,XU M,et al. Saliency-guided image translation[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(10):2689-2698 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0732

显著性指导下图像迁移

doi: 10.13700/j.bh.1001-5965.2021.0732
基金项目: 国家自然科学基金(61876013,61922009,61573037)
详细信息
    通讯作者:

    E-mail:maixu@buaa.edu.cn

  • 中图分类号: TP391.4

Saliency-guided image translation

Funds: National Natural Science Foundation of China (61876013,61922009,61573037)
More Information
  • 摘要:

    开拓了显著性指导下图像迁移新任务,在图像内容及保真度不变条件下,为使迁移后图像满足用户指定的显著图分布,提出一种全新的生成对抗网络 (SalG-GAN)方法。对于给定的原始图像和目标显著图,所提方法可高效生成符合目标显著图的迁移图像。在所提方法中,引入解耦表示框架用于激励模型,针对相同的显著图输入,生成内容不同的迁移图像;在该框架基础上,设计基于显著图的注意模块作为一种特殊的注意力机制,辅助网络在图像迁移过程中聚焦于关键区域;同时,在所提方法中构造基于显著性的生成器、编码器、全局和局部鉴别器等深度网络结构。此外,建立用于显著性指导下图像迁移任务的大规模数据集,用于训练和评估所提方法,具体包括一个大规模的合成数据集和一个包括人眼视觉注意点的真实数据集。在2个数据集上的实验结果表明:所提方法在显著性指导下图像迁移任务中具有优异性能,远优于基准生成对抗网络方法。

     

  • 图 1  图像迁移

    Figure 1.  Image translation

    图 2  本文方法的整体训练流程

    Figure 2.  Training pipeline of proposed method

    图 3  本文方法结构细节

    Figure 3.  Structural details of proposed method

    图 4  基准方法与本文方法在SGIT-S、SGIT-R和SGIT-G数据集应用的结果示例

    Figure 4.  Examples of results obtained from application of benchmark methods and proposed method in SGIT-S, SGIT-R, and SGIT-G datasets

    表  1  本文方法和基准方法的性能比较

    Table  1.   Performance comparision between the propsoed and baseline methods

    方法FIDSSKL
    SGIT-SSGIT-RSGIT-CSGIT-SSGIT-RSGIT-CSGIT-SSGIT-RSGIT-C
    HAG60.3169.6345.210.26 ± 0.120.65 ± 0.220.73 ± 0.12
    CycleGAN34.81106.45114.580.03 ± 0.020.03 ± 0.030.43 ± 0.33
    BicycleGAN113.92122.03118.670.06 ± 0.050.08 ± 0.030.38 ± 0.28
    本文方法30.4848.7853.350.34 ± 0.170.02 ± 0.010.08 ± 0.020.02 ± 0.010.02 ± 0.010.11 ± 0.07
    下载: 导出CSV

    表  2  用户研究:使用本文方法和基准方法获得的结果偏好

    Table  2.   Userstudy: Preference between results obtained using proposed and baseline methods

    方法真实性/%显著性/%一致性/%
    SGIT-SSGIT-RSGIT-CSGIT-SSGIT-RSGIT-CSGIT-SSGIT-RSGIT-C
    HAG8.025.247.34.29.42.018.121.641.7
    CycleGAN21.012.32.021.316.312.110.116.42.1
    BicycleGAN15.58.42.017.810.312.010.014.12.1
    本文方法55.554.148.756.764.073.961.847.955.1
    下载: 导出CSV
  • [1] EL-NOUBY A, SHARMA S, SCHULZ H, et al. Tell, draw, and repeat: Penerating and modifying images based on continual linguistic instruction[C]//2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 10303-10311.
    [2] HONG S, YANG D D, CHOI J, et al. Inferring semantic layout for hierarchical text-to-image synthesis[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7986-7994.
    [3] ISOLA P, ZHU J Y, ZHOU T H, et al. Image-to-image translation with conditional adversarial networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5967-5976.
    [4] ZHAO B, MENG L L, YIN W D, et al. Image generation from layout[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 8576-8585.
    [5] CHOI Y, CHOI M, KIM M, et al. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8789-8797.
    [6] YIN W D, LIU Z W, CHANGE LOY C. Instance-level facial attributes transfer with geometry-aware flow[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 9111-9118. doi: 10.1609/aaai.v33i01.33019111
    [7] JOHNSON J, GUPTA A, LI F F. Image generation from scene graphs[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 1219-1228.
    [8] KARRAS T, LAINE S, AILA T M. A style-based generator architecture for generative adversarial networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4396-4405.
    [9] WANG T C, LIU M Y, ZHU J Y, et al. High-resolution image synthesis and semantic manipulation with conditional GANs[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8798-8807.
    [10] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//2017 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2242-2251.
    [11] BAU D, ZHU J Y, STROBELT H, et al. GAN dissection: Visualizing and understanding generative adversarial networks[EB/OL]. (2018-12-26)[2021-10-20].https://arxiv.org/abs/1811.10597.
    [12] YU J H, LIN Z, YANG J M, et al. Free-form image inpainting with gated convolution[C]//2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 4470-4479.
    [13] MATEESCU V A, BAJIC I V. Visual attention retargeting[J]. IEEE MultiMedia, 2016, 23(1): 82-91. doi: 10.1109/MMUL.2015.59
    [14] MECHREZ R, SHECHTMAN E, ZELNIK-MANOR L. Saliency driven image manipulation[J]. Machine Vision and Applications, 2019, 30(2): 189-202. doi: 10.1007/s00138-018-01000-w
    [15] FRIED O, SHECHTMAN E, GOLDMAN D B, et al. Finding distractors in images[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 1703-1712.
    [16] NGUYEN T V, NI B B, LIU H R, et al. Image re-attentionizing[J]. IEEE Transactions on Multimedia, 2013, 15(8): 1910-1919. doi: 10.1109/TMM.2013.2272919
    [17] ITTI L, KOCH C, NIEBUR E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259. doi: 10.1109/34.730558
    [18] JIANG L, XU M, WANG X F, et al. Saliency-guided image translation[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 16504-16513.
    [19] CHEN Y C, CHANG K J, TSAI Y H, et al. Guide your eyes: Learning image manipulation under saliency guidance[C]//British Machine Vision Conference. Cardiff: BMVA Press, 2019.
    [20] WONG L K, LOW K L. Saliency retargeting: An approach to enhance image aesthetics[C]//2011 IEEE Workshop on Applications of Computer Vision. Piscataway: IEEE Press, 2011: 73-80.
    [21] GATYS L A, KÜMMERER M, WALLIS T S A, et al. Guiding human gaze with convolutional neural networks[EB/OL]. (2017-09-18)[2021-10-20].https://arxiv.org/abs/1712.06492.
    [22] MEJJATI Y A, GOMEZ C F, KIM K I, et al. Look here! A parametric learning based approach to redirect visual attention[C]//European Conference on Computer Vision. Berlin: Springer, 2020: 343-361.
    [23] HAGIWARA A, SUGIMOTO A, KAWAMOTO K. Saliency-based image editing for guiding visual attention[C]//Proceedings of the 1st International Workshop on Pervasive Eye Tracking & Mobile Eye-Based Interaction. New York: ACM, 2011: 43-48.
    [24] MENDEZ E, FEINER S, SCHMALSTIEG D. Focus and context in mixed reality by modulating first order salient features[C]//International Symposium on Smart Graphics. Berlin: Springer, 2010: 232-243.
    [25] BERNHARD M, ZHANG L, WIMMER M. Manipulating attention in computer games[C]//2011 IEEE 10th IVMSP Workshop: Perception and Visual Signal Analysis. Piscataway: IEEE Press, 2011: 153-158.
    [26] ZHU J Y, ZHANG R, PATHAK D, et al. Multimodal image-to-image translation by enforcing bi-cycle consistency[C]//Advances in Neural Information Processing Systems. Long Beach: NIPS, 2017: 465-476.
    [27] LARSEN A B L, SØNDERBY S K, LAROCHELLE H, et al. Autoencoding beyond pixels using a learned similarity metric[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York: ACM, 2016: 1558-1566.
    [28] MAO X D, LI Q, XIE H R, et al. Least Squares generative adversarial networks[C]//2017 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2813-2821.
    [29] JIANG M, HUANG S S, DUAN J Y, et al. SALICON: Saliency in context[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 1072-1080.
    [30] JOHNSON J, HARIHARAN B, VAN DER MAATEN L, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1988-1997.
    [31] ZHOU B L, LAPEDRIZA A, KHOSLA A, et al. Places: A 10 million image database for scene recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(6): 1452-1464. doi: 10.1109/TPAMI.2017.2723009
    [32] MIYATO T, KATAOKA T, KOYAMA M, et al. Spectral normalization for generative adversarial networks[EB/OL]. (2018-02-16)[2021-10-18]. https://arxiv.org/abs/1802.05957.
    [33] ULYANOV D, VEDALDI A, LEMPITSKY V. Instance normalization: The missing ingredient for fast stylization[EB/OL]. (2016-07-27)[2021-10-24]. https://arxiv.org/abs/1607.08022
    [34] KINGMA D P, BA J. Adam: A method for stochastic optimization[EB/OL]. (2014-12-22)[2021-10-25]. https://arxiv.org/abs/1412.6980.
    [35] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6629-6640.
    [36] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 586-595.
    [37] LEE H Y, TSENG H Y, HUANG J B, et al. Diverse image-to-image translation via disentangled representations[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 36-52.
  • 加载中
图(4) / 表(2)
计量
  • 文章访问数:  1068
  • HTML全文浏览量:  61
  • PDF下载量:  25
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-12-06
  • 录用日期:  2022-01-21
  • 网络出版日期:  2022-03-04
  • 整期出版日期:  2023-10-31

目录

    /

    返回文章
    返回
    常见问答