留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

融合语义信息的视频摘要生成

滑蕊 吴心筱 赵文天

滑蕊, 吴心筱, 赵文天等 . 融合语义信息的视频摘要生成[J]. 北京航空航天大学学报, 2021, 47(3): 650-657. doi: 10.13700/j.bh.1001-5965.2020.0447
引用本文: 滑蕊, 吴心筱, 赵文天等 . 融合语义信息的视频摘要生成[J]. 北京航空航天大学学报, 2021, 47(3): 650-657. doi: 10.13700/j.bh.1001-5965.2020.0447
HUA Rui, WU Xinxiao, ZHAO Wentianet al. Video summarization by learning semantic information[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(3): 650-657. doi: 10.13700/j.bh.1001-5965.2020.0447(in Chinese)
Citation: HUA Rui, WU Xinxiao, ZHAO Wentianet al. Video summarization by learning semantic information[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(3): 650-657. doi: 10.13700/j.bh.1001-5965.2020.0447(in Chinese)

融合语义信息的视频摘要生成

doi: 10.13700/j.bh.1001-5965.2020.0447
基金项目: 

国家自然科学基金 61673062

国家自然科学基金 62072041

详细信息
    作者简介:

    滑蕊   女,硕士研究生。主要研究方向:视频摘要

    吴心筱   女,博士,副教授,博士生导师。主要研究方向:视觉与语言、视频内容理解、机器学习

    赵文天   男,博士研究生。主要研究方向:图像和视频描述生成

    通讯作者:

    吴心筱, E-mail: wuxinxiao@bit.edu.cn

  • 中图分类号: TP391

Video summarization by learning semantic information

Funds: 

National Natural Science Foundation of China 61673062

National Natural Science Foundation of China 62072041

More Information
  • 摘要:

    视频摘要任务旨在通过生成简短的视频片段来表示原视频的主要内容,针对现有方法缺乏对语义信息探索的问题,提出了一种融合语义信息的视频摘要生成模型,学习视频特征使其包含丰富的语义信息,进而同时生成描述原始视频内容的视频摘要和文本摘要。该模型分为3个模块:帧级分数加权模块、视觉-语义嵌入模块、视频文本描述生成模块。帧级分数加权模块结合卷积网络与全连接层以获取帧级重要性分数;视觉-语义嵌入模块将视觉特征与文本特征映射到同一空间,以使2种特征相互靠近;视频文本描述生成模块最小化视频摘要的生成描述与文本标注真值之间的距离,以生成带有语义信息的视频摘要。测试时,在获取视频摘要的同时,该模型获得简短的文本摘要作为副产品,可以帮助人们更直观地理解视频内容。在SumMe和TVSum数据集上的实验表明:该模型通过融合语义信息,比现有先进方法取得了更好的性能,在这2个数据集上F-score指标分别提高了0.5%和1.6%。

     

  • 图 1  融合语义信息的视频摘要生成流程

    Figure 1.  Flowchart of video summarization by learning semantic information

    图 2  帧级分数加权模块框架

    Figure 2.  Framework of frame-level score weighting module

    图 3  TVSum数据集中生成视频摘要的示例

    Figure 3.  Examples of video summarization in TVSum

    图 4  TVSum数据集中生成文本摘要的示例

    Figure 4.  Examples of text summarization in TVSum

    表  1  与6个最新方法之间的F-score比较

    Table  1.   between our frameworks and six state-of- the-art methods

    实验方法 F-score/%
    SumMe TVSum
    vsLSTM[4] 37.6 54.2
    dppLSTM[4] 38.6 54.7
    SUM-GANsup[5] 41.7 56.3
    DR-DSNsup[17] 42.1 58.1
    SASUMsup[11] 45.3 58.2
    CSNetsup[24] 48.6 58.5
    本文方法(无监督) 45.5 57.3
    本文方法(有监督) 49.1 60.1
    下载: 导出CSV

    表  2  不同数据集生成的文本摘要评测

    Table  2.   Evaluation of text summaries generated by different datasets

    数据集 BLEU-1/% ROUGE-L/% CIDEr/%
    SumMe 28.3 27.6 9.6
    TVSum 32.8 29.9 12.7
    下载: 导出CSV

    表  3  TVSum数据集上的消融实验结果

    Table  3.   Results of ablation experiment on TVSum

    实验编号 嵌入空间 描述生成 F-score/%
    1 × × 50.4
    2 × 51.5
    3 × 58.3
    4 60.1
    下载: 导出CSV
  • [1] 刘波. 视频摘要研究综述[J]. 南京信息工程大学学报, 2020, 12(3): 274-278. https://www.cnki.com.cn/Article/CJFDTOTAL-NJXZ202003003.htm

    LIU B. Survey of video summary[J]. Journal of Nanjing University of Information Science & Technology, 2020, 12(3): 274-278(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-NJXZ202003003.htm
    [2] RAV-ACHA A, PRITCH Y, PELEG S. Making a long video short: Dynamic video synopsis[C]//Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2006: 435-441.
    [3] ZHAO B, XING E P. Quasi real-time summarization for consumer videos[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 2513-2520.
    [4] ZHANG K, CHAO W, SHA F, et al. Video summarization with long short-term memory[C]//European Conference on Computer Vision. Berlin: Springer, 2016: 766-782.
    [5] MAHASSENI B, LAM M, TODOROVIC S, et al. Unsupervised video summarization with adversarial LSTM networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2982-2991.
    [6] ZHAO B, LI X, LU X, et al. HSA-RNN: Hierarchical structure-adaptive RNN for video summarization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7405-7414.
    [7] 冀中, 江俊杰. 基于解码器注意力机制的视频摘要[J]. 天津大学学报, 2018, 51(10): 1023-1030. https://www.cnki.com.cn/Article/CJFDTOTAL-TJDX201810004.htm

    JI Z, JIANG J J. Video summarization based on decoder attention mechanism[J]. Transactions of Tianjin University, 2018, 51(10): 1023-1030(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-TJDX201810004.htm
    [8] 李依依, 王继龙. 自注意力机制的视频摘要模型[J]. 计算机辅助设计与图形学学报, 2020, 32(4): 652-659. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJF202004016.htm

    LI Y Y, WANG J L. Self-attention based video summarization[J]. Journal of Computer-Aided Design & Computer Graphics, 2020, 32(4): 652-659(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JSJF202004016.htm
    [9] CHEN Y, TAO L, WANG X, et al. Weakly supervised video summarization by hierarchical reinforcement learning[C]//Proceedings of the ACM Multimedia Asia. New York: ACM Press, 2019: 1-6.
    [10] CHOI J, OH T, KWEON I S, et al. Contextually customized video summaries via natural language[C]//Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). Piscataway: IEEE Press, 2018: 1718-1726.
    [11] WEI H, NI B, YAN Y, et al. Video summarization via semantic attended networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018: 216-223.
    [12] SHARGHI A, BORJI A, LI C, et al. Improving sequential determinantal point processes for supervised video summarization[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 533-550.
    [13] ROCHAN M, YE L, WANG Y, et al. Video summarization using fully convolutional sequence networks[C]//European Conference on Computer Vision. Berlin: Springer, 2018: 358-374.
    [14] ZHANG Y, KAMPFFMEYER M, LIANG X, et al. Query-conditioned three-player adversarial network for video summarization[C]//British Machine Vision Conference. Berlin: Springer, 2018.
    [15] YUAN L, TAY F E, LI P, et al. Cycle-sum: Cycle-consistent adversarial LSTM networks for unsupervised video summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 9143-9150.
    [16] ZHANG Y, KAMPFFMEYER M, ZHAO X, et al. DTR-GAN: Dilated temporal relational adversarial network for video summarization[C]//Proceedings of the ACM Turing Celebration Conference-China. New York: ACM Press, 2019: 1-6.
    [17] ZHOU K, QIAO Y, XIANG T. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018: 7582-7589.
    [18] WANG L, ZHU Y, PAN H. Unsupervised reinforcement learning for video summarization reward function[C]//Proceedings of the 2019 International Conference on Image, Video and Signal Processing. New York: ACM Press, 2019: 40-44.
    [19] XU J, MEI T, YAO T, et al. Msr-VTT: A large video description dataset for bridging video and language[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 5288-5296.
    [20] POTAPOV D, DOUZE M, HARCHAOUI Z, et al. Category-specific video summarization[C]//European Conference on Computer Vision. Berlin: Springer, 2014: 540-555.
    [21] GONG B, CHAO W L, GRAUMAN K, et al. Diverse sequential subset selection for supervised video summarization[C]//Advances in Neural Information Processing Systems. New York: Curran Associates, 2014: 2069-2077.
    [22] SONG Y, VALLMITJANA J, STENT A, et al. TVSum: Summarizing web videos using titles[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 5179-5187.
    [23] GYGLI M, GRABNER H, RIEMENSCHNEIDER H, et al. Creating summaries from user videos[C]//European Conference on Computer Vision. Berlin: Springer, 2014: 505-520.
    [24] JUNG Y, CHO D, KIM D, et al. Discriminative feature learning for unsupervised video summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2019: 8537-8544.
  • 加载中
图(4) / 表(3)
计量
  • 文章访问数:  350
  • HTML全文浏览量:  2
  • PDF下载量:  62
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-08-24
  • 录用日期:  2020-09-27
  • 刊出日期:  2021-03-20

目录

    /

    返回文章
    返回
    常见问答