北京航空航天大学学报 ›› 2021, Vol. 47 ›› Issue (3): 441-450.doi: 10.13700/j.bh.1001-5965.2020.0479

• 论文 • 上一篇    下一篇

基于多特征图像视觉显著性的视频摘要化生成

金海燕1,2, 曹甜1, 肖聪1, 肖照林1,2   

  1. 1. 西安理工大学 计算机科学与工程学院, 西安 710048;
    2. 陕西省网络计算与安全技术重点实验室, 西安 710048
  • 收稿日期:2020-08-31 发布日期:2021-04-08
  • 通讯作者: 肖照林 E-mail:xiaozhaolin@xaut.edu.cn
  • 作者简介:金海燕,女,博士,教授,博士生导师,CCF会员。主要研究方向:计算机视觉、图像处理、智能信息处理等;曹甜,女,硕士研究生。主要研究方向:计算机视觉、图像处理等;肖聪,男,硕士研究生。主要研究方向:计算机视觉、图像处理等;肖照林,男,博士,副教授,硕士生导师,CCF会员。主要研究方向:计算机视觉、计算摄影学等。
  • 基金资助:
    陕西省技术创新引导计划(2020CGXNG-026);陕西省自然科学基础研究计划(2019JM-221)

Video summary generation based on multi-feature image and visual saliency

JIN Haiyan1,2, CAO Tian1, XIAO Cong1, XIAO Zhaolin1,2   

  1. 1. College of Computer Science and Engineering, Xi'an University of Technology, Xi'an 710048, China;
    2. Shaanxi Key Laboratory for Network Computing and Security Technology, Xi'an 710048, China
  • Received:2020-08-31 Published:2021-04-08
  • Supported by:
    Technology Innovation Leading Program of Shaanxi (2020CGXNG-026); Natural Science Basic Research Program of Shaanxi (2019JM-221)

摘要: 如何高效提取视频内容即视频摘要化,一直是计算机视觉领域研究的热点。简单通过图像颜色、纹理等特征进行检测已无法有效、完整地获取视频摘要。基于视觉注意力金字塔模型,提出了一种改进的可变比例及双对比度计算的中心-环绕视频摘要化方法。首先,以超像素方法对视频图像序列进行像素块划分以加速图像计算;然后,检测不同颜色背景下的图像对比度特征差异并进行融合;最后,结合光流运动信息,合并静态图像与动态图像显著性结果提取视频关键帧,在提取关键帧时,利用感知哈希函数进行相似性判断完成视频摘要化生成。在Segtrack V2、ViSal及OVP数据集上进行仿真实验,结果表明:所提方法可以有效提取图像感兴趣区域,得到以关键帧图像序列表示的视频摘要。

关键词: 视频摘要化, 视觉注意力金字塔, 视频显著性, 关键帧提取, 相似性判断

Abstract: How to extract video content efficiently, that is, video summarization, is a research hotspot in the field of computer vision. Video summary cannot be obtained effectively and completely by simply detecting the image color, texture and other features. Based on the visual attention pyramid model, this paper proposes an improved center-surround video summarization method with variable ratio and double contrast calculation. First, the video image sequence is divided into pixel blocks by superpixel method to speed up image calculation. Then, the contrast feature difference under different color backgrounds is detected and fused. Finally, combined with the optical flow motion information, the static and dynamic saliency results are merged to extract the key frames of the video. When extracting the key frames, the perceived Hash function is used to perform similarity judgment to complete the video summary generation. Simulation experiments are carried out on Segtrack V2, ViSal and OVP datasets. The experimental results show that the proposed method can be used to effectively extract the area of interest, and finally obtain the video summary expressed by the sequence of key frame images.

Key words: video summarization, visual attention pyramid, visual saliency, key frame extraction, similarity judgment

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发