基于多特征图像视觉显著性的视频摘要化生成

金海燕; 曹甜; 肖聪; 肖照林

doi:10.13700/j.bh.1001-5965.2020.0479

基于多特征图像视觉显著性的视频摘要化生成

doi: 10.13700/j.bh.1001-5965.2020.0479

金海燕^{1, 2},
曹甜¹,
肖聪¹,
肖照林^{1, 2, ,}

1.
西安理工大学计算机科学与工程学院, 西安 710048
2.
陕西省网络计算与安全技术重点实验室, 西安 710048

基金项目:

陕西省技术创新引导计划 2020CGXNG-026

陕西省自然科学基础研究计划 2019JM-221

详细信息

作者简介:
金海燕   女，博士，教授，博士生导师，CCF会员。主要研究方向：计算机视觉、图像处理、智能信息处理等

曹甜   女，硕士研究生。主要研究方向：计算机视觉、图像处理等

肖聪   男, 硕士研究生。主要研究方向：计算机视觉、图像处理等

肖照林   男,博士,副教授,硕士生导师,CCF会员。主要研究方向：计算机视觉、计算摄影学等

通讯作者:
肖照林, E-mail：xiaozhaolin@xaut.edu.cn

中图分类号: TP391.41
计量
- 文章访问数: 432
- HTML全文浏览量: 139
- PDF下载量: 113
- 被引次数: 0
出版历程
- 收稿日期: 2020-08-31
- 录用日期: 2020-10-27
- 网络出版日期: 2021-03-20

Video summary generation based on multi-feature image and visual saliency

JIN Haiyan^{1, 2},
CAO Tian¹,
XIAO Cong¹,
XIAO Zhaolin^{1, 2
, ,}

1.
College of Computer Science and Engineering, Xi'an University of Technology, Xi'an 710048, China
2.
Shaanxi Key Laboratory for Network Computing and Security Technology, Xi'an 710048, China

Funds:

Technology Innovation Leading Program of Shaanxi 2020CGXNG-026

Natural Science Basic Research Program of Shaanxi 2019JM-221

More Information

Corresponding author: XIAO Zhaolin, E-mail: xiaozhaolin@xaut.edu.cn

摘要

摘要:
如何高效提取视频内容即视频摘要化，一直是计算机视觉领域研究的热点。简单通过图像颜色、纹理等特征进行检测已无法有效、完整地获取视频摘要。基于视觉注意力金字塔模型，提出了一种改进的可变比例及双对比度计算的中心-环绕视频摘要化方法。首先，以超像素方法对视频图像序列进行像素块划分以加速图像计算；然后，检测不同颜色背景下的图像对比度特征差异并进行融合；最后，结合光流运动信息，合并静态图像与动态图像显著性结果提取视频关键帧，在提取关键帧时，利用感知哈希函数进行相似性判断完成视频摘要化生成。在Segtrack V2、ViSal及OVP数据集上进行仿真实验，结果表明：所提方法可以有效提取图像感兴趣区域，得到以关键帧图像序列表示的视频摘要。
- 视频摘要化 /
- 视觉注意力金字塔 /
- 视频显著性 /
- 关键帧提取 /
- 相似性判断
Abstract:
How to extract video content efficiently, that is, video summarization, is a research hotspot in the field of computer vision. Video summary cannot be obtained effectively and completely by simply detecting the image color, texture and other features. Based on the visual attention pyramid model, this paper proposes an improved center-surround video summarization method with variable ratio and double contrast calculation. First, the video image sequence is divided into pixel blocks by superpixel method to speed up image calculation. Then, the contrast feature difference under different color backgrounds is detected and fused. Finally, combined with the optical flow motion information, the static and dynamic saliency results are merged to extract the key frames of the video. When extracting the key frames, the perceived Hash function is used to perform similarity judgment to complete the video summary generation. Simulation experiments are carried out on Segtrack V2, ViSal and OVP datasets. The experimental results show that the proposed method can be used to effectively extract the area of interest, and finally obtain the video summary expressed by the sequence of key frame images.
- video summarization /
- visual attention pyramid /
- visual saliency /
- key frame extraction /
- similarity judgment

HTML全文

图 1 动态显著图调整效果前后对比

Figure 1. Effect comparison of dynamic saliency map before and after adjustment

下载: 全尺寸图片幻灯片

图 2 显著结果自适应融合

Figure 2. Adaptive fusion of saliency results

下载: 全尺寸图片幻灯片

图 3 关键帧提取主要方法内容和整体技术框架

Figure 3. Main method content and overall technical framework of key frame extraction

下载: 全尺寸图片幻灯片

图 4 显著性检测效果增强结果

Figure 4. Enhancement results of saliency detection effect

下载: 全尺寸图片幻灯片

图 5 数据集在不同方法上的显著性图比较

Figure 5. Comparison of saliency maps of datasets among different methods

下载: 全尺寸图片幻灯片

图 6 F-measure在不同数据集上的情况

Figure 6. F-measure on different datasets

下载: 全尺寸图片幻灯片

图 7 视频“v20.flv”及“v101.flv”在不同摘要算法下的结果

Figure 7. Results of video "v20.flv" and "v101.flv" under different summarization algorithms

下载: 全尺寸图片幻灯片

图 8 运动视频在不同摘要算法下的结果

Figure 8. Results of sports video under different summarization glgorithms

下载: 全尺寸图片幻灯片

表 1 运动视频在不同摘要算法下的对比

Table 1. Comparison of sports videos under various summarization algorithms

算法	准确率	错误率	漏检率	精度	召回率	F-measure
OV	0.58	0.08	0.42	0.88	0.58	0.7
VSUMM	0.42	0.08	0.58	0.83	0.42	0.56
STIMO	0.67	0.08	0.33	0.89	0.67	0.76
SD	0.33	0.25	0.67	0.57	0.33	0.42
KBKS	0.5	0.08	0.5	0.86	0.5	0.63
本文	0.92	0	0.08	0.92	0.92	0.92

下载: 导出CSV

参考文献(22)

[1]	唐铭谦. 基于对象的监控视频摘要算法研究[D]. 西安: 西安电子科技大学, 2018: 1-3. TANG M Q. The research of surveillance video synopsis algorithm based on objects[D]. Xi'an: Xidian University, 2018: 1-3(in Chinese).
[2]	刘全, 翟建伟, 钟珊, 等. 一种基于视觉注意力机制的深度循环Q网络模型[J]. 计算机学报, 2017, 40(6): 1353-1366. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX201706008.htm LIU Q, ZHAI J W, ZHONG S, et al. A deep recurrent Q-network based on visual attention mechanism[J]. Chinese Journal of Computers, 2017, 40(6): 1353-1366(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX201706008.htm
[3]	郎洪, 丁朔, 陆键, 等. 复杂场景下的交通视频显著性前景目标提取[J]. 中国图象图形学报, 2018, 24(1): 50-63. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB201901006.htm LANG H, DING S, LU J, et al. Traffic video significance foreground target extraction in complex scenes[J]. Journal of Image and Graphics, 2018, 24(1): 50-63(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB201901006.htm
[4]	张芳, 王萌, 肖志涛, 等. 基于全卷积神经网络与低秩稀疏分解的显著性检测[J]. 自动化学报, 2019, 45(11): 2149-2158. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201911011.htm ZHANG F, WANG M, XIAO Z T, et al. Saliency detection via full convolution neural network and low rank sparse decomposition[J]. Acta Automatica Sinica, 2019, 45(11): 2149-2158(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201911011.htm
[5]	李庆武, 马云鹏, 周亚琴, 等. 基于无监督栈式降噪自编码网络的显著性检测算法[J]. 电子学报, 2019, 47(4): 871-879. doi: 10.3969/j.issn.0372-2112.2019.04.015 LI Q W, MA Y P, ZHOU Y Q, et al. Saliency detection based on unsupervised SDAE network[J]. Acta Electronica Sinica, 2019, 47(4): 871-879(in Chinese). doi: 10.3969/j.issn.0372-2112.2019.04.015
[6]	陈炳才, 陶鑫, 陈慧, 等. 融合边界连通性与局部对比性的图像显著性检测[J]. 计算机学报, 2020, 43(1): 16-28. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202001002.htm CHEN B C, TAO X, CHEN H, et al. Saliency detection via fusion of boundary connectivity and local contrast[J]. Chinese Journal of Computers, 2020, 43(1): 16-28(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-JSJX202001002.htm
[7]	ABLAVATSKI A, LU S, CAI J. Enriched deep recurrent visual attention model for multiple object recognition[C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV). Piscataway: IEEE Press, 2017: 971-978.
[8]	QU S, XI Y, DING S. Visual attention based on long-short term memory model for image caption generation[C]//Proceedings of Chinese Control and Decision Conference (CCDC). Piscataway: IEEE Press, 2017: 4789-4794.
[9]	LIU G H, YANG J Y. Exploiting color volume and color difference for salient region detection[J]. IEEE Transactions on Image Processing, 2019, 28(1): 6-16. doi: 10.1109/TIP.2018.2847422
[10]	LI Z, TANG J, WANG X, et al. Multimedia news summarization in search[J]. ACM Transactions on Intelligent Systems and Technology, 2016, 7(3): 1-20.
[11]	HU T L, LI Z C. Video summarization via exploring the global and local importance[J]. Multimedia Tools and Applications, 2018, 77(17): 22083-22098. doi: 10.1007/s11042-017-5479-y
[12]	MENG J, WANG S, WANG H, et al. Video summarization via multi-view representative selection[C]//Proceedings of IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2017: 1189-1198.
[13]	ACHANTA R, SHAJI A, SMITH K, et al. SLIC superpixels compared to state-of-the-art superpixel methods[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(11): 2274-2282. doi: 10.1109/TPAMI.2012.120
[14]	YANG C, ZHANG L, LU H, et al. Saliency detection via graph-based manifold ranking[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2013: 3166-3173.
[15]	PERAZZI F, KRAHENBUHL P, PRITCH Y, et al. Saliency filters: Contrast based filtering for salient region detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2012: 733-740.
[16]	ACHANTA R, HEMAMI S, ESTRADA F, et al. Frequency-tuned salient region detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2009: 1597-1604.
[17]	WEI Y C, WEN F, ZHU W, et al. Geodesic saliency using background priors[C]//Proceedings of European Conference on Computer Vision (ECCV). Berlin: Springer, 2012: 29-42.
[18]	DEMENTHON D, KOBLA V, DOERMANN D. Video summarization by curve simplification[C]//Proceedings of ACM International Conference on Multimedia. New York: ACM Press, 1998: 211-218.
[19]	DE AVILA S E F, LOPES A P B, DA LUZ A, et al. VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method[J]. Pattern Recognition Letters, 2011, 32(1): 56-68. doi: 10.1016/j.patrec.2010.08.004
[20]	FURINI M, GERACI F, MONTANGERO M, et al. STIMO: Still and moving video storyboard for the web scenario[J]. Multimedia Tools and Applications, 2010, 46(1): 47-69. doi: 10.1007/s11042-009-0307-7
[21]	CONG Y, YUAN J, LUO J. Towards scalable summarization of consumer videos via sparse dictionary selection[J]. IEEE Transactions on Multimedia, 2012, 14(1): 66-75. doi: 10.1109/TMM.2011.2166951
[22]	GUAN G, WANG Z, LU S, et al. Keypoint based keyframe selection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2012, 23(4): 729-734.