北京航空航天大学学报 ›› 2019, Vol. 45 ›› Issue (12): 2506-2513.doi: 10.13700/j.bh.1001-5965.2019.0374

• 论文 • 上一篇    下一篇

时空域上下文学习的视频多帧质量增强方法

佟骏超, 吴熙林, 丁丹丹   

  1. 杭州师范大学 信息科学与工程学院, 杭州 311121
  • 收稿日期:2019-07-09 出版日期:2019-12-20 发布日期:2019-12-31
  • 通讯作者: 丁丹丹 E-mail:DandanDing@hznu.edu.cn
  • 作者简介:佟骏超 男,硕士研究生。主要研究方向:视频图像处理、视频编码;丁丹丹 女,博士,讲师,硕士生导师。主要研究方向:视频编码、视频图像处理。
  • 基金资助:
    浙江省自然科学基金(LY20F010013);国家重点研发计划(2017YFB1002803)

Video multi-frame quality enhancement method via spatial-temporal context learning

TONG Junchao, WU Xilin, DING Dandan   

  1. School of Information Science and Engineering, Hangzhou Normal University, Hangzhou 311121, China
  • Received:2019-07-09 Online:2019-12-20 Published:2019-12-31
  • Supported by:
    Natural Science Foundation of Zhejiang Province, China (LY20F010013); National Key R & D Program of China (2017YFB1002803)

摘要: 卷积神经网络(CNN)在视频增强方向取得了巨大的成功。现有的视频增强方法主要在空域探索图像内像素的相关性,忽略了连续帧之间的时域相似性。针对上述问题,提出一种基于时空域上下文学习的多帧质量增强方法(STMVE),即利用当前帧以及相邻多帧图像共同增强当前帧的质量。首先根据时域多帧图像直接预测得到当前帧的预测帧,然后利用预测帧对当前帧进行增强。其中,预测帧通过自适应可分离的卷积神经网络(ASCNN)得到;在后续增强中,设计了一种多帧卷积神经网络(MFCNN),利用早期融合架构来挖掘当前帧及其预测帧的时空域相关性,最终得到增强的当前帧。实验结果表明,所提出的STMVE方法在量化参数值37、32、27、22上,相对于H.265/HEVC,分别获得0.47、0.43、0.38、0.28 dB的性能增益;与多帧质量增强(MFQE)方法相比,平均获得0.17 dB的增益。

关键词: 时空域上下文学习, 多帧质量增强(MFQE), 卷积神经网络(CNN), 残差学习, 预测帧

Abstract: Convolutional neural network (CNN) has achieved great success in the field of video enhancement. The existing video enhancement methods mainly explore the pixel correlations in spatial domain of an image, which ignores the temporal similarity between consecutive frames. To address the above issue, this paper proposes a multi-frame quality enhancement method, namely spatial-temporal multi-frame video enhancement (STMVE), through learning the spatial-temporal context of current frame. The basic idea of STMVE is utilizing the adjacent frames of current frame to help enhance the quality of current frame. To this end, the virtual frames of current frame are first predicted from its neighbouring frames and then current frame is enhanced by its virtual frames. And the adaptive separable convolutional neural network (ASCNN) is employed to generate the virtual frame. In the subsequent enhancement stage, a multi-frame CNN (MFCNN) is designed. An early-fusion CNN structure is developed to extract both temporal and spatial correlation between the current and virtual frames and output the enhanced current frame. The experimental results show that the proposed STMVE method obtains 0.47 dB, 0.43 dB, 0.38 dB and 0.28 dB PSNR gains compared with H.265/HEVC at quantized parameter values 37, 32, 27 and 22 respectively. Compared to the multi-frame quality enhancement (MFQE) method, an average 0.17 dB PSNR gain is obtained.

Key words: spatial-temporal context learning, multi-frame quality enhancement (MFQE), convolutional neural network (CNN), residual learning, virtual frame

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发