Volume 50 Issue 2
Feb.  2024
Turn off MathJax
Article Contents
LU G,ZHONG T X,GENG J. A Transformer based deep conditional video compression[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):442-448 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0374
Citation: LU G,ZHONG T X,GENG J. A Transformer based deep conditional video compression[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(2):442-448 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0374

A Transformer based deep conditional video compression

doi: 10.13700/j.bh.1001-5965.2022.0374
Funds:  National Natural Science Foundation of China (62102024)
More Information
  • Corresponding author: E-mail: janegeng@bit.edu.cn
  • Received Date: 18 May 2022
  • Accepted Date: 23 Jun 2022
  • Available Online: 31 Oct 2022
  • Publish Date: 09 Oct 2022
  • Convolutional neural networks (CNN) are the foundation of most recent learning-based video compression algorithms, which also use residual coding and motion compensation architectures. It is difficult to attain the best compression performance given that typical CNN can only use local correlations and the sparsity of prediction residual. To solve the problems above, this paper proposed a Transformer-based deep conditional video compression algorithm, which can achieve better compression performance. The proposed algorithm uses deformable convolution to obtain the predicted frame feature based on the motion information between the front and rear frames. The predicted frame feature is used as conditional information to conditionally encode the original input frame feature which avoids the direct encoding of sparse residual signals. The proposed algorithm further utilizes the non-local correlation between the features and proposes a transformer-based autoencoder architecture to implement motion coding and conditional coding, which further improves the performance of compression. Experiments show that our Transformer based deep conditional video compression algorithm surpasses the current mainstream learning-based video compression algorithms in both HEVC and UVG datasets.

     

  • loading
  • [1]
    Joint Video Team of ITU-T and ISO/IEC JTC 1. Advanced video coding:ISO/IEC 14496-10[S].JVT:Pattaya,2003.
    [2]
    BROSS B.High efficiency video coding:ISO/IEC 23008-2[S].Shanghai:ITU-T/ISO/IEC Joint Collaborative Team on Video Coding,2012.
    [3]
    LU G, OUYANG W L, XU D, et al. DVC: An end-to-end deep video compression framework[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 10998-11007.
    [4]
    WALLACE G K. The JPEG still picture compression standard[J]. Communications of the ACM, 1991, 34(4): 30-44. doi: 10.1145/103085.103089
    [5]
    BELLARD F. BPG image format[EB/OL]. (2015-02-01)[2022-05-10]. https://bellard.org/bpg.
    [6]
    TODERICI G, O’MALLEY S M, HWANG S J, et al. Variable rate image compression with recurrent neural networks[EB/OL]. (2016-05-01)[2022-05-10]. https://arxiv.org/abs/1511.06085.pdf.
    [7]
    TODERICI G, VINCENT D, JOHNSTON N, et al. Full resolution image compression with recurrent neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5435-5443.
    [8]
    JOHNSTON N, VINCENT D, MINNEN D, et al. Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4385-4393.
    [9]
    BALLÉ J, LAPARRA V, SIMONCELLI E P. End-to-end optimized image compression[EB/OL]. (2017-05-03)[2022-05-10]. https://arxiv.org/abs/1611.01704.pdf.
    [10]
    BALLÉ J, MINNEN D, SINGH S, et al. Variational image compression with a scale hyperprior[EB/OL]. (2018-05-01)[2022-05-10]. https://arxiv.org/abs/1802.01436.pdf.
    [11]
    MINNEN D, BALLÉ J, TODERICI G. Joint autoregressive and hierarchical priors for learned image compression[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York: ACM, 2018: 10794-10803.
    [12]
    MINNEN D, SINGH S. Channel-wise autoregressive entropy models for learned image compression[C]//Proceedings of the IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2020: 3339-3343.
    [13]
    ZHU Y, YANG Y, COHEN T. Transformer-based transform coding[EB/OL]. (2022-01-29)[2022-05-10]. https://openreview.net/forum?id=IDwN6xjHnK8.
    [14]
    KOYUNCU A B, GAO H, BOEV A, et al. Contextformer: A Transformer with spatio-channel attention for context modeling in learned image compression[EB/OL]. (2022-07-20)[2022-05-10]. https://arxiv.org/abs/2203.02452.pdf.
    [15]
    QIAN Y C, LIN M, SUN X Y, et al. Entroformer: A Transformer-based entropy model for learned image compression[EB/OL]. (2022-05-14)[2022-05-10]. https://arxiv.org/abs/2202.05492.pdf.
    [16]
    LU M, GUO P Y, SHI H Q, et al. Transformer-based image compression[EB/OL]. (2021-11-12)[2022-05-10]. https://arxiv.org/abs/2111.06707.pdf.
    [17]
    BAI Y C, YANG X, LIU X M, et al. Towards end-to-end image compression and analysis with Transformers[EB/OL]. (2021-12-17)[2022-05-10]. https://arxiv.org/abs/2112.09300.pdf.
    [18]
    HU Z H, CHEN Z H, XU D, et al. Improving deep video compression by resolution-adaptive flow coding[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 193-209.
    [19]
    AGUSTSSON E, MINNEN D, JOHNSTON N, et al. Scale-space flow for end-to-end optimized video compression[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 8500-8509.
    [20]
    HU Z H, LU G, XU D. FVC: A new framework towards deep video compression in feature space[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 1502-1511.
    [21]
    LI J, LI B, LU Y. Deep contextual video compression[EB/OL]. (2021-12-14)[2022-05-01]https://arxiv.org/abs/2109.15047.
    [22]
    WU C Y, SINGHAL N, KRÄHENBÜHL P. Video compression through image interpolation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 425-440.
    [23]
    DJELOUAH A, CAMPOS J, SCHAUB-MEYER S, et al. Neural inter-frame compression for video coding[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 6420-6428.
    [24]
    YANG R, MENTZER F, VAN GOOL L, et al. Learning for video compression with hierarchical quality and recurrent enhancement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 6627-6636.
    [25]
    PESSOA J, AIDOS H, TOMÁS P, et al. End-to-end learning of video compression using spatio-temporal autoencoders[C]//Proceedings of the IEEE Workshop on Signal Processing Systems. Piscataway: IEEE Press, 2020: 1-6.
    [26]
    HABIBIAN A, VAN ROZENDAAL T, TOMCZAK J, et al. Video compression with rate-distortion autoencoders[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2020: 7032-7041.
    [27]
    LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: Hierarchical vision Transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2022: 9992-10002.
    [28]
    XUE T F, CHEN B A, WU J J, et al. Video enhancement with task-oriented flow[J]. International Journal of Computer Vision, 2019, 127(8): 1106-1125. doi: 10.1007/s11263-018-01144-2
    [29]
    MERCAT A, VIITANEN M, VANNE J. UVG dataset: 50/120fps 4K sequences for video codec analysis and development[C]//Proceedings of the 11th ACM Multimedia Systems Conference. New York: ACM, 2020: 297-302.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(3)  / Tables(2)

    Article Metrics

    Article views(888) PDF downloads(23) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return