留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种高效Swin Transformer加速器设计

蔡青竹 刘强

蔡青竹,刘强. 一种高效Swin Transformer加速器设计[J]. 北京航空航天大学学报,2026,52(6):2102-2113
引用本文: 蔡青竹,刘强. 一种高效Swin Transformer加速器设计[J]. 北京航空航天大学学报,2026,52(6):2102-2113
CAI Q Z,LIU Q. A high-efficient Swin Transformer accelerator design[J]. Journal of Beijing University of Aeronautics and Astronautics,2026,52(6):2102-2113 (in Chinese)
Citation: CAI Q Z,LIU Q. A high-efficient Swin Transformer accelerator design[J]. Journal of Beijing University of Aeronautics and Astronautics,2026,52(6):2102-2113 (in Chinese)

一种高效Swin Transformer加速器设计

doi: 10.13700/j.bh.1001-5965.2024.0222
基金项目: 

国家自然科学基金(U21B2031)

详细信息
    通讯作者:

    E-mail:qiangliu@tju.edu.cn

  • 中图分类号: TN431.2;V247.1+9

A high-efficient Swin Transformer accelerator design

Funds: 

National Natural Science Foundation of China (U21B2031)

More Information
  • 摘要:

    针对Swin Transformer模型在资源受限环境下的部署与执行挑战,采用自适应轮询式剪枝和搜索型无偏移量化技术,以降低模型复杂度和存储需求,同时保持模型准确度。通过自适应轮询式剪枝技术动态去除不重要的权重,减少模型的存储与计算需求。引入搜索型无偏移量化方法,优化权重与激活值的存储,进一步降低模型大小,同时尽量避免精度损失。提出了一种专为Swin Transformer模型优化的加速架构,通过硬件级别的优化提高数据处理速度和效率。实验结果显示:在应用剪枝和量化技术后,模型大小被压缩至原始的14.4%,且在ImageNet-1K数据集上的Top-1准确率达到77.4%。

     

  • 图 1  Swin Transformer模型整体架构[6]

    Figure 1.  Overall architecture of the Swin Transformer model[6]

    图 2  2个连续的Swin Transformer 块的结构[6]

    Figure 2.  Structure of two consecutive Swin Transformer blocks[6]

    图 3  剪枝和修补过程

    Figure 3.  Pruning and patching process

    图 4  50%稀疏度下2种剪枝方法的工作负载均衡效果比较

    Figure 4.  Comparison of workload balancing effects of two pruning methods at 50% sparsity

    图 5  CSC格式压缩存储示意图

    Figure 5.  Schematic diagram of CSC format compressed storage

    图 6  UCSC格式压缩存储示意图

    Figure 6.  Schematic diagram of UCSC format compressed storage

    图 7  Swin Transformer量化前参数分布

    Figure 7.  Distribution of parameters before quantization of Swin Transformer

    图 8  模型准确度与多层感知机权重的缩放因子的关系

    Figure 8.  Relationship between model accuracy and scaling factors of multilayer perceptron weights

    图 9  加速器整体架构图

    Figure 9.  Overall architecture of the accelerator

    图 10  窗口并行计算资源优化策略的性能分析

    Figure 10.  Performance analysis of window parallel computation resource optimization strategy

    图 11  乘法单元计算引擎

    Figure 11.  Multiplication unit computing engine

    图 12  tanh函数与近似五阶多项式的对比

    Figure 12.  Comparison of the tanh function with an approximate fifth-order polynomial

    图 13  tanh函数替换前后GELU函数的对比

    Figure 13.  Comparison of the GELU function before and after replacement with the tanh function

    图 14  剪枝效果随首次轮询剪枝率的变化

    Figure 14.  Comparison of pruning effectiveness with changes in the initial polling pruning rate

    图 15  剪枝效果随修补剪枝率起始点的变化

    Figure 15.  Comparison of pruning effectiveness with the starting point of repair pruning rate

    图 16  与已有相关FPGA加速器的推理延迟对比

    Figure 16.  Comparison of inference latency with existing FPGA accelerators

    表  1  不同容忍阈值对模型剪枝效果的影响

    Table  1.   Impact of different tolerance thresholds on model pruning effectiveness

    容忍阈值/%压缩率/%准确率损失/%
    554.9205.672
    243.8022.572
    128.8771.502
    下载: 导出CSV

    表  2  不同量化方法推理准确率对比

    Table  2.   Comparison of inference accuracy for different quantization methods

    剪枝后量化方法 模型大小/MB 准确率/% 量化损失/%
    未量化 62.4 78.34
    有偏移量化 15.6 72.22 6.120
    最大值截断的无偏移量化 15.6 75.68 2.660
    搜索型无偏移量化 15.6 77.42 0.924
    下载: 导出CSV

    表  3  加速器资源消耗、功耗及性能评估

    Table  3.   Accelerator resource consumption, power consumption and performance evaluation

    延迟/ms BRAM/块 DSP/个 LUT/个 FF/个 URAM/块 功耗/W
    5.437 1402 3267 508787 682198 33 41
    下载: 导出CSV

    表  4  本文加速器与CPU和GPU实现的比较

    Table  4.   Comparison of accelerator in this paper with CPU and GPU implementations

    型号 频率/GHz 延迟/ms 功耗/W 帧率/(帧·s−1) 能效/(帧·(s·W)−1)
    E5-2695[18] 3.12 198 100 5.051 0.05051
    Tesla T4[19] 1.005 17.90 75 55.86 0.745
    本文加速器 0.2 5.437 41 183.9 4.485
    下载: 导出CSV
  • [1] ZHANG B W, GU S Y, ZHANG B, et al. StyleSwin: transformer-based GAN for high-resolution image generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 11294-11304.
    [2] NGUYEN L X, TUN Y L, TUN Y K, et al. Swin transformer-based dynamic semantic communication for multi-user with different computing capacity[J]. IEEE Transactions on Vehicular Technology, 2024, 73(6): 8957-8972.
    [3] WU H N, CHEN C F, LIAO L, et al. DisCoVQA: temporal distortion-content transformers for video quality assessment[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(9): 4840-4854.
    [4] LIU Z, NING J, CAO Y, et al. Video swin transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2022: 3192-3201.
    [5] ZHAO H, ZHANG C, ZHU B L, et al. S3T: self-supervised pre-training with swin transformer for music classification[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE Press, 2022: 606-610.
    [6] LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2022: 9992-10002.
    [7] LIN X F, KIM S, JOO J. FairGRAPE: fairness-aware GRAdient pruning mEthod forFace attribute classification[C]//Proceedings of the Computer Vision-ECCV 2022. Berlin: Springer, 2022: 414-432.
    [8] BLALOCK D, ORTIZ J J G, FRANKLE J, et al. What is the state of neural network pruning? [EB/OL]. (2020-03-06)[2024-02-16]. https://arxiv.org/abs/2003.03033.
    [9] PRAGNESH T, MOHAN B R. Compression of convolution neural network using structured pruning[C]//Proceedings of the IEEE 7th International Conference for Convergence in Technology. Piscataway: IEEE Press, 2022: 1-5.
    [10] LIU Z C, MU H Y, ZHANG X Y, et al. MetaPruning: meta learning for automatic neural network channel pruning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 3295-3304.
    [11] CHEN H L, YANG J F, MAO S A. Convolutional layers acceleration by exploring optimal filter structures[C]//Proceedings of the IEEE International Conference on Recent Advances in Systems Science and Engineering. Piscataway: IEEE Press, 2022: 1-6.
    [12] CARBALLO M V, KIL LEE B. Accuracy-aware structured filter pruning for deep neural networks[C]//Proceedings of the International Conference on Computational Science and Computational Intelligence. Piscataway: IEEE Press, 2021: 679-682.
    [13] JAYAKODY S, WANG J. EMBARK: memory bounded architectural improvement in CSR-CSC sparse matrix multiplication[C]//Proceedings of the IEEE 9th International Conference on Collaboration and Internet Computing. Piscataway: IEEE Press, 2024: 8-17.
    [14] WANG S, KANG Y. Gradient distribution-aware INT8 training for neural networks[J]. Neurocomputing, 2023, 541: 126269.
    [15] KIM J, LEE C, CHO E, et al. Towards next-level post-training quantization of hyper-scale transformers[EB/OL]. (2024-02-14)[2024-02-16]. https://arxiv.org/abs/2402.08958.
    [16] SCHABACK R. Convergence analysis of the general Gauss-Newton algorithm[J]. Numerische Mathematik, 1985, 46(2): 281-309.
    [17] ARBEL M, MENEGAUX R, WOLINSKI P. Rethinking Gauss-Newton for learning over-parameterized models[C]//Proceedings of the Advances in Neural Information Processing Systems 36. San Diego: Neural Information Processing Systems Foundation, Inc., 2023: 33379-33402.
    [18] HU W, XU D, FAN Z M, et al. Vis-TOP: visual transformer overlay processor[EB/OL]. (2021-10-21)[2024-02-16]. https://arxiv.org/abs/2110.10957.
    [19] HAN Y T, LIU Q. HPTA: a high performance transformer accelerator based on FPGA[C]//Proceedings of the 33rd International Conference on Field-Programmable Logic and Applications. Piscataway: IEEE Press, 2023: 27-33.
  • 加载中
图(16) / 表(4)
计量
  • 文章访问数:  366
  • HTML全文浏览量:  116
  • PDF下载量:  25
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-04-16
  • 录用日期:  2024-05-17
  • 网络出版日期:  2024-06-04
  • 整期出版日期:  2026-06-30

目录

    /

    返回文章
    返回
    常见问答