留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种基于HXDSP的移位器查找表技术

叶鸿 顾乃杰 林传文 张孝慈 陈瑞

叶鸿, 顾乃杰, 林传文, 等 . 一种基于HXDSP的移位器查找表技术[J]. 北京航空航天大学学报, 2019, 45(10): 2044-2050. doi: 10.13700/j.bh.1001-5965.2019.0039
引用本文: 叶鸿, 顾乃杰, 林传文, 等 . 一种基于HXDSP的移位器查找表技术[J]. 北京航空航天大学学报, 2019, 45(10): 2044-2050. doi: 10.13700/j.bh.1001-5965.2019.0039
YE Hong, GU Naijie, LIN Chuanwen, et al. A shifter look-up table technique based on HXDSP[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 2044-2050. doi: 10.13700/j.bh.1001-5965.2019.0039(in Chinese)
Citation: YE Hong, GU Naijie, LIN Chuanwen, et al. A shifter look-up table technique based on HXDSP[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 2044-2050. doi: 10.13700/j.bh.1001-5965.2019.0039(in Chinese)

一种基于HXDSP的移位器查找表技术

doi: 10.13700/j.bh.1001-5965.2019.0039
基金项目: 

安徽省科技重大专项 18030901011

合肥学院科研发展基金 19ZR03ZDA

详细信息
    作者简介:

    叶鸿  男, 博士研究生。主要研究方向:并行计算、体系架构优化

    顾乃杰  男, 博士, 教授, 博士生导师。主要研究方向:网络计算、大数据处理和分析、云计算与应用, 在线算法、软件和代码优化、大型软件代码检测, 深度学习软硬件系统、并行和分布式计算

    林传文  男, 博士。主要研究方向:编译优化、深度学习训练优化及应用

    陈瑞  男, 硕士。主要研究方向:体系架构优化

    通讯作者:

    顾乃杰, E-mail: gunj@ustc.edu.cn

  • 中图分类号: TP402

A shifter look-up table technique based on HXDSP

Funds: 

Anhui Province Science and Technology Major Project 18030901011

Scientific Research and Development Fund Project of Hefei University 19ZR03ZDA

More Information
  • 摘要:

    高性能信号处理应用的快速发展,对相应处理器的运算速度及吞吐效率提出了巨大挑战。移位器是数字信号处理器(DSP)上的重要部件,通过为移位器设计额外专用随机存取存储器(RAM)和查找表(LUT),并对其指令集及架构进行优化调整,从而达到提高处理器使用效率和传输速率的目的。此外,基于移位器与相应查找表指令,可在数据暂存的同时进行移位、提取、算术与逻辑运算处理,将部分数据运算的过程直接合并在对移位器RAM的数据存读取过程中,显著地提高了运算部件的使用效率。结果表明:基于移位器查找表的暂存技术可以达到与传输总线接近的吞吐率,对信号处理算法快速傅里叶变换(FFT)可以达到加速比约为1.15~1.20的性能提升效果。

     

  • 图 1  HXDSP104x系统架构

    Figure 1.  Architecture of HXDSP104x system

    图 2  HXDSP处理器向量处理单元

    Figure 2.  Vector processing unit for HXDSP Processor

    图 3  移位器查找表结构

    Figure 3.  Structure of shifter look-up table

    图 4  16点DIT-FFT

    Figure 4.  16-DIT-FFT

    图 5  使用移位器查找表前后对比

    Figure 5.  Comparison with and without shifter look-up table

    表  1  64点FFT偏移地址

    Table  1.   64-FFT offset address

    层数 偏移(D) 偏移(O)
    1 32 100000
    2 16 010000
    3 8 001000
    4 4 000100
    5 2 000010
    6 1 000001
    下载: 导出CSV
  • [1] EYRE J, BIER J.The evolution of DSP processors[J].IEEE Signal Processing Magazine, 2000, 17(2):43-51. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=60f7e87d7fa8cb991994e4fe58d2ff6c
    [2] YE H, GU N, ZHANG X, et al.Design and implementation of a conflict-free memory accessing technique for FFT on multicluster VLIW DSP[J].IEICE Electronics Express, 2018, 15(18):20180674. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=J-STAGE_4425638
    [3] LEE J S, SUNWOO M H.Design of new DSP instructions and their hardware architecture for high-speed FFT[J].Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 2003, 33(3):247-254.
    [4] NISHITSUJI T, KAKUE T, SHIMOBABA T, et al.Conflict-free FFT circuit using loop architecture by 5-bank memory system[C]//IEEE Asia Pacific Conference on Circuits and Systems(APCCAS).Piscataway, NJ: IEEE Press, 2014: 523-526.
    [5] CHANG H, SUNG W.Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware[C]//Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems.New York: ACM, 2008: 167-176.
    [6] YE H, GU N, ZHANG X, et al.An efficient conflict-free memory-addressing unit for SIMD VLIW DSP[C]//2017 International Symposium on Performance Evaluation of Computer and Telecommunication Systems(SPECTS).Piscataway, NJ: IEEE Press, 2017: 1-7.
    [7] LE G B, CASSEAU E, HUET S.Dynamic memory access management for high-performance DSP applications using high-level synthesis[J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems, 2008, 16(11):1454-1464. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=fe8b79dfbed844fa1702fe9b3683617c
    [8] LUO H F, LIU Y J, SHIEH M D.Efficient memory-addressing algorithms for FFT processor design[J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems, 2015, 23(10):2162-2172. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=5dfa16062699c44b13782efb9aaffe26
    [9] HSIAO C F, CHEN Y, LEE C Y.A generalized mixed-radix algorithm for memory-based FFT processors[J].IEEE Transactions on Circuits and Systems Ⅱ:Express Briefs, 2010, 57(1):26-30. http://cn.bing.com/academic/profile?id=2927437f95a5cd8b8801f07dbeb86a42&encoded=0&v=paper_preview&mkt=zh-cn
    [10] PENG Y X, ZOU J J.Design and implementation of ALU and shifter in X-DSP[J].Journal of Computer Applications, 2010, 30(7):1978-1982. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=jsjyy201007079
    [11] PEREIRA R, MICHELL J A, SOLANA J M.Fully pipelined TSPC barrel shifter for high-speed applications[J].IEEE Journal of Solid-State Circuits, 1995, 30(6):686-690. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=521bf9bdc9c34d122f86dc3c9f839d19
    [12] ACKEN K P, IRWIN M J, OWENS R M.Power comparisons for barrel shifters[C]//Proceedings of 1996 International Symposium on Low Power Electronics and Design.Piscataway, NJ: IEEE Press, 1996: 209-212.
    [13] WESTE N, ESHRAGHIAN K.Principles of CMOS VLST design[M].Boston:Addison Wesey, 1993.
    [14] Xillinx.PG058-block memory generator v8.4 product guide (v8.4)[EB/OL].(2017-10-04)[2019-01-28].https://china.xilinx.com/support/documentation/ip_documentation/blk_mem_gen/v8_4/pg058-blk-mem-gen.pdf.
    [15] Altera.RAM-based shift register(ALTSHIFT_TAPS)IP core user guide[EB/OL].(2014-08-18)[2019-01-28].https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_shift_register_ram_based.pdf?wapkw=altshift_taps&_ga=2.155121400.1796521913.1564220766-1713056724.1564220743.
    [16] 中国电子科技集团公司第三十八研究所.BWDSP104x软件用户手册[Z].2015.

    CETC-38.Software user's manual BWDSP104x[Z].2015(in Chinese).
    [17] 中国电子科技集团公司第三十八研究所.BWDSP104x硬件用户手册[Z].2015.

    CETC-38.Hardware user's manual BWDSP104x[Z].2015(in Chinese).
    [18] 刘余福, 郎文辉, 贾光帅.HXDSP平台上矩阵乘法的实现与性能分析[J].计算机工程, 2019, 45(4):25-29. http://d.old.wanfangdata.com.cn/Periodical/jsjgc201904005

    LIU Y F, LANG W H, JIA G S.Implementation and performance analysis of matrix multiplication on the platform HXDSP[J].Computer Engineering, 2019, 45(4):25-29(in Chinese). http://d.old.wanfangdata.com.cn/Periodical/jsjgc201904005
  • 加载中
图(5) / 表(1)
计量
  • 文章访问数:  927
  • HTML全文浏览量:  206
  • PDF下载量:  339
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-01-29
  • 录用日期:  2019-03-22
  • 网络出版日期:  2019-10-20

目录

    /

    返回文章
    返回
    常见问答