-
摘要:
高性能信号处理应用的快速发展,对相应处理器的运算速度及吞吐效率提出了巨大挑战。移位器是数字信号处理器(DSP)上的重要部件,通过为移位器设计额外专用随机存取存储器(RAM)和查找表(LUT),并对其指令集及架构进行优化调整,从而达到提高处理器使用效率和传输速率的目的。此外,基于移位器与相应查找表指令,可在数据暂存的同时进行移位、提取、算术与逻辑运算处理,将部分数据运算的过程直接合并在对移位器RAM的数据存读取过程中,显著地提高了运算部件的使用效率。结果表明:基于移位器查找表的暂存技术可以达到与传输总线接近的吞吐率,对信号处理算法快速傅里叶变换(FFT)可以达到加速比约为1.15~1.20的性能提升效果。
-
关键词:
- 数字信号处理器(DSP) /
- 移位器 /
- 查找表(LUT) /
- 单指令多数据流(SIMD) /
- 超长指令字(VLIW)
Abstract:With the development of digital signal processing technology, the application of high-performance signal processing has attracted more and more attention, which also poses great challenges to the computing speed and throughput efficiency of the corresponding processors. The shifter unit is an important component on the digital signal processor (DSP). By designing additional dedicated random access memory (RAM) and look-up table (LUT) for the shifter unit, this paper optimizes and adjusts its instruction set and architecture, so as to improve the use efficiency and transmission rate of the processor. In addition, based on the shifter and the corresponding look-up table instruction, it can carry out shift, extraction, arithmetic and logical operation processing at the same time of data temporary storage. And the process of the partial data operation is directly merged into the data read/write process of the shifter RAM, which greatly improves the efficiency of arithmetic unit. Experiments show that the temporary storage technology based on the shifter look-up table can achieve the throughput rate close to the transmission bus, and the signal processing algorithm fast Fourier transformation (FFT) can achieve the performance improvement of the acceleration ratio of 1.15 to 1.20.
-
表 1 64点FFT偏移地址
Table 1. 64-FFT offset address
层数 偏移(D) 偏移(O) 1 32 100000 2 16 010000 3 8 001000 4 4 000100 5 2 000010 6 1 000001 -
[1] EYRE J, BIER J.The evolution of DSP processors[J].IEEE Signal Processing Magazine, 2000, 17(2):43-51. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=60f7e87d7fa8cb991994e4fe58d2ff6c [2] YE H, GU N, ZHANG X, et al.Design and implementation of a conflict-free memory accessing technique for FFT on multicluster VLIW DSP[J].IEICE Electronics Express, 2018, 15(18):20180674. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=J-STAGE_4425638 [3] LEE J S, SUNWOO M H.Design of new DSP instructions and their hardware architecture for high-speed FFT[J].Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 2003, 33(3):247-254. [4] NISHITSUJI T, KAKUE T, SHIMOBABA T, et al.Conflict-free FFT circuit using loop architecture by 5-bank memory system[C]//IEEE Asia Pacific Conference on Circuits and Systems(APCCAS).Piscataway, NJ: IEEE Press, 2014: 523-526. [5] CHANG H, SUNG W.Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware[C]//Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems.New York: ACM, 2008: 167-176. [6] YE H, GU N, ZHANG X, et al.An efficient conflict-free memory-addressing unit for SIMD VLIW DSP[C]//2017 International Symposium on Performance Evaluation of Computer and Telecommunication Systems(SPECTS).Piscataway, NJ: IEEE Press, 2017: 1-7. [7] LE G B, CASSEAU E, HUET S.Dynamic memory access management for high-performance DSP applications using high-level synthesis[J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems, 2008, 16(11):1454-1464. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=fe8b79dfbed844fa1702fe9b3683617c [8] LUO H F, LIU Y J, SHIEH M D.Efficient memory-addressing algorithms for FFT processor design[J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems, 2015, 23(10):2162-2172. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=5dfa16062699c44b13782efb9aaffe26 [9] HSIAO C F, CHEN Y, LEE C Y.A generalized mixed-radix algorithm for memory-based FFT processors[J].IEEE Transactions on Circuits and Systems Ⅱ:Express Briefs, 2010, 57(1):26-30. http://cn.bing.com/academic/profile?id=2927437f95a5cd8b8801f07dbeb86a42&encoded=0&v=paper_preview&mkt=zh-cn [10] PENG Y X, ZOU J J.Design and implementation of ALU and shifter in X-DSP[J].Journal of Computer Applications, 2010, 30(7):1978-1982. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=jsjyy201007079 [11] PEREIRA R, MICHELL J A, SOLANA J M.Fully pipelined TSPC barrel shifter for high-speed applications[J].IEEE Journal of Solid-State Circuits, 1995, 30(6):686-690. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=521bf9bdc9c34d122f86dc3c9f839d19 [12] ACKEN K P, IRWIN M J, OWENS R M.Power comparisons for barrel shifters[C]//Proceedings of 1996 International Symposium on Low Power Electronics and Design.Piscataway, NJ: IEEE Press, 1996: 209-212. [13] WESTE N, ESHRAGHIAN K.Principles of CMOS VLST design[M].Boston:Addison Wesey, 1993. [14] Xillinx.PG058-block memory generator v8.4 product guide (v8.4)[EB/OL].(2017-10-04)[2019-01-28].https://china.xilinx.com/support/documentation/ip_documentation/blk_mem_gen/v8_4/pg058-blk-mem-gen.pdf. [15] Altera.RAM-based shift register(ALTSHIFT_TAPS)IP core user guide[EB/OL].(2014-08-18)[2019-01-28].https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_shift_register_ram_based.pdf?wapkw=altshift_taps&_ga=2.155121400.1796521913.1564220766-1713056724.1564220743. [16] 中国电子科技集团公司第三十八研究所.BWDSP104x软件用户手册[Z].2015.CETC-38.Software user's manual BWDSP104x[Z].2015(in Chinese). [17] 中国电子科技集团公司第三十八研究所.BWDSP104x硬件用户手册[Z].2015.CETC-38.Hardware user's manual BWDSP104x[Z].2015(in Chinese). [18] 刘余福, 郎文辉, 贾光帅.HXDSP平台上矩阵乘法的实现与性能分析[J].计算机工程, 2019, 45(4):25-29. http://d.old.wanfangdata.com.cn/Periodical/jsjgc201904005LIU Y F, LANG W H, JIA G S.Implementation and performance analysis of matrix multiplication on the platform HXDSP[J].Computer Engineering, 2019, 45(4):25-29(in Chinese). http://d.old.wanfangdata.com.cn/Periodical/jsjgc201904005