Citation: | YE Hong, GU Naijie, LIN Chuanwen, et al. A shifter look-up table technique based on HXDSP[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 2044-2050. doi: 10.13700/j.bh.1001-5965.2019.0039(in Chinese) |
With the development of digital signal processing technology, the application of high-performance signal processing has attracted more and more attention, which also poses great challenges to the computing speed and throughput efficiency of the corresponding processors. The shifter unit is an important component on the digital signal processor (DSP). By designing additional dedicated random access memory (RAM) and look-up table (LUT) for the shifter unit, this paper optimizes and adjusts its instruction set and architecture, so as to improve the use efficiency and transmission rate of the processor. In addition, based on the shifter and the corresponding look-up table instruction, it can carry out shift, extraction, arithmetic and logical operation processing at the same time of data temporary storage. And the process of the partial data operation is directly merged into the data read/write process of the shifter RAM, which greatly improves the efficiency of arithmetic unit. Experiments show that the temporary storage technology based on the shifter look-up table can achieve the throughput rate close to the transmission bus, and the signal processing algorithm fast Fourier transformation (FFT) can achieve the performance improvement of the acceleration ratio of 1.15 to 1.20.
[1] |
EYRE J, BIER J.The evolution of DSP processors[J].IEEE Signal Processing Magazine, 2000, 17(2):43-51. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=60f7e87d7fa8cb991994e4fe58d2ff6c
|
[2] |
YE H, GU N, ZHANG X, et al.Design and implementation of a conflict-free memory accessing technique for FFT on multicluster VLIW DSP[J].IEICE Electronics Express, 2018, 15(18):20180674. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=J-STAGE_4425638
|
[3] |
LEE J S, SUNWOO M H.Design of new DSP instructions and their hardware architecture for high-speed FFT[J].Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 2003, 33(3):247-254.
|
[4] |
NISHITSUJI T, KAKUE T, SHIMOBABA T, et al.Conflict-free FFT circuit using loop architecture by 5-bank memory system[C]//IEEE Asia Pacific Conference on Circuits and Systems(APCCAS).Piscataway, NJ: IEEE Press, 2014: 523-526.
|
[5] |
CHANG H, SUNG W.Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware[C]//Proceedings of the 2008 International Conference on Compilers, Architectures and Synthesis for Embedded Systems.New York: ACM, 2008: 167-176.
|
[6] |
YE H, GU N, ZHANG X, et al.An efficient conflict-free memory-addressing unit for SIMD VLIW DSP[C]//2017 International Symposium on Performance Evaluation of Computer and Telecommunication Systems(SPECTS).Piscataway, NJ: IEEE Press, 2017: 1-7.
|
[7] |
LE G B, CASSEAU E, HUET S.Dynamic memory access management for high-performance DSP applications using high-level synthesis[J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems, 2008, 16(11):1454-1464. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=fe8b79dfbed844fa1702fe9b3683617c
|
[8] |
LUO H F, LIU Y J, SHIEH M D.Efficient memory-addressing algorithms for FFT processor design[J].IEEE Transactions on Very Large Scale Integration(VLSI) Systems, 2015, 23(10):2162-2172. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=5dfa16062699c44b13782efb9aaffe26
|
[9] |
HSIAO C F, CHEN Y, LEE C Y.A generalized mixed-radix algorithm for memory-based FFT processors[J].IEEE Transactions on Circuits and Systems Ⅱ:Express Briefs, 2010, 57(1):26-30. http://cn.bing.com/academic/profile?id=2927437f95a5cd8b8801f07dbeb86a42&encoded=0&v=paper_preview&mkt=zh-cn
|
[10] |
PENG Y X, ZOU J J.Design and implementation of ALU and shifter in X-DSP[J].Journal of Computer Applications, 2010, 30(7):1978-1982. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=jsjyy201007079
|
[11] |
PEREIRA R, MICHELL J A, SOLANA J M.Fully pipelined TSPC barrel shifter for high-speed applications[J].IEEE Journal of Solid-State Circuits, 1995, 30(6):686-690. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=521bf9bdc9c34d122f86dc3c9f839d19
|
[12] |
ACKEN K P, IRWIN M J, OWENS R M.Power comparisons for barrel shifters[C]//Proceedings of 1996 International Symposium on Low Power Electronics and Design.Piscataway, NJ: IEEE Press, 1996: 209-212.
|
[13] |
WESTE N, ESHRAGHIAN K.Principles of CMOS VLST design[M].Boston:Addison Wesey, 1993.
|
[14] |
Xillinx.PG058-block memory generator v8.4 product guide (v8.4)[EB/OL].(2017-10-04)[2019-01-28].https://china.xilinx.com/support/documentation/ip_documentation/blk_mem_gen/v8_4/pg058-blk-mem-gen.pdf.
|
[15] |
Altera.RAM-based shift register(ALTSHIFT_TAPS)IP core user guide[EB/OL].(2014-08-18)[2019-01-28].https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_shift_register_ram_based.pdf?wapkw=altshift_taps&_ga=2.155121400.1796521913.1564220766-1713056724.1564220743.
|
[16] |
中国电子科技集团公司第三十八研究所.BWDSP104x软件用户手册[Z].2015.
CETC-38.Software user's manual BWDSP104x[Z].2015(in Chinese).
|
[17] |
中国电子科技集团公司第三十八研究所.BWDSP104x硬件用户手册[Z].2015.
CETC-38.Hardware user's manual BWDSP104x[Z].2015(in Chinese).
|
[18] |
刘余福, 郎文辉, 贾光帅.HXDSP平台上矩阵乘法的实现与性能分析[J].计算机工程, 2019, 45(4):25-29. http://d.old.wanfangdata.com.cn/Periodical/jsjgc201904005
LIU Y F, LANG W H, JIA G S.Implementation and performance analysis of matrix multiplication on the platform HXDSP[J].Computer Engineering, 2019, 45(4):25-29(in Chinese). http://d.old.wanfangdata.com.cn/Periodical/jsjgc201904005
|