Memory optimized parallel LDPC decoder architecture design on GPU
-
摘要: 提出了一种基于Nvidia公司Fermi架构图形处理单元(GPU,Graphic Processing Unit)的分层低密度奇偶校验LDPC(Low-Density Parity-Check)码译码算法的译码器结构优化设计.利用GPU架构的并行性特点,采用帧间与层内双重并行的处理方式,充分利用流多处理器硬件资源,有效缓解了分层译码算法并行度受限的问题.此外,通过采取片上constant memory存储器压缩存储校验矩阵以及利用片外global memory存储器对译码迭代信息进行联合访问的优化方法,有效降低了访存延迟,提高了译码吞吐率.测试结果表明,通过采用多帧并行处理和存储器访问优化可以提升基于GPU的LDPC译码器吞吐率14.9~34.8倍.
-
关键词:
- 准循环低密度奇偶校验码 /
- 图形处理单元 /
- 多帧处理 /
- 分层译码算法 /
- 存储优化
Abstract: An optimized decoding architecture was proposed for low-density parity-check (LDPC) codes layered decoding algorithm based on Nvidia's Fermi graphic processing unit (GPU). In accordance with the parallelism characteristics in GPU hardware structure, inter-frame and intra-layer parallelization processing were adopted to fully utilize the resource of streaming multiprocessors (SM) and mitigate the decoding parallelism limitation in layered decoding algorithm. Secondly, by compressed storing parity-check matrix in on-chip constant memory and coalescing access the exchange information in off-chip global memory, the memory access latency was reduced, and hence the decoding throughput was improved. Simulation results show that 14.9x to 34.8x speed-up for decoding throughput is obtained by using multi-frame processing and memory access optimization on GPU platform. -
[1] Gallager R G.Low-density-parity-check-codes[J].IRE Trans-Information Theory,1962,18:21-28 [2] Mansour M M,Shanbhag N R.Turbo decoder architectures for low-density parity-check codes[C]// Chi-kuo Mao .Global Telecommunications Conference.Taipei:Cornell University,2002:1383-1388 [3] Mansour M M,Shanbhag N R.A 640-Mb/s 2048-bit programmable LDPC decoder chip [J].IEEE Journal on Solid-State Circuits,2006,41(3):684-698 [4] Zhang T,Parhi K.Joint (3,k)-regular LDPC code and decoder/encoder design[J].IEEE Trans Signal Processing,2004,52(4):1065-1079 [5] Dielissen J,Hekstra A.Low cost LDPC decoder for DVB-S2 [C]// Cyprian Grassmann.Proc Conf Design,Automation and Test in Europe (DATE '06).Munich,Germany:Designers' Forum,2006:130-136 [6] Daesun Oh,Keshab K Parhi.Low-complexity switch network for reconfigurable LDPC decoders [J].IEEE Transactions on Very Large Scale Integration Systems,2010,18(1):85-94 [7] Blake G,Dreslinski R G,Mudge T.A survey of multicore processors [J].IEEE Signal Processing Magazine,2009,26(6):26-37 [8] Sanders J,Kandrot E.CUDA by example [M].New Jersey,USA:Addison-Wesley,2010:4-6 [9] Goodnight N,Wang R,Humphreys G.Computation on programmable graphics hardware [J].IEEE Computer Graphics and Applications,2005,25(5):12-15 [10] Falc o G,Yamagiwa S,Silva V L.Parallel LDPC decoding on GPUs using a stream-based computing approach [J].Journal of Computer Science and Technology,2006,24:913-924 [11] Chang Chengchun,Chang Yanglang,Huang Minyu,et al.Accelerating regular LDPC code decoders on GPUs [J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing September,2011,4(3):653-659 [12] Gabriel Falc o,Leonel Sousa,Vitor Silva.Massively LDPC decoding on multicore architectures [J].IEEE Transactions on Parallel and Distributed Systems,2011,22(2):309-322 [13] Wang Guohui,Wu Michael,Sun Yang.A massively parallel implementation of QC-LDPC decoder on GPU [C]// Walid Najjar.2011 IEEE 9th Symposium on Application Specific Processors.San Diego,CA,USA:IEEE Computer Society,2011:82-85 [14] Ji Hyunwoo,Cho Junho,Sung Wonyong.Memory access optimized implementation of cyclic and quasi-cyclic LDPC codes on a GPGPU [J].Journal of Signal Processing Systems,2010,64(1):149-159 [15] CCSDS 131.1-O-2 Low density parity check codes for use in near-earth and deep space applications [S]
点击查看大图
计量
- 文章访问数: 1731
- HTML全文浏览量: 57
- PDF下载量: 717
- 被引次数: 0