留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于GPU的LDPC存储优化并行译码结构设计

葛帅 刘荣科 侯毅

葛帅, 刘荣科, 侯毅等 . 基于GPU的LDPC存储优化并行译码结构设计[J]. 北京航空航天大学学报, 2013, 39(3): 421-426.
引用本文: 葛帅, 刘荣科, 侯毅等 . 基于GPU的LDPC存储优化并行译码结构设计[J]. 北京航空航天大学学报, 2013, 39(3): 421-426.
Ge Shuai, Liu Rongke, Hou Yiet al. Memory optimized parallel LDPC decoder architecture design on GPU[J]. Journal of Beijing University of Aeronautics and Astronautics, 2013, 39(3): 421-426. (in Chinese)
Citation: Ge Shuai, Liu Rongke, Hou Yiet al. Memory optimized parallel LDPC decoder architecture design on GPU[J]. Journal of Beijing University of Aeronautics and Astronautics, 2013, 39(3): 421-426. (in Chinese)

基于GPU的LDPC存储优化并行译码结构设计

基金项目: 航空电子系统综合技术重点实验室和航空科学基金联合资助项目(20115551022)
详细信息
    作者简介:

    葛帅(1988-),男,黑龙江哈尔滨人,硕士生,gysn.erwrew@gmail.com.

  • 中图分类号: TN911.2

Memory optimized parallel LDPC decoder architecture design on GPU

  • 摘要: 提出了一种基于Nvidia公司Fermi架构图形处理单元(GPU,Graphic Processing Unit)的分层低密度奇偶校验LDPC(Low-Density Parity-Check)码译码算法的译码器结构优化设计.利用GPU架构的并行性特点,采用帧间与层内双重并行的处理方式,充分利用流多处理器硬件资源,有效缓解了分层译码算法并行度受限的问题.此外,通过采取片上constant memory存储器压缩存储校验矩阵以及利用片外global memory存储器对译码迭代信息进行联合访问的优化方法,有效降低了访存延迟,提高了译码吞吐率.测试结果表明,通过采用多帧并行处理和存储器访问优化可以提升基于GPU的LDPC译码器吞吐率14.9~34.8倍.

     

  • [1] Gallager R G.Low-density-parity-check-codes[J].IRE Trans-Information Theory,1962,18:21-28 [2] Mansour M M,Shanbhag N R.Turbo decoder architectures for low-density parity-check codes[C]// Chi-kuo Mao .Global Telecommunications Conference.Taipei:Cornell University,2002:1383-1388 [3] Mansour M M,Shanbhag N R.A 640-Mb/s 2048-bit programmable LDPC decoder chip [J].IEEE Journal on Solid-State Circuits,2006,41(3):684-698 [4] Zhang T,Parhi K.Joint (3,k)-regular LDPC code and decoder/encoder design[J].IEEE Trans Signal Processing,2004,52(4):1065-1079 [5] Dielissen J,Hekstra A.Low cost LDPC decoder for DVB-S2 [C]// Cyprian Grassmann.Proc Conf Design,Automation and Test in Europe (DATE '06).Munich,Germany:Designers' Forum,2006:130-136 [6] Daesun Oh,Keshab K Parhi.Low-complexity switch network for reconfigurable LDPC decoders [J].IEEE Transactions on Very Large Scale Integration Systems,2010,18(1):85-94 [7] Blake G,Dreslinski R G,Mudge T.A survey of multicore processors [J].IEEE Signal Processing Magazine,2009,26(6):26-37 [8] Sanders J,Kandrot E.CUDA by example [M].New Jersey,USA:Addison-Wesley,2010:4-6 [9] Goodnight N,Wang R,Humphreys G.Computation on programmable graphics hardware [J].IEEE Computer Graphics and Applications,2005,25(5):12-15 [10] Falc o G,Yamagiwa S,Silva V L.Parallel LDPC decoding on GPUs using a stream-based computing approach [J].Journal of Computer Science and Technology,2006,24:913-924 [11] Chang Chengchun,Chang Yanglang,Huang Minyu,et al.Accelerating regular LDPC code decoders on GPUs [J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing September,2011,4(3):653-659 [12] Gabriel Falc o,Leonel Sousa,Vitor Silva.Massively LDPC decoding on multicore architectures [J].IEEE Transactions on Parallel and Distributed Systems,2011,22(2):309-322 [13] Wang Guohui,Wu Michael,Sun Yang.A massively parallel implementation of QC-LDPC decoder on GPU [C]// Walid Najjar.2011 IEEE 9th Symposium on Application Specific Processors.San Diego,CA,USA:IEEE Computer Society,2011:82-85 [14] Ji Hyunwoo,Cho Junho,Sung Wonyong.Memory access optimized implementation of cyclic and quasi-cyclic LDPC codes on a GPGPU [J].Journal of Signal Processing Systems,2010,64(1):149-159 [15] CCSDS 131.1-O-2 Low density parity check codes for use in near-earth and deep space applications [S]
  • 加载中
计量
  • 文章访问数:  1648
  • HTML全文浏览量:  38
  • PDF下载量:  713
  • 被引次数: 0
出版历程
  • 收稿日期:  2012-04-18
  • 网络出版日期:  2013-03-31

目录

    /

    返回文章
    返回
    常见问答