北京航空航天大学学报 ›› 2021, Vol. 47 ›› Issue (9): 1900-1907.doi: 10.13700/j.bh.1001-5965.2020.0310

• 论文 • 上一篇    下一篇

基于3D-Winograd的快速卷积算法设计及FPGA实现

林珂玉, 姜宏旭, 张永华, 丛容子   

  1. 北京航空航天大学 数字媒体北京市重点实验室, 北京 100083
  • 收稿日期:2020-07-03 发布日期:2021-10-09
  • 通讯作者: 姜宏旭 E-mail:jianghx@buaa.edu.cn
  • 基金资助:
    航天科学技术基金(190109);国家自然科学基金(61872017)

Design and FPGA implementation of fast convolution algorithm based on 3D-Winograd

LIN Keyu, JIANG Hongxu, ZHANG Yonghua, CONG Rongzi   

  1. Beijing Key Laboratory of Digital Media, Beihang University, Beijing 100083, China
  • Received:2020-07-03 Published:2021-10-09
  • Supported by:
    Aerospace Science and Technology Fund (190109);National Natural Science Foundation of China (61872017)

摘要: 近年来,卷积神经网络(CNN)已被计算机视觉任务广泛采用。由于FPGA的高性能、能效和可重新配置性,已被认为是最有前途的CNN硬件加速器,但是受FPGA计算能力、存储资源的限制,基于传统Winograd算法计算三维卷积的FPGA解决方案性能还有提升的空间。首先,研究了适用于三维运算的Winograd算法一维展开过程;然后,通过增加一次性输入特征图和卷积块的维度大小、低比特量化权重和输入数据等方法改善CNN在FPGA上的运行性能。优化思路包括使用移位代替部分除法的方法、分tile方案、二维到三维扩展及低比特量化等4个部分。相对传统的二维Winograd算法,优化算法每个卷积层的时钟周期数减少了7倍左右,相较传统滑窗卷积算法平均每个卷积层减少7倍左右。通过研究,证明了基于一维展开的3D-Winograd算法可以大大减少运算复杂度,并改善在FPGA运行CNN的性能。

关键词: 卷积神经网络(CNN), FPGA, Winograd, 卷积算法, 快速算法

Abstract: In recent years, Convolutional Neural Networks (CNNs) have been widely adopted by computer vision tasks. Due to the high performance, energy efficiency, and reconfigurability of FPGA, it has been considered as the most promising CNN hardware accelerator. However, the existing FPGA solutions based on the traditional Winograd method are usually limited by FPGA computing power and storage resources, and there is room for improvement in performance of 3D convolution operations. This paper first studied the one-dimensional expansion process of the Winograd algorithm suitable for three-dimensional operations; then, improved the performance of CNN on FPGA by increasing the one-time input feature map and the dimensional size of the convolution block, low-bit quantization weight and input data. The optimization ideas include four parts:the method of using shift instead of partial division, the division of tiles, the expansion of two-dimensional to three-dimensional, and low-bit quantization. Compared with the traditional two-dimensional Winograd algorithm, the number of clock cycles of each convolutional layer of the optimized algorithm is reduced by about 7 times, which is about 7 times less for each convolutional layer than the traditional sliding window convolution algorithm. Through the research, it is proved that the 3D-Winograd algorithm based on one-dimensional expansion can greatly reduce the computational complexity and improve the performance of running CNN on FPGA.

Key words: Convolutional Neural Network(CNN), FPGA, Winograd, convolution algorithm, fast algorithm

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发