基于FPGA的低复杂度快速SIFT特征提取

姜晓明; 刘强

doi:10.13700/j.bh.1001-5965.2018.0438

基于FPGA的低复杂度快速SIFT特征提取

doi: 10.13700/j.bh.1001-5965.2018.0438

姜晓明^{1, 2},
刘强^{1, 2, ,}

1.
天津大学微电子学院, 天津 300072
2.
天津市成像与感知微电子技术重点实验室, 天津 300072

基金项目:

国家自然科学基金 61574099

详细信息

作者简介:
姜晓明男, 硕士研究生。主要研究方向:图像处理及其FPGA硬件加速系统设计

刘强男, 博士, 副教授, 博士生导师。主要研究方向:数字集成电路设计、高速低功耗电路系统设计等

通讯作者:
刘强, E-mail: qiangliu@tju.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 970
- HTML全文浏览量: 112
- PDF下载量: 380
- 被引次数: 14
出版历程
- 收稿日期: 2018-07-23
- 录用日期: 2018-10-19
- 网络出版日期: 2019-04-20

Low-complexity fast SIFT feature extraction based on FPGA

JIANG Xiaoming^{1, 2},
LIU Qiang^{1, 2
, ,}

1.
School of Microelectronics, Tianjin University, Tianjin 300072, China
2.
Tianjin Key Laboratory of Imaging and Sensing Microelectronic Technology, Tianjin 300072, China

Funds:

National Natural Science Foundation of China 61574099

More Information

Corresponding author: LIU Qiang, E-mail: qiangliu@tju.edu.cn

摘要

摘要:
尺度不变特征变换（SIFT）算法具有优良的鲁棒性，在计算机视觉领域得到广泛应用。针对SIFT算法高计算复杂度而导致其在CPU上运行实时性低的问题，基于现场可编程门阵列（FPGA）设计了一种低复杂度的快速SIFT硬件架构，主要对算法的特征描述符提取部分进行优化。通过降低梯度信息（包括梯度幅值和梯度方向）的位宽、优化高斯权重系数的产生、简化三线性插值系数的计算和简化梯度幅值直方图索引的求解等方法，避免了指数、三角函数和乘法等复杂计算，降低了硬件设计复杂度和硬件资源消耗。实验结果显示，提出的低复杂度快速SIFT硬件架构，与软件相比，可以获得约200倍的加速；与相关研究相比，速度提高了3倍，特征描述符稳定性提高了18%以上。
- 现场可编程门阵列(FPGA) /
- 尺度不变特征变换(SIFT) /
- 硬件设计 /
- 梯度信息 /
- 特征描述符提取
Abstract:
Scale invariant feature transform (SIFT) algorithm is widely used in the field of computer vision because of its excellent robustness. In order to solve the problem of low real-time performance of computation-intensive SIFT algorithm on CPU, a fast SIFT hardware architecture is proposed based on field programmable gate array (FPGA), with reduced complexity by optimizing the feature descriptor extraction part of the algorithm. By reducing the bit width of gradient information (including gradient amplitude and gradient direction), optimizing the generation of the Gauss weight coefficients, simplifying the calculation of the three linear interpolation coefficients and simplifying the computation process of the histogram index of the gradient amplitude, the proposed design avoids complex computations such as exponent, trigonometric function and multiplication, and reduces the complexity of hardware architecture and hardware resource consumption. The experimental results show that the proposed low-complexity fast SIFT hardware architecture can speed up by about 200 times compared to the software implementation. Compared with the related research, the speed is improved by 3 times and the stability of the feature descriptor is increased by more than 18%.
- field programmable gate array (FPGA) /
- scale invariant feature transform (SIFT) /
- hardware design /
- gradient information /
- feature descriptor extraction

HTML全文

图 1 子区域划分

Figure 1. Subregion division

下载: 全尺寸图片幻灯片

图 2 SIFT硬件系统架构

Figure 2. SIFT hardware system architecture

下载: 全尺寸图片幻灯片

图 3 特征描述符提取流水线结构

Figure 3. Feature descriptor extraction pipeline structure

下载: 全尺寸图片幻灯片

图 4 特征描述符提取硬件电路

Figure 4. Hardware circuit of feature descriptor extraction

下载: 全尺寸图片幻灯片

图 5 尺度变化特征匹配结果

Figure 5. Feature matching result of scale change

下载: 全尺寸图片幻灯片

表 1 相对梯度方向分类方式

Table 1. Classification method of relative gradient direction

前后类别说明	类别
分类前类别	0~7	8~11	12~15	16~19	20~23	24~27	28~31	32~35
分类后类别	0	1	2	3	4	5	6	7

下载: 导出CSV

表 2 旋转变化特征描述符稳定性检测

Table 2. Stability detection of feature descriptors for rotation change

旋转程度/(°)	匹配率
5	0.80
10	0.85
15	0.75
20	0.5
25	0.45
30	0.35
注：旋转变化平均匹配率为0.62。

下载: 导出CSV

表 3 特征描述符稳定性对比

Table 3. Stability comparison of feature descriptors

模糊程度(σ)及对比模式	文献[3]匹配率	本文匹配率
0.2	无	1
0.4	无	0.5
0.6	无	0.8
0.8	无	0.55
1	无	0.2
模糊处理^①	0.42	0.6
JPEG压缩	0.41	1
光照变化	0.69	1
仿射变化	0.4	0.62^②
相机变化	0.37	0.62^②
注：σ为二维高斯滤波器标准差；①文献[3]中模糊处理特征描述符稳定性检测只进行了1组实验，本文进行5组实验取平均值与其进行对比；②本文利用表 2中旋转变化平均匹配率与文献[3]的仿射变化和相机变化匹配率进行对比。

下载: 导出CSV

表 4 硬件资源消耗对比

Table 4. Comparison of hardware resource consumption

FPGA与资源消耗对比	文献[3]	文献[13]	文献[15]	本文
FPGA	Virtex-5	Virtex-5	Virtex-6	Virtex-7
LUT/个	26 398	38 179	57 598	23 726
Register/个	10 310	9 646	24 988	18 788
DSP/个	89	52	8	43
RAM/Mbit	7.8	2.4	1.2	1.2

下载: 导出CSV

表 5 梯度方向优化前后硬件资源消耗对比

Table 5. Comparison of hardware resource consumption before and after gradient direction optimization

模块	LUT/个	Register/个
梯度阵列(m=14，g=16)	0	7 680×2
梯度阵列(m=14，g=6)	0	5 120×2
梯度方向分类模块	102	6

下载: 导出CSV

表 6 硬件执行速度对比

Table 6. Comparison of hardware execution speed

类别	文献[3]	文献[13]	文献[15]	本文
图像大小/(像素×像素)	512×512	640×480	640×480	480×320
硬件平台	Virtex-5	Virtex-5	Virtex-6	Virtex-7
时钟频率/MHz	50(特征点检测)；100(特征描述符提取)	50	100	100
执行时间(N=1 000)/ms	8.14	27.28	30	10.37
PFS(归一化)/个	1.46	0.5	0.66	1

下载: 导出CSV

参考文献(15)

[1]	王亭亭, 蔡志浩, 王英勋.无人机室内视觉/惯导组合导航方法[J].北京航空航天大学学报, 2018, 44(1):176-186. WANG T T, CAI Z H, WANG Y X.Integrated vision/inertial navigation method of UAVs in indoor environment[J].Journal of Beijing University of Aeronautics and Astronautics, 2018, 44(1):176-186(in Chinese).
[2]	LOWE D G.Distinctive image features from scale-invariant keypoints[J].International Journal of Computer Vision, 2004, 60(2):91-110.
[3]	JIANG J, LI X, ZHANG G.SIFT hardware implementation for real-time image feature extraction[J].IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(7):1209-1220. doi: 10.1109/TCSVT.2014.2302535
[4]	MA W, WEN Z, WU Y, et al.Remote sensing image registration with modified SIFT and enhanced feature matching[J].IEEE Geoscience and Remote Sensing Letters, 2017, 14(1):3-7. doi: 10.1109/LGRS.2016.2600858
[5]	LONG J, SHELHAMER E, DARRELL T.Fully convolutional networks for semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2015: 3431-3440.
[6]	HE K, ZHANG X, REN S, et al.Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 770-778.
[7]	VIJAY K B G, CARNEIRO G, REID I.Learning local image descriptors with deep siamese and triplet convolutional networks by minimizing global loss functions[C]//IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2016: 5385-5394.
[8]	王琳, 刘强.基于局部特征的多目标图像分割算法[J].激光与光电子学进展, 2018, 55(6):061002. WANG L, LIU Q.A multi-object image segmentation algorithm based on local features[J].Laser & Optoelectronics Progress, 2018, 55(6):061002(in Chinese).
[9]	HEYMANN S, MVLLER K, SMOLIC A, et al.SIFT implementation and optimization for general-purpose GPU[J].Media Culture & Society, 2007, 67(1):7-13.
[10]	ZHANG Q, CHEN Y, ZHANG Y, et al.SIFT implementation and optimization for multi-core systems[C]//IEEE International Symposium on Parallel and Distributed Processing.Piscataway, NJ: IEEE Press, 2008: 1-8.
[11]	LIU Q, LIU J, SANG R, et al.Fast neural network training on fpga using quasi-Newton optimization method[J].IEEE Transactions on Very Large Scale Integration Systems, 2018, 26(8):1575-1579. doi: 10.1109/TVLSI.2018.2820016
[12]	VOURVOULAKIS J, KALOMIROS J, LYGOURAS J.FPGA accelerator for real-time SIFT matching with RANSAC support[J].Microprocessors and Microsystems, 2017, 49:105-116. doi: 10.1016/j.micpro.2016.11.011
[13]	QASAIMEH M, SAGAHYROON A, SHANABLEH T.FPGA-based parallel hardware architecture for real-time image classification[J].IEEE Transactions on Computational Imaging, 2015, 1(1):56-70.
[14]	VOURVOULAKIS J, KALOMIROS J, LYGOURAS J.Fully pipelined FPGA-based architecture for real-time SIFT extraction[J].Microprocessors and Microsystems, 2016, 40:53-73. doi: 10.1016/j.micpro.2015.11.013
[15]	CHIU L C, CHANG T S, CHEN J Y, et al.Fast SIFT design for real-time visual feature extraction[J].IEEE Transactions on Image Processing, 2013, 22(8):3158-3167. doi: 10.1109/TIP.2013.2259841