北京航空航天大学学报 ›› 2022, Vol. 48 ›› Issue (3): 522-532.doi: 10.13700/j.bh.1001-5965.2020.0611

• 论文 • 上一篇    下一篇

基于深度神经网络的像素级别可见光图像配准

黄晨威1, 程景春1, 潘雄1, 宋凝芳1, 刘冰2   

  1. 1. 北京航空航天大学 仪器科学与光电工程学院, 北京 100083;
    2. 湖北三江航天红峰控制有限公司, 孝感 432000
  • 收稿日期:2020-11-03 发布日期:2022-03-29
  • 通讯作者: 程景春 E-mail:chengjingchun14@163.com

Pixel-wise visible image registration based on deep neural network

HUANG Chenwei1, CHENG Jingchun1, PAN Xiong1, SONG Ningfang1, LIU Bing2   

  1. 1. School of Instrumentation and Optoelectronic Engineering, Beihang University, Beijing 100083, China;
    2. Hubei Sanjiang Aerospace Hongfeng Control Co., Ltd., Xiaogan 432000, China
  • Received:2020-11-03 Published:2022-03-29

摘要: 现有图像配准算法中,借助图像采集设备参数的方法存在硬件内参难以获得或精度不够的问题,采用匹配图像特征计算图像单应性的方法存在对场景深度信息利用不全的问题。针对这一现象,提出了结合可见光图像与其深度信息来生成更具有真实性的配准图像对数据,用以训练得到一个可以进行像素级别图像配准的深度神经网络PIR-Net。建立了一个大规模、多视角、超仿真的图像配准数据集:多视角配准(MVR)数据集,该数据集包含7 240对含有深度信息的待配准图像及其像素级别的坐标对准真值;基于编码器-解码器的深度神经网络结构,训练得到一个能以全分辨率形式对2幅输入图像之间的坐标变化矩阵进行重建的PIR-Net。通过实验验证了PIR-Net能够在未知相机内参的情况下实现不同视角的可见光图像配准,并比传统算法具有更高的配准精度。在MVR数据集上,PIR-Net的配准误差仅为通用的特征匹配对准算法(SIFT+RANSAC)的18%,同时减少了30%的时间消耗。

关键词: 深度学习, 图像配准, 坐标变换, 单应性估计, 图像深度值

Abstract: Current image registration algorithms relying on the internal parameters of sensing devices for image alignment face the difficulty of acquiring precise device parameters and reaching high mapping precision; while the ones using matched image features to calculate image homography matric for registration have the problem of insufficient utilization of scene depth information. Based on this observation, we propose a method which can generate more authentic image registration data from monocular images and their depth-maps, and use the data to train a pixel-wise image registration network, the PIR-Net, for fast, accurate and practical image registration. We construct a large-scale, multi-view, realistic image registration database with pixel-wise depth information that imitates real-world situations, the multi-view image registration (MVR) dataset. The MVR dataset contains 7 240 pairs of RGB images and their corresponding registraton labels. With the dataset, we train an encoder-decoder structure based, fully convolutional image registration network, the PIR-Net, extensive experiments on the MVR dataset demonstrate that the PIR-Net can predict pixel-wise image alignment matrix for multi-view RGB images without accessing the camera internal parameters, and that the PIR-Net out-performs traditional image registration methods. On the MVR dataset, the registration error of PIR-Net is only 18% of the general feature matching method (SIFT+RANSAC), and its time cost is 30% less.

Key words: deep learning, image registration, coordinate transformation, homography estimation, depth-map

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发