留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于编解码双路卷积神经网络的视觉自定位方法

贾瑞明 刘圣杰 李锦涛 王赟豪 潘海侠

贾瑞明, 刘圣杰, 李锦涛, 等 . 基于编解码双路卷积神经网络的视觉自定位方法[J]. 北京航空航天大学学报, 2019, 45(10): 1965-1972. doi: 10.13700/j.bh.1001-5965.2019.0046
引用本文: 贾瑞明, 刘圣杰, 李锦涛, 等 . 基于编解码双路卷积神经网络的视觉自定位方法[J]. 北京航空航天大学学报, 2019, 45(10): 1965-1972. doi: 10.13700/j.bh.1001-5965.2019.0046
JIA Ruiming, LIU Shengjie, LI Jintao, et al. A visual localization method based on encoder-decoder dual-stream CNN[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 1965-1972. doi: 10.13700/j.bh.1001-5965.2019.0046(in Chinese)
Citation: JIA Ruiming, LIU Shengjie, LI Jintao, et al. A visual localization method based on encoder-decoder dual-stream CNN[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 1965-1972. doi: 10.13700/j.bh.1001-5965.2019.0046(in Chinese)

基于编解码双路卷积神经网络的视觉自定位方法

doi: 10.13700/j.bh.1001-5965.2019.0046
基金项目: 

国家重点研发计划 2017YFB0802300

北京市教委面上项目 KM201510009005

北方工业大学学生科技活动项目 110051360007

详细信息
    作者简介:

    贾瑞明    男, 博士, 助理研究员。主要研究方向:计算机视觉、深度学习、模式识别

    刘圣杰   男, 硕士研究生。主要研究方向:计算机视觉、深度学习

    通讯作者:

    贾瑞明, E-mail: jiaruiming@ncut.edu.cn

  • 中图分类号: V249.32+9;TP391

A visual localization method based on encoder-decoder dual-stream CNN

Funds: 

National Key R & D Program of China 2017YFB0802300

The General Program of Beijing Municipal Education Commission KM201510009005

Science and Technology Activities for Students of NCUT 110051360007

More Information
  • 摘要:

    为了从单张RGB图像估计出相机的位姿信息,提出了一种深度编解码双路卷积神经网络(CNN),提升了视觉自定位的精度。首先,使用编码器从输入图像中提取高维特征;然后,使用解码器提升特征的空间分辨率;最后,通过多尺度位姿预测器输出位姿参数。由于位置和姿态的特性不同,网络从解码器开始采用双路结构,对位置和姿态分别进行处理,并且在编解码之间增加跳跃连接以保持空间信息。实验结果表明:所提网络的精度与目前同类型算法相比有明显提升,其中相机姿态角度精度有较大提升。

     

  • 图 1  BiLocNet结构

    Figure 1.  Architecture of BiLocNet

    图 2  Inception-Resnet-V2编码器结构

    Figure 2.  Architecture of encoder Inception-Resnet-V2

    图 3  MSR模块结构

    Figure 3.  Module architecture of MSR

    图 4  SSR模块结构

    Figure 4.  Module architecture of SSR

    图 5  7-scenes数据集样例

    Figure 5.  Samples of 7-scenes dataset

    图 6  全连接预测器

    Figure 6.  FC estimator

    图 7  GAP预测器

    Figure 7.  GAP estimator

    图 8  有无预训练的BiLocNet损失函数曲线

    Figure 8.  Loss function curves of pre-trained and non-pre-trained BiLocNet

    表  1  不同场景下不同算法的位置误差和姿态误差

    Table  1.   Position error and orientation error with different scenes for various algorithms

    场景 位置误差/m, 姿态误差/(°)
    PoseNet[11] Bayesian DS[15] LSTM-Pose[13] VidLoc[12] Hourglass[14] BiLocNet
    Chess 0.32, 8.12 0.28, 7.05 0.24, 5.77 0.18, N/A 0.15, 6.17 0.13, 5.13
    Fire 0.47, 14.4 0.43, 12.52 0.34, 11.9 0.26, N/A 0.27, 10.84 0.29, 10.48
    Heads 0.29, 12.0 0.25, 12.72 0.21, 13.7 0.14, N/A 0.19, 11.63 0.16, 12.67
    Office 0.48, 7.68 0.30, 8.92 0.30, 8.08 0.26, N/A 0.21, 8.48 0.25, 6.82
    Pupkin 0.47, 8.42 0.36, 7.53 0.33, 7.00 0.36, N/A 0.25, 8.12 0.25, 5.23
    Kitchen 0.59, 8.64 0.45, 9.80 0.37, 8.83 0.32, N/A 0.27, 10.15 0.26, 6.95
    Stairs 0.47, 13.8 0.42, 13.06 0.40, 13.7 0.26, N/A 0.29, 12.46 0.33, 9.86
    均值 0.44, 10.44 0.35, 10.22 0.31, 9.85 0.25, N/A 0.23, 9.69 0.23, 8.16
    下载: 导出CSV

    表  2  不同预测器的网络精度对比

    Table  2.   Comparison of network precision of different estimators

    预测器 位置误差/m 姿态误差/(°)
    全连接预测器 0.26 8.03
    GAP预测器 0.14 5.21
    多尺度位姿预测器 0.13 5.13
    下载: 导出CSV

    表  3  α权重损失函数与σ权重损失函数的网络精度对比

    Table  3.   Comparison of network precision between α weights loss function and σ weights loss function

    损失函数 位置误差/m 姿态误差/(°)
    α权重 0.19 6.55
    σ权重 0.17 5.64
    下载: 导出CSV

    表  4  有无迁移学习的网络精度对比

    Table  4.   Comparison of network precision with and without transfer learning

    有无迁移学习 位置误差/m 姿态误差/(°)
    0.32 10.64
    0.29 10.48
    下载: 导出CSV
  • [1] CHEN D M, BAATZ G, KOSER K, et al.City-scale landmark identification on mobile devices[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2011: 12258110.
    [2] TORII A, SIVIC J, PAJDLA T, et al.Visual place recognition with repetitive structures[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2013: 883-890.
    [3] SCHINDLER G, BROWN M, SZELISKI R.City-scale location recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2007: 1-7.
    [4] ARTH C, PIRCHHEIM C, VENTURA J, et al.Instant outdoor localization and SLAM initialization from 2.5 D maps[J].IEEE Transactions on Visualization and Computer Graphics, 2015, 21(11):1309-1318. doi: 10.1109/TVCG.2015.2459772
    [5] POGLITSCH C, ARTH C, SCHMALSTIEG D, et al.A particle filter approach to outdoor localization using image-based rendering[C]//IEEE International Symposium on Mixed and Augmented Reality(ISMAR).Piscataway, NJ: IEEE Press, 2015: 132-135.
    [6] SATTLER T, LEIBE B, KOBBELT L.Improving image-based localization by active correspondence search[C]//Proceedings of European Conference on Computer Vision.Berlin: Springer, 2012: 752-765.
    [7] LI Y, SNAVELY N, HUTTENLOCHER D, et al.Worldwide pose estimation using 3D point clouds[C]//Proceedings of European Conference on Computer Vision.Berlin: Springer, 2012: 15-29.
    [8] CHOUDHARY S, NARAYANAN P J.Visibility probability structure from SFM datasets and applications[C]//Proceedings of European Conference on Computer Vision.Berlin: Springer, 2012: 130-143.
    [9] SVARM L, ENQVIST O, OSKARSSON M, et al.Accurate localization and pose estimation for large 3D models[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2014: 532-539.
    [10] SHOTTON J, GLOCKER B, ZACH C, et al.Scene coordinate regression forests for camera relocalization in RGB-D images[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2013: 2930-2937.
    [11] KENDALL A, GRIMES M, CIPOLLA R.PoseNet: A convolutional network for real-time 6-DOF camera relocalization[C]//Proceedings of IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2015: 2938-2946.
    [12] WALCH F, HAZIRBAS C, LEAL-TAIXÉ L, et al.Image-based localization with spatial LSTMs[EB/OL].(2016-11-23)[2018-12-25].https://arxiv.org/pdf/1611.07890v1.
    [13] CLARK R, WANG S, MARKHAM A, et al.VidLoc: A deep spatio-temporal model for 6-DOF video-clip relocalization[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2017: 6856-6864.
    [14] KENDALL A, CIPOLLA R.Modelling uncertainty in deep learning forcamera relocalization[C]//Proceedings of IEEE International Conference on Robotics and Automation (ICRA).Piscataway, NJ: IEEE Press, 2016: 4762-4769.
    [15] MELEKHOV I, YLIOINAS J, KANNALA J, et al.Image-based localization using hourglass networks[EB/OL].(2017-08-24)[2018-12-25].https://arxiv.org/abs/1703.07971.
    [16] LI R, LIU Q, GUI J, et al.Indoor relocalization in challenging environments with dual-stream convolutional neural networks[J].IEEE Transactions on Automation Science and Engineering, 2018, 15(2):651-662. doi: 10.1109/TASE.2017.2664920
    [17] RADWAN N, VALADA A, BURGARD W.Vlocnet++: Deep multitask learning for semantic visual localization and odometry[EB/OL].(2016-10-11)[2018-12-25].https://arxiv.org/abs/1804.08366.
    [18] BRAHMBHATT S, GU J, KIM K, et al.Geometry-aware learning of maps for camera localization[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 2616-2625.
    [19] LI X, YLIOINAS J, KANNALA J.Full-frame scene coordinate regression for image-based localization[EB/OL].(2018-01-25)[2018-12-25].https://arxiv.org/abs/1802.03237.
    [20] LASKAR Z, MELEKHOV I, KALIA S, et al.Camera relocalization by computing pairwise relative poses using convolutional neural network[C]//Proceedings of IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 929-938.
    [21] KENDALL A, GAL Y, CIPOLLA R.Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 7482-7491.
    [22] SZEGEDY C, IOFFE S, VANHOUCKE V, et al.Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-First AAAI Conference on Artificial Intelligence, 2017: 4-12.
    [23] KENDALL A, GAL Y.What uncertainties do we need in Bayesian deep learning for computer vision [EB/OL].(2017-10-05)[2018-12-26].https://arxiv.org/abs/1703.04977.
    [24] IZADI S, KIM D, HILLIGES O, et al.KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera[C]//Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology.New York: ACM, 2011: 559-568.
    [25] HE K M, ROSS G, PIOTR D.Rethinking imagenet pre-training[EB/OL].(2015-11-21)[2018-12-25].https://arxiv.org/abs/1811.08883.
    [26] WU Y X, HE K M.Group normalization[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018, 3-19.
  • 加载中
图(8) / 表(4)
计量
  • 文章访问数:  827
  • HTML全文浏览量:  87
  • PDF下载量:  354
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-02-13
  • 录用日期:  2019-05-18
  • 网络出版日期:  2019-10-20

目录

    /

    返回文章
    返回
    常见问答