Citation: | JIA Ruiming, LIU Shengjie, LI Jintao, et al. A visual localization method based on encoder-decoder dual-stream CNN[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 1965-1972. doi: 10.13700/j.bh.1001-5965.2019.0046(in Chinese) |
In order to calculate the camera pose from a single RGB image, a deep encoder-decoder dual-stream convolutional neural network (CNN) is proposed, which can improve the accuracy of visual localization. The network first uses an encoder to extract advanced features from input images. Second, the spacialresolution is enhancedby a pose decoder.Finally, a multi-scale estimator is used to output pose parameters. Becauseof the differentperformance of position and orientation, the network adopts a dual-stream structure from the decoder to process the position and orientationseparately. To restore the spatial information, several skip connections are added to encoder-decoder architecture. The experimental results show that the accuracy of the network is obviously improved compared with the congener state-of-the-art algorithms, and the orientation accuracy of camera pose is improved dramatically.
[1] |
CHEN D M, BAATZ G, KOSER K, et al.City-scale landmark identification on mobile devices[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2011: 12258110.
|
[2] |
TORII A, SIVIC J, PAJDLA T, et al.Visual place recognition with repetitive structures[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2013: 883-890.
|
[3] |
SCHINDLER G, BROWN M, SZELISKI R.City-scale location recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2007: 1-7.
|
[4] |
ARTH C, PIRCHHEIM C, VENTURA J, et al.Instant outdoor localization and SLAM initialization from 2.5 D maps[J].IEEE Transactions on Visualization and Computer Graphics, 2015, 21(11):1309-1318. doi: 10.1109/TVCG.2015.2459772
|
[5] |
POGLITSCH C, ARTH C, SCHMALSTIEG D, et al.A particle filter approach to outdoor localization using image-based rendering[C]//IEEE International Symposium on Mixed and Augmented Reality(ISMAR).Piscataway, NJ: IEEE Press, 2015: 132-135.
|
[6] |
SATTLER T, LEIBE B, KOBBELT L.Improving image-based localization by active correspondence search[C]//Proceedings of European Conference on Computer Vision.Berlin: Springer, 2012: 752-765.
|
[7] |
LI Y, SNAVELY N, HUTTENLOCHER D, et al.Worldwide pose estimation using 3D point clouds[C]//Proceedings of European Conference on Computer Vision.Berlin: Springer, 2012: 15-29.
|
[8] |
CHOUDHARY S, NARAYANAN P J.Visibility probability structure from SFM datasets and applications[C]//Proceedings of European Conference on Computer Vision.Berlin: Springer, 2012: 130-143.
|
[9] |
SVARM L, ENQVIST O, OSKARSSON M, et al.Accurate localization and pose estimation for large 3D models[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2014: 532-539.
|
[10] |
SHOTTON J, GLOCKER B, ZACH C, et al.Scene coordinate regression forests for camera relocalization in RGB-D images[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2013: 2930-2937.
|
[11] |
KENDALL A, GRIMES M, CIPOLLA R.PoseNet: A convolutional network for real-time 6-DOF camera relocalization[C]//Proceedings of IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2015: 2938-2946.
|
[12] |
WALCH F, HAZIRBAS C, LEAL-TAIXÉ L, et al.Image-based localization with spatial LSTMs[EB/OL].(2016-11-23)[2018-12-25].https://arxiv.org/pdf/1611.07890v1.
|
[13] |
CLARK R, WANG S, MARKHAM A, et al.VidLoc: A deep spatio-temporal model for 6-DOF video-clip relocalization[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2017: 6856-6864.
|
[14] |
KENDALL A, CIPOLLA R.Modelling uncertainty in deep learning forcamera relocalization[C]//Proceedings of IEEE International Conference on Robotics and Automation (ICRA).Piscataway, NJ: IEEE Press, 2016: 4762-4769.
|
[15] |
MELEKHOV I, YLIOINAS J, KANNALA J, et al.Image-based localization using hourglass networks[EB/OL].(2017-08-24)[2018-12-25].https://arxiv.org/abs/1703.07971.
|
[16] |
LI R, LIU Q, GUI J, et al.Indoor relocalization in challenging environments with dual-stream convolutional neural networks[J].IEEE Transactions on Automation Science and Engineering, 2018, 15(2):651-662. doi: 10.1109/TASE.2017.2664920
|
[17] |
RADWAN N, VALADA A, BURGARD W.Vlocnet++: Deep multitask learning for semantic visual localization and odometry[EB/OL].(2016-10-11)[2018-12-25].https://arxiv.org/abs/1804.08366.
|
[18] |
BRAHMBHATT S, GU J, KIM K, et al.Geometry-aware learning of maps for camera localization[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 2616-2625.
|
[19] |
LI X, YLIOINAS J, KANNALA J.Full-frame scene coordinate regression for image-based localization[EB/OL].(2018-01-25)[2018-12-25].https://arxiv.org/abs/1802.03237.
|
[20] |
LASKAR Z, MELEKHOV I, KALIA S, et al.Camera relocalization by computing pairwise relative poses using convolutional neural network[C]//Proceedings of IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 929-938.
|
[21] |
KENDALL A, GAL Y, CIPOLLA R.Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 7482-7491.
|
[22] |
SZEGEDY C, IOFFE S, VANHOUCKE V, et al.Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-First AAAI Conference on Artificial Intelligence, 2017: 4-12.
|
[23] |
KENDALL A, GAL Y.What uncertainties do we need in Bayesian deep learning for computer vision [EB/OL].(2017-10-05)[2018-12-26].https://arxiv.org/abs/1703.04977.
|
[24] |
IZADI S, KIM D, HILLIGES O, et al.KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera[C]//Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology.New York: ACM, 2011: 559-568.
|
[25] |
HE K M, ROSS G, PIOTR D.Rethinking imagenet pre-training[EB/OL].(2015-11-21)[2018-12-25].https://arxiv.org/abs/1811.08883.
|
[26] |
WU Y X, HE K M.Group normalization[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018, 3-19.
|