Citation: | JIA Ruiming, LIU Shengjie, LI Jintao, et al. A visual localization method based on encoder-decoder dual-stream CNN[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(10): 1965-1972. doi: 10.13700/j.bh.1001-5965.2019.0046(in Chinese) |
In order to calculate the camera pose from a single RGB image, a deep encoder-decoder dual-stream convolutional neural network (CNN) is proposed, which can improve the accuracy of visual localization. The network first uses an encoder to extract advanced features from input images. Second, the spacialresolution is enhancedby a pose decoder.Finally, a multi-scale estimator is used to output pose parameters. Becauseof the differentperformance of position and orientation, the network adopts a dual-stream structure from the decoder to process the position and orientationseparately. To restore the spatial information, several skip connections are added to encoder-decoder architecture. The experimental results show that the accuracy of the network is obviously improved compared with the congener state-of-the-art algorithms, and the orientation accuracy of camera pose is improved dramatically.
[1] |
CHEN D M, BAATZ G, KOSER K, et al.City-scale landmark identification on mobile devices[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2011: 12258110.
|
[2] |
TORII A, SIVIC J, PAJDLA T, et al.Visual place recognition with repetitive structures[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press, 2013: 883-890.
|
[3] |
SCHINDLER G, BROWN M, SZELISKI R.City-scale location recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2007: 1-7.
|
[4] |
ARTH C, PIRCHHEIM C, VENTURA J, et al.Instant outdoor localization and SLAM initialization from 2.5 D maps[J].IEEE Transactions on Visualization and Computer Graphics, 2015, 21(11):1309-1318. doi: 10.1109/TVCG.2015.2459772
|
[5] |
POGLITSCH C, ARTH C, SCHMALSTIEG D, et al.A particle filter approach to outdoor localization using image-based rendering[C]//IEEE International Symposium on Mixed and Augmented Reality(ISMAR).Piscataway, NJ: IEEE Press, 2015: 132-135.
|
[6] |
SATTLER T, LEIBE B, KOBBELT L.Improving image-based localization by active correspondence search[C]//Proceedings of European Conference on Computer Vision.Berlin: Springer, 2012: 752-765.
|
[7] |
LI Y, SNAVELY N, HUTTENLOCHER D, et al.Worldwide pose estimation using 3D point clouds[C]//Proceedings of European Conference on Computer Vision.Berlin: Springer, 2012: 15-29.
|
[8] |
CHOUDHARY S, NARAYANAN P J.Visibility probability structure from SFM datasets and applications[C]//Proceedings of European Conference on Computer Vision.Berlin: Springer, 2012: 130-143.
|
[9] |
SVARM L, ENQVIST O, OSKARSSON M, et al.Accurate localization and pose estimation for large 3D models[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2014: 532-539.
|
[10] |
SHOTTON J, GLOCKER B, ZACH C, et al.Scene coordinate regression forests for camera relocalization in RGB-D images[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2013: 2930-2937.
|
[11] |
KENDALL A, GRIMES M, CIPOLLA R.PoseNet: A convolutional network for real-time 6-DOF camera relocalization[C]//Proceedings of IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2015: 2938-2946.
|
[12] |
WALCH F, HAZIRBAS C, LEAL-TAIXÉ L, et al.Image-based localization with spatial LSTMs[EB/OL].(2016-11-23)[2018-12-25].
|
[13] |
CLARK R, WANG S, MARKHAM A, et al.VidLoc: A deep spatio-temporal model for 6-DOF video-clip relocalization[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2017: 6856-6864.
|
[14] |
KENDALL A, CIPOLLA R.Modelling uncertainty in deep learning forcamera relocalization[C]//Proceedings of IEEE International Conference on Robotics and Automation (ICRA).Piscataway, NJ: IEEE Press, 2016: 4762-4769.
|
[15] |
MELEKHOV I, YLIOINAS J, KANNALA J, et al.Image-based localization using hourglass networks[EB/OL].(2017-08-24)[2018-12-25].
|
[16] |
LI R, LIU Q, GUI J, et al.Indoor relocalization in challenging environments with dual-stream convolutional neural networks[J].IEEE Transactions on Automation Science and Engineering, 2018, 15(2):651-662. doi: 10.1109/TASE.2017.2664920
|
[17] |
RADWAN N, VALADA A, BURGARD W.Vlocnet++: Deep multitask learning for semantic visual localization and odometry[EB/OL].(2016-10-11)[2018-12-25].
|
[18] |
BRAHMBHATT S, GU J, KIM K, et al.Geometry-aware learning of maps for camera localization[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 2616-2625.
|
[19] |
LI X, YLIOINAS J, KANNALA J.Full-frame scene coordinate regression for image-based localization[EB/OL].(2018-01-25)[2018-12-25].
|
[20] |
LASKAR Z, MELEKHOV I, KALIA S, et al.Camera relocalization by computing pairwise relative poses using convolutional neural network[C]//Proceedings of IEEE International Conference on Computer Vision.Piscataway, NJ: IEEE Press, 2017: 929-938.
|
[21] |
KENDALL A, GAL Y, CIPOLLA R.Multi-task learning using uncertainty to weigh losses for scene geometry and semantics[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Piscataway, NJ: IEEE Press, 2018: 7482-7491.
|
[22] |
SZEGEDY C, IOFFE S, VANHOUCKE V, et al.Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-First AAAI Conference on Artificial Intelligence, 2017: 4-12.
|
[23] |
KENDALL A, GAL Y.What uncertainties do we need in Bayesian deep learning for computer vision [EB/OL].(2017-10-05)[2018-12-26].
|
[24] |
IZADI S, KIM D, HILLIGES O, et al.KinectFusion: Real-time 3D reconstruction and interaction using a moving depth camera[C]//Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology.New York: ACM, 2011: 559-568.
|
[25] |
HE K M, ROSS G, PIOTR D.Rethinking imagenet pre-training[EB/OL].(2015-11-21)[2018-12-25].
|
[26] |
WU Y X, HE K M.Group normalization[C]//Proceedings of the European Conference on Computer Vision (ECCV), 2018, 3-19.
|
[1] | LIU C J,QIAO Z,YAN H W,et al. Semantic segmentation network of remote sensing images based on dual path supervision[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(3):732-741 (in Chinese). doi: 10.13700/j.bh.1001-5965.2023.0155. |
[2] | LIANG L M,YANG Y,ZHU C K,et al. Fusion of Mobile Vit and inverted gated codec retinal vessel segmentation algorithm[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(3):712-723 (in Chinese). doi: 10.13700/j.bh.1001-5965.2023.0088. |
[3] | BAI C P,ZHANG S Y,ZHANG X,et al. Spaceborne particle identification platform and application based on convolutional neural network[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(4):1313-1323 (in Chinese). doi: 10.13700/j.bh.1001-5965.2023.0171. |
[4] | LU S Q,GUAN F X,LAI H T,et al. Two-stage underwater image enhancement method based on convolutional neural networks[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(1):321-332 (in Chinese). doi: 10.13700/j.bh.1001-5965.2022.1003. |
[5] | REN Liqiang, WANG Haipeng, PAN Xinlong, WAN Bing, TANG Tiantian. A complex maneuver recognition method based on wavelet time-frequency image and lightweight CNN-Transformer hybrid neural network[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2024.0745 |
[6] | JI X,WU T X,WANG H G,et al. Attribute aggregation entity alignment based on multi-channel graph neural network[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(9):2791-2799 (in Chinese). doi: 10.13700/j.bh.1001-5965.2022.0703. |
[7] | ZHAO S,LIN L,LI Z,et al. Deck motion prediction and compensation technology based on BP neural network[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(9):2772-2780 (in Chinese). doi: 10.13700/j.bh.1001-5965.2022.0743. |
[8] | JU T,LIU S,WANG Z Q,et al. Task segmentation and parallel optimization of DNN model[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(9):2739-2752 (in Chinese). doi: 10.13700/j.bh.1001-5965.2022.0731. |
[9] | LI Rui, TANG Xun, DU Yanwei, ZHANG Rui, XU Bin. Terrain contour aided navigation based on neural network[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2024.0376 |
[10] | ZHANG Y T,LI Q Y,LIU S K. Tabular subordination relation extraction based on graph convolutional networks[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(4):1308-1315 (in Chinese). doi: 10.13700/j.bh.1001-5965.2022.0382. |
[11] | HE X N,LI Y H,YANG J. Optimization model for dual hub airline networks based on competition scenario[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(9):2902-2911 (in Chinese). doi: 10.13700/j.bh.1001-5965.2022.0709. |
[12] | LI Y H,YU H K,MA D F,et al. Improved transfer learning based dual-branch convolutional neural network image dehazing[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(1):30-38 (in Chinese). doi: 10.13700/j.bh.1001-5965.2022.0253. |
[13] | WANG Zai-sheng, WANG Xiao-feng, SHEN Guo-dong, ZHANG Zeng-jie, QUAN Da-ying. Self-Supervised Learning for Community Detection Based on Deep Graph Convolutional Networks[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023.0408 |
[14] | HOU Zhi-qiang, ZHAO Jia-xin, CHEN Yu, MA Su-gang, YU Wang-sheng, FAN Jiu-lun. Cascaded object drift determination network for long-term visual tracking[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023.0504 |
[15] | ZHANG Dong-dong, WANG Chun-ping, FU Qiang. Camouflaged Object Detection Network Based on Human Visual Mechanisms[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023.0511 |
[16] | LYU Z Y,NIE X Y,ZHAO A B. Prediction of wing aerodynamic coefficient based on CNN[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(3):674-680 (in Chinese). doi: 10.13700/j.bh.1001-5965.2021.0276. |
[17] | JIANG Y,CHEN M Y,YUAN Q,et al. Departure flight delay prediction based on spatio-temporal graph convolutional networks[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(5):1044-1052 (in Chinese). doi: 10.13700/j.bh.1001-5965.2021.0415. |
[18] | CHAI Guoqiang, WANG Dawei, LU Bin, LI Zhu. Lightweight densely connected network based on attention mechanism for single-image deraining[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(11): 2186-2192. doi: 10.13700/j.bh.1001-5965.2021.0294 |
[19] | TIAN Limei, GONG Mengtong, TANG Diyin, HAN Danyang, YU Jinsong, LI Chunwei. Degradation indicator extraction for aerospace CMG based on power consumption analysis[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(10): 1899-1905. doi: 10.13700/j.bh.1001-5965.2021.0060 |
[20] | LI Zheyang, ZHANG Ruyi, TAN Wenming, REN Ye, LEI Ming, WU Hao. A graph convolution network based latency prediction algorithm for convolution neural network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(12): 2450-2459. doi: 10.13700/j.bh.1001-5965.2021.0149 |