Self-supervised scene depth estimation for monocular images based on uncertainty

CHAI Guoqiang; BO Xiangshi; LIU Haijun; LU Bin; WANG Dawei

doi:10.13700/j.bh.1001-5965.2022.0943

Volume 50 Issue 12

Dec. 2024

Turn off MathJax

Article Contents

Abstract

References

Journal of Beijing University of Aeronautics and Astronautics > 2024 > 50(12): 3780-3787.

CHAI G Q，BO X S，LIU H J，et al. Self-supervised scene depth estimation for monocular images based on uncertainty[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（12）：3780-3787 （in Chinese） doi: 10.13700/j.bh.1001-5965.2022.0943

Citation:

PDF( 1884 KB)

Self-supervised scene depth estimation for monocular images based on uncertainty

doi: 10.13700/j.bh.1001-5965.2022.0943

1.
School of Physics and Information Engineering，Shanxi Normal University，Taiyuan 030000，China
2.
School of Microelectronics and Communication Engineering，Chongqing University，Chongqing 400044，China

Funds: National Natural Science Foundation of China (62201333,62001063); Basic Research Plan of Shanxi Province (20210302124647); Science and Technology Innovation Project of Colleges and Universities in Shanxi Province (2021L269)

More Information

Corresponding author: E-mail：haijun_liu@cqu.edu.cn
Received Date: 24 Nov 2022
Accepted Date: 17 Mar 2023

Available Online: 31 Mar 2023

Publish Date: 27 Mar 2023

Abstract

Abstract

Depth information plays an important role in accurately understanding the three-dimensional scene structure and the three-dimensioual relationship between objects in images. An end-to-end self-supervised depth estimation algorithm based on uncertainty for monocular images was proposed in this paper by combining structure-from-motion, image reprojection, and uncertainty theory. The depth map of the target image was obtained by the encoder-decoder depth estimation network based on an improved densely connected module, and the transformation matrix of camera positions for shooting the target image and source image was calculated by the pose estimation network. Then, the source image was sampled pixel by pixel according to the image reprojection to obtain the reconstructed target image. The proposed algorithm was optimized by the reconstructed objective function, uncertain objective function, and smooth objective function, and the self-supervised depth information estimation was realized by minimizing the difference between the reconstructed image and the real target image. Experimental results show that the proposed algorithm achieves better depth estimation effects than the mainstream algorithms such as competitive collaboration estimation algorithm (CC), Monodepth2, and Hr-depth in terms of both objective indicators and subjective visual comparison.
- depth estimation,
- deep learning,
- self-supervised,
- image reprojection,
- uncertainty

FullText(HTML)

References(31)

References

[1]	李宏刚, 王云鹏, 廖亚萍, 等. 无人驾驶矿用运输车辆感知及控制方法[J]. 北京航空航天大学学报, 2019, 45(11): 2335-2344. LI H G, WANG Y P, LIAO Y P, et al. Perception and control method of driverless mining vehicle[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(11): 2335-2344(in Chinese).
[2]	CHENG Z Y, ZHANG Y, TANG C K. Swin-depth: Using transformers and multi-scale fusion for monocular-based depth estimation[J]. IEEE Sensors Journal, 2021, 21(23): 26912-26920. doi: 10.1109/JSEN.2021.3120753
[3]	IZADINIA H, SHAN Q, SEITZ S M. IM2CAD[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2422-2431.
[4]	ZHANG Y Y, XIONG Z W, YANG Z, et al. Real-time scalable depth sensing with hybrid structured light illumination[J]. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2014, 23(1): 97-109. doi: 10.1109/TIP.2013.2286901
[5]	LEE J, KIM Y, LEE S, et al. High-quality depth estimation using an exemplar 3D model for stereo conversion[J]. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(7): 835-847. doi: 10.1109/TVCG.2015.2398440
[6]	邓慧萍, 盛志超, 向森, 等. 基于语义导向的光场图像深度估计[J]. 电子与信息学报, 2022, 44(8): 2940-2948. DENG H P, SHENG Z C, XIANG S, et al. Depth estimation based on semantic guidance for light field image[J]. Journal of Electronics & Information Technology, 2022, 44(8): 2940-2948(in Chinese).
[7]	ZHANG J, CAO Y, ZHA Z J, et al. A unified scheme for super-resolution and depth estimation from asymmetric stereoscopic video[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26(3): 479-493. doi: 10.1109/TCSVT.2014.2367356
[8]	YANG J Y, ALVAREZ J M, LIU M M. Self-supervised learning of depth inference for multi-view stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 7522-7530.
[9]	FU H, GONG M M, WANG C H, et al. Deep ordinal regression network for monocular depth estimation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2002-2011.
[10]	UMMENHOFER B, ZHOU H Z, UHRIG J, et al. DeMoN: Depth and motion network for learning monocular stereo[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5622-5631.
[11]	KENDALL A, MARTIROSYAN H, DASGUPTA S, et al. End-to-end learning of geometry and context for deep stereo regression[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 66-75.
[12]	HAMBARDE P, MURALA S. S2DNet: Depth estimation from single image and sparse samples[J]. IEEE Transactions on Computational Imaging, 2020, 6: 806-817. doi: 10.1109/TCI.2020.2981761
[13]	BADKI A, TROCCOLI A, KIM K, et al. Bi3D: Stereo depth estimation via binary classifications[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 1597-1605.
[14]	DU Q C, LIU R K, PAN Y, et al. Depth estimation with multi-resolution stereo matching[C]//Proceedings of the IEEE Visual Communications and Image Processing. Piscataway: IEEE Press, 2019: 1-4.
[15]	JOHNSTON A, CARNEIRO G. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 4755-4764.
[16]	SONG M, LIM S, KIM W. Monocular depth estimation using Laplacian pyramid-based depth residuals[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(11): 4381-4393. doi: 10.1109/TCSVT.2021.3049869
[17]	RANJAN A, JAMPANI V, BALLES L, et al. Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 12232-12241.
[18]	GODARD C, MAC AODHA O, FIRMAN M, et al. Digging into self-supervised monocular depth estimation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE Press, 2019: 3827-3837.
[19]	ZHOU T H, BROWN M, SNAVELY N, et al. Unsupervised learning of depth and ego-motion from video[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6612-6619.
[20]	LI K H, FU Z H, WANG H Y, et al. Adv-depth: Self-supervised monocular depth estimation with an adversarial loss[J]. IEEE Signal Processing Letters, 2021, 28: 638-642. doi: 10.1109/LSP.2021.3065203
[21]	ZOU Y L, JI P, TRAN Q H, et al. Learning monocular visual odometry via self-supervised long-term modeling[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 710-727.
[22]	LYU X Y, LIU L, WANG M M, et al. HR-depth: High resolution self-supervised monocular depth estimation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021, 35(3): 2294-2301.
[23]	WAN Y C, ZHAO Q K, GUO C, et al. Multi-sensor fusion self-supervised deep odometry and depth estimation[J]. Remote Sensing, 2022, 14(5): 1228. doi: 10.3390/rs14051228
[24]	MAHJOURIAN R, WICKE M, ANGELOVA A. Unsupervised learning of depth and ego-motion from monocular video using 3D geometric constraints[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 5667-5675.
[25]	刘晓旻, 杜梦珠, 马治邦, 等. 基于遮挡场景的光场图像深度估计方法[J]. 光学学报, 2020, 40(5): 0510002. doi: 10.3788/AOS202040.0510002 LIU X M, DU M Z, MA Z B, et al. Depth estimation method of light field image based on occlusion scene[J]. Acta Optica Sinica, 2020, 40(5): 0510002(in Chinese). doi: 10.3788/AOS202040.0510002
[26]	YIN Z C, SHI J P. GeoNet: Unsupervised learning of dense depth, optical flow and camera pose[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 1983-1992.
[27]	KONG C, LUCEY S. Deep non-rigid structure from motion with missing data[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(12): 4365-4377. doi: 10.1109/TPAMI.2020.2997026
[28]	HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 2261-2269.
[29]	WANG P Q, CHEN P F, YUAN Y, et al. Understanding convolution for semantic segmentation[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2018: 1451-1460.
[30]	KENDALL A, GAL Y. What uncertainties do we need in Bayesian deep learning for computer vision[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. California: NIPS, 2017: 5580-5590.
[31]	GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: The KITTI dataset[J]. The International Journal of Robotics Research, 2013, 32(11): 1231-1237. doi: 10.1177/0278364913491297

Relative Articles

[1]	ZHANG K，LIU Y，HU K. RAW image reconstruction based on multi-scale attention mechanism[J]. Journal of Beijing University of Aeronautics and Astronautics，2025，51（1）：257-264 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0959.
[2]	LI K，SHEN Z G，ZHANG X J. Study on uncertainties of graphene tag antenna by screen printing[J]. Journal of Beijing University of Aeronautics and Astronautics，2025，51（3）：857-864 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2023.0159.
[3]	WANG Zhi-hui, XIANG Zhi-ning, GAO Ping. Research on Uncertainty in Kill Effectiveness of Anti-Ship Ballistic Missiles[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023.0774
[4]	ZHAO Z B，MA D Y，DING J T，et al. Weak supervision detection method for pin-missing bolts of transmission lines based on SAW-PCL[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（11）：3319-3326 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0832.
[5]	ZOU Y B，LI T，CHEN M，et al. Indoor spatial layout estimation model based on multi-task supervised learning[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（11）：3327-3337 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0834.
[6]	TIAN M Y，SHEN Z J. Trajectory planning of re-entry gliding vehicle in a class of uncertain environment[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（8）：2514-2523 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0640.
[7]	HU Jianping, GAO Zhipeng, MOU Yang, XIE Qi. Deep learning-based image watermarking combining attack classification and multi-channel embedding[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2024.0552
[8]	YANG B，WEI X，YU H，et al. Martian terrain feature extraction method based on unsupervised contrastive learning[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（6）：1842-1849 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0525.
[9]	LIANG Zhen-feng, XIA Hai-ying, TAN Yu-mei, SONG Shu-xiang. Aerial Image Stitching Algorithm Based on Unsupervised Deep Learning[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023.0366
[10]	PANG F Q，ZHAO H F，KANG Y Y. Uncertainty estimation fused end-to-end video event detection algorithm[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（12）：3759-3770 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0897.
[11]	LU G，ZHONG T X，GENG J. A Transformer based deep conditional video compression[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（2）：442-448 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0374.
[12]	ZHANG Dongdong, WANG Chunping, FU Qiang. FastSAM-assisted representation enhancement for self-supervised monocular depth estimation[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023.0846
[13]	HOU Z Q，MA J Y，HAN R X，et al. A fast long-term visual tracking algorithm based on deep learning[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（8）：2391-2403 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0645.
[14]	WANG Guang-han, SONG Chen, YANG Chao. Influence of airfoil uncertainty on aerodynamic characteristics and shape inspection method[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023.0647
[15]	DONG X X，YUE Z J，WANG Z，et al. Uncertainty lightweight design of sandwich structure of rocket fairing cone[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（3）：625-635 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2021.0267.
[16]	WANG Zai-sheng, WANG Xiao-feng, SHEN Guo-dong, ZHANG Zeng-jie, QUAN Da-ying. Self-Supervised Learning for Community Detection Based on Deep Graph Convolutional Networks[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023.0408
[17]	CHENG De-qiang, FAN Shu-ming, QIAN Jian-sheng, JIANG He, KOU Qi-qi. Coordinate-aware Attention-Based Multi-Frame Self-Supervised Monocular Depth Estimation[J]. Journal of Beijing University of Aeronautics and Astronautics. doi: 10.13700/j.bh.1001-5965.2023.0417
[18]	WANG Y G，YAO S Z，TAN H B. Residual SDE-Net for uncertainty estimates of deep neural networks[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（8）：1991-2000 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2021.0604.
[19]	SUN X T，CHENG W，CHEN W J，et al. A visual detection and grasping method based on deep learning[J]. Journal of Beijing University of Aeronautics and Astronautics，2023，49（10）：2635-2644 （in Chinese）. doi: 10.13700/j.bh.1001-5965.2022.0130.
[20]	ZHANG Wei, WANG Qiang, LU Jiachen, YAN Chao. Robust optimization design under geometric uncertainty based on PCA-HicksHenne method[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(12): 2473-2481. doi: 10.13700/j.bh.1001-5965.2021.0142

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(8) / Tables(3)

Get Citation

PDF

XML

Article Metrics

Article views(351) PDF downloads(15)

Self-supervised scene depth estimation for monocular images based on uncertainty

doi: 10.13700/j.bh.1001-5965.2022.0943

Abstract

References

Relative Articles

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Self-supervised scene depth estimation for monocular images based on uncertainty

doi: 10.13700/j.bh.1001-5965.2022.0943

Abstract

References

Relative Articles

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content