Volume 50 Issue 11
Nov.  2024
Turn off MathJax
Article Contents
ZOU Y B,LI T,CHEN M,et al. Indoor spatial layout estimation model based on multi-task supervised learning[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(11):3327-3337 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0834
Citation: ZOU Y B,LI T,CHEN M,et al. Indoor spatial layout estimation model based on multi-task supervised learning[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(11):3327-3337 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0834

Indoor spatial layout estimation model based on multi-task supervised learning

doi: 10.13700/j.bh.1001-5965.2022.0834
Funds:  Shanghai Science and Technology Innovation Action Planning (20dz1203800)
More Information
  • Corresponding author: E-mail:mchen@shou.edu.cn
  • Received Date: 04 Oct 2022
  • Accepted Date: 04 Dec 2022
  • Available Online: 30 Dec 2022
  • Publish Date: 26 Dec 2022
  • Indoor spatial layout estimation is currently one of the research hotspots in the computer vision field. It plays a crucial role in object detection, augmented reality, and robot navigation. This paper proposed an indoor spatial layout estimation method based on multi-task supervised learning to efficiently perceive the layout relationship of indoor scenes. This method could extract the spatial segmentation map of indoor scenes in an end-to-end manner. According to the segmentation characteristics of indoor layout images, an encoder-decoder network structure was designed, and multi-task supervised learning was introduced to obtain the indoor spatial layout and the semantic edge results of each region. The joint loss function was defined to continuously optimize the segmentation effect during the model training. In order to better express the layout relationship between regions, the edge prediction results of each region were used to locally refine the output of the network model, so as to infer the final spatial layout of indoor scenes. Experiments on the public datasets LSUN and Hedau show that the proposed method can effectively optimize the effect of indoor spatial layout estimation and obtain 7.54% and 7.08% pixel errors respectively, which is better than the comparison method in general.

     

  • loading
  • [1]
    PARK S J, HONG K S. Recovering an indoor 3D layout with top-down semantic segmentation from a single image[J]. Pattern Recognition Letters, 2015, 68: 70-75. doi: 10.1016/j.patrec.2015.08.014
    [2]
    SAITO H, BABA S, KANADE T. Appearance-based virtual view generation from multicamera videos captured in the 3-D room[J]. IEEE Transactions on Multimedia, 2003, 5(3): 303-316. doi: 10.1109/TMM.2003.813283
    [3]
    DE CRISTÓFORIS P, NITSCHE M, KRAJNÍK T, et al. Hybrid vision-based navigation for mobile robots in mixed indoor/outdoor environments[J]. Pattern Recognition Letters, 2015, 53: 118-128. doi: 10.1016/j.patrec.2014.10.010
    [4]
    XIE H T, FANG S C, ZHA Z J, et al. Convolutional attention networks for scene text recognition[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2019, 15(1s): 1-17.
    [5]
    HONG J, HONG Y, UH Y, et al. Discovering overlooked objects: Context-based boosting of object detection in indoor scenes[J]. Pattern Recognition Letters, 2017, 86: 56-61. doi: 10.1016/j.patrec.2016.12.017
    [6]
    YU F, SEFF A, ZHANG Y D, et al. LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop[EB/OL]. (2016-06-04)[2022-10-01]. http://arxiv.org/abs/1506.03365.
    [7]
    HEDAU V, HOIEM D, FORSYTH D. Thinking inside the box: Using appearance models and context based on room geometry[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2010: 224-237.
    [8]
    YAN C G, LI L, ZHANG C J, et al. Cross-modality bridging and knowledge transferring for image understanding[J]. IEEE Transactions on Multimedia, 2019, 21(10): 2675-2685. doi: 10.1109/TMM.2019.2903448
    [9]
    DI MAURO D, FURNARI A, PATANÈ G, et al. SceneAdapt: Scene-based domain adaptation for semantic segmentation using adversarial learning[J]. Pattern Recognition Letters, 2020, 136: 175-182. doi: 10.1016/j.patrec.2020.06.002
    [10]
    BAHETI B, INNANI S, GAJRE S, et al. Semantic scene segmentation in unstructured environment with modified DeepLabV3+[J]. Pattern Recognition Letters, 2020, 138: 223-229. doi: 10.1016/j.patrec.2020.07.029
    [11]
    ISMAIL A S, SEIFELNASR M M, GUO H X. Understanding indoor scene: Spatial layout estimation, scene classification, and object detection[C]//Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing. New York: ACM, 2018: 64-70.
    [12]
    HUANG C, HE Z H. Task-driven progressive part localization for fine-grained recognition[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2016: 1-9.
    [13]
    TANG J H, JIN L, LI Z C, et al. RGB-D object recognition via incorporating latent data structure and prior knowledge[J]. IEEE Transactions on Multimedia, 2015, 17(11): 1899-1908. doi: 10.1109/TMM.2015.2476660
    [14]
    黄荣泽, 孟庆浩, 刘胤伯. 基于多任务监督学习的实时室内布局估计方法[J]. 激光与光电子学进展, 2021, 58(14): 1410023.

    HUANG R Z, MENG Q H, LIU Y B. Real-time indoor layout estimation method based on multi-task supervised learning[J]. Laser & Optoelectronics Progress, 2021, 58(14): 1410023(in Chinese).
    [15]
    COUGHLAN J, YUILLE A L. The Manhattan world assumption: Regularities in scene statistics which enable Bayesian inference[C]//Proceedings of the Neural Information Processing Systems. Trier: DBLP, 2000.
    [16]
    许宏科, 秦严严, 陈会茹. 一种基于改进Canny的边缘检测算法[J]. 红外技术, 2014, 36(3): 210-214.

    XU H K, QIN Y Y, CHEN H R. An improved algorithm for edge detection based on Canny[J]. Infrared Technology, 2014, 36(3): 210-214(in Chinese).
    [17]
    YILMAZ B, ABDULLAH S N H, KOK V J. Vanishing region loss for crowd density estimation[J]. Pattern Recognition Letters, 2020, 138: 336-345. doi: 10.1016/j.patrec.2020.08.001
    [18]
    WANG H Y, GOULD S, ROLLER D. Discriminative learning with latent variables for cluttered indoor scene understanding[J]. Communications of the ACM, 2013, 56(4): 92-99. doi: 10.1145/2436256.2436276
    [19]
    LIU C X, SCHWING A G, KUNDU K, et al. Rent3D: Floor-plan priors for monocular layout estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3413-3421.
    [20]
    DEL PERO L, BOWDISH J, KERMGARD B, et al. Understanding Bayesian rooms using composite 3D object models[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2013: 153-160.
    [21]
    DEL PERO L, BOWDISH J, FRIED D, et al. Bayesian geometric modeling of indoor scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2012: 2719-2726.
    [22]
    REN Y Z, LI S W, CHEN C, et al. A coarse-to-fine indoor layout estimation (CFILE) method[C]//Proceedings of the Asian Conference on Computer Vision. Berlin: Springer, 2017: 36-51.
    [23]
    LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3431-3440.
    [24]
    LEE C Y, BADRINARAYANAN V, MALISIEWICZ T, et al. RoomNet: End-to-end room layout estimation[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 4875-4884.
    [25]
    ZHANG W D, ZHANG W, GU J. Edge-semantic learning strategy for layout estimation in indoor environment[J]. IEEE Transactions on Cybernetics, 2020, 50(6): 2730-2739. doi: 10.1109/TCYB.2019.2895837
    [26]
    ZHENG W Z, LU J W, ZHOU J. Structural deep metric learning for room layout estimation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 735-751.
    [27]
    HIRZER M, ROTHP M, LEPETIT V. Smart hypothesis generation for efficient and robust room layout estimation[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2020: 2901-2909.
    [28]
    WANG A P, WEN S T, GAO Y J, et al. An efficient method for indoor layout estimation with FPN[C]//Proceedings of the International Conference on Web Information Systems Engineering. Berlin: Springer, 2021: 94-106.
    [29]
    KIRILLOV A, GIRSHICK R, HE K M, et al. Panoptic feature pyramid networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 6392-6401.
    [30]
    LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 936-944.
    [31]
    CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 833-851.
    [32]
    CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. doi: 10.1109/TPAMI.2017.2699184
    [33]
    CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1800-1807.
    [34]
    CHEN L C, BARRON J T, PAPANDREOU G, et al. Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4545-4554.
    [35]
    DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2009: 248-255.
    [36]
    MALLYA A, LAZEBNIK S. Learning informative edge maps for indoor scene layout prediction[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 936-944.
    [37]
    DASGUPTA S, FANG K, CHEN K, et al. DeLay: Robust spatial layout estimation for cluttered indoor scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 616-624.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(9)  / Tables(4)

    Article Metrics

    Article views(258) PDF downloads(15) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return