留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于多任务监督学习的室内空间布局估计模型

邹一波 李涛 陈明 葛艳 赵林林

邹一波,李涛,陈明,等. 基于多任务监督学习的室内空间布局估计模型[J]. 北京航空航天大学学报,2024,50(11):3327-3337 doi: 10.13700/j.bh.1001-5965.2022.0834
引用本文: 邹一波,李涛,陈明,等. 基于多任务监督学习的室内空间布局估计模型[J]. 北京航空航天大学学报,2024,50(11):3327-3337 doi: 10.13700/j.bh.1001-5965.2022.0834
ZOU Y B,LI T,CHEN M,et al. Indoor spatial layout estimation model based on multi-task supervised learning[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(11):3327-3337 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0834
Citation: ZOU Y B,LI T,CHEN M,et al. Indoor spatial layout estimation model based on multi-task supervised learning[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(11):3327-3337 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0834

基于多任务监督学习的室内空间布局估计模型

doi: 10.13700/j.bh.1001-5965.2022.0834
基金项目: 上海市科技创新计划(20dz1203800)
详细信息
    通讯作者:

    E-mail:mchen@shou.edu.cn

  • 中图分类号: V221+.3;TB553

Indoor spatial layout estimation model based on multi-task supervised learning

Funds: Shanghai Science and Technology Innovation Action Planning (20dz1203800)
More Information
  • 摘要:

    室内空间布局估计作为当下计算机视觉领域的研究之一,在目标检测、增强现实和机器人导航等任务中发挥着重要的作用。为更加有效地感知室内场景的布局关系,提出了一种基于多任务监督学习的室内空间布局估计方法,端到端地提取出室内场景的空间分割图。针对室内图像的分割特点,设计编码器-解码器的网络结构,并引入多任务监督学习,从而推理出室内空间布局和各区域的语义边缘结果;定义联合损失函数,在模型训练过程中不断优化分割效果;为更好地表达出各区域之间的布局关系,通过各区域的边缘预测结果,对网络模型的输出进行局部精细化处理,以推理出室内场景空间的最终布局。在公共数据集LSUN和Hedau上进行实验,所提方法能够有效地优化室内空间布局估计效果,分别获得7.54%和7.08%的像素误差,总体上优于对比方法。

     

  • 图 1  室内空间布局估计任务示例

    Figure 1.  Examples of indoor spatial layout estimation task

    图 2  多任务监督学习的网络架构

    Figure 2.  Network architecture of multi-task supervised learning

    图 3  编码器的主要结构

    Figure 3.  Main architecture of encoder

    图 4  解码器的主要结构

    Figure 4.  Main architecture of decoder

    图 5  室内空间布局数据的语义分割和语义边缘

    Figure 5.  Semantic segmentation and sematic edge of indoor spatial layout data

    图 6  本文方法在LSUN数据集上的表现

    Figure 6.  Performance of the proposed method on LSUN dataset

    图 7  本文方法在LSUN数据集上效果欠佳的示例

    Figure 7.  Poor performance of the proposed method on LSUN dataset

    图 8  本文方法在Hedau数据集上的表现

    Figure 8.  Performance of the proposed method on Hedau dataset

    图 9  引入多任务监督学习前后的视觉对比结果

    Figure 9.  Visual comparison results before and after introduction of multi-task supervised learning

    表  1  编码器和解码器的相关参数信息

    Table  1.   Relevant parameters of encoder and decoder

    模块 层次编号 模块类型 输入特征层 输出特征层 步长 采样率
    1 DSBlock 3×512×512 32×256×256 2
    2 IRBlock 32×256×256 16×256×256 1
    3~4 IRBlock 16×256×256 24×128×128 2
    5~7 IRBlock 24×128×128 32×64×64 2
    8~11 IRBlock 32×64×64 64×64×64 2
    12~14 IRBlock 64×64×64 96×64×64 1
    编码器 15~17 IRBlock 96×64×64 160×64×64 2
    18 IRBlock 160×64×64 320×64×64 1
    19 DSCBlock 320×64×64 256×64×64 1 1
    20 DSCBlock 320×64×64 256×64×64 1 6
    21 DSCBlock 320×64×64 256×64×64 1 12
    22 DSCBlock 320×64×64 256×64×64 1 8
    23 IPBlock 320×64×64 256×64×64 1
    1 SCBlock 24×128×128 48×128×128 1
    2 CCBlock 304×128×128 256×128×128 1
    解码器 3-1 SegBlock 256×128×128 5×128×128 1
    3-2 TEdgeBlock 256×128×128 1×128×128 1
    3-3 AEdgeBlock 256×128×128 5×128×128 1
    下载: 导出CSV

    表  2  LSUN数据集上本文方法与现有方法的对比

    Table  2.   Comparison between the proposed method and existing methods on LSUN dataset

    方法 像素误差/% 是否采用端到端方法
    文献[7] 24.23 ×
    文献[36] 16.71 ×
    文献[37] 10.63 ×
    RoomNet[24] 9.86
    CFILF[22] 9.31
    文献[27] 7.79
    文献[28] 7.99
    本文 7.54
    下载: 导出CSV

    表  3  Hedau数据集上本文方法与现有方法的对比

    Table  3.   Comparison between the proposed method and existing methods on Hedau dataset

    方法 像素误差/% 是否采用端到端方法
    文献[7] 21.20 ×
    文献[36] 12.83 ×
    文献[37] 9.73 ×
    RoomNet[24] 8.36
    CFILF[22] 8.67
    文献[27] 7.44
    本文 7.08
    下载: 导出CSV

    表  4  消融实验中本文方法各模块的优化结果

    Table  4.   Optimization results of each module by the proposed method in ablation experiments

    基准 改进
    编码器
    多任务
    监督学习
    特征融合
    后处理
    参数量/MB 单图预测
    时间/ms
    像素误差/%
    × × × 209.6 45 11.15
    × × 22.1 26 11.34
    × × 209.8 46 8.41
    × 22.4 28 8.63
    22.4 28 7.54
    下载: 导出CSV
  • [1] PARK S J, HONG K S. Recovering an indoor 3D layout with top-down semantic segmentation from a single image[J]. Pattern Recognition Letters, 2015, 68: 70-75. doi: 10.1016/j.patrec.2015.08.014
    [2] SAITO H, BABA S, KANADE T. Appearance-based virtual view generation from multicamera videos captured in the 3-D room[J]. IEEE Transactions on Multimedia, 2003, 5(3): 303-316. doi: 10.1109/TMM.2003.813283
    [3] DE CRISTÓFORIS P, NITSCHE M, KRAJNÍK T, et al. Hybrid vision-based navigation for mobile robots in mixed indoor/outdoor environments[J]. Pattern Recognition Letters, 2015, 53: 118-128. doi: 10.1016/j.patrec.2014.10.010
    [4] XIE H T, FANG S C, ZHA Z J, et al. Convolutional attention networks for scene text recognition[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2019, 15(1s): 1-17.
    [5] HONG J, HONG Y, UH Y, et al. Discovering overlooked objects: Context-based boosting of object detection in indoor scenes[J]. Pattern Recognition Letters, 2017, 86: 56-61. doi: 10.1016/j.patrec.2016.12.017
    [6] YU F, SEFF A, ZHANG Y D, et al. LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop[EB/OL]. (2016-06-04)[2022-10-01]. http://arxiv.org/abs/1506.03365.
    [7] HEDAU V, HOIEM D, FORSYTH D. Thinking inside the box: Using appearance models and context based on room geometry[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2010: 224-237.
    [8] YAN C G, LI L, ZHANG C J, et al. Cross-modality bridging and knowledge transferring for image understanding[J]. IEEE Transactions on Multimedia, 2019, 21(10): 2675-2685. doi: 10.1109/TMM.2019.2903448
    [9] DI MAURO D, FURNARI A, PATANÈ G, et al. SceneAdapt: Scene-based domain adaptation for semantic segmentation using adversarial learning[J]. Pattern Recognition Letters, 2020, 136: 175-182. doi: 10.1016/j.patrec.2020.06.002
    [10] BAHETI B, INNANI S, GAJRE S, et al. Semantic scene segmentation in unstructured environment with modified DeepLabV3+[J]. Pattern Recognition Letters, 2020, 138: 223-229. doi: 10.1016/j.patrec.2020.07.029
    [11] ISMAIL A S, SEIFELNASR M M, GUO H X. Understanding indoor scene: Spatial layout estimation, scene classification, and object detection[C]//Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing. New York: ACM, 2018: 64-70.
    [12] HUANG C, HE Z H. Task-driven progressive part localization for fine-grained recognition[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2016: 1-9.
    [13] TANG J H, JIN L, LI Z C, et al. RGB-D object recognition via incorporating latent data structure and prior knowledge[J]. IEEE Transactions on Multimedia, 2015, 17(11): 1899-1908. doi: 10.1109/TMM.2015.2476660
    [14] 黄荣泽, 孟庆浩, 刘胤伯. 基于多任务监督学习的实时室内布局估计方法[J]. 激光与光电子学进展, 2021, 58(14): 1410023.

    HUANG R Z, MENG Q H, LIU Y B. Real-time indoor layout estimation method based on multi-task supervised learning[J]. Laser & Optoelectronics Progress, 2021, 58(14): 1410023(in Chinese).
    [15] COUGHLAN J, YUILLE A L. The Manhattan world assumption: Regularities in scene statistics which enable Bayesian inference[C]//Proceedings of the Neural Information Processing Systems. Trier: DBLP, 2000.
    [16] 许宏科, 秦严严, 陈会茹. 一种基于改进Canny的边缘检测算法[J]. 红外技术, 2014, 36(3): 210-214.

    XU H K, QIN Y Y, CHEN H R. An improved algorithm for edge detection based on Canny[J]. Infrared Technology, 2014, 36(3): 210-214(in Chinese).
    [17] YILMAZ B, ABDULLAH S N H, KOK V J. Vanishing region loss for crowd density estimation[J]. Pattern Recognition Letters, 2020, 138: 336-345. doi: 10.1016/j.patrec.2020.08.001
    [18] WANG H Y, GOULD S, ROLLER D. Discriminative learning with latent variables for cluttered indoor scene understanding[J]. Communications of the ACM, 2013, 56(4): 92-99. doi: 10.1145/2436256.2436276
    [19] LIU C X, SCHWING A G, KUNDU K, et al. Rent3D: Floor-plan priors for monocular layout estimation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3413-3421.
    [20] DEL PERO L, BOWDISH J, KERMGARD B, et al. Understanding Bayesian rooms using composite 3D object models[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2013: 153-160.
    [21] DEL PERO L, BOWDISH J, FRIED D, et al. Bayesian geometric modeling of indoor scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2012: 2719-2726.
    [22] REN Y Z, LI S W, CHEN C, et al. A coarse-to-fine indoor layout estimation (CFILE) method[C]//Proceedings of the Asian Conference on Computer Vision. Berlin: Springer, 2017: 36-51.
    [23] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3431-3440.
    [24] LEE C Y, BADRINARAYANAN V, MALISIEWICZ T, et al. RoomNet: End-to-end room layout estimation[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 4875-4884.
    [25] ZHANG W D, ZHANG W, GU J. Edge-semantic learning strategy for layout estimation in indoor environment[J]. IEEE Transactions on Cybernetics, 2020, 50(6): 2730-2739. doi: 10.1109/TCYB.2019.2895837
    [26] ZHENG W Z, LU J W, ZHOU J. Structural deep metric learning for room layout estimation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2020: 735-751.
    [27] HIRZER M, ROTHP M, LEPETIT V. Smart hypothesis generation for efficient and robust room layout estimation[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2020: 2901-2909.
    [28] WANG A P, WEN S T, GAO Y J, et al. An efficient method for indoor layout estimation with FPN[C]//Proceedings of the International Conference on Web Information Systems Engineering. Berlin: Springer, 2021: 94-106.
    [29] KIRILLOV A, GIRSHICK R, HE K M, et al. Panoptic feature pyramid networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 6392-6401.
    [30] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 936-944.
    [31] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 833-851.
    [32] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. doi: 10.1109/TPAMI.2017.2699184
    [33] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 1800-1807.
    [34] CHEN L C, BARRON J T, PAPANDREOU G, et al. Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 4545-4554.
    [35] DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2009: 248-255.
    [36] MALLYA A, LAZEBNIK S. Learning informative edge maps for indoor scene layout prediction[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 936-944.
    [37] DASGUPTA S, FANG K, CHEN K, et al. DeLay: Robust spatial layout estimation for cluttered indoor scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 616-624.
  • 加载中
图(9) / 表(4)
计量
  • 文章访问数:  236
  • HTML全文浏览量:  92
  • PDF下载量:  5
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-10-04
  • 录用日期:  2022-12-04
  • 网络出版日期:  2022-12-26
  • 整期出版日期:  2024-11-30

目录

    /

    返回文章
    返回
    常见问答