留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

真实场景水下语义分割方法及数据集

马志伟 李豪杰 樊鑫 罗钟铉 李建军 王智慧

马志伟, 李豪杰, 樊鑫, 等 . 真实场景水下语义分割方法及数据集[J]. 北京航空航天大学学报, 2022, 48(8): 1515-1524. doi: 10.13700/j.bh.1001-5965.2021.0527
引用本文: 马志伟, 李豪杰, 樊鑫, 等 . 真实场景水下语义分割方法及数据集[J]. 北京航空航天大学学报, 2022, 48(8): 1515-1524. doi: 10.13700/j.bh.1001-5965.2021.0527
MA Zhiwei, LI Haojie, FAN Xin, et al. A real scene underwater semantic segmentation method and related dataset[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(8): 1515-1524. doi: 10.13700/j.bh.1001-5965.2021.0527(in Chinese)
Citation: MA Zhiwei, LI Haojie, FAN Xin, et al. A real scene underwater semantic segmentation method and related dataset[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(8): 1515-1524. doi: 10.13700/j.bh.1001-5965.2021.0527(in Chinese)

真实场景水下语义分割方法及数据集

doi: 10.13700/j.bh.1001-5965.2021.0527
基金项目: 

国家自然科学基金 61976038

国家自然科学基金 61932020

国家自然科学基金 61772108

国家自然科学基金 U1908210

详细信息
    通讯作者:

    王智慧, E-mail: zhwang@dlut.edu.cn

  • 中图分类号: TP37; TP242.6

A real scene underwater semantic segmentation method and related dataset

Funds: 

National Natural Science Foundation of China 61976038

National Natural Science Foundation of China 61932020

National Natural Science Foundation of China 61772108

National Natural Science Foundation of China U1908210

More Information
  • 摘要:

    随着水下生物抓取技术的不断发展,高精度的水下物体识别与分割成为了挑战。已有的水下目标检测技术仅能给出物体的大体位置,无法提供物体轮廓等更加细致的信息,严重影响了抓取效率。为了解决这一问题,标注并建立了真实场景水下语义分割数据集DUT-USEG,该数据集包含6 617张图像,其中1 487张具有语义分割和实例分割标注,剩余5 130张图像具有目标检测框标注。基于该数据集,提出了一个关注边界的半监督水下语义分割网络(US-Net),该网络通过设计伪标签生成器和边界检测子网络,实现了对水下物体与背景之间边界的精细学习,提升了边界区域的分割效果。实验表明:所提方法在DUT-USEG数据集的海参、海胆和海星3个类别上相较于对比方法提升了6.7%,达到了目前最好的分割精度。

     

  • 图 1  边界难以分割的样例图像

    Figure 1.  Sample image with indivisible border

    图 2  标注过程

    Figure 2.  Process of labeling

    图 3  语义分割的标注样例

    Figure 3.  Examples of annotations for semantic segmentation

    图 4  本文数据集中不同分辨率图像数量统计

    Figure 4.  Statistics on number of images with different resolutions in the proposed dataset

    图 5  本文数据集中不同类别对应的实例数量统计

    Figure 5.  Statistics on number of instances of different categories in the proposed dataset

    图 6  海洋生物不同类别数目不均衡问题的样例图像

    Figure 6.  Sampleimages showing the unbalanced numbers of different categories marine organisms

    图 7  US-Net网络框架

    Figure 7.  Framework of US-Net

    图 8  对比实验可视化结果

    Figure 8.  Visualization results of contrast experiments

    图 9  边界图可视化结果

    Figure 9.  Visualization results of boundary maps

    图 10  失败案例

    Figure 10.  Failure examples

    表  1  对比实验结果

    Table  1.   Results of contrast experiments

    方法 IOU/% mIOU/% FWIOU/% PA/%
    海参 海胆 扇贝 海星
    GrabCut 52.5 63.8 77.8 58.2 70.1 97.6 98.6
    GGANet 32.1 42.9 52.8 34.5 52.1 96.6 98.0
    US-Net(本文方法) 54.9 67.7 70.4 63.6 71.1 98.1 98.9
    下载: 导出CSV

    表  2  运行时间对比实验结果

    Table  2.   Results of contrast experiment for running time

    方法 运行时间/(s·张-1)
    GGANet 38.9
    US-Net(本文方法) 47.6
    下载: 导出CSV

    表  3  边界检测网络消融实验结果

    Table  3.   Results of ablation experiment for boundary detection network

    方法 mIOU/%
    只用高斯注意力图 68.8
    高斯注意力图+边界检测网络(本文方法) 71.1
    下载: 导出CSV

    表  4  伪标签生成器消融实验结果

    Table  4.   Results of ablation experiment for pseudo label generator

    方法 mIOU/%
    不使用伪标签生成器 67.6
    高斯注意力图作为伪标签 68.6
    高斯注意力图+类激活图作为伪标签(本文方法) 71.1
    下载: 导出CSV

    表  5  伪标签前景阈值消融实验结果

    Table  5.   Results of ablation experiment for foreground threshold of pseudo labels

    前景阈值 0.2 0.3 0.4
    mIOU % 70.5 71.1 69.7
    下载: 导出CSV

    表  6  像素对最大距离消融实验结果

    Table  6.   Results of ablation experiment for maximum distance of pixel pairs

    最大距离γ 5 10 15
    mIOU/% 70.2 71.1 71.3
    下载: 导出CSV
  • [1] LI C, GUO C, REN W, et al. An underwater image enhancement benchmark dataset and beyond[J]. IEEE Transactions on Image Processing, 2019, 29: 4376-4389.
    [2] LI C, ANWAR S, PORIKLI F. Underwater scene prior inspired deep underwater image and video enhancement[J]. Pattern Recognition, 2020, 98: 107038. doi: 10.1016/j.patcog.2019.107038
    [3] GUO Y, LI H, ZHUANG P. Underwater image enhancement using a multiscale dense generative adversarial network[J]. IEEE Journal of Oceanic Engineering, 2019, 45(3): 862-870.
    [4] CHEN L, LIU Z, TONG L, et al. Underwater object detection using invert multi-class adaboost with deep learning[C]//2020 International Joint Conference on Neural Networks (IJCNN). Piscataway: IEEE Press, 2020: 1-8.
    [5] LI X, SHANG M, QIN H, et al. Fast accurate fish detection and recognition of underwater images with fast R-CNN[C]//OCEANS 2015-MTS/IEEE Washington. Piscataway: IEEE Press, 2015: 1-5.
    [6] VILLON S, CHAUMONT M, SUBSOL G, et al. Coral reef fish detection and recognition in underwater videos by supervised machine learning: Comparison between deep learning and HOG+ SVM methods[C]//International Conference on Advanced Concepts for Intelligent Vision Systems. Berlin: Springer, 2016: 160-171.
    [7] LIU C W, LI H J, WANG S C, et al. A dataset and benchmark of underwater object detection for robot picking[EB/OL]. (2021-06-10)[2021-09-01]. https://arxiv.org/abs/2106.05681.
    [8] JIAN M, QI Q, DONG J, et al. The OUC-vision large-scale underwater image database[C]//2017 IEEE International Conference on Multimedia and Exposition (ICME). Piscataway: IEEE Press, 2017: 1297-1302.
    [9] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2015: 3431-3440.
    [10] DAI J, HE K, SUN J. BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2015: 1635-1643.
    [11] KHOREVA A, BENENSON R, HOSANG J, et al. Simple does it: Weakly supervised instance and semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 876-885.
    [12] ROTHER C, KOLMOGOROV V, BLAKE A. "GrabCut" interactive foreground extraction using iterated graph cuts[J]. ACM Transactions on Graphics, 2004, 23(3): 309-314. doi: 10.1145/1015706.1015720
    [13] PONT-TUSET J, ARBELAEZ P, BARRON J T, et al. Multiscale combinatorial grouping for image segmentation and object proposal generation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(1): 128-140.
    [14] SONG C, HUANG Y, OUYANG W, et al. Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 3136-3145.
    [15] LEE J, YI J, SHIN C, et al. BBAM: Bounding box attribution map for weakly supervised semantic and instance segmentation[EB/OL]. (2021-03-16)[2021-09-01]. https://arxiv.org/abs/2103.08907.
    [16] ZHANG P, WANG Z, MA X, et al. Learning to segment unseen category objects using gradient gaussian attention[C]//2019 IEEE International Conference on Multimedia and Exposition (ICME). Piscataway: IEEE Press, 2019: 1636-1641.
    [17] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]//European Conference on Computer Vision. Berlin: Springer, 2014: 740-755.
    [18] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer, 2015, 111: 98-136.
    [19] CHENG H K, CHUNG J, TAI Y W, et al. CascadePSP: Toward class-agnostic and very high-resolution segmentation via global and local refinement[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2020: 8890-8899.
    [20] ZHOU B, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 2921-2929.
    [21] AHN J, CHO S, KWAK S. Weakly supervised learning of instance segmentation with inter-pixel relations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 2209-2218.
    [22] KRÄHENBVHL P, KOLTUN V. Efficient inference in fully connected CRFS with gaussian edge potentials[J]. Advances in Neural Information Processing Systems, 2011, 24: 109-117.
    [23] LIU W, RABINOVICH A, BERG A C. ParseNet: Looking wider to see better[EB/OL]. (2015-11-19)[2021-09-01]. https://arxiv.org/abs/1506.04579.
    [24] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778.
  • 加载中
图(10) / 表(6)
计量
  • 文章访问数:  765
  • HTML全文浏览量:  182
  • PDF下载量:  98
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-09-06
  • 录用日期:  2021-10-01
  • 网络出版日期:  2022-01-28
  • 整期出版日期:  2022-08-20

目录

    /

    返回文章
    返回
    常见问答