Volume 46 Issue 9
Sep.  2020
Turn off MathJax
Article Contents
DU Yujia, LI Haisheng, YAO Chunlian, et al. Monocular image based 3D model retrieval using triplet network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(9): 1691-1700. doi: 10.13700/j.bh.1001-5965.2020.0057(in Chinese)
Citation: DU Yujia, LI Haisheng, YAO Chunlian, et al. Monocular image based 3D model retrieval using triplet network[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(9): 1691-1700. doi: 10.13700/j.bh.1001-5965.2020.0057(in Chinese)

Monocular image based 3D model retrieval using triplet network

doi: 10.13700/j.bh.1001-5965.2020.0057
Funds:

National Natural Science Foundation of China 61877002

Beijing Natural Science Foundation and Fengtai Rail Transit Frontier Research Joint Fund L191009

Beijing Municipal Education Commission Research Team Construction Project PXM2019_014213_000007

More Information
  • Corresponding author: LI Haisheng, E-mail: lihsh@btbu.edu.cn
  • Received Date: 28 Feb 2020
  • Accepted Date: 28 Mar 2020
  • Publish Date: 20 Sep 2020
  • With the diversified development of media data, the cross-domain retrieval between images and 3D models becomes a new challenge for 3D model retrieval. In view that images and 3D models are extremely different and hard to match, a cross-domain retrieval algorithm based on triple network is proposed to construct a joint embedding space for real images and 3D shapes in an end-to-end manner. Then the similarity between different modal data could be effectively computed by the distance in the space, leading to accurate retrieval of similar 3D models from single image. In order to improve the accuracy of cross-domain retrieval, the 3D model was represented by a set of sequential views, and the Gate Recurrent Unit (GRU) was utilized for view-level features to generate the global feature. In addition, an attention mechanism was introduced to extract image features and bridge the semantic gaps between the real image and the rendered 3D views. Experimental results show that the mean average precision can be improved by at least 2.98%-3.05% on two cross-domain datasets compared with other similar algorithms.

     

  • loading
  • [1]
    BU S H, WANG L, HAN P C, et al.3D shape recognition and retrieval based on multi-modality deep learning[J].Neurocomputing, 2017, 259:183-193. doi: 10.1016/j.neucom.2016.06.088
    [2]
    蔡轶珩, 王雪艳, 胡绍斌, 等.基于多源图像弱监督学习的3D人体姿态估计[J].北京航空航天大学学报, 2019, 45(12):2375-2384. doi: 10.13700/j.bh.1001-5965.2019.0387

    CAI Y H, WANG X Y, HU S B, et al.Three-dimensional human pose estimation based on multi-source image weakly-supervised learning[J].Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(12):2375-2384(in Chinese). doi: 10.13700/j.bh.1001-5965.2019.0387
    [3]
    GIRDHAR R, FOUHEY D F, RODRIGUEZ M, et al.Learning a predictable and generative vector representation for objects[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 484-499.
    [4]
    TULSIANI S, GUPTA S, FOUHEY D F, et al.Factoring shape, pose, and layout from the 2d image of a 3d scene[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2018: 302-310.
    [5]
    IYER N, JAYANTI S, LOU K, et al.Three-dimensional shape searching:State-of-the-art review and future trends[J].Computer-Aided Design, 2005, 37(5):509-530. http://www.sciencedirect.com/science/article/pii/S001044850400140X
    [6]
    XIE J, FANG Y, ZHU F, et al.Deepshape: Deep learned shape descriptor for 3d shape matching and retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2015: 1275-1283.
    [7]
    MIAN A S, BENNAMOUN M, OWENS R A.Matching tensors for pose invariant automatic 3D face recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2005: 120.
    [8]
    KRIZHEVSKY A, SUTSKEVER I, HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2012: 1097-1105.
    [9]
    杨思晨, 王华锋, 王月海, 等.深度学习机制与小波融合的超分辨率重建算法[J].北京航空航天大学学报, 2020, 46(1):189-197. doi: 10.13700/j.bh.1001-5965.2019.0146

    YANG S C, WANG H F, WANG Y H, et al.Super-resolution reconstructing algorithm based on deep learning mechanism and wavelet fusion[J].Journal of Beijing University of Aeronautics and Astronautics, 2020, 46(1):189-197(in Chinese). doi: 10.13700/j.bh.1001-5965.2019.0146
    [10]
    GRABNER A, ROTH P M, LEPETIT V.Location field descriptors: Single image 3D model retrieval in the wild[C]//Proceedings of the 2019 International Conference on 3D Vision (3DV).Piscataway: IEEE Press, 2019: 583-593.
    [11]
    WU Z Z, ZHANG Y H, ZENG M, et al.Joint analysis of shapes and images via deep domain adaptation[J].Computers & Graphics, 2018, 70:140-147. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=464c4602d6b160c53d6eb7368840f7a7
    [12]
    WOO S, PARK J, LEE J Y, et al.CBAM: Convolutional block attention module[C]//European Conference on Computer Vision.Berlin: Springer, 2018: 3-19.
    [13]
    CHO K, VAN MERRIËNBOER B, GULCEHRE C, et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL].(2014-06-03)[2020-02-25].https://arxiv.org/abs/1406.1078.
    [14]
    LEE T, LIN Y L, CHIANG H Y, et al.Cross-domain image-based 3D shape retrieval by view sequence learning[C]//Proceedings of the 2018 International Conference on 3D Vision (3DV).Piscataway: IEEE Press, 2018: 258-266.
    [15]
    LI W, LIU A, NIE W Z, et al.SHREC 2019-Monocular image based 3D model retrieval[EB/OL].(2019-01-28)[2020-02-25].https://www.iti-tju.org/MI3DOR19/.
    [16]
    FANG Y, XIE J, DAI G, et al.3D deep shape descriptor[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2015: 2319-2328.
    [17]
    李海生, 武玉娟, 郑艳萍, 等.基于深度学习的三维数据分析理解方法研究综述[J/OL].计算机学报, 2019, 42: 1-25.(2019-07-09)[2020-02-21].http://kns.cnki.net/kcms/detail/11.1826.TP.20190709.1509.002.html.

    LI H S, WU Y J, ZHENG Y P, et al.A survey of 3D data analysis and understanding based on deep learning[J/OL].Chinese Journal of Computers, 2019, 42: 1-25.(2019-07-09)[2020-02-21].http://kns.cnki.net/kcms/detail/11.1826.TP.20190709.1509.002.html(in Chinese).
    [18]
    OSADA R, FUNKHOUSER T, CHAZELLE B, et al.Shape distributions[J].ACM Transactions on Graphics (TOG), 2002, 21(4):807-832. doi: 10.1145/571647.571648
    [19]
    MAHMOUDI M, SAPIRO G.Three-dimensional point cloud recognition via distributions of geometric distances[J].Graphical Models, 2009, 71(1):22-31. http://www.sciencedirect.com/science/article/pii/S1524070308000313
    [20]
    SUN J, OVSJANIKOV M, GUIBAS L.A concise and provably informative multi-scale signature based on heat diffusion[J].Computer Graphics Forum, 2009, 28(5):1383-1392. doi: 10.1111/j.1467-8659.2009.01515.x
    [21]
    AUBRY M, SCHLICKEWEI U, CREMERS D.The wave kernel signature: A quantum mechanical approach to shape analysis[C]//Proceedings of 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).Piscataway: IEEE Press, 2011: 1626-1633.
    [22]
    WANG P S, SUN C Y, LIU Y, et al.Adaptive O-CNN:A patch-based deep representation of 3D shapes[J].ACM Transactions on Graphics (TOG), 2018, 37(6):1-11. http://arxiv.org/abs/1809.07917
    [23]
    FENG Y, FENG Y, YOU H, et al.MeshNet: Mesh neural network for 3D shape representation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto: AAAI Press, 2019, 33: 8279-8286.
    [24]
    QI C R, YI L, SU H, et al.Pointnet++: Deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems.Cambridge: MIT Press, 2017: 5099-5108.
    [25]
    HUANG H B, KALOGERAKIS E, CHAUDHURI S, et al.Learning local shape descriptors from part correspondences with multiview convolutional networks[J].ACM Transactions on Graphics (TOG), 2017, 37(1):1-14. doi: 10.1145/3137609
    [26]
    WANG P S, LIU Y, GUO Y X, et al.O-CNN:Octree-based convolutional neural networks for 3d shape analysis[J].ACM Transactions on Graphics (TOG), 2017, 36(4):1-11. http://dl.acm.org/citation.cfm?id=3073608
    [27]
    HAN Z, SHANG M, LIU Z, et al.SeqViews2SeqLabels:Learning 3D global features via aggregating sequential views by RNN with attention[J].IEEE Transactions on Image Processing, 2018, 28(2):658-672. http://ieeexplore.ieee.org/document/8453813/
    [28]
    LAN S Y, YU R C, YU G, et al.Modeling local geometric structure of 3D point clouds using Geo-CNN[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2019: 998-1008.
    [29]
    SU H, MAJI S, KALOGERAKIS E, et al.Multi-view convolutional neural networks for 3d shape recognition[C]//Proceedings of the IEEE International Conference on Computer Vision.Piscataway: IEEE Press, 2015: 945-953.
    [30]
    FENG Y, ZHANG Z, ZHAO X, et al.GVCNN: Group-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2018: 264-272.
    [31]
    JIANG J, BAO D, CHEN Z, et al.MLVCNN: Multi-loop-view convolutional neural network for 3D shape retrieval[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto: AAAI Press, 2019, 33: 8513-8520.
    [32]
    TASSE F P, DODGSON N.Shape2Vec:Semantic-based descriptors for 3D shapes, sketches and images[J].ACM Transactions on Graphics (TOG), 2016, 35(6):1-12. http://www.zhangqiaokeyan.com/academic-journal-foreign_other_thesis/0204110052953.html
    [33]
    AUBRY M, MATURANA D, EFROS A A, et al.Seeing 3d chairs: Exemplar part-based 2d-3d alignment using a large dataset of cad models[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2014: 3762-3769.
    [34]
    MOTTAGHI R, XIANG Y, SAVARESE S.A coarse-to-fine model for 3d pose estimation and sub-category recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2015: 418-426.
    [35]
    KULIS B.Metric learning:A survey[J].Foundations and Trendsin Machine Learning, 2013, 5(4):287-364. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?bkn=8186753
    [36]
    XIANG Y, KIM W, CHEN W, et al.ObjectNet3D: A large scale database for 3d object recognition[C]//European Conference on Computer Vision.Berlin: Springer, 2016: 160-176.
    [37]
    WANG F, KANG L, LI Y.Sketch-based 3d shape retrieval using convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2015: 1875-1883.
    [38]
    LI Y Y, SU H, QI C R, et al.Joint embeddings of shapes and images via CNN image purification[J].ACM Transactions on Graphics (TOG), 2015, 34(6):1-12. http://dl.acm.org/citation.cfm?id=2818071
    [39]
    DAI G, XIE J, ZHU F, et al.Deep correlated metric learning for sketch-based 3d shape retrieval[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Palo Alto: AAAI Press, 2017: 4002-4008.
    [40]
    DAI G X, XIE J, FANG Y.Deep correlated holistic metric learning for sketch-based 3d shape retrieval[J].IEEE Transactions on Image Processing, 2018, 27(7):3374-3386. doi: 10.1109/TIP.2018.2817042
    [41]
    SCHROFF F, KALENICHENKO D, PHILBIN J.FaceNet: Aunified embedding for face recognition and clustering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Piscataway: IEEE Press, 2015: 815-823.
    [42]
    SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2014-09-04)[2020-02-25].https://arxiv.org/abs/1409.1556.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(4)  / Tables(3)

    Article Metrics

    Article views(915) PDF downloads(114) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return