北京航空航天大学学报 ›› 2020, Vol. 46 ›› Issue (9): 1691-1700.doi: 10.13700/j.bh.1001-5965.2020.0057

• 论文 • 上一篇    下一篇

基于三元组网络的单图三维模型检索

杜雨佳1,2,3, 李海生1,2,3, 姚春莲1,2,3, 蔡强1,2,3   

  1. 1. 北京工商大学 计算机与信息工程学院, 北京 100048;
    2. 农产品质量安全追溯技术及应用国家工程实验室, 北京 100048;
    3. 食品安全大数据技术北京市重点实验室, 北京 100048
  • 收稿日期:2020-02-28 发布日期:2020-09-22
  • 通讯作者: 李海生 E-mail:lihsh@btbu.edu.cn
  • 作者简介:杜雨佳 女,硕士研究生。主要研究方向:计算机图形学;李海生 男,博士,教授,博士生导师。主要研究方向:计算机图形学;姚春莲 女,博士,副教授。主要研究方向:视频、图像处理、嵌入式系统设计;蔡强 男,博士,教授,博士生导师。主要研究方向:计算机图形学。
  • 基金资助:
    国家自然科学基金(61877002);北京市自然科学基金-丰台轨道交通前沿研究联合基金(L191009);北京市教委科研团队建设项目(PXM2019_014213_000007)

Monocular image based 3D model retrieval using triplet network

DU Yujia1,2,3, LI Haisheng1,2,3, YAO Chunlian1,2,3, CAI Qiang1,2,3   

  1. 1. School of Computer and Information Engineering, Beijing Technology and Business University, Beijing 100048, China;
    2. National Engineering Laboratory For Agri-product Quality Traceability, Beijing 100048, China;
    3. Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing 100048, China
  • Received:2020-02-28 Published:2020-09-22
  • Supported by:
    National Natural Science Foundation of China (61877002); Beijing Natural Science Foundation and Fengtai Rail Transit Frontier Research Joint Fund (L191009); Beijing Municipal Education Commission Research Team Construction Project(PXM2019_014213_000007)

摘要: 随着媒体数据的多样化发展,联合图像与三维模型的跨域检索成为三维模型检索问题的一个新挑战。针对图像与三维模型差异大、难匹配问题,提出了一种基于三元组网络的跨域数据检索方法。以端到端的方式构建真实图像与三维模型的特征联合嵌入空间,通过特征间的距离度量不同模态数据之间的相似性,实现从单张图像检索相似的三维模型。为了提高跨域检索准确度,将三维模型用一组顺序视图表示,结合门控循环单元(GRU)聚合视图级特征,同时引入注意力机制提取图像特征,缩小真实图像与投影视图间的语义差异。实验结果表明:相比于同类方法,所提方法在两个跨域数据集上的检索平均准确率至少提升2.98%~3.05%。

关键词: 三维模型检索, 深度学习, 跨域检索, 三元组网络, 门控循环单元(GRU), 注意力机制

Abstract: With the diversified development of media data, the cross-domain retrieval between images and 3D models becomes a new challenge for 3D model retrieval. In view that images and 3D models are extremely different and hard to match, a cross-domain retrieval algorithm based on triple network is proposed to construct a joint embedding space for real images and 3D shapes in an end-to-end manner. Then the similarity between different modal data could be effectively computed by the distance in the space, leading to accurate retrieval of similar 3D models from single image. In order to improve the accuracy of cross-domain retrieval, the 3D model was represented by a set of sequential views, and the Gate Recurrent Unit (GRU) was utilized for view-level features to generate the global feature. In addition, an attention mechanism was introduced to extract image features and bridge the semantic gaps between the real image and the rendered 3D views. Experimental results show that the mean average precision can be improved by at least 2.98%-3.05% on two cross-domain datasets compared with other similar algorithms.

Key words: 3D model retrieval, deep learning, cross-domain retrieval, triplet network, Gate Recurrent Unit (GRU), attention mechanism

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发