北京航空航天大学学报 ›› 2021, Vol. 47 ›› Issue (3): 431-440.doi: 10.13700/j.bh.1001-5965.2020.0443

• 论文 • 上一篇    下一篇

基于依存句法的图像描述文本生成

毕健旗1,2, 刘茂福1,2, 胡慧君1,2, 代建华3   

  1. 1. 武汉科技大学 计算机科学与技术学院, 武汉 430065;
    2. 武汉科技大学 智能信息处理与实时工业系统湖北省重点实验室, 武汉 430081;
    3. 湖南师范大学 智能计算与语言信息处理湖南省重点实验室, 长沙 410081
  • 收稿日期:2020-08-21 发布日期:2021-04-08
  • 通讯作者: 刘茂福 E-mail:liumaofu@wust.edu.cn
  • 作者简介:毕健旗,男,硕士研究生。主要研究方向:自然语言处理;刘茂福,男,博士,教授,博士生导师。主要研究方向:自然语言处理、图像分析与理解;胡慧君,女,博士,副教授,硕士生导师。主要研究方向:智能信息处理、图像分析与理解;代建华,男,博士,教授,博士生导师。主要研究方向:人工智能、智能信息处理。
  • 基金资助:
    国家社会科学基金重大研究计划(11&ZD189);全军共用信息系统装备预先研究项目(31502030502)

Image captioning based on dependency syntax

BI Jianqi1,2, LIU Maofu1,2, HU Huijun1,2, DAI Jianhua3   

  1. 1. School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China;
    2. Hubei Provincial Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan University of Science and Technology, Wuhan 430081, China;
    3. Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha 410081, China
  • Received:2020-08-21 Published:2021-04-08
  • Supported by:
    Major Projects of National Social Science Foundation of China (11&ZD189); Pre-research Foundation of Whole Army Shared Information System Equipment (31502030502)

摘要: 现有图像描述文本生成模型能够应用词性序列和句法树使生成的文本更符合语法规则,但文本多为简单句,在语言模型促进深度学习模型的可解释性方面研究甚少。将依存句法信息融合到深度学习模型以监督图像描述文本生成的同时,可使深度学习模型更具可解释性。图像结构注意力机制基于依存句法和图像视觉信息,用于计算图像区域间关系并得到图像区域关系特征;融合图像区域关系特征和图像区域特征,与文本词向量通过长短期记忆网络(LSTM),用于生成图像描述文本。在测试阶段,通过测试图像与训练图像集的内容关键词,计算2幅图像的内容重合度,间接提取与测试图像对应的依存句法模板;模型基于依存句法模板,生成多样的图像描述文本。实验结果验证了模型在改善图像描述文本多样性和句法复杂度方面的能力,表明模型中的依存句法信息增强了深度学习模型的可解释性。

关键词: 图像描述文本生成, 依存句法, 图像结构注意力, 内容重合度, 深度模型可解释性

Abstract: Current image captioning model can automatically apply the part-of-speech sequences and syntactic trees to make the generated text in line with grammar. However, the above-mentioned models generally generate the simple sentences. There is no groundbreaking work in language models promoting the interpretability of deep learning models. The dependency syntax is integrated into the deep learning model to supervise the image captioning, which can make deep learning models more interpretable. An image structure attention mechanism, which recognizes the relationship between image regions based on the dependency syntax,is applied to compute the visual relations and obtain the features. The fusion of image region relation features and image region features, and the word embedding are employed into Long Short-Term Memory (LSTM) to generate the image captions. In testing, the content keywords of the testing and training image datasets are produced due to the content overlap of two images, and the dependency syntax template corresponding to the test image can be indirectly extracted. According to the dependency syntax template, the diverse descriptions can be generated. Experiment resultsverify the capacity of the proposed model to improve the diversity of generated captions and syntax complexity and indicate that the dependency syntax can enhance the interpretability of deep learning model.

Key words: image captioning, dependency syntax, image structure attention, content overlap, interpretability of deep learning model

中图分类号: 


版权所有 © 《北京航空航天大学学报》编辑部
通讯地址:北京市海淀区学院路37号 北京航空航天大学学报编辑部 邮编:100191 E-mail:jbuaa@buaa.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发