留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于三元组信息指导的生成式文本摘要研究

张云佐 李怡

张云佐,李怡. 基于三元组信息指导的生成式文本摘要研究[J]. 北京航空航天大学学报,2024,50(12):3677-3685 doi: 10.13700/j.bh.1001-5965.2022.0896
引用本文: 张云佐,李怡. 基于三元组信息指导的生成式文本摘要研究[J]. 北京航空航天大学学报,2024,50(12):3677-3685 doi: 10.13700/j.bh.1001-5965.2022.0896
ZHANG Y Z,LI Y. Research on abstractive text summarization based on triplet information guidance[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(12):3677-3685 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0896
Citation: ZHANG Y Z,LI Y. Research on abstractive text summarization based on triplet information guidance[J]. Journal of Beijing University of Aeronautics and Astronautics,2024,50(12):3677-3685 (in Chinese) doi: 10.13700/j.bh.1001-5965.2022.0896

基于三元组信息指导的生成式文本摘要研究

doi: 10.13700/j.bh.1001-5965.2022.0896
基金项目: 国家自然科学基金(61702347,62027801); 河北省自然科学基金(F2022210007,F2017210161);河北省高等学校科学技术研究项目(ZD2022100); 中央引导地方科技发展资金(226Z0501G); 石家庄铁道大学在读研究生创新能力培养资助项目(YC2022058)
详细信息
    通讯作者:

    E-mail:zhangyunzuo888@sina.com

  • 中图分类号: TP391.1

Research on abstractive text summarization based on triplet information guidance

Funds: National Natural Science Foundation of China (61702347,62027801); Natural Science Foundation of Hebei Province (F2022210007,F2017210161); Science and Technology Project of Hebei Education Department (ZD2022100); Central Guidance on Local Science and Technology Development Fund (226Z0501G); Shijiazhuang Tiedao University Graduate Innovation Funding Project (YC2022058)
More Information
  • 摘要:

    针对当前生成式文本摘要模型在解码时对文本事实性信息利用不充分的问题,提出一种以事实三元组为指导的文本摘要模型 SPOATS。该模型基于 Transformer 结构搭建具有事实提取能力的双编码器和融合事实特征的解码器。构建 LTP-BiLSTM-GAT (LBiG) 模型,并设计最优事实三元组选择算法,从非结构化中文文本中提取最优事实三元组,并获取事实性信息的特征表示;利用改进的 S-BERT 模型对原文进行句子级向量表示,获取语义丰富的句子编码;设计基于注意力的事实融合机制,融合双编码特征来提高模型在解码阶段对事实性信息的选择能力。实验结果表明:在LCSTS数据集上,所提模型相比于基线模型 ERPG 的R1值提升了2.0%,摘要质量得到明显提升。

     

  • 图 1  事实性文本摘要任务示意图

    Figure 1.  Task diagram of factual text summarization

    图 2  SPOATS 模型结构

    Figure 2.  Structure of SPOATS model

    图 3  S-BERT 模型结构

    Figure 3.  Structure of S-BERT model

    图 4  事实三元组嵌入

    Figure 4.  Embedding of factual triples

    表  1  LCSTS数据集信息

    Table  1.   Information on LCSTS dataset

    数据集 样本数量 数据用途
    PART Ⅰ 2400591 训练集
    PART Ⅱ 10666 验证集
    PART Ⅲ 1106 测试集
    下载: 导出CSV

    表  2  LCSTS数据集在不同模型上的实验结果

    Table  2.   Experimental results of LCSTS dataset on different models %

    模型 R1 R2 RL
    RNN[13] 19.8 8.4 16.8
    PGN[14] 29.3 17.0 24.9
    ERPG[15] 28.8 17.6 26.9
    BERT-Trans 29.1 16.7 26.2
    SPOATS 32.1 19.5 27.6
    下载: 导出CSV

    表  3  实例分析结果

    Table  3.   Results of case study

    参考摘要或
    模型
    样本1 样本2 样本3
    参考摘要 专家建议养老险每多缴1年养老金应多发5% 可穿戴技术十大设计原则 某电影票房破纪录
    RNN[13] 孙洁认为养老金多发5% 本文总结可穿戴产品十大设计原则,强调解决重复性问题 某电影上映几天票房破XX亿大关
    PGN[14] 人大代表认为养老保险超过15年,养老金多发5% 可穿戴技术十大设计原则出炉,注重从人出发的设计思路 某电影票房大卖,上映即破XX亿,制作精良获好评
    ERPG[15] 专家认为养老保险缴费每多一年,养老金多发5% 可穿戴产品设计原则:吸引注意但不刻意,提升用户能力 某电影票房破纪录,精彩剧情和出色演员表现赢得好评
    BERT-Trans 人大代表认为养老保险应多发5% 十大设计原则引领可穿戴技术,强调用户能力而非取代人 某电影成为今年最卖座之一,票房破XX亿大关
    SPOATS 专家提议养老保险缴费超过15年后,养老金应多发5% 本文明确可穿戴技术十大设计原则,与用户能力提升紧密相关 某电影票房破XX亿,剧情、演员、制作均获观众和业内好评
    下载: 导出CSV
  • [1] REINSEL D, GANTZ J, RYDNING J. Data age 2025[R]. Framingham: International Data Corporation, 2017.
    [2] WAZERY Y M, SALEH M E, ALHARBI A, et al. Abstractive Arabic text summarization based on deep learning[J]. Computational Intelligence and Neuroscience, 2022, 2022: 1566890.
    [3] 邓露, 胡珀, 李炫宏. 知识增强的生物医学文本生成式摘要研究[J]. 数据分析与知识发现, 2022, 6(11): 1-12.

    DENG L, HU P, LI X H. Abstractiing biomedical documents with knowledge enhancement[J]. Data Analysis and Knowledge Discovery, 2022, 6(11): 1-12(in Chinese).
    [4] 闫婉莹, 郭军军, 余正涛, 等. 基于案件属性感知的案件倾向性抽取式摘要[J]. 山西大学学报(自然科学版), 2021, 44(3): 445-453.

    YAN W Y, GUO J J, YU Z T, et al. Case propensity extract summarization based on case attribute perception[J]. Journal of Shanxi University(Natural Science Edition), 2021, 44(3): 445-453(in Chinese).
    [5] 蔡中祥, 孙建伟. 融合指针网络的新闻文本摘要模型[J]. 小型微型计算机系统, 2021, 42(3): 462-466.

    CAI Z X, SUN J W. News text summarization model integrating pointer network[J]. Journal of Chinese Computer Systems, 2021, 42(3): 462-466(in Chinese).
    [6] 黄浩宁, 陈志敏, 徐聪, 等. 基于领域概念图的航天新闻自动摘要模型[J]. 北京航空航天大学学报, 2024, 50(1): 317-327.

    HUANG H N, CHEN Z M, XU C, et al. Automatic summarization model of aerospace news based on domain concept graph[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50(1): 317-327(in Chinese).
    [7] 吴华瑞, 郭威, 邓颖, 等. 农业文本语义理解技术综述[J]. 农业机械学报, 2022, 53(5): 1-16. doi: 10.6041/j.issn.1000-1298.2022.05.001

    WU H R, GUO W, DENG Y, et al. Review of semantic analysis techniques of agricultural texts[J]. Transactions of the Chinese Society for Agricultural Machinery, 2022, 53(5): 1-16(in Chinese). doi: 10.6041/j.issn.1000-1298.2022.05.001
    [8] 李金鹏, 张闯, 陈小军, 等. 自动文本摘要研究综述[J]. 计算机研究与发展, 2021, 58(1): 1-21. doi: 10.7544/issn1000-1239.2021.20190785

    LI J P, ZHANG C, CHEN X J, et al. Survey on automatic text summarization[J]. Journal of Computer Research and Development, 2021, 58(1): 1-21(in Chinese). doi: 10.7544/issn1000-1239.2021.20190785
    [9] 朱永清, 赵鹏, 赵菲菲, 等. 基于深度学习的生成式文本摘要技术综述[J]. 计算机工程, 2021, 47(11): 11-21.

    ZHU Y Q, ZHAO P, ZHAO F F, et al. Survey on abstractive text summarization technologies based on deep learning[J]. Computer Engineering, 2021, 47(11): 11-21(in Chinese).
    [10] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL]. (2014-09-03)[2022-11-01].
    [11] 石磊, 阮选敏, 魏瑞斌, 等. 基于序列到序列模型的生成式文本摘要研究综述[J]. 情报学报, 2019, 38(10): 1102-1116. doi: 10.3772/j.issn.1000-0135.2019.10.010

    SHI L, RUAN X M, WEI R B, et al. Abstractive summarization based on sequence to sequence models: A review[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(10): 1102-1116(in Chinese). doi: 10.3772/j.issn.1000-0135.2019.10.010
    [12] ALOMARI A, IDRIS N, SABRI A Q M, et al. Deep reinforcement and transfer learning for abstractive text summarization: A review[J]. Computer Speech & Language, 2022, 71: 101276.
    [13] HU B T, CHEN Q C, ZHU F Z. LCSTS: A large scale Chinese short text summarization dataset[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2015: 1967-1972.
    [14] SEE A, LIU P J, MANNING C D. Get to the point: Summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 1073-1083.
    [15] HUANG T, LU G, LI Z, et al. Entity relations based pointer-generator network for abstractive text summarization[C]//Proceedings of the International Conference on Advanced Data Mining and Applications. Berlin: Springer, 2022: 219-236.
    [16] RUSH A M, CHOPRA S, WESTON J. A neural attention model for abstractive sentence summarization[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2015: 379-389.
    [17] CHOPRA S, AULI M, RUSH A M. Abstractive sentence summarization with attentive recurrent neural networks[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2016: 93-98.
    [18] 冯正平, 王勇. 融合分词和语义感知的中文文本摘要模型[J]. 计算机科学与应用, 2021, 11(12): 2913-2923. doi: 10.12677/CSA.2021.1112295

    FENG Z P, WANG Y. A Chinese text summarization model combining word segmentation and semantic awareness[J]. Computer Science and Application, 2021, 11(12): 2913-2923(in Chinese). doi: 10.12677/CSA.2021.1112295
    [19] 吴世鑫, 黄德根, 李玖一. 基于语义对齐的生成式文本摘要研究[J]. 北京大学学报(自然科学版), 2021, 57(1): 1-6.

    WU S X, HUANG D G, LI J Y. Abstractive text summarization based on semantic alignment network[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57(1): 1-6(in Chinese).
    [20] ZHOU Q Y, YANG N, WEI F R, et al. Selective encoding for abstractive sentence summarization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 1095-1104.
    [21] 徐如阳, 曾碧卿, 韩旭丽, 等. 卷积自注意力编码过滤的强化自动摘要模型[J]. 小型微型计算机系统, 2020, 41(2): 271-277. doi: 10.3969/j.issn.1000-1220.2020.02.008

    XU R Y, ZENG B Q, HAN X L, et al. Convolutional self-attention encoding for reinforced automatic summarization model[J]. Journal of Chinese Computer Systems, 2020, 41(2): 271-277(in Chinese). doi: 10.3969/j.issn.1000-1220.2020.02.008
    [22] 邓维斌, 李云波, 张一明, 等. 融合BERT和卷积门控的生成式文本摘要方法[J]. 控制与决策, 2023, 38(1): 152-160.

    DENG W B, LI Y B, ZHANG Y M, et al. An abstractive text summarization method combining BERT and convolutional gating unit[J]. Control and Decision, 2023, 38(1): 152-160(in Chinese).
    [23] QIU D, YANG B. Text summarization based on multi-head self-attention mechanism and pointer network[J]. Complex & Intelligent Systems, 2022, 8(1): 555-567.
    [24] PAULUS R, XIONG C M, SOCHER R. A deep reinforced model for abstractive summarization[EB/OL]. (2017-11-13)[2022-11-01].
    [25] 党宏社, 陶亚凡, 张选德. 基于混合注意力与强化学习的文本摘要生成[J]. 计算机工程与应用, 2020, 56(1): 185-190.

    DANG H S, TAO Y F, ZHANG X D. Ive summarization model based on mixture attention and reinforcement learning[J]. Computer Engineering and Applications, 2020, 56(1): 185-190(in Chinese)
    [26] 曾道建, 童国维, 戴愿, 等. 基于序列到序列模型的法律问题关键词抽取[J]. 清华大学学报(自然科学版), 2019, 59(4): 256-261.

    ZENG D J, TONG G W, DAI Y, et al. Keyphrase extraction for legal questions based on a sequence to sequence model[J]. Journal of Tsinghua University (Science and Technology), 2019, 59(4): 256-261(in Chinese).
    [27] ZHAO S, YOU F C, CHANG W, et al. Augment BERT with average pooling layer for Chinese summary generation[J]. Journal of Intelligent & Fuzzy Systems, 2022, 42(3): 1859-1868.
    [28] CAO Z Q, WEI F R, LI W J, et al. Faithful to the original: Fact aware neural abstractive summarization[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2018: 4784-4791.
    [29] FALKE T, RIBEIRO L F R, UTAMA P A, et al. Ranking generated summaries by correctness: an interesting but challenging application for natural language inference[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 2214-2220.
    [30] KRYSCINSKI W, MCCANN B, XIONG C M, et al. Evaluating the factual consistency of abstractive text summarization[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 9332-9346.
    [31] KHANAM S A, LIU F, CHEN Y P P. Joint knowledge-powered topic level attention for a convolutional text summarization model[J]. Knowledge-Based Systems, 2021, 228: 107273. doi: 10.1016/j.knosys.2021.107273
    [32] GUNEL B, ZHU C G, ZENG M, et al. Mind the facts: Knowledge-boosted coherent abstractive text summarization[EB/OL]. (2020-06-27)[2022-11-01].
    [33] WANG L, YAO J L, TAO Y Z, et al. A reinforced topic-aware convolutional sequence-to-sequence model for abstractive text summarization[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Freiburg: International Joint Conferences on Artificial Intelligence Organization, 2018: 4453-4460.
    [34] CHE W X, LI Z H, LIU T, et al. LTP: A Chinese language technology platform[C]//Proceedings of the 23rd International Conference on Computational Linguistics. New York: ACM, 2010: 13-16.
    [35] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the Advances in Neural Information Processing Systems. La Jolla: NIPS, 2017: 5998-6008.
    [36] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 4171-4186.
    [37] LIN C Y. ROUGE: A package for automatic evaluation of summaries[C]//Proceedings of the ACL Workshop on Text Summarization Braches Out. Stroudsburg: ACL, 2004: 74-81.
  • 加载中
图(4) / 表(3)
计量
  • 文章访问数:  315
  • HTML全文浏览量:  110
  • PDF下载量:  10
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-11-03
  • 录用日期:  2023-02-17
  • 网络出版日期:  2023-03-09
  • 整期出版日期:  2024-12-31

目录

    /

    返回文章
    返回
    常见问答