Automatic summarization model of aerospace news based on domain concept graph

HUANG Haoning; CHEN Zhimin; XU Cong; ZHANG Xiaoyan

doi:10.13700/j.bh.1001-5965.2022.0233

Volume 50 Issue 1

Jan. 2024

Turn off MathJax

Article Contents

Journal of Beijing University of Aeronautics and Astronautics > 2024 > 50(1): 317-327.

HUANG H N，CHEN Z M，XU C，et al. Automatic summarization model of aerospace news based on domain concept graph[J]. Journal of Beijing University of Aeronautics and Astronautics，2024，50（1）：317-327 （in Chinese） doi: 10.13700/j.bh.1001-5965.2022.0233

Citation:

PDF( 1095 KB)

Automatic summarization model of aerospace news based on domain concept graph

doi: 10.13700/j.bh.1001-5965.2022.0233

HUANG Haoning^{1, 2},
CHEN Zhimin^{1
,
,},
XU Cong^{1, 2},
ZHANG Xiaoyan³

1.
National Space Science Center，Chinese Academy of Sciences，Beijing 100190，China
2.
University of Chinese Academy of Sciences，Beijing 100049，China
3.
State Radio Monitoring Center，Beijing 100037，China

Funds: National Natural Science Foundation of China (91738101); National Key R & D Program of China (2020YFB1807900)

More Information

Corresponding author: E-mail：chenzm@nssc.ac.cn
Received Date: 12 Apr 2022
Accepted Date: 25 Jun 2022

Available Online: 02 Sep 2022

Publish Date: 26 Aug 2022

Abstract

Abstract

The effectiveness of subsequent intelligence analysis can be increased by comprehending and compressing the vast amount of aerospace information that is hidden in the Internet's aerospace news. However the general automatic summarization algorithms tend to ignore many domain key Information, and the existing supervised automatic summarization algorithms need to annotate a lot of data in the domain text. It is time-consuming and laborious. Therefore, we proposed an unsupervised automatic summarization model TextRank based on domain concept graph (DCG-TextRank). It is based on a domain concept graph, which uses domain terms to help guide graph ordering and improve the model's understanding of domain text. The model has three modules: domain concept graph generation, graph weight initialization, graph sorting and semantic filtering. Transform the text into domain concept graph containing sentence nodes and domain term nodes according to sentence vector similarity and domain term database. Initialize the domain concept graph weight according to the features of aerospace news text. Use the TextRank algorithm to sort the sentences, and in the semantic filtering module, the output of TextRank is improved by clustering the graph nodes and setting the semantic retention of the abstract, which fully preserves the semantic Information of text and reduces redundancy. The proposed model is domain portable, and experimental findings indicate that in the aerospace news dataset, the proposed model performs 14.97% better than the conventional TextRank model and 4.37%～12.97% better than the supervised extraction text summary models BertSum and MatchSum.
- automatic text summarization,
- domain concept graph,
- Pre-trained language model,
- graph sorting algorithm,
- graph node clustering

FullText(HTML)

References(36)

References

[1]	冯鸾鸾, 李军辉, 李培峰, 等. 面向国防科技领域的技术和术语语料库构建方法[J]. 中文信息学报, 2020, 34(8): 41-50. FENG L L, LI J H, LI P F, et al. Constructing a technology and terminology corpus oriented national defense science[J]. Journal of Chinese Information Processing, 2020, 34(8): 41-50(in Chinese).
[2]	MAURYA P, JAFARI O, THATTE B, et al. Building a comprehensive NER model for satellite domain[J]. SN Computer Science, 2022, 3(3): 1-8.
[3]	JAFARI O, NAGARKAR P, THATTE B, et al. SatelliteNER: An effective named entity recognition model for the satellite domain[C]//12th International Conference on Knowledge Management and Information Systems. Beijing: SCITEPRESS, 2020: 100-107.
[4]	LU Y, YANG R, JIANG X, et al. A military named entity recognition method based on pre-training language model and BiLSTM-CRF[J]. Journal of Physics: Conference Series, 2020, 1693(1): 012161
[5]	高翔, 张金登, 许潇, 等. 基于LSTM-CRF的军事动向文本实体识别方法[J]. 指挥信息系统与技术, 2020, 11(6): 91-95 GAO X, ZHANG J D, XU X, et al. Military trend text entity recognition method based on LSTM-CRF[J]. Command Information System and Technology, 2020, 11(6): 91-95(in Chinese).
[6]	SHINDE M, MHATRE D, MARWAL G. Techniques and research in text summarization—A survey[C]//2021 International Conference on Advance Computing and Innovative Technologies in Engineering. Piscataway: IEEE Press, 2021: 260-263.
[7]	ZHENG C, ZHANG K, WANG H J, et al. Enhanced Seq2Seq autoencoder via contrastive learning for abstractive text summarization[C]//2021 IEEE International Conference on Big Data. Piscataway: IEEE Press, 2021: 1764-1771.
[8]	PRASAD C, KALLIMANI J S, HAREKAL D, et al. Automatic text summarization model using Seq2Seq technique[C]//2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud). Piscataway: IEEE Press, 2020: 599-604.
[9]	SHI T, KENESHLOO Y, RAMAKRISHNAN N, et al. Neural abstractive text summarization with sequence-to-sequence models[J]. ACM Transactions on Data Science, 2021, 2(1): 1-37.
[10]	WANG Q, REN J. Summary-aware attention for social media short text abstractive summarization[J]. Neurocomputing, 2021, 425: 290-299. doi: 10.1016/j.neucom.2020.04.136
[11]	MIAO W, ZHANG G, BAI Y, et al. Improving accuracy of key information acquisition for social media text summarization[C]//2019 IEEE International Conferences on Ubiquitous Computing & Communications(IUCC) and Data Science and Computational Intelligence(DSCI) and Smart Computing, Networking and Services. Piscataway: IEEE Press, 2019: 408-415.
[12]	NALLAPATI R, FEIFEI Z, Bowen Z. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents[C]//Thirty-first AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2017: 3075-3081.
[13]	WANG Y, TANG S, ZHU F, et al. Revisiting the transferability of supervised pretraining: An MLP perspective[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2021: 9173-9183.
[14]	VANDERWENDE L, SUZUKI H, BROCKETT C, et al. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion[J]. Information Processing & Management, 2007, 43(6): 1606-1618.
[15]	CAO Z, WEI F, DONG L, et al. Ranking with recursive neural networks and its application to multi-document summarization[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2015, 29(1): 2153-2159.
[16]	HONG K, CONROY J M, FAVRE B, et al. A repository of state of the art and competitive baseline summaries for generic news summarization[C]//Proceedings of the LREC. Marseille: LREC, 2014: 1608−1616.
[17]	LIU N, LU Y, TANG X J, et al. Multi-document summarization algorithm based on significance sentences[C]//2016 Chinese Control and Decision Conference. Piscataway: IEEE Press, 2016: 3847-3852.
[18]	WU R S, LIU K, WANG H L. An evolutionary summarization system based on local-global topic relationship[J]. Journal of Chinese Information Processing, 2018, 32(9): 75-83.
[19]	LITVAK M, VANETIK N, LIU C, et al. Improving summarization quality with topic modeling[C]//Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications. New York: ACM, 2015: 39-47.
[20]	ERKAN G, RADEV D R. LexRank: Graph-based lexical centrality as salience in text summarization[J]. Journal of Artificial IntelliGence Research, 2004, 22(1): 457-479.
[21]	MIHALCEA R, TARAU P. TextRank: Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona: EMNLP, 2004: 404-411.
[22]	叶雷, 余正涛, 高盛祥, 等. 多特征融合的汉越双语新闻摘要方法[J]. 中文信息学报, 2018, 32(12): 84-91. YE L, YU Z T, GAO S X, et al. A bilingual news summarization in Chinese and Vietnamese based on multiple features[J]. Journal of Chinese Information Processing, 2018, 32(12): 84-91(in Chinese).
[23]	余珊珊, 苏锦钿, 李鹏飞. 基于改进的TextRank的自动摘要提取方法[J]. 计算机科学, 2016, 43(6): 240-247. YU S S, SU J D, LI P F. Improved TextRank-based method for automatic summarization[J]. Computer Science, 2016, 43(6): 240-247(in Chinese).
[24]	WAN X, YANG J, XIAO J. Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction[C]//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. Stroudsburg: ACL, 2007: 552-559.
[25]	汪旭祥, 韩斌, 高瑞, 等. 基于改进TextRank的文本摘要自动提取[J]. 计算机应用与软件, 2021, 38(6): 155-160. WANG X X, HAN B, GAO R, et al. Automatic extraction of text summarization based on improved TextRank[J]. Computer Applications and Software, 2021, 38(6): 155-160(in Chinese).
[26]	黄波, 刘传才. 基于加权TextRank的中文自动文本摘要[J]. 计算机应用研究, 2020, 37(2): 407-410. HUANG B, LIU C C. Chinese automatic text summarization based on weighted TextRank[J]. Application Research of Computers, 2020, 37(2): 407-410(in Chinese).
[27]	李峰, 黄金柱, 李舟军, 等. 使用关键词扩展的新闻文本自动摘要方法[J]. 计算机科学与探索, 2016, 10(3): 372-380. LI F, HUANG J Z, LI Z J, et al. Automatic summarization method of news texts using keywords expansion[J]. Journal of Frontiers of Computer Science and Technology, 2016, 10(3): 372-380(in Chinese).
[28]	方萍, 徐宁. 基于BERT双向预训练的图模型摘要抽取算法[J]. 计算机应用研究, 2021, 38(9): 2657-2661. FANG P, XU N. Graph model summary extraction algorithm based on BERT bidirectional pretraining[J]. Application Research of Computers, 2021, 38(9): 2657-2661(in Chinese).
[29]	FERREIRA R, LINS R D, FREITAS F, et al. A new sentence similarity method based on a three-layer sentence representation[C]//2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies. Piscataway: IEEE Press , 2014: 110-117.
[30]	DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2019: 4171–4186.
[31]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. New York: ACM, 2017: 5998-6008.
[32]	LIN C Y, HOVY E. Identifying topics by position[C]//Fifth Conference on Applied Natural Language Processing. New York: ACM, 1997: 283-290.
[33]	SEIFIKAR M, FARZI S, BARATI M. C-blondel: An efficient Louvain-based dynamic community detection algorithm[J]. IEEE Transactions on Computational Social Systems, 2020, 7(2): 308-318. doi: 10.1109/TCSS.2020.2964197
[34]	LIU Y, LAPATA M. Text summarization with pretrained encoders[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg: ACL, 2019: 3721–3731.
[35]	ZHONG M, LIU P, CHEN Y, et al. Extractive summarization as text matching[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 6197-6208.
[36]	SEE A, LIU P J, MANNING C D. Get to the point: summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2017: 1073-1083.

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(4) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views(921) PDF downloads(16)

Automatic summarization model of aerospace news based on domain concept graph

doi: 10.13700/j.bh.1001-5965.2022.0233

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Automatic summarization model of aerospace news based on domain concept graph

doi: 10.13700/j.bh.1001-5965.2022.0233

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content