留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于ELMo-GCN的核电领域命名实体识别

荆鑫 王华峰 刘潜峰 罗嗣梧 张凡

荆鑫, 王华峰, 刘潜峰, 等 . 基于ELMo-GCN的核电领域命名实体识别[J]. 北京航空航天大学学报, 2022, 48(12): 2556-2565. doi: 10.13700/j.bh.1001-5965.2021.0155
引用本文: 荆鑫, 王华峰, 刘潜峰, 等 . 基于ELMo-GCN的核电领域命名实体识别[J]. 北京航空航天大学学报, 2022, 48(12): 2556-2565. doi: 10.13700/j.bh.1001-5965.2021.0155
JING Xin, WANG Huafeng, LIU Qianfeng, et al. Named entity recognition in nuclear power field based on ELMo-GCN[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(12): 2556-2565. doi: 10.13700/j.bh.1001-5965.2021.0155(in Chinese)
Citation: JING Xin, WANG Huafeng, LIU Qianfeng, et al. Named entity recognition in nuclear power field based on ELMo-GCN[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(12): 2556-2565. doi: 10.13700/j.bh.1001-5965.2021.0155(in Chinese)

基于ELMo-GCN的核电领域命名实体识别

doi: 10.13700/j.bh.1001-5965.2021.0155
基金项目: 

北京市教育委员会科研计划 KM202110009001

河北省科研计划 203777116D

详细信息
    通讯作者:

    王华峰, E-mail: wanghuafeng@buaa.edu.cn

  • 中图分类号: YP391

Named entity recognition in nuclear power field based on ELMo-GCN

Funds: 

Beijing Municipal Commission of Education Scientific Research Program KM202110009001

Scientific Research Program of Hebei Province 203777116D

More Information
  • 摘要:

    在核电领域的知识管理过程中,需要使用命名实体识别技术抽取高质量语义实体,以进行核电领域文本的智能分析和处理。在现有研究的基础上,通过增强网络对上下文信息的提取能力,提升模型对嵌套命名实体的识别准确率。经实验验证,所提方法较现有方法在准确率与召回率指标上提升显著,与BiFlaG网络对比,准确率提高9.52%,召回率提高8.51%,F1值提高9.02%。所提方法对嵌套命名实体识别优于BiFlaG等网络。

     

  • 图 1  DTGCN网络模型

    Figure 1.  DTGCN network model

    图 2  核电语料中嵌套命名实体比例

    Figure 2.  Proportion of nested named entities in nuclear power corpus

    图 3  核电语料命名实体标注示例

    Figure 3.  Example of nuclear power corpus named entity labeling

    图 4  LSTM结构

    Figure 4.  Structure of LSTM

    图 5  G1实体相邻图

    Figure 5.  Entity adjacent graph (G1)

    图 6  G2实体关系图

    Figure 6.  Entity relationship graph (G2)

    图 7  核电语料实体类型对比

    Figure 7.  Comparison of nuclear power corpus entity types

    表  1  核电语料实体标注方法及实体数量和出现频率

    Table  1.   Nuclear power corpus labeling methodology, count, and frequency of appearance

    实体类型 实体开始 实体内部及结尾 实体数量 出现次数
    专有名词类 B-NOU I-NOU 684 3 548
    冷却与冷却剂类 B-COO I-COO 1 365 6 894
    燃料与材料类 B-FUE I-FUE 207 1 198
    反应堆类 B-REA I-REA 251 2 134
    下载: 导出CSV

    表  2  实体类型编号对照

    Table  2.   Table of entity type number

    类型 命名 编号
    非实体 O 1
    专有名词类 B-NOU 2
    I-NOU 3
    冷却与冷却剂类 B-COO 4
    I-COO 5
    燃料与材料类 B-FUE 6
    I-FUE 7
    反应堆类 B-REA 8
    I-REA 9
    下载: 导出CSV

    表  3  模型实验结果对比

    Table  3.   Comparison of model experimental results  %

    模型 P R F1
    LSTM[12] 52.47 50.26 51.34
    BiLSTM+CRF[13] 60.63 70.63 65.25
    ELMo[14] 67.98 70.12 69.03
    BiFlaG[25] 71.73 73.81 72.76
    MGNER[24] 74.94 74.72 74.83
    DTGCN 81.25 82.32 81.78
    下载: 导出CSV

    表  4  分类结果对比

    Table  4.   Comparison of classification results  %

    分类 DTGCN BiFlaG[25] MGNER[24]
    P R F1 P R F1 P R F1
    NOU 90.7 76.3 82.9 73.4 69.2 71.2 83.8 65.5 73.5
    COO 82.7 90.8 86.6 72.3 81.8 76.7 73.5 84.1 78.4
    FUE 60.0 66.7 63.2 71.7 53.0 60.9 48.0 59.9 53.3
    REA 79.9 68.8 74.5 82.4 63.0 71.4 80.4 76.7 78.5
    下载: 导出CSV

    表  5  消融实验结果

    Table  5.   Experimental results of ablation  %

    序号 P R F1
    81.25 82.32 81.78
    74.53 77.47 75.97
    75.31 76.51 75.90
    76.46 79.10 77.76
    78.10 77.29 77.69
    78.61 79.67 79.13
    77.22 80.73 78.94
    下载: 导出CSV
  • [1] 王飞跃, 孙奇, 江国进, 等. 核能5.0: 智能时代的核电工业新形态与体系架构[J]. 自动化学报, 2018, 44(5): 922-934. https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201805015.htm

    WANG F Y, SUN Q, JIANG G J, et al. Nuclear energy 5.0: New formation and system architecture of nuclear power industry in the new IT era[J]. Acta Automatica Sinica, 2018, 44(5): 922-934(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-MOTO201805015.htm
    [2] 国务院发展研究中心资源与环境政策研究所. 中国能源革命进展报告(2020)[M]. 北京: 石油工业出版社, 2020.

    Institute of Resources and Environmental Policy, Development Research Center of the State Council. The energy development report of China(2020)[M]. Beijing: Petroleum Industry Press, 2020(in Chinese).
    [3] KRIPKE S, DAVIDSON D, HARMAN G. Naming and necessity[M]. Cambridge: Harvard University Press, 1980: 253-355.
    [4] CHINCHOR N. MUC-6 named entity task definition[C] // Conference on Message Understanding, 1995.
    [5] FLEISCHMAN M. Automated subcategorization of named entities[C] // Association for Computational Linguistic, 2001.
    [6] LEE S, LEE G. Heuristic methods for reducing errors of geographic named entities learned by bootstrapping[C] // Natural Language Processing. Berlin: Springer, 2005, 3651: 658-669.
    [7] FLEISCHMAN M, HOVY E. Fine grained classification of named entities[C] // Proceedings of the 19th International Conference on Computational Linguistics, 2002: 1-7.
    [8] BODENREIDER O, ZWEIGENBAUM P. Identifying proper names in parallel medical terminologies[J]. Studies in Health Technology and Informatics, 2000, 77: 443-447.
    [9] SHEN D, ZHANG J, ZHOU G D, et al. Effective adaptation of a hidden Markov model-based named entity recognizer for biomedical domain[C] // Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, 2003, 13: 49-56.
    [10] GOYAL A, GUPTA V, KUMAR M. Recent named entity recognition and classification techniques: A systematic review[J]. Computer Science Review, 2018, 29: 21-43.
    [11] GOLLER C, KUCHER A. Learning task-dependent distributed representations by backpropagation through structure[C] //Proceedings of International Conference on Neural Networks. Piscataway: IEEE Press, 1996, 1: 347-352.
    [12] HAMMERTON J. Named entity recognition with long short-term memory[C] // Proceedings of the 7th Conference on Natural Language Learning, 2003, 4: 172-175.
    [13] HUANG Z H, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[EB/OL]. (2015-08-09)[2021-03-01]. https://arxiv.org/abs/1508.01991.
    [14] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C] // Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018, 1: 2227-2237.
    [15] SUTSKEVER I, VINYAL S O, LE Q V. Sequence to sequence learning with neural networks[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014, 2: 3104-3112.
    [16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017: 5998-6008.
    [17] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C] // Proceedings of the 2019 Conference of the North, 2019, 1: 4171-4186.
    [18] KAZAMA J, MAKINO T, OHTA Y, et al. Tuning support vector machines for biomedical named entity recognition[C] // Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain, 2002, 3: 1-8.
    [19] TAULÉ M, MARTÍ M A, RECASENS M. AnCora: Multilingual and multilevel annotated corpora[C] // LREC 2008, 2008: 96-101.
    [20] DODDINGTON G R, MITCHELL A, PRZYBOCKIM A, et al. The automatic content extraction(ACE) program-tasks, data, and evaluation[C]//LREC 2004, 2004: 837-840.
    [21] FINKEL J R, MANNING C D. Nested named entity recognition[C]// Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009, 1: 141-150.
    [22] ALEX B, HADDOW B, GROVER C. Recognising nested named entities in biomedical text[C] // Proceedings of the Workshop on BioNLP 2007 Biological, Translational, and Clinical Language Processing, 2007: 65-72.
    [23] LU W, DAN R. Joint mention extraction and classification with mention hypergraphs[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015: 857-867.
    [24] XIA C Y, ZHANG C W, YANG T, et al. Multi-grained named entity recognition[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019: 1430-1440.
    [25] LUO Y, ZHAO H. Bipartite flat-graph network for nested named entity recognition[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020: 6408-6418.
    [26] 于平安, 朱瑞安, 喻真烷, 等. 核反应堆热工分析[M]. 3版. 上海: 上海交通大学出版社, 2002.

    YU P A, ZHU R A, YU Z W, et al. Nuclear reactor thermal analysis[M]. 3th ed. Shanghai: Shanghai Jiao Tong University Press, 2002(in Chinese).
    [27] 谢仲生, 吴宏春, 张少鸿. 核反应堆物理分析(修订版)[M]. 西安: 西安交通大学出版社, 2002.

    XIE Z S, WU H C, ZHANG S H. Nuclear reactor physical analysis(Revised Edition)[M]. Xi'an: Xi'an Jiaotong University Press, 2020(in Chinese).
    [28] LAFFERTY J D, MCCALLUM A, PEREIRA F C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[C] //Proceedings of the 18th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc, 2001: 282-289.
    [29] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition[C] // Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016: 260-270.
    [30] MA X Z, HOVY E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF[C] // Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016, 1: 1064-1074.
    [31] YANG J, ZHANG Y. NCRF++: An open-source neural sequence labeling toolkit[C] // Proceedings of ACL 2018, 2018: 74-79.
    [32] KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22)[2021-03-01]. https://arxiv.org/abs/1609.02907v3.
  • 加载中
图(7) / 表(5)
计量
  • 文章访问数:  318
  • HTML全文浏览量:  99
  • PDF下载量:  23
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-03-30
  • 录用日期:  2021-05-31
  • 网络出版日期:  2021-07-13
  • 整期出版日期:  2022-12-20

目录

    /

    返回文章
    返回
    常见问答