区域地质调查文本中文命名实体识别
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

本文为国家重点研发计划(编号:2022YFF0711601)、国家自然科学基金资助项目(编号:42050101)和中国博士后科学基金资助项目(编号:2021M702991)的成果


Chinese named entity recognition for regional geological survey text
Author:
Affiliation:

Fund Project:

单位:
  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    作为我国地质调查领域最重要的数据源之一,地质调查报告中蕴含着丰富的地学知识及地质体描述等关键信息,准确高质量地抽取地质命名实体为地学知识图谱构建、知识推理及知识演化提供基础。本文在阐述地质命名实体识别任务基础上,分析地质实体不仅包含大量专业术语,还存在实体嵌套、大量长实体等领域特性,进一步增加了地质命名实体识别难度。笔者等提出一种基于轻量级预训练模型(ALBERT)—双向长短时记忆网络(BiLSTM)—条件随机场(CRF)模型的地质命名实体识别方法。首先利用ALBERT对输入字符上下文特征进行建模,并采用BiLSTM对其进行进一步上下文特征表征,最后采用CRF实现标注序列预测。实验结果表明,在构建的地质命名实体识别数据集上,相比于主流的命名实体识别模型算法,本文所提出的方法具有更好的抽取性能,提出的命名实体识别模型能为领域实体识别提供借鉴,同时为地学领域实体关系抽取和地学知识图谱构建提供有力方法支撑。

    Abstract:

    As one of the most important data sources in the field of geological survey in China, geological survey texts contain a wealth of geological knowledge and descriptions of geological bodies and other key information, and accurate and effective extraction of geological entities in this field can provide the basis for geological knowledge graph and knowledge inference. In this paper, based on the description of the geological named entity recognition task, it is analysed that geological entities contain a large number of terminologies along with domain characteristics such as entity nesting and a large number of long entities, which further increase the difficulty of geological named entity recognition. A lightweight pre- training model (ALBERT) — bi- directional long and short- term memory network (BiLSTM) — conditional random field (CRF) model is proposed for geological named entity recognition. Firstly, ALBERT is used to model the contextual features of the input characters, and BiLSTM is used to further characterize the contextual features, and finally CRF is used to achieve annotated sequence prediction. The experimental results show that the proposed method has superior extraction performance than the mainstream named entity recognition model algorithms on the constructed geological named entity recognition datasets, and the proposed named entity recognition model can provide reference for domain entity recognition, as well as provide powerful methodological support for entity relationship extraction and geological knowledge graph construction in the geoscience domain.

    参考文献
    相似文献
    引证文献
引用本文

邱芹军,田苗,马凯,谢忠,金相国,段雨希,陶留锋.2023.区域地质调查文本中文命名实体识别[J].地质论评,69(1):2023010005,[DOI].
QIU Qinjun, TIAN Miao, MA Kai, XIE Zhong, JIN Xiangguo, DUAN Yuxi, TAO Liufeng.2023. Chinese named entity recognition for regional geological survey text[J]. Geological Review,69(1):2023010005.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-11-23
  • 最后修改日期:2023-01-10
  • 录用日期:
  • 在线发布日期: 2023-01-20
  • 出版日期: