基于BERT—BiLSTM—CRF模型的中文岩石描述文本命名实体与关系联合提取
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

本文为国家自然科学基金资助项目(编号:41820104007,42072321,41872247)的成果。


Based on BERT—BiLSTM—CRF model the named entity and relation joint extration of chinese lithological description corpus
Author:
Affiliation:

Fund Project:

单位:
  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    地质调查正在从“数字化”走向“智能化”,需要在大数据思维的指导下,面向非结构化数据开展机器阅读和地质知识的自动提取。地学命名实体和关系联合提取是当前研究的难点和核心。本文采用基于大规模预训练中文语言模型的BERT—BiLSTM—CRF方法开展岩石描述文本命名实体与关系联合提取。首先,通过收集数字地质填图工作中的剖面测量和路线地质观测数据,建立岩石描述语料;然后,在岩石学理论指导下分析岩石知识组成,完成岩石知识图谱命名实体与关系的模式设计,标注岩石语料;最后,开展岩石描述语料知识提取的深度学习训练和消融试验对比。试验结果显示,大规模预训练中文语言模型(BERT)对岩石描述语料知识提取具有较高的适用性。推荐的BERT—BiLSTM—CRF模型方法对岩石命名实体与关系联合提取的准确率(F1值)为91.75%,对岩石命名实体识别的准确率(F1值)为97.38%。消融试验证明基于BERT的词嵌入层对岩石描述知识提取的性能提升影响显著,双向长短时记忆网络模型层(BiLSTM Layer)能提升实体关系联合提取性能。

    Abstract:

    At present, the geological survey is developing from digitization towards the direction of intelligence. According to the big data thinking, the machine reading technique and the auto-extration of geological knowledge based on the unstructured data deserves academic concern in geosciences. The problem about joint extration of the geological named entity and relation is the key to this research and yet it is lack of study. This paper proposes the BERT—BiLSTM—CRF model which based on the pre-trained chinese language representation model called BERT to conduct the joint task of geological named entity recognition (NER) and relation extraction (RE) on the lithological description corpus. First, the sentence-level corpus was collected from the the profiling and field geological observation data which were produced by the digital geological survey information system designed by China Geological Survey (CGS). Second, based on the theory of petrology, the meta-graph was projected for the rock named entities and relations and the corpus was manual labeled. Third, the comparison experiment of geological knowledge extration task were carried out on the labeled corpus. The experiment results showed that the BERT model does apply to the NER and RE task on the lithological description corpus. The performance (F1) achieved by the proposed BERT—BiLSTM—CRF model on the lithological named entity and relation joint extraction task reached 91.75%, and F1 even reached 97.38% on the task of the named entity recognition. The ablation experiments indicated that the influence of the BERT-embedding layer is prominent on the lithological knowledge extration task and the BiLSTM layer can improvement the performance of the entity and relation joint extraction task.

    参考文献
    相似文献
    引证文献
引用本文

陈忠良,袁峰,李晓晖,张明明.2022.基于BERT—BiLSTM—CRF模型的中文岩石描述文本命名实体与关系联合提取[J].地质论评,68(2):742-750,[DOI].
CHEN Zhongliang, YUAN Feng, LI Xiaohui, ZHANG Mingming.2022. Based on BERT—BiLSTM—CRF model the named entity and relation joint extration of chinese lithological description corpus[J]. Geological Review,68(2):742-750.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-07-07
  • 最后修改日期:2022-01-05
  • 录用日期:
  • 在线发布日期: 2022-03-19
  • 出版日期: 2022-03-15