Abstract:At present, the geological survey is developing from digitization towards the direction of intelligence. According to the big data thinking, the machine reading technique and the auto-extration of geological knowledge based on the unstructured data deserves academic concern in geosciences. The problem about joint extration of the geological named entity and relation is the key to this research and yet it is lack of study. This paper proposes the BERT—BiLSTM—CRF model which based on the pre-trained chinese language representation model called BERT to conduct the joint task of geological named entity recognition (NER) and relation extraction (RE) on the lithological description corpus. First, the sentence-level corpus was collected from the the profiling and field geological observation data which were produced by the digital geological survey information system designed by China Geological Survey (CGS). Second, based on the theory of petrology, the meta-graph was projected for the rock named entities and relations and the corpus was manual labeled. Third, the comparison experiment of geological knowledge extration task were carried out on the labeled corpus. The experiment results showed that the BERT model does apply to the NER and RE task on the lithological description corpus. The performance (F1) achieved by the proposed BERT—BiLSTM—CRF model on the lithological named entity and relation joint extraction task reached 91.75%, and F1 even reached 97.38% on the task of the named entity recognition. The ablation experiments indicated that the influence of the BERT-embedding layer is prominent on the lithological knowledge extration task and the BiLSTM layer can improvement the performance of the entity and relation joint extraction task.