Abstract:Geological Named Entity Recognition is the task of identifying geological entities in geological texts and categorizing them into accurate geological concepts. It is also one of the key technologies for constructing knowledge graphs in the geological domain. This research addresses two major challenges in the field of geological named entity recognition: the insufficient accuracy in complex entity recognition and the high cost of sample annotation. We have developed a geological entity recognition model, BERTwwm-BiLSTM-Attention-CRF. This model significantly enhances the recognition accuracy of complex geological entities by incorporating an improved pre-training layer, BERTwwm, and adding a Self-Attention module. It achieves a precision rate of 92.67%, a recall rate of 94.21%, and an F1-Score of 93.29%. To reduce annotation costs and improve recognition accuracy on small-scale datasets, this study optimizes the model construction process, employing a model-assisted annotation method to accelerate the dataset annotation speed. We have refined the Easy Data Augmentation (EDA) approach and expanded the dataset effectively using a geological dictionary, thus reducing the difficulty of manual annotation. Comparative experiments and ablation studies have proven that the improvements proposed in this study enhance the effectiveness of geological entity recognition. This offers an efficient and economical solution for geological text analysis, aiding the construction of knowledge graphs in the geological field and the intelligent processing of geological information.