当前位置: X-MOL 学术GeoInformatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semi-supervised geological disasters named entity recognition using few labeled data
GeoInformatica ( IF 2 ) Pub Date : 2022-10-18 , DOI: 10.1007/s10707-022-00474-1
Xinya Lei , Weijing Song , Runyu Fan , Ruyi Feng , Lizhe Wang

The geological disasters Named Entity Recognition (NER) method aims to recognize entities reflecting disaster event information in unstructured texts to construct a geohazard knowledge graph that can provide a reference for disaster emergency response. Without training on large-scale labeled data, current NER methods based on deep learning models cannot identify specific geological disaster entities from geological disaster situation reports. However, manually labeling geohazard situation reports is tedious and time-consuming. As a result, we present Semi-GDNER, a semi-supervised geological disasters NER approach that can effectively extract six kinds of geological disaster entities when a few manually labeled and unlabeled in-domain data are available. It is divided into two stages: (1) transferring the parameters of the pre-trained BERT-base model to the BERT layer of the backbone model BERT-BiLSTM-CRF and training the backbone model with a few labeled data; (2) continuing training the backbone model by expanding the training set with unlabeled data using a self-training (ST) strategy. To reduce noise in the second stage, we select the pseudo-labeled samples with high confidence to join the training set in each ST iteration. Experiments on our constructed Geological Disaster NER data show that our approach achieves a higher F1 (0.88) than other NER approaches (including five supervised NER approaches and a semi-supervised NER approach using the ST strategy of expanding the training set with all pseudo-labeled data), demonstrating the effectiveness of our approach. Furthermore, experiments on four general Chinese NER datasets show that the framework of our approach is transferable.



中文翻译:

使用少量标记数据的半监督地质灾害命名实体识别

地质灾害命名实体识别(NER)方法旨在识别非结构化文本中反映灾害事件信息的实体,构建地质灾害知识图谱,为灾害应急响应提供参考。在没有对大规模标记数据进行训练的情况下,目前基于深度学习模型的 NER 方法无法从地质灾害形势报告中识别出特定的地质灾害实体。但是,手动标记地质灾害情况报告既繁琐又耗时。因此,我们提出了 Semi-GDNER,这是一种半监督的地质灾害 NER 方法,当一些手动标记和未标记的域内数据可用时,它可以有效地提取六种地质灾害实体。它分为两个阶段:(1) 将预训练的 BERT-base 模型的参数转移到主干模型 BERT-BiLSTM-CRF 的 BERT 层,并用少量标记数据训练主干模型;(2) 通过使用自训练 (ST) 策略用未标记数据扩展训练集来继续训练主干模型。为了减少第二阶段的噪声,我们在每次 ST 迭代中选择具有高置信度的伪标记样本加入训练集。对我们构建的地质灾害 NER 数据的实验表明,我们的方法比其他 NER 方法(包括五个监督 NER 方法和一个半监督 NER 方法)实现了更高的 F1(0.88),使用 ST 策略扩展训练集与所有伪标记数据),证明了我们方法的有效性。此外,

更新日期:2022-10-18
down
wechat
bug