当前位置: X-MOL 学术J. Geogr. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CHTopoNER model-based method for recognizing Chinese place names from social media information
Journal of Geographical Systems ( IF 2.417 ) Pub Date : 2024-01-11 , DOI: 10.1007/s10109-023-00433-w
Mengwei Zhang , Xingui Liu , Zheng Zhang , Yue Qiu , Zhipeng Jiang , Pengyu Zhang

Chinese toponym recognition is crucial in named entity recognition and has significant implications for improving geographic information systems. Based on the real-time nature of social media and rich geographical data contained in social media, it is important to identify Chinese toponyms, including compound toponyms, informal toponyms, and other forms of social media content, for automatic geospatial information extraction. However, the strong word-building ability, diverse features, and ambiguity of Chinese toponyms combined with the linguistic irregularities of social media pose significant challenges for accurately locating toponym boundaries and resolving ambiguities. Furthermore, existing Chinese toponym recognition methods often ignore the fusion of local and global features during feature extraction, resulting in semantic information loss. Therefore, we used the Chinese-roberta-wwm-ext pre-trained language model to encode input text and obtain character-level information. An improved SoftLexicon-based statistical method was employed to acquire word-level semantic information, which was then integrated with character-level semantic information. A two-channel neural network layer comprising a bi-directional long short-term memory and an inception-dilated convolutional neural network was utilized to extract global and local features from text. Additionally, a conditional random field was applied to establish label constraints. The proposed deep neural network model, called CHTopoNER, is designed to identify various forms of Chinese toponyms in irregular Chinese social media content. Its effectiveness was validated on four publicly available annotated toponym datasets and a custom social media dataset. CHTopoNER surpasses state-of-the-art Chinese toponym recognition models and achieves promising results for extracting various types of toponyms and spatial location terms.



中文翻译:

基于CHTopoNER模型的社交媒体信息中文地名识别方法

中文地名识别对于命名实体识别至关重要,对于完善地理信息系统具有重要意义。基于社交媒体的实时性和社交媒体中包含的丰富地理数据,识别中文地名,包括复合地名、非正式地名和其他形式的社交媒体内容,对于地理空间信息的自动提取具有重要意义。然而,中文地名强大的构词能力、多样的特征和歧义性,加上社交媒体的语言不规则性,给准确定位地名边界和解决歧义带来了重大挑战。此外,现有的中文地名识别方法在特征提取时往往忽略局部特征和全局特征的融合,导致语义信息丢失。因此,我们使用Chinese-roberta-wwm-ext预训练语言模型对输入文本进行编码并获取字符级信息。采用改进的基于SoftLexicon的统计方法来获取词级语义信息,然后将其与字符级语义信息集成。利用由双向长短期记忆和初始扩张卷积神经网络组成的双通道神经网络层从文本中提取全局和局部特征。此外,还应用条件随机场来建立标签约束。所提出的深度神经网络模型名为 CHTopoNER,旨在识别不规则中文社交媒体内容中各种形式的中文地名。其有效性在四个公开可用的带注释的地名数据集和自定义社交媒体数据集上得到了验证。CHTopoNER 超越了最先进的中文地名识别模型,并在提取各种类型的地名和空间位置术语方面取得了可喜的结果。

更新日期:2024-01-12
down
wechat
bug