当前位置: X-MOL 学术GeoInformatica › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A spatially-aware algorithm for location extraction from structured documents
GeoInformatica ( IF 2 ) Pub Date : 2022-11-04 , DOI: 10.1007/s10707-022-00482-1
Praval Sharma , Ashok Samal , Leen-Kiat Soh , Deepti Joshi

Place names facilitate locating and distinguishing geographic space where human activities and natural phenomena occur. Extracting place names at multiple spatial resolutions from text is beneficial in several tasks such as identifying the location of events, enriching gazetteers, discovering connections between events and places, etc. Most modern place name extraction approaches generalize the linguistic rules and lexical features as a universal rule and ignore patterns inherent in place names in the geographic contexts. As a result, they lack spatial awareness to effectively identify place names from different geographic contexts, especially the lesser-known place names. In this research, we develop a novel Spatially-Aware Location Extraction (SALE) algorithm for place name extraction from structured documents that uses a hybrid approach comprising of knowledge-driven and data-driven methods. We build a custom named entity recognition (NER) system based on the conditional random field (CRF) and train/ fine-tune it using spatial features extracted from a dataset based on a given geographic region. SALE uses multiple pathways, including the use of the spatially tuned NER to enhance the efficacy in our place names extraction. The experimental results using a large geographic region show that our algorithm outperforms well-known state-of-the-art place name recognizers.



中文翻译:

从结构化文档中提取位置的空间感知算法

地名有助于定位和区分人类活动和自然现象发生的地理空间。从文本中提取多种空间分辨率的地名对于识别事件的位置、丰富地名词典、发现事件和地点之间的联系等多项任务是有益的。大多数现代地名提取方法将语言规则和词汇特征概括为通用的规则和忽略地理环境中地名固有的模式。因此,他们缺乏空间意识来有效地识别来自不同地理环境的地名,尤其是鲜为人知的地名。在这项研究中,我们开发了一种新颖的空间感知位置提取 (SALE) 算法,用于从结构化文档中提取地名,该算法使用由知识驱动和数据驱动方法组成的混合方法。我们基于条件随机场 (CRF) 构建了一个自定义命名实体识别 (NER) 系统,并使用从基于给定地理区域的数据集中提取的空间特征对其进行训练/微调。SALE 使用多种途径,包括使用空间调整的 NER 来提高我们地名提取的效率。使用大地理区域的实验结果表明,我们的算法优于众所周知的最先进的地名识别器。我们基于条件随机场 (CRF) 构建了一个自定义命名实体识别 (NER) 系统,并使用从基于给定地理区域的数据集中提取的空间特征对其进行训练/微调。SALE 使用多种途径,包括使用空间调整的 NER 来提高我们地名提取的效率。使用大地理区域的实验结果表明,我们的算法优于众所周知的最先进的地名识别器。我们基于条件随机场 (CRF) 构建了一个自定义命名实体识别 (NER) 系统,并使用从基于给定地理区域的数据集中提取的空间特征对其进行训练/微调。SALE 使用多种途径,包括使用空间调整的 NER 来提高我们地名提取的效率。使用大地理区域的实验结果表明,我们的算法优于众所周知的最先进的地名识别器。

更新日期:2022-11-04
down
wechat
bug