A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants,Journal of Web Semantics

当前位置： X-MOL 学术 › J. Web Semant. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants
Journal of Web Semantics ( IF 2.5 ) Pub Date : 2024-02-20 , DOI: 10.1016/j.websem.2024.100815
Andreas Eibeck , Shaocong Zhang , Mei Qi Lim , Markus Kraft

Knowledge graphs store and link semantically annotated data about real-world entities from a variety of domains and on a large scale. The World Avatar is based on a dynamic decentralised knowledge graph and on semantic technologies to realise complex cross-domain scenarios. Accurate computational results for such scenarios require the availability of complete, high-quality data. This work focuses on instance matching — one of the subtasks of automatically populating the knowledge graph with data from a wide spectrum of external sources. Instance matching compares two data sets and seeks to identify instances (data, records) referring to the same real-world entity. We introduce AutoCal, a new instance matcher which does not require labelled data and runs out of the box for a wide range of domains without tuning method-specific parameters. AutoCal achieves results competitive to recently proposed unsupervised matchers from the field of Machine Learning. We also select an unsupervised state-of-the-art matcher from the field of Deep Learning for a thorough comparison. Our results show that neither AutoCal nor the state-of-the-art matcher is superior regarding matching quality while AutoCal has only moderate hardware requirements and runs 2.7 to 60 times faster. In summary, AutoCal is specifically well-suited to be used in an automated environment. We present its prototypical integration into the World Avatar and apply AutoCal to the domain of power plants which is relevant for practical environmental scenarios of the World Avatar.

中文翻译：

一种简单高效的无监督实例匹配方法及其在发电厂关联数据中的应用

知识图存储并链接来自各种领域和大规模的现实世界实体的语义注释数据。世界阿凡达基于动态的去中心化知识图谱和语义技术来实现复杂的跨领域场景。此类场景的准确计算结果需要完整、高质量的数据。这项工作的重点是实例匹配——用来自广泛外部源的数据自动填充知识图的子任务之一。实例匹配比较两个数据集，并寻求识别引用同一现实世界实体的实例（数据、记录）。我们引入了 AutoCal，这是一种新的实例匹配器，它不需要标记数据，并且开箱即用，适用于各种领域，无需调整特定于方法的参数。 AutoCal 取得的结果可与机器学习领域最近提出的无监督匹配器相媲美。我们还从深度学习领域选择了一个无监督的最先进的匹配器来进行彻底的比较。我们的结果表明，AutoCal 和最先进的匹配器在匹配质量方面均不具备优越性，而 AutoCal 仅对硬件要求适中，并且运行速度快 2.7 至 60 倍。总之，AutoCal 特别适合在自动化环境中使用。我们展示了其与世界阿凡达的原型集成，并将 AutoCal 应用到与世界阿凡达的实际环境场景相关的发电厂领域。

更新日期：2024-02-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>