A study of the quality of Wikidata,Journal of Web Semantics

当前位置： X-MOL 学术 › J. Web Semant. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A study of the quality of Wikidata
Journal of Web Semantics ( IF 2.5 ) Pub Date : 2021-12-05 , DOI: 10.1016/j.websem.2021.100679
Kartik Shenoy ₁ , Filip Ilievski ₁ , Daniel Garijo ₂ , Daniel Schwabe ₃ , Pedro Szekely ₁

Affiliation

Wikidata has been increasingly adopted by many communities for a wide variety of applications, which demand high-quality knowledge to deliver successful results. In this paper, we develop a framework to detect and analyze low-quality statements in Wikidata by shedding light on the current practices exercised by the community. We explore three indicators of data quality in Wikidata, based on: (1) community consensus on the currently recorded knowledge, assuming that statements that have been removed and not added back are implicitly agreed to be of low quality; (2) statements that have been deprecated; and (3) constraint violations in the data. We combine these indicators to detect low-quality statements, revealing challenges with duplicate entities, missing triples, violated type rules, and taxonomic distinctions. Our findings complement ongoing efforts by the Wikidata community to improve data quality, aiming to make it easier for users and editors to find and correct mistakes.

中文翻译：

维基数据质量研究

Wikidata 越来越多地被许多社区用于各种应用程序，这些应用程序需要高质量的知识才能提供成功的结果。在本文中，我们开发了一个框架来检测和分析维基数据中的低质量语句，通过阐明社区当前的做法。我们探索了维基数据中数据质量的三个指标，基于：（1）社区对当前记录知识的共识，假设已被删除且未添加回来的陈述被隐含地同意为低质量；(2) 已被弃用的语句；(3) 数据中的约束违规。我们结合这些指标来检测低质量语句，揭示重复实体、缺失三元组、违反类型规则和分类差异的挑战。

更新日期：2021-12-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>