当前位置: X-MOL 学术J. Web Semant. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
IndeGx: A model and a framework for indexing RDF knowledge graphs with SPARQL-based test suits
Journal of Web Semantics ( IF 2.5 ) Pub Date : 2023-01-20 , DOI: 10.1016/j.websem.2023.100775
Pierre Maillot , Olivier Corby , Catherine Faron , Fabien Gandon , Franck Michel

In recent years, a large number of RDF datasets have been built and published on the Web in fields as diverse as linguistics or life sciences, as well as general datasets such as DBpedia or Wikidata. The joint exploitation of these datasets requires specific knowledge about their content, access points, and commonalities. However, not all datasets contain a self-description, and not all access points can handle the complex queries used to generate such a description.

In this article, we provide a standard-based approach to generate the description of a dataset. The generated descriptions as well as the process of their computation are expressed using standard vocabularies and languages. We implemented our approach into a framework, called IndeGx, where each indexing feature and its computation is collaboratively and declaratively defined in a GitHub repository. We have experimented IndeGx on a set of 339 RDF datasets with endpoints listed in public catalogs, over 8 months. The results show that we can collect, as much as possible, important characteristics of the datasets depending on their availability and capacities. The resulting index captures the commonalities, variety and disparity in the offered content and services and it provides an important support to any application designed to query RDF datasets.



中文翻译:

IndeGx:使用基于 SPARQL 的测试套件对 RDF 知识图进行索引的模型和框架

近年来,在语言学或生命科学等不同领域,以及 DBpedia 或 Wikidata 等通用数据集,在 Web 上建立并发布了大量 RDF 数据集。联合开发这些数据集需要了解其内容、访问点和共性。然而,并非所有数据集都包含自我描述,也并非所有访问点都能处理用于生成此类描述的复杂查询。

在本文中,我们提供了一种基于标准的方法来生成数据集的描述。生成的描述及其计算过程使用标准词汇表和语言来表达。我们将我们的方法实施到一个名为 IndeGx 的框架中,其中每个索引功能及其计算都是在 GitHub 存储库中以协作和声明方式定义的。在 8 个月的时间里,我们在一组 339 个 RDF 数据集上对 IndeGx 进行了实验,这些数据集的端点列在公共目录中。结果表明,我们可以根据数据集的可用性和容量尽可能多地收集数据集的重要特征。生成的索引捕获了所提供内容和服务的共性、多样性和差异性,它为任何旨在查询 RDF 数据集的应用程序提供了重要支持。

更新日期:2023-01-20
down
wechat
bug