当前位置: X-MOL 学术J. Web Semant. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Scaling up knowledge graph creation to large and heterogeneous data sources
Journal of Web Semantics ( IF 2.5 ) Pub Date : 2022-09-16 , DOI: 10.1016/j.websem.2022.100755
Enrique Iglesias , Samaneh Jozashoori , Maria-Esther Vidal

RDF knowledge graphs (KG) are powerful data structures to represent factual statements created from heterogeneous data sources. KG creation is laborious and demands data management techniques to be executed efficiently. This paper tackles the problem of the automatic generation of KG creation processes declaratively specified; it proposes techniques for planning and transforming heterogeneous data into RDF triples following mapping assertions specified in the RDF Mapping Language (RML). Given a set of mapping assertions, the planner provides an optimized execution plan by partitioning and scheduling the execution of the assertions. First, the planner assesses an optimized number of partitions considering the number of data sources, type of mapping assertions, and the associations between different assertions. After providing a list of partitions and assertions that belong to each partition, the planner determines their execution order. A greedy algorithm is implemented to generate the partitions’ bushy tree execution plan. Bushy tree plans are translated into operating system commands that guide the execution of the partitions of the mapping assertions in the order indicated by the bushy tree. The proposed optimization approach is evaluated over state-of-the-art RML-compliant engines, and existing benchmarks of data sources and RML triples maps. Our experimental results suggest that the performance of the studied engines can be considerably improved, particularly in a complex setting with numerous triples maps and large data sources. As a result, engines that time out in complex cases are enabled to produce at least a portion of the KG applying the planner.



中文翻译:

将知识图创建扩展到大型异构数据源

RDF 知识图 (KG) 是强大的数据结构,用于表示从异构数据源创建的事实陈述。KG 创建很费力,需要有效执行数据管理技术。本文解决了自动生成以声明方式指定的 KG 创建过程的问题;它提出了按照 RDF 映射语言 (RML) 中指定的映射断言来规划异构数据并将其转换为 RDF 三元组的技术。给定一组映射断言,计划器通过分区和调度断言的执行来提供优化的执行计划。首先,规划器根据数据源的数量、映射断言的类型以及不同断言之间的关联来评估优化的分区数量。在提供分区列表和属于每个分区的断言后,规划器确定它们的执行顺序。实现了一个贪心算法来生成分区的浓密树执行计划。浓树计划被翻译成操作系统命令,这些命令按照浓树指示的顺序引导映射断言的分区的执行。所提出的优化方法在最先进的 RML 兼容引擎以及数据源和 RML 三元组映射的现有基准上进行了评估。我们的实验结果表明,所研究引擎的性能可以显着提高,特别是在具有大量三元组映射和大型数据源的复杂环境中。因此,

更新日期:2022-09-16
down
wechat
bug