当前位置: X-MOL 学术J. Web Semant. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Materialisation and data partitioning algorithms for distributed RDF systems
Journal of Web Semantics ( IF 2.5 ) Pub Date : 2022-03-31 , DOI: 10.1016/j.websem.2022.100711
Temitope Ajileye 1 , Boris Motik 1
Affiliation  

Many RDF systems support reasoning with Datalog rules via materialisation, where all conclusions of RDF data and the rules are precomputed and explicitly stored in a preprocessing step. As the amount of RDF data used in applications keeps increasing, processing large datasets often requires distributing the data in a cluster of shared-nothing servers. While numerous distributed query answering techniques are known, distributed materialisation is less well understood. In this paper, we present several techniques that facilitate scalable materialisation in distributed RDF systems. First, we present a new distributed materialisation algorithm that aims to minimise communication and synchronisation in the cluster. Second, we present two new algorithms for partitioning RDF data, both of which aim to produce tightly connected partitions, but without loading complete datasets into memory. We evaluate our materialisation algorithm against two state-of-the-art distributed Datalog systems and show that our technique offers competitive performance, particularly when the rules are complex. Moreover, we analyse in depth the effects of data partitioning on reasoning performance and show that our techniques offer performance comparable or superior to the state of the art min-cut partitioning, but computing the partitions requires considerably less time and memory.



中文翻译:

分布式 RDF 系统的物化和数据分区算法

许多 RDF 系统通过物化支持使用 Datalog 规则进行推理,其中 RDF 数据和规则的所有结论都预先计算并显式存储在预处理步骤中。随着应用程序中使用的 RDF 数据量不断增加,处理大型数据集通常需要将数据分布在无共享服务器集群中。虽然许多分布式查询回答技术是已知的,但分布式物化却不太为人所知。在本文中,我们提出了几种促进分布式 RDF 系统中可扩展物化的技术。首先,我们提出了一种新的分布式物化算法,旨在最大限度地减少集群中的通信和同步。其次,我们提出了两种用于划分 RDF 数据的新算法,这两种算法都旨在产生紧密连接的分区,但不会将完整的数据集加载到内存中。我们针对两个最先进的分布式 Datalog 系统评估我们的物化算法,并表明我们的技术提供了具有竞争力的性能,特别是在规则复杂的情况下。此外,我们深入分析了数据分区对推理性能的影响,并表明我们的技术提供的性能与最先进的 min-cut 分区相当或更好,但计算分区需要的时间和内存要少得多。

更新日期:2022-03-31
down
wechat
bug