当前位置: X-MOL 学术ACM Trans. Database Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Linking Entities across Relations and Graphs
ACM Transactions on Database Systems ( IF 1.8 ) Pub Date : 2024-02-28 , DOI: 10.1145/3639363
Wenfei Fan 1 , Ping Lu 2 , Kehan Pang 2 , Ruochun Jin 3
Affiliation  

This article proposes a notion of parametric simulation to link entities across a relational database 𝒟 and a graph G. Taking functions and thresholds for measuring vertex closeness, path associations, and important properties as parameters, parametric simulation identifies tuples t in 𝒟 and vertices v in G that refer to the same real-world entity, based on both topological and semantic matching. We develop machine learning methods to learn the parameter functions and thresholds. We show that parametric simulation is in quadratic-time by providing such an algorithm. Moreover, we develop an incremental algorithm for parametric simulation; we show that the incremental algorithm is bounded relative to its batch counterpart, i.e., it incurs the minimum cost for incrementalizing the batch algorithm. Putting these together, we develop HER, a parallel system to check whether (t, v) makes a match, find all vertex matches of t in G, and compute all matches across 𝒟 and G, all in quadratic-time; moreover, HER supports incremental computation of these in response to updates to 𝒟 and G. Using real-life and synthetic data, we empirically verify that HER is accurate with F-measure of 0.94 on average, and is able to scale with database 𝒟 and graph G for both batch and incremental computations.



中文翻译:

跨关系和图表链接实体

本文提出了参数模拟的概念,用于跨关系数据库 𝒟 和图G链接实体。以测量顶点紧密度、路径关联和重要属性的函数和阈值作为参数,参数化模拟基于拓扑和语义匹配来识别𝒟 中的元组t和G中的顶点v,它们引用同一现实世界实体。我们开发机器学习方法来学习参数函数和阈值。通过提供这样的算法,我们证明了参数仿真是二次时间的。此外,我们开发了一种用于参数模拟的增量算法;我们表明,增量算法相对于其批处理算法是有界的,即增量批处理算法的成本最小。将这些放在一起,我们开发,一个并行系统,用于检查 ( t, v ) 是否匹配,查找Gt的所有顶点匹配,并计算 𝒟 和G之间的所有匹配,所有这些都在二次时间内完成;而且,支持增量计算以响应 𝒟 和G的更新。使用现实生活和合成数据,我们凭经验验证准确度平均为 0.94,并且能够通过数据库 𝒟 和图G进行扩展,以进行批量和增量计算。

更新日期:2024-02-28
down
wechat
bug