ACM Transactions on Database Systems ( IF 1.8 ) Pub Date : 2024-02-28 , DOI: 10.1145/3639363 Wenfei Fan 1 , Ping Lu 2 , Kehan Pang 2 , Ruochun Jin 3
This article proposes a notion of parametric simulation to link entities across a relational database 𝒟 and a graph G. Taking functions and thresholds for measuring vertex closeness, path associations, and important properties as parameters, parametric simulation identifies tuples t in 𝒟 and vertices v in G that refer to the same real-world entity, based on both topological and semantic matching. We develop machine learning methods to learn the parameter functions and thresholds. We show that parametric simulation is in quadratic-time by providing such an algorithm. Moreover, we develop an incremental algorithm for parametric simulation; we show that the incremental algorithm is bounded relative to its batch counterpart, i.e., it incurs the minimum cost for incrementalizing the batch algorithm. Putting these together, we develop
中文翻译:
跨关系和图表链接实体
本文提出了参数模拟的概念,用于跨关系数据库 𝒟 和图G链接实体。以测量顶点紧密度、路径关联和重要属性的函数和阈值作为参数,参数化模拟基于拓扑和语义匹配来识别𝒟 中的元组t和G中的顶点v,它们引用同一现实世界实体。我们开发机器学习方法来学习参数函数和阈值。通过提供这样的算法,我们证明了参数仿真是二次时间的。此外,我们开发了一种用于参数模拟的增量算法;我们表明,增量算法相对于其批处理算法是有界的,即增量批处理算法的成本最小。将这些放在一起,我们开发