当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Representative and Back-In-Time Sampling from Real-World Hypergraphs
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2024-03-19 , DOI: 10.1145/3653306
Minyoung Choe 1 , Jaemin Yoo 2 , Geon Lee 1 , Woonsung Baek 2 , U Kang 3 , Kijung Shin 1
Affiliation  

Graphs are widely used for representing pairwise interactions in complex systems. Since such real-world graphs are large and often evergrowing, sampling subgraphs is useful for various purposes, including simulation, visualization, stream processing, representation learning, and crawling. However, many complex systems consist of group interactions (e.g., collaborations of researchers and discussions on online Q&A platforms) and thus are represented more naturally and accurately by hypergraphs than by ordinary graphs. Motivated by the prevalence of large-scale hypergraphs, we study the problem of sampling from real-world hypergraphs, aiming to answer (Q1) how can we measure the goodness of sub-hypergraphs, and (Q2) how can we efficiently find a “good” sub-hypergraph. Regarding Q1, we distinguish between two goals: (a) representative sampling, which aims to capture the characteristics of the input hypergraph, and (b) back-in-time sampling, which aims to closely approximate a past snapshot of the input time-evolving hypergraph. To evaluate the similarity of the sampled sub-hypergraph to the target (i.e., the input hypergraph or its past snapshot), we consider 10 graph-level, hyperedge-level, and node-level statistics. Regarding Q2, we first conduct a thorough analysis of various intuitive approaches using 11 real-world hypergraphs, Then, based on this analysis, we propose MiDaS and MiDaS-B, designed for representative sampling and back-in-time sampling, respectively. Regarding representative sampling, we demonstrate through extensive experiments that MiDaS, which employs a sampling bias towards high-degree nodes in hyperedge selection, is (a) Representative: finding overall the most representative samples among 15 considered approaches, (b) Fast: several orders of magnitude faster than the strongest competitors, and (c) Automatic: automatically tuning the degree of sampling bias. Regarding back-in-time sampling, we demonstrate that MiDaS-B inherits the strengths of MiDaS despite an additional challenge—the unavailability of the target (i.e., past snapshot). It effectively handles this challenge by focusing on replicating universal evolutionary patterns, rather than directly replicating the target.



中文翻译:

来自现实世界超图的代表性采样和回溯采样

图广泛用于表示复杂系统中的成对相互作用。由于此类现实世界的图很大且经常不断增长,因此采样子图可用于各种目的,包括模拟、可视化、流处理、表示学习和爬行。然而,许多复杂系统由群体交互组成(例如,研究人员的协作和在线问答平台上的讨论),因此超图比普通图更自然、更准确地表示。受大规模超图盛行的推动,我们研究了从现实世界超图中采样的问题,旨在回答(Q1)我们如何衡量子超图的优点,以及(Q2)我们如何有效地找到“好”子超图。关于 Q1,我们区分两个目标:(a)代表性采样,旨在捕获输入超图的特征;(b)回溯采样,旨在密切近似输入时间的过去快照 -不断演化的超图。为了评估采样的子超图与目标(即输入超图或其过去的快照)的相似性,我们考虑 10 个图级、超边级和节点级统计数据。关于Q2,我们首先使用11个现实世界的超图对各种直观方法进行彻底分析,然后,基于此分析,我们提出MiDaSMiDaS-B,分别设计用于代表性采样和回溯采样。关于代表性采样,我们通过广泛的实验证明,MiDaS在超边选择中采用了对高度节点的采样偏差,它是 (a)代表性:在 15 种考虑的方法中总体上找到最具代表性的样本,(b)快速:几个订单比最强的竞争对手快很多,并且 (c)自动:自动调整采样偏差程度。关于回溯采样,我们证明MiDaS-B继承了MiDaS的优势,尽管存在额外的挑战——目标(即过去的快照)不可用。它通过专注于复制普遍的进化模式,而不是直接复制目标,有效地应对了这一挑战。

更新日期:2024-03-19
down
wechat
bug