当前位置: X-MOL 学术VLDB J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A near-optimal approach to edge connectivity-based hierarchical graph decomposition
The VLDB Journal ( IF 4.2 ) Pub Date : 2023-05-06 , DOI: 10.1007/s00778-023-00797-x
Lijun Chang , Zhiyi Wang

The problem of efficiently computing all \(k\)-edge-connected components (\(k\)-ECCs) of a graph G for a user-given k has been extensively studied recently in view of its importance in many applications. The \(k\)-ECCs of G for all possible values of k form a hierarchical structure; that is, any two different \(k\)-ECCs for the same k value are disjoint and any \(k\)-ECC is contained in a unique \((k\text {-}1)\)-ECC. In this paper, we study the problem of efficiently constructing the hierarchy tree of the \(k\)-ECCs for all possible k values, for a graph G. The existing approaches \(\textsf{TD}\) and \(\textsf{BU}\) construct the hierarchy tree in either a top-down manner or a bottom-up manner, with both having the time complexity of \({{\mathcal {O}}}\big (\delta (G)\times {\mathsf {T_{KECC}}} (G)\big )\), where \(\delta (G)\) is the degeneracy of G and \({\mathsf {T_{KECC}}} (G)\) is the time complexity of computing all \(k\)-ECCs of G for a specific k value. Here, the degeneracy of G is defined as the maximum value among the minimum vertex degrees of all subgraphs of G and is at most \(\sqrt{m}\) where m is the number of edges in G. To improve the time complexity, we propose a divide-and-conquer approach \(\textsf{DC}\) running in \({{\mathcal {O}}}\big ( (\log \delta (G))\times {\mathsf {T_{KECC}}} (G)\big )\) time; this time complexity is optimal up to a logarithmic factor. However, a straightforward implementation of \(\textsf{DC}\) would take \({{\mathcal {O}}}( (m + n) \log \delta (G))\) main-memory space, which could easily run out-of-memory when processing large graphs; here, n is the number of vertices in G. To reduce the main-memory footprint of our algorithm, we propose adjacency array-based techniques to optimize the space complexity to \(2m+{{\mathcal {O}}}(n\log \delta (G))\) and denote our resulting algorithm by \(\mathsf {DC\text {-}AA}\). As a by-product of \(\mathsf {DC\text {-}AA}\), we also improve the space complexity of the state-of-the-art algorithm for computing all \(k\)-ECCs for a specific k to \(2m + {{\mathcal {O}}}(n)\), by using the same technique as used in \(\mathsf {DC\text {-}AA}\). Finally, we propose optimization techniques to improve the practical efficiency of the existing approach \(\textsf{BU}\) and denote the space-optimized version of it as \(\mathsf {BU^*\text {-}AA}\) which runs in \({{\mathcal {O}}}\big (\delta (G)\times {\mathsf {T_{KECC}}} (G)\big )\) time and \(2m+{{\mathcal {O}}}(n)\) space. Extensive experiments on large real graphs and synthetic graphs demonstrate that our algorithms \(\mathsf {DC\text {-}AA}\) and \(\mathsf {BU^*\text {-}AA}\) outperform the state-of-the-art approaches by up to 28 times in terms of running time and by up to 8 times in terms of main memory usage. In particular, our approach \(\mathsf {BU^*\text {-}AA}\) processes the Twitter graph, which has more than 1 billion undirected edges, in 29 min with 13.5 GB memory, while the state-of-the-art approaches take more than 13 h after our space optimization; note that the state-of-the-art approaches run out-of-memory if without our space optimization. Our empirical study also shows that \(\mathsf {BU^*\text {-}AA}\), despite having a higher time complexity, performs better than \(\mathsf {DC\text {-}AA}\) in practice. We also remark that \(\mathsf {BU^*\text {-}AA}\) is much simpler and easier to implement than \(\mathsf {DC\text {-}AA}\).



中文翻译:

基于边缘连通性的层次图分解的近最优方法

鉴于其在许多应用中的重要性,最近对用户给定k的图G的所有\(k\) -边连接分量(\(k\) -ECC)的问题进行了广泛研究。对于所有可能的k值, G\(k\) -ECC形成一个层次结构;也就是说,相同k值的任何两个不同的\(k\) -ECC都是不相交的,并且任何\(k\) -ECC 都包含在唯一的\((k\text {-}1)\) -ECC 中。在本文中,我们研究了有效构建\(k\) -图G的所有可能k值的 ECC 。现有方法\(\textsf{TD}\)\(\textsf{BU}\)以自上而下或自下而上的方式构造层次树,两者的时间复杂度均为\({ {\mathcal {O}}}\big (\delta (G)\times {\mathsf {T_{KECC}}} (G)\big )\) ,其中 \(\delta (G)\) 是G\({\mathsf {T_{KECC}}} (G)\) 的时间复杂度是针对特定k值计算G的所有\(k\) -ECC的时间复杂在这里, G的简并被定义为G的所有子图的最小顶点度数中的最大值并且至多为\(\sqrt{m}\)其中m是G中的边数。为了提高时间复杂度,我们提出了一种分而治之的方法\(\textsf{DC}\)运行在\({{\mathcal {O}}}\big ( (\log \delta (G))\时间 {\mathsf {T_{KECC}}} (G)\big )\)时间;这个时间复杂度是最优的,达到对数因子。然而, \(\textsf{DC}\)的直接实现将占用\({{\mathcal {O}}}( (m + n) \log \delta (G))\)主内存空间,这处理大图时很容易耗尽内存;在这里,n是G中的顶点数。为了减少我们算法的主内存占用,我们提出了基于邻接数组的技术来将空间复杂度优化为\(2m+{{\mathcal {O}}}(n\log \delta (G))\)并表示我们得到的算法是\(\mathsf {DC\text {-}AA}\)作为\(\mathsf {DC\text {-}AA}\)的副产品,我们还提高了用于计算所有\(k\) -ECC 的最先进算法的空间复杂度具体k\(2m + {{\mathcal {O}}}(n)\),通过使用与\(\mathsf {DC\text {-}AA}\)相同的技术. 最后,我们提出了优化技术来提高现有方法\(\textsf{BU}\)的实际效率,并将其空间优化版本表示为\(\mathsf {BU^*\text {-}AA}\ )它运行在\({{\mathcal {O}}}\big (\delta (G)\times {\mathsf {T_{KECC}}} (G)\big )\) 时间和 \(2m+ { { \mathcal {O}}}(n)\)空间。对大型真实图和合成图的大量实验表明,我们的算法\(\mathsf {DC\text {-}AA}\)\(\mathsf {BU^*\text {-}AA}\)优于状态最先进的方法在运行时间方面高达 28 倍,在主内存使用方面高达 8 倍。特别是,我们的方法\(\mathsf {BU^*\text {-}AA}\)在 29 分钟内使用 13.5 GB 内存处理具有超过 10 亿条无向边的 Twitter 图,而最先进的方法采用空间优化后超过 13 小时;请注意,如果没有我们的空间优化,最先进的方法会耗尽内存。我们的实证研究还表明\(\mathsf {BU^*\text {-}AA}\)尽管具有更高的时间复杂度,但在以下方面表现优于\(\mathsf {DC\text {-}AA}\)实践。我们还注意到\(\mathsf {BU^*\text {-}AA}\)比\(\mathsf {DC\text {-}AA}\)更简单、更容易实现。

更新日期:2023-05-07
down
wechat
bug