Abstract
Graph summarization has become integral for managing and analyzing large-scale graphs in diverse real-world applications, including social networks, biological networks, and communication networks. Existing methods for graph summarization often face challenges, being either computationally expensive, limiting their applicability to large graphs, or lacking the incorporation of node attributes. In response, we introduce SsAG, an efficient and scalable lossy graph summarization method designed to preserve the essential structure of the original graph.
SsAG computes a sparse representation (summary) of the input graph, accommodating graphs with node attributes. The summary is structured as a graph on supernodes (subsets of vertices of G), where weighted superedges connect pairs of supernodes. The methodology focuses on constructing a summary graph with k supernodes, aiming to minimize the reconstruction error (the difference between the original graph and the graph reconstructed from the summary) while maximizing homogeneity with respect to the node attributes. The construction process involves iteratively merging pairs of nodes.
To enhance computational efficiency, we derive a closed-form expression for efficiently computing the reconstruction error (RE) after merging a pair, enabling constant-time approximation of this score. We assign a weight to each supernode, quantifying their contribution to the score of pairs, and utilize a weighted sampling strategy to select the best pair for merging. Notably, a logarithmic-sized sample achieves a summary comparable in quality based on various measures. Additionally, we propose a sparsification step for the constructed summary, aiming to reduce storage costs to a specified target size with a marginal increase in RE.
Empirical evaluations across diverse real-world graphs demonstrate that SsAG exhibits superior speed, being up to 17 × faster, while generating summaries of comparable quality. This work represents a significant advancement in the field, addressing computational challenges and showcasing the effectiveness of SsAG in graph summarization.
- [1] . 2017. Spectral methods for immunization of large networks. Australas. J. Inf. Syst. 21 (2017), 1–18.Google ScholarCross Ref
- [2] . 2021. Predicting attributes of nodes using network structure. ACM Trans. Intell. Syst. Technol. 12, 2 (2021), 1–23.Google ScholarDigital Library
- [3] . 2018. Scalable approximation algorithm for graph summarization. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. 502–514.Google ScholarDigital Library
- [4] . 2023. DAC-HPP: Deep attributed clustering with high-order proximity preserve. Neural Comput. Applic. 35, 34 (2023), 24493–24511.Google ScholarDigital Library
- [5] . 2022. Graph regularized nonnegative matrix factorization for community detection in attributed networks. IEEE Trans. Netw. Sci. Eng. 10, 1 (2022), 372–385.Google ScholarCross Ref
- [6] . 2018. Sampling strategies for extracting information from large data sets. Data Knowl. Eng. 115 (2018), 1–15.Google ScholarCross Ref
- [7] . 2021. HHGN: A hierarchical reasoning-based heterogeneous graph neural network for fact verification. Inf. Process. Manag. 58, 5 (2021), 102659:1–102659:14.Google ScholarDigital Library
- [8] . 2005. An improved data stream summary: The count-min sketch and its applications. J. Algor. 55, 1 (2005), 58–75.Google ScholarDigital Library
- [9] . 2022. A survey of sampling method for social media embeddedness relationship. Comput. Surv. 55, 4 (2022), 1–39.Google ScholarDigital Library
- [10] . 2020. Estimating descriptors for large graphs. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. 779–791.Google ScholarDigital Library
- [11] . 2017. Toward query-friendly compression of rapid graph streams. Soc. Netw. Anal. Min. 7, 1 (2017), 23:1–23:19.Google ScholarCross Ref
- [12] . 2020. Compact group discovery in attributed graphs and social networks. Inf. Process. Manag. 57, 2 (2020), 102054.Google ScholarDigital Library
- [13] . 2017. Faster compression methods for a weighted graph using locality sensitive hashing. Inf. Sci. 421 (2017), 237–253.Google ScholarDigital Library
- [14] . 2015. Set-based approximate approach for lossless graph summarization. Computing 97, 12 (2015), 1185–1207.Google ScholarDigital Library
- [15] . 2017. Set-based unified approach for summarization of a multi-attributed graph. World Wide Web 20, 3 (2017), 543–570.Google ScholarDigital Library
- [16] . 2020. Incremental lossless graph summarization. In International Conference on Knowledge Discovery and Data Mining. 317–327.Google ScholarDigital Library
- [17] . 2014. VOG: Summarizing and understanding large graphs. In SIAM International Conference on Data Mining. 91–99.Google ScholarCross Ref
- [18] . 2013. Konect: The Koblenz network collection. In International Conference on World Wide Web. 1343–1350.Google ScholarDigital Library
- [19] . 2020. SSumM: Sparse summarization of massive graphs. In International Conference on Knowledge Discovery and Data Mining. 144–154.Google ScholarDigital Library
- [20] . 2010. GraSS: Graph structure summarization. In SIAM International Conference on Data Mining. 454–465.Google ScholarCross Ref
- [21] . 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/dataGoogle Scholar
- [22] . 2014. SlashBurn: Graph compression and mining beyond caveman communities. IEEE Trans. Knowl. Data Eng. 26 (2014), 3077–3089.Google ScholarCross Ref
- [23] . 2018. Graph summarization methods and applications: A survey. Comput. Surv. 51, 3 (2018), 62:1–62:34.Google Scholar
- [24] . 2012. Approximate homogeneous graph summarization. J. Inf. Process. 20, 1 (2012), 77–88.Google Scholar
- [25] . 2008. Graph summarization with bounded error. In International Conference on Management of Data. 419–432.Google ScholarDigital Library
- [26] . 2021. Movie summarization via sparse graph construction. In Association for the Advancement of Artificial Intelligence.Google Scholar
- [27] . 2012. Spotting culprits in epidemics: How many and which ones? In International Conference on Data Mining. 11–20.Google ScholarDigital Library
- [28] . 2016. Efficient online summarization of large-scale dynamic networks. IEEE Trans. Knowl. Data Eng. 28, 12 (2016), 3231–3245.Google ScholarDigital Library
- [29] . 2016. Sampling algorithms for weighted networks. Soc. Netw. Anal. Min. 6 (2016), 1–22.Google ScholarCross Ref
- [30] . 2017. Sampling algorithms for stochastic graphs: A learning automata approach. Knowl.-based Syst. 127 (2017), 126–144.Google ScholarDigital Library
- [31] . 2017. Graph summarization with quality guarantees. Data Min. Knowl. Discov. 31, 2 (2017), 314–349.Google ScholarDigital Library
- [32] . 2015. The network data repository with interactive graph analytics and visualization. In AAAI. Retrieved from https://networkrepository.comGoogle Scholar
- [33] . 2019. Personalized knowledge graph summarization: From the cloud to your pocket. In IEEE International Conference on Data Mining. 528–537.Google ScholarCross Ref
- [34] . 2020. An effective graph summarization and compression technique for a large-scaled graph. J. Supercomput. 76 (2020), 7906–7920.Google ScholarDigital Library
- [35] . 2015. VEGAS: Visual influence graph summarization on citation networks. IEEE Trans. Knowl. Data Eng. 27, 12 (2015), 3417–3431.Google ScholarDigital Library
- [36] . 2019. SWeG: Lossless and lossy summarization of web-scale graphs. In International Conference on World Wide Web. 1679–1690.Google ScholarDigital Library
- [37] . 2021. Temporally evolving graph neural network for fake news detection. Inf. Process. Manag. 58, 6 (2021), 102712:1–102712:18.Google ScholarDigital Library
- [38] . 2016. Graph stream summarization: From big bang to big crunch. In International Conference on Management of Data. 1481–1496.Google ScholarDigital Library
- [39] . 2008. Efficient aggregation for graph summarization. In International Conference on Management of Data. 567–580.Google ScholarDigital Library
- [40] . 2016. Scalable dynamic graph summarization. In International Conference on Big Data. 1032–1039.Google ScholarCross Ref
- [41] . 1980. An efficient method for weighted sampling without replacement. SIAM J. Comput. 9, 1 (1980), 111–113.Google ScholarDigital Library
- [42] . 2013. Towards graph summary and aggregation: A survey. In Social Media Retrieval and Mining. Springer.Google ScholarCross Ref
- [43] . 2015. DaVinci: Data-driven visual interface construction for subgraph search in graph databases. In International Conference on Data Engineering. 1500–1503.Google ScholarCross Ref
- [44] . 2017. Summarisation of weighted networks. J. Experim. Theoret. Artif. Intell. 29, 5 (2017), 1023–1052.Google ScholarCross Ref
Index Terms
- SsAG: Summarization and Sparsification of Attributed Graphs
Recommendations
SSumM: Sparse Summarization of Massive Graphs
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningGiven a graph G and the desired size k in bits, how can we summarize G within k bits, while minimizing the information loss?
Large-scale graphs have become omnipresent, posing considerable computational challenges. Analyzing such large graphs can be ...
On augmenting topological graph representations for attributed graphs▪
AbstractGraph representations based on embedding methods allow for easier analysis of the network structure and can be used for a variety of tasks, such as link prediction and node classification. These methods have been shown to be effective ...
Highlights- This study proposes an attributed graph augmentation framework for attributed graphs
Dense Subgraphs Summarization: An Efficient Way to Summarize Large Scale Graphs by Super Nodes
Intelligent Computing MethodologiesAbstractFor large scale graphs, the graph summarization technique is essential, which can reduce the complexity for large-scale graphs analysis. The traditional graph summarization methods focus on reducing the complexity of original graph, and ignore the ...
Comments