skip to main content
research-article

SsAG: Summarization and Sparsification of Attributed Graphs

Published:12 April 2024Publication History
Skip Abstract Section

Abstract

Graph summarization has become integral for managing and analyzing large-scale graphs in diverse real-world applications, including social networks, biological networks, and communication networks. Existing methods for graph summarization often face challenges, being either computationally expensive, limiting their applicability to large graphs, or lacking the incorporation of node attributes. In response, we introduce SsAG, an efficient and scalable lossy graph summarization method designed to preserve the essential structure of the original graph.

SsAG computes a sparse representation (summary) of the input graph, accommodating graphs with node attributes. The summary is structured as a graph on supernodes (subsets of vertices of G), where weighted superedges connect pairs of supernodes. The methodology focuses on constructing a summary graph with k supernodes, aiming to minimize the reconstruction error (the difference between the original graph and the graph reconstructed from the summary) while maximizing homogeneity with respect to the node attributes. The construction process involves iteratively merging pairs of nodes.

To enhance computational efficiency, we derive a closed-form expression for efficiently computing the reconstruction error (RE) after merging a pair, enabling constant-time approximation of this score. We assign a weight to each supernode, quantifying their contribution to the score of pairs, and utilize a weighted sampling strategy to select the best pair for merging. Notably, a logarithmic-sized sample achieves a summary comparable in quality based on various measures. Additionally, we propose a sparsification step for the constructed summary, aiming to reduce storage costs to a specified target size with a marginal increase in RE.

Empirical evaluations across diverse real-world graphs demonstrate that SsAG exhibits superior speed, being up to 17 × faster, while generating summaries of comparable quality. This work represents a significant advancement in the field, addressing computational challenges and showcasing the effectiveness of SsAG in graph summarization.

REFERENCES

  1. [1] Ahmad Muhammad, Tariq Juvaria, Shabbir Mudassir, and Khan Imdadullah. 2017. Spectral methods for immunization of large networks. Australas. J. Inf. Syst. 21 (2017), 118.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Ali Sarwan, Shakeel Muhammad, Khan Imdadullah, Faizullah Safiullah, and Khan Muhammad. 2021. Predicting attributes of nodes using network structure. ACM Trans. Intell. Syst. Technol. 12, 2 (2021), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Beg Maham, Ahmad Muhammad, Zaman Arif, and Khan Imdadullah. 2018. Scalable approximation algorithm for graph summarization. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. 502514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Berahmand Kamal, Li Yuefeng, and Xu Yue. 2023. DAC-HPP: Deep attributed clustering with high-order proximity preserve. Neural Comput. Applic. 35, 34 (2023), 2449324511.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Berahmand Kamal, Mohammadi Mehrnoush, Saberi-Movahed Farid, Li Yuefeng, and Xu Yue. 2022. Graph regularized nonnegative matrix factorization for community detection in attributed networks. IEEE Trans. Netw. Sci. Eng. 10, 1 (2022), 372385.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Boicea Alexandru, Truică Ciprian-Octavian, Rădulescu Florin, and Buşe Elena-Cristina. 2018. Sampling strategies for extracting information from large data sets. Data Knowl. Eng. 115 (2018), 115.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chen Chonghao, Cai Fei, Hu Xuejun, Chen Wanyu, and Chen Honghui. 2021. HHGN: A hierarchical reasoning-based heterogeneous graph neural network for fact verification. Inf. Process. Manag. 58, 5 (2021), 102659:1–102659:14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Cormode Graham and Muthukrishnan S.. 2005. An improved data stream summary: The count-min sketch and its applications. J. Algor. 55, 1 (2005), 5875.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Cui Yingan, Li Xue, Li Junhuai, Wang Huaijun, and Chen Xiaogang. 2022. A survey of sampling method for social media embeddedness relationship. Comput. Surv. 55, 4 (2022), 139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Hassan Zohair, Shabbir Mudassir, Khan Imdadullah, and Abbas Waseem. 2020. Estimating descriptors for large graphs. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. 779791.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Khan Arijit and Aggarwal Charu. 2017. Toward query-friendly compression of rapid graph streams. Soc. Netw. Anal. Min. 7, 1 (2017), 23:1–23:19.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Khan Abeer, Golab Lukasz, Kargar Mehdi, Szlichta Jaroslaw, and Zihayat Morteza. 2020. Compact group discovery in attributed graphs and social networks. Inf. Process. Manag. 57, 2 (2020), 102054.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Khan Kifayat-Ullah, Dolgorsuren Batjargal, Tu Nguyen, Nawaz Waqas, and Lee Young-Koo. 2017. Faster compression methods for a weighted graph using locality sensitive hashing. Inf. Sci. 421 (2017), 237253.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Khan Kifayat-Ullah, Nawaz Waqas, and Lee Young-Koo. 2015. Set-based approximate approach for lossless graph summarization. Computing 97, 12 (2015), 11851207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Khan Kifayat-Ullah, Nawaz Waqas, and Lee Young-Koo. 2017. Set-based unified approach for summarization of a multi-attributed graph. World Wide Web 20, 3 (2017), 543570.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Ko Jihoon, Kook Yunbum, and Shin Kijung. 2020. Incremental lossless graph summarization. In International Conference on Knowledge Discovery and Data Mining. 317327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Koutra Danai, Kang U., Vreeken Jilles, and Faloutsos Christos. 2014. VOG: Summarizing and understanding large graphs. In SIAM International Conference on Data Mining. 9199.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Kunegis Jérôme. 2013. Konect: The Koblenz network collection. In International Conference on World Wide Web. 13431350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Lee Kyuhan, Jo Hyeonsoo, Ko Jihoon, Lim Sungsu, and Shin Kijung. 2020. SSumM: Sparse summarization of massive graphs. In International Conference on Knowledge Discovery and Data Mining. 144154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] LeFevre Kristen and Terzi Evimaria. 2010. GraSS: Graph structure summarization. In SIAM International Conference on Data Mining. 454465.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Leskovec Jure and Krevl Andrej. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/dataGoogle ScholarGoogle Scholar
  22. [22] Lim Yongsub, Kang U., and Faloutsos Christos. 2014. SlashBurn: Graph compression and mining beyond caveman communities. IEEE Trans. Knowl. Data Eng. 26 (2014), 30773089.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Liu Yike, Safavi Tara, Dighe Abhilash, and Koutra Danai. 2018. Graph summarization methods and applications: A survey. Comput. Surv. 51, 3 (2018), 62:1–62:34.Google ScholarGoogle Scholar
  24. [24] Liu Zheng, Yu Jeffrey Xu, and Cheng Hong. 2012. Approximate homogeneous graph summarization. J. Inf. Process. 20, 1 (2012), 7788.Google ScholarGoogle Scholar
  25. [25] Navlakha Saket, Rastogi Rajeev, and Shrivastava Nisheeth. 2008. Graph summarization with bounded error. In International Conference on Management of Data. 419432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Papalampidi Pinelopi, Keller Frank, and Lapata Mirella. 2021. Movie summarization via sparse graph construction. In Association for the Advancement of Artificial Intelligence.Google ScholarGoogle Scholar
  27. [27] Prakash B., Vreeken Jilles, and Faloutsos Christos. 2012. Spotting culprits in epidemics: How many and which ones? In International Conference on Data Mining. 1120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Qu Qiang, Liu Siyuan, Zhu Feida, and Jensen Christian. 2016. Efficient online summarization of large-scale dynamic networks. IEEE Trans. Knowl. Data Eng. 28, 12 (2016), 32313245.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Rezvanian Alireza and Meybodi Mohammad Reza. 2016. Sampling algorithms for weighted networks. Soc. Netw. Anal. Min. 6 (2016), 122.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Rezvanian Alireza and Meybodi Mohammad Reza. 2017. Sampling algorithms for stochastic graphs: A learning automata approach. Knowl.-based Syst. 127 (2017), 126144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Riondato Matteo, García-Soriano David, and Bonchi Francesco. 2017. Graph summarization with quality guarantees. Data Min. Knowl. Discov. 31, 2 (2017), 314349.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Rossi Ryan A. and Ahmed Nesreen K.. 2015. The network data repository with interactive graph analytics and visualization. In AAAI. Retrieved from https://networkrepository.comGoogle ScholarGoogle Scholar
  33. [33] Safavi Tara, Belth Caleb, Faber Lukas, Mottin Davide, Müller Emmanuel, and Koutra Danai. 2019. Personalized knowledge graph summarization: From the cloud to your pocket. In IEEE International Conference on Data Mining. 528537.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Seo Hojin, Park Kisung, Han Yongkoo, Kim Hyunwook, Umair Muhammad, Khan Kifayat Ullah, and Lee Young-Koo. 2020. An effective graph summarization and compression technique for a large-scaled graph. J. Supercomput. 76 (2020), 79067920.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Shi Lei, Tong Hanghang, Tang Jie, and Lin Chuang. 2015. VEGAS: Visual influence graph summarization on citation networks. IEEE Trans. Knowl. Data Eng. 27, 12 (2015), 34173431.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Shin Kijung, Ghoting Amol, Kim Myunghwan, and Raghavan Hema. 2019. SWeG: Lossless and lossy summarization of web-scale graphs. In International Conference on World Wide Web. 16791690.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Song Chenguang, Shu Kai, and Wu Bin. 2021. Temporally evolving graph neural network for fake news detection. Inf. Process. Manag. 58, 6 (2021), 102712:1–102712:18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Tang Nan, Chen Qing, and Mitra Prasenjit. 2016. Graph stream summarization: From big bang to big crunch. In International Conference on Management of Data. 14811496.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Tian Yuanyuan, Hankins Richard, and Patel Jignesh. 2008. Efficient aggregation for graph summarization. In International Conference on Management of Data. 567580.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Tsalouchidou Ioanna, Morales Gianmarco, Bonchi Francesco, and Baeza-Yates Ricardo. 2016. Scalable dynamic graph summarization. In International Conference on Big Data. 10321039.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Wong C. and Easton Malcolm. 1980. An efficient method for weighted sampling without replacement. SIAM J. Comput. 9, 1 (1980), 111113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] You Jinguo, Pan Qiuping, Shi Wei, Zhang Zhipeng, and Hu Jianhua. 2013. Towards graph summary and aggregation: A survey. In Social Media Retrieval and Mining. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Zhang Jinbo, Bhowmick Sourav S., Nguyen Hong H., Choi Byron, and Zhu Feida. 2015. DaVinci: Data-driven visual interface construction for subgraph search in graph databases. In International Conference on Data Engineering. 15001503.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Zhou Fang, Qu Qiang, and Toivonen Hannu. 2017. Summarisation of weighted networks. J. Experim. Theoret. Artif. Intell. 29, 5 (2017), 10231052.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. SsAG: Summarization and Sparsification of Attributed Graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 6
      July 2024
      535 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3613684
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 April 2024
      • Online AM: 6 March 2024
      • Accepted: 3 March 2024
      • Revised: 3 January 2024
      • Received: 4 July 2023
      Published in tkdd Volume 18, Issue 6

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)81
      • Downloads (Last 6 weeks)70

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text