Skip to main content
Log in

An effective representation learning model for link prediction in heterogeneous information networks

  • Special Issue Article
  • Published:
Computing Aims and scope Submit manuscript

Abstract

Heterogeneous Information Networks (HINs) consist of multiple categories of nodes and edges and encompass rich semantic information. Representing HINs in a low-dimensional feature space is challenging due to its complex structure and rich semantics. In this paper, we focus on link prediction and node classification by learning efficient low-dimensional feature representations of HINs. Metapath-guided walkers have been extensively studied in the literature for learning feature representations. However, the metapath walker does not control the length of random walks, resulting in weak structural and semantic information embeddings. In this work, we present an influence propagation controlled metapath-guided random walk model (called IPCMetapath2Vec) for representation learning in HINs. The model works in three phases: first, we perform node transition to generate a metapath-guided random walk, which is conditioned on two factors: (i) type mapping of the next node according to the metapath, and (ii) compute influence propagation score for each node and detect potential influencers on the walk by a threshold based filter. Next, we provide the collected random walks as input to the skip-gram model to learn each node’s feature representation. Lastly, we employ an attention mechanism that aggregates the learned feature representations of each node from various semantic metapath-guided walks, preserving the importance of different semantics. We use these network representation features to address link prediction and multi-label node classification tasks. Experimental results on two public HIN datasets, namely DBLP and IMDB, show that our model outperforms the state-of-the-art representation learning models such as DeepWalk, Node2vec, Metapath2Vec, and HIN2Vec by 4.5% to 17.2% in terms of micro-F1 score for multi-label node classification and 4% to 14.50% in terms of AUC-ROC score for link prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Shi C, Li Y, Zhang J, Sun Y, Philip SY (2016) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37

    Article  Google Scholar 

  2. Zhao H, Rui P, Chen J, Zhang Y, Wang Y, Zhao S, Tang J (2023) Hinchip: heterogeneous information network representation with community hierarchy preserving. Knowl Based Syst 264:110343

    Article  Google Scholar 

  3. Fu Ty, Lee WC, Lei Z (2017) Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 1797–1806

  4. Wu H, Song C, Ge Y, Ge T (2022) Link prediction on complex networks: an experimental survey. Data Sci Eng 7(3):253–278

    Article  Google Scholar 

  5. Jacob Y, Denoyer L, Gallinari P (2014) Learning latent representations of nodes for classifying in heterogeneous social networks. In: Proceedings of the 7th ACM international conference on web search and data mining, pp 373–382

  6. Tan Q, Liu N, Hu X (2019) Deep representation learning for social network analysis. Front Big Data 2:2

    Article  Google Scholar 

  7. Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the Fourth ACM international conference on web search and data mining, pp 635–644

  8. Butun E, Kaya M, Alhajj R (2016) A new topological metric for link prediction in directed, weighted and temporal networks. In: 2016 IEEE/ACM International conference on advances in social networks analysis and mining (ASONAM), pp 954–959

  9. Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 701–710

  10. Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: Large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp 1067–1077

  11. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864 (2016)

  12. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  13. Dong Y, Chawla NV, Swami A (2017) metapath2vec: Scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 135–144

  14. Pham P, Do P (2019) W-metapath2vec: the topic-driven meta-path-based model for large-scaled content-based heterogeneous information network representation learning. Expert Syst Appl 123:328–344

    Article  Google Scholar 

  15. Cao X, Zheng Y, Shi C, Li J, Wu B (2017) Meta-path-based link prediction in schema-rich heterogeneous information network. Int J Data Sci Anal 3:285–296

    Article  Google Scholar 

  16. Sun Y, Han J (2013) Meta-path-based search and mining in heterogeneous information networks. Tsinghua Sci Technol 18(4):329–338

    Article  Google Scholar 

  17. Berahmand K, Nasiri E, Li Y (2021) Spectral clustering on protein-protein interaction networks via constructing affinity matrix using attributed graph embedding. Comput Biol Med 138:104933

    Article  Google Scholar 

  18. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  19. Singh R, Agarwal P, Bhattacharya M (2016) Mr brain tumor detection employing laplacian eigen maps and kernel support vector machine. In: 2016 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 827–830

  20. Cao S, Lu W, Xu Q (2015) Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 891–900

  21. Cui P, Wang X, Pei J, Zhu W (2018) A survey on network embedding. IEEE Trans Knowl Data Eng 31(5):833–852

    Article  Google Scholar 

  22. Yuliansyah H, Othman ZA, Bakar AA (2022) Extending adamic Adar for cold-start problem in link prediction based on network metrics. Int J Adv Intell Inform 8(3):271–284

    Article  Google Scholar 

  23. Palau J, Montaner M, Lopez B, De La Rosa JL (2004) Collaboration analysis in recommender systems using social networks. In: CIA, vol. 3191, pp 137–151

  24. Bonhard P, Sasse MA (2006) Knowing me, knowing you-using profiles and social networking to improve recommender systems. BT Technol J 24(3):84–98

    Article  Google Scholar 

  25. Kong X, Shi Y, Yu S, Liu J, Xia F (2019) Academic social networks: modeling, analysis, mining and applications. J Netw Comput Appl 132:86–103

    Article  Google Scholar 

  26. Stroele V, Zimbrao G, Souza JM (2012) Modeling, mining and analysis of multi-relational scientific social network. J Univ Comput Sci 18(8):1048–1068

    MathSciNet  Google Scholar 

  27. Rossi RA, Ahmed NK (2014) Role discovery in networks. IEEE Trans Knowl Data Eng 27(4):1112–1131

    Article  Google Scholar 

  28. Wang X, Chai Y, Li H, Wu D (2021) Link prediction in heterogeneous information networks: an improved deep graph convolution approach. Decis Support Syst 141:113448

    Article  Google Scholar 

  29. Zhao Z, Gou Z, Du Y, Ma J, Li T, Zhang R (2022) A novel link prediction algorithm based on inductive matrix completion. Expert Syst Appl 188:116033

    Article  Google Scholar 

  30. Samorodnitsky G, Resnick S, Towsley D, Davis R, Willis A, Wan P (2016) Nonstandard regular variation of in-degree and out-degree in the preferential attachment model. J Appl Probab 53(1):146–161

    Article  MathSciNet  MATH  Google Scholar 

  31. Tan L, Zhu Z, Ge F, Xiong N (2015) Utility maximization resource allocation in wireless networks: methods and algorithms. IEEE Trans Syst Man Cybern Syst 45(7):1018–1034

    Article  Google Scholar 

  32. Eirinaki M, Vazirgiannis M (2005) Usage-based pagerank for web personalization. In: Fifth IEEE international conference on data mining (ICDM’05), p 8

  33. Yang Y, Chawla N, Sun Y, Hani J (2012) Predicting links in multi-relational and heterogeneous networks. In: 2012 IEEE 12th international conference on data mining, pp 755–764. IEEE

  34. Zheng W, Zou L, Feng Y, Chen L, Zhao D (2013) Efficient simrank-based similarity join over large graphs. Proc VLDB Endow 6(7):493–504

    Article  Google Scholar 

  35. Guthrie TD, Benadjaoud YY, Chavez RS (2022) Social relationship strength modulates the similarity of brain-to-brain representations of group members. Cereb Cortex 32(11):2469–2477

    Article  Google Scholar 

  36. Zhang J, Yu PS, Zhou ZH (2014) Meta-path based multi-network collective link prediction. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1286–1295

  37. Liang H, Markchom T (2022) Tne: a general time-aware network representation learning framework for temporal applications. Knowl-Based Syst 240:108050

    Article  Google Scholar 

  38. Meng C, Cheng R, Maniu S, Senellart P, Zhang W (2015) Discovering meta-paths in large heterogeneous information networks. In: Proceedings of the 24th International Conference on World Wide Web. WWW ’15, pp 754–764

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Vishnu Kumar or P. Radha Krishna.

Ethics declarations

Conflict of interest

The authors confirm that they do not have known competing interests of any kind that could influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, V., Krishna, P.R. An effective representation learning model for link prediction in heterogeneous information networks. Computing (2023). https://doi.org/10.1007/s00607-023-01238-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00607-023-01238-x

Keywords

Mathematics Subject Classification

Navigation