Abstract
Heterogeneous Information Networks (HINs) consist of multiple categories of nodes and edges and encompass rich semantic information. Representing HINs in a low-dimensional feature space is challenging due to its complex structure and rich semantics. In this paper, we focus on link prediction and node classification by learning efficient low-dimensional feature representations of HINs. Metapath-guided walkers have been extensively studied in the literature for learning feature representations. However, the metapath walker does not control the length of random walks, resulting in weak structural and semantic information embeddings. In this work, we present an influence propagation controlled metapath-guided random walk model (called IPCMetapath2Vec) for representation learning in HINs. The model works in three phases: first, we perform node transition to generate a metapath-guided random walk, which is conditioned on two factors: (i) type mapping of the next node according to the metapath, and (ii) compute influence propagation score for each node and detect potential influencers on the walk by a threshold based filter. Next, we provide the collected random walks as input to the skip-gram model to learn each node’s feature representation. Lastly, we employ an attention mechanism that aggregates the learned feature representations of each node from various semantic metapath-guided walks, preserving the importance of different semantics. We use these network representation features to address link prediction and multi-label node classification tasks. Experimental results on two public HIN datasets, namely DBLP and IMDB, show that our model outperforms the state-of-the-art representation learning models such as DeepWalk, Node2vec, Metapath2Vec, and HIN2Vec by 4.5% to 17.2% in terms of micro-F1 score for multi-label node classification and 4% to 14.50% in terms of AUC-ROC score for link prediction.
Similar content being viewed by others
References
Shi C, Li Y, Zhang J, Sun Y, Philip SY (2016) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37
Zhao H, Rui P, Chen J, Zhang Y, Wang Y, Zhao S, Tang J (2023) Hinchip: heterogeneous information network representation with community hierarchy preserving. Knowl Based Syst 264:110343
Fu Ty, Lee WC, Lei Z (2017) Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In: Proceedings of the 2017 ACM on conference on information and knowledge management, pp 1797–1806
Wu H, Song C, Ge Y, Ge T (2022) Link prediction on complex networks: an experimental survey. Data Sci Eng 7(3):253–278
Jacob Y, Denoyer L, Gallinari P (2014) Learning latent representations of nodes for classifying in heterogeneous social networks. In: Proceedings of the 7th ACM international conference on web search and data mining, pp 373–382
Tan Q, Liu N, Hu X (2019) Deep representation learning for social network analysis. Front Big Data 2:2
Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the Fourth ACM international conference on web search and data mining, pp 635–644
Butun E, Kaya M, Alhajj R (2016) A new topological metric for link prediction in directed, weighted and temporal networks. In: 2016 IEEE/ACM International conference on advances in social networks analysis and mining (ASONAM), pp 954–959
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 701–710
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: Large-scale information network embedding. In: Proceedings of the 24th international conference on world wide web, pp 1067–1077
Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864 (2016)
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Dong Y, Chawla NV, Swami A (2017) metapath2vec: Scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 135–144
Pham P, Do P (2019) W-metapath2vec: the topic-driven meta-path-based model for large-scaled content-based heterogeneous information network representation learning. Expert Syst Appl 123:328–344
Cao X, Zheng Y, Shi C, Li J, Wu B (2017) Meta-path-based link prediction in schema-rich heterogeneous information network. Int J Data Sci Anal 3:285–296
Sun Y, Han J (2013) Meta-path-based search and mining in heterogeneous information networks. Tsinghua Sci Technol 18(4):329–338
Berahmand K, Nasiri E, Li Y (2021) Spectral clustering on protein-protein interaction networks via constructing affinity matrix using attributed graph embedding. Comput Biol Med 138:104933
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Singh R, Agarwal P, Bhattacharya M (2016) Mr brain tumor detection employing laplacian eigen maps and kernel support vector machine. In: 2016 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 827–830
Cao S, Lu W, Xu Q (2015) Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 891–900
Cui P, Wang X, Pei J, Zhu W (2018) A survey on network embedding. IEEE Trans Knowl Data Eng 31(5):833–852
Yuliansyah H, Othman ZA, Bakar AA (2022) Extending adamic Adar for cold-start problem in link prediction based on network metrics. Int J Adv Intell Inform 8(3):271–284
Palau J, Montaner M, Lopez B, De La Rosa JL (2004) Collaboration analysis in recommender systems using social networks. In: CIA, vol. 3191, pp 137–151
Bonhard P, Sasse MA (2006) Knowing me, knowing you-using profiles and social networking to improve recommender systems. BT Technol J 24(3):84–98
Kong X, Shi Y, Yu S, Liu J, Xia F (2019) Academic social networks: modeling, analysis, mining and applications. J Netw Comput Appl 132:86–103
Stroele V, Zimbrao G, Souza JM (2012) Modeling, mining and analysis of multi-relational scientific social network. J Univ Comput Sci 18(8):1048–1068
Rossi RA, Ahmed NK (2014) Role discovery in networks. IEEE Trans Knowl Data Eng 27(4):1112–1131
Wang X, Chai Y, Li H, Wu D (2021) Link prediction in heterogeneous information networks: an improved deep graph convolution approach. Decis Support Syst 141:113448
Zhao Z, Gou Z, Du Y, Ma J, Li T, Zhang R (2022) A novel link prediction algorithm based on inductive matrix completion. Expert Syst Appl 188:116033
Samorodnitsky G, Resnick S, Towsley D, Davis R, Willis A, Wan P (2016) Nonstandard regular variation of in-degree and out-degree in the preferential attachment model. J Appl Probab 53(1):146–161
Tan L, Zhu Z, Ge F, Xiong N (2015) Utility maximization resource allocation in wireless networks: methods and algorithms. IEEE Trans Syst Man Cybern Syst 45(7):1018–1034
Eirinaki M, Vazirgiannis M (2005) Usage-based pagerank for web personalization. In: Fifth IEEE international conference on data mining (ICDM’05), p 8
Yang Y, Chawla N, Sun Y, Hani J (2012) Predicting links in multi-relational and heterogeneous networks. In: 2012 IEEE 12th international conference on data mining, pp 755–764. IEEE
Zheng W, Zou L, Feng Y, Chen L, Zhao D (2013) Efficient simrank-based similarity join over large graphs. Proc VLDB Endow 6(7):493–504
Guthrie TD, Benadjaoud YY, Chavez RS (2022) Social relationship strength modulates the similarity of brain-to-brain representations of group members. Cereb Cortex 32(11):2469–2477
Zhang J, Yu PS, Zhou ZH (2014) Meta-path based multi-network collective link prediction. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1286–1295
Liang H, Markchom T (2022) Tne: a general time-aware network representation learning framework for temporal applications. Knowl-Based Syst 240:108050
Meng C, Cheng R, Maniu S, Senellart P, Zhang W (2015) Discovering meta-paths in large heterogeneous information networks. In: Proceedings of the 24th International Conference on World Wide Web. WWW ’15, pp 754–764
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors confirm that they do not have known competing interests of any kind that could influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kumar, V., Krishna, P.R. An effective representation learning model for link prediction in heterogeneous information networks. Computing (2023). https://doi.org/10.1007/s00607-023-01238-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00607-023-01238-x
Keywords
- Link prediction
- Node classification
- Metapath
- Attention mechanism
- Semantic confusion
- Feature representation learning
- Networks embedding
- Influence propagation