Abstract
Knowledge graphs (KGs) are important resources for many artificial intelligence tasks but usually suffer from incompleteness, which has prompted scholars to put forward the task of knowledge graph completion (KGC). Embedding-based methods, which use the structural information of the KG for inference completion, are mainstream for this task. But these methods cannot complete the inference for the entities that do not appear in the KG and are also constrained by the structural information. To address these issues, scholars have proposed text-based methods. This type of method improves the reasoning ability of the model by utilizing pre-trained language (PLMs) models to learn textual information from the knowledge graph data. However, the performance of text-based methods lags behind that of embedding-based methods. We identify that the key reason lies in the expensive negative sampling. Positive unlabeled (PU) learning is introduced to help collect negative samples with high confidence from a small number of samples, and prompt learning is introduced to produce good training results. The proposed PLM-based KGC model outperforms earlier text-based methods and rivals earlier embedding-based approaches on several benchmark datasets. By exploiting the structural information of KGs, the proposed model also has a satisfactory performance in inference speed.
Similar content being viewed by others
Code availability
Code that supports the findings of this study and data extracted for training the classification algorithms are available from the corresponding author upon reasonable request.
References
Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z, (2007) Dbpedia: a nucleus for a web of open data. In: The Semantic Web: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007+ ASWC 2007, Busan, Korea, 2007. Proceedings. Springer, pp. 722–735.
Balaevi I, Allen C, Hospedales TM (2019) TuckER: tensor factorization for knowledge graph completion.
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J, (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 1247–1250.
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. Advances in neural information processing systems, 26.
Cao Y, Ji X, Lv X, Li J, Wen Y, Zhang H, (2021) Are missing links predictable? An inferential benchmark for knowledge graph completion. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 6855–6865.
Chao L, He J, Wang T, Chu W (2020) Pairre: knowledge graph embeddings via paired relation vectors. arXiv preprint arXiv:2011.03798.
Chen X, Xie X, Zhang N, Yan J, Deng S, Tan C, Huang F, Si L, Chen H (2021) Adaprompt: adaptive prompt-based finetuning for relation extraction. arXiv preprint arXiv:2104.07650.
Cui L, Wu Y, Liu J, Yang S, Zhang Y (2021) Template-based named entity recognition using BART. Find Assoc Comput Ling: ACL-IJCNLP 2021:1835–1845
Daza D, Cochez M, Groth P (2021) Inductive entity representations from text via link prediction. Proc Web Conf 2021:798–808
Dettmers T, Minervini P, Stenetorp P, Riedel S, (2018) Convolutional 2d knowledge graph embeddings, Proceedings of the AAAI conference on artificial intelligence.
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W, (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 601–610.
Gao T, Fisch A, Chen D (2020) Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723.
Hambardzumyan K, Khachatrian H, May J (2021) Warp: word-level adversarial reprogramming. arXiv preprint arXiv:2101.00121.
Han X, Zhao W, Ding N, Liu Z, Sun M (2022) Ptr: prompt tuning with rules for text classification. AI Open 3:182–192
Hao Y, Zhang Y, Liu K, He S, Liu Z, Wu H, Zhao J, (2017) An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 1: 221–231.
He F, Liu T, Webb GI, Tao D (2018) Instance-dependent pu learning by bayesian optimal relabeling. arXiv preprint arXiv:1808.02180.
Jiang Z, Xu FF, Araki J, Neubig G (2020) How can we know what language models know? Trans Assoc Comput Ling 8:423–438
Kim B, Hong T, Ko Y, Seo J, (2020) Multi-task learning for knowledge graph completion with pre-trained language models. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1737–1743.
Liu B, Liu Q, Xiao Y (2022) A new method for positive and unlabeled learning with privileged information. Appl Intell 52:2465–2479
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst, 26.
Pennington J, Socher R, Manning CD, (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543.
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training.
Sha X, Sun Z, Zhang J (2021) Hierarchical attentive knowledge graph embedding for personalized recommendation. Electron Commer Res Appl 48:101071
Suchanek FM, Kasneci G, Weikum G, (2007) Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web, pp. 697–706.
Sun Z, Deng Z-H, Nie J-Y, Tang J (2019) Rotate: knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197.
Toutanova K, Chen D, Pantel P, Poon H, Choudhury P, Gamon M, (2015) Representing text for joint embedding of text and knowledge bases. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 1499–1509.
Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G, (2016) Complex embeddings for simple link prediction. In: International conference on machine learning. PMLR, pp. 2071–2080.
Wang B, Shen T, Long G, Zhou T, Wang Y, Chang Y (2021) Structure-augmented text representation learning for efficient knowledge graph completion. Proc Web Conf 2021:1737–1748
Wang L, Zhao W, Wei Z, Liu J, (2022) SimKGC: simple contrastive knowledge graph completion with pre-trained language models. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 4281–4294.
Wang Q, Mao Z, Wang B, Guo L (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Trans Knowl Data Eng 29:2724–2743
Wang X, Gao T, Zhu Z, Zhang Z, Liu Z, Li J, Tang J (2021) KEPLER: a unified model for knowledge embedding and pre-trained language representation. Trans Assoc Comput Ling 9:176–194
Wang Z, Zhang J, Feng J, Chen Z, (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence.
Xie R, Liu Z, Jia J, Luan H, Sun M, (2016) Representation learning of knowledge graphs with entity descriptions. In: Proceedings of the AAAI Conference on Artificial Intelligence.
Yang B, Yih WT, He X, Gao J, Deng L, (2014) Embedding entities and relations for learning and inference in knowledge bases. In: International Conference on Learning Representations.
Yao L, Mao C, Luo Y (2019) KG-BERT: BERT for knowledge graph completion. arXiv preprint arXiv:1909.03193.
Zhang F, Yuan NJ, Lian D, Xie X, Ma W-Y, (2016) Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 353–362.
Acknowledgements
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Contributions
DL did conceptualization, methodology, validation, writing—review & editing. WJ done data preparation, software, writing—original draft, writing—review & editing. LB contributed to conceptualization, methodology, validation, investigation. SQ performed conceptualization, methodology, validation, investigation, supervision, and project administration.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Conflict of interest
The authors declare that they have no conflict of interest.
Ethics approval
We declare that this submission follows the policies as outlined in the Guide for Authors. The current research involves no Human Participants and/or Animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Duan, L., Wang, J., Luo, B. et al. Simple knowledge graph completion model based on PU learning and prompt learning. Knowl Inf Syst 66, 2683–2697 (2024). https://doi.org/10.1007/s10115-023-02040-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-02040-z