CodeKGC: Code Language Model for Generative Knowledge Graph Construction,ACM Transactions on Asian and Low-Resource Language Information Processing

当前位置： X-MOL 学术 › ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CodeKGC: Code Language Model for Generative Knowledge Graph Construction
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2024-03-09 , DOI: 10.1145/3641850
Zhen Bi ₁ , Jing Chen ₁ , Yinuo Jiang ₁ , Feiyu Xiong ₂ , Wei Guo ₂ , Huajun Chen ₁ , Ningyu Zhang ₁

Affiliation

Current generative knowledge graph construction approaches usually fail to capture structural knowledge by simply flattening natural language into serialized texts or a specification language. However, large generative language model trained on structured data such as code has demonstrated impressive capability in understanding natural language for structural prediction and reasoning tasks. Intuitively, we address the task of generative knowledge graph construction with code language model: given a code-format natural language input, the target is to generate triples which can be represented as code completion tasks. Specifically, we develop schema-aware prompts that effectively utilize the semantic structure within the knowledge graph. As code inherently possesses structure, such as class and function definitions, it serves as a useful model for prior semantic structural knowledge. Furthermore, we employ a rationale-enhanced generation method to boost the performance. Rationales provide intermediate steps, thereby improving knowledge extraction abilities. Experimental results indicate that the proposed approach can obtain better performance on benchmark datasets compared with baselines.¹

中文翻译：

CodeKGC：用于生成知识图构建的代码语言模型

当前的生成知识图构建方法通常无法通过简单地将自然语言扁平化为序列化文本或规范语言来捕获结构知识。然而，在代码等结构化数据上训练的大型生成语言模型在理解自然语言以进行结构预测和推理任务方面表现出了令人印象深刻的能力。直观地，我们解决了使用代码语言模型构建生成知识图的任务：给定代码格式的自然语言输入，目标是生成可以表示为代码完成任务的三元组。具体来说，我们开发了模式感知提示，可以有效地利用知识图谱中的语义结构。由于代码本质上拥有结构，例如类和函数定义，因此它可以作为先前语义结构知识的有用模型。此外，我们采用了合理增强的生成方法来提高性能。基本原理提供了中间步骤，从而提高了知识提取能力。实验结果表明，与基线相比，所提出的方法可以在基准数据集上获得更好的性能。¹

更新日期：2024-03-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>