PTM-APIRec: Leveraging Pre-trained Models of Source Code in API Recommendation,ACM Transactions on Software Engineering and Methodology

当前位置： X-MOL 学术 › ACM Trans. Softw. Eng. Methodol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

PTM-APIRec: Leveraging Pre-trained Models of Source Code in API Recommendation
ACM Transactions on Software Engineering and Methodology ( IF 4.4 ) Pub Date : 2024-03-15 , DOI: 10.1145/3632745
Zhihao Li ₁ , Chuanyi Li ₁ , Ze Tang ₁ , Wanhong Huang ₁ , Jidong Ge ₁ , Bin Luo ₁ , Vincent Ng ₂ , Ting Wang ₃ , Yucheng Hu ₃ , Xiaopeng Zhang ₃

Affiliation

Recommending APIs is a practical and essential feature of IDEs. Improving the accuracy of API recommendations is an effective way to improve coding efficiency. With the success of deep learning in software engineering, the state-of-the-art (SOTA) performance of API recommendation is also achieved by deep-learning-based approaches. However, existing SOTAs either only consider the API sequences in the code snippets or rely on complex operations for extracting hand-crafted features, all of which have potential risks in under-encoding the input code snippets and further resulting in sub-optimal recommendation performance. To this end, this article proposes to utilize the code understanding ability of existing general code Pre-Training Models to fully encode the input code snippet to improve the accuracy of API Recommendation, namely, PTM-APIRec. To ensure that the code semantics of the input are fully understood and the API recommended actually exists, we use separate vocabularies for the input code snippet and the APIs to be predicted. The experimental results on the JDK and Android datasets show that PTM-APIRec surpasses existing approaches. Besides, an effective way to improve the performance of PTM-APIRec is to enhance the pre-trained model with more pre-training data (which is easier to obtain than API recommendation datasets).

中文翻译：

PTM-APIRec：在 API 推荐中利用源代码的预训练模型

推荐API是IDE的一个实用且必不可少的功能。提高API推荐的准确性是提高编码效率的有效途径。随着深度学习在软件工程中的成功，基于深度学习的方法也实现了 API 推荐的最先进（SOTA）性能。然而，现有的 SOTA 要么只考虑代码片段中的 API 序列，要么依赖复杂的操作来提取手工制作的特征，所有这些都存在对输入代码片段编码不足并进一步导致推荐性能次优的潜在风险。为此，本文提出利用现有通用代码的代码理解能力磷关于-时间下雨中号模型对输入代码片段进行完全编码，以提高准确性应用程序编程接口记录推荐，即PTM-APIRec。为了确保输入的代码语义被完全理解并且推荐的 API 确实存在，我们对输入代码片段和要预测的 API 使用单独的词汇表。在 JDK 和 Android 数据集上的实验结果表明 PTM-APIRec 超越了现有方法。此外，提高PTM-APIRec性能的有效方法是使用更多的预训练数据（比API推荐数据集更容易获得）来增强预训练模型。

更新日期：2024-03-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>