Code Comment Inconsistency Detection Based on Confidence Learning,IEEE Transactions on Software Engineering

当前位置： X-MOL 学术 › IEEE Trans. Softw. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Code Comment Inconsistency Detection Based on Confidence Learning
IEEE Transactions on Software Engineering ( IF 7.4 ) Pub Date : 2024-01-29 , DOI: 10.1109/tse.2024.3358489
Zhengkang Xu ₁ , Shikai Guo ₁ , Yumiao Wang ₂ , Rong Chen ₂ , Hui Li ₂ , Xiaochen Li ₃ , He Jiang ₃

Affiliation

Code comments are a crucial source of software documentation that captures various aspects of the code. Such comments play a vital role in understanding the source code and facilitating communication between developers. However, with the iterative release of software, software projects become larger and more complex, leading to a corresponding increase in issues such as mismatched, incomplete, or outdated code comments. These inconsistencies in code comments can misguide developers and result in potential bugs, and there has been a steady rise in reports of such inconsistencies over time. Despite numerous methods being proposed for detecting code comment inconsistencies, their learning effect remains limited due to a lack of consideration for issues such as characterization noise and labeling errors in datasets. To overcome these limitations, we propose a novel approach called MCCL that first removes noise from the dataset and then detects inconsistent code comments in a timely manner, thereby enhancing the model's learning ability. Our proposed model facilitates better matching between code and comments, leading to improved development of software engineering projects. MCCL comprises two components, namely method comment detection and confidence learning denoising. The method comment detection component captures the intricate relationships between code and comments by learning their syntactic and semantic structures. It correlates the code and comments through an attention mechanism to identify how changes in the code affect the comments. Furthermore, confidence learning denoising component of MCCL identifies and removes characterization noises and labeling errors to enhance the quality of the datasets. This is achieved by implementing principles such as pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence. By effectively eliminating noise from the dataset, our model is able to more accurately learn inconsistencies between comments and source code. Our experiments on 1,518 open-source projects demonstrate that MCCL can accurately detect inconsistencies, achieving an average F1-score of 82.6%. This result outperforms state-of-the-art methods by 2.4% to 28.0%. Therefore, MCCL is more effective in identifying inconsistent comments based on code changes compared to existing approaches.

中文翻译：

基于置信学习的代码注释不一致检测

代码注释是软件文档的重要来源，它捕获了代码的各个方面。这些注释对于理解源代码和促进开发人员之间的沟通起着至关重要的作用。然而，随着软件的迭代发布，软件项目变得越来越大、越来越复杂，导致代码注释不匹配、不完整或过时等问题也相应增加。代码注释中的这些不一致可能会误导开发人员并导致潜在的错误，并且随着时间的推移，此类不一致的报告不断增加。尽管提出了许多方法来检测代码注释不一致，但由于缺乏对数据集中特征噪声和标签错误等问题的考虑，它们的学习效果仍然有限。为了克服这些限制，我们提出了一种称为 MCCL 的新方法，该方法首先从数据集中去除噪声，然后及时检测不一致的代码注释，从而增强模型的学习能力。我们提出的模型有助于代码和注释之间更好的匹配，从而改进软件工程项目的开发。MCCL 包括两个部分，即方法注释检测和置信度学习去噪。方法注释检测组件通过学习代码和注释的句法和语义结构来捕获代码和注释之间的复杂关系。它通过注意力机制将代码和注释关联起来，以识别代码的更改如何影响注释。此外，MCCL 的置信学习去噪组件可识别并消除特征噪声和标记错误，以提高数据集的质量。这是通过实施一些原则来实现的，例如修剪噪声数据、使用概率阈值进行计数以估计噪声以及对示例进行排序以充满信心地进行训练。通过有效消除数据集中的噪音，我们的模型能够更准确地学习注释和源代码之间的不一致。我们对 1,518 个开源项目进行的实验表明，MCCL 可以准确地检测不一致情况，达到平均F1 分数为 82.6%。该结果比最先进的方法高出 2.4% 至 28.0%。因此，与现有方法相比，MCCL 在识别基于代码更改的不一致注释方面更有效。

更新日期：2024-01-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>