Source code expert identification: Models and application,Information and Software Technology

当前位置： X-MOL 学术 › Inf. Softw. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Source code expert identification: Models and application
Information and Software Technology ( IF 3.9 ) Pub Date : 2024-03-16 , DOI: 10.1016/j.infsof.2024.107445
Otávio Cury , Guilherme Avelino , Pedro Santos Neto , Marco Túlio Valente , Ricardo Britto

Identifying source code expertise is useful in several situations. Activities like bug fixing and helping newcomers are best performed by knowledgeable developers. Some studies have proposed repository-mining techniques to identify source code experts. However, there is a gap in understanding which variables are most related to code knowledge and how they can be used for identifying expertise. This study explores models of expertise identification and how these models can be used to improve a Truck Factor algorithm. First, we built an oracle with the knowledge of developers from software projects. Then, we use this oracle to analyze the correlation between measures from the development history and source code knowledge. We investigate the use of linear and machine-learning models to identify file experts. Finally, we use the proposed models to improve a Truck Factor algorithm and analyze their performance using data from public and private repositories. and have the highest positive and negative correlations with source code knowledge, respectively. Machine learning classifiers outperformed the linear techniques ( = 71% to 73%) in the largest analyzed dataset, but this advantage is unclear in the smallest one. The Truck Factor algorithm using the proposed models could handle developers missed by the previous expertise model with the best average of 74%. It was perceived as more accurate in computing the Truck Factor of an industrial project. If we analyze , the studied models have similar performance. However, machine learning classifiers get higher while linear models obtained the highest . Therefore, choosing the best technique depends on the user’s tolerance to false positives and negatives. Additionally, the proposed models significantly improved the accuracy of a Truck Factor algorithm, affirming their effectiveness in precisely identifying the key developers within software projects.

中文翻译：

源代码专家鉴定：模型与应用

识别源代码专业知识在多种情况下很有用。修复错误和帮助新人等活动最好由知识渊博的开发人员来执行。一些研究提出了存储库挖掘技术来识别源代码专家。然而，在理解哪些变量与代码知识最相关以及如何使用它们来识别专业知识方面存在差距。本研究探讨了专业知识识别模型以及如何使用这些模型来改进卡车系数算法。首先，我们利用软件项目开发人员的知识构建了一个预言机。然后，我们使用这个预言机来分析开发历史和源代码知识的度量之间的相关性。我们研究使用线性和机器学习模型来识别文件专家。最后，我们使用所提出的模型来改进卡车系数算法，并使用公共和私人存储库的数据分析其性能。和分别与源代码知识具有最高的正相关性和负相关性。在最大的分析数据集中，机器学习分类器的性能优于线性技术（= 71% 到 73%），但这种优势在最小的分析数据集中并不明显。使用所提出的模型的卡车系数算法可以处理以前的专业知识模型错过的开发人员，最佳平均值为 74%。人们认为它在计算工业项目的卡车系数方面更为准确。如果我们分析，所研究的模型具有相似的性能。然而，机器学习分类器的得分更高，而线性模型的得分最高。因此，选择最佳技术取决于用户对误报和漏报的容忍度。此外，所提出的模型显着提高了卡车系数算法的准确性，证实了它们在精确识别软件项目中的关键开发人员方面的有效性。

更新日期：2024-03-16

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>