Effective test generation using pre-trained Large Language Models and mutation testing,Information and Software Technology

当前位置： X-MOL 学术 › Inf. Softw. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Effective test generation using pre-trained Large Language Models and mutation testing
Information and Software Technology ( IF 3.9 ) Pub Date : 2024-04-06 , DOI: 10.1016/j.infsof.2024.107468
Arghavan Moradi Dakhel , Amin Nikanjam , Vahid Majdinasab , Foutse Khomh , Michel C. Desmarais

One of the critical phases in the software development life cycle is software testing. Testing helps with identifying potential bugs and reducing maintenance costs. The goal of automated test generation tools is to ease the development of tests by suggesting efficient bug-revealing tests. Recently, researchers have leveraged Large Language Models (LLMs) of code to generate unit tests. While the code coverage of generated tests was usually assessed, the literature has acknowledged that the coverage is weakly correlated with the efficiency of tests in bug detection. To improve over this limitation, in this paper, we introduce (tation est case generation using ugmented rompt) for improving the effectiveness of test cases generated by LLMs in terms of revealing bugs by leveraging mutation testing. Our goal is achieved by augmenting prompts with surviving mutants, as those mutants highlight the limitations of test cases in detecting bugs. is capable of generating effective test cases in the absence of natural language descriptions of the Program Under Test (PUTs). We employ different LLMs within and evaluate their performance on different benchmarks. Our results show that our proposed method is able to detect up to 28% more faulty human-written code snippets. Among these, 17% remained undetected by both the current state-of-the-art fully-automated test generation tool (i.e., Pynguin) and zero-shot/few-shot learning approaches on LLMs. Furthermore, achieves a Mutation Score (MS) of 93.57% on synthetic buggy code, outperforming all other approaches in our evaluation. Our findings suggest that although LLMs can serve as a useful tool to generate test cases, they require specific post-processing steps to enhance the effectiveness of the generated test cases which may suffer from syntactic or functional errors and may be ineffective in detecting certain types of bugs and testing corner cases in s.

中文翻译：

使用预先训练的大型语言模型和突变测试生成有效的测试

软件开发生命周期的关键阶段之一是软件测试。测试有助于识别潜在的错误并降低维护成本。自动化测试生成工具的目标是通过建议有效的错误揭示测试来简化测试的开发。最近，研究人员利用代码的大型语言模型 (LLM) 来生成单元测试。虽然通常会评估生成的测试的代码覆盖率，但文献承认覆盖率与错误检测中的测试效率相关性较弱。为了克服这一限制，在本文中，我们引入了（tation est case Generation using ugmented rompt），以通过利用突变测试来提高 LLM 生成的测试用例在揭示错误方面的有效性。我们的目标是通过用幸存的突变体增强提示来实现的，因为这些突变体突出了测试用例在检测错误方面的局限性。能够在缺乏被测程序（PUT）自然语言描述的情况下生成有效的测试用例。我们聘用不同的法学硕士，并根据不同的基准评估他们的表现。我们的结果表明，我们提出的方法能够检测出多达 28% 的错误的人类编写的代码片段。其中，17% 的问题仍未被当前最先进的全自动测试生成工具（即 Pynguin）和法学硕士的零样本/少样本学习方法检测到。此外，在合成错误代码上实现了 93.57% 的突变分数 (MS)，优于我们评估中的所有其他方法。我们的研究结果表明，虽然法学硕士可以作为生成测试用例的有用工具，但它们需要特定的后处理步骤来增强生成的测试用例的有效性，这些步骤可能会出现语法或功能错误，并且可能无法有效检测某些类型的测试用例。 s 中的错误和测试极端情况。

更新日期：2024-04-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>