ACM Transactions on Software Engineering and Methodology ( IF 4.4 ) Pub Date : 2024-04-17 , DOI: 10.1145/3635713 Peng Zhang 1 , Yang Wang 2 , Xutong Liu 2 , Zeyu Lu 2 , Yibiao Yang 2 , Yanhui Li 2 , Lin Chen 2 , Ziyuan Wang 3 , Chang-ai Sun 4 , Xiao Yu 5 , Yuming Zhou 2
Background. Software testing is a critical activity for ensuring the quality and reliability of software systems. To evaluate the effectiveness of different test suites, researchers have developed a variety of metrics. Problem. However, comparing these metrics is challenging due to the lack of a standardized evaluation framework including comprehensive factors. As a result, researchers often focus on single factors (e.g., size), which finally leads to different or even contradictory conclusions. After comparing dozens of pieces of work in detail, we have found two main problems most troubling to our community: (1) researchers tend to oversimplify the description of the ground truth they use, and (2) data involving real defects is not suitable for analysis using traditional statistical indicators. Objective. We aim at scrutinizing the whole process of comparing test suites for our community. Method. To hit this aim, we propose a framework ASSENT (ev
中文翻译:
评估测试套件的有效性:我们知道什么以及我们应该做什么?
背景。软件测试是确保软件系统质量和可靠性的关键活动。为了评估不同测试套件的有效性,研究人员开发了多种指标。问题。然而,由于缺乏包含综合因素的标准化评估框架,比较这些指标具有挑战性。因此,研究人员往往关注单一因素(例如规模),最终导致不同甚至矛盾的结论。在详细比较了数十项工作后,我们发现了最困扰我们社区的两个主要问题:(1)研究人员倾向于过度简化他们使用的基本事实的描述,(2)涉及真实缺陷的数据不适合使用传统统计指标进行分析。客观的。我们的目标是仔细审查社区测试套件比较的整个过程。方法。为了实现这一目标,我们提出了一个框架 ASSENT (ev