当前位置: X-MOL 学术IEEE Trans. Softw. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An Empirical Study on Correlations Between Deep Neural Network Fairness and Neuron Coverage Criteria
IEEE Transactions on Software Engineering ( IF 7.4 ) Pub Date : 2024-01-08 , DOI: 10.1109/tse.2023.3349001
Wei Zheng 1 , Lidan Lin 1 , Xiaoxue Wu 2 , Xiang Chen 3
Affiliation  

Recently, with the widespread use of deep neural networks (DNNs) in high-stakes decision-making systems (such as fraud detection and prison sentencing), concerns have arisen about the fairness of DNNs in terms of the potential negative impact they may have on individuals and society. Therefore, fairness testing has become an important research topic in DNN testing. At the same time, the neural network coverage criteria (such as criteria based on neuronal activation) is considered as an adequacy test for DNN white-box testing. It is implicitly assumed that improving the coverage can enhance the quality of test suites. Nevertheless, the correlation between DNN fairness (a test property) and coverage criteria (a test method) has not been adequately explored. To address this issue, we conducted a systematic empirical study on seven coverage criteria, six fairness metrics, three fairness testing techniques, and five bias mitigation methods on five DNN models and nine fairness datasets to assess the correlation between coverage criteria and DNN fairness. Our study achieved the following findings: 1) with the increase in the size of the test suite, some of the coverage and fairness metrics changed significantly, as the size of the test suite increased; 2) the statistical correlation between coverage criteria and DNN fairness is limited; and 3) after bias mitigation for improving the fairness of DNN, the change pattern in coverage criteria is different; 4) Models debiased by different bias mitigation methods have a lower correlation between coverage and fairness compared to the original models. Our findings cast doubt on the validity of coverage criteria concerning DNN fairness (i.e., increasing the coverage may even have a negative impact on the fairness of DNNs). Therefore, we warn DNN testers against blindly pursuing higher coverage of coverage criteria at the cost of test properties of DNNs (such as fairness).

中文翻译:

深度神经网络公平性与神经元覆盖标准相关性的实证研究

近年来,随着深度神经网络(DNN)在高风险决策系统(例如欺诈检测和监狱量刑)中的广泛使用,人们开始担心 DNN 的公平性,因为它们可能对决策产生潜在的负面影响。个人和社会。因此,公平性测试成为DNN测试中的一个重要研究课题。同时,神经网络覆盖标准(例如基于神经元激活的标准)被认为是 DNN 白盒测试的充分性测试。人们隐含地假设提高覆盖率可以提高测试套件的质量。然而,DNN 公平性(测试属性)和覆盖标准(测试方法)之间的相关性尚未得到充分探索。为了解决这个问题,我们对 5 个 DNN 模型和 9 个公平数据集对 7 个覆盖标准、6 个公平性指标、3 种公平性测试技术和 5 种偏差缓解方法进行了系统的实证研究,以评估覆盖标准和 DNN 公平性之间的相关性。我们的研究取得了以下发现:1)随着测试套件规模的增加,一些覆盖率和公平性指标随着测试套件规模的增加而发生显着变化;2)覆盖标准与DNN公平性之间的统计相关性有限;3)为了提高DNN的公平性而进行偏差缓解后,覆盖标准的变化模式不同;4)与原始模型相比,通过不同偏差缓解方法去偏差的模型在覆盖率和公平性之间的相关性较低。我们的研究结果对有关 DNN 公平性的覆盖标准的有效性提出了质疑(即增加覆盖范围甚至可能对 DNN 的公平性产生负面影响)。因此,我们警告 DNN 测试人员不要盲目追求更高的覆盖率标准,而牺牲 DNN 的测试特性(例如公平性)。
更新日期:2024-01-08
down
wechat
bug