Generating Deep Learning Model-Specific Explanations at the End User’s Side,International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems

当前位置： X-MOL 学术 › Int. J. Uncertain. Fuzziness Knowl. Based Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Generating Deep Learning Model-Specific Explanations at the End User’s Side
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems ( IF 1.5 ) Pub Date : 2023-01-27 , DOI: 10.1142/s0218488522400219
R. Haffar ₁ , N. Jebreel ₁ , D. Sánchez ₁ , J. Domingo-Ferrer ₁

Affiliation

End users who cannot afford to collect and label big data to train accurate deep learning (DL) models resort to Machine Learning as a Service (MLaaS) providers, who provide paid access to accurate DL models. However, the lack of transparency in how the providers’ models make predictions causes a problem of trust. A way to increase trust (and also to align with ethical regulations) is for predictions to be accompanied by explanations locally and independently generated by the end users (rather than by explanations offered by the model providers). Explanation methods using internal components of DL models (a.k.a. model-specific explanations) are more accurate and effective than those relying solely on the inputs and outputs (a.k.a. model-agnostic explanations). However, end users lack white-box access to the internal components of the providers’ models. To tackle this issue, we propose a novel approach allowing an end user to locally generate model-specific explanations for a DL classification model accessed via a provider’s API. First, we approximate the provider’s model with a local surrogate model. We then use the surrogate model’s components to locally generate model-specific explanations that approximate the explanations obtainable with white-box access to the provider’s DL model. Specifically, we leverage the surrogate model’s gradients to generate adversarial examples that counterfactually explain why an input example is classified into a specific class. Our approach only requires the end user to have unlabeled data of size $0.5 %$ of the provider’s training data and with a similar distribution; given the small size and unlabeled nature of these data, they can be assumed to be already available to the end user or even to be supplied by the provider to build trust in his model. We demonstrate the accuracy and effectiveness of our approach through extensive experiments on two ML tasks: image classification and tabular data classification. The locally generated explanations are consistent with those obtainable with white-box access to the provider’s model, thus giving end users an independent and reliable way to determine if the provider’s model is trustworthy.

中文翻译：

在最终用户端生成特定于深度学习模型的解释

无力收集和标记大数据以训练准确的深度学习 (DL) 模型的最终用户求助于机器学习即服务 (MLaaS) 提供商，后者提供对准确的 DL 模型的付费访问。然而，供应商的模型如何进行预测缺乏透明度会导致信任问题。增加信任（并符合道德规范）的一种方法是预测伴随着最终用户在本地和独立生成的解释（而不是模型提供者提供的解释）。使用 DL 模型内部组件的解释方法（也称为特定于模型的解释）比那些仅依赖输入和输出的方法（也称为与模型无关的解释）更准确和有效。然而，最终用户缺乏对供应商模型内部组件的白盒访问。为了解决这个问题，我们提出了一种新方法，允许最终用户在本地为通过提供商的 API 访问的 DL 分类模型生成特定于模型的解释。首先，我们用本地代理模型来近似提供者的模型。然后，我们使用代理模型的组件在本地生成特定于模型的解释，这些解释近似于通过白盒访问提供者的 DL 模型可获得的解释。具体来说，我们利用代理模型的梯度来生成对抗性示例，这些示例反事实地解释了为什么输入示例被分类到特定类别中。我们的方法只需要最终用户有大小的未标记数据我们提出了一种新方法，允许最终用户在本地为通过提供者的 API 访问的 DL 分类模型生成特定于模型的解释。首先，我们用本地代理模型来近似提供者的模型。然后，我们使用代理模型的组件在本地生成特定于模型的解释，这些解释近似于通过白盒访问提供者的 DL 模型可获得的解释。具体来说，我们利用代理模型的梯度来生成对抗性示例，这些示例反事实地解释了为什么输入示例被分类到特定类别中。我们的方法只需要最终用户有大小的未标记数据我们提出了一种新方法，允许最终用户在本地为通过提供者的 API 访问的 DL 分类模型生成特定于模型的解释。首先，我们用本地代理模型来近似提供者的模型。然后，我们使用代理模型的组件在本地生成特定于模型的解释，这些解释近似于通过白盒访问提供者的 DL 模型可获得的解释。具体来说，我们利用代理模型的梯度来生成对抗性示例，这些示例反事实地解释了为什么输入示例被分类到特定类别中。我们的方法只需要最终用户有大小的未标记数据然后，我们使用代理模型的组件在本地生成特定于模型的解释，这些解释近似于通过白盒访问提供者的 DL 模型可获得的解释。具体来说，我们利用代理模型的梯度来生成对抗性示例，这些示例反事实地解释了为什么输入示例被分类到特定类别中。我们的方法只需要最终用户有大小的未标记数据然后，我们使用代理模型的组件在本地生成特定于模型的解释，这些解释近似于通过白盒访问提供者的 DL 模型可获得的解释。具体来说，我们利用代理模型的梯度来生成对抗性示例，这些示例反事实地解释了为什么输入示例被分类到特定类别中。我们的方法只需要最终用户有大小的未标记数据 $0 . 5个 %$ 提供者的训练数据和类似的分布；鉴于这些数据的小规模和未标记的性质，可以假设它们已经可供最终用户使用，甚至可以由提供商提供以建立对其模型的信任。我们通过对两个 ML 任务的大量实验证明了我们方法的准确性和有效性：图像分类和表格数据分类。本地生成的解释与通过对提供者模型的白盒访问可获得的解释一致，从而为最终用户提供了一种独立且可靠的方式来确定提供者的模型是否值得信赖。

更新日期：2023-01-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>