当前位置: X-MOL 学术arXiv.cs.GT › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Towards a Game-theoretic Understanding of Explanation-based Membership Inference Attacks
arXiv - CS - Computer Science and Game Theory Pub Date : 2024-04-10 , DOI: arxiv-2404.07139
Kavita Kumari, Murtuza Jadliwala, Sumit Kumar Jha, Anindya Maiti

Model explanations improve the transparency of black-box machine learning (ML) models and their decisions; however, they can also be exploited to carry out privacy threats such as membership inference attacks (MIA). Existing works have only analyzed MIA in a single "what if" interaction scenario between an adversary and the target ML model; thus, it does not discern the factors impacting the capabilities of an adversary in launching MIA in repeated interaction settings. Additionally, these works rely on assumptions about the adversary's knowledge of the target model's structure and, thus, do not guarantee the optimality of the predefined threshold required to distinguish the members from non-members. In this paper, we delve into the domain of explanation-based threshold attacks, where the adversary endeavors to carry out MIA attacks by leveraging the variance of explanations through iterative interactions with the system comprising of the target ML model and its corresponding explanation method. We model such interactions by employing a continuous-time stochastic signaling game framework. In our framework, an adversary plays a stopping game, interacting with the system (having imperfect information about the type of an adversary, i.e., honest or malicious) to obtain explanation variance information and computing an optimal threshold to determine the membership of a datapoint accurately. First, we propose a sound mathematical formulation to prove that such an optimal threshold exists, which can be used to launch MIA. Then, we characterize the conditions under which a unique Markov perfect equilibrium (or steady state) exists in this dynamic system. By means of a comprehensive set of simulations of the proposed game model, we assess different factors that can impact the capability of an adversary to launch MIA in such repeated interaction settings.

中文翻译:

对基于解释的成员推理攻击的博弈论理解

模型解释提高了黑盒机器学习 (ML) 模型及其决策的透明度;然而,它们也可能被用来实施隐私威胁,例如成员推理攻击 (MIA)。现有的工作仅在对手与目标 ML 模型之间的单个“假设”交互场景中分析了 MIA;因此,它无法识别影响对手在重复交互环境中发起 MIA 的能力的因素。此外,这些工作依赖于对手对目标模型结构的了解的假设,因此不能保证区分成员和非成员所需的预定义阈值的最优性。在本文中,我们深入研究了基于解释的阈值攻击领域,其中对手试图通过与由目标 ML 模型及其相应解释方法组成的系统迭代交互,利用解释的方差来进行 MIA 攻击。我们通过采用连续时间随机信号游戏框架来模拟此类交互。在我们的框架中,对手玩停止游戏,与系统交互(关于对手类型的不完美信息,即诚实或恶意)以获得解释方差信息并计算最佳阈值以准确确定数据点的成员资格。首先,我们提出了一个合理的数学公式来证明这样一个最佳阈值的存在,它可以用来启动 MIA。然后,我们描述了该动态系统中存在唯一马尔可夫完美平衡(或稳态)的条件。通过对所提出的游戏模型进行全面的模拟,我们评估了可能影响对手在此类重复交互设置中启动 MIA 的能力的不同因素。
更新日期:2024-04-11
down
wechat
bug