当前位置: X-MOL 学术J. Comput. Sci. Tech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy
Journal of Computer Science and Technology ( IF 1.9 ) Pub Date : 2023-12-01 , DOI: 10.1007/s11390-023-1346-7
Dan Zhao , Su-Yun Zhao , Hong Chen , Rui-Xuan Liu , Cui-Ping Li , Xiao-Ying Zhang

Abstract

Local differential privacy (LDP) approaches to collecting sensitive information for frequent itemset mining (FIM) can reliably guarantee privacy. Most current approaches to FIM under LDP add “padding and sampling” steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items. The current state-of-the-art approach, namely set-value itemset mining (SVSM), must balance variance and bias to achieve accurate results. Thus, an unbiased FIM approach with lower variance is highly promising. To narrow this gap, we propose an Item-Level LDP frequency oracle approach, named the Integrated-with-Hadamard-Transform-Based Frequency Oracle (IHFO). For the first time, Hadamard encoding is introduced to a set of values to encode all items into a fixed vector, and perturbation can be subsequently applied to the vector. An FIM approach, called optimized united itemset mining (O-UISM), is proposed to combine the padding-and-sampling-based frequency oracle (PSFO) and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies. Finally, we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.



中文翻译:

本地差分隐私下基于Hadamard编码的频繁项集挖掘

摘要

为频繁项集挖掘(FIM)收集敏感信息的本地差分隐私(LDP)方法可以可靠地保证隐私。LDP 下的当前 FIM 方法大多添加“填充和采样”步骤来获取频繁项集及其频率,因为每个用户事务代表一组项。当前最先进的方法,即集值项集挖掘(SVSM),必须平衡方差和偏差才能获得准确的结果。因此,具有较低方差的无偏 FIM 方法非常有前途。为了缩小这一差距,我们提出了一种项目级 LDP 频率预言机方法,称为基于哈达玛变换的集成频率预言机 (IHFO)。首次将哈达玛编码引入一组值,将所有项编码为固定向量,随后可以对该向量应用扰动。提出了一种称为优化联合项集挖掘(O-UISM)的 FIM 方法,将基于填充和采样的频率预言机(PSFO)和 IHFO 结合到一个框架中,以获取准确的频繁项集及其频率。最后,我们从理论上和实验上证明,在相同的隐私保证下,O-UISM 在查找频繁项集并估计其频率方面显着优于现有方法。

更新日期:2023-12-01
down
wechat
bug