Skip to main content
Log in

Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Local differential privacy (LDP) approaches to collecting sensitive information for frequent itemset mining (FIM) can reliably guarantee privacy. Most current approaches to FIM under LDP add “padding and sampling” steps to obtain frequent itemsets and their frequencies because each user transaction represents a set of items. The current state-of-the-art approach, namely set-value itemset mining (SVSM), must balance variance and bias to achieve accurate results. Thus, an unbiased FIM approach with lower variance is highly promising. To narrow this gap, we propose an Item-Level LDP frequency oracle approach, named the Integrated-with-Hadamard-Transform-Based Frequency Oracle (IHFO). For the first time, Hadamard encoding is introduced to a set of values to encode all items into a fixed vector, and perturbation can be subsequently applied to the vector. An FIM approach, called optimized united itemset mining (O-UISM), is proposed to combine the padding-and-sampling-based frequency oracle (PSFO) and the IHFO into a framework for acquiring accurate frequent itemsets with their frequencies. Finally, we theoretically and experimentally demonstrate that O-UISM significantly outperforms the extant approaches in finding frequent itemsets and estimating their frequencies under the same privacy guarantee.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Li N H, Qardaji W, Su D, Cao J N. PrivBasis: Frequent itemset mining with differential privacy. Proceedings of the VLDB Endowment, 2012, 5(11): 1340–1351. https://doi.org/10.14778/2350229.2350251.

  2. Xiong X Y, Chen F, Huang P Z, Tian M M, Hu X F, Chen B D, Qin J. Frequent itemsets mining with differential privacy over large-scale data. IEEE Access, 2018, 6: 28877–28889. https://doi.org/10.1109/ACCESS.2018.2839752.

    Article  Google Scholar 

  3. Wang T H, Li N H, Jha S. Locally differentially private frequent itemset mining. In Proc. the 2018 IEEE Symposium on Security and Privacy (SP), May 2018, pp.127–143. https://doi.org/10.1109/SP.2018.00035.

  4. Agrawal R, Srikant R. Fast algorithms for mining association rules. In Proc. the 20th International Conference on Very Large Data Bases, Sept. 1994, pp.487–499. https://doi.org/10.5555/645920.672836.

  5. Adar E, Weld D S, Bershad B N, Gribble S S. Why we search: Visualizing and predicting user behavior. In Proc. the 16th International Conference on World Wide Web, May 2007, pp.161–170. https://doi.org/10.1145/1242572.1242595.

  6. Brin S, Motwani R, Silverstein C. Beyond market baskets: Generalizing association rules to correlations. In Proc. the 1997 ACM SIGMOD International Conference on Management of Data, Jun. 1997, pp.265–276. https://doi.org/10.1145/253260.253327.

  7. Dwork C. Differential privacy: A survey of results. In Proc. the 5th International Conference on Theory and Applications of Models of Computation, Apr. 2008. https://doi.org/10.1007/978-3-540-79228-4_1.

  8. Dwork C, McSherry F, Nissim K, Smith A. Calibrating noise to sensitivity in private data analysis. In Proc. the 3rd Theory of Cryptography Conference, Mar. 2006, pp.265–284. https://doi.org/10.1007/11681878_14.

  9. Wang S W, Huang L S, Nie Y W, Wang P Z, Xu H L, Yang W. PrivSet: Set-valued data analyses with locale differential privacy. In Proc. the 2018 IEEE Conference on Computer Communications, Apr. 2018, pp.1088–1096. https://doi.org/10.1109/INFOCOM.2018.8486234.

  10. Su S, Xu S Z, Cheng X, Li Z Y, Yang F C. Differentially private frequent itemset mining via transaction splitting. IEEE Trans. Knowledge and Data Engineering, 2015, 27(7): 1875–1891. https://doi.org/10.1109/TKDE.2015.2399310.

    Article  Google Scholar 

  11. Qin Z, Yang Y, Yu T, Khalil I, Xiao X K, Ren K. Heavy hitter estimation over set-valued data with local differential privacy. In Proc. the 2016 ACM SIGSAC Conference on Computer and Communications Security, Oct. 2016, pp.192–203. https://doi.org/10.1145/2976749.2978409.

  12. Cormode G, Jha S, Kulkarni T, Li N H, Srivastava D, Wang T H. Privacy at scale: Local differential privacy in practice. In Proc. the 2018 International Conference on Management of Data, May 2018, pp.1655–1658. https://doi.org/10.1145/3183713.3197390.

  13. Zeng C, Naughton J F, Cai J Y. On differentially private frequent itemset mining. Proceedings of the VLDB Endowment, 2012, 6(1): 25–36. https://doi.org/10.14778/2428536.2428539.

  14. Vadhan S. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, Lindell Y (ed.), Springer, 2017, pp.347–450. https://doi.org/10.1007/978-3-319-57048-8_7.

  15. Li N H, Lyu M, Su D, Yang W N. Differential privacy: From theory to practice. Synthesis Lectures on Information Security, Privacy, & Trust, 2016, 8(4): 1–138. https://doi.org/10.1007/978-3-031-02350-7.

    Article  Google Scholar 

  16. Papernot N, Song S, Mironov I, Raghunathan A, Talwar K, Erlingsson Ú. Scalable private learning with PATE. arXiv: 1802.08908, 2018. https://arxiv.org/abs/1802.08908, Dec. 2023.

  17. Kulkarni T, Cormode G, Srivastava D. Marginal release under local differential privacy. arXiv: 1711.02952, 2017. https://arxiv.org/abs/1711.02952, Dec. 2023.

  18. Nguyên T T, Xiao X K, Yang Y, Hui S C, Shin H, Shin J. Collecting and analyzing data from smart device users with local differential privacy. arXiv: 1606.05053, 2016. https://arxiv.org/abs/1606.05053, Dec. 2023.

  19. Wang T H, Blocki J, Li N H, Jha S. Locally differentially private protocols for frequency estimation. In Proc. the 26th USENIX Conference on Security Symposium, Aug. 2017, pp.729–745. https://doi.org/10.5555/3241189.3241247.

  20. Bassily R, Smith A. Local, private, efficient protocols for succinct histograms. In Proc. the 47th Annual ACM Symposium on Theory of Computing, Jun. 2015, pp.127–135. https://doi.org/10.1145/2746539.2746632.

  21. Wang T H, Li N H, Jha S. Locally differentially private heavy hitter identification. IEEE Trans. Dependable and Secure Computing, 2021, 18(2): 982–993. https://doi.org/10.1109/TDSC.2019.2927695.

    Article  Google Scholar 

  22. Duchi J C, Wainwright M J, Jordan M I. Local privacy and minimax bounds: Sharp rates for probability estimation. In Proc. the 26th International Conference on Neural Information Processing Systems, Dec. 2013, pp.1529–1537. https://doi.org/10.5555/2999611.2999782.

  23. Ye Q Q, Hu H B, Meng X F, Zheng H D. PrivKV: Keyvalue data collection with local differential privacy. In Proc. the 2019 IEEE Symposium on Security and Privacy (SP), May 2019, pp.317–331. https://doi.org/10.1109/SP.2019.00018.

  24. Gu X L, Li M, Cheng Y Q, Xiong L, Cao Y. PCKV: Locally differentially private correlated key-value data collection with optimized utility. In Proc. the 29th USENIX Conference on Security Symposium, Aug. 2020, pp.967–984. https://doi.org/10.5555/3489212.3489267.

  25. Erlingsson Ú, Feldman V, Mironov I, Raghunathan A, Talwar K, Thakurta A. Amplification by shuffling: From local to central differential privacy via anonymity. In Proc. the 30th Annual ACM-SIAM Symposium on Discrete Algorithms, Jan. 2019, pp.2468–2479. https://doi.org/10.1137/1.9781611975482.151.

  26. Duchi J C, Jordan M I, Wainwright M J. Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 2018, 113(521): 182–201. https://doi.org/10.1080/01621459.2017.1389735.

    Article  MathSciNet  Google Scholar 

  27. Wang N, Xiao X K, Yang Y, Zhao J, Hui S C, Shin H, Shin J, Yu G. Collecting and analyzing multidimensional data with local differential privacy. In Proc. the 35th International Conference on Data Engineering (ICDE), Apr. 2019, pp.638–649. https://doi.org/10.1109/ICDE.2019.00063.

  28. Liu R X, Cao Y, Chen H, Guo R Y, Yoshikawa M. FLAME: Differentially private federated learning in the shuffle model. In Proc. the AAAI Conference on Artificial Intelligence, May 2021, pp.8688–8696. https://doi.org/10.1609/aaai.v35i10.17053.

  29. Chen R, Li H R, Qin A K, Kasiviswanathan S P, Jin H. Private spatial data aggregation in the local setting. In Proc. the 32nd IEEE International Conference on Data Engineering (ICDE), May 2016, pp.289–300. https://doi.org/10.1109/ICDE.2016.7498248.

  30. Gursoy M E, Tamersoy A, Truex S, Wei W Q, Liu L. Secure and utility-aware data collection with condensed local differential privacy. IEEE Trans. Dependable and Secure Computing, 2021, 18(5): 2365–2378. https://doi.org/10.1109/TDSC.2019.2949041.

    Article  Google Scholar 

  31. Murakami T, Kawamoto Y. Utility-optimized local differential privacy mechanisms for distribution estimation. In Proc. the 28th USENIX Conference on Security Symposium, Aug. 2019, pp.1877–1894. https://doi.org/10.5555/3361338.3361468.

  32. Gu X L, Li M, Xiong L, Cao Y. Providing input-discriminative protection for local differential privacy. In Proc. the 36th IEEE International Conference on Data Engineering (ICDE), Apr. 2020, pp.505–516. https://doi.org/10.1109/ICDE48307.2020.00050.

  33. Han X, Wang M, Zhang X J, Meng X F. Differentially private top-k query over mapreduce. In Proc. the 4th International Workshop on Cloud Data Management, Oct. 2012, pp.25–32. https://doi.org/10.1145/2390021.2390027.

  34. Evfimievski A, Gehrke J, Srikant R. Limiting privacy breaches in privacy preserving data mining. In Proc. the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Jun. 2003, pp.211–222. https://doi.org/10.1145/773153.773174.

  35. Bassily R, Nissim K, Stemmer U, Thakurta A. Practical locally private heavy hitters. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.2288–2296. https://doi.org/10.5555/3294771.3294989.

  36. Bun M, Nelson J, Stemmer U. Heavy hitters and the structure of local privacy. In Proc. the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, May 2018, pp.435–447. https://doi.org/10.1145/3196959.3196981.

  37. Acharya J, Sun Z T, Zhang H. Hadamard response: Estimating distributions privately, efficiently, and with little communication. arXiv: 1802.04705, 2018. https://arxiv.org/abs/1802.04705, Dec. 2023.

  38. Liu Y H, Suresh A T, Yu F, Kumar S, Riley M. Learning discrete distributions: User vs item-level privacy. arXiv: 2007.13660, 2020. https://arxiv.org/abs/2007.13660, Dec. 2023.

  39. Kairouz P, Oh S, Viswanath P. Extremal mechanisms for local differential privacy. In Proc. the 27th International Conference on Neural Information Processing Systems, Dec. 2014, pp.2879–2887. https://doi.org/10.5555/2969033.2969148.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Chen.

Supplementary Information

ESM 1

(PDF 150 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, D., Zhao, SY., Chen, H. et al. Hadamard Encoding Based Frequent Itemset Mining under Local Differential Privacy. J. Comput. Sci. Technol. 38, 1403–1422 (2023). https://doi.org/10.1007/s11390-023-1346-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-023-1346-7

Keywords

Navigation