skip to main content
research-article

X-FSPMiner: A Novel Algorithm for Frequent Similar Pattern Mining

Published:26 March 2024Publication History
Skip Abstract Section

Abstract

Frequent similar pattern mining (FSP mining) allows for finding frequent patterns hidden from the classical approach. However, the use of similarity functions implies more computational effort, necessitating the development of more efficient algorithms for FSP mining. This work aims to improve the efficiency of mining all FSPs when using Boolean and non-increasing monotonic similarity functions. A data structure to condense an object description collection, named FV-Tree, and an algorithm for mining all FSPs from the FV-Tree, named X-FSPMiner, are proposed. The experimental results reveal that the novel algorithm X-FSPMiner vastly outperforms the state-of-the-art algorithms for mining all FSPs using Boolean and non-increasing monotonic similarity functions.

REFERENCES

  1. [1] Aggarwal Charu C.. 2014. Applications of frequent pattern mining. In Frequent Pattern Mining. Springer, 443467.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Agrawal Rakesh, Imieliński Tomasz, and Swami Arun. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. 207216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Agrawal Rakesh, Srikant Ramakrishnan, et al. 1994. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, Vol. 1215. 487499.Google ScholarGoogle Scholar
  4. [4] Alemán-García Nathalie and Ortiz-Posadas Martha R.. 2021. Evaluation of hepatic fibrosis stages using the logical combinatorial approach. In Progress in Artificial Intelligence and Pattern Recognition, Heredia Yanio Hernández, Núñez Vladimir Milián, and Shulcloper José Ruiz (Eds.). Springer International Publishing, Cham, 158166.Google ScholarGoogle Scholar
  5. [5] Aryabarzan Nader, Minaei-Bidgoli Behrouz, and Teshnehlab Mohammad. 2018. negFIN: An efficient algorithm for fast mining frequent itemsets. Expert Systems with Applications 105 (2018), 129143.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Busarov Vyacheslav, Grafeeva Natalia, and Mikhailova Elena. 2016. A comparative analysis of algorithms for mining frequent itemsets. In Databases and Information Systems, Arnicans Guntis, Arnicane Vineta, Borzovs Juris, and Niedrite Laila (Eds.). Springer International Publishing, Cham, 136150.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Cha Sung-Hyuk. 2007. Comprehensive survey on distance/similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences 1, 4 (2007), 300307.Google ScholarGoogle Scholar
  8. [8] Chen Yangming, Fournier-Viger Philippe, Nouioua Farid, and Wu Youxi. 2021. Sequence prediction using partially-ordered episode rules. In 2021 International Conference on Data Mining Workshops (ICDMW). 574580.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Danger Roxana, Ruíz-Shulcloper José, and Llavori Rafael Berlanga. 2004. Objectminer: A new approach for mining complex objects. In ICEIS (2). Citeseer, 4247.Google ScholarGoogle Scholar
  10. [10] Deng ZhiHong, Wang ZhongHui, and Jiang JiaJian. 2012. A new algorithm for fast mining frequent itemsets using n-lists. Science China Information Sciences 55, 9 (Sept.2012), 20082030.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Deng Zhi-Hong. 2014. Fast mining top-rank-k frequent patterns by using node-lists. Expert Systems with Applications 41, 4, Part 2 (2014), 17631768.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Deng Zhi-Hong. 2016. DiffNodesets: An efficient structure for fast mining frequent itemsets. Applied Soft Computing 41 (2016), 214223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Deng Zhi-Hong and Lv Sheng-Long. 2015. PrePost+: An efficient n-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Systems with Applications 42, 13 (2015), 54245432.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Fournier-Viger Philippe, Gan Wensheng, Wu Youxi, Nouioua Mourad, Song Wei, Truong Tin, and Duong Hai. 2022. Pattern mining: Current challenges and opportunities. In Database Systems for Advanced Applications. DASFAA 2022 International Workshops, Rage Uday Kiran, Goyal Vikram, and Reddy P. Krishna (Eds.). Springer International Publishing, Cham, 3449.Google ScholarGoogle Scholar
  15. [15] Gómez J., Rodríguez O., Valladares S., Ruiz-Shulcloper J., et al. 1994. Prognostic of gas-oil deposits in the Cuban Ophiological Association, applying mathematical modeling. Geofisica Internacional 33, 3 (1994), 447467.Google ScholarGoogle Scholar
  16. [16] Grahne G. and Zhu J.. 2005. Fast algorithms for frequent itemset mining using FP-trees. IEEE Transactions on Knowledge and Data Engineering 17, 10 (2005), 13471362.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Han Jiawei, Cheng Hong, Xin Dong, and Yan Xifeng. 2007. Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery 15, 1 (Aug.2007), 5586.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Han Jiawei, Pei Jian, and Yin Yiwen. 2000. Mining frequent patterns without candidate generation. SIGMOD Rec. 29, 2 (May2000), 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Le Hieu Hanh, Yamada Tatsuhiro, Honda Yuichi, Sakamoto Takatoshi, Matsuo Ryosuke, Yamazaki Tomoyoshi, Araki Kenji, and Yokota Haruo. 2022. Methods for analyzing medical-order sequence variants in sequential pattern mining for electronic medical record systems. ACM Trans. Comput. Healthcare (Sep2022). Just Accepted.Google ScholarGoogle Scholar
  20. [20] Leung Carson Kai-Sang. 2009. Anti-monotone Constraints. Springer US, Boston, MA, 9898.Google ScholarGoogle Scholar
  21. [21] Liu Guimei, Lu Hongjun, Lou Wenwu, Xu Yabo, and Yu Jeffrey Xu. 2004. Efficient mining of frequent patterns using ascending frequency ordered prefix-tree. Data Mining and Knowledge Discovery 9, 2 (Nov.2004), 249274.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Ortiz-Posadas Martha R.. 2017. The logical combinatorial approach applied to pattern recognition in medicine. In New Trends and Advanced Methods in Interdisciplinary Mathematical Sciences, Toni Bourama (Ed.). Springer International Publishing, Cham, 169188.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Pan Zhiwen, Li Jiangtian, Chen Yiqiang, Pacheco Jesus, Dai Lianjun, and Zhang Jun. 2019. Knowledge discovery in sociological databases. International Journal of Crowd Science (2019).Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Pei J., Han Jiawei, Lu Hongjun, Nishio S., Tang S., and Yang Dongqing. 2001. H-mine: Hyper-structure mining of frequent patterns in large databases. Proceedings 2001 IEEE International Conference on Data Mining (2001), 441448.Google ScholarGoogle Scholar
  25. [25] Rodríguez-González Ansel Y., Lezama Fernando, Iglesias-Alvarez Carlos A., Martínez-Trinidad José Fco, Carrasco-Ochoa Jesús A., and Cote Enrique Munoz de. 2018. Closed frequent similar pattern mining: Reducing the number of frequent similar patterns without information loss. Expert Systems with Applications 96 (2018), 271283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Rodríguez-González Ansel Y., Martínez-Trinidad José Francisco, Carrasco-Ochoa Jesús Ariel, and Ruiz-Shulcloper José. 2008. Mining frequent similar patterns on mixed data. In Iberoamerican Congress on Pattern Recognition. Springer, 136144.Google ScholarGoogle Scholar
  27. [27] Rodríguez-González Ansel Y., Martínez-Trinidad José Fco, Carrasco-Ochoa Jesús Ariel, and Ruiz-Shulcloper José. 2010. Using non Boolean similarity functions for frequent similar pattern mining. In Canadian Conference on Artificial Intelligence. Springer, 374378.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Rodríguez-González Ansel Yoan, Martínez-Trinidad José Francisco, Carrasco-Ochoa Jesús Ariel, and Ruiz-Shulcloper José. 2011. RP-Miner: A relaxed prune algorithm for frequent similar pattern mining. Knowledge and Information Systems 27, 3 (2011), 451471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Rodríguez-González Ansel Y., Martínez-Trinidad José Fco., Carrasco-Ochoa Jesús A., and Ruiz-Shulcloper José. 2013. Mining frequent patterns and association rules using similarities. Expert Systems with Applications 40, 17 (2013), 68236836.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Rodríguez-González Ansel Y., Martínez-Trinidad José F., Carrasco-Ochoa Jesús A., Ruiz-Shulcloper José, and Alvarado-Mentado Matías. 2019. Frequent similar pattern mining using non Boolean similarity functions. Journal of Intelligent & Fuzzy Systems 36, 5 (2019), 49314944.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Ruiz-Shulcloper J. and Fuentes-Rodriguez A.. 1981. A cybernetic model to analyze juvenile delinquency. Revista Ciencias Matemáticas 2, 1 (1981), 123153.Google ScholarGoogle Scholar
  32. [32] Savasere Ashoka, Omiecinski Edward, and Navathe Shamkant B.. 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB ’95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 432444.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Shenoy Pradeep, Haritsa Jayant R., Sudarshan S., Bhalotia Gaurav, Bawa Mayank, and Shah Devavrat. 2000. Turbo-charging vertical mining of large databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Dallas, Texas, USA) (SIGMOD ’00). Association for Computing Machinery, New York, NY, USA, 2233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Telikani Akbar and Shahbahrami Asadollah. 2018. Data sanitization in association rule mining: An analytical review. Expert Systems with Applications 96 (2018), 406426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Trinidad José Fco. Martínez, Shulcloper José Ruiz, and Cortés Manuel S. Lazo. 2000. Structuralization of universes. Fuzzy Sets and Systems 112, 3 (2000), 485500.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Tversky Amos. 1977. Features of similarity. Psychological Review 84, 4 (1977), 327.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Vo Bay, Pham Sang, Le Tuong, and Deng Zhi-Hong. 2017. A novel approach for mining maximal frequent patterns. Expert Systems with Applications 73 (2017), 178186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Weisberg Michael. 2012. Getting serious about similarity. Philosophy of Science 79, 5 (2012), 785794.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Woon E. P. Lim, Y. K., Ng and W. K.. 2004. A support-ordered trie for fast frequent itemset discovery. IEEE Transactions on Knowledge and Data Engineering 16, 7 (2004), 875879.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Yoo Jin Soung. 2019. Crime data warehousing and crime pattern discovery. In Proceedings of the 2nd International Conference on Data Science, E-Learning and Information Systems (Dubai, United Arab Emirates) (DATA ’19). Association for Computing Machinery, New York, NY, USA, Article 40, 6 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Zaki M. J.. 2000. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12, 3 (2000), 372390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Zaki Mohammed J. and Gouda Karam. 2003. Fast vertical mining using diffsets. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Washington, D.C.) (KDD ’03). Association for Computing Machinery, New York, NY, USA, 326335.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Zhu Shuling. 2019. Research on data mining of education technical ability training for physical education students based on Apriori algorithm. Cluster Computing 22 (2019), 1481114818.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. X-FSPMiner: A Novel Algorithm for Frequent Similar Pattern Mining

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 5
        June 2024
        699 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3613659
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 March 2024
        • Online AM: 30 January 2024
        • Accepted: 26 January 2024
        • Revised: 27 February 2023
        • Received: 15 July 2022
        Published in tkdd Volume 18, Issue 5

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)108
        • Downloads (Last 6 weeks)50

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text