Abstract
Frequent similar pattern mining (FSP mining) allows for finding frequent patterns hidden from the classical approach. However, the use of similarity functions implies more computational effort, necessitating the development of more efficient algorithms for FSP mining. This work aims to improve the efficiency of mining all FSPs when using Boolean and non-increasing monotonic similarity functions. A data structure to condense an object description collection, named FV-Tree, and an algorithm for mining all FSPs from the FV-Tree, named X-FSPMiner, are proposed. The experimental results reveal that the novel algorithm X-FSPMiner vastly outperforms the state-of-the-art algorithms for mining all FSPs using Boolean and non-increasing monotonic similarity functions.
- [1] . 2014. Applications of frequent pattern mining. In Frequent Pattern Mining. Springer, 443–467.Google ScholarCross Ref
- [2] . 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data. 207–216.Google ScholarDigital Library
- [3] . 1994. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, Vol. 1215. 487–499.Google Scholar
- [4] . 2021. Evaluation of hepatic fibrosis stages using the logical combinatorial approach. In Progress in Artificial Intelligence and Pattern Recognition, , , and (Eds.). Springer International Publishing, Cham, 158–166.Google Scholar
- [5] . 2018. negFIN: An efficient algorithm for fast mining frequent itemsets. Expert Systems with Applications 105 (2018), 129–143.Google ScholarCross Ref
- [6] . 2016. A comparative analysis of algorithms for mining frequent itemsets. In Databases and Information Systems, , , , and (Eds.). Springer International Publishing, Cham, 136–150.Google ScholarCross Ref
- [7] . 2007. Comprehensive survey on distance/similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences 1, 4 (2007), 300–307.Google Scholar
- [8] . 2021. Sequence prediction using partially-ordered episode rules. In 2021 International Conference on Data Mining Workshops (ICDMW). 574–580.Google ScholarCross Ref
- [9] . 2004. Objectminer: A new approach for mining complex objects. In ICEIS (2). Citeseer, 42–47.Google Scholar
- [10] . 2012. A new algorithm for fast mining frequent itemsets using n-lists. Science China Information Sciences 55, 9 (
Sept. 2012), 2008–2030.Google ScholarCross Ref - [11] . 2014. Fast mining top-rank-k frequent patterns by using node-lists. Expert Systems with Applications 41, 4, Part 2 (2014), 1763–1768.Google ScholarDigital Library
- [12] . 2016. DiffNodesets: An efficient structure for fast mining frequent itemsets. Applied Soft Computing 41 (2016), 214–223.Google ScholarDigital Library
- [13] . 2015. PrePost+: An efficient n-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Systems with Applications 42, 13 (2015), 5424–5432.Google ScholarDigital Library
- [14] . 2022. Pattern mining: Current challenges and opportunities. In Database Systems for Advanced Applications. DASFAA 2022 International Workshops, , , and (Eds.). Springer International Publishing, Cham, 34–49.Google Scholar
- [15] . 1994. Prognostic of gas-oil deposits in the Cuban Ophiological Association, applying mathematical modeling. Geofisica Internacional 33, 3 (1994), 447–467.Google Scholar
- [16] . 2005. Fast algorithms for frequent itemset mining using FP-trees. IEEE Transactions on Knowledge and Data Engineering 17, 10 (2005), 1347–1362.Google ScholarDigital Library
- [17] . 2007. Frequent pattern mining: Current status and future directions. Data Mining and Knowledge Discovery 15, 1 (
Aug. 2007), 55–86.Google ScholarDigital Library - [18] . 2000. Mining frequent patterns without candidate generation. SIGMOD Rec. 29, 2 (
May 2000), 1–12.Google ScholarDigital Library - [19] . 2022. Methods for analyzing medical-order sequence variants in sequential pattern mining for electronic medical record systems. ACM Trans. Comput. Healthcare (
Sep 2022).Just Accepted .Google Scholar - [20] . 2009. Anti-monotone Constraints. Springer US, Boston, MA, 98–98.Google Scholar
- [21] . 2004. Efficient mining of frequent patterns using ascending frequency ordered prefix-tree. Data Mining and Knowledge Discovery 9, 2 (
Nov. 2004), 249–274.Google ScholarCross Ref - [22] . 2017. The logical combinatorial approach applied to pattern recognition in medicine. In New Trends and Advanced Methods in Interdisciplinary Mathematical Sciences, (Ed.). Springer International Publishing, Cham, 169–188.Google ScholarCross Ref
- [23] . 2019. Knowledge discovery in sociological databases. International Journal of Crowd Science (2019).Google ScholarCross Ref
- [24] . 2001. H-mine: Hyper-structure mining of frequent patterns in large databases. Proceedings 2001 IEEE International Conference on Data Mining (2001), 441–448.Google Scholar
- [25] . 2018. Closed frequent similar pattern mining: Reducing the number of frequent similar patterns without information loss. Expert Systems with Applications 96 (2018), 271–283.Google ScholarDigital Library
- [26] . 2008. Mining frequent similar patterns on mixed data. In Iberoamerican Congress on Pattern Recognition. Springer, 136–144.Google Scholar
- [27] . 2010. Using non Boolean similarity functions for frequent similar pattern mining. In Canadian Conference on Artificial Intelligence. Springer, 374–378.Google ScholarDigital Library
- [28] . 2011. RP-Miner: A relaxed prune algorithm for frequent similar pattern mining. Knowledge and Information Systems 27, 3 (2011), 451–471.Google ScholarDigital Library
- [29] . 2013. Mining frequent patterns and association rules using similarities. Expert Systems with Applications 40, 17 (2013), 6823–6836.Google ScholarDigital Library
- [30] . 2019. Frequent similar pattern mining using non Boolean similarity functions. Journal of Intelligent & Fuzzy Systems 36, 5 (2019), 4931–4944.Google ScholarCross Ref
- [31] . 1981. A cybernetic model to analyze juvenile delinquency. Revista Ciencias Matemáticas 2, 1 (1981), 123–153.Google Scholar
- [32] . 1995. An efficient algorithm for mining association rules in large databases. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB ’95). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 432–444.Google ScholarDigital Library
- [33] . 2000. Turbo-charging vertical mining of large databases. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Dallas, Texas, USA) (
SIGMOD ’00 ). Association for Computing Machinery, New York, NY, USA, 22–33.Google ScholarDigital Library - [34] . 2018. Data sanitization in association rule mining: An analytical review. Expert Systems with Applications 96 (2018), 406–426.Google ScholarDigital Library
- [35] . 2000. Structuralization of universes. Fuzzy Sets and Systems 112, 3 (2000), 485–500.Google ScholarDigital Library
- [36] . 1977. Features of similarity. Psychological Review 84, 4 (1977), 327.Google ScholarCross Ref
- [37] . 2017. A novel approach for mining maximal frequent patterns. Expert Systems with Applications 73 (2017), 178–186.Google ScholarDigital Library
- [38] . 2012. Getting serious about similarity. Philosophy of Science 79, 5 (2012), 785–794.Google ScholarCross Ref
- [39] . 2004. A support-ordered trie for fast frequent itemset discovery. IEEE Transactions on Knowledge and Data Engineering 16, 7 (2004), 875–879.Google ScholarDigital Library
- [40] . 2019. Crime data warehousing and crime pattern discovery. In Proceedings of the 2nd International Conference on Data Science, E-Learning and Information Systems (Dubai, United Arab Emirates) (
DATA ’19 ). Association for Computing Machinery, New York, NY, USA, Article40 , 6 pages.Google ScholarDigital Library - [41] . 2000. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12, 3 (2000), 372–390.Google ScholarDigital Library
- [42] . 2003. Fast vertical mining using diffsets. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Washington, D.C.) (
KDD ’03 ). Association for Computing Machinery, New York, NY, USA, 326–335.Google ScholarDigital Library - [43] . 2019. Research on data mining of education technical ability training for physical education students based on Apriori algorithm. Cluster Computing 22 (2019), 14811–14818.Google ScholarCross Ref
Index Terms
- X-FSPMiner: A Novel Algorithm for Frequent Similar Pattern Mining
Recommendations
Closed frequent similar pattern mining
The concept of closed frequent similar pattern mining is introduced.Several lemmas to prune the search space are introduced and proved.A novel closed frequent similar pattern mining algorithm (CFSP-Miner), is proposed.CFSP-Miner is more efficient than ...
Mining frequent patterns and association rules using similarities
Most of the current algorithms for mining association rules assume that two object subdescriptions are similar when they are exactly equal, but in many real world problems some other similarity functions are used. Commonly these algorithms are divided ...
Mining Frequent Similar Patterns on Mixed Data
CIARP '08: Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and ApplicationsFrequent Pattern Mining is an important task due to the relevance of repetitions on data, also it is a fundamental step in the Association Rule Mining. Most of the current algorithms for mining frequent patterns assume that two object subdescriptions ...
Comments