当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mining Top-k High On-shelf Utility Itemsets Using Novel Threshold Raising Strategies
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2024-03-26 , DOI: 10.1145/3645115
Kuldeep Singh 1 , Bhaskar Biswas 2
Affiliation  

High utility itemsets (HUIs) mining is an emerging area of data mining which discovers sets of items generating a high profit from transactional datasets. In recent years, several algorithms have been proposed for this task. However, most of them do not consider the on-shelf time period of items and negative utility of items. High on-shelf utility itemset (HOUIs) mining is more difficult than traditional HUIs mining because it deals with on-shelf-based time period and negative utility of items. Moreover, most algorithms need minimum utility threshold (min_util) to find rules. However, specifying the appropriate min_util threshold is a difficult problem for users. A smaller min_util threshold may generate too many rules and a higher one may generate a few rules, which can degrade performance. To address these issues, a novel top-k HOUIs mining algorithm named TKOS (Top-K high On-Shelf utility itemsets miner) is proposed which considers on-shelf time period and negative utility. TKOS presents a novel branch and bound-based strategy to raise the internal min_util threshold efficiently. It also presents two pruning strategies to speed up the mining process. In order to reduce the dataset scanning cost, we utilize transaction merging and dataset projection techniques. Extensive experiments have been conducted on real and synthetic datasets having various characteristics. Experimental results show that the proposed algorithm outperforms the state-of-the-art algorithms. The proposed algorithm is up to 42 times faster and uses up-to 19 times less memory compared to the state-of-the-art KOSHU. Moreover, the proposed algorithm has excellent scalability in terms of time periods and the number of transactions.



中文翻译:

使用新颖的阈值提升策略挖掘 Top-k 高现成实用项目集

高效用项集 (HUI) 挖掘是数据挖掘的一个新兴领域,它发现从事务数据集中产生高利润的项集。近年来,针对此任务提出了几种算法。然而,大多数都没有考虑物品的上架时间和物品的负效用。高货架效用项集(HOUI)挖掘比传统的 HUI 挖掘更困难,因为它处理基于货架的时间段和项目的负效用。此外,大多数算法需要最小效用阈值(min_util)来查找规则。然而,指定合适的min_util阈值对于用户来说是一个难题。较小的min_util阈值可能会生成过多的规则,较高的阈值可能会生成少量规则,这会降低性能。为了解决这些问题,提出了一种新的top-k HOUIs挖掘算法TKOS(Top - K high On - Shelf utility itemsetsets miner),该算法考虑了上架时间段和负效用。 TKOS 提出了一种新颖的基于分支定界的策略来有效提高内部min_util阈值。它还提出了两种修剪策略来加速挖掘过程。为了降低数据集扫描成本,我们利用事务合并和数据集投影技术。人们对具有各种特征的真实和合成数据集进行了广泛的实验。实验结果表明,所提出的算法优于最先进的算法。与最先进的 KOSHU 相比,所提出的算法速度提高了 42 倍,占用的内存减少了 19 倍。此外,所提出的算法在时间段和交易数量方面具有出色的可扩展性。

更新日期:2024-03-26
down
wechat
bug