当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Alice  and the Caterpillar: A more descriptive null model for assessing data mining results
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2023-11-02 , DOI: 10.1007/s10115-023-02001-6
Giulia Preti , Gianmarco De Francisci Morales , Matteo Riondato

We introduce novel null models for assessing the results obtained from observed binary transactional and sequence datasets, using statistical hypothesis testing. Our null models maintain more properties of the observed dataset than existing ones. Specifically, they preserve the Bipartite Joint Degree Matrix of the bipartite (multi-)graph corresponding to the dataset, which ensures that the number of caterpillars, i.e., paths of length three, is preserved, in addition to other properties considered by other models. We describe Alice, a suite of Markov chain Monte Carlo algorithms for sampling datasets from our null models, based on a carefully defined set of states and efficient operations to move between them. The results of our experimental evaluation show that Alice mixes fast and scales well, and that our null model finds different significant results than ones previously considered in the literature.



中文翻译:

爱丽丝和毛毛虫:用于评估数据挖掘结果的更具描述性的空模型

我们引入了新颖的零模型,用于使用统计假设检验来评估从观察到的二进制事务和序列数据集获得的结果。我们的空模型比现有模型保留了观察数据集的更多属性。具体来说,它们保留了与数据集相对应的二分(多)图的二分联合度矩阵,这确保了除了其他模型考虑的其他属性之外,还保留了毛毛虫的数量,即长度为三的路径。我们描述了Alice,这是一套马尔可夫链蒙特卡罗算法,用于从我们的空模型中采样数据集,基于一组精心定义的状态和在状态之间移动的有效操作。我们的实验评估结果表明,Alice混合速度快且扩展良好,并且我们的零模型发现了与文献中先前考虑的结果不同的显着结果。

更新日期:2023-11-02
down
wechat
bug