当前位置: X-MOL 学术Ann. Math. Artif. Intel. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Single MCMC chain parallelisation on decision trees
Annals of Mathematics and Artificial Intelligence ( IF 1.2 ) Pub Date : 2023-07-02 , DOI: 10.1007/s10472-023-09876-9
Efthyvoulos Drousiotis , Paul Spirakis

Decision trees (DT) are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian DT depends on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC DT chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.



中文翻译:

决策树上的单 MCMC 链并行化

决策树(DT)在机器学习领域非常有名,通常能够获得最先进的性能。尽管如此,CART、ID3、随机森林和提升树等众所周知的变体都缺少一个概率版本,该版本对有关树结构的先前假设进行编码并在节点参数之间共享统计强度。贝叶斯DT的现有工作依赖于马尔可夫链蒙特卡罗(MCMC),它的计算速度可能很慢,尤其是在高维数据和昂贵的建议上。在本研究中,我们提出了一种在普通笔记本电脑或个人计算机上并行化单个 MCMC DT 链的方法,使我们能够通过多核处理减少其运行时间,同时结果在统计上与传统的顺序实现相同。我们还计算了运行时间的理论和实际减少量,这可以利用我们在多处理器架构上的方法来获得。实验表明,如果串行和并行实现在统计上相同,我们可以将运行时间提高 18 倍。

更新日期:2023-07-04
down
wechat
bug