Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis,Australian & New Zealand Journal of Statistics

当前位置： X-MOL 学术 › Aust. N. Z. J. Stat. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis
Australian & New Zealand Journal of Statistics ( IF 1.1 ) Pub Date : 2022-02-10 , DOI: 10.1111/anzs.12350
Jan Greve ₁ , Bettina Grün ₁ , Gertraud Malsiner‐Walli ₁ , Sylvia Frühwirth‐Schnatter ₁

Affiliation

Cluster analysis aims at partitioning data into groups or clusters. In applications, it is common to deal with problems where the number of clusters is unknown. Bayesian mixture models employed in such applications usually specify a flexible prior that takes into account the uncertainty with respect to the number of clusters. However, a major empirical challenge involving the use of these models is in the characterisation of the induced prior on the partitions. This work introduces an approach to compute descriptive statistics of the prior on the partitions for three selected Bayesian mixture models developed in the areas of Bayesian finite mixtures and Bayesian nonparametrics. The proposed methodology involves computationally efficient enumeration of the prior on the number of clusters in-sample (termed as ‘data clusters’) and determining the first two prior moments of symmetric additive statistics characterising the partitions. The accompanying reference implementation is made available in the R package fipp. Finally, we illustrate the proposed methodology through comparisons and also discuss the implications for prior elicitation in applications.

中文翻译：

在贝叶斯聚类分析中窥探数据聚类数量和分区分布的先验

聚类分析旨在将数据划分为组或聚类。在应用程序中，处理簇数未知的问题是很常见的。在此类应用中采用的贝叶斯混合模型通常指定一个灵活的先验，该先验考虑到关于集群数量的不确定性。然而，涉及使用这些模型的一个主要经验挑战是对分区上的诱导先验的表征。这项工作介绍了一种方法来计算在贝叶斯有限混合和贝叶斯非参数领域开发的三个选定贝叶斯混合模型的分区的先验描述性统计。所提出的方法涉及计算有效地枚举样本中集群（称为“数据集群”）数量的先验，并确定表征分区的对称加性统计的前两个先验矩。随附的参考实现可在R包fipp。最后，我们通过比较说明了所提出的方法，并讨论了对应用程序中的先验启发的影响。

更新日期：2022-02-10

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>