当前位置: X-MOL 学术J. R. Stat. Soc. B › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Exact clustering in tensor block model: Statistical optimality and computational limit
The Journal of the Royal Statistical Society, Series B (Statistical Methodology) ( IF 5.8 ) Pub Date : 2022-10-30 , DOI: 10.1111/rssb.12547
Rungang Han 1, 2 , Yuetian Luo 1 , Miaoyan Wang 1 , Anru R. Zhang 1, 2, 3, 4, 5
Affiliation  

High-order clustering aims to identify heterogeneous substructures in multiway datasets that arise commonly in neuroimaging, genomics, social network studies, etc. The non-convex and discontinuous nature of this problem pose significant challenges in both statistics and computation. In this paper, we propose a tensor block model and the computationally efficient methods, high-order Lloyd algorithm (HLloyd), and high-order spectral clustering (HSC), for high-order clustering. The convergence guarantees and statistical optimality are established for the proposed procedure under a mild sub-Gaussian noise assumption. Under the Gaussian tensor block model, we completely characterise the statistical-computational trade-off for achieving high-order exact clustering based on three different signal-to-noise ratio regimes. The analysis relies on new techniques of high-order spectral perturbation analysis and a ‘singular-value-gap-free’ error bound in tensor estimation, which are substantially different from the matrix spectral analyses in the literature. Finally, we show the merits of the proposed procedures via extensive experiments on both synthetic and real datasets.

中文翻译:

张量块模型中的精确聚类:统计最优性和计算限制

高阶聚类旨在识别神经影像学、基因组学、社交网络研究等中常见的多路数据集中的异构子结构。该问题的非凸性和不连续性对统计和计算都提出了重大挑战。在本文中,我们提出了张量块模型和计算效率高的方法、高阶劳埃德算法(HLloyd) 和高阶谱聚类(HSC),用于高阶聚类。在温和的亚高斯噪声假设下,为所提出的过程建立了收敛保证和统计最优性。在高斯张量块模型下,我们完全描述了基于三种不同信噪比机制实现高阶精确聚类的统计计算权衡。该分析依赖于高阶谱扰动分析的新技术和张量估计中的“无奇异值间隙”误差界限,这与文献中的矩阵谱分析有很大不同。最后,我们通过对合成数据集和真实数据集的广泛实验展示了所提出程序的优点。
更新日期:2022-10-30
down
wechat
bug