Annals of Mathematics and Artificial Intelligence ( IF 1.2 ) Pub Date : 2024-03-19 , DOI: 10.1007/s10472-024-09929-7 L. Thorne McCarty
This paper develops a theory of clustering and coding that combines a geometric model with a probabilistic model in a principled way. The geometric model is a Riemannian manifold with a Riemannian metric, \({g}_{ij}(\textbf{x})\), which we interpret as a measure of dissimilarity. The probabilistic model consists of a stochastic process with an invariant probability measure that matches the density of the sample input data. The link between the two models is a potential function, \(U(\textbf{x})\), and its gradient, \(\nabla U(\textbf{x})\). We use the gradient to define the dissimilarity metric, which guarantees that our measure of dissimilarity will depend on the probability measure. Finally, we use the dissimilarity metric to define a coordinate system on the embedded Riemannian manifold, which gives us a low-dimensional encoding of our original data.
中文翻译:
聚类、编码和相似性的概念
本文发展了一种聚类和编码理论,以原则性的方式将几何模型与概率模型结合起来。几何模型是具有黎曼度量\({g}_{ij}(\textbf{x})\) 的黎曼流形,我们将其解释为相异性的度量。概率模型由随机过程组成,该过程具有与样本输入数据的密度相匹配的不变概率度量。两个模型之间的联系是势函数\(U(\textbf{x})\)及其梯度\(\nabla U(\textbf{x})\)。我们使用梯度来定义相异性度量,这保证了我们的相异性度量将取决于概率度量。最后,我们使用相异度量在嵌入式黎曼流形上定义坐标系,这为我们提供了原始数据的低维编码。