Clustering, coding, and the concept of similarity,Annals of Mathematics and Artificial Intelligence

当前位置： X-MOL 学术 › Ann. Math. Artif. Intel. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Clustering, coding, and the concept of similarity
Annals of Mathematics and Artificial Intelligence ( IF 1.2 ) Pub Date : 2024-03-19 , DOI: 10.1007/s10472-024-09929-7
L. Thorne McCarty

This paper develops a theory of clustering and coding that combines a geometric model with a probabilistic model in a principled way. The geometric model is a Riemannian manifold with a Riemannian metric, \({g}_{ij}(\textbf{x})\), which we interpret as a measure of dissimilarity. The probabilistic model consists of a stochastic process with an invariant probability measure that matches the density of the sample input data. The link between the two models is a potential function, \(U(\textbf{x})\), and its gradient, \(\nabla U(\textbf{x})\). We use the gradient to define the dissimilarity metric, which guarantees that our measure of dissimilarity will depend on the probability measure. Finally, we use the dissimilarity metric to define a coordinate system on the embedded Riemannian manifold, which gives us a low-dimensional encoding of our original data.

中文翻译：

聚类、编码和相似性的概念

本文发展了一种聚类和编码理论，以原则性的方式将几何模型与概率模型结合起来。几何模型是具有黎曼度量\({g}_{ij}(\textbf{x})\) 的黎曼流形，我们将其解释为相异性的度量。概率模型由随机过程组成，该过程具有与样本输入数据的密度相匹配的不变概率度量。两个模型之间的联系是势函数\(U(\textbf{x})\)及其梯度\(\nabla U(\textbf{x})\)。我们使用梯度来定义相异性度量，这保证了我们的相异性度量将取决于概率度量。最后，我们使用相异度量在嵌入式黎曼流形上定义坐标系，这为我们提供了原始数据的低维编码。

更新日期：2024-03-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>