当前位置: X-MOL 学术ACM Trans. Algorithms › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Approximating (k,ℓ)-Median Clustering for Polygonal Curves
ACM Transactions on Algorithms ( IF 1.3 ) Pub Date : 2023-02-23 , DOI: https://dl.acm.org/doi/10.1145/3559764
Maike Buchin, Anne Driemel, Dennis Rohde

In 2015, Driemel, Krivošija, and Sohler introduced the k,ℓ-median clustering problem for polygonal curves under the Fréchet distance. Given a set of input curves, the problem asks to find k median curves of at most ℓ vertices each that minimize the sum of Fréchet distances over all input curves to their closest median curve. A major shortcoming of their algorithm is that the input curves are restricted to lie on the real line. In this article, we present a randomized bicriteria-approximation algorithm that works for polygonal curves in ℝd and achieves approximation factor (1+ɛ) with respect to the clustering costs. The algorithm has worst-case running time linear in the number of curves, polynomial in the maximum number of vertices per curve (i.e., their complexity), and exponential in d, ℓ, 1/ɛ and 1/δ (i.e., the failure probability). We achieve this result through a shortcutting lemma, which guarantees the existence of a polygonal curve with similar cost as an optimal median curve of complexity ℓ, but of complexity at most 2ℓ -2, and whose vertices can be computed efficiently. We combine this lemma with the superset sampling technique by Kumar et al. to derive our clustering result. In doing so, we describe and analyze a generalization of the algorithm by Ackermann et al., which may be of independent interest.



中文翻译:

多边形曲线的近似 (k,ℓ)-中值聚类

2015 年,Driemel、Krivošija 和 Sohler 引入了Fréchet 距离下多边形曲线的k,ℓ -中值聚类问题。给定一组输入曲线,问题要求找到最多 ℓ 个顶点的k条中值曲线,每个顶点最小化所有输入曲线到它们最接近的中值曲线的 Fréchet 距离之和。他们算法的一个主要缺点是输入曲线被限制在实线上。在本文中,我们提出了一种适用于 ℝ d中的多边形曲线的随机双准则近似算法并获得关于聚类成本的近似因子 (1+ε)。该算法的最坏情况运行时间与曲线数量呈线性关系,每条曲线的最大顶点数(即它们的复杂性)呈多项式关系,而 d、ℓ、1/ε 和 1/δ 呈指数关系(失败可能性)。我们通过一个捷径引理实现了这个结果,它保证了一个多边形曲线的存在,其成本与复杂度为 ℓ 的最优中值曲线相似,但复杂度最多为 2ℓ -2,并且可以有效地计算其顶点。我们将这个引理与 Kumar 等人的超集采样技术结合起来。得出我们的聚类结果。为此,我们描述和分析了 Ackermann 等人对算法的概括,这可能具有独立的兴趣。

更新日期:2023-02-23
down
wechat
bug