1 Introduction

The Gromov–Hausdorff (GH) distance, a notion of distance between compact metric spaces, was introduced by Gromov in the 1980 s and was eventually adapted into data/shape analysis by the second author [54, 69, 70] as a tool for measuring the dissimilarity between shapes/datasets.

Despite its usefulness in providing a mathematical model for shape matching procedures, [9, 69, 70], the Gromov–Hausdorff distance leads to NP-hard problems: [58] relates it to the well known Quadratic Assignment Problem, which is NP-hard, and Schmiedl in his PhD thesis [78] (see also [4]) directly proves the NP-hardness of the computation of the Gromov–Hausdorff distance even for ultrametric spaces. Recent work has also identified certain Fixed Parameter Tractable algorithms for the GH distance between ultrametric spaces [71].

These hardness results have motivated research in other directions:

  1. (I)

    finding suitable relaxations of the Gromov–Hausdorff distance which are more amenable to computations and

  2. (II)

    finding lower bounds for the Gromov–Hausdorff distance which are easier to compute, yet retain good discriminative power.

Related to the first thread, and based on ideas from optimal transport, the notion of Gromov–Wasserstein distance was proposed in [55, 56]. This notion of distance leads to continuous quadratic optimization problems (as oposed to the combinatorial nature of the problems induced by the Gromov–Hausdorff distance) and, as such, it has benefited from the wealth of continuous optimization computational techniques that are available in the literature [75, 76] and has seen a number of applications in data analysis and machine learning [6, 11, 32, 50, 86] in recent years.

The second thread mentioned above is that of obtaining computationally tractable lower bounds for the usual Gromov–Hausdorff distance. Several such lower bounds were identified in [58] by the second author, and then in [26, 27] and [22] it was proved that hierarchical clustering dendrograms and persistence diagrams or barcodes, metric invariants which arose in the Applied Algebraic Topology community, provide a lower bound for the GH distance. These persistence diagrams will eventually become central to the present paper, but before reviewing them, we will describe the notion of curvature sets introduced by Gromov.

Gromov’s curvature sets and curvature measures

Given a compact metric space \((X,d_X)\), in the book [46] Gromov identified a class of invariants of metric spaces indexed by the natural numbers that classifies compact metric spaces up to isometry. In more detail, Gromov defines for each \(n\in {\mathbb {N}}\), the n-th curvature set of X, denoted by \({\textbf{K}}_n(X)\), as the collection of all \(n\times n\) matrices that arise from restricting \(d_X\) to all possible n-tuples of points chosen from X, possibly with repetitions. The terminology curvature sets is justified by the observation that these sets contain, in particular, metric information about configurations of closely clustered points in a given metric space. This information is enough to recover the curvature of a manifold; see Fig. 1.

Fig. 1
figure 1

The curvature of a smooth curve C can be estimated as the inverse of the radius R of the circle passing through the points \(x,x'\) and p. By plane geometry results [28, Thm. 2.3], this radius can be computed from the 3 interpoint distances a, b, and c, and hence from \({\textbf{K}}_3(C)\), as \(R=R(a,b,c) = \frac{a\,b\,c}{\left( (a+b+c)(a+b-c)(a-b+c)(-a+b+c)\right) ^{1/2}}\). In fact, [28] proves that \(R^{-1} = \kappa + \frac{1}{3}(b-a) \kappa _s + \cdots \) where \(\kappa \) and \(\kappa _s\) are the curvature and its arc length derivative at the point p.

These curvature sets have the property that \({\textbf{K}}_n(X)={\textbf{K}}_n(Y)\) for all \(n\in {\mathbb {N}}\) is equivalent to the statement that the compact metric spaces X and Y are isometric. Constructions similar to the curvature sets of Gromov were also identified by Peter Olver in [74] in his study of invariants for curves and surfaces under different group actions (including the group of Euclidean isometries).

[58] points out that the GH distance admits lower bounds based on these curvature sets:

$$\begin{aligned} d_{\mathcal{G}\mathcal{H}}(X,Y)\ge {\widehat{d}}_{{\mathcal {G}}{\mathcal {H}}}\left( {X},{Y}\right) :=\frac{1}{2}\sup _{n\in {\mathbb {N}}} d_{{\mathcal {H}}}({\textbf{K}}_n(X),{\textbf{K}}_n(Y)) \end{aligned}$$
(1)

for all XY compact metric spaces. Here, \(d_{{\mathcal {H}}}\) denotes the Hausdorff distance on \({\mathbb {R}}^{n\times n}\) with \(\ell ^\infty \) distance. As we mentioned above, the computation of the Gromov–Hausdorff distance leads in general to NP-hard problems, whereas the lower bound in the equation above can be computed in polynomial time when restricted to definite values of n. In [58] it is argued that work of Peter Olver [74] and Boutin and Kemper [17] leads to identifying rich classes of shapes where these lower bounds permit full discrimination.

In the category of compact mm-spaces, that is triples \((X,d_X,\mu _X)\) where \((X,d_X)\) is a compact metric space and \(\mu _X\) is a fully supported probability measure on X (see Definition 2.6), Gromov also discusses the following parallel construction: for an mm-space \((X,d_X,\mu _X)\) let \(\Psi _X^{(n)}:X^{\times n}\longrightarrow {\mathbb {R}}^{n\times n}\) be the map that sends the n-tuple \((x_1,x_2,\ldots ,x_n)\) to the matrix M with elements \(M_{ij}=d_X(x_i,x_j)\). Then, the n-th curvature measure of X is defined as

$$\begin{aligned} \mu _{n}(X):=\Bigg (\Psi _X^{(n)}\Bigg )_{\#}\mu _X^{\otimes n}, \end{aligned}$$
(2)

where \(\mu _X^{\otimes n}\) is the product measure on \(X^{\times n}\) and \((\Psi _X^{(n)})_\# \mu _X^{\otimes n}\) is the pushforward to \({\mathbb {R}}^{n \times n}\). Clearly, curvature measures and curvature sets are related by \(\textrm{supp}(\mu _n(X))={\textbf{K}}_n(X)\) for all \(n\in {\mathbb {N}}\). Gromov then proves in his mm-reconstruction theorem that the collection of all curvature measures permit reconstructing any given mm-space up to isomorphism. See Theorem 3.8 for a relationship, analogous to (1), between the curvature measures and the Gromov–Wasserstein distance.

Persistent Homology Ideas related to what is nowadays know as persistent homology appeared already in the late 1980 s and early 1990 s in the work of Patrizio Frosini [37,38,39], then in the work of Vanessa Robins [77], in the work of Edelsbrunner and collaborators [35], and then in the work of Carlsson and Zomorodian [90]. Some excellent references for this topic are [19, 34, 41, 89].

In a nutshell, persistent homology (PH) assigns to a given compact metric space X and an integer \(k\ge 0\), a multiset of points \(\textrm{dgm}_k^\textrm{VR}(X)\) in the plane, known as the k-th (Vietoris–Rips) persistence diagram of X. The standard PH pipeline is shown in Fig. 2.

Fig. 2
figure 2

The pipeline to compute a persistence diagram. Starting with a distance matrix, we compute the Vietoris–Rips complex and its reduced homology, and produce an interval decomposition. Together, we call these three steps \({\text {PH}}_k^\textrm{VR}\).

These diagrams indicate the presence of k-dimensional multi-scale topological features in the space X, and can be compared via the bottleneck distance (which is closely related to but is stronger than the Hausdorff distance in \(({\mathbb {R}}^2,\ell ^\infty )\)).

Following work by Cohen-Steiner et al. [29], in [22] it is proved that the maps \(X\mapsto \textrm{dgm}_k^\textrm{VR}(X)\) sending a given compact metric space to its k-th Vietoris–Rips persistence diagrams is 2-Lipschitz under the GH and bottleneck distances.

Algorithmic work by Edelsbrunner and collaborators [35] and more recent developments [8] guarantee that not only can \(\textrm{dgm}_k^\textrm{VR}(X)\) be computed in polynomial time (in the cardinality of X) but also it is well known that the bottleneck distance can also be computed in polynomial time [34]. This means that persistence diagrams provide another source of stable invariants which would permit estimating (lower bounding) the Gromov–Hausorff distance.

It is known that persistence diagrams are not full invariants of metric spaces. For instance, any two tree metric spaces, that is metric graphs that are \(\delta \)-hyperbolic with \(\delta =0\) [45], have trivial persistence diagrams in all degrees \(k\ge 1\). It is also not difficult to find two finite tree metric spaces with the same degree zero persistence diagrams. See [52] for more examples and [73] for results about stronger invariants (i.e. persistent homotopy groups).

Despite the fact that persistence diagrams for a fixed degree k can be computed with effort which depends polynomially on the size of the input metric space [7, 34], the computations are actually quite onerous and, as of today, it is not realistic to compute the degree 1 Vietoris–Rips persistence diagram of a finite metric space with more than a few thousand points even with state of the art implementations such as Ripser [8] and Ripser++ [91].

Curvature sets over persistence diagrams In this paper, we consider a version of the curvature set ideas which arises when combining their construction with Vietoris–Rips persistent homology. For a compact metric space X and integers \(n\ge 1\) and \(k\ge 0\), the (nk)-Vietoris–Rips persistence set of X is (cf. Definition 3.10) the collection \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) of all persistence diagrams in degree k of subsets of X with cardinality at most n. In a manner similar to how the n-th curvature measure \(\mu _n(X)\) arose above, we also study the probability measure \({\textbf{U}}_{n,k}^{\textrm{VR}}(X)\) defined as the pushforward of \(\mu _n(X)\) under the degree k Vietoris–Rips persistence diagram map (cf. Definition 3.18). We also study a more general version wherein for any simplicial filtration functor \({\mathfrak {F}}\) (cf. Definition 2.15), we consider both the persistence sets \({\textbf{D}}_{n,k}^{{\mathfrak {F}}}(X)\) and the the persistence measures \({\textbf{U}}_{n,k}^{{\mathfrak {F}}}(X)\). Furthermore, as we discuss below, for certain choices of the parameters k and n curvature sets are not only more efficient to compute (in terms of memory requirements and/or in terms overall computational cost) than standard persistence diagrams, but they also often capture information which is not directly visible through the lens of standard persistence diagrams.

Fig. 3
figure 3

The pipeline for computing \({\textbf{D}}_{n,k}^{\textrm{VR}}\). Starting with a metric space \((X,d_X)\), we take samples of the distance matrix as elements of \({\textbf{K}}_{n}(X)\), apply \({\text {PH}}_k\) to each, and aggregate the resulting persistence diagrams. For example, Theorem 4.4 guarantees that the VR-persistence diagram in dimension k of a metric space with \(n=2k+2\) points only has one point. The aggregation in this case means plotting the set \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) by plotting all diagrams simultaneously in one set of axes. In general, the diagrams in \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) have more than 1 point, so one possibility for aggregation is constructing a one-point summary or an average of a persistence diagram (for instance, a Chebyshev center or an \(\ell _\infty \) mean) and then plotting all such points simultaneously. The figure aims to convey the eminently parallelizable nature of \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\).

Fig. 4
figure 4

A graphical representation of how the principal persistent set \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) is obtained by overlaying the persistence diagrams of all samples \(Y \subset X\) (with \(|Y| \le 2k+2\)) into a single set of axes. This is made possible since by Theorem 4.4 these diagrams have at most one off diagonal point.

1.1 Contributions

We believe that persistence sets are useful as an alternative paradigm for the efficient computation of invariants/features based on persistent homology. We believe that persistence sets are useful as a general paradigm for the efficient computation of invariants/features based on persistent homology. Persistence sets are designed to generalize and complement—not substitute the usual persistence diagrams. We provide a thorough study of persistence sets and, in particular, analyze the following points.

Persistence sets and measures generalize \(\textrm{dgm}_*^\textrm{VR}\). The family \(\{{\textbf{D}}_{n,k}^{\textrm{VR}}(X)\}_{n\ge 1,k\ge 0}\) of all persistence sets of X generalizes the family \(\{\textrm{dgm}^\textrm{VR}_k(X)\}_{k\ge 0}\) of all Vietoris–Rips persistence diagrams of X in the sense that, when \(n = |X| < \infty \), \(\textrm{dgm}^\textrm{VR}_k(X)\) is an element of \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) for each \(k\ge 0\).

Some persistence sets and measures can discriminate spaces that \(\textrm{dgm}_*^\textrm{VR}\) cannot There are many cases in which Vietoris–Rips barcodes are unable to discriminate spaces, see discussion in Sect. 9.4 of [52]. For instance, the existence of a crushing \(X \rightarrow Y\) (in the sense of Hausmann) between metric spaces such that \(Y\subseteq X\) gives for each \(r>0\) homotopy equivalences \(\textrm{VR}_r(X) \simeq \textrm{VR}_r(Y)\) through Proposition 2.2 of [47]. Furthermore, the VR-persistence diagrams of X and Y are equal; see Fig. 5 for an example.

In contrast, it is interesting that in many such scenarios some elements of the family of persistence sets can capture strictly more information than VR persistent diagrams. In Example 3.16 we show that the sets \({\textbf{D}}_{n,0}^{\textrm{VR}}(X)\) contain information about the distances in X, whereas \(\textrm{dgm}_0^\textrm{VR}(X)\) is empty whenever X is connected (recall that we use reduced homology). Additionally, in Example 6.6 we show a graph G that consists of a cycle C with 4 edges attached for which \({\textbf{D}}_{4,1}^{\textrm{VR}}(G)\) is different (more precisely, larger than) \({\textbf{D}}_{4,1}^{\textrm{VR}}(C)\); cf. Figure 6. This observation generalizes to the k-sphere with \(2k+2\) edges attached; see Proposition 6.8 and Fig. 35.

Fig. 5
figure 5

A graph G formed by a circle C with two trees attached. Since there is a crushing of G to C (in the sense of Hausmann [47]), \(\textrm{dgm}_k^\textrm{VR}(G)=\textrm{dgm}_k^\textrm{VR}(C)\) for all k.

Fig. 6
figure 6

Left: A metric graph G formed by a cycle C with four edges attached. All edges have length 1. In the notation of Example 6.6, the edges are attached at \(y_1, y_2, y_3\), and \(y_4\). Middle: The persistence set \({\textbf{D}}_{4,1}^{\textrm{VR}}(C)\). Right: Even though \(\textrm{VR}_*(G) \simeq \textrm{VR}_*(C)\), and as a consequence the persistence diagrams are identical, the set \({\textbf{D}}_{4,1}^{\textrm{VR}}(G) {\setminus } {\textbf{D}}_{4,1}^{\textrm{VR}}(C)\) is non-empty (see Remark 6.7). The middle and right figures were obtained by sampling 100,000 configurations of 4 points uniformly from G. Of those, about 12.98% were contained in C. The fraction of configurations in G (resp. C) that produced a non-diagonal point in \({\textbf{D}}_{4,1}^{\textrm{VR}}(G)\) (resp. \({\textbf{D}}_{4,1}^{\textrm{VR}}(C)\)) is 7.59% (resp. 10.97%).

Discriminating power on a classification task In Sect. 4.3.1, we describe results on a shape classification experiment which indicate that persistent sets can be useful invariants for practical data classification applications. In order to carry out this test, we computed approximations of the persistence sets \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}\) and the persistence measures \({\textbf{U}}_{2k+2,k}^{\textrm{VR}}\), for \(k=0,1,2\), of 62 three-dimensional shapes in 6 different classes from the publicly available database [80]. We classified these shapes using the 1-nearest neighbor classifier induced by the Hausdorff and 1-Wasserstein distances between persistence sets and measures, respectively.

Computational cost, memory requirements, paralellizability, and approximation Besides its ability to often detect useful information that is not captured by standard VR persistence diagrams, another motivation for considering persistence sets \({\textbf{D}}_{n,k}^{\textrm{VR}}\) for small n as features that can help in shape/data classification is that the cost incurred in their computation/approximation compares favourably against the cost and memory requirements of computing \(\textrm{dgm}_{k}^\textrm{VR}(X)\) as the size of X increases. Furthermore, not only are the associated computational tasks eminently parallelizable (cf. Figure 3) but also, when n is small, the amount of memory needed for computing persistent sets is also notably smaller than for computing persistence diagrams over the same data set. See Sects. 3.1.2 and 4.4 for a detailed discussion.

Principal persistence sets, their characterization and an algorithm Persistence sets are defined to be sets of persistence diagrams and, although a single persistence diagram is easy to visualize, large collections of them might not be so. However, our main result (Theorem 4.4) says that the degree k persistence diagram of X contains no points if \(|X| < 2k+2\) and at most one point if \(|X|=2k+2\). For that reason, we aggregate all persistence diagrams in the principal persistence set \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) on the same axis; cf. Figure 4.

Furthermore, Theorem 4.4 gives a precise representation of the unique point in the degree k persistence diagram of a metric space with at most \(n_k:=2k+2\) points via a formula which induces an algorithm for computing the principal persistence sets. This algorithm is purely geometric in the sense that it does not rely on analyzing boundary matrices as the standard persistent homology algorithms but, in contrast, directly operates at the level of distance matrices. For any k, this geometric algorithm has cost \(O(n_k^2)\approx O(k^2)\) as opposed to the much larger cost incurred by the algebraic algorithms; see Proposition 4.6. This makes the practical approximation of principal persistence sets to be very efficient; see Corollary 4.9.

Fig. 7
figure 7

Characterization of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\): The (4, 1)-persistence set of \({\mathbb {S}}^{1}\) (with geodesic distance) is the shaded triangular area where the top left and top right points have coordinates \((\frac{\pi }{2},\pi )\) and \((\pi ,\pi )\), respectively, whereas the lowest diagonal point has coordinates \((\frac{2\pi }{3},\frac{2\pi }{3})\). This is the \(k=1\) case of Theorem 5.6. The figure also shows exemplary configurations \(X \subset {\mathbb {S}}^{1}\) with \(|X| \le 4\) together with their respective persistence diagrams inside of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\).

Characterization results We fully characterize the principal persistence sets \({\textbf{D}}_{{2k+2},k}^{\textrm{VR}}({\mathbb {S}}^{1})\) (Theorems 5.4 and 5.6). In particular, these results prove that \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) coincides with the triangle in \({\mathbb {R}}^2\) with vertices \((\frac{2\pi }{3},\frac{2\pi }{3})\), \((\frac{\pi }{2},\pi )\), and \((\pi ,\pi )\); see Fig. 7. We also characterize the persistence measure \({\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\), which is supported on \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\), in Proposition 5.9. Furthermore, if \({\mathbb {S}}^{1}\) has the uniform probability measure, we show that \({\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) has probability density function \(f(t_b,t_d)=\frac{12}{\pi ^3}(\pi -t_d)\), for any \((t_b,t_d)\) in the triangular region specified in Fig. 7. Propositions 5.17 and 5.21, and Corollary 5.22 provide additional information about higher dimensional spheres. In particular, we discuss the use of a MCMC random walk to effectively sample from \({\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2})\); see Conjecture 5.25. Example 4.13 has computational approximations of the persistence measure \({\textbf{U}}_{4,1}^{\textrm{VR}}\) of the 2-sphere and the torus. These characterization results are in the same spirit as those pioneered by Adamaszek and Adams on the Vietoris–Rips persistence diagrams of circles and spheres [1]; see also [52].

We also compute \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {R}}^n)\) using Ptolemy’s inequality (Proposition 5.17). In fact, Ptolemy’s inequality generalizes to non-Euclidean geometries, so we can also characterize the (4,1)-persistence sets of surfaces with constant curvature \(M_\kappa \). For clarity, \(M_0 = {\mathbb {R}}^2\), \(M_\kappa \) is the sphere of radius \(1/\sqrt{\kappa }\) if \(\kappa >0\) or a rescaling of the hyperbolic plane if \(\kappa <0\).

Theorem 5.19

Let \(M_\kappa \) be the 2-dimensional model space with constant sectional curvature \(\kappa \). Then:

  • If \(\kappa \!\!>\!\!0\), \({\textbf{D}}_{4,1}^{\textrm{VR}}(M_\kappa ) \!\!=\!\! \bigg \{ (t_b,t_d)| \sin \!\left( \frac{\sqrt{ \kappa }}{2} t_d \right) \!\!\le \!\! \sqrt{2}\sin \left( \frac{\sqrt{ \kappa }}{2} t_b \right) \text { and } 0 \!\!<\!\! t_b \!\!<\!\! t_d \!\!\le \!\! \frac{\pi }{\sqrt{\kappa }} \bigg \}\).

  • If \(\kappa =0\), \({\textbf{D}}_{4,1}^{\textrm{VR}}(M_0) = \left\{ (t_b,t_d)|\ 0 \le t_b < t_d \le \sqrt{2}t_b \right\} \).

  • If \(\kappa \!\!<\!0\), \({\textbf{D}}_{4,1}^{\textrm{VR}}(M_\kappa ) \!\!=\!\! \bigg \{ (t_b,t_d)| \sinh \!\left( \frac{\sqrt{-\kappa }}{2} t_d \right) \!\!\le \!\! \sqrt{2}\sinh \!\left( \frac{\sqrt{ -\kappa }}{2} t_b \right) \text { and } 0 \!<\! t_b \!<\! t_d \bigg \}\).

A similar result appears in [16], where the authors studied the Čech complex of triangles in the model spaces of constant curvature. Using the logarithmic persistence (that is, the fraction \(t_d/t_b\)), they detected the curvature of the ambient space both analytically and experimentally. See their paper for more details.

An application of \({\textbf{D}}_{4,1}^{\textrm{VR}}\) to detecting homotopy type of graphs In Sect. 6, as an application of the characterization of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\), we study a class of metric graphs for which \({\textbf{D}}_{4,1}^{\textrm{VR}}\), a rather coarse invariant which is fairly easy to estimate and compute in practice, is able to characterize the homotopy type of graphs in this class.

Stability In Theorems 3.13 and 3.19 we establish the stability of persistence sets and measures under the modified Gromov–Hausdorff and Gromov–Wasserstein distances. Such results give lower bounds for these distances which are computable in polynomial time. In particular, see Sect. 5.8.1.

1.2 Related Work

The measures \({\textbf{U}}_{n,k}^{\textrm{VR}}\) first appeared in a preprint by Blumberg et al. [14] in 2012 and then in print in [15]. These measures were also exploited a few years later by Chazal et al. in the articles [24, 25] in order to devise bootstrapping methods for the estimation of persistence diagrams.

The connection to Gromov’s curvature sets and measures was not recognized in either of these two papers. [58] studied curvature sets and their role in shape comparison and, as a natural follow up, some results regarding the persistence sets \({\textbf{D}}_{n,k}^{\textrm{VR}}\) and the measures \({\textbf{U}}_{n,k}^{\textrm{VR}}\) (as well as the more general objects \({\textbf{D}}_{n,k}^{{\mathfrak {F}}}\) and \({\textbf{U}}_{n,k}^{{\mathfrak {F}}}\)) were first described in Banff in 2012 during a conference [57] by the second authorFootnote 1 as stable and computationally easier alternatives to the usual Vietoris–Rips persistence diagrams of metric spaces [63].

In [83] Bendich et al. discuss ideas related to our construction of \({\textbf{D}}_{n,k}^{{\mathfrak {F}}}.\) The authors pose questions about the discriminative power of a certain labeled version of the persistent sets \({\textbf{D}}_{n,k}^{\textrm{VR}}\) (even though they do not call them that). [66] has recently explored the classificatory power of \(\mu _2\) (see Eq. (2)) as well as that of certain localizations of \(\mu _2\). In [20] the authors identify novel classes of simplicial filtrations arising from curvature sets together with suitable notions of locality.

In terms of data centric applications, the neuroscience paper [79] made use of ideas related to \({\textbf{U}}_{n,k}^{\textrm{VR}}\) and \({\textbf{D}}_{n,k}^{\textrm{VR}}\) in the context of analysis of neuroscientific data.

2 Background

For us, \({\mathcal {M}}\) and \({\mathcal {M}}^\text {fin}\) will denote, respectively, the category of compact and finite metric spaces. The morphisms in both categories will be 1-Lipschitz maps, that is, functions \(\varphi :X \rightarrow Y\) such that \(d_Y(\varphi (x), \varphi (x')) \le d_X(x,x')\) for all \((X,d_X),(Y,d_Y)\) in \({\mathcal {M}}\) or \({\mathcal {M}}^\text {fin}\). We say that two metric spaces are isometric if there exists a surjective isometry \(\varphi :X \rightarrow Y\), i.e. a surjective map such that \(d_Y(\varphi (x), \varphi (x')) = d_X(x,x')\) for all \(x,x' \in X\). We also say that a space is geodesic if for any \(x,x' \in X\), there exists an isometry \(\gamma :[0,d] \rightarrow X\) such that \(d = d_X(x,x')\), \(\gamma (0)=x\) and \(\gamma (d)=x'\).

2.1 Metric Geometry

In this section, we define the tools that we’ll use to quantitatively compare metric spaces [10].

Definition 2.1

For any subset A of a metric space X, its diameter is \(\textbf{diam}_X(A):= \sup _{a,a' \in A} d_X(a,a')\), and its radius is \(\textbf{rad}_X(A):= \inf _{p \in X} \sup _{a \in A} d_X(p,a)\). Note that \(\textbf{rad}_X(A) \le \textbf{diam}_X(A)\). The separation of X is \(\textbf{sep}(X):= \inf _{x \ne x'} d_X(x,x')\).

Definition 2.2

(Hausdorff distance) Let AB be subsets of a compact metric space \((X,d_X)\). The Hausdorff distance between A and B is defined as

$$\begin{aligned} d_{{\mathcal {H}}}^X(A,B):= \inf \left\{ \varepsilon >0 \ | \ A \subset B^\varepsilon \text { and } B \subset A^\varepsilon \right\} , \end{aligned}$$

where \(A^\varepsilon := \left\{ x \in X \ | \ \inf _{a \in A} d_X(x,a) < \varepsilon \right\} \) is the \(\varepsilon \)-thickening of A. It is known that \(d_{{\mathcal {H}}}^X(A,B)=0\) if, and only if their closures are equal: \({\bar{A}} = {\bar{B}}\).

We will use an alternative definition that is useful for calculations, but is not standard in the literature.

Definition 2.3

A correspondence between two sets X and Y is a set \(R \subset X \times Y\) such that \(\pi _1(R) = X\) and \(\pi _2(R)=Y\), where \(\pi _i\) is the projection to the i-th coordinate. We will denote the set of all correspondences between X and Y as \({\mathcal {R}}(X,Y)\).

Proposition 2.4

(Proposition 2.1 of [56]) For any compact metric space \((X,d_X)\) and any \(A,B \subset X\) closed,

$$\begin{aligned} d_{{\mathcal {H}}}^X(A,B) = \inf _{R \in {{\mathcal {R}}}(A,B)} \sup _{(a,b) \in R} d_X(a,b). \end{aligned}$$

The standard method for comparing two metric spaces is a generalization of the Hausdorff distance.

Definition 2.5

For any correspondence R between \((X, d_X), (Y, d_Y) \in {\mathcal {M}}\), we define its distortion as

$$\begin{aligned} \textrm{dis}(R):= \max \left\{ |d_X(x,x') - d_Y(y,y')|: (x,y),(x',y') \in R \right\} . \end{aligned}$$

Then the Gromov–Hausdorff distance between X and Y is defined as

$$\begin{aligned} d_{\mathcal{G}\mathcal{H}}(X,Y):= \dfrac{1}{2} \inf _{R \in {{\mathcal {R}}}(X,Y)} \textrm{dis}(R). \end{aligned}$$

2.2 Metric Measure Spaces

To model the situation in which points are endowed with a notion of weight (signaling their trustworthiness), we will also consider finite metric spaces enriched with probability measures [56]. Recall that the support \(\textrm{supp}(\nu )\) of a Borel measure \(\nu \) defined on a topological space Z is defined as the minimal closed set \(Z_0\) such that \(\nu (Z {\setminus } Z_0)=0\). If \(\varphi :Z \rightarrow X\) is a measurable map from a measure space \((Z,\Sigma _Z,\nu )\) into the measurable space \((X,\Sigma _X)\), then the pushforward measure of \(\nu \) induced by \(\varphi \) is the measure \(\varphi _\#\nu \) on X defined by \(\varphi _\#\nu (A):= \nu (\varphi ^{-1}(A))\) for all \(A \in \Sigma _X\).

Definition 2.6

A metric measure space is a triple \((X, d_X, \mu _X)\) where \((X,d_X)\) is a compact metric space and \(\mu _X\) is a Borel probability measure on X with full support, i.e. \(\textrm{supp}(\mu )=X\). Two mm-spaces \((X, d_X, \mu _X)\) and \((Y, d_Y, \mu _Y)\) are isomorphic if there exists an isometry \(\varphi :X \rightarrow Y\) such that \(\varphi _\#\mu _X = \mu _Y\). We define the category of mm-spaces \({\mathcal {M}}^w\), where the objects are mm-spaces and the morphisms are 1-Lipschitz maps \(\varphi :X \rightarrow Y\) such that \(\varphi _\# \mu _X = \mu _Y\).

The following definitions are used to compare mm-spaces.

Definition 2.7

Given two measure spaces \((X, \Sigma _X, \mu _X)\) and \((Y, \Sigma _Y, \mu _Y)\), a coupling between \(\mu _X\) and \(\mu _Y\) is a measure \(\mu \) on \(X \times Y\) such that \(\mu (A \times Y) = \mu _X(A)\) and \(\mu (X \times B) = \mu _Y(B)\) for all measurable \(A \in \Sigma _X\) and \(B \in \Sigma _Y\) (in other words, \((\pi _1)_\# \mu = \mu _X\) and \((\pi _2)_\# \mu = \mu _Y\)). We denote the set of couplings between \(\mu _X\) and \(\mu _Y\) as \({\mathcal {M}}(\mu _X, \mu _Y)\).

Remark 2.8

(The support of a coupling is a correspondence) Notice that, since \(\mu _X\) is fully supported and X is finite, then \(\mu (\pi _1^{-1}(x)) = \mu _X(\{x\}) \ne 0\) for any fixed coupling \(\mu \in {\mathcal {M}}(\mu _X, \mu _Y)\). Thus, the set \(\pi _1^{-1}(x) \cap \textrm{supp}(\mu )\) is non-empty for every \(x \in X\). The same argument on Y shows that \(\textrm{supp}(\mu )\) is a correspondence between X and Y.

Definition 2.9

Given a metric space \((Z, d_Z)\), let \({\mathcal {P}}_1(Z)\) be the set of Borel probability measures on Z. Given \(\alpha ,\beta \in {\mathcal {P}}_1(Z)\) and \(p \ge 1\), the Wasserstein distance of order p is defined as [87]:

$$\begin{aligned} d_{{\mathcal {W}},p}^Z(\alpha , \beta ):= \inf _{\mu \in {\mathcal {M}}(\alpha , \beta )} \left( \iint _{Z \times Z} (d_Z(z,z'))^p \mu (dz \times dz') \right) ^{1/p}. \end{aligned}$$

To compare two mm-spaces, we have the following distance.

Definition 2.10

Given two mm-spaces \((X, d_X, \mu _X)\) and \((Y, d_Y, \mu _Y)\), \(p \ge 1\), and \(\mu \in {\mathcal {M}}(\mu _X, \mu _Y)\), we define the p-distortion of \(\mu \) as

$$\begin{aligned} \textrm{dis}_p(\mu ):= \left( \iint |d_X(x,x')-d_Y(y,y')|^p \mu (dx \times dy) \mu (dx' \times dy') \right) ^{1/p} \end{aligned}$$

For \(p=\infty \) we set \(\textrm{dis}_\infty (\mu ):= \textrm{dis}(\textrm{supp}(\mu ))\).

The Gromov–Wasserstein distance of order \(p \in [1,\infty ]\) between X and Y is defined as [56]:

$$\begin{aligned} d_{\mathcal{G}\mathcal{W},p}(X,Y):= \dfrac{1}{2} \inf _{\mu \in {\mathcal {M}}(\mu _X, \mu _Y)} \textrm{dis}_p(\mu ). \end{aligned}$$

Remark 2.11

For each \(p \in [1,\infty ]\), \(d_{\mathcal{G}\mathcal{W},p}\) defines a legitimate metric on the collection of isomorphism classes of mm-spaces in \({\mathcal {M}}^w\) [56].

2.3 Simplicial Complexes

Definition 2.12

Let V be a set. An abstract simplicial complex K with vertex set V is a collection of finite subsets of V such that if \(\sigma \in K\), then every \(\tau \subset \sigma \) is also in K. We also use K to denote its geometric realization. A set \(\sigma \in K\) is called a k-face if \(|\sigma |=k+1\). A simplicial map \(f:K_1 \rightarrow K_2\) is a set map \(f:V_1 \rightarrow V_2\) between the vertex sets of \(K_1\) and \(K_2\) such that if \(\sigma \in K_1\), then \(f(\sigma ) \in K_2\).

We will focus on two particular complexes.

Definition 2.13

Let \((X, d_X) \in {\mathcal {M}}\) and \(r \ge 0\). The Vietoris–Rips complex of X at scale r is the simplicial complex

$$\begin{aligned} \textrm{VR}_{r}(X):= \left\{ \sigma \subset X \text { finite}: \textbf{diam}_X(\sigma ) \le r \right\} . \end{aligned}$$

Definition 2.14

Fix \(n \ge 1\). Let \(e_i:= (0,\dots ,1,\dots ,0)\) be the i-th standard basis vector in \({\mathbb {R}}^n\) and \(V:=\{\pm e_1, \dots , \pm e_n\}\). Let \({\mathfrak {B}}_{n}\) be the collection of subsets \(\sigma \subset V\) that don’t contain both \(e_i\) and \(-e_i\). This simplicial complex is called the n-th cross-polytope.

Fig. 8
figure 8

From left to right: \({\mathfrak {B}}_{1}, {\mathfrak {B}}_{2}, {\mathfrak {B}}_{3}\) (there is no edge between the vertices of \({\mathfrak {B}}_{1}\)). See Definition 2.14.

2.4 Persistent Homology

We adopt definitions from [20, 64].

Definition 2.15

A filtration on a finite set X is a function \(F_X:\textrm{pow}(X) \rightarrow {\mathbb {R}}\) such that \(F_X(\sigma ) \le F_X(\tau )\) whenever \(\sigma \subset \tau \), and we call the pair \((X,F_X)\) a filtered set. \({\mathcal {F}}\) will denote the category of finite filtered sets, where objects are pairs \((X,F_X)\) and the morphisms \(\varphi :(X,F_X) \rightarrow (Y,F_Y)\) are set maps \(\varphi :X \rightarrow Y\) such that \(F_Y(\varphi (\sigma )) \le F_X(\sigma )\). A filtration functor is any functor \({\mathfrak {F}}:{\mathcal {M}}^\textrm{fin} \rightarrow {\mathcal {F}}\) where \((X, F_X) = {\mathfrak {F}}(X)\) and \(F_X: \textrm{pow}(X) \rightarrow {\mathbb {R}}\). Observe that filtration functors are equivariant under isometries.

Definition 2.16

Given \((X, d_X) \in {\mathcal {M}}^\text {fin}\), define the Vietoris–Rips filtration \(F_X^\textrm{VR}\) by setting \(F_X^\textrm{VR}(\sigma ):= \textbf{diam}(\sigma )\) for \(\sigma \subset X\). It is straightforward to check that this construction is functorial, so we define the Vietoris–Rips filtration functor \({\mathfrak {F}}^\textrm{VR}:{\mathcal {M}}^\text {fin} \rightarrow {\mathcal {F}}\) by \((X,d_X) \mapsto (X,F_X^\textrm{VR})\).

More examples of filtration functors, such as the Čech filtration, can be found in [20].

Given a filtration functor \({\mathfrak {F}}\), we assign a persistence diagram to \((X,d_X)\) as follows. Let \((X,F_X^{\mathfrak {F}})={\mathfrak {F}}(X,d_X)\). For every \(r > 0\), we construct the simplicial complex \(L_r:= \left\{ \sigma \subset X: F_X^{\mathfrak {F}}(\sigma ) \le r \right\} \),Footnote 2 giving a nested sequence of simplicial complexes \(L_{r_0} \subset L_{r_1} \subset L_{r_2} \subset \cdots \subset L_{r_m}\). We apply reduced homology \({\widetilde{H}}_k(\cdot , {\mathbb {F}})\) with field coefficients at each step, and we get a persistent vector space \({\text {PH}}_k^{\mathfrak {F}}(X)\) which decomposes as a sum of interval modules \({\text {PH}}_k^{\mathfrak {F}}(X) \cong \bigoplus _{\alpha \in A} {\mathbb {I}}[b_\alpha ,d_\alpha )\) where A is a finite indexing set [23]. We can also represent a persistent vector space by the multiset \(\textrm{dgm}_k^{\mathfrak {F}}(X) = \{(b_\alpha , d_\alpha ) | \ 0 \le b_\alpha < d_\alpha , \alpha \in A\}\), called a persistence diagram. We denote the empty persistence diagram, which corresponds to the persistence module \({\text {PH}}_k^{\mathfrak {F}}(X) = 0\), as \(\emptyset \). Notice that using reduced homology implies that \({\widetilde{H}}_k(L_r)=0\) for \(r \ge F_X^{\mathfrak {F}}(X)\), and so \(d_\alpha < \infty \) for all \(\alpha \in A\), regardless of the dimension k. In dimension 0, this removes the infinite interval. We denote by \({\mathcal {D}}\) the collection of all finite persistence diagrams. We say that a point \(P = (b_\alpha , d_\alpha )\) in a persistence diagram \(D \in {\mathcal {D}}\) has persistence \(\textrm{pers}(P):= d_\alpha -b_\alpha \) and define \(\textrm{pers}(D):= \max \{\textrm{pers}(P) | \ P \in D\}\). Let \(d_{{\mathcal {B}}}\) be the bottleneck distance.

Definition 2.17

We say that a filtration functor \({\mathfrak {F}}:{\mathcal {M}}^\textrm{fin} \rightarrow {\mathcal {F}}\) is stable if there exists a constant \(L>0\) such that

$$\begin{aligned} d_{{\mathcal {B}}}(\textrm{dgm}_k^{\mathfrak {F}}(X), \textrm{dgm}_k^{\mathfrak {F}}(Y)) \le L \cdot d_{\mathcal{G}\mathcal{H}}(X,Y) \end{aligned}$$

for all \(X,Y \in {\mathcal {M}}^\text {fin}\) and \(k \in {\mathbb {N}}\). The infimal L that satisfies the above is called the Lipschitz constant of \({\mathfrak {F}}\) and denoted by \(L({\mathfrak {F}})\).

The Vietoris–Rips and Čech filtrations are stable and, in fact, \(L({\mathfrak {F}}^\textrm{VR})=2\).

3 Curvature Sets, Persistence Diagrams and Persistent Sets

Given a compact metric space \((X,d_X)\), Gromov identified a class of full invariants called curvature sets (see Section \(1.19_+\) of [46] for the definition, and Section \(3\frac{1}{2}.4\) for the terminology “curvature sets”). Intuitively, the n-th curvature set contains the metric information of all possible samples of n points from X. In this section, we define persistence sets as an invariant that captures the persistent homology of all n-point samples of X. We start by recalling Gromov’s definition, and defining an analogue of the Gromov–Hausdorff distance in terms of curvature sets. We then define persistence sets and study their stability with respect to this modified Gromov–Hausdorff distance. We also extend these constructions to mm-spaces.

Definition 3.1

Let \((X,d_X)\) be a metric space. Given a positive integer n, let \(\Psi _X^{(n)}:X^n \rightarrow {\mathbb {R}}^{n \times n}\) be the map that sends an n-tuple \((x_1,\dots ,x_n)\) to the distance matrix M, where \(M_{ij} = d_X(x_i,x_j)\). The n-th curvature set of X is \({\textbf{K}}_n(X):= {\text {im}}(\Psi _X^{(n)})\), the collection of all distance matrices of n points from X.

Remark 3.2

(Functoriality of curvature sets) Curvature sets are functorial in the sense that if X is isometrically embedded in Y, then \({\textbf{K}}_n(X) \subset {\textbf{K}}_n(Y)\).

Example 3.3

\({\textbf{K}}_2(X)\) is the set of distances of X. If X is geodesic, \({\textbf{K}}_2(X) = [0, \textbf{diam}(X)]\).

Example 3.4

Let \(X = \{p,q\}\) be a two point metric space with \(d_X(p,q) = \delta \). Then

$$\begin{aligned} {\textbf{K}}_3(X)&= \left\{ \Psi _X^{(3)}(p,p,p), \Psi _X^{(3)}(p,p,q), \Psi _X^{(3)}(p,q,p), \Psi _X^{(3)}(q,p,p),\right. \\&\left. \Psi _X^{(3)}(q,q,q), \Psi _X^{(3)}(q,q,p), \Psi _X^{(3)}(q,p,q), \Psi _X^{(3)}(p,q,q) \right\} \\&= \left\{ \left( {\begin{matrix} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 \end{matrix}} \right) , \left( {\begin{matrix} 0 &{} 0 &{} \delta \\ 0 &{} 0 &{} \delta \\ \delta &{} \delta &{} 0 \end{matrix}} \right) , \left( {\begin{matrix} 0 &{} \delta &{} 0\\ \delta &{} 0 &{} \delta \\ 0 &{} \delta &{} 0 \end{matrix}} \right) , \left( {\begin{matrix} 0 &{} \delta &{} \delta \\ \delta &{} 0 &{} 0\\ \delta &{} 0 &{} 0 \end{matrix}} \right) \right\} . \end{aligned}$$

For \(n \ge 2\) and \(0< k < n\), let \(x_1 = \cdots = x_k=p\) and \(x_{k+1} = \cdots = x_n = q\). Define

$$\begin{aligned} M_k(\delta ):= \Psi _X^{(n)}(x_1, \dots , x_n) = \left( \begin{array}{c|c} {\textbf{0}}_{k \times k} &{} \delta \cdot {\textbf{1}}_{k \times (n-k)} \\ \hline \delta \cdot {\textbf{1}}_{(n-k) \times k} &{} {\textbf{0}}_{(n-k) \times (n-k)} \end{array} \right) , \end{aligned}$$

where \({\textbf{0}}_{r \times s}\) and \({\textbf{1}}_{r \times s}\) are the \(r \times s\) matrices with all entries equal to 0 and 1, respectively. If we make another choice of \(x_1,\dots ,x_n\), the resulting distance matrix will change only by a permutation of its rows and columns. Thus, if we define \(M_k^\Pi (\delta ):= \Pi ^T \cdot M_k(\delta ) \cdot \Pi \), for some permutation matrix \(\Pi \in S_n\), then

$$\begin{aligned} {\textbf{K}}_n(X) = \left\{ {\textbf{0}}_{n \times n} \right\} \cup \left\{ M_k^\Pi (\delta ): 0< k < n \text { and } \Pi \in S_n \right\} . \end{aligned}$$

Example 3.5

In this example we describe \({\textbf{K}}_3({\mathbb {S}}^{1})\), where \({\mathbb {S}}^{1} = [0,2\pi ] / (0 \sim 2\pi )\) is equipped with the geodesic metric. Depending on the position of \(x_1,x_2,x_3\), we need two cases. If the three points are not contained in the same semicircle, then \(d_{12}+d_{23}+d_{31}=2\pi \). If they are, then there exists a point, say \(x_2\), that lies on the shortest path joining the other two so that \(d_{13} = d_{12}+d_{23} \le \pi \). The other possibilities are \(d_{12}=d_{13}+d_{32}\) and \(d_{23}=d_{21}+d_{13}\).

Let \(M:= \Psi _{{\mathbb {S}}^{1}}^{(3)}(x_1,x_2,x_3)\). Since M is symmetric and its diagonal entries are 0, we only need 3 entries to characterize it. If we label \(x=d_{12}, y=d_{23}\) and \(z=d_{31}\), then \({\textbf{K}}_3({\mathbb {S}}^{1})\) is the boundary of the 3-simplex with vertices (0, 0, 0), \((\pi , \pi , 0)\), \((\pi , 0, \pi )\), and \((0, \pi , \pi )\) in \({\mathbb {R}}^3\) (see Fig. 9). Each of the cases in the previous paragraph corresponds to a face of this simplex. See also Appendix A and Theorem 4.33 of [33] for a more thorough calculation.

Fig. 9
figure 9

The curvature set \({\textbf{K}}_3({\mathbb {S}}^{1})\); cf. Example 3.5.

Gromov proved that curvature sets are a full invariant of compact metric spaces, which means that the compact spaces X and Y are isometric if and only if \({\textbf{K}}_n(X)={\textbf{K}}_n(Y)\) for all \(n \ge 1\) [46, Sect. 3.27]. For this reason, the following definition from [58] defines a bona-fide metric on compact metric spaces.

Definition 3.6

([58]) The modified Gromov–Hausdorff distance between \(X,Y \in {\mathcal {M}}\) is

$$\begin{aligned} {\widehat{d}}_\mathcal{G}\mathcal{H}(X,Y):= \dfrac{1}{2} \sup _{n \in {\mathbb {N}}} d_{{\mathcal {H}}}({\textbf{K}}_n(X), {\textbf{K}}_n(Y)). \end{aligned}$$
(3)

Here \(d_{{\mathcal {H}}}\) denotes the Hausdorff distance on \({\mathbb {R}}^{n \times n}\) with \(\ell ^\infty \) distance.

[58] proved that

$$\begin{aligned} {\widehat{d}}_\mathcal{G}\mathcal{H}(X,Y) \le d_{\mathcal{G}\mathcal{H}}(X,Y). \end{aligned}$$
(4)

A benefit of \({\widehat{d}}_\mathcal{G}\mathcal{H}\) when compared to the standard Gromov–Hausdorff distance is that the computation of the latter leads in general to NP-hard problems [78], whereas computing the lower bound in the equation above on certain values of n leads to polynomial time problems. In [58] it is argued that work of Peter Olver [74] and Boutin and Kemper [17] leads to identifying rich classes of shapes where these lower bounds permit full discrimination.

The analogous definitions for mm-spaces are the following.

Definition 3.7

Let \((X, d_X, \mu _X)\) be an mm-space. The n-th curvature measure of X is defined as

$$\begin{aligned} \mu _n(X):= \left( \Psi _X^{(n)} \right) _\# \mu _X^{\otimes n}, \end{aligned}$$

where \(\mu _X^{\otimes n}\) is the product measure on \(X^n\). Observe that \(\textrm{supp}(\mu _n(X))={\textbf{K}}_n(X)\) for all \(n \in {\mathbb {N}}\).

We also define the modified Gromov–Wasserstein distance between \(X,Y \in {\mathcal {M}}^w\) as

$$\begin{aligned} {\widehat{d}}_{\mathcal{G}\mathcal{W},p}(X,Y):= \dfrac{1}{2} \sup _{n \in {\mathbb {N}}} d_{{\mathcal {W}},p}(\mu _n(X), \mu _n(Y)), \end{aligned}$$

where \(d_{{\mathcal {W}},p}\) is the p-Wasserstein distance [87] on \({\mathcal {P}}_1({\mathbb {R}}^{n \times n})\), and \({\mathbb {R}}^{n \times n}\) is equipped with the \(\ell ^\infty \) distance.

The modified p-Gromov–Wasserstein distance satisfies an inequality similar to (4).

Theorem 3.8

For any \(X, Y \in {\mathcal {M}}^w\),

$$\begin{aligned} d_{{\mathcal {W}},p}(\mu _n(X),\mu _n(Y)) \le 2 \left( {\begin{array}{c}n\\ 2\end{array}}\right) ^{\frac{1}{p}} d_{\mathcal{G}\mathcal{W},p}(X,Y) \end{aligned}$$

for \(1 \le p < \infty \). If \(p=\infty \),

$$\begin{aligned} {\widehat{d}}_{\mathcal{G}\mathcal{W},\infty }(X,Y) \le d_{\mathcal{G}\mathcal{W},\infty }(X,Y). \end{aligned}$$
(5)

See Appendix 1 for the proof.

Remark 3.9

(Interpretation as “motifs”) In network science [68], it is of interest to identify substructures of a dataset (network) X which appear with high frequency. The interpretation of the definitions above is that the curvature sets \({\textbf{K}}_n(X)\) for different \(n \in {\mathbb {N}}\) capture the information of those substructures whose cardinality is at most n, whereas the curvature measures \(\mu _n(X)\) capture their frequency of occurrence.

3.1 Persistence Sets

The idea behind curvature sets is to study a metric space by taking the distance matrix of a sample of n points. This is the inspiration for the next definition: we want to study the persistence of a compact metric space X by looking at the persistence diagrams of samples with n points induced by a given filtration functor \({\mathfrak {F}}\).

Definition 3.10

Fix \(n \ge 1\) and \(k \ge 0\). Let \((X,d_X) \in {\mathcal {M}}\) and \({\mathfrak {F}}:{\mathcal {M}}^\text {fin} \rightarrow {\mathcal {F}}\) be any filtration functor. The (n,k)-\({\mathfrak {F}}\) persistence set of X is

$$\begin{aligned} {\textbf{D}}_{n,k}^{{\mathfrak {F}}}(X):= \left\{ \textrm{dgm}_k^{\mathfrak {F}}(X'): X'\subset X \text { such that } |X'| \le n \right\} . \end{aligned}$$

Even though the empty persistence diagram \(\emptyset \) always belongs to the set \({\textbf{D}}_{n,k}^{{\mathfrak {F}}}(X)\), we establish the convention to omit writing it explicitly whenever convenient.

Remark 3.11

(Persistence sets are functorial and isometry invariant) Notice that, similarly to curvature sets (cf. Remark 3.2), persistence sets are functorial and isometry invariant. If \(X \hookrightarrow Y\) isometrically, then \({\textbf{K}}_n(X) \subset {\textbf{K}}_n(Y)\), and consequently, \({\textbf{D}}_{n,k}^{{\mathfrak {F}}}(X) \subset {\textbf{D}}_{n,k}^{{\mathfrak {F}}}(Y)\) for all \(n,k \in {\mathbb {N}}\). As such, they can be regarded, in principle, as signatures that can be used to gain insight into datasets or to discriminate between different shapes.

Remark 3.12

Recall from Definition 2.15 that filtration functors are equivariant under isometry. This implies that we can define the \({\mathfrak {F}}\)-persistence diagram of a distance matrix as the diagram of the underlying pseudometric space. More explicitly, if a finite pseudometric space \(X=\{x_1,\dots ,x_n\}\) has distance matrix \(\Psi _X^{(n)}(x_1,\dots ,x_n) = M\), we define \(\textrm{dgm}_k^{\mathfrak {F}}(M):= \textrm{dgm}_k^{\mathfrak {F}}(X)\). For that reason, we can view the persistence set \({\textbf{D}}_{n,k}^{{\mathfrak {F}}}(X)\) as the image of the map \(\textrm{dgm}_k^{\mathfrak {F}}:{\textbf{K}}_n(X) \rightarrow {\mathcal {D}}\).

Persistence sets inherit the stability of the filtration functor.

Theorem 3.13

Let \({\mathfrak {F}}\) be a stable filtration functor with Lipschitz constant \(L({\mathfrak {F}})\). Then for all \(X,Y \in {\mathcal {M}}\) and integers \(n \ge 1\) and \(k \ge 0\), one has

$$\begin{aligned} d_{{\mathcal {H}}}^{\mathcal {D}}({\textbf{D}}_{n,k}^{{\mathfrak {F}}}(X), {\textbf{D}}_{n,k}^{{\mathfrak {F}}}(Y)) \le \frac{1}{2} L({\mathfrak {F}}) \cdot d_{{\mathcal {H}}}({\textbf{K}}_n(X), {\textbf{K}}_n(Y)), \end{aligned}$$

and thus

$$\begin{aligned} d_{{\mathcal {H}}}^{\mathcal {D}}({\textbf{D}}_{n,k}^{{\mathfrak {F}}}(X), {\textbf{D}}_{n,k}^{{\mathfrak {F}}}(Y)) \le L({\mathfrak {F}}) \cdot {\widehat{d}}_\mathcal{G}\mathcal{H}(X,Y), \end{aligned}$$

where \(d_{{\mathcal {H}}}^{\mathcal {D}}\) denotes the Hausdorff distance between subsets of \({\mathcal {D}}\).

Proof

We will show that \(d_{{\mathcal {H}}}^{\mathcal {D}}({\textbf{D}}_{n,k}^{{\mathfrak {F}}}(X), {\textbf{D}}_{n,k}^{{\mathfrak {F}}}(Y)) \le \frac{1}{2} L({\mathfrak {F}}) \cdot d_{{\mathcal {H}}}({\textbf{K}}_n(X), {\textbf{K}}_n(Y))\). Since \(L({\mathfrak {F}}) \cdot {\widehat{d}}_\mathcal{G}\mathcal{H}(X,Y)\) is an upper bound for the right-hand side, the theorem will follow.

Assume \(d_{{\mathcal {H}}}({\textbf{K}}_n(X), {\textbf{K}}_n(Y)) < \eta \). Pick any \(D_1 \in {\textbf{D}}_{n,k}^{{\mathfrak {F}}}(X)\). Let \({\mathbb {X}} = (x_1, \dots , x_n) \in X^n\) such that \(\Psi _X^{(n)}({\mathbb {X}})=M_1\) and \(D_1=\textrm{dgm}_k^{\mathfrak {F}}(M_1)\). From the assumption on \(d_{{\mathcal {H}}}({\textbf{K}}_n(X), {\textbf{K}}_n(Y))\), there exists \(M_2 \in {\textbf{K}}_n(Y)\) such that \(\Vert M_1-M_2\Vert _{\infty } < \eta \). As before, let \({\mathbb {Y}} = (y_1,\dots ,y_n)\) be such that \(M_2=\Psi _Y^{(n)}({\mathbb {Y}})\) and \(D_2 = \textrm{dgm}_k^{\mathfrak {F}}(M_2)\). By abuse of notation, consider \({\mathbb {X}}\) and \({\mathbb {Y}}\) as pseudometric spaces and observe that \(D_1 = \textrm{dgm}_k^{\mathfrak {F}}({\mathbb {X}})\) and \(D_2 = \textrm{dgm}_k^{\mathfrak {F}}({\mathbb {Y}})\) (see Remark 3.12). Then, by Definition 2.17,

$$\begin{aligned} d_{{\mathcal {B}}}(D_1,D_2) \le L({\mathfrak {F}}) \cdot d_{\mathcal{G}\mathcal{H}}({\mathbb {X}}, {\mathbb {Y}}). \end{aligned}$$

With the correspondence \(R = \left\{ (x_i,y_i) \in {\mathbb {X}} \times {\mathbb {Y}}: i=1, \dots , n\right\} \), we can bound the \(d_{\mathcal{G}\mathcal{H}}({\mathbb {X}}, {\mathbb {Y}})\) term by

$$\begin{aligned} d_{\mathcal{G}\mathcal{H}}({\mathbb {X}}, {\mathbb {Y}}) \le \dfrac{1}{2} \textrm{dis}(R){} & {} =\dfrac{1}{2} \max _{i,j=1,\dots ,n} |d_X(x_i,x_j)-d_Y(y_i,y_j)|\\{} & {} =\dfrac{1}{2} \Vert M_1-M_2\Vert _{\infty } < \dfrac{\eta }{2}. \end{aligned}$$

In summary, for every \(D_1 \in {\textbf{D}}_{n,k}^{{\mathfrak {F}}}(X)\), we can find \(D_2 \in {\textbf{D}}_{n,k}^{{\mathfrak {F}}}(Y)\) such that \(d_{{\mathcal {B}}}(D_1,D_2) \le L({\mathfrak {F}}) \cdot d_{\mathcal{G}\mathcal{H}}({\mathbb {X}}, {\mathbb {Y}}) < L({\mathfrak {F}}) \cdot \eta /2\). Changing the roles of X and Y gives the same bound on the Hausdorff distance so, when we let \(\eta \rightarrow d_{{\mathcal {H}}}({\textbf{K}}_n(X), {\textbf{K}}_n(Y))\), we obtain

$$\begin{aligned} d_{{\mathcal {H}}}^{\mathcal {D}}({\textbf{D}}_{n,k}^{{\mathfrak {F}}}(X), {\textbf{D}}_{n,k}^{{\mathfrak {F}}}(Y)) \le \frac{1}{2} L({\mathfrak {F}}) \cdot d_{{\mathcal {H}}}({\textbf{K}}_n(X), {\textbf{K}}_n(Y)). \end{aligned}$$

as desired. \(\square \)

Remark 3.14

(Tightness of the bound) Recall that \(L({\mathfrak {F}}^\textrm{VR})=2\). Let \(\delta _1 \ne \delta _2\) be positive real numbers. For \(i=1,2\), let \(X_i = \{x_1^{(i)},x_2^{(i)}\}\) be a two-point metric space with \(d_{X_i}(x_1^{(i)}, x_2^{(i)}) = \delta _i > 0\). Observe that \(\textrm{dgm}_0^\textrm{VR}(X_i) = \{(0,\delta _i) \}\), so

$$\begin{aligned}{} & {} d_{{\mathcal {H}}}^{\mathcal {D}}({\textbf{D}}_{2,0}^{\textrm{VR}}(X_1), {\textbf{D}}_{2,0}^{\textrm{VR}}(X_2))\\{} & {} = d_{{\mathcal {B}}}(\textrm{dgm}_{0}^\textrm{VR}(X_1), \textrm{dgm}_{0}^\textrm{VR}(X_2)) = \Vert (0,\delta _1)-(0,\delta _2)\Vert _\infty \\{} & {} =|\delta _1-\delta _2|. \end{aligned}$$

On the other hand, \({\widehat{d}}_\mathcal{G}\mathcal{H}(X_1, X_2) = \frac{1}{2}|\delta _1-\delta _2|\). To wit, since \({\textbf{K}}_2(X_i) = \left\{ ({\begin{matrix}0 &{} 0\\ 0 &{} 0\end{matrix}}), ({\begin{matrix}0 &{} \delta _i\\ \delta _i &{} 0\end{matrix}}) \right\} \), we have the lower bound \({\widehat{d}}_\mathcal{G}\mathcal{H}(X_1, X_2) \ge \frac{1}{2}d_{{\mathcal {H}}}({\textbf{K}}_2(X_1), {\textbf{K}}_2(X_2)) = \frac{1}{2}|\delta _1-\delta _2|\). The upper bound is given by \(d_{\mathcal{G}\mathcal{H}}(X_1,X_2)=\frac{1}{2}|\delta _1-\delta _2|\). Thus, \(d_{{\mathcal {H}}}^{\mathcal {D}}({\textbf{D}}_{2,0}^{\textrm{VR}}(X_1),{\textbf{D}}_{2,0}^{\textrm{VR}}(X_2)) = L({\mathfrak {F}}^\textrm{VR}) \cdot {\widehat{d}}_\mathcal{G}\mathcal{H}(X_1,X_2)\).

3.1.1 VR-Persistence Sets of Ultrametric Spaces

We now show that \({\textbf{D}}_{n,0}^{\textrm{VR}}\), the simplest of all persistence sets, can sometimes capture information that persistence diagrams cannot see.

Definition 3.15

An ultrametric space \((U, d_U)\) is a metric space such that every triple \(x_1, x_2, x_3 \in U\) satisfies the ultrametric inequality:

$$\begin{aligned} d_{U}(x_1,x_3) \le \max (d_{U}(x_1,x_2), d_{U}(x_2,x_3)). \end{aligned}$$

Observe that applying the ultrametric inequality to \(d_{U}(x_1,x_2)\) and \(d_{U}(x_2,x_3)\) implies that the two largest distances among \(d_{U}(x_1,x_3)\), \(d_{U}(x_1,x_2)\), and \(d_{U}(x_2,x_3)\) are equal. Ultrametric spaces are usually represented as dendrograms [27], where \(d_U(x_1, x_2)\) is the first value of t such that \(x_1\) and \(x_2\) belong to the same cluster.

Example 3.16

(\(\{{\textbf{D}}_{n,0}^{\textrm{VR}}\}_{n\ge 1}\) can distinguish spaces that \(\textrm{dgm}_0^\textrm{VR}\) cannot.) Let X be a metric space with N points. The collection of persistence sets \(\{{\textbf{D}}_{n,0}^{\textrm{VR}}(X)\}_{n\ge 1}\) generally contains more information than \(\textrm{dgm}_0^\textrm{VR}(X)\). Indeed, as we pointed out before, \(\textrm{dgm}_0^\textrm{VR}(X)\in {\textbf{D}}_{N,0}^{\textrm{VR}}.\) The diagram \(\textrm{dgm}_0^\textrm{VR}(X)\) contains \(N-1\) (non-infinite) points (recall that we are using reduced homology) corresponding to the distances in a minimum spanning tree for X, while \({\textbf{D}}_{2,0}^{\textrm{VR}}(X)\) contains one point for every distinct distance in X, and \({\textbf{U}}_{2,0}^{\textrm{VR}}(X)\) counts the number of times each distance appears. Therefore, if all pairwise distances in X are different, \({\textbf{D}}_{2,0}^{\textrm{VR}}\) will capture all \({N}\atopwithdelims (){2}\) pairwise distances whereas \(\textrm{dgm}_0^\textrm{VR}(X)\) will be able to recover only \(N-1\) of them. Now, if X is instead assumed to be compact and connected, \(\textrm{dgm}_0^\textrm{VR}(X)\) will be empty whereas \({\textbf{D}}_{2,0}^{\textrm{VR}}(X)\) will recover the set \(\textrm{im}(d_X) = [0,\textbf{diam}(X)]\) of all possible distances attained by pairs of points in X.

The difference between the invariants \(\textrm{dgm}_0^\textrm{VR}(X)\) and \({\textbf{D}}_{n,0}^{\textrm{VR}}(X)\) becomes more apparent in the case of ultrametric spaces. Any ultrametric space U is tree-like (see Definition 6.1), so by Lemma 6.2, both \(\textrm{dgm}_k^\textrm{VR}(U) = \emptyset \) and \({\textbf{D}}_{n,k}^{\textrm{VR}}(U) = \{ \emptyset \}\) for \(k \ge 1\). Thus, all the persistence information of ultrametric spaces is concentrated in dimension 0. With that in mind, Fig. 10 shows two ultrametric spaces \(U_1\) and \(U_2\) such that \(\textrm{dgm}_0^\textrm{VR}(U_1) = \textrm{dgm}_0^\textrm{VR}(U_2) = \{ (0,1), (0,1), (0,2) \}\). Notice that \({\textbf{D}}_{3,0}^{\textrm{VR}}(U_1)\) consists of only the diagram \(D_1:= \{(0,1), (0,1)\}\), whereas \({\textbf{D}}_{3,0}^{\textrm{VR}}(U_2)\) consists of \(D_1\) and \(D_2:= \{(0,1), (0,2)\}\). Thus, \({\textbf{D}}_{3,0}^{\textrm{VR}}\) differentiates two (ultra)metric spaces that \(\textrm{dgm}_*^\textrm{VR}(X)\) cannot tell apart.

Fig. 10
figure 10

Two ultrametric spaces \(U_1\), \(U_2\) for which \(\textrm{dgm}_k^\textrm{VR}(U_1)=\textrm{dgm}_k^\textrm{VR}(U_2)\) for all \(k \ge 0\) but, in contrast, \({\textbf{D}}_{n,0}^{\textrm{VR}}(U_1) \ne {\textbf{D}}_{n,0}^{\textrm{VR}}(U_2)\) for \(n=3\).

3.1.2 Computational Cost and Memory Requirements

One thing to keep in mind is that computing the single diagram \(\textrm{dgm}_{1}^\textrm{VR}(X)\) when \(n_X:= |X| = 1000\) points is likely to be much more computationally expensive than computing 10,000 VR one-dimensional persistence diagrams obtained by randomly sampling points from X, i.e. approximating \({\textbf{D}}_{n,1}^{\textrm{VR}}(X)\) with small n. Let \(c(n_X, k)\) denote the worst case time that it takes to compute \(\textrm{dgm}_k^\textrm{VR}(X)\). Earlier algorithms, like the one in [65], are based on Gaussian elimination and their complexity is bounded in terms of the number of simplices in the filtration. In the worst case, computing \(\textrm{dgm}_k^\textrm{VR}(X)\) requires knowledge of the \((k+1)\)-simplices of \(\textrm{VR}_r(X)\), each of which is a subset of size \(k+2\), so [65] gives a worst-case bound of \(c(n_X,k) \approx O\bigg (\left( {\begin{array}{c}n_X\\ k+2\end{array}}\right) ^\omega \bigg )\). Here, we are assuming that multiplication of \(m \times m\) matrices has costFootnote 3\(O(m^\omega )\). In contrast, since there are \(\left( {\begin{array}{c}n_X\\ n\end{array}}\right) \) possible n-tuples of points of X (up to permutation), the complexity of computing \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) is bounded by \(O\bigg (c(n,k) \cdot \left( {\begin{array}{c}n_X\\ n\end{array}}\right) \bigg )\). For example, let \(n=4\) and \(k=1\). Since k is a small constant, we approximate \(\left( {\begin{array}{c}n\\ k\end{array}}\right) \approx n^k\). Then the worst-case bound for \(\textrm{dgm}_1^\textrm{VR}(X)\) is \(c(n_X,1) \approx O(n_X^{3\omega }) \approx O(n_X^{7.11})\), while \({\textbf{D}}_{4,1}^{\textrm{VR}}(X)\) only takes \(O\left( c(4,1) \cdot \left( {\begin{array}{c}n_X\\ 4\end{array}}\right) \right) \approx O(n_X^4)\). In general, \(O\left( c(n,k)\cdot \left( {\begin{array}{c}n_X\\ n\end{array}}\right) \right) \) will be smaller than \(c(n_X,k)\) as long as \(n < \omega (k+2)\).

Modern implementations of VR persistent homology [8, 91] are much more efficient in practice, and their performance is linear in the number of simplices, that is, they have cost \(c'(n_X,k) \approx O\bigg (\left( {\begin{array}{c}n_X\\ k+2\end{array}}\right) \bigg )\). Several sources give evidence for this claim. For example, the authors of [18] argue that the practical linear bound is due to the sparsity of the boundary matrix. Similarly, the paper [40] presents a rich family of examples where the expected runtime of the standard algorithm is better than the worst-case, at least for boundary matrices in degree 1. They also construct an example that realizes the worst-case runtime, although they argue that such an example is not typical in practice. In contrast, we will show that \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) is non-empty only when \(n \ge 2k+2\) in Theorem 4.4, so the cost of computing the full persistence set \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) with modern algorithms is at least \(O\bigg (c'(n,k) \cdot \left( {\begin{array}{c}n_X\\ 2k+2\end{array}}\right) \bigg )\), which is larger than \(c'(n_X,k)\).

Approximation Another point which lends flexibility to the approximate computation of persistence sets is that one can actually easily cap the number of n-tuples to be considered by a parameter N, and this case the complexity associated to estimating \({\textbf{D}}_{n,k}^{\textrm{VR}}\) will be \(O\bigg (\left( {\begin{array}{c}n\\ k+2\end{array}}\right) N\bigg )\). This is the pragmatic approach we have followed in the experiments reported in this paper and in the code provided in our Github repository [42]. In Sect. 3.2.1 we provide probabilistic convergence results as well as approximation bounds that provide a justification for this approach.

Parallelizability and memory requirements Furthermore, these calculations are of course eminently pararelizable and, if \(n\ll N\), the memory requirements for computing an estimate to \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) are substantially more modest than what computing \(\textrm{dgm}_k^\textrm{VR}(X)\) would require since the boundary matrices that one needs to store in memory are several orders of magnitude smaller. We continue this discussion in Sect. 4.4, where we show datasets with increasing cardinality n where the memory used to approximate principal persistence sets remains almost constant, whereas the memory required during the computation of persistence diagrams grows up to the point that the calculation cannot finish after a certain value of n.

Proposition 3.17

If \(|X|=N\), the (worst case) computational cost of computing \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) is \(O\bigg (\left( {\begin{array}{c}n\\ k+2\end{array}}\right) ^\omega \, N^n\bigg )\), where \(\omega \) is the matrix multiplication exponent.

Finally, if one is only interested in the principal persistence set, a much faster geometric algorithm is available, cf. Section 4.1.1. See our Github repository [42] for a parfor based Matlab implementation.

3.2 Persistence Measures

We now extend the constructions in the previous section to mm-spaces.

Definition 3.18

For each filtration functor \({\mathfrak {F}}\), integers \(n \ge 1, k \ge 0\), and \(X \in {\mathcal {M}}^w\), define the (nk)-persistence measure of X as (see Definition 3.7 and Remark 3.12)

$$\begin{aligned} {\textbf{U}}_{n,k}^{{\mathfrak {F}}}(X):= \left( \textrm{dgm}_{k}^{\mathfrak {F}}\right) _\# \mu _n(X). \end{aligned}$$

We also have a stability result for these measures in terms of the Gromov–Wasserstein distance.

Theorem 3.19

Let \({\mathfrak {F}}\) be a stable filtration functor with Lipschitz constant \(L({\mathfrak {F}})\). For all \(X,Y \in {\mathcal {M}}^w\) and integers \(n \ge 1\) and \(k \ge 0\),

$$\begin{aligned} d_{{\mathcal {W}},p}^{\mathcal {D}}({\textbf{U}}_{n,k}^{{\mathfrak {F}}}(X), {\textbf{U}}_{n,k}^{{\mathfrak {F}}}(Y)) \le \frac{L({\mathfrak {F}})}{2} \cdot d_{{\mathcal {W}},p}(\mu _n(X), \mu _n(Y)) \end{aligned}$$

and, as a consequence,

$$\begin{aligned} d_{{\mathcal {W}},p}^{\mathcal {D}}({\textbf{U}}_{n,k}^{{\mathfrak {F}}}(X), {\textbf{U}}_{n,k}^{{\mathfrak {F}}}(Y)) \le L({\mathfrak {F}}) \cdot {\widehat{d}}_{\mathcal{G}\mathcal{W},p}(X,Y). \end{aligned}$$

We prove this theorem in Appendix 1.

3.2.1 Probabilistic Approximation of Persistence Sets

Regarding the idea of approximating \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) using N samples of n points, consider the persistence sets shown in Fig. 12. These figures were obtained by sampling \(N=10^6\) configurations of \(n=4\) points uniformly at random from \({\mathbb {S}}^{1}\), \({\mathbb {S}}^{2}\), and from the torus \({\mathbb {T}}^2 = {\mathbb {S}}^{1} \times {\mathbb {S}}^{1}\). Observe that the analytical graph of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) in Fig. 7 and the approximation in the leftmost panel of Fig. 12 are very similar. Their similarity indicates that using \(N=10^6\) was more than enough to get a good approximation of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\).

More generally, consider an mm-space \((X, d_X, \mu _X)\). Let \({\textbf{x}}_1, \dots , {\textbf{x}}_N \in X^n\) be i.i.d. random variables distributed according to the product measure \(\mu _X^{\otimes n}\). Using the stochastic covering theorem from [27, Thm. 34], we find a lower bound for N so that an approximation to \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) via \(\{ \textrm{dgm}_k^\textrm{VR}({\textbf{x}}_i) \}_{i=1}^N\) is \(\varepsilon \)-close (with respect to the Hausdorff distance) to \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) with probability at least p.

Now, define the function \(f_X(\varepsilon ):= \displaystyle \min _{x \in X} \mu _X \left( B_\varepsilon (x) \right) \). Recall that an mm-space space X is (lower) Ahlfors regular (see Definition 3.18, page 252 of [31]) if there exist constants \(c,d>0\) such that \(f_X(\varepsilon )\ge \min (1,c\,\varepsilon ^d)\) for all \(\varepsilon >0.\) In the next theorem we assume that X is Ahlfors regular.

Theorem 3.20

(Approximation of \({\textbf{K}}_n(X)\) and \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\)) Let \(n \ge 2\). Fix a confidence level \(p \in [0,1]\) and \(\varepsilon > 0\). Let \(\displaystyle N_0 =N_0(X;n,p,\varepsilon ):= \Bigg \lceil \dfrac{-\ln \left[ (1-p)f_X^n(\varepsilon /2)\right] }{f_X^n(\varepsilon /2)} \Bigg \rceil \). Then, for all \(N \ge N_0\),

  • \(d_{{\mathcal {H}}}^{{\textbf{K}}_n(X)}\left( \{ \Psi _X^{(n)}({\textbf{x}}_i) \}_{i=1}^N, {\textbf{K}}_n(X) \right) \le \varepsilon \) with probability \(\ge p\).

  • \(d_{{\mathcal {H}}}^{\mathcal {D}}\left( \{ \textrm{dgm}_k^\textrm{VR}({\textbf{x}}_i) \}_{i=1}^N, {\textbf{D}}_{n,k}^{\textrm{VR}}(X) \right) \le \varepsilon \) with probability \(\ge p\).

Furthermore, the estimators \(\{ \Psi _X^{(n)}({\textbf{x}}_i) \}_{i=1}^N\) and \(\{ \textrm{dgm}_k^\textrm{VR}({\textbf{x}}_i) \}_{i=1}^N\) converge to \({\textbf{K}}_n(X)\) and \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\), respectively, almost surely as \(N \rightarrow \infty \).

See Appendix 1 for the proof.

4 Vietoris–Rips Principal Persistence Sets

From this point on, we focus on the Vietoris–Rips persistence sets \({\textbf{D}}_{n,k}^{\textrm{VR}}\) with \(n=2k+2\). The reason to do so is Theorem 4.4, which states that the k-dimensional persistence diagram of \(\textrm{VR}_{*}(X)\) is empty if \(|X| < 2k+2\) and has at most one point if \(|X|=2k+2\). What this means for persistence sets \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) is that given a fixed k, the first interesting choice of n is \(n=2k+2\). We prove this fact in Sect. 4.1 and then use it to construct a graphical representation of \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) in Sect. 4.2. We also present the results of a classification experiment in Sect. 4.3 and a comparison of the computational resources consumed by persistence sets and persistence diagrams in Sect. 4.4.

4.1 Some Properties of VR-Filtrations and Their Persistence Diagrams

Let X be a finite metric space with n points. The highest dimensional simplex of \(\textrm{VR}_*(X)\) has dimension \(n-1\), but even if \(\textrm{VR}_*(X)\) contains k-dimensional simplices, it won’t necessarily produce persistent homology in dimension k. The first definition of this section is inspired by the structure of the cross-polytope \({\mathfrak {B}}_{m}\); see Fig. 8. Recall that a set \(\sigma \subset V = \{ \pm e_1, \dots , \pm e_m \}\) is a face if it doesn’t contain both \(e_i\) and \(-e_i\). In particular, there is an edge between \(e_i\) and every other vertex except \(-e_i\). The next definition tries to emulate this phenomenon in \(\textrm{VR}_{*}(X)\).

Definition 4.1

Let \((X, d_X)\) be a finite metric space, \(A \subset X\), and fix \(x_0 \in X\). Find two distinct points \(x_1, x_2 \in A\) such that \(d_X(x_0,x_1) \ge d_X(x_0,x_2) \ge d_X(x_0,a)\) for all \(a \in A {\setminus } \{x_1,x_2\}\). Define

$$\begin{aligned} t_d(x_0,A) := d_X(x_0,x_1), \text { and } t_b(x_0,A) := d_X(x_0,x_2). \end{aligned}$$

We set \(v_d(x_0,A):= x_1\). When \(A=X\) and there is no risk of confusion, we will denote \(t_b(x_0,X) \), \(t_d(x_0,X) \), and \(v_d(x_0,X)\) simply as \(t_b(x_0) , t_d(x_0) \), and \(v_d(x_0)\), respectively. Also define

$$\begin{aligned} t_b(X) := \max _{x \in X} t_b(x,X) \,\,\,\text{ and }\,\,\, t_d(X) := \min _{x \in X} t_d(x,X) . \end{aligned}$$

In a few words, \(t_d(x) \ge t_b(x) \) are the two largest distances between x and any other point of X. The motivation behind these choices is that if r satisfies \(t_b(x) \le r < t_d(x) \), then \(\textrm{VR}_r(X)\) contains all edges between x and all other points of X, except for \(v_d(x)\). If this holds for all \(x \in X\), then \(\textrm{VR}_r(X)\) is isomorphic to a cross-polytope. Also, note that \(t_d(X) \) is the radius \(\textbf{rad}(X)\) of X, cf. Definition 2.1. Also note that according to [52, Prop. 9.6], the death time of any interval in \(\textrm{dgm}_*(X)\) is bounded by \(\textbf{rad}(X)\).

Of course, \(v_d(x)\) as defined above is not well defined. However, it is in the case that interests us.

Lemma 4.2

Let \((X,d_X)\) be a finite metric space and suppose that \(t_b(X) < t_d(X) \). Then \(v_d:X\rightarrow X\) is well defined and \(v_d\circ v_d= \textrm{id}\).

Proof

Given a point \(x \in X\), suppose there exist \(x_1 \ne x_2 \in X\) such that \(d_X(x,x_1) = d_X(x,x_2) \ge d_X(x,x')\) for all \(x' \in X\). Since \(t_b(x) \) and \(t_d(x) \) are the two largest distances between x and any \(x' \in X\), we have \(t_b(x) = t_d(x) \). However, this implies \(t_d(X) \le t_d(x) = t_b(x) \le t_b(X) \), which contradicts the hypothesis \(t_b(X) < t_d(X) \). Thus, we have a unique choice of \(v_d(x)\) for every \(x \in X\).

For the second claim, suppose that \(v_d^2(x):=v_d(v_d(x)) \ne x\). Then \(t_d(v_d(x)) = d_X(v_d(x), v_d^2(x)) \ge d_X(v_d(x), x)\). Hence, the second largest distance \(t_b(v_d(x)) \) is at least \(d_X(v_d(x), x)\). However,

$$\begin{aligned} t_d(X) \le t_d(x) = d_X(x, v_d(x)) \le t_b(v_d(x)) \le t_b(X) , \end{aligned}$$

which is, again, a contradiction. Thus, \(v_d^2(x) = x\). \(\square \)

Under these conditions, we can produce the claimed isomorphism between \(\textrm{VR}_r(X)\) and a cross-polytope.

Proposition 4.3

Let \((X,d_X)\) be a metric space with \(|X|=n\), where \(n \ge 2\) is even, and suppose that \(t_b(X) < t_d(X) \). Let \(k=\frac{n}{2}-1\). Then \(\textrm{VR}_{r}(X)\) is isomorphic, as a simplicial complex, to the cross-polytope \({\mathfrak {B}}_{k+1}\) for all \(r \in [t_b(X) ,t_d(X) )\).

Proof

Let \(r \in [t_b(X) , t_d(X) )\). Lemma 4.2 implies that we can partition X into \(k+1\) pairs \(\{x_i^+, x_i^-\}\) such that \(x_i^- = v_d(x_i^+)\), so define \(f:\{\pm e_1, \dots , \pm e_{k+1}\} \rightarrow X\) as \(f(\varepsilon \cdot e_i) = x_i^\varepsilon \), for \(\varepsilon = \pm 1\). Both cross-polytopes and Vietoris–Rips complexes are flag complexes, so it’s enough to verify that f induces an isomorphism of their 1-skeleta. Indeed, for any \(i=1,\dots ,k+1\), \(\varepsilon =\pm 1\), and \(x \ne x_i^{-\varepsilon }\), we have \(d_X(x_i^\varepsilon ,x) \le t_b(x_i^\varepsilon ) \le t_b(X) \le r < t_d(X) \le t_d(x_i^\varepsilon ) = d_X(x_i^+, x_i^-)\). Thus, \(\textrm{VR}_r(X)\) contains the edges \([x_i^\varepsilon , x]\) for \(x \ne x_i^{-\varepsilon }\), but not \([x_i^+, x_i^-]\). Since \(f(\varepsilon \cdot e_i) = x_i^\varepsilon \), f sends the simplices \([\varepsilon \cdot e_i, v]\) to the simplices \([x_i^\varepsilon , f(v)]\) and the non-simplex \([e_i, -e_i]\) to the non-simplex \([x_i^+, x_i^-]\). \(\square \)

A consequence of the previous proposition is that \(H_k(\textrm{VR}_{r}(X)) \simeq H_k({\mathfrak {B}}_{k+1}) = {\mathbb {F}}\) for \(r \in [t_b(X) , t_d(X) )\). It turns out that \(n=2k+2\) is the minimum number of points that X needs to have in order to produce persistent homology in dimension k, which is what we prove next. The proof is inspired by the use of the Mayer–Vietoris sequence to find \(H_k({\mathbb {S}}^{k})\) by splitting \({\mathbb {S}}^{k}\) into two hemispheres that intersect in an equator \({\mathbb {S}}^{k-1}\). Since the hemispheres are contractible, the Mayer–Vietoris sequence produces an isomorphism \(H_k({\mathbb {S}}^{k}) \simeq H_{k-1}({\mathbb {S}}^{k-1})\). We emulate this by splitting \(\textrm{VR}_r(X)\) into two halves which, under the right circumstances, are contractible and find the k-th persistent homology of \(\textrm{VR}_*(X)\) in terms of the \((k-1)\)-dimensional persistent homology of a subcomplex.

Two related results appear in [3, 21, 48]. The first two references prove that a flag complex with non-trivial \(H_k\) has at least \(2k+2\) vertices (Lemma 5.3 in [48] and Proposition 5.4 in [3]); case (A.) in our Theorem 4.4 is a consequence of this fact. The decomposition \(\textrm{VR}_r(X) = \textrm{VR}_r(B_0) \cup \textrm{VR}_r(B_1)\) (see the proof for the definition of \(B_0\) and \(B_1\)) already appears as Proposition 2.2 in the appendix of [21]. The novelty in the next Theorem is the characterization of the persistence diagram \(\textrm{dgm}_k^\textrm{VR}(X)\) in terms of \(t_b(X) \) and \(t_d(X) \).

Theorem 4.4

Let \((X, d_X)\) be a metric space with n points. Then:

  1. A.

    For all integers \(k > \frac{n}{2}-1\), \(\textrm{dgm}_k^\textrm{VR}(X)=\emptyset \).

  2. B.

    If n is even and \(k = \frac{n}{2}-1\), then \(\textrm{dgm}_k^\textrm{VR}(X)\) consists of a single point \((t_b(X) , t_d(X) )\) if and only if \(t_b(X) < t_d(X) \), and is empty otherwise.

Example 4.5

(The conclusion of Theorem 4.4 when \(n=4\)) Let us consider the case \(k=1\) and \(n=4\). Let \(X = \{x_1, x_2, x_3, x_4\}\) as shown in Fig. 11. In order for \(\textrm{dgm}_1^\textrm{VR}(X)\) to be non-empty, \(\textrm{VR}_r(X)\) has to contain all the “outer edges” and none of the “diagonals”. That is, there exists \(r>0\) such that

$$\begin{aligned} d_{12}, d_{23}, d_{34}, d_{41} \le r < d_{13}, d_{24}. \end{aligned}$$

On the other hand, the calculations in Table 1 yield \(t_b(X) =\max (d_{12}, d_{23}, d_{34}, d_{41})\), \(t_d(X) =\min (d_{13}, d_{24})\). We also have \(v_d(x_1)=x_3\), \(v_d(x_2)=x_4\), and \(v_d\circ v_d= \textrm{id}\). In either case, \(\textrm{dgm}_1^\textrm{VR}(X) = \bigg (\max (d_{12}, d_{23}, d_{34}, d_{41}), \min (d_{13}, d_{24}) \bigg )\).

Fig. 11
figure 11

A generic metric space with 4 points. In order for \({\text {PH}}_1^\textrm{VR}(X)\) to be non-zero, the two diagonals should be larger than the outer edges.

However, if we had \(d_{12}, d_{23}, d_{34}< d_{24}< d_{41} < d_{13}\) for example, then the 2-simplex \([x_2, x_3, x_4]\) appears before the would-be generator \([x_1,x_2]+[x_2,x_3]+[x_3,x_4]+[x_4,x_1]\), so \(\textrm{dgm}_1^\textrm{VR}(X) = \emptyset \). According to Table 2, \(t_b(X) =d_{41} > d_{24}=t_d(X) \), and \(v_d(x_2)=x_4\) but \(v_d(x_4)=x_1 \ne x_2\).

Table 1 \(t_b(x_i) \) and \(t_d(x_i) \) when the sides of the quadrilateral X are smaller than the diagonals
Table 2 \(t_b(x_i) \) and \(t_d(x_i) \) when the side \(d_{41}\) of the quadrilateral X is larger than the diagonal \(d_{24}\)

In general, we want to partition X into pairs of “opposite” points, that is pairs xy such that \(v_d(x)=y\) and \(v_d(y)=x\). Intuitively, this says that the diagonals are larger than every other edge. If not, as in the second case, then no persistence is produced. As for \(k=1\) and \(n=4\), we will generally label the points as \(x_1,x_2,x_3,x_4\) in such a way that

$$\begin{aligned} t_b(X) =\max (d_{12}, d_{23}, d_{34}, d_{41}) \text { and } t_d(X) =\min (d_{13}, d_{24}). \end{aligned}$$

Proof of Theorem 4.4

The proof is by induction on n. Recall that \({\text {PH}}^\textrm{VR}_k(X)\) denotes the reduced homology of the VR-complex \({\widetilde{H}}_k(\textrm{VR}_*(X))\). If \(n=1\), \(\textrm{VR}_{r}(X)\) is contractible for all r, and so \({\text {PH}}^\textrm{VR}_k(X) = 0\) for all \(k \ge 0 > \frac{n}{2}-1\). If \(n=2\), let \(X=\{x_0, x_1\}\). The space \(\textrm{VR}_{r}(X)\) is two discrete points when \(r \in [0,\textbf{diam}(X))\) and an interval when \(r \ge \textbf{diam}(X)\). Then \({\text {PH}}^\textrm{VR}_k(X) = 0\) for all \(k \ge 1 > \frac{n}{2}-1\), and \({\text {PH}}^\textrm{VR}_0(X) = {\mathbb {I}}[0,\textbf{diam}(X))\). Furthermore, this interval module equals \({\mathbb {I}}[t_b(X) ,t_d(X) )\) because \(d_X(x_0,x_1) > d_X(x_0,x_0) = 0\), so \(t_b(x_0) =0\) and \(t_d(x_0) =d_X(x_0,x_1)\). The same holds for \(x_1\), so \(t_b(X) = 0\) and \(t_d(X) = d_X(x_0, x_1) = \textbf{diam}(X)\).

For the inductive step, assume that the proposition holds for every metric space with less than n points. Fix X with \(|X|=n\) and an integer \(k \ge \frac{n}{2}-1\). \(\textrm{VR}_{r}(X)\) is contractible when \(r \ge \textbf{diam}(X)\), so let \(r < \textbf{diam}(X)\) and choose any pair \(x_0, x_1 \in X\) such that \(d_X(x_0,x_1) = \textbf{diam}(X)\). Let \(B_j = X {\setminus } \{x_j\}\) for \(j=0,1\) and \(A = X {\setminus } \{x_0,x_1\}\). Because of the restriction on r, \(\textrm{VR}_{r}(X)\) contains no simplex \(\sigma \supset [x_0,x_1]\), so \(\textrm{VR}_{r}(X) = \textrm{VR}_{r}(B_0) \cup \textrm{VR}_{r}(B_1)\). At the same time, \(\textrm{VR}_{r}(A) = \textrm{VR}_{r}(B_0) \cap \textrm{VR}_{r}(B_1)\), so we can use the Mayer–Vietoris sequence:

figure c

Since \(|B_j| < n\), the induction hypothesis implies that \({\text {PH}}^\textrm{VR}_k(B_j) = 0\), and so \(\partial _*\) is injective for any r. Now we verify the two claims in the statement.

Item A.: Suppose \(k > \frac{n}{2}-1\).

Observe that \(k-1 > \frac{n-2}{2}-1\) and \(|A|=n-2\), so the induction hypothesis gives \({\text {PH}}^\textrm{VR}_{k-1}(A) = 0\). Then \({\widetilde{H}}_{k}(\textrm{VR}_{r}(X)) = 0\) for all \(r \in [0,\textbf{diam}(X)]\) and, since \(\textrm{VR}_{r}(X)\) is contractible when \(r \ge \textbf{diam}(X)\), the homology of \(\textrm{VR}_r(X)\) is still 0 for \(r \in [\textbf{diam}(X), \infty )\).

Item B.: Suppose \(k = \frac{n}{2}-1\).

By induction hypothesis, \({\text {PH}}^\textrm{VR}_{k-1}(A)\) is either a single interval \({\mathbb {I}}[t_b(A) ,t_d(A) )\) or 0 depending on whether \(t_b(A) < t_d(A) \) or not. Also define

$$\begin{aligned} b:= \max \left[ t_b(A) , \max _{a \in A} d_X(x_0,a), \max _{a \in A} d_X(x_1,a)\right] . \end{aligned}$$
(7)

We claim that \({\text {PH}}^\textrm{VR}_k(X) \cong {\mathbb {I}}[b,t_d(A) )\) if and only if \(b < t_d(A) \).

Case 1: If \(r \in [0, t_b(A) )\) or \(r \in [t_d(A) , \infty )\), then \({\widetilde{H}}_k(\textrm{VR}_{r}(X)) \cong 0\).

Since \({\text {PH}}^\textrm{VR}_{k-1}(A) \cong {\mathbb {I}}[t_b(A) ,t_d(A) )\), we have \({\widetilde{H}}_{k-1}(\textrm{VR}_{r}(A)) = 0\) for \(r \notin [t_b(A) , t_d(A) )\). Now \({\widetilde{H}}_k(\textrm{VR}_{r}(X))=0\) follows from the Mayer–Vietoris sequence.

Case 2: If \(r \in [t_b(A) ,b)\), then \({\widetilde{H}}_k(\textrm{VR}_{r}(X)) \cong 0\).

Notice that we might have \(b \ge t_d(A) \). However, the conclusion for \(r \in [t_d(A) , b)\) follows from Case 1, so we can assume \(r \in [t_b(A) , b) \cap [t_b(A) , t_d(A) )\). Additionally, if \(b=t_b(A) \), then the interval \([t_b(A) ,b)\) is empty and there is nothing to prove. Suppose, then, \(b = d_X(x_0,a_0) > t_b(A) \) for some \(a_0 \in A\). In that case, \(\textrm{VR}_r(B_1)\) doesn’t contain the 1-simplex \([x_0,a_0]\), so \(\textrm{VR}_r(A) \subset \textrm{VR}_r(B_1) \subset C(\textrm{VR}_r(A),x_0) {\setminus } [x_0,a_0]\). Additionally, since \(r \in [t_b(A) , b) \cap [t_b(A) , t_d(A) )\), \(\textrm{VR}_r(A) \simeq {\mathfrak {B}}_{k}\) by Proposition 4.3, that is, \(\textrm{VR}_r(A)\) has the homotopy type of \({\mathbb {S}}^{k-1}\). Then \(C(\textrm{VR}_r(A), x_0)\) has the homotopy type of a hemisphere of \({\mathbb {S}}^{k}\) whose equator is \(\textrm{VR}_r(A) \simeq {\mathbb {S}}^{k-1}\). Hence, \(C(\textrm{VR}_r(A),x_0) {\setminus } [x_0,a_0]\) is homotopy equivalent to a punctured hemisphere of \({\mathbb {S}}^{k}\), which strong deformation retracts onto \(\textrm{VR}_r(A)\). Thus, the composition induced by inclusions

$$\begin{aligned} {\widetilde{H}}_{k-1}(\textrm{VR}_r(A)) \rightarrow {\widetilde{H}}_{k-1}(\textrm{VR}_r(B_1)) \rightarrow {\widetilde{H}}_{k-1}(C(\textrm{VR}_r(A)) \setminus [x_0,a_0]) \end{aligned}$$

is an isomorphism. This implies that the first map \({\widetilde{H}}_{k-1}(\textrm{VR}_r(A)) \rightarrow {\widetilde{H}}_{k-1}(\textrm{VR}_r(B_1))\) is injective which, in turn, makes \({\widetilde{H}}_{k-1}(\textrm{VR}_{r}(A)) \rightarrow {\widetilde{H}}_{k-1}(\textrm{VR}_{r}(B_0)) \oplus {\widetilde{H}}_{k-1}(\textrm{VR}_{r}(B_1))\) injective. Since \(\partial _*\) in (6) is also an injection, \({\widetilde{H}}_k(\textrm{VR}_{r}(X))=0\) for \(r \in [t_b(A) ,b)\).

Case 3: If \(r \in [b,t_d(A) )\), then \({\widetilde{H}}_k(\textrm{VR}_{r}(X)) \cong {\mathbb {F}}\).

The definition of b implies that \(t_b(A) \le b\), so \(t_b(A) < t_d(A) \). Then, the induction hypothesis on A implies that \({\text {PH}}^\textrm{VR}_{k-1}(A) = {\mathbb {I}}[t_b(A) ,t_d(A) )\) and, in particular, \( {\widetilde{H}}_{k-1}(\textrm{VR}_r(A)) = {\mathbb {F}}\) for \(r \in [b,t_d(A) )\). Now, since \(\max _{a \in A} d_X(x_1,a) \le b \le r\), \(\textrm{VR}_r(B_0)\) contains all simplices \([x_1,a_1,\dots ,a_m]\), where \([a_1, \dots , a_m]\) is a simplex of \(\textrm{VR}_{r}(A)\). In other words, \(\textrm{VR}_{r}(B_0) = C(\textrm{VR}_{r}(A), x_1) \simeq *\). The same holds for \(\textrm{VR}_r(B_1)\), so their homology is 0, and the Mayer–Vietoris sequence gives an isomorphism \({\widetilde{H}}_k(\textrm{VR}_{r}(X)) \xrightarrow {\sim } {\widetilde{H}}_{k-1}(\textrm{VR}_{r}(A)) \simeq {\mathbb {F}}\).

Case 4: If \(b \ge t_d(A) \), then \({\widetilde{H}}_k(\textrm{VR}_{r}(X)) \cong 0\) for all \(r \ge 0\).

The conclusion follows from Cases 1 and 2 because the interval \([b, t_d(A) )\) from Case 3 is empty and \([0, \infty ) = [0, t_b(A) ) \cup [t_b(A) , b) \cup [t_d(A) , \infty )\).

The last thing left to check is that \(\textrm{VR}_{*}(X)\) produces persistent homology precisely when \(t_b(X) < t_d(X) \). So far we have \({\text {PH}}^\textrm{VR}_k(X) = {\mathbb {I}}[b,t_d(A) )\) if and only if \(b < t_d(A) \), so now we show that \(t_b(X) < t_d(X) \) is equivalent to \(b < t_d(A) \).

Case 1: \(b < t_d(A) \) implies \(t_b(X) < t_d(X) \).

Let \(a \in A\). Since \(t_b(A) \le b < t_d(A) \), \(v_d(a,A)\) is well-defined by Lemma 4.2. Then for every \(a' \ne v_d(a,A)\), \(d_X(a,a') \le t_b(a,A) < t_d(a,A) \). Also, for \(j=0,1\), we have \(d_X(a, x_j) \le b < t_d(A) \le t_d(a,A) \) by definition of b. In other words, for every \(x \in X {\setminus } \{v_d(a,A)\}\), \(d_X(a,x) < t_d(a,A) \), which means that the point in X furthest away from a is still \(v_d(a,A) \in A\). Thus, \(t_d(a,X) = t_d(a,A) \) and \(t_b(a,X) = \max \left[ t_b(a,A) , d_X(a,x_0), d_X(a,x_1) \right] \). Additionally, \(d_X(x_0,x_1) = \textbf{diam}(X)\) and \(d_X(a,x_j) \le b < t_d(A) \le \textbf{diam}(X)\), so \(t_d(x_j,X) = \textbf{diam}(X)\), \(t_b(x_j,X) = \max _{a \in A} d_X(x_j,a)\), and \(v_d(x_0,X) = x_1\). Hence,

$$\begin{aligned} t_d(X){} & {} = \min \left\{ t_d(x_0,X) , t_d(x_1,X) , \min _{a \in A} t_d(a,X) \right\} \\{} & {} = \min \left\{ \textbf{diam}(X), \min _{a \in A} t_d(a,A) \right\} = t_d(A) , \end{aligned}$$

and

$$\begin{aligned} b&= \max \left[ t_b(A) , \max _{a \in A} d_X(x_0,a), \max _{a \in A} d_X(x_1,a)\right] \\&= \max \left[ \max _{a \in A} t_b(a,A) , \max _{a \in A} d_X(x_0,a), \max _{a \in A} d_X(x_1,a)\right] \\&= \max \left[ \max _{a \in A} t_b(a,X) , t_b(x_0,X) , t_b(x_1,X) \right] = t_b(X) . \end{aligned}$$

In conclusion, \(t_b(X) = b < t_d(A) = t_d(X) \).

Case 2: \(b \ge t_d(A) \) implies \(t_b(X) \ge t_d(X) \).

Let \(a_0 \in A\) such that \(t_d(A) = t_d(a_0,A) \). Notice that \(t_d(a_0,X) \) can differ from \(t_d(a_0,A) \) if \(d_X(a_0,x_j) \ge d_X(a_0,v_d(a_0,A))\) for some \(j=0,1\). However, we have \(b \ge d_X(a_0,x_j)\) by definition, so b would still be greater than \(t_d(a_0,X) \) even if \(t_d(a_0,X) \ne t_d(a_0,A) \). With this in mind, we have two sub-cases.

Case 2.1: \(b=t_b(A) \).

Since \(t_b(a,X) \) takes the maximum over a larger set than \(t_b(a,A) \) does, \(t_b(a,A) \le t_b(a,X) \) for all \(a \in A\). Then

$$\begin{aligned} t_b(X) \ge t_b(A) = b \ge t_d(a_0,X) \ge t_d(X) . \end{aligned}$$

Case 2.2: \(b > t_b(A) \). Write \(b = d_X(a_1,x_j)\), where \(a_1 \in A\) and j is either 0 or 1. Observe that \(t_d(x_j,X) = \textbf{diam}(X) = d_X(x_1,x_2) \ge d_X(a_1,x_j)\), so \(t_b(x_j,X) \ge d_X(a_1,x_j)\). Then

$$\begin{aligned} t_b(X) \ge t_b(x_j,X) \ge d_X(a_1,x_j) = b \ge t_d(a_0,X) \ge t_d(X) . \end{aligned}$$

This concludes the proof of Case 2. \(\square \)

4.1.1 A Geometric Algorithm for Computing \(\textrm{dgm}_k^\textrm{VR}(X)\) when \(|X|=n\) and \(k=\frac{n}{2}-1\).

Thanks to Theorem 4.4, we can compute \(\textrm{dgm}_k^\textrm{VR}(X)\) in \(O(n^2)\) time if \(|X|=n=2k+2\). Indeed, both \(t_b(x) \) and \(t_d(x) \) can be found in at most \((n-1)+(n-2) = 2n-3\) steps because finding a maximum takes as many steps as the number of entries. We compute both quantities for each of the n points in X and then find \(t_b(X) = \max _{x \in X} t_b(x) \) and \(t_d(X) = \min _{x \in X} t_d(x) \) in n steps each. After comparing \(t_b(X) \) and \(t_d(X) \), we are able to determine whether \(\textrm{dgm}_k^\textrm{VR}(X)\) is \(\{({t_b(X) },{t_d(X) })\}\) or empty in at most \(n(2n-3)+2n+1 = O(n^2)\) steps. This is a significant improvement from the linear bound (in the number of simplices) \(O\bigg (\left( {\begin{array}{c}n\\ k+2\end{array}}\right) \bigg ) = O\bigg (\left( {\begin{array}{c}n\\ n/2+1\end{array}}\right) \bigg )\) discussed Sect. 3.1.2. We summarize this paragraph as follows:

Proposition 4.6

Let X be a metric space with n points and \(k = \frac{n}{2}-1\). The cost of computing \(t_b(X) \) and \(t_d(X) \) as in Definition 4.1 is \(O(n^2)\).

A parfor based Matlab implementation is provided in our Github repository [42].

4.2 The Definition of VR-Principal Persistence Sets

Theorem 4.4 has two consequences for \(\textrm{VR}\)-persistence sets. The first is the following corollary.

Corollary 4.7

Let X be any metric space. Given \(k \ge 0\) fixed, \({\textbf{D}}_{n,k}^{\textrm{VR}}(X)\) is empty for all \(n < 2k+2\).

This means that the first interesting choice of n is \(n=2k+2\), and in that case, any sample \(Y \subset X\) with \(|Y|=n\) will produce only one point in its persistence diagram. This case will be focus of the rest of the paper, so we give it a name.

Definition 4.8

\({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) and \({\textbf{U}}_{2k+2,k}^{\textrm{VR}}(X)\) are called, respectively, the Vietoris–Rips principal persistence set and the principal persistence measure of X in dimension k.

Let \(k \ge 0\) and \(n=2k+2\). Notice that the results in Sect. 3.1.2 imply the worst case bound \(O(n^{k+2} \cdot N)\) for approximating principal persistence sets with N samples (cf. page 18). We improve this bound via the algorithm from Sect. 4.1.1.

Corollary 4.9

Let X be a metric space. Fix \(k \ge 0\) and \(n=2k+2\). The cost of approximating \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) with N samples is \(O(n^2 \cdot N)\).

The fact that the diagrams in \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) have at most one point allows us to visualize principal persistence sets as subsets of points in \({\mathbb {R}}^2\) (cf. Figure 4), and also to recast their properties as properties of these subsets of \({\mathbb {R}}^2\).

Definition 4.10

Let \({\mathcal {D}}_1:= \{D \in {\mathcal {D}}\mid |D| \le 1\}\) and \(\Delta _0^+:= \{ (x,y) \in {\mathbb {R}}^2 \mid 0< x < y \text { or } x=y=0\}\). Define \(\Phi :{\mathcal {D}}_1 \rightarrow \Delta _0^+\) by \(\Phi (\emptyset ) = (0,0)\) and \(\Phi (\{(t_b,t_d)\}) = (t_b,t_d)\).

The immediate use of \(\Phi \) is to visualize principal persistence sets as subsets of \({\mathbb {R}}^2\) by plotting \(\Phi \bigg ({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\bigg )\) (as we do in Fig. 12). Additionally, via the map \(\Phi \) we can also import principal persistence measures, metrics, and the stability of principal persistence sets into easier concepts involving \({\mathbb {R}}^2\). For example, we can visualize the pushforward measure \(\Phi _\# {\textbf{U}}_{2k+2,k}^{\textrm{VR}}(X)\) by coloring its support \(\Phi \bigg ({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\bigg )\) according to density. See Fig. 12.

If we define the metric \(d_{{\mathcal {B}}}\) on \(\Delta _0^+\) by

$$\begin{aligned} \textstyle d_{{\mathcal {B}}}\bigg ( (x,y), (x',y') \bigg ):= \min \left\{ \max (|x-x'|, |y-y'|), \frac{1}{2}\max (y-x, y'-x') \right\} \end{aligned}$$
(8)

then the map \(\Phi \) is an isometry between \(({\mathcal {D}}_1,d_{{\mathcal {D}}})\) and \((\Delta _0^+,d_{{\mathcal {B}}})\). It follows that the stability Theorems 3.13 and 3.19 can be reprhased (in way that will be immediately useful in Sect. 4.3.1) as follows.

Theorem 4.11

Let \(X,Y \in {\mathcal {M}}\). For any \(k \ge 0\),

$$\begin{aligned} d_{{\mathcal {H}}}^{\Delta _0^+}(\Phi \circ {\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X), \Phi \circ {\textbf{D}}_{2k+2,k}^{\textrm{VR}}(Y)) \le d_{{\mathcal {H}}}({\textbf{K}}_n(X), {\textbf{K}}_n(Y)) \le 2 \cdot {\widehat{d}}_\mathcal{G}\mathcal{H}(X,Y), \end{aligned}$$

and

$$\begin{aligned} d_{{\mathcal {W}},p}^{\Delta _0^+}(\Phi _\# {\textbf{U}}_{2k+2,k}^{\textrm{VR}}(X), \Phi _\# {\textbf{U}}_{2k+2,k}^{\textrm{VR}}(Y)) \le d_{{\mathcal {W}},p}(\mu _n(X), \mu _n(Y)) \le 2 \cdot {\widehat{d}}_{\mathcal{G}\mathcal{W},p}(X,Y), \end{aligned}$$

where \(d_{{\mathcal {H}}}^{\Delta _0^+}\) and \(d_{{\mathcal {W}},p}^{\Delta _0^+}\) denote the Hausdorff and p-Wasserstein distances defined on \((\Delta _0^+, d_{{\mathcal {B}}})\).

Remark 4.12

To reduce notational overload, we will simply write \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) and \({\textbf{U}}_{2k+2,k}^{\textrm{VR}}(X)\) instead of \(\Phi \circ {\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) and \(\Phi _\# {\textbf{U}}_{2k+2,k}^{\textrm{VR}}(X)\). Additionally, whenever referring to distances between points in a given \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) or between two sets \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(\cdot )\) or between measures \({\textbf{U}}_{2k+2,k}^{\textrm{VR}}(\cdot )\), we will invoke the metrics \(d_{{\mathcal {B}}}\), \(d_{{\mathcal {H}}}^{\Delta _0^+}\), and \(d_{{\mathcal {W}},p}^{\Delta _0^+}\) described above.

Example 4.13

Fig. 12 shows computational approximations to the principal persistence measure \({\textbf{U}}_{4,1}^{\textrm{VR}}\) of \({\mathbb {S}}^{1}, {\mathbb {S}}^{2}\), and \({\mathbb {T}}^2:= {\mathbb {S}}^{1} \times {\mathbb {S}}^{1}\). The spheres are equipped with their usual Riemannian metrics \(d_{{\mathbb {S}}^{1}}\) and \(d_{{\mathbb {S}}^{2}}\) respectively. As for the torus, we used the \(\ell ^2\) product metric defined as

$$\begin{aligned} d_{{\mathbb {T}}^2}\left( (\theta _1,\theta _2), (\theta _1',\theta _2') \right) := \sqrt{ \left( d_{{\mathbb {S}}^{1}}(\theta _1,\theta _1') \right) ^2 + \left( d_{{\mathbb {S}}^{1}}(\theta _2,\theta _2') \right) ^2}, \end{aligned}$$

for all \((\theta _1,\theta _2), (\theta _1',\theta _2') \in {\mathbb {T}}^2\). The diagrams were computed with the algorithm in Sect. 4.1.1 implemented in MATLAB using \(10^6\) 4-tuples of points sampled uniformly at random. The calculations took 12.11 seconds for the circle, 20.08 sec. for the sphere and 25.96 sec. for the torus. The fraction of configurations that produced a non-diagonal point were 11.08 % for the circle, 12.63 % for the sphere and 14.80 % for the torus.

In these graphs we observe the functioriality property \({\textbf{D}}_{n,k}^{\textrm{VR}}(X) \subset {\textbf{D}}_{n,k}^{\textrm{VR}}(Y)\) whenever \(X \hookrightarrow Y\) (see Remark 3.11). Notice that \({\mathbb {S}}^{1}\) embeds into \({\mathbb {S}}^{2}\) as the equator, and as slices \({\mathbb {S}}^{1} \times \{x_0\}\) and \(\{x_0\} \times {\mathbb {S}}^{1}\) in \({\mathbb {T}}^2\). The effect on the persistence sets is that a copy of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) appears in both \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2})\) and \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {T}}^2)\).

Fig. 12
figure 12

From left to right: computational approximations to the 1-dimensional persistence measures \({\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}), {\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2})\), and \({\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {T}}^2)\). The colors represent the density of points in the diagram. The support of each measure (that is, the colored region) is the persistence set \({\textbf{D}}_{4,1}^{\textrm{VR}}\) of the corresponding metric space. Notice how these results agree with the functoriality property (cf. Remark 3.11): namely, that the persistence set of \({\mathbb {S}}^{1}\) is a subset of the respective persistence sets of \({\mathbb {S}}^{2}\) and \({\mathbb {T}}^2\) (see Example 4.13).

4.3 Discriminating Power of VR-Principal Persistence Sets

In this section, we study the discriminating power of principal persistence sets in two synthetic examples and in one practical dataset. In the first example, we see that \({\textbf{D}}_{4,1}^{\textrm{VR}}(R)\) correlates with the “size” of the hole of a rectangle \(R \subset {\mathbb {R}}^2\). The second example shows that \({\textbf{D}}_{6,2}^{\textrm{VR}}\) can tell apart a flat torus from a rectangle. Lastly, we show that various metrics induced by persistence sets (or persistence measures) can classify the 3D shapes from the paper [80] with classification error as low as 7.38 %.

Example 4.14

(\({\textbf{D}}_{6,2}^{\textrm{VR}}\) can distinguish the torus from a rectangle) Let \(R>0\), and define \(S_R:= \frac{R}{\pi } \cdot {\mathbb {S}}^{1}\) to be the circle with geodesic distance rescaled to have perimeter 2R. Define the rectangle \(Q_{R_1, R_2}:= [0,2R_1] \times [0,2R_2] \subset {\mathbb {R}}^2\) and the torus \(T_{R_1,R_2}:= S_{R_1} \times S_{R_2}\), and equip both spaces with the \(\ell ^p\) product metric for some \(p \ge 1\). Inspired by the observation that \(H_2(T_{R_1,R_2}) \cong {\mathbb {F}}\) and \(H_2(Q_{R_1,R_2}) \cong 0\), we ask if \({\textbf{D}}_{6,2}^{\textrm{VR}}\) can distinguish \(T_{R_1, R_2}\) from \(Q_{R_1, R_2}\). Figure 13 shows experimental approximations to \({\textbf{D}}_{6,2}^{\textrm{VR}}(T_{R_1, R_2})\) and \({\textbf{D}}_{6,2}^{\textrm{VR}}(Q_{R_1, R_2})\) for several values of \(R_1\) and \(R_2\), and different \(\ell ^p\) metrics. The diagrams were obtained by uniformly sampling 1,000,000 6-point subsets from each space.

Regardless of the choice of parameters, the approximations of \({\textbf{D}}_{6,2}^{\textrm{VR}}(Q_{R_1, R_2})\) have almost no points, while those of \({\textbf{D}}_{6,2}^{\textrm{VR}}(T_{R_1, R_2})\) have a significant number of non-diagonal points. It is important to note that the diagrams \({\textbf{D}}_{6,2}^{\textrm{VR}}(Q_{R_1, R_2})\) with the \(\ell ^2\) metric have more points than are shown here. For instance, both \(Q_{1,1}\) and \(Q_{1,3}\) contain a circle of radius 1, so \({\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{1}_E) \subset {\textbf{D}}_{6,2}^{\textrm{VR}}(Q_{1,1}) \subset {\textbf{D}}_{6,2}^{\textrm{VR}}(Q_{1,3})\) (cf. Theorem 5.4). However, these examples show that the measures \({\textbf{U}}_{6,2}^{\textrm{VR}}(Q_{R_1,R_2})\) and \({\textbf{U}}_{6,2}^{\textrm{VR}}(T_{R_1,R_2})\) induced by the uniform measures on the respective spaces are different. Lastly, it is interesting to note that these computations require less points (6) than the number of vertices (7) in a minimal simplicial complex homeomorphic to the torus. See, for instance, Theorem 1 of [53].

\({\textbf{D}}_{4,1}^{\textrm{VR}}\) can also tell apart the torus and the rectangle. We will see in Proposition 5.17 that any \((t_b, t_d) \in {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {R}}^2)\) satisfies \(t_d \le \sqrt{2} t_b\). This holds, in particular, for any \((t_b, t_d) \in {\textbf{D}}_{4,1}^{\textrm{VR}}(Q_{R_1,R_2})\). In contrast, the set \(X = \{(0,0), (R_1/2, 0), (R_1, 0), (3R_1/ 2, 0)\} \subset T_{R_1,R_2}\) satisfies \(t_b(X) = R_1/2\) and \(t_d(X) = R_1\) but \(t_d(X) = 2t_b(X) > \sqrt{2} t_b(X) \).

Fig. 13
figure 13

Diagrams \({\textbf{D}}_{6,2}^{\textrm{VR}}\) for the torus \(T_{R_1, R_2} = \frac{R_1}{\pi } \cdot {\mathbb {S}}^{1} \times \frac{R_2}{\pi } \cdot {\mathbb {S}}^{1}\) and the rectangle \(Q_{R_1, R_2} = [0, 2R_1] \times [0, 2R_2]\) equipped with the \(\ell ^p\) product metric for \(p=2, \infty \) (see Example 4.14). Out of the 1,000,000 configurations sampled from \(Q_{R_1, R_2}\), the percentage that produced a non-diagonal point are, from left to right, \(0.01\ \%\), \(0.00\ \%\), \(0.00\ \%\), and \(0.00\ \%\). For \(T_{R_1, R_2}\), the percentages are \(2.20\ \%\), \(2.00\ \%\), \(2.86\ \%\), and \(1.99\ \%\).

Example 4.15

(Sampling effects) The following two experiments illustrate how sampling affects persistence sets and persistence diagrams. For both experiments we used \(N=10^5\) tuples when estimating persistence sets.

The first experiment. In the first case, we consider the metric gluing \({\mathbb {S}}^1 \vee {\mathbb {S}}^2\), where each sphere is given its own geodesic metric. For a given parameter value \(0 \le p \le 1\), we sample a set \(X \subset {\mathbb {S}}^1 \vee {\mathbb {S}}^2\) with 1000 i.i.d. points as follows: with probability p the point is sampled uniformly at random from \({\mathbb {S}}^1\) and with probability \((1-p)\) the point is sampled uniformly at random from \({\mathbb {S}}^2\). For \(p=0,0.1,0.2,0.3,0.4\) and 0.5 we calculated \(\textrm{dgm}_1^\textrm{VR}(X)\), \(\textrm{dgm}_2^\textrm{VR}(X)\), and \({\textbf{U}}_{4,1}^{\textrm{VR}}(X)\); the results are shown in Fig. 14. For \(p>0\), the persistence diagrams clearly indicate that X has one cycle in each dimension 1 and 2, whereas the measures \({\textbf{U}}_{4,1}^{\textrm{VR}}(X)\) are very similar to each other for \(0 \le p \le 0.20\); compare with the central panel in Fig. 12. Note that the measures \({\textbf{U}}_{4,1}^{\textrm{VR}}(X)\) exhibit an ability to ‘detect’ the \({\mathbb {S}}^1\) component for \(p\ge 0.3\).

The second experiment Let \({\mathbb {S}}^1_E\) and \(D^2\) be the unit circle and the unit disk in \({\mathbb {R}}^2\), both equipped with the Euclidean metric. We sample a subset \(X \subset D^2\) consisting of 1000 points as follows: each point is sampled uniformly from the interior of \(D^2\) with probability p and from its boundary \({\mathbb {S}}^1_E\) with probability \(1-p\). We endow X with the Euclidean metric and calculate \(\textrm{dgm}_1^\textrm{VR}(X)\) and \({\textbf{U}}_{4,1}^{\textrm{VR}}(X)\) for 16 different values of p; see Fig. 16. This time, \(\textrm{dgm}_1^\textrm{VR}(X)\) does not have a unique significant cycle for values of p as low as 0.01, and all the diagrams between \(p=0.20\) and \(p=1\) are virtually indistinguishable. In contrast, the measures \({\textbf{U}}_{4,1}^{\textrm{VR}}(X)\) assign a lot of weight to \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^1_E)\) for \(0 \le p \le 0.30\) (cf. the third panel of Fig. 27). This can be interpreted as signaling that these measures permit detecting topological features in the present of outliers/noise.

Fig. 14
figure 14

\(0 \le p \le 1\), we sampled \(X \subset {\mathbb {S}}^1 \vee {\mathbb {S}}^2\) with 1000 points so that each point is uniformly distributed in \({\mathbb {S}}^1\) with probability p or in \({\mathbb {S}}^2\) with probability \(1-p\). Top: \(\textrm{dgm}_1^\textrm{VR}(X)\) (blue) and \(\textrm{dgm}_2^\textrm{VR}(X)\) (orange). Bottom: \({\textbf{U}}_{4,1}^{\textrm{VR}}(X)\). See Example 4.15.

Example 4.16

(\({\textbf{D}}_{4,1}^{\textrm{VR}}(R)\) correlates with the size of the rectangle R.) Given \(0 \le a \le b\) such that \(a+b=1\), consider the boundary of the rectangle \(R_{a,b} \subset {\mathbb {R}}^2\) with side lengths a and b and constant perimeter 2, and give \(R_{a,b}\) the Euclidean metric. Figure 15 shows computational approximations of the persistence measures \({\textbf{U}}_{4,1}^{\textrm{VR}}(R_{a,b})\) for several values of a and b. We sampled \(10^5\) sets of 4 points uniformly at random from \(R_{a,b}\). Observe that as a increases, the minimal Euclidean distance from the origin to the support \({\textbf{D}}_{4,1}^{\textrm{VR}}(R_{a,b})\) of \({\textbf{U}}_{4,1}^{\textrm{VR}}(R_{a,b})\) increases. Also, note that the maximal persistence of points in \({\textbf{D}}_{4,1}^{\textrm{VR}}(R_{a,b})\) decreases rapidly with a. These two observations indicate that \({\textbf{D}}_{4,1}^{\textrm{VR}}(R_{a,b})\) is sensitive to the size of the “hole" determined by the rectangle \(R_{a,b}\).

Fig. 15
figure 15

The persistence measures \({\textbf{U}}_{4,1}^{\textrm{VR}}(R_{a,b})\) of rectangles with side lengths ab such that \(a+b=1\). The lines shown in red are the diagonal \(t_d=t_b\) and the upper bound \(t_d = \sqrt{2} t_b\) given by Proposition 5.17. These graphs were generated by sampling 4 points uniformly at random from each rectangle \(10^5\) times. The percentage of samples that produced a non-diagonal point in each graph are, from left to right, \(0.17\ \%\), \(1.17\ \%\), \(3.15\ \%\), \(6.01\ \%\), and \(9.12\ \%\).

Fig. 16
figure 16

Given \(0 \le p \le 1\), we sampled \(X \subset D^2 \subset {\mathbb {R}}^2\) with 1000 points so that each point is uniformly distributed in the interior of \(D^2\) with probability p or on its boundary \({\mathbb {S}}^1_E\) with probability \(1-p\). Top: Examples of X for several values of p. Middle: \(\textrm{dgm}_1^\textrm{VR}(X)\). Bottom: \({\textbf{U}}_{4,1}^{\textrm{VR}}(X)\).

4.3.1 Performance in a Pose-Invariant Shape Classification Task

To test the discriminative power of persistence sets, we performed a classification experiment similar to the one outlined in [22]. In this experiment one has a database with multiple classes, each containing several poses of the same shape, and the goal is to classify the database so that poses in the same class are clustered together while poses in two different classes are well discriminated; see Fig. 17.

We used a subset of the database fromFootnote 4 [80] consisting of 62 shapes from six different classes: camel, cat, elephant, face, head, and horse. Each class has either 10 or 11 poses of the same shape. A pose is encoded with a mesh \((V_i,T_i)\) (\(i=1, \dots , 62\)) which consists of a set of vertices \(V_i \subset {\mathbb {R}}^3\) and a set of triangles \(T_i \subset V_i^3\). Let \(G_i = (V_i, E_i)\) be the 1-skeleton of \((V_i, T_i)\) with an edge \(\{p,q\} \in E_i\) weighted by the Euclidean distance \(\Vert p-q\Vert \). Let \(d_{G_i}\) be the shortest path distance on \(G_i\). Note that this implies that any two poses of the same class will be (nearly) isometric. We first select a subset \(X_i \subset G_i\) of 4,000 points using farthest point sampling, that is, we start with a random initial point \(p_1 \in V_i\), and at each step, we choose \(p_{t+1} \in V_i\) to be any point that maximizes the Euclidean distance to \(\{p_1, \dots , p_t\}\). The metric on \(X_i \subset G_i\) is the restriction of \(d_{G_i}\) and we denote it by \(d_i\). We normalized each \(X_i\) to have diameter 1 and endowed it with the uniform probability measure \(\gamma _i\) to obtain an mm-space \((X_i, d_i, \gamma _i)\) representing the ith shape.

For each shape, we computed the full \({\textbf{D}}_{2,0}^{\textrm{VR}}(X_i)\) (which consists of \(\left( {\begin{array}{c}4000\\ 2\end{array}}\right) +4000 \approx 10^7\) points counting repetitions), together with approximations to \({\textbf{D}}_{4,1}^{\textrm{VR}}(X_i)\) and \({\textbf{D}}_{6,2}^{\textrm{VR}}(X_i)\) with \(N=10^6\) and \(N=10^7\) samples, respectively, chosen uniformly at random (see the description on page 18). These samples also induce an approximation of \({\textbf{U}}_{2k+2,k}^{\textrm{VR}}(X_i)\) given by the empirical measure \(\gamma _{i,k}\) for \(k=1,2\). For \(k=0\), we have the exact measure \({\textbf{U}}_{2,0}^{\textrm{VR}}(X_i)\).

Fig. 17
figure 17

Exemplar shapes from the the database of 3D shapes we used in our shape classification task. Note that different poses of the camel shape are nearly isometric when regarded as metric spaces (endowed with their geodesic distances). See Sect. 4.3.1 for details.

Coarsening of \({\textbf{U}}_{2k+2,k}^{\textrm{VR}}(X_i)\). Ideally, we would like to compute the Wasserstein distance on \(\Delta _0^+\) between the different \(\gamma _{i,k}\)s. However, this calculation would require finding the bottleneck distance between every pair of points in the product of the supports of \(\gamma _{i,k}\) and \(\gamma _{j,k}\). This cost matrix would be unmanageable (in the sense that its size would be at least \(10^7 \times 10^7\) when \(k=0\) and \(10^6\times 10^6\) when \(k=1,2\)), so we replace \(\gamma _{i,k}\) with a coarsened measure \(\gamma _{i,k}^c\) defined via a Voronoi partition on \(\Delta _0^+\) as follows. Choose a set of landmark points \(L:= \{p_{1}, \dots , p_{\ell }\} \subset \Delta _0^+\). For every \(t=1, \dots , \ell \), let \(\gamma _{i,k}^c(p_t)\) be the sum of \(\gamma _{i,k}(p)\) over all points \(p \in {\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X_i)\) that are closer to \(p_t\) than to any other \(p' \in L\). For \(k=0\), L consists of 850 points spaced uniformly on the line \(\{0\} \times [0,1]\). For \(k=1\) and \(k=2\), we constructed a grid of uniformly spaced points in \([0,1] \times [0,1]\), and retained the origin and the points that were strictly above the diagonal; the final landmark set L had 947 points.

The pairwise distance matrices arising from persistence sets and measures We first computed 8 distance matrices of size 62-by-62, where the (ij) entry of each matrix is the (Hausdorff or Wasserstein) distance between a certain invariant of \(X_i\) and \(X_j\) as we describe next:

  • For each \(k=0,1,2\), we computed the Hausdorff distance between the persistence sets \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X_i)\) for \(i=1, \dots , 62\) as subsets of \((\Delta _0^+, d_{{\mathcal {B}}})\) (recall Definition 4.10). We denote these matrices as \({\mathcal {H}}_{0}\), \({\mathcal {H}}_1\) and \({\mathcal {H}}_2\).

  • The next three matrices are given by the 1-Wasserstein distance between the coarsened measures \(\gamma _{i,k}^c\). We denote them as \({\mathcal {W}}_k\). We used the Matlab mex interface from [5] for this step.

  • The last two matrices are defined by the entry-wise maxima

    $$\begin{aligned} {\mathcal {H}}_{\max }:= \max _{k} {\mathcal {H}}_k\,\,\text{ and }\,\, {\mathcal {W}}_{\max }:= \max _{k} {\mathcal {W}}_k. \end{aligned}$$

Observe that, since we are operating on \((\Delta _0^+, d_{{\mathcal {B}}})\), we used Eq. (8) to directly find \(d_{{\mathcal {B}}}(D_1, D_2)\) for \(D_1 \in {\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X_i)\) and \(D_2 \in {\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X_j)\) instead of optimizing between all partial matchings \(\varphi :D_1 \rightarrow D_2\).

The pairwise distance matrices coming from VR-persistence diagrams We also computed the VR-persistence diagrams of subsets \(X_i' \subset X_i\) with \(|X_i'|=500\) obtained from the farthest point sampling induced by the metric of \(X_i\).Footnote 5 We equip \(X_i'\) with the metric inherited from \(X_i\) and normalize it so that it has diameter 1. Then, we computed \(\textrm{dgm}_k^\textrm{VR}(X_i')\) for \(k=0,1,2\) with a modification of C. Trailie’s wrapper for Ripser [8]. Define the matrices \({\mathcal {B}}_k\) by setting the (ij)-entry of \({\mathcal {B}}_k\) to be the bottleneck distance between \(\textrm{dgm}_k^\textrm{VR}(X_i')\) and \(\textrm{dgm}_k^\textrm{VR}(X_j')\). As before, we define \({\mathcal {B}}_{\max }:= \max _k {\mathcal {B}}_k\). We used Hera to compute the bottleneck distances [51].

Classification tasks and results Let \({\textbf{M}}:=\{{\mathcal {H}}_k, {\mathcal {W}}_k, {\mathcal {B}}_k \mid k=0,1,2\} \cup \{{\mathcal {H}}_{\max }, {\mathcal {W}}_{\max }, {\mathcal {B}}_{\max }\}\). For each \(M \in {\textbf{M}}\), we performed a 1-nearest neighbor classification task. Our training set contains one random member \(R_j\) from each class (\(j=1, \dots , 6\)), and we assign each \(X_i\) to the class of the \(R_j\) that is closest to \(X_i\) as given by M. We repeated this experiment 2000 times and computed the average classification error \(P_e(M)\). The results are shown in Table 3, and the heatmaps of the matrices in \({\textbf{M}}\) are shown in Figs. 18, 19, and 20.

We make the following remarks:

  • Regarding the \({\mathcal {B}}_k\) matrices, it is interesting to note that \({\mathcal {B}}_2\) performed much better than \({\mathcal {B}}_0\) and \({\mathcal {B}}_1\). \({\mathcal {B}}_0\) can apparently separate head from the other classes (see Fig. 20).Footnote 6 In the same vein, we believe that \({\mathcal {B}}_1\) can separate instances of face and head from other classes because of the “holes” induced by the eyes and mouth.

  • All the metrics induced by persistence sets (\({\mathcal {H}}_k\) and \({\mathcal {W}}_k\)) perform better than \({\mathcal {B}}_k\) for \(k=0 \,\text{ or }\, 1\), and the best classification errors obtained by each metric are \(P_e({\mathcal {H}}_{\max }) = 9.17\ \%\), \(P_e({\mathcal {W}}_1) = 9.06\ \%\), and \(P_e({\mathcal {B}}_2) = 5.93\ \%\). That is, the performance of the best \({\mathcal {H}}_k\) and \({\mathcal {W}}_k\) is comparable to that of \({\mathcal {B}}_2\). This is promising especially since the computation of the latter is particularly costly.

  • An important observation from Table 3 is that \(P_e({\mathcal {W}}_{\max }) = 19.28\ \%\) despite the fact that \(P_e({\mathcal {W}}_1) = 9.06\ \%\). The reason is that \({\mathcal {W}}_0\) dominates the maximum in \({\mathcal {W}}_{\max }\), so the discriminating power of \({\mathcal {W}}_1\) is obfuscated.

Appendix 2 contains additional results regarding this classification task.

Table 3 Average classification error \(P_e(M)\) over 2000 trials for all possible choices \(M\in {\textbf{M}}\)
Fig. 18
figure 18

Heatmaps of the matrices \({\mathcal {H}}_0\), \({\mathcal {H}}_1\), \({\mathcal {H}}_2\), \({\mathcal {H}}_{\max }\). Notice that the scale of each matrix is different. Notice that \({\textbf{D}}_{4,1}^{\textrm{VR}}\) can tell apart the classes face and head from all the others. In addition, a head has a 2-dimensional cavity and a face doesn’t, which suggests a reason why \({\textbf{D}}_{6,2}^{\textrm{VR}}\) can also tell those two classes apart.

Fig. 19
figure 19

Heatmaps of the matrices \({\mathcal {W}}_0\), \({\mathcal {W}}_1\), \({\mathcal {W}}_2\), \({\mathcal {W}}_{\max }\). Notice that the scale of each matrix is different. Notice that \({\textbf{U}}_{4,1}^{\textrm{VR}}\) can tell apart the classes face and head from all the others. In addition, a head has a 2-dimensional cavity and a face doesn’t, which suggests why \({\textbf{U}}_{6,2}^{\textrm{VR}}\) can also tell those two classes apart.

Fig. 20
figure 20

Heatmaps of the matrices \({\mathcal {B}}_0\), \({\mathcal {B}}_1\), \({\mathcal {B}}_2\), \({\mathcal {B}}_{\max }\). Notice that the scale of each matrix is different.

4.4 Comparison of Computational Performance of VR-Persistence Sets and VR-Persistent Homology

Time benchmarks of the geometric algorithm and VR-persistent homology We tested the algorithm from Sect. 4.1.1 by calculating \(\textrm{dgm}_k^\textrm{VR}(X)\) for sets X with \(n_X:= |X|=2k+2\) and \(n_X\) ranging from 4 to 50. For each value of \(n_X\), we used MATLAB to generate 250 sets \(X \subset {\mathbb {R}}^2\) with a two-dimensional normal random variable. We then attempted to calculate \(\textrm{dgm}_k(X)\) in three ways: with the geometric algorithm (from Theorem 4.4) coded in MATLAB and in C++, and with Ripser (which is written in C++) using a MATLAB wrapperFootnote 7 for Ripser [8] developed by C. Tralie. The results are given in the boxplot in Fig. 21. Ripser was unable to compute \(\textrm{dgm}_k^\textrm{VR}(X)\) for \(n_X>28\), while both versions of the geometric algorithm did so successfully. C. Trailie’s wrapper calls Ripser inside MATLAB with the system command, and we adopted that approach to run our C++ implementation of the geometric algorithm [42]. The time was measured with the tic, toc functions.

Fig. 21
figure 21

The time required by three algorithms to compute \(\textrm{dgm}_k^\textrm{VR}(X)\) for a space X with \(n_X = 2k+2\) points. We did 250 repetitions for each value of \(n_X\). Ripser was not able not finish the calculations past \(n_X=28\) due to having exceeded the memory available on the local machine (see text for specifications).

It must be noted that both C++ executables required that we write the distance matrices to disk before running the programs. In contrast, MATLAB can run the geometric algorithm with the distance matrix loaded in memory, and this explains why MATLAB outperformed the other two programs. This observation has implications for the implementation of persistence sets. Principal persistence sets (i.e. when \(n=2k+2\)) can be calculated in any programming language without significant overhead after implementing the geometric algorithm. Similarly, the computation of non-principal persistence sets could be integrated into existing software for persistent homology in order to avoid the costly I/O operations described above. The tests in this section were performed in a Dell Precision 7540 Laptop with an Intel Core i7-9850 H CPU and 8GB of RAM, running Fedora 35 and gcc version 11.3.1.

Benchmarks of VR-persistence sets and VR-persistent homology At this point, it is important to emphasize that we view persistence sets as a family of invariants that complements the standard persistent homology pipeline. Persistence sets are, in many cases, efficiently computable both in terms of their complexity and approximability and, importantly, in terms of memory requirements. In contrast, computing standard persistence diagrams tends to require very substantial memory resources to the point that this is a factor limiting their applicability. To illustrate this contrast, we sampled a collection of sets X uniformly at random from the sphere \({\mathbb {S}}^{3}\) with \(n_X=|X|\) ranging from 100 to 1000 in increments of 100. We attempted to calculate persistent homology in dimension k with Ripser and an approximation to \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) with \(N=10^6\). We used \(k=1,2,3\) in both experiments and implemented the algorithm from Sect. 4.1.1 in C++ for the latter computation. The computation of \(\textrm{dgm}_k^\textrm{VR}(X)\) failed to finish beyond \(n_X = 500\) when \(k=2\) and \(n_X = 100\) when \(k=3\). See Fig. 22. We measured ellapsed time and consumed memory with the /usr/time -v command. The tests in this section were performed in a Dell Precision 7540 Laptop with an Intel Core i7-9850 H CPU and 8GB of RAM, running Fedora 35 and gcc version 11.3.1.

Fig. 22
figure 22

Left column: The time (top) and memory (bottom) required to compute \(\textrm{dgm}_k^\textrm{VR}(X)\) for a space with \(n_X\) points. Ripser was unable to finish the calculations due to having exceeded the memory available on the local machine for \(n_X>500\) when \(k=2\) and for \(n_X>100\) when \(k=3\) (see text for memory specifications). This is why the graph for \(k=3\) has a single point rather than a line. Right column: The time (top) and memory (bottom) required to approximate \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) with \(10^6\) samples for a space with \(n_X\) points.

The impact of parallelization We also compared the running time of persistent homology and principal persistence sets when the computation of the latter is parallelized. We selected 15 random shapes \(X_{i_j}\) from the database in Sect. 4.3.1 and computed approximations to \({\textbf{D}}_{4,1}^{\textrm{VR}}(X_{i_j})\) and \({\textbf{D}}_{6,2}^{\textrm{VR}}(X_{i_j})\) with \(10^6\) and \(10^7\) samples, respectively, and \(\textrm{dgm}_2^\textrm{VR}(X_{i_j}')\) for a subset \(X_{i_j}' \subset X_{i_j}\) with 500 points selected by farthest point sampling. The calculations were carried out in MATLAB running in a cluster computer (see below for the specs). We timed the computation of each \({\textbf{D}}_{4,1}^{\textrm{VR}}(X_{i_j})\), \({\textbf{D}}_{6,2}^{\textrm{VR}}(X_{i_j})\) and \(\textrm{dgm}_2^\textrm{VR}(X_{i_j}')\) with the tic, toc functions, and we show the average running time in Fig. 23. Although the computation of \({\textbf{D}}_{6,2}^{\textrm{VR}}(X_{i_j})\) with 1 core takes much longer than \(\textrm{dgm}_2^\textrm{VR}(X_{i_j}')\), the parallelized computation of the former is in the same ballpark as the latter in terms of running time for the range of number of cores that we utilized. For example, \({\textbf{D}}_{6,2}^{\textrm{VR}}(X_{i_j})\) took less time than \(\textrm{dgm}_2^\textrm{VR}(X_{i_j}')\) as soon as we had 4 cores, and the running time halved with 8 or more cores. In this test, we ran all calculations (both persistence sets and persistence diagrams) within the same node with the following specifications: it has a Broadwell architecture with AVX2 and runs Linux 3.10.0 \(-\)1160.81.1.el7.x86_64. It has 22 cores available and 128 GB maximum memory. Our program used 12 cores and was allotted 44.50 GB memory, of which we used used 13.51 GB. The experiments ran with an average CPU frequency of 3.35 GHz.

Fig. 23
figure 23

Running time of \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X_i)\) for \(k=1,2\) in parallel with a variable number of workers (nWorkers). The dashed line is the running time of \(\textrm{dgm}_2(X_i')\) computed sequentially, and the dotted lines are the theoretical speedup guaranteed by Amdahl’s law.

5 Vietoris–Rips Persistence Sets of Spheres

In this section, we will describe the principal persistence sets \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1})\) for all \(k \ge 0\). After that, we will take advantage of functoriality to find some of the persistence sets of the higher dimensional spheres \({\mathbb {S}}^{m}\), \(m \ge 2\), and describe the limitations (if any) to obtain higher principal persistence sets. We begin with a general technical lemma.

Lemma 5.1

Let \(k \ge 1\) and \(n=2k+2\). Let \((X, d_X)\) be a metric space with n points. Then:

  1. 1.

    \(t_d(X) \le 2t_b(X) \), and equality holds if and only if \(v_d\) is well defined and, for every \(i=1,\dots ,n\), \(d_X(x_i,v_d(x_i))=t_d(X) \) and \(d_X(x_i,x)=t_b(X) \) for every \(x \ne v_d(x_i)\).

  2. 2.

    \(\textrm{pers}(\textrm{dgm}_k^\textrm{VR}(X)) = t_d(X) -t_b(X) \le \textbf{sep}(X)\).

  3. 3.

    If X can be isometrically embedded into an interval, then \(t_b(X) \ge t_d(X) \).

Proof

We prove the 3 claims in order.

1. If \(t_b(X) \ge t_d(X) \), then \(\textrm{pers}(\textrm{dgm}_k^\textrm{VR}(X))=0\) and items 1 and 2 are trivially true. Suppose, then, \(t_b(X) < t_d(X) \). Choose any \(x_0,x \in X\) such that \(x \ne x_0,v_d(x_0)\). By definition of \(v_d(x_0)\), we have \(d_X(x_0, x) \le t_b(x_0) \) and \(d_X(x,v_d(x_0)) \le t_b(v_d(x_0)) \). Then

$$\begin{aligned} d_X(x_0,x)\ge & {} d_X(x_0,v_d(x_0)) - d_X(x,v_d(x_0)) \nonumber \\\ge & {} t_d(x_0) - t_b(v_d(x_0)) \ge t_d(X) -t_b(X) . \end{aligned}$$
(9)

Since \(d_X(x_0,x) \le t_b(X) \), we get the bound \(t_d(X) \le 2t_b(X) \). If \(t_d(X) = 2t_b(X) \), then every intermediate inequality holds; in particular, we have \(d_X(x_0,x) = t_b(X) \) and \(d_X(x_0, v_d{x_0}) = t_d(X) \).

2. The finer bound \(\textbf{sep}(X) \ge t_d(X) -t_b(X) = \textrm{pers}(\textrm{dgm}_k^\textrm{VR}(X))\) follows by taking the minimum of \(d_X(x_0,x)\) over \(x_0\) and x in inequality (9).

3. Suppose, without loss of generality, that \(X \subset {\mathbb {R}}\) and that \(x_1< x_2< \cdots < x_n\). Notice that \(t_d(x_k) = \max (x_k-x_1, x_n-x_k)\) and, in particular, \(t_d(x_1) = t_d(x_n) = x_n-x_1\). If \(k \ne 1,n\), then \(t_b(x_1) \ge x_k-x_1\) and \(t_b(x_n) \ge x_n-x_k\). Then

$$\begin{aligned} t_b(X) \ge \max (t_b(x_1) , t_b(x_n) ) \ge \max (x_k-x_1, x_n-x_k) = t_d(x_k) \ge t_d(X) . \end{aligned}$$

\(\square \)

5.1 Characterization of \(t_b(X) \) and \(t_d(X) \) for \(X \subset {\mathbb {S}}^{1}\)

Now we focus on subsets of the circle. We refer to a set \(X=\{x_1, x_2, \dots , x_n\} \subset {\mathbb {S}}^{1}\) as a configuration of n points in \({\mathbb {S}}^{1}\).

Definition 5.2

Let \({\mathbb {S}}^{1}\) be the quotient \([0,2\pi ]/0 \sim 2\pi \) equipped with the geodesic distance, i.e.

$$\begin{aligned} d_{{\mathbb {S}}^{1}}(x, y):= \min (|x-y|, 2\pi -|x-y|), \end{aligned}$$

for \(x,y \in {\mathbb {S}}^{1}\). Also, we adopt the cyclic order \(\prec \) on \({\mathbb {S}}^{1}\) from [1]. We refer to the increasing direction in \([0,2\pi ]\) as counter-clockwise, and define \(x \prec y \prec z\) to mean that the counter-clockwise path starting at x meets y before reaching z. We also use \(\preceq \) to allow the points to be equal.

Throughout this section, \(k \ge 1\) and \(n=2k+2\) will be fixed. Addition of indices is done modulo n. Let \(X=\{x_1, x_2, \dots , x_n\} \subset {\mathbb {S}}^{1}\) such that \(x_i \prec x_{i+1} \prec x_{i+2}\) for all i. Write \(d_{ij}=d_{{\mathbb {S}}^{1}}(x_i,x_j)\) for the distances, and assume \(t_b(X) < t_d(X) \).

Lemma 5.3

Let \(X=\{x_1, x_2, \dots , x_n\} \subset {\mathbb {S}}^{1}\) be such that \(x_{i-1} \prec x_{i} \prec x_{i+1}\). Then:

  1. 1.

    For every i, \(t_b(x_i) = \max (d_{i,i+k}, d_{i,i-k})\) and \(t_d(x_i) =d_{i,i+k+1}\).

  2. 2.

    \(t_b(X) = \max _{i=1,...,n} d_{i,i+k}\) and \(t_d(X) = \min _{i=1,\dots ,n} d_{i,i+k+1}.\)

  3. 3.

    For every i,

    $$\begin{aligned} d_{i,i+k}=d_{i,i+1}+d_{i+1,i+2}+\cdots +d_{i+k-1,i+k}. \end{aligned}$$
  4. 4.

    \(t_b(X) \ge \frac{k}{k+1} \pi \).

Fig. 24
figure 24

This configuration shows the edges that realize \(t_b(x_1) = \max (d_{1,1+3}, d_{1,1-3})\) and \(t_d(x_1) =d_{1,1+3+1}\) when \(k=3\) and \(n=8\). The shortest path between \(x_1\) and \(x_5\) contains \(x_8,x_7,x_6\), so when \(r>d_{15}\), \(\textrm{VR}_{r}(X)\) will contain a 4-simplex. These ideas were inspired by [49].

Proof

1 Let \(r \in [t_b(X) ,t_d(X) )\). By Proposition 4.3, \(\textrm{VR}_{r}(X)\) is a cross-polytope with n points. In particular, \(\textrm{VR}_{r}(X)\) contains no simplices of dimension \(k+1\). We claim that this forces \(t_d(x_i) = d_{i,i+k+1}\) for all i. Indeed, the shortest path between \(x_i\) and \(x_{i+k+1}\) contains either the set \(\{x_{i+1},\dots ,x_{i+k-1}\}\) or the set \(\{x_{i+k+2},\dots ,x_{i-1}\}\) (see Fig. 24). For any \(x_j\) in that shortest path, \(d_{i,j} \le d_{i,i+k+1}\), so if we had \(d_{i,i+k+1} \le r\), \(\textrm{VR}_{r}(X)\) would contain a \(k+1\) simplex, either \([x_i,x_{i+1}, \dots , x_{i+k+1}]\) or \([x_{i+k+1},x_{i+k+2}, \dots , x_{i}]\). Thus, \(r< d_{i,i+k+1}\) for all i.

In particular, \(\textrm{VR}_r(X)\) doesn’t contain the edge \([x_i,x_{i+k+1}]\). According to Definition 2.14, cross-polytopes contain all edges incident on a fixed point \(x_i\) except one, so \([x_i,x_j] \in \textrm{VR}_{r}(X)\) for all \(j \ne i+k+1\). As a consequence, \(d_{i,j} \le r < d_{i,i+k+1}\) for all \(j \ne i+k+1\), so \(t_d(x_i) = d_{i,i+k+1}\) and \(t_b(x_i) = \max _{j \ne i+k+1} d_{i,j}\). Additionally, the shortest path between \(x_i\) and \(x_{i+k}\) contains the set \(\{x_{i+1},\dots ,x_{i+k-1}\}\) rather than \(\{x_{i+k+1}, \dots , x_{i-1}\}\), so \(d_{i,i+j} \le d_{i,i+k}\) for \(j=1,\dots ,k-1\) (otherwise, \(\textrm{VR}_{r}(X)\) would contain the \(k+2\) simplex \([x_{i+k}, x_{i+k+1}, \dots , x_i]\)). The analogous statement \(d_{i,i-j} \le d_{i,i-k}\) holds for \(j=1,2,\dots ,k-1\). Thus, \(t_b(x_i) = \max (d_{i,i+k}, d_{i,i-k})\).

2. These equations follow by taking the maximum (resp. minimum) over all i of the above expression for \(t_b(x_i) \) (resp. \(t_d(x_i) \)), as per Definition 4.1.

3. As we saw in the proof of item 1, the shortest path from \(x_i\) to \(x_{i+k}\) contains the set \(\{x_{i+1},\dots ,x_{i+k-1}\}\). The length of this path is \(d_{i,i+k} = d_{i,i+1} + \cdots + d_{i+k-1,i+k}\).

4. By items 2 and 3, \(n t_b(X) \ge \sum _{i=1}^{n} d_{i,i+k} = \sum _{i=1}^{n} \sum _{j=1}^{k} d_{i+j-1,i+j} = \sum _{j=1}^{k} \sum _{i=1}^{n} d_{i+j-1,i+j} = k \cdot 2\pi \). Thus, \(t_b(X) \ge \frac{2k}{n}\pi = \frac{k}{k+1}\pi \). \(\square \)

5.2 Characterization of \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1})\) for k Even

As a followup to Lemma 5.3 item 4, we show that for every pair of values \(t_b,t_d\) with \(\frac{k}{k+1}\pi \le t_b < t_d \le \pi \), there exists \(X \subset {\mathbb {S}}^{1}\) with \(|X|=2k+2\) such that \(t_b(X) =t_b\) and \(t_d(X) =t_d\).

Theorem 5.4

For even k, \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1}) = \left\{ (t_b,t_d): \dfrac{k}{k+1}\pi \le t_b < t_d \le \pi \right\} \).

Proof

We will first construct what we call the critical configurations, those where \(t_b(X) = \frac{k}{k+1}\pi \) and \(t_d(X) =t_d \in (t_b(X) ,\pi ]\). Consider the points

$$\begin{aligned} x_{i} = {\left\{ \begin{array}{ll} \frac{\pi }{k+1} \cdot (i-1), &{} i \text { odd}\\ \frac{\pi }{k+1} \cdot (i-1)-(\pi -t_d), &{} i \text { even}, \end{array}\right. } \end{aligned}$$

for \(i=1,\dots ,n\). When i is odd, \(x_{i-1} < x_i\). If i is even, by Lemma 5.3 item 4, we have \(x_i-x_{i-1} = -\frac{k\pi }{k+1}+t_d > -\frac{k\pi }{k+1}+t_b \ge 0\). Thus, \(0 = x_1< x_2< \cdots < x_n\). Additionally, since \(t_d \le \textbf{diam}({\mathbb {S}}^{1})\), we have \(x_{2k+2} = \frac{k\pi }{k+1} +t_d \le \frac{(2k+1) \pi }{k+1} < 2\pi \), so we have \(x_{i} \prec x_{i+1} \prec x_{i+2}\) for all i.

Since k is even, i and \(i+k\) have the same parity, so if \(1 \le i \le k+2\),

$$\begin{aligned} \textstyle x_{i+k}-x_i = \frac{\pi }{k+1}[(i+k-1)-(i-1)] = \frac{k}{k+1} \pi . \end{aligned}$$
(10)

If \(k+3 \le i \le 2k+2\), \(x_{i+k} = x_{i-k-2}\), and the last equation gives \(|x_{i+k}-x_{i}| = x_{i} - x_{i-k-2} = \frac{k+2}{k+1} \pi \). Since \(\frac{k}{k+1}\pi + \frac{k+2}{k+1}\pi = 2\pi \), for all i we have \(d_{i,i+k} = \min (|x_{i+k}-x_{i}|, 2\pi - |x_{i+k}-x_{i}|) = \min \left( \frac{k}{k+1}\pi , \frac{k+2}{k+1}\pi \right) = \frac{k}{k+1} \pi \). Thus, \(t_b(X) = \max _i d_{i,i+k} = \frac{k}{k+1}\pi \). To find \(t_d(X) = \min _i d_{i,i+k+1}\), we have two cases depending on the parity of i. If \(i \le k+1\) is odd (and \(i+k+1 \le 2k+2\) even),

$$\begin{aligned} \textstyle |x_{i+k+1}-x_{i}| = \frac{\pi }{k+1} [(i+k)-(i-1)] -(\pi -t_d) = t_d, \end{aligned}$$
(11)

and if \(i \le k+1\) is even,

$$\begin{aligned} \textstyle |x_{i+k+1}-x_{i}| = \left| \frac{\pi }{k+1} [(i+k)-(i-1)] + (\pi -t_d)\right| = 2\pi - t_d. \end{aligned}$$
(12)

Since \(d_{i,i+k+1} = \min (|x_{i+k+1}-x_{i}|, 2\pi - |x_{i+k+1}-x_{i}|)\), the above equations imply \(d_{i,i+k+1} = t_d\) irrespective of the parity of i. If \(i>k+1\), the index \(i+k+1\) equals \(i-k-1\) modulo n, and we have \(1 \le i-k-1 \le k+1\). Hence, the paragraph above gives \(d_{i,i+k+1} = d_{i-k-1,i} = t_d\). All in all, \(t_d(X) = \min d_{i,i+k+1} = t_d\).

Lastly, we can use these critical configurations to construct \(X'\) such that \(t_b(X') = t_b > \frac{k}{k+1}\pi \). Let \(\varepsilon :=t_b-\frac{k}{k+1}\pi > 0\). Define \(x'_{1}:= x_{1} + \varepsilon \), \(x'_{k+2}:= x_{k+2} + \varepsilon \), and \(x_i':= x_i\) for \(i \ne 1, k+2\). Write \(d_{ij}' = d_{{\mathbb {S}}^{1}}(x_i', x_j')\). In order to use Lemma 5.3 item 2 to find \(t_b(X') \) and \(t_d(X') \), we have to check that \(x_{i}' \prec x_{i+1}' \prec x_{i+2}'\) for all \(1 \le i \le 2k+2\). This boils down to checking \(x_{2k+2}' \prec 0 \prec x_{1}' \prec x_{2}'\) and \(x_{k+1}' \prec x_{k+2}' \prec x_{k+3}'\) because \(x_{i}' = x_{i}\) for all \(i \ne 1, k+2\). Since the points are listed in counter-clockwise order, the desired cyclic orderings hold as long as \(x_{1}' < x_{2}'\) and \(x_{k+2}' < x_{k+3}'\). Furthermore, these inequalities are equivalent to \(\varepsilon < x_{2}-x_{1}, x_{k+3}-x_{k+2}\). In fact, \(\varepsilon = t_b - \frac{k}{k+1} \pi < t_d-\frac{k}{k+1}\pi = x_{2}-x_{1}\) and, since \(t_d \le \pi \), \(x_{2} - x_{1} = t_d-\frac{k}{k+1}\pi \le \frac{k+2}{k+1}\pi - t_d = x_{k+3}-x_{k+2}\). In conclusion, \(x_{i}' \prec x_{i+1}' \prec x_{i+2}'\) for all \(1 \le i \le 2k+2\), and by Lemma 5.3 item 2, \(t_b(X') = \max _i d_{i,i+k}'\) and \(t_d(X') = \min _i d_{i,i+k+1}'\).

The only distances among \(d_{i,i+k}'\) and \(d_{i,i+k+1}'\) that might differ from the corresponding \(d_{ij}\) are those involving \(x_{1}'\) and \(x_{k+2}'\), namely \(d_{1,k+1}\), \(d_{1-k,1} = d_{k+3,1}\), \(d_{k+2,2k+2}\), \(d_{2,k+2}\), and \(d_{1,k+2}\). To compute the first pair of distances, the arguments following equation (10) give \(d_{1,k+1} = x_{k+1} - x_{1}\) and \(d_{k+3,1} = 2\pi - (x_{k+3} - x_{1})\). Then

$$\begin{aligned} x_{k+1}'-x_{1}'&=\textstyle x_{k+1}-x_{1}-\varepsilon = d_{1,k+1} - \varepsilon = \frac{k}{k+1}\pi - \varepsilon \text {, and }\\ 2\pi - (x_{k+3}'-x_{1}')&=\textstyle 2\pi - (x_{k+3}-x_{1}) + \varepsilon = d_{k+3,1} + \varepsilon = \frac{k}{k+1}\pi + \varepsilon = t_b. \end{aligned}$$

Both quantities are strictly less than \(\pi \), so \(d_{1,k+1}' = x_{k+1}'-x_{1}' = \frac{k}{k+1}\pi - \varepsilon \) and \(d_{k+3,1}' = 2\pi - (x_{k+3}'-x_{1}') = \frac{k}{k+1}\pi + \varepsilon \). An analogous argument gives \(d_{k+2,2k+2}' = \frac{k}{k+1}\pi - \varepsilon \) and \(d_{2,k+2}' = \frac{k}{k+1}\pi + \varepsilon \). Lastly, since \(x_{k+2}'-x_{1}' = x_{k+2}-x_{1}\), we have \(d_{1,k+2}' = d_{1,k+2}\). Thus, \(t_d(X') = \min d'_{i,i+k+1} = t_d\) and \(t_b(X') = \max d'_{i,i+k} = \max (\frac{k}{k+1}\pi - \varepsilon , \frac{k}{k+1}\pi , \frac{k}{k+1}\pi + \varepsilon ) = \frac{k}{k+1}\pi + \varepsilon = t_b\). \(\square \)

Fig. 25
figure 25

Left: Example of a critical configuration for \(k=2\) as in Theorem 5.4. The solid blue lines have length \(t_b(X) = 2\pi /3\), while the dotted red line has length \(t_d(X) \). Right: Example of a critical configuration for \(k=3\) in Theorem 5.5. Here, \(t_b(X) =2L+s\) and \(t_d(X) =2L+2s\). Both: The sequence \(x_1, x_{1+k}, x_{1+2k}, \dots \) forms a regular \((k+1)\)-gon in the left image and a \((2k+2)\)-gon in the right.

5.3 Characterization of \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1})\) for k Odd

An important difference between even and odd k is that only for even k can we find configurations that have the minimal possible birth time \(t_b(X) =\frac{k}{k+1}\pi \) given any \(t_d \in (t_b(X) ,\pi ]\). The difference is that sequences of the form \(x_i,x_{i+k},x_{i+2k},\dots \) eventually reach all points when k is odd, but only half of them when k is even (see Fig. 25). This allows us to separate \(X \subset {\mathbb {S}}^{1}\) into two regular \((k+1)\)-gons with fixed \(t_b(X) \) and it still allows control on \(t_d(X) \), as shown in Proposition 5.4. For odd k, we will instead use an idea from Proposition 5.4 of [1]. We won’t need the result in its full generality, so we only use part of its argument to provide a bound for \(t_b(X) \) in terms of \(t_d(X) \).

Theorem 5.5

Let k be an odd positive integer. Then \(t_d(X) \ge (k+1)(\pi - t_b(X) )\), and this inequality is tight.

Proof

Fix \(i \in \{1,\dots ,n\}\). Let \(r \ge \frac{k}{k+1}\pi \) and \(\delta = r - \frac{k-1}{k}\pi \). Notice that \(k^2 = \frac{1}{2}(k-1) \cdot n + 1\), so the path that passes through the points \(x_i, x_{i+k}, \dots , x_{i+k\cdot k}\) makes \(\frac{1}{2}(k-1)\) revolutions around the circle and stops at \(x_{i+k^2}=x_{i+1}\). At the same time, \(d_{\ell ,\ell +k} \le t_b(X) \). These facts give:

$$\begin{aligned} \frac{1}{2}(k-1) \cdot 2\pi + d_{i,i+1} = \sum _{j=1}^{k} d_{i+(j-1)k,i+jk} \le kt_b(X) . \end{aligned}$$

Thus, \((k-1)\pi + \max _{i=1,\dots ,n} d_{i,i+1} \le kt_b(X) \). However, by Lemma 5.3, there exists an \(\ell \) for which \(d_{\ell ,\ell +k+1}=t_d(X) \). Let \(\gamma \) be the path between \(x_\ell \) and \(x_{\ell +k+1}\) such that \(d_{\ell ,\ell +k+1} + |\gamma | = 2\pi \). Assume, without loss of generality, that \(\gamma \) contains \(x_{\ell +1}\). This means that \(|\gamma | = d_{\ell ,\ell +1} + d_{\ell +1,\ell +k+1}\), so

$$\begin{aligned} d_{\ell ,\ell +1} = |\gamma |-d_{\ell +1,\ell +k+1} = 2\pi -t_d(X) -d_{\ell +1,\ell +k+1} \ge 2\pi - t_d(X) - t_b(X) . \end{aligned}$$

Thus, \(kt_b(X) \ge (k-1)\pi +\max _{i=1,\dots ,n} d_{i,i+1} \ge (k+1)\pi -t_d(X) -t_b(X) \). Solving this inequality for \(t_d(X) \) gives the result.

In order to prove tightness, we describe the critical configurations in terms of the distances between consecutive points. Let \(0< t_b < t_d \le \pi \) be such that \(t_d = (k+1)(\pi -t_b)\). Replacing \(t_d\) with the bounds \(t_b\) and \(\pi \) in the equation \(t_d = (k+1)(\pi -t_b)\) implies \(\frac{k}{k+1} \pi \le t_b < \frac{k+1}{k+2}\pi \). Define \(L:= kt_b-(k-1)\pi \) and \(s:= -(k+2)t_b+(k+1)\pi \). Observe that the bounds \(\frac{k}{k+1} \pi \le t_b < \frac{k+1}{k+2}\pi \) imply that \(0 < s \le L\). Additionally, it can be checked that \((k+2)L+ks = 2\pi \). Let

$$\begin{aligned} x_{i}:= {\left\{ \begin{array}{ll} \left\lfloor \frac{i}{2} \right\rfloor L + \left\lfloor \frac{i-1}{2} \right\rfloor s &{} 1 \le i \le k+1,\\ \left\lfloor \frac{i+1}{2} \right\rfloor L + \left\lfloor \frac{i-2}{2} \right\rfloor s &{} k+2 \le i \le 2k+2. \end{array}\right. } \end{aligned}$$
(13)

By Lemma 5.3 item 2, \(t_b(X) = \min _i d_{i,i+k}\), so we compute the distances \(d_{i,i+k} = \min (|x_{i+k}-x_{i}|, 2\pi - |x_{i+k}-x_{i}|)\). For \(i=1\), since k is odd, \(\frac{k \pm 1}{2}\) is an integer and so

$$\begin{aligned} \textstyle x_{i+k}-x_{i} = x_{k+1}-x_{1} = \left( \left\lfloor \frac{k+1}{2} \right\rfloor L + \left\lfloor \frac{k-1}{2} \right\rfloor s \right) - 0 = \frac{k+1}{2} L + \frac{k-1}{2} s. \end{aligned}$$

If \(2 \le i \le k+1\), we have \(k+2 \le i+k \le 2k+1\). Also, observe that if \(x-y \in {\mathbb {Z}}\), then \(\lfloor x \rfloor - \lfloor y \rfloor = x-y\). Hence,

$$\begin{aligned} x_{i+k} - x_{i}&=\textstyle \left( \left\lfloor \frac{i+k+1}{2} \right\rfloor L + \left\lfloor \frac{i+k-2}{2} \right\rfloor s \right) - \left( \left\lfloor \frac{i}{2} \right\rfloor L + \left\lfloor \frac{i-1}{2} \right\rfloor s \right) \\&=\textstyle \left( \left\lfloor \frac{i+k+1}{2} \right\rfloor - \left\lfloor \frac{i}{2} \right\rfloor \right) L + \left( \left\lfloor \frac{i+k-2}{2} \right\rfloor - \left\lfloor \frac{i-1}{2} \right\rfloor \right) s =\textstyle \frac{k+1}{2} L + \frac{k-1}{2} s. \end{aligned}$$

For \(i=k+2\),

$$\begin{aligned} x_{i+k} - x_{i}&= x_{2k+2} - x_{k+2}\\&=\textstyle \left( \left\lfloor \frac{2k+3}{2} \right\rfloor L + \left\lfloor \frac{2k}{2} \right\rfloor s \right) - \left( \left\lfloor \frac{k+3}{2} \right\rfloor L + \left\lfloor \frac{k}{2} \right\rfloor s \right) \\&=\textstyle \left( \left\lfloor \frac{2k+3}{2} \right\rfloor - \left\lfloor \frac{k+3}{2} \right\rfloor \right) L + \left( \left\lfloor \frac{2k}{2} \right\rfloor - \left\lfloor \frac{k}{2} \right\rfloor \right) s \\&=\textstyle \left( \frac{2k+2}{2} - \frac{k+3}{2} \right) L + \left( \frac{2k}{2} - \frac{k-1}{2} \right) s\\&=\textstyle \frac{k-1}{2} L + \frac{k+1}{2} s. \end{aligned}$$

If \(k+3 \le i \le 2k+2\), then \(i+k\) modulo n is \(i-k-2\). Since \(|x_a-x_b| + |x_b-x_a| = 2\pi \) for any ab, and \(1 \le i-k-2 \le k\), the case above gives

$$\begin{aligned} |x_{i+k} - x_{i}|&= 2\pi - |x_{i} - x_{i+k}| = 2\pi - |x_{i} - x_{i-k-2}|\\&=\textstyle 2\pi - \left( \frac{k+1}{2} L + \frac{k-1}{2} s \right) = \frac{k+3}{2} L + \frac{k+1}{2} s. \end{aligned}$$

Also, since \((k+2)L + ks = 2\pi \), we have \(\left( \frac{k+1}{2} L + \frac{k-1}{2} s \right) + \left( \frac{k+3}{2} L + \frac{k+1}{2} s \right) = 2\pi \). Thus, putting together the above calculations gives, for \(i \ne k+2\),

$$\begin{aligned} d_{i,i+k}&=\textstyle \min \left( |x_{i+k}-x_{i}|, 2\pi - |x_{i+k}-x_{i}| \right) \nonumber \\&= {\left\{ \begin{array}{ll} |x_{i+k}-x_{i}|, &{} 1 \le i \le k+1,\\ 2\pi - |x_{i+k}-x_{i}| &{} k+3 \le i \le 2k+2. \end{array}\right. } \end{aligned}$$
(14)

In both cases we obtain \(d_{i,i+k} = \frac{k+1}{2} L + \frac{k-1}{2} s\). For \(i=k+2\), we have

$$\begin{aligned} d_{i,i+k}&=\textstyle \min \left( \frac{k-1}{2} L + \frac{k+1}{2} s, 2\pi - \frac{k-1}{2} L - \frac{k+1}{2} s\right) \\&=\textstyle \min \left( \frac{k-1}{2} L + \frac{k+1}{2} s, \frac{k+5}{2} L + \frac{k-1}{2} s \right) = \frac{k-1}{2} L + \frac{k+1}{2} s. \end{aligned}$$

Hence,

$$\begin{aligned} t_b(X) = \max \left( \frac{k+1}{2} L + \frac{k-1}{2} s, \frac{k-1}{2} L + \frac{k+1}{2} s \right) = \frac{k+1}{2} L + \frac{k-1}{2} s = t_b. \nonumber \\ \end{aligned}$$
(15)

To find \(t_d(X) \), we compute the distances \(d_{i,i+k+1}\) (cf. Lemma 5.3 item 2). For \(1 \le i \le k+1\),

$$\begin{aligned} x_{i+k+1} - x_{i}&=\textstyle \left( \left\lfloor \frac{i+k+2}{2} \right\rfloor L + \left\lfloor \frac{i+k-1}{2} \right\rfloor s \right) - \left( \left\lfloor \frac{i}{2} \right\rfloor L + \left\lfloor \frac{i-1}{2} \right\rfloor s \right) \\&=\textstyle \left( \left\lfloor \frac{i+k+2}{2} \right\rfloor - \left\lfloor \frac{i}{2} \right\rfloor \right) L + \left( \left\lfloor \frac{i+k-1}{2} \right\rfloor - \left\lfloor \frac{i-1}{2} \right\rfloor \right) s. \end{aligned}$$

When i is odd, the above simplifies to

$$\begin{aligned} \textstyle x_{i+k+1} - x_i = \left( \frac{i+k+2}{2} - \frac{i-1}{2} \right) L + \left( \frac{i+k-2}{2} - \frac{i-1}{2} \right) s = \frac{k+3}{2} L + \frac{k-1}{2} s, \end{aligned}$$

and when i is even,

$$\begin{aligned} \textstyle x_{i+k+1} - x_{i} = \left( \frac{i+k+1}{2} - \frac{i}{2} \right) L + \left( \frac{i+k-1}{2} - \frac{i-2}{2} \right) s = \frac{k+1}{2} L + \frac{k+1}{2} s. \end{aligned}$$

Notice that \(\left( \frac{k+3}{2} L + \frac{k-1}{2} s\right) + \left( \frac{k+1}{2} L + \frac{k+1}{2} s \right) = (k+2)L + ks = 2\pi \). When \(k+2 \le i \le 2k+2\), we get \(x_{i+k+1} = x_{i-k-1}\) and, since \(1 \le i-k-1 \le k+1\), the above equations give

$$\begin{aligned} \textstyle |x_{i+k+1} - x_{i}|&= 2\pi - |x_{i}-x_{i+k+1}| = 2\pi - |x_{i} - x_{i-k-1}|\\ {}&= {\left\{ \begin{array}{ll} \frac{k+1}{2} L + \frac{k+1}{2} s, &{} i \text { odd},\\ \frac{k+3}{2} L + \frac{k-1}{2} s, &{} i \text { even}. \end{array}\right. } \end{aligned}$$

Hence,

$$\begin{aligned} d_{i,i+k+1}&= \min \{ |x_{i+k+1} - x_{i}|, 2\pi - |x_{i+k+1} - x_{i}| \}\\&= \textstyle \min \{ \frac{k+1}{2} L + \frac{k+1}{2} s, \frac{k+3}{2} L + \frac{k-1}{2} s \}\\&= \textstyle \frac{k+1}{2} L + \frac{k+1}{2} s = (k+1)(\pi -t_b). \end{aligned}$$

Thus, \(t_d(X) = \min _{i} d_{i,i+k+1} = (k+1)(\pi -t_b) = (k+1)(\pi -t_b(X) )\). \(\square \)

Theorem 5.6

For odd k,

$$\begin{aligned} \textstyle {\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1}) = \left\{ (t_b,t_d): (k+1)(\pi -t_b) \le t_d \text { and } \frac{k}{k+1}\pi \le t_b < t_d \le \pi \right\} .\nonumber \\ \end{aligned}$$
(16)

Proof

Theorem 5.5 and Lemma 5.3 item 4 imply that \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1})\) is contained in the right-hand side of (16). To show the other inclusion, choose any pair \((t_b', t_d')\) in the right-hand side of (16). We now exhibit a set \(X' = \{x_1', \dots , x_n'\} \subset {\mathbb {S}}^{1}\) with \(t_b(X') =t_b'\) and \(t_d(X') =t_d'\). Let \(t_d:= t_d'\) and \(t_b = \pi -\frac{1}{k+1}t_d\). Notice that \(t_d = (k+1)(\pi - t_b)\), so let \(X = \{x_1, \dots , x_n\}\) be the set defined in (13). Let \(\varepsilon = t_b'-t_b\). Since \((k+1)(\pi -t_b) = t_d = t_d' \ge (k+1)(\pi -t_b')\), we must have \(\varepsilon \ge 0\). Now define \(x_1':= x_1 + \varepsilon \), \(x_{k+2}':= x_{k+2}+\varepsilon \), and \(x_i':= x_i\) for \(i \ne 1, k+2\). Let \(d_{ij} = d_{{\mathbb {S}}^{1}}(x_i,x_j)\) and \(d_{ij}' = d_{{\mathbb {S}}^{1}}(x_i',x_j')\). We claim that \(t_b(X') = t_b+\varepsilon = t_b'\) and \(t_d(X') = t_d = t_d'\).

Notice that \(\varepsilon = t_b'-t_b < t_d'-t_b = t_d-t_b = t_d(X) -t_b(X) \), which, by Lemma 5.1 item 2, is bounded above by \(\textbf{sep}(X)\). Because of this, \(x_1' = x_1+\varepsilon < x_1+\textbf{sep}(X) \le x_{2} = x_{2}'\), so \(x_1 \prec x_1' \prec x_2'\). Analogously, \(x_{k+2} \prec x_{k+2}' \prec x_{k+3}'\). Since \(x_1'\) and \(x_{k+2}'\) are the only points for which \(x_i' \ne x_i\), the previous two inequalities combined with \(x_i \prec x_{i+1} \prec x_{i+2}\) imply that \(x_i' \prec x_{i+1}' \prec x_{i+2}'\). Hence, by Lemma 5.3, \(t_b(X') = \max _i d_{i,i+k}'\) and \(t_d(X') = \min _i d_{i,i+k+1}'\).

Now we find \(d_{i,i+k}'\) in terms of \(d_{i,i+k}\) and \(\varepsilon \). Observe that \(d_{i,i+k}' = d_{i,i+k}\) whenever \(i \ne 1, 2, k+2, k+3\) because \(x_i' \ne x_i\) only when \(i=1, k+2\). In fact, \(x_1< x_{1+k} < x_{1-k}\) and \(\varepsilon < \textbf{sep}(X)\), so we can write the distances and absolute values in (14) as

$$\begin{aligned} d_{1,1+k} - \varepsilon&= |x_{1+k}-x_1|-\varepsilon = x_{1+k}-(x_1+\varepsilon ) = |x_{1+k}'-x_{1}'|, \text { and}\\ d_{1-k,1} + \varepsilon&= 2\pi - |x_{1} - x_{1-k}| + \varepsilon = 2\pi - [x_{1-k} - (x_{1}+\varepsilon )] = 2\pi - |x_{1-k}' - x_{1}'|. \end{aligned}$$

In particular, the two quantities \(|x_{1+k}'-x_{1}'|\) and \(2\pi - |x_{1-k}' - x_{1}'|\) are bounded above by \(t_b+\varepsilon = t_b' < \pi \) because both \(d_{1,1+k} - \varepsilon \) and \(d_{1,1-k} + \varepsilon \) are. Hence, \(d_{1,1 \pm k}' = \min (|x_{1 \pm k}' - x_{1}'|, 2\pi - |x_{1 \pm k}' - x_{1}'|) = d_{1,1 \pm k} \mp \varepsilon \). An analogous argument gives \(d_{k+2,(k+2) \pm k}' = d_{k+2,(k+2) \pm k} \mp \varepsilon \).

Now we compute \(t_b(X') \) and \(t_d(X') \). Observe that (15) gives \(t_b(X) = d_{i,i+k}\) for all \(i \ne k+2\) and, in particular, that \(d_{1-k,1} \ge d_{i,i+k}\) for all i. By the above paragraph, the distances \(d_{i,i+k}'\) are either equal to \(d_{i,i+k}\) or differ by \(\varepsilon \). For this reason, \(d_{1-k,1}' = d_{1-k,1}+\varepsilon \ge d_{i,i+k} + \varepsilon \ge d_{i,i+k}'\). Thus, \(t_b(X') = \max _i d_{i,i+k}' = d_{1-k,1}' = d_{1-k,1} + \varepsilon = t_b+\varepsilon = t_b'\). To compute \(t_d(X') = \min _i d_{i,i+k+1}'\), observe that the only values of i for which the distance \(d_{i,i+k+1}'\) might differ from \(d_{i,i+k+1}\) are \(i=1, k+2\). However, \(x_1' = x_1+\varepsilon \) and \(x_{k+2}' = x_{k+2}+\varepsilon \), so \(|x_{k+2}'-x_{1}'| = |x_{k+2}-x_{1}|\) and, thus, \(d_{1,k+2}' = d_{1,k+2}\). Hence, \(t_d(X') = \min _i d_{i,i+k+1}' = \min _i d_{i,i+k+1} = t_d = t_d'\). \(\square \)

Remark 5.7

The persistence sets of a circle \(\frac{\lambda }{\pi } \cdot {\mathbb {S}}^{1}\) with diameter \(\lambda \) are obtained by rescaling the results of this section. For example, \({\textbf{D}}_{4,1}^{\textrm{VR}}(\frac{\lambda }{\pi } \cdot {\mathbb {S}}^{1})\) is the set bounded by \(2(\lambda - t_b) \le t_d\) and \(t_b < t_d \le \lambda \).

In general, there are multiple configurations with the same persistence diagram, even among those that minimize the death time. The exception is the configuration that has the minimal birth time, as the following lemma shows.

Proposition 5.8

For any \(k \ge 0\), let \(n = 2k+2\). If \(X \subset {\mathbb {S}}^{1}\) has n points and satisfies \(t_b(X) = \frac{k}{k+1}\pi \) and \(t_d(X) = \pi \), then X is a regular n-gon. As a consequence, the configuration X with n points such that \(\textrm{dgm}_k^\textrm{VR}(X) = \{(\frac{k}{k+1}\pi , \pi )\}\) is unique up to rotations.

Proof

An application of Lemma 5.3 item 3 and the triangle inequality gives:

$$\begin{aligned} \frac{k}{k+1}\pi&= t_b(X) = \max (d_{i,i+k}) \ge \frac{1}{2k+2} \sum _{i=1}^{2k+2} d_{i,i+k}\\&= \frac{1}{2k+2} \sum _{i=1}^{2k+2} \sum _{j=1}^{k} d_{i+j-1,i+j} = \frac{1}{2k+2} \sum _{j=1}^{k} \sum _{i=1}^{2k+2} d_{i+j-1,i+j}\\&= \frac{1}{2k+2} \sum _{j=1}^{k} \left[ \sum _{i=1}^{k+1} d_{i+j-1,i+j} + \sum _{i=k+2}^{2k+2} d_{i+j-1,i+j}\right] \\&\ge \frac{1}{2k+2} \sum _{j=1}^{k} \left[ d_{j,j+k+1} + d_{j+k+1,j} \right] \ge \frac{1}{2k+2} \sum _{j=1}^{k} \left[ 2t_d(X) \right] = \frac{k}{k+1}\pi . \end{aligned}$$

Thus, all intermediate inequalities become equalities, most notably, \(d_{i,i+k} = \frac{k}{k+1}\pi \) and \(d_{j,j+k+1} = \sum _{i=1}^{k+1} d_{i+j-1,i+j} = \pi \). Then \( d_{i,i+1} = d_{i-k,i+1} - d_{i-k,i} = \frac{2\pi }{2k+2}. \) That is, X is a regular n-gon. \(\square \)

5.4 Characterization of \({\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^1)\)

In addition to the characterization of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) given in Theorem 5.6, we can also characterize the persistence measure \({\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\). To set up the context, consider the diagonal \(\Delta _0\subset {\mathbb {R}}^2\). Since any two points in \(\Delta _0\) are at bottleneck distance 0, we can view \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) as a subset of \({\mathbb {R}}^2/\Delta _0\). Let \({\mathcal {L}}\) be the pushforward of the Lebesgue measure under the quotient \({\mathbb {R}}^2 \rightarrow {\mathbb {R}}^2 / \Delta _0\).

Proposition 5.9

Let \(\mu _{{\mathbb {S}}^{1}}\) be the uniform measure on \({\mathbb {S}}^{1}\). With respect to \({\mathcal {L}}\), the persistence measure \({\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) decomposes into a singular measure supported on \(\Delta _0\) and a measure supported on \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}) {\setminus } \Delta _0\) with Radon-Nikodym derivative

$$\begin{aligned} f(t_b,t_d) = \frac{12}{\pi ^3}\left( \pi -t_d \right) , \end{aligned}$$

for \((t_b,t_d) \in {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}) \setminus \Delta _0\). In particular, the probability that the 1-dimensional persistence diagram of a 4-point subset of \({\mathbb {S}}^{1}\) is in \(\Delta _0\) is \(\frac{8}{9}\).

Remark 5.10

Given a set \(X = \{x_1,x_2,x_3,x_4\} \subset {\mathbb {S}}^{1}\) chosen uniformly at random, the probability that \(\textrm{dgm}_1(X)\) is a non-diagonal point is \(\frac{1}{9} \approx 11 \%\). This is consistent with the 11.08 % success rate obtained in the simulations; cf. Example 4.13.

Before proving Proposition 5.9, we give an application where \({\textbf{U}}_{4,1}^{\textrm{VR}}\) can distinguish spaces that \({\textbf{D}}_{4,1}^{\textrm{VR}}\) cannot.

Example 5.11

(Persistence sets of \({\mathbb {S}}^{1}\) without a segment) Let \(L \in (0,2\pi )\). Define \(S_L:= {\mathbb {S}}^{1} {\setminus } (2\pi -L, 2\pi )\) to be the circle with an open segment of length L removed, and give \(S_L\) the restriction of the geodesic metric induced from \({\mathbb {S}}^{1}\) (which, in particular, will not be geodesic). We will show in Proposition 5.12 that \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(S_{L})\) equals \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1})\) for \(0 < L \le \frac{1}{k+1}\pi \) and is strictly contained in \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1})\) otherwise.

As for the degree 1 persistence diagrams of \(S_{L}\) and \({\mathbb {S}}^{1}\), they are different for any value of L. Indeed, if \(r<L\), the balls of radius r in \(S_L\) are isometric to the corresponding balls in \([0, 2\pi -L]\) (with the absolute value metric). Hence, \(\textrm{VR}_r(S_{L}) \simeq \textrm{VR}_r([0,2\pi -L]) \simeq *\) for \(0<r<L\). This implies that, if \((b,d) \in \textrm{dgm}^\textrm{VR}_1(S_L)\), then \(b\ge L>0\). However, it is known that \(\textrm{dgm}^\textrm{VR}_1({\mathbb {S}}^{1})=\{(0,\frac{2\pi }{3})\}\). Thus, \(S_L\) and \({\mathbb {S}}^{1}\) are an example of a pair of spaces that can be distinguished by \(\textrm{dgm}_1^\textrm{VR}\) but not by any \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}\) for which \(L \le \frac{1}{k+1}\pi \).

The last invariant we consider is the measure \({\textbf{U}}_{4,1}^{\textrm{VR}}\), which can distinguish \({\mathbb {S}}^{1}\) and \(S_{L}\) for every \(L \in (0,2\pi )\). For instance, when \(L=\frac{\pi }{2}\), there exists a circle worth of squares in \({\mathbb {S}}^{1}\) but, since all sides of a square have length \(\frac{\pi }{2}\), only one square fits in \(S_{\pi /2}\). Moreover, Proposition 5.9 says that the Radon-Nikodym derivative of \({\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) away from the diagonal is independent of \(t_b\) but, as Fig. 26 shows, the derivative of \({\textbf{U}}_{4,1}^{\textrm{VR}}(S_{\pi /2})\) is not (compare with Fig. 12).

A related example appears in Sect. 9 of [88] which shows that \(\textrm{dgm}_1^\textrm{VR}\) can itself be insensitive to small holes. For a slightly deformed 2-dimensional torus T and a small enough open disk \(D \subset T\), the author shows that \({\text {PH}}_1^\textrm{VR}(T) \cong {\text {PH}}_1^\textrm{VR}(T \setminus D)\). It is interesting that in this case \(\textrm{dgm}^\textrm{VR}_1\) cannot detect the absence of D, in contrast to the case of \({\mathbb {S}}^{1}\) and \(S_L\).

Proposition 5.12

Let \(L \in (0,2\pi )\). Define \(S_L:= {\mathbb {S}}^{1} {\setminus } (2\pi -L, 2\pi )\) to be the circle with an open segment of length L removed, and equip \(S_L\) with the metric induced by the inclusion \(S_L \subset {\mathbb {S}}^1\). Then \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(S_L) \ne {\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1})\) if and only if \(L > \frac{\pi }{k+1}\).

Proof

Suppose that \(0 < L \le \frac{\pi }{k+1}\). Let \(X = \{x_1, \dots , x_{2k+2} \} \subset {\mathbb {S}}^{1}\) such that \(x_{i-1} \prec x_{i} \prec x_{i+1}\) and \(t_b(X) < t_d(X) \). By Lemma 5.3 items 2 and 4, \(t_b(X) = d_{i,i+k}\) for some i and \(t_b(X) \ge \frac{k}{k+1}\pi \). In particular, by Lemma 5.3 item 3, one of the distances \(d_{j,j+1}\) is at least \(\frac{1}{k+1}\pi \) for some \(i \le j < i+k\). In other words, the gap between \(x_j\) and \(x_{j+1}\) is larger than or equal to L, so if we rotate X anticlockwise by \(2\pi -x_{j+1}\), we obtain a set \(X' \subset S_L\) isometric to X. Hence, \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1}) \subset {\textbf{D}}_{2k+2,k}^{\textrm{VR}}(S_{L})\). Since \(S_{L} \hookrightarrow {\mathbb {S}}^{1}\), we also have the other inclusion.

Now suppose that \(L > \frac{1}{k+1}\pi \). The point \(\left( \frac{k}{k+1}\pi , \pi \right) \) is in \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1})\) and, by Proposition 5.8, it is generated by a regular \((2k+2)\)-gon. The side length of that polygon is \(\frac{1}{k+1}\pi < L\), so \(S_L\) cannot contain any regular \((2k+2)\)-gon and, thus, \(\left( \frac{k}{k+1}\pi , \pi \right) \notin {\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{1})\). See, for example, \({\textbf{U}}_{4,1}^{\textrm{VR}}(S_{3\pi /4})\) in Fig. 26. \(\square \)

Fig. 26
figure 26

Left: The space \(S_{\pi /2}\) formed by removing a segment of length \(\pi /2\) from \({\mathbb {S}}^{1}\). Middle and right: The persistence measures \({\textbf{U}}_{4,1}^{\textrm{VR}}(S_L)\) of the truncated circle \(S_L\) for \(L=\pi /2\) (middle) and \(L=3\pi /4\) (right).

Proof of Proposition 5.9

Since \({\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) is a probability measure (and hence finite) and \({\mathcal {L}}\) is positive and \(\sigma \)-finite, the Lebesgue-Radon-Nikodym Theorem ([36, Thm. 3.8]) says that \({\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) decomposes as a sum of a singular measure and an absolutely continuous measure with respect to \({\mathcal {L}}\). We will show that \({\textbf{U}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) is absolutely continuous in \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}) \setminus \Delta _0\) and that the persistence diagram of a 4-point subset of \({\mathbb {S}}^{1}\) is in \(\Delta _0\) with non-zero probability. These facts give the desired decomposition.

Recall that we model \({\mathbb {S}}^{1}\) as the quotient \([0, 2\pi ]/0 \sim 2\pi \). Let \(X = \{x_1, x_2, x_3, x_4\} \subset [0, 2\pi ]\) be 4 points chosen uniformly at random. Since \(t_b(X) \) and \(t_d(X) \) only depend on the distances between the \(x_i\), we may assume \(x_1=0\). Notice that the tuple \((x_2,x_3,x_4)\) is still distributed uniformly in \([0,2\pi ]^3\). Relabel \(x_i\) as \(x^{(j)} \in [0, 2\pi ]\) so that \(0 = x^{(1)}< x^{(2)}< x^{(3)} < x^{(4)}\) and set \(y_i:= x^{(i+1)} - x^{(i)}\) for \(i=1, 2, 3\) and \(y_4:= 2\pi - x^{(4)}\). Let \(D:= \{ (x^{(2)}, x^{(3)}, x^{(4)}) \in [0,2\pi ]^3: x^{(2)}< x^{(3)} < x^{(4)} \}\), and \(\Delta _3(2\pi ):= \{(y_1, y_2, y_3) \in [0, 2\pi ]^3: y_1 + y_2 + y_3 \le 2\pi \}\). Since the only difference between \((x_2, x_3, x_4)\) and \((x^{(2)}, x^{(3)}, x^{(4)})\) is the order of the coordinates, the latter is uniformly distributed in D. Furthermore, the pushforward of the uniform measure on D onto \(\Delta _3(2 \pi )\) under the map \(\Psi (x^{(2)}, x^{(3)}, x^{(4)}) = (x^{(2)}, x^{(3)}-x^{(2)}, x^{(4)} - x^{(3)})\) is the uniform measure because the Jacobian of \(\Psi \) has determinant 1. Hence, we will model a configuration of four points in \({\mathbb {S}}^{1}\) as the set of distances \(y_1, y_2, y_3, y_4\) instead, where \((y_1,y_2,y_3) \in \Delta _3(2\pi )\) is uniformly distributed and \(y_4 = 2\pi - (y_1+y_2+y_3)\).

Now we characterize the measure on non-diagonal points of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\). Fix \((t_b,t_d) \in {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\) with \(t_b < t_d\). By Lemma 5.3, \(t_b(X) = \max _i y_i\) and \(t_d(X) = \min _i(y_i+y_{i+1})\). Since \(\Delta _3(2\pi )\) has the uniform measure, the probability that \(t_b \le t_b(X) < t_d(X) \le t_d\) is the volume of the set

$$\begin{aligned} R(t_b,t_d):= \{(y_1,y_2,y_3) \in \Delta _3(2\pi ): t_b \le t_b(X) < t_d(X) \le t_d\} \end{aligned}$$

divided by \({\text {Vol}}(\Delta _3(2\pi )) = \frac{(2\pi )^3}{3!}\). We will find \({\text {Vol}}(R(t_b,t_d))\) using an integral with a suitable parametrization of \(y_1,y_2,y_3\).

Assume that \(t_b(X) = y_1\). There are four choices for \(t_d(X) \), but to start, let \(t_d(X) =y_1+y_2\). Since \(y_3 \le y_1\) by definition of \(t_b(X) \), we have \(y_3+y_2 \le y_1+y_2\) and, by definition of \(t_d(X) \), \(y_1+y_2 = y_3+y_2\). Thus, the case \(t_d(X) =y_1+y_2\) is a subset of the case when \(t_d(X) = y_2+y_3\). Similarly, \(t_d(X) = y_1+y_4\) implies \(t_d(X) = y_3+y_4\), so we only have to consider two choices for \(t_d(X) \).

Let \(R'(t_b,t_d)\) be the subset of \(R(t_b,t_d)\) where \(t_b(X) =y_1\) and \(t_d(X) =y_2+y_3\). Observe that the inequalities \(y_2+y_3 \le y_3+y_4\) if and only if \(y_2 \le y_4\), so the conditions \(t_b(X) = y_1 \ge t_b\) and \(t_d(X) = y_2+y_3 \le t_d\) are equivalent to the system of inequalities \(t_b \le y_1 < y_2+y_3 \le t_d\), \(y_2 \le y_4 \le y_1\), and \(y_3 \le y_1\). Consider the substitution \(s=y_2+y_3\) and rewrite \(y_4 = 2\pi -y_1-s\). These changes give, for example, that \(y_4 \le y_1\) is equivalent to \(2\pi -2y_1 = y_4-y_1+s \le s\). In a similar fashion, substituting s and \(y_4\) into the rest of the inequalities yields the following characterization of \(R'(t_b,t_d)\) in terms of \(y_1,y_2\) and s:

$$\begin{aligned} t_b \le y_1&< t_d\\ \max (2\pi -2y_1,y_1) \le s&\le t_d\\ s-y_1 \le y_2&\le 2\pi -s-y_1. \end{aligned}$$

Notice that the Jacobian \(\left| \frac{\partial (y_1,y_2,y_3)}{\partial (y_1,y_2,s)} \right| \) of the transformation \((y_1,y_2,y_3) \mapsto (y_1,y_2,s) = (y_1,y_2,y_2+y_3)\) is 1. Also, when defining \(R'(t_b,t_d)\), we had four choices for \(t_b(X) \) (all four \(y_i\)) and for each, two choices for \(t_d(X) \). Then

$$\begin{aligned} {\text {Vol}}(R(t_b,t_d)) = 8{\text {Vol}}(R'(t_b,t_d)) = \int _{t_b}^{t_d}\int _{\max (2\pi -2y_1,y_1)}^{t_d}\int _{s-y_1}^{2\pi -s-y_1} 8\, \textrm{d}y_2\, \textrm{d}s\, \textrm{d}y_1. \end{aligned}$$

If we define \(f(t_b,t_d):= \frac{1}{{\text {Vol}}(\Delta _3(2\pi ))} \int _{t_d-t_b}^{2\pi -t_d-t_b} 8\, \textrm{d}y_2\) and \(F(t_b, t_d):= {\mathbb {P}}(t_b \le t_b(X) < t_d(X) \le t_d)\), we obtain

$$\begin{aligned} F(t_b,t_d) = \frac{{\text {Vol}}(R(t_b,t_d))}{{\text {Vol}}(\Delta _3(2\pi ))} = \int _{t_b}^{t_d} \int _{\max (2(\pi -\tau _b),\tau _b)}^{t_d} f(\tau _b, \tau _d) \, \textrm{d}\tau _d \, \textrm{d}\tau _b. \end{aligned}$$
(17)

Notice that the lower bound on \(\tau _d\) equals the bound \(t_d \ge 2(\pi -t_b)\) given by Theorem 5.6 when \(k=1\). In other words, \(F(t_b,t_d)\) is the integral of \(f(\tau _b,\tau _d)\) over the subset of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}) {\setminus } \Delta _0\) where \(t_b \le \tau _b < \tau _d \le t_d\). In particular, \(F(t_b, t_d)\) is absolutely continuous with respect to \({\mathcal {L}}\) and its Radon-Nikodym derivative is

$$\begin{aligned} f(t_b,t_d) = \frac{1}{{\text {Vol}}(\Delta _3(2\pi ))} \int _{t_d-t_b}^{2\pi -t_d-t_b} 8\, \textrm{d}y_2 = \dfrac{16(\pi -t_d)}{(2\pi )^3/3!} = \frac{12}{\pi ^3}(\pi -t_d). \end{aligned}$$

Furthermore, the probability that \(t_b(X) <t_b(X) \) equals \(F(\pi /2, \pi ) = \frac{{\text {Vol}}(R(\pi /2, \pi ))}{{\text {Vol}}(\Delta _3(2\pi ))} = \frac{4\pi ^3/27}{(2\pi )^3/3!} = \frac{1}{9}\). Hence, the probability that \(\textrm{dgm}_1(X)\) is in \(\Delta _0\) is \(\frac{8}{9}\). \(\square \)

5.5 Persistence Sets of Ptolemaic Spaces

Example 4.5 showed that in a metric space with four points, the birth time of its one-dimensional persistent homology is given by the length of the largest side and the death time, by that of the smaller diagonal. In this section, we use Ptolemy’s inequality, which relates the lengths of the diagonals and sides of Euclidean quadrilaterals, to bound the first persistence set \({\textbf{D}}_{4,1}^{\textrm{VR}}\) of several spaces and show examples where the bound is attained.

Definition 5.13

A metric space \((X, d_X)\) is called Ptolemaic if for any \(x_1,x_2,x_3,x_4\,{\in }\,X\),

$$\begin{aligned} d_X(x_1,x_3) \cdot d_X(x_2,x_4) \le d_X(x_1,x_2) \cdot d_X(x_3,x_4) + d_X(x_1,x_4) \cdot d_X(x_2,x_3). \nonumber \\ \end{aligned}$$
(18)

It should be noted that the inequality holds for any permutation of \(x_1,x_2,x_3,x_4\) and, in \({\mathbb {R}}^m\), equality holds if and only if the points \(x_1,x_2,x_3,x_4\) lie on a circle or a line. Examples of Ptolemaic metric spaces include the Euclidean spaces \({\mathbb {R}}^m\) and \(\text {CAT}(0)\) spaces; see [13] for a more complete list of references. The basic result of this section is the following.

Proposition 5.14

Let \((X,d_X)\) be Ptolemaic. Then \(t_d \le \sqrt{2}t_b\) for any \((t_b, t_d) \in {\textbf{D}}_{4,1}^{\textrm{VR}}(X)\).

Proof

Let \(X' = \{x_1,x_2,x_3,x_4\} \subset X\) be such that \(t_b(X') < t_d(X') \). As per Example 4.5, relabel the points so that \(t_b(X') = \max (d_{12}, d_{23}, d_{34}, d_{41})\) and \(t_d(X') = \min (d_{13}, d_{24})\). Then, Ptolemy’s inequality gives

$$\begin{aligned} \left( t_d(X') \right) ^2 \le d_{13} d_{24} \le d_{12} d_{34} + d_{23} d_{14} \le 2 \left( t_b(X') \right) ^2. \end{aligned}$$

Taking square root gives the result. \(\square \)

Remark 5.15

If \(t_d(X') =\sqrt{2}t_b(X') \) in the proof of Proposition 5.14, we have \(d_{13} \cdot d_{24} = d_{12} \cdot d_{34} + d_{23} \cdot d_{41}\). In particular, if \(X' \subset {\mathbb {R}}^m\), then \(X'\) must lie on a circle. In other words, any point in the boundary \(t_d = \sqrt{2}t_b\) of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {R}}^m)\) is the persistence diagram of a concyclic 4-point set \(X'\).

Another way to phrase the above proposition is to say that \({\textbf{D}}_{4,1}^{\textrm{VR}}(X)\) is contained in the set

$$\begin{aligned} P_R:= \left\{ (t_b,t_d)| 0 \le t_b < t_d \le \min (\sqrt{2}t_b, R) \right\} , \end{aligned}$$
(19)

with \(R=\textbf{diam}(X)\). A key example where the containment is strict is the following.

Proposition 5.16

Let \({\mathbb {S}}^{1}_E\) denote the unit circle in \({\mathbb {R}}^2\) equipped with the Euclidean metric. Then

$$\begin{aligned} {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}_E) = \left\{ (t_b,t_d)\ \bigg |\ 2t_b \sqrt{1-\dfrac{t_b^2}{4}} \le t_d, \text { and } \sqrt{2} \le t_b < t_d \le 2 \right\} . \end{aligned}$$

Proof

Observe that the Euclidean distance \(d_E\) between two points in \({\mathbb {S}}^{1}\) is related to their geodesic distance d by \(d_E = f_E(d):= 2\sin (d/2)\). Since \(f_E\) is increasing on \([-\pi ,\pi ]\), an interval that contains all possible distances between points in \({\mathbb {S}}^{1}\), a configuration \(X = \{x_1,x_2,x_3,x_4\} \subset {\mathbb {S}}^{1}\) produces non-zero persistence if and only if its Euclidean counterpart \(X_E \subset {\mathbb {S}}^{1}_E\) does. For this reason, \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}_E) = f_E\left( {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\right) \).

From Theorem 5.6,

$$\begin{aligned} {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}) = \{ (t_b,t_d) \mid 2(\pi -t_b) \le t_d \text { and } \pi /2 \le t_b < t_d \le \pi \}. \end{aligned}$$

Applying \(f_E\) to the bound \(t_d \ge 2(\pi -t_b)\) gives

$$\begin{aligned} t_{d,E} = 2\sin (t_d/2)&\ge 2\sin (\pi -t_b) = 2\sin (t_b) = 2\sin (2\arcsin (t_{b,E}/2)) \\&= 4 \sin (\arcsin (t_{b,E}/2)) \cos (\arcsin (t_{b,E}/2)) = 2 t_{b,E} \sqrt{1-t_{b,E}^2/4}, \end{aligned}$$

while the image of the bound \(\pi /2 \le t_b < t_d \le \pi \) under \(f_E\) is \(\sqrt{2} \le t_{b,E} < t_{d,E} \le 2\). \(\square \)

Even though \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}_E)\) doesn’t attain equality in the bound given by Proposition 5.14, it can be used to show that other spaces do. Two examples are \({\mathbb {S}}^{2}\) and \({\mathbb {R}}^2\).

Proposition 5.17

For \(n \ge 2\),

$$\begin{aligned} {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {R}}^n)= & {} \left\{ (t_b,t_d) \ \bigg |\ 0< t_b< t_d \le \sqrt{2}t_b \right\} \ \text {and} \ {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{n}_E) \\= & {} \left\{ (t_b,t_d) \ \bigg |\ 0< t_b < t_d \le \min (\sqrt{2}t_b, 2) \right\} . \end{aligned}$$

In particular, both sets are convex.

Proof

Since both \({\mathbb {R}}^n\) and \({\mathbb {S}}^{n}_E \subset {\mathbb {R}}^{n+1}\) are Ptolemaic spaces, Proposition 5.14 gives \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {R}}^n) \subset P_\infty \) and \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{n}_E) \subset P_2\) (see equation 19). To show the other direction, notice that \({\mathbb {R}}^n\) contains circles \(R \cdot {\mathbb {S}}^{1}_E\) of any radius \(R>0\). By functoriality of persistence sets (Remark 3.11), \({\textbf{D}}_{4,1}^{\textrm{VR}}(R \cdot {\mathbb {S}}^{1}_E) \subset {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {R}}^n)\) so, in particular, \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {R}}^n)\) contains the line \([\sqrt{2} R, 2R) \times 2R\) that bounds \({\textbf{D}}_{4,1}^{\textrm{VR}}(R \cdot {\mathbb {S}}^{1}_E)\) from above (see Proposition 5.16 and Fig. 27). The inequality \(t_b < t_d \le \sqrt{2} t_b\) can be rearranged to \(\frac{\sqrt{2}}{2} t_d \le t_b < t_d\), so given any point \((t_b,t_d) \in P_\infty \), taking \(R = t_d/2\) gives \((t_b,t_d) \in [\sqrt{2} R, 2R) \times 2R \subset {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {R}}^2)\). Thus, \(P_\infty \subset {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {R}}^n)\). The same argument with the added restriction of \(R \le 1\) shows that \(P_2 \subset {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{n}_E)\). Lastly, \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {R}}^n)\) (resp. \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2}_n)\)) is convex because it is the intersection of two (resp. three) half-spaces. \(\square \)

Fig. 27
figure 27

From left to right: \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\), \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2})\), \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}_E)\), and \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\). Notice that \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}) \subset {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2})\), as indicated by the red line in the second diagram from the left. The analogous statement holds for \({\mathbb {S}}^{1}_E \subset {\mathbb {S}}^{2}_E\). Cf. the two rightmost figures with Proposition 5.17.

Two observations summarize the proof of Proposition 5.17: Ptolemy’s inequality gives a region \(P_\infty \) that contains \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {R}}^2)\), while the circles in \({\mathbb {R}}^2\) produce enough points to fill \(P_\infty \). It turns out that this technique can be generalized to other spaces, provided that we have a suitable analogue of Ptolemy’s inequality. This is explored in the next section.

5.6 Persistence Sets of Surfaces with Constant Curvature

Consider the surface \(M_\kappa \) with constant sectional curvature \(\kappa \). In this section, we will characterize \({\textbf{D}}_{4,1}^{\textrm{VR}}(M_\kappa )\). Proposition 5.17 already has the case \(\kappa =0\), so now we deal with \(\kappa \ne 0\). To fix notation, let \(x,y \in {\mathbb {R}}^3\). Define \(\langle x,y \rangle := x_1y_1+x_2y_2+x_3y_3\) and \(\langle x|y \rangle := -x_1y_1+x_2y_2+x_3y_3\). We model \(M_\kappa \) as \(\displaystyle M_\kappa := \left\{ x \in {\mathbb {R}}^3 \ | \ \langle x,x \rangle = \frac{1}{\kappa } \right\} \text { if } \kappa >0\), and \(\displaystyle M_\kappa := \left\{ x \in {\mathbb {R}}^3 \ | \ \langle x|x \rangle = \frac{1}{\kappa } \text { and } x_1>0 \right\} \) if \(\kappa <0\). In other words, \(M_\kappa \) is the sphere of radius \(1/\sqrt{\kappa }\) if \(\kappa >0\), or a rescaling of the hyperbolic plane if \(\kappa <0\). The geodesic distance in \(M_\kappa \) is given by

$$\begin{aligned} d_{M_\kappa }(x,y):= {\left\{ \begin{array}{ll} \frac{1}{\sqrt{+\kappa }} \arccos (\kappa \langle x,y \rangle ), &{} \text {if } \kappa >0,\\ \frac{1}{\sqrt{-\kappa }} {{\,\textrm{arcosh}\,}}(\kappa \langle x|y \rangle ), &{} \text {if } \kappa <0. \end{array}\right. } \end{aligned}$$
(20)

To use the same technique as in Proposition 5.17, we use a version of Ptolemy’s inequality for spaces of non-zero curvature.

Theorem 5.18

(Spherical and Hyperbolic Ptolemy’s inequality, [84, 85]) Let \(x_1,x_2,x_3,x_4 \in M_\kappa \), and \(d_{ij}=d_{M_\kappa }(x_i,x_j)\). Then

$$\begin{aligned} \textstyle s_\kappa \left( d_{13}/2\right) s_\kappa \left( d_{24}/2\right) \le s_\kappa \left( d_{12}/2\right) s_\kappa \left( d_{34}/2\right) + s_\kappa \left( d_{14}/2\right) s_\kappa \left( d_{23}/2\right) , \end{aligned}$$
(21)

where \(s_\kappa (t)\) is defined as \(\sin (\sqrt{\kappa }t)\) if \(\kappa >0\), and \(\sinh (\sqrt{-\kappa }t)\) if \(\kappa <0\).

With these tools, we are ready to prove the main theorem of this section.

Theorem 5.19

Let \(M_\kappa \) be the 2-dimensional model space with constant sectional curvature \(\kappa \). Then:

  • If \(\kappa >0\), \({\textbf{D}}_{4,1}^{\textrm{VR}}(M_\kappa ) = \Big \{ (t_b,t_d)|\ \sin \left( \frac{\sqrt{ \kappa }}{2} t_d \right) \le \sqrt{2}\sin \left( \frac{\sqrt{ \kappa }}{2} t_b \right) \text { and } 0< t_b < t_d \le \frac{\pi }{\sqrt{\kappa }} \Big \}\).

  • If \(\kappa =0\), \({\textbf{D}}_{4,1}^{\textrm{VR}}(M_0) = \left\{ (t_b,t_d)|\ 0 \le t_b < t_d \le \sqrt{2}t_b \right\} \).

  • If \(\kappa <0\), \({\textbf{D}}_{4,1}^{\textrm{VR}}(M_\kappa ) = \Big \{ (t_b,t_d)|\ \sinh \left( \frac{\sqrt{-\kappa }}{2} t_d \right) \le \sqrt{2}\sinh \left( \frac{\sqrt{ -\kappa }}{2} t_b \right) \text { and } 0< t_b < t_d \Big \}\).

Proof

The case \(\kappa =0\) was already done in Proposition 5.17. For \(\kappa >0\), let

$$\begin{aligned} \textstyle P:= \left\{ (t_b,t_d)|\ \sin \left( \frac{\sqrt{ \kappa }}{2} t_d \right) \le \sqrt{2}\sin \left( \frac{\sqrt{ \kappa }}{2} t_b \right) \text { and } 0< t_b < t_d \le \frac{\pi }{\sqrt{\kappa }} \right\} . \end{aligned}$$

Let \(X = \{x_1,x_2,x_3,x_4\} \subset M_\kappa \) and \(d_{ij} = d_{M_\kappa }(x_i,x_j)\). Suppose that \(t_b(X) < t_d(X) \) and label the \(x_i\) so that \(t_b(X) = \max (d_{12}, d_{23}, d_{34}, d_{41})\) and \(t_d(X) = \min (d_{13}, d_{24})\). Let \(s_{ij}:= \sin \left( \frac{\sqrt{\kappa }}{2} d_{ij} \right) \). By Theorem 5.18, \(s_{13}s_{24} \le s_{12}s_{34}+s_{14}s_{23}\), and, since the function \(t \mapsto \sin \left( \frac{\sqrt{\kappa }}{2} t\right) \) is increasing when \(\frac{\sqrt{\kappa }}{2} t \in \left[ 0,\frac{\sqrt{\kappa }}{2} \textbf{diam}(M_\kappa )\right] = \left[ 0,\frac{\pi }{2}\right] \), we get

$$\begin{aligned} \textstyle \sin ^2\left( \frac{\sqrt{\kappa }}{2} t_d(X) \right) = (\min (s_{13},s_{24}))^2 \le s_{13}s_{24} \le s_{12}s_{34}+s_{14}s_{23} \le \textstyle 2\sin ^2\left( \frac{\sqrt{\kappa }}{2} t_b(X) \right) . \end{aligned}$$

Thus,

$$\begin{aligned} \displaystyle \sin \left( \frac{\sqrt{ \kappa }}{2} t_d \right) \le \sqrt{2}\sin \left( \frac{\sqrt{ \kappa }}{2} t_b \right) . \end{aligned}$$
(22)

This shows that \({\textbf{D}}_{4,1}^{\textrm{VR}}(M_\kappa ) \subset P\). For the other direction, let \(0 < t \le 1\) and \(s \in (0,\pi /2]\), and consider \(X = \{x_1,x_2,x_3,x_4\}\) where

$$\begin{aligned} \begin{array}{l} \bullet x_1 =\textstyle \left( \frac{1}{\sqrt{\kappa }} \sqrt{1-t^2}, \frac{t}{\sqrt{\kappa }}, 0\right) \\ \bullet x_3 =\textstyle \left( \frac{1}{\sqrt{\kappa }} \sqrt{1-t^2}, -\frac{t}{\sqrt{\kappa }}, 0\right) \\ \bullet x_2 =\textstyle \left( \frac{1}{\sqrt{\kappa }} \sqrt{1-t^2}, \frac{t}{\sqrt{\kappa }} \sin (s), \frac{t}{\sqrt{\kappa }} \cos (s)\right) \\ \bullet x_4 =\textstyle \left( \frac{1}{\sqrt{\kappa }} \sqrt{1-t^2}, -\frac{t}{\sqrt{\kappa }} \sin (s), -\frac{t}{\sqrt{\kappa }} \cos (s)\right) . \end{array} \end{aligned}$$

Notice that the set \(\left\{ (p_1, p_2, p_3) \in M_\kappa : p_1 = \frac{1}{\sqrt{\kappa }} \sqrt{1-t^2} \right\} \) is a circle with radius \(t/\sqrt{|\kappa |}\). Inside of this circle, the configuration \(\{x_1, x_2, x_3, x_3\}\) is a parallelogram where \(x_1\) and \(x_2\) are antipodal to \(x_3\) and \(x_4\), respectively. Indeed, it can be checked that:

$$\begin{aligned} \begin{array}{ll} \bullet x_i \in M_\kappa , &{} \bullet \langle x_1, x_2 \rangle = \langle x_3, x_4 \rangle = \frac{1}{\kappa }(1-t^2(1-\sin (s))),\\ \bullet \langle x_1, x_3 \rangle = \langle x_2, x_4 \rangle = \frac{1}{\kappa }(1-2t^2), &{} \bullet \langle x_1, x_4 \rangle = \langle x_2, x_3 \rangle = \frac{1}{\kappa }(1-t^2(1+\sin (s))), \end{array} \end{aligned}$$

and (since \(s \in (0, \pi /2]\)) \(\langle x_1,x_3 \rangle < \langle x_1,x_4 \rangle \le \langle x_1,x_2 \rangle \). Since \(\arccos (t)\) is decreasing, we have

$$\begin{aligned} t_b(X)&= \frac{1}{\sqrt{\kappa }} \arccos \left( \kappa \langle x_1, x_4 \rangle \right) = \frac{1}{\sqrt{\kappa }} \arccos (1-t^2(1+\sin (s))), \text { and }\\ t_d(X)&= \frac{1}{\sqrt{\kappa }} \arccos \left( \kappa \langle x_1, x_3 \rangle \right) = \frac{1}{\sqrt{\kappa }} \arccos (1-2t^2). \end{aligned}$$

Notice that for a fixed t, \(t_b(X) \) is minimized at \(s=0\) and the equality in (22) is achieved. Also, \(t_d(X) \) is maximized at \(t=1\), at which point \(t_d(X) = \frac{\pi }{\sqrt{\kappa }}\). Now, let \((t_b,t_d) \in P\) be arbitrary. If we set \(t_b(X) =t_b\) and \(t_d(X) =t_d\), we can solve the equations above to get

$$\begin{aligned} t = \sqrt{\frac{1-\cos (\sqrt{\kappa }t_d)}{2}}, \text { and } \sin (s) = 2 \cdot \frac{1-\cos (\sqrt{\kappa }t_b)}{1-\cos (\sqrt{\kappa }t_d)}-1. \end{aligned}$$

Such a t exists because \(\cos (\sqrt{\kappa }t_d) \le 1\). As for s, the half-angle identity \(1-\cos (x) = 2\sin ^2(x/2)\) gives the equivalent expression \(\sin (s) = 2 \cdot \frac{\sin ^2(\sqrt{\kappa }\,t_b/2)}{\sin ^2(\sqrt{\kappa }\,t_d/2)}-1\). Since \((t_b,t_d)\) satisfies inequality (22), the right side is bounded below by 0 and, since \(t_b < t_d \le \frac{\pi }{\sqrt{\kappa }}\), it is also bounded above by 1. Thus, there exists an \(s \in [0,\pi /2]\) that satisfies the equality. This finishes the proof of \(P \subset {\textbf{D}}_{4,1}^{\textrm{VR}}(M_\kappa )\).

The proof for \(\kappa <0\) proceeds in much the same way. The only major change is in the definition of the points \(x_i\) when showing \(P \subset {\textbf{D}}_{4,1}^{\textrm{VR}}(M_\kappa )\):

$$\begin{aligned} \begin{array}{l} \bullet x_1 =\left( \frac{1}{\sqrt{-\kappa }} \sqrt{1+t^2}, \frac{t}{\sqrt{-\kappa }}, 0\right) \\ \bullet x_3 =\left( \frac{1}{\sqrt{-\kappa }} \sqrt{1+t^2}, -\frac{t}{\sqrt{-\kappa }}, 0\right) \\ \bullet x_2 =\left( \frac{1}{\sqrt{-\kappa }} \sqrt{1+t^2}, \frac{t}{\sqrt{-\kappa }} \sin (s), \frac{t}{\sqrt{-\kappa }} \cos (s)\right) \\ \bullet x_4 =\left( \frac{1}{\sqrt{-\kappa }} \sqrt{1+t^2}, -\frac{t}{\sqrt{-\kappa }} \sin (s), -\frac{t}{\sqrt{-\kappa }} \cos (s)\right) . \end{array} \end{aligned}$$

Other than that, and the fact that \(M_\kappa \) is unbounded when \(\kappa <0\), the proof is completely analogous. \(\square \)

Remark 5.20

A related result appears in [16]. The authors explore the question of whether persistent homology can detect the curvature of the ambient \(M_\kappa \). On the theoretical side, they found a geometric formula to compute the Čech persistence diagram \(\textrm{dgm}_1^{\hat{\hbox {C}}\text {ech}}(T)\) of a sample \(T \subset M_\kappa \) with three points, much in the same vein as our Theorem 4.4. They used it to find the logarithmic persistence \(P_a(\kappa ):= t_d(T_{\kappa ,a})/t_b(T_{\kappa ,a})\) for an equilateral triangle \(T_{\kappa ,a}\) of fixed side length \(a>0\), and proved that \(P_a\), when viewed as a function of \(\kappa \), is invertible. On the experimental side, they sampled 1000 points from a unit disk in \(M_\kappa \) and were able to approximate \(\kappa \) using, among other things, average persistence landscapes in dimension 1 of 100 such samples. For example, one method consisted in finding a collection of landscapes \(L_\kappa \) labeled with a known curvature \(\kappa \), and estimating \(\kappa _*\) for an unlabeled \(L_*\) with the average curvature of the three nearest neighbors of \(L_*\). They were also able to approximate \(\kappa _*\) without labeled examples by using PCA. See their paper [16] for more details. Compare with Fig. 29.

Our Theorem 5.19 is in the same spirit. The curvature value \(\kappa \) determines the boundary of \({\textbf{D}}_{4,1}^{\textrm{VR}}(M_\kappa )\), and instead of triangles, we use squares with a given \(t_d\) and minimal \(t_b\) to find \(\kappa \). Additionally, we can qualitatively detect the sign of the curvature by looking at the boundary of \({\textbf{D}}_{4,1}^{\textrm{VR}}(M_\kappa )\): it is concave up when \(\kappa >0\), a straight line when \(\kappa =0\), and concave down when \(\kappa <0\). See Fig. 28.

Fig. 28
figure 28

The boundary of \({\textbf{D}}_{4,1}^{\textrm{VR}}(M_\kappa )\) for multiple \(\kappa \) (see Theorem 5.19). Observe this set is bounded only when \(\kappa >0\), and that the left boundary of these persistence sets is concave up when \(\kappa >0\), a straight line when \(\kappa =0\), and concave down when \(\kappa >0\).

Fig. 29
figure 29

The diagrams \({\textbf{D}}_{4,1}^{\textrm{VR}}(D_\kappa )\) for disks \(D_\kappa \subset M_\kappa \) of radius \(R=\pi /\sqrt{|\kappa |}\) for various \(\kappa \ne 0\) (compare with Theorem 5.19). Also shown is \({\textbf{D}}_{4,1}^{\textrm{VR}}(D_0)\) for \(D_0 \subset M_0\), a disk of radius 1.

5.7 Persistence Sets of Spheres

After surfaces, the next case we study is higher dimensional Euclidean spheres. Observe that if \(n \le m\), an n-point subset of \({\mathbb {S}}^{m}_E\) is contained in a sphere with smaller dimension. Hence, the computation of the persistence sets of spheres can be reduced to a specific dimension that depends on n. After proving this result and giving an example, we comment on the first unknown case.

Proposition 5.21

For all \(m \ge n-1\) and all \(k \ge 0\),

$$\begin{aligned} {\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{m}_E) = {\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{n-1}_E) = \bigcup _{\lambda \in [0,1]} \lambda \cdot {\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{n-2}_E). \end{aligned}$$

Proof

\({\mathbb {S}}^{m}_E\) contains copies of \(\lambda \cdot {\mathbb {S}}^{n-2}_E\) for \(\lambda \in [0,1]\), so \(\bigcup _{\lambda \in [0,1]} \lambda \cdot {\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{n-2}_E) \subset {\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{m}_E)\). For the other direction, notice that a set \(X \subset {\mathbb {S}}^{m}_E \subset {\mathbb {R}}^{m+1}\) with n points generates an \((n-1)\)-hyperplane which intersects \({\mathbb {S}}^{m}_E\) on a \((n-2)\)-dimensional sphere of radius \(\lambda \le 1\). Thus, \(X \subset \lambda \cdot {\mathbb {S}}^{n-2}_E\), so \({\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{m}_E) \subset \bigcup _{\lambda \in [0,1]} \lambda \cdot {\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{n-2}_E)\). \(\square \)

If \(n=4\), the above proposition reduces the computation of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{m}_E)\) to the union of rescalings of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\). However, as seen in the proof of Proposition 5.17, \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2})\) is itself \(\bigcup _{\lambda \in [0,1]} \lambda \cdot {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}_E)\). This observation extends the above result to \({\mathbb {S}}^{n-2}_E\) instead of \({\mathbb {S}}^{n-1}_E\).

Corollary 5.22

For all \(m \ge 2\),

$$\begin{aligned} {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{m}_E)&= {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2}_E) = \left\{ (t_b,t_d)| 0 \le t_b < t_d \le \min (\sqrt{2}t_b, \pi ) \right\} . \end{aligned}$$

Proof

By Proposition 5.21, for every \(m \ge 3\), \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{m}_E) = \bigcup _{\lambda \in [0,1]} \lambda \cdot {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\), and by the proof of Proposition 5.17, \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\) is convex and equals \(\bigcup _{\lambda \in [0,1]} \lambda \cdot {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}_E)\). Hence, \(\bigcup _{\lambda \in [0,1]} \lambda \cdot {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2}_E) = {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\). \(\square \)

The reason why the dimension of the spheres was reduced between Proposition 5.21 and Corollary 5.22 comes from two facts. First, when \(X \subset {\mathbb {R}}^m\) is a 4-point set, the quotient \(t_d(X) /t_b(X) \) is maximized when X is concyclic (see Remark 5.15), and second, \({\mathbb {S}}^{2}_E\) contains all circles of radius \(0 \le \lambda \le 1\). Thus, to bound \(t_d(X) /t_b(X) \) for \(X \subset {\mathbb {S}}^{m}\), we take a concyclic \(X'\) such that \(t_d(X) /t_b(X) \le \sqrt{2} = t_d(X') /t_b(X') \). Since \({\mathbb {S}}^{2}\) contains enough circles, the information in \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\) is sufficient to determine \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{m})\). We are curious to see how far can this technique be pushed. More specifically, we are interested in finding inequalities in higher dimensions that will play the same role as Ptolemy’s has, namely, provide bounds for \({\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{m})\) and whose equality condition happens in a sphere of dimension lower than \(n-2\). If such an inequality existed, we could improve the equality \({\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{m}) = {\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{n-1})\) in Proposition 5.21 to a lower dimensional sphere in the same way as we did in Corollary 5.22.

At this point, we have characterized \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{m}_E)\) for \(k=1\) and any m. For \(k=2\), the first case \({\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{1}_E)\) can be obtained from Theorem 5.4. This is the extent of our knowledge of the sets \({\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{m})\). We now discuss our partial results for \(m=2\).

Lemma 5.23

Fix \(\sqrt{2/3} \le r \le 1\) and define \(f(\rho ):= 2 - r\rho + 2\sqrt{(1-r^2)(1-\rho ^2)} - 3r^2\). The equation \(f(\rho )=0\) has a unique solution \(\rho _0:= r \cdot \frac{8-9r^2}{4-3r^2}\) that satisfies \(-r < \rho _0 \le r\).

Proof

Isolating the square root and squaring the resulting equation gives \(-4(1-r^2)\rho ^2 + 4(1-r^2) = r^2 \rho ^2 +2(3r^2-2) r\rho + (3r^2-2)^2\). After reordering terms and simplifying, we get

$$\begin{aligned} g(\rho ):= (4-3r^2) \rho ^2 + (6r^3-4r) \rho + (9r^4 - 8r^2) = 0. \end{aligned}$$

Observe that \(g(-r)=0\), and that \(g(\rho )/(\rho +r) = (4-3r^2)\rho + (9r^2-8)r\). Hence, \(g(\rho )\) has the solutions \(\rho =-r\) and \(\rho = \rho _0\), where \(\rho _0:= r \cdot \frac{8-9r^2}{4-3r^2}\). However, \(\rho =-r\) is not a solution of \(f(\rho )=0\) because \(f(-r) = 4+6r^2 > 0\). Still, \(f(r)=4-6r^2 \le 0\) because \(r \ge \sqrt{2/3}\), so since f is continuous, \(f(\rho )=0\) must have a solution \(-r < \rho \le r\). This solution must be \(\rho =\rho _0\). \(\square \)

Proposition 5.24

Let \(P_{6,2}\) be

$$\begin{aligned} P_{6,2}:= \left\{ (t_b,t_d) \mid 0< t_b < t_d \le 2 \text { and either } t_d \le \frac{2}{\sqrt{3}} t_b \text { or } 4t_b^2 \cdot \dfrac{3-t_b^2}{4-t_b^2} \le t_d^2 \right\} . \end{aligned}$$

Then \(P_{6,2} \subset {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\).

Proof

We begin by noting the lines \(t_d = \frac{2}{\sqrt{3}}t_b\) and \(t_d=2\) intersect at \(t_b=\sqrt{3}\). Then, \(P_{6,2}\) splits as the union of the sets

$$\begin{aligned} A&:=\textstyle \left\{ (t_b,t_d) \mid 0< t_b \le \sqrt{3} \text { and } t_d \le \frac{2}{\sqrt{3}} t_b \right\} ,\\ B&:= \left\{ (t_b,t_d) \mid \sqrt{3} \le t_b< t_d \le 2 \right\} \text { and }\\ C&:= \left\{ (t_b,t_d) \mid \sqrt{2} \le t_b \le \sqrt{3}, t_b < t_d \le 2 \text { and } 4t_b^2 \cdot \dfrac{3-t_b^2}{4-t_b^2} \le t_d^2 \right\} . \end{aligned}$$

We will show that the configurations \(X \subset {\mathbb {S}}^{2}_E\) that generate \(A \cup B\) are the sets X inscribed in a circle of radius \(r \in (0,1] \subset {\mathbb {S}}^{2}_E\). C will take a bit more work, but we will show that it is generated by equilateral triangles inscribed at parallel circles of controlled radii.

By Theorem 5.4, \({\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{1}) = \{(t_b,t_d) \mid \frac{2\pi }{3} \le t_b < t_d \le \pi \}\) and, analogously to Proposition 5.16 (recall that \(f_E(t) = 2\sin (t/2)\)),

$$\begin{aligned} {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{1}_E) = f_E\left( {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{1}) \right) = \{(t_b,t_d) \mid \sqrt{3} \le t_b < t_d \le 2\} = B. \end{aligned}$$

Since \({\mathbb {S}}^{1}_E \hookrightarrow {\mathbb {S}}^{2}_E\), \(B \subset {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\). More generally, we have \(\bigcup _{0 < \lambda \le 1} \lambda \cdot {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{1}_E) \subset {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\) because \(\lambda \cdot {\mathbb {S}}^{1}_E \hookrightarrow {\mathbb {S}}^{2}_E\) for \(0 < \lambda \le 1\). Since \(\bigcup _{0 < \lambda \le 1} \lambda \cdot {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{1}_E) = A \cup B\), we have \(A \cup B \subset {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\).

Now we show \(C \subset {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\). Given \(\sqrt{2/3} \le r \le 1\), let \(\rho _0:= r \cdot \frac{8-9r^2}{4-3r^2}\), and \(\max (0,\rho _0) \le \rho \le r\). Let \(X = \{x_1,..., x_6\} \subset {\mathbb {S}}^{2}_E\), where each \(x_i\) has azimuthal angle \(\frac{2\pi }{6} (i-1)\), the points \(x_1, x_2, x_3\) are at height \(\sqrt{1-r^2}\), and \(x_4, x_5, x_6\) are at height \(-\sqrt{1-\rho ^2}\). More explicitly,

$$\begin{aligned} \begin{array}{ll} \bullet x_1 =\left( r, 0, \sqrt{1-r^2}\right) , &{} \bullet x_4 =\left( -\rho , 0, -\sqrt{1-\rho ^2}\right) , \\ \bullet x_2 =\left( -\frac{1}{2}r, \frac{\sqrt{3}}{2} r, \sqrt{1-r^2}\right) , &{} \bullet x_5 =\left( \frac{1}{2}\rho , -\frac{\sqrt{3}}{2} \rho , -\sqrt{1-\rho ^2}\right) , \\ \bullet x_3 =\left( -\frac{1}{2}r, -\frac{\sqrt{3}}{2} r, \sqrt{1-r^2}\right) , &{} \bullet x_6 =\left( \frac{1}{2}\rho , \frac{\sqrt{3}}{2} \rho , -\sqrt{1-\rho ^2}\right) . \end{array} \end{aligned}$$

We can verify that

$$\begin{aligned} d_{ij}^2 = {\left\{ \begin{array}{ll} 3r^2 &{} \text { if } i \ne j \in \{1,2,3\}, \\ 3\rho ^2 &{} \text { if } i \ne j \in \{4,5,6\}, \\ 2 - r\rho + 2\sqrt{(1-r^2)(1-\rho ^2)} &{} \text { if } i \in \{1,2,3\}, j \in \{4,5,6\}, j \ne i+3, \text { and } \\ 2 + 2r\rho + 2\sqrt{(1-r^2)(1-\rho ^2)} &{} \text { if } j=i+3. \end{array}\right. } \end{aligned}$$

Given \((t_b, t_d) \in C\), we claim there exist \(\sqrt{2/3} \le r \le 1\) and \(\rho _0 \le \rho \le r\) such that \(t_b^2 = 3r^2\), \(t_d^2 = 2 + 2r\rho + 2\sqrt{(1-r^2)(1-\rho ^2)}\), \(t_b(X) =t_b\), and \(t_d(X) =t_d\).

Finding r is immediate. Since \(\sqrt{2} \le t_b \le \sqrt{3}\), \(r:= t_b/\sqrt{3}\) satisfies \(\sqrt{2/3} \le r \le 1\). To find \(\rho \), define \(g(\rho ):= 2 + 2r\rho + 2\sqrt{(1-r^2)(1-\rho ^2)} - t_d^2\). If we show that \(g(\rho _0) \le 0 \le g(r)\), there will exist \(\rho _0 \le \rho \le r\) such that \(g(\rho )=0\). To wit, since \(t_d \le 2\),

$$\begin{aligned} g(r) = 2+2r^2+2(1-r^2)-t_d^2 = 4-t_d^2 \ge 0. \end{aligned}$$

For the other inequality, recall that \(t_d^2 \ge 4t_b^2 \cdot \frac{3-t_b^2}{4-t_b^2}\), and that \(f(\rho _0)=0\) by Lemma 5.23. Then

$$\begin{aligned} g(\rho _0)&= 2 + 2r\rho _0 + 2\sqrt{(1-r^2)(1-\rho _0^2)} - t_d^2 = f(\rho _0) +3r^2+3r\rho _0 - t_d^2\\&= 3r \left( r+ r \cdot \frac{8-9r^2}{4-3r^2} \right) - t_d^2 \\&= 3r^2 \cdot \frac{12-12r^2}{4-3r^2} - t_d^2 = t_b^2 \cdot \frac{12-4t_b^2}{4-t_b^2} - t_d^2 = 4t_b^2 \cdot \frac{3-t_b^2}{4-t_b^2} - t_d^2 \le 0, \end{aligned}$$

as desired.

The remaining facts to verify are \(t_b(X) =t_b\) and \(t_d(X) =t_d\). By definition of C and the previous paragraph, we have \(3r^2 = t_b < t_d = 2 +2r\rho + 2\sqrt{(1-r^2)(1-\rho ^2)}\). Thus, if we can show

$$\begin{aligned} \max \left( 3\rho ^2, 2 - r\rho + 2\sqrt{(1-r^2)(1-\rho ^2)} \right) \le 3r^2, \end{aligned}$$

we will have \(t_b(x_i) = 3r^2\) and \(t_d(x_i) = 2 +2r\rho + 2\sqrt{(1-r^2)(1-\rho ^2)}\) for all \(i=1, \dots , 6\). The inequality \(3\rho ^2 \le 3r^2\) is immediate from the assumption \(\rho \le r\). For the second inequality, recall that the function \(f(\rho )\) from Lemma 5.23 has a unique zero at \(\rho =\rho _0\) and that \(f(r) \le 0\). Hence, since f is continuous, we have \(f(\rho ) \le 0\) for any \(\rho _0 \le \rho \le r\). Thus, \(2 - r\rho + 2\sqrt{(1-r^2)(1-\rho ^2)} \le 3r^2\).

In conclusion, for every \((t_b, t_d) \in C\), we found \(X \subset {\mathbb {S}}^{2}_E\) with \(|X|=6\) such that \(t_b(X) =t_b\) and \(t_d(X) =t_d\). Hence, \(C \subset {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\). Together with the case of \(A \cup B\), we have \(P_{6,2} = A \cup B \cup C \subset {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\). \(\square \)

\(P_{6,2}\) is shown in blue in Fig. 30. It is generated by two parallel equilateral triangles inscribed in \({\mathbb {S}}^{2}_E\). We haven’t been able to prove that \({\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E) = P_{6,2}\), but we have strong experimental evidence. We first sampled 4.5 million configurations uniformly at random from \({\mathbb {S}}^{2}_E\) and retained only the 88,708 configurations that produced non-trivial persistence (1.9713 % of the total samples). This produces a set \({\textbf{D}}_\text {unif}\) of persistence diagrams, which are shown in green in Fig. 30. The second step was a biased MCMC random walk. The Metropolis-Hasting MCMC starts with a choice of parameter \(\sigma ^2\), an initial configuration \(X_0\), and a set \({\textbf{D}}_0={\textbf{D}}_\text {unif}\) [81]. We additionally fix a radius \(\varepsilon \). At each step t, we obtain \(X_{t-1}^{\sigma ^2}\) by perturbing the previous configuration \(X_{t-1}\) with Gaussian noise of variance \(\sigma ^2\). Let \(D_{t-1}\) and \(D_{t-1}^{\sigma ^2}\) be the persistence diagrams of \(X_{t-1}\) and \(X_{t-1}^{\sigma ^2}\), respectively. We then compute the cardinalities \(N_{\text {pre}} = |B_\varepsilon (D_{t-1}) \cap {\textbf{D}}_{t-1}|\) and \(N_{\text {post}} = |B_\varepsilon (D_{t-1}^{\sigma ^2}) \cap {\textbf{D}}_{t-1}|\).

Fig. 30
figure 30

The set \(P_{6,2} \subset {\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\) described in Proposition 5.24. The green points were generated with a uniform sample of sets with 6 points. Under that, the magenta points were generated with an MCMC random walk. The blue points were generated by the vertices of two parallel equilateral triangles inscribed in \({\mathbb {S}}^{2}_E\).

The balls of radius \(\varepsilon \) are defined with the bottleneck distance. In the next step is where we diverge from the usual algorithm. Normally, we would accept the new configuration \(X_{t-1}^{\sigma ^2}\) with probabilityFootnote 8\(\min (1,N_\text {post}/N_\text {pre})\). Eventually, the distribution of persistence diagrams in \({\textbf{D}}_t\) would approximate the distribution that we are sampling from, that is, \({\textbf{U}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\). However, sampling uniformly from \({\mathbb {S}}^{2}_E\) also produces diagrams with that distribution and this method did not produce points close to the boundary of \({\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2})\). Instead, we accept \(X_{t-1}^{\sigma ^2}\) with probability \(\min (1,N_\text {pre}/N_\text {post})\), and set \(X_t:= X_{t-1}^{\sigma ^2}\) and \({\textbf{D}}_t:= {\textbf{D}}_{t-1} \cup \{D_{t-1}^{\sigma ^2}\}\). This causes the random walk to diverge from the diagrams that already are in \({\textbf{D}}_\text {unif}\) and produces configurations closer to the boundary of \({\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\). The diagrams produced by the random walk are colored in magenta in Fig.  30. This figure suggests that there are no points outside of \(P_{6,2}\).

Conjecture 5.25

\({\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E) = P_{6,2}\).

5.8 Principal Persistence Sets Can Differentiate Spheres

Any non-diagonal point \((t_b, t_d) \in {\textbf{D}}_{2k+2,k}^{\textrm{VR}}(X)\) corresponds to a subset \(A \subset X\) coinciding with the vertex set of a cross-polytope inscribed in X. If, in addition \(t_d = 2t_b\), then the cross-polytope must be regular, as Lemma 5.3 item 1 shows. For example, \({\mathbb {S}}^{m}\) admits a particular inscribed regular cross-polytope depending on the dimension m. It turns out that principal persistence sets can pick up this difference, and that is enough to tell apart spheres of different dimensions.

Proposition 5.26

\((\pi /2, \pi ) \in {\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{m})\) if and only if \(1 \le k \le m\).

Proof

Let \(k \ge 1\). Suppose that a set \(X = \{x_1, \dots , x_{2k+2}\} \subset {\mathbb {S}}^{m}\) satisfies \(t_b(X) = \pi /2\) and \(t_d(X) = \pi \), and label the points so that \(v_d(x_i) = x_{i+k+1}\). By Lemma 5.1 item 1, we must have \(d_{{\mathbb {S}}^{m}}(x_i, x_j) = t_b(X) = \pi /2\) for all \(j \ne i+k+1\) and \(d_{{\mathbb {S}}^{m}}(x_i, x_{i+k+2}) = t_d(X) = \pi \) for all i. The fact that \(d_{{\mathbb {S}}^{m}}(x_i, x_j) = \pi /2 = \arccos \langle x_i, x_j \rangle \) means that \(x_1, \dots , x_{k+1}\) are mutually orthogonal and, hence, linearly independent. This forces \(k \le m\). Conversely, for any \(1 \le k \le m\), we can construct a set of mutually orthogonal vectors \(x_1, \dots , x_{k+1} \in {\mathbb {S}}^{m}\) by setting \(x_i\) as, for instance, the i-th standard basis vector of \({\mathbb {R}}^{m+1}\). In that case, \(X:= \{\pm x_1, \dots , \pm x_{k+1}\}\) has \(2k+2\) points and satisfies \(t_b(X) = \pi /2\) and \(t_d(X) = \pi \). \(\square \)

Remark 5.27

(Principal persistence sets and fundamental classes of spheres) The point \((\pi /2, \pi ) \in {\textbf{D}}_{2\,m+2,m}^{\textrm{VR}}({\mathbb {S}}^{m})\) is generated by a regular cross-polytope \(X \in {\mathbb {S}}^{m}\) with \(2\,m+2\) vertices. It is interesting to note that the m-simplices of \(\textrm{VR}_r(X)\), when \(\pi /2 \le r < \pi \), determine an m-chain that represents the fundamental class \([{\mathbb {S}}^{m}]\).

Remark 5.28

(Distances between persistence sets can distinguish spheres) For \(m=1, \dots , 5\) and \(k=1,\dots ,5\), we computed an approximation \(D_{k}({\mathbb {S}}^{m})\) of the principal persistence set \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{m})\) by sampling \(10^5\) configurations of \(2k+2\) points uniformly at random from \({\mathbb {S}}^{m}\). Then, for each k, we computed the Hausdorff distance induced by the bottleneck distance for all \(1 \le i,j \le 5\) which we denote by \(d_{k}({\mathbb {S}}^{i}, {\mathbb {S}}^{j}):= d_{{\mathcal {H}}}^{{\mathcal {D}}}(D_{k}({\mathbb {S}}^{i}), D_{k}({\mathbb {S}}^{j}))\). Analogously to Definition 3.6, we set \(d({\mathbb {S}}^{i}, {\mathbb {S}}^{j}):= \max _{k} d_{k}({\mathbb {S}}^{i}, {\mathbb {S}}^{j})\). Lastly, we computed the single-linkage hierarchical clustering; the resulting dendrogram is shown in Fig. 31 and it indicates that principal persistence sets can discriminate these 5 spheres.

Fig. 31
figure 31

The dendrogram induced by the distances \(d({\mathbb {S}}^{i}, {\mathbb {S}}^{j})\) in Remark 5.28, for \(1 \le i,j \le 5\).

5.8.1 Lower Bounds for \(d_{\mathcal{G}\mathcal{H}}({\mathbb {S}}^{1},{\mathbb {S}}^{m})\)

Using the stability in Theorem 3.13 and the characterization of \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1})\), we can find lower bounds for the Gromov–Hausdorff distance between the circle and other spheres. Suppose \(n=2k+2\), and let \(D_k = (\frac{\pi }{2}, \pi ) \in {\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{k})\) be the diagram of the cross-polytope \({\mathfrak {B}}_{k} \subset {\mathbb {S}}^{k}\). We were able to prove

  • \(d_{\mathcal{G}\mathcal{H}}({\mathbb {S}}^{1}, {\mathbb {S}}^{2}) \ge \frac{1}{2} d_{{\mathcal {H}}}^{\mathcal {D}}({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}), {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{2})) \approx 0.2147 \approx \frac{\pi }{14.6344}\).

  • For \(k \ge 3\), \(d_{\mathcal{G}\mathcal{H}}({\mathbb {S}}^{1}, {\mathbb {S}}^{k}) \ge \frac{1}{2} d_{{\mathcal {H}}}^{\mathcal {D}}({\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{1}), {\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{k})) \ge \frac{1}{2} \inf _{D_1 \in {\textbf{D}}_{n,k}^{\textrm{VR}}({\mathbb {S}}^{1})} d_{{\mathcal {B}}} (D_1,D_k) = \frac{\pi }{8}\).

See our preprint [43] for a detailed proof.

6 Persistence Sets of Metric Graphs

Let G be a metric graph, that is, the geometric realization of a finite one-dimensional simplicial complex equipped with the shortest path distance induced by a collection of weights \(\ell _e\) on the edges \(e \in E(G)\) (see [10, Sect. 3.2.2] or [67, 72] for other definitions). The central question in this section is what features of G are detected by \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}(G)\). Our first setting is when G is a metric tree.

Definition 6.1

We say that a metric space X is tree-like if there exists a metric tree T such that X is isometrically embedded in T. See Fig. 32.

Lemma 6.2

Let \(k \ge 1\) and \(n \ge 1\) be fixed. For any metric tree T and \(X \subset T\) with \(|X|=n\), \({\text {PH}}_k(X)=0\) and, thus, \({\textbf{D}}_{n,k}^{\textrm{VR}}(T)\) is empty. In particular, if \(n=2k+2\), then \(t_b(X) \ge t_d(X) \).

Proof

Observe that X is tree-like, so by Theorem 2.1 of the appendix of [21], the persistence module \({\text {PH}}_k(X)\) is 0 for any \(k \ge 1\). In particular, if \(n=2k+2\), Theorem 4.4 implies that \(t_b(X) \ge t_d(X) \). \(\square \)

As a consequence, a metric graph G must have a cycle if \({\textbf{D}}_{n,k}^{\textrm{VR}}(G)\) is to be non-empty and, even if it does, not all configurations \(X \subset G\) with \(|X|=n\) have \(t_b(X) < t_d(X) \). In fact, X can be tree-like even if there is no metric tree T such that \(X \hookrightarrow T \hookrightarrow G\). We will see an example in the proof of Proposition 6.5. Hence, it would be useful to have a notion of a minimal metric graph \(\Gamma _X\) containing X so that, if \(\Gamma _X\) is a tree, then \(\textrm{PH}_k^\textrm{VR}(X)=0\). For now, we deal with the case of \(n=4\), where split metric decompositions provide one possible construction for \(\Gamma _X\).

Fig. 32
figure 32

A tree-like metric space \(X = \{x_1,x_2,x_3,x_4,x_5\}\) and a metric tree T such that \(X \hookrightarrow T\).

6.1 Split Metric Decompositions

We follow the exposition in [12]. Let \((X, d_X)\) be a finite pseudo-metric space. Given a partition \(X = A \cup B\), let

$$\begin{aligned} \beta _{\{a,a'\},\{b,b'\}}:= \frac{1}{2}{} & {} \bigg ( \max [d_X(a,b)+d_X(a',b'),\ d_X(a,b')\\{} & {} \quad +d_X(a',b),\ d_X(a,a')+d_X(b,b')]- d_X(a,a') - d_X(b,b') \bigg ), \end{aligned}$$

and define the isolation index \(\alpha _{A,B}:= \min \left\{ \beta _{\{a,a'\}, \{b,b'\}} \ \bigg | \ a,a' \in A \text { and } b,b' \in B \right\} \).

Notice that both \(\alpha _{A,B}\) and \(\beta _{\{a,a'\}, \{b,b'\}}\) are non-negative. Also, if \(A=\{a,a'\}\) and \(B=\{b,b'\}\), \(\alpha _{A,B} = \beta _{A,B}\). If the isolation index \(\alpha _{A,B}\) is non-zero, then the unordered partition AB is called a \(d_X\)-split. The main theorem regarding isolation indices and split metrics is the following.

Theorem 6.3

([12]) Any (pseudo-)metric \(d_X\) on a finite set X can be written uniquely as

$$\begin{aligned} d_X = d_0 + \sum \alpha _{A,B} \delta _{A,B}, \text { where } \delta _{A,B}(x,y):= {\left\{ \begin{array}{ll} 0, &{} \text {if } x,y \in A \text { or } x,y \in B,\\ 1, &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

where the sum runs over all \(d_X\)-splits AB. \(\delta _{A,B}\) is called a split-metric, and the term \(d_0\) is a (pseudo-)metric that has no \(d_0\)-splits (also called split-prime metric).

The importance of split metric decompositions is motivated by the following example. If \(X = \{x_1,x_2,x_3,x_4\}\), then the metric graph \(\Gamma _X\) shown in Fig. 33 contains an isometric copy of X [30, 82] and the length of the edges of \(\Gamma _X\) is given by isolation indices [12]. Furthermore, any metric on 4 points does not contain a split-prime component [12]. Another related construction is the tight span of a metric space, which is an extension of X that is universal in the sense that it is the smallest injective space in which X embeds [30]. In Fig. 33, for instance, the tight span of X can be obtained from \(\Gamma _X\) by filling in the rectangle with a 2-cell equipped with the \(L^1\) metric. See [12] for more connections between the tight span and metric decompositions.

Regarding persistent homology, the tight span has several properties that make it suitable for studying Vietoris–Rips complexes [52]. The key fact is the following. Let M be a metric space and let \(T_M \supset M\) be its tight span. Let \(B_{r}^{T_M}(m) \subset T_M\) be the open ball of radius r around \(m \in M\). Then, there exists a filtered homotopy equivalence \(f_r:\textrm{VR}_{2r}(M) \rightarrow \bigcup _{m \in M} B_{r}^{T_M}(m)\). This theorem gives the type of construction that we want: an extension of a metric space X where we can study the Vietoris–Rips complex of X. Given this property and the similarity of the tight span and \(\Gamma _X\), it is reasonable to expect that split metric decompositions are also a good tool to study the Vietoris–Rips complex of X. Split metric decompositions do have an important advantage in our setting. They produce a graph \(\Gamma _X\) such that \(X \hookrightarrow \Gamma _X\) with edges of lengths that are computable with isolation indices. For these reasons, we now study the persistence diagram of \(X \hookrightarrow \Gamma _X\).

Fig. 33
figure 33

The metric graph \(\Gamma _X\) resulting from the split-metric decomposition of a metric space \((X,d_X)\) with 4 points (Theorem 6.3). In this case, \(a_i = \alpha _{\{x_i\}, X {\setminus } \{x_i\}}\), \(b = \alpha _{\{x_2,x_3\}, \{x_1,x_4\}}\), \(c = \alpha _{\{x_1,x_2\}, \{x_3,x_4\}}\), and \(\alpha _{\{x_1,x_3\}, \{x_2,x_4\}}=0\). Notice that \(d_X = \sum _{i=1}^4 a_{i} \cdot \delta _{x_i} + b \cdot \delta _{\{x_2,x_3\}, \{x_1,x_4\}} + c \cdot \delta _{\{x_1,x_2\}, \{x_3,x_4\}}\).

Proposition 6.4

Let \(\Gamma _X\) be the metric graph shown in Fig. 33, and \(X = \{x_1,x_2,x_3,x_4\} \subset \Gamma _X\). Let \(a_i = \alpha _{\{x_i\}, X {\setminus } \{x_i\}}\), \(b = \alpha _{\{x_2,x_3\}, \{x_1,x_4\}}\), and \(c = \alpha _{\{x_1,x_2\}, \{x_3,x_4\}}\).

  1. 1.

    If \(t_b(X) < t_d(X) \), then \(t_b(X) = \max (d_{12}, d_{23}, d_{34}, d_{41})\) and \(t_d(X) = \min (d_{13}, d_{24})\).

  2. 2.

    \(t_b(X) < t_d(X) \) if and only if

    $$\begin{aligned} \begin{aligned} |a_{2}-a_{1}|, |a_{4}-a_{3}|<b,\\ |a_{3}-a_{2}|, |a_{1}-a_{4}|<c. \end{aligned} \end{aligned}$$
    (23)
  3. 3.

    \(t_d(X) -t_b(X) \le \min (b,c)\), regardless of whether \(t_b(X) < t_d(X) \) or not.

Proof

1. If \(t_b(X) < t_d(X) \), the desired formulas for \(t_b(X) \) and \(t_d(X) \) hold if and only if \(v_d(x_1)=x_3\) and \(v_d(x_2)=x_4\). To see why, recall that \(v_d\) is well defined by Lemma 4.2, and suppose \(v_d(x_1) = x_4\) and \(v_d(x_2) = x_3\). In particular, this means that \(d_{13} < d_{14}\) and \(d_{24} < d_{23}\). Since X is isometrically embedded in \(\Gamma _X\), \(d_{ij}\) equals the length of the shortest path in \(\Gamma _X\) between \(x_i\) and \(x_j\). Then, the inequalities \(d_{13} < d_{14}\) and \(d_{24} < d_{23}\) are equivalent to

$$\begin{aligned} a_{1} + (b+c) + a_{3}< a_{1} + c + a_{4} \text { and } a_{2} + (b+c) + a_{4} < a_{2} + c + a_{3}. \end{aligned}$$

After rearranging terms, we get \(b< a_{4} - a_{3} < -b\), a contradiction. The case \(v_d(x_1)=x_4\) and \(v_d(x_2)=x_3\) follows analogously, so \(v_d(x_1)=x_3\) and \(v_d(x_2)=x_4\).

2. Notice that the inequalities \(d_{23}<d_{13}\) and \(d_{14}<d_{24}\) are equivalent to

$$\begin{aligned} a_{2}+c+a_{3}&< a_{1} + (b+c) + a_{3},\\ a_{1}+c+a_{4}&< a_{2} + (b+c) + a_{4}, \end{aligned}$$

which, after rearranging terms, result in \(-b< a_{2}-a_{1} < b\). Using similar combinations, we find that \(\max (d_{12},d_{23},d_{34},d_{41}) < \min (d_{13},d_{24})\) is equivalent to the system of inequalities in (23).

If (23) holds, then for all \(1 \le i \le 4\), \(d_{i,i+2} \ge \min (d_{13}, d_{24}) > \max (d_{12},d_{23},d_{34}, d_{41}) \ge \max (d_{i-1,i}, d_{i,i+1})\). As a consequence, \(t_d(x_i) = d_{i,i+2}\) and \(t_b(x_i) = \max (d_{i-1,i}, d_{i,i+1})\). Hence, \(t_b(X) =\max _i t_b(x_i) = \max (d_{12},d_{23},d_{34},d_{41})\) and \(t_d(X) = \min _i t_d(x_i) = \min (d_{13},d_{24})\), and thus, \(t_b(X) < t_d(X) \). Conversely, if \(t_b(X) <t_d(X) \), then item 1 and the paragraph above imply (23).

3. If \(t_b(X) \ge t_d(X) \), the bound is trivially satisfied. Suppose then, without loss of generality, that \(t_b(X) =d_{12}\). Since \(a_{3}+b+a_{4} = d_{34} \le d_{12} = a_{1} + b + a_{2}\), we have

$$\begin{aligned} t_d(X) - t_b(X)&= \min (d_{13},d_{24}) - d_{12} \le \frac{1}{2}[d_{13}+d_{24}] - d_{12} \\&= \frac{1}{2}[a_{1}+a_{2}+a_{3}+a_{4} + 2(b+c)] - (a_{1}+b+a_{2})\\&\le \frac{1}{2}[a_{1}+a_{2}+(a_{1}+a_{2}) + 2(b+c)] - (a_{1}+a_{2})-b = c. \end{aligned}$$

On the other hand, \(d_{14} \le d_{12}\) and \(d_{23} \le d_{12}\) give \(a_{4}+c \le a_{2}+b\) and \(a_{3}+c \le a_{1}+b\). Then

$$\begin{aligned} t_d(X) - t_b(X)&\le \frac{1}{2}[a_{1}+a_{2}+a_{3}+a_{4} + 2(b+c)] - (a_{1}+b+a_{2})\\&\le \frac{1}{2}[a_{1}+a_{2}+(a_{1}+a_{2}) + 4b] - (a_{1}+a_{2})-b = b. \end{aligned}$$

In summary, \(t_d(X) -t_b(X) \le \min (b,c)\). \(\square \)

The following examples illustrate different uses of Proposition 6.4.

Proposition 6.5

Let \(\lambda _1,\dots ,\lambda _n\) be positive numbers, and consider the wedge \(\bigvee _{k=1}^n \frac{\lambda _k}{\pi } \cdot {\mathbb {S}}^{1}\) of n circles at a common point \(p_0 = \bigcap _{k=1}^n \frac{\lambda _k}{\pi } \cdot {\mathbb {S}}^{1}\). Then

$$\begin{aligned} {\textbf{D}}_{4,1}^{\textrm{VR}}\left( \bigvee _{k=1}^n \frac{\lambda _k}{\pi } \cdot {\mathbb {S}}^{1} \right) = \bigcup _{k=1}^{n} \frac{\lambda _k}{\pi } \cdot {\textbf{D}}_{4,1}^{\textrm{VR}}\left( {\mathbb {S}}^{1} \right) . \end{aligned}$$
Fig. 34
figure 34

A metric graph formed by the wedge of two circles at 0 as in Proposition 6.5. Left: \(\Gamma _X\) is a metric tree. Center: One circle contains three points, while the other only has one. Right: Both circles have two out of four points.

Proof

Let \(S_k = \frac{\lambda _k}{\pi } {\mathbb {S}}^{1}\) and \(G = \bigvee _{k=1}^n \frac{\lambda _k}{\pi } \cdot {\mathbb {S}}^{1}\). Observe that the set \({\textbf{D}}_{4,1}^{\textrm{VR}}(S_k)\) is the triangle in \({\mathbb {R}}^2\) bounded by

figure d

with vertices \((\frac{1}{2}\lambda _k,\lambda _k), (\frac{2}{3}\lambda _k, \frac{2}{3}\lambda _k)\), and \((\lambda _k, \lambda _k)\) (see Remark 5.7). By functoriality of persistence sets, \(\bigcup _{i=1}^{n} {\textbf{D}}_{4,1}^{\textrm{VR}}(S_k) \subset {\textbf{D}}_{4,1}^{\textrm{VR}}(G)\). We now show the other inclusion.

Let \(X = \{x_1,x_2,x_3,x_4\} \subset G\), and set \(d_{ij}=d_G(x_i,x_j)\). Define \(X_k = X \cap (S_k {\setminus } \{p_0\})\). The proof will go case by case depending on the cardinality of the sets \(X_k\).

Case 1: \(|X_{a}| = 4\) for some a. Observe that X is contained in \(S_{a}\), so \(\textrm{dgm}_1^\textrm{VR}(X) \in {\textbf{D}}_{4,1}^{\textrm{VR}}(S_{a})\).

Case \(\mathbf {1'}\): \(|X_{k_0}|=3\) for some a and \(|X_j|=0\) for all \(j \ne a\) In this case, \(X = X_{a} \cup \{p_0\} \subset S_{a}\), so, similarly to Case 1, we have \(\textrm{dgm}_1^\textrm{VR}(X) \in {\textbf{D}}_{4,1}^{\textrm{VR}}(S_{a})\).

Case 2: \(|X_a| = 3\) and \(|X_b|=1\) for some \(a \ne b\). For concreteness, write \(X_a = \{x_1,x_2,x_3\}\) and \(X_b = \{x_4\}\), and assume \(x_2\) is in the connected component of \(S_a {\setminus } \{x_1,x_3\}\) that doesn’t contain \(p_0\); see Fig. 34. Then \(d_{24} > d_{21},d_{23}\), so \(v_d(x_2) = x_4\) (see Definition 4.1). If \(t_b(X) \ge t_d(X) \), then \(\textrm{dgm}_1^\textrm{VR}(X)\) is the empty diagram and it belongs to \({\textbf{D}}_{4,1}^{\textrm{VR}}(S_k)\) for all k. Assume, then, that \(t_b(X) < t_d(X) \). In particular, we have \(v_d(x_1)=x_3\). Let \(X' = \{p_0, x_1, x_2,x_3\}\), and \(t = d_G(p_0,x_4)\). Let \(d_{0i}=d_G(p_0,x_i)\) for \(i \ne 4\), and notice that \(d_{i4} = d_{i0}+t\). This implies that \(t_b(X') \le t_b(X) \) and \(t_d(X') \le t_d(X) \). Now we have two cases, depending on whether \(t_b(X') < t_d(X') \) or not. If the inequality holds, and since \(X' \subset S_a\), then \(t_b(X') \) and \(t_d(X') \) satisfy (\(\star _a\)). This allows us to verify \((\star _a)\) for \(t_b(X) \) and \(t_d(X) \). Indeed, we have

$$\begin{aligned} 2\lambda _a \le 2t_b(X') +t_d(X') \le 2t_b(X) +t_d(X) . \end{aligned}$$

Also, \(t_d(X) = \min _i t_d(x_i) \le t_d(x_1) = d_{13} \le \lambda _a\), regardless of the position of \(x_4\). Thus, \(t_b(X) \) and \(t_d(X) \) satisfy \((\star _a)\), so \((t_b(X) , t_d(X) ) \in {\textbf{D}}_{4,1}^{\textrm{VR}}(S_a)\).

For the second case, it is possible for \(t_b(X) \) to be smaller than \(t_d(X) \) even if \(t_b(X') \ge t_d(X') \), However, several conditions must be met. First, recall that any 4-point metric space has a split metric decomposition as in Fig. 33. By Proposition 6.4 item 3, \(b,c > 0\). Moreover,

$$\begin{aligned} \beta _{\{x_4\},\{x_i,x_j\}} = \frac{1}{2}(d_{i4}+d_{j4}-d_{ij}) = \frac{1}{2}(d_{i0}+d_{j0}-d_{ij}) + t = \beta _{\{x_0\},\{x_i,x_j\}}+t. \end{aligned}$$

Thus, \(\alpha _{\{x_4\}, X {\setminus } \{x_4\}} = \alpha _{\{x_0\}, X' {\setminus } \{x_0\}}+t\). By Theorem 2 of [12],Footnote 9 all other isolation indices satisfy \(\alpha _{A,B} = \alpha _{A',B'}\), where \(X = A \cup B\), \(X'=A' \cup B'\) and \(A'\) is the set A with \(x_4\) replaced by \(x_0\). \(B'\) is defined analogously. In other words, the only isolation indices that are different between X and \(X'\) are \(\alpha _{\{x_4\}, X \setminus \{x_4\}}\) and \(\alpha _{\{x_0\}, X' \setminus \{x_0\}}\). For this reason, \(X'\) has the split metric decomposition shown in Fig. 33 except that \(x_4\) is changed to \(x_0\) and \(a_4\) is changed to \(a_4-t \ge 0\). In particular, since \(b,c>0\), \(X'\) is not tree-like. In other words, \(X'\) is not contained in any semicircle of \(S_a\), so

$$\begin{aligned} 2\lambda _a = d_{01}+d_{12}+d_{23}+d_{30}. \end{aligned}$$
(24)

The second set of conditions comes from comparing \(t_b(X) \) and \(t_d(X) \) with \(t_b(X') \) and \(t_d(X') \). First, observe that \(t_d(X) = \min (d_{13}, d_{24}) = \min (d_{13}, d_{20}+t)\). Since \(t_d(X') = \min (d_{13},d_{20})\) is smaller than \(t_b(X') \) and \(t_b(X') \le t_b(X) < t_d(X) \), we need \(d_{20} < d_{13}\). Second, \(t_b(X') \) cannot be \(d_{i0}\) for \(i=1,3\). Otherwise, \(t_b(X) = \max (d_{12},d_{23},d_{30}+t,d_{01}+t)\) would be \(d_{i4}=d_{i0}+t\) for either \(i=1,3\). This, however, induces a contradiction:

$$\begin{aligned} t_b(X) = d_{i0} + t = t_b(X') +t \ge t_d(X') +t \ge t_d(X) . \end{aligned}$$

For concreteness, let \(t_b(X') = \max (d_{12},d_{23}) = d_{12}\). Also, since \(d_{02} = t_d(X') \le t_b(X') = d_{12}\), we must have \(d_{02} = \min (d_{01}+d_{12}, d_{23}+d_{30}) = d_{23}+d_{30} \le d_{12}\).

Now we are ready to prove that \((t_b(X) , t_d(X) ) \in {\textbf{D}}_{4,1}^{\textrm{VR}}(S_a)\). By Eq. (24) and the conditions in the preceding paragraph,

$$\begin{aligned} 2\lambda _a = d_{01}+d_{12}+(d_{23}+d_{30}) \le 3 d_{12} \le 3t_b(X) . \end{aligned}$$

Hence, \(t_b(X) \ge \frac{2}{3}\lambda _a\). Then

$$\begin{aligned} 2t_b(X) +t_d(X) > 3t_b(X) \ge 2\lambda _a. \end{aligned}$$

Lastly, \(t_d(X) = \min (d_{13}, d_{24}) \le d_{13} \le \lambda _a\). Thus, \(t_b(X) \) and \(t_d(X) \) satisfy \((\star _a)\).

Case \(\mathbf {2'}\): \(|X_a| = 2\), \(|X_b|=1\) for some \(a \ne b\), and \(|X_c|=0\) for all \(c \ne a,b\).

\(X = X_a \cup \{p_0\} \cup X_b\) and the proof in Case 2 is still valid if we replace \(X_a\) with \(X_a \cup \{p_0\}\).

Case 3: \(|X_a|=2\) and either \(|X_b|=2\) or \(|X_b|=|X_c|=1\) for \(a \ne b \ne c\). Let \(X_a = \{x_1,x_2\}\) and \(X_a' = X {\setminus } X_a\). Let \(a_i = d_G(x_i,p_0)\). Notice that \(d_{ij}=a_i+a_j\) for \(i \in \{1,2\}\) and \(j \in \{3,4\}\). Then:

$$\begin{aligned} d_{13}+d_{24} = d_{14}+d_{23} = a_1+a_2+a_3+a_4 \,\,\,\,\text{ and }\,\,\,\, d_{12}+d_{34} \le a_1+a_2+a_3+a_4. \end{aligned}$$

As a consequence,

$$\begin{aligned} \alpha _{\{x_1,x_3\}, \{x_2,x_4\}}= & {} \beta _{\{x_1,x_3\}, \{x_2,x_4\}}\\= & {} \frac{1}{2} [\max (d_{13}+d_{24}, d_{14}+d_{23}, d_{12}+d_{34} ) - d_{13}-d_{24} ] = 0. \end{aligned}$$

Analogously, \(\alpha _{\{x_1,x_4\}, \{x_2,x_3\}} = 0 \le \alpha _{\{x_1,x_2\}, \{x_3,x_4\}}\). Then \(b=0\) in Proposition 6.4 and item 2 gives that \(\textrm{dgm}_1^\textrm{VR}(X)\) is the empty diagram. Note, in particular, that \(\Gamma _X\) is a metric tree.

Case 4: \(|X_{a}| \le 1\) for all a. Observe that X is isometrically embedded in the tree \(T \subset G\) formed by the four shortest paths joining each \(x_i\) to \(p_0\). \(\textrm{dgm}_1^\textrm{VR}(X)\) is empty by Lemma 6.2. \(\square \)

The proof of Proposition 6.5 shows that a configuration \(X \subset G\) produces persistence only if it is close to a cycle in the sense that either X is contained in a circle \(\frac{\lambda _i}{\pi } \cdot {\mathbb {S}}^{1}\), or only one point of X is outside of \(\frac{\lambda _i}{\pi } \cdot {\mathbb {S}}^{1}\). In both cases, the metric graph \(\Gamma _X\) contains a cycle since both b and c in Fig. 33 are non-zero. In any other scenario, \(\Gamma _X\) is a metric tree. This might lead to the conjecture that \({\textbf{D}}_{4,1}^{\textrm{VR}}(G) = \bigcup _{C \subset G} {\textbf{D}}_{4,1}^{\textrm{VR}}(C)\) where the union runs over all cycles \(C \subset G\). However, the following examples show that this is false.

Example 6.6

Recall the cyclic order \(\prec \) from Definition 5.2. Let G be a metric graph formed by attaching edges of length L to a cycle C at the points \(y_1 \prec y_2 \prec y_3 \prec y_4\); see Fig. 6. Let \(X = \{x_1,x_2,x_3,x_4\} \subset G\). If \(X \subset C\), then no new persistence is produced, so the points in X have to be in the attached edges. Also, if \(t_b(X) \) is to be smaller than \(t_d(X) \), then each \(x_i\) must be on a different edge. For example, if \(x_1\) and \(x_2\) are on the edge attached to \(y_1\), and \(x_3\) and \(x_4\) are on the edges adjacent to \(y_3\) and \(y_4\), respectively, let \(X' = \{x_1,x_2,y_3,y_4\}\). This \(X'\) consists of two points inside of a cycle and two points outside, so as we saw in Proposition 6.5 when \(|X_1|=|X_2|=2\), \(X'\) is tree-like, and attaching edges at \(y_3\) and \(y_4\) doesn’t change that. Thus, X is also a tree-like metric space.

Suppose, then, that each \(x_i\) is on the edge attached to \(y_i\). Let \(Y = \{y_1, y_2, y_3, y_4\}\). Since the decomposition in Theorem 6.3 is unique, the isolation indices of the metrics of X and Y satisfy \(\alpha _{\{x_i\}, X {\setminus } \{x_i\}} = \alpha _{\{y_i\}, Y {\setminus } \{y_i\}} + d_G(x_i,y_i)\), and \(\alpha _{\{x_i,x_j\},\{x_h,x_k\}} = \alpha _{\{y_i,y_j\},\{y_h,y_k\}}\), where \(\{i,j,h,k\} = \{1,2,3,4\}\). Suppose that \(\alpha _{\{y_1,y_3\}, \{y_2, y_4\}}=0\), and let \(m:= \min (\alpha _{\{y_1,y_2\}, \{y_3, y_4\}}, \alpha _{\{y_1,y_4\}, \{y_2, y_3\}})\). By Proposition 6.4, \(t_d(X) -t_b(X) \le m\), so

$$\begin{aligned} {\textbf{D}}_{4,1}^{\textrm{VR}}(G) \subset {\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{1}) \cup \{ (t_b,t_d) \ | \ t_b(Y) \le t_b < t_d \le t_b+m, \text { and } t_b \le t_b(Y) +2L \}. \end{aligned}$$

Observe that \({\textbf{D}}_{4,1}^{\textrm{VR}}(G)\) can contain points outside of \({\textbf{D}}_{4,1}^{\textrm{VR}}(C)\). For example, if \(t_b(Y) < t_d(Y) \), then the point \((t_b(Y) +2L, t_d(Y) +2L) \in {\textbf{D}}_{4,1}^{\textrm{VR}}(G)\).

Remark 6.7

(\({\textbf{D}}_{4,1}^{\textrm{VR}}\) captures information that is invisible to \(\textrm{dgm}_*^\textrm{VR}\)) Note that, in the last example, the simplicial complex \(\textrm{VR}_r(G)\) is homotopy equivalent to \(\textrm{VR}_r(C)\) at every scale r. The reason is that the VR complex of a wedge sum \(X \vee Y\) decomposes as \(\textrm{VR}_r(X \vee Y) \simeq \textrm{VR}_r(X) \vee \textrm{VR}_r(Y)\) (see Proposition 3.7 of [2] or Theorem 4.1 in [52] for a reformulation in terms of persistence modules). Since G is the wedge sum of C with 4 edges \(E_i\), Lemma 6.2 gives that each \(\textrm{VR}(E_i)\) is contractible and, hence, \(\textrm{VR}_r(G) \simeq \textrm{VR}_r(C)\) which implies that \(\textrm{dgm}_*(G)=\textrm{dgm}(C)\). In contrast, \({\textbf{D}}_{4,1}^{\textrm{VR}}(G) \ne {\textbf{D}}_{4,1}^{\textrm{VR}}(C)\). In other words, \({\textbf{D}}_{4,1}^{\textrm{VR}}\) is able to detect features of G which the Vietoris–Rips persistence diagram does not. See Fig. 6.

Let \(F_k\) be a geodesic space formed by attaching \(2k+2\) edges of length L to \({\mathbb {S}}^{k}\) at the vertices of the regular cross-polytope. We can generalize Example 6.6 to the following proposition (cf. Figure 35).

Proposition 6.8

\({\mathbb {S}}^{k}\) and \(F_k\) have the same persistence diagrams, but \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{k}) \subsetneq {\textbf{D}}_{2k+2,k}^{\textrm{VR}}(F_{k})\).

Proof

By the explanation in the previous remark, the persistence diagrams of \({\mathbb {S}}^{k}\) and \(F_k\) are equal. Also, \({\mathbb {S}}^{k} \hookrightarrow F_k\), so \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{k}) \subset {\textbf{D}}_{2k+2,k}^{\textrm{VR}}(F_{k})\). To see that the containment is strict, suppose that the i-th edge was attached to \(y_i \in {\mathbb {S}}^{k}\) for \(i=1, \dots , 2k+2\) and choose the labels so that \(y_i\) and \(y_{i+k+1}\) are antipodal (addition of indices is done modulo \(2k+2\)). Thus, \(d_{{\mathbb {S}}^{k}}(y_i,y_j)\) equals \(\pi /2\) if \(j \ne i, i+k+1\) and \(\pi \) if \(j=i+k+1\). If \(x_i\) is the point on the i-th edge at distance L from \(y_i\), then \(d_{F_{k}}(x_i,x_j)\) is \(\pi /2+2L\) when \(j \ne i, i+k+1\) and \(\pi +2\,L\) when \(j=i+k+1\). Hence, \(t_b(X) =\pi /2+2\,L\) and \(t_d(X) =\pi +2\,L\). Since every point \((t_b, t_d) \in {\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{k})\) satisfies \(t_b < t_d \le \textbf{diam}({\mathbb {S}}^{k}) = \pi \), \((t_b(X) , t_d(X) ) \in {\textbf{D}}_{2k+2,k}^{\textrm{VR}}(F_k) {\setminus } {\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{k})\). \(\square \)

Fig. 35
figure 35

A spiky sphere. See Proposition 6.8.

Example 6.9

Not all cycles \(C \subset G\) with the induced subspace metric produce the persistence sets of a cycle graph. For instance, let G be the metric graph with edges of length 1 shown in Fig. 36. Let C be the cycle that passes through the vertices 1, 2, 6, 5, 8, 7, 3, 4. C has length 8, but there is no point (2, 4) in \({\textbf{D}}_{4,1}^{\textrm{VR}}(G)\). The reason is that the shortest path between points in C is often not contained in C, and so C is not isometric to a circle. For example, the edge connecting 1 and 5 is not contained in C despite it being the shortest path between its endpoints. We will explain this phenomenon in the next section.

Fig. 36
figure 36

Left: The cube metric graph G contains a cycle that is not isometric to a circle. Right: Its persistence set \({\textbf{D}}_{4,1}^{\textrm{VR}}(G)\) does not contain the point (2, 4). See Example 6.9. This figure was obtained by sampling 100,000 configurations of 4 points from G. About 13 % of those configurations produced a non-diagonal point.

6.2 A Family of Metric Graphs Whose Homotopy Type is Characterized via \({\textbf{D}}_{4,1}^{\textrm{VR}}\)

Recall that the persistence set \({\textbf{D}}_{4,1}^{\textrm{VR}}(\frac{\lambda }{\pi } \cdot {\mathbb {S}}^{1})\) is a triangle with vertices \((\lambda /2, \lambda )\), \((\frac{2}{3}\lambda , \frac{2}{3}\lambda )\), \((\lambda , \lambda )\). Observe that the only point in \({\textbf{D}}_{4,1}^{\textrm{VR}}(\frac{\lambda }{\pi } \cdot {\mathbb {S}}^{1})\) that satisfies \(t_d = 2t_b\) is \((\lambda /2, \lambda )\). A similar observation holds in Examples 6.6 and 6.9. In both cases, the metric graph in question contains an isometrically embedded cycle, and by functoriality, the persistence set of the metric graph contains a triangle generated by such a cycle. However, not all cycles produce such a triangle as Example 6.9 shows. Proposition 6.10 gives conditions under which \({\textbf{D}}_{4,1}^{\textrm{VR}}(G)\) is capable of detecting all cycles in G, and examples of admissible graphs are shown in Fig. 40.

Proposition 6.10

Let \(T_1,\dots ,T_m\) be a set of metric trees and, for each \(k=1,\dots ,n\), let \(C_k\) be a cycle. Suppose that all cycles have different length. Let G be a metric graph formed by iteratively attaching either a metric tree \(T_i\) or a cycle \(C_k\) along a vertex or an edge e that satisfies the following property. For any cycle \(C \subset G\) that intersects e, their lengths satisfy \(|e| < \frac{1}{3} |C|\). Then the first Betti number of G equals the number of points \((\lambda /2, \lambda ) \in {\textbf{D}}_{4,1}^{\textrm{VR}}(G)\).

We prove this statement at the end of the section. For now, we begin the road to the proof by recalling Lemma 5.1 item 1. For a 4-point set X, Lemma 5.1 says that if \(t_d(X) = 2t_b(X) \), then X has to be a square, that is, \(d_X(x_i,x_{i+1})=t_b(X) \) and \(d_X(x_i,x_{i+2})=t_d(X) \) for \(i=1,\dots ,4\). If X is a subset of a metric graph G, it is tempting to suggest that X must be contained in a cycle \(C \subset G\) isometric to \(\frac{\lambda }{\pi } \cdot {\mathbb {S}}^{1}\). However, as Fig. 37 shows, this is not always the case. Still, if G satisfies the hypothesis of Theorem 6.12, then at least we can ensure that X lies in a specific metric subgraph. Before that, we need one more preparatory result which was inspired by Theorem 3.15 in [2].

Fig. 37
figure 37

A metric graph G and a set \(X \subset G\) such that \(t_b(X) =\pi /2\) and \(t_d(X) =\pi \). Notice that the outer black cycle C contains X but is not isometric to a circle. If it were, the shortest path in G between \(p_1\) and \(p_2\) would be contained in C, but that path is the blue edge of length \(\pi -\varepsilon \).

Fig. 38
figure 38

In Lemma 6.11, any path in \(G_1\) between u and v has length greater than \(\alpha \).

Lemma 6.11

Let \(G = G_1 \cup _A G_2\) be a metric gluing of the metric graphs \(G_1\) and \(G_2\) such that \(A = G_1 \cap G_2\) is a closed path of length \(\alpha \). Let \(\ell _j\) be the length of the shortest cycle contained in \(G_j\) that intersects A, and set \(\ell = \min (\ell _1, \ell _2)\). Assume that \(\alpha < \frac{\ell }{2}\). Then the shortest path \(\gamma _{uv}\) between any two points \(u,v \in A\) is contained in A. As a consequence, if \(\frac{\lambda }{\pi } \cdot {\mathbb {S}}^{1} \hookrightarrow G\) is an isometric embedding, then \(\frac{\lambda }{\pi } \cdot {\mathbb {S}}^{1}\) is contained in either \(G_1\) or \(G_2\).

Proof

Let \(\gamma _{uv}'\) be the shortest path contained in A that connects u and v. We will show that \(\gamma _{uv} = \gamma _{uv}'\). Let \(\gamma \) be any path that joins u and v, and is contained in either \(G_1\) or \(G_2\) but not in A; see Fig. 38. Since \(\gamma \) is not contained in A, \(\gamma \cup \gamma _{uv}'\) contains a non-trivial cycle C that intersects A. Since \(\gamma _{uv}' \subset A\), its length is smaller than \(\alpha \). Then

$$\begin{aligned} 2\alpha < \ell \le |\gamma |+|\gamma _{uv}'| = |\gamma |+\alpha . \end{aligned}$$

Thus, \(|\gamma | > \alpha \ge |\gamma _{uv}'| = d_G(u,v)\). More generally, any path \(\gamma \) between u and v can be split into subpaths \(\gamma _1, \dots , \gamma _k\) such that either \(\gamma _j \subset A\), or \(\gamma _j \subset G_i\) for some \(i=1,2\) and \(\gamma _j \cap A = \{u', v'\}\), where \(u'\) and \(v'\) are the endpoints of \(\gamma _j\). Applying the reasoning above to each \(\gamma _j\) that is not contained in A shows that \(|\gamma | \ge |\gamma _{uv}'|\). In particular, we must have \(\gamma _{uv} = \gamma _{uv}'\).

Now, a cycle \(C \subset G\) is isometric to \(\frac{\lambda }{\pi } \cdot {\mathbb {S}}^{1}\) if there is a shortest path between any \(x,x' \in C\) contained in C. If \(C \cap A\) has several connected components, then C can be decomposed as the union of paths in A and paths contained in \(G_1\) or \(G_2\). If we pick two points u and v that lie in different connected components of \(G \cap A\), then the shortest sub-path of C between them will contain a sub-path that lies either in \(G_1\) or \(G_2\). By the previous paragraph, the sub-path contained in \(G_1\) or \(G_2\) has length larger than \(\alpha \ge d_G(u,v)\). Thus, the shortest path between u and v lies outside of C, so C is not isometric to \(\frac{\lambda }{\pi } \cdot {\mathbb {S}}^{1}\). Instead, the only possibility for C to be isometric to \(\frac{\lambda }{\pi } \cdot {\mathbb {S}}^{1}\) is that \(C \cap A\) is either empty or connected. This implies \(C \subset G_1\) or \(C \subset G_2\). \(\square \)

The next theorem is the main result of this section and similar in spirit to Proposition 6.5. The proof of Proposition 6.5 relied on the observation that if \(X \subset G\) has \(t_b(X) < t_d(X) \) then either X lies inside a cycle \(\frac{\lambda _i}{\pi } {\mathbb {S}}^{1}\) or, at worse, only one point lies outside. In a more general metric gluing \(G_1 \cup _A G_2\), however, the condition \(t_b(X) < t_d(X) \) is not enough to guarantee that most of X lies inside one component. Instead, we give hypotheses on \(G_1 \cup _A G_2\) under which the stronger condition \(t_d(X) = 2t_b(X) \) (as opposed to just \(t_b(X) < t_d(X) \)) implies that X is contained in either \(G_1\) or \(G_2\).

Theorem 6.12

Let \(G = G_1 \cup _A G_2\) be a metric gluing of the metric graphs \(G_1\) and \(G_2\) such that \(A = G_1 \cap G_2\) is a path of length \(\alpha \). Let \(\ell _j\) be the length of the shortest cycle contained in \(G_j\) that intersects A, and set \(\ell = \min (\ell _1, \ell _2)\). Assume that \(\alpha < \frac{\ell }{3}\). If \(X = \{x_1,x_2,x_3,x_4\} \subset G\) satisfies \(t_b(X) = \lambda /2\) and \(t_d(X) = \lambda \), then either \(X \subset G_1\) or \(X \subset G_2\).

Proof

Let \(\gamma _{ij}\) be a shortest path in G from \(x_i\) to \(x_j\). Since \(t_d(X) = 2t_b(X) \), Lemma 5.1 item 1 gives that \(d_G(x_i, v_d(x_i)) = \lambda \) and \(d_G(x_i, x) = \lambda /2\) for every \(x \ne v_d(x_i)\). For this reason, we relabel the points \(x_i\) so that \(\lambda = |\gamma _{13}| = |\gamma _{24}|\) and \(\lambda /2 = |\gamma _{12}| = |\gamma _{23}| = |\gamma _{34}| = |\gamma _{41}|\).

During this proof, if a path \(\gamma \) has one endpoint in \(G_1\) and one in \(G_2\), we decompose it as \(\gamma ^{(1)} \cup \gamma ^{(A)} \cup \gamma ^{(2)}\), where \(\gamma ^{(i)} \subset G_i\), \(\gamma ^{(A)} \subset A\) and each intersection \(\gamma ^{(i)} \cap \gamma ^{(A)}\) is a single point. Let \(X_1:= X \cap G_1\) and \(X_2:= X \cap G_2\). We will break down the proof depending on the size of \(X_1\) and \(X_2\).

Fig. 39
figure 39

Possible arrangements of paths between 4 points in Theorem 6.12. Left: \(x_1 \in G_1\) and \(x_2,x_3,x_4 \in G_2\) (Case 1). Middle: \(X_1 = \{x_1,x_2\}\) and \(X_2=\{x_3,x_4\}\) (Case 3). Right: The paths between points of X form a cycle in G (Case 3.2).

Case 0: If either \(X_1\) or \(X_2\) is empty, the theorem holds immediately.

Case 1: \(X_1\) or \(X_2\) is a singleton. Suppose that \(X_1 = \{x_1\}\) (see Fig. 39). Let \(u:= \gamma _{21}^{(1)} \cap A\) and \(v:= \gamma _{14}^{(1)} \cap A\). By Lemma 6.11, \(d_G(u,v) < |\gamma _{21}^{(1)}| + |\gamma _{14}^{(1)}|\). However, if \(\gamma _{uv}\) is a shortest path between u and v, then \(\gamma _{24}':= \gamma _{21}^{(2)} \cup \gamma _{21}^{(A)} \cup \gamma _{uv} \cup \gamma _{14}^{(A)} \cup \gamma _{14}^{(2)}\) is a path between \(x_2\) and \(x_4\) such that

$$\begin{aligned} |\gamma _{24}'|&\le |\gamma _{21}^{(2)}| + |\gamma _{21}^{(A)}| + |\gamma _{uv}| + |\gamma _{14}^{(A)}| + |\gamma _{14}^{(2)}|\\&< |\gamma _{21}^{(2)}|+ |\gamma _{21}^{(A)}| + |\gamma _{21}^{(1)}| + |\gamma _{14}^{(1)}| + |\gamma _{14}^{(A)}| + |\gamma _{14}^{(2)}|\\&= |\gamma _{21}|+|\gamma _{14}| = \lambda /2 + \lambda /2 = \lambda . \end{aligned}$$

This contradicts the assumption that \(d_G(x_2,x_4) = \lambda \).

Case 2: \(|X_1|=|X_2|=2\) and \(\textbf{diam}(X_1)=\textbf{diam}(X_2)=\lambda \). Without loss of generality, write \(X_1 = \{x_1, x_3\}\) and \(X_2 = \{x_2,x_4\}\). The path \(\gamma _{12} \cup \gamma _{23} \cup \gamma _{31}\) is a cycle in G that intersects both \(G_1\) and \(G_2\). Let \(u = \gamma _{12}^{(1)} \cap A\) and \(v = \gamma _{23}^{(1)} \cap A\), and let \(\gamma _{uv} \subset A\) be a path between them. By Lemma 6.11, \(d_G(u,v) < |\gamma _{12}^{(2)}|+|\gamma _{12}^{(A)}|+|\gamma _{23}^{(2)}|+|\gamma _{23}^{(A)}|\), so following the reasoning of Case 1, \(\gamma _{12}^{(1)} \cup \gamma _{uv} \cup \gamma _{23}^{(1)}\) is a path between \(x_1\) and \(x_3\) with length less than \(|\gamma _{12}|+|\gamma _{23}|=\lambda \). This is again a contradiction.

Case 3: \(|X_1|=|X_2|=2\) and \(\textbf{diam}(X_1)=\textbf{diam}(X_2)=\lambda /2\). Now we can assume \(X_1 = \{x_1,x_2\}\) and \(X_2=\{x_3,x_4\}\) (See Fig. 39). Let \(u = \gamma _{14}^{(1)} \cap A\), and \(v = \gamma _{23}^{(1)} \cap A\). By the triangle inequality,

$$\begin{aligned} \lambda = d_G(x_1,x_3) \le d_G(x_1,u)+d_G(u,v)+d_G(v,x_3). \end{aligned}$$
(25)

Analogously,

$$\begin{aligned} \lambda \le d_G(x_2,v)+d_G(v,u)+d_G(u,x_4). \end{aligned}$$
(26)

On the other hand, since \(\gamma _{23}\) is the shortest path between \(x_2\) and \(x_3\) and it passes through v, \(\lambda /2 = d_G(x_2,x_3) = d_G(x_2,v)+d_G(v,x_3)\). If there existed a path between v and \(x_3\) of length smaller than \(d_G(v,x_3)\), then the concatenation of that path and \(\gamma _{23}^{(1)}\) would give a path between \(x_2\) and \(x_3\) shorter than \(\gamma _{23}\). The same reasoning applies to \(x_2\) and v, so the above equality holds. By a similar argument, we get \(\lambda /2 = d_G(x_1,u)+d_G(u,x_4)\). Adding these two equations gives

$$\begin{aligned} d_G(x_1,u) + d_G(x_2,v) + d_X(v,x_3) + d_G(u,x_4) = \lambda , \end{aligned}$$

and combining this last equation with inequalities (25) and (26) produces, respectively,

$$\begin{aligned} d_G(x_2,v) + d_G(u,x_4)&\le d_G(u,v) \end{aligned}$$
(27)
$$\begin{aligned} d_G(x_1,u) + d_G(v,x_3)&\le d_G(v,u) . \end{aligned}$$
(28)

Then, using inequalities 28 and 25, we obtain \(\lambda \le 2 d_G(u,v)\). Furthermore, since \(u,v \in A\), we get \(\lambda /2 \le d_G(u,v) \le \alpha \). Now we break down case 3 depending on whether \(\gamma _{12}\) and \(\gamma _{34}\) intersect A or not.

Case 3.1: Suppose that \(\gamma _{12}\) intersects A. Write \(\gamma _{12} = \gamma _{12}^{(1)} \cup \gamma _{12}^{(A)} \cup \gamma _{12}^{(2)}\), and let \(w_i = \gamma _{12}^{(i)} \cap \gamma _{12}^{(A)}\). Let \(\gamma _{w_1}\) be a shortest path between u and \(w_1\). By the triangle inequality,

$$\begin{aligned} |\gamma _{w_1}| = d_G(u,w_1) \le d_G(u,x_1) + d_G(x_1,w_1) \le d_G(x_1,x_4)+d_G(x_1,x_2) = \lambda . \end{aligned}$$

If \(u \ne w_1\), then \(\gamma _{14}^{(1)} \cup \gamma _{w_1} \cup \gamma _{12}^{(1)}\) is a cycle that intersects A of length at most \(2\lambda \le 2\alpha \). Then, \(\ell \) is smaller than \(2\alpha \) by definition. However, this is a contradiction because \(3\alpha < \ell \) by hypothesis. Thus, \(w_1=u\), and an analogous argument shows that \(w_2=v\). Since \(\gamma _{12}\) is a shortest path between \(x_1\) and \(x_2\),

$$\begin{aligned} \lambda /2 = d_G(x_1,x_2)&= d_G(x_1,w_1)+d_G(w_1,w_2)+d_G(w_2,x_2)\\&= d_G(x_1,u) + d_G(u,v) + d_G(v,x_2) \ge d_G(u,v) \ge \lambda /2. \end{aligned}$$

Thus, \(x_1=u\) and \(x_2=v\). In other words, \(X_1 \subset A \subset G_2\), so \(X = X_1 \cup X_2 \subset G_2\). Naturally, if \(\gamma _{34}\) intersected A instead of \(\gamma _{12}\), then an analogous argument would give \(X \subset G_1\).

Case 3.2: Neither \(\gamma _{34}\) nor \(\gamma _{12}\) intersect A (see Fig. 39). Once more, let \(u = \gamma _{14}^{(1)} \cap A\), \(v = \gamma _{23}^{(1)} \cap A\), and \(\nu = d_G(u,v)\). Define the cycles \(C = \gamma _{12} \cup \gamma _{23} \cup \gamma _{34} \cup \gamma _{41}\), \(C_1 = \gamma _{12} \cup \gamma _{23}^{(1)} \cup \gamma _{uv} \cup \gamma _{41}^{(1)}\) and \(C_2 = \gamma _{34} \cup \gamma _{41}^{(2)} \cup \gamma _{41}^{(A)} \cup \gamma _{uv} \cup \gamma _{23}^{(A)} \cup \gamma _{23}^{(2)}\). Set \(L=|C|\) and \(L_j = |C_j|\) for \(j=1,2\). Clearly, \(L=2\lambda \) and \(L_1+L_2-2\nu = L = 2\lambda \). For this reason, write \(\lambda = \frac{L_1+L_2}{2}-\nu \).

For brevity, let \(\delta _1 = d_G(x_1,u), \delta _2 = d_G(x_2,v), \delta _3 = d_G(x_3,v)\), and \(\delta _4 = d_G(x_4,u)\). By definition of u and v, we have

$$\begin{aligned} \lambda /2 = d_G(x_1,x_4) = d_G(x_1,u)+d_G(u,x_4) = \delta _1+\delta _4, \end{aligned}$$
(29)

and

$$\begin{aligned} \lambda /2 = d_G(x_2,x_3) = \delta _2+\delta _3. \end{aligned}$$
(30)

Additionally,

$$\begin{aligned} L_1&= |\gamma _{12}| + |\gamma _{23}^{(1)}| + |\gamma _{uv}| + |\gamma _{14}^{(1)}| = d_G(x_1,x_2) + d_G(x_2,v) + d_G(u,v) + d_G(u,x_1) \nonumber \\&= \lambda /2+\delta _2+\nu +\delta _1, \text { and } \end{aligned}$$
(31)
$$\begin{aligned} L_2&= |\gamma _{34}| + |\gamma _{41}^{(2)} \cup \gamma _{41}^{(A)}| + |\gamma _{uv}| + |\gamma _{23}^{(A)} \cup \gamma _{23}^{(2)}|\nonumber \\&= d_G(x_3,x_4) + d_G(x_4,u) + d_G(u,v) + d_G(v,x_3) \nonumber \\&= \lambda /2+\delta _4+\nu +\delta _3. \end{aligned}$$
(32)

If we interpret the \(\delta _i\) as variables and \(L_1,L_2,\nu \), and \(\lambda \) as constants, equations (29)–(32) form a system of 4 equations with 4 variables. It can be seen that the matrix of coefficients has rank 3, so the solution has one parameter. Thus, choosing \(\delta _4 = t\) gives the general solution

$$\begin{aligned} \delta _1 = \lambda /2-t,\,\, \delta _2 = L_1-\lambda -\nu +t,\,\, \delta _3 = L_2-\lambda /2-\nu -t,\,\, \delta _4 = t. \end{aligned}$$
(33)

This means that there exists a particular number \(0 \le t \le \lambda /2\) such that the distances between points of X and u and v are given by the equations above. With this tool at hand, we now claim that at least one of the paths \(\gamma _1:= \gamma _{14}^{(1)} \cup \gamma _{uv} \cup \gamma _{23}^{(A)} \cup \gamma _{23}^{(2)}\) or \(\gamma _2:= \gamma _{14}^{(2)} \cup \gamma _{14}^{(A)} \cup \gamma _{uv} \cup \gamma _{23}^{(1)}\) has length less than \(\lambda \). This would imply that either \(d_G(x_1,x_3)\) or \(d_G(x_2,x_4)\) is less than \(\lambda \), violating the assumption that \(t_d(X) = \lambda \).

$$\begin{aligned} {\hbox {An equivalent formulation of the claim is}}\,{\max }_{t}\left( \min (|\gamma _1|,|\gamma _2|) \right) < \lambda . \end{aligned}$$
(34)

If this inequality holds, then either \(|\gamma _1|\) or \(|\gamma _2|\) is smaller than \(\lambda \), regardless of the value of t. Notice, though, that \(|\gamma _1| = \delta _1+\nu +\delta _3\) and \(|\gamma _2|=\delta _4+\nu +\delta _2\). Using the equations in (33), we see that \(|\gamma _1|+|\gamma _2| = L_1+L_2-\lambda \) is a quantity independent of t. Thus, the maximum in Eq. (34) is achieved when \(|\gamma _1|=|\gamma _2|\). This happens when \(t=\frac{1}{4}(L_2-L_1+\lambda )\), and gives \( |\gamma _1| = \frac{L_1+L_2}{2}-\nu -\frac{\lambda }{2} = \frac{L_1+L_2}{4}+\frac{\nu }{2}. \) The claim is that this quantity is less than \(\lambda = \frac{L_1+L_2}{2}-\nu \). Solving for \(\nu \) gives the equivalent \(\nu < \frac{L_1+L_2}{6}\). Recall that \(\gamma _{uv} \subset A\), that A is a path of length \(\alpha < \frac{\ell }{3}\), and that \(\ell \) is the length of the smallest cycle contained in either \(G_1\) or \(G_2\) that intersects A. Since \(C_i \subset G_i\), we have \(\nu \le \alpha < \frac{\ell }{3} \le \frac{L_1+L_2}{6}\), as desired. This forces \(d_G(x_1,x_3) \le |\gamma _1| < \lambda \), violating the assumption that \(t_d(X) = \lambda \). This concludes the proof of Case 3.2, and gives the Theorem. \(\square \)

To close up this section, we explore a consequence of Theorem 6.12. Once more, this application is inspired by [2], specifically Proposition 4.1.

Theorem 6.13

Let \(T_1,\dots ,T_m\) be a set of metric trees. For each \(k=1,\dots ,n\), let \(\lambda _k>0\) and let \(C_k\) be a cycle of length \(L_k = 2\lambda _k\). Suppose that all \(\lambda _k\) are distinct. Let G be a metric graph formed by iteratively attaching either a metric tree \(T_i\) or a cycle \(C_k\) along a vertex or an edge e that satisfies the following property. For any cycle \(C \subset G\) that intersects e, their lengths satisfy \(|e| < \frac{1}{3} |C|\). Then, the number of points \((\lambda /2, \lambda ) \in {\textbf{D}}_{4,1}^{\textrm{VR}}(G)\) is equal to the number of cycles \(C_k\) that were attached. Furthermore, if \(X \subset G\) is a set of 4 points such that \(t_b(X) = \lambda /2\) and \(t_d(X) = \lambda \), then X is contained in a cycle \(C_k\) and \(L_k = 2\lambda \).

Proof

First, label the metric trees and the cycles as \(G_1, G_2, \dots , G_N\) depending on the order that they were attached. Consider a cycle \(C_k\) and denote it as \(G_m\). Suppose that there is a path \(\gamma \) between \(x,x' \in C_k\) that intersects \(C_k\) only at x and \(x'\). We claim that the edge \([x,x']\) is in \(C_k\). Otherwise, since we are only attaching metric graphs at an edge or a vertex, there are two different metric graphs attached to \(C_k\), one at x and one at \(x'\). However, if we follow \(\gamma \), we will find a metric graph that was attached to the previous metric graphs at two disconnected segments. This contradicts the construction of G, so \([x,x']\) is an edge of \(C_k\). Thus, \(d_G(x,x') < |\gamma |\). Moreover, the only paths between non-adjacent points \(x,x' \in C_k\) lie in \(C_k\). Thus, \(C_k\) is isometric to a circle which, as a metric space, has \(\textbf{diam}_G(C_k)=\lambda _k\). Then \((\lambda _k/2, \lambda _k) \in {\textbf{D}}_{4,1}^{\textrm{VR}}(C_k) \subset {\textbf{D}}_{4,1}^{\textrm{VR}}(G)\).

Now, suppose that there is a point \((\lambda /2, \lambda ) \in {\textbf{D}}_{4,1}^{\textrm{VR}}(G)\) generated by a set \(X = \{x_1, x_2, x_3, x_4 \}\subset G\), with the labels chosen so that \(t_d(X) = \min \{d_G(x_{1}, x_{3}), d_G(x_{2}, x_{4}) \}\). By Lemma 5.1 item 1, \(t_b(X) = \lambda /2 = d_G(x_{i}, x_{i+1})\) and \(t_d(X) = \lambda = d_G(x_{i}, x_{i+2})\) for all \(1 \le i \le 4\). Find the largest m such that \(X \cap G_m \ne \emptyset \). By Theorem 6.12, either \(X \subset G_1 \cup \cdots \cup G_{m-1}\), or \(X \subset G_m\). If X is not contained in \(G_m\), we can keep using Theorem 6.12 to remove metric graphs until we find one that contains X. Notice that X cannot be contained in a metric tree \(T_i\) because of Lemma 6.2, so \(X \subset C_k\) for some k. Let \(\gamma _{i}\) be the shortest path between \(x_i\) and \(x_{i+1}\). Then the sum \(d_G(x_1,x_2)+d_G(x_2,x_3)+d_G(x_3,x_4)+d_G(x_4,x_1) = 2\lambda \) equals \(L_k\) because the path \(\gamma _1 \cup \gamma _2 \cup \gamma _3 \cup \gamma _4\) is a cycle contained in \(C_k\). Since \(L_k = 2\lambda _k\), \(\lambda = \lambda _k\). \(\square \)

Now we prove Proposition 6.10, which was stated at the start of the section. Since the metric graphs in Theorem 6.13 are pasted along a contractible space, we can detect the homotopy type of the metric graph.

Fig. 40
figure 40

Two examples of admissible metric graphs G as in Proposition 6.10 and their persistence set \({\textbf{D}}_{4,1}^{\textrm{VR}}(G)\). The red triangles are the boundaries of the sets \({\textbf{D}}_{4,1}^{\textrm{VR}}(C)\) for every cycle \(C \subset G\). Left: Two cycles of lengths \(\ell _1=3.5\) and \(\ell _2=4.5\) pasted over an edge of length \(\alpha = 0.5 < \frac{1}{3} \min (\ell _1, \ell _2)\). Right: A tree of cycles. Each persistence set was found by sampling 100,000 uniform configurations from G.

Proof of Proposition 6.10

Attaching a metric tree to a metric graph doesn’t change its homotopy type, while attaching a cycle \(C_k\) to \(G_1 \cup \dots \cup G_m\) along a contractible subspace induces a homotopy equivalence \((G_1 \cup \cdots \cup G_m) \cup C_k \simeq (G_1 \cup \cdots \cup G_m) \vee C_k\). Thus, by induction, \(G \simeq C_1 \vee \cdots \vee C_n\), and \(\beta _1(G)=n\). \(\square \)

7 Discussion and Questions

Here we mention other results that can be obtained:

  • As an application of the stability theorem and of our characterization results, one can show that the Gromov–Hausdorff distance between \({\mathbb {S}}^{1}\) and \({\mathbb {S}}^{m}\) is bounded below by \(\frac{\pi }{14.6344}\) when \(m=2\) and by \(\frac{\pi }{8}\) when \(m \ge 3\). See [43] for details.

  • As the objects \({\textbf{U}}_{n,k}^{\textrm{VR}}\) can be considerably complex, a system of coordinates \(\{\zeta _\alpha :{\mathcal {D}}\rightarrow {\mathbb {R}}\}_{\alpha \in A}\) that exhausts the information contained in the persistence measures is desirable. See the preprint version [43] for results in this direction.

  • Another consequence of the stability of persistence measures is the concentration of \({\textbf{U}}_{n,k}^{{\mathfrak {F}}}(X)\) as \(n \rightarrow \infty \), which can also be found in [43].

Now we outline some open questions and conjectures.

  • Are there rich classes of compact metric spaces that can be distinguished with persistence sets?

    This question is a generalization of Theorem 6.13 and Proposition 6.10. The persistence set \({\textbf{D}}_{4,1}^{\textrm{VR}}(G)\) captures the number and length of cycles in a metric graph G that was constructed according to the instructions in Theorem 6.13. Are there other families of compact metric spaces where higher order diagrams \({\textbf{D}}_{n,k}^{\textrm{VR}}(G)\) can detect relevant features? In other words, are there families \({\mathcal {C}}\) of compact metric spaces such that

    $$\begin{aligned} \sup _{n,k} d_{{\mathcal {H}}}^{\mathcal {D}}({\textbf{D}}_{n,k}^{\textrm{VR}}(X), {\textbf{D}}_{n,k}^{\textrm{VR}}(Y)) \end{aligned}$$

    is a metric when \(X, Y \in {\mathcal {C}}\)?

  • Description \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{m}_E)\) for all k and m: Propositions 5.17 and 5.21 are a step in that direction. In fact, the latter implies that we only need to find \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{2k}_E)\) to determine \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{m}_E)\) for all spheres with \(m \ge 2k+1\). In particular for \({\textbf{D}}_{6,2}^{\textrm{VR}}({\mathbb {S}}^{2}_E)\), does Conjecture 5.25 hold?

  • Description of \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{n}_E):\) When \(k=1\), Corollary 5.22 shows that \({\textbf{D}}_{4,1}^{\textrm{VR}}({\mathbb {S}}^{m})\) stabilizes at \(m=2\) instead of \(m=3\), as given by Proposition 5.21. The key to the reduction was the use of Ptolemy’s inequality as in Theorem 5.19. A natural follow up question, even if it is subsumed by the previous one, is when does \({\textbf{D}}_{2k+2,k}^{\textrm{VR}}({\mathbb {S}}^{m}_E)\) really stabilize for general k.