Minkowski-type distances in approximate query searches

Singh, Arpan; Jayaram, Balasubramaniam

doi:10.1007/s40314-024-02704-8

Minkowski-type distances in approximate query searches

Published: 20 April 2024

Volume 43, article number 187, (2024)
Cite this article

Computational and Applied Mathematics Aims and scope Submit manuscript

32 Accesses
Explore all metrics

Abstract

In approximate query searching (AQS), the given query point (\({\bar{\textbf{q}}}'\)) can be seen as a noise (\({{\bar{\eta }}}\)) corrupted version of one of the points (\({\bar{\textbf{q}}}\)) in the existing database \({\mathcal {X}}\), i.e., \({\bar{\textbf{q}}}' = {\bar{\textbf{q}}} + {\bar{\mathbf{\eta }}}\). Thus deciding on an appropriate distance d that would return the correct match (\({\bar{\textbf{q}}}\)) entails that the chosen distance should be aware of the type of distribution of the noise. In this work, we study the suitability of Minkowski-type distances in AQS when the \({\bar{\textbf{q}}}\) is afflicted by both white and coloured noises to different extent. To this end, we employ a simple similarity search based scoring algorithm proposed in François et al. (ESANN 2005, 13th European Symposium on Artificial Neural Networks, Bruges, Belgium, April 27–29, 2005, Proceedings, pp 339–344, 2005). Our study reveals an interesting interplay of the following 3D’s in the quest for an appropriate distance: Dimensionality and Domain geometry of the data and the type of noise Distribution and has led us to explore this problem from a basic geometric perspective. Our main contribution herein is the proposal of a novel index called the Relative Contained Volume (RCV) that helps explain the performance of the considered distances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Concept of Relational Similarity Search

Approximate Nearest Neighbor Search Using Query-Directed Dense Graph

Transition-Sensitive Distances

Data availability

The analysis of all the data generated during this study is included in this article. The MATLAB codes to generate and analyse the data are available from the corresponding author on reasonable request.

Notes

Note that \(\left( \ell _p({\bar{\textbf{x}}}, {\bar{\textbf{y}}}) \right) ^p = \Vert {\bar{\textbf{x}}} - {\bar{\textbf{y}}} \Vert _p^p = \sum _{i=1}^m |x_i - y_i \vert ^p \) is, in fact, not a metric since it does not satisfy the triangle inequality but it satisfies the other two properties of a metric and hence the choice of our terminology.
This is an expanded version of the preliminary findings presented in Singh and Jayaram (2020).
Perhaps this explains the choice of \(\sigma _i = 0.3\) in François et al. (2005) where \({\mathcal {D}} = [0,1]^m\).
Notice that the scale on the y-axis is in the order of \(10^{-4}\) in some of the plots in Figs. 5, 6, 7, and 8.

References

Aeberhard S, Forina M (1991) Wine. UCI Machine Learning Repository. https://doi.org/10.24432/C5PC7J
Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: Database theory—ICDT 2001, 8th International Conference, London, UK, January 4–6, 2001, Proceedings, pp 420–434
Bator M (2015) Dataset for sensorless drive diagnosis. UCI Machine Learning Repository. https://doi.org/10.24432/C5VP5F
Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Database theory—ICDT ’99, 7th International Conference, Jerusalem, Israel, January 10–12, 1999, Proceedings, pp 217–235
Blum A, Hopcroft J, Kannan R (2020) Foundations of data science. Cambridge University Press, Cambridge
Book Google Scholar
Bock R (2007) Magic gamma telescope. UCI Machine Learning Repository. https://doi.org/10.24432/C52C8B
Durrant RJ, Kabán A (2009) When is ‘nearest neighbour’ meaningful: a converse theorem and implications. J Complex 25(4):385–397
Article MathSciNet Google Scholar
François D, Wertz V, Verleysen M (2005) Non-Euclidean metrics for similarity search in noisy datasets. In: ESANN 2005, 13th European symposium on artificial neural networks, Bruges, Belgium, April 27–29, 2005, Proceedings, pp 339–344
François D, Wertz V, Verleysen M (2007) The concentration of fractional distances. IEEE Trans Knowl Data Eng 19(7):873–886
Article Google Scholar
Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: VLDB 2000, Proceedings of 26th international conference on very large data bases, September 10–14, 2000, Cairo, Egypt, pp 506–515
Hu Y, Yu M, Wang H, Ting Z (2015) A similarity-based learning algorithm using distance transformation. IEEE Trans Knowl Data Eng 27(6):1452–1464
Article Google Scholar
Jayaram B, Klawonn F (2012) Can unbounded distance measures mitigate the curse of dimensionality? Int J Data Min Model Manag 4(4):361–383
Google Scholar
Klawonn F, Höppner F, Jayaram B (2012) What are clusters in high dimensions and are they difficult to find? In: Revised selected papers of the first international workshop on clustering high-dimensional data, vol 7627. Springer, Berlin, pp 14–33
Kumari S, Jayaram B (2017) Measuring concentration of distances—an effective and efficient empirical index. IEEE Trans Knowl Data Eng 29(2):373–386
Article Google Scholar
Pestov V (2000) On the geometry of similarity search: dimensionality curse and concentration of measure. Inf Process Lett 73(1–2):47–51
Article MathSciNet Google Scholar
Qiao M, Li J (2016) Distance-based mixture modeling for classification via hypothetical local mapping. Stat Anal Data Min 9(1):43–57
Article MathSciNet Google Scholar
Singh A, Jayaram B (2020) Performance of Minkowski-type distances in similarity search—a geometrical approach. In: IEEE 5th International Conference on Computing Communication and Automation (ICCCA), pp 467–47
Smith DJ, Vamanamurthy MK (1989) How small is a unit ball? Math Mag 62(2):101–107
Article MathSciNet Google Scholar
Wang Z, Bovik AC (2009) Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1):98–117
Article Google Scholar
Weinberger KQ, Sha F, Saul LK (2010) Convex optimizations for distance metric learning and pattern classification [Applications Corner]. IEEE Signal Process Mag 27(3):146–158
Article Google Scholar
Wolberg SW, William, Mangasarian O (1995) Breast cancer Wisconsin (prognostic). UCI Machine Learning Repository. https://doi.org/10.24432/C5GK50

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Department of Mathematics, Indian Institute of Technology Hyderabad, Hyderabad, 502285, India
Arpan Singh & Balasubramaniam Jayaram

Authors

Arpan Singh
View author publications
You can also search for this author in PubMed Google Scholar
Balasubramaniam Jayaram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arpan Singh.

Ethics declarations

Conflict of interest

The authors declare that there is no Conflict of interest. The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Singh, A., Jayaram, B. Minkowski-type distances in approximate query searches. Comp. Appl. Math. 43, 187 (2024). https://doi.org/10.1007/s40314-024-02704-8

Download citation

Received: 08 August 2023
Revised: 24 February 2024
Accepted: 16 March 2024
Published: 20 April 2024
DOI: https://doi.org/10.1007/s40314-024-02704-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minkowski-type distances in approximate query searches

Abstract

Access this article

Similar content being viewed by others

Concept of Relational Similarity Search

Approximate Nearest Neighbor Search Using Query-Directed Dense Graph

Transition-Sensitive Distances

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Minkowski-type distances in approximate query searches

Abstract

Access this article

Similar content being viewed by others

Concept of Relational Similarity Search

Approximate Nearest Neighbor Search Using Query-Directed Dense Graph

Transition-Sensitive Distances

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation