Skip to main content
Log in

Minkowski-type distances in approximate query searches

  • Published:
Computational and Applied Mathematics Aims and scope Submit manuscript

Abstract

In approximate query searching (AQS), the given query point (\({\bar{\textbf{q}}}'\)) can be seen as a noise (\({{\bar{\eta }}}\)) corrupted version of one of the points (\({\bar{\textbf{q}}}\)) in the existing database \({\mathcal {X}}\), i.e., \({\bar{\textbf{q}}}' = {\bar{\textbf{q}}} + {\bar{\mathbf{\eta }}}\). Thus deciding on an appropriate distance d that would return the correct match (\({\bar{\textbf{q}}}\)) entails that the chosen distance should be aware of the type of distribution of the noise. In this work, we study the suitability of Minkowski-type distances in AQS when the \({\bar{\textbf{q}}}\) is afflicted by both white and coloured noises to different extent. To this end, we employ a simple similarity search based scoring algorithm proposed in François et al. (ESANN 2005, 13th European Symposium on Artificial Neural Networks, Bruges, Belgium, April 27–29, 2005, Proceedings, pp 339–344, 2005). Our study reveals an interesting interplay of the following 3D’s in the quest for an appropriate distance: Dimensionality and Domain geometry of the data and the type of noise Distribution and has led us to explore this problem from a basic geometric perspective. Our main contribution herein is the proposal of a novel index called the Relative Contained Volume (RCV) that helps explain the performance of the considered distances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The analysis of all the data generated during this study is included in this article. The MATLAB codes to generate and analyse the data are available from the corresponding author on reasonable request.

Notes

  1. Note that \(\left( \ell _p({\bar{\textbf{x}}}, {\bar{\textbf{y}}}) \right) ^p = \Vert {\bar{\textbf{x}}} - {\bar{\textbf{y}}} \Vert _p^p = \sum _{i=1}^m |x_i - y_i \vert ^p \) is, in fact, not a metric since it does not satisfy the triangle inequality but it satisfies the other two properties of a metric and hence the choice of our terminology.

  2. This is an expanded version of the preliminary findings presented in Singh and Jayaram (2020).

  3. Perhaps this explains the choice of \(\sigma _i = 0.3\) in François et al. (2005) where \({\mathcal {D}} = [0,1]^m\).

  4. Notice that the scale on the y-axis is in the order of \(10^{-4}\) in some of the plots in Figs. 5, 6, 7, and 8.

References

  • Aeberhard S, Forina M (1991) Wine. UCI Machine Learning Repository. https://doi.org/10.24432/C5PC7J

  • Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: Database theory—ICDT 2001, 8th International Conference, London, UK, January 4–6, 2001, Proceedings, pp 420–434

  • Bator M (2015) Dataset for sensorless drive diagnosis. UCI Machine Learning Repository. https://doi.org/10.24432/C5VP5F

  • Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: Database theory—ICDT ’99, 7th International Conference, Jerusalem, Israel, January 10–12, 1999, Proceedings, pp 217–235

  • Blum A, Hopcroft J, Kannan R (2020) Foundations of data science. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Bock R (2007) Magic gamma telescope. UCI Machine Learning Repository. https://doi.org/10.24432/C52C8B

  • Durrant RJ, Kabán A (2009) When is ‘nearest neighbour’ meaningful: a converse theorem and implications. J Complex 25(4):385–397

    Article  MathSciNet  Google Scholar 

  • François D, Wertz V, Verleysen M (2005) Non-Euclidean metrics for similarity search in noisy datasets. In: ESANN 2005, 13th European symposium on artificial neural networks, Bruges, Belgium, April 27–29, 2005, Proceedings, pp 339–344

  • François D, Wertz V, Verleysen M (2007) The concentration of fractional distances. IEEE Trans Knowl Data Eng 19(7):873–886

    Article  Google Scholar 

  • Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: VLDB 2000, Proceedings of 26th international conference on very large data bases, September 10–14, 2000, Cairo, Egypt, pp 506–515

  • Hu Y, Yu M, Wang H, Ting Z (2015) A similarity-based learning algorithm using distance transformation. IEEE Trans Knowl Data Eng 27(6):1452–1464

    Article  Google Scholar 

  • Jayaram B, Klawonn F (2012) Can unbounded distance measures mitigate the curse of dimensionality? Int J Data Min Model Manag 4(4):361–383

    Google Scholar 

  • Klawonn F, Höppner F, Jayaram B (2012) What are clusters in high dimensions and are they difficult to find? In: Revised selected papers of the first international workshop on clustering high-dimensional data, vol 7627. Springer, Berlin, pp 14–33

  • Kumari S, Jayaram B (2017) Measuring concentration of distances—an effective and efficient empirical index. IEEE Trans Knowl Data Eng 29(2):373–386

    Article  Google Scholar 

  • Pestov V (2000) On the geometry of similarity search: dimensionality curse and concentration of measure. Inf Process Lett 73(1–2):47–51

    Article  MathSciNet  Google Scholar 

  • Qiao M, Li J (2016) Distance-based mixture modeling for classification via hypothetical local mapping. Stat Anal Data Min 9(1):43–57

    Article  MathSciNet  Google Scholar 

  • Singh A, Jayaram B (2020) Performance of Minkowski-type distances in similarity search—a geometrical approach. In: IEEE 5th International Conference on Computing Communication and Automation (ICCCA), pp 467–47

  • Smith DJ, Vamanamurthy MK (1989) How small is a unit ball? Math Mag 62(2):101–107

    Article  MathSciNet  Google Scholar 

  • Wang Z, Bovik AC (2009) Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1):98–117

    Article  Google Scholar 

  • Weinberger KQ, Sha F, Saul LK (2010) Convex optimizations for distance metric learning and pattern classification [Applications Corner]. IEEE Signal Process Mag 27(3):146–158

    Article  Google Scholar 

  • Wolberg SW, William, Mangasarian O (1995) Breast cancer Wisconsin (prognostic). UCI Machine Learning Repository. https://doi.org/10.24432/C5GK50

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arpan Singh.

Ethics declarations

Conflict of interest

The authors declare that there is no Conflict of interest. The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, A., Jayaram, B. Minkowski-type distances in approximate query searches. Comp. Appl. Math. 43, 187 (2024). https://doi.org/10.1007/s40314-024-02704-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40314-024-02704-8

Keywords

Mathematics Subject Classification

Navigation