Skip to main content
Log in

Abstract

The k-nearest neighbors (k/NN) algorithm is a simple yet powerful non-parametric classifier that is robust to noisy data and easy to implement. However, with the growing literature on k/NN methods, it is increasingly challenging for new researchers and practitioners to navigate the field. This review paper aims to provide a comprehensive overview of the latest developments in the k/NN algorithm, including its strengths and weaknesses, applications, benchmarks, and available software with corresponding publications and citation analysis. The review also discusses the potential of k/NN in various data science tasks, such as anomaly detection, dimensionality reduction and missing value imputation. By offering an in-depth analysis of k/NN, this paper serves as a valuable resource for researchers and practitioners to make informed decisions and identify the best k/NN implementation for a given application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Fix, E., Hodges, J.L.: Discriminatory analysis, nonparametric discrimination, consistency properties. Project 21-49-004. Report No.4 USAF School of Aviation Medicine Randolph Field, Texas, USA, 1–21 (1951)

  2. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory IT- 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  3. Sylvester, J.J.: On Poncelet’s approximate valuation of surd forms. Philos. Mag. 20, 203–222 (1860)

    Article  Google Scholar 

  4. Jung, H.W.E.: Ueber die kleinste Kugel, die eine ráumliche Figur einschliesst. J. Reine Angew. Math. 123, 241–257 (1901)

    MathSciNet  MATH  Google Scholar 

  5. Jung, H.W.E.: Ueber den kleinsten Kreis, der eine ebene Figur einschliesst. J. Reine Angew. Math. 137, 310–313 (1909)

    MATH  Google Scholar 

  6. Blumenthal, L.M., Wahlin, G.E.: On the spherical surface of smallest radius enclosing a bounded subset of n-dimensional Euclidean space. Bull. Amer. Math. Soc. 47, 771–777 (1941)

    Article  MathSciNet  MATH  Google Scholar 

  7. Guggenheimer, H.W.: Applicable Geometry. R. E. Krieger Publishing Co, Huntigton, New York (1977)

    MATH  Google Scholar 

  8. Vrahatis, M.N.: A variant of Jung’s theorem. Bull. Greek Math. Soc. 29, 1–6 (1988)

    MathSciNet  MATH  Google Scholar 

  9. Vrahatis, M.N.: An error estimation for the method of bisection in Rn. Bull. Greek Math. Soc. 27, 161–174 (1986)

    MATH  Google Scholar 

  10. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach, Learn (1991)

    Book  MATH  Google Scholar 

  11. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)

    MathSciNet  Google Scholar 

  12. Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

  13. Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

  14. Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9588–9597 (2021)

  15. Bandaragoda, T.R., Ting, K.M., Albrecht, D., Liu, F.T., Wells, J.R.: Efficient anomaly detection by isolation using nearest neighbour ensemble. In: 2014 IEEE International Conference on Data Mining Workshop, pp. 698–705 (2014). IEEE

  16. Pang, G., Ting, K.M., Albrecht, D.: Lesinn: Detecting anomalies by identifying least similar nearest neighbours. In: 2015 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 623–630 (2015). IEEE

  17. Ting, K.M., Washio, T., Wells, J.R., Aryal, S.: Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors. Mach. Learn. 106(1), 55–91 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  18. Beretta, L., Santaniello, A.: Nearest neighbor imputation algorithms: a critical evaluation. BMC medical informatics and decision making 16(3), 197–208 (2016)

    Google Scholar 

  19. Triguero, I., García-Gil, D., Maillo, J., Luengo, J., García, S., Herrera, F.: Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. WIREs Data Mining and Knowledge Discovery 9(2) (2019)

  20. Adeniyi, D.A., Wei, Z., Yongquan, Y.: Automated web usage data mining and recommendation system using k-nearest neighbor (knn) classification method. Appl. Comput. Inform. 12(1), 90–108 (2016)

    Article  Google Scholar 

  21. Taunk, K., De, S., Verma, S., Swetapadma, A.: A brief review of nearest neighbor algorithm for learning and classification. In: 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp. 1255–1260 (2019). IEEE

  22. Kataria, A., Singh, M.: A review of data classification using k-nearest neighbour algorithm. Int. J. of Emerg. Technol. Adv. Eng. 3(6), 354–360 (2013)

  23. Sun, B., Chen, H.: A survey of nearest neighbor algorithms for solving the class imbalanced problem. Wirel. Commun. Mob. Comput. 2021 (2021)

  24. Agarwal, Y., Poornalatha, G.: Analysis of the nearest neighbor classifiers: a review. Advances in Artificial Intelligence and Data Engineering: Select Proceedings of AIDE 2019, 559–570 (2021)

    Google Scholar 

  25. Ting, K.M., Zhou, G.-T., Liu, F.T., Tan, J.S.C.: Mass estimation and its applications. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ‘10, pp. 989–998. Association for Computing Machinery, New York, NY, USA (2010). https://doi.org/10.1145/1835804.1835929

  26. Uddin, S., Haque, I., Lu, H., Moni, M.A., Gide, E.: Comparative performance analysis of k-nearest neighbour (knn) algorithm and its different variants for disease prediction. Sci. Reports 12(1), 1–11 (2022)

    Google Scholar 

  27. Fix, E., Hodges, J.L.: Discriminatory analysis. nonparametric discrimination: Consistency properties. Int. Stat. Rev./Rev. Int. de Stat. 57(3), 238–247 (1989)

  28. Welch, B.L.: Note on discriminant functions. Biometrika 31(1/2), 218–220 (1939)

    Article  MathSciNet  MATH  Google Scholar 

  29. Hellman, M.E.: The nearest neighbor classification rule with a reject option. IEEE Trans. Syst. Sci. Cybern. 6(3), 179–185 (1970)

    Article  MATH  Google Scholar 

  30. Loizou, G., Maybank, S.J.: The nearest neighbor and the bayes error rates. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI- 9(2), 254–262 (1987)

    Article  MATH  Google Scholar 

  31. Fukunaga, K., Hostetler, L.: Optimization of k nearest neighbor density estimates. IEEE Trans. Inf. Theory 19(3), 320–326 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  32. Tang, B., He, H.: Enn: Extended nearest neighbor method for pattern recognition [research frontier]. IEEE Comput. Intell. Mag. 10(3), 52–60 (2015)

    Article  Google Scholar 

  33. Yuan, B.-W., Luo, X.-G., Zhang, Z.-L., Yu, Y., Huo, H.-W., Johannes, T., Zou, X.-D.: A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets. Neural Comput. Appl. 33(9), 4457–4481 (2021)

    Article  Google Scholar 

  34. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl, Data Eng (2009)

    Google Scholar 

  35. Fernández, A., del Río, S., Chawla, N.V., Herrera, F.: An insight into imbalanced big data classification: outcomes and challenges. Complex & Intell. Syst. 3(2), 105–120 (2017)

    Article  Google Scholar 

  36. Zhang, X., Li, Y., Kotagiri, R., Wu, L., Tari, Z., Cheriet, M.: Krnn: k rare-class nearest neighbour classification. Pattern Recognit. 62, 33–44 (2017)

    Article  Google Scholar 

  37. Zhang, S.: Challenges in knn classification. IEEE Trans. Knowl. Data Eng. 34(10), 4663–4675 (2022). https://doi.org/10.1109/TKDE.2021.3049250

    Article  Google Scholar 

  38. Zeraatkar, S., Afsari, F.: Interval-valued fuzzy and intuitionistic fuzzy-knn for imbalanced data classification. Pattern Recogn. Appl. 184, 115510 (2021)

    Google Scholar 

  39. Wang, Z., Li, Y., Li, D., Zhu, Z., Du, W.: Entropy and gravitation based dynamic radius nearest neighbor classification for imbalanced problem. Knowl.-Based Syst. 193, 105474 (2020)

  40. Patel, H., Thakur, G.S.: Classification of imbalanced data using a modified fuzzy-neighbor weighted approach. Int. J. Intell. Eng. Syst. 10(1), 56–64 (2017)

    Google Scholar 

  41. Liu, S., Zhang, J., Xiang, Y., Zhou, W.: Fuzzy-based information decomposition for incomplete and imbalanced data learning. IEEE Trans. Fuzzy Syst. 25(6), 1476–1490 (2017)

    Article  Google Scholar 

  42. Li, Y., Zhang, X.: Improving k nearest neighbor with exemplar generalization for imbalanced classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 321–332 (2011). Springer

  43. Nikpour, B., Shabani, M., Nezamabadi-pour, H.: Proposing new method to improve gravitational fixed nearest neighbor algorithm for imbalanced data classification. In: 2017 2nd Conference on Swarm Intelligence and Evolutionary Computation (CSIEC), pp. 6–11 (2017). IEEE

  44. Ando, S.: Classifying imbalanced data in distance-based feature space. Knowl. Inf. Syst. 46 (2016)

  45. Yu, Z., Chen, H., Liu, J., You, J., Leung, H., Han, G.: Hybrid k-nearest neighbor classifier. IEEE Trans. Cybern. 46(6), 1263–1275 (2015)

    Article  Google Scholar 

  46. Zhu, Y., Wang, Z., Gao, D.: Gravitational fixed radius nearest neighbor for imbalanced problem. Knowl.-Based Syst. 90, 224–238 (2015)

  47. Hajizadeh, Z., Taheri, M., Jahromi, M.Z.: Nearest neighbor classification with locally weighted distance for imbalanced data. Int. J. Comput. Commun. Eng. 3(2), 81 (2014)

    Article  Google Scholar 

  48. Dubey, H., Pudi, V.: Class based weighted k-nearest neighbor over imbalance dataset. In: Advances in Knowledge Discovery and Data Mining, pp. 305–316. Springer, Berlin, Heidelberg (2013)

  49. Zhang, X., Li, Y.: A positive-biased nearest neighbour algorithm for imbalanced classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 293–304 (2013). Springer

  50. Kriminger, E., Próncipe, J.C., Lakshminarayan, C.: Nearest neighbor distributions for imbalanced classification. In: The 2012 International Joint Conference on Neural Networks (IJCNN) pp. 1–5 (2012). IEEE

  51. Liu, W., Chawla, S.: Class confidence weighted knn algorithms for imbalanced data sets. In: Advances in Knowledge Discovery and Data Mining, pp. 345–356. Springer, Berlin, Heidelberg (2011)

  52. Song, Y., Huang, J., Zhou, D., Zha, H., Giles, C.L.: Iknn: Informative k-nearest neighbor pattern classification. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 248–264 2007. Springer

  53. Abu Alfeilat, H., Hassanat, A., Lasassmeh, O., Tarawneh, A., Alhasanat, M., Eyal-Salman, H., Prasath, S.: Effects of distance measure choice on K-nearest neighbor classifier performance: A review. Big Data 7 (2019)

  54. García-Pedrajas, N., Romero del Castillo, J.A., Cerruela-García, G.: A proposal for local k values for k -nearest neighbor rule. IEEE Trans. Neural Netw. Learn. Syst. 28(2), 470–475 (2017)

  55. Zhang, S., Li, X., Zong, M., Zhu, X., Cheng, D.: Learning k for KNN classification. ACM Trans. Intell. Syst. Technol. (TIST) 8(3), 1–19 (2017)

    Google Scholar 

  56. Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R.: Efficient knn classification with different numbers of nearest neighbors. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1774–1785 (2017)

    Article  MathSciNet  Google Scholar 

  57. Hu, L.-Y., Huang, M.-W., Ke, S.-W., Tsai, C.-F.: The distance function effect on k-nearest neighbor classification for medical datasets. SpringerPlus 5 (2016)

  58. Xing, E., Jordan, M., Russell, S.J., Ng, A.: Distance metric learning with application to clustering with side-information. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15. MIT Press, Cambridge, MA (2002)

    Google Scholar 

  59. Shalev-Shwartz, S., Singer, Y., Ng, A.Y.: Online and batch learning of pseudo-metrics. In: Proceedings of the Twenty-First International Conference on Machine Learning. ICML ‘04, p. 94. Association for Computing Machinery, New York, NY, USA (2004)

  60. Goldberger, J., Hinton, G.E., Roweis, S., Salakhutdinov, R.R.: Neighbourhood components analysis. In: Saul, L., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17. MIT Press, Cambridge, MA (2004)

    Google Scholar 

  61. Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification. (2005)

  62. Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14. MIT Press, Cambridge, MA (2001)

    Google Scholar 

  63. Derrac, J., García, S., Herrera, F.: Fuzzy nearest neighbor algorithms: Taxonomy, experimental analysis and prospects. Information Sciences 260, 98–119 (2014)

    Article  Google Scholar 

  64. Gou, J., Du, L., Zhang, Y., Xiong, T.: A new distance-weighted k-nearest neighbor classifier. J. Inf, Comput. Sci. 9, 1429–1436 (2012)

  65. Dudani, S.A.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst., Man, Cybern. SMC-6(4), 325–327 (1976)

  66. Zhang, S., Cheng, D., Deng, Z., Zong, M., Deng, X.: A novel knn algorithm with data-driven k parameter computation. Pattern Recognition Letters 109, 44–54 (2018). Special Issue on Pattern Discovery from Multi-Source Data (PDMSD)

  67. He, X., Niyogi, P.: Locality preserving projections. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16. MIT Press, Cambridge, MA (2003)

    Google Scholar 

  68. Susan S., K.A.: Dst-ml-eknn: Data space transformation with metric learning and elite k-nearest neighbor cluster formation for classification of imbalanced datasets 1133 (2021)

  69. Wang, A.X., Chukova, S.S., Nguyen, B.P.: Ensemble k-nearest neighbors based on centroid displacement. Inf. Sci. 629, 313–323 (2023)

    Article  Google Scholar 

  70. Deng, S., Wang, L., Guan, S., Li, M., Wang, L.: Non-parametric nearest neighbor classification based on global variance difference. Int. J. Comput. Intell. Syst. 16(1), 26 (2023)

    Article  Google Scholar 

  71. Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 659–661 (2002)

  72. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: A data perspective. ACM Compu. Surv. (CSUR) 50(6), 1–45 (2017)

    Google Scholar 

  73. Tahir, M.A., Bouridane, A., Kurugollu, F.: Simultaneous feature selection and feature weighting using hybrid tabu search/K-nearest neighbor classifier. Pattern Recogn. Lett. 28(4), 438–446 (2007)

    Article  Google Scholar 

  74. Wang, A., An, N., Chen, G., Li, L., Alterovitz, G.: Accelerating wrapper-based feature selection with K-nearest-neighbor. Knowl.-Based Syst. 83, 81–91 (2015)

  75. Li, S., Harner, E.J., Adjeroh, D.A.: Random knn feature selection-a fast and stable alternative to random forests. BMC bioinformatics 12(1), 1–11 (2011)

    Article  Google Scholar 

  76. Park, C.H., Kim, S.B.: Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst. Appl. 42(5), 2336–2342 (2015)

    Article  Google Scholar 

  77. Xiao, C., Chaovalitwongse, W.A.: Optimization models for feature selection of decomposed nearest neighbor. IEEE Trans. Syst., Man, Cybern.: Syst. 46(2), 177–184 (2016)

  78. Garcia, S., Derrac, J., Cano, J., Herrera, F.: Prototype selection for nearest neighbor classification: Taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)

    Article  Google Scholar 

  79. Arnaiz-González, Á., Díez-Pastor, J.-F., Rodríguez, J.J., García-Osorio, C.: Instance selection of linear complexity for big data. Knowl.-Based Syst. 107, 83–95 (2016)

  80. Triguero, I., Peralta, D., Bacardit, J., García, S., Herrera, F.: Mrpr: a mapreduce solution for prototype reduction in big data classification. Neurocomputing 150, 331–345 (2015)

    Article  Google Scholar 

  81. Sisodia, D., Sisodia, D.S.: Quad division prototype selection-based knearest neighbor classifier for click fraud detection from highly skewed user click dataset. Eng. Sci.Technol., Int. J. 28, 101011 (2022)

  82. Zhang, X., Xiao, H., Gao, R., Zhang, H., Wang, Y.: K-nearest neighbors rule combining prototype selection and local feature weighting for classification. Knowl.-Based Syst. 243, 108451 (2022)

  83. Song, Y., Liang, J., Lu, J., Zhao, X.: An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing 251, 26–34 (2017)

    Article  Google Scholar 

  84. Minsky, M., Papert, S.: An introduction to computational geometry. Cambridge tiass., HIT 479, 480 (1969)

  85. Rivest, R.L.: On the optimality of elia’s algorithm for performing bestmatch searches. In: IFIP Congress, pp. 678–681 (1974)

  86. Knuth, D.E., et al.: The Art of Computer Programming, vol. 3. Addison- Wesley Reading, MA (1973)

    MATH  Google Scholar 

  87. Shamos, M.I.: Geometric complexity. In: Proceedings of the Seventh Annual ACM Symposium on Theory of Computing, pp. 224–233 (1975)

  88. Chew, L.P., Dyrsdale III, R.L.: Voronoi diagrams based on convex distance functions. In: Proceedings of the First Annual Symposium on Computational Geometry, pp. 235–244 (1985)

  89. Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Informatica 4(1), 1–9 (1974)

    Article  MATH  Google Scholar 

  90. Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. (TOMS) 3(3), 209–226 (1977)

    Article  MATH  Google Scholar 

  91. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 97–104 (2006)

  92. Silpa-Anan, C., Hartley, R.: Optimised kd-trees for fast image descriptor matching. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008). IEEE

  93. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), 2, pp. 2161–2168 (2006). Ieee

  94. Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP 2(331–340), 2 (2009)

    Google Scholar 

  95. Indyk, P.: Nearest neighbors in high-dimensional spaces (2004)

  96. Indyk, P., Motwani, R.: Approximate nearest neighbors: Towards removing the curse of dimensionality. Conf. Proc. Ann. ACM Symp. Theory Comput. 604–613 (2000)

  97. Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)

    Article  Google Scholar 

  98. He, J., Liu, W., Chang, S.-F.: Scalable similarity search with optimized kernel hashing. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1129–1138 (2010)

  99. Xu, H., Wang, J., Li, Z., Zeng, G., Li, S., Yu, N.: Complementary hashing for approximate nearest neighbor search. In: 2011 International Conference on Computer Vision, pp. 1631–1638 (2011)

  100. Iwamura, M., Sato, T., Kise, K.: What is the most efficient way to select nearest neighbor candidates for fast approximate nearest neighbor search? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3535–3542 (2013)

  101. Andoni, A., Razenshteyn, I.: Optimal data-dependent hashing for approximate near neighbors. In: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing. STOC ‘15, pp. 793–801. Association for Computing Machinery, New York, NY, USA

  102. Wang, J., Zhang, T., song, j., Sebe, N., Shen, H.T.: A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 769–790 (2018)

  103. Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2010)

    Article  Google Scholar 

  104. Babenko, A., Lempitsky, V.: The inverted multi-index. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1247–1260 (2014)

    Article  Google Scholar 

  105. Vasuki, A., Vanathi, P.: A review of vector quantization techniques. IEEE Potentials 25(4), 39–47 (2006)

    Article  Google Scholar 

  106. Wu, Z.-b., Yu, J.-q.: Vector quantization: a review. Front. Inf. Technol. & Electron. Eng. 20(4), 507–524 (2019)

  107. Wang, M., Xu, X., Yue, Q., Wang, Y.: A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search. arXiv preprint arXiv:2101.12631 (2021)

  108. Shimomura, L.C., Oyamada, R.S., Vieira, M.R., Kaster, D.S.: A survey on graph-based methods for similarity searches in metric spaces. Inf. Syst. 95, 101507 (2021)

    Article  Google Scholar 

  109. Chatzimilioudis, G., Costa, C., Zeinalipour-Yazti, D., Lee, W.-C., Pitoura, E.: Distributed in-memory processing of all k nearest neighbor queries. IEEE Trans. Knowl. Data Eng. 28(4), 925–938 (2015)

    Article  Google Scholar 

  110. Patwary, M.M.A., Satish, N.R., Sundaram, N., Liu, J., Sadowski, P., Racah, E., Byna, S., Tull, C., Bhimji, W., Dubey, P., et al.: Panda: Extreme scale parallel k-nearest neighbor on distributed architectures. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 494–503 (2016). IEEE

  111. Kim, W., Kim, Y., Shim, K.: Parallel computation of k-nearest neighbor joins using mapreduce. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 696–705 (2016). IEEE

  112. Maillo, J., Ramírez, S., Triguero, I., Herrera, F.: KNN-IS: An iterative spark-based design of the k-nearest neighbors classifier for big data. Knowl.-Based Syst.117, 3–15 (2017)

  113. Deza, M.M., Deza, E.: Encyclopedia of distances. In: Encyclopedia of Distances, pp. 1–583. Springer, Heidelberg (2009)

  114. Johnson, J., Douze, M., Jígou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)

    Article  Google Scholar 

  115. Chen, Q., Wang, H., Li, M., Ren, G., Li, S., Zhu, J., Li, J., Liu, C., Zhang, L., Wang, J.: SPTAG: A Library for Fast Approximate Nearest Neighbor Search. (2018). https://github.com/Microsoft/SPTAG

  116. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MATH  Google Scholar 

  117. Lamrous, S., Taileb, M.: Divisive hierarchical k-means. In: 2006 International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA’06), pp. 18–18 (2006). IEEE

  118. Iwasaki, M.: Proximity search in metric spaces using approximate k nearest neighbor graph. IPSJ Trans. Database 3(1), 18–28 (2010)

    Google Scholar 

  119. Iwasaki, M.: Proximity search using approximate k nearest neighbor graph with a tree structured index. IPSJ J. 52(2), 817–828 (2011)

    Google Scholar 

  120. Iwasaki, M.: Applying a graph-structured index to product image search. J. Inst. Image Electr. Eng. of Japan 42(5), 633–641 (2013).https://doi.org/10.11371/iieej.42.633

  121. Iwasaki, M.: Pruned bi-directed k-nearest neighbor graph for proximity search. In: SISAP (2016)

  122. Sugawara, K., Kobayashi, H., Iwasaki, M.: On approximately searching for similar word embeddings. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2265–2275 (2016)

  123. Iwasaki, M., Miyazaki, D.: Optimization of indexing based on k-nearest neighbor graph for proximity search in high-dimensional data. (2018). arXiv preprint arXiv:1810.07355

  124. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  125. Omohundro, S.M.: Five Balltree Construction Algorithms. International Computer Science Institute Berkeley, Berkeley, CA (1989)

    Google Scholar 

  126. Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., et al.: Scipy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17(3), 261–272 (2020)

  127. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  128. Arya, S., Mount, D.: Ann: library for approximate nearest neighbor searching. In: Proceedings of IEEE CGC Workshop on Computational Geometry, Providence, RI (1998)

  129. Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM (JACM) 45(6), 891–923 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  130. Jayaram Subramanya, S., Devvrit, F., Simhadri, H.V., Krishnawamy, R., Kadekodi, R.: Diskann: Fast accurate billion-point nearest neighbor search on a single node. Adv. Neural Inf. Process. Syst. 32 (2019)

  131. Chen, Q.,Wang, H., Li, M., Ren, G., Li, S., Zhu, J., Li, J., Liu, C., Zhang, L., Wang, J.: SPTAG: A library for fast approximate nearest neighbor search. GitHub. (2018) https://github.com/Microsoft/SPTAG

  132. Curtin, R.R., Edel, M., Lozhnikov, M., Mentekidis, Y., Ghaisas, S., Zhang, S.: mlpack 3: a fast, flexible machine learning library. Journal of Open Source Software 3(26), 726 (2018)

    Article  Google Scholar 

  133. Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 824–836 (2018)

    Article  Google Scholar 

  134. Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)

    Article  Google Scholar 

  135. Boytsov, L., Naidan, B.: Engineering efficient and effective non-metric space library. In: International Conference on Similarity Search and Applications, pp. 280–293 (2013). Springer

  136. Kriegel, H.-P., Schubert, E., Zimek, A.: The (black) art of runtime evaluation: Are we comparing algorithms or implementations? Knowledge and Information Systems 52(2), 341–378 (2017)

    Article  Google Scholar 

  137. Aumüler, M., Bernhardsson, E., Faithfull, A.: Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. In: International Conference on Similarity Search and Applications, pp. 34–49 (2017). Springer

  138. Simhadri, H.V., Williams, G., Aumüller, M., Douze, M., Babenko, A., Baranchuk, D., Chen, Q., Hosseini, L., Krishnaswamy, R., Srinivasa, G., et al.: Results of the neurips’21 challenge on billion-scale approximate nearest neighbor search. (2022) arXiv preprint arXiv:2205.03763

  139. Li, W., Zhang, Y., Sun, Y., Wang, W., Zhang, W., Lin, X.: Approximate Nearest Neighbor Search on High Dimensional Data–Experiments, Analyses, and Improvement (v1.0). (2016). arXiv arXiv:1610.02455

  140. Sun, Y., Wang, W., Qin, J., Zhang, Y., Lin, X.: Srs: solving capproximate nearest neighbor queries in high dimensional euclidean space with a tiny index. Proc, VLDB Endowment (2014)

    Google Scholar 

  141. Ge, T., He, K., Ke, Q., Sun, J.: Optimized product quantization. IEEE Tans. Pattern Anal. Mach. Intell. 36(4), 744–755 (2013)

    Article  Google Scholar 

  142. Bischl, B., Casalicchio, G., Feurer, M., Hutter, F., Lang, M., Mantovani, R.G., van Rijn, J.N., Vanschoren, J.: Openml benchmarking suites. (2017). arXiv preprint arXiv:1708.03731

  143. Piccolo, S.R., Lee, T.J., Suh, E., Hill, K.: Shinylearner: A containerized benchmarking tool for machine-learning classification of tabular data. GigaScience 9(4), 026 (2020)

    Article  Google Scholar 

  144. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. ACM Sigplan notices 42(6), 89–100 (2007)

    Article  Google Scholar 

  145. Bischl, B., Casalicchio, G., Feurer, M., Hutter, F., Lang, M., Mantovani, R.G., van Rijn, J.N., Vanschoren, J.: Openml benchmarking suites and the openml100. stat 1050, 11 (2017)

  146. Vanschoren, J., Van Rijn, J.N., Bischl, B., Torgo, L.: Openml: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2014)

    Article  Google Scholar 

  147. Cunningham, P., Delany, S.J.: k-nearest neighbour classifiers-a tutorial. ACM Computing Surveys (CSUR) 54(6), 1–25 (2021)

    Article  Google Scholar 

  148. Shaban, W.M., Rabie, A.H., Saleh, A.I., Abo-Elsoud, M.A.: A new covid- 19 patients detection strategy (cpds) based on hybrid feature selection and enhanced knn classifier. Knowl.-Based Syst. 205, 106270 (2020)

  149. Deng, Z., Zhu, X., Cheng, D., Zong, M., Zhang, S.: Efficient knn classification algorithm for big data. Neurocomputing 195, 143–148 (2016)

    Article  Google Scholar 

  150. Miao, X., Gao, Y., Chen, G., Zheng, B., Cui, H.: Processing incomplete k nearest neighbor search. IEEE Trans. Fuzzy Syst. 24(6), 1349–1363 (2016)

    Article  Google Scholar 

  151. Begum, S., Chakraborty, D., Sarkar, R.: Data classification using feature selection and knn machine learning approach. In: 2015 International Conference on Computational Intelligence and Communication Networks (CICN), pp. 811–814 (2015)

  152. Van Hulse, J., Khoshgoftaar, T.M.: Incomplete-case nearest neighbor imputation in software measurement data. Inf. Sci. 259, 596–610 (2014)

    Article  Google Scholar 

  153. Eirola, E., Doquire, G., Verleysen, M., Lendasse, A.: Distance estimation in numerical data sets with missing values. Inf. Sci. 240, 115–128 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  154. Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: Knn model-based approach in classification. In: OTM Confederated International Conferences“ On the Move to Meaningful Internet Systems”, pp. 986–996 (2003). Springer

  155. Imandoust, S.B., Bolandraftar, M., et al.: Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. Int. J. Eng. Res. Appl. 3(5), 605–610 (2013)

    Google Scholar 

  156. Jin, Z., Li, C., Lin, Y., Cai, D.: Density sensitive hashing. IEEE Trans. Cybern. 44(8), 1362–1371 (2013)

    Article  Google Scholar 

  157. Triguero, I., García, S., Herrera, F.: Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification. Pattern Recog. 44(4), 901–916 (2011)

    Article  Google Scholar 

  158. Fayed, H.A., Atiya, A.F.: A novel template reduction approach for the knearest neighbor method. IEEE Trans. Neural Networks 20(5), 890–896 (2009)

    Article  Google Scholar 

  159. Kamath, S.D., Mahato, K.K.: Principal component analysis (pca)-based k-nearest neighbor (k-nn) analysis of colonic mucosal tissue fluorescence spectra. Photomed. Laser Surg. 27(4), 659–668 (2009)

    Article  Google Scholar 

  160. Wong, W.K., Cheung, D.W.-l., Kao, B., Mamoulis, N.: Secure knn computation on encrypted databases. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 139–152 (2009)

  161. Wang, J., Neskovic, P., Cooper, L.N.: Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence. Pattern Recog. 39(3), 417–423 (2006)

    Article  MATH  Google Scholar 

  162. Sfetsos, A., Siriopoulos, C.: Time series forecasting with a hybrid clustering scheme and pattern recognition. IEEE Trans. Syst., Man, Cybern.-Part A: Syst. Hum. 34(3), 399–405 (2004)

  163. Wettschereck, D., Dietterich, T.: Locally adaptive nearest neighbor algorithms. In: Cowan, J., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6. Morgan-Kaufmann, Burlington, MA (1993)

    Google Scholar 

  164. Yianilos, P.N.: Data structures and algorithms for nearest neighbor. In: Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, vol. 66, p. 311 (1993). SIAM

Download references

Acknowledgements

The authors express their gratitude to the anonymous reviewers for their valuable feedback and insightful comments on the earlier version of the manuscript. Their constructive criticism greatly contributed to the improvement of the paper, and we are grateful for their time and effort in reviewing our work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panos K. Syriopoulos.

Ethics declarations

The authors declare that there is no competing interest of any kind, directly or indirectly related to this work. A data availability statement is not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Syriopoulos, P.K., Kalampalikis, N.G., Kotsiantis, S.B. et al. kNN Classification: a review. Ann Math Artif Intell (2023). https://doi.org/10.1007/s10472-023-09882-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10472-023-09882-x

Keywords

Mathematics Subject Classification (2010)

Navigation