Skip to main content
Log in

Information theory divergences in principal component analysis

  • Short Paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The metric learning area studies methodologies to find the most appropriate distance function for a given dataset. It was shown that dimensionality reduction algorithms are closely related to metric learning because, in addition to obtaining a more compact representation of the data, such methods also implicitly derive a distance function that best represents similarity between a pair of objects in the collection. Principal Component Analysis is a traditional linear dimensionality reduction algorithm that is still widely used by researchers. However, its procedure faithfully represents outliers in the generated space, which can be an undesirable characteristic in pattern recognition applications. With this is mind, it was proposed the replacement of the traditional punctual approach by a contextual one based on the data samples neighborhoods. This approach implements a mapping from the usual feature space to a parametric feature space, where the difference between two samples is defined by the vector whose scalar coordinates are given by the statistical divergence between two probability distributions. It was demonstrated for some divergences that the new approach outperforms several existing dimensionality reduction algorithms in a wide range of datasets. Although, it is important to investigate the framework divergence sensitivity. Experiments using Total Variation, Renyi, Sharma-Mittal and Tsallis divergences are exhibited in this paper and the results evidence the method robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Availability of data and materials

The datasets analyzed during the current study are available in www.openml.org.

Code availability

Code is available from the corresponding author on reasonable request.

References

  1. Li D, Tian Y (2018) Survey and experimental study on metric learning methods. Neural Netw 105:447–462

    Article  PubMed  Google Scholar 

  2. Wang F, Sun J (2015) Survey on distance metric learning and dimensionality reduction in data mining. Data Min Knowl Discov 29(2):534–564

    Article  MathSciNet  Google Scholar 

  3. Bellet A, Habrard A, Sebban M (2013) A survey on metric learning for feature vectors and structured data. In: CoRR arXiv: 1306.6709

  4. Suárez JL, García S, Herrera F (2021) A tutorial on distance metric learning: Mathematical foundations, algorithms, experimental analysis, prospects and challenges. Neurocomputing 425:300–322

    Article  Google Scholar 

  5. Yang L, Jin R (2006) Distance metric learning: a comprehensive survey. Michigan State University

    Google Scholar 

  6. Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, Aberdeen, p 487

    Google Scholar 

  7. Levada AL (2020) Parametric PCA for unsupervised metric learning. Pattern Recogn Lett 135:425–430

    Article  ADS  Google Scholar 

  8. Levada ALM (2021) PCA-KL: a parametric dimensionality reduction approach for unsupervised metric learning. Adv Data Anal Classif 15(4):829–868

    Article  MathSciNet  Google Scholar 

  9. Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Article  CAS  PubMed  ADS  Google Scholar 

  10. Verdu S (2014) Total variation distance and the distribution of relative information. In: 2014 information theory and applications workshop (ITA), pp 1– 3

  11. Nielsen F, Sun K (2018) Guaranteed deterministic bounds on the total variation distance between univariate mixtures. In: 2018 IEEE 28th international workshop on machine learning for signal processing (MLSP), pp 1–6

  12. van Erven T, Harremos P (2014) Rényi divergence and Kullback–Leibler divergence. IEEE Trans Inf Theory 60(7):3797–3820

    Article  Google Scholar 

  13. Gil M, Alajaji F, Linder T (2013) Rényi divergence measures for commonly used univariate continuous distributions. Inf Sci 249:124–131

    Article  Google Scholar 

  14. Havrda J, Charvat F (1967) Quantification method of classification processes. Kiberbetika Cislo 1(3):30–34

    Google Scholar 

  15. Tsallis C (1988) Possible generalization of Boltzmann–Gibbs statistics. J Stat Phys 52:479–487

    Article  MathSciNet  ADS  Google Scholar 

  16. Nielsen F, Nock R (2011) On rényi and tsallis entropies and divergences for exponential families. arXiv preprint arXiv:1105.3259

  17. Nielsen F, Nock R (2011) A closed-form expression for the Sharma–Mittal entropy of exponential families. J Phys A Math Theory 45(3):032003

    Article  MathSciNet  ADS  Google Scholar 

  18. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comp Appl Math 20:53–65

    Article  Google Scholar 

  19. Markopoulos PP, Kundu S, Chamadia S, Pados DA (2017) Efficient l1-norm principal-component analysis via bit flipping. IEEE Trans Signal Process 65(16):4252–4264

    Article  MathSciNet  ADS  Google Scholar 

  20. Yi S, Lai Z, He Z, Cheung Y-M, Liu Y (2017) Joint sparse principal component analysis. Pattern Recogn 61:524–536

    Article  ADS  Google Scholar 

  21. Schölkopf B, Smola A, Müller K-R (1997) Kernel principal component analysis. In: Gerstner W, Germond A, Hasler M, Nicoud J-D (eds) Artificial neural networks–ICANN’97. Springer, Berlin, Heidelberg, pp 583–588

    Google Scholar 

  22. Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):1–37

    Article  MathSciNet  Google Scholar 

  23. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323

    Article  CAS  PubMed  ADS  Google Scholar 

  24. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  Google Scholar 

  25. Camastra F (2003) Data dimensionality estimation methods: a survey. Pattern Recogn 36(12):2945–2954

    Article  ADS  Google Scholar 

  26. Cox TF, Cox MAA (2000) Multidimensional scaling, 2nd edn. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. CRC Press, New York

    Book  Google Scholar 

  27. He J, Ding L, Jiang L, Li Z, Hu Q (2014) Intrinsic dimensionality estimation based on manifold assumption. J Vis Commun Image Represent 25(5):740–747

    Article  Google Scholar 

  28. Miranda GF, Thomaz CE, Giraldi GA (2017) Geometric data analysis based on manifold learning with applications for image understanding. In: 2017 30th SIBGRAPI conference on graphics, patterns and images tutorials (SIBGRAPI-T), pp 42–62

Download references

Funding

No funds, grants, or other support were received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduardo K. Nakao.

Ethics declarations

Conflict of interest

The authors have no competing interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nakao, E.K., Levada, A.L.M. Information theory divergences in principal component analysis. Pattern Anal Applic 27, 19 (2024). https://doi.org/10.1007/s10044-024-01215-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10044-024-01215-w

Keywords

Navigation