skip to main content

Algorithm 1038: KCC: A MATLAB Package for k-Means-based Consensus Clustering

Published:15 December 2023Publication History
Skip Abstract Section

Abstract

Consensus clustering is gaining increasing attention for its high quality and robustness. In particular, k-means-based Consensus Clustering (KCC) converts the usual computationally expensive problem to a classic k-means clustering with generalized utility functions, bringing potentials for large-scale data clustering on different types of data. Despite KCC’s applicability and generalizability, implementing this method such as representing the binary dataset in the k-means heuristic is challenging and has seldom been discussed in prior work. To fill this gap, we present a MATLAB package, KCC, that completely implements the KCC framework and utilizes a sparse representation technique to achieve a low space complexity. Compared to alternative consensus clustering packages, the KCC package is of high flexibility, efficiency, and effectiveness. Extensive numerical experiments are also included to show its usability on real-world datasets.

REFERENCES

  1. [1] Aeberhard Stefan, Coomans Danny, and Vel Olivier de. 1992. The classification performance of RDA. Technical Report, Department of Computer Science and Department of Mathematics and Statistics, James Cook University of North Queensland, 92–01.Google ScholarGoogle Scholar
  2. [2] Alimoglu Fevzi and Alpaydin Ethem. 1996. Methods of combining multiple classifiers based on different representations for pen-based handwriting recognition. In Proceedings of the 5th Turkish Artificial Intelligence and Artificial Neural Networks Symposium (TAINN’96).Google ScholarGoogle Scholar
  3. [3] Ayad H. G. and Kamel M. S.. 2008. Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1 (Jan.2008), 160173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Banerjee A., Merugu S., Dhillon I. S., and Ghosh J.. 2005. Clustering with bregman divergences. J. Mach. Learn. Res. 6 (Oct.2005), 17051749.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Bholowalia Purnima and Kumar Arvind. 2014. EBK-means: A clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl. 105, 9 (2014).Google ScholarGoogle Scholar
  6. [6] Bradley Paul S. and Fayyad Usama M.. 1998. Refining initial points for k-means clustering. In ICML, Vol. 98. Citeseer, 9199.Google ScholarGoogle Scholar
  7. [7] Bregman Lev M.. 1967. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 3 (1967), 200217.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Caliński Tadeusz and Harabasz Jerzy. 1974. A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3, 1 (1974), 127.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Chiu Derek S. and Talhouk Aline. 2018. diceR: An R package for class discovery using an ensemble driven approach. BMC Bioinf. 19, 1 (2018), 14.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Cover Thomas M. and Thomas Joy A.. 2012. Elements of Information Theory. John Wiley & Sons.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Demiroz G., Govenir H. A., and Ilter N.. 1998. Learning differential diagnosis of eryhemato-squamous diseases using voting feature intervals. Artif. Intell. Med. 13, 3 (1998), 147165.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Domeniconi Carlotta and Al-Razgan Muna. 2009. Weighted cluster ensembles: Methods and analysis. ACM Trans. Knowl. Discov. Data 2, 4, Article 17 (Jan.2009), 40 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Dongen Stijn. 2000. Performance Criteria for Graph Clustering and Markov Cluster Experiments. Technical Report. Amsterdam, The Netherlands, The Netherlands.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Ester Martin, Kriegel Hans-Peter, Sander Jörg, and Xu Xiaowei. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96). AAAI Press, 226231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Fisher Ronald A.. 1936. The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 2 (1936), 179188.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Fowlkes Edward B. and Mallows Colin L.. 1983. A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 383 (1983), 553569.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Fred A. L. N. and Jain A. K.. 2005. Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27, 6 (Jun.2005), 835850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Han Jiawei, Kamber Micheline, and Pei Jian. 2012. 10 - cluster analysis: Basic concepts and methods. In Data Mining (Third Edition), Han Jiawei, Kamber Micheline, and Pei Jian (Eds.). Morgan Kaufmann, Boston, 443495. Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Harman Donna. 1998. The text retrieval conferences (TRECs): Providing a test-bed for information retrieval systems. Bull. Am. Soc. Inf. Sci. Technol. 24, 4 (1998), 1113.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Hornik Kurt. 2005. A CLUE for CLUster ensembles. J. Stat. Softw. 14, 12 (Sep.2005). Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Hornik Kurt. 2020. Clue: Cluster Ensembles.Google ScholarGoogle Scholar
  22. [22] Huang Dong, Wang Chang-Dong, and Lai Jian-Huang. 2017. Locally weighted ensemble clustering. IEEE Trans. Cybernet. 48, 5 (2017), 14601473.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Huang Dong, Wang Chang-Dong, Peng Hongxing, Lai Jianhuang, and Kwoh Chee-Keong. 2018. Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Trans. Syst. Man Cybernet.: Syst. (2018).Google ScholarGoogle Scholar
  24. [24] Iam-on Natthakan and Garrett Simon. 2010. LinkCluE: A MATLAB package for link-based cluster ensembles. J. Stat. Softw. 36, 1 (2010), 136. Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Jardine Nick and Rijsbergen Cornelis Joost van. 1971. The use of hierarchic clustering in information retrieval. Inf. Stor. Retr. 7, 5 (1971), 217240.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Karypis George. 2002. CLUTO-a Clustering Toolkit. Technical Report, Department of Computer Science, University of Minnesota, Minneapolis, MN.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Kaufman Leonard and Rousseeuw Peter J.. 2009. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons.Google ScholarGoogle Scholar
  28. [28] Li Hongmin, Ye Xiucai, Imakura Akira, and Sakurai Tetsuya. 2020. Ensemble learning for spectral clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM’20). IEEE, 10941099.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Li Tao, Ding Chris, and Jordan Michael I.. 2007. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). IEEE Computer Society, 577582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Li Xue and Liu Hongfu. 2018. Greedy optimization for K-means-based consensus clustering. Tsinghua Sci. Technol. 23, 2 (2018), 184194.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Liang Yinian, Ren Zhigang, Wu Zongze, Zeng Deyu, and Li Jianzhong. 2020. Scalable spectral ensemble clustering via building representative co-association matrix. Neurocomputing 390 (2020), 158167.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Liu Hongfu, Liu Tongliang, Wu Junjie, Tao Dacheng, and Fu Yun. 2015. Spectral ensemble clustering. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). ACM, New York, NY, 715724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Liu Hongfu, Shao Ming, Li Sheng, and Fu Yun. 2016. Infinite ensemble for image clustering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, New York, NY, 17451754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Liu Hongfu, Shao Ming, Li Sheng, and Fu Yun. 2018. Infinite ensemble clustering. Data Min. Knowl. Discov. 32, 2 (2018), 385416.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Liu Hongfu, Wu Junjie, Liu Tongliang, Tao Dacheng, and Fu Yun. 2017. Spectral ensemble clustering via weighted k-means: Theoretical and practical evidence. IEEE Trans. Knowl. Data Eng. 29, 5 (2017), 11291143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Lu Zhiwu, Peng Yuxin, and Xiao Jianguo. 2008. From comparing clusterings to combining clusterings. In Proceedings of the 23rd National Conference on Artificial Intelligence, Volume 2 (AAAI’08), Vol. 2. AAAI Press, 665670.Google ScholarGoogle Scholar
  37. [37] MacQueen James et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 281297.Google ScholarGoogle Scholar
  38. [38] Mirkin B.. 2001. Reinterpreting the category utility function. Mach. Learn. 45, 2 (2001), 219228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Moore Andrew W.. 2001. Clustering with gaussian mixtures. School of Computer Science, Carnegie Mellon University.Google ScholarGoogle Scholar
  40. [40] Nakai Kenta and Kanehisa Minoru. 1991. Expert system for predicting protein localization sites in gram-negative bacteria. Proteins Struct. Funct. Bioinf. 11, 2 (1991), 95110.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Nguyen Nam and Caruana Rich. 2007. Consensus clusterings. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). IEEE Computer Society, 607612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Olex Amy L. and Fetrow Jacquelyn S.. 2011. SC2ATmd: A tool for integration of the figure of merit with cluster analysis for gene expression data. Bioinformatics 27, 9 (2011), 1330.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Parsons Lance, Haque Ehtesham, and Liu Huan. 2004. Subspace clustering for high dimensional data: A review. SIGKDD Explor. Newsl. 6, 1 (2004), 90105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Rand William M.. 1971. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 336 (1971), 846850.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Rokach Lior and Maimon Oded. 2005. Clustering Methods. Springer US, Boston, MA, 321352. Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Ronan Tom, Anastasio Shawn, Qi Zhijie, Tavares Pedro Henrique S. Vieira, Sloutsky Roman, and Naegle Kristen M.. 2018. OpenEnsembles: A python resource for ensemble clustering. J. Mach. Learn. Res. 19, 26 (2018), 16.Google ScholarGoogle Scholar
  47. [47] Seiler Michael, Huang C. Chris, Szalma Sandor, and Bhanot Gyan. 2010. ConsensusCluster: A software tool for unsupervised cluster discovery in numerical data. OMICS 14, 1 (Feb.2010), 109113. Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Shepitsen Andriy, Gemmell Jonathan, Mobasher Bamshad, and Burke Robin. 2008. Personalized recommendation in social tagging systems using hierarchical clustering. In Proceedings of the ACM Conference on Recommender Systems (RecSys’08). ACM, New York, NY, 259266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Shi Jianbo and Malik J.. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (Aug.2000), 888905. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Strehl Alexander and Ghosh Joydeep. 2003. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3 (Mar.2003), 583617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Tang E. Ke, Suganthan Ponnuthurai N., Yao Xin, and Qin A. Kai. 2005. Linear dimensionality reduction using relevance weighted LDA. Pattern Recogn. 38, 4 (2005), 485493.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Tao Zhiqiang, Liu Hongfu, Li Jun, Wang Zhaowen, and Fu Yun. 2019. Adversarial graph embedding for ensemble clustering. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI-19). International Joint Conferences on Artificial Intelligence Organization, 35623568. Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Topchy Alexander, Jain Anil K., and Punch William. 2003. Combining multiple weak clusterings. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03). IEEE Computer Society, 331–. http://dl.acm.org/citation.cfm?id=951949.952159Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Topchy Alexander, Jain Anil K., and Punch William. 2004. A mixture model for clustering ensembles. In Proceedings of the 4th SIAM International Conference on Data Mining. 379390. Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Vega-Pons Sandro, Correa-Morris Jyrko, and Ruiz-Shulcloper Jose. 2010. Weighted partition consensus via kernels. Pattern Recogn. 43, 8 (Aug.2010), 27122724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Wolberg William H. and Mangasarian Olvi L.. 1990. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. U.S.A. 87, 23 (1990), 91939196.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Wu Junjie, Liu Hongfu, Xiong Hui, and Cao Jie. 2013. A theoretic framework of K-means-based consensus clustering. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13). AAAI Press, 17991805.Google ScholarGoogle Scholar
  58. [58] Wu J., Liu H., Xiong H., Cao J., and Chen J.. 2015. K-means-based consensus clustering: A unified view. IEEE T. Knowl. Data En. 27, 1 (Jan.2015), 155169. Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Wu J., Xiong H., Liu C., and Chen J.. 2012. A generalization of distance functions for fuzzy c-means clustering with centroids of arithmetic means. IEEE Trans. Fuzzy Syst. 20, 3 (Jun.2012), 557571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Yoon Hye-Sung, Ahn Sun-Young, Lee Sang-Ho, Cho Sung-Bum, and Kim Ju Han. 2006. Heterogeneous Clustering Ensemble Method for Combining Different Cluster Results. Springer, Berlin, 8292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Zhong Caiming, Hu Lianyu, Yue Xiaodong, Luo Ting, Fu Qiang, and Xu Haiyong. 2019. Ensemble clustering based on evidence extracted from the co-association matrix. Pattern Recogn. 92 (2019), 93106.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Algorithm 1038: KCC: A MATLAB Package for k-Means-based Consensus Clustering

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Mathematical Software
          ACM Transactions on Mathematical Software  Volume 49, Issue 4
          December 2023
          226 pages
          ISSN:0098-3500
          EISSN:1557-7295
          DOI:10.1145/3637452
          • Editors:
          • Zhaojun Bai,
          • Wolfgang Bangerth
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 December 2023
          • Online AM: 15 August 2023
          • Accepted: 7 August 2023
          • Revised: 14 June 2023
          • Received: 13 May 2021
          Published in toms Volume 49, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text