research-article

Algorithm 1038: KCC: A MATLAB Package for k-Means-based Consensus Clustering

Authors:
Hao Lin

School of Economics and Management, Beihang University, China

School of Economics and Management, Beihang University, China

0000-0002-1921-3036
View Profile

,
Hongfu Liu

Michtom School of Computer Science, Brandeis University, USA

Michtom School of Computer Science, Brandeis University, USA

0000-0002-4261-8154
View Profile

,
Junjie Wu

School of Economics and Management, Beihang University, China

School of Economics and Management, Beihang University, China

0000-0001-7650-3657
View Profile

,
Hong Li

School of Economics and Management, Beihang University, China

School of Economics and Management, Beihang University, China

0009-0002-9702-0253
View Profile

,
Stephan Günnemann

Department of Informatics, Technical University of Munich, Germany

Department of Informatics, Technical University of Munich, Germany

0000-0001-7772-5059
View Profile

Authors Info & Claims

ACM Transactions on Mathematical Software Volume 49 Issue 4Article No.: 40pp 1–27https://doi.org/10.1145/3616011

Published:15 December 2023Publication History

ACM Transactions on Mathematical Software

Abstract

Consensus clustering is gaining increasing attention for its high quality and robustness. In particular, k-means-based Consensus Clustering (KCC) converts the usual computationally expensive problem to a classic k-means clustering with generalized utility functions, bringing potentials for large-scale data clustering on different types of data. Despite KCC’s applicability and generalizability, implementing this method such as representing the binary dataset in the k-means heuristic is challenging and has seldom been discussed in prior work. To fill this gap, we present a MATLAB package, KCC, that completely implements the KCC framework and utilizes a sparse representation technique to achieve a low space complexity. Compared to alternative consensus clustering packages, the KCC package is of high flexibility, efficiency, and effectiveness. Extensive numerical experiments are also included to show its usability on real-world datasets.

REFERENCES

[1] Aeberhard Stefan, Coomans Danny, and Vel Olivier de. 1992. The classification performance of RDA. Technical Report, Department of Computer Science and Department of Mathematics and Statistics, James Cook University of North Queensland, 92–01.Google Scholar
[2] Alimoglu Fevzi and Alpaydin Ethem. 1996. Methods of combining multiple classifiers based on different representations for pen-based handwriting recognition. In Proceedings of the 5th Turkish Artificial Intelligence and Artificial Neural Networks Symposium (TAINN’96).Google Scholar
[3] Ayad H. G. and Kamel M. S.. 2008. Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1 (Jan.2008), 160–173. Google ScholarDigital Library
[4] Banerjee A., Merugu S., Dhillon I. S., and Ghosh J.. 2005. Clustering with bregman divergences. J. Mach. Learn. Res. 6 (Oct.2005), 1705–1749.Google ScholarCross Ref
[5] Bholowalia Purnima and Kumar Arvind. 2014. EBK-means: A clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl. 105, 9 (2014).Google Scholar
[6] Bradley Paul S. and Fayyad Usama M.. 1998. Refining initial points for k-means clustering. In ICML, Vol. 98. Citeseer, 91–99.Google Scholar
[7] Bregman Lev M.. 1967. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 3 (1967), 200–217.Google ScholarCross Ref
[8] Caliński Tadeusz and Harabasz Jerzy. 1974. A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3, 1 (1974), 1–27.Google ScholarCross Ref
[9] Chiu Derek S. and Talhouk Aline. 2018. diceR: An R package for class discovery using an ensemble driven approach. BMC Bioinf. 19, 1 (2018), 1–4.Google ScholarCross Ref
[10] Cover Thomas M. and Thomas Joy A.. 2012. Elements of Information Theory. John Wiley & Sons.Google ScholarDigital Library
[11] Demiroz G., Govenir H. A., and Ilter N.. 1998. Learning differential diagnosis of eryhemato-squamous diseases using voting feature intervals. Artif. Intell. Med. 13, 3 (1998), 147–165.Google ScholarCross Ref
[12] Domeniconi Carlotta and Al-Razgan Muna. 2009. Weighted cluster ensembles: Methods and analysis. ACM Trans. Knowl. Discov. Data 2, 4, Article 17 (Jan.2009), 40 pages. Google ScholarDigital Library
[13] Dongen Stijn. 2000. Performance Criteria for Graph Clustering and Markov Cluster Experiments. Technical Report. Amsterdam, The Netherlands, The Netherlands.Google ScholarDigital Library
[14] Ester Martin, Kriegel Hans-Peter, Sander Jörg, and Xu Xiaowei. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96). AAAI Press, 226–231.Google ScholarDigital Library
[15] Fisher Ronald A.. 1936. The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 2 (1936), 179–188.Google ScholarCross Ref
[16] Fowlkes Edward B. and Mallows Colin L.. 1983. A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 383 (1983), 553–569.Google ScholarCross Ref
[17] Fred A. L. N. and Jain A. K.. 2005. Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27, 6 (Jun.2005), 835–850. Google ScholarDigital Library
[18] Han Jiawei, Kamber Micheline, and Pei Jian. 2012. 10 - cluster analysis: Basic concepts and methods. In Data Mining (Third Edition), Han Jiawei, Kamber Micheline, and Pei Jian (Eds.). Morgan Kaufmann, Boston, 443–495. Google ScholarCross Ref
[19] Harman Donna. 1998. The text retrieval conferences (TRECs): Providing a test-bed for information retrieval systems. Bull. Am. Soc. Inf. Sci. Technol. 24, 4 (1998), 11–13.Google ScholarCross Ref
[20] Hornik Kurt. 2005. A CLUE for CLUster ensembles. J. Stat. Softw. 14, 12 (Sep.2005). Google ScholarCross Ref
[21] Hornik Kurt. 2020. Clue: Cluster Ensembles.Google Scholar
[22] Huang Dong, Wang Chang-Dong, and Lai Jian-Huang. 2017. Locally weighted ensemble clustering. IEEE Trans. Cybernet. 48, 5 (2017), 1460–1473.Google ScholarCross Ref
[23] Huang Dong, Wang Chang-Dong, Peng Hongxing, Lai Jianhuang, and Kwoh Chee-Keong. 2018. Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Trans. Syst. Man Cybernet.: Syst. (2018).Google Scholar
[24] Iam-on Natthakan and Garrett Simon. 2010. LinkCluE: A MATLAB package for link-based cluster ensembles. J. Stat. Softw. 36, 1 (2010), 1–36. Google ScholarCross Ref
[25] Jardine Nick and Rijsbergen Cornelis Joost van. 1971. The use of hierarchic clustering in information retrieval. Inf. Stor. Retr. 7, 5 (1971), 217–240.Google ScholarCross Ref
[26] Karypis George. 2002. CLUTO-a Clustering Toolkit. Technical Report, Department of Computer Science, University of Minnesota, Minneapolis, MN.Google ScholarCross Ref
[27] Kaufman Leonard and Rousseeuw Peter J.. 2009. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons.Google Scholar
[28] Li Hongmin, Ye Xiucai, Imakura Akira, and Sakurai Tetsuya. 2020. Ensemble learning for spectral clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM’20). IEEE, 1094–1099.Google ScholarCross Ref
[29] Li Tao, Ding Chris, and Jordan Michael I.. 2007. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). IEEE Computer Society, 577–582. Google ScholarDigital Library
[30] Li Xue and Liu Hongfu. 2018. Greedy optimization for K-means-based consensus clustering. Tsinghua Sci. Technol. 23, 2 (2018), 184–194.Google ScholarCross Ref
[31] Liang Yinian, Ren Zhigang, Wu Zongze, Zeng Deyu, and Li Jianzhong. 2020. Scalable spectral ensemble clustering via building representative co-association matrix. Neurocomputing 390 (2020), 158–167.Google ScholarCross Ref
[32] Liu Hongfu, Liu Tongliang, Wu Junjie, Tao Dacheng, and Fu Yun. 2015. Spectral ensemble clustering. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). ACM, New York, NY, 715–724. Google ScholarDigital Library
[33] Liu Hongfu, Shao Ming, Li Sheng, and Fu Yun. 2016. Infinite ensemble for image clustering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, New York, NY, 1745–1754. Google ScholarDigital Library
[34] Liu Hongfu, Shao Ming, Li Sheng, and Fu Yun. 2018. Infinite ensemble clustering. Data Min. Knowl. Discov. 32, 2 (2018), 385–416.Google ScholarDigital Library
[35] Liu Hongfu, Wu Junjie, Liu Tongliang, Tao Dacheng, and Fu Yun. 2017. Spectral ensemble clustering via weighted k-means: Theoretical and practical evidence. IEEE Trans. Knowl. Data Eng. 29, 5 (2017), 1129–1143.Google ScholarDigital Library
[36] Lu Zhiwu, Peng Yuxin, and Xiao Jianguo. 2008. From comparing clusterings to combining clusterings. In Proceedings of the 23rd National Conference on Artificial Intelligence, Volume 2 (AAAI’08), Vol. 2. AAAI Press, 665–670.Google Scholar
[37] MacQueen James et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 281–297.Google Scholar
[38] Mirkin B.. 2001. Reinterpreting the category utility function. Mach. Learn. 45, 2 (2001), 219–228. Google ScholarDigital Library
[39] Moore Andrew W.. 2001. Clustering with gaussian mixtures. School of Computer Science, Carnegie Mellon University.Google Scholar
[40] Nakai Kenta and Kanehisa Minoru. 1991. Expert system for predicting protein localization sites in gram-negative bacteria. Proteins Struct. Funct. Bioinf. 11, 2 (1991), 95–110.Google ScholarCross Ref
[41] Nguyen Nam and Caruana Rich. 2007. Consensus clusterings. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). IEEE Computer Society, 607–612. Google ScholarDigital Library
[42] Olex Amy L. and Fetrow Jacquelyn S.. 2011. SC2ATmd: A tool for integration of the figure of merit with cluster analysis for gene expression data. Bioinformatics 27, 9 (2011), 1330.Google ScholarDigital Library
[43] Parsons Lance, Haque Ehtesham, and Liu Huan. 2004. Subspace clustering for high dimensional data: A review. SIGKDD Explor. Newsl. 6, 1 (2004), 90–105.Google ScholarDigital Library
[44] Rand William M.. 1971. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 336 (1971), 846–850.Google ScholarCross Ref
[45] Rokach Lior and Maimon Oded. 2005. Clustering Methods. Springer US, Boston, MA, 321–352. Google ScholarCross Ref
[46] Ronan Tom, Anastasio Shawn, Qi Zhijie, Tavares Pedro Henrique S. Vieira, Sloutsky Roman, and Naegle Kristen M.. 2018. OpenEnsembles: A python resource for ensemble clustering. J. Mach. Learn. Res. 19, 26 (2018), 1–6.Google Scholar
[47] Seiler Michael, Huang C. Chris, Szalma Sandor, and Bhanot Gyan. 2010. ConsensusCluster: A software tool for unsupervised cluster discovery in numerical data. OMICS 14, 1 (Feb.2010), 109–113. Google ScholarCross Ref
[48] Shepitsen Andriy, Gemmell Jonathan, Mobasher Bamshad, and Burke Robin. 2008. Personalized recommendation in social tagging systems using hierarchical clustering. In Proceedings of the ACM Conference on Recommender Systems (RecSys’08). ACM, New York, NY, 259–266. Google ScholarDigital Library
[49] Shi Jianbo and Malik J.. 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (Aug.2000), 888–905. Google ScholarDigital Library
[50] Strehl Alexander and Ghosh Joydeep. 2003. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3 (Mar.2003), 583–617. Google ScholarDigital Library
[51] Tang E. Ke, Suganthan Ponnuthurai N., Yao Xin, and Qin A. Kai. 2005. Linear dimensionality reduction using relevance weighted LDA. Pattern Recogn. 38, 4 (2005), 485–493.Google ScholarDigital Library
[52] Tao Zhiqiang, Liu Hongfu, Li Jun, Wang Zhaowen, and Fu Yun. 2019. Adversarial graph embedding for ensemble clustering. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI-19). International Joint Conferences on Artificial Intelligence Organization, 3562–3568. Google ScholarCross Ref
[53] Topchy Alexander, Jain Anil K., and Punch William. 2003. Combining multiple weak clusterings. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03). IEEE Computer Society, 331–. http://dl.acm.org/citation.cfm?id=951949.952159Google ScholarCross Ref
[54] Topchy Alexander, Jain Anil K., and Punch William. 2004. A mixture model for clustering ensembles. In Proceedings of the 4th SIAM International Conference on Data Mining. 379–390. Google ScholarCross Ref
[55] Vega-Pons Sandro, Correa-Morris Jyrko, and Ruiz-Shulcloper Jose. 2010. Weighted partition consensus via kernels. Pattern Recogn. 43, 8 (Aug.2010), 2712–2724. Google ScholarDigital Library
[56] Wolberg William H. and Mangasarian Olvi L.. 1990. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. U.S.A. 87, 23 (1990), 9193–9196.Google ScholarCross Ref
[57] Wu Junjie, Liu Hongfu, Xiong Hui, and Cao Jie. 2013. A theoretic framework of K-means-based consensus clustering. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13). AAAI Press, 1799–1805.Google Scholar
[58] Wu J., Liu H., Xiong H., Cao J., and Chen J.. 2015. K-means-based consensus clustering: A unified view. IEEE T. Knowl. Data En. 27, 1 (Jan.2015), 155–169. Google ScholarCross Ref
[59] Wu J., Xiong H., Liu C., and Chen J.. 2012. A generalization of distance functions for fuzzy c-means clustering with centroids of arithmetic means. IEEE Trans. Fuzzy Syst. 20, 3 (Jun.2012), 557–571. Google ScholarDigital Library
[60] Yoon Hye-Sung, Ahn Sun-Young, Lee Sang-Ho, Cho Sung-Bum, and Kim Ju Han. 2006. Heterogeneous Clustering Ensemble Method for Combining Different Cluster Results. Springer, Berlin, 82–92. Google ScholarDigital Library
[61] Zhong Caiming, Hu Lianyu, Yue Xiaodong, Luo Ting, Fu Qiang, and Xu Haiyong. 2019. Ensemble clustering based on evidence extracted from the co-association matrix. Pattern Recogn. 92 (2019), 93–106.Google ScholarDigital Library

Index Terms

Algorithm 1038: KCC: A MATLAB Package for k-Means-based Consensus Clustering

Recommendations

Ensemble-Initialized k-Means Clustering
ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

As one of the most classical clustering techniques, the k-means clustering has been widely used in various areas over the past few decades. Despite its significant success, there are still several challenging issues in the k-means clustering research, ...
Read More
Clustering stability-based Evolutionary K-Means

Evolutionary K-Means (EKM), which combines K-Means and genetic algorithm, solves K-Means' initiation problem by selecting parameters automatically through the evolution of partitions. Currently, EKM algorithms usually choose silhouette index as cluster ...
Read More
Ant clustering algorithm with K-harmonic means clustering

Clustering is an unsupervised learning procedure and there is no a prior knowledge of data distribution. It organizes a set of objects/data into similar groups called clusters, and the objects within one cluster are highly similar and dissimilar with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Mathematical Software Volume 49, Issue 4
December 2023
226 pages
ISSN:0098-3500
EISSN:1557-7295
DOI:10.1145/3637452
Editors:
Zhaojun Bai
University of California at Davis, USA
,
Wolfgang Bangerth
Colorado State University, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 December 2023
- Online AM: 15 August 2023
- Accepted: 7 August 2023
- Revised: 14 June 2023
- Received: 13 May 2021
Published in toms Volume 49, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Evaluated & Functional / v1.1
- Artifacts Available / v1.1
Author Tags
Consensus clustering
k-means
utility functions
MATLAB
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 269
  Total Downloads
- Downloads (Last 12 months)269
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Algorithm 1038: KCC: A MATLAB Package for k-Means-based Consensus Clustering

ACM Transactions on Mathematical Software

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Ensemble-Initialized k-Means Clustering

Clustering stability-based Evolutionary K-Means

Ant clustering algorithm with K-harmonic means clustering