Abstract
Consensus clustering is gaining increasing attention for its high quality and robustness. In particular, k-means-based Consensus Clustering (KCC) converts the usual computationally expensive problem to a classic k-means clustering with generalized utility functions, bringing potentials for large-scale data clustering on different types of data. Despite KCC’s applicability and generalizability, implementing this method such as representing the binary dataset in the k-means heuristic is challenging and has seldom been discussed in prior work. To fill this gap, we present a MATLAB package, KCC, that completely implements the KCC framework and utilizes a sparse representation technique to achieve a low space complexity. Compared to alternative consensus clustering packages, the KCC package is of high flexibility, efficiency, and effectiveness. Extensive numerical experiments are also included to show its usability on real-world datasets.
- [1] . 1992. The classification performance of RDA. Technical Report, Department of Computer Science and Department of Mathematics and Statistics, James Cook University of North Queensland, 92–01.Google Scholar
- [2] . 1996. Methods of combining multiple classifiers based on different representations for pen-based handwriting recognition. In Proceedings of the 5th Turkish Artificial Intelligence and Artificial Neural Networks Symposium (TAINN’96).Google Scholar
- [3] . 2008. Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1 (
Jan. 2008), 160–173. Google ScholarDigital Library - [4] . 2005. Clustering with bregman divergences. J. Mach. Learn. Res. 6 (
Oct. 2005), 1705–1749.Google ScholarCross Ref - [5] . 2014. EBK-means: A clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl. 105, 9 (2014).Google Scholar
- [6] . 1998. Refining initial points for k-means clustering. In ICML, Vol. 98. Citeseer, 91–99.Google Scholar
- [7] . 1967. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 3 (1967), 200–217.Google ScholarCross Ref
- [8] . 1974. A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3, 1 (1974), 1–27.Google ScholarCross Ref
- [9] . 2018. diceR: An R package for class discovery using an ensemble driven approach. BMC Bioinf. 19, 1 (2018), 1–4.Google ScholarCross Ref
- [10] . 2012. Elements of Information Theory. John Wiley & Sons.Google ScholarDigital Library
- [11] . 1998. Learning differential diagnosis of eryhemato-squamous diseases using voting feature intervals. Artif. Intell. Med. 13, 3 (1998), 147–165.Google ScholarCross Ref
- [12] . 2009. Weighted cluster ensembles: Methods and analysis. ACM Trans. Knowl. Discov. Data 2, 4, Article
17 (Jan. 2009), 40 pages. Google ScholarDigital Library - [13] . 2000. Performance Criteria for Graph Clustering and Markov Cluster Experiments.
Technical Report . Amsterdam, The Netherlands, The Netherlands.Google ScholarDigital Library - [14] . 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD’96). AAAI Press, 226–231.Google ScholarDigital Library
- [15] . 1936. The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 2 (1936), 179–188.Google ScholarCross Ref
- [16] . 1983. A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 383 (1983), 553–569.Google ScholarCross Ref
- [17] . 2005. Combining multiple clusterings using evidence accumulation. IEEE Trans. Pattern Anal. Mach. Intell. 27, 6 (
Jun. 2005), 835–850. Google ScholarDigital Library - [18] . 2012. 10 - cluster analysis: Basic concepts and methods. In Data Mining (Third Edition), , , and (Eds.). Morgan Kaufmann, Boston, 443–495. Google ScholarCross Ref
- [19] . 1998. The text retrieval conferences (TRECs): Providing a test-bed for information retrieval systems. Bull. Am. Soc. Inf. Sci. Technol. 24, 4 (1998), 11–13.Google ScholarCross Ref
- [20] . 2005. A CLUE for CLUster ensembles. J. Stat. Softw. 14, 12 (
Sep. 2005). Google ScholarCross Ref - [21] . 2020. Clue: Cluster Ensembles.Google Scholar
- [22] . 2017. Locally weighted ensemble clustering. IEEE Trans. Cybernet. 48, 5 (2017), 1460–1473.Google ScholarCross Ref
- [23] . 2018. Enhanced ensemble clustering via fast propagation of cluster-wise similarities. IEEE Trans. Syst. Man Cybernet.: Syst. (2018).Google Scholar
- [24] . 2010. LinkCluE: A MATLAB package for link-based cluster ensembles. J. Stat. Softw. 36, 1 (2010), 1–36. Google ScholarCross Ref
- [25] . 1971. The use of hierarchic clustering in information retrieval. Inf. Stor. Retr. 7, 5 (1971), 217–240.Google ScholarCross Ref
- [26] . 2002. CLUTO-a Clustering Toolkit.
Technical Report , Department of Computer Science, University of Minnesota, Minneapolis, MN.Google ScholarCross Ref - [27] . 2009. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons.Google Scholar
- [28] . 2020. Ensemble learning for spectral clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM’20). IEEE, 1094–1099.Google ScholarCross Ref
- [29] . 2007. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). IEEE Computer Society, 577–582. Google ScholarDigital Library
- [30] . 2018. Greedy optimization for K-means-based consensus clustering. Tsinghua Sci. Technol. 23, 2 (2018), 184–194.Google ScholarCross Ref
- [31] . 2020. Scalable spectral ensemble clustering via building representative co-association matrix. Neurocomputing 390 (2020), 158–167.Google ScholarCross Ref
- [32] . 2015. Spectral ensemble clustering. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’15). ACM, New York, NY, 715–724. Google ScholarDigital Library
- [33] . 2016. Infinite ensemble for image clustering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). ACM, New York, NY, 1745–1754. Google ScholarDigital Library
- [34] . 2018. Infinite ensemble clustering. Data Min. Knowl. Discov. 32, 2 (2018), 385–416.Google ScholarDigital Library
- [35] . 2017. Spectral ensemble clustering via weighted k-means: Theoretical and practical evidence. IEEE Trans. Knowl. Data Eng. 29, 5 (2017), 1129–1143.Google ScholarDigital Library
- [36] . 2008. From comparing clusterings to combining clusterings. In Proceedings of the 23rd National Conference on Artificial Intelligence, Volume 2 (AAAI’08), Vol. 2. AAAI Press, 665–670.Google Scholar
- [37] . 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 281–297.Google Scholar
- [38] . 2001. Reinterpreting the category utility function. Mach. Learn. 45, 2 (2001), 219–228. Google ScholarDigital Library
- [39] . 2001. Clustering with gaussian mixtures. School of Computer Science, Carnegie Mellon University.Google Scholar
- [40] . 1991. Expert system for predicting protein localization sites in gram-negative bacteria. Proteins Struct. Funct. Bioinf. 11, 2 (1991), 95–110.Google ScholarCross Ref
- [41] . 2007. Consensus clusterings. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM’07). IEEE Computer Society, 607–612. Google ScholarDigital Library
- [42] . 2011. SC2ATmd: A tool for integration of the figure of merit with cluster analysis for gene expression data. Bioinformatics 27, 9 (2011), 1330.Google ScholarDigital Library
- [43] . 2004. Subspace clustering for high dimensional data: A review. SIGKDD Explor. Newsl. 6, 1 (2004), 90–105.Google ScholarDigital Library
- [44] . 1971. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 336 (1971), 846–850.Google ScholarCross Ref
- [45] . 2005. Clustering Methods. Springer US, Boston, MA, 321–352. Google ScholarCross Ref
- [46] . 2018. OpenEnsembles: A python resource for ensemble clustering. J. Mach. Learn. Res. 19, 26 (2018), 1–6.Google Scholar
- [47] . 2010. ConsensusCluster: A software tool for unsupervised cluster discovery in numerical data. OMICS 14, 1 (
Feb. 2010), 109–113. Google ScholarCross Ref - [48] . 2008. Personalized recommendation in social tagging systems using hierarchical clustering. In Proceedings of the ACM Conference on Recommender Systems (RecSys’08). ACM, New York, NY, 259–266. Google ScholarDigital Library
- [49] . 2000. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 8 (
Aug. 2000), 888–905. Google ScholarDigital Library - [50] . 2003. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3 (
Mar. 2003), 583–617. Google ScholarDigital Library - [51] . 2005. Linear dimensionality reduction using relevance weighted LDA. Pattern Recogn. 38, 4 (2005), 485–493.Google ScholarDigital Library
- [52] . 2019. Adversarial graph embedding for ensemble clustering. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI-19). International Joint Conferences on Artificial Intelligence Organization, 3562–3568. Google ScholarCross Ref
- [53] . 2003. Combining multiple weak clusterings. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03). IEEE Computer Society, 331–. http://dl.acm.org/citation.cfm?id=951949.952159Google ScholarCross Ref
- [54] . 2004. A mixture model for clustering ensembles. In Proceedings of the 4th SIAM International Conference on Data Mining. 379–390. Google ScholarCross Ref
- [55] . 2010. Weighted partition consensus via kernels. Pattern Recogn. 43, 8 (
Aug. 2010), 2712–2724. Google ScholarDigital Library - [56] . 1990. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. U.S.A. 87, 23 (1990), 9193–9196.Google ScholarCross Ref
- [57] . 2013. A theoretic framework of K-means-based consensus clustering. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13). AAAI Press, 1799–1805.Google Scholar
- [58] . 2015. K-means-based consensus clustering: A unified view. IEEE T. Knowl. Data En. 27, 1 (
Jan. 2015), 155–169. Google ScholarCross Ref - [59] . 2012. A generalization of distance functions for fuzzy c-means clustering with centroids of arithmetic means. IEEE Trans. Fuzzy Syst. 20, 3 (
Jun. 2012), 557–571. Google ScholarDigital Library - [60] . 2006. Heterogeneous Clustering Ensemble Method for Combining Different Cluster Results. Springer, Berlin, 82–92. Google ScholarDigital Library
- [61] . 2019. Ensemble clustering based on evidence extracted from the co-association matrix. Pattern Recogn. 92 (2019), 93–106.Google ScholarDigital Library
Index Terms
- Algorithm 1038: KCC: A MATLAB Package for k-Means-based Consensus Clustering
Recommendations
Ensemble-Initialized k-Means Clustering
ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and ComputingAs one of the most classical clustering techniques, the k-means clustering has been widely used in various areas over the past few decades. Despite its significant success, there are still several challenging issues in the k-means clustering research, ...
Clustering stability-based Evolutionary K-Means
Evolutionary K-Means (EKM), which combines K-Means and genetic algorithm, solves K-Means' initiation problem by selecting parameters automatically through the evolution of partitions. Currently, EKM algorithms usually choose silhouette index as cluster ...
Ant clustering algorithm with K-harmonic means clustering
Clustering is an unsupervised learning procedure and there is no a prior knowledge of data distribution. It organizes a set of objects/data into similar groups called clusters, and the objects within one cluster are highly similar and dissimilar with ...
Comments