当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Supervised Clustering of Persian Handwritten Images Using Regularization and Dimension Reduction Methods
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2024-02-27 , DOI: 10.1145/3638060
Sajedeh Moradnia 1 , Mousa Golalizadeh 1
Affiliation  

Clustering, as a fundamental exploratory data technique, not only is used to discover patterns and structures in complex datasets but also is utilized to group variables in high-dimensional data analysis. Dimension reduction through clustering helps identify important variables and reduce data dimensions without losing significant information. High-dimensional image datasets, such as Persian handwritten images, have numerous pixels, making statistical inference difficult. Such high-dimensionality property pose challenges for analysis and processing, requiring specialized techniques like clustering to extract information. Incorporating response variable information enhances clustering analysis, transforming it into a supervised method. This article evaluates a supervised clustering approach using Ridge and Lasso penalties, comparing them in analyzing a real dataset while identifying important variables. We demonstrate that despite choosing a small number of variables as important variables, Lasso penalty performs relatively well in predicting the labels of new observations for this multi-class dataset.



中文翻译:

使用正则化和降维方法对波斯手写图像进行监督聚类

聚类作为一种基本的探索性数据技术,不仅用于发现复杂数据集中的模式和结构,还用于在高维数据分析中对变量进行分组。通过聚类进行降维有助于识别重要变量并减少数据维度,而不会丢失重要信息。高维图像数据集,例如波斯手写图像,具有大量像素,使得统计推断变得困难。这种高维属性给分析和处理带来了挑战,需要聚类等专门技术来提取信息。结合响应变量信息增强了聚类分析,将其转变为监督方法。本文评估了使用 Ridge 和 Lasso 惩罚的监督聚类方法,在分析真实数据集的同时识别重要变量,对它们进行比较。我们证明,尽管选择了少量变量作为重要变量,但套索惩罚在预测这个多类数据集的新观测值的标签方面表现相对较好。

更新日期:2024-02-27
down
wechat
bug