当前位置: X-MOL 学术Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sparse orthogonal supervised feature selection with global redundancy minimization, label scaling, and robustness
Information Sciences ( IF 8.1 ) Pub Date : 2024-03-12 , DOI: 10.1016/j.ins.2024.120454
Huming Liao , Hongmei Chen , Yong Mi , Chuan Luo , Shi-Jinn Horng , Tianrui Li

Selecting discriminative features to build effective learning models is a significant research work in machine learning. In practical applications, the data distribution characteristics are diverse, and the uncertainties pose challenges for building learning models with robustness and generalization capabilities. Since one-hot encoding is good at representing independent labels, the label matrix of regression-based feature selection (FS) methods is usually encoded with one-hot encoding. However, it's not well adapted to the different data distributions. This paper proposes a sparse orthogonal supervised FS model with global redundancy minimization, label scaling, and robustness (GRMLSRSOFS) to address the above problems. This model uses the label scaling technique proposed in this paper to better adapt to different data distributions. An iterative optimization method is given, and its convergence is demonstrated theoretically and experimentally. Further, experimental results on 12 public datasets show that 1) The GRMLSRSOFS can achieve higher classification accuracy with fewer features in most cases than several state-of-the-art FS methods. For example, the GRMLSRSOFS achieves 100% classification accuracy using only 20 features on the warpPIE10P dataset and obtains nearly 6% improvement over other methods on the Yale dataset. 2) The convergence speed of the GRMLSRSOFS will be faster after label scaling.

中文翻译:

具有全局冗余最小化、标签缩放和鲁棒性的稀疏正交监督特征选择

选择判别性特征来构建有效的学习模型是机器学习领域的一项重要研究工作。在实际应用中,数据分布特征多样,不确定性给构建具有鲁棒性和泛化能力的学习模型带来了挑战。由于one-hot编码擅长表示独立标签,因此基于回归的特征选择(FS)方法的标签矩阵通常采用one-hot编码进行编码。然而,它不能很好地适应不同的数据分布。本文提出了一种具有全局冗余最小化、标签缩放和鲁棒性的稀疏正交监督FS模型(GRMLSRSOFS)来解决上述问题。该模型使用本文提出的标签缩放技术来更好地适应不同的数据分布。给出了一种迭代优化方法,并从理论上和实验上证明了其收敛性。此外,在 12 个公共数据集上的实验结果表明:1)与几种最先进的 FS 方法相比,GRMLSRSOFS 在大多数情况下可以用更少的特征实现更高的分类精度。例如,GRMLSRSOFS 仅使用 warpPIE10P 数据集上的 20 个特征即可实现 100% 的分类准确率,并且在耶鲁数据集上比其他方法获得近 6% 的改进。 2)标签缩放后GRMLSRSOFS的收敛速度会更快。
更新日期:2024-03-12
down
wechat
bug