当前位置: X-MOL 学术Methodol. Comput. Appl. Probab. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A New Separation Index and Classification Techniques Based on Shannon Entropy
Methodology and Computing in Applied Probability ( IF 0.9 ) Pub Date : 2023-09-22 , DOI: 10.1007/s11009-023-10055-w
Jorge Navarro , Francesco Buono , Jorge M. Arevalillo

The purpose is to use Shannon entropy measures to develop classification techniques and an index which estimates the separation of the groups in a finite mixture model. These measures can be applied to machine learning techniques such as discriminant analysis, cluster analysis, exploratory data analysis, etc. If we know the number of groups and we have training samples from each group (supervised learning) the index is used to measure the separation of the groups. Here some entropy measures are used to classify new individuals in one of these groups. If we are not sure about the number of groups (unsupervised learning), the index can be used to determine the optimal number of groups from an entropy (information/uncertainty) criterion. It can also be used to determine the best variables in order to separate the groups. In all the cases we assume that we have absolutely continuous random variables and we use the Shannon entropy based on the probability density function. Theoretical, parametric and non-parametric techniques are proposed to get approximations of these entropy measures in practice. An application to gene selection in a colon cancer discrimination study with a lot of variables is provided as well.



中文翻译:

基于香农熵的新型分离指标和分类技术

目的是使用香农熵度量来开发分类技术和估计有限混合模型中组的分离的指数。这些度量可以应用于机器学习技术,例如判别分析、聚类分析、探索性数据分析等。如果我们知道组的数量并且我们有来自每个组的训练样本(监督学习),则该索引用于测量分离组的。这里使用一些熵度量来将新个体分类到这些组之一中。如果我们不确定组的数量(无监督学习),则可以使用该索引根据熵(信息/不确定性)标准确定最佳组数。它还可用于确定最佳变量以区分组。在所有情况下,我们假设我们有绝对连续的随机变量,并且我们使用基于概率密度函数的香农熵。提出了理论、参数和非参数技术来在实践中获得这些熵度量的近似值。还提供了在具有许多变量的结肠癌歧视研究中基因选择的应用。

更新日期:2023-09-23
down
wechat
bug