当前位置: X-MOL 学术Stat. Appl. Genet. Molecul. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Use of SVM-based ensemble feature selection method for gene expression data analysis
Statistical Applications in Genetics and Molecular Biology ( IF 0.9 ) Pub Date : 2022-07-13 , DOI: 10.1515/sagmb-2022-0002
Shizhi Zhang 1 , Mingjin Zhang 2
Affiliation  

Gene selection is one of the key steps for gene expression data analysis. An SVM-based ensemble feature selection method is proposed in this paper. Firstly, the method builds many subsets by using Monte Carlo sampling. Secondly, ranking all the features on each of the subsets and integrating them to obtain a final ranking list. Finally, the optimum feature set is determined by a backward feature elimination strategy. This method is applied to the analysis of 4 public datasets: the Leukemia, Prostate, Colorectal, and SMK_CAN, resulting 7, 10, 13, and 32 features. The AUC obtained from independent test sets are 0.9867, 0.9796, 0.9571, and 0.9575, respectively. These results indicate that the features selected by the proposed method can improve sample classification accuracy, and thus be effective for gene selection from gene expression data.

中文翻译:

基于支持向量机的集成特征选择方法在基因表达数据分析中的应用

基因选择是基因表达数据分析的关键步骤之一。本文提出了一种基于SVM的集成特征选择方法。首先,该方法利用蒙特卡洛采样构建了许多子集。其次,对每个子集上的所有特征进行排序,并整合它们以获得最终的排序列表。最后,通过后向特征消除策略确定最优特征集。该方法用于分析 4 个公共数据集:白血病、前列腺、结肠直肠和 SMK_CAN,产生 7、10、13 和 32 个特征。从独立测试集获得的 AUC 分别为 0.9867、0.9796、0.9571 和 0.9575。这些结果表明,该方法选择的特征可以提高样本分类的准确性,从而有效地从基因表达数据中进行基因选择。
更新日期:2022-07-13
down
wechat
bug