当前位置: X-MOL 学术J. Big Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Gene selection via improved nuclear reaction optimization algorithm for cancer classification in high-dimensional data
Journal of Big Data ( IF 8.1 ) Pub Date : 2024-04-03 , DOI: 10.1186/s40537-024-00902-z
Amr A. Abd El-Mageed , Ahmed E. Elkhouli , Amr A. Abohany , Mona Gafar

Abstract

RNA Sequencing (RNA-Seq) has been considered a revolutionary technique in gene profiling and quantification. It offers a comprehensive view of the transcriptome, making it a more expansive technique in comparison with micro-array. Genes that discriminate malignancy and normal can be deduced using quantitative gene expression. However, this data is a high-dimensional dense matrix; each sample has a dimension of more than 20,000 genes. Dealing with this data poses challenges. This paper proposes RBNRO-DE (Relief Binary NRO based on Differential Evolution) for handling the gene selection strategy on (rnaseqv2 illuminahiseq rnaseqv2 un edu Level 3 RSEM genes normalized) with more than 20,000 genes to pick the best informative genes and assess them through 22 cancer datasets. The k-nearest Neighbor (k-NN) and Support Vector Machine (SVM) are applied to assess the quality of the selected genes. Binary versions of the most common meta-heuristic algorithms have been compared with the proposed RBNRO-DE algorithm. In most of the 22 cancer datasets, the RBNRO-DE algorithm based on k-NN and SVM classifiers achieved optimal convergence and classification accuracy up to 100% integrated with a feature reduction size down to 98%, which is very evident when compared to its counterparts, according to Wilcoxon’s rank-sum test (5% significance level).



中文翻译:

通过改进的核反应优化算法进行基因选择,用于高维数据中的癌症分类

摘要

RNA 测序 (RNA-Seq) 被认为是基因分析和定量领域的革命性技术。它提供了转录组的全面视图,使其成为比微阵列更广泛的技术。可以使用定量基因表达来推断区分恶性肿瘤和正常的基因。然而,这个数据是一个高维稠密矩阵;每个样本都有超过20,000个基因的维度。处理这些数据带来了挑战。本文提出RBNRO-DE(基于差异进化的救济二元NRO),用于处理超过20,000个基因的(rnaseqv2 Illuminahiseq rnaseqv2 un edu Level 3 RSEM基因归一化)的基因选择策略,以挑选最佳信息基因并通过22癌症数据集。应用k最近邻(k -NN)和支持向量机(SVM)来评估所选基因的质量。最常见的元启发式算法的二进制版本与所提出的 RBNRO-DE 算法进行了比较。在 22 个癌症数据集中,基于k -NN 和 SVM 分类器的 RBNRO-DE 算法实现了最佳收敛,分类准确率高达 100%,并且特征缩减量降低至 98%,这与它的算法相比非常明显。根据 Wilcoxon 的秩和检验(5% 显着性水平)。

更新日期:2024-04-04
down
wechat
bug