当前位置: X-MOL 学术Curr. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DHFS-ECM: Design of a Dual Heuristic Feature Selection-based Ensemble Classification Model for the Identification of Bamboo Species from Genomic Sequences
Current Genomics ( IF 2.6 ) Pub Date : 2024-02-21 , DOI: 10.2174/0113892029268176240125055419
Aditi R. Durge 1 , Deepti D. Shrimankar 1
Affiliation  

Problem: Analyzing genomic sequences plays a crucial role in understanding biological diversity and classifying Bamboo species. Existing methods for genomic sequence analysis suffer from limitations such as complexity, low accuracy, and the need for constant reconfiguration in response to evolving genomic datasets. Aim: This study addresses these limitations by introducing a novel Dual Heuristic Feature Selection- based Ensemble Classification Model (DHFS-ECM) for the precise identification of Bamboo species from genomic sequences. Methods: The proposed DHFS-ECM method employs a Genetic Algorithm to perform dual heuristic feature selection. This process maximizes inter-class variance, leading to the selection of informative N-gram feature sets. Subsequently, intra-class variance levels are used to create optimal training and validation sets, ensuring comprehensive coverage of class-specific features. The selected features are then processed through an ensemble classification layer, combining multiple stratification models for species-specific categorization. Results: Comparative analysis with state-of-the-art methods demonstrate that DHFS-ECM achieves remarkable improvements in accuracy (9.5%), precision (5.9%), recall (8.5%), and AUC performance (4.5%). Importantly, the model maintains its performance even with an increased number of species classes due to the continuous learning facilitated by the Dual Heuristic Genetic Algorithm Model. Conclusion: DHFS-ECM offers several key advantages, including efficient feature extraction, reduced model complexity, enhanced interpretability, and increased robustness and accuracy through the ensemble classification layer. These attributes make DHFS-ECM a promising tool for real-time clinical applications and a valuable contribution to the field of genomic sequence analysis.

中文翻译:

DHFS-ECM:基于双启发式特征选择的集成分类模型的设计,用于从基因组序列中识别竹种

问题:分析基因组序列对于了解生物多样性和竹种分类起着至关重要的作用。现有的基因组序列分析方法存在一些局限性,例如复杂性、准确性低以及需要不断重新配置以响应不断变化的基因组数据集。目的:本研究通过引入一种新颖的基于双启发式特征选择的集成分类模型(DHFS-ECM)来解决这些局限性,用于从基因组序列中精确识别竹种。方法:所提出的 DHFS-ECM 方法采用遗传算法来执行双重启发式特征选择。此过程最大化类间方差,从而选择信息丰富的 N 元语法特征集。随后,使用类内方差水平来创建最佳训练和验证集,确保全面覆盖类特定特征。然后通过集成分类层处理选定的特征,结合多个分层模型进行特定物种的分类。结果:与最先进方法的比较分析表明,DHFS-ECM 在准确度 (9.5%)、精确度 (5.9%)、召回率 (8.5%) 和 AUC 性能 (4.5%) 方面取得了显着提高。重要的是,由于双重启发式遗传算法模型促进了持续学习,即使物种类别数量增加,该模型也能保持其性能。结论:DHFS-ECM 提供了几个关键优势,包括高效的特征提取、降低的模型复杂性、增强的可解释性以及通过集成分类层提高的鲁棒性和准确性。这些属性使 DHFS-ECM 成为实时临床应用的有前途的工具,并对基因组序列分析领域做出了宝贵的贡献。
更新日期:2024-02-21
down
wechat
bug