当前位置: X-MOL 学术IEEE Trans. NanoBiosci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Developing a New Phylogeny-Driven Random Forest Model for Functional Metagenomics
IEEE Transactions on NanoBioscience ( IF 3.9 ) Pub Date : 2023-06-06 , DOI: 10.1109/tnb.2023.3283462
Jyotsna Talreja Wassan 1 , Haiying Wang 2 , Huiru Zheng 2
Affiliation  

Metagenomics is an unobtrusive science linking microbial genes to biological functions or environmental states. Classifying microbial genes into their functional repertoire is an important task in the downstream analysis of Metagenomic studies. The task involves Machine Learning (ML) based supervised methods to achieve good classification performance. Random Forest (RF) has been applied rigorously to microbial gene abundance profiles, mapping them to functional phenotypes. The current research targets tuning RF by the evolutionary ancestry of microbial phylogeny, developing a Phylogeny-RF model for functional classification of metagenomes. This method facilitates capturing the effects of phylogenetic relatedness in an ML classifier itself rather than just applying a supervised classifier over the raw abundances of microbial genes. The idea is rooted in the fact that closely related microbes by phylogeny are highly correlated and tend to have similar genetic and phenotypic traits. Such microbes behave similarly; and hence tend to be selected together, or one of these could be dropped from the analysis, to improve the ML process. The proposed Phylogeny-RF algorithm has been compared with state-of-the-art classification methods including RF and the phylogeny-aware methods of MetaPhyl and PhILR, using three real-world 16S rRNA metagenomic datasets. It has been observed that the proposed method not only achieved significantly better performance than the traditional RF model but also performed better than the other phylogeny-driven benchmarks (p < 0.05). For example, Phylogeny-RF attained a highest AUC of 0.949 and Kappa of 0.891 over soil microbiomes in comparison to other benchmarks.

中文翻译:

开发一种新的系统发育驱动的功能宏基因组随机森林模型

宏基因组学是一门不起眼的科学,将微生物基因与生物功能或环境状态联系起来。将微生物基因分类为其功能库是宏基因组研究下游分析的一项重要任务。该任务涉及基于机器学习 (ML) 的监督方法,以实现良好的分类性能。随机森林 (RF) 已被严格应用于微生物基因丰度谱,将它们映射到功能表型。目前的研究目标是通过微生物系统发育的进化祖先来调整 RF,开发用于宏基因组功能分类的系统发育-RF 模型。这种方法有助于捕获机器学习分类器本身的系统发育相关性的影响,而不是仅仅对微生物基因的原始丰度应用监督分类器。这个想法植根于这样一个事实:系统发育密切相关的微生物具有高度相关性,并且往往具有相似的遗传和表型特征。这些微生物的行为相似。因此往往会一起选择,或者可以从分析中删除其中之一,以改进机器学习过程。使用三个真实世界的 16S rRNA 宏基因组数据集,将所提出的系统发生-RF 算法与最先进的分类方法(包括 RF 以及 MetaPhyl 和 PhILR 的系统发生感知方法)进行了比较。据观察,所提出的方法不仅比传统的 RF 模型取得了显着更好的性能,而且比其他系统发育驱动的基准表现更好 (p < 0.05)。例如,与其他基准相比,系统发育-RF 在土壤微生物组上的 AUC 最高为 0.949,Kappa 为 0.891。
更新日期:2023-06-06
down
wechat
bug