当前位置: X-MOL 学术Front. Neuroinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Comparing feature selection and machine learning approaches for predicting CYP2D6 methylation from genetic variation
Frontiers in Neuroinformatics ( IF 3.5 ) Pub Date : 2024-02-21 , DOI: 10.3389/fninf.2023.1244336
Wei Jing Fong , Hong Ming Tan , Rishabh Garg , Ai Ling Teh , Hong Pan , Varsha Gupta , Bernadus Krishna , Zou Hui Chen , Natania Yovela Purwanto , Fabian Yap , Kok Hian Tan , Kok Yen Jerry Chan , Shiao-Yng Chan , Nicole Goh , Nikita Rane , Ethel Siew Ee Tan , Yuheng Jiang , Mei Han , Michael Meaney , Dennis Wang , Jussi Keppo , Geoffrey Chern-Yee Tan

IntroductionPharmacogenetics currently supports clinical decision-making on the basis of a limited number of variants in a few genes and may benefit paediatric prescribing where there is a need for more precise dosing. Integrating genomic information such as methylation into pharmacogenetic models holds the potential to improve their accuracy and consequently prescribing decisions. Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene conventionally associated with the metabolism of commonly used drugs and endogenous substrates. We thus sought to predict epigenetic loci from single nucleotide polymorphisms (SNPs) related to CYP2D6 in children from the GUSTO cohort.MethodsBuffy coat DNA methylation was quantified using the Illumina Infinium Methylation EPIC beadchip. CpG sites associated with CYP2D6 were used as outcome variables in Linear Regression, Elastic Net and XGBoost models. We compared feature selection of SNPs from GWAS mQTLs, GTEx eQTLs and SNPs within 2 MB of the CYP2D6 gene and the impact of adding demographic data. The samples were split into training (75%) sets and test (25%) sets for validation. In Elastic Net model and XGBoost models, optimal hyperparameter search was done using 10-fold cross validation. Root Mean Square Error and R-squared values were obtained to investigate each models’ performance. When GWAS was performed to determine SNPs associated with CpG sites, a total of 15 SNPs were identified where several SNPs appeared to influence multiple CpG sites.ResultsOverall, Elastic Net models of genetic features appeared to perform marginally better than heritability estimates and substantially better than Linear Regression and XGBoost models. The addition of nongenetic features appeared to improve performance for some but not all feature sets and probes. The best feature set and Machine Learning (ML) approach differed substantially between CpG sites and a number of top variables were identified for each model.DiscussionThe development of SNP-based prediction models for CYP2D6 CpG methylation in Singaporean children of varying ethnicities in this study has clinical application. With further validation, they may add to the set of tools available to improve precision medicine and pharmacogenetics-based dosing.

中文翻译:

比较通过遗传变异预测 CYP2D6 甲基化的特征选择和机器学习方法

简介药物遗传学目前支持基于少数基因的有限数量变异的临床决策,并且可能有利于需要更精确剂量的儿科处方。将甲基化等基因组信息整合到药物遗传学模型中,有可能提高模型的准确性,从而提高处方决策的准确性。细胞色素 P450 2D6 (CYP2D6)是一种高度多态性的基因,通常与常用药物和内源性底物的代谢相关。因此,我们试图从与以下相关的单核苷酸多态性(SNP)预测表观遗传位点:CYP2D6方法使用 Illumina Infinium 甲基化 EPIC 珠芯片对血沉棕黄层 DNA 甲基化进行定量。CpG 位点相关CYP2D6被用作线性回归、弹性网络和 XGBoost 模型中的结果变量。我们比较了 GWAS mQTL、GTEx eQTL 和 2 MB 范围内的 SNP 的 SNP 特征选择。CYP2D6基因以及添加人口统计数据的影响。将样本分为训练集 (75%) 和测试集 (25%) 以进行验证。在 Elastic Net 模型和 XGBoost 模型中,使用 10 倍交叉验证来完成最佳超参数搜索。获得均方根误差和 R 平方值来研究每个模型的性能。当进行 GWAS 以确定与 CpG 位点相关的 SNP 时,总共鉴定了 15 个 SNP,其中几个 SNP 似乎影响多个 CpG 位点。结果总体而言,遗传特征的弹性网络模型的表现似乎略好于遗传力估计,并且明显好于线性模型回归和 XGBoost 模型。添加非遗传特征似乎提高了一些但不是所有特征集和探针的性能。最佳特征集和机器学习 (ML) 方法在 CpG 位点之间存在显着差异,并且为每个模型确定了许多顶级变量。讨论本研究中针对新加坡不同种族儿童中 CYP2D6 CpG 甲基化的基于 SNP 的预测模型的开发临床应用。经过进一步验证,它们可能会增加可用于改善精准医学和基于药物遗传学的剂量的工具集。
更新日期:2024-02-21
down
wechat
bug