当前位置: X-MOL 学术Am. J. Epidemiol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Omics feature selection with the extended SIS R package: identification of a body mass index epigenetic multi-marker in the Strong Heart Study
American Journal of Epidemiology ( IF 5 ) Pub Date : 2024-02-20 , DOI: 10.1093/aje/kwae006
Arce Domingo-Relloso , Yang Feng , Zulema Rodriguez-Hernandez , Karin Haack , Shelley A Cole , Ana Navas-Acien , Maria Tellez-Plaza , Jose D Bermudez

The statistical analysis of omics data poses a great computational challenge given its ultra-high dimensional nature and frequent between-features correlation. In this work, we extended the Iterative Sure Independence Screening (ISIS) algorithm by pairing ISIS with elastic-net (Enet) and two versions of adaptive Enet (AEnet and MSAEnet) to efficiently improve feature selection and effect estimation in omics research. We subsequently used genome-wide human blood DNA methylation data from American Indians of the Strong Heart Study (N=2,235 participants), measured in 1989-1991, to compare the performance (predictive accuracy, coefficient estimation and computational efficiency) of SIS-paired regularization methods to Bayesian shrinkage and traditional linear regression to identify epigenomic multi-marker of body mass index. ISIS-AEnet outperformed the other methods in prediction. In biological pathway enrichment analysis of genes annotated to BMI-related differentially methylated positions, ISIS-AEnet captured most of the enriched pathways in common for at least two of all the evaluated methods. ISIS-AEnet can favor biological discovery because it identifies the most robust biological pathways while achieving an optimal balance between bias and efficient feature selection. In the extended SIS R package, we also implemented ISIS paired with Cox and logistic regression for time-to-event and binary endpoints, respectively, and bootstrap confidence intervals for the estimated regression coefficients.

中文翻译:

使用扩展 SIS R 包进行组学特征选择:强心脏研究中体重指数表观遗传多标记的识别

鉴于组学数据的超高维度性质和频繁的特征间相关性,组学数据的统计分析提出了巨大的计算挑战。在这项工作中,我们通过将 ISIS 与弹性网络(Enet)和自适应 Enet 的两个版本(AEnet 和 MSAEnet)配对来扩展迭代确定独立筛选(ISIS)算法,以有效改进组学研究中的特征选择和效果估计。我们随后使用了 1989 年至 1991 年测量的美国印第安人强心研究(N = 2,235 名参与者)的全基因组人类血液 DNA 甲基化数据,来比较 SIS 配对的性能(预测准确性、系数估计和计算效率)贝叶斯收缩和传统线性回归的正则化方法可识别体重指数的表观基因组多标记。ISIS-AEnet 在预测方面优于其他方法。在对 BMI 相关差异甲基化位置注释的基因进行生物途径富集分析时,ISIS-AEnet 捕获了所有评估方法中至少两种方法共有的大部分富集途径。ISIS-AEnet 可以有利于生物发现,因为它可以识别最稳健的生物途径,同时在偏差和有效特征选择之间实现最佳平衡。在扩展的 SIS R 包中,我们还实现了 ISIS 与 Cox 和逻辑回归配对,分别用于事件时间和二元端点,以及估计回归系数的引导置信区间。
更新日期:2024-02-20
down
wechat
bug