当前位置: X-MOL 学术Ann. Hum. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Identification of influential rare variants in aggregate testing using random forest importance measures
Annals of Human Genetics ( IF 1.9 ) Pub Date : 2023-05-23 , DOI: 10.1111/ahg.12509
Rachel Z Blumhagen 1, 2 , David A Schwartz 3 , Carl D Langefeld 4, 5, 6 , Tasha E Fingerlin 1, 2, 3
Affiliation  

Aggregate tests of rare variants are often employed to identify associated regions compared to sequentially testing each individual variant. When an aggregate test is significant, it is of interest to identify which rare variants are “driving” the association. We recently developed the rare variant influential filtering tool (RIFT) to identify influential rare variants and showed RIFT had higher true positive rates compared to other published methods. Here we use importance measures from the standard random forest (RF) and variable importance weighted RF (vi-RF) to identify influential variants. For very rare variants (minor allele frequency [MAF] < 0.001), the vi-RF:Accuracy method had the highest median true positive rate (TPR = 0.24; interquartile range [IQR]: 0.13, 0.42) followed by the RF:Accuracy method (TPR = 0.16; IQR: 0.07, 0.33) and both were superior to RIFT (TPR = 0.05; IQR: 0.02, 0.15). Among uncommon variants (0.001 < MAF < 0.03), the RF methods had higher true positive rates than RIFT while observing comparable false positive rates. Finally, we applied the RF methods to a targeted resequencing study in idiopathic pulmonary fibrosis (IPF), in which the vi-RF approach identified eight and seven variants in TERT and FAM13A, respectively. In summary, the vi-RF provides an improved, objective approach to identifying influential variants following a significant aggregate test. We have expanded our previously developed R package RIFT to include the random forest methods.

中文翻译:

使用随机森林重要性度量来识别聚合测试中有影响力的罕见变异

与顺序测试每个单独的变体相比,通常采用罕见变体的聚合测试来识别相关区域。当总体测试很重要时,有必要确定哪些罕见变异正在“驱动”这种关联。我们最近开发了罕见变异影响过滤工具(RIFT)来识别有影响的罕见变异,并表明与其他已发表的方法相比,RIFT 具有更高的真阳性率。在这里,我们使用标准随机森林 (RF) 和变量重要性加权 RF (vi-RF) 的重要性度量来识别有影响力的变体。对于非常罕见的变异(次要等位基因频率 [MAF] < 0.001),vi-RF:Accuracy 方法的中位真阳性率最高(TPR = 0.24;四分位距 [IQR]: 0.13, 0.42),其次是 RF:Accuracy方法(TPR = 0.16;IQR:0.07,0。33)并且均优于 RIFT(TPR = 0.05;IQR:0.02、0.15)。在不常见的变异 (0.001 < MAF < 0.03) 中,RF 方法的真阳性率高于 RIFT,同时观察到相当的假阳性率。最后,我们将 RF 方法应用于特发性肺纤维化 (IPF) 的靶向重测序研究,其中 vi-RF 方法鉴定了 8 个和 7 个变异分别为TERTFAM13A。总之,vi-RF 提供了一种改进的、客观的方法来识别有影响力的变异,然后进行重要的综合测试。我们扩展了之前开发的 R 包 RIFT,以包含随机森林方法。
更新日期:2023-05-23
down
wechat
bug