当前位置: X-MOL 学术medRxiv. Genet. Genom. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A reassessment of Hardy-Weinberg equilibrium filtering in large sample Genomic studies.
medRxiv - Genetic and Genomic Medicine Pub Date : 2024-03-19 , DOI: 10.1101/2024.02.07.24301951
Phil J Greer , Anastazie Sedlakova , Mitchell Ellison , Talia DeFrancesco Oranburg , Martin Maiers , David C Whitcomb , Ben Busby

Hardy Weinberg Equilibrium (HWE) is a fundamental principle of population genetics. Adherence to HWE, using a p-value filter, is used as a quality control measure to remove potential genotyping errors prior to certain analyses. Larger sample sizes increase power to differentiate smaller effect sizes, but will also affect methods of quality control. Here, we test the effects of current methods of HWE QC filtering on varying sample sizes up to 486,178 subjects for imputed and Whole Exome Sequencing (WES) genotypes using data from the UK Biobank and propose potential alternative filtering methods. METHODS: Simulations were performed on imputed genotype data using chromosome 1. WES GWAS (Genome Wide Association Study) was performed using PLINK2. RESULTS: Our simulations on the imputed data from Chromosome 1 show a progressive increase in the number of SNPs eliminated from analysis as sample sizes increase. As the HWE p-value filter remains constant at p<1e-15, the number of SNPs removed increases from 1.66% at n=10,000 to 18.86% at n=486,178 in a multi-ancestry cohort and from 0.002% at n=10,000 to 0.334% at n=300,000 in a European ancestry cohort. Greater reductions are shown in WES analysis with a 11.91% reduction in analyzed SNPs in a European ancestry cohort n=362,192, and a 32.70% reduction in SNPs in a multi- ancestry dataset n=463,605. Using a sample size specific HWE p-value cutoff removes ~2.25% of SNPs in the all ancestry cohort across all sample sizes, but does not currently scale beyond 300,000 samples. A hard cutoff of +/- 20% deviation from HWE produces the most consistent results and scales across all sample sizes but requires additional user steps. CONCLUSION: Testing for deviance from HWE may still be an important quality control step in GWAS studies, however we demonstrate here that using an HWE p-value threshold that is acceptable for smaller sample sizes will be inappropriate for large sample studies due to an unnecessarily high number of variants removed prior to analysis. Rather than exclude variants that fail HWE prior to analysis it may be better to include all variants in the analysis and examine their deviation from HWE afterward. We believe that adjusting the cutoffs will be even more important for large whole genome sequencing results and more diverse population studies.

中文翻译:

大样本基因组研究中哈迪-温伯格平衡过滤的重新评估。

哈迪温伯格平衡 (HWE) 是群体遗传学的基本原理。遵守 HWE,使用 p 值过滤器,用作质量控制措施,以在某些分析之前消除潜在的基因分型错误。较大的样本量会增加区分较小效应量的能力,但也会影响质量控制方法。在这里,我们使用英国生物银行的数据测试了当前 HWE QC 过滤方法对多达 486,178 名受试者的估算和全外显子组测序 (WES) 基因型的不同样本量的影响,并提出了潜在的替代过滤方法。方法:使用 1 号染色体对估算基因型数据进行模拟。使用 PLINK2 进行 WES GWAS(全基因组关联研究)。结果:我们对 1 号染色体的估算数据进行的模拟显示,随着样本量的增加,从分析中消除的 SNP 数量逐渐增加。由于 HWE p 值过滤器保持恒定在 p<1e-15,因此在多祖先队列中,去除的 SNP 数量从 n=10,000 时的 1.66% 增加到 n=486,178 时的 18.86%,并且从 n=10,000 时的 0.002% 增加在欧洲血统队列中,n = 300,000 时,该比例降至 0.334%。WES 分析显示,欧洲血统队列 n=362,192 中分析的 SNP 减少了 11.91%,多血统数据集 n=463,605 中的 SNP 减少了 32.70%。使用样本量特定的 HWE p 值截止值可以去除所有样本量的所有血统队列中约 2.25% 的 SNP,但目前规模不会超过 300,000 个样本。与 HWE 偏差 +/- 20% 的硬截止可在所有样本量上产生最一致的结果和比例,但需要额外的用户步骤。结论:HWE 偏差测试可能仍然是 GWAS 研究中重要的质量控制步骤,但是我们在此证明,使用较小样本量可接受的 HWE p 值阈值将不适合大样本研究,因为不必要的高值分析前删除的变体数量。与其在分析之前排除 HWE 失败的变体,不如在分析中包含所有变体,然后检查它们与 HWE 的偏差。我们相信,调整截止值对于大型全基因组测序结果和更多样化的群体研究将更加重要。
更新日期:2024-03-20
down
wechat
bug