当前位置: X-MOL 学术medRxiv. Genet. Genom. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Mapping structural variants to rare disease genes using long-read whole genome sequencing and trait-relevant polygenic scores
medRxiv - Genetic and Genomic Medicine Pub Date : 2024-03-18 , DOI: 10.1101/2024.03.15.24304216
C. LeMaster , C. Schwendinger-Schreck , B. Ge , W. Cheung , J. J. Johnston , T. Pastinen , C. Smail

Recent studies have revealed the pervasive landscape of rare structural variants (rSVs) present in human genomes. rSVs can have extreme effects on the expression of proximal genes and, in a rare disease context, have been implicated in patient cases where no diagnostic single nucleotide variant (SNV) was found. Approaches for integrating rSVs to date have focused on targeted approaches in known Mendelian rare disease genes. This approach is intractable for rare diseases with many causal loci or patients with complex, multi-phenotype syndromes. We hypothesized that integrating trait-relevant polygenic scores (PGS) would provide a substantial reduction in the number of candidate disease genes in which to assess rSV effects. We further implemented a method for ranking PGS genes to define a set of core/key genes where a rSV has the potential to exert relatively larger effects on disease risk. Among a subset of patients enrolled in the Genomic Answers for Kids (GA4K) rare disease program (N=497), we used PacBio HiFi long-read whole genome sequencing (lrWGS) to identify rSVs intersecting genes in trait-relevant PGSs. Illustrating our approach in Autism (N=54 cases), we identified 1,827 deletions, 158 duplications, 619 insertions, and 14 inversions overlapping putative core/key PGS genes. Additionally, by integrating genomic constraint annotations from gnomAD, we observed that rare duplications overlapping putative core/key PGS genes were frequently in higher constraint regions compared to controls (P = 2x10-04). This difference was not observed in the lowest-ranked gene set (P = 0.18). Overall, our study provides a framework for the annotation of long-read rSVs from lrWGS data and prioritization of disease-linked genomic regions for downstream functional validation of rSV impacts. To enable reuse by other researchers, we have made SV allele frequencies and gene associations freely available.

中文翻译:

使用长读长全基因组测序和性状相关多基因评分将结构变异映射到罕见疾病基因

最近的研究揭示了人类基因组中普遍存在的罕见结构变异(rSV)。rSV 可对近端基因的表达产生极端影响,并且在罕见疾病情况下,与未发现诊断性单核苷酸变异 (SNV) 的患者病例有关。迄今为止,整合 rSV 的方法主要集中在已知孟德尔罕见病基因的靶向方法上。这种方法对于具有许多致病位点的罕见疾病或具有复杂、多表型综合征的患者来说是棘手的。我们假设整合性状相关多基因评分(PGS)将大幅减少用于评估 rSV 效应的候选疾病基因的数量。我们进一步实施了一种对 PGS 基因进行排序的方法,以定义一组核心/关键基因,其中 rSV 有可能对疾病风险发挥相对较大的影响。在参加儿童基因组答案 (GA4K) 罕见病项目的一部分患者 (N=497) 中,我们使用 PacBio HiFi 长读长全基因组测序 (lrWGS) 来识别与性状相关的 PGS 中基因交叉的 rSV。为了说明我们在自闭症(N=54 例)中的方法,我们鉴定了 1,827 个缺失、158 个重复、619 个插入和 14 个与假定的核心/关键 PGS 基因重叠的倒位。此外,通过整合来自gnomAD的基因组约束注释,我们观察到与对照相比,与假定的核心/关键PGS基因重叠的罕见重复经常出现在较高的约束区域中(P = 2x10-04)。在排名最低的基因集中没有观察到这种差异(P = 0.18)。总体而言,我们的研究提供了一个框架,用于从 lrWGS 数据中注释长读 rSV,并对疾病相关基因组区域进行优先排序,以进行 rSV 影响的下游功能验证。为了便于其他研究人员重复使用,我们免费提供 SV 等位基因频率和基因关联。
更新日期:2024-03-19
down
wechat
bug