当前位置: X-MOL 学术Electrophoresis › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A complete pipeline enables haplotyping and phasing macrohaplotype in long sequencing reads for polyploidy samples and a multi-source DNA mixture
Electrophoresis ( IF 2.9 ) Pub Date : 2024-01-09 , DOI: 10.1002/elps.202300143
Xuewen Wang 1 , Melissa Muenzler 1 , Jonathan King 1 , Muyi Liu 1 , Hongmin Li 2 , Bruce Budowle 3, 4 , Jianye Ge 1
Affiliation  

Macrohaplotype combines multiple types of phased DNA variants, increasing forensic discrimination power. High-quality long-sequencing reads, for example, PacBio HiFi reads, provide data to detect macrohaplotypes in multiploidy and DNA mixtures. However, the bioinformatics tools for detecting macrohaplotypes are lacking. In this study, we developed a bioinformatics software, MacroHapCaller, in which targeted loci (i.e., short TRs [STRs], single nucleotide polymorphisms, and insertion and deletions) are genotyped and combined with novel algorithms to call macrohaplotypes from long reads. MacroHapCaller uses physical phasing (i.e., read-backed phasing) to identify macrohaplotypes, and thus it can detect multi-allelic macrohaplotypes for a given sample. MacroHapCaller was validated with data generated from our designed targeted PacBio HiFi sequencing pipeline, which sequenced ∼8-kb amplicon regions harboring 20 core forensic STR loci in human benchmark samples HG002 and HG003. MacroHapCaller also was validated in whole-genome long-read sequencing data. Robust and accurate genotyping and phased macrohaplotypes were obtained with MacroHapCaller compared with the known ground truth. MacroHapCaller achieved a higher or consistent genotyping accuracy and faster speed than existing tools HipSTR and DeepVar. MacroHapCaller enables efficient macrohaplotype analysis from high-throughput sequencing data and supports applications using discriminating macrohaplotypes.

中文翻译:

完整的流程可在多倍体样本和多源 DNA 混合物的长测序读取中进行单倍型分析和定相宏单倍型分析

宏单倍型结合了多种类型的定相 DNA 变异,提高了法医辨别力。高质量的长测序读数(例如 PacBio HiFi 读数)提供了检测多倍体和 DNA 混合物中的大单倍型的数据。然而,缺乏用于检测大单倍型的生物信息学工具。在这项研究中,我们开发了一种生物信息学软件MacroHapCaller,其中对目标位点(即短TR [STR]、单核苷酸多态性以及插入和删除)进行基因分型,并与新颖的算法相结合,从长读段中调用宏单倍型。MacroHapCaller 使用物理定相(即回读定相)来识别宏单倍型,因此它可以检测给定样本的多等位基因宏单倍型。MacroHapCaller 使用我们设计的目标 PacBio HiFi 测序管道生成的数据进行了验证,该管道对人类基准样本 HG002 和 HG003 中包含 20 个核心法医 STR 位点的~8-kb 扩增子区域进行了测序。MacroHapCaller 还在全基因组长读长测序数据中得到了验证。与已知的基本事实相比,使用 MacroHapCaller 获得了稳健且准确的基因分型和阶段性宏单倍型。与现有工具 HipSTR 和 DeepVar 相比,MacroHapCaller 实现了更高或一致的基因分型准确性和更快的速度。MacroHapCaller 能够从高通量测序数据中进行高效的宏单倍型分析,并支持使用区分宏单倍型的应用。
更新日期:2024-01-10
down
wechat
bug