Pairwise comparative analysis of six haplotype assembly methods based on users’ experience,BMC Genetics

当前位置： X-MOL 学术 › BMC Genet. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Pairwise comparative analysis of six haplotype assembly methods based on users’ experience
BMC Genetics ( IF 2.9 ) Pub Date : 2023-06-29 , DOI: 10.1186/s12863-023-01134-5
Shuying Sun ₁ , Flora Cheng ₂ , Daphne Han ₂ , Sarah Wei ₃ , Alice Zhong ₄ , Sherwin Massoudian ₅ , Alison B Johnson ₆

Affiliation

A haplotype is a set of DNA variants inherited together from one parent or chromosome. Haplotype information is useful for studying genetic variation and disease association. Haplotype assembly (HA) is a process of obtaining haplotypes using DNA sequencing data. Currently, there are many HA methods with their own strengths and weaknesses. This study focused on comparing six HA methods or algorithms: HapCUT2, MixSIH, PEATH, WhatsHap, SDhaP, and MAtCHap using two NA12878 datasets named hg19 and hg38. The 6 HA algorithms were run on chromosome 10 of these two datasets, each with 3 filtering levels based on sequencing depth (DP1, DP15, and DP30). Their outputs were then compared. Run time (CPU time) was compared to assess the efficiency of 6 HA methods. HapCUT2 was the fastest HA for 6 datasets, with run time consistently under 2 min. In addition, WhatsHap was relatively fast, and its run time was 21 min or less for all 6 datasets. The other 4 HA algorithms’ run time varied across different datasets and coverage levels. To assess their accuracy, pairwise comparisons were conducted for each pair of the six packages by generating their disagreement rates for both haplotype blocks and Single Nucleotide Variants (SNVs). The authors also compared them using switch distance (error), i.e., the number of positions where two chromosomes of a certain phase must be switched to match with the known haplotype. HapCUT2, PEATH, MixSIH, and MAtCHap generated output files with similar numbers of blocks and SNVs, and they had relatively similar performance. WhatsHap generated a much larger number of SNVs in the hg19 DP1 output, which caused it to have high disagreement percentages with other methods. However, for the hg38 data, WhatsHap had similar performance as the other 4 algorithms, except SDhaP. The comparison analysis showed that SDhaP had a much larger disagreement rate when it was compared with the other algorithms in all 6 datasets. The comparative analysis is important because each algorithm is different. The findings of this study provide a deeper understanding of the performance of currently available HA algorithms and useful input for other users.

中文翻译：

基于用户经验的六种单倍型组装方法的成对比较分析

单倍型是从父母或染色体一起遗传的一组 DNA 变体。单倍型信息对于研究遗传变异和疾病关联非常有用。单倍型组装 (HA) 是使用 DNA 测序数据获得单倍型的过程。目前HA方法有很多，各有优缺点。本研究重点使用两个名为 hg19 和 hg38 的 NA12878 数据集来比较六种 HA 方法或算法：HapCUT2、MixSIH、PEATH、WhatsHap、SDhaP 和 MAtCHap。6 个 HA 算法在这两个数据集的 10 号染色体上运行，每个算法具有基于测序深度的 3 个过滤级别（DP1、DP15 和 DP30）。然后比较他们的输出。通过比较运行时间（CPU 时间）来评估 6 种 HA 方法的效率。HapCUT2 是 6 个数据集最快的 HA，运行时间始终低于 2 分钟。此外，WhatsHap 的速度相对较快，所有 6 个数据集的运行时间均为 21 分钟或更短。其他 4 个 HA 算法的运行时间因数据集和覆盖级别的不同而异。为了评估其准确性，通过生成单倍型块和单核苷酸变体 (SNV) 的不一致率，对六个包中的每一对进行了成对比较。作者还使用切换距离（误差）对它们进行了比较，即某个相的两条染色体必须切换以与已知单倍型匹配的位置数。HapCUT2、PEATH、MixSIH 和 MAtCHap 生成的输出文件具有相似数量的块和 SNV，并且它们具有相对相似的性能。WhatsHap 在 hg19 DP1 输出中生成了大量的 SNV，这导致它与其他方法有很高的分歧百分比。然而，对于 hg38 数据，WhatsHap 与除 SDhaP 之外的其他 4 种算法具有相似的性能。对比分析表明，与所有 6 个数据集中的其他算法相比，SDhaP 的分歧率要大得多。比较分析很重要，因为每种算法都是不同的。这项研究的结果使人们能够更深入地了解当前可用的 HA 算法的性能，并为其他用户提供有用的输入。

更新日期：2023-06-29

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>