当前位置: X-MOL 学术medRxiv. Genet. Genom. Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
The benefit of a complete reference genome for cancer structural variant analysis
medRxiv - Genetic and Genomic Medicine Pub Date : 2024-03-18 , DOI: 10.1101/2024.03.15.24304369
Luis F Paulin , Jeremy Fan , Kieran O’Neill , Erin Pleasance , Vanessa L. Porter , Steven J.M Jones , Fritz J. Sedlazeck

The complexities of cancer genomes are becoming more easily interpreted due to advancements in sequencing technologies and improved bioinformatic analysis. Structural variants (SVs) represent an important subset of somatic events in tumors. While detection of SVs has been markedly improved by the development of long-read sequencing, somatic variant identification and annotation remains challenging. We hypothesized that use of a completed human reference genome (CHM13-T2T) would improve somatic SV calling. Our findings in a tumour/normal matched benchmark sample and two patient samples show that the CHM13-T2T improves SV detection and prioritization accuracy compared to GRCh38, with a notable reduction in false positive calls. We also overcame the lack of annotation resources for CHM13-T2T by lifting over CHM13-T2T-aligned reads to the GRCh38 genome, therefore combining both improved alignment and advanced annotations. In this process, we assessed the current SV benchmark set for COLO829/COLO829BL across four replicates sequenced at different centers with different long-read technologies. We discovered instability of this cell line across these replicates; 346 SVs (1.13%) were only discoverable in a single replicate. We identify 49 somatic SVs, which appear to be stable as they are consistently present across the four replicates. As such, we propose this consensus set as an updated benchmark for somatic SV calling and include both GRCh38 and CHM13-T2T coordinates in our benchmark. The benchmark is available at: 10.5281/zenodo.10819636 Our work demonstrates new approaches to optimize somatic SV prioritization in cancer with potential improvements in other genetic diseases.

中文翻译:

完整参考基因组对癌症结构变异分析的好处

由于测序技术的进步和生物信息分析的改进,癌症基因组的复杂性变得越来越容易解释。结构变异(SV)代表肿瘤中体细胞事件的一个重要子集。虽然长读长测序的发展显着改善了 SV 的检测,但体细胞变异的识别和注释仍然具有挑战性。我们假设使用完整的人类参考基因组 (CHM13-T2T) 将改善体细胞 SV 调用。我们在肿瘤/正常匹配基准样本和两个患者样本中的研究结果表明,与 GRCh38 相比,CHM13-T2T 提高了 SV 检测和优先排序准确性,并且误报率显着减少。我们还通过将 CHM13-T2T 比对读取提升到 GRCh38 基因组来克服 CHM13-T2T 注释资源的缺乏,从而将改进的比对和高级注释结合起来。在此过程中,我们评估了 COLO829/COLO829BL 在不同中心使用不同长读技术测序的四个重复序列的当前 SV 基准集。我们发现该细胞系在这些重复中不稳定;346 个 SV(1.13%)仅在一次重复中发现。我们鉴定了 49 个体细胞 SV,它们似乎是稳定的,因为它们在四个重复中始终存在。因此,我们建议将此共识设置为体细胞 SV 调用的更新基准,并将 GRCh38 和 CHM13-T2T 坐标纳入我们的基准中。该基准可在以下网址获取:10.5281/zenodo.10819636 我们的工作展示了优化癌症体细胞 SV 优先级的新方法,并对其他遗传疾病具有潜在的改善作用。
更新日期:2024-03-19
down
wechat
bug