当前位置: X-MOL 学术Comput. Struct. Biotechnol. J. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust multi-read reconstruction from noisy clusters using deep neural network for DNA storage
Computational and Structural Biotechnology Journal ( IF 6 ) Pub Date : 2024-03-01 , DOI: 10.1016/j.csbj.2024.02.019
Yun Qin , Fei Zhu , Bo Xi , Lifu Song

DNA holds immense potential as an emerging data storage medium. However, the recovery of information in DNA storage systems faces challenges posed by various errors, including IDS errors, strand breaks, and rearrangements, inevitably introduced during synthesis, amplification, sequencing, and storage processes. Sequence reconstruction, crucial for decoding, involves inferring the DNA reference from a cluster of erroneous copies. While most methods assume equal contributions from all reads within a cluster as noisy copies of the same reference, they often overlook the existence of contaminated sequences caused by DNA breaks, rearrangements, or mis-clustering reads. To address this issue, we propose RobuSeqNet, a robust multi-read reconstruction neural network specifically designed to robustly reconstruct multiple reads, accommodating noisy clusters with strand breakage, rearrangements, and mis-clustered strands. Leveraging the attention mechanism and an elaborate network design, RobuSeqNet exhibits resilience to highly-noisy clusters and effectively deals with in-strand IDS errors. The effectiveness and robustness of the proposed method are validated on three representative next-generation sequencing datasets. Results demonstrate that RobuSeqNet maintains high sequence reconstruction success rates of 99.74%, 99.58%, and 96.44% across three datasets, even in the presence of noisy clusters containing up to 20% contaminated sequences, outperforming known sequence reconstruction models. Additionally, in scenarios without contaminated sequences, it exhibits comparable performance to existing models, achieving success rates of 99.88%, 99.82%, and 97.68% across the three datasets.

中文翻译:

使用用于 DNA 存储的深度神经网络从噪声簇中进行稳健的多重读取重建

DNA 作为一种新兴的数据存储介质具有巨大的潜力。然而,DNA存储系统中的信息恢复面临着合成、扩增、测序和存储过程中不可避免地引入的各种错误带来的挑战,包括IDS错误、链断裂和重排。序列重建对于解码至关重要,涉及从一组错误拷贝推断 DNA 参考。虽然大多数方法假设簇内所有读取的贡献均等,作为同一参考的噪声副本,但它们经常忽略由 DNA 断裂、重排或错误聚类读取引起的污染序列的存在。为了解决这个问题,我们提出了 RobuSeqNet,这是一种强大的多读重建神经网络,专门设计用于稳健地重建多个读,适应具有链断裂、重排和错误聚类链的噪声簇。利用注意力机制和精心设计的网络设计,RobuSeqNet 展现出对高噪声集群的弹性,并有效处理链内 IDS 错误。该方法的有效性和鲁棒性在三个具有代表性的下一代测序数据集上得到了验证。结果表明,RobuSeqNet 在三个数据集中保持了 99.74%、99.58% 和 96.44% 的高序列重建成功率,即使存在包含高达 20% 污染序列的噪声簇,其性能优于已知的序列重建模型。此外,在没有污染序列的场景中,它表现出与现有模型相当的性能,在三个数据集上实现了 99.88%、99.82% 和 97.68% 的成功率。
更新日期:2024-03-01
down
wechat
bug