当前位置: X-MOL 学术J. Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enricherator: A Bayesian Method for Inferring Regularized Genome-wide Enrichments from Sequencing Count Data
Journal of Molecular Biology ( IF 5.6 ) Pub Date : 2024-04-05 , DOI: 10.1016/j.jmb.2024.168567
Jeremy W. Schroeder , P. Lydia Freddolino

A pervasive question in biological research studying gene regulation, chromatin structure, or genomics is where, and to what extent, does a signal of interest arise genome-wide? This question is addressed using a variety of methods relying on high-throughput sequencing data as their final output, including ChIP-seq for protein-DNA interactions, GapR-seq for measuring supercoiling, and HBD-seq or DRIP-seq for R-loop positioning. Current computational methods to calculate genome-wide enrichment of the signal of interest usually do not properly handle the count-based nature of sequencing data, they often do not make use of the local correlation structure of sequencing data, and they do not apply any regularization of enrichment estimates. This can result in unrealistic estimates of the true underlying biological enrichment of interest, unrealistically low estimates of confidence in point estimates of enrichment (or no estimates of confidence at all), unrealistic gyrations in enrichment estimates at very close (<10 bp) genomic loci due to noise inherent in sequencing data, and in a multiple-hypothesis testing problem during interpretation of genome-wide enrichment estimates. We developed a tool called Enricherator to infer genome-wide enrichments from sequencing count data. Enricherator uses the variational Bayes algorithm to fit a generalized linear model to sequencing count data and to sample from the approximate posterior distribution of enrichment estimates (). Enrichments inferred by Enricherator more precisely identify known binding sites in cases where low coverage between binding sites leads to false-positive peak calls in these noisy regions of the genome; these benefits extend to published datasets.

中文翻译:

Enricherator:一种从测序计数数据推断正则化全基因组富集的贝叶斯方法

在研究基因调控、染色质结构或基因组学的生物学研究中,一个普遍存在的问题是,感兴趣的信号在全基因组范围内出现在哪里以及在多大程度上出现?这个问题可以使用多种依赖高通量测序数据作为最终输出的方法来解决,包括用于蛋白质-DNA 相互作用的 ChIP-seq、用于测量超螺旋的 GapR-seq 以及用于 R 环的 HBD-seq 或 DRIP-seq定位。当前计算感兴趣信号的全基因组富集的计算方法通常不能正确处理测序数据的基于计数的性质,它们通常不利用测序数据的局部相关结构,并且它们不应用任何正则化的富集估计。这可能导致对真正感兴趣的潜在生物富集的估计不切实际,对富集点估计的置信度估计不切实际地低(或根本没有置信度估计),在非常接近(<10 bp)基因组位点的富集估计中不切实际的旋转由于测序数据中固有的噪声以及解释全基因组富集估计期间的多重假设检验问题。我们开发了一种名为 Enricherator 的工具,用于从测序计数数据推断全基因组富集。 Enricherator 使用变分贝叶斯算法将广义线性模型拟合到测序计数数据,并从富集估计的近似后验分布中进行采样 ()。当结合位点之间的低覆盖度导致基因组这些嘈杂区域中的假阳性峰识别时,Enricherator 推断的富集可以更准确地识别已知的结合位点;这些好处扩展到已发布的数据集。
更新日期:2024-04-05
down
wechat
bug