当前位置: X-MOL 学术Algorithms Mol. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Perplexity: evaluating transcript abundance estimation in the absence of ground truth
Algorithms for Molecular Biology ( IF 1 ) Pub Date : 2022-03-25 , DOI: 10.1186/s13015-022-00214-y
Jason Fan 1 , Skylar Chan 1 , Rob Patro 1
Affiliation  

There has been rapid development of probabilistic models and inference methods for transcript abundance estimation from RNA-seq data. These models aim to accurately estimate transcript-level abundances, to account for different biases in the measurement process, and even to assess uncertainty in resulting estimates that can be propagated to subsequent analyses. The assumed accuracy of the estimates inferred by such methods underpin gene expression based analysis routinely carried out in the lab. Although hyperparameter selection is known to affect the distributions of inferred abundances (e.g. producing smooth versus sparse estimates), strategies for performing model selection in experimental data have been addressed informally at best. We derive perplexity for evaluating abundance estimates on fragment sets directly. We adapt perplexity from the analogous metric used to evaluate language and topic models and extend the metric to carefully account for corner cases unique to RNA-seq. In experimental data, estimates with the best perplexity also best correlate with qPCR measurements. In simulated data, perplexity is well behaved and concordant with genome-wide measurements against ground truth and differential expression analysis. Furthermore, we demonstrate theoretically and experimentally that perplexity can be computed for arbitrary transcript abundance estimation models. Alongside the derivation and implementation of perplexity for transcript abundance estimation, our study is the first to make possible model selection for transcript abundance estimation on experimental data in the absence of ground truth.

中文翻译:

困惑:在没有基本事实的情况下评估转录本丰度估计

从 RNA-seq 数据估计转录本丰度的概率模型和推理方法得到了快速发展。这些模型旨在准确估计转录水平的丰度,考虑测量过程中的不同偏差,甚至评估结果估计中的不确定性,这些估计可以传播到后续分析。通过这些方法推断出的估计的假设准确性支持在实验室中常规进行的基于基因表达的分析。尽管已知超参数选择会影响推断丰度的分布(例如,产生平滑与稀疏估计),但在实验数据中执行模型选择的策略充其量只是非正式地解决。我们得出直接评估片段集丰度估计的困惑。我们从用于评估语言和主题模型的类似度量中调整了困惑度,并扩展了该度量以仔细考虑 RNA-seq 特有的极端情况。在实验数据中,具有最佳困惑度的估计也与 qPCR 测量最相关。在模拟数据中,perplexity 表现良好,并且与针对基本事实和差异表达分析的全基因组测量结果一致。此外,我们在理论上和实验上证明了可以为任意转录本丰度估计模型计算困惑度。除了推导和实现转录本丰度估计的困惑之外,我们的研究是第一个在没有基本事实的情况下为实验数据的转录本丰度估计做出可能的模型选择。
更新日期:2022-03-25
down
wechat
bug