ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis,ACM Transactions on Architecture and Code Optimization

当前位置： X-MOL 学术 › ACM Trans. Archit. Code Optim. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis
ACM Transactions on Architecture and Code Optimization ( IF 1.6 ) Pub Date : 2023-12-28 , DOI: 10.1145/3632950
Can Firtina ₁ , Kamlesh Pillai ₂ , Gurpreet S. Kalsi ₂ , Bharathwaj Suresh ₂ , Damla Senol Cali ₃ , Jeremie S. Kim ₁ , Taha Shahroodi ₄ , Meryem Banu Cavlak ₁ , Joël Lindegger ₁ , Mohammed Alser ₁ , Juan Gómez Luna ₁ , Sreenivas Subramoney ₂ , Onur Mutlu ₁

Affiliation

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences are represented as graph structures, where states and edges capture modifications (i.e., insertions, deletions, and substitutions) by assigning probabilities to them. These probabilities are subsequently used to compute the similarity score between a sequence and a pHMM graph. The Baum-Welch algorithm, a prevalent and highly accurate method, utilizes these probabilities to optimize and compute similarity scores. Accurate computation of these probabilities is essential for the correct identification of sequence similarities. However, the Baum-Welch algorithm is computationally intensive, and existing solutions offer either software-only or hardware-only approaches with fixed pHMM designs. When we analyze state-of-the-art works, we identify an urgent need for a flexible, high-performance, and energy-efficient hardware-software co-design to address the major inefficiencies in the Baum-Welch algorithm for pHMMs.

We introduce ApHMM, the first flexible acceleration framework designed to significantly reduce both computational and energy overheads associated with the Baum-Welch algorithm for pHMMs. ApHMM employs hardware-software co-design to tackle the major inefficiencies in the Baum-Welch algorithm by 1) designing flexible hardware to accommodate various pHMM designs, 2) exploiting predictable data dependency patterns through on-chip memory with memoization techniques, 3) rapidly filtering out unnecessary computations using a hardware-based filter, and 4) minimizing redundant computations.

ApHMM achieves substantial speedups of 15.55 × - 260.03 ×, 1.83 × - 5.34 ×, and 27.97 × when compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms state-of-the-art CPU implementations in three key bioinformatics applications: 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29 × - 59.94 ×, 1.03 × - 1.75 ×, and 1.03 × - 1.95 ×, respectively, while improving their energy efficiency by 64.24 × - 115.46 ×, 1.75 ×, 1.96 ×.

中文翻译：

ApHMM：加速剖面隐马尔可夫模型，实现快速、节能的基因组分析

轮廓隐马尔可夫模型 (pHMM) 广泛应用于各种生物信息学应用中，以识别生物序列（例如 DNA 或蛋白质序列）之间的相似性。在 pHMM 中，序列被表示为图结构，其中状态和边通过为其分配概率来捕获修改（即插入、删除和替换）。这些概率随后用于计算序列和 pHMM 图之间的相似性得分。Baum-Welch 算法是一种流行且高度准确的方法，它利用这些概率来优化和计算相似度分数。准确计算这些概率对于正确识别序列相似性至关重要。然而，Baum-Welch 算法计算量大，现有解决方案提供具有固定 pHMM 设计的纯软件或纯硬件方法。当我们分析最先进的工作时，我们发现迫切需要灵活、高性能和节能的硬件-软件协同设计，以解决 pHMM 的 Baum-Welch 算法的主要低效率问题。

我们推出了ApHMM，这是第一个灵活的加速框架，旨在显着减少与 pHMM 的 Baum-Welch 算法相关的计算和能源开销。ApHMM 采用硬件-软件协同设计来解决 Baum-Welch 算法中主要的低效率问题，方法是：1) 设计灵活的硬件以适应各种 pHMM 设计，2) 通过具有记忆技术的片上存储器利用可预测的数据依赖模式，3) 快速使用基于硬件的过滤器过滤掉不必要的计算，以及 4) 最小化冗余计算。

与 Baum-Welch 算法的 CPU、GPU 和 FPGA 实现相比，ApHMM 分别实现了 15.55 × - 260.03 ×、1.83 × - 5.34 × 和 27.97 × 的显着加速。ApHMM 在三个关键生物信息学应用中的性能优于最先进的 CPU 实现：1) 纠错、2) 蛋白质家族搜索和 3) 多序列比对，分别为 1.29 × - 59.94 ×、1.03 × - 1.75 × 和 1.03 × - 1.95 × ，同时能源效率分别提高了 64.24 × - 115.46 × 、 1.75 × 、 1.96 × 。

更新日期：2023-12-29

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>