CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition
arXiv - CS - Sound Pub Date : 2024-01-04 , DOI: arxiv-2401.02046
Junfeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin

Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skipping method that leverages the CTC blank output from intermediate layers to trigger the skipping of the last few encoder layers for frames with high blank probabilities. Furthermore, we factorize the CTC output distribution and perform knowledge distillation on intermediate layers to reduce computation and improve recognition accuracy. Experimental results show that by utilizing the CTC blank, the encoder layer depth can be adjusted dynamically, resulting in 29% acceleration of the CTC model inference with minor performance degradation.

中文翻译：

CTC 空白触发动态跳层，实现基于 CTC 的高效语音识别

尽管性能令人印象深刻，但在计算资源有限的情况下部署端到端语音识别模型仍然具有挑战性。鉴于模型规模的逐渐增加和模型应用的广泛，针对不同输入选择性地执行模型组件以提高推理效率非常有意义。在本文中，我们提出了一种动态跳层方法，该方法利用中间层的 CTC 空白输出来触发跳过具有高空白概率的帧的最后几个编码器层。此外，我们对CTC输出分布进行因子分解，并对中间层进行知识蒸馏，以减少计算量并提高识别精度。实验结果表明，利用CTC空白，可以动态调整编码器层深度，使CTC模型推理速度提高29%，而性能下降较小。

更新日期：2024-01-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>