当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation
arXiv - CS - Sound Pub Date : 2023-12-06 , DOI: arxiv-2312.03312
Wonjun Lee, Gary Geunbae Lee, Yunsu Kim

This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models. Our approach optimizes these two stages to improve speech recognition across languages. We optimize phoneme vocabulary coverage by merging phonemes based on shared articulatory characteristics, thus improving recognition accuracy. Additionally, we introduce a global phoneme noise generator for realistic ASR noise during phoneme-to-grapheme training to reduce error propagation. Experiments on the CommonVoice 12.0 dataset show significant reductions in Word Error Rate (WER) for low-resource languages, highlighting the effectiveness of our approach. This research contributes to the advancements of two-pass ASR systems in low-resource languages, offering the potential for improved cross-lingual transfer learning.

中文翻译:

优化两遍跨语言迁移学习:音素识别和音素到字素翻译

这项研究通过增强音素识别和音素到字素翻译模型来优化低资源语言中的两遍跨语言迁移学习。我们的方法优化了这两个阶段,以提高跨语言的语音识别能力。我们通过基于共享发音特征合并音素来优化音素词汇覆盖范围,从而提高识别准确性。此外,我们还引入了一个全局音素噪声生成器,可在音素到字素训练期间产生真实的 ASR 噪声,以减少错误传播。CommonVoice 12.0 数据集上的实验表明,低资源语言的词错误率 (WER) 显着降低,凸显了我们方法的有效性。这项研究有助于低资源语言中两遍 ASR 系统的进步,为改进跨语言迁移学习提供了潜力。
更新日期:2023-12-07
down
wechat
bug