当前位置: X-MOL 学术Comput. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Using LSTM neural networks for cross-lingual phonetic speech segmentation with an iterative correction procedure
Computational Intelligence ( IF 2.8 ) Pub Date : 2023-09-19 , DOI: 10.1111/coin.12602
Zdeněk Hanzlíček 1 , Jindřich Matoušek 1 , Jakub Vít 1
Affiliation  

This article describes experiments on speech segmentation using long short-term memory recurrent neural networks. The main part of the paper deals with multi-lingual and cross-lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi-lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering phones of particular languages. Many experiments were performed exploring various experimental conditions and data combinations. We proposed a simple procedure that iteratively adapts the inaccurate default model to the new voice/language. The segmentation accuracy was evaluated by comparison with reference segmentation created by a well-tuned hidden Markov model-based framework with additional manual corrections. The resulting segmentation was also employed in a unit selection text-to-speech system. The generated speech quality was compared with the reference segmentation by a preference listening test.

中文翻译:

使用 LSTM 神经网络通过迭代校正过程进行跨语言语音分割

本文描述了使用长短期记忆递归神经网络进行语音分割的实验。论文的主要部分涉及多语言和跨语言分割,也就是说,它是在与训练模型的语言不同的语言上执行的。实验数据涉及指定用于语音合成的大型捷克语、英语、德语和俄语语音语料库。为了实现最佳的多语言建模,通过共享和聚类特定语言的音素提出了紧凑的语音字母表。进行了许多实验,探索各种实验条件和数据组合。我们提出了一个简单的过程,迭代地使不准确的默认模型适应新的语音/语言。通过与由经过良好调整的基于隐马尔可夫模型的框架创建的参考分割进行比较,并进行额外的手动校正,来评估分割准确性。由此产生的分割也被用于单元选择文本转语音系统。通过偏好听力测试将生成的语音质量与参考分割进行比较。
更新日期:2023-09-19
down
wechat
bug