Leveraging Laryngograph Data for Robust Voicing Detection in Speech,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Leveraging Laryngograph Data for Robust Voicing Detection in Speech
arXiv - CS - Sound Pub Date : 2023-12-05 , DOI: arxiv-2312.03129
Yixuan Zhang, Heming Wang, DeLiang Wang

Accurately detecting voiced intervals in speech signals is a critical step in pitch tracking and has numerous applications. While conventional signal processing methods and deep learning algorithms have been proposed for this task, their need to fine-tune threshold parameters for different datasets and limited generalization restrict their utility in real-world applications. To address these challenges, this study proposes a supervised voicing detection model that leverages recorded laryngograph data. The model is based on a densely-connected convolutional recurrent neural network (DC-CRN), and trained on data with reference voicing decisions extracted from laryngograph data sets. Pretraining is also investigated to improve the generalization ability of the model. The proposed model produces robust voicing detection results, outperforming other strong baseline methods, and generalizes well to unseen datasets. The source code of the proposed model with pretraining is provided along with the list of used laryngograph datasets to facilitate further research in this area.

中文翻译：

利用喉头图数据进行稳健的语音发声检测

准确检测语音信号中的浊音间隔是音高跟踪的关键步骤，并且具有许多应用。虽然已经针对此任务提出了传统的信号处理方法和深度学习算法，但它们需要针对不同数据集微调阈值参数，并且泛化能力有限，限制了它们在实际应用中的实用性。为了应对这些挑战，本研究提出了一种利用记录的喉镜数据的监督发声检测模型。该模型基于密集连接的卷积循环神经网络 (DC-CRN)，并使用从喉镜数据集中提取的参考发声决策进行数据训练。还研究了预训练以提高模型的泛化能力。所提出的模型产生了稳健的发声检测结果，优于其他强大的基线方法，并且可以很好地推广到未见过的数据集。提供了所提出的预训练模型的源代码以及所使用的喉镜数据集的列表，以促进该领域的进一步研究。

更新日期：2023-12-07

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>