当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization
arXiv - CS - Sound Pub Date : 2024-03-21 , DOI: arxiv-2403.14286
Nikhil Raghav, Md Sahidullah

Clustering speaker embeddings is crucial in speaker diarization but hasn't received as much focus as other components. Moreover, the robustness of speaker diarization across various datasets hasn't been explored when the development and evaluation data are from different domains. To bridge this gap, this study thoroughly examines spectral clustering for both same-domain and cross-domain speaker diarization. Our extensive experiments on two widely used corpora, AMI and DIHARD, reveal the performance trend of speaker diarization in the presence of domain mismatch. We observe that the performance difference between two different domain conditions can be attributed to the role of spectral clustering. In particular, keeping other modules unchanged, we show that differences in optimal tuning parameters as well as speaker count estimation originates due to the mismatch. This study opens several future directions for speaker diarization research.

中文翻译:

评估深度说话人二值化的谱聚类的鲁棒性

说话人嵌​​入聚类对于说话人分类至关重要,但没有像其他组件那样受到足够的关注。此外,当开发和评估数据来自不同领域时,尚未探索跨不同数据集的说话者分类的稳健性。为了弥补这一差距,本研究彻底检查了同域和跨域说话人二值化的谱聚类。我们对两个广泛使用的语料库 AMI 和 DIHARD 进行了大量实验,揭示了在存在域不匹配的情况下说话人二值化的性能趋势。我们观察到两个不同域条件之间的性能差异可以归因于谱聚类的作用。特别是,在保持其他模块不变的情况下,我们表明最佳调整参数以及说话者计数估计的差异源于不匹配。这项研究为说话人分类研究开辟了几个未来方向。
更新日期:2024-03-22
down
wechat
bug