当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization
arXiv - CS - Sound Pub Date : 2024-03-21 , DOI: arxiv-2403.14286 Nikhil Raghav, Md Sahidullah
arXiv - CS - Sound Pub Date : 2024-03-21 , DOI: arxiv-2403.14286 Nikhil Raghav, Md Sahidullah
Clustering speaker embeddings is crucial in speaker diarization but hasn't
received as much focus as other components. Moreover, the robustness of speaker
diarization across various datasets hasn't been explored when the development
and evaluation data are from different domains. To bridge this gap, this study
thoroughly examines spectral clustering for both same-domain and cross-domain
speaker diarization. Our extensive experiments on two widely used corpora, AMI
and DIHARD, reveal the performance trend of speaker diarization in the presence
of domain mismatch. We observe that the performance difference between two
different domain conditions can be attributed to the role of spectral
clustering. In particular, keeping other modules unchanged, we show that
differences in optimal tuning parameters as well as speaker count estimation
originates due to the mismatch. This study opens several future directions for
speaker diarization research.
中文翻译:
评估深度说话人二值化的谱聚类的鲁棒性
说话人嵌入聚类对于说话人分类至关重要,但没有像其他组件那样受到足够的关注。此外,当开发和评估数据来自不同领域时,尚未探索跨不同数据集的说话者分类的稳健性。为了弥补这一差距,本研究彻底检查了同域和跨域说话人二值化的谱聚类。我们对两个广泛使用的语料库 AMI 和 DIHARD 进行了大量实验,揭示了在存在域不匹配的情况下说话人二值化的性能趋势。我们观察到两个不同域条件之间的性能差异可以归因于谱聚类的作用。特别是,在保持其他模块不变的情况下,我们表明最佳调整参数以及说话者计数估计的差异源于不匹配。这项研究为说话人分类研究开辟了几个未来方向。
更新日期:2024-03-22
中文翻译:
评估深度说话人二值化的谱聚类的鲁棒性
说话人嵌入聚类对于说话人分类至关重要,但没有像其他组件那样受到足够的关注。此外,当开发和评估数据来自不同领域时,尚未探索跨不同数据集的说话者分类的稳健性。为了弥补这一差距,本研究彻底检查了同域和跨域说话人二值化的谱聚类。我们对两个广泛使用的语料库 AMI 和 DIHARD 进行了大量实验,揭示了在存在域不匹配的情况下说话人二值化的性能趋势。我们观察到两个不同域条件之间的性能差异可以归因于谱聚类的作用。特别是,在保持其他模块不变的情况下,我们表明最佳调整参数以及说话者计数估计的差异源于不匹配。这项研究为说话人分类研究开辟了几个未来方向。