当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
arXiv - CS - Sound Pub Date : 2023-12-07 , DOI: arxiv-2312.04324
Federico Landini, Mireia Diez, Themos Stafylakis, Lukáš Burget

Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly regarding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most successful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and running inference on almost half of the time on long recordings. Furthermore, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.

中文翻译:

DiaPer:使用基于感知器的吸引器进行端到端神经二化

直到最近,说话人分类领域仍由级联系统主导。由于其局限性,主要是重叠语音和繁琐的管道,端到端模型最近非常受欢迎。最成功的模型之一是基于吸引器的编码器-解码器的端到端神经二值化(EEND-EDA)。在这项工作中,我们用基于 Perceiver 的 EDA 模块替换了 EDA 模块,并展示了其相对于 EEND-EDA 的优势;即在经过大量研究的 Callhome 数据集上获得更好的性能,更准确地找到对话中说话者的数量,并在长录音中的近一半时间上运行推理。此外,与其他方法进行详尽比较时,我们的模型 DiaPer 通过非常轻量级的设计实现了卓越的性能。此外,我们还与其他作品以及十多个公共宽带数据集的级联基线进行比较。与本出版物一起,我们发布了 DiaPer 的代码以及在公共和免费数据上训练的模型。
更新日期:2023-12-08
down
wechat
bug