ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning
arXiv - CS - Sound Pub Date : 2024-03-04 , DOI: arxiv-2403.01792
Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Speech separation has recently made significant progress thanks to the fine-grained vision used in time-domain methods. However, several studies have shown that adopting Short-Time Fourier Transform (STFT) for feature extraction could be beneficial when encountering harsher conditions, such as noise or reverberation. Therefore, we propose a magnitude-conditioned time-domain framework, ConSep, to inherit the beneficial characteristics. The experiment shows that ConSep promotes performance in anechoic, noisy, and reverberant settings compared to two celebrated methods, SepFormer and Bi-Sep. Furthermore, we visualize the components of ConSep to strengthen the advantages and cohere with the actualities we have found in preliminary studies.

中文翻译：

ConSep：通过幅度调节实现噪声和混响鲁棒的语音分离框架

由于时域方法中使用的细粒度视觉，语音分离最近取得了重大进展。然而，多项研究表明，在遇到噪声或混响等更恶劣的条件时，采用短时傅立叶变换 (STFT) 进行特征提取可能会很有帮助。因此，我们提出了一种幅度条件时域框架 ConSep 来继承有益的特征。实验表明，与两种著名的方法 SepFormer 和 Bi-Sep 相比，ConSep 可以提高消声、噪声和混响环境中的性能。此外，我们将 ConSep 的组成部分可视化，以增强优势并与我们在初步研究中发现的现实相一致。

更新日期：2024-03-06

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>