当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning
arXiv - CS - Sound Pub Date : 2024-03-04 , DOI: arxiv-2403.01792
Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

Speech separation has recently made significant progress thanks to the fine-grained vision used in time-domain methods. However, several studies have shown that adopting Short-Time Fourier Transform (STFT) for feature extraction could be beneficial when encountering harsher conditions, such as noise or reverberation. Therefore, we propose a magnitude-conditioned time-domain framework, ConSep, to inherit the beneficial characteristics. The experiment shows that ConSep promotes performance in anechoic, noisy, and reverberant settings compared to two celebrated methods, SepFormer and Bi-Sep. Furthermore, we visualize the components of ConSep to strengthen the advantages and cohere with the actualities we have found in preliminary studies.

中文翻译:

ConSep:通过幅度调节实现噪声和混响鲁棒的语音分离框架

由于时域方法中使用的细粒度视觉,语音分离最近取得了重大进展。然而,多项研究表明,在遇到噪声或混响等更恶劣的条件时,采用短时傅立叶变换 (STFT) 进行特征提取可能会很有帮助。因此,我们提出了一种幅度条件时域框架 ConSep 来继承有益的特征。实验表明,与两种著名的方法 SepFormer 和 Bi-Sep 相比,ConSep 可以提高消声、噪声和混响环境中的性能。此外,我们将 ConSep 的组成部分可视化,以增强优势并与我们在初步研究中发现的现实相一致。
更新日期:2024-03-06
down
wechat
bug