当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Diffusion-Based Speech Enhancement in Matched and Mismatched Conditions Using a Heun-Based Sampler
arXiv - CS - Sound Pub Date : 2023-12-05 , DOI: arxiv-2312.02683 Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May
arXiv - CS - Sound Pub Date : 2023-12-05 , DOI: arxiv-2312.02683 Philippe Gonzalez, Zheng-Hua Tan, Jan Østergaard, Jesper Jensen, Tommy Sonne Alstrøm, Tobias May
Diffusion models are a new class of generative models that have recently been
applied to speech enhancement successfully. Previous works have demonstrated
their superior performance in mismatched conditions compared to state-of-the
art discriminative models. However, this was investigated with a single
database for training and another one for testing, which makes the results
highly dependent on the particular databases. Moreover, recent developments
from the image generation literature remain largely unexplored for speech
enhancement. These include several design aspects of diffusion models, such as
the noise schedule or the reverse sampler. In this work, we systematically
assess the generalization performance of a diffusion-based speech enhancement
model by using multiple speech, noise and binaural room impulse response (BRIR)
databases to simulate mismatched acoustic conditions. We also experiment with a
noise schedule and a sampler that have not been applied to speech enhancement
before. We show that the proposed system substantially benefits from using
multiple databases for training, and achieves superior performance compared to
state-of-the-art discriminative models in both matched and mismatched
conditions. We also show that a Heun-based sampler achieves superior
performance at a smaller computational cost compared to a sampler commonly used
for speech enhancement.
中文翻译:
使用基于 Heun 的采样器在匹配和不匹配条件下进行基于扩散的语音增强
扩散模型是一类新型生成模型,最近已成功应用于语音增强。与最先进的判别模型相比,之前的作品已经证明了它们在不匹配条件下的优越性能。然而,这是使用一个用于训练的数据库和另一个用于测试的数据库进行调查的,这使得结果高度依赖于特定的数据库。此外,图像生成文献的最新进展在很大程度上仍未被探索用于语音增强。其中包括扩散模型的几个设计方面,例如噪声调度或反向采样器。在这项工作中,我们通过使用多个语音、噪声和双耳房间脉冲响应 (BRIR) 数据库来模拟不匹配的声学条件,系统地评估基于扩散的语音增强模型的泛化性能。我们还尝试了之前未应用于语音增强的噪声调度和采样器。我们表明,所提出的系统大大受益于使用多个数据库进行训练,并且在匹配和不匹配条件下与最先进的判别模型相比,都实现了卓越的性能。我们还表明,与常用于语音增强的采样器相比,基于 Heun 的采样器以更小的计算成本实现了卓越的性能。
更新日期:2023-12-06
中文翻译:
使用基于 Heun 的采样器在匹配和不匹配条件下进行基于扩散的语音增强
扩散模型是一类新型生成模型,最近已成功应用于语音增强。与最先进的判别模型相比,之前的作品已经证明了它们在不匹配条件下的优越性能。然而,这是使用一个用于训练的数据库和另一个用于测试的数据库进行调查的,这使得结果高度依赖于特定的数据库。此外,图像生成文献的最新进展在很大程度上仍未被探索用于语音增强。其中包括扩散模型的几个设计方面,例如噪声调度或反向采样器。在这项工作中,我们通过使用多个语音、噪声和双耳房间脉冲响应 (BRIR) 数据库来模拟不匹配的声学条件,系统地评估基于扩散的语音增强模型的泛化性能。我们还尝试了之前未应用于语音增强的噪声调度和采样器。我们表明,所提出的系统大大受益于使用多个数据库进行训练,并且在匹配和不匹配条件下与最先进的判别模型相比,都实现了卓越的性能。我们还表明,与常用于语音增强的采样器相比,基于 Heun 的采样器以更小的计算成本实现了卓越的性能。