当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning
arXiv - CS - Sound Pub Date : 2023-12-11 , DOI: arxiv-2312.06118 Xincheng Yu, Dongyue Guo, Jianwei Zhang, Yi Lin
arXiv - CS - Sound Pub Date : 2023-12-11 , DOI: arxiv-2312.06118 Xincheng Yu, Dongyue Guo, Jianwei Zhang, Yi Lin
Radio speech echo is a specific phenomenon in the air traffic control (ATC)
domain, which degrades speech quality and further impacts automatic speech
recognition (ASR) accuracy. In this work, a recognition-oriented speech
enhancement (ROSE) framework is proposed to improve speech intelligibility and
also advance ASR accuracy, which serves as a plug-and-play tool in ATC
scenarios and does not require additional retraining of the ASR model.
Specifically, an encoder-decoder-based U-Net framework is proposed to eliminate
the radio speech echo based on the real-world collected corpus. By
incorporating the SE-oriented and ASR-oriented loss, ROSE is implemented in a
multi-objective manner by learning shared representations across the two
optimization objectives. An attention-based skip-fusion (ABSF) mechanism is
applied to skip connections to refine the features. A channel and sequence
attention (CSAtt) block is innovatively designed to guide the model to focus on
informative representations and suppress disturbing features. The experimental
results show that the ROSE significantly outperforms other state-of-the-art
methods for both the SE and ASR tasks. In addition, the proposed approach can
contribute to the desired performance improvements on public datasets.
中文翻译:
ROSE:使用多目标学习的空中交通管制中面向识别的语音增强框架
无线电语音回声是空中交通管制 (ATC) 领域的一种特殊现象,它会降低语音质量并进一步影响自动语音识别 (ASR) 的准确性。在这项工作中,提出了一种面向识别的语音增强(ROSE)框架来提高语音清晰度并提高 ASR 准确性,该框架可作为 ATC 场景中的即插即用工具,并且不需要对 ASR 模型进行额外的重新训练。具体来说,基于现实世界收集的语料库,提出了一种基于编码器-解码器的U-Net框架来消除无线电语音回声。通过结合面向SE和面向ASR的损失,ROSE通过学习跨两个优化目标的共享表示以多目标方式实现。基于注意力的跳跃融合(ABSF)机制应用于跳跃连接以细化特征。通道和序列注意(CSAtt)块被创新地设计来引导模型专注于信息表示并抑制干扰特征。实验结果表明,ROSE 在 SE 和 ASR 任务上都明显优于其他最先进的方法。此外,所提出的方法可以有助于公共数据集的预期性能改进。
更新日期:2023-12-12
中文翻译:
ROSE:使用多目标学习的空中交通管制中面向识别的语音增强框架
无线电语音回声是空中交通管制 (ATC) 领域的一种特殊现象,它会降低语音质量并进一步影响自动语音识别 (ASR) 的准确性。在这项工作中,提出了一种面向识别的语音增强(ROSE)框架来提高语音清晰度并提高 ASR 准确性,该框架可作为 ATC 场景中的即插即用工具,并且不需要对 ASR 模型进行额外的重新训练。具体来说,基于现实世界收集的语料库,提出了一种基于编码器-解码器的U-Net框架来消除无线电语音回声。通过结合面向SE和面向ASR的损失,ROSE通过学习跨两个优化目标的共享表示以多目标方式实现。基于注意力的跳跃融合(ABSF)机制应用于跳跃连接以细化特征。通道和序列注意(CSAtt)块被创新地设计来引导模型专注于信息表示并抑制干扰特征。实验结果表明,ROSE 在 SE 和 ASR 任务上都明显优于其他最先进的方法。此外,所提出的方法可以有助于公共数据集的预期性能改进。