ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning
arXiv - CS - Sound Pub Date : 2023-12-11 , DOI: arxiv-2312.06118
Xincheng Yu, Dongyue Guo, Jianwei Zhang, Yi Lin

Radio speech echo is a specific phenomenon in the air traffic control (ATC) domain, which degrades speech quality and further impacts automatic speech recognition (ASR) accuracy. In this work, a recognition-oriented speech enhancement (ROSE) framework is proposed to improve speech intelligibility and also advance ASR accuracy, which serves as a plug-and-play tool in ATC scenarios and does not require additional retraining of the ASR model. Specifically, an encoder-decoder-based U-Net framework is proposed to eliminate the radio speech echo based on the real-world collected corpus. By incorporating the SE-oriented and ASR-oriented loss, ROSE is implemented in a multi-objective manner by learning shared representations across the two optimization objectives. An attention-based skip-fusion (ABSF) mechanism is applied to skip connections to refine the features. A channel and sequence attention (CSAtt) block is innovatively designed to guide the model to focus on informative representations and suppress disturbing features. The experimental results show that the ROSE significantly outperforms other state-of-the-art methods for both the SE and ASR tasks. In addition, the proposed approach can contribute to the desired performance improvements on public datasets.

中文翻译：

ROSE：使用多目标学习的空中交通管制中面向识别的语音增强框架

无线电语音回声是空中交通管制 (ATC) 领域的一种特殊现象，它会降低语音质量并进一步影响自动语音识别 (ASR) 的准确性。在这项工作中，提出了一种面向识别的语音增强（ROSE）框架来提高语音清晰度并提高 ASR 准确性，该框架可作为 ATC 场景中的即插即用工具，并且不需要对 ASR 模型进行额外的重新训练。具体来说，基于现实世界收集的语料库，提出了一种基于编码器-解码器的U-Net框架来消除无线电语音回声。通过结合面向SE和面向ASR的损失，ROSE通过学习跨两个优化目标的共享表示以多目标方式实现。基于注意力的跳跃融合（ABSF）机制应用于跳跃连接以细化特征。通道和序列注意（CSAtt）块被创新地设计来引导模型专注于信息表示并抑制干扰特征。实验结果表明，ROSE 在 SE 和 ASR 任务上都明显优于其他最先进的方法。此外，所提出的方法可以有助于公共数据集的预期性能改进。

更新日期：2023-12-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>