当前位置: X-MOL 学术arXiv.cs.SD › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
arXiv - CS - Sound Pub Date : 2024-03-26 , DOI: arxiv-2403.17420
Dongjin Kim, Sung Jin Um, Sangmin Lee, Jung Uk Kim

The goal of the multi-sound source localization task is to localize sound sources from the mixture individually. While recent multi-sound source localization methods have shown improved performance, they face challenges due to their reliance on prior information about the number of objects to be separated. In this paper, to overcome this limitation, we present a novel multi-sound source localization method that can perform localization without prior knowledge of the number of sound sources. To achieve this goal, we propose an iterative object identification (IOI) module, which can recognize sound-making objects in an iterative manner. After finding the regions of sound-making objects, we devise object similarity-aware clustering (OSC) loss to guide the IOI module to effectively combine regions of the same object but also distinguish between different objects and backgrounds. It enables our method to perform accurate localization of sound-making objects without any prior knowledge. Extensive experimental results on the MUSIC and VGGSound benchmarks show the significant performance improvements of the proposed method over the existing methods for both single and multi-source. Our code is available at: https://github.com/VisualAIKHU/NoPrior_MultiSSL

中文翻译:

学习在没有先验声源知识的情况下从混合物中视觉定位声源

多声源定位任务的目标是单独定位混合物中的声源。虽然最近的多声源定位方法已显示出改进的性能,但由于依赖于有关要分离的对象数量的先验信息,它们面临着挑战。在本文中,为了克服这一限制,我们提出了一种新颖的多声源定位方法,该方法可以在不事先知道声源数量的情况下进行定位。为了实现这一目标,我们提出了一种迭代对象识别(IOI)模块,它可以以迭代方式识别发声对象。在找到发声对象的区域后,我们设计了对象相似性感知聚类(OSC)损失来指导 IOI 模块有效地组合同一对象的区域,同时区分不同的对象和背景。它使我们的方法能够在没有任何先验知识的情况下对发声对象进行准确的定位。 MUSIC 和 VGGSound 基准上的大量实验结果表明,与现有的单源和多源方法相比,所提出的方法具有显着的性能改进。我们的代码位于:https://github.com/VisualAIKHU/NoPrior_MultiSSL
更新日期:2024-03-28
down
wechat
bug