当前位置: X-MOL 学术Appl. Acoust. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation
Applied Acoustics ( IF 3.4 ) Pub Date : 2024-02-28 , DOI: 10.1016/j.apacoust.2024.109929
Zirui Ge , Xinzhou Xu , Haiyan Guo , Tingting Wang , Zhen Yang

The emergence of self-supervised representation (i.e., wav2vec 2.0) allows speaker-recognition approaches to process spoken signals through foundation models built on speech data. Nevertheless, effective fusion on the representation requires further investigating, due to the inclusion of fixed or sub-optimal temporal pooling strategies. Despite of improved strategies considering graph learning and graph attention factors, non-injective aggregation still exists in the approaches, which may influence the performance for speaker recognition. In this regard, we propose a speaker recognition approach using Isomorphic Graph ATtention network (IsoGAT) on self-supervised representation. The proposed approach contains three modules of representation learning, graph attention, and aggregation, jointly considering learning on the self-supervised representation and the IsoGAT. Then, we perform experiments for speaker recognition tasks on VoxCeleb1&2 datasets, with the corresponding experimental results demonstrating the recognition performance for the proposed approach, compared with existing pooling approaches on the self-supervised representation.

中文翻译:

使用基于自监督表示的同构图注意网络池化的说话人识别

自监督表示(即 wav2vec 2.0)的出现允许说话人识别方法通过基于语音数据构建的基础模型来处理语音信号。然而,由于包含固定或次优的时间池策略,对表示的有效融合需要进一步研究。尽管考虑了图学习和图注意因素的改进策略,但这些方法中仍然存在非内射聚合,这可能会影响说话人识别的性能。在这方面,我们提出了一种使用同构图注意力网络(IsoGAT)进行自我监督表示的说话人识别方法。该方法包含表示学习、图注意力和聚合三个模块,共同考虑自监督表示和 IsoGAT 的学习。然后,我们在 VoxCeleb1&2 数据集上进行说话人识别任务的实验,相应的实验结果证明了所提出方法的识别性能,与现有的自监督表示的池化方法相比。
更新日期:2024-02-28
down
wechat
bug