当前位置: X-MOL 学术Inform. Fusion › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Bimodal semantic fusion prototypical network for few-shot classification
Information Fusion ( IF 18.6 ) Pub Date : 2024-04-15 , DOI: 10.1016/j.inffus.2024.102421
Xilang Huang , Seon Han Choi

Few-shot classification learns from a small number of image samples to recognize unseen images. Recent few-shot learning exploits auxiliary text information, such as class labels and names, to obtain more discriminative class prototypes. However, most existing approaches rarely consider using text information as a clue to highlight important feature regions and do not consider feature alignment between prototypes and targets, leading to prototype ambiguity owing to information gaps. To address this issue, a prototype generator module was developed to perform interactions between the text knowledge of the class name and visual feature maps in the spatial and channel dimensions. This module learns how to assign mixture weights to essential regions of each sample feature to obtain informative prototypes. In addition, a feature refinement module was proposed to embed text information into query images without knowing their labels. It generates attention from concatenated features between query and text features through pairwise distance loss. To improve the alignment between the prototype and relevant targets, a prototype calibration module was designed to preserve the important features of the prototype by considering the interrelationships between the prototype and query features. Extensive experiments were conducted on five few-shot classification benchmarks, and the results demonstrated the superiority of the proposed method over state-of-the-art methods in 1-shot and 5-shot settings.

中文翻译:

用于少样本分类的双模态语义融合原型网络

少镜头分类从少量图像样本中学习以识别未见过的图像。最近的几次学习利用辅助文本信息(例如类标签和名称)来获得更具辨别力的类原型。然而,大多数现有方法很少考虑使用文本信息作为突出重要特征区域的线索,并且没有考虑原型和目标之间的特征对齐,导致由于信息差距而导致原型模糊。为了解决这个问题,开发了一个原型生成器模块来执行类名称的文本知识与空间和通道维度中的视觉特征图之间的交互。该模块学习如何将混合权重分配给每个样本特征的基本区域,以获得信息丰富的原型。此外,还提出了一个特征细化模块,将文本信息嵌入到查询图像中,而无需知道其标签。它通过成对距离损失从查询和文本特征之间的串联特征产生注意力。为了提高原型与相关目标之间的对准度,设计了原型校准模块,通过考虑原型与查询特征之间的相互关系来保留原型的重要特征。在五个少样本分类基准上进行了大量实验,结果证明了所提出的方法在 1 样本和 5 样本设置中优于最先进的方法。
更新日期:2024-04-15
down
wechat
bug