当前位置: X-MOL 学术ACM Trans. Knowl. Discov. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the Value of Head Labels in Multi-Label Text Classification
ACM Transactions on Knowledge Discovery from Data ( IF 3.6 ) Pub Date : 2024-03-26 , DOI: 10.1145/3643853
Haobo Wang 1 , Cheng Peng 2 , Hede Dong 2 , Lei Feng 3 , Weiwei Liu 4 , Tianlei Hu 2 , Ke Chen 2 , Gang Chen 2
Affiliation  

A formidable challenge in the multi-label text classification (MLTC) context is that the labels often exhibit a long-tailed distribution, which typically prevents deep MLTC models from obtaining satisfactory performance. To alleviate this problem, most existing solutions attempt to improve tail performance by means of sampling or introducing extra knowledge. Data-rich labels, though more trustworthy, have not received the attention they deserve. In this work, we propose a multiple-stage training framework to exploit both model- and feature-level knowledge from the head labels, to improve both the representation and generalization ability of MLTC models. Moreover, we theoretically prove the superiority of our framework design over other alternatives. Comprehensive experiments on widely used MLTC datasets clearly demonstrate that the proposed framework achieves highly superior results to state-of-the-art methods, highlighting the value of head labels in MLTC.



中文翻译:

论头标签在多标签文本分类中的价值

多标签文本分类 (MLTC) 环境中的一个巨大挑战是标签通常呈现长尾分布,这通常会阻止深度 MLTC 模型获得令人满意的性能。为了缓解这个问题,大多数现有的解决方案试图通过采样或引入额外的知识来提高尾部性能。数据丰富的标签虽然更值得信赖,但尚未得到应有的关注。在这项工作中,我们提出了一个多阶段训练框架,以利用头部标签的模型级和特征级知识,以提高 MLTC 模型的表示和泛化能力。此外,我们从理论上证明了我们的框架设计相对于其他替代方案的优越性。对广泛使用的 MLTC 数据集进行的综合实验清楚地表明,所提出的框架取得了优于最先进方法的结果,凸显了 MLTC 中头部标签的价值。

更新日期:2024-03-27
down
wechat
bug