当前位置: X-MOL 学术ACM Trans. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding
ACM Transactions on Information Systems ( IF 5.6 ) Pub Date : 2024-04-27 , DOI: 10.1145/3652599
Yunchang Zhu 1 , Liang Pang 2 , Kangxi Wu 1 , Yanyan Lan 3 , Huawei Shen 1 , Xueqi Cheng 1
Affiliation  

Current natural language understanding (NLU) models have been continuously scaling up, both in terms of model size and input context, introducing more hidden and input neurons. While this generally improves performance on average, the extra neurons do not yield a consistent improvement for all instances. This is because some hidden neurons are redundant, and the noise mixed in input neurons tends to distract the model. Previous work mainly focuses on extrinsically reducing low-utility neurons by additional post- or pre-processing, such as network pruning and context selection, to avoid this problem. Beyond that, can we make the model reduce redundant parameters and suppress input noise by intrinsically enhancing the utility of each neuron? If a model can efficiently utilize neurons, no matter which neurons are ablated (disabled), the ablated submodel should perform no better than the original full model. Based on such a comparison principle between models, we propose a cross-model comparative loss for a broad range of tasks. Comparative loss is essentially a ranking loss on top of the task-specific losses of the full and ablated models, with the expectation that the task-specific loss of the full model is minimal. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from three distinct NLU tasks based on five widely used pre-trained language models and find it particularly superior for models with few parameters or long input.



中文翻译:

增强神经元在语言理解中的效用的跨模型比较损失

当前的自然语言理解(NLU)模型在模型大小和输入上下文方面都在不断扩大,引入了更多的隐藏和输入神经元。虽然这通常会提高平均性能,但额外的神经元并不能为所有实例带来一致的改进。这是因为一些隐藏神经元是多余的,并且输入神经元中混合的噪声往往会分散模型的注意力。以前的工作主要集中在通过额外的后处理或预处理(例如网络修剪和上下文选择)从外部减少低效用神经元,以避免这个问题。除此之外,我们能否通过本质上增强每个神经元的效用来使模型减少冗余参数并抑制输入噪声?如果模型可以有效地利用神经元,则无论哪些神经元被消融(禁用),消融的子模型的性能都不会比原始完整模型更好。基于模型之间的这种比较原则,我们提出了针对广泛任务的跨模型比较损失。比较损失本质上是完整模型和消融模型的特定于任务损失之上的排名损失,期望完整模型的特定于任务损失是最小的。我们通过对来自三个不同 NLU 任务(基于五种广泛使用的预训练语言模型)的 14 个数据集进行广泛实验,证明了比较损失的普遍有效性,并发现它对于参数较少或输入较长的模型特别优越。

更新日期:2024-04-27
down
wechat
bug