当前位置: X-MOL 学术arXiv.cs.AI › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolation
arXiv - CS - Artificial Intelligence Pub Date : 2024-03-26 , DOI: arxiv-2403.17698
Weiguo Gao

When the predicted sequence length exceeds the length seen during training, the transformer's inference accuracy diminishes. Existing relative position encoding methods, such as those based on the ALiBi technique, address the length extrapolation challenge exclusively through the implementation of a single kernel function, which introduces a constant bias to every post-softmax attention scores according to their distance. These approaches do not investigate or employ multiple kernel functions to address the extrapolation challenge. Drawing on the ALiBi approach, this study proposes a novel relative positional encoding method, called MEP, which employs a weighted average to combine distinct kernel functions(such as the exponential kernel and the Gaussian kernel) to generate a bias that is applied to post-softmax attention scores. Initially, the framework utilizes various kernel functions to construct multiple kernel functions. Each kernel function adheres to a consistent mean weight coefficient, harnessing the synergistic advantages of different kernels to formulate an innovative bias function. Subsequently, specific slopes are tailored for each kernel function, applying penalties at varying rates, to enhance the model's extrapolation capabilities. Finally, this bias is seamlessly incorporated as a penalty to the post-softmax scores. We present two distinct versions of our method: a parameter-free variant that requires no new learnable parameters, which enhances length extrapolation capabilities without compromising training efficiency, and a parameterized variant capable of integrating state-of-the-art techniques. Empirical evaluations across diverse datasets have demonstrated that both variants of our method achieve state-of-the-art performance, outperforming traditional parameter-free and parameterized approaches.

中文翻译:

MEP:多内核学习增强相对位置编码长度外推

当预测的序列长度超过训练期间看到的长度时,变压器的推理精度会降低。现有的相对位置编码方法,例如基于 ALiBi 技术的方法,专门通过实现单个核函数来解决长度外推挑战,该函数根据距离为每个 post-softmax 注意力分数引入恒定偏差。这些方法不研究或采用多个核函数来解决外推挑战。借鉴 ALiBi 方法,本研究提出了一种新颖的相对位置编码方法,称为 MEP,它采用加权平均来组合不同的核函数(例如指数核和高斯核)来生成应用于后置的偏差。 Softmax 注意力分数。最初,该框架利用各种核函数来构造多个核函数。每个核函数都遵循一致的平均权重系数,利用不同核的协同优势来制定创新的偏差函数。随后,为每个核函数定制特定的斜率,以不同的速率应用惩罚,以增强模型的外推能力。最后,这种偏差被无缝地纳入到 post-softmax 分数的惩罚中。我们提出了我们的方法的两个不同版本:无参数变体,不需要新的可学习参数,可以在不影响训练效率的情况下增强长度外推能力,以及能够集成最先进技术的参数化变体。对不同数据集的实证评估表明,我们方法的两种变体都实现了最先进的性能,优于传统的无参数和参数化方法。
更新日期:2024-03-28
down
wechat
bug