EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention,arXiv - CS - Machine Learning

当前位置： X-MOL 学术 › arXiv.cs.LG › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention
arXiv - CS - Machine Learning Pub Date : 2024-03-26 , DOI: arxiv-2403.17729
Zhen Tian, Wayne Xin Zhao, Changwang Zhang, Xin Zhao, Zhongrui Ma, Ji-Rong Wen

To capture user preference, transformer models have been widely applied to model sequential user behavior data. The core of transformer architecture lies in the self-attention mechanism, which computes the pairwise attention scores in a sequence. Due to the permutation-equivariant nature, positional encoding is used to enhance the attention between token representations. In this setting, the pairwise attention scores can be derived by both semantic difference and positional difference. However, prior studies often model the two kinds of difference measurements in different ways, which potentially limits the expressive capacity of sequence modeling. To address this issue, this paper proposes a novel transformer variant with complex vector attention, named EulerFormer, which provides a unified theoretical framework to formulate both semantic difference and positional difference. The EulerFormer involves two key technical improvements. First, it employs a new transformation function for efficiently transforming the sequence tokens into polar-form complex vectors using Euler's formula, enabling the unified modeling of both semantic and positional information in a complex rotation form.Secondly, it develops a differential rotation mechanism, where the semantic rotation angles can be controlled by an adaptation function, enabling the adaptive integration of the semantic and positional information according to the semantic contexts.Furthermore, a phase contrastive learning task is proposed to improve the anisotropy of contextual representations in EulerFormer. Our theoretical framework possesses a high degree of completeness and generality. It is more robust to semantic variations and possesses moresuperior theoretical properties in principle. Extensive experiments conducted on four public datasets demonstrate the effectiveness and efficiency of our approach.

中文翻译：

EulerFormer：具有复杂向量注意力的顺序用户行为建模

为了捕获用户偏好，变压器模型已被广泛应用于对顺序用户行为数据进行建模。 Transformer 架构的核心在于自注意力机制，它计算序列中的成对注意力分数。由于排列等变的性质，位置编码用于增强标记表示之间的注意力。在这种情况下，成对注意力分数可以通过语义差异和位置差异得出。然而，先前的研究经常以不同的方式对两种差异测量进行建模，这可能限制了序列建模的表达能力。为了解决这个问题，本文提出了一种具有复杂向量注意力的新型变压器变体，名为 EulerFormer，它提供了一个统一的理论框架来制定语义差异和位置差异。 EulerFormer 涉及两项关键的技术改进。首先，它采用了一种新的变换函数，可以使用欧拉公式将序列标记有效地变换为极坐标形式的复向量，从而能够以复旋转形式对语义和位置信息进行统一建模。其次，它开发了一种差分旋转机制，其中语义旋转角度可以通过自适应函数控制，从而能够根据语义上下文自适应地集成语义和位置信息。此外，提出了相位对比学习任务来改善EulerFormer中上下文表示的各向异性。我们的理论框架具有高度的完整性和通用性。它对语义变化更加鲁棒，原则上具有更优越的理论特性。在四个公共数据集上进行的广泛实验证明了我们方法的有效性和效率。

更新日期：2024-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>