An Empirical Study of Training ID-Agnostic Multi-modal Sequential Recommenders,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Empirical Study of Training ID-Agnostic Multi-modal Sequential Recommenders
arXiv - CS - Information Retrieval Pub Date : 2024-03-26 , DOI: arxiv-2403.17372
Youhua Li, Hanwen Du, Yongxin Ni, Yuanqi He, Junchen Fu, Xiangyan Liu, Qi Guo

Sequential Recommendation (SR) aims to predict future user-item interactions based on historical interactions. While many SR approaches concentrate on user IDs and item IDs, the human perception of the world through multi-modal signals, like text and images, has inspired researchers to delve into constructing SR from multi-modal information without using IDs. However, the complexity of multi-modal learning manifests in diverse feature extractors, fusion methods, and pre-trained models. Consequently, designing a simple and universal \textbf{M}ulti-\textbf{M}odal \textbf{S}equential \textbf{R}ecommendation (\textbf{MMSR}) framework remains a formidable challenge. We systematically summarize the existing multi-modal related SR methods and distill the essence into four core components: visual encoder, text encoder, multimodal fusion module, and sequential architecture. Along these dimensions, we dissect the model designs, and answer the following sub-questions: First, we explore how to construct MMSR from scratch, ensuring its performance either on par with or exceeds existing SR methods without complex techniques. Second, we examine if MMSR can benefit from existing multi-modal pre-training paradigms. Third, we assess MMSR's capability in tackling common challenges like cold start and domain transferring. Our experiment results across four real-world recommendation scenarios demonstrate the great potential ID-agnostic multi-modal sequential recommendation. Our framework can be found at: https://github.com/MMSR23/MMSR.

中文翻译：

训练与 ID 无关的多模态序列推荐器的实证研究

顺序推荐（SR）旨在根据历史交互来预测未来的用户-项目交互。虽然许多 SR 方法专注于用户 ID 和项目 ID，但人类通过文本和图像等多模态信号对世界的感知激发了研究人员深入研究在不使用 ID 的情况下从多模态信息构建 SR。然而，多模态学习的复杂性体现在不同的特征提取器、融合方法和预训练模型上。因此，设计一个简单且通用的 \textbf{M}ulti-\textbf{M}odal \textbf{S}equential \textbf{R} 推荐（\textbf{MMSR}）框架仍然是一个艰巨的挑战。我们系统地总结了现有的多模态相关的SR方法，并将其精华提炼为四个核心组件：视觉编码器、文本编码器、多模态融合模块和顺序架构。沿着这些维度，我们剖析模型设计，并回答以下子问题：首先，我们探索如何从头开始构建 MMSR，确保其性能与现有 SR 方法相当或超过现有 SR 方法，而无需复杂的技术。其次，我们研究 MMSR 是否可以从现有的多模态预训练范例中受益。第三，我们评估 MMSR 应对冷启动和域转移等常见挑战的能力。我们在四个现实世界推荐场景中的实验结果证明了与 ID 无关的多模式顺序推荐的巨大潜力。我们的框架可以在：https://github.com/MMSR23/MMSR 找到。

更新日期：2024-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>