Deconfounded Cross-modal Matching for Content-based Micro-video Background Music Recommendation,ACM Transactions on Intelligent Systems and Technology

当前位置： X-MOL 学术 › ACM Trans. Intell. Syst. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deconfounded Cross-modal Matching for Content-based Micro-video Background Music Recommendation
ACM Transactions on Intelligent Systems and Technology ( IF 5 ) Pub Date : 2024-04-15 , DOI: 10.1145/3650042
Jing Yi ₁ , Zhenzhong Chen ₁

Affiliation

Object-oriented micro-video background music recommendation is a complicated task where the matching degree between videos and background music is a major issue. However, music selections in user-generated content (UGC) are prone to selection bias caused by historical preferences of uploaders. Since historical preferences are not fully reliable and may reflect obsolete behaviors, over-reliance on them should be avoided as knowledge and interests dynamically evolve. In this article, we propose a Deconfounded Cross-Modal matching model to mitigate such bias. Specifically, uploaders’ personal preferences of music genres are identified as confounders that spuriously correlate music embeddings and background music selections, causing the learned system to over-recommend music from majority groups. To resolve such confounders, backdoor adjustment is utilized to deconfound the spurious correlation between music embeddings and prediction scores. We further utilize Monte Carlo estimator with batch-level average as the approximations to avoid integrating the entire confounder space calculated by the adjustment. Furthermore, we design a teacher–student network to utilize the matching of music videos, which is professionally generated content (PGC) with specialized matching, to better recommend content-matching background music. The PGC data are modeled by a teacher network to guide the matching of uploader-selected UGC data of student network by Kullback–Leibler–based knowledge transfer. Extensive experiments on the TT-150k-genre dataset demonstrate the effectiveness of the proposed method. The code is publicly available on https://github.com/jing-1/DecCM

中文翻译：

基于内容的微视频背景音乐推荐的解混跨模态匹配

面向对象的微视频背景音乐推荐是一项复杂的任务，其中视频与背景音乐的匹配度是一个主要问题。然而，用户生成内容（UGC）中的音乐选择很容易出现由上传者的历史偏好引起的选择偏差。由于历史偏好并不完全可靠，并且可能反映过时的行为，因此随着知识和兴趣的动态发展，应避免过度依赖它们。在本文中，我们提出了一种去混杂的跨模态匹配模型来减轻这种偏差。具体来说，上传者对音乐流派的个人偏好被认为是混杂因素，这些混杂因素将音乐嵌入和背景音乐选择虚假关联，导致学习系统过度推荐来自大多数群体的音乐。为了解决此类混杂因素，利用后门调整来消除音乐嵌入和预测分数之间的虚假相关性。我们进一步利用蒙特卡罗估计量和批次级平均值作为近似值，以避免对调整计算出的整个混杂空间进行积分。此外，我们设计了一个师生网络，利用音乐视频的匹配，即具有专门匹配的专业生成内容（PGC），来更好地推荐内容匹配的背景音乐。 PGC 数据由教师网络建模，通过基于 Kullback-Leibler 的知识转移来指导上传者选择的学生网络 UGC 数据的匹配。在 TT-150k-genre 数据集上的大量实验证明了该方法的有效性。该代码可在 https://github.com/jing-1/DecCM 上公开获取

更新日期：2024-04-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>