Skip to main content
Log in

Multimodal Interactive Network for Sequential Recommendation

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Building an effective sequential recommendation system is still a challenging task due to limited interactions among users and items. Recent work has shown the effectiveness of incorporating textual or visual information into sequential recommendation to alleviate the data sparse problem. The data sparse problem now is attracting a lot of attention in both industry and academic community. However, considering interactions among modalities on a sequential scenario is an interesting yet challenging task because of multimodal heterogeneity. In this paper, we introduce a novel recommendation approach of considering both textual and visual information, namely Multimodal Interactive Network (MIN). The advantage of MIN lies in designing a learning framework to leverage the interactions among modalities from both the item level and the sequence level for building an efficient system. Firstly, an item-wise interactive layer based on the encoder-decoder mechanism is utilized to model the item-level interactions among modalities to select the informative information. Secondly, a sequence interactive layer based on the attention strategy is designed to capture the sequence-level preference of each modality. MIN seamlessly incorporates interactions among modalities from both the item level and the sequence level for sequential recommendation. It is the first time that interactions in each modality have been explicitly discussed and utilized in sequential recommenders. Experimental results on four real-world datasets show that our approach can significantly outperform all the baselines in sequential recommendation task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Hidasi B, Karatzoglou A, Baltrunas L, Tikk D. Session-based recommendations with recurrent neural networks. In Proc. the 4th International Conference on Learning Representations, May 2016.

  2. Quadrana M, Karatzoglou A, Hidasi B, Cremonesi P. Personalizing session-based recommendations with hierarchical recurrent neural networks. In Proc. the 11th ACM Conference on Recommender Systems, Aug. 2017, pp.130–137. https://doi.org/10.1145/3109859.3109896.

  3. Li J, Ren P J, Chen Z M, Ren Z C, Lian T, Ma J. Neural attentive session-based recommendation. In Proc. the 2017 ACM on Conference on Information and Knowledge Management, Nov. 2017, pp.1419–1428. https://doi.org/10.1145/3132847.3132926.

  4. He R J, McAuley J J. VBPR: Visual Bayesian personalized ranking from implicit feedback. In Proc. the 30th Conference on Artificial Intelligence, Feb. 2016, pp.144–150.

  5. McAuley J J, Targett C, Shi Q F, Van Den Hengel A. Image-based recommendations on styles and substitutes. In Proc. the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2015, pp.43–52. https://doi.org/10.1145/2766462.2767755.

  6. Lin Y J, Ren P J, Chen Z M, Ren Z C, Ma J, De Rijke M. Improving outfit recommendation with co-supervision of fashion generation. In Proc. the 2019 the World Wide Web Conference, May 2019, pp.1095–1105.. https://doi.org/10.1145/3308558.3313614.

  7. Bao Y, Fang H, Zhang J. TopicMF: Simultaneously exploiting ratings and reviews for recommendation. In Proc. the 28th Conference on Artificial Intelligence, Jul. 2014, pp.2–8.

  8. Chen X, Chen H X, Xu H T, Zhang Y F, Cao Y X, Qin Z, Zha H. Personalized fashion recommendation with visual explanations based on multimodal attention network: Towards visually explainable recommendation. In Proc. the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2019, pp.765–774. https://doi.org/10.1145/3331184.3331254.

  9. Li C L, Niu X C, Luo X Y, Chen Z Z, Quan C. A Review-driven neural model for sequential recommendation. In Proc. the 28th International Joint Conference on Artificial Intelligence, Aug. 2019, pp.2866–2872. https://doi.org/10.24963/ijcai.2019/397.

  10. Chen J Y, Zhang H W, He X N, Nie L Q, Liu W, Chua T S. Attentive collaborative filtering: Multimedia recommendation with item- and component-level attention. In Proc. the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2017, pp.335–344. https://doi.org/10.1145/3077136.3080797.

  11. Zhang Y F, Ai Q Y, Chen X, Croft W B. Joint representation learning for top-N recommendation with heterogeneous information sources. In Proc. the 2017 ACM on Conference on Information and Knowledge Management, Nov. 2017, pp.1449–1458. https://doi.org/10.1145/3132847.3132892.

  12. Wang P F, Guo J F, Lan Y Y, Xu J, Wan S X, Cheng X Q. Learning hierarchical representation model for nextbasket recommendation. In Proc. the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2015, pp.403–412. https://doi.org/10.1145/2766462.2767694.

  13. Yu F, Liu Q, Wu S, Wang L, Tan T N. A dynamic recurrent model for next basket recommendation. In Proc. the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2016, pp.729–732. https://doi.org/10.1145/2911451.2914683.

  14. Chen X, Xu H T, Zhang Y F, Tang J X, Cao Y X, Qin Z, Zha H Y. Sequential recommendation with user memory networks. In Proc. the 11th ACM International Conference on Web Search and Data Mining, Feb. 2018, pp.108–116. https://doi.org/10.1145/3159652.3159668.

  15. Tang J X, Wang K. Personalized top-N sequential recommendation via convolutional sequence embedding. In Proc. the 11th ACM International Conference on Web Search and Data Mining, Feb. 2018, pp.565–573. https://doi.org/10.1145/3159652.3159656.

  16. Yuan F J, Karatzoglou A, Arapakis I, Jose J M, He X N. A simple convolutional generative network for next item recommendation. In Proc. the 12th ACM International Conference on Web Search and Data Mining, Feb. 2019, pp.582–590. https://doi.org/10.1145/3289600.3290975.

  17. Wu S, Tang Y Y, Zhu Y Q, Wang L, Xie X, Tan T N. Session-based recommendation with graph neural networks. In Proc. the 33rd Conference on Artificial Intelligence, Jan. 2019, pp.346–353. https://doi.org/10.1609/aaai.v33i01.3301346.

  18. Qiu R H, Li J J, Huang Z, Yin H Z. Rethinking the item order in session-based recommendation with graph neural networks. In Proc. the 28th ACM International Conference on Information and Knowledge Management, Nov. 2019, pp.579–588. https://doi.org/10.1145/3357384.3358010.

  19. Qiu R H, Huang Z, Li J J, Yin H Z. Exploiting cross-session information for session-based recommendation with graph neural networks. ACM Transactions on Information Systems, 2020, 38(3): Article No. 22. https://doi.org/10.1145/3382764.

  20. Qiu R H, Yin H Z, Huang Z, Chen T. GAG: Global attributed graph neural network for streaming session-based recommendation. In Proc. the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2020, pp.669–678. https://doi.org/10.1145/3397271.3401109.

  21. Guo L, Yin H Z, Wang Q Y, Chen T, Zhou A, Hung N Q V. Streaming session-based recommendation. In Proc. the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Jul. 2019, pp.1569–1577. https://doi.org/10.1145/3292500.3330839.

  22. Huang J, Zhao W X, Dou H J, Wen J R, Chang E Y. Improving sequential recommendation with knowledge-enhanced memory networks. In Proc. the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Jul. 2018, pp.505–514. https://doi.org/10.1145/3209978.3210017.

  23. Kang W C, McAuley J J. Self-attentive sequential recommendation. In Proc. the 2018 IEEE International Conference on Data Mining, Nov. 2018, pp.197–206. https://doi.org/10.1109/ICDM.2018.00035.

  24. Sun F, Liu J, Wu J, Pei C H, Lin X, Ou W W, Jiang P. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proc. the 28th ACM International Conference on Information and Knowledge Management, Nov. 2019, pp.1441–1450. https://doi.org/10.1145/3357384.3357895.

  25. Zhang T T, Zhao P P, Liu Y C, Sheng V S, Xu J J, Wang D W, Liu G F, Zhou X F. Feature-level deeper self-attention network for sequential recommendation. In Proc. the 28th International Joint Conference on Artificial Intelligence, Aug. 2019, pp.4320–4326. https://doi.org/10.24963/ijcai.2019/600.

  26. Zhou K, Wang H, Zhao W X, Zhu Y T, Wang S R, Zhang F Z, Wang Z Y, Wen J R. S3-Rec: Self-supervised learning for sequential recommendation with mutual information maximization. In Proc. the 29th ACM International Conference on Information & Knowledge Management, Oct. 2020, pp.1893–1902. https://doi.org/10.1145/3340531.3411954.

  27. Rendle S, Freudenthaler C, Schmidt-Thieme L. Factorizing personalized Markov chains for next-basket recommendation. In Proc. the 19th International Conference on World Wide Web, Apr. 2010, pp.811–820. https://doi.org/10.1145/1772690.1772773.

  28. He X N, Chua T S. Neural factorization machines for sparse predictive analytics. In Proc. the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Aug. 2017, pp.355–364. https://doi.org/10.1145/3077136.3080777.

  29. Wang J L, Ding K Z, Hong L J, Liu H, Caverlee J. Next-item recommendation with sequential hypergraphs. In Proc. the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul. 2020, pp.1101–1110. https://doi.org/10.1145/3397271.3401133.

  30. Xia X, Yin H Z, Yu J L, Wang Q Y, Cui L Z, Zhang X L. Self-supervised hypergraph convolutional networks for session-based recommendation. In Proc. the 35th AAAI Conference on Artificial Intelligence, Feb. 2021, pp.4503–4511. https://doi.org/10.1609/aaai.v35i5.16578.

  31. Han X T, Wu Z X, Jiang Y G, Davis L S. Learning fashion compatibility with bidirectional LSTMs. In Proc. the 25th ACM International Conference on Multimedia, Oct. 2017, pp.1078–1086. https://doi.org/10.1145/3123266.3123394.

  32. Song X M, Feng F L, Liu J H, Li Z K, Nie L Q, Ma J. NeuroStylist: Neural compatibility modeling for clothing matching. In Proc. the 25th ACM International Conference on Multimedia, Oct. 2017, pp.753–761. https://doi.org/10.1145/3123266.3123314.

  33. Cheng Z Y, Ding Y, Zhu L, Kankanhalli M S. Aspect-aware latent factor model: Rating prediction with ratings and reviews. In Proc. the 2018 World Wide Web Conference, Apr. 2018, pp.639–648. https://doi.org/10.1145/3178876.3186145.

  34. He X N, Chen T, Kan M Y, Chen X. TriRank: Review-aware explainable recommendation by modeling aspects. In Proc. the 24th ACM International on Conference on Information and Knowledge Management, Oct. 2015, pp. 1661–1670. https://doi.org/10.1145/2806416.2806504.

  35. Zheng L, Noroozi V, Yu P S. Joint deep modeling of users and items using reviews for recommendation. In Proc. the 10th ACM International Conference on Web Search and Data Mining, Feb. 2017, pp.425–434. https://doi.org/10.1145/3018661.3018665.

  36. Kang W C, Fang C, Wang Z W, McAuley J. Visually-aware fashion recommendation and design with generative image models. In Proc. the 2017 IEEE International Conference on Data Mining, Nov. 2017, pp.207–216. https://doi.org/10.1109/ICDM.2017.30.

  37. Cui Q, Wu S, Liu Q, Zhong W, Wang L. MV-RNN: A multi-view recurrent neural network for sequential recommendation. IEEE Trans. Knowledge and Data Engineering, 2020, 32(2): 317-331. https://doi.org/10.1109/TKDE.2018.2881260.

    Article  Google Scholar 

  38. Van Den Oord A, Li Y Z, Vinyals O. Representation learning with contrastive predictive coding. arXiv: 1807.03748, 2019. https://arxiv.org/abs/1807.03748, Jul. 2023.

  39. Hjelm R D, Fedorov A, Lavoie-Marchildon S, Grewal K, Bachman P, Trischler A, Bengio Y. Learning deep representations by mutual information estimation and maximization. In Proc. the 7th International Conference on Learning Representations, May 2019.

  40. Zhang R, Isola P, Efros A A. Colorful image colorization. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.649–666. https://doi.org/10.1007/978-3-319-46487-9_40.

  41. Devlin J, Chang M W, Lee K, Toutanova K. BERT: Pretraining of deep bidirectional transformers for language understanding. In Proc. the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun. 2019, pp.4171–4186. https://doi.org/10.18653/v1/n19-1423.

  42. Kong L P, D’Autume C D M, Yu L, Ling W, Dai Z H, Yogatama D. A mutual information maximization perspective of language representation learning. In Proc. the 8th International Conference on Learning Representations, Apr. 2020.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng-Fei Wang.

Supplementary Information

ESM 1

(PDF 126 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, TY., Wang, PF. & Niu, SZ. Multimodal Interactive Network for Sequential Recommendation. J. Comput. Sci. Technol. 38, 911–926 (2023). https://doi.org/10.1007/s11390-022-1152-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-022-1152-7

Keywords

Navigation