当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Picture Is Worth a Graph: Blueprint Debate on Graph for Multimodal Reasoning
arXiv - CS - Multimedia Pub Date : 2024-03-22 , DOI: arxiv-2403.14972
Changmeng Zheng, Dayong Liang, Wengyu Zhang, Xiao-Yong Wei, Tat-Seng Chua, Qing Li

This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the issue, we propose a deductive (top-down) debating approach called Blueprint Debate on Graphs (BDoG). In BDoG, debates are confined to a blueprint graph to prevent opinion trivialization through world-level summarization. Moreover, by storing evidence in branches within the graph, BDoG mitigates distractions caused by frequent but irrelevant concepts. Extensive experiments validate BDoG, achieving state-of-the-art results in Science QA and MMBench with significant improvements over previous methods.

中文翻译:

一图胜一图:多模态推理图的蓝图辩论

本文提出了一项旨在将多智能体辩论引入多模态推理的试点研究。该研究解决了两个关键挑战:过度概括导致的观点的琐碎化以及图像引入的干扰概念造成的焦点转移。这些挑战源于现有辩论方案的归纳(自下而上)性质。为了解决这个问题,我们提出了一种演绎(自上而下)的辩论方法,称为图上的蓝图辩论(BDoG)。在 BDoG 中,辩论仅限于蓝图,以防止通过世界级总结而使观点变得琐碎。此外,通过将证据存储在图中的分支中,BDoG 减少了由频繁但不相关的概念引起的干扰。大量实验验证了 BDoG,在 Science QA 和 MMBench 中取得了最先进的结果,并且比以前的方法有了显着改进。
更新日期:2024-03-25
down
wechat
bug