Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents,arXiv - CS - Information Retrieval

当前位置： X-MOL 学术 › arXiv.cs.IR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
arXiv - CS - Information Retrieval Pub Date : 2024-03-23 , DOI: arxiv-2403.15765
Hao Wang, Tang Li, Chenhui Chu, Nengjun Zhu, Rui Wang, Pinpin Zhu

Key-value relations are prevalent in Visually-Rich Documents (VRDs), often depicted in distinct spatial regions accompanied by specific color and font styles. These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets. However, current document AI approaches often fail to consider this valuable prior information related to visual and spatial features, resulting in suboptimal performance, particularly when dealing with limited examples. To address this limitation, our research focuses on few-shot relational learning, specifically targeting the extraction of key-value relation triplets in VRDs. Given the absence of a suitable dataset for this task, we introduce two new few-shot benchmarks built upon existing supervised benchmark datasets. Furthermore, we propose a variational approach that incorporates relational 2D-spatial priors and prototypical rectification techniques. This approach aims to generate relation representations that are more aware of the spatial context and unseen relation in a manner similar to human perception. Experimental results demonstrate the effectiveness of our proposed method by showcasing its ability to outperform existing methods. This study also opens up new possibilities for practical applications.

中文翻译：

迈向类人机器理解：视觉丰富的文档中的少量关系学习

键值关系在视觉丰富的文档 (VRD) 中很普遍，通常在不同的空间区域中描述，并伴有特定的颜色和字体样式。这些非文本线索作为重要指标，极大地增强了人类对此类关系三元组的理解和习得。然而，当前的文档人工智能方法通常无法考虑与视觉和空间特征相关的有价值的先验信息，导致性能不佳，特别是在处理有限的示例时。为了解决这个限制，我们的研究重点是小样本关系学习，特别是针对 VRD 中键值关系三元组的提取。鉴于缺乏适合此任务的数据集，我们引入了两个基于现有监督基准数据集构建的新的小样本基准。此外，我们提出了一种结合了关系二维空间先验和原型校正技术的变分方法。这种方法旨在生成关系表示，以类似于人类感知的方式更了解空间上下文和看不见的关系。实验结果证明了我们提出的方法的有效性，展示了其优于现有方法的能力。这项研究还为实际应用开辟了新的可能性。

更新日期：2024-03-27

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>