当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
BioVL-QR: Egocentric Biochemical Video-and-Language Dataset Using Micro QR Codes
arXiv - CS - Multimedia Pub Date : 2024-04-04 , DOI: arxiv-2404.03161
Taichi Nishimura, Koki Yamamoto, Yuto Haneji, Keiya Kajimura, Chihiro Nishiwaki, Eriko Daikoku, Natsuko Okuda, Fumihito Ono, Hirotaka Kameko, Shinsuke Mori

This paper introduces a biochemical vision-and-language dataset, which consists of 24 egocentric experiment videos, corresponding protocols, and video-and-language alignments. The key challenge in the wet-lab domain is detecting equipment, reagents, and containers is difficult because the lab environment is scattered by filling objects on the table and some objects are indistinguishable. Therefore, previous studies assume that objects are manually annotated and given for downstream tasks, but this is costly and time-consuming. To address this issue, this study focuses on Micro QR Codes to detect objects automatically. From our preliminary study, we found that detecting objects only using Micro QR Codes is still difficult because the researchers manipulate objects, causing blur and occlusion frequently. To address this, we also propose a novel object labeling method by combining a Micro QR Code detector and an off-the-shelf hand object detector. As one of the applications of our dataset, we conduct the task of generating protocols from experiment videos and find that our approach can generate accurate protocols.

中文翻译:

BioVL-QR:使用微型 QR 码的以自我为中心的生化视频和语言数据集

本文介绍了一个生化视觉和语言数据集,该数据集由 24 个以自我为中心的实验视频、相应的协议以及视频和语言对齐组成。湿实验室领域的主要挑战是检测设备、试剂和容器很困难,因为实验室环境因桌子上的填充物体而分散,并且某些物体无法区分。因此,之前的研究假设对象是手动注释并给出下游任务的,但这是昂贵且耗时的。为了解决这个问题,本研究重点关注微型 QR 码来自动检测物体。从我们的初步研究来看,我们发现仅使用微二维码检测物体仍然很困难,因为研究人员操纵物体,经常导致模糊和遮挡。为了解决这个问题,我们还提出了一种新颖的物体标记方法,将微型 QR 码检测器和现成的手部物体检测器相结合。作为我们数据集的应用之一,我们执行了从实验视频生成协议的任务,并发现我们的方法可以生成准确的协议。
更新日期:2024-04-05
down
wechat
bug