当前位置: X-MOL 学术arXiv.cs.LG › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Language Models are Free Boosters for Biomedical Imaging Tasks
arXiv - CS - Machine Learning Pub Date : 2024-03-26 , DOI: arxiv-2403.17343
Zhixin Lai, Jing Wu, Suiyao Chen, Yucheng Zhou, Anna Hovakimyan, Naira Hovakimyan

In this study, we uncover the unexpected efficacy of residual-based large language models (LLMs) as part of encoders for biomedical imaging tasks, a domain traditionally devoid of language or textual data. The approach diverges from established methodologies by utilizing a frozen transformer block, extracted from pre-trained LLMs, as an innovative encoder layer for the direct processing of visual tokens. This strategy represents a significant departure from the standard multi-modal vision-language frameworks, which typically hinge on language-driven prompts and inputs. We found that these LLMs could boost performance across a spectrum of biomedical imaging applications, including both 2D and 3D visual classification tasks, serving as plug-and-play boosters. More interestingly, as a byproduct, we found that the proposed framework achieved superior performance, setting new state-of-the-art results on extensive, standardized datasets in MedMNIST-2D and 3D. Through this work, we aim to open new avenues for employing LLMs in biomedical imaging and enriching the understanding of their potential in this specialized domain.

中文翻译:

语言模型是生物医学成像任务的免费助推器

在这项研究中,我们发现基于残差的大语言模型(LLM)作为生物医学成像任务编码器的一部分具有意想不到的功效,生物医学成像任务是一个传统上缺乏语言或文本数据的领域。该方法与现有方法不同,它利用从预训练的 LLM 中提取的冻结变压器块作为直接处理视觉标记的创新编码器层。该策略与标准多模式视觉语言框架有很大不同,后者通常依赖于语言驱动的提示和输入。我们发现这些法学硕士可以提高一系列生物医学成像应用的性能,包括 2D 和 3D 视觉分类任务,作为即插即用的助推器。更有趣的是,作为副产品,我们发现所提出的框架实现了卓越的性能,在 MedMNIST-2D 和 3D 中广泛的标准化数据集上设置了新的最先进的结果。通过这项工作,我们的目标是为生物医学成像领域的法学硕士的应用开辟新的途径,并丰富人们对其在这一专业领域潜力的理解。
更新日期:2024-03-27
down
wechat
bug