当前位置: X-MOL 学术Vis. Comput. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Visual question answering on blood smear images using convolutional block attention module powered object detection
The Visual Computer ( IF 3.5 ) Pub Date : 2024-04-09 , DOI: 10.1007/s00371-024-03359-6
A. Lubna , Saidalavi Kalady , A. Lijiya

One of the vital characteristics that determine the health condition of a person is the shape and number of the red blood cells, white blood cells and platelets present in one’s blood. Any abnormality in these characteristics is an indication of the person suffering from diseases like anaemia, leukaemia or thrombocytosis. The counting of the blood cell is conventionally made by means of microscopic studies with the application of suitable chemical substances in the blood. The conventional methods pose challenges in the analysis in terms of manual labour and are time-consuming and costly tasks requiring highly skilled medical professionals. This paper proposes a novel scheme to analyse the blood sample of an individual by employing a visual question answering (VQA) system, which accepts a blood smear image as input and answers questions pertaining to the sample, viz. amount of blood cells, nature of abnormalities, etc. very quickly without requiring the service of a skilled medical professional. In VQA, the computer generates textual answers to questions about an input image. Solving this difficult problem requires visual understanding, question comprehension and deductive reasoning. The proposed approach exploits a convolutional neural network for question categorisation and an object detector with an attention mechanism for visual comprehension. The experiment has been conducted with two types of attention: (1) convolutional block attention module and (2) squeeze-and-excitation network which facilitates very fast and reliable results. A VQA dataset has been created for this study due to the unavailability of a public dataset, and the proposed system exhibited an accuracy of 94% for numeric response questions/yes or no type questions and has a BLEU score of 0.91. It is also observed that the attention-based object recognition model of the proposed system for counting the blood characteristics has an accuracy of 97%, 100% and 98% for red blood cell count, white blood cell count and platelet count, respectively, which is an improvement of 1%, 0.06% and 1.61% as compared to the state-of-the-art model.



中文翻译:

使用卷积块注意力模块驱动的对象检测对血涂片图像进行视觉问答

决定一个人健康状况的重要特征之一是血液中红细胞、白细胞和血小板的形状和数量。这些特征的任何异常都表明该人患有贫血、白血病或血小板增多症等疾病。血细胞计数通常通过在血液中应用合适的化学物质进行显微镜研究来进行。传统方法对体力劳动的分析提出了挑战,并且是耗时且昂贵的任务,需要高技能的医疗专业人员。本文提出了一种利用视觉问答(VQA)系统来分析个体血液样本的新颖方案,该系统接受血涂片图像作为输入并回答与样本有关的问题,即:无需熟练的医疗专业人员的服务即可非常快速地了解血细胞的数量、异常的性质等。在 VQA 中,计算机生成有关输入图像的问题的文本答案。解决这个难题需要视觉理解、问题理解和演绎推理。所提出的方法利用卷积神经网络进行问题分类,并利用具有注意机制的对象检测器进行视觉理解。该实验使用两种类型的注意力进行:(1)卷积块注意力模块和(2)挤压和激励网络,这有助于获得非常快速且可靠的结果。由于无法获得公共数据集,因此为本研究创建了 VQA 数据集,所提出的系统对于数字回答问题/是或否类型问题的准确率为 94%,BLEU 得分为 0.91。还观察到,所提出的血液特征计数系统的基于注意力的目标识别模型对于红细胞计数、白细胞计数和血小板计数的准确率分别为 97%、100% 和 98%,这与最先进的模型相比,分别提高了 1%、0.06% 和 1.61%。

更新日期:2024-04-10
down
wechat
bug