Merging databases for CNN image recognition, increasing bias or improving results?,Marine Micropaleontology

当前位置： X-MOL 学术 › Mar. Micropaleontol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Merging databases for CNN image recognition, increasing bias or improving results?
Marine Micropaleontology ( IF 1.9 ) Pub Date : 2023-10-11 , DOI: 10.1016/j.marmicro.2023.102296
Martin Tetard , Veronica Carlsson , Mathias Meunier , Taniel Danelian

Automated microscopy, image processing, and recognition using artificial intelligence is getting a growing interest from the scientific community, as more and more research centres are actively working on building datasets of images for training convolutional neural networks (CNNs) to identify microscopic objects. However, images acquired between institutes might show differences in light and contrast intensity leading to potential bias in identification when using datasets or CNNs from another institute.

One might then question if combining datasets acquired in different conditions is likely to improve the efficiency of the resulting CNN by increasing the number of images and integrating lighting variability into the learning process, or on the contrary introduce bias in the CNN training by adding a recurrent noise, common to all classes, through a substantial light and contrast variability.

In order to ease collaboration between laboratories, two datasets of middle Eocene radiolarian images, acquired separately at GNS Science (NZ) and the University of Lille (France), were generated to assess the accuracy of CNNs trained on both datasets individually, and also when combined into a third dataset. The performance of the three resulting CNNs was then evaluated on new images acquired at both institutions.

Finally, the new radiolarian dataset generated at GNS allowed to easily detect unknown taxa, which are otherwise abundant in the studied material. Seven new species are described: Ceratospyris metroid n. sp., Ceratospyris okazakii n. sp., Desmospyris biloba n. sp., Botryostrobus lagena n. sp., Buryella apiculata n. sp., Lophocyrtis cortesei n. sp., and Cromyosphaera fulgurans n. sp.

中文翻译：

合并数据库进行 CNN 图像识别，增加偏差还是改善结果？

使用人工智能的自动显微镜、图像处理和识别越来越受到科学界的关注，越来越多的研究中心正在积极致力于构建图像数据集，以训练卷积神经网络 (CNN) 识别微观物体。然而，在不同机构之间获取的图像可能会显示出光线和对比度强度的差异，从而导致在使用其他机构的数据集或 CNN 时出现潜在的识别偏差。

然后，人们可能会质疑，组合在不同条件下获取的数据集是否可能通过增加图像数量并将光照变化集成到学习过程中来提高最终 CNN 的效率，或者相反，通过添加循环模型在 CNN 训练中引入偏差。噪音是所有类别共有的，通过大量的光线和对比度变化而产生。

为了简化实验室之间的合作，生成了两个中始新世放射虫图像数据集，分别在 GNS Science（新西兰）和里尔大学（法国）获得，以评估分别在两个数据集上训练的 CNN 的准确性，以及当合并成第三个数据集。然后根据在两个机构获取的新图像评估这三个 CNN 的性能。

最后，GNS 生成的新放射虫数据集可以轻松检测未知的分类单元，而这些分类单元在研究材料中非常丰富。描述了七个新物种：Ceratospyris metroid n. sp.，Ceratospyris okazakii n. sp., Desmospyris biloba n. sp., Botryostrobus lagena n. sp., Buryella apiculata n. sp.，Lophocyrtis cortesei n. sp. 和Cromyosphaera fulgurans n. sp。

更新日期：2023-10-11

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>