Computational metadata generation methods for biological specimen image collections,International Journal on Digital Libraries

当前位置： X-MOL 学术 › International Journal on Digital Libraries › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Computational metadata generation methods for biological specimen image collections
International Journal on Digital Libraries Pub Date : 2022-11-23 , DOI: 10.1007/s00799-022-00342-1
Kevin Karnani , Joel Pepper , Yasin Bakiş , Xiaojun Wang , Henry Bart , David E. Breen , Jane Greenberg

Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the associated metadata is often sparse and, at times, erroneous. This paper extends previous research conducted with the Illinois Natural History Survey (INHS) collection (7244 specimen images) that uses computational approaches to analyze image quality, and then automatically generates 22 metadata properties representing the image quality and morphological features of the specimens. In the research reported here, we demonstrate the extension of our initial work to University of the Wisconsin Zoological Museum (UWZM) collection (4155 specimen images). Further, we enhance our computational methods in four ways: (1) augmenting the training set, (2) applying contrast enhancement, (3) upscaling small objects, and (4) refining our processing logic. Together these new methods improved our overall error rates from 4.6 to 1.1%. These enhancements also allowed us to compute an additional set of 17 image-based metadata properties. The new metadata properties provide supplemental features and information that may also be used to analyze and classify the fish specimens. Examples of these new features include convex area, eccentricity, perimeter, skew, etc. The newly refined process further outperforms humans in terms of time and labor cost, as well as accuracy, providing a novel solution for leveraging digitized specimens with ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories world-wide by generating accurate and valuable metadata for those repositories.

中文翻译：

生物标本图像集的计算元数据生成方法

元数据是寻求将机器学习 (ML) 应用于可在线找到的大量数字化生物标本的研究人员的关键数据源。不幸的是，相关的元数据通常是稀疏的，而且有时是错误的。本文扩展了先前对伊利诺斯州自然历史调查 (INHS) 集合（7244 张标本图像）进行的研究，该研究使用计算方法分析图像质量，然后自动生成代表图像质量和标本形态特征的 22 个元数据属性。在此报告的研究中，我们展示了将我们的初步工作扩展到威斯康星大学动物博物馆 (UWZM) 收藏（4155 张标本图像）。此外，我们通过四种方式增强我们的计算方法：（1）扩充训练集，(2) 应用对比度增强，(3) 放大小物体，以及 (4) 改进我们的处理逻辑。这些新方法共同将我们的整体错误率从 4.6% 提高到 1.1%。这些增强功能还使我们能够计算另外一组 17 个基于图像的元数据属性。新的元数据属性提供了补充特征和信息，也可用于对鱼类标本进行分析和分类。这些新特征的例子包括凸面、偏心率、周长、偏斜等。新改进的过程在时间和劳动力成本以及准确性方面进一步优于人类，为利用 ML 数字化样本提供了一种新颖的解决方案。

更新日期：2022-11-25

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文

全部期刊列表>>