Information Science & Library Science Science & Technology Technology
Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the associated metadata is often sparse and, at times, erroneous. This paper extends previous research conducted with the Illinois Natural History Survey (INHS) collection (7244 specimen images) that uses computational approaches to analyze image quality, and then automatically generates 22 metadata properties representing the image quality and morphological features of the specimens. In the research reported here, we demonstrate the extension of our initial work to University of the Wisconsin Zoological Museum (UWZM) collection (4155 specimen images). Further, we enhance our computational methods in four ways: (1) augmenting the training set, (2) applying contrast enhancement, (3) upscaling small objects, and (4) refining our processing logic. Together these new methods improved our overall error rates from 4.6 to 1.1%. These enhancements also allowed us to compute an additional set of 17 image-based metadata properties. The new metadata properties provide supplemental features and information that may also be used to analyze and classify the fish specimens. Examples of these new features include convex area, eccentricity, perimeter, skew, etc. The newly refined process further outperforms humans in terms of time and labor cost, as well as accuracy, providing a novel solution for leveraging digitized specimens with ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories world-wide by generating accurate and valuable metadata for those repositories.
Computational metadata generation methods for biological specimen image collections
Creators
Kevin Karnani - Drexel University
Joel Pepper - Drexel University
Yasin Bakis - Tulane University
Xiaojun Wang - Tulane University
Henry Bart - Tulane University
David E. Breen - Drexel University
Jane Greenberg - Drexel University
Publication Details
International journal on digital libraries
Publisher
Springer Nature
Number of pages
18
Grant note
1940233; 1940322 / NSF Office of Advanced Cyberinfrastructure (OAC); National Science Foundation (NSF); NSF - Directorate for Computer & Information Science & Engineering (CISE)
Resource Type
Journal article
Language
English
Academic Unit
Computer Science
Web of Science ID
WOS:000886817000001
Scopus ID
2-s2.0-85142431045
Other Identifier
991020531935404721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool: