Journal article
Computational metadata generation methods for biological specimen image collections
International journal on digital libraries
23 Nov 2022
Featured in Collection : UN Sustainable Development Goals @ Drexel
Abstract
Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the associated metadata is often sparse and, at times, erroneous. This paper extends previous research conducted with the Illinois Natural History Survey (INHS) collection (7244 specimen images) that uses computational approaches to analyze image quality, and then automatically generates 22 metadata properties representing the image quality and morphological features of the specimens. In the research reported here, we demonstrate the extension of our initial work to University of the Wisconsin Zoological Museum (UWZM) collection (4155 specimen images). Further, we enhance our computational methods in four ways: (1) augmenting the training set, (2) applying contrast enhancement, (3) upscaling small objects, and (4) refining our processing logic. Together these new methods improved our overall error rates from 4.6 to 1.1%. These enhancements also allowed us to compute an additional set of 17 image-based metadata properties. The new metadata properties provide supplemental features and information that may also be used to analyze and classify the fish specimens. Examples of these new features include convex area, eccentricity, perimeter, skew, etc. The newly refined process further outperforms humans in terms of time and labor cost, as well as accuracy, providing a novel solution for leveraging digitized specimens with ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories world-wide by generating accurate and valuable metadata for those repositories.
Metrics
Details
- Title
- Computational metadata generation methods for biological specimen image collections
- Creators
- Kevin Karnani - Drexel UniversityJoel Pepper - Drexel UniversityYasin Bakis - Tulane UniversityXiaojun Wang - Tulane UniversityHenry Bart - Tulane UniversityDavid E. Breen - Drexel UniversityJane Greenberg - Drexel University
- Publication Details
- International journal on digital libraries
- Publisher
- Springer Nature
- Number of pages
- 18
- Grant note
- 1940233; 1940322 / NSF Office of Advanced Cyberinfrastructure (OAC); National Science Foundation (NSF); NSF - Directorate for Computer & Information Science & Engineering (CISE)
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Computer Science
- Web of Science ID
- WOS:000886817000001
- Scopus ID
- 2-s2.0-85142431045
- Other Identifier
- 991020531935404721
UN Sustainable Development Goals (SDGs)
This publication has contributed to the advancement of the following goals:
Source: SDGs in the Output
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- Web of Science research areas
- Information Science & Library Science