Logo image
Automatic Metadata Generation for Fish Specimen Image Collections
Conference proceeding   Open access

Automatic Metadata Generation for Fish Specimen Image Collections

Joel Pepper, Jane Greenberg, Yasin Bakis, Xiaojun Wang, Henry Bart and David Breen
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)
Sep 2021
url
https://doi.org/10.1101/2021.10.04.463070View

Abstract

applied machine learning bioinformatics Feature extraction Fish image analysis Libraries Machine learning Measurement Metadata Ontologies
Metadata are key descriptors of research data, particularly for researchers seeking to apply machine learning (ML) to the vast collections of digitized specimens. Unfortunately, the available metadata is often sparse and, at times, erroneous. Additionally, it is prohibitively expensive to address these limitations through traditional, manual means. This paper reports on research that applies machine-driven approaches to analyzing digitized fish images and extracting various important features from them. The digitized fish specimens are being analyzed as part of the Biology Guided Neural Networks (BGNN) initiative, which is developing a novel class of artificial neural networks using phylogenies and anatomy ontologies. Automatically generated metadata is crucial for identifying the high-quality images needed for the neural network's predictive analytics. Methods that combine ML and image informatics techniques allow us to rapidly enrich the existing metadata associated with the 7,244 images from the Illinois Natural History Survey (INHS) used in our study. Results show we can accurately generate many key metadata properties relevant to the BGNN project, as well as general image quality metrics (e.g. brightness and contrast). Results also show that we can accurately generate bounding boxes and segmentation masks for fish, which are needed for subsequent machine learning analyses. The automatic process outperforms humans in terms of time and accuracy, and provides a novel solution for leveraging digitized specimens in ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories worldwide.

Metrics

22 Record Views
5 citations in Scopus

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#2 Zero Hunger

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
Web of Science research areas
Computer Science, Interdisciplinary Applications
Information Science & Library Science
Logo image