Flexible metadata pipelines are crucial for supporting the FAIR data
principles. Despite this need, researchers seldom report their approaches for
identifying metadata standards and protocols that support optimal flexibility.
This paper reports on an initiative targeting the development of a flexible
metadata pipeline for a collection containing over 300,000 digital fish
specimen images, harvested from multiple data repositories and fish
collections. The images and their associated metadata are being used for
AI-related scientific research involving automated species identification,
segmentation and trait extraction. The paper provides contextual background,
followed by the presentation of a four-phased approach involving: 1. Assessment
of the Problem, 2. Investigation of Solutions, 3. Implementation, and 4.
Refinement. The work is part of the NSF Harnessing the Data Revolution, Biology
Guided Neural Networks (NSF/HDR-BGNN) project and the HDR Imageomics Institute.
An RDF graph prototype pipeline is presented, followed by a discussion of
research implications and conclusion summarizing the results.
Metrics
7 Record Views
Details
Title
Toward a Flexible Metadata Pipeline for Fish Specimen Images