Logo image
Geologic data standardization for database entry: Preparing diverse datasets for hosting and accessibility
Conference proceeding   Open access

Geologic data standardization for database entry: Preparing diverse datasets for hosting and accessibility

Leah LeVay, Andrew Fraass, Shanan Peters, Jocelyn Sessa, Seth Kaufman and Wai-Yin Kwan
01 Jan 2022
url
https://doi.org/10.6084/m9.figshare.20123972View
Preprint (Author's original)Open Access (License Unspecified) Open

Abstract

Aggregating a large number of datasets into databases facilitates new research avenues and creates a more accessible pathway for information discovery. When dealing with datasets that come from multiple sources spanning decades, however, that harmonization process can become quite onerous. Extending Ocean Drilling Pursuits (eODP) is an EarthCube-funded project compiling and migrating scientific ocean drilling data that span six decades into three representations: an eODP-specific aggregation database, the Macrostrat database, and the Paleobiology Database (PBDB). Sediment rock descriptions and microfossil assemblage data have been stored as flat files in various places and formats depending upon the year of collection, ultimately limiting their utility. Additionally, as methodologies have evolved over the years, new and different information was captured in loose, customizable formats that require translation by content experts. A major goal of eODP is to centralize and harmonize much of this information. Preparing the raw data for ingestion into Macrostrat and the PBDB has required intensive effort. This includes entering taxonomic opinions for every microfossil genus, cleaning and editing microfossil names, and standardizing and cross-walking column header names. The standardization of datasets and database prep work has been a combination of computational cleaning and formatting, along with manual cleaning and data entry. In order to format all of the files in a consistent way, a new database called the eODP database, was created. This database consists of all of the files retrieved, with completed cross-walking, with data staged for migration into Macrostrat and the PBDB. Before microfossil assemblage data can be transferred to the PBDB, all taxonomic opinions and fossil name errors or misspellings were manually entered and reviewed. Furthermore, stratigraphic age information and sediment rock descriptions not stored in a digital format have required manual database entry. Setting the foundation for eODP has been complex, due to the inconsistencies within files and the sheer volume of data, but progress, necessary to produce the best outcome for the community, is being made.

Metrics

18 Record Views

Details

Logo image