Aggregating a large number of datasets into databases facilitates new research avenues and creates a more accessible pathway for information discovery. When dealing with datasets that come from multiple sources spanning decades, however, that harmonization process can become quite onerous. Extending Ocean Drilling Pursuits (eODP) is an EarthCube-funded project compiling and migrating scientific ocean drilling data that span six decades into three representations: an eODP-specific aggregation database, the Macrostrat database, and the Paleobiology Database (PBDB). Sediment rock descriptions and microfossil assemblage data have been stored as flat files in various places and formats depending upon the year of collection, ultimately limiting their utility. Additionally, as methodologies have evolved over the years, new and different information was captured in loose, customizable formats that require translation by content experts. A major goal of eODP is to centralize and harmonize much of this information. Preparing the raw data for ingestion into Macrostrat and the PBDB has required intensive effort. This includes entering taxonomic opinions for every microfossil genus, cleaning and editing microfossil names, and standardizing and cross-walking column header names. The standardization of datasets and database prep work has been a combination of computational cleaning and formatting, along with manual cleaning and data entry. In order to format all of the files in a consistent way, a new database called the eODP database, was created. This database consists of all of the files retrieved, with completed cross-walking, with data staged for migration into Macrostrat and the PBDB. Before microfossil assemblage data can be transferred to the PBDB, all taxonomic opinions and fossil name errors or misspellings were manually entered and reviewed. Furthermore, stratigraphic age information and sediment rock descriptions not stored in a digital format have required manual database entry. Setting the foundation for eODP has been complex, due to the inconsistencies within files and the sheer volume of data, but progress, necessary to produce the best outcome for the community, is being made.
Metrics
18 Record Views
Details
Title
Geologic data standardization for database entry: Preparing diverse datasets for hosting and accessibility
Creators
Leah LeVay
Andrew Fraass
Shanan Peters
Jocelyn Sessa
Seth Kaufman
Wai-Yin Kwan
Publisher
figshare
Resource Type
Conference proceeding
Language
English
Academic Unit
Biodiversity, Earth, and Environmental Science (BEES)