Conference proceeding
AI-Ready Data: Knowledge Extraction from Archival Lab Notebooks
IEEE International Conference on Big Data (Print), pp 2489-2495
15 Dec 2024
Abstract
Collections of analog lab notebooks are an invaluable source of data about research conditions, steps, and outcomes, and in aggregate have the potential to provide new insights into the successes, failures and pedagogy of research laboratories. Unfortunately, these artifacts are increasingly at risk of being lost from the historical scientific record, given limited archiving and an absence of computational and AI readiness. This paper reports on research addressing this challenge by testing mechanisms for transforming digital scans of analog lab notebooks into AI-ready data resources. The research being pursued is framed by the field of computational archival science (CAS) and the aim to utilize analog, research lab notebook data for scientific study. The paper presents background context on archival lab notebooks and CAS, discusses MOF (metal organic frameworks) and COF (covalent organic frameworks) synthesis - the scientific domain of the lab notebooks under study, and details our research methods. We demonstrate a promising approach that automatically segments pages into discrete entry types, extracts the contents of those entries, refines the output and assesses the automated results. These efforts represent a first step towards developing a framework for both improving the usability of archival lab notebooks, and enabling their contents to be used in subsequent scientific inquiry.
Metrics
4 Record Views
Details
- Title
- AI-Ready Data: Knowledge Extraction from Archival Lab Notebooks
- Creators
- Joel Pepper - Drexel UniversityElizabeth Jones - Northeastern UniversityXintong Zhao - Drexel UniversityJacob Furst - University of Central FloridaKyle Langlois - University of Central FloridaFernando Uribe-Romo - University of Central FloridaDavid Breen - Drexel UniversityJane Greenberg - Drexel University
- Publication Details
- IEEE International Conference on Big Data (Print), pp 2489-2495
- Publisher
- IEEE
- Grant note
- National Science Foundation (10.13039/100000001)
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Information Science (Informatics); Computer Science (Computing)
- Scopus ID
- 2-s2.0-85218053327
- Other Identifier
- 991022020336004721