Logo image
A unified framework for collecting, annotating, modeling and prediction from phenotypes in multi-institutional pediatric data resources
Dissertation   Open access

A unified framework for collecting, annotating, modeling and prediction from phenotypes in multi-institutional pediatric data resources

Alex S. Felmeister
Doctor of Philosophy (Ph.D.), Drexel University
Apr 2019
DOI:
https://doi.org/10.17918/0p1j-yq02
pdf
Felmeister_Alex_20192.60 MBDownloadView

Abstract

Information science Data integration (Computer science) Brain--Tumors Pediatrics--Research Bioinformatics Medical Informatics Phenotype
Predictive models on sequential observational clinical data have helped make headway utilizing large sets of this data collected from health systems from electronic medical record (EMR) and billing systems to predict outcomes, help physicians make decisions through clinical decision support (CDS) and possibly change the course of treatment of patients suffering from some of the most common diseases and illnesses. For diseases that affect a large portion of the population, standardized clinical data observations can be used to create new models that predict precise treatment options for individuals. This is referred to as precision medicine, and these treatments are derived from techniques in machine learning, artificial intelligence and other areas breaking to the surface by data scientists in healthcare. The question remains, is this data useful for those affected by rare diseases? Unlike diseases that common in the population like asthma or diabetes, rare diseases do not have characteristics well described or derived from the clinical records. Pediatric cancer, specifically brain tumors in children are severely lacking the sample size, consistent data capture and simple infrastructure to collect data. This hinders the speed at which scientists can discover breakthroughs in precision medicine as is seen in more common diseases. With the attention and buzz around the discipline under the umbrella of "artificial intelligence" in precision medicine especially in cancer, collaborations have been created to pool resources, data and biomaterials. In parallel, the government in the United States over the years has dedicated money and resources to large scale health data programs called Clinical Data Resource Networks (CDRNs) where standard observational clinical data is integrated across institutions to made large relevant sample sizes for observational research in one common data model. In this research, we investigate the intersection of both a specific rare disease repository, The Children's Brain Tumor Tissue Consortium (CBTTC) and a CDRN, PEDSnet, with the goal of leveraging the CDRN to help ease the burden of human data entry in the CBTTC and make data available faster to more researchers over multiple disciplines. We use PEDSnet data from children diagnosed with malignant brain tumors to try to predict a highly malignant brain tumor, Primitive neuroectodermal tumors (PNETs), and reproduce a similar data set used in a physical laboratory project developed by researchers at the Children's Hospital of Seattle (CHS), Children's Hospital of Philadelphia (CHOP) and Toronto Hospital for Sick Children. We use a threefold pipeline of exploratory data analysis, multi-modal data transformation and predictive analytics to derive a data driven phenotype from a sample of over 1900 brain tumor cases from PEDSnet originating at CHOP and SCH. The hypothesis herein is that the CDRN could contain a set of data that can be derived through traditional machine learning techniques and used to supplement the current hand data entry and abstraction process to bring automation to data harmonization across institutions where longitudinal data is required for biomedical research. This research is supported by the iFellowship Program administered by the University of Pittsburgh as part of the iSchool Consortium and funded by the Andrew W. Mellon Foundation. This research reflects the goals of the Committee on Coherence at Scale for Higher Education in that this research aims to take information that is usually 'siloed' in a lab and make it available to more minds from different domains by using a national organization with large sets of patient data to drive an analogue, human data entry intensive project to scale.

Metrics

62 File views/ downloads
64 Record Views

Details

Logo image