Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
Creation and curation of knowledge graphs can accelerate disease discovery
and analysis in real-world data. While disease ontologies aid in biological
data annotation, codified categories (SNOMED-CT, ICD10, CPT) may not capture
patient condition nuances or rare diseases. Multiple disease definitions across
data sources complicate ontology mapping and disease clustering. We propose
creating patient knowledge graphs using large language model extraction
techniques, allowing data extraction via natural language rather than rigid
ontological hierarchies. Our method maps to existing ontologies (MeSH,
SNOMED-CT, RxNORM, HPO) to ground extracted entities.
Using a large ambulatory care EHR database with 33.6M patients, we
demonstrate our method through the patient search for Dravet syndrome, which
received ICD10 recognition in October 2020. We describe our construction of
patient-specific knowledge graphs and symptom-based patient searches. Using
confirmed Dravet syndrome ICD10 codes as ground truth, we employ LLM-based
entity extraction to characterize patients in grounded ontologies. We then
apply this method to identify Beta-propeller protein-associated
neurodegeneration (BPAN) patients, demonstrating real-world discovery where no
ground truth exists.
Metrics
6 Record Views
Details
Title
Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search