Logo image
Interpretable Metadata-Based Microbial Risk Prediction Using Large Language Models
Conference proceeding   Open access

Interpretable Metadata-Based Microbial Risk Prediction Using Large Language Models

Hyunwoo Ted Yoo and Gail L Rosen
BCB '25: Proceedings of the 16th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, pp 1-10
10 Dec 2025
Featured in Collection :   Research Supported by Drexel Libraries' OA Programs
url
https://doi.org/10.1145/3765612.3767197View
Published, Version of Record (VoR)Open Access via Drexel Libraries Read and Publish Program 2025CC BY-NC V4.0 Open

Abstract

Large language models microbial risk prediction metadata analysis ontology classification environmental biosurveillance
Environmental metadata provides a widely accessible signal for microbiome analysis, especially in settings where genomic sequencing is unavailable. However, traditional machine learning models often struggle to make reliable predictions using metadata alone. The potential of large language models (LLMs) to effectively utilize such metadata remains underexplored. In this work, we investigate whether LLMs can perform accurate microbial ontology classification and pathogen risk prediction using metadata alone. We evaluate models such as ChatGPT-4o, Claude 3.7, Grok-3, and LLaMA 4 on tasks including ontology classification based on the Earth Microbiome Project Ontology (EMPO 3) and binary prediction of E. coli contamination risk, across both zero-shot and few-shot settings. To further understand how these models make predictions, we conduct a comprehensive ablation study to identify which environmental features such as rainfall, turbidity, or temperature most influence performance. Our results show that LLMs not only achieve competitive predictive performance but also exhibit distinct patterns of reliance on specific metadata features. These patterns enable insights into biological decision factors without requiring access to genomic sequences. Our findings highlight the potential of LLMs as robust and interpretable tools for metadata-based microbiome analysis and environmental biosurveillance.

Metrics

8 Record Views

Details

UN Sustainable Development Goals (SDGs)

This publication has contributed to the advancement of the following goals:

#3 Good Health and Well-Being
#6 Clean Water and Sanitation
#11 Sustainable Cities and Communities
#14 Life Below Water

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Mathematical & Computational Biology
Medical Informatics
Logo image