Conference proceeding
Identifying Physiochemical Variables important to LLM Prediction of Coliform Bacteria Presence
Companion Proceedings of the 16th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 23
11 Oct 2025
Abstract
The objective of this study was to evaluate whether large language models (LLMs) can effectively predict the presence of coliform bacteria in water using physicochemical and sanitizer residual features, providing a rapid and low-cost alternative to conventional laboratory testing. This question is significant because coliform detection is a key public health concern, yet current methods are resource-intensive and not always feasible in real-time monitoring contexts. For data, we used experimental measurements from irrigation water that were collected from coliform-inoculated lettuce crops by the University of Arizona. The purpose of the experiment was to see if coliform bacteria could be detected after no irrigation treatment, Peracetic Acid (PAA) treatment, and calcium hypochlorite (Cl) addition to water. In this study, we tested the importance of other water quality attributes for the prediction. To address this, we tested five LLMs; ChatGPT-4o, Claude 4, Llama 4, Grok 3, and Gemini 2.5 under both zero-shot conditions, where predictions relied solely on heuristic reasoning without labeled data, and few-shot conditions, where limited labeled samples guided learning.
Predictions were generated across three scenarios that excluded specific features: Date and Time, pH, and Water Temperature, and the outputs were compared against ground-truth labels as well as across models. Results revealed that Llama4 with few-shot information was strongest when deprived time/date information depending on the metric with 80% accuracy and 78% F1 of precision-recall; Grok 3 consistently performed best when deprived pH information, achieving the highest accuracy (78% zero shot/81% few shot), macro F1 (77% zero shot/80% few shot) F1 of the precision-recall; and while all models suffered loss with the depreivation of water temperature information, Gemini 2.5 delivered the top overall results, particularly in macro F1 (in the 70s % for both zero- and few- shot). ChatGPT-4o performed well in heuristic zero-shot reasoning but adapted poorly in few-shot learning, while Claude 4 and Llama 4 delivered steady midrange performance across most scenarios. We conclude that although no single LLM dominated in every case, Grok 3 emerged as the most reliable across conditions, and Gemini 2.5 demonstrated context-dependent strengths in the few-shot setting.
These comparative findings further revealed that while all models relied on similar heuristic cues such as turbidity and sanitizer residuals, their ability to balance precision and recall varied across experimental conditions. Also, we found that date/time and pH can possibly confound performance and lower performance while water temperature (in C) turned out to be the most important variable that we tested. These findings highlight the potential of LLMs as decision-support tools for water quality monitoring, and the importance of cross-model evaluation to improve robustness and reliability in environmental applications. This suggests that combining or triangulating outputs from multiple LLMs could provide a more robust and reliable framework than relying on a single model in isolation.
This work was supported in part by a National Science Foundation grant #2107108 and by the Center for Produce Safety under a grant entitled "Development of a risk ranking tool for evaluating hazards and risks related to agricultural water subpart E".
Metrics
2 Record Views
Details
- Title
- Identifying Physiochemical Variables important to LLM Prediction of Coliform Bacteria Presence
- Creators
- Hamza Shahrukh Kiyani - Drexel UniversityKristina Yoo - Drexel UniversityHyunwoo Yoo - Drexel UniversityGail Rosen - Drexel UniversityHunter Quon - Arizona State UniversityAlyssa Rosenbaum - University of ArizonaCharles Gerba - University of ArizonaKelly Bright - University of ArizonaNatalie Brassill - University of ArizonaChannah Rock - University of ArizonaKerry Hamilton - Arizona State University
- Publication Details
- Companion Proceedings of the 16th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 23
- Conference
- BCB Companion '25: Companion Proceedings of the 16th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics
- Series
- ACM Conferences
- Publisher
- Association for Computing Machinery
- Number of pages
- 1
- Grant note
- National Science Foundation: 2107108 Center for Produce Safety
This work was supported in part by a National Science Foundation grant #2107108 and by the Center for Produce Safety under a grant entitled "Development of a risk ranking tool for evaluating hazards and risks related to agricultural water subpart E".
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Electrical and Computer Engineering; College of Computing and Informatics
- Web of Science ID
- WOS:001661442600022
- Other Identifier
- 991022138658004721