Thesis
Using accessory gene content to predict the clinical isolation source of Haemophilus influenzae
Master of Science (M.S.), Drexel University
Jul 2023
DOI:
https://doi.org/10.17918/00001799
Abstract
Introduction: Haemophilus influenzae is a human-restricted bacterium that commonly colonizes the nasopharynx of healthy children. However, H. influenzae also causes high disease burden due to its role in various infections and chronic disease conditions. This commensal-to-pathogen transition stems from complex factors including: the presence of co-infecting viruses; the genetics, immunocompetence, and environment of the human host; and phenotypic and genomic variation among H. influenzae strains, which includes high variation in accessory gene content. We hypothesized that accessory genes affect a given strain's potential to cause different diseases, and thus the accessory gene profile of a strain could be used to predict its clinical isolation source. Methods: We collected 1618 H. influenzae whole genome sequences - of which 896 were newly-sequenced for this work - representing clinical isolates collected from diverse health states: (1) carriage (isolated from healthy subjects' nasal passages); (2) ear (from the middle ears of children with otitis media); (3) eye (from the eyes of children with conjunctivitis); (4) invasive (from blood or cerebrospinal fluid of children with sepsis, bacteremia, or meningitis); (5) lung (from sputum or other lung-associated specimens from subjects with chronic obstructive pulmonary disease, cystic fibrosis, or other lung disease); and (6) other (spanning various less-well sampled and rare disease states, or with ambiguous or unknown provenance). Using each strain's gene possession profile, we trained a decision tree-based XGBoost machine learning classifier to predict clinical isolation source based on the presence of 2040 features representing the accessory genes. Results: An optimized XGBoost model correctly predicted the clinical provenance of H. influenzae isolates with a significantly higher accuracy (62%) than naïve guessing (31%). Notably, accuracy varied substantially among the five classes, with carriage and lung isolates called with much higher accuracy than the other classes, especially middle ear isolates. Although performance correlated with class size, corrections for class imbalance and under-sampling yielded minimal improvement, and rarefaction analysis indicated that classes had reasonable sampling. These data suggest that accessory genes contribute to some disease states more than others. Removal of encapsulated isolates, which are strongly associated with invasive infections, resulted in models with comparable overall performance (~62% accuracy). However, class-specific performance varied substantially, with a dramatic decrease in predictive capability of invasive and ear isolates, coupled with an improved performance pertaining to isolates collected from conjunctivitis. Using recursive feature elimination, we identified a reduced set of 50 accessory gene features that predicted clinical isolation source with reasonably high accuracy (55%). Many of these highly informative genes had known or predicted functions in H. influenzae pathogenesis, including: cell surface adhesins, toxin-antitoxin systems, and phage-associated recombinases. Discussion: These results underline the importance of accessory gene content in influencing the pathogenic potential of H. influenzae. Furthermore, variation in performance by disease state suggests that accessory genes have varying influence across these different pathologies. More specifically, middle ear and invasive infections from unencapsulated H. influenzae may be less affected by variable accessory gene content than others, such as those isolated chronic lung disease. Most top-ranking informative features were accessory genes with known functions associated with pathogenesis, but 14 of these encoded proteins with unknown functions, providing a list of candidate genes for future molecular mechanistic investigations.
Metrics
61 File views/ downloads
16 Record Views
Details
- Title
- Using accessory gene content to predict the clinical isolation source of Haemophilus influenzae
- Creators
- Kelvin Koser
- Contributors
- Joshua Chang Mell (Advisor)Gabriele Romano (Advisor)
- Awarding Institution
- Drexel University
- Degree Awarded
- Master of Science (M.S.)
- Publisher
- Drexel University; Philadelphia, Pennsylvania
- Number of pages
- x, 158 pages
- Resource Type
- Thesis
- Language
- English
- Academic Unit
- School of Biomedical Engineering, Science, and Health Systems (1997-2026); Drexel University
- Other Identifier
- 991021227814104721