Logo image
Content based search in gene expression databases and a meta-analysis of host responses to infection
Dissertation   Open access

Content based search in gene expression databases and a meta-analysis of host responses to infection

Francis X. Bell
Doctor of Philosophy (Ph.D.), Drexel University
Nov 2015
DOI:
https://doi.org/10.17918/etd-7254
pdf
Bell_Francis-X_20151.97 MBDownloadView

Abstract

Meta-analysis Biomedical Engineering Gene Expression
The expression of a gene is a function of the number of times that the information encoded by the gene is transcribed. Estimation of gene expression levels is typically performed by determining the concentration of RNA molecules. The high-throughput technology of cDNA microarrays allows the concentrations of thousands of genes to be estimated simultaneously. Databases of gene expression studies have been growing, but the data contained in these databases are not fully interpreted, because cross-comparison of all gene expression studies requires large amounts of memory and computing time, making content-based searching in these databases impractical. In order to improve the efficiency of searching gene expression databases, we introduce the method of representing gene expression data in binary format, inspired from the use of binary fingerprints in the Chemoinformatics applications. The use of binary representations of significant gene lists from gene expression studies is tested for its appropriateness by performing cross-validation experiments in small and large benchmark datasets. Among the numerical and binary distance measures surveyed, the modified Tanimoto distance was found to provide the best accuracy-speed trade-off; it identifies many relevant profiles as similar to a query profile and the distance calculation can be executed efficiently by utilizing fast bit-wise operations. Availability of data from different gene expression experiments in public databases provides an opportunity for performing meta-analysis to obtain common expression profiles across different experiments that may not be apparent from individual studies. We have undertaken a meta-analysis of host responses to infections, utilizing both gene and miRNA expression studies. Common and unique differential expression patterns in response to pathogens from different taxonomic groups are identified and gene set enrichment was performed to identify the affected biological pathways. Our findings corroborate known common host response mechanisms, while also identifying novel expression profiles that are important for pathogen-specific responses. Some of the significantly differentially expressed genes we have identified are involved in pathogen recognition, chromosome and proteasome assembly, and common pathogen evasion mechanisms including TNF signaling and apoptosis. Our findings identify under-studied classes of pathogens and also provide insights for pathogen-specific evasion and response mechanisms.

Metrics

32 File views/ downloads
25 Record Views

Details

Logo image