Development and application of microarray technology in biological research has led to compilation of expression and sequence data on a genome-wide scale. Given the volume of data produced and the complexity of gene regulatory mechanisms, it can be difficult to extract meaningful biological information. Classification can be used to reduce the complexity through the detection of genes, genetic loci or conditions that share common attributes and the identification of gene expression patterns or genotypes associated with phenotype. In the study of cancer, supervised classification has been applied to identify gene expression biomarkers of different disease states. Clinically validated biomarkers are valuable indicators for diagnosis and guiding therapeutic strategy. We developed an iterative machine learning algorithm to compare the predictive value of biomarker sets chosen by supervised classification against sets selected randomly from known disease-related genes. Both supervised classification and feature selection based on prior knowledge resulted in discriminative classification of molecular phenotypes in breast cancer and lymphoma. Compilation of gene expression data has led to the identification of genes with bimodal, or switch-like, expression patterns. We used unsupervised, supervised and model-based classification methods to investigate the biological relevance of bimodal expression patterns and to evaluate their potential for class discovery and prediction. Both model-based and supervised classification resulted in the accurate classification of samples by tissue phenotype or infectious disease. Functional enrichment analysis indicates switch-like genes are involved in tissue-specific or immune response functions. Taken together, this evidence supports the assertion that bimodal expression patterns are biologically relevant. Clinical relevance of bimodal expression patterns was investigated in an association study of genotypes of families affected by autism. A subset of neural-specific switch-like genes was used to identify candidate gene regions which may contain genetic variants associated with autism risk. A two-stage family-based association test detected an autism susceptibility locus in the q26 region of chromosome 10. The coding region of the fibroblast growth factor receptor 2 (FGFR2) gene is 80 kilobases downstream from the identified locus. Altered expression of FGFR2 may be a contributing genetic factor in development of autism. Identification of the susceptibility locus provides motivation for novel hypotheses concerning the molecular basis of autism. In addition, we provide a method for integration of gene expression and genotype data that may lead to the identification of disease-related polymorphisms in other disorders.
Metrics
31 File views/ downloads
25 Record Views
Details
Title
Classification of tissues and disease subtypes using whole-genome signatures
Creators
Michael P. Gormley - DU
Contributors
Aydin Tözeren (Advisor) - Drexel University (1970-)
Awarding Institution
Drexel University
Degree Awarded
Doctor of Philosophy (Ph.D.)
Publisher
Drexel University; Philadelphia, Pennsylvania
Resource Type
Dissertation
Language
English
Academic Unit
School of Biomedical Engineering, Science, and Health Systems (1997-2026); Drexel University