One of the major issues of biological research is that events happen continuously while we only sample them at a few discrete time points. This is especially evident when studying gene mRNA expression. Due to expense and difficulty of these experiments they seldom have more than a handful of time points whose time scale ranges from hours to days. This time frame is much longer then the minimal time step of changes in mRNA. I have developed here a computational method that categorizes genes by their expression patterns over time. Starting from a time course of mRNA genes arrays I fit each gene to the least complex segmented linear model that well describes the time course. In this fashion each gene was fit to either single slope or multiple contiguous slopes. Normalizing the distribution of the fold changes of slopes; each slope was assigned a general direction - up (u) down (d) or flat (f). We could thus fit all the genes in a single experiment to one of 39 patterns - 3 with single slopes (u, d, f) 9 with two slopes (uu, ud, uf, du, dd, df, fu, fd, ff) and 27 with three slopes (uuu, uud, uuf, udu, udd, udf, etc.). We could now ask not only which genes are over or under-expressed at a given time point in a given response but also if there were general trends to the dynamics of specific genes. This new question is much less likely to be affected by the happenstance of when things are measured as it relies on multiple time points and not just a single one. As a test case of our method we analyzed a published dataset of 11 gene arrays showing mRNA expression of monocyte derived human dendritic cells at 0 to 18 hours post infection by Newcastle disease virus. Specifically, we checked whether any patterns of expression were evident amongst genes with transcription binding sites for members of the IRF and STAT gene families. We chose these, as they are known to be important in the antiviral response. A clear pattern emerged at once. Only 7 of the 39 categories had genes whose binding sites where significantly closer to transcription start site (an indication of their reliability as putative target sites). Most of these categories involved up slopes in some constellation. In the genes driven by IRF we identified most particularly the "uf', 'fu' and 'uuf' categories while the STAT activated genes showed also some genes whose general patterns where of down regulation (df and fd). Interestingly, from these patterns we could also identify differences in the extent of temporal control of up or down regulated genes. For both IRF and STAT transcription factors all up regulated genes appeared to stop raising their expression levels at ~ 8-10 hours. We could see this because in all the genes with IRF and STAT binding sites the timing of the final 'f' slope (i.e. the last slope in 'uf and uuf' categories) was at 8-10 hours post infection. We did not find such a pattern in the down-regulated genes, which had some genes that stopped decreasing at every time point from 2 to 12 hours post infection (the limit of our range of analysis). One caveat of our study was that 11 time points was potentially the lower bound in terms of minimal data for analysis. Following FDR correction we could only consider ~ 1700 genes for a 3 slope model. Despite this the slope method did lead to some interesting results relating different transcription factors to patterns of gene activation and determining when these patterns are specifically constrained in time or not. It is my hope in the future to further develop this model and utilize to study other cellular responses.
Metrics
35 File views/ downloads
36 Record Views
Details
Title
The use of segmented linear models to analyze gene array time course experiments
Creators
Bailu Xu - DU
Contributors
Uri Hershberg (Advisor) - Drexel University (1970-)
Awarding Institution
Drexel University
Degree Awarded
Master of Science (M.S.)
Publisher
Drexel University; Philadelphia, Pennsylvania
Resource Type
Thesis
Language
English
Academic Unit
Chemical (and Biological) Engineering [Historical]; College of Engineering (1970-2026); Drexel University