Dissertation
Statistical and machine learning contributions to the analysis of single-cell RNA sequencing data
Doctor of Philosophy (Ph.D.), Drexel University
Jun 2024
DOI:
https://doi.org/10.17918/00010658
Abstract
Single-cell RNA sequencing (scRNA-seq) has become widely adopted as a powerful technology enabling whole transcriptome profiling of individual cells. Nevertheless, the intricate handling of the scRNA-seq data necessitates specific computational approaches. In this dissertation, I proposed novel statistical or machine learning solutions to address existing bottlenecks in three key aspects of analyzing scRNA-seq data. 1) Feature selection: In the first part of my dissertation, I introduced a framework that integrates Multiple Correspondence Analysis, graph-based community detection, and a new statistical test to improve the identification of highly variable genes, unlocking downstream analyses of scRNA-seq data; 2) Cell type annotation: In the second part, I developed an ensemble learning framework that enhances the precision of automated cell type annotation based on an oversampling strategy and compositional data transformation, addressing the inherent imbalance of cell type proportions within biological samples, and the compositional nature of sequencing-based gene expression quantification. 3) Trajectory-based functional gene set testing: In the last part, leveraging ideas from Functional Data Analysis - a branch of statistics dealing with data sets consisting of functions, curves, or trajectories - I developed a new method for piecewise functional two-sample testing of gene sets, enabling statistical comparison of a group of genes across the developmental progression or differentiation status of individual cells within a biological sample. Collectively, this dissertation makes significant contributions in the field by introducing innovative statistical or machine learning approaches to tackle current limitations in three fundamental aspects of analyzing scRNA-seq data.
Metrics
74 File views/ downloads
43 Record Views
Details
- Title
- Statistical and machine learning contributions to the analysis of single-cell RNA sequencing data
- Creators
- Saishi Cui
- Contributors
- Issa Zakeri (Advisor)Sina Nassiri (Advisor)
- Awarding Institution
- Drexel University
- Degree Awarded
- Doctor of Philosophy (Ph.D.)
- Publisher
- Drexel University; Philadelphia, Pennsylvania
- Number of pages
- ix, 171 pages
- Resource Type
- Dissertation
- Language
- English
- Academic Unit
- Dana and David Dornsife School of Public Health; Epidemiology and Biostatistics; Drexel University
- Other Identifier
- 991021890213004721