Logo image
Statistical and machine learning contributions to the analysis of single-cell RNA sequencing data
Dissertation   Open access

Statistical and machine learning contributions to the analysis of single-cell RNA sequencing data

Saishi Cui
Doctor of Philosophy (Ph.D.), Drexel University
Jun 2024
DOI:
https://doi.org/10.17918/00010658
pdf
Cui_Saishi_202420.35 MBDownloadView

Abstract

Nucleotide sequence RNA
Single-cell RNA sequencing (scRNA-seq) has become widely adopted as a powerful technology enabling whole transcriptome profiling of individual cells. Nevertheless, the intricate handling of the scRNA-seq data necessitates specific computational approaches. In this dissertation, I proposed novel statistical or machine learning solutions to address existing bottlenecks in three key aspects of analyzing scRNA-seq data. 1) Feature selection: In the first part of my dissertation, I introduced a framework that integrates Multiple Correspondence Analysis, graph-based community detection, and a new statistical test to improve the identification of highly variable genes, unlocking downstream analyses of scRNA-seq data; 2) Cell type annotation: In the second part, I developed an ensemble learning framework that enhances the precision of automated cell type annotation based on an oversampling strategy and compositional data transformation, addressing the inherent imbalance of cell type proportions within biological samples, and the compositional nature of sequencing-based gene expression quantification. 3) Trajectory-based functional gene set testing: In the last part, leveraging ideas from Functional Data Analysis - a branch of statistics dealing with data sets consisting of functions, curves, or trajectories - I developed a new method for piecewise functional two-sample testing of gene sets, enabling statistical comparison of a group of genes across the developmental progression or differentiation status of individual cells within a biological sample. Collectively, this dissertation makes significant contributions in the field by introducing innovative statistical or machine learning approaches to tackle current limitations in three fundamental aspects of analyzing scRNA-seq data.

Metrics

74 File views/ downloads
43 Record Views

Details

Logo image