CRISPR-associated protein 9 Long read sequencing Optical mapping Structural variant Target enrichment Molecular Biology
The last two decades have seen tremendous advancements in tools like CRISPR, single-molecule sequencing technologies, optical genome mapping, single-cell omics techniques, etc. Genomic analyses of multiple individuals become important to address medical questions and advances in genomic technologies and methods directly contribute to improved outcomes in disease research and healthcare. Genome-wide association studies conducted in the past inform us about the limitations of technologies in analyzing factors contributing to human traits and diseases using only short-read amplicon sequencing technologies, especially in repetitive genomic stretches where the likelihood of structural variations is higher. While short-read sequencing is a highly evolved technology, the discovery of causative variants with it alone is impeded by its read length and depending on the assay, is limited to gene coding regions, and less penetrative variants are often missed. In a recent report, it was found that only about one in three known genetic disorders in children have been diagnosed successfully with short-read sequencing (SRS, including whole exome and whole genome assays). On the other hand, building evidence that variants occurring in extragenic regions are found to be driving mutations for diseases shows that there is a need to capture both intergenic and extragenic information. Long-read platforms including optical genome mapping and Pacific Biosciences (PacBio) or Oxford Nanopore Technology (ONT) sequencers have advantages like improved contiguity, long-range haplotype phasing, and therefore, identification of structural variants associated with diseases, especially where alleles extend beyond typical read length. They are better suited to capture large structural variations, their genotypes, and copy number alterations but some practical challenges, like lack of base-level resolution for optical mapping, high error rates, cost per sample, low throughput, and resource-heavy informatics analysis for long-read sequencing, prevent routine adoption long-read platforms. Targeted interrogation of the genome in lieu of the whole genome alleviates some of the above challenges. CRISPR-Cas9-based targeting methods offer programmability and thus are being increasingly used for long-read platforms. This thesis discusses novel applications of CRISPR-Cas9 nickase-based experimental methods for targeted interrogation of genomic features on long-read sequencing and optical mapping platforms. First, the linked-pair sequencing strategy enabled by CRISPR-Cas9n chemistry is described. Here, orderly fragmentation of DNA molecules is performed using CRISPR-Cas9n and pairs of sgRNA in such a way that produced fragments share identical termini. This strategy is capable of greater multiplicity and can assemble critical genetic loci effectively. The validity and efficacy of the method for sequencing were first demonstrated on a lambda phage model and Haemophilus influenzae model, and later for sequencing a cancer panel in the human genome containing 100 full-length gene sequences. When the designed linker sequences contained heterozygous genetic variants, long haplotypes could be established. These whole-gene haplotypes enable the study of genetic variants at non-coding regulatory elements and the detection of any allele-specific effects. The use of Cas9n to introduce nicks allowed for preferential ligation of sequencing adapters resulting in significant enrichment of target regions. Second, multicolor whole genome labeling for optical mapping technique, enabled by CRISPR-Cas9n, is described. Here, Cas9n is used to make directed nicks at target sites to incorporate fluorophores. This technique was used in the detection and quantification of known biomarkers (telomeres and D4Z4 units) and in finding the locations of LINE-1 insertions across the genome With Cas9-mediated nick-labeling, it is possible to target and fluorescently label any 20mer or the combination of multiple 20mers across the whole genome, especially in repetitive regions lacking DLE motifs. Custom maps can be generated to enable precise detection of breakpoints and interrogate the repetitive sequences; this enables more in-depth analysis of SVs than was previously possible. Application of this method in the interrogation of exogenic DNA integrations in the host genome for gene therapy validation and safety assessment is also demonstrated. Finally, additional applications of Cas9n chemistry using both long-read sequencing and optical genome mapping platforms together are demonstrated. Multicolor Cas9n-mediated labeling chemistry is leveraged to create high-density optical maps with an increased resolution for use in structural variant analysis, calling breakpoints, and scaffolding genome assemblies. Such high-density optical maps can also be used to study interactions of Cas9 with target DNA.
Metrics
4 File views/ downloads
65 Record Views
Details
Title
CRISPR-Cas9 mediated methods for targeted genomic analyses using long-read platforms
Creators
Lahari Uppuluri
Contributors
Matthew McCarthy (Advisor)
Awarding Institution
Drexel University
Degree Awarded
Doctor of Philosophy (Ph.D.)
Publisher
Drexel University; Philadelphia, Pennsylvania
Number of pages
xiv, 127 pages
Resource Type
Dissertation
Language
English
Academic Unit
College of Engineering (1970-2026); Mechanical Engineering (and Mechanics) [Historical]; Drexel University