CRISPR-associated protein 9 DNA combing Optical mapping Single molecule Biomedical Engineering DNA Sequencing Engineering Genomics
Since the completion of the human genome project in 2003, structural and genetic factors within the genome are continuously being mapped to various diseases previously not considered to have genetic etiology. Although small disease-associated genetic variants have been identified, structural variations, particularly >50 bp, have not been characterized well using short-read sequencing (SRS). Large structural variations can have a profound influence in many complex diseases. Long-read sequencing (LRS) and genome mapping are emerging technologies to reveal genomic variations of all sizes. Current LRS technologies, PacBio's SMRT and Oxford's nanopore, have read lengths of 10-20 kbp which are insufficient to detect megabase-scale SVs. Moreover, their low base call accuracy (up to 15%) necessitates combining SRS data. Optical mapping technology extracts sequence-motif information from 300 kbp-long linearized DNA molecules. This substantially higher read length, to a significant extent, addresses the limitations in read length for de novo genome assembly and large SV detection of complex genomes. However, this needs to be combined with sequencing to obtain complete sequence information making the entire process expensive and resource intensive. Even if viable for large SV discoveries, current approaches are not viable for large-scale clinical diagnostics. Hence, there is a need for an economic single molecule long-read sequencing technology that has megabase-scale reads with single -base resolution, high accuracy, and throughput. In this project, we developed a single DNA analysis device that utilizes molecular combing of DNA to isolate and linearize megabase-long DNA molecules. These molecules can act as templates to directly obtain sequence information optically. The unique ability of the device is the application of on-surface enzymatic reactions on ultra-long DNA molecules, opening up various modalities to interrogating genomic loci, including optical mapping, and sequencing. At the device's core is a micropatterned substrate that is patterned with two opposing surface functionalities, one binds DNA ends and the other passivates against DNA, proteins, and fluorophores. This surface design generates highly ordered adsorption of DNA, prevents overstretching of individual molecules permitting sequence-recognition, and provides a low-fluorescence background for efficient fluorescent labeling chemistries. First, development of the novel substrate is discussed followed by demonstration of ordered DNA linearization on the optimized substrates. Next, combed DNA on the substrate was tested for enzymatic activity by performing on-surface transcription. Detection of bright fluorescent labels confirmed unambiguously that combed DNA can be modified enzymatically. Subsequently, an optical mapping experiment was performed by generating single strand nicks in [lambda]-DNA molecules using Nb. BbvCI, then incorporating fluorescent nucleotides by Klenow(exo-). Labeled molecules showed high degree of alignment against the reference confirming sequence-specificity and the ability to perform multiple sequential reactions. To assess if single base detection was possible, similar set of experiments were performed with fluorescent dideoxynucleotides. Results indicated successful single base incorporation, encouraging further development towards base-by-base sequencing. The above results were achieved with developments in various aspects of the device, primarily the incorporation of an optimized hydrogel overlay to protect and stabilize combed DNA molecules, and construction of microliter flow cells for efficient implementation of enzymatic reactions. Additionally, fully automated microscope imaging instruments were built to scan large areas of the device and detect single fluorophores. A novel rapid sequence-specific labeling chemistry based on CRISPR-dCas9 was developed and tested. Successful labeling of combed DNA was observed, and alignment to the reference was compared to a state-of-the-art nanochannel-based assay. A high error rate was observed, and further work to minimize by way of gRNA design, and potential dCas9 protein modifications might be necessary. In summary, the design and development of a novel, inexpensive single DNA analysis device that is capable of megabase read lengths is described.