Medical sciences Bioinformatics Congenital Heart Disease
The 22q11.2 Deletion Syndrome (22q11DS) is a congenital malformation disorder and the most frequent microdeletion syndrome in humans [1]. It has a prevalence of 1 in every 3000 live births [1,2] and 1 in every 1000 pregnancies [3]. Significant medical issues afflict affected individuals. Medical issues may include: congenital cardiac defects (~75%), immune deficiencies, speech/language defects, intellectual disabilities, and a 25-30% risk for developing schizophrenia in adolescence or adulthood [2]. The causative deletion of 22q11DS occurs as a de novo event in meiosis and 90% of affected individuals have a hemizygous 3 million base pair (Mbp) deletion in the chromosome 22q11.2 region [2]. The mechanism responsible for the deletion is non-allelic homologous recombination (NAHR) between surrounding low copy repeats, specific to chromosome 22 (LCR22s) [4,5]. There are 8 LCRs on chromosome 22, labeled alphabetically from LCRA to LCRH from centromere to telomere on the long (q) arm [5]. The LCR22s are comprised of sequence modules of varying lengths containing interspersed genes and pseudogenes. Sequence analysis of these modules reveal a complex organization of duplicated modules. The most frequent ~3Mbp deletion is immediately flanked by LCRA and LCRD, two of the largest LCR22s at approximately 240kbp each [2,5,9]. LCRA and LCRD consist of a direct, highly homologous (>99% sequence identity) 160kbp repeat [6,9,10]. The combination of large, near-identical segments makes the LCR22s substrates in non-allelic homologous recombination (NAHR), leading to genomic rearrangements. Unfortunately, these characteristics also make the LCR22s difficult to reliably sequence and identify rearrangement breakpoints within the homologous chromosome 22 LCRs in individuals with 22q11DS. While significant progress has been made toward elucidating genomic structures of 22q11.2 and mechanisms involved in leading to the causative deletion, any predisposing structures and the exact location of deletion breakpoints remain unknown. Currently, there is no complete and specific model of the NAHR mechanism. Such a model would require completely contiguous LCR22 modules on each chromosome 22 homolog in the parent containing the homologous recombination, along with the resulting 22q11.2 deletion-containing haplotype within this parent's 22q11DS proband. Numerous genomic disorders arise from NAHR of LCRs specific to other chromosomes [6,7,8]. Predisposition to Williams-Beuren syndrome [11] and 16p12.1 microdeletions [12] has been linked to copy number variation of subunits within LCRs. Copy number variation in LCR22 modules has yet to be linked to predisposition to 22q11DS, as the complex arrangement and large size of the LCR22s has made the identification of any NAHR-driving sequences difficult. To complicate matters, the last two human genome reference assembly builds, hg19 (GRCh37) and hg38 (GRCh38.p11), contain large gaps in sequence, predominantly in LCRA. Additionally, LCRB contains an AT-rich palindrome in the center of its mapped location, leaving mischaracterized sequence [13,14]. The sum of gaps, large identical stretches of sequence modules, tandem repeats, and variants of all classes and sizes makes it difficult to define where chromosomal breakage and exchange occurs leading to the 3Mbp deletion. These same characteristics hamper identification of any specific LCR22 configurations responsible that might lead to increased risk for NAHR. Duplicate modules in LCRs are assigned orientations based on the orientation of the module in the human genome reference and typically, the "forward" or "direct" orientation is denoted from the reference module. Flanking LCRs with inverted modules may influence inter- or intrachromosomal rearrangements creating inversion polymorphisms [15,16]. If present, inversions can hamper the meiotic pairing of homologous chromosomes and are predisposing factors in NAHR [19]. Additionally, variations in LCR22s may influence NAHR events but a larger cohort of individuals is required for confirmation of this association [21]. It is possible that any combination of inversions and/or other variants predispose to the events leading to NAHR, highlighting the importance of large-scale population-based studies exhaustively detailing every aspect of the 22q11.2 region. The remaining issues in the 22q11DS region cannot be solved with current sequencing technologies. While whole-genome NGS approaches have enabled thousands of human genomes to be sequenced, the technologies are still not sufficient to completely sequence a whole genome end-to-end revealing phased chromosomes, without leaving gaps in structurally complex regions. Combinations of genome mapping and phased sequencing technologies have shown promising results in previous studies but have yet to be applied to as complex a region as 22q11.2. With the current state of whole-genome approaches, optimizing complementary mapping and sequencing technologies to resolve structural variations in this extremely complex region of the genome while discerning parental origin, will provide an innovative and comprehensive approach to understanding the mechanism giving rise to the 22q11DS. Here the development of a comprehensive whole-genome approach is described, leveraging the increased sensitivity afforded by long single molecule optical mapping on nanochannel arrays coupled with 10xGenomics (10xG) Linked-Read whole-genome sequencing and the CRISPR-Cas9 labeling system. This combination of technologies, along with novel informatics approaches, will elucidate the previously unmapped structure and variation of the chromosome 22 LCRs and surrounding regions. This will provide enhanced insight into the role of variable genetic structures in producing 22q11DS and its associated phenotypes. Our lab's preliminary studies show variability in LCR22 structures that have never been observed before taking place via the 160kbp modules. Typical sequencing approaches fail because the 160kbp modules cause read pile-ups and are not able to discriminate between LCRA- and LCRD-specific sequences. The proposed approach may determine if specific haplotypes predominate in the parent-of-deletion-origin. This approach will likely represent a paradigm, providing resources for the analysis of numerous other significant regions of the genome that have failed accurate detection because of the presence and complexity of other LCRs, many of which cause disease. The added advantage of mapping as an upfront technology to drive sequencing is observed in its high-throughput nature. One can map whole human genomes in less than one week. This enables fast and efficient use of time in determining large-scale structures of LCR22s. Using this information, assembled and/or mapped sequence data may be placed to the proper high-homology LCR22 modules. The coordinates of the deletion breakpoint may be honed-in on, by using three different methods. First, nucleotide differences between LCR22-specific modules in 22q11.2 may be identified and optically mapped using targeted gRNAs using the CRISPR-Cas9 labeling system. To obtain sequence-based information and changes from the reference genome, the medium-range heterozygous variant linking ability of 10xG allows for the crossing from accessible and known mapped regions into unknown, highly-repetitive regions. Finally, label polymorphisms between repeat copies in the 22q11.2, of which may be detected using the DLE-1 optical mapping labeling system, are used to distinguish individual duplicons. By incorporating these methods, the complex configurations in 22q11.2 were disentangled and locally contiguous haplotypes were produced per LCR22 region. Using this information, insight into the genetic mechanisms involved in recombination leading to the predominant 22q11.2 deletion was gained. Overall, this work has resulted in the production of effective mapping and sequencing approaches for use in other difficult to analyze genomic regions and for the eventual creation of pre-diagnostic tests that might potentially aid in preconception screening for 22q11DS risk. This platform has many advantages over the systematic use and expense of current sequencing technologies, which fail to resolve the 22q11.2 region altogether. Using our methods, NAHR has been observed for the first time and may be applied to future genomes, enabling the direct observation of genomic structures participating in NAHR leading to the 22q11.2 deletion.
Metrics
47 File views/ downloads
37 Record Views
Details
Title
Analysis of Genomic Structures Involved in 22q Deletion Syndrome
Creators
Steven Pastor - DU
Contributors
Ming Xiao (Advisor) - Drexel University (1970-)
Awarding Institution
Drexel University
Degree Awarded
Doctor of Philosophy (Ph.D.)
Publisher
Drexel University; Philadelphia, Pennsylvania
Number of pages
xiv, 204 pages
Resource Type
Dissertation
Language
English
Academic Unit
School of Biomedical Engineering, Science, and Health Systems (1997-2026); Drexel University